Docker Container Ate All My VPS RAM: How to Diagnose and Fix OOM Kills

~10 min read

1,811 views

0 likes

Your Docker container was running fine for weeks. Then it started restarting every few hours. You check the logs and see nothing useful. You run docker ps and the container status flips between Up and Restarting. The only clue is exit code 137, which means the process was killed with SIGKILL. In most cases on a VPS, that means the Linux Out-Of-Memory (OOM) killer stepped in.

This article walks through the exact diagnostic workflow to confirm an OOM kill, find the process that triggered it, and fix the root cause. No guesses. No restarting the entire VPS and hoping for the best.

What exit code 137 actually means

Exit code 137 is the sum of two numbers: 128 (the base for signal-terminated processes) plus 9 (SIGKILL). When Docker reports this code, it means the container's main process was forcefully terminated by something that sent a SIGKILL signal.

There are two common causes:

The Linux OOM killer killed the process because the system ran out of RAM and the kernel needed to free memory to keep the OS alive.
Someone or something ran docker kill or kill -9 on the process manually.

If you did not manually kill the container, it is almost certainly the OOM killer. The OOM killer is a Linux kernel mechanism, not a Docker feature. It exists in every Linux distribution and activates when the kernel cannot satisfy a memory allocation request. Its job is to pick a process and terminate it to free RAM.

On a VPS with no swap or with swap already full, the OOM killer has fewer options. It may target the largest process, the most recently started process, or a process with a high badness score. That process could be your database, your reverse proxy, or even the Docker daemon itself.

Step 1: Confirm it was an OOM kill

Do not assume. Verify. Run three commands in this order.

Check docker inspect

docker inspect CONTAINER_NAME --format='{{.State.OOMKilled}} {{.State.ExitCode}}'

If the output is true 137, Docker confirms the container was killed by the OOM killer. If it is false 137, something else sent SIGKILL. Check whether a monitoring script, a CI/CD pipeline, or a person ran docker kill.

Check the kernel log

sudo dmesg | grep -i "killed process"

The kernel logs every OOM kill. Look for a line like this:

[123456.789012] Out of memory: Killed process 12345 (postgres) total-vm:1048576kB, anon-rss:512000kB, file-rss:0kB, shmem-rss:0kB

This tells you exactly which process was killed, its PID, and how much memory it was using. If you see multiple entries with different PIDs, the OOM killer ran more than once, which means your VPS is chronically under memory pressure.

Check docker stats

docker stats --no-stream

This shows the current memory and CPU usage for all running containers. Look for a container with high MEM % or one that consistently grows over time. If the MEM LIMIT column shows the total host RAM for every container, you have not set memory limits. That is the root cause.

Step 2: Find what is consuming the memory

Once you confirm an OOM kill, the next question is: what process inside the container consumed all the RAM? Here are the most common culprits on self-hosted VPS setups.

Machine learning inference containers

Containers that run AI models load the entire model into RAM at startup. Ollama, Immich machine learning, and similar services can consume 2-8 GB of RAM depending on the model. If your VPS has 2 GB total, one of these containers will trigger an OOM kill the moment it starts. The Immich self-hosting guide notes that the machine learning container alone needs 2-4 GB of RAM for facial recognition and AI search.

Databases without connection limits

PostgreSQL, MariaDB, and Redis all allocate memory per connection. A default PostgreSQL configuration on a small VPS can accept hundreds of connections, each consuming a few megabytes. The total adds up quickly. If your app opens a connection pool of 100 connections on a 2 GB VPS, the database alone can trigger an OOM kill.

Java applications with default heap sizes

Java applications often default to a heap size of 25% of total system RAM. On a 4 GB VPS, that is 1 GB just for the JVM heap, plus native memory for threads, metaspace, and off-heap buffers. If you run multiple Java services, they collectively allocate more memory than the host has.

Build processes during `docker compose up`

Building a Docker image from source, especially Node.js or Python projects, can spike RAM usage significantly. npm install, pip install, and compilation steps all run inside the build context. If your docker compose up --build runs on the same VPS as your production containers, the build process can OOM-kill a running database or app. The Coolify guide specifically warns about this: a Nixpacks build on a 2 GB VPS will OOM-kill itself before it finishes.

Document processing pipelines

OCR engines like Tesseract and document parsers used by Paperless-ngx load entire pages into memory. Processing a large PDF batch can spike RAM usage for minutes. The Paperless-ngx setup guide recommends at least 2 GB of RAM for the stack, and that is before you throw a 100-page scanned document at it.

Step 3: The immediate fix

There are three ways to stop OOM kills, ordered from fastest to most robust.

Option A: Add memory limits to the container

Memory limits are hard caps. If a container exceeds its limit, Docker kills only that container. The rest of the system stays alive. Add this to your docker-compose.yml:

services:
  app:
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: "1.0"

Adjust the values based on what the service actually needs. A PostgreSQL database with a small dataset might need 1 GB. A Node.js API might run fine on 256 MB. The key is to set a limit that is lower than what would trigger a system-wide OOM kill.

Without limits, a container with a memory leak will consume all available RAM. The kernel OOM killer then steps in and may kill a different, more important process. Memory limits confine the blast radius to the offending container.

Option B: Add swap space

Swap is not a replacement for RAM, but it buys time. When RAM fills up, the kernel moves inactive memory pages to disk. This prevents immediate OOM kills during memory spikes, such as during a Docker build or a large document import.

sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Make it permanent by adding to /etc/fstab:

/swapfile none swap sw 0 0

On a VPS with SSD storage, 2 GB of swap is usually enough to handle spikes. Do not rely on swap for sustained workloads. If a container is constantly swapping, it needs more RAM, not more swap.

Option C: Upgrade the VPS

If you have already set memory limits and added swap, but containers still hit their caps, the VPS itself is undersized for the workload. This is common when self-hosting grows: you start with one service, add a second, then a third, and suddenly a 2 GB VPS cannot handle the stack.

A ServerSpan KVM VPS can be resized without data loss. Moving from a 2 GB plan to a 4 GB plan gives you enough headroom for a typical self-hosted stack: reverse proxy, database, one or two applications, and the OS itself. The alternative is constant firefighting with memory limits and swap tuning.

Step 4: Tune the application inside the container

Memory limits and swap treat the symptom. To fix the root cause, tune the application so it uses less memory.

PostgreSQL: reduce max connections

max_connections = 50
shared_buffers = 128MB

The default max_connections = 100 is excessive for a small VPS. Each connection consumes a few megabytes. On a 2 GB VPS, 50 connections is a safer ceiling.

Redis: set maxmemory

maxmemory 256mb
maxmemory-policy allkeys-lru

Without a maxmemory limit, Redis will grow until the OOM killer intervenes. The allkeys-lru policy evicts the least recently used keys when the limit is reached.

Java: set explicit heap size

-Xmx512m -Xms256m

These flags cap the JVM heap at 512 MB and set the initial heap to 256 MB. Without them, the JVM defaults to 25% of total RAM, which competes with other containers.

Nextcloud: tune PHP-FPM pool

Nextcloud on Docker often ships with aggressive PHP-FPM defaults. A pool of 50 workers on a 2 GB VPS will exhaust memory under load. The Nextcloud stability guide covers the exact worker count, OPcache settings, and memory limits that keep the stack alive past the first week.

Step 5: Set up monitoring so you see it coming

OOM kills should never be a surprise. Set up monitoring that alerts you before RAM runs out.

Monitor container memory with docker stats

watch -n 5 docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}"

This refreshes every 5 seconds and shows which containers are growing. If one container's memory percentage climbs steadily, you have a leak.

Monitor host memory with free

free -h

Check the available column, not just free. Linux uses free RAM for buffers and cache. The available number is what matters for new allocations. If it drops below 10% of total RAM, you are in the danger zone.

Set up a simple alert script

#!/bin/bash
AVAILABLE=$(free | awk '/Mem:/ {printf "%.0f", $7/$2 * 100}')
if [ "$AVAILABLE" -lt 15 ]; then
  echo "WARNING: Only ${AVAILABLE}% RAM available on $(hostname)" | logger
fi

Run this via cron every minute. It logs a warning when available RAM drops below 15%. You can extend it to send an email or webhook notification.

The OOM kill decision tree

Use this flow when a container crashes:

Run docker inspect. Is OOMKilled true? If no, check for manual kills or scripts.
Run dmesg | grep "killed process". Which process was killed and how much RAM was it using?
Run docker stats. Which container is the largest consumer?
Check if the container has a memory limit. If not, add one immediately.
Check if the VPS has swap. If not, add 1-2 GB.
Check application settings: connection limits, heap size, worker count.
If all limits are set and the app is tuned but it still hits the cap, the VPS is too small.

What NOT to do

Do not disable the OOM killer. It exists to keep the OS alive. Without it, the kernel would deadlock when RAM runs out.
Do not set all containers to restart: always without memory limits. A container that OOM-kills every 30 seconds will restart forever, consuming CPU and disk I/O for log writes.
Do not add 10 GB of swap on a VPS with 2 GB of RAM. Heavy swapping turns a fast SSD into a bottleneck. The system becomes unresponsive without technically crashing.
Do not ignore exit code 137. It is not a "Docker bug." It is a signal that your workload exceeds your resources.

Prevention checklist

Set memory limits on every production container.
Set CPU limits on containers that spike during builds or batch processing.
Tune database connection limits and buffer sizes to match VPS RAM.
Set explicit heap sizes on Java applications.
Add 1-2 GB of swap for memory spike protection.
Monitor docker stats after every new deployment.
Leave 20-30% of host RAM unallocated for the OS and Docker overhead.
Test builds on a staging instance before running them on the production VPS.
Back up your data before resizing or adding swap. The BorgBackup vs Restic comparison explains which backup tool handles memory pressure during prune jobs without killing your VPS.

When the fix is a bigger VPS

There is a point where tuning stops helping. If you run Immich with machine learning, Nextcloud with preview generation, Paperless-ngx with OCR, and a PostgreSQL database on the same 2 GB VPS, no amount of memory limits will make that setup stable. The aggregate RAM requirement exceeds the hardware.

A ServerSpan KVM VPS in the ct.Steady tier (4 Core, 4 GB RAM, 50 GB SSD) is the minimum for a multi-service self-hosted stack that includes ML inference or heavy document processing. If you are consistently hitting memory limits across multiple containers, resizing the VPS is the only correct fix.

The cost of a VPS upgrade is lower than the cost of downtime, data corruption from ungraceful kills, and the hours spent firefighting memory issues. Diagnose first, tune second, and scale third.

For broader context on keeping Docker infrastructure healthy beyond just memory management, the Docker 29 incident analysis explains why blind auto-updates, missing healthchecks, and untested version bumps create the same class of production problems.

Romanian version: Containerul Docker mi-a mâncat tot RAM-ul de pe VPS: cum diagnostichezi și repari OOM Kills

Source & Attribution

This article is based on original data belonging to serverspan.com blog. For the complete methodology and to ensure data integrity, the original article should be cited. The canonical source is available at: Docker Container Ate All My VPS RAM: How to Diagnose and Fix OOM Kills.

Virtual Private Servers

Scalable VPS solutions with full root access and SSD storage.

Learn More

Docker Container Ate All My VPS RAM: How to Diagnose and Fix OOM Kills

What exit code 137 actually means

Step 1: Confirm it was an OOM kill

Check docker inspect

Check the kernel log

Check docker stats

Step 2: Find what is consuming the memory

Machine learning inference containers

Databases without connection limits

Java applications with default heap sizes

Build processes during `docker compose up`

Document processing pipelines

Step 3: The immediate fix

Option A: Add memory limits to the container

Option B: Add swap space

Option C: Upgrade the VPS

Step 4: Tune the application inside the container

PostgreSQL: reduce max connections

Redis: set maxmemory

Java: set explicit heap size

Nextcloud: tune PHP-FPM pool

Step 5: Set up monitoring so you see it coming

Monitor container memory with docker stats

Monitor host memory with free

Set up a simple alert script

The OOM kill decision tree

What NOT to do

Prevention checklist

When the fix is a bigger VPS

Source & Attribution

Virtual Private Servers

You May Also Like

Self-Hosted Invoice Ninja on Your VPS: The Freelancer's Billing Stack

SSH on VPS: Enable & Connect with PuTTY in 5 Minutes (All Linux/BSD)

Firefly III on a VPS: Own Your Financial Data Before Your Bank's App Owns You

Stop Screenshotting Recipes: self-host Mealie and build a family cookbook that actually gets used

Docker Container Ate All My VPS RAM: How to Diagnose and Fix OOM Kills

What exit code 137 actually means

Step 1: Confirm it was an OOM kill

Check docker inspect

Check the kernel log

Check docker stats

Step 2: Find what is consuming the memory

Machine learning inference containers

Databases without connection limits

Java applications with default heap sizes

Build processes during docker compose up

Document processing pipelines

Step 3: The immediate fix

Option A: Add memory limits to the container

Option B: Add swap space

Option C: Upgrade the VPS

Step 4: Tune the application inside the container

PostgreSQL: reduce max connections

Redis: set maxmemory

Java: set explicit heap size

Nextcloud: tune PHP-FPM pool

Step 5: Set up monitoring so you see it coming

Monitor container memory with docker stats

Monitor host memory with free

Set up a simple alert script

The OOM kill decision tree

What NOT to do

Prevention checklist

When the fix is a bigger VPS

Source & Attribution

Virtual Private Servers

Share This Post

You May Also Like

Build processes during `docker compose up`