-
Notifications
You must be signed in to change notification settings - Fork 643
Description
Summary
Cloudwatch is reporting what looks to be a memory leak in my ECS task. MemoryUtilization has been rising continually since the last deployment and currently sits at 330% with no sign of stopping.
ContainerInsight corroborates this, reporting that my app container is using 990MB.
However, memory usage on the entire host is only 441MB and has been stable. So the number ECS is reporting cannot be accurate.
[ec2-user@ip-10-0-0-108 ~]$ free -m
total used free shared buff/cache available
Mem: 1954 441 247 10 1264 1329
Swap: 0 0 0
What's happening is that MemoryUtilization is including kernel slabs, notably dentry. Every time a file is created, information is saved in the dentry cache, but is not cleared when the file is deleted. So applications like mine that create many short-lived files, dentry can inflate to a massive size.
This unfortunately makes MemoryUtilization meaningless and leaves me with no insight into the memory usage of my containers.
Description
As mentioned above, ContainerInsights reports 990MB.
@timestamp
1677516240000
ClusterName
pos
ContainerInstanceId
6963edae9ae74236a5127d57bba779ad
ContainerKnownStatus
RUNNING
ContainerName
app
CpuReserved
0.0
CpuUtilized
39.30253030341968
EC2InstanceId
i-0be30281ecc6ad325
Image
[REDACTED]
MemoryReserved
256
MemoryUtilized
990
NetworkRxBytes
67874
NetworkRxDropped
0
NetworkRxErrors
0
NetworkRxPackets
89263702
NetworkTxBytes
55667
NetworkTxDropped
0
NetworkTxErrors
0
NetworkTxPackets
81569942
ServiceName
exterminator
StorageReadBytes
3691008
StorageWriteBytes
90112
TaskDefinitionFamily
exterminator
TaskDefinitionRevision
29
TaskId
5077b121b4754591a8665be324f83e6c
Timestamp
1677516240000
Type
Container
Docker stats also reports this this. (This shows 1108MB because it was run a few hours later.)
[ec2-user@ip-10-0-0-108 ~]$ docker stats --no-stream
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
28d2df0aa8e0 ecs-exterminator-29-proxy-f6ddeed6fffdaca97200 0.64% 18.99MiB / 1.908GiB 0.97% 63.3GB / 66.6GB 456kB / 0B 2
f25307353a3b ecs-exterminator-29-app-acd3b998add1b1a88801 4.50% 1008MiB / 1.908GiB 51.56% 60.4GB / 51.2GB 4.53MB / 90.1kB 16
e42429bee55a ecs-exterminator-29-forwarder-82fcccd1ea82f1c95a00 0.81% 57.24MiB / 1.908GiB 2.93% 3.13GB / 2.18GB 545kB / 0B 12
09f811287d6e ecs-agent 0.17% 17.32MiB / 1.908GiB 0.89% 0B / 0B 118MB / 6.36MB 12
However, host memory use is only 440MB.
[ec2-user@ip-10-0-0-108 ~]$ free -m
total used free shared buff/cache available
Mem: 1954 441 247 10 1264 1329
Swap: 0 0 0
If we look into the containers memory.stat, we can see RSS is 158m (about what I would expected) with cache and inactive_files and others showing modest amounts that would not account for the discrepancy.
[ec2-user@ip-10-0-0-108 ~]$ docker exec ecs-exterminator-29-app-acd3b998add1b1a88801 cat /sys/fs/cgroup/memory/memory.stat
cache 2899968
rss 158834688
rss_huge 0
shmem 0
mapped_file 135168
dirty 135168
writeback 0
swap 0
pgpgin 7420380
pgpgout 7380871
pgfault 77616
pgmajfault 33
inactive_anon 0
active_anon 158777344
inactive_file 2658304
active_file 286720
unevictable 0
hierarchical_memory_limit 9223372036854771712
hierarchical_memsw_limit 9223372036854771712
total_cache 2899968
total_rss 158834688
total_rss_huge 0
total_shmem 0
total_mapped_file 135168
total_dirty 135168
total_writeback 0
total_swap 0
total_pgpgin 7420380
total_pgpgout 7380871
total_pgfault 77616
total_pgmajfault 33
total_inactive_anon 0
total_active_anon 158777344
total_inactive_file 2658304
total_active_file 286720
total_unevictable 0
memory.usage_in_bytes shows a very large value. I believe ECS takes usage_in_bytes - cache, so that's where our inflated value is coming from.
[ec2-user@ip-10-0-0-108 ~]$ docker exec ecs-exterminator-29-app-acd3b998add1b1a88801 cat /sys/fs/cgroup/memory/memory.usage_in_bytes
1059020800
If we look at kmem use, we can see that it's extremely high, which I believe accounts for the discrepancy.
[ec2-user@ip-10-0-0-108 ~]$ docker exec ecs-exterminator-29-app-acd3b998add1b1a88801 cat /sys/fs/cgroup/memory/memory.kmem.usage_in_bytes
897204224
And if we break that down we can see that dentry is absolutely massive.
[ec2-user@ip-10-0-0-108 ~]$ docker exec ecs-exterminator-29-app-acd3b998add1b1a88801 cat /sys/fs/cgroup/memory/memory.kmem.slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
kmalloc-96 42 42 96 42 1 : tunables 0 0 0 : slabdata 1 1 0
radix_tree_node 151 168 584 14 2 : tunables 0 0 0 : slabdata 12 12 0
xfs_inode 259 306 960 17 4 : tunables 0 0 0 : slabdata 18 18 0
kmalloc-64 320 320 64 64 1 : tunables 0 0 0 : slabdata 5 5 0
kmalloc-8 1024 1024 8 512 1 : tunables 0 0 0 : slabdata 2 2 0
ovl_inode 532 805 696 23 4 : tunables 0 0 0 : slabdata 35 35 0
kmalloc-1024 32 32 1024 16 4 : tunables 0 0 0 : slabdata 2 2 0
kmalloc-192 42 42 192 21 1 : tunables 0 0 0 : slabdata 2 2 0
inode_cache 26 26 616 13 2 : tunables 0 0 0 : slabdata 2 2 0
mqueue_inode_cache 0 0 960 17 4 : tunables 0 0 0 : slabdata 0 0 0
pid 64 64 128 32 1 : tunables 0 0 0 : slabdata 2 2 0
signal_cache 32 32 1024 16 4 : tunables 0 0 0 : slabdata 2 2 0
sighand_cache 30 30 2112 15 8 : tunables 0 0 0 : slabdata 2 2 0
files_cache 46 46 704 23 4 : tunables 0 0 0 : slabdata 2 2 0
task_struct 23 23 11520 1 4 : tunables 0 0 0 : slabdata 23 23 0
sock_inode_cache 69 69 704 23 4 : tunables 0 0 0 : slabdata 3 3 0
kmalloc-512 32 32 512 16 2 : tunables 0 0 0 : slabdata 2 2 0
kmalloc-256 32 32 256 16 1 : tunables 0 0 0 : slabdata 2 2 0
mm_struct 32 32 2048 16 8 : tunables 0 0 0 : slabdata 2 2 0
shmem_inode_cache 66 66 728 22 4 : tunables 0 0 0 : slabdata 3 3 0
proc_inode_cache 92 92 688 23 4 : tunables 0 0 0 : slabdata 4 4 0
dentry 4582515 4582515 192 21 1 : tunables 0 0 0 : slabdata 218215 218215 0
vm_area_struct 1860 1860 200 20 1 : tunables 0 0 0 : slabdata 93 93 0
cred_jar 210 210 192 21 1 : tunables 0 0 0 : slabdata 10 10 0
anon_vma 780 780 104 39 1 : tunables 0 0 0 : slabdata 20 20 0
anon_vma_chain 1536 1536 64 64 1 : tunables 0 0 0 : slabdata 24 24 0
And finally, if we clear the caches (echo 3 | sudo tee /proc/sys/vm/drop_caches) , memory usage drops from several hundred percent to about 70%, proving that it is indeed a kernel cache that is inflating MemoryUtilization.
Environment Details
t3.small running Amazon Linux 2 (amzn2-ami-ecs-hvm-2.0.20230214-x86_64-ebs ami-0ae546d2dd33d2039), ECS Agent 1.68.2
(This was initially observed on Fargate but I switched to EC2 to facilitate debugging.)
docker info output:
Client:
Context: default
Debug Mode: false
Server:
Containers: 4
Running: 4
Paused: 0
Stopped: 0
Images: 6
Server Version: 20.10.17
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
runc version: 5fd4c4d144137e991c4acebb2146ab1483a97925
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 4.14.301-224.520.amzn2.x86_64
Operating System: Amazon Linux 2
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.908GiB
Name: ip-10-0-0-108.us-west-2.compute.internal
ID: 6L62:MSTB:SIPL:L65U:PCMR:2FYH:JZR7:MRMS:IP53:56ZU:NQUH:4FNK
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Prior art
- In researching this I found someone who encountered this exact issue and wrote up a medium article about it: https://medium.com/@bobzsj87/demist-the-memory-ghost-d6b7cf45dd2a
- Container memory stats include filesystem cache usage #280 reported unusual memory usage, fixed in exclude cache value from container memory utilization metric #582 by subtracting
memory.stat.cache - Similar issue reported for Docker CLI. Exclude
inode,dentryand other slabs fromMEM USAGEdocker/cli#3171
