Azure AI Translator Container Memory Leak Symptoms Causes and Solutions Inquiry

Question

Azure AI Translator Container Memory Leak Symptoms Causes and Solutions Inquiry

BrityMeeting 공용계정 0

We configured AI Translator Container on 4 servers a month ago (June 2025).

There is no problem with the translation itself, but we have confirmed that there is a memory leak as shown in the image below.

I would like to ask how to identify the cause of the memory leak and resolve it.

How to configure
- Install Docker on each of the 4 servers and install the AI Translator container image on them.
Pages referenced during configuration
- https://v4.hkg1.meaqua.org/en-us/azure/ai-services/translator/containers/install-run?tabs=connected
- https://v4.hkg1.meaqua.org/en-us/azure/ai-services/translator/containers/configuration
Spec.
- Server OS : UBUNTU 22.04
- CPU : 24 core
- Memory : 48 GB
- Azuer AI Translator container image version : 1.0.03051.790-amd64
- Docker engine ver. 28.2.1
Drive command docker run -d --rm -it -p 5000:5000 --memory 46g --cpus 23.9 \ --name cpaas_trans \ --log-driver=syslog \ --log-opt tag="container" \ -v /mnt/d/TranslatorContainer:/usr/local/models \ -e apikey=<<API_KEY>>\ -e eula=accept \ -e billing=<<BILLING_URL>> \ -e Languages=ko,en,ja,zh-Hans,zh-Hant,es,ru,vi,de,fr,it,pt,hu,pl \ mcr.microsoft.com/azure-cognitive-services/translator/text-translation:${TAG}
Memory Leak graph (grafana)

1 answer

Your answer

Answer 1

Hello ! Thank you for posting on Microsoft Learn Q&A.

In your case you need to verify if it’s real RSS growth and not page cache or host noise so try to run on each host/container:

# live view
docker stats cpaas_trans

# inside the container: process 1 is the service
docker exec cpaas_trans sh -c 'awk "/VmRSS|VmSize/" /proc/1/status; echo "---"; cat /proc/meminfo | egrep "MemAvailable|Cached"; echo "---"; head -100 /proc/1/smaps_rollup'

# per-region allocator view (good for C/C++ heaps)
docker exec cpaas_trans sh -c 'cat /proc/1/smaps_rollup'

and if VmRSS tracks your Grafana line while Cached on the host does not it’s true heap growth.

Another thing is to rule out easy the followings :

cgroups limits: you set --memory 46g on a 48 GB host. Leave at least 6–8 GB headroom for the OS + page cache
```
  docker run ... --memory 38g --memory-swap 0 --memory-reservation 30g ...
```
(--memory-swap 0 = no swap; prevents silent ballooning.)
logging: you’re using --log-driver=syslog; confirm host syslog rotation so logs don’t pressure memory/disk I/O
concurrency that correlate with mini-steps up in RSS

You need to lower limits on one node: --memory 38g --memory-swap 0 --cpus 16.

and spin up one canary with -e Languages=en,pt only and mirror10% traffic for 48 h.

Add an hourly dump of /proc/1/smaps_rollup to quantify where memory grows and on clients, cap HTTP keep-alive lifetime and idle pools and try a newer tag on a separate canary.

Share via

Azure AI Translator Container Memory Leak Symptoms Causes and Solutions Inquiry

1 answer

Your answer