Azure AI Translator Container Memory Leak Symptoms Causes and Solutions Inquiry

BrityMeeting 공용계정 0 Reputation points
2025-07-18T01:48:28.0933333+00:00

We configured AI Translator Container on 4 servers a month ago (June 2025).

There is no problem with the translation itself, but we have confirmed that there is a memory leak as shown in the image below.

I would like to ask how to identify the cause of the memory leak and resolve it.

  • How to configure
    • Install Docker on each of the 4 servers and install the AI Translator container image on them.
  • Pages referenced during configuration
  • Spec.
    • Server OS : UBUNTU 22.04
    • CPU : 24 core
    • Memory : 48 GB
    • Azuer AI Translator container image version : 1.0.03051.790-amd64
    • Docker engine ver. 28.2.1
  • Drive command docker run -d --rm -it -p 5000:5000 --memory 46g --cpus 23.9 \ --name cpaas_trans \ --log-driver=syslog \ --log-opt tag="container" \ -v /mnt/d/TranslatorContainer:/usr/local/models \ -e apikey=<<API_KEY>>\ -e eula=accept \ -e billing=<<BILLING_URL>> \ -e Languages=ko,en,ja,zh-Hans,zh-Hant,es,ru,vi,de,fr,it,pt,hu,pl \ mcr.microsoft.com/azure-cognitive-services/translator/text-translation:${TAG}
  • Memory Leak graph (grafana)
    User's image
Azure AI Translator
Azure AI Translator
An Azure service to easily conduct machine translation with a simple REST API call.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 41,121 Reputation points Volunteer Moderator
    2025-10-27T15:16:02.16+00:00

    Hello ! Thank you for posting on Microsoft Learn Q&A.

    In your case you need to verify if it’s real RSS growth and not page cache or host noise so try to run on each host/container:

    # live view
    docker stats cpaas_trans
    
    # inside the container: process 1 is the service
    docker exec cpaas_trans sh -c 'awk "/VmRSS|VmSize/" /proc/1/status; echo "---"; cat /proc/meminfo | egrep "MemAvailable|Cached"; echo "---"; head -100 /proc/1/smaps_rollup'
    
    # per-region allocator view (good for C/C++ heaps)
    docker exec cpaas_trans sh -c 'cat /proc/1/smaps_rollup'
    

    and if VmRSS tracks your Grafana line while Cached on the host does not it’s true heap growth.

    Another thing is to rule out easy the followings :

    • cgroups limits: you set --memory 46g on a 48 GB host. Leave at least 6–8 GB headroom for the OS + page cache
        docker run ... --memory 38g --memory-swap 0 --memory-reservation 30g ...
      
      (--memory-swap 0 = no swap; prevents silent ballooning.)
    • logging: you’re using --log-driver=syslog; confirm host syslog rotation so logs don’t pressure memory/disk I/O
    • concurrency that correlate with mini-steps up in RSS

    You need to lower limits on one node: --memory 38g --memory-swap 0 --cpus 16.

    and spin up one canary with -e Languages=en,pt only and mirror10% traffic for 48 h.

    Add an hourly dump of /proc/1/smaps_rollup to quantify where memory grows and on clients, cap HTTP keep-alive lifetime and idle pools and try a newer tag on a separate canary.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.