From 7b01e60b989f5fcd40e19a8293eff90d49c70064 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Niklas=20Hamb=C3=BCchen?= Date: Wed, 26 Mar 2025 23:20:46 +0100 Subject: [PATCH] Document the adverse effects of delayed memory return. See #2688. Also document the surprising behaviour that the times configured `dirty_decay_ms`/`muzzy_decay_ms` have no effect unless `background_thread:true` is given if the process is sleeping. I was very surprised when I had a process waiting for user input sitting around at 100 GB RES memory, and `dirty_decay_ms` not returning the memory after the default 10 seconds, but `dirty_decay_ms:0` fixing the problem immediately. I was even more confused that changing the time configured by `dirty_decay_ms` seemed to be completely ignored by jemalloc, until I got the explanation from https://github.com/jemalloc/jemalloc/issues/2688#issuecomment-2464775694 Personally I think the default-on of `dirty_decay_ms` is questionable, as I encountered the problem documented here with "This can lead to out-of-memory crashes ..." where my program needed 100 GB for some processing, freed it, and then invoked a child process that also needed 100 GB RAM; the parent program continued to sit at 100 GB RES usage and thus was OOM-killed, creating a bug where none should be (as the parent process freed all memory exactly right). This commit at least documents the impact of this questionable default. --- TUNING.md | 5 +++ doc/jemalloc.xml.in | 75 +++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 77 insertions(+), 3 deletions(-) diff --git a/TUNING.md b/TUNING.md index 1f6bef35..bd18eac7 100644 --- a/TUNING.md +++ b/TUNING.md @@ -43,6 +43,11 @@ Runtime options can be set via between CPU and memory usage. Shorter decay time purges unused pages faster to reduces memory usage (usually at the cost of more CPU cycles spent on purging), and vice versa. + Be aware that decay times `> 0` will not be honored until the next relevant call + into jemalloc, unless you also enable `background_thread:true`. + Without `background_thread:true`, processes that are sleeping + (e.g. because they call `sleep()`, block on user input, network/file activity, + or run subprocesses) will not purge memory. Suggested: tune the values based on the desired trade-offs. diff --git a/doc/jemalloc.xml.in b/doc/jemalloc.xml.in index 2a8573b8..d9f4249d 100644 --- a/doc/jemalloc.xml.in +++ b/doc/jemalloc.xml.in @@ -526,6 +526,58 @@ for (i = 0; i < nbins; i++) { 8, 10, or 16, depending on prefix), and yet others have raw string values. + + DELAYED MEMORY RETURN + Some jemalloc behavior delays the return of + free()d memory to the operating system. + This is an optimization to increase allocator speed (since + if the application needs memory again soon after, + the system calls to return it to the OS and to re-request it + can be skipped). + It comes at the the cost of retaining system memory, which is then + not available to other processes. + + This can lead to out-of-memory crashes: + For example, if the program completes a task that needed 70% of + the machine's memory, frees it (with delayed return), + and then invokes a child process that also needs + 70% of memory, the total consumption is 140% of the machine's memory. + The operating system did not observe the program returning + the memory, so it could not allocate it to the child process. + + + Delayed memory return is controlled by the option + opt.dirty_decay_ms + which is enabled by default. + + + Be aware that decay times > 0 will not be honored until the next relevant + call into jemalloc, unless you also + enable background_thread. + Without background_thread, + processes that are sleeping (e.g. because they call sleep(), + block on user input, network/file activity, or run subprocesses) will not purge memory + after the configured time, but at a potentially much later time + (seconds or hours or months, depending on how long the program blocks). + + + It is thus recommended that if your program uses and frees significant + amounts of memory that other processes (including its own child processes) + may subsequently need, you should either set + opt.dirty_decay_ms + to 0 if you need guarantees that the freed memory will be + immediately available to them, or that you enable + background_thread + if you desire a delay that will be approximately honored. + Alternatively, a performance tradeoff in between is to set + opt.dirty_decay_ms + to 0 and enable + opt.muzzy_decay_ms + (on platforms where it is supported); this will make freed memory + available immediately available to other processes with lower overheads, + but may negatively affect observability as documented for that option. + + IMPLEMENTATION NOTES Traditionally, allocators have used @@ -1167,7 +1219,14 @@ mallctl("arena." STRINGIFY(MALLCTL_ARENAS_ALL) ".decay", purged according to a sigmoidal decay curve that starts and ends with zero purge rate. A decay time of 0 causes all unused dirty pages to be purged immediately upon creation. A decay time of -1 disables purging. - The default decay time is 10 seconds. See DELAYED MEMORY RETURN + for the adverse effect that this mechanism can have under memory pressure. + Be aware that decay times > 0 will not be honored + until the next relevant call into jemalloc, + unless background_thread + is enabled; see DELAYED MEMORY RETURN. + See arenas.dirty_decay_ms and arena.<i>.dirty_decay_ms @@ -1191,11 +1250,21 @@ mallctl("arena." STRINGIFY(MALLCTL_ARENAS_ALL) ".decay", subsequently purged in a manner that left them subject to the reclamation whims of the operating system (e.g. madvise(...MADV_FREE)), - and therefore in an indeterminate state. The pages are incrementally + and therefore in an indeterminate state. + A drawback of this method is reduced observability, since e.g. on Linux, + memory freed this way is still displayed as resident process memory (RSS) + in many tools that display memory usage, making it more difficult to check + how much memory a process is actually using. + The pages are incrementally purged according to a sigmoidal decay curve that starts and ends with zero purge rate. A decay time of 0 causes all unused muzzy pages to be purged immediately upon creation. A decay time of -1 disables purging. - Muzzy decay is disabled by default (with decay time 0). See 0 will not be honored + until the next relevant call into jemalloc, + unless background_thread + is enabled; see DELAYED MEMORY RETURN. + See arenas.muzzy_decay_ms and arena.<i>.muzzy_decay_ms