diff --git a/TUNING.md b/TUNING.md
index 1f6bef35..bd18eac7 100644
--- a/TUNING.md
+++ b/TUNING.md
@@ -43,6 +43,11 @@ Runtime options can be set via
between CPU and memory usage. Shorter decay time purges unused pages faster
to reduces memory usage (usually at the cost of more CPU cycles spent on
purging), and vice versa.
+ Be aware that decay times `> 0` will not be honored until the next relevant call
+ into jemalloc, unless you also enable `background_thread:true`.
+ Without `background_thread:true`, processes that are sleeping
+ (e.g. because they call `sleep()`, block on user input, network/file activity,
+ or run subprocesses) will not purge memory.
Suggested: tune the values based on the desired trade-offs.
diff --git a/doc/jemalloc.xml.in b/doc/jemalloc.xml.in
index 2a8573b8..d9f4249d 100644
--- a/doc/jemalloc.xml.in
+++ b/doc/jemalloc.xml.in
@@ -526,6 +526,58 @@ for (i = 0; i < nbins; i++) {
8, 10, or 16, depending on prefix), and yet others have raw string
values.
+
+ DELAYED MEMORY RETURN
+ Some jemalloc behavior delays the return of
+ free()d memory to the operating system.
+ This is an optimization to increase allocator speed (since
+ if the application needs memory again soon after,
+ the system calls to return it to the OS and to re-request it
+ can be skipped).
+ It comes at the the cost of retaining system memory, which is then
+ not available to other processes.
+
+ This can lead to out-of-memory crashes:
+ For example, if the program completes a task that needed 70% of
+ the machine's memory, frees it (with delayed return),
+ and then invokes a child process that also needs
+ 70% of memory, the total consumption is 140% of the machine's memory.
+ The operating system did not observe the program returning
+ the memory, so it could not allocate it to the child process.
+
+
+ Delayed memory return is controlled by the option
+ opt.dirty_decay_ms
+ which is enabled by default.
+
+
+ Be aware that decay times > 0 will not be honored until the next relevant
+ call into jemalloc, unless you also
+ enable background_thread.
+ Without background_thread,
+ processes that are sleeping (e.g. because they call sleep(),
+ block on user input, network/file activity, or run subprocesses) will not purge memory
+ after the configured time, but at a potentially much later time
+ (seconds or hours or months, depending on how long the program blocks).
+
+
+ It is thus recommended that if your program uses and frees significant
+ amounts of memory that other processes (including its own child processes)
+ may subsequently need, you should either set
+ opt.dirty_decay_ms
+ to 0 if you need guarantees that the freed memory will be
+ immediately available to them, or that you enable
+ background_thread
+ if you desire a delay that will be approximately honored.
+ Alternatively, a performance tradeoff in between is to set
+ opt.dirty_decay_ms
+ to 0 and enable
+ opt.muzzy_decay_ms
+ (on platforms where it is supported); this will make freed memory
+ available immediately available to other processes with lower overheads,
+ but may negatively affect observability as documented for that option.
+
+ IMPLEMENTATION NOTESTraditionally, allocators have used
@@ -1167,7 +1219,14 @@ mallctl("arena." STRINGIFY(MALLCTL_ARENAS_ALL) ".decay",
purged according to a sigmoidal decay curve that starts and ends with
zero purge rate. A decay time of 0 causes all unused dirty pages to be
purged immediately upon creation. A decay time of -1 disables purging.
- The default decay time is 10 seconds. See DELAYED MEMORY RETURN
+ for the adverse effect that this mechanism can have under memory pressure.
+ Be aware that decay times > 0 will not be honored
+ until the next relevant call into jemalloc,
+ unless background_thread
+ is enabled; see DELAYED MEMORY RETURN.
+ See arenas.dirty_decay_ms
and arena.<i>.dirty_decay_ms
@@ -1191,11 +1250,21 @@ mallctl("arena." STRINGIFY(MALLCTL_ARENAS_ALL) ".decay",
subsequently purged in a manner that left them subject to the
reclamation whims of the operating system (e.g.
madvise(...MADV_FREE)),
- and therefore in an indeterminate state. The pages are incrementally
+ and therefore in an indeterminate state.
+ A drawback of this method is reduced observability, since e.g. on Linux,
+ memory freed this way is still displayed as resident process memory (RSS)
+ in many tools that display memory usage, making it more difficult to check
+ how much memory a process is actually using.
+ The pages are incrementally
purged according to a sigmoidal decay curve that starts and ends with
zero purge rate. A decay time of 0 causes all unused muzzy pages to be
purged immediately upon creation. A decay time of -1 disables purging.
- Muzzy decay is disabled by default (with decay time 0). See 0 will not be honored
+ until the next relevant call into jemalloc,
+ unless background_thread
+ is enabled; see DELAYED MEMORY RETURN.
+ See arenas.muzzy_decay_ms
and arena.<i>.muzzy_decay_ms