Commit graph

824 commits

Author SHA1 Message Date
Shai Duvdevani
257e64b968 Unlike prof_sample which is supported only with profiling mode active, prof_threshold is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks. 2025-01-29 18:55:52 -08:00
Qi Wang
607b866035 Check for 0 input when setting max_background_thread through mallctl.
Reported by @nc7s.
2025-01-28 10:38:56 -08:00
Dmitry Ilvokhin
52fa9577ba Fix integer overflow in test/unit/hash.c
`final[3]` is `uint8_t`. Integer conversion rank of `uint8_t` is lower
than integer conversion rank of `int`, so `uint8_t` got promoted to
`int`, which is signed integer type. Shift `final[3]` value left on 24,
when leftmost bit is set overflows `int` and it is undefined behaviour.

Before this change Undefined Behaviour Sanitizer was unhappy about it
with the following message.

```
../test/unit/hash.c:119:25: runtime error: left shift of 176 by 24
places cannot be represented in type 'int'
```

After this commit problem is gone.
2025-01-17 12:54:22 -08:00
Guangli Dai
587676fee8 Disable psset test when hugepage size is too large. 2024-12-17 12:35:35 -08:00
Dmitry Ilvokhin
46690c9ec0 Fix test_retained on boxes with a lot of CPUs
We are trying to create `ncpus * 2` threads for this test and place them
into `VARIABLE_ARRAY`, but `VARIABLE_ARRAY` can not be more than
`VARIABLE_ARRAY_SIZE_MAX` bytes. When there are a lot of threads on the
box test always fails.

```
$ nproc
176

$ make -j`nproc` tests_unit && ./test/unit/retained
<jemalloc>: ../test/unit/retained.c:123: Failed assertion:
"sizeof(thd_t) * (nthreads) <= VARIABLE_ARRAY_SIZE_MAX"
Aborted (core dumped)
```

There is no need for high concurrency for this test as we are only
checking stats there and it's behaviour is quite stable regarding number
of allocating threads.

Limited number of threads to 16 to save compute resources (on CI for
example) and reduce tests running time.

Before the change (`nproc` is 80 on this box).

```
$ make -j`nproc` tests_unit && time ./test/unit/retained
<...>
real    0m0.372s
user    0m14.236s
sys     0m12.338s
```

After the change (same box).

```
$ make -j`nproc` tests_unit && time ./test/unit/retained
<...>
real    0m0.018s
user    0m0.108s
sys     0m0.068s
```
2024-12-02 14:12:26 -08:00
Dmitry Ilvokhin
6092c980a6 Expose psset state stats
When evaluating changes in HPA logic, it is useful to know internal
`hpa_shard` state. Great deal of this state is `psset`. Some of the
`psset` stats was available, but in disaggregated form, which is not
very convenient. This commit exposed `psset` counters to `mallctl`
and malloc stats dumps.

Example of how malloc stats dump will look like after the change.

HPA shard stats:
  Pageslabs: 14899 (4354 huge, 10545 nonhuge)
  Active pages: 6708166 (2228917 huge, 4479249 nonhuge)
  Dirty pages: 233816 (331 huge, 233485 nonhuge)
  Retained pages: 686306
  Purge passes: 8730 (10 / sec)
  Purges: 127501 (146 / sec)
  Hugeifies: 4358 (5 / sec)
  Dehugifies: 4 (0 / sec)

Pageslabs, active pages, dirty pages and retained pages are rows added
by this change.
2024-11-21 09:23:32 -08:00
Dmitry Ilvokhin
3820e38dc1 Remove validation for HPA ratios
Config validation was introduced at 3aae792b with main intention to fix
infinite purging loop, but it didn't actually fix the underlying
problem, just masked it. Later 47d69b4ea was merged to address the same
problem.

Options `hpa_dirty_mult` and `hpa_hugification_threshold` have different
application dimensions: `hpa_dirty_mult` applied to active memory on the
shard, but `hpa_hugification_threshold` is a threshold for single
pageslab (hugepage). It doesn't make much sense to sum them up together.

While it is true that too high value of `hpa_dirty_mult` and too low
value of `hpa_hugification_threshold` can lead to pathological
behaviour, it is true for other options as well. Poor configurations
might lead to suboptimal and sometimes completely unacceptable
behaviour and that's OK, that is exactly the reason why they are called
poor.

There are other mechanism exist to prevent extreme behaviour, when we
hugified and then immediately purged page, see
`hpa_hugify_blocked_by_ndirty` function, which exist to prevent exactly
this case.

Lastly, `hpa_dirty_mult + hpa_hugification_threshold >= 1` constraint is
too tight and prevents a lot of valid configurations.
2024-11-20 18:59:07 -08:00
Dmitry Ilvokhin
0ce13c6fb5 Add opt hpa_hugify_sync to hugify synchronously
Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort
synchronous collapse of the native pages mapped by the memory range into
transparent huge pages.

Synchronous hugification might be beneficial for at least two reasons:
we are not relying on khugepaged anymore and get an instant feedback if
range wasn't hugified.

If `hpa_hugify_sync` option is on, we'll try to perform synchronously
collapse and if it wasn't successful, we'll fallback to asynchronous
behaviour.
2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin
b9758afff0 Add nstime_ms_since to get time since in ms
Milliseconds are used a lot in hpa, so it is convenient to have
`nstime_ms_since` function instead of dividing to `MILLION` constantly.

For consistency renamed `nstime_msec` to `nstime_ms` as `ms` abbreviation
is used much more commonly across codebase than `msec`.

```
$ grep -Rn '_msec' include src | wc -l
2

$ grep -RPn '_ms( |,|:)' include src | wc -l
72
```

Function `nstime_msec` wasn't used anywhere in the code yet.
2024-11-08 10:37:28 -08:00
Nathan Slingerland
edc1576f03 Add safe frame-pointer backtrace unwinder 2024-10-01 11:01:56 -07:00
Dmitry Ilvokhin
4f4fd42447 Remove strict_min_purge_interval option
Option `experimental_hpa_strict_min_purge_interval` was expected to be
temporary to simplify rollout of a bugfix. Now, when bugfix rollout is
complete it is safe to remove this option.
2024-09-25 11:49:18 -07:00
Nathan Slingerland
60f472f367 Fix initialization of pop_attempt_results in bin_batching test 2024-09-12 11:36:17 -07:00
Shirui Cheng
baa5a90cc6 fix nstime_update_mock in arena_decay unit test 2024-08-29 10:50:33 -07:00
Shirui Cheng
8c54637f8c Better trigger race condition in bin_batching unit test 2024-08-23 14:10:04 -07:00
Dmitry Ilvokhin
c7ccb8d7e9 Add experimental prefix to hpa_strict_min_purge_interval
Goal is to make it obvious this option is experimental.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
aaa29003ab Limit maximum number of purged slabs with option
Option `experimental_hpa_max_purge_nhp` introduced for backward
compatibility reasons: to make it possible to have behaviour similar
to buggy `hpa_strict_min_purge_interval` implementation.

When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit
to number of slabs we'll purge on each iteration. Otherwise, we'll purge
no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in
turn means we might not purge enough dirty pages to satisfy
`hpa_dirty_mult` requirement.

Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and
`hpa_strict_min_purge_interval` options allows us to have steady rate of
pages returned back to the system. This provides a strickier latency
guarantees as number of `madvise` calls is bounded (and hence number of
TLB shootdowns is limited) in exchange to weaker memory usage
guarantees.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
143f458188 Fix hpa_strict_min_purge_interval option logic
We update `shard->last_purge` on each call of `hpa_try_purge` if we
purged something. This means, when `hpa_strict_min_purge_interval`
option is set only one slab will be purged, because on the next
call condition for too frequent purge protection
`since_last_purge_ms < shard->opts.min_purge_interval_ms` will always
be true. This is not an intended behaviour.

Instead, we need to check `min_purge_interval_ms` once and purge as many
pages as needed to satisfy requirements for `hpa_dirty_mult` option.

Make possible to count number of actions performed in unit tests (purge,
hugify, dehugify) instead of binary: called/not called. Extended current
unit tests with cases where we need to purge more than one page for a
purge phase.
2024-08-20 10:02:38 -07:00
Shirui Cheng
8fefabd3a4 increase the ncached_max in fill_flush test case to 1024 2024-08-06 13:16:09 -07:00
Shirui Cheng
48f66cf4a2 add a size check when declare a stack array to be less than 2048 bytes 2024-08-06 13:16:09 -07:00
Nathan Slingerland
bc32ddff2d Add usize to prof_sample_hook_t 2024-07-30 10:29:30 -07:00
Dmitry Ilvokhin
b66f689764 Emit long string values without truncation
There are few long options (`bin_shards` and `slab_sizes` for example)
when they are specified and we emit statistics value gets truncated.

Moved emitting logic for strings into separate `emitter_emit_str`
function. It will try to emit string same way as before and if value is
too long will fallback emiting rest partially with chunks of `BUF_SIZE`.

Justification for long strings (longer than `BUF_SIZE`) is not
supported.
2024-07-29 13:58:31 -07:00
Shirui Cheng
a1fcbebb18 skip tcache GC for tcache_max unit test 2024-06-25 12:59:45 -07:00
Dmitry Ilvokhin
867c6dd7dc Option to guard hpa_min_purge_interval_ms fix
Change in `hpa_min_purge_interval_ms` handling logic is not backward
compatible as it might increase memory usage. Now this logic guarded by
`hpa_strict_min_purge_interval` option.

When `hpa_strict_min_purge_interval` is true, we will purge no more than
`hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is
false, old purging logic behaviour is preserved.

Long term strategy migrate all users of hpa to new logic and then delete
`hpa_strict_min_purge_interval` option.
2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin
91a6d230db Respect hpa_min_purge_interval_ms option
Currently, hugepages aware allocator backend works together with classic
one as a fallback for not yet supported allocations. When background
threads are enabled wake up time for classic interfere with hpa as there
were no checks inside hpa purging logic to check if we are not purging too
frequently. If background thread is running and `hpa_should_purge`
returns true, then we will purge, even if we purged less than
hpa_min_purge_interval_ms ago.
2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin
90c627edb7 Export hugepage size with arenas.hugepage 2024-06-05 15:37:41 -07:00
David Goldblatt
fc615739cb Add batching to arena bins.
This adds a fast-path for threads freeing a small number of allocations to
bins which are not their "home-base" and which encounter lock contention in
attempting to do so. In producer-consumer workflows, such small lock hold times
can cause lock convoying that greatly increases overall bin mutex contention.
2024-05-22 10:30:31 -07:00
David Goldblatt
c085530c71 Tcache batching: Plumbing
In the next commit, we'll start using the batcher to eliminate mutex traffic.
To avoid cluttering up that commit with the random bits of busy-work it entails,
we'll centralize them here.  This commit introduces:
- A batched bin type.
- The ability to mix batched and unbatched bins in the arena.
- Conf parsing to set batches per size and a max batched size.
- mallctl access to the corresponding opt-namespace keys.
- Stats output of the above.
2024-05-22 10:30:31 -07:00
David Goldblatt
70c94d7474 Add batcher module.
This can be used to batch up simple operation commands for later use by another
thread.
2024-05-22 10:30:31 -07:00
Dmitry Ilvokhin
47d69b4eab HPA: Fix infinite purging loop
One of the condition to start purging is `hpa_hugify_blocked_by_ndirty`
function call returns true. This can happen in cases where we have no
dirty memory for this shard at all. In this case purging loop will be an
infinite loop.

`hpa_hugify_blocked_by_ndirty` was introduced at 0f6c420, but at that
time purging loop has different form and additional `break` was not
required. Purging loop form was re-written at 6630c5989, but additional
exit condition wasn't added there at the time.

Repo code was shared by Patrik Dokoupil at [1], I stripped it down to
minimum to reproduce issue in jemalloc unit tests.

[1]: https://github.com/jemalloc/jemalloc/pull/2533
2024-04-30 13:46:32 -07:00
Shirui Cheng
4b555c11a5 Enable heap profiling on MacOS 2024-04-09 12:57:01 -07:00
Daniel Hodges
11038ff762 Add support for namespace pids in heap profile names
This change adds support for writing pid namespaces to the filename of a
heap profile. When running with namespaces pids may reused across
namespaces and if mounts are shared where profiles are written there is
not a great way to differentiate profiles between pids.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Signed-off-by: Daniel Hodges <hodgesd@fb.com>
2024-04-09 10:27:52 -07:00
Shirui Cheng
373884ab48 print out all malloc_conf settings in stats 2024-02-29 12:12:44 -08:00
Qi Wang
a2c5267409 HPA: Allow frequent reused alloc to bypass the slab_max_alloc limit, as long as
it's within the huge page size.  These requests do not concern internal
fragmentation with huge pages, since the entire range is expected to be
accessed.
2024-01-18 14:51:04 -08:00
Shirui Cheng
e4817c8d89 Cleanup cache_bin_info_t* info input args 2023-10-25 10:27:31 -07:00
guangli-dai
6fb3b6a8e4 Refactor the tcache initiailization
1. Pre-generate all default tcache ncached_max in tcache_boot;
2. Add getters returning default ncached_max and ncached_max_set;
3. Refactor tcache init so that it is always init with a given setting.
2023-10-18 14:11:46 -07:00
guangli-dai
8a22d10b83 Allow setting default ncached_max for each bin through malloc_conf 2023-10-18 14:11:46 -07:00
guangli-dai
630f7de952 Add mallctl to set and get ncached_max of each cache_bin.
1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the
    ncached_max of the bin with the input sizeclass, passed in through
    oldp (will be upper casted if not an exact bin size is given).
2. `thread_tcache_ncached_max_write` takes in a char array
    representing the settings for bins in the tcache.
2023-10-17 14:53:23 -07:00
guangli-dai
6b197fdd46 Pre-generate ncached_max for all bins for better tcache_max tuning experience. 2023-10-17 14:53:23 -07:00
Shirui Cheng
36becb1302 metadata usage breakdowns: tracking edata and rtree usages 2023-10-11 11:56:01 -07:00
Qi Wang
72cfdce718 Allocate tcache stack from base allocator
When using metadata_thp, allocate tcache bin stacks from base0, which means they
will be placed on huge pages along with other metadata, instead of mixed with
other regular allocations.

In order to do so, modified the base allocator to support limited reuse: freed
tcached stacks (from thread termination) will be returned to base0 and made
available for reuse, but no merging will be attempted since they were bump
allocated out of base blocks. These reused base extents are managed using
separately allocated base edata_t -- they are cached in base->edata_avail when
the extent is all allocated.

One tricky part is, stats updating must be skipped for such reused extents
(since they were accounted for already, and there is no purging for base). This
requires tracking the "if is reused" state explicitly and bypass the stats
updates when allocating from them.
2023-09-18 12:18:32 -07:00
guangli-dai
a442d9b895 Enable per-tcache tcache_max
1. add tcache_max and nhbins into tcache_t so that they are per-tcache,
   with one auto tcache per thread, it's also per-thread;
2. add mallctl for each thread to set its own tcache_max (of its auto tcache);
3. store the maximum number of items in each bin instead of using a global storage;
4. add tests for the modifications above.
5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.
2023-09-06 10:47:14 -07:00
guangli-dai
fbca96c433 Remove unnecessary parameters for cache_bin_postincrement. 2023-09-06 10:47:14 -07:00
Evers Chen
7d9eceaf38 Fix array bounds false warning in gcc 12.3.0
1.error: array subscript 232 is above array bounds of ‘size_t[232]’ in gcc 12.3.0
2.it also optimizer to the code
2023-09-05 14:33:55 -07:00
Kevin Svetlitski
da66aa391f Enable a few additional warnings for CI and fix the issues they uncovered
- `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are
  helpful for finding dead code and/or things that should be `static`
  but aren't marked as such.
- `-Wunused-macros` is of similar utility, but for identifying dead macros.
- `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly
  what they say: flag unreachable code.
2023-08-11 13:56:23 -07:00
guangli-dai
254c4847e8 Print colorful reminder for failed tests. 2023-08-08 15:01:07 -07:00
Kevin Svetlitski
3aae792b10 Fix infinite purging loop in HPA
As reported in #2449, under certain circumstances it's possible to get
stuck in an infinite loop attempting to purge from the HPA. We now
handle this by validating the HPA settings at the end of
configuration parsing and either normalizing them or aborting depending on
if `abort_conf` is set.
2023-08-08 14:36:19 -07:00
Kevin Svetlitski
7e54dd1ddb Define PROF_TCTX_SENTINEL instead of using magic numbers
This makes the code more readable on its own, and also sets the stage
for more cleanly handling the pointer provenance lints in a following
commit.
2023-07-24 14:40:42 -07:00
Kevin Svetlitski
41e0b857be Make headers self-contained by fixing #includes
Header files are now self-contained, which makes the relationships
between the files clearer, and crucially allows LSP tools like `clangd`
to function correctly in all of our header files. I have verified that
the headers are self-contained (aside from the various Windows shims) by
compiling them as if they were C files – in a follow-up commit I plan to
add this to CI to ensure we don't regress on this front.
2023-07-14 09:06:32 -07:00
Kevin Svetlitski
314c073a38 Print the failed assertion before aborting in test cases
This makes it faster and easier to debug, so that you don't need to fire
up a debugger just to see which assertion triggered in a failing test.
2023-07-13 15:07:17 -07:00
Kevin Svetlitski
65d3b5989b Print test error messages in color when stderr is a terminal
When stderr is a terminal and supports color, print error messages
from tests in red to make them stand out from the surrounding output.
2023-07-13 13:03:23 -07:00