Commit graph

1947 commits

Author SHA1 Message Date
Guangli Dai
bffe921ba0 Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE
For PAC, to avoid having too many bins, arena bins still have the same
layout.  This means some extra search is needed for a page-level request that
is not aligned with the orginal size class: it should also search the heap
before the current index since the previous heap might also be able to
have some allocations satisfying it.  The same changes apply to HPA's
psset.

This search relies on the enumeration of the heap because not all allocs in
the previous heap are guaranteed to satisfy the request.  To balance the
memory and CPU overhead, we currently enumerate at most a fixed number
of nodes before concluding none can satisfy the request during an
enumeration.
2025-02-19 12:03:30 -08:00
Guangli Dai
205ba7b223 Prepare tcache for size to grow by PAGE over GROUP*PAGE
To prepare for the upcoming changes where size class grows by PAGE when
larger than NGROUP * PAGE, disable the tcache when it is larger than 2 *
NGROUP * PAGE. The threshold for tcache is set higher to prevent perf
regression as much as possible while usizes between NGROUP * PAGE and 2 *
NGROUP * PAGE happen to grow by PAGE.
2025-02-10 16:34:10 -08:00
guangli-dai
eac4163a95 Add config option limit-usize-gap and runtime option limit_usize_gap.
Adding a build-time config option (--enable-limit-usize-gap) and a
runtime one (limit_usize_gap) to guard the changes.  When build-time
config is enabled, some minor CPU overhead is expected because usize
will be stored and accessed apart from index.  When runtime option is
also enabled (it can only be enabled with the build-time config
enabled). a new usize calculation approach wil be employed.  This new
calculation will ceil size to the closest multiple of PAGE for all sizes
larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes.
2025-02-07 10:56:33 -08:00
roblabla
c17bf8b368 Disable config from file or envvar with build flag
This adds a new autoconf flag, --disable-user-config, which disables
reading the configuration from /etc/malloc.conf or the MALLOC_CONF
environment variable. This can be useful when integrating jemalloc in a
binary that internally handles all aspects of the configuration and
shouldn't be impacted by ambient change in the environment.
2025-02-05 15:01:50 -08:00
Shai Duvdevani
257e64b968 Unlike prof_sample which is supported only with profiling mode active, prof_threshold is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks. 2025-01-29 18:55:52 -08:00
Qi Wang
607b866035 Check for 0 input when setting max_background_thread through mallctl.
Reported by @nc7s.
2025-01-28 10:38:56 -08:00
Qi Wang
20cc983314 Fix the gettid() detection caught by @mrluanma . 2025-01-22 10:30:53 -08:00
Dan Horák
17881ebbfd Add configure check for gettid() presence
The gettid() function is available on Linux in glibc only since version
2.30. There are supported distributions that still use older glibc
version. Thus add a configure check if the gettid() function is
available and extend the check in src/prof_stack_range.c so it's skipped
also when gettid() isn't available.

Fixes: https://github.com/jemalloc/jemalloc/issues/2740
2024-12-17 12:40:54 -08:00
Guangli Dai
587676fee8 Disable psset test when hugepage size is too large. 2024-12-17 12:35:35 -08:00
Guangli Dai
6786934280 Fix ehooks assertion for arena creation 2024-12-11 13:33:32 -08:00
Dmitry Ilvokhin
6092c980a6 Expose psset state stats
When evaluating changes in HPA logic, it is useful to know internal
`hpa_shard` state. Great deal of this state is `psset`. Some of the
`psset` stats was available, but in disaggregated form, which is not
very convenient. This commit exposed `psset` counters to `mallctl`
and malloc stats dumps.

Example of how malloc stats dump will look like after the change.

HPA shard stats:
  Pageslabs: 14899 (4354 huge, 10545 nonhuge)
  Active pages: 6708166 (2228917 huge, 4479249 nonhuge)
  Dirty pages: 233816 (331 huge, 233485 nonhuge)
  Retained pages: 686306
  Purge passes: 8730 (10 / sec)
  Purges: 127501 (146 / sec)
  Hugeifies: 4358 (5 / sec)
  Dehugifies: 4 (0 / sec)

Pageslabs, active pages, dirty pages and retained pages are rows added
by this change.
2024-11-21 09:23:32 -08:00
Dmitry Ilvokhin
3820e38dc1 Remove validation for HPA ratios
Config validation was introduced at 3aae792b with main intention to fix
infinite purging loop, but it didn't actually fix the underlying
problem, just masked it. Later 47d69b4ea was merged to address the same
problem.

Options `hpa_dirty_mult` and `hpa_hugification_threshold` have different
application dimensions: `hpa_dirty_mult` applied to active memory on the
shard, but `hpa_hugification_threshold` is a threshold for single
pageslab (hugepage). It doesn't make much sense to sum them up together.

While it is true that too high value of `hpa_dirty_mult` and too low
value of `hpa_hugification_threshold` can lead to pathological
behaviour, it is true for other options as well. Poor configurations
might lead to suboptimal and sometimes completely unacceptable
behaviour and that's OK, that is exactly the reason why they are called
poor.

There are other mechanism exist to prevent extreme behaviour, when we
hugified and then immediately purged page, see
`hpa_hugify_blocked_by_ndirty` function, which exist to prevent exactly
this case.

Lastly, `hpa_dirty_mult + hpa_hugification_threshold >= 1` constraint is
too tight and prevents a lot of valid configurations.
2024-11-20 18:59:07 -08:00
Dmitry Ilvokhin
0ce13c6fb5 Add opt hpa_hugify_sync to hugify synchronously
Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort
synchronous collapse of the native pages mapped by the memory range into
transparent huge pages.

Synchronous hugification might be beneficial for at least two reasons:
we are not relying on khugepaged anymore and get an instant feedback if
range wasn't hugified.

If `hpa_hugify_sync` option is on, we'll try to perform synchronously
collapse and if it wasn't successful, we'll fallback to asynchronous
behaviour.
2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin
b82333fdec Split stats_arena_hpa_shard_print function
Make multiple functions from `stats_arena_hpa_shard_print` for
readability and ease of change in the future.
2024-11-08 12:18:15 -08:00
Dmitry Ilvokhin
b9758afff0 Add nstime_ms_since to get time since in ms
Milliseconds are used a lot in hpa, so it is convenient to have
`nstime_ms_since` function instead of dividing to `MILLION` constantly.

For consistency renamed `nstime_msec` to `nstime_ms` as `ms` abbreviation
is used much more commonly across codebase than `msec`.

```
$ grep -Rn '_msec' include src | wc -l
2

$ grep -RPn '_ms( |,|:)' include src | wc -l
72
```

Function `nstime_msec` wasn't used anywhere in the code yet.
2024-11-08 10:37:28 -08:00
Qi Wang
6d625d5e5e Add support for clock_gettime_nsec_np()
Prefer clock_gettime_nsec_np(CLOCK_UPTIME_RAW) to mach_absolute_time().
2024-10-14 10:33:27 -07:00
Nathan Slingerland
edc1576f03 Add safe frame-pointer backtrace unwinder 2024-10-01 11:01:56 -07:00
Ben Niu
3a0d9cdadb Use MSVC __declspec(thread) for TSD on Windows 2024-09-30 11:33:44 -07:00
Guangli Dai
1c900088c3 Do not support hpa if HUGEPAGE is too large. 2024-09-27 15:34:13 -07:00
Dmitry Ilvokhin
4f4fd42447 Remove strict_min_purge_interval option
Option `experimental_hpa_strict_min_purge_interval` was expected to be
temporary to simplify rollout of a bugfix. Now, when bugfix rollout is
complete it is safe to remove this option.
2024-09-25 11:49:18 -07:00
Qi Wang
44db479fad Fix the lock owner sanity checking during background thread boot.
During boot, some mutexes are not initialized yet, plus there's no point taking
many mutexes while everything is covered by the global init lock, so the locking
assumptions in some functions (e.g. background_thread_enabled_set()) can't be
enforced.  Skip the lock owner check in this case.
2024-09-23 18:06:07 -07:00
Qi Wang
de5606d0d8 Fix a missing init value warning caught by static analysis. 2024-09-20 16:56:07 -07:00
Qi Wang
3eb7a4b53d Fix mutex state tracking around pthread_cond_wait().
pthread_cond_wait drops and re-acquires the mutex internally, w/o
going through our wrapper.  Update the locked state explicitly.
2024-09-20 16:56:07 -07:00
Nathan Slingerland
8c2e15d1a5 Add malloc_open() / malloc_close() reentrancy safe helpers 2024-09-12 15:38:08 -07:00
Qi Wang
c1a3ca3755 Adjust the value width in stats output.
Some of the values are accumulative and can reach high after running for long
periods.
2024-09-11 14:29:32 -07:00
Qi Wang
3383b98f1b Check if the huge page size is expected when enabling HPA. 2024-09-04 15:43:59 -07:00
Qi Wang
cd05b19f10 Fix the VM over-reservation on aarch64 w/ larger pages.
HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages),
in which case it would cause grow_retained / exp_grow to over-reserve VMs.

Similarly, make sure the base alloc has a const 2M alignment.
2024-09-04 15:43:59 -07:00
Shirui Cheng
7c99686165 Better handle burst allocation on tcache_alloc_small_hard 2024-08-29 10:50:33 -07:00
Shirui Cheng
0c88be9e0a Regulate GC frequency by requiring a time interval between two consecutive GCs 2024-08-29 10:50:33 -07:00
Shirui Cheng
e2c9f3a9ce Take locality into consideration when doing GC flush 2024-08-29 10:50:33 -07:00
Shirui Cheng
14d5dc136a Allow a range for the nfill passed to arena_cache_bin_fill_small 2024-08-29 10:50:33 -07:00
Shirui Cheng
f68effe4ac Add a runtime option opt_experimental_tcache_gc to guard the new design 2024-08-29 10:50:33 -07:00
Ben Niu
9e123a833c Leverage new Windows API TlsGetValue2 for performance 2024-08-28 16:50:33 -07:00
Qi Wang
bd0a5b0f3b Fix static analysis warnings.
Newly reported warnings included several reserved macro identifier, and
false-positive used-uninitialized.
2024-08-28 16:03:53 -07:00
Dmitry Ilvokhin
c7ccb8d7e9 Add experimental prefix to hpa_strict_min_purge_interval
Goal is to make it obvious this option is experimental.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
aaa29003ab Limit maximum number of purged slabs with option
Option `experimental_hpa_max_purge_nhp` introduced for backward
compatibility reasons: to make it possible to have behaviour similar
to buggy `hpa_strict_min_purge_interval` implementation.

When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit
to number of slabs we'll purge on each iteration. Otherwise, we'll purge
no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in
turn means we might not purge enough dirty pages to satisfy
`hpa_dirty_mult` requirement.

Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and
`hpa_strict_min_purge_interval` options allows us to have steady rate of
pages returned back to the system. This provides a strickier latency
guarantees as number of `madvise` calls is bounded (and hence number of
TLB shootdowns is limited) in exchange to weaker memory usage
guarantees.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
143f458188 Fix hpa_strict_min_purge_interval option logic
We update `shard->last_purge` on each call of `hpa_try_purge` if we
purged something. This means, when `hpa_strict_min_purge_interval`
option is set only one slab will be purged, because on the next
call condition for too frequent purge protection
`since_last_purge_ms < shard->opts.min_purge_interval_ms` will always
be true. This is not an intended behaviour.

Instead, we need to check `min_purge_interval_ms` once and purge as many
pages as needed to satisfy requirements for `hpa_dirty_mult` option.

Make possible to count number of actions performed in unit tests (purge,
hugify, dehugify) instead of binary: called/not called. Extended current
unit tests with cases where we need to purge more than one page for a
purge phase.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
0a9f51d0d8 Simplify hpa_shard_maybe_do_deferred_work
It doesn't make much sense to repeat purging once we done with
hugification, because we can de-hugify pages that were hugified just
moment ago for no good reason. Let them wait next deferred work phase
instead. And if they still meeting purging conditions then, purge them.
2024-08-20 10:02:38 -07:00
Amaury Séchet
a25b9b8ba9 Simplify the logic when bumping lg_fill_div. 2024-08-06 13:31:49 -07:00
Shirui Cheng
47c9bcd402 Use a for-loop to fulfill flush requests that are larger than CACHE_BIN_NFLUSH_BATCH_MAX items 2024-08-06 13:16:09 -07:00
Shirui Cheng
48f66cf4a2 add a size check when declare a stack array to be less than 2048 bytes 2024-08-06 13:16:09 -07:00
Burton Li
8dc97b1108 Fix NSTIME_MONOTONIC for win32 implementation 2024-07-30 10:30:41 -07:00
Nathan Slingerland
bc32ddff2d Add usize to prof_sample_hook_t 2024-07-30 10:29:30 -07:00
Danny Lin
c893fcd169 Change macOS mmap tag to fix conflict with CoreMedia
Tag 101 is assigned to "CoreMedia Capture Data", which makes for confusing output when debugging.

To avoid conflicts, use a tag in the reserved application-specific range from 240–255 (inclusive).

All assigned tags: 94d3b45284/osfmk/mach/vm_statistics.h (L773-L775)
2024-06-26 14:53:48 -07:00
Guangli Dai
8477ec9562 Set dependent as false for all rtree reads without ownership 2024-06-24 10:50:20 -07:00
Dmitry Ilvokhin
867c6dd7dc Option to guard hpa_min_purge_interval_ms fix
Change in `hpa_min_purge_interval_ms` handling logic is not backward
compatible as it might increase memory usage. Now this logic guarded by
`hpa_strict_min_purge_interval` option.

When `hpa_strict_min_purge_interval` is true, we will purge no more than
`hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is
false, old purging logic behaviour is preserved.

Long term strategy migrate all users of hpa to new logic and then delete
`hpa_strict_min_purge_interval` option.
2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin
91a6d230db Respect hpa_min_purge_interval_ms option
Currently, hugepages aware allocator backend works together with classic
one as a fallback for not yet supported allocations. When background
threads are enabled wake up time for classic interfere with hpa as there
were no checks inside hpa purging logic to check if we are not purging too
frequently. If background thread is running and `hpa_should_purge`
returns true, then we will purge, even if we purged less than
hpa_min_purge_interval_ms ago.
2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin
90c627edb7 Export hugepage size with arenas.hugepage 2024-06-05 15:37:41 -07:00
David Goldblatt
f9c0b5f7f8 Bin batching: add some stats.
This lets us easily see what fraction of flush load is being taken up by the
bins, and helps guide future optimization approaches (for example: should we
prefetch during cache bin fills? It depends on how many objects the average fill
pops out of the batch).
2024-05-22 10:30:31 -07:00
David Goldblatt
fc615739cb Add batching to arena bins.
This adds a fast-path for threads freeing a small number of allocations to
bins which are not their "home-base" and which encounter lock contention in
attempting to do so. In producer-consumer workflows, such small lock hold times
can cause lock convoying that greatly increases overall bin mutex contention.
2024-05-22 10:30:31 -07:00