romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-07-14 12:47:27 +03:00

Author	SHA1	Message	Date
Andrei Pechkurov	4d0ffa075b	Fix background thread initialization race	2026-03-10 18:14:33 -07:00
Slobodan Predolac	6016d86c18	[SEC] Make SEC owned by hpa_shard, simplify the code, add stats, lock per bin	2026-03-10 18:14:33 -07:00
Guangli Dai	0988583d7c	Add a mallctl for users to get an approximate of active bytes.	2026-03-10 18:14:33 -07:00
Slobodan Predolac	355774270d	[EASY] Encapsulate better, do not pass hpa_shard when hooks are enough, move shard independent actions to hpa_utils	2026-03-10 18:14:33 -07:00
Slobodan Predolac	47aeff1d08	Add experimental_enforce_hugify	2026-03-10 18:14:33 -07:00
Slobodan Predolac	3678a57c10	When extracting from central, hugify_eager is different than start_as_huge	2026-03-10 18:14:33 -07:00
guangli-dai	2cfa41913e	Refactor init_system_thp_mode and print it in malloc stats.	2026-03-10 18:14:33 -07:00
Carl Shapiro	f714cd9249	Inline the value of an always false boolean local variable Next to its use, which is always as an argument, we include the name of the parameter in a constant. This completes a partially implemented cleanup suggested in an earlier commit.	2026-03-10 18:14:33 -07:00
Slobodan Predolac	7c40be249c	Add npurges and npurge_passes to output of pa_benchmark	2026-03-10 18:14:33 -07:00
Slobodan Predolac	707aab0c95	[pa-bench] Add clock to pa benchmark	2026-03-10 18:14:33 -07:00
Slobodan Predolac	a199278f37	[HPA] Add ability to start page as huge and more flexibility for purging	2026-03-10 18:14:33 -07:00
Slobodan Predolac	2688047b56	Revert "Do not dehugify when purging" This reverts commit `16c5abd1cd`.	2026-03-10 18:14:33 -07:00
Slobodan Predolac	de886e05d2	Revert "Remove an unused function and global variable" This reverts commit `acd85e5359`.	2026-03-10 18:14:33 -07:00
guangli-dai	261591f123	Add a page-allocator microbenchmark.	2026-03-10 18:14:33 -07:00
guangli-dai	56cdce8592	Adding trace analysis in preparation for page allocator microbenchmark.	2026-03-10 18:14:33 -07:00
Shirui Cheng	2114349a4e	Revert PR #2608 : Manually revert commits 70c94d..f9c0b5 Closes: #2707	2026-03-10 18:14:33 -07:00
lexprfuncall	e4fa33148a	Remove an unused function and global variable When the dehugify functionality was retired in an previous commit, a dehugify-related function and global variable in a test was accidentally left in-place causing builds that add -Werror to CFLAGS to fail.	2026-03-10 18:14:33 -07:00
lexprfuncall	a156e997d7	Do not dehugify when purging Giving the advice MADV_DONTNEED to a range of virtual memory backed by a transparent huge page already causes that range of virtual memory to become backed by regular pages.	2026-03-10 18:14:33 -07:00
guangli-dai	6200e8987f	Reformat the codebase with the clang-format 18.	2026-03-10 18:14:33 -07:00
dzhao.ampere	c5547f9e64	test/unit/psset.c: fix SIGSEGV when PAGESIZE is large When hugepage is enabled and PAGESIZE is large, the test could ask for a stack size larger than user limit. Allocating the memory instead can avoid the failure. Closes: #2408	2026-03-10 18:14:33 -07:00
Slobodan Predolac	015b017973	[thread_event] Add support for user events in thread events when stats are enabled	2026-03-10 18:14:33 -07:00
Slobodan Predolac	e6864c6075	[thread_event] Remove macros from thread_event and replace with dynamic event objects	2026-03-10 18:14:33 -07:00
Qi Wang	1972241cd2	Remove unused options in the batched madvise unit tests.	2025-06-02 11:25:37 -07:00
Jason Evans	27d7960cf9	Revert "Extend purging algorithm with peak demand tracking" This reverts commit `ad108d50f1`.	2025-06-02 10:44:37 -07:00
guangli-dai	edaab8b3ad	Turn clang-format off for codes with multi-line commands in macros	2025-05-28 19:22:21 -07:00
guangli-dai	1818170c8d	Fix binshard.sh by specifying bin_shards for all sizes.	2025-05-28 19:21:49 -07:00
Slobodan Predolac	b6338c4ff6	EASY - be explicit in non-vectorized hpa tests	2025-05-19 16:31:04 -07:00
guangli-dai	554185356b	Sample format on tcache_max test	2025-05-19 15:06:13 -07:00
guangli-dai	8347f1045a	Renaming limit_usize_gap to disable_large_size_classes	2025-05-06 14:47:35 -07:00
Guangli Dai	01e9ecbeb2	Remove build-time configuration 'config_limit_usize_gap'	2025-05-06 14:47:35 -07:00
Slobodan Predolac	1956a54a43	[process_madvise] Use process_madvise across multiple huge_pages	2025-04-25 19:19:03 -07:00
Slobodan Predolac	0dfb4a5a1a	Add output argument to hpa_purge_begin to count dirty ranges	2025-04-25 19:19:03 -07:00
Slobodan Predolac	f19f49ef3e	if process_madvise is supported, call it when purging hpa	2025-04-04 13:57:42 -07:00
Dmitry Ilvokhin	ad108d50f1	Extend purging algorithm with peak demand tracking Implementation inspired by idea described in "Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator" paper [1]. Primary idea is to track maximum number (peak) of active pages in use with sliding window and then use this number to decide how many dirty pages we would like to keep. We are trying to estimate maximum amount of active memory we'll need in the near future. We do so by projecting future active memory demand (based on peak active memory usage we observed in the past within sliding window) and adding slack on top of it (an overhead is reasonable to have in exchange of higher hugepages coverage). When peak demand tracking is off, projection of future active memory is active memory we are having right now. Estimation is essentially the same as `nactive_max * (1 + dirty_mult)`. Peak demand purging algorithm controlled by two config options. Option `hpa_peak_demand_window_ms` controls duration of sliding window we track maximum active memory usage in and option `hpa_dirty_mult` controls amount of slack we are allowed to have as a percent from maximum active memory usage. By default `hpa_peak_demand_window_ms == 0` now and we have same behaviour (ratio based purging) that we had before this commit. [1]: https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf	2025-03-13 10:12:22 -07:00
Qi Wang	22440a0207	Implement process_madvise support. Add opt.process_madvise_max_batch which determines if process_madvise is enabled (non-zero) and the max # of regions in each batch. Added another limiting factor which is the space to reserve on stack, which results in the max batch of 128.	2025-03-07 15:32:32 -08:00
Guangli Dai	6035d4a8d3	Cache extra extents in the dirty pool from ecache_alloc_grow	2025-03-06 15:08:13 -08:00
guangli-dai	c067a55c79	Introducing a new usize calculation policy Converting size to usize is what jemalloc has been done by ceiling size to the closest size class. However, this causes lots of memory wastes with HPA enabled. This commit changes how usize is calculated so that the gap between two contiguous usize is no larger than a page. Specifically, this commit includes the following changes: 1. Adding a build-time config option (--enable-limit-usize-gap) and a runtime one (limit_usize_gap) to guard the changes. When build-time config is enabled, some minor CPU overhead is expected because usize will be stored and accessed apart from index. When runtime option is also enabled (it can only be enabled with the build-time config enabled). a new usize calculation approach wil be employed. This new calculation will ceil size to the closest multiple of PAGE for all sizes larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes. Note when the build-time config is enabled, the runtime option is default on. 2. Prepare tcache for size to grow by PAGE over GROUPPAGE. To prepare for the upcoming changes where size class grows by PAGE when larger than NGROUP PAGE, disable the tcache when it is larger than 2 * NGROUP * PAGE. The threshold for tcache is set higher to prevent perf regression as much as possible while usizes between NGROUP * PAGE and 2 * NGROUP * PAGE happen to grow by PAGE. 3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE For PAC, to avoid having too many bins, arena bins still have the same layout. This means some extra search is needed for a page-level request that is not aligned with the orginal size class: it should also search the heap before the current index since the previous heap might also be able to have some allocations satisfying it. The same changes apply to HPA's psset. This search relies on the enumeration of the heap because not all allocs in the previous heap are guaranteed to satisfy the request. To balance the memory and CPU overhead, we currently enumerate at most a fixed number of nodes before concluding none can satisfy the request during an enumeration. 4. Add bytes counter to arena large stats. To prepare for the upcoming usize changes, stats collected by multiplying alive allocations and the bin size is no longer accurate. Thus, add separate counters to record the bytes malloced and dalloced. 5. Change structs use when freeing to avoid using index2size for large sizes. - Change the definition of emap_alloc_ctx_t - Change the read of both from edata_t. - Change the assignment and usage of emap_alloc_ctx_t. - Change other callsites of index2size. Note for the changes in the data structure, i.e., emap_alloc_ctx_t, will be used when the build-time config (--enable-limit-usize-gap) is enabled but they will store the same value as index2size(szind) if the runtime option (opt_limit_usize_gap) is not enabled. 6. Adapt hpa to the usize changes. Change the settings in sec to limit is usage for sizes larger than USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests. 7. Modify usize calculation and corresponding tests. Change the sz_s2u_compute. Note sz_index2size is not always safe now while sz_size2index still works as expected.	2025-03-06 15:08:13 -08:00
Guangli Dai	ac279d7e71	Fix profiling sample metadata lookup during xallocx	2025-03-04 14:42:04 -08:00
Dmitry Ilvokhin	499f306859	Fix arena 0 `deferral_allowed` flag init Arena 0 have a dedicated initialization path, which differs from initialization path of other arenas. The main difference for the purpose of this change is that we initialize arena 0 before we initialize background threads. HPA shard options have `deferral_allowed` flag which should be equal to `background_thread_enabled()` return value, but it wasn't the case before this change, because for arena 0 `background_thread_enabled()` was initialized correctly after arena 0 initialization phase already ended. Below is initialization sequence for arena 0 after this commit to illustrate everything still should be initialized correctly. * `hpa_central_init` initializes HPA Central, before we initialize every HPA shard (including arena's 0). * `background_thread_boot1` initializes `background_thread_enabled()` return value. * `pa_shard_enable_hpa` initializes arena 0 HPA shard. ``` malloc_init_hard ------------- / / \ / / \ / / \ malloc_init_hard_a0_locked background_thread_boot1 pa_shard_enable_hpa / / \ / / \ / / \ arena_boot background_thread_enabled_seta hpa_shard_init \| \| pa_central_init \| \| hpa_central_init ```	2025-02-18 12:10:35 -08:00
Qi Wang	3bc89cfeca	Avoid implicit conversion in test/unit/prof_threshold	2025-01-31 10:18:36 -08:00
Qi Wang	1abeae9ebd	Fix test/unit/prof_threshold when !config_stats	2025-01-30 10:39:49 -08:00
Shai Duvdevani	257e64b968	Unlike `prof_sample` which is supported only with profiling mode active, `prof_threshold` is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks.	2025-01-29 18:55:52 -08:00
Qi Wang	607b866035	Check for 0 input when setting max_background_thread through mallctl. Reported by @nc7s.	2025-01-28 10:38:56 -08:00
Dmitry Ilvokhin	52fa9577ba	Fix integer overflow in test/unit/hash.c `final[3]` is `uint8_t`. Integer conversion rank of `uint8_t` is lower than integer conversion rank of `int`, so `uint8_t` got promoted to `int`, which is signed integer type. Shift `final[3]` value left on 24, when leftmost bit is set overflows `int` and it is undefined behaviour. Before this change Undefined Behaviour Sanitizer was unhappy about it with the following message. ``` ../test/unit/hash.c:119:25: runtime error: left shift of 176 by 24 places cannot be represented in type 'int' ``` After this commit problem is gone.	2025-01-17 12:54:22 -08:00
Guangli Dai	587676fee8	Disable psset test when hugepage size is too large.	2024-12-17 12:35:35 -08:00
Dmitry Ilvokhin	46690c9ec0	Fix `test_retained` on boxes with a lot of CPUs We are trying to create `ncpus * 2` threads for this test and place them into `VARIABLE_ARRAY`, but `VARIABLE_ARRAY` can not be more than `VARIABLE_ARRAY_SIZE_MAX` bytes. When there are a lot of threads on the box test always fails. ``` $ nproc 176 $ make -j`nproc` tests_unit && ./test/unit/retained <jemalloc>: ../test/unit/retained.c:123: Failed assertion: "sizeof(thd_t) * (nthreads) <= VARIABLE_ARRAY_SIZE_MAX" Aborted (core dumped) ``` There is no need for high concurrency for this test as we are only checking stats there and it's behaviour is quite stable regarding number of allocating threads. Limited number of threads to 16 to save compute resources (on CI for example) and reduce tests running time. Before the change (`nproc` is 80 on this box). ``` $ make -j`nproc` tests_unit && time ./test/unit/retained <...> real 0m0.372s user 0m14.236s sys 0m12.338s ``` After the change (same box). ``` $ make -j`nproc` tests_unit && time ./test/unit/retained <...> real 0m0.018s user 0m0.108s sys 0m0.068s ```	2024-12-02 14:12:26 -08:00
Dmitry Ilvokhin	6092c980a6	Expose `psset` state stats When evaluating changes in HPA logic, it is useful to know internal `hpa_shard` state. Great deal of this state is `psset`. Some of the `psset` stats was available, but in disaggregated form, which is not very convenient. This commit exposed `psset` counters to `mallctl` and malloc stats dumps. Example of how malloc stats dump will look like after the change. HPA shard stats: Pageslabs: 14899 (4354 huge, 10545 nonhuge) Active pages: 6708166 (2228917 huge, 4479249 nonhuge) Dirty pages: 233816 (331 huge, 233485 nonhuge) Retained pages: 686306 Purge passes: 8730 (10 / sec) Purges: 127501 (146 / sec) Hugeifies: 4358 (5 / sec) Dehugifies: 4 (0 / sec) Pageslabs, active pages, dirty pages and retained pages are rows added by this change.	2024-11-21 09:23:32 -08:00
Dmitry Ilvokhin	3820e38dc1	Remove validation for HPA ratios Config validation was introduced at `3aae792b` with main intention to fix infinite purging loop, but it didn't actually fix the underlying problem, just masked it. Later `47d69b4ea` was merged to address the same problem. Options `hpa_dirty_mult` and `hpa_hugification_threshold` have different application dimensions: `hpa_dirty_mult` applied to active memory on the shard, but `hpa_hugification_threshold` is a threshold for single pageslab (hugepage). It doesn't make much sense to sum them up together. While it is true that too high value of `hpa_dirty_mult` and too low value of `hpa_hugification_threshold` can lead to pathological behaviour, it is true for other options as well. Poor configurations might lead to suboptimal and sometimes completely unacceptable behaviour and that's OK, that is exactly the reason why they are called poor. There are other mechanism exist to prevent extreme behaviour, when we hugified and then immediately purged page, see `hpa_hugify_blocked_by_ndirty` function, which exist to prevent exactly this case. Lastly, `hpa_dirty_mult + hpa_hugification_threshold >= 1` constraint is too tight and prevents a lot of valid configurations.	2024-11-20 18:59:07 -08:00
Dmitry Ilvokhin	0ce13c6fb5	Add opt `hpa_hugify_sync` to hugify synchronously Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort synchronous collapse of the native pages mapped by the memory range into transparent huge pages. Synchronous hugification might be beneficial for at least two reasons: we are not relying on khugepaged anymore and get an instant feedback if range wasn't hugified. If `hpa_hugify_sync` option is on, we'll try to perform synchronously collapse and if it wasn't successful, we'll fallback to asynchronous behaviour.	2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin	b9758afff0	Add `nstime_ms_since` to get time since in ms Milliseconds are used a lot in hpa, so it is convenient to have `nstime_ms_since` function instead of dividing to `MILLION` constantly. For consistency renamed `nstime_msec` to `nstime_ms` as `ms` abbreviation is used much more commonly across codebase than `msec`. ``` $ grep -Rn '_msec' include src \| wc -l 2 $ grep -RPn '_ms( \|,\|:)' include src \| wc -l 72 ``` Function `nstime_msec` wasn't used anywhere in the code yet.	2024-11-08 10:37:28 -08:00

1 2 3 4 5 ...

865 commits