romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-04-15 15:11:41 +03:00

Author	SHA1	Message	Date
Slobodan Predolac	6016d86c18	[SEC] Make SEC owned by hpa_shard, simplify the code, add stats, lock per bin	2026-03-10 18:14:33 -07:00
Slobodan Predolac	47aeff1d08	Add experimental_enforce_hugify	2026-03-10 18:14:33 -07:00
Slobodan Predolac	3678a57c10	When extracting from central, hugify_eager is different than start_as_huge	2026-03-10 18:14:33 -07:00
Slobodan Predolac	a199278f37	[HPA] Add ability to start page as huge and more flexibility for purging	2026-03-10 18:14:33 -07:00
Slobodan Predolac	2688047b56	Revert "Do not dehugify when purging" This reverts commit `16c5abd1cd`.	2026-03-10 18:14:33 -07:00
lexprfuncall	a156e997d7	Do not dehugify when purging Giving the advice MADV_DONTNEED to a range of virtual memory backed by a transparent huge page already causes that range of virtual memory to become backed by regular pages.	2026-03-10 18:14:33 -07:00
guangli-dai	6200e8987f	Reformat the codebase with the clang-format 18.	2026-03-10 18:14:33 -07:00
Jason Evans	27d7960cf9	Revert "Extend purging algorithm with peak demand tracking" This reverts commit `ad108d50f1`.	2025-06-02 10:44:37 -07:00
Slobodan Predolac	f19f49ef3e	if process_madvise is supported, call it when purging hpa	2025-04-04 13:57:42 -07:00
Dmitry Ilvokhin	ad108d50f1	Extend purging algorithm with peak demand tracking Implementation inspired by idea described in "Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator" paper [1]. Primary idea is to track maximum number (peak) of active pages in use with sliding window and then use this number to decide how many dirty pages we would like to keep. We are trying to estimate maximum amount of active memory we'll need in the near future. We do so by projecting future active memory demand (based on peak active memory usage we observed in the past within sliding window) and adding slack on top of it (an overhead is reasonable to have in exchange of higher hugepages coverage). When peak demand tracking is off, projection of future active memory is active memory we are having right now. Estimation is essentially the same as `nactive_max * (1 + dirty_mult)`. Peak demand purging algorithm controlled by two config options. Option `hpa_peak_demand_window_ms` controls duration of sliding window we track maximum active memory usage in and option `hpa_dirty_mult` controls amount of slack we are allowed to have as a percent from maximum active memory usage. By default `hpa_peak_demand_window_ms == 0` now and we have same behaviour (ratio based purging) that we had before this commit. [1]: https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf	2025-03-13 10:12:22 -07:00
guangli-dai	c067a55c79	Introducing a new usize calculation policy Converting size to usize is what jemalloc has been done by ceiling size to the closest size class. However, this causes lots of memory wastes with HPA enabled. This commit changes how usize is calculated so that the gap between two contiguous usize is no larger than a page. Specifically, this commit includes the following changes: 1. Adding a build-time config option (--enable-limit-usize-gap) and a runtime one (limit_usize_gap) to guard the changes. When build-time config is enabled, some minor CPU overhead is expected because usize will be stored and accessed apart from index. When runtime option is also enabled (it can only be enabled with the build-time config enabled). a new usize calculation approach wil be employed. This new calculation will ceil size to the closest multiple of PAGE for all sizes larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes. Note when the build-time config is enabled, the runtime option is default on. 2. Prepare tcache for size to grow by PAGE over GROUPPAGE. To prepare for the upcoming changes where size class grows by PAGE when larger than NGROUP PAGE, disable the tcache when it is larger than 2 * NGROUP * PAGE. The threshold for tcache is set higher to prevent perf regression as much as possible while usizes between NGROUP * PAGE and 2 * NGROUP * PAGE happen to grow by PAGE. 3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE For PAC, to avoid having too many bins, arena bins still have the same layout. This means some extra search is needed for a page-level request that is not aligned with the orginal size class: it should also search the heap before the current index since the previous heap might also be able to have some allocations satisfying it. The same changes apply to HPA's psset. This search relies on the enumeration of the heap because not all allocs in the previous heap are guaranteed to satisfy the request. To balance the memory and CPU overhead, we currently enumerate at most a fixed number of nodes before concluding none can satisfy the request during an enumeration. 4. Add bytes counter to arena large stats. To prepare for the upcoming usize changes, stats collected by multiplying alive allocations and the bin size is no longer accurate. Thus, add separate counters to record the bytes malloced and dalloced. 5. Change structs use when freeing to avoid using index2size for large sizes. - Change the definition of emap_alloc_ctx_t - Change the read of both from edata_t. - Change the assignment and usage of emap_alloc_ctx_t. - Change other callsites of index2size. Note for the changes in the data structure, i.e., emap_alloc_ctx_t, will be used when the build-time config (--enable-limit-usize-gap) is enabled but they will store the same value as index2size(szind) if the runtime option (opt_limit_usize_gap) is not enabled. 6. Adapt hpa to the usize changes. Change the settings in sec to limit is usage for sizes larger than USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests. 7. Modify usize calculation and corresponding tests. Change the sz_s2u_compute. Note sz_index2size is not always safe now while sz_size2index still works as expected.	2025-03-06 15:08:13 -08:00
Dmitry Ilvokhin	0ce13c6fb5	Add opt `hpa_hugify_sync` to hugify synchronously Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort synchronous collapse of the native pages mapped by the memory range into transparent huge pages. Synchronous hugification might be beneficial for at least two reasons: we are not relying on khugepaged anymore and get an instant feedback if range wasn't hugified. If `hpa_hugify_sync` option is on, we'll try to perform synchronously collapse and if it wasn't successful, we'll fallback to asynchronous behaviour.	2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin	4f4fd42447	Remove `strict_min_purge_interval` option Option `experimental_hpa_strict_min_purge_interval` was expected to be temporary to simplify rollout of a bugfix. Now, when bugfix rollout is complete it is safe to remove this option.	2024-09-25 11:49:18 -07:00
Dmitry Ilvokhin	c7ccb8d7e9	Add `experimental` prefix to `hpa_strict_min_purge_interval` Goal is to make it obvious this option is experimental.	2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin	aaa29003ab	Limit maximum number of purged slabs with option Option `experimental_hpa_max_purge_nhp` introduced for backward compatibility reasons: to make it possible to have behaviour similar to buggy `hpa_strict_min_purge_interval` implementation. When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit to number of slabs we'll purge on each iteration. Otherwise, we'll purge no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in turn means we might not purge enough dirty pages to satisfy `hpa_dirty_mult` requirement. Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and `hpa_strict_min_purge_interval` options allows us to have steady rate of pages returned back to the system. This provides a strickier latency guarantees as number of `madvise` calls is bounded (and hence number of TLB shootdowns is limited) in exchange to weaker memory usage guarantees.	2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin	143f458188	Fix `hpa_strict_min_purge_interval` option logic We update `shard->last_purge` on each call of `hpa_try_purge` if we purged something. This means, when `hpa_strict_min_purge_interval` option is set only one slab will be purged, because on the next call condition for too frequent purge protection `since_last_purge_ms < shard->opts.min_purge_interval_ms` will always be true. This is not an intended behaviour. Instead, we need to check `min_purge_interval_ms` once and purge as many pages as needed to satisfy requirements for `hpa_dirty_mult` option. Make possible to count number of actions performed in unit tests (purge, hugify, dehugify) instead of binary: called/not called. Extended current unit tests with cases where we need to purge more than one page for a purge phase.	2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin	47d69b4eab	HPA: Fix infinite purging loop One of the condition to start purging is `hpa_hugify_blocked_by_ndirty` function call returns true. This can happen in cases where we have no dirty memory for this shard at all. In this case purging loop will be an infinite loop. `hpa_hugify_blocked_by_ndirty` was introduced at `0f6c420`, but at that time purging loop has different form and additional `break` was not required. Purging loop form was re-written at `6630c5989`, but additional exit condition wasn't added there at the time. Repo code was shared by Patrik Dokoupil at [1], I stripped it down to minimum to reproduce issue in jemalloc unit tests. [1]: https://github.com/jemalloc/jemalloc/pull/2533	2024-04-30 13:46:32 -07:00
Qi Wang	a2c5267409	HPA: Allow frequent reused alloc to bypass the slab_max_alloc limit, as long as it's within the huge page size. These requests do not concern internal fragmentation with huge pages, since the entire range is expected to be accessed.	2024-01-18 14:51:04 -08:00
Kevin Svetlitski	589c63b424	Make eligible global variables `static` and/or `const` For better or worse, Jemalloc has a significant number of global variables. Making all eligible global variables `static` and/or `const` at least makes it slightly easier to reason about them, as these qualifications communicate to the programmer restrictions on their use without having to `grep` the whole codebase.	2023-07-06 14:15:12 -07:00
Kevin Svetlitski	e1338703ef	Address compiler warnings in the unit tests	2023-07-03 16:06:35 -07:00
Qi Wang	837b37c4ce	Fix the time-since computation in HPA. nstime module guarantees monotonic clock update within a single nstime_t. This means, if two separate nstime_t variables are read and updated separately, nstime_subtract between them may result in underflow. Fixed by switching to the time since utility provided by nstime.	2021-12-21 23:37:22 -08:00
Alex Lapenkou	f56f5b9930	Pass 'frequent_reuse' hint to PAI Currently used only for guarding purposes, the hint is used to determine if the allocation is supposed to be frequently reused. For example, it might urge the allocator to ensure the allocation is cached.	2021-12-15 10:39:17 -08:00
Qi Wang	400c59895a	Fix uninitialized nstime reading / updating on the stack in hpa. In order for nstime_update to handle non-monotonic clocks, it requires the input nstime to be initialized -- when reading for the first time, zero init has to be done. Otherwise random stack value may be seen as clocks and returned.	2021-11-16 16:54:12 -08:00
Alex Lapenkou	c9ebff0fd6	Initialize deferred_work_generated As the code evolves, some code paths that have previously assigned deferred_work_generated may cease being reached. This would leave the value uninitialized. This change initializes the value for safety.	2021-10-07 11:50:38 -07:00
Qi Wang	deb8e62a83	Implement guard pages. Adding guarded extents, which are regular extents surrounded by guard pages (mprotected). To reduce syscalls, small guarded extents are cached as a separate eset in ecache, and decay through the dirty / muzzy / retained pipeline as usual.	2021-09-26 16:30:15 -07:00
Piotr Balcer	7bb05e04be	add experimental.arenas_create_ext mallctl This mallctl accepts an arena_config_t structure which can be used to customize the behavior of the arena. Right now it contains extent_hooks and a new option, metadata_use_hooks, which controls whether the extent hooks are also used for metadata allocation. The medata_use_hooks option has two main use cases: 1. In heterogeneous memory systems, to avoid metadata being placed on potentially slower memory. 2. Avoiding virtual memory from being leaked as a result of metadata allocation failure originating in an extent hook.	2021-09-24 13:43:18 -07:00
Alex Lapenkou	8229cc77c5	Wake up background threads on demand This change allows every allocator conforming to PAI communicate that it deferred some work for the future. Without it if a background thread goes into indefinite sleep, there is no way to notify it about upcoming deferred work.	2021-09-17 16:56:41 -07:00
David Goldblatt	d93eef2f40	HPA: Introduce a redesigned hpa_central_t. For now, this only handles allocating virtual address space to shards, with no reuse. This is framework, though; it will change over time.	2021-07-23 21:59:59 -07:00
David Goldblatt	6630c59896	HPA: Hugification hysteresis. We wait a while after deciding a huge extent should get hugified to see if it gets purged before long. This avoids hugifying extents that might shortly get dehugified for purging. Rename and use the hpa_dehugification_threshold option support code for this, since it's now ignored.	2021-07-12 17:59:18 -07:00
David Goldblatt	113938b6f4	HPA: Pull out a hooks type. For now, this is a no-op change. In a subsequent commit, it will be useful for testing.	2021-07-12 17:59:18 -07:00
David Goldblatt	ce9386370a	HPA: Implement batch allocation.	2021-02-19 15:10:54 -08:00
David Goldblatt	b3df80bc79	Pull HPA options into a containing struct. Currently that just means max_alloc, but we're about to add more. While we're touching these lines anyways, tweak things to be more in line with testing.	2021-02-04 20:58:31 -08:00
David Goldblatt	ca30b5db2b	Introduce hpdata_t. Using an edata_t both for hugepages and the allocations within those hugepages was convenient at first, but has outlived its usefulness. Representing hugepages explicitly, with their own data structure, will make future development easier.	2020-12-07 06:21:08 -08:00
David Goldblatt	4a15008cfb	HPA unit test: skip if unsupported. Previously, we replicated the logic in hpa_supported in the test as well.	2020-12-07 06:21:08 -08:00
David Goldblatt	43af63fff4	HPA: Manage whole hugepages at a time. This redesigns the HPA implementation to allow us to manage hugepages all at once, locally, without relying on a global fallback.	2020-12-07 06:21:08 -08:00
David Goldblatt	534504d4a7	HPA: add size-exclusion functionality. I.e. only allowing allocations under or over certain sizes.	2020-10-23 11:14:34 -07:00
David Goldblatt	1c7da33317	HPA: Tie components into a PAI implementation.	2020-10-23 11:14:34 -07:00

37 commits