romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-04-14 22:51:50 +03:00

Author	SHA1	Message	Date
Slobodan Predolac	7bf49ca689	Remove prof_threshold built-in event. It is trivial to implement it as user event if needed	2026-02-25 09:06:29 -08:00
Slobodan Predolac	afce889cc9	[SEC] Make SEC owned by hpa_shard, simplify the code, add stats, lock per bin	2025-12-19 12:51:26 -05:00
Slobodan Predolac	bfb63eaf41	Add experimental_enforce_hugify	2025-11-21 11:30:16 -08:00
Slobodan Predolac	6ced85a8e5	When extracting from central, hugify_eager is different than start_as_huge	2025-10-15 20:27:35 -04:00
Slobodan Predolac	f8ef3195e5	[HPA] Add ability to start page as huge and more flexibility for purging	2025-10-06 10:59:43 -04:00
Slobodan Predolac	db943a0d9e	Running clang-format on two files	2025-10-06 10:59:43 -04:00
Carl Shapiro	218c989047	Always use pthread_equal to compare thread IDs This change replaces direct comparisons of Pthread thread IDs with calls to pthread_equal. Directly comparing thread IDs is neither portable nor reliable since a thread ID is defined as an opaque type that can be implemented using a structure.	2025-08-28 14:45:51 -07:00
Slobodan Predolac	d4cde60066	Remove pidfd_open call handling and rely on PIDFD_SELF	2025-08-27 12:44:45 -04:00
Shirui Cheng	e2da7477f8	Revert PR #2608 : Manually revert commits 70c94d..f9c0b5	2025-08-22 21:55:24 -07:00
Slobodan Predolac	97d25919c3	[process_madvise] Make init lazy so that python tests pass. Reset the pidfd on fork	2025-07-29 15:47:53 -07:00
guangli-dai	34f359e0ca	Reformat the codebase with the clang-format 18.	2025-06-20 14:35:15 -07:00
Shirui Cheng	ae4e729d15	Update the default value for opt_experimental_tcache_gc and opt_calloc_madvise_threshold	2025-06-17 13:25:20 -07:00
Slobodan Predolac	390e70c840	[thread_event] Add support for user events in thread events when stats are enabled	2025-06-11 15:37:03 -07:00
Jason Evans	27d7960cf9	Revert "Extend purging algorithm with peak demand tracking" This reverts commit `ad108d50f1`.	2025-06-02 10:44:37 -07:00
guangli-dai	8347f1045a	Renaming limit_usize_gap to disable_large_size_classes	2025-05-06 14:47:35 -07:00
Guangli Dai	01e9ecbeb2	Remove build-time configuration 'config_limit_usize_gap'	2025-05-06 14:47:35 -07:00
Qi Wang	a3910b9802	Avoid forced purging during thread-arena migration when bg thd is on.	2025-04-25 19:18:20 -07:00
Shirui Cheng	3688dfb5c3	fix assertion error in huge_arena_auto_thp_switch() when b0 is deleted in unit test	2025-03-20 12:45:23 -07:00
Shirui Cheng	e1a77ec558	Support THP with Huge Arena in PAC	2025-03-17 16:06:43 -07:00
Dmitry Ilvokhin	ad108d50f1	Extend purging algorithm with peak demand tracking Implementation inspired by idea described in "Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator" paper [1]. Primary idea is to track maximum number (peak) of active pages in use with sliding window and then use this number to decide how many dirty pages we would like to keep. We are trying to estimate maximum amount of active memory we'll need in the near future. We do so by projecting future active memory demand (based on peak active memory usage we observed in the past within sliding window) and adding slack on top of it (an overhead is reasonable to have in exchange of higher hugepages coverage). When peak demand tracking is off, projection of future active memory is active memory we are having right now. Estimation is essentially the same as `nactive_max * (1 + dirty_mult)`. Peak demand purging algorithm controlled by two config options. Option `hpa_peak_demand_window_ms` controls duration of sliding window we track maximum active memory usage in and option `hpa_dirty_mult` controls amount of slack we are allowed to have as a percent from maximum active memory usage. By default `hpa_peak_demand_window_ms == 0` now and we have same behaviour (ratio based purging) that we had before this commit. [1]: https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf	2025-03-13 10:12:22 -07:00
Qi Wang	22440a0207	Implement process_madvise support. Add opt.process_madvise_max_batch which determines if process_madvise is enabled (non-zero) and the max # of regions in each batch. Added another limiting factor which is the space to reserve on stack, which results in the max batch of 128.	2025-03-07 15:32:32 -08:00
guangli-dai	c067a55c79	Introducing a new usize calculation policy Converting size to usize is what jemalloc has been done by ceiling size to the closest size class. However, this causes lots of memory wastes with HPA enabled. This commit changes how usize is calculated so that the gap between two contiguous usize is no larger than a page. Specifically, this commit includes the following changes: 1. Adding a build-time config option (--enable-limit-usize-gap) and a runtime one (limit_usize_gap) to guard the changes. When build-time config is enabled, some minor CPU overhead is expected because usize will be stored and accessed apart from index. When runtime option is also enabled (it can only be enabled with the build-time config enabled). a new usize calculation approach wil be employed. This new calculation will ceil size to the closest multiple of PAGE for all sizes larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes. Note when the build-time config is enabled, the runtime option is default on. 2. Prepare tcache for size to grow by PAGE over GROUPPAGE. To prepare for the upcoming changes where size class grows by PAGE when larger than NGROUP PAGE, disable the tcache when it is larger than 2 * NGROUP * PAGE. The threshold for tcache is set higher to prevent perf regression as much as possible while usizes between NGROUP * PAGE and 2 * NGROUP * PAGE happen to grow by PAGE. 3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE For PAC, to avoid having too many bins, arena bins still have the same layout. This means some extra search is needed for a page-level request that is not aligned with the orginal size class: it should also search the heap before the current index since the previous heap might also be able to have some allocations satisfying it. The same changes apply to HPA's psset. This search relies on the enumeration of the heap because not all allocs in the previous heap are guaranteed to satisfy the request. To balance the memory and CPU overhead, we currently enumerate at most a fixed number of nodes before concluding none can satisfy the request during an enumeration. 4. Add bytes counter to arena large stats. To prepare for the upcoming usize changes, stats collected by multiplying alive allocations and the bin size is no longer accurate. Thus, add separate counters to record the bytes malloced and dalloced. 5. Change structs use when freeing to avoid using index2size for large sizes. - Change the definition of emap_alloc_ctx_t - Change the read of both from edata_t. - Change the assignment and usage of emap_alloc_ctx_t. - Change other callsites of index2size. Note for the changes in the data structure, i.e., emap_alloc_ctx_t, will be used when the build-time config (--enable-limit-usize-gap) is enabled but they will store the same value as index2size(szind) if the runtime option (opt_limit_usize_gap) is not enabled. 6. Adapt hpa to the usize changes. Change the settings in sec to limit is usage for sizes larger than USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests. 7. Modify usize calculation and corresponding tests. Change the sz_s2u_compute. Note sz_index2size is not always safe now while sz_size2index still works as expected.	2025-03-06 15:08:13 -08:00
Guangli Dai	ac279d7e71	Fix profiling sample metadata lookup during xallocx	2025-03-04 14:42:04 -08:00
Dmitry Ilvokhin	499f306859	Fix arena 0 `deferral_allowed` flag init Arena 0 have a dedicated initialization path, which differs from initialization path of other arenas. The main difference for the purpose of this change is that we initialize arena 0 before we initialize background threads. HPA shard options have `deferral_allowed` flag which should be equal to `background_thread_enabled()` return value, but it wasn't the case before this change, because for arena 0 `background_thread_enabled()` was initialized correctly after arena 0 initialization phase already ended. Below is initialization sequence for arena 0 after this commit to illustrate everything still should be initialized correctly. * `hpa_central_init` initializes HPA Central, before we initialize every HPA shard (including arena's 0). * `background_thread_boot1` initializes `background_thread_enabled()` return value. * `pa_shard_enable_hpa` initializes arena 0 HPA shard. ``` malloc_init_hard ------------- / / \ / / \ / / \ malloc_init_hard_a0_locked background_thread_boot1 pa_shard_enable_hpa / / \ / / \ / / \ arena_boot background_thread_enabled_seta hpa_shard_init \| \| pa_central_init \| \| hpa_central_init ```	2025-02-18 12:10:35 -08:00
roblabla	c17bf8b368	Disable config from file or envvar with build flag This adds a new autoconf flag, --disable-user-config, which disables reading the configuration from /etc/malloc.conf or the MALLOC_CONF environment variable. This can be useful when integrating jemalloc in a binary that internally handles all aspects of the configuration and shouldn't be impacted by ambient change in the environment.	2025-02-05 15:01:50 -08:00
Shai Duvdevani	257e64b968	Unlike `prof_sample` which is supported only with profiling mode active, `prof_threshold` is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks.	2025-01-29 18:55:52 -08:00
Dmitry Ilvokhin	3820e38dc1	Remove validation for HPA ratios Config validation was introduced at `3aae792b` with main intention to fix infinite purging loop, but it didn't actually fix the underlying problem, just masked it. Later `47d69b4ea` was merged to address the same problem. Options `hpa_dirty_mult` and `hpa_hugification_threshold` have different application dimensions: `hpa_dirty_mult` applied to active memory on the shard, but `hpa_hugification_threshold` is a threshold for single pageslab (hugepage). It doesn't make much sense to sum them up together. While it is true that too high value of `hpa_dirty_mult` and too low value of `hpa_hugification_threshold` can lead to pathological behaviour, it is true for other options as well. Poor configurations might lead to suboptimal and sometimes completely unacceptable behaviour and that's OK, that is exactly the reason why they are called poor. There are other mechanism exist to prevent extreme behaviour, when we hugified and then immediately purged page, see `hpa_hugify_blocked_by_ndirty` function, which exist to prevent exactly this case. Lastly, `hpa_dirty_mult + hpa_hugification_threshold >= 1` constraint is too tight and prevents a lot of valid configurations.	2024-11-20 18:59:07 -08:00
Dmitry Ilvokhin	0ce13c6fb5	Add opt `hpa_hugify_sync` to hugify synchronously Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort synchronous collapse of the native pages mapped by the memory range into transparent huge pages. Synchronous hugification might be beneficial for at least two reasons: we are not relying on khugepaged anymore and get an instant feedback if range wasn't hugified. If `hpa_hugify_sync` option is on, we'll try to perform synchronously collapse and if it wasn't successful, we'll fallback to asynchronous behaviour.	2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin	4f4fd42447	Remove `strict_min_purge_interval` option Option `experimental_hpa_strict_min_purge_interval` was expected to be temporary to simplify rollout of a bugfix. Now, when bugfix rollout is complete it is safe to remove this option.	2024-09-25 11:49:18 -07:00
Qi Wang	3383b98f1b	Check if the huge page size is expected when enabling HPA.	2024-09-04 15:43:59 -07:00
Shirui Cheng	f68effe4ac	Add a runtime option opt_experimental_tcache_gc to guard the new design	2024-08-29 10:50:33 -07:00
Dmitry Ilvokhin	c7ccb8d7e9	Add `experimental` prefix to `hpa_strict_min_purge_interval` Goal is to make it obvious this option is experimental.	2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin	aaa29003ab	Limit maximum number of purged slabs with option Option `experimental_hpa_max_purge_nhp` introduced for backward compatibility reasons: to make it possible to have behaviour similar to buggy `hpa_strict_min_purge_interval` implementation. When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit to number of slabs we'll purge on each iteration. Otherwise, we'll purge no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in turn means we might not purge enough dirty pages to satisfy `hpa_dirty_mult` requirement. Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and `hpa_strict_min_purge_interval` options allows us to have steady rate of pages returned back to the system. This provides a strickier latency guarantees as number of `madvise` calls is bounded (and hence number of TLB shootdowns is limited) in exchange to weaker memory usage guarantees.	2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin	867c6dd7dc	Option to guard `hpa_min_purge_interval_ms` fix Change in `hpa_min_purge_interval_ms` handling logic is not backward compatible as it might increase memory usage. Now this logic guarded by `hpa_strict_min_purge_interval` option. When `hpa_strict_min_purge_interval` is true, we will purge no more than `hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is false, old purging logic behaviour is preserved. Long term strategy migrate all users of hpa to new logic and then delete `hpa_strict_min_purge_interval` option.	2024-06-07 10:52:41 -07:00
David Goldblatt	fc615739cb	Add batching to arena bins. This adds a fast-path for threads freeing a small number of allocations to bins which are not their "home-base" and which encounter lock contention in attempting to do so. In producer-consumer workflows, such small lock hold times can cause lock convoying that greatly increases overall bin mutex contention.	2024-05-22 10:30:31 -07:00
David Goldblatt	c085530c71	Tcache batching: Plumbing In the next commit, we'll start using the batcher to eliminate mutex traffic. To avoid cluttering up that commit with the random bits of busy-work it entails, we'll centralize them here. This commit introduces: - A batched bin type. - The ability to mix batched and unbatched bins in the arena. - Conf parsing to set batches per size and a max batched size. - mallctl access to the corresponding opt-namespace keys. - Stats output of the above.	2024-05-22 10:30:31 -07:00
Qi Wang	8d8379da44	Fix background_thread creation for the oversize_arena. Bypassing background thread creation for the oversize_arena used to be an optimization since that arena had eager purging. However #2466 changed the purging policy for the oversize_arena -- specifically it switched to the default decay time when background_thread is enabled. This issue is noticable when the number of arenas is low: whenever the total # of arenas is <= 4 (which is the default max # of background threads), in which case the purging will be stalled since no background thread is created for the oversize_arena.	2024-05-02 14:45:18 -07:00
Daniel Hodges	11038ff762	Add support for namespace pids in heap profile names This change adds support for writing pid namespaces to the filename of a heap profile. When running with namespaces pids may reused across namespaces and if mounts are shared where profiles are written there is not a great way to differentiate profiles between pids. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com> Signed-off-by: Daniel Hodges <hodgesd@fb.com>	2024-04-09 10:27:52 -07:00
Qi Wang	83b075789b	rallocx path: only set errno on the realloc case.	2024-04-05 17:41:43 -07:00
Shirui Cheng	5081c16bb4	Experimental calloc implementation with using memset on larger sizes	2024-04-04 15:31:56 -07:00
Juhyung Park	38056fea64	Set errno to ENOMEM on rallocx() OOM failures realloc() and rallocx() shares path, and realloc() should set errno to ENOMEM upon OOM failures. Fixes: `ee961c2310` ("Merge realloc and rallocx pathways.") Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>	2024-04-04 15:13:22 -07:00
Shirui Cheng	373884ab48	print out all malloc_conf settings in stats	2024-02-29 12:12:44 -08:00
Qi Wang	1aba4f41a3	Allow zero sized memalign to pass. Instead of failing on assertions. Previously the same change was made for posix_memalign and aligned_alloc (#1554). Make memalign behave the same way even though it's obsolete.	2024-02-16 13:06:07 -08:00
guangli-dai	b1792c80d2	Add LOGs when entrying and exiting free and sdallocx.	2024-01-11 14:37:20 -08:00
guangli-dai	eda05b3994	Fix static analysis warnings.	2024-01-03 14:18:52 -08:00
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00

1 2 3 4 5 ...

573 commits