romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-04-14 22:51:50 +03:00

Author	SHA1	Message	Date
Slobodan Predolac	34ace9169b	Remove prof_threshold built-in event. It is trivial to implement it as user event if needed	2026-03-10 18:14:33 -07:00
Slobodan Predolac	d4908fe44a	Revert "Experimental configuration option for fast path prefetch from cache_bin" This reverts commit `f9fae9f1f8`.	2026-03-10 18:14:33 -07:00
Slobodan Predolac	6016d86c18	[SEC] Make SEC owned by hpa_shard, simplify the code, add stats, lock per bin	2026-03-10 18:14:33 -07:00
Slobodan Predolac	8a06b086f3	[EASY] Extract hpa_central component from hpa source file	2026-03-10 18:14:33 -07:00
Slobodan Predolac	355774270d	[EASY] Encapsulate better, do not pass hpa_shard when hooks are enough, move shard independent actions to hpa_utils	2026-03-10 18:14:33 -07:00
Slobodan Predolac	47aeff1d08	Add experimental_enforce_hugify	2026-03-10 18:14:33 -07:00
Shirui Cheng	6d4611197e	move fill/flush pointer array out of tcache.c	2026-03-10 18:14:33 -07:00
Slobodan Predolac	3678a57c10	When extracting from central, hugify_eager is different than start_as_huge	2026-03-10 18:14:33 -07:00
guangli-dai	2cfa41913e	Refactor init_system_thp_mode and print it in malloc stats.	2026-03-10 18:14:33 -07:00
Slobodan Predolac	5e49c28ef0	[EASY] Spelling in the comments	2026-03-10 18:14:33 -07:00
Slobodan Predolac	a199278f37	[HPA] Add ability to start page as huge and more flexibility for purging	2026-03-10 18:14:33 -07:00
Slobodan Predolac	2688047b56	Revert "Do not dehugify when purging" This reverts commit `16c5abd1cd`.	2026-03-10 18:14:33 -07:00
Slobodan Predolac	5d5f76ee01	Remove pidfd_open call handling and rely on PIDFD_SELF	2026-03-10 18:14:33 -07:00
Slobodan Predolac	2a66c0be5a	[EASY][BUGFIX] Spelling and format	2026-03-10 18:14:33 -07:00
lexprfuncall	38b12427b7	Define malloc_{write,read}_fd as non-inline global functions The static inline definition made more sense when these functions just dispatched to a syscall wrapper. Since they acquired a retry loop, a non-inline definition makes more sense.	2026-03-10 18:14:33 -07:00
lexprfuncall	9fdc1160c5	Handle interruptions and retries of read(2) and write(2)	2026-03-10 18:14:33 -07:00
Shirui Cheng	2114349a4e	Revert PR #2608 : Manually revert commits 70c94d..f9c0b5 Closes: #2707	2026-03-10 18:14:33 -07:00
Slobodan Predolac	d73de95f72	Experimental configuration option for fast path prefetch from cache_bin	2026-03-10 18:14:33 -07:00
lexprfuncall	a156e997d7	Do not dehugify when purging Giving the advice MADV_DONTNEED to a range of virtual memory backed by a transparent huge page already causes that range of virtual memory to become backed by regular pages.	2026-03-10 18:14:33 -07:00
Slobodan Predolac	4246475b44	[process_madvise] Make init lazy so that python tests pass. Reset the pidfd on fork	2026-03-10 18:14:33 -07:00
Slobodan Predolac	711fff750c	Add experimental support for usdt systemtap probes	2026-03-10 18:14:33 -07:00
guangli-dai	6200e8987f	Reformat the codebase with the clang-format 18.	2026-03-10 18:14:33 -07:00
Shirui Cheng	a952a3b8b0	Update the default value for opt_experimental_tcache_gc and opt_calloc_madvise_threshold	2026-03-10 18:14:33 -07:00
Slobodan Predolac	015b017973	[thread_event] Add support for user events in thread events when stats are enabled	2026-03-10 18:14:33 -07:00
Slobodan Predolac	e6864c6075	[thread_event] Remove macros from thread_event and replace with dynamic event objects	2026-03-10 18:14:33 -07:00
Jason Evans	27d7960cf9	Revert "Extend purging algorithm with peak demand tracking" This reverts commit `ad108d50f1`.	2025-06-02 10:44:37 -07:00
guangli-dai	edaab8b3ad	Turn clang-format off for codes with multi-line commands in macros	2025-05-28 19:22:21 -07:00
guangli-dai	fd60645260	Add one more check to double free validation.	2025-05-28 19:21:49 -07:00
Xin Yang	5e460bfea2	Refactor: use the cache_bin_sz_t typedef instead of direct uint16_t any future changes to the underlying data type for bin sizes (such as upgrading from `uint16_t` to `uint32_t`) can be achieved by modifying only the `cache_bin_sz_t` definition. Signed-off-by: Xin Yang <yangxin.dev@bytedance.com>	2025-05-22 10:43:33 -07:00
Xin Yang	9169e9272a	Fix: Adjust CACHE_BIN_NFLUSH_BATCH_MAX size to prevent assert failures The maximum allowed value for `nflush_batch` is `CACHE_BIN_NFLUSH_BATCH_MAX`. However, `tcache_bin_flush_impl_small` could potentially declare an array of `emap_batch_lookup_result_t` of size `CACHE_BIN_NFLUSH_BATCH_MAX + 1`. leads to a `VARIABLE_ARRAY` assertion failure, observed when `tcache_nslots_small_max` is configured to 2048. This patch ensures the array size does not exceed the allowed maximum. Signed-off-by: Xin Yang <yangxin.dev@bytedance.com>	2025-05-22 10:27:09 -07:00
guangli-dai	37bf846cc3	Fixes to prevent static analysis warnings.	2025-05-06 14:47:35 -07:00
guangli-dai	8347f1045a	Renaming limit_usize_gap to disable_large_size_classes	2025-05-06 14:47:35 -07:00
Guangli Dai	01e9ecbeb2	Remove build-time configuration 'config_limit_usize_gap'	2025-05-06 14:47:35 -07:00
Slobodan Predolac	852da1be15	Add experimental option force using SYS_process_madvise	2025-04-28 18:45:30 -07:00
Slobodan Predolac	1956a54a43	[process_madvise] Use process_madvise across multiple huge_pages	2025-04-25 19:19:03 -07:00
Slobodan Predolac	0dfb4a5a1a	Add output argument to hpa_purge_begin to count dirty ranges	2025-04-25 19:19:03 -07:00
Slobodan Predolac	cfa90dfd80	Refactor hpa purging to prepare for vectorized call across multiple pages	2025-04-25 19:19:03 -07:00
guangli-dai	c20a63a765	Silence the uninitialized warning from clang.	2025-04-16 10:38:10 -07:00
Slobodan Predolac	f19f49ef3e	if process_madvise is supported, call it when purging hpa	2025-04-04 13:57:42 -07:00
Kaspar M. Rohrer	80e9001af3	Move `extern "C" specifications for C++ to where they are needed This should fix errors when compiling C++ code with modules enabled on clang.	2025-03-31 10:41:51 -07:00
Shirui Cheng	3688dfb5c3	fix assertion error in huge_arena_auto_thp_switch() when b0 is deleted in unit test	2025-03-20 12:45:23 -07:00
Shirui Cheng	e1a77ec558	Support THP with Huge Arena in PAC	2025-03-17 16:06:43 -07:00
Guangli Dai	773b5809f9	Fix frame pointer based unwinder to handle changing stack range	2025-03-13 17:15:42 -07:00
Dmitry Ilvokhin	ad108d50f1	Extend purging algorithm with peak demand tracking Implementation inspired by idea described in "Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator" paper [1]. Primary idea is to track maximum number (peak) of active pages in use with sliding window and then use this number to decide how many dirty pages we would like to keep. We are trying to estimate maximum amount of active memory we'll need in the near future. We do so by projecting future active memory demand (based on peak active memory usage we observed in the past within sliding window) and adding slack on top of it (an overhead is reasonable to have in exchange of higher hugepages coverage). When peak demand tracking is off, projection of future active memory is active memory we are having right now. Estimation is essentially the same as `nactive_max * (1 + dirty_mult)`. Peak demand purging algorithm controlled by two config options. Option `hpa_peak_demand_window_ms` controls duration of sliding window we track maximum active memory usage in and option `hpa_dirty_mult` controls amount of slack we are allowed to have as a percent from maximum active memory usage. By default `hpa_peak_demand_window_ms == 0` now and we have same behaviour (ratio based purging) that we had before this commit. [1]: https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf	2025-03-13 10:12:22 -07:00
Qi Wang	22440a0207	Implement process_madvise support. Add opt.process_madvise_max_batch which determines if process_madvise is enabled (non-zero) and the max # of regions in each batch. Added another limiting factor which is the space to reserve on stack, which results in the max batch of 128.	2025-03-07 15:32:32 -08:00
Guangli Dai	6035d4a8d3	Cache extra extents in the dirty pool from ecache_alloc_grow	2025-03-06 15:08:13 -08:00
guangli-dai	c067a55c79	Introducing a new usize calculation policy Converting size to usize is what jemalloc has been done by ceiling size to the closest size class. However, this causes lots of memory wastes with HPA enabled. This commit changes how usize is calculated so that the gap between two contiguous usize is no larger than a page. Specifically, this commit includes the following changes: 1. Adding a build-time config option (--enable-limit-usize-gap) and a runtime one (limit_usize_gap) to guard the changes. When build-time config is enabled, some minor CPU overhead is expected because usize will be stored and accessed apart from index. When runtime option is also enabled (it can only be enabled with the build-time config enabled). a new usize calculation approach wil be employed. This new calculation will ceil size to the closest multiple of PAGE for all sizes larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes. Note when the build-time config is enabled, the runtime option is default on. 2. Prepare tcache for size to grow by PAGE over GROUPPAGE. To prepare for the upcoming changes where size class grows by PAGE when larger than NGROUP PAGE, disable the tcache when it is larger than 2 * NGROUP * PAGE. The threshold for tcache is set higher to prevent perf regression as much as possible while usizes between NGROUP * PAGE and 2 * NGROUP * PAGE happen to grow by PAGE. 3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE For PAC, to avoid having too many bins, arena bins still have the same layout. This means some extra search is needed for a page-level request that is not aligned with the orginal size class: it should also search the heap before the current index since the previous heap might also be able to have some allocations satisfying it. The same changes apply to HPA's psset. This search relies on the enumeration of the heap because not all allocs in the previous heap are guaranteed to satisfy the request. To balance the memory and CPU overhead, we currently enumerate at most a fixed number of nodes before concluding none can satisfy the request during an enumeration. 4. Add bytes counter to arena large stats. To prepare for the upcoming usize changes, stats collected by multiplying alive allocations and the bin size is no longer accurate. Thus, add separate counters to record the bytes malloced and dalloced. 5. Change structs use when freeing to avoid using index2size for large sizes. - Change the definition of emap_alloc_ctx_t - Change the read of both from edata_t. - Change the assignment and usage of emap_alloc_ctx_t. - Change other callsites of index2size. Note for the changes in the data structure, i.e., emap_alloc_ctx_t, will be used when the build-time config (--enable-limit-usize-gap) is enabled but they will store the same value as index2size(szind) if the runtime option (opt_limit_usize_gap) is not enabled. 6. Adapt hpa to the usize changes. Change the settings in sec to limit is usage for sizes larger than USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests. 7. Modify usize calculation and corresponding tests. Change the sz_s2u_compute. Note sz_index2size is not always safe now while sz_size2index still works as expected.	2025-03-06 15:08:13 -08:00
Dmitry Ilvokhin	421b17a622	Remove age_counter from hpa_central Before this commit we had two age counters: one global in HPA central and one local in each HPA shard. We used HPA shard counter, when we are reused empty pageslab and HPA central counter anywhere else. They suppose to be comparable, because we use them for allocation placement decisions, but in reality they are not, there is no ordering guarantees between them. At the moment, there is no way for pageslab to migrate between HPA shards, so we don't actually need HPA central age counter.	2025-02-13 16:00:41 -08:00
roblabla	c17bf8b368	Disable config from file or envvar with build flag This adds a new autoconf flag, --disable-user-config, which disables reading the configuration from /etc/malloc.conf or the MALLOC_CONF environment variable. This can be useful when integrating jemalloc in a binary that internally handles all aspects of the configuration and shouldn't be impacted by ambient change in the environment.	2025-02-05 15:01:50 -08:00
Shai Duvdevani	257e64b968	Unlike `prof_sample` which is supported only with profiling mode active, `prof_threshold` is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks.	2025-01-29 18:55:52 -08:00

1 2 3 4 5 ...

1708 commits