romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-06-02 18:24:17 +03:00

Author	SHA1	Message	Date
Slobodan Predolac	88745978e9	Pass surviving descriptor through jemalloc_postfork_child orchestrator The orchestrator looks up the surviving descriptor via tcache_postfork_arena_descriptor and threads it into arena_postfork_child, eliminating arena's call into tcache. Also reset cache_bin_array_descriptor_ql_mtx right before the queue rebuild it protects.	2026-05-13 17:50:41 -04:00
Slobodan Predolac	3cd9753e23	Move tcache_stats_merge into arena as arena_cache_bins_stats_merge	2026-05-13 17:50:41 -04:00
Slobodan Predolac	35d102fa32	Encapsulate cache_bin_array_descriptor queue ops behind arena helpers tcache.c was reaching into arena->cache_bin_array_descriptor_ql{,_mtx} directly to register / unregister / postfork-relink its descriptor. That queue and mutex are owned by arena, so the locking and ql_* operations belong in arena.c.	2026-05-13 17:50:41 -04:00
Slobodan Predolac	36820f9b76	Drop redundant tcache_t param from tcache_arena_{associate,dissociate,reassociate} After tcache_init runs, tcache_slow->tcache == tcache always holds, so the tcache_t parameter to the three association helpers is derivable from tcache_slow.	2026-05-13 17:50:41 -04:00
Slobodan Predolac	b92420d309	Replace arena->tcache_ql with cache_bin_array_descriptor_ql walks Drop the duplicate arena->tcache_ql; stats merging walks the cache_bin_array_descriptor_ql directly. Rename the protecting mutex from tcache_ql_mtx to cache_bin_array_descriptor_ql_mtx to match. Add an assertion in test_thread_migrate_arena that the dissociate-time flush zeros cache_bin->tstats.nrequests.	2026-05-13 17:50:41 -04:00
Slobodan Predolac	54ef51121b	Extract postfork-child tcache list relink into tcache_arena_postfork_child	2026-05-13 17:50:41 -04:00
Slobodan Predolac	3c1c6ae419	Hide bin slab-locality query behind arena_locality_hint bin_t is an arena implementation detail; tcache should not reach into it. Extract the slab-address lookup into bin.c as bin_current_slab_addr, and expose it to tcache only through arena_locality_hint(tsdn, arena, szind), which composes bin_choose + bin_current_slab_addr.	2026-05-13 17:50:41 -04:00
Slobodan Predolac	e286fba00a	Extract bin->stats.nrequests mutation into bin_stats_nrequests_add	2026-05-13 17:50:41 -04:00
Slobodan Predolac	19bbefe136	Remove dead code: extent_commit_wrapper, large_salloc, tcache_gc_dalloc event waits These functions had zero callers anywhere in the codebase: - extent_commit_wrapper: wrapper never called, _impl used directly - large_salloc: trivial wrapper never called - tcache_gc_dalloc_new_event_wait: no header declaration, no callers - tcache_gc_dalloc_postponed_event_wait: no header declaration, no callers	2026-04-01 17:48:19 -04:00
Carl Shapiro	a056c20d67	Handle tcache init failures gracefully tsd_tcache_data_init() returns true on failure but its callers ignore this return value, leaving the per-thread tcache in an uninitialized state after a failure. This change disables the tcache on an initialization failure and logs an error message. If opt_abort is true, it will also abort. New unit tests have been added to test tcache initialization failures.	2026-03-10 18:14:33 -07:00
Carl Shapiro	1cc563f531	Move bin functions from arena.c to bin.c This is a clean-up change that gives the bin functions implemented in the area code a prefix of bin_ and moves them into the bin code. To further decouple the bin code from the arena code, bin functions that had taken an arena_t to check arena_is_auto now take an is_auto parameter instead.	2026-03-10 18:14:33 -07:00
Shirui Cheng	6d4611197e	move fill/flush pointer array out of tcache.c	2026-03-10 18:14:33 -07:00
Shirui Cheng	2114349a4e	Revert PR #2608 : Manually revert commits 70c94d..f9c0b5 Closes: #2707	2026-03-10 18:14:33 -07:00
guangli-dai	6200e8987f	Reformat the codebase with the clang-format 18.	2026-03-10 18:14:33 -07:00
Slobodan Predolac	015b017973	[thread_event] Add support for user events in thread events when stats are enabled	2026-03-10 18:14:33 -07:00
Slobodan Predolac	e6864c6075	[thread_event] Remove macros from thread_event and replace with dynamic event objects	2026-03-10 18:14:33 -07:00
guangli-dai	c067a55c79	Introducing a new usize calculation policy Converting size to usize is what jemalloc has been done by ceiling size to the closest size class. However, this causes lots of memory wastes with HPA enabled. This commit changes how usize is calculated so that the gap between two contiguous usize is no larger than a page. Specifically, this commit includes the following changes: 1. Adding a build-time config option (--enable-limit-usize-gap) and a runtime one (limit_usize_gap) to guard the changes. When build-time config is enabled, some minor CPU overhead is expected because usize will be stored and accessed apart from index. When runtime option is also enabled (it can only be enabled with the build-time config enabled). a new usize calculation approach wil be employed. This new calculation will ceil size to the closest multiple of PAGE for all sizes larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes. Note when the build-time config is enabled, the runtime option is default on. 2. Prepare tcache for size to grow by PAGE over GROUPPAGE. To prepare for the upcoming changes where size class grows by PAGE when larger than NGROUP PAGE, disable the tcache when it is larger than 2 * NGROUP * PAGE. The threshold for tcache is set higher to prevent perf regression as much as possible while usizes between NGROUP * PAGE and 2 * NGROUP * PAGE happen to grow by PAGE. 3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE For PAC, to avoid having too many bins, arena bins still have the same layout. This means some extra search is needed for a page-level request that is not aligned with the orginal size class: it should also search the heap before the current index since the previous heap might also be able to have some allocations satisfying it. The same changes apply to HPA's psset. This search relies on the enumeration of the heap because not all allocs in the previous heap are guaranteed to satisfy the request. To balance the memory and CPU overhead, we currently enumerate at most a fixed number of nodes before concluding none can satisfy the request during an enumeration. 4. Add bytes counter to arena large stats. To prepare for the upcoming usize changes, stats collected by multiplying alive allocations and the bin size is no longer accurate. Thus, add separate counters to record the bytes malloced and dalloced. 5. Change structs use when freeing to avoid using index2size for large sizes. - Change the definition of emap_alloc_ctx_t - Change the read of both from edata_t. - Change the assignment and usage of emap_alloc_ctx_t. - Change other callsites of index2size. Note for the changes in the data structure, i.e., emap_alloc_ctx_t, will be used when the build-time config (--enable-limit-usize-gap) is enabled but they will store the same value as index2size(szind) if the runtime option (opt_limit_usize_gap) is not enabled. 6. Adapt hpa to the usize changes. Change the settings in sec to limit is usage for sizes larger than USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests. 7. Modify usize calculation and corresponding tests. Change the sz_s2u_compute. Note sz_index2size is not always safe now while sz_size2index still works as expected.	2025-03-06 15:08:13 -08:00
Shirui Cheng	7c99686165	Better handle burst allocation on tcache_alloc_small_hard	2024-08-29 10:50:33 -07:00
Shirui Cheng	0c88be9e0a	Regulate GC frequency by requiring a time interval between two consecutive GCs	2024-08-29 10:50:33 -07:00
Shirui Cheng	e2c9f3a9ce	Take locality into consideration when doing GC flush	2024-08-29 10:50:33 -07:00
Shirui Cheng	14d5dc136a	Allow a range for the nfill passed to arena_cache_bin_fill_small	2024-08-29 10:50:33 -07:00
Qi Wang	bd0a5b0f3b	Fix static analysis warnings. Newly reported warnings included several reserved macro identifier, and false-positive used-uninitialized.	2024-08-28 16:03:53 -07:00
Amaury Séchet	a25b9b8ba9	Simplify the logic when bumping lg_fill_div.	2024-08-06 13:31:49 -07:00
Shirui Cheng	47c9bcd402	Use a for-loop to fulfill flush requests that are larger than CACHE_BIN_NFLUSH_BATCH_MAX items	2024-08-06 13:16:09 -07:00
David Goldblatt	f9c0b5f7f8	Bin batching: add some stats. This lets us easily see what fraction of flush load is being taken up by the bins, and helps guide future optimization approaches (for example: should we prefetch during cache bin fills? It depends on how many objects the average fill pops out of the batch).	2024-05-22 10:30:31 -07:00
David Goldblatt	fc615739cb	Add batching to arena bins. This adds a fast-path for threads freeing a small number of allocations to bins which are not their "home-base" and which encounter lock contention in attempting to do so. In producer-consumer workflows, such small lock hold times can cause lock convoying that greatly increases overall bin mutex contention.	2024-05-22 10:30:31 -07:00
David Goldblatt	44d91cf243	Tcache flush: Partition by bin before locking. This accomplishes two things: - It avoids a full array scan (and any attendant branch prediction misses, etc.) while holding the bin lock. - It allows us to know the number of items that will be flushed before flushing them, which will (in an upcoming commit) let us know if it's safe to use the batched flush (in which case we won't acquire the bin mutex).	2024-05-22 10:30:31 -07:00
David Goldblatt	6e56848850	Tcache: Split up small/large handling. The main bits of shared code are the edata filtering and the stats flushing logic, both of which are fairly simple to read and not so painful to duplicate. The shared code comes at the cost of guarding all the subtle logic with `if (small)`, which doesn't feel worth it.	2024-05-22 10:30:31 -07:00
Amaury Séchet	5afff2e44e	Simplify the logic in tcache_gc_small.	2024-05-02 18:52:19 -07:00
Qi Wang	fa451de17f	Fix the tcache flush sanity checking around ncached and nstashed. When there were many items stashed, it's possible that after flushing stashed, ncached is already lower than the remain, in which case the flush can simply return at that point.	2024-04-12 16:01:55 -07:00
guangli-dai	eda05b3994	Fix static analysis warnings.	2024-01-03 14:18:52 -08:00
Shirui Cheng	e4817c8d89	Cleanup cache_bin_info_t* info input args	2023-10-25 10:27:31 -07:00
guangli-dai	d88fa71bbd	Fix nfill = 0 bug when ncached_max is 1	2023-10-18 14:11:46 -07:00
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Kevin Svetlitski	3e82f357bb	Fix all optimization-inhibiting integer-to-pointer casts Following from PR #2481, we replace all integer-to-pointer casts [which hide pointer provenance information (and thus inhibit optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html) with equivalent operations that preserve this information. I have enabled the corresponding clang-tidy check in our static analysis CI so that we do not get bitten by this again in the future.	2023-07-24 14:40:42 -07:00
Qi Wang	f509703af5	Fix two conversion warnings in tcache.	2022-01-04 13:55:06 -08:00
Qi Wang	8b34a788b5	Fix an used-uninitialized warning (false positive).	2021-12-29 14:44:43 -08:00
Qi Wang	e491cef9ab	Add stats for stashed bytes in tcache.	2021-12-29 14:44:43 -08:00
Qi Wang	b75822bc6e	Implement use-after-free detection using junk and stash. On deallocation, sampled pointers (specially aligned) get junked and stashed into tcache (to prevent immediate reuse). The expected behavior is to have read-after-free corrupted and stopped by the junk-filling, while write-after-free is checked when flushing the stashed pointers.	2021-12-29 14:44:43 -08:00
Qi Wang	06aac61c4b	Split the core logic of tcache flush into a separate function. The core function takes a ptr array as input (containing items to be flushed), which will be reused to flush sanitizer-stashed items.	2021-12-29 14:44:43 -08:00
Qi Wang	041145c272	Report the correct and wrong sizes on sized dealloc bug detection.	2021-02-08 14:42:27 -08:00
Qi Wang	f3b2668b32	Report the offending pointer on sized dealloc bug detection.	2021-02-08 14:42:27 -08:00
David Goldblatt	20140629b4	Bin: Move stats closer to the mutex. This is a slight cache locality optimization.	2021-02-04 14:10:43 -08:00
David Goldblatt	3967329813	Arena: share bin offsets in a global. This saves us a cache miss when lookup up the arena bin offset in a remote arena during tcache flush. All arenas share the base offset, and so we don't need to look it up repeatedly for each arena. Secondarily, it shaves 288 bytes off the arena on, e.g., x86-64.	2021-02-04 14:10:43 -08:00

1 2 3 4 5

233 commits