romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-04-14 22:51:50 +03:00

Author	SHA1	Message	Date
Slobodan Predolac	19bbefe136	Remove dead code: extent_commit_wrapper, large_salloc, tcache_gc_dalloc event waits These functions had zero callers anywhere in the codebase: - extent_commit_wrapper: wrapper never called, _impl used directly - large_salloc: trivial wrapper never called - tcache_gc_dalloc_new_event_wait: no header declaration, no callers - tcache_gc_dalloc_postponed_event_wait: no header declaration, no callers	2026-04-01 17:48:19 -04:00
Carl Shapiro	a056c20d67	Handle tcache init failures gracefully tsd_tcache_data_init() returns true on failure but its callers ignore this return value, leaving the per-thread tcache in an uninitialized state after a failure. This change disables the tcache on an initialization failure and logs an error message. If opt_abort is true, it will also abort. New unit tests have been added to test tcache initialization failures.	2026-03-10 18:14:33 -07:00
Carl Shapiro	1cc563f531	Move bin functions from arena.c to bin.c This is a clean-up change that gives the bin functions implemented in the area code a prefix of bin_ and moves them into the bin code. To further decouple the bin code from the arena code, bin functions that had taken an arena_t to check arena_is_auto now take an is_auto parameter instead.	2026-03-10 18:14:33 -07:00
Shirui Cheng	6d4611197e	move fill/flush pointer array out of tcache.c	2026-03-10 18:14:33 -07:00
Shirui Cheng	2114349a4e	Revert PR #2608 : Manually revert commits 70c94d..f9c0b5 Closes: #2707	2026-03-10 18:14:33 -07:00
guangli-dai	6200e8987f	Reformat the codebase with the clang-format 18.	2026-03-10 18:14:33 -07:00
Slobodan Predolac	015b017973	[thread_event] Add support for user events in thread events when stats are enabled	2026-03-10 18:14:33 -07:00
Slobodan Predolac	e6864c6075	[thread_event] Remove macros from thread_event and replace with dynamic event objects	2026-03-10 18:14:33 -07:00
guangli-dai	c067a55c79	Introducing a new usize calculation policy Converting size to usize is what jemalloc has been done by ceiling size to the closest size class. However, this causes lots of memory wastes with HPA enabled. This commit changes how usize is calculated so that the gap between two contiguous usize is no larger than a page. Specifically, this commit includes the following changes: 1. Adding a build-time config option (--enable-limit-usize-gap) and a runtime one (limit_usize_gap) to guard the changes. When build-time config is enabled, some minor CPU overhead is expected because usize will be stored and accessed apart from index. When runtime option is also enabled (it can only be enabled with the build-time config enabled). a new usize calculation approach wil be employed. This new calculation will ceil size to the closest multiple of PAGE for all sizes larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes. Note when the build-time config is enabled, the runtime option is default on. 2. Prepare tcache for size to grow by PAGE over GROUPPAGE. To prepare for the upcoming changes where size class grows by PAGE when larger than NGROUP PAGE, disable the tcache when it is larger than 2 * NGROUP * PAGE. The threshold for tcache is set higher to prevent perf regression as much as possible while usizes between NGROUP * PAGE and 2 * NGROUP * PAGE happen to grow by PAGE. 3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE For PAC, to avoid having too many bins, arena bins still have the same layout. This means some extra search is needed for a page-level request that is not aligned with the orginal size class: it should also search the heap before the current index since the previous heap might also be able to have some allocations satisfying it. The same changes apply to HPA's psset. This search relies on the enumeration of the heap because not all allocs in the previous heap are guaranteed to satisfy the request. To balance the memory and CPU overhead, we currently enumerate at most a fixed number of nodes before concluding none can satisfy the request during an enumeration. 4. Add bytes counter to arena large stats. To prepare for the upcoming usize changes, stats collected by multiplying alive allocations and the bin size is no longer accurate. Thus, add separate counters to record the bytes malloced and dalloced. 5. Change structs use when freeing to avoid using index2size for large sizes. - Change the definition of emap_alloc_ctx_t - Change the read of both from edata_t. - Change the assignment and usage of emap_alloc_ctx_t. - Change other callsites of index2size. Note for the changes in the data structure, i.e., emap_alloc_ctx_t, will be used when the build-time config (--enable-limit-usize-gap) is enabled but they will store the same value as index2size(szind) if the runtime option (opt_limit_usize_gap) is not enabled. 6. Adapt hpa to the usize changes. Change the settings in sec to limit is usage for sizes larger than USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests. 7. Modify usize calculation and corresponding tests. Change the sz_s2u_compute. Note sz_index2size is not always safe now while sz_size2index still works as expected.	2025-03-06 15:08:13 -08:00
Shirui Cheng	7c99686165	Better handle burst allocation on tcache_alloc_small_hard	2024-08-29 10:50:33 -07:00
Shirui Cheng	0c88be9e0a	Regulate GC frequency by requiring a time interval between two consecutive GCs	2024-08-29 10:50:33 -07:00
Shirui Cheng	e2c9f3a9ce	Take locality into consideration when doing GC flush	2024-08-29 10:50:33 -07:00
Shirui Cheng	14d5dc136a	Allow a range for the nfill passed to arena_cache_bin_fill_small	2024-08-29 10:50:33 -07:00
Qi Wang	bd0a5b0f3b	Fix static analysis warnings. Newly reported warnings included several reserved macro identifier, and false-positive used-uninitialized.	2024-08-28 16:03:53 -07:00
Amaury Séchet	a25b9b8ba9	Simplify the logic when bumping lg_fill_div.	2024-08-06 13:31:49 -07:00
Shirui Cheng	47c9bcd402	Use a for-loop to fulfill flush requests that are larger than CACHE_BIN_NFLUSH_BATCH_MAX items	2024-08-06 13:16:09 -07:00
David Goldblatt	f9c0b5f7f8	Bin batching: add some stats. This lets us easily see what fraction of flush load is being taken up by the bins, and helps guide future optimization approaches (for example: should we prefetch during cache bin fills? It depends on how many objects the average fill pops out of the batch).	2024-05-22 10:30:31 -07:00
David Goldblatt	fc615739cb	Add batching to arena bins. This adds a fast-path for threads freeing a small number of allocations to bins which are not their "home-base" and which encounter lock contention in attempting to do so. In producer-consumer workflows, such small lock hold times can cause lock convoying that greatly increases overall bin mutex contention.	2024-05-22 10:30:31 -07:00
David Goldblatt	44d91cf243	Tcache flush: Partition by bin before locking. This accomplishes two things: - It avoids a full array scan (and any attendant branch prediction misses, etc.) while holding the bin lock. - It allows us to know the number of items that will be flushed before flushing them, which will (in an upcoming commit) let us know if it's safe to use the batched flush (in which case we won't acquire the bin mutex).	2024-05-22 10:30:31 -07:00
David Goldblatt	6e56848850	Tcache: Split up small/large handling. The main bits of shared code are the edata filtering and the stats flushing logic, both of which are fairly simple to read and not so painful to duplicate. The shared code comes at the cost of guarding all the subtle logic with `if (small)`, which doesn't feel worth it.	2024-05-22 10:30:31 -07:00
Amaury Séchet	5afff2e44e	Simplify the logic in tcache_gc_small.	2024-05-02 18:52:19 -07:00
Qi Wang	fa451de17f	Fix the tcache flush sanity checking around ncached and nstashed. When there were many items stashed, it's possible that after flushing stashed, ncached is already lower than the remain, in which case the flush can simply return at that point.	2024-04-12 16:01:55 -07:00
guangli-dai	eda05b3994	Fix static analysis warnings.	2024-01-03 14:18:52 -08:00
Shirui Cheng	e4817c8d89	Cleanup cache_bin_info_t* info input args	2023-10-25 10:27:31 -07:00
guangli-dai	d88fa71bbd	Fix nfill = 0 bug when ncached_max is 1	2023-10-18 14:11:46 -07:00
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Kevin Svetlitski	3e82f357bb	Fix all optimization-inhibiting integer-to-pointer casts Following from PR #2481, we replace all integer-to-pointer casts [which hide pointer provenance information (and thus inhibit optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html) with equivalent operations that preserve this information. I have enabled the corresponding clang-tidy check in our static analysis CI so that we do not get bitten by this again in the future.	2023-07-24 14:40:42 -07:00
Qi Wang	f509703af5	Fix two conversion warnings in tcache.	2022-01-04 13:55:06 -08:00
Qi Wang	8b34a788b5	Fix an used-uninitialized warning (false positive).	2021-12-29 14:44:43 -08:00
Qi Wang	e491cef9ab	Add stats for stashed bytes in tcache.	2021-12-29 14:44:43 -08:00
Qi Wang	b75822bc6e	Implement use-after-free detection using junk and stash. On deallocation, sampled pointers (specially aligned) get junked and stashed into tcache (to prevent immediate reuse). The expected behavior is to have read-after-free corrupted and stopped by the junk-filling, while write-after-free is checked when flushing the stashed pointers.	2021-12-29 14:44:43 -08:00
Qi Wang	06aac61c4b	Split the core logic of tcache flush into a separate function. The core function takes a ptr array as input (containing items to be flushed), which will be reused to flush sanitizer-stashed items.	2021-12-29 14:44:43 -08:00
Qi Wang	041145c272	Report the correct and wrong sizes on sized dealloc bug detection.	2021-02-08 14:42:27 -08:00
Qi Wang	f3b2668b32	Report the offending pointer on sized dealloc bug detection.	2021-02-08 14:42:27 -08:00
David Goldblatt	20140629b4	Bin: Move stats closer to the mutex. This is a slight cache locality optimization.	2021-02-04 14:10:43 -08:00
David Goldblatt	3967329813	Arena: share bin offsets in a global. This saves us a cache miss when lookup up the arena bin offset in a remote arena during tcache flush. All arenas share the base offset, and so we don't need to look it up repeatedly for each arena. Secondarily, it shaves 288 bytes off the arena on, e.g., x86-64.	2021-02-04 14:10:43 -08:00
David Goldblatt	2fcbd18115	Cache bin: Don't reverse flush order. The items we pick to flush matter a lot, but the order in which they get flushed doesn't; just use forward scans. This simplifies the accessing code, both in terms of the C and the generated assembly (i.e. this speeds up the flush pathways).	2021-02-04 14:10:43 -08:00
David Goldblatt	229994a204	Tcache flush: keep common path state in registers. By carefully force-inlining the division constants and the operation sum count, we can eliminate redundant operations in the arena-level dalloc function. Do so.	2021-02-04 14:10:43 -08:00
David Goldblatt	31a629c3de	Tcache flush: prefetch edata contents. This frontloads more of the miss latency. It also moves it to a pathway where we have not yet acquired any locks, so that it should (hopefully) reduce hold times.	2021-02-04 14:10:43 -08:00
David Goldblatt	9f9247a62e	Tcache fluhing: increase cache miss parallelism. In practice, many rtree_leaf_elm accesses are cache misses. By restructuring, we can make it more likely that these misses occur without blocking us from starting later lookups, taking more of those misses in parallel.	2021-02-04 14:10:43 -08:00
David Goldblatt	181ba7fd4d	Tcache flush: Add an emap "batch lookup" path. For now this is a no-op; but the interface is a little more flexible for our purposes.	2021-02-04 14:10:43 -08:00
David Goldblatt	c007c537ff	Tcache flush: Unify edata lookup path.	2021-02-04 14:10:43 -08:00
David Goldblatt	a011c4c22d	cache_bin: Separate out local and remote accesses. This fixes an incorrect debug-mode assert: - T1 starts an arena stats update and reads stack_head from another thread's cache bin, when that cache bin has 1 item in it. - T2 allocates from that cache bin. The cache_bin's stack_head now points to a NULL pointer, since the cache bin is empty. - T1 Re-reads the cache_bin's stack_head to perform an assertion check (since it previously saw that the bin was empty, whatever stack_head points to should be non-NULL).	2021-01-08 14:18:08 -08:00
Qi Wang	bf72188f80	Allow opt.tcache_max to accept small size classes. Previously all the small size classes were cached. However this has downsides -- particularly when page size is greater than 4K (e.g. iOS), which will result in much higher SMALL_MAXCLASS. This change allows tcache_max to be set to lower values, to better control resources taken by tcache.	2020-10-24 20:43:44 -07:00

1 2 3 4 5

225 commits