romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-06-24 21:05:40 +03:00

Author	SHA1	Message	Date
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Kevin Svetlitski	3e82f357bb	Fix all optimization-inhibiting integer-to-pointer casts Following from PR #2481, we replace all integer-to-pointer casts [which hide pointer provenance information (and thus inhibit optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html) with equivalent operations that preserve this information. I have enabled the corresponding clang-tidy check in our static analysis CI so that we do not get bitten by this again in the future.	2023-07-24 14:40:42 -07:00
Qi Wang	f509703af5	Fix two conversion warnings in tcache.	2022-01-04 13:55:06 -08:00
Qi Wang	8b34a788b5	Fix an used-uninitialized warning (false positive).	2021-12-29 14:44:43 -08:00
Qi Wang	e491cef9ab	Add stats for stashed bytes in tcache.	2021-12-29 14:44:43 -08:00
Qi Wang	b75822bc6e	Implement use-after-free detection using junk and stash. On deallocation, sampled pointers (specially aligned) get junked and stashed into tcache (to prevent immediate reuse). The expected behavior is to have read-after-free corrupted and stopped by the junk-filling, while write-after-free is checked when flushing the stashed pointers.	2021-12-29 14:44:43 -08:00
Qi Wang	06aac61c4b	Split the core logic of tcache flush into a separate function. The core function takes a ptr array as input (containing items to be flushed), which will be reused to flush sanitizer-stashed items.	2021-12-29 14:44:43 -08:00
Qi Wang	041145c272	Report the correct and wrong sizes on sized dealloc bug detection.	2021-02-08 14:42:27 -08:00
Qi Wang	f3b2668b32	Report the offending pointer on sized dealloc bug detection.	2021-02-08 14:42:27 -08:00
David Goldblatt	20140629b4	Bin: Move stats closer to the mutex. This is a slight cache locality optimization.	2021-02-04 14:10:43 -08:00
David Goldblatt	3967329813	Arena: share bin offsets in a global. This saves us a cache miss when lookup up the arena bin offset in a remote arena during tcache flush. All arenas share the base offset, and so we don't need to look it up repeatedly for each arena. Secondarily, it shaves 288 bytes off the arena on, e.g., x86-64.	2021-02-04 14:10:43 -08:00
David Goldblatt	2fcbd18115	Cache bin: Don't reverse flush order. The items we pick to flush matter a lot, but the order in which they get flushed doesn't; just use forward scans. This simplifies the accessing code, both in terms of the C and the generated assembly (i.e. this speeds up the flush pathways).	2021-02-04 14:10:43 -08:00
David Goldblatt	229994a204	Tcache flush: keep common path state in registers. By carefully force-inlining the division constants and the operation sum count, we can eliminate redundant operations in the arena-level dalloc function. Do so.	2021-02-04 14:10:43 -08:00
David Goldblatt	31a629c3de	Tcache flush: prefetch edata contents. This frontloads more of the miss latency. It also moves it to a pathway where we have not yet acquired any locks, so that it should (hopefully) reduce hold times.	2021-02-04 14:10:43 -08:00
David Goldblatt	9f9247a62e	Tcache fluhing: increase cache miss parallelism. In practice, many rtree_leaf_elm accesses are cache misses. By restructuring, we can make it more likely that these misses occur without blocking us from starting later lookups, taking more of those misses in parallel.	2021-02-04 14:10:43 -08:00
David Goldblatt	181ba7fd4d	Tcache flush: Add an emap "batch lookup" path. For now this is a no-op; but the interface is a little more flexible for our purposes.	2021-02-04 14:10:43 -08:00
David Goldblatt	c007c537ff	Tcache flush: Unify edata lookup path.	2021-02-04 14:10:43 -08:00
David Goldblatt	a011c4c22d	cache_bin: Separate out local and remote accesses. This fixes an incorrect debug-mode assert: - T1 starts an arena stats update and reads stack_head from another thread's cache bin, when that cache bin has 1 item in it. - T2 allocates from that cache bin. The cache_bin's stack_head now points to a NULL pointer, since the cache bin is empty. - T1 Re-reads the cache_bin's stack_head to perform an assertion check (since it previously saw that the bin was empty, whatever stack_head points to should be non-NULL).	2021-01-08 14:18:08 -08:00
Qi Wang	bf72188f80	Allow opt.tcache_max to accept small size classes. Previously all the small size classes were cached. However this has downsides -- particularly when page size is greater than 4K (e.g. iOS), which will result in much higher SMALL_MAXCLASS. This change allows tcache_max to be set to lower values, to better control resources taken by tcache.	2020-10-24 20:43:44 -07:00
David Goldblatt	6599651aee	PA: Use an SEC in fron of the HPA shard.	2020-10-23 11:14:34 -07:00
Qi Wang	c8209150f9	Switch from opt.lg_tcache_max to opt.tcache_max Though for convenience, keep parsing lg_tcache_max.	2020-10-22 20:40:41 -07:00
Qi Wang	5e41ff9b74	Add a hard limit on tcache max size class. For locality reasons, tcache bins are integrated in TSD. Allowing all size classes to be cached has little benefit, but takes up much thread local storage. In addition, it complicates the layout which we try hard to optimize.	2020-10-16 13:49:51 -07:00
Qi Wang	3de19ba401	Eagerly detect double free and sized dealloc bugs for large sizes.	2020-10-15 10:03:16 -07:00
David Goldblatt	be9548f2be	Tcaches: Fix a subtle race condition. Without a lock held continuously between checking tcaches_past and incrementing it, it's possible for two threads to go down manual creation path simultaneously. If the number of tcaches is one less than the maximum, it's possible for both to create a tcache and increment tcaches_past, with the second thread returning a value larger than TCACHES_MAX.	2020-10-13 15:06:16 -07:00
Yinan Zhang	f28cc2bc87	Extract bin shard selection out of bin locking	2020-07-31 09:16:50 -07:00
David Goldblatt	f1f4ec315a	Tcache: Tweak nslots_max tuning parameter. In making these settings configurable, `634afc4124` unintentially changed a tuning parameter (reducing the "goal" max by a factor of 4). This commit undoes that change.	2020-07-09 08:58:05 -07:00
Yinan Zhang	a795b19327	Remove beginning define in source files ``` sed -i "/^#define JEMALLOC_[A-Z_]_C_$/d" src/.c; ```	2020-06-19 12:15:44 -07:00
David Goldblatt	8da0896b79	Tcache: Make an integer conversion explicit.	2020-05-28 15:52:40 -07:00
David Goldblatt	6cdac3c573	Tcache: Make flush fractions configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	7503b5b33a	Stats, CTL: Expose new tcache settings.	2020-05-16 13:34:23 -07:00
David Goldblatt	ee72bf1cfd	Tcache: Add tcache gc delay option. This can reduce flushing frequency for small size classes.	2020-05-16 13:34:23 -07:00
David Goldblatt	d338dd45d7	Tcache: Make incremental gc bytes configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	ec0b579563	Tcache: Privatize opt_lg_tcache_max default.	2020-05-16 13:34:23 -07:00
David Goldblatt	181093173d	Tcache: make slot sizing configurable.	2020-05-16 13:34:23 -07:00
David Goldblatt	634afc4124	Tcache: Make size computation configurable.	2020-05-16 13:34:23 -07:00
Yinan Zhang	b06dfb9ccc	Push event handlers to constituent modules	2020-05-12 09:16:16 -07:00
Yinan Zhang	abd4674931	Extract out per event postponed wait time fetching	2020-05-12 09:16:16 -07:00
Yinan Zhang	733ae918f0	Extract out per event new wait time fetching	2020-05-12 09:16:16 -07:00
David Goldblatt	cd29ebefd0	Tcache: treat small and large cache bins uniformly	2020-04-14 15:20:19 -07:00
David Goldblatt	a13fbad374	Tcache: split up fast and slow path data.	2020-04-14 15:20:19 -07:00
David Goldblatt	7099c66205	Arena: fill in terms of cache_bins.	2020-04-14 15:20:19 -07:00
David Goldblatt	294b276fc7	PA: Parameterize emap. Move emap_global to arena. This lets us test the PA module without interfering with the global emap used by the real allocator (the one not under test).	2020-04-10 13:12:47 -07:00
David Goldblatt	d701a085c2	Fast path: allow low-water mark changes. This lets us put more allocations on an "almost as fast" path after a flush. This results in around a 4% reduction in malloc cycles in prod workloads (corresponding to about a 0.1% reduction in overall cycles).	2020-03-12 11:54:19 -07:00
David Goldblatt	fef0b1ffe4	Cache bin: Remove last internals accesses.	2020-03-12 11:54:19 -07:00
David Goldblatt	0a2fcfac01	Tcache: Hold cache bin allocation explicitly.	2020-03-12 11:54:19 -07:00
David Goldblatt	d498a4bb08	Cache bin: Add an emptiness assertion.	2020-03-12 11:54:19 -07:00

1 2 3 4

198 commits