romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-07-14 12:47:27 +03:00

Author	SHA1	Message	Date
Kaspar M. Rohrer	80e9001af3	Move `extern "C" specifications for C++ to where they are needed This should fix errors when compiling C++ code with modules enabled on clang.	2025-03-31 10:41:51 -07:00
Shirui Cheng	3688dfb5c3	fix assertion error in huge_arena_auto_thp_switch() when b0 is deleted in unit test	2025-03-20 12:45:23 -07:00
Shirui Cheng	e1a77ec558	Support THP with Huge Arena in PAC	2025-03-17 16:06:43 -07:00
Guangli Dai	773b5809f9	Fix frame pointer based unwinder to handle changing stack range	2025-03-13 17:15:42 -07:00
Dmitry Ilvokhin	ad108d50f1	Extend purging algorithm with peak demand tracking Implementation inspired by idea described in "Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator" paper [1]. Primary idea is to track maximum number (peak) of active pages in use with sliding window and then use this number to decide how many dirty pages we would like to keep. We are trying to estimate maximum amount of active memory we'll need in the near future. We do so by projecting future active memory demand (based on peak active memory usage we observed in the past within sliding window) and adding slack on top of it (an overhead is reasonable to have in exchange of higher hugepages coverage). When peak demand tracking is off, projection of future active memory is active memory we are having right now. Estimation is essentially the same as `nactive_max * (1 + dirty_mult)`. Peak demand purging algorithm controlled by two config options. Option `hpa_peak_demand_window_ms` controls duration of sliding window we track maximum active memory usage in and option `hpa_dirty_mult` controls amount of slack we are allowed to have as a percent from maximum active memory usage. By default `hpa_peak_demand_window_ms == 0` now and we have same behaviour (ratio based purging) that we had before this commit. [1]: https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf	2025-03-13 10:12:22 -07:00
Qi Wang	22440a0207	Implement process_madvise support. Add opt.process_madvise_max_batch which determines if process_madvise is enabled (non-zero) and the max # of regions in each batch. Added another limiting factor which is the space to reserve on stack, which results in the max batch of 128.	2025-03-07 15:32:32 -08:00
Guangli Dai	6035d4a8d3	Cache extra extents in the dirty pool from ecache_alloc_grow	2025-03-06 15:08:13 -08:00
guangli-dai	c067a55c79	Introducing a new usize calculation policy Converting size to usize is what jemalloc has been done by ceiling size to the closest size class. However, this causes lots of memory wastes with HPA enabled. This commit changes how usize is calculated so that the gap between two contiguous usize is no larger than a page. Specifically, this commit includes the following changes: 1. Adding a build-time config option (--enable-limit-usize-gap) and a runtime one (limit_usize_gap) to guard the changes. When build-time config is enabled, some minor CPU overhead is expected because usize will be stored and accessed apart from index. When runtime option is also enabled (it can only be enabled with the build-time config enabled). a new usize calculation approach wil be employed. This new calculation will ceil size to the closest multiple of PAGE for all sizes larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes. Note when the build-time config is enabled, the runtime option is default on. 2. Prepare tcache for size to grow by PAGE over GROUPPAGE. To prepare for the upcoming changes where size class grows by PAGE when larger than NGROUP PAGE, disable the tcache when it is larger than 2 * NGROUP * PAGE. The threshold for tcache is set higher to prevent perf regression as much as possible while usizes between NGROUP * PAGE and 2 * NGROUP * PAGE happen to grow by PAGE. 3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE For PAC, to avoid having too many bins, arena bins still have the same layout. This means some extra search is needed for a page-level request that is not aligned with the orginal size class: it should also search the heap before the current index since the previous heap might also be able to have some allocations satisfying it. The same changes apply to HPA's psset. This search relies on the enumeration of the heap because not all allocs in the previous heap are guaranteed to satisfy the request. To balance the memory and CPU overhead, we currently enumerate at most a fixed number of nodes before concluding none can satisfy the request during an enumeration. 4. Add bytes counter to arena large stats. To prepare for the upcoming usize changes, stats collected by multiplying alive allocations and the bin size is no longer accurate. Thus, add separate counters to record the bytes malloced and dalloced. 5. Change structs use when freeing to avoid using index2size for large sizes. - Change the definition of emap_alloc_ctx_t - Change the read of both from edata_t. - Change the assignment and usage of emap_alloc_ctx_t. - Change other callsites of index2size. Note for the changes in the data structure, i.e., emap_alloc_ctx_t, will be used when the build-time config (--enable-limit-usize-gap) is enabled but they will store the same value as index2size(szind) if the runtime option (opt_limit_usize_gap) is not enabled. 6. Adapt hpa to the usize changes. Change the settings in sec to limit is usage for sizes larger than USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests. 7. Modify usize calculation and corresponding tests. Change the sz_s2u_compute. Note sz_index2size is not always safe now while sz_size2index still works as expected.	2025-03-06 15:08:13 -08:00
Dmitry Ilvokhin	421b17a622	Remove age_counter from hpa_central Before this commit we had two age counters: one global in HPA central and one local in each HPA shard. We used HPA shard counter, when we are reused empty pageslab and HPA central counter anywhere else. They suppose to be comparable, because we use them for allocation placement decisions, but in reality they are not, there is no ordering guarantees between them. At the moment, there is no way for pageslab to migrate between HPA shards, so we don't actually need HPA central age counter.	2025-02-13 16:00:41 -08:00
roblabla	c17bf8b368	Disable config from file or envvar with build flag This adds a new autoconf flag, --disable-user-config, which disables reading the configuration from /etc/malloc.conf or the MALLOC_CONF environment variable. This can be useful when integrating jemalloc in a binary that internally handles all aspects of the configuration and shouldn't be impacted by ambient change in the environment.	2025-02-05 15:01:50 -08:00
Shai Duvdevani	257e64b968	Unlike `prof_sample` which is supported only with profiling mode active, `prof_threshold` is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks.	2025-01-29 18:55:52 -08:00
Dmitry Ilvokhin	ef8e512e29	Fix `bitmap_ffu` out of range read We tried to load `g` from `bitmap[i]` before checking it is actually a valid load. Tweaked a loop a bit to `break` early, when we are done scanning for bits. Before this commit undefined behaviour sanitizer from GCC 14+ was unhappy at `test/unit/bitmap` test with following error. ``` ../include/jemalloc/internal/bitmap.h:293:5: runtime error: load of address 0x7bb1c2e08008 with insufficient space for an object of type 'const bitmap_t' <...> #0 0x62671a149954 in bitmap_ffu ../include/jemalloc/internal/bitmap.h:293 #1 0x62671a149954 in test_bitmap_xfu_body ../test/unit/bitmap.c:275 #2 0x62671a14b767 in test_bitmap_xfu ../test/unit/bitmap.c:323 #3 0x62671a376ad1 in p_test_impl ../test/src/test.c:149 #4 0x62671a377135 in p_test ../test/src/test.c:200 #5 0x62671a13da06 in main ../test/unit/bitmap.c:336 <...> ```	2025-01-28 10:42:20 -08:00
Qi Wang	20cc983314	Fix the gettid() detection caught by @mrluanma .	2025-01-22 10:30:53 -08:00
appujee	4b88bddbca	Conditionally remove unreachable for C23+	2024-12-17 12:39:00 -08:00
appujee	d8486b2653	Remove unreachable() macro as c23 already defines it. Taken from https://android-review.git.corp.google.com/c/platform/external/jemalloc_new/+/3316478 This might need more cleanups to remove the definition of JEMALLOC_INTERNAL_UNREACHABLE.	2024-12-17 12:39:00 -08:00
Guangli Dai	587676fee8	Disable psset test when hugepage size is too large.	2024-12-17 12:35:35 -08:00
Dmitry Ilvokhin	6092c980a6	Expose `psset` state stats When evaluating changes in HPA logic, it is useful to know internal `hpa_shard` state. Great deal of this state is `psset`. Some of the `psset` stats was available, but in disaggregated form, which is not very convenient. This commit exposed `psset` counters to `mallctl` and malloc stats dumps. Example of how malloc stats dump will look like after the change. HPA shard stats: Pageslabs: 14899 (4354 huge, 10545 nonhuge) Active pages: 6708166 (2228917 huge, 4479249 nonhuge) Dirty pages: 233816 (331 huge, 233485 nonhuge) Retained pages: 686306 Purge passes: 8730 (10 / sec) Purges: 127501 (146 / sec) Hugeifies: 4358 (5 / sec) Dehugifies: 4 (0 / sec) Pageslabs, active pages, dirty pages and retained pages are rows added by this change.	2024-11-21 09:23:32 -08:00
Dmitry Ilvokhin	0ce13c6fb5	Add opt `hpa_hugify_sync` to hugify synchronously Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort synchronous collapse of the native pages mapped by the memory range into transparent huge pages. Synchronous hugification might be beneficial for at least two reasons: we are not relying on khugepaged anymore and get an instant feedback if range wasn't hugified. If `hpa_hugify_sync` option is on, we'll try to perform synchronously collapse and if it wasn't successful, we'll fallback to asynchronous behaviour.	2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin	b9758afff0	Add `nstime_ms_since` to get time since in ms Milliseconds are used a lot in hpa, so it is convenient to have `nstime_ms_since` function instead of dividing to `MILLION` constantly. For consistency renamed `nstime_msec` to `nstime_ms` as `ms` abbreviation is used much more commonly across codebase than `msec`. ``` $ grep -Rn '_msec' include src \| wc -l 2 $ grep -RPn '_ms( \|,\|:)' include src \| wc -l 72 ``` Function `nstime_msec` wasn't used anywhere in the code yet.	2024-11-08 10:37:28 -08:00
Qi Wang	2a693b83d2	Fix the sized-dealloc safety check abort msg.	2024-10-14 10:34:15 -07:00
Qi Wang	6d625d5e5e	Add support for clock_gettime_nsec_np() Prefer clock_gettime_nsec_np(CLOCK_UPTIME_RAW) to mach_absolute_time().	2024-10-14 10:33:27 -07:00
Nathan Slingerland	edc1576f03	Add safe frame-pointer backtrace unwinder	2024-10-01 11:01:56 -07:00
Ben Niu	3a0d9cdadb	Use MSVC __declspec(thread) for TSD on Windows	2024-09-30 11:33:44 -07:00
Dmitry Ilvokhin	4f4fd42447	Remove `strict_min_purge_interval` option Option `experimental_hpa_strict_min_purge_interval` was expected to be temporary to simplify rollout of a bugfix. Now, when bugfix rollout is complete it is safe to remove this option.	2024-09-25 11:49:18 -07:00
Qi Wang	6cc42173cb	Assert the mutex is locked within malloc_mutex_assert_owner().	2024-09-23 18:06:07 -07:00
Qi Wang	44db479fad	Fix the lock owner sanity checking during background thread boot. During boot, some mutexes are not initialized yet, plus there's no point taking many mutexes while everything is covered by the global init lock, so the locking assumptions in some functions (e.g. background_thread_enabled_set()) can't be enforced. Skip the lock owner check in this case.	2024-09-23 18:06:07 -07:00
Guangli Dai	0181aaa495	Optimize edata_cmp_summary_compare when __uint128_t is available	2024-09-23 16:23:42 -07:00
Qi Wang	1960536b61	Add malloc_mutex_is_locked() sanity checks.	2024-09-20 16:56:07 -07:00
Qi Wang	661fb1e672	Fix the locked flag for malloc_mutex_trylock().	2024-09-20 16:56:07 -07:00
Nathan Slingerland	8c2e15d1a5	Add malloc_open() / malloc_close() reentrancy safe helpers	2024-09-12 15:38:08 -07:00
Qi Wang	323ed2e3a8	Optimize fast path to allow static size class computation. After inlining at LTO time, many callsites have input size known which means the index and usable size can be translated at compile time. However the size-index lookup table prevents it -- this commit solves that by switching to the compute approach when the size is detected to be a known const.	2024-09-12 11:34:09 -07:00
Qi Wang	3383b98f1b	Check if the huge page size is expected when enabling HPA.	2024-09-04 15:43:59 -07:00
Qi Wang	cd05b19f10	Fix the VM over-reservation on aarch64 w/ larger pages. HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages), in which case it would cause grow_retained / exp_grow to over-reserve VMs. Similarly, make sure the base alloc has a const 2M alignment.	2024-09-04 15:43:59 -07:00
Shirui Cheng	7c99686165	Better handle burst allocation on tcache_alloc_small_hard	2024-08-29 10:50:33 -07:00
Shirui Cheng	0c88be9e0a	Regulate GC frequency by requiring a time interval between two consecutive GCs	2024-08-29 10:50:33 -07:00
Shirui Cheng	e2c9f3a9ce	Take locality into consideration when doing GC flush	2024-08-29 10:50:33 -07:00
Shirui Cheng	14d5dc136a	Allow a range for the nfill passed to arena_cache_bin_fill_small	2024-08-29 10:50:33 -07:00
Shirui Cheng	f68effe4ac	Add a runtime option opt_experimental_tcache_gc to guard the new design	2024-08-29 10:50:33 -07:00
Ben Niu	9e123a833c	Leverage new Windows API TlsGetValue2 for performance	2024-08-28 16:50:33 -07:00
Dmitry Ilvokhin	c7ccb8d7e9	Add `experimental` prefix to `hpa_strict_min_purge_interval` Goal is to make it obvious this option is experimental.	2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin	aaa29003ab	Limit maximum number of purged slabs with option Option `experimental_hpa_max_purge_nhp` introduced for backward compatibility reasons: to make it possible to have behaviour similar to buggy `hpa_strict_min_purge_interval` implementation. When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit to number of slabs we'll purge on each iteration. Otherwise, we'll purge no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in turn means we might not purge enough dirty pages to satisfy `hpa_dirty_mult` requirement. Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and `hpa_strict_min_purge_interval` options allows us to have steady rate of pages returned back to the system. This provides a strickier latency guarantees as number of `madvise` calls is bounded (and hence number of TLB shootdowns is limited) in exchange to weaker memory usage guarantees.	2024-08-20 10:02:38 -07:00
Shirui Cheng	47c9bcd402	Use a for-loop to fulfill flush requests that are larger than CACHE_BIN_NFLUSH_BATCH_MAX items	2024-08-06 13:16:09 -07:00
Shirui Cheng	48f66cf4a2	add a size check when declare a stack array to be less than 2048 bytes	2024-08-06 13:16:09 -07:00
Nathan Slingerland	bc32ddff2d	Add usize to prof_sample_hook_t	2024-07-30 10:29:30 -07:00
Dmitry Ilvokhin	b66f689764	Emit long string values without truncation There are few long options (`bin_shards` and `slab_sizes` for example) when they are specified and we emit statistics value gets truncated. Moved emitting logic for strings into separate `emitter_emit_str` function. It will try to emit string same way as before and if value is too long will fallback emiting rest partially with chunks of `BUF_SIZE`. Justification for long strings (longer than `BUF_SIZE`) is not supported.	2024-07-29 13:58:31 -07:00
Guangli Dai	8477ec9562	Set dependent as false for all rtree reads without ownership	2024-06-24 10:50:20 -07:00
Guangli Dai	21bcc0a8d4	Make JEMALLOC_CXX_THROW definition compatible with newer C++ versions	2024-06-13 11:03:05 -07:00
Dmitry Ilvokhin	867c6dd7dc	Option to guard `hpa_min_purge_interval_ms` fix Change in `hpa_min_purge_interval_ms` handling logic is not backward compatible as it might increase memory usage. Now this logic guarded by `hpa_strict_min_purge_interval` option. When `hpa_strict_min_purge_interval` is true, we will purge no more than `hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is false, old purging logic behaviour is preserved. Long term strategy migrate all users of hpa to new logic and then delete `hpa_strict_min_purge_interval` option.	2024-06-07 10:52:41 -07:00
David Goldblatt	f9c0b5f7f8	Bin batching: add some stats. This lets us easily see what fraction of flush load is being taken up by the bins, and helps guide future optimization approaches (for example: should we prefetch during cache bin fills? It depends on how many objects the average fill pops out of the batch).	2024-05-22 10:30:31 -07:00
David Goldblatt	fc615739cb	Add batching to arena bins. This adds a fast-path for threads freeing a small number of allocations to bins which are not their "home-base" and which encounter lock contention in attempting to do so. In producer-consumer workflows, such small lock hold times can cause lock convoying that greatly increases overall bin mutex contention.	2024-05-22 10:30:31 -07:00

1 2 3 4 5 ...

1669 commits