romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-07-14 12:47:27 +03:00

Author	SHA1	Message	Date
Nathan Slingerland	edc1576f03	Add safe frame-pointer backtrace unwinder	2024-10-01 11:01:56 -07:00
Ben Niu	3a0d9cdadb	Use MSVC __declspec(thread) for TSD on Windows	2024-09-30 11:33:44 -07:00
Dmitry Ilvokhin	4f4fd42447	Remove `strict_min_purge_interval` option Option `experimental_hpa_strict_min_purge_interval` was expected to be temporary to simplify rollout of a bugfix. Now, when bugfix rollout is complete it is safe to remove this option.	2024-09-25 11:49:18 -07:00
Qi Wang	6cc42173cb	Assert the mutex is locked within malloc_mutex_assert_owner().	2024-09-23 18:06:07 -07:00
Qi Wang	44db479fad	Fix the lock owner sanity checking during background thread boot. During boot, some mutexes are not initialized yet, plus there's no point taking many mutexes while everything is covered by the global init lock, so the locking assumptions in some functions (e.g. background_thread_enabled_set()) can't be enforced. Skip the lock owner check in this case.	2024-09-23 18:06:07 -07:00
Guangli Dai	0181aaa495	Optimize edata_cmp_summary_compare when __uint128_t is available	2024-09-23 16:23:42 -07:00
Qi Wang	1960536b61	Add malloc_mutex_is_locked() sanity checks.	2024-09-20 16:56:07 -07:00
Qi Wang	661fb1e672	Fix the locked flag for malloc_mutex_trylock().	2024-09-20 16:56:07 -07:00
Nathan Slingerland	8c2e15d1a5	Add malloc_open() / malloc_close() reentrancy safe helpers	2024-09-12 15:38:08 -07:00
Qi Wang	323ed2e3a8	Optimize fast path to allow static size class computation. After inlining at LTO time, many callsites have input size known which means the index and usable size can be translated at compile time. However the size-index lookup table prevents it -- this commit solves that by switching to the compute approach when the size is detected to be a known const.	2024-09-12 11:34:09 -07:00
Qi Wang	3383b98f1b	Check if the huge page size is expected when enabling HPA.	2024-09-04 15:43:59 -07:00
Qi Wang	cd05b19f10	Fix the VM over-reservation on aarch64 w/ larger pages. HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages), in which case it would cause grow_retained / exp_grow to over-reserve VMs. Similarly, make sure the base alloc has a const 2M alignment.	2024-09-04 15:43:59 -07:00
Shirui Cheng	7c99686165	Better handle burst allocation on tcache_alloc_small_hard	2024-08-29 10:50:33 -07:00
Shirui Cheng	0c88be9e0a	Regulate GC frequency by requiring a time interval between two consecutive GCs	2024-08-29 10:50:33 -07:00
Shirui Cheng	e2c9f3a9ce	Take locality into consideration when doing GC flush	2024-08-29 10:50:33 -07:00
Shirui Cheng	14d5dc136a	Allow a range for the nfill passed to arena_cache_bin_fill_small	2024-08-29 10:50:33 -07:00
Shirui Cheng	f68effe4ac	Add a runtime option opt_experimental_tcache_gc to guard the new design	2024-08-29 10:50:33 -07:00
Ben Niu	9e123a833c	Leverage new Windows API TlsGetValue2 for performance	2024-08-28 16:50:33 -07:00
Dmitry Ilvokhin	c7ccb8d7e9	Add `experimental` prefix to `hpa_strict_min_purge_interval` Goal is to make it obvious this option is experimental.	2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin	aaa29003ab	Limit maximum number of purged slabs with option Option `experimental_hpa_max_purge_nhp` introduced for backward compatibility reasons: to make it possible to have behaviour similar to buggy `hpa_strict_min_purge_interval` implementation. When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit to number of slabs we'll purge on each iteration. Otherwise, we'll purge no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in turn means we might not purge enough dirty pages to satisfy `hpa_dirty_mult` requirement. Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and `hpa_strict_min_purge_interval` options allows us to have steady rate of pages returned back to the system. This provides a strickier latency guarantees as number of `madvise` calls is bounded (and hence number of TLB shootdowns is limited) in exchange to weaker memory usage guarantees.	2024-08-20 10:02:38 -07:00
Shirui Cheng	47c9bcd402	Use a for-loop to fulfill flush requests that are larger than CACHE_BIN_NFLUSH_BATCH_MAX items	2024-08-06 13:16:09 -07:00
Shirui Cheng	48f66cf4a2	add a size check when declare a stack array to be less than 2048 bytes	2024-08-06 13:16:09 -07:00
Nathan Slingerland	bc32ddff2d	Add usize to prof_sample_hook_t	2024-07-30 10:29:30 -07:00
Dmitry Ilvokhin	b66f689764	Emit long string values without truncation There are few long options (`bin_shards` and `slab_sizes` for example) when they are specified and we emit statistics value gets truncated. Moved emitting logic for strings into separate `emitter_emit_str` function. It will try to emit string same way as before and if value is too long will fallback emiting rest partially with chunks of `BUF_SIZE`. Justification for long strings (longer than `BUF_SIZE`) is not supported.	2024-07-29 13:58:31 -07:00
Guangli Dai	8477ec9562	Set dependent as false for all rtree reads without ownership	2024-06-24 10:50:20 -07:00
Guangli Dai	21bcc0a8d4	Make JEMALLOC_CXX_THROW definition compatible with newer C++ versions	2024-06-13 11:03:05 -07:00
Dmitry Ilvokhin	867c6dd7dc	Option to guard `hpa_min_purge_interval_ms` fix Change in `hpa_min_purge_interval_ms` handling logic is not backward compatible as it might increase memory usage. Now this logic guarded by `hpa_strict_min_purge_interval` option. When `hpa_strict_min_purge_interval` is true, we will purge no more than `hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is false, old purging logic behaviour is preserved. Long term strategy migrate all users of hpa to new logic and then delete `hpa_strict_min_purge_interval` option.	2024-06-07 10:52:41 -07:00
David Goldblatt	f9c0b5f7f8	Bin batching: add some stats. This lets us easily see what fraction of flush load is being taken up by the bins, and helps guide future optimization approaches (for example: should we prefetch during cache bin fills? It depends on how many objects the average fill pops out of the batch).	2024-05-22 10:30:31 -07:00
David Goldblatt	fc615739cb	Add batching to arena bins. This adds a fast-path for threads freeing a small number of allocations to bins which are not their "home-base" and which encounter lock contention in attempting to do so. In producer-consumer workflows, such small lock hold times can cause lock convoying that greatly increases overall bin mutex contention.	2024-05-22 10:30:31 -07:00
David Goldblatt	c085530c71	Tcache batching: Plumbing In the next commit, we'll start using the batcher to eliminate mutex traffic. To avoid cluttering up that commit with the random bits of busy-work it entails, we'll centralize them here. This commit introduces: - A batched bin type. - The ability to mix batched and unbatched bins in the arena. - Conf parsing to set batches per size and a max batched size. - mallctl access to the corresponding opt-namespace keys. - Stats output of the above.	2024-05-22 10:30:31 -07:00
David Goldblatt	70c94d7474	Add batcher module. This can be used to batch up simple operation commands for later use by another thread.	2024-05-22 10:30:31 -07:00
David Goldblatt	86f4851f5d	Add clang static analyzer suppression macro.	2024-05-22 10:30:31 -07:00
Qi Wang	8d8379da44	Fix background_thread creation for the oversize_arena. Bypassing background thread creation for the oversize_arena used to be an optimization since that arena had eager purging. However #2466 changed the purging policy for the oversize_arena -- specifically it switched to the default decay time when background_thread is enabled. This issue is noticable when the number of arenas is low: whenever the total # of arenas is <= 4 (which is the default max # of background threads), in which case the purging will be stalled since no background thread is created for the oversize_arena.	2024-05-02 14:45:18 -07:00
Daniel Hodges	11038ff762	Add support for namespace pids in heap profile names This change adds support for writing pid namespaces to the filename of a heap profile. When running with namespaces pids may reused across namespaces and if mounts are shared where profiles are written there is not a great way to differentiate profiles between pids. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com> Signed-off-by: Daniel Hodges <hodgesd@fb.com>	2024-04-09 10:27:52 -07:00
Shirui Cheng	5081c16bb4	Experimental calloc implementation with using memset on larger sizes	2024-04-04 15:31:56 -07:00
Dmitry Ilvokhin	b2e59a96e1	Introduce getters for page allocator shard stats Access nactive, ndirty and nmuzzy throught getters and not directly. There are no functional change, but getters are required to propagate HPA's statistics up to Page Allocator's statitics.	2024-04-04 12:17:30 -07:00
Amaury Séchet	92aa52c062	Reduce nesting in phn_merge_siblings using an early return.	2024-03-14 13:08:17 -07:00
Amaury Séchet	10d713151d	Ensure that the root of a heap is always the best element.	2024-03-14 13:07:45 -07:00
Shirui Cheng	373884ab48	print out all malloc_conf settings in stats	2024-02-29 12:12:44 -08:00
Qi Wang	a2c5267409	HPA: Allow frequent reused alloc to bypass the slab_max_alloc limit, as long as it's within the huge page size. These requests do not concern internal fragmentation with huge pages, since the entire range is expected to be accessed.	2024-01-18 14:51:04 -08:00
guangli-dai	b1792c80d2	Add LOGs when entrying and exiting free and sdallocx.	2024-01-11 14:37:20 -08:00
guangli-dai	eda05b3994	Fix static analysis warnings.	2024-01-03 14:18:52 -08:00
Shirui Cheng	e4817c8d89	Cleanup cache_bin_info_t* info input args	2023-10-25 10:27:31 -07:00
Qi Wang	3025b021b9	Optimize mutex and bin alignment / locality.	2023-10-23 20:28:26 -07:00
Qi Wang	04d1a87b78	Fix a zero-initializer warning on macOS.	2023-10-18 14:12:43 -07:00
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	867eedfc58	Fix the bug in dalloc promoted allocations. An allocation small enough will be promoted so that it does not share an extent with others. However, when dalloc, such allocations may not be dalloc as a promoted one if nbins < SC_NBINS. This commit fixes the bug.	2023-10-17 14:53:23 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00

1 2 3 4 5 ...

1648 commits