romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-05-30 08:37:29 +03:00

Author	SHA1	Message	Date
Shirui Cheng	14d5dc136a	Allow a range for the nfill passed to arena_cache_bin_fill_small	2024-08-29 10:50:33 -07:00
Shirui Cheng	f68effe4ac	Add a runtime option opt_experimental_tcache_gc to guard the new design	2024-08-29 10:50:33 -07:00
Ben Niu	9e123a833c	Leverage new Windows API TlsGetValue2 for performance	2024-08-28 16:50:33 -07:00
Dmitry Ilvokhin	c7ccb8d7e9	Add `experimental` prefix to `hpa_strict_min_purge_interval` Goal is to make it obvious this option is experimental.	2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin	aaa29003ab	Limit maximum number of purged slabs with option Option `experimental_hpa_max_purge_nhp` introduced for backward compatibility reasons: to make it possible to have behaviour similar to buggy `hpa_strict_min_purge_interval` implementation. When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit to number of slabs we'll purge on each iteration. Otherwise, we'll purge no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in turn means we might not purge enough dirty pages to satisfy `hpa_dirty_mult` requirement. Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and `hpa_strict_min_purge_interval` options allows us to have steady rate of pages returned back to the system. This provides a strickier latency guarantees as number of `madvise` calls is bounded (and hence number of TLB shootdowns is limited) in exchange to weaker memory usage guarantees.	2024-08-20 10:02:38 -07:00
Shirui Cheng	47c9bcd402	Use a for-loop to fulfill flush requests that are larger than CACHE_BIN_NFLUSH_BATCH_MAX items	2024-08-06 13:16:09 -07:00
Shirui Cheng	48f66cf4a2	add a size check when declare a stack array to be less than 2048 bytes	2024-08-06 13:16:09 -07:00
Nathan Slingerland	bc32ddff2d	Add usize to prof_sample_hook_t	2024-07-30 10:29:30 -07:00
Dmitry Ilvokhin	b66f689764	Emit long string values without truncation There are few long options (`bin_shards` and `slab_sizes` for example) when they are specified and we emit statistics value gets truncated. Moved emitting logic for strings into separate `emitter_emit_str` function. It will try to emit string same way as before and if value is too long will fallback emiting rest partially with chunks of `BUF_SIZE`. Justification for long strings (longer than `BUF_SIZE`) is not supported.	2024-07-29 13:58:31 -07:00
Guangli Dai	8477ec9562	Set dependent as false for all rtree reads without ownership	2024-06-24 10:50:20 -07:00
Guangli Dai	21bcc0a8d4	Make JEMALLOC_CXX_THROW definition compatible with newer C++ versions	2024-06-13 11:03:05 -07:00
Dmitry Ilvokhin	867c6dd7dc	Option to guard `hpa_min_purge_interval_ms` fix Change in `hpa_min_purge_interval_ms` handling logic is not backward compatible as it might increase memory usage. Now this logic guarded by `hpa_strict_min_purge_interval` option. When `hpa_strict_min_purge_interval` is true, we will purge no more than `hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is false, old purging logic behaviour is preserved. Long term strategy migrate all users of hpa to new logic and then delete `hpa_strict_min_purge_interval` option.	2024-06-07 10:52:41 -07:00
David Goldblatt	f9c0b5f7f8	Bin batching: add some stats. This lets us easily see what fraction of flush load is being taken up by the bins, and helps guide future optimization approaches (for example: should we prefetch during cache bin fills? It depends on how many objects the average fill pops out of the batch).	2024-05-22 10:30:31 -07:00
David Goldblatt	fc615739cb	Add batching to arena bins. This adds a fast-path for threads freeing a small number of allocations to bins which are not their "home-base" and which encounter lock contention in attempting to do so. In producer-consumer workflows, such small lock hold times can cause lock convoying that greatly increases overall bin mutex contention.	2024-05-22 10:30:31 -07:00
David Goldblatt	c085530c71	Tcache batching: Plumbing In the next commit, we'll start using the batcher to eliminate mutex traffic. To avoid cluttering up that commit with the random bits of busy-work it entails, we'll centralize them here. This commit introduces: - A batched bin type. - The ability to mix batched and unbatched bins in the arena. - Conf parsing to set batches per size and a max batched size. - mallctl access to the corresponding opt-namespace keys. - Stats output of the above.	2024-05-22 10:30:31 -07:00
David Goldblatt	70c94d7474	Add batcher module. This can be used to batch up simple operation commands for later use by another thread.	2024-05-22 10:30:31 -07:00
David Goldblatt	86f4851f5d	Add clang static analyzer suppression macro.	2024-05-22 10:30:31 -07:00
Qi Wang	8d8379da44	Fix background_thread creation for the oversize_arena. Bypassing background thread creation for the oversize_arena used to be an optimization since that arena had eager purging. However #2466 changed the purging policy for the oversize_arena -- specifically it switched to the default decay time when background_thread is enabled. This issue is noticable when the number of arenas is low: whenever the total # of arenas is <= 4 (which is the default max # of background threads), in which case the purging will be stalled since no background thread is created for the oversize_arena.	2024-05-02 14:45:18 -07:00
Daniel Hodges	11038ff762	Add support for namespace pids in heap profile names This change adds support for writing pid namespaces to the filename of a heap profile. When running with namespaces pids may reused across namespaces and if mounts are shared where profiles are written there is not a great way to differentiate profiles between pids. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com> Signed-off-by: Daniel Hodges <hodgesd@fb.com>	2024-04-09 10:27:52 -07:00
Shirui Cheng	5081c16bb4	Experimental calloc implementation with using memset on larger sizes	2024-04-04 15:31:56 -07:00
Dmitry Ilvokhin	b2e59a96e1	Introduce getters for page allocator shard stats Access nactive, ndirty and nmuzzy throught getters and not directly. There are no functional change, but getters are required to propagate HPA's statistics up to Page Allocator's statitics.	2024-04-04 12:17:30 -07:00
Amaury Séchet	92aa52c062	Reduce nesting in phn_merge_siblings using an early return.	2024-03-14 13:08:17 -07:00
Amaury Séchet	10d713151d	Ensure that the root of a heap is always the best element.	2024-03-14 13:07:45 -07:00
Shirui Cheng	373884ab48	print out all malloc_conf settings in stats	2024-02-29 12:12:44 -08:00
Qi Wang	a2c5267409	HPA: Allow frequent reused alloc to bypass the slab_max_alloc limit, as long as it's within the huge page size. These requests do not concern internal fragmentation with huge pages, since the entire range is expected to be accessed.	2024-01-18 14:51:04 -08:00
guangli-dai	b1792c80d2	Add LOGs when entrying and exiting free and sdallocx.	2024-01-11 14:37:20 -08:00
guangli-dai	eda05b3994	Fix static analysis warnings.	2024-01-03 14:18:52 -08:00
Shirui Cheng	e4817c8d89	Cleanup cache_bin_info_t* info input args	2023-10-25 10:27:31 -07:00
Qi Wang	3025b021b9	Optimize mutex and bin alignment / locality.	2023-10-23 20:28:26 -07:00
Qi Wang	04d1a87b78	Fix a zero-initializer warning on macOS.	2023-10-18 14:12:43 -07:00
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	867eedfc58	Fix the bug in dalloc promoted allocations. An allocation small enough will be promoted so that it does not share an extent with others. However, when dalloc, such allocations may not be dalloc as a promoted one if nbins < SC_NBINS. This commit fixes the bug.	2023-10-17 14:53:23 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Shirui Cheng	36becb1302	metadata usage breakdowns: tracking edata and rtree usages	2023-10-11 11:56:01 -07:00
Qi Wang	005f20aa7f	Fix comments about malloc_conf to enable logging.	2023-10-04 11:49:10 -07:00
guangli-dai	7a9e4c9073	Mark jemalloc.h as system header to resolve header conflicts.	2023-10-04 11:41:30 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Qi Wang	b71da25b8a	Fix reading CPU id using rdtscp. As pointed out in #2527, the correct register containing CPU id should be ecx instead edx.	2023-08-28 11:46:39 -07:00
Kevin Svetlitski	da66aa391f	Enable a few additional warnings for CI and fix the issues they uncovered - `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are helpful for finding dead code and/or things that should be `static` but aren't marked as such. - `-Wunused-macros` is of similar utility, but for identifying dead macros. - `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly what they say: flag unreachable code.	2023-08-11 13:56:23 -07:00
Kevin Svetlitski	d2c9ed3d1e	Ensure short `read(2)`s/`write(2)`s are properly handled by IO utilities `read(2)` and `write(2)` may read or write fewer bytes than were requested. In order to robustly ensure that all of the requested bytes are read/written, these edge-cases must be handled.	2023-08-11 13:36:24 -07:00
Kevin Svetlitski	4f50f782fa	Use compiler-provided assume builtins when available There are several benefits to this: 1. It's cleaner and more reliable to use the builtin to inform the compiler of assumptions instead of hoping that the optimizer understands your intentions. 2. `clang` will warn you if any of your assumptions would produce side-effects (which the compiler will discard). [This blog post](https://fastcompression.blogspot.com/2019/01/compiler-checked-contracts.html) by Yann Collet highlights that a hazard of using the `unreachable()`-based method of signaling assumptions is that it can sometimes result in additional instructions being generated (see [this Godbolt link](https://godbolt.org/z/lKNMs3) from the blog post for an example).	2023-08-08 14:59:36 -07:00
Kevin Svetlitski	3aae792b10	Fix infinite purging loop in HPA As reported in #2449, under certain circumstances it's possible to get stuck in an infinite loop attempting to purge from the HPA. We now handle this by validating the HPA settings at the end of configuration parsing and either normalizing them or aborting depending on if `abort_conf` is set.	2023-08-08 14:36:19 -07:00
Kevin Svetlitski	424dd61d57	Issue a warning upon directly accessing an arena's bins An arena's bins should normally be accessed via the `arena_get_bin` function, which properly takes into account bin-shards. To ensure that we don't accidentally commit code which incorrectly accesses the bins directly, we mark the field with `__attribute__((deprecated))` with an appropriate warning message, and suppress the warning in the few places where directly accessing the bins is allowed.	2023-08-04 15:47:05 -07:00
Kevin Svetlitski	120abd703a	Add support for the `deprecated` attribute This is useful for enforcing the usage of getter/setter functions to access fields which are considered private or have unique access constraints.	2023-08-04 15:47:05 -07:00
Kevin Svetlitski	b01d496646	Add an override for the compile-time malloc_conf to `jemalloc_internal_overrides.h`	2023-07-31 14:53:15 -07:00
Kevin Svetlitski	8ff7e7d6c3	Remove errant `#include`s in public `jemalloc.h` header In an attempt to make all headers self-contained, I inadvertently added `#include`s which refer to intermediate, generated headers that aren't included in the final install. Closes #2489.	2023-07-25 16:26:50 -07:00

1 2 3 4 5 ...

1633 commits