romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-05-30 16:47:29 +03:00

Author	SHA1	Message	Date
Nathan Slingerland	bc32ddff2d	Add usize to prof_sample_hook_t	2024-07-30 10:29:30 -07:00
Dmitry Ilvokhin	b66f689764	Emit long string values without truncation There are few long options (`bin_shards` and `slab_sizes` for example) when they are specified and we emit statistics value gets truncated. Moved emitting logic for strings into separate `emitter_emit_str` function. It will try to emit string same way as before and if value is too long will fallback emiting rest partially with chunks of `BUF_SIZE`. Justification for long strings (longer than `BUF_SIZE`) is not supported.	2024-07-29 13:58:31 -07:00
Shirui Cheng	a1fcbebb18	skip tcache GC for tcache_max unit test	2024-06-25 12:59:45 -07:00
Dmitry Ilvokhin	867c6dd7dc	Option to guard `hpa_min_purge_interval_ms` fix Change in `hpa_min_purge_interval_ms` handling logic is not backward compatible as it might increase memory usage. Now this logic guarded by `hpa_strict_min_purge_interval` option. When `hpa_strict_min_purge_interval` is true, we will purge no more than `hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is false, old purging logic behaviour is preserved. Long term strategy migrate all users of hpa to new logic and then delete `hpa_strict_min_purge_interval` option.	2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin	91a6d230db	Respect `hpa_min_purge_interval_ms` option Currently, hugepages aware allocator backend works together with classic one as a fallback for not yet supported allocations. When background threads are enabled wake up time for classic interfere with hpa as there were no checks inside hpa purging logic to check if we are not purging too frequently. If background thread is running and `hpa_should_purge` returns true, then we will purge, even if we purged less than hpa_min_purge_interval_ms ago.	2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin	90c627edb7	Export hugepage size with `arenas.hugepage`	2024-06-05 15:37:41 -07:00
David Goldblatt	fc615739cb	Add batching to arena bins. This adds a fast-path for threads freeing a small number of allocations to bins which are not their "home-base" and which encounter lock contention in attempting to do so. In producer-consumer workflows, such small lock hold times can cause lock convoying that greatly increases overall bin mutex contention.	2024-05-22 10:30:31 -07:00
David Goldblatt	c085530c71	Tcache batching: Plumbing In the next commit, we'll start using the batcher to eliminate mutex traffic. To avoid cluttering up that commit with the random bits of busy-work it entails, we'll centralize them here. This commit introduces: - A batched bin type. - The ability to mix batched and unbatched bins in the arena. - Conf parsing to set batches per size and a max batched size. - mallctl access to the corresponding opt-namespace keys. - Stats output of the above.	2024-05-22 10:30:31 -07:00
David Goldblatt	70c94d7474	Add batcher module. This can be used to batch up simple operation commands for later use by another thread.	2024-05-22 10:30:31 -07:00
Dmitry Ilvokhin	47d69b4eab	HPA: Fix infinite purging loop One of the condition to start purging is `hpa_hugify_blocked_by_ndirty` function call returns true. This can happen in cases where we have no dirty memory for this shard at all. In this case purging loop will be an infinite loop. `hpa_hugify_blocked_by_ndirty` was introduced at `0f6c420`, but at that time purging loop has different form and additional `break` was not required. Purging loop form was re-written at `6630c5989`, but additional exit condition wasn't added there at the time. Repo code was shared by Patrik Dokoupil at [1], I stripped it down to minimum to reproduce issue in jemalloc unit tests. [1]: https://github.com/jemalloc/jemalloc/pull/2533	2024-04-30 13:46:32 -07:00
Shirui Cheng	4b555c11a5	Enable heap profiling on MacOS	2024-04-09 12:57:01 -07:00
Daniel Hodges	11038ff762	Add support for namespace pids in heap profile names This change adds support for writing pid namespaces to the filename of a heap profile. When running with namespaces pids may reused across namespaces and if mounts are shared where profiles are written there is not a great way to differentiate profiles between pids. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com> Signed-off-by: Daniel Hodges <hodgesd@fb.com>	2024-04-09 10:27:52 -07:00
Shirui Cheng	373884ab48	print out all malloc_conf settings in stats	2024-02-29 12:12:44 -08:00
Qi Wang	a2c5267409	HPA: Allow frequent reused alloc to bypass the slab_max_alloc limit, as long as it's within the huge page size. These requests do not concern internal fragmentation with huge pages, since the entire range is expected to be accessed.	2024-01-18 14:51:04 -08:00
Shirui Cheng	e4817c8d89	Cleanup cache_bin_info_t* info input args	2023-10-25 10:27:31 -07:00
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Shirui Cheng	36becb1302	metadata usage breakdowns: tracking edata and rtree usages	2023-10-11 11:56:01 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Evers Chen	7d9eceaf38	Fix array bounds false warning in gcc 12.3.0 1.error: array subscript 232 is above array bounds of ‘size_t[232]’ in gcc 12.3.0 2.it also optimizer to the code	2023-09-05 14:33:55 -07:00
Kevin Svetlitski	da66aa391f	Enable a few additional warnings for CI and fix the issues they uncovered - `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are helpful for finding dead code and/or things that should be `static` but aren't marked as such. - `-Wunused-macros` is of similar utility, but for identifying dead macros. - `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly what they say: flag unreachable code.	2023-08-11 13:56:23 -07:00
guangli-dai	254c4847e8	Print colorful reminder for failed tests.	2023-08-08 15:01:07 -07:00
Kevin Svetlitski	3aae792b10	Fix infinite purging loop in HPA As reported in #2449, under certain circumstances it's possible to get stuck in an infinite loop attempting to purge from the HPA. We now handle this by validating the HPA settings at the end of configuration parsing and either normalizing them or aborting depending on if `abort_conf` is set.	2023-08-08 14:36:19 -07:00
Kevin Svetlitski	7e54dd1ddb	Define `PROF_TCTX_SENTINEL` instead of using magic numbers This makes the code more readable on its own, and also sets the stage for more cleanly handling the pointer provenance lints in a following commit.	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	41e0b857be	Make headers self-contained by fixing `#include`s Header files are now self-contained, which makes the relationships between the files clearer, and crucially allows LSP tools like `clangd` to function correctly in all of our header files. I have verified that the headers are self-contained (aside from the various Windows shims) by compiling them as if they were C files – in a follow-up commit I plan to add this to CI to ensure we don't regress on this front.	2023-07-14 09:06:32 -07:00
Kevin Svetlitski	314c073a38	Print the failed assertion before aborting in test cases This makes it faster and easier to debug, so that you don't need to fire up a debugger just to see which assertion triggered in a failing test.	2023-07-13 15:07:17 -07:00
Kevin Svetlitski	65d3b5989b	Print test error messages in color when stderr is a terminal When stderr is a terminal and supports color, print error messages from tests in red to make them stand out from the surrounding output.	2023-07-13 13:03:23 -07:00
Kevin Svetlitski	589c63b424	Make eligible global variables `static` and/or `const` For better or worse, Jemalloc has a significant number of global variables. Making all eligible global variables `static` and/or `const` at least makes it slightly easier to reason about them, as these qualifications communicate to the programmer restrictions on their use without having to `grep` the whole codebase.	2023-07-06 14:15:12 -07:00
Qi Wang	602edd7566	Enabled -Wstrict-prototypes and fixed warnings.	2023-07-06 12:00:02 -07:00
Kevin Svetlitski	ebd7e99f5c	Add a test-case for small profiled allocations Validate that small allocations (i.e. those with `size <= SC_SMALL_MAXCLASS`) which are sampled for profiling maintain the expected invariants even though they now take up less space.	2023-07-03 16:19:06 -07:00
Kevin Svetlitski	e1338703ef	Address compiler warnings in the unit tests	2023-07-03 16:06:35 -07:00
Qi Wang	d131331310	Avoid eager purging on the dedicated oversize arena when using bg thds. We have observed new workload patterns (namely ML training type) that cycle through oversized allocations frequently, because 1) the dataset might be sparse which is faster to go through, and 2) GPU accelerated. As a result, the eager purging from the oversize arena becomes a bottleneck. To offer an easy solution, allow normal purging of the oversized extents when background threads are enabled.	2023-06-27 11:57:41 -07:00
Kevin Svetlitski	f2e00d2fd3	Remove trailing whitespace Additionally, added a GitHub Action to ensure no more trailing whitespace will creep in again in the future. I'm excluding Markdown files from this check, since trailing whitespace is significant there, and also excluding `build-aux/install-sh` because there is significant trailing whitespace on the line that sets `defaultIFS`.	2023-06-23 11:58:18 -07:00
Qi Wang	d4a2b8bab1	Add the prof_sys_thread_name feature in the prof_recent unit test. This tests the combination of the prof_recent and thread_name features. Verified that it catches the issue being fixed in this PR. Also explicitly set thread name in test/unit/prof_recent. This fixes the name testing when no default thread name is set (e.g. FreeBSD).	2023-05-11 09:10:57 -07:00
Kevin Svetlitski	70344a2d38	Make eligible functions `static` The codebase is already very disciplined in making any function which can be `static`, but there are a few that appear to have slipped through the cracks.	2023-05-08 15:00:02 -07:00
Qi Wang	434a68e221	Disallow decay during reentrancy. Decay should not be triggered during reentrant calls (may cause lock order reversal / deadlocks). Added a delay_trigger flag to the tickers to bypass decay when rentrancy_level is not zero.	2023-04-05 10:16:37 -07:00
Qi Wang	ce0b7ab6c8	Inline the storage for thread name in prof_tdata_t. The previous approach managed the thread name in a separate buffer, which causes races because the thread name update (triggered by new samples) can happen at the same time as prof dumping (which reads the thread names) -- these two operations are under separate locks to avoid blocking each other. Implemented the thread name storage as part of the tdata struct, which resolves the lifetime issue and also avoids internal alloc / dalloc during prof_sample.	2023-04-05 10:03:12 -07:00
Qi Wang	6cab460a45	Add a multithreaded test for prof_sys_thread_name. Verified that this catches the issue being fixed in `5fd5583`.	2023-04-05 10:03:12 -07:00
Qi Wang	8b64be3441	Explicit arena assignment in test_tcache_max. Otherwise the associated arena could change with percpu arena enabled.	2023-03-22 15:16:43 -07:00
Qi Wang	8e7353a19b	Explicit arena assignment in test_thread_idle. Otherwise the associated arena could change with percpu arena enabled.	2023-03-22 15:16:43 -07:00
Qi Wang	71bc1a3d91	Avoid assuming the arena id in test when percpu_arena is used.	2023-03-13 10:50:10 -07:00
guangli-dai	09e4b38fb1	Use asm volatile during benchmarks.	2023-02-24 11:17:48 -08:00
Qi Wang	97b313c7d4	More conservative setting for /test/unit/background_thread_enable. Lower the thread and arena count to avoid resource exhaustion on 32-bit.	2023-02-16 14:42:21 -08:00
Qi Wang	8580c65f81	Implement prof sample hooks "experimental.hooks.prof_sample(_free)". The added hooks hooks.prof_sample and hooks.prof_sample_free are intended to allow advanced users to track additional information, to enable new ways of profiling on top of the jemalloc heap profile and sample features. The sample hook is invoked after the allocation and backtracing, and forwards the both the allocation and backtrace to the user hook; the sample_free hook happens before the actual deallocation, and forwards only the ptr and usz to the hook.	2022-12-07 16:06:49 -08:00
guangli-dai	a74acb57e8	Fix dividing 0 error in stress/cpp/microbench Summary: Per issue #2356, some CXX compilers may optimize away the new/delete operation in stress/cpp/microbench.cpp. Thus, this commit (1) bumps the time interval to 1 if it is 0, and (2) modifies the pointers in the microbench to volatile.	2022-12-06 10:46:14 -08:00
Guangli Dai	e8f9f13811	Inline free and sdallocx into operator delete	2022-11-21 11:14:05 -08:00

1 2 3 4 5 ...

805 commits