romenskiy2012/jemalloc

mirror of https://github.com/jemalloc/jemalloc.git synced 2026-04-20 01:11:15 +03:00

Author	SHA1	Message	Date
Dmitry Ilvokhin	867c6dd7dc	Option to guard `hpa_min_purge_interval_ms` fix Change in `hpa_min_purge_interval_ms` handling logic is not backward compatible as it might increase memory usage. Now this logic guarded by `hpa_strict_min_purge_interval` option. When `hpa_strict_min_purge_interval` is true, we will purge no more than `hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is false, old purging logic behaviour is preserved. Long term strategy migrate all users of hpa to new logic and then delete `hpa_strict_min_purge_interval` option.	2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin	91a6d230db	Respect `hpa_min_purge_interval_ms` option Currently, hugepages aware allocator backend works together with classic one as a fallback for not yet supported allocations. When background threads are enabled wake up time for classic interfere with hpa as there were no checks inside hpa purging logic to check if we are not purging too frequently. If background thread is running and `hpa_should_purge` returns true, then we will purge, even if we purged less than hpa_min_purge_interval_ms ago.	2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin	90c627edb7	Export hugepage size with `arenas.hugepage`	2024-06-05 15:37:41 -07:00
David Goldblatt	f9c0b5f7f8	Bin batching: add some stats. This lets us easily see what fraction of flush load is being taken up by the bins, and helps guide future optimization approaches (for example: should we prefetch during cache bin fills? It depends on how many objects the average fill pops out of the batch).	2024-05-22 10:30:31 -07:00
David Goldblatt	fc615739cb	Add batching to arena bins. This adds a fast-path for threads freeing a small number of allocations to bins which are not their "home-base" and which encounter lock contention in attempting to do so. In producer-consumer workflows, such small lock hold times can cause lock convoying that greatly increases overall bin mutex contention.	2024-05-22 10:30:31 -07:00
David Goldblatt	44d91cf243	Tcache flush: Partition by bin before locking. This accomplishes two things: - It avoids a full array scan (and any attendant branch prediction misses, etc.) while holding the bin lock. - It allows us to know the number of items that will be flushed before flushing them, which will (in an upcoming commit) let us know if it's safe to use the batched flush (in which case we won't acquire the bin mutex).	2024-05-22 10:30:31 -07:00
David Goldblatt	6e56848850	Tcache: Split up small/large handling. The main bits of shared code are the edata filtering and the stats flushing logic, both of which are fairly simple to read and not so painful to duplicate. The shared code comes at the cost of guarding all the subtle logic with `if (small)`, which doesn't feel worth it.	2024-05-22 10:30:31 -07:00
David Goldblatt	c085530c71	Tcache batching: Plumbing In the next commit, we'll start using the batcher to eliminate mutex traffic. To avoid cluttering up that commit with the random bits of busy-work it entails, we'll centralize them here. This commit introduces: - A batched bin type. - The ability to mix batched and unbatched bins in the arena. - Conf parsing to set batches per size and a max batched size. - mallctl access to the corresponding opt-namespace keys. - Stats output of the above.	2024-05-22 10:30:31 -07:00
David Goldblatt	70c94d7474	Add batcher module. This can be used to batch up simple operation commands for later use by another thread.	2024-05-22 10:30:31 -07:00
Amaury Séchet	5afff2e44e	Simplify the logic in tcache_gc_small.	2024-05-02 18:52:19 -07:00
Qi Wang	8d8379da44	Fix background_thread creation for the oversize_arena. Bypassing background thread creation for the oversize_arena used to be an optimization since that arena had eager purging. However #2466 changed the purging policy for the oversize_arena -- specifically it switched to the default decay time when background_thread is enabled. This issue is noticable when the number of arenas is low: whenever the total # of arenas is <= 4 (which is the default max # of background threads), in which case the purging will be stalled since no background thread is created for the oversize_arena.	2024-05-02 14:45:18 -07:00
Dmitry Ilvokhin	47d69b4eab	HPA: Fix infinite purging loop One of the condition to start purging is `hpa_hugify_blocked_by_ndirty` function call returns true. This can happen in cases where we have no dirty memory for this shard at all. In this case purging loop will be an infinite loop. `hpa_hugify_blocked_by_ndirty` was introduced at `0f6c420`, but at that time purging loop has different form and additional `break` was not required. Purging loop form was re-written at `6630c5989`, but additional exit condition wasn't added there at the time. Repo code was shared by Patrik Dokoupil at [1], I stripped it down to minimum to reproduce issue in jemalloc unit tests. [1]: https://github.com/jemalloc/jemalloc/pull/2533	2024-04-30 13:46:32 -07:00
Qi Wang	fa451de17f	Fix the tcache flush sanity checking around ncached and nstashed. When there were many items stashed, it's possible that after flushing stashed, ncached is already lower than the remain, in which case the flush can simply return at that point.	2024-04-12 16:01:55 -07:00
debing.sun	630434bb0a	Fixed type error with allocated that caused incorrect printing on 32bit	2024-04-09 14:44:43 -07:00
Shirui Cheng	4b555c11a5	Enable heap profiling on MacOS	2024-04-09 12:57:01 -07:00
Daniel Hodges	11038ff762	Add support for namespace pids in heap profile names This change adds support for writing pid namespaces to the filename of a heap profile. When running with namespaces pids may reused across namespaces and if mounts are shared where profiles are written there is not a great way to differentiate profiles between pids. Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com> Signed-off-by: Daniel Hodges <hodgesd@fb.com>	2024-04-09 10:27:52 -07:00
Qi Wang	83b075789b	rallocx path: only set errno on the realloc case.	2024-04-05 17:41:43 -07:00
Shirui Cheng	5081c16bb4	Experimental calloc implementation with using memset on larger sizes	2024-04-04 15:31:56 -07:00
Juhyung Park	38056fea64	Set errno to ENOMEM on rallocx() OOM failures realloc() and rallocx() shares path, and realloc() should set errno to ENOMEM upon OOM failures. Fixes: `ee961c2310` ("Merge realloc and rallocx pathways.") Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>	2024-04-04 15:13:22 -07:00
Dmitry Ilvokhin	268e8ee880	Include HPA ndirty into page allocator ndirty stat	2024-04-04 12:17:30 -07:00
Dmitry Ilvokhin	b2e59a96e1	Introduce getters for page allocator shard stats Access nactive, ndirty and nmuzzy throught getters and not directly. There are no functional change, but getters are required to propagate HPA's statistics up to Page Allocator's statitics.	2024-04-04 12:17:30 -07:00
XChy	ed9b00a96b	Replace unsigned induction variable with size_t in background_threads_enable This patch avoids unnecessary vectorizations in clang and missed recognition of memset in gcc. See also https://godbolt.org/z/aoeMsjr4c.	2024-03-05 14:54:50 -08:00
Shirui Cheng	373884ab48	print out all malloc_conf settings in stats	2024-02-29 12:12:44 -08:00
Qi Wang	1aba4f41a3	Allow zero sized memalign to pass. Instead of failing on assertions. Previously the same change was made for posix_memalign and aligned_alloc (#1554). Make memalign behave the same way even though it's obsolete.	2024-02-16 13:06:07 -08:00
Qi Wang	a2c5267409	HPA: Allow frequent reused alloc to bypass the slab_max_alloc limit, as long as it's within the huge page size. These requests do not concern internal fragmentation with huge pages, since the entire range is expected to be accessed.	2024-01-18 14:51:04 -08:00
guangli-dai	b1792c80d2	Add LOGs when entrying and exiting free and sdallocx.	2024-01-11 14:37:20 -08:00
Qi Wang	05160258df	When safety_check_fail, also embed hint msg in the abort function name because there are cases only logging crash stack traces.	2024-01-11 14:19:54 -08:00
guangli-dai	eda05b3994	Fix static analysis warnings.	2024-01-03 14:18:52 -08:00
Shirui Cheng	e4817c8d89	Cleanup cache_bin_info_t* info input args	2023-10-25 10:27:31 -07:00
Qi Wang	3025b021b9	Optimize mutex and bin alignment / locality.	2023-10-23 20:28:26 -07:00
guangli-dai	e2cd27132a	Change stack_size assertion back to the more compatabile one.	2023-10-23 20:28:26 -07:00
guangli-dai	d88fa71bbd	Fix nfill = 0 bug when ncached_max is 1	2023-10-18 14:11:46 -07:00
guangli-dai	6fb3b6a8e4	Refactor the tcache initiailization 1. Pre-generate all default tcache ncached_max in tcache_boot; 2. Add getters returning default ncached_max and ncached_max_set; 3. Refactor tcache init so that it is always init with a given setting.	2023-10-18 14:11:46 -07:00
guangli-dai	8a22d10b83	Allow setting default ncached_max for each bin through malloc_conf	2023-10-18 14:11:46 -07:00
guangli-dai	630f7de952	Add mallctl to set and get ncached_max of each cache_bin. 1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the ncached_max of the bin with the input sizeclass, passed in through oldp (will be upper casted if not an exact bin size is given). 2. `thread_tcache_ncached_max_write` takes in a char array representing the settings for bins in the tcache.	2023-10-17 14:53:23 -07:00
guangli-dai	6b197fdd46	Pre-generate ncached_max for all bins for better tcache_max tuning experience.	2023-10-17 14:53:23 -07:00
Shirui Cheng	36becb1302	metadata usage breakdowns: tracking edata and rtree usages	2023-10-11 11:56:01 -07:00
Qi Wang	72cfdce718	Allocate tcache stack from base allocator When using metadata_thp, allocate tcache bin stacks from base0, which means they will be placed on huge pages along with other metadata, instead of mixed with other regular allocations. In order to do so, modified the base allocator to support limited reuse: freed tcached stacks (from thread termination) will be returned to base0 and made available for reuse, but no merging will be attempted since they were bump allocated out of base blocks. These reused base extents are managed using separately allocated base edata_t -- they are cached in base->edata_avail when the extent is all allocated. One tricky part is, stats updating must be skipped for such reused extents (since they were accounted for already, and there is no purging for base). This requires tracking the "if is reused" state explicitly and bypass the stats updates when allocating from them.	2023-09-18 12:18:32 -07:00
guangli-dai	a442d9b895	Enable per-tcache tcache_max 1. add tcache_max and nhbins into tcache_t so that they are per-tcache, with one auto tcache per thread, it's also per-thread; 2. add mallctl for each thread to set its own tcache_max (of its auto tcache); 3. store the maximum number of items in each bin instead of using a global storage; 4. add tests for the modifications above. 5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.	2023-09-06 10:47:14 -07:00
guangli-dai	fbca96c433	Remove unnecessary parameters for cache_bin_postincrement.	2023-09-06 10:47:14 -07:00
Qi Wang	7d563a8f81	Update safety check message to remove --enable-debug when it's already on.	2023-09-05 14:15:45 -07:00
Kevin Svetlitski	da66aa391f	Enable a few additional warnings for CI and fix the issues they uncovered - `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are helpful for finding dead code and/or things that should be `static` but aren't marked as such. - `-Wunused-macros` is of similar utility, but for identifying dead macros. - `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly what they say: flag unreachable code.	2023-08-11 13:56:23 -07:00
Kevin Svetlitski	3aae792b10	Fix infinite purging loop in HPA As reported in #2449, under certain circumstances it's possible to get stuck in an infinite loop attempting to purge from the HPA. We now handle this by validating the HPA settings at the end of configuration parsing and either normalizing them or aborting depending on if `abort_conf` is set.	2023-08-08 14:36:19 -07:00
Kevin Svetlitski	424dd61d57	Issue a warning upon directly accessing an arena's bins An arena's bins should normally be accessed via the `arena_get_bin` function, which properly takes into account bin-shards. To ensure that we don't accidentally commit code which incorrectly accesses the bins directly, we mark the field with `__attribute__((deprecated))` with an appropriate warning message, and suppress the warning in the few places where directly accessing the bins is allowed.	2023-08-04 15:47:05 -07:00
Kevin Svetlitski	07a2eab3ed	Stop over-reporting memory usage from sampled small allocations @interwq noticed [while reviewing an earlier PR](https://github.com/jemalloc/jemalloc/pull/2478#discussion_r1256217261) that I missed modifying this statistics accounting in line with the rest of the changes from #2459. This is now fixed, such that sampled small allocations increment the `.nmalloc`/`.ndalloc` of their effective bin size instead of over-reporting memory usage by attributing all such allocations to `SC_LARGE_MINCLASS`.	2023-08-03 16:12:22 -07:00
Qi Wang	6816b23862	Include the unrecognized malloc conf option in the error message. Previously the option causing trouble will not be printed, unless the option key:value pair format is found.	2023-08-02 10:44:55 -07:00
Kevin Svetlitski	62648c88e5	Ensured sampled allocations are properly deallocated during `arena_reset` Sampled allocations were not being demoted before being deallocated during an `arena_reset` operation.	2023-08-01 11:35:37 -07:00
Kevin Svetlitski	9ba1e1cb37	Make `ctl_arena_clear` slightly more efficient While this function isn't particularly hot, (accounting for just 0.27% of time spent inside the allocator on average across the fleet), looking at the generated assembly and performance profiles does show we're dispatching to multiple different `memset`s when we could instead be just tail-calling `memset` once, reducing code size and marginally improving performance.	2023-07-31 14:44:04 -07:00
Kevin Svetlitski	3e82f357bb	Fix all optimization-inhibiting integer-to-pointer casts Following from PR #2481, we replace all integer-to-pointer casts [which hide pointer provenance information (and thus inhibit optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html) with equivalent operations that preserve this information. I have enabled the corresponding clang-tidy check in our static analysis CI so that we do not get bitten by this again in the future.	2023-07-24 14:40:42 -07:00
Kevin Svetlitski	1431153695	Define `SBRK_INVALID` instead of using a magic number	2023-07-24 14:40:42 -07:00

1 2 3 4 5 ...

1902 commits