Commit graph

298 commits

Author SHA1 Message Date
Slobodan Predolac
a0f2bdf91d Fix missing negation in large_ralloc_no_move usize_min fallback
The second expansion attempt in large_ralloc_no_move omitted the !
before large_ralloc_no_move_expand(), inverting the return value.
On expansion failure, the function falsely reported success, making
callers believe the allocation was expanded in-place when it was not.
On expansion success, the function falsely reported failure, causing
callers to unnecessarily allocate, copy, and free.

Add unit test that verifies the return value matches actual size change.
2026-04-01 23:15:19 -04:00
Carl Shapiro
86b7219213 Add unit tests for conf parsing and its helpers 2026-03-10 18:14:33 -07:00
Carl Shapiro
ad726adf75 Separate out the configuration code from initialization 2026-03-10 18:14:33 -07:00
Carl Shapiro
a056c20d67 Handle tcache init failures gracefully
tsd_tcache_data_init() returns true on failure but its callers ignore
this return value, leaving the per-thread tcache in an uninitialized
state after a failure.

This change disables the tcache on an initialization failure and logs
an error message.  If opt_abort is true, it will also abort.

New unit tests have been added to test tcache initialization failures.
2026-03-10 18:14:33 -07:00
Carl Shapiro
a75655badf Add unit test coverage for bin interfaces 2026-03-10 18:14:33 -07:00
guangli-dai
c73ab1c2ff Add a test to check the output in JSON-based stats is consistent with mallctl results. 2026-03-10 18:14:33 -07:00
Slobodan Predolac
34ace9169b Remove prof_threshold built-in event. It is trivial to implement it as user event if needed 2026-03-10 18:14:33 -07:00
Andrei Pechkurov
4d0ffa075b Fix background thread initialization race 2026-03-10 18:14:33 -07:00
Slobodan Predolac
6016d86c18 [SEC] Make SEC owned by hpa_shard, simplify the code, add stats, lock per bin 2026-03-10 18:14:33 -07:00
Slobodan Predolac
8a06b086f3 [EASY] Extract hpa_central component from hpa source file 2026-03-10 18:14:33 -07:00
Slobodan Predolac
355774270d [EASY] Encapsulate better, do not pass hpa_shard when hooks are enough, move shard independent actions to hpa_utils 2026-03-10 18:14:33 -07:00
Slobodan Predolac
3678a57c10 When extracting from central, hugify_eager is different than start_as_huge 2026-03-10 18:14:33 -07:00
guangli-dai
261591f123 Add a page-allocator microbenchmark. 2026-03-10 18:14:33 -07:00
guangli-dai
56cdce8592 Adding trace analysis in preparation for page allocator microbenchmark. 2026-03-10 18:14:33 -07:00
Carl Shapiro
daf44173c5 Replace an instance of indentation with spaces with tabs 2026-03-10 18:14:33 -07:00
Shirui Cheng
2114349a4e Revert PR #2608: Manually revert commits 70c94d..f9c0b5
Closes: #2707
2026-03-10 18:14:33 -07:00
Slobodan Predolac
e6864c6075 [thread_event] Remove macros from thread_event and replace with dynamic event objects 2026-03-10 18:14:33 -07:00
Jason Evans
27d7960cf9 Revert "Extend purging algorithm with peak demand tracking"
This reverts commit ad108d50f1.
2025-06-02 10:44:37 -07:00
Slobodan Predolac
1956a54a43 [process_madvise] Use process_madvise across multiple huge_pages 2025-04-25 19:19:03 -07:00
Slobodan Predolac
f19f49ef3e if process_madvise is supported, call it when purging hpa 2025-04-04 13:57:42 -07:00
Dmitry Ilvokhin
ad108d50f1 Extend purging algorithm with peak demand tracking
Implementation inspired by idea described in "Beyond malloc efficiency
to fleet efficiency: a hugepage-aware memory allocator" paper [1].

Primary idea is to track maximum number (peak) of active pages in use
with sliding window and then use this number to decide how many dirty
pages we would like to keep.

We are trying to estimate maximum amount of active memory we'll need in
the near future. We do so by projecting future active memory demand
(based on peak active memory usage we observed in the past within
sliding window) and adding slack on top of it (an overhead is reasonable
to have in exchange of higher hugepages coverage). When peak demand
tracking is off, projection of future active memory is active memory we
are having right now.

Estimation is essentially the same as `nactive_max * (1 + dirty_mult)`.

Peak demand purging algorithm controlled by two config options. Option
`hpa_peak_demand_window_ms` controls duration of sliding window we track
maximum active memory usage in and option `hpa_dirty_mult` controls
amount of slack we are allowed to have as a percent from maximum active
memory usage. By default `hpa_peak_demand_window_ms == 0` now and we
have same behaviour (ratio based purging) that we had before this
commit.

[1]: https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf
2025-03-13 10:12:22 -07:00
Shai Duvdevani
257e64b968 Unlike prof_sample which is supported only with profiling mode active, prof_threshold is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks. 2025-01-29 18:55:52 -08:00
Dmitry Ilvokhin
3820e38dc1 Remove validation for HPA ratios
Config validation was introduced at 3aae792b with main intention to fix
infinite purging loop, but it didn't actually fix the underlying
problem, just masked it. Later 47d69b4ea was merged to address the same
problem.

Options `hpa_dirty_mult` and `hpa_hugification_threshold` have different
application dimensions: `hpa_dirty_mult` applied to active memory on the
shard, but `hpa_hugification_threshold` is a threshold for single
pageslab (hugepage). It doesn't make much sense to sum them up together.

While it is true that too high value of `hpa_dirty_mult` and too low
value of `hpa_hugification_threshold` can lead to pathological
behaviour, it is true for other options as well. Poor configurations
might lead to suboptimal and sometimes completely unacceptable
behaviour and that's OK, that is exactly the reason why they are called
poor.

There are other mechanism exist to prevent extreme behaviour, when we
hugified and then immediately purged page, see
`hpa_hugify_blocked_by_ndirty` function, which exist to prevent exactly
this case.

Lastly, `hpa_dirty_mult + hpa_hugification_threshold >= 1` constraint is
too tight and prevents a lot of valid configurations.
2024-11-20 18:59:07 -08:00
Nathan Slingerland
edc1576f03 Add safe frame-pointer backtrace unwinder 2024-10-01 11:01:56 -07:00
David Goldblatt
fc615739cb Add batching to arena bins.
This adds a fast-path for threads freeing a small number of allocations to
bins which are not their "home-base" and which encounter lock contention in
attempting to do so. In producer-consumer workflows, such small lock hold times
can cause lock convoying that greatly increases overall bin mutex contention.
2024-05-22 10:30:31 -07:00
David Goldblatt
70c94d7474 Add batcher module.
This can be used to batch up simple operation commands for later use by another
thread.
2024-05-22 10:30:31 -07:00
guangli-dai
8a22d10b83 Allow setting default ncached_max for each bin through malloc_conf 2023-10-18 14:11:46 -07:00
guangli-dai
630f7de952 Add mallctl to set and get ncached_max of each cache_bin.
1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the
    ncached_max of the bin with the input sizeclass, passed in through
    oldp (will be upper casted if not an exact bin size is given).
2. `thread_tcache_ncached_max_write` takes in a char array
    representing the settings for bins in the tcache.
2023-10-17 14:53:23 -07:00
Kevin Svetlitski
3aae792b10 Fix infinite purging loop in HPA
As reported in #2449, under certain circumstances it's possible to get
stuck in an infinite loop attempting to purge from the HPA. We now
handle this by validating the HPA settings at the end of
configuration parsing and either normalizing them or aborting depending on
if `abort_conf` is set.
2023-08-08 14:36:19 -07:00
Kevin Svetlitski
ebd7e99f5c Add a test-case for small profiled allocations
Validate that small allocations (i.e. those with `size <= SC_SMALL_MAXCLASS`)
which are sampled for profiling maintain the expected invariants even
though they now take up less space.
2023-07-03 16:19:06 -07:00
barracuda156
4422f88d17 Makefile.in: link with g++ when cxx enabled 2023-02-21 13:26:58 -08:00
guangli-dai
06374d2a6a Benchmark operator delete
Added the microbenchmark for operator delete.
Also modified bench.h so that it can be used in C++.
2022-11-21 11:14:05 -08:00
divanorama
3de0c24859 Disable builtin malloc in tests
With `--with-jemalloc-prefix=` and without `-fno-builtin` or `-O1` both clang and gcc may optimize out `malloc` calls
whose result is unused. Comparing result to NULL also doesn't necessarily count as being used.

This won't be a problem in most client programs as this only concerns really unused pointers, but in
tests it's important to actually execute allocations.
`-fno-builtin` should disable this optimization for both gcc and clang, and applying it only to tests code shouldn't hopefully be an issue.
Another alternative is to force "use" of result but that'd require more changes and may miss some other optimization-related issues.

This should resolve https://github.com/jemalloc/jemalloc/issues/2091
2022-10-03 10:39:13 -07:00
Alex Lapenkou
df7ad8a9b6 Revert "Echo installed files via verbose 'install' command"
This reverts commit f15d8f3b41. "install -v"
turned out to be not portable and not work on NetBSD.
2022-06-07 12:28:45 -07:00
Qi Wang
0e29ad4efa Rename zero_realloc option "strict" to "alloc".
With realloc(ptr, 0) being UB per C23, the option name "strict" makes less sense
now.  Rename to "alloc" which describes the behavior.
2022-04-20 10:27:25 -07:00
Charles
eaaa368bab Add comments and use meaningful vars in sz_psz2ind. 2022-03-24 16:56:59 -07:00
Shuduo Sang
640c3c72e6 Add support for 'make uninstall' 2022-01-19 12:28:16 -08:00
Alex Lapenkou
f15d8f3b41 Echo installed files via verbose 'install' command
It's not necessary to manually echo all install commands, similar effect
is achieved via 'install -v'
2022-01-19 12:28:16 -08:00
Qi Wang
011449f17b Fix doc build with install-suffix. 2022-01-11 21:15:24 -08:00
Qi Wang
b75822bc6e Implement use-after-free detection using junk and stash.
On deallocation, sampled pointers (specially aligned) get junked and stashed
into tcache (to prevent immediate reuse).  The expected behavior is to have
read-after-free corrupted and stopped by the junk-filling, while
write-after-free is checked when flushing the stashed pointers.
2021-12-29 14:44:43 -08:00
Alex Lapenkou
0f6da1257d San: Implement bump alloc
The new allocator will be used to allocate guarded extents used as slabs
for guarded small allocations.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
62f9c54d2a San: Rename 'guard' to 'san'
This prepares the foundation for more sanitizer-related work in the
future.
2021-12-15 10:39:17 -08:00
Qi Wang
deb8e62a83 Implement guard pages.
Adding guarded extents, which are regular extents surrounded by guard pages
(mprotected).  To reduce syscalls, small guarded extents are cached as a
separate eset in ecache, and decay through the dirty / muzzy / retained pipeline
as usual.
2021-09-26 16:30:15 -07:00
Alex Lapenkou
f7d46b8119 Allow setting custom backtrace hook
Existing backtrace implementations skip native stack frames from runtimes like
Python. The hook allows to augment the backtraces to attribute allocations to
native functions in heap profiles.
2021-09-22 15:04:01 -07:00
Mingli Yu
e5062e9fb9 Makefile.in: make sure doc generated before install
There is a race between the doc generation and the doc installation,
so make the install depend on the build for doc.

Signed-off-by: Mingli Yu <mingli.yu@windriver.com>
2021-09-13 13:40:39 -07:00
David Goldblatt
e09eac1d4e Remove hpa_central.
This is now dead code.
2021-07-23 21:59:59 -07:00
David Goldblatt
113938b6f4 HPA: Pull out a hooks type.
For now, this is a no-op change.  In a subsequent commit, it will be useful for
testing.
2021-07-12 17:59:18 -07:00
David Goldblatt
1d4a7666d5 HPA: Do deferred operations on background threads. 2021-07-12 17:59:18 -07:00
David Goldblatt
de033f56c0 mpsc_queue: Add module.
This is a simple multi-producer, single-consumer queue.  The intended use case
is in the HPA, as we begin supporting hpdatas that move between hpa_shards.  We
take just a single CAS as the cost to send a message (or a batch of messages) in
the low-contention case, and lock-freedom lets us avoid some lock-ordering
issues.
2021-06-24 14:55:49 -07:00
David Goldblatt
4452a4812f Add opt.experimental_infallible_new.
This allows a guarantee that operator new never throws.

Fix the .gitignore rules to include test/integration/cpp while we're here.
2021-06-24 12:22:51 -07:00