Commit graph

2000 commits

Author SHA1 Message Date
Slobodan Predolac
7bf49ca689 Remove prof_threshold built-in event. It is trivial to implement it as user event if needed 2026-02-25 09:06:29 -08:00
Andrei Pechkurov
bee30a9cd3 Fix background thread initialization race 2026-02-21 15:12:42 -08:00
Carl Shapiro
0d7a26e9b6 Remove an incorrect use of the address operator
The address of the local variable created_threads is a different
location than the data it points to.  Incorrectly treating these
values as being the same can cause out-of-bounds writes to the stack.

Resolves #59
2025-12-23 21:59:04 -08:00
Slobodan Predolac
afce889cc9 [SEC] Make SEC owned by hpa_shard, simplify the code, add stats, lock per bin 2025-12-19 12:51:26 -05:00
Guangli Dai
9aaad0544f Add a mallctl for users to get an approximate of active bytes. 2025-12-08 10:47:57 -08:00
Slobodan Predolac
3f1aa90426 [EASY] Extract hpa_central component from hpa source file 2025-12-02 13:20:18 -08:00
Slobodan Predolac
8384b00ebe [EASY] Encapsulate better, do not pass hpa_shard when hooks are enough, move shard independent actions to hpa_utils 2025-12-02 13:20:18 -08:00
Slobodan Predolac
bfb63eaf41 Add experimental_enforce_hugify 2025-11-21 11:30:16 -08:00
Shirui Cheng
f5f0f063c1 move fill/flush pointer array out of tcache.c 2025-11-02 13:46:00 -08:00
Slobodan Predolac
6ced85a8e5 When extracting from central, hugify_eager is different than start_as_huge 2025-10-15 20:27:35 -04:00
guangli-dai
0f8a8d7ebe Refactor init_system_thp_mode and print it in malloc stats. 2025-10-13 16:56:12 -07:00
Slobodan Predolac
832a332c33 Do not release the hpa_shard->mtx when inserting newly retrieved page from central before allocating from it 2025-10-09 15:39:07 -04:00
Slobodan Predolac
f8ef3195e5 [HPA] Add ability to start page as huge and more flexibility for purging 2025-10-06 10:59:43 -04:00
Slobodan Predolac
db943a0d9e Running clang-format on two files 2025-10-06 10:59:43 -04:00
Slobodan Predolac
2bbcb10d10 Revert "Do not dehugify when purging"
This reverts commit 16c5abd1cd.
2025-10-06 10:59:43 -04:00
Slobodan Predolac
b7bedaf7a0 [sdt] Add some tracepoints to sec and hpa modules 2025-09-16 16:27:04 -07:00
Carl Shapiro
218c989047 Always use pthread_equal to compare thread IDs
This change replaces direct comparisons of Pthread thread IDs with
calls to pthread_equal.  Directly comparing thread IDs is neither
portable nor reliable since a thread ID is defined as an opaque type
that can be implemented using a structure.
2025-08-28 14:45:51 -07:00
Slobodan Predolac
d4cde60066 Remove pidfd_open call handling and rely on PIDFD_SELF 2025-08-27 12:44:45 -04:00
Slobodan Predolac
88b29da00a [EASY][BUGFIX] Spelling and format 2025-08-23 21:47:21 -07:00
lexprfuncall
f890bbed4a Define malloc_{write,read}_fd as non-inline global functions
The static inline definition made more sense when these functions just
dispatched to a syscall wrapper.  Since they acquired a retry loop, a
non-inline definition makes more sense.
2025-08-23 11:42:23 -04:00
lexprfuncall
5f06b14d1c Remove an orphaned comment
This was left behind when definitions of malloc_open and malloc_close
were abstracted from code that had followed.
2025-08-23 11:42:23 -04:00
Shirui Cheng
e2da7477f8 Revert PR #2608: Manually revert commits 70c94d..f9c0b5 2025-08-22 21:55:24 -07:00
Slobodan Predolac
4f126b47b3 Save and restore errno when calling process_madvise 2025-08-21 16:22:06 -07:00
lexprfuncall
c45b6223e5 Use relaxed atomics to access the process madvise pid fd
Relaxed atomics already provide sequentially consistent access to single
location data structures.
2025-08-13 18:33:27 -07:00
lexprfuncall
16c5abd1cd Do not dehugify when purging
Giving the advice MADV_DONTNEED to a range of virtual memory backed by
a transparent huge page already causes that range of virtual memory to
become backed by regular pages.
2025-08-13 18:31:50 -07:00
lexprfuncall
0d071a086b Fix several spelling errors in comments 2025-08-08 14:12:12 -07:00
Slobodan Predolac
97d25919c3 [process_madvise] Make init lazy so that python tests pass. Reset the pidfd on fork 2025-07-29 15:47:53 -07:00
Slobodan Predolac
fb52eac372 Add several USDT probes for hpa 2025-06-26 13:16:04 -07:00
guangli-dai
34f359e0ca Reformat the codebase with the clang-format 18. 2025-06-20 14:35:15 -07:00
Shirui Cheng
ae4e729d15 Update the default value for opt_experimental_tcache_gc and opt_calloc_madvise_threshold 2025-06-17 13:25:20 -07:00
Slobodan Predolac
390e70c840 [thread_event] Add support for user events in thread events when stats are enabled 2025-06-11 15:37:03 -07:00
Slobodan Predolac
b2a35a905f [thread_event] Remove macros from thread_event and replace with dynamic event objects 2025-06-11 15:37:03 -07:00
Jason Evans
27d7960cf9 Revert "Extend purging algorithm with peak demand tracking"
This reverts commit ad108d50f1.
2025-06-02 10:44:37 -07:00
Xin Yang
5e460bfea2 Refactor: use the cache_bin_sz_t typedef instead of direct uint16_t
any future changes to the underlying data type for bin sizes
(such as upgrading from `uint16_t` to `uint32_t`) can be achieved
by modifying only the `cache_bin_sz_t` definition.

Signed-off-by: Xin Yang <yangxin.dev@bytedance.com>
2025-05-22 10:43:33 -07:00
Jiebin Sun
3c14707b01 To improve reuse efficiency, the maximum coalesced size for large extents
in the dirty ecache has been limited. This patch was tested with real
workloads using ClickHouse (Clickbench Q35) on a system with 2x240 vCPUs.
The results showed a 2X in query per second (QPS) performance and
a reduction in page faults to 29% of the previous rate. Additionally,
microbenchmark testing involved 256 memory reallocations resizing
from 4KB to 16KB in one arena, which demonstrated a 5X performance
improvement.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
2025-05-12 15:45:36 -07:00
guangli-dai
37bf846cc3 Fixes to prevent static analysis warnings. 2025-05-06 14:47:35 -07:00
guangli-dai
8347f1045a Renaming limit_usize_gap to disable_large_size_classes 2025-05-06 14:47:35 -07:00
Guangli Dai
01e9ecbeb2 Remove build-time configuration 'config_limit_usize_gap' 2025-05-06 14:47:35 -07:00
Slobodan Predolac
852da1be15 Add experimental option force using SYS_process_madvise 2025-04-28 18:45:30 -07:00
Slobodan Predolac
1956a54a43 [process_madvise] Use process_madvise across multiple huge_pages 2025-04-25 19:19:03 -07:00
Slobodan Predolac
0dfb4a5a1a Add output argument to hpa_purge_begin to count dirty ranges 2025-04-25 19:19:03 -07:00
Slobodan Predolac
cfa90dfd80 Refactor hpa purging to prepare for vectorized call across multiple pages 2025-04-25 19:19:03 -07:00
Qi Wang
a3910b9802 Avoid forced purging during thread-arena migration when bg thd is on. 2025-04-25 19:18:20 -07:00
guangli-dai
c23a6bfdf6 Add opt.limit_usize_gap to stats 2025-04-16 10:38:10 -07:00
Slobodan Predolac
f19f49ef3e if process_madvise is supported, call it when purging hpa 2025-04-04 13:57:42 -07:00
Shirui Cheng
3688dfb5c3 fix assertion error in huge_arena_auto_thp_switch() when b0 is deleted in unit test 2025-03-20 12:45:23 -07:00
Shirui Cheng
e1a77ec558 Support THP with Huge Arena in PAC 2025-03-17 16:06:43 -07:00
Audrey Dutcher
86bbabac32 background_thread: add fallback for pthread_create dlsym
If jemalloc is linked into a shared library, the RTLD_NEXT dlsym call
may fail since RTLD_NEXT is only specified to search all objects after
the current one in the loading order, and the pthread library may be
earlier in the load order. Instead of failing immediately, attempt one
more time to find pthread_create via RTLD_GLOBAL.

Errors cascading from this were observed on FreeBSD 14.1.
2025-03-17 09:41:04 -07:00
Guangli Dai
773b5809f9 Fix frame pointer based unwinder to handle changing stack range 2025-03-13 17:15:42 -07:00
Dmitry Ilvokhin
ad108d50f1 Extend purging algorithm with peak demand tracking
Implementation inspired by idea described in "Beyond malloc efficiency
to fleet efficiency: a hugepage-aware memory allocator" paper [1].

Primary idea is to track maximum number (peak) of active pages in use
with sliding window and then use this number to decide how many dirty
pages we would like to keep.

We are trying to estimate maximum amount of active memory we'll need in
the near future. We do so by projecting future active memory demand
(based on peak active memory usage we observed in the past within
sliding window) and adding slack on top of it (an overhead is reasonable
to have in exchange of higher hugepages coverage). When peak demand
tracking is off, projection of future active memory is active memory we
are having right now.

Estimation is essentially the same as `nactive_max * (1 + dirty_mult)`.

Peak demand purging algorithm controlled by two config options. Option
`hpa_peak_demand_window_ms` controls duration of sliding window we track
maximum active memory usage in and option `hpa_dirty_mult` controls
amount of slack we are allowed to have as a percent from maximum active
memory usage. By default `hpa_peak_demand_window_ms == 0` now and we
have same behaviour (ratio based purging) that we had before this
commit.

[1]: https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf
2025-03-13 10:12:22 -07:00