Commit graph

47 commits

Author SHA1 Message Date
guangli-dai
6200e8987f Reformat the codebase with the clang-format 18. 2026-03-10 18:14:33 -07:00
guangli-dai
8347f1045a Renaming limit_usize_gap to disable_large_size_classes 2025-05-06 14:47:35 -07:00
guangli-dai
c067a55c79 Introducing a new usize calculation policy
Converting size to usize is what jemalloc has been done by ceiling
size to the closest size class. However, this causes lots of memory
wastes with HPA enabled.  This commit changes how usize is calculated so
that the gap between two contiguous usize is no larger than a page.
Specifically, this commit includes the following changes:

1. Adding a build-time config option (--enable-limit-usize-gap) and a
runtime one (limit_usize_gap) to guard the changes.
When build-time
config is enabled, some minor CPU overhead is expected because usize
will be stored and accessed apart from index.  When runtime option is
also enabled (it can only be enabled with the build-time config
enabled). a new usize calculation approach wil be employed.  This new
calculation will ceil size to the closest multiple of PAGE for all sizes
larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes.
Note when the build-time config is enabled, the runtime option is
default on.

2. Prepare tcache for size to grow by PAGE over GROUP*PAGE.
To prepare for the upcoming changes where size class grows by PAGE when
larger than NGROUP * PAGE, disable the tcache when it is larger than 2 *
NGROUP * PAGE. The threshold for tcache is set higher to prevent perf
regression as much as possible while usizes between NGROUP * PAGE and 2 *
NGROUP * PAGE happen to grow by PAGE.

3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE
For PAC, to avoid having too many bins, arena bins still have the same
layout.  This means some extra search is needed for a page-level request that
is not aligned with the orginal size class: it should also search the heap
before the current index since the previous heap might also be able to
have some allocations satisfying it.  The same changes apply to HPA's
psset.
This search relies on the enumeration of the heap because not all allocs in
the previous heap are guaranteed to satisfy the request.  To balance the
memory and CPU overhead, we currently enumerate at most a fixed number
of nodes before concluding none can satisfy the request during an
enumeration.

4. Add bytes counter to arena large stats.
To prepare for the upcoming usize changes, stats collected by
multiplying alive allocations and the bin size is no longer accurate.
Thus, add separate counters to record the bytes malloced and dalloced.

5. Change structs use when freeing to avoid using index2size for large sizes.
  - Change the definition of emap_alloc_ctx_t
  - Change the read of both from edata_t.
  - Change the assignment and usage of emap_alloc_ctx_t.
  - Change other callsites of index2size.
Note for the changes in the data structure, i.e., emap_alloc_ctx_t,
will be used when the build-time config (--enable-limit-usize-gap) is
enabled but they will store the same value as index2size(szind) if the
runtime option (opt_limit_usize_gap) is not enabled.

6. Adapt hpa to the usize changes.
Change the settings in sec to limit is usage for sizes larger than
USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests.

7. Modify usize calculation and corresponding tests.
Change the sz_s2u_compute. Note sz_index2size is not always safe now
while sz_size2index still works as expected.
2025-03-06 15:08:13 -08:00
guangli-dai
eda05b3994 Fix static analysis warnings. 2024-01-03 14:18:52 -08:00
Kevin Svetlitski
3e82f357bb Fix all optimization-inhibiting integer-to-pointer casts
Following from PR #2481, we replace all integer-to-pointer casts [which
hide pointer provenance information (and thus inhibit
optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html)
with equivalent operations that preserve this information. I have
enabled the corresponding clang-tidy check in our static analysis CI so
that we do not get bitten by this again in the future.
2023-07-24 14:40:42 -07:00
Qi Wang
602edd7566 Enabled -Wstrict-prototypes and fixed warnings. 2023-07-06 12:00:02 -07:00
Qi Wang
ce0b7ab6c8 Inline the storage for thread name in prof_tdata_t.
The previous approach managed the thread name in a separate buffer, which causes
races because the thread name update (triggered by new samples) can happen at
the same time as prof dumping (which reads the thread names) -- these two
operations are under separate locks to avoid blocking each other.  Implemented
the thread name storage as part of the tdata struct, which resolves the lifetime
issue and also avoids internal alloc / dalloc during prof_sample.
2023-04-05 10:03:12 -07:00
Qi Wang
5fd55837bb Fix thread_name updating for heap profiling.
The current thread name reading path updates the name every time, which requires
both alloc and dalloc -- and the temporary NULL value in the middle causes races
where the prof dump read path gets NULLed in the middle.

Minimize the changes in this commit to isolate the bugfix testing; will also
refactor the whole thread name paths later.
2023-02-15 17:49:40 -08:00
Guangli Dai
a0734fd6ee Making jemalloc max stack depth a runtime option 2022-09-12 13:56:22 -07:00
yunxu
b798fabdf7 Add prof_leak_error option
The option makes the process to exit with error code 1 if a memory leak
is detected. This is useful for implementing automated tools that rely
on leak detection.
2022-01-21 16:24:20 -08:00
Qi Wang
d038160f3b Fix shadowed variable usage.
Verified with EXTRA_CFLAGS=-Wshadow.
2021-12-23 10:55:08 -08:00
Yinan Zhang
20f2479ed7 Do not create size class tables for non-prof builds 2020-08-24 20:10:02 -07:00
Yinan Zhang
8efcdc3f98 Move unbias data to prof_data 2020-08-24 20:10:02 -07:00
David Goldblatt
60993697d8 Prof: Add prof_unbias.
This gives more accurate attribution of bytes and counts to stack traces,
without introducing backwards incompatibilities in heap-profile parsing tools.
We track the ideal reported (to the end user) number of bytes more carefully
inside core jemalloc.  When dumping heap profiles, insteading of outputting our
counts directly, we output counts that will cause parsing tools to give a result
close to the value we want.

We retain the old version as an opt setting, to let users who are tracking
values on a per-component basis to keep their metrics stable until they decide
to switch.
2020-08-05 18:33:55 -07:00
Yinan Zhang
c2e7a06392 No need to intercept prof_dump_header() in tests 2020-06-29 14:27:50 -07:00
Yinan Zhang
f58ebdff7a Generalize prof_cnt_all() for testing 2020-06-29 14:27:50 -07:00
Yinan Zhang
d4259ea53b Simplify signatures for prof dump functions 2020-06-29 14:27:50 -07:00
Yinan Zhang
5d823f3a91 Consolidate struct definitions for prof dump parameters 2020-06-29 14:27:50 -07:00
Yinan Zhang
1f5fe3a3e3 Pass write callback explicitly in prof_data 2020-06-29 14:27:50 -07:00
Yinan Zhang
4556d3c0c8 Define structures for prof dump parameters 2020-06-29 14:27:50 -07:00
Yinan Zhang
dad821bb22 Move unwind to prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
d128efcb6a Relocate a few prof utilities to the right modules 2020-06-29 14:27:50 -07:00
Yinan Zhang
4736fb4fc9 Move file handling logic in prof_data to prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
adfd9d7b1d Change tsdn to tsd for thread name allocation 2020-06-29 14:27:50 -07:00
Yinan Zhang
841af2b426 Move thread name handling to prof_data module 2020-06-29 14:27:50 -07:00
Yinan Zhang
c8683bee80 Unify printing for prof counts object 2020-06-29 14:27:50 -07:00
Yinan Zhang
5d292b5660 Push error handling logic out of core dumping logic 2020-06-29 14:27:50 -07:00
Yinan Zhang
354183b10d Define prof dump buffer size centrally 2020-06-29 14:27:50 -07:00
Yinan Zhang
7455813e57 Make dump file writing replaceable in test 2020-06-29 14:27:50 -07:00
Yinan Zhang
21e44c45d9 Make maps file opening replaceable in test 2020-06-29 14:27:50 -07:00
Yinan Zhang
4bb4037dbe Extract utility function for opening maps file 2020-06-29 14:27:50 -07:00
Yinan Zhang
f307b25804 Only replace the dump file opening function in test 2020-06-29 14:27:50 -07:00
Yinan Zhang
a795b19327 Remove beginning define in source files
```
sed -i "/^#define JEMALLOC_[A-Z_]*_C_$/d" src/*.c;
```
2020-06-19 12:15:44 -07:00
Yinan Zhang
b7858abfc0 Expose prof testing internal functions 2020-06-19 09:16:51 -07:00
Yinan Zhang
1e2524e15a Do not reset sample wait time when re-initing tdata 2020-05-12 09:16:16 -07:00
Yinan Zhang
84b28c6a13 Properly handle tdata deletion race 2020-01-21 16:51:26 -08:00
Yinan Zhang
d331208560 Get rid of redundant logic in prof 2020-01-21 16:51:26 -08:00
Yinan Zhang
b8df719d5c No tdata creation for backtracing on dying thread 2020-01-16 21:54:14 -08:00
Yinan Zhang
9a60cf54ec Last-N profiling mode 2019-12-30 15:58:57 -08:00
Yinan Zhang
3fa142cf39 Remove _externs from prof internal header names 2019-12-23 11:14:15 -08:00
Yinan Zhang
ea42174d07 Refactor profiling headers 2019-12-20 17:17:48 -08:00
Yinan Zhang
7d2bac5a38 Refactor destroy code path for prof_tctx 2019-12-10 16:31:05 -08:00
Yinan Zhang
7e3671911f Get rid of old indentation style for prof 2019-12-06 09:47:51 -08:00
Yinan Zhang
dfdd46f6c1 Refactor prof_tctx_t creation 2019-12-06 09:47:51 -08:00
Qi Wang
da50d8ce87 Refactor and optimize prof sampling initialization.
Makes the prof sample prng use the tsd prng_state.  This allows us to properly
initialize the sample interval event, without having to create tdata.  As a
result, tdata will be created on demand (when a thread reaches the sample
interval bytes allocated), instead of on the first allocation.
2019-11-11 10:35:37 -08:00
Yinan Zhang
66e07f986d Suppress tdata creation in reentrancy
This change suppresses tdata initialization and prof sample threshold
update in interrupting malloc calls.  Interrupting calls have no need
for tdata.  Delaying tdata creation aligns better with our lazy tdata
creation principle, and it also helps us gain control back from
interrupting calls more quickly and reduces any risk of delegating
tdata creation to an interrupting call.
2019-10-04 08:52:50 -07:00
Yinan Zhang
07ce2434bf Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-08-07 19:48:28 -07:00