No description
Find a file
guangli-dai c067a55c79 Introducing a new usize calculation policy
Converting size to usize is what jemalloc has been done by ceiling
size to the closest size class. However, this causes lots of memory
wastes with HPA enabled.  This commit changes how usize is calculated so
that the gap between two contiguous usize is no larger than a page.
Specifically, this commit includes the following changes:

1. Adding a build-time config option (--enable-limit-usize-gap) and a
runtime one (limit_usize_gap) to guard the changes.
When build-time
config is enabled, some minor CPU overhead is expected because usize
will be stored and accessed apart from index.  When runtime option is
also enabled (it can only be enabled with the build-time config
enabled). a new usize calculation approach wil be employed.  This new
calculation will ceil size to the closest multiple of PAGE for all sizes
larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes.
Note when the build-time config is enabled, the runtime option is
default on.

2. Prepare tcache for size to grow by PAGE over GROUP*PAGE.
To prepare for the upcoming changes where size class grows by PAGE when
larger than NGROUP * PAGE, disable the tcache when it is larger than 2 *
NGROUP * PAGE. The threshold for tcache is set higher to prevent perf
regression as much as possible while usizes between NGROUP * PAGE and 2 *
NGROUP * PAGE happen to grow by PAGE.

3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE
For PAC, to avoid having too many bins, arena bins still have the same
layout.  This means some extra search is needed for a page-level request that
is not aligned with the orginal size class: it should also search the heap
before the current index since the previous heap might also be able to
have some allocations satisfying it.  The same changes apply to HPA's
psset.
This search relies on the enumeration of the heap because not all allocs in
the previous heap are guaranteed to satisfy the request.  To balance the
memory and CPU overhead, we currently enumerate at most a fixed number
of nodes before concluding none can satisfy the request during an
enumeration.

4. Add bytes counter to arena large stats.
To prepare for the upcoming usize changes, stats collected by
multiplying alive allocations and the bin size is no longer accurate.
Thus, add separate counters to record the bytes malloced and dalloced.

5. Change structs use when freeing to avoid using index2size for large sizes.
  - Change the definition of emap_alloc_ctx_t
  - Change the read of both from edata_t.
  - Change the assignment and usage of emap_alloc_ctx_t.
  - Change other callsites of index2size.
Note for the changes in the data structure, i.e., emap_alloc_ctx_t,
will be used when the build-time config (--enable-limit-usize-gap) is
enabled but they will store the same value as index2size(szind) if the
runtime option (opt_limit_usize_gap) is not enabled.

6. Adapt hpa to the usize changes.
Change the settings in sec to limit is usage for sizes larger than
USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests.

7. Modify usize calculation and corresponding tests.
Change the sz_s2u_compute. Note sz_index2size is not always safe now
while sz_size2index still works as expected.
2025-03-06 15:08:13 -08:00
.github/workflows Update acitons/checkout and actions/upload-artifact to v4 2024-03-12 12:59:15 -07:00
bin Updated jeprof with more symbols to filter. 2024-10-14 10:31:58 -07:00
build-aux Remove trailing whitespace 2023-06-23 11:58:18 -07:00
doc Update doc to reflect muzzy decay is disabled by default. 2024-10-10 16:41:23 -07:00
doc_internal update PROFILING_INTERNALS.md 2022-10-03 10:48:29 -07:00
include Introducing a new usize calculation policy 2025-03-06 15:08:13 -08:00
m4 Support C++17 over-aligned allocation 2019-11-22 10:14:16 -08:00
msvc Unlike prof_sample which is supported only with profiling mode active, prof_threshold is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks. 2025-01-29 18:55:52 -08:00
scripts Enable large hugepage tests for arm64 on Travis 2024-12-17 12:35:35 -08:00
src Introducing a new usize calculation policy 2025-03-06 15:08:13 -08:00
test Introducing a new usize calculation policy 2025-03-06 15:08:13 -08:00
.appveyor.yml Appveyor: fix 404 errors. 2020-10-27 15:28:20 -07:00
.autom4te.cfg Disable autom4te cache. 2014-09-02 17:49:29 -07:00
.cirrus.yml Remove unsupported Cirrus CI config 2025-03-03 16:29:04 -08:00
.clang-format Add a .clang-format file. 2020-10-02 14:49:56 -07:00
.gitattributes fix git handling of newlines on windows 2014-05-07 18:48:39 -04:00
.gitignore gitignore: Start ignoring clangd dirs. 2024-01-23 17:02:01 -08:00
.travis.yml Enable large hugepage tests for arm64 on Travis 2024-12-17 12:35:35 -08:00
autogen.sh build: Make autogen.sh accept quoted extra options 2024-01-03 14:20:34 -08:00
ChangeLog Update ChangeLog for 5.3.0. 2022-05-06 11:24:21 -07:00
config.stamp.in Move repo contents in jemalloc/ to top level. 2011-03-31 20:36:17 -07:00
configure.ac Introducing a new usize calculation policy 2025-03-06 15:08:13 -08:00
COPYING Update copyright dates. 2019-01-25 13:25:20 -08:00
INSTALL.md Update the configure cache file example in INSTALL.md 2024-10-10 16:41:48 -07:00
jemalloc.pc.in Expose jemalloc_prefix via pkg-config 2023-09-05 14:30:21 -07:00
Makefile.in Unlike prof_sample which is supported only with profiling mode active, prof_threshold is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks. 2025-01-29 18:55:52 -08:00
README switch to https 2023-03-09 11:44:02 -08:00
run_tests.sh Introduce scripts to run all possible tests 2017-01-30 17:51:57 -08:00
TUNING.md switch to https 2023-03-09 11:44:02 -08:00

jemalloc is a general purpose malloc(3) implementation that emphasizes
fragmentation avoidance and scalable concurrency support.  jemalloc first came
into use as the FreeBSD libc allocator in 2005, and since then it has found its
way into numerous applications that rely on its predictable behavior.  In 2010
jemalloc development efforts broadened to include developer support features
such as heap profiling and extensive monitoring/tuning hooks.  Modern jemalloc
releases continue to be integrated back into FreeBSD, and therefore versatility
remains critical.  Ongoing development efforts trend toward making jemalloc
among the best allocators for a broad range of demanding applications, and
eliminating/mitigating weaknesses that have practical repercussions for real
world applications.

The COPYING file contains copyright and licensing information.

The INSTALL file contains information on how to configure, build, and install
jemalloc.

The ChangeLog file contains a brief summary of changes for each release.

URL: https://jemalloc.net/