Compare commits

...

1682 commits
5.0.0 ... dev

Author SHA1 Message Date
Guangli Dai
81034ce1f1
Update ChangeLog for release 5.3.1 2026-04-13 17:12:37 -07:00
Ian Ker-Seymer
b8646f4db3 Fix opt.max_background_threads default in docs 2026-04-13 14:46:53 -07:00
Guangli Dai
6515df8cec
Documentation updates (#2869)
* Document new mallctl interfaces added since 5.3.0

Add documentation for the following new mallctl entries:
- opt.debug_double_free_max_scan: double-free detection scan limit
- opt.prof_bt_max: max profiling backtrace depth
- opt.disable_large_size_classes: page-aligned large allocations
- opt.process_madvise_max_batch: batched process_madvise purging
- thread.tcache.max: per-thread tcache_max control
- thread.tcache.ncached_max.read_sizeclass: query ncached_max
- thread.tcache.ncached_max.write: set ncached_max per size range
- arena.<i>.name: get/set arena names
- arenas.hugepage: hugepage size
- approximate_stats.active: lightweight active bytes estimate

Remove config.prof_frameptr since it still needs more development
and is still experimental.

Co-authored-by: lexprfuncall <carl.shapiro@gmail.com>
2026-04-07 10:41:44 -07:00
Slobodan Predolac
f265645d02 Emit retained HPA slab stats in JSON 2026-04-01 23:15:19 -04:00
Slobodan Predolac
db7d99703d Add TODO to benchmark possibly better policy 2026-04-01 23:15:19 -04:00
Slobodan Predolac
6281482c39 Nest HPA SEC stats inside hpa_shard JSON 2026-04-01 23:15:19 -04:00
Slobodan Predolac
3cc56d325c Fix large alloc nrequests under-counting on cache misses 2026-04-01 23:15:19 -04:00
Slobodan Predolac
a47fa33b5a Run clang-format on test/unit/tcache_max.c 2026-04-01 23:15:19 -04:00
Slobodan Predolac
b507644cb0 Fix conf_handle_char_p zero-sized dest and remove unused conf_handle_unsigned 2026-04-01 23:15:19 -04:00
Slobodan Predolac
3ac9f96158 Run clang-format on test/unit/conf_parse.c 2026-04-01 23:15:19 -04:00
Slobodan Predolac
5904a42187 Fix memory leak of old curr_reg on san_bump_grow_locked failure
When san_bump_grow_locked fails, it sets sba->curr_reg to NULL.
The old curr_reg (saved in to_destroy) was never freed or restored,
leaking the virtual memory extent. Restore sba->curr_reg from
to_destroy on failure so the old region remains usable.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
2fceece256 Fix extra size argument in edata_init call in extent_alloc_dss
An extra 'size' argument was passed where 'slab' (false) should be,
shifting all subsequent arguments: slab got size (nonzero=true),
szind got false (0), and sn got SC_NSIZES instead of a proper serial
number from extent_sn_next(). Match the correct pattern used by the
gap edata_init call above.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
234404d324 Fix wrong loop variable for array index in sz_boot_pind2sz_tab
The sentinel fill loop used sz_pind2sz_tab[pind] (constant) instead
of sz_pind2sz_tab[i] (loop variable), writing only to the first
entry repeatedly and leaving subsequent entries uninitialized.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
675ab079e7 Fix missing release of acquired neighbor edata in extent_try_coalesce_impl
When emap_try_acquire_edata_neighbor returned a non-NULL neighbor but
the size check failed, the neighbor was never released from
extent_state_merging, making it permanently invisible to future
allocation and coalescing operations.

Release the neighbor when it doesn't meet the size requirement,
matching the pattern used in extent_recycle_extract.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
3f6e63e86a Fix wrong type for malloc_read_fd return value in prof_stack_range
Used size_t (unsigned) instead of ssize_t for the return value of
malloc_read_fd, which returns -1 on error. With size_t, -1 becomes
a huge positive value, bypassing the error check and corrupting the
remaining byte count.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
dd30c91eaa Fix wrong fallback value in os_page_detect when sysconf fails
Returned LG_PAGE (log2 of page size, e.g. 12) instead of PAGE (actual
page size, e.g. 4096) when sysconf(_SC_PAGESIZE) failed. This would
cause os_page to be set to an absurdly small value, breaking all
page-aligned operations.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
3a8bee81f1 Fix pac_mapped stats inflation on allocation failure
newly_mapped_size was set unconditionally in the ecache_alloc_grow
fallback path, even when the allocation returned NULL. This inflated
pac_mapped stats without a corresponding deallocation to correct them.

Guard the assignment with an edata != NULL check, matching the pattern
used in the batched allocation path above it.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
c2d57040f0 Fix out-of-bounds write in malloc_vsnprintf when size is 0
When called with size==0, the else branch wrote to str[size-1] which
is str[(size_t)-1], a massive out-of-bounds write. Standard vsnprintf
allows size==0 to mean "compute length only, write nothing".

Add unit test for the size==0 case.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
eab2b29736 Fix off-by-one in stats_arenas_i_bins_j and stats_arenas_i_lextents_j bounds checks
Same pattern as arenas_bin_i_index: used > instead of >= allowing
access one past the end of bstats[] and lstats[] arrays.

Add unit tests that verify boundary indices return ENOENT.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
a0f2bdf91d Fix missing negation in large_ralloc_no_move usize_min fallback
The second expansion attempt in large_ralloc_no_move omitted the !
before large_ralloc_no_move_expand(), inverting the return value.
On expansion failure, the function falsely reported success, making
callers believe the allocation was expanded in-place when it was not.
On expansion success, the function falsely reported failure, causing
callers to unnecessarily allocate, copy, and free.

Add unit test that verifies the return value matches actual size change.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
87f9938de5 Fix duplicate "nactive_huge" JSON key in HPA shard stats output
In both the full_slabs and empty_slabs JSON sections of HPA shard
stats, "nactive_huge" was emitted twice instead of emitting
"ndirty_huge" as the second entry. This caused ndirty_huge to be
missing from the JSON output entirely.

Add a unit test that verifies both sections contain "ndirty_huge".
2026-04-01 23:15:19 -04:00
Slobodan Predolac
513778bcb1 Fix off-by-one in arenas_bin_i_index and arenas_lextent_i_index bounds checks
The index validation used > instead of >=, allowing access at index
SC_NBINS (for bins) and SC_NSIZES-SC_NBINS (for lextents), which are
one past the valid range. This caused out-of-bounds reads in bin_infos[]
and sz_index2size_unsafe().

Add unit tests that verify the boundary indices return ENOENT.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
176ea0a801 Remove experimental.thread.activity_callback 2026-04-01 16:23:41 -07:00
Slobodan Predolac
19bbefe136 Remove dead code: extent_commit_wrapper, large_salloc, tcache_gc_dalloc event waits
These functions had zero callers anywhere in the codebase:
- extent_commit_wrapper: wrapper never called, _impl used directly
- large_salloc: trivial wrapper never called
- tcache_gc_dalloc_new_event_wait: no header declaration, no callers
- tcache_gc_dalloc_postponed_event_wait: no header declaration, no callers
2026-04-01 17:48:19 -04:00
Weixie Cui
a87c518bab Fix typo in prof_log_rep_check: use != instead of || for alloc_count
The condition incorrectly used 'alloc_count || 0' which was likely a typo
for 'alloc_count != 0'. While both evaluate similarly for the zero/non-zero
case, the fix ensures consistency with bt_count and thr_count checks and
uses the correct comparison operator.
2026-03-26 10:42:29 -07:00
Slobodan Predolac
d758349ca4 Fix psset_pick_purge when last candidate with index 0 dirtiness is ineligible
psset_pick_purge used max_bit-- after rejecting a time-ineligible
candidate, which caused unnecessary re-scanning of the same bitmap
and makes assert fail in debug mode) and a size_t underflow
when the lowest-index entry was rejected.  Use max_bit = ind - 1
to skip directly past the rejected index.
2026-03-26 10:39:37 -07:00
Tony Printezis
1d018d8fda improve hpdata_assert_consistent()
A few ways this consistency check can be improved:
* Print which conditions fail and associated values.
* Accumulate the result so that we can print all conditions that fail.
* Turn hpdata_assert_consistent() into a macro so, when it fails,
  we can get line number where it's called from.
2026-03-26 10:39:23 -07:00
Carl Shapiro
86b7219213 Add unit tests for conf parsing and its helpers 2026-03-10 18:14:33 -07:00
Carl Shapiro
ad726adf75 Separate out the configuration code from initialization 2026-03-10 18:14:33 -07:00
Carl Shapiro
a056c20d67 Handle tcache init failures gracefully
tsd_tcache_data_init() returns true on failure but its callers ignore
this return value, leaving the per-thread tcache in an uninitialized
state after a failure.

This change disables the tcache on an initialization failure and logs
an error message.  If opt_abort is true, it will also abort.

New unit tests have been added to test tcache initialization failures.
2026-03-10 18:14:33 -07:00
Carl Shapiro
a75655badf Add unit test coverage for bin interfaces 2026-03-10 18:14:33 -07:00
Carl Shapiro
0ac9380cf1 Move bin inline functions from arena_inlines_b.h to bin_inlines.h
This is a continuation of my previous clean-up change, now focusing on
the inline functions defined in header files.
2026-03-10 18:14:33 -07:00
Carl Shapiro
1cc563f531 Move bin functions from arena.c to bin.c
This is a clean-up change that gives the bin functions implemented in
the area code a prefix of bin_ and moves them into the bin code.

To further decouple the bin code from the arena code, bin functions
that had taken an arena_t to check arena_is_auto now take an is_auto
parameter instead.
2026-03-10 18:14:33 -07:00
guangli-dai
c73ab1c2ff Add a test to check the output in JSON-based stats is consistent with mallctl results. 2026-03-10 18:14:33 -07:00
guangli-dai
12b33ed8f1 Fix wrong mutex stats in json-formatted malloc stats
During mutex stats emit, derived counters are not emitted for json.
Yet the array indexing counter should still be increased to skip
derived elements in the output, which was not. This commit fixes it.
2026-03-10 18:14:33 -07:00
Carl Shapiro
79cc7dcc82 Guard os_page_id against a NULL address
While undocumented, the prctl system call will set errno to ENOMEM
when passed NULL as an address.  Under that condition, an assertion
that check for EINVAL as the only possible errno value will fail.  To
avoid the assertion failure, this change skips the call to os_page_id
when address is NULL.  NULL can only occur after mmap fails in which
case there is no mapping to name.
2026-03-10 18:14:33 -07:00
Yuxuan Chen
a10ef3e1f1 configure: add --with-cxx-stdlib option
When C++ support is enabled, configure unconditionally probes
`-lstdc++` and keeps it in LIBS if the link test succeeds. On
platforms using libc++, this probe can succeed at compile time (if
libstdc++ headers/libraries happen to be installed) but then cause
runtime failures when configure tries to execute test binaries
because `libstdc++.so.6` isn't actually available.

Add a `--with-cxx-stdlib=<libstdc++|libcxx>` option that lets the
build system specify which C++ standard library to link. When given,
the probe is skipped and the specified library is linked directly.
When not given, the original probe behavior is preserved.
2026-03-10 18:14:33 -07:00
Tony Printezis
0fa27fd28f Run single subtest from a test file
Add mechanism to be able to select a test to run from a test file. The test harness will read the JEMALLOC_TEST_NAME env and, if set, it will only run subtests with that name.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
34ace9169b Remove prof_threshold built-in event. It is trivial to implement it as user event if needed 2026-03-10 18:14:33 -07:00
Andrei Pechkurov
4d0ffa075b Fix background thread initialization race 2026-03-10 18:14:33 -07:00
Slobodan Predolac
d4908fe44a Revert "Experimental configuration option for fast path prefetch from cache_bin"
This reverts commit f9fae9f1f8.
2026-03-10 18:14:33 -07:00
Carl Shapiro
c51abba131 Determine the page size on Android from NDK header files
The definition of the PAGE_SIZE macro is used as a signal for a 32-bit
target or a 64-bit target with an older NDK.  Otherwise, a 16KiB page
size is assumed.

Closes: #2657
2026-03-10 18:14:33 -07:00
Carl Shapiro
5f353dc283 Remove an incorrect use of the address operator
The address of the local variable created_threads is a different
location than the data it points to.  Incorrectly treating these
values as being the same can cause out-of-bounds writes to the stack.

Closes: facebook/jemalloc#59
2026-03-10 18:14:33 -07:00
Carl Shapiro
365747bc8d Use the BRE construct \{1,\} for one or more consecutive matches
This removes duplication introduced by my earlier commit that
eliminating the use of the non-standard "\+" from BREs in the
configure script.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
6016d86c18 [SEC] Make SEC owned by hpa_shard, simplify the code, add stats, lock per bin 2026-03-10 18:14:33 -07:00
Slobodan Predolac
c7690e92da Remove Cirrus CI 2026-03-10 18:14:33 -07:00
Slobodan Predolac
441e840df7 Add a script to generate github actions instead of Travis CI and Cirrus 2026-03-10 18:14:33 -07:00
Guangli Dai
0988583d7c Add a mallctl for users to get an approximate of active bytes. 2026-03-10 18:14:33 -07:00
Slobodan Predolac
8a06b086f3 [EASY] Extract hpa_central component from hpa source file 2026-03-10 18:14:33 -07:00
Slobodan Predolac
355774270d [EASY] Encapsulate better, do not pass hpa_shard when hooks are enough, move shard independent actions to hpa_utils 2026-03-10 18:14:33 -07:00
Slobodan Predolac
47aeff1d08 Add experimental_enforce_hugify 2026-03-10 18:14:33 -07:00
Shirui Cheng
6d4611197e move fill/flush pointer array out of tcache.c 2026-03-10 18:14:33 -07:00
Slobodan Predolac
3678a57c10 When extracting from central, hugify_eager is different than start_as_huge 2026-03-10 18:14:33 -07:00
guangli-dai
2cfa41913e Refactor init_system_thp_mode and print it in malloc stats. 2026-03-10 18:14:33 -07:00
Slobodan Predolac
87555dfbb2 Do not release the hpa_shard->mtx when inserting newly retrieved page from central before allocating from it 2026-03-10 18:14:33 -07:00
Carl Shapiro
f714cd9249 Inline the value of an always false boolean local variable
Next to its use, which is always as an argument, we include the name
of the parameter in a constant.  This completes a partially
implemented cleanup suggested in an earlier commit.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
5e49c28ef0 [EASY] Spelling in the comments 2026-03-10 18:14:33 -07:00
Slobodan Predolac
7c40be249c Add npurges and npurge_passes to output of pa_benchmark 2026-03-10 18:14:33 -07:00
Slobodan Predolac
707aab0c95 [pa-bench] Add clock to pa benchmark 2026-03-10 18:14:33 -07:00
Slobodan Predolac
a199278f37 [HPA] Add ability to start page as huge and more flexibility for purging 2026-03-10 18:14:33 -07:00
Slobodan Predolac
ace437d26a Running clang-format on two files 2026-03-10 18:14:33 -07:00
Slobodan Predolac
2688047b56 Revert "Do not dehugify when purging"
This reverts commit 16c5abd1cd.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
de886e05d2 Revert "Remove an unused function and global variable"
This reverts commit acd85e5359.
2026-03-10 18:14:33 -07:00
guangli-dai
755735a6bf Remove Travis Windows CI for now since it has infra failures. 2026-03-10 18:14:33 -07:00
Slobodan Predolac
d70882a05d [sdt] Add some tracepoints to sec and hpa modules 2026-03-10 18:14:33 -07:00
Carl Shapiro
67435187d1 Improve the portability of grep patterns in configure.ac
The configure.ac script uses backslash plus in its grep patterns to
match one or more occurrences.  This is a GNU grep extension to the
Basic Regular Expressions syntax that fails on systems with a more
traditional grep.  This changes fixes grep patterns that use backslash
plus to use a star instead.

Closes: #2777
2026-03-10 18:14:33 -07:00
guangli-dai
261591f123 Add a page-allocator microbenchmark. 2026-03-10 18:14:33 -07:00
guangli-dai
56cdce8592 Adding trace analysis in preparation for page allocator microbenchmark. 2026-03-10 18:14:33 -07:00
Carl Shapiro
daf44173c5 Replace an instance of indentation with spaces with tabs 2026-03-10 18:14:33 -07:00
Aurélien Brooke
ce02945070 Add missing thread_event_registry.c to Visual Studio projects
This file was added by b2a35a905f.
2026-03-10 18:14:33 -07:00
lexprfuncall
c51949ea3e Update config.guess and config.sub to the latest versions
These files need to be refreshed periodically to support new platform
types.

The following command was used to retrieve the updates

curl -L -O https://git.savannah.gnu.org/cgit/config.git/plain/config.guess
curl -L -O https://git.savannah.gnu.org/cgit/config.git/plain/config.sub

Closes: #2814
2026-03-10 18:14:33 -07:00
Carl Shapiro
5a634a8d0a Always use pthread_equal to compare thread IDs
This change replaces direct comparisons of Pthread thread IDs with
calls to pthread_equal.  Directly comparing thread IDs is neither
portable nor reliable since a thread ID is defined as an opaque type
that can be implemented using a structure.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
5d5f76ee01 Remove pidfd_open call handling and rely on PIDFD_SELF 2026-03-10 18:14:33 -07:00
lexprfuncall
9442300cc3 Change the default page size to 64KiB on Aarch64 Linux
This updates the configuration script to set the default page size to
64KiB on Aarch64 Linux.  This is motivated by compatibility as a build
configured for a 64KiB page will work on kernels that use the smaller
4KiB or 16KiB pages, whereas the reverse is not true.

To make the configured page size setting more visible, the script now
displays the page size when printing the configuration results.

Users that want to override the page size in to choose a smaller value
can still do so with the --with-lg-pagesize configuration option.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
2a66c0be5a [EASY][BUGFIX] Spelling and format 2026-03-10 18:14:33 -07:00
lexprfuncall
38b12427b7 Define malloc_{write,read}_fd as non-inline global functions
The static inline definition made more sense when these functions just
dispatched to a syscall wrapper.  Since they acquired a retry loop, a
non-inline definition makes more sense.
2026-03-10 18:14:33 -07:00
lexprfuncall
9fdc1160c5 Handle interruptions and retries of read(2) and write(2) 2026-03-10 18:14:33 -07:00
lexprfuncall
48b4ad60a7 Remove an orphaned comment
This was left behind when definitions of malloc_open and malloc_close
were abstracted from code that had followed.
2026-03-10 18:14:33 -07:00
Shirui Cheng
2114349a4e Revert PR #2608: Manually revert commits 70c94d..f9c0b5
Closes: #2707
2026-03-10 18:14:33 -07:00
lexprfuncall
ced8b3cffb Fix the compilation check for process madvise
An include of unistd.h is needed to make the declaration of the
syscall function visible to the compiler.  The include of sys/mman.h
is not used at all.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
5e98585b37 Save and restore errno when calling process_madvise 2026-03-10 18:14:33 -07:00
lexprfuncall
e4fa33148a Remove an unused function and global variable
When the dehugify functionality was retired in an previous commit, a
dehugify-related function and global variable in a test was
accidentally left in-place causing builds that add -Werror to CFLAGS
to fail.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
d73de95f72 Experimental configuration option for fast path prefetch from cache_bin 2026-03-10 18:14:33 -07:00
lexprfuncall
9528a2e2dd Use relaxed atomics to access the process madvise pid fd
Relaxed atomics already provide sequentially consistent access to single
location data structures.
2026-03-10 18:14:33 -07:00
lexprfuncall
a156e997d7 Do not dehugify when purging
Giving the advice MADV_DONTNEED to a range of virtual memory backed by
a transparent huge page already causes that range of virtual memory to
become backed by regular pages.
2026-03-10 18:14:33 -07:00
lexprfuncall
395e63bf7e Fix several spelling errors in comments 2026-03-10 18:14:33 -07:00
Slobodan Predolac
4246475b44 [process_madvise] Make init lazy so that python tests pass. Reset the pidfd on fork 2026-03-10 18:14:33 -07:00
Slobodan Predolac
f87bbab22c Add several USDT probes for hpa 2026-03-10 18:14:33 -07:00
Slobodan Predolac
711fff750c Add experimental support for usdt systemtap probes 2026-03-10 18:14:33 -07:00
guangli-dai
5847516692 Ignore the clang-format changes in the git blame. 2026-03-10 18:14:33 -07:00
guangli-dai
6200e8987f Reformat the codebase with the clang-format 18. 2026-03-10 18:14:33 -07:00
Shirui Cheng
a952a3b8b0 Update the default value for opt_experimental_tcache_gc and opt_calloc_madvise_threshold 2026-03-10 18:14:33 -07:00
Guangli Dai
e350c71571 Remove --enable-limit-usize-gap for cirrus CI since the config-time option is removed. 2026-03-10 18:14:33 -07:00
guangli-dai
95fc091b0f Update appveyor settings. 2026-03-10 18:14:33 -07:00
dzhao.ampere
c5547f9e64 test/unit/psset.c: fix SIGSEGV when PAGESIZE is large
When hugepage is enabled and PAGESIZE is large, the test could
ask for a stack size larger than user limit. Allocating the
memory instead can avoid the failure.

Closes: #2408
2026-03-10 18:14:33 -07:00
Slobodan Predolac
015b017973 [thread_event] Add support for user events in thread events when stats are enabled 2026-03-10 18:14:33 -07:00
Slobodan Predolac
e6864c6075 [thread_event] Remove macros from thread_event and replace with dynamic event objects 2026-03-10 18:14:33 -07:00
Qi Wang
1972241cd2 Remove unused options in the batched madvise unit tests. 2025-06-02 11:25:37 -07:00
Jason Evans
27d7960cf9 Revert "Extend purging algorithm with peak demand tracking"
This reverts commit ad108d50f1.
2025-06-02 10:44:37 -07:00
guangli-dai
edaab8b3ad Turn clang-format off for codes with multi-line commands in macros 2025-05-28 19:22:21 -07:00
guangli-dai
4531411abe Modify .clang-format to have declarations aligned 2025-05-28 19:22:21 -07:00
guangli-dai
1818170c8d Fix binshard.sh by specifying bin_shards for all sizes. 2025-05-28 19:21:49 -07:00
guangli-dai
fd60645260 Add one more check to double free validation. 2025-05-28 19:21:49 -07:00
Xin Yang
5e460bfea2 Refactor: use the cache_bin_sz_t typedef instead of direct uint16_t
any future changes to the underlying data type for bin sizes
(such as upgrading from `uint16_t` to `uint32_t`) can be achieved
by modifying only the `cache_bin_sz_t` definition.

Signed-off-by: Xin Yang <yangxin.dev@bytedance.com>
2025-05-22 10:43:33 -07:00
Xin Yang
9169e9272a Fix: Adjust CACHE_BIN_NFLUSH_BATCH_MAX size to prevent assert failures
The maximum allowed value for `nflush_batch` is
`CACHE_BIN_NFLUSH_BATCH_MAX`. However, `tcache_bin_flush_impl_small`
could potentially declare an array of `emap_batch_lookup_result_t`
of size `CACHE_BIN_NFLUSH_BATCH_MAX + 1`. leads to a `VARIABLE_ARRAY`
assertion failure, observed when `tcache_nslots_small_max` is
configured to 2048. This patch ensures the array size does not exceed
the allowed maximum.

Signed-off-by: Xin Yang <yangxin.dev@bytedance.com>
2025-05-22 10:27:09 -07:00
guangli-dai
f19a569216 Ignore formatting commit in blame. 2025-05-20 14:21:08 -07:00
Slobodan Predolac
b6338c4ff6 EASY - be explicit in non-vectorized hpa tests 2025-05-19 16:31:04 -07:00
guangli-dai
554185356b Sample format on tcache_max test 2025-05-19 15:06:13 -07:00
guangli-dai
3cee771cfa Modify .clang-format to make it more aligned with current freebsd style 2025-05-19 15:06:13 -07:00
Jiebin Sun
3c14707b01 To improve reuse efficiency, the maximum coalesced size for large extents
in the dirty ecache has been limited. This patch was tested with real
workloads using ClickHouse (Clickbench Q35) on a system with 2x240 vCPUs.
The results showed a 2X in query per second (QPS) performance and
a reduction in page faults to 29% of the previous rate. Additionally,
microbenchmark testing involved 256 memory reallocations resizing
from 4KB to 16KB in one arena, which demonstrated a 5X performance
improvement.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
2025-05-12 15:45:36 -07:00
guangli-dai
37bf846cc3 Fixes to prevent static analysis warnings. 2025-05-06 14:47:35 -07:00
guangli-dai
8347f1045a Renaming limit_usize_gap to disable_large_size_classes 2025-05-06 14:47:35 -07:00
Guangli Dai
01e9ecbeb2 Remove build-time configuration 'config_limit_usize_gap' 2025-05-06 14:47:35 -07:00
Slobodan Predolac
852da1be15 Add experimental option force using SYS_process_madvise 2025-04-28 18:45:30 -07:00
Slobodan Predolac
1956a54a43 [process_madvise] Use process_madvise across multiple huge_pages 2025-04-25 19:19:03 -07:00
Slobodan Predolac
0dfb4a5a1a Add output argument to hpa_purge_begin to count dirty ranges 2025-04-25 19:19:03 -07:00
Slobodan Predolac
cfa90dfd80 Refactor hpa purging to prepare for vectorized call across multiple pages 2025-04-25 19:19:03 -07:00
Qi Wang
a3910b9802 Avoid forced purging during thread-arena migration when bg thd is on. 2025-04-25 19:18:20 -07:00
guangli-dai
c23a6bfdf6 Add opt.limit_usize_gap to stats 2025-04-16 10:38:10 -07:00
guangli-dai
c20a63a765 Silence the uninitialized warning from clang. 2025-04-16 10:38:10 -07:00
Qi Wang
f81fb92a89 Remove Travis CI macOS configs (not supported anymore). 2025-04-14 15:27:38 -07:00
Slobodan Predolac
f19f49ef3e if process_madvise is supported, call it when purging hpa 2025-04-04 13:57:42 -07:00
Kaspar M. Rohrer
80e9001af3 Move `extern "C" specifications for C++ to where they are needed
This should fix errors when compiling C++ code with modules enabled on clang.
2025-03-31 10:41:51 -07:00
Shirui Cheng
3688dfb5c3 fix assertion error in huge_arena_auto_thp_switch() when b0 is deleted in unit test 2025-03-20 12:45:23 -07:00
Jay Lee
a4defdb854 detect false failure of strerror_r
See tikv/jemallocator#108.

In a summary, test on `strerror_r` can fail due to reasons other
than `strerror_r` itself, so add an additional test to determine
the failure is expected.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2025-03-17 17:50:20 -07:00
Shirui Cheng
e1a77ec558 Support THP with Huge Arena in PAC 2025-03-17 16:06:43 -07:00
Audrey Dutcher
86bbabac32 background_thread: add fallback for pthread_create dlsym
If jemalloc is linked into a shared library, the RTLD_NEXT dlsym call
may fail since RTLD_NEXT is only specified to search all objects after
the current one in the loading order, and the pthread library may be
earlier in the load order. Instead of failing immediately, attempt one
more time to find pthread_create via RTLD_GLOBAL.

Errors cascading from this were observed on FreeBSD 14.1.
2025-03-17 09:41:04 -07:00
Guangli Dai
81f35e0b55 Modify Travis tests to use frameptr when profiling 2025-03-13 17:15:42 -07:00
Guangli Dai
773b5809f9 Fix frame pointer based unwinder to handle changing stack range 2025-03-13 17:15:42 -07:00
Dmitry Ilvokhin
ad108d50f1 Extend purging algorithm with peak demand tracking
Implementation inspired by idea described in "Beyond malloc efficiency
to fleet efficiency: a hugepage-aware memory allocator" paper [1].

Primary idea is to track maximum number (peak) of active pages in use
with sliding window and then use this number to decide how many dirty
pages we would like to keep.

We are trying to estimate maximum amount of active memory we'll need in
the near future. We do so by projecting future active memory demand
(based on peak active memory usage we observed in the past within
sliding window) and adding slack on top of it (an overhead is reasonable
to have in exchange of higher hugepages coverage). When peak demand
tracking is off, projection of future active memory is active memory we
are having right now.

Estimation is essentially the same as `nactive_max * (1 + dirty_mult)`.

Peak demand purging algorithm controlled by two config options. Option
`hpa_peak_demand_window_ms` controls duration of sliding window we track
maximum active memory usage in and option `hpa_dirty_mult` controls
amount of slack we are allowed to have as a percent from maximum active
memory usage. By default `hpa_peak_demand_window_ms == 0` now and we
have same behaviour (ratio based purging) that we had before this
commit.

[1]: https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf
2025-03-13 10:12:22 -07:00
Qi Wang
22440a0207 Implement process_madvise support.
Add opt.process_madvise_max_batch which determines if process_madvise is enabled
(non-zero) and the max # of regions in each batch.  Added another limiting
factor which is the space to reserve on stack, which results in the max batch of
128.
2025-03-07 15:32:32 -08:00
Guangli Dai
70f019cd3a Enable limit-usize-gap in CI tests.
Considering the new usize calculation will be default soon, add the
config option in for Travis, Cirrus and appveyor.
2025-03-06 15:08:13 -08:00
Guangli Dai
6035d4a8d3 Cache extra extents in the dirty pool from ecache_alloc_grow 2025-03-06 15:08:13 -08:00
guangli-dai
c067a55c79 Introducing a new usize calculation policy
Converting size to usize is what jemalloc has been done by ceiling
size to the closest size class. However, this causes lots of memory
wastes with HPA enabled.  This commit changes how usize is calculated so
that the gap between two contiguous usize is no larger than a page.
Specifically, this commit includes the following changes:

1. Adding a build-time config option (--enable-limit-usize-gap) and a
runtime one (limit_usize_gap) to guard the changes.
When build-time
config is enabled, some minor CPU overhead is expected because usize
will be stored and accessed apart from index.  When runtime option is
also enabled (it can only be enabled with the build-time config
enabled). a new usize calculation approach wil be employed.  This new
calculation will ceil size to the closest multiple of PAGE for all sizes
larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes.
Note when the build-time config is enabled, the runtime option is
default on.

2. Prepare tcache for size to grow by PAGE over GROUP*PAGE.
To prepare for the upcoming changes where size class grows by PAGE when
larger than NGROUP * PAGE, disable the tcache when it is larger than 2 *
NGROUP * PAGE. The threshold for tcache is set higher to prevent perf
regression as much as possible while usizes between NGROUP * PAGE and 2 *
NGROUP * PAGE happen to grow by PAGE.

3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE
For PAC, to avoid having too many bins, arena bins still have the same
layout.  This means some extra search is needed for a page-level request that
is not aligned with the orginal size class: it should also search the heap
before the current index since the previous heap might also be able to
have some allocations satisfying it.  The same changes apply to HPA's
psset.
This search relies on the enumeration of the heap because not all allocs in
the previous heap are guaranteed to satisfy the request.  To balance the
memory and CPU overhead, we currently enumerate at most a fixed number
of nodes before concluding none can satisfy the request during an
enumeration.

4. Add bytes counter to arena large stats.
To prepare for the upcoming usize changes, stats collected by
multiplying alive allocations and the bin size is no longer accurate.
Thus, add separate counters to record the bytes malloced and dalloced.

5. Change structs use when freeing to avoid using index2size for large sizes.
  - Change the definition of emap_alloc_ctx_t
  - Change the read of both from edata_t.
  - Change the assignment and usage of emap_alloc_ctx_t.
  - Change other callsites of index2size.
Note for the changes in the data structure, i.e., emap_alloc_ctx_t,
will be used when the build-time config (--enable-limit-usize-gap) is
enabled but they will store the same value as index2size(szind) if the
runtime option (opt_limit_usize_gap) is not enabled.

6. Adapt hpa to the usize changes.
Change the settings in sec to limit is usage for sizes larger than
USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests.

7. Modify usize calculation and corresponding tests.
Change the sz_s2u_compute. Note sz_index2size is not always safe now
while sz_size2index still works as expected.
2025-03-06 15:08:13 -08:00
Guangli Dai
ac279d7e71 Fix profiling sample metadata lookup during xallocx 2025-03-04 14:42:04 -08:00
Qi Wang
f55e0c3f5c Remove unsupported Cirrus CI config 2025-03-03 16:29:04 -08:00
Dmitry Ilvokhin
499f306859 Fix arena 0 deferral_allowed flag init
Arena 0 have a dedicated initialization path, which differs from
initialization path of other arenas. The main difference for the purpose
of this change is that we initialize arena 0 before we initialize
background threads. HPA shard options have `deferral_allowed` flag which
should be equal to `background_thread_enabled()` return value, but it
wasn't the case before this change, because for arena 0
`background_thread_enabled()` was initialized correctly after arena 0
initialization phase already ended.

Below is initialization sequence for arena 0 after this commit to
illustrate everything still should be initialized correctly.

* `hpa_central_init` initializes HPA Central, before we initialize every
  HPA shard (including arena's 0).
* `background_thread_boot1` initializes `background_thread_enabled()`
  return value.
* `pa_shard_enable_hpa` initializes arena 0 HPA shard.

```
                       malloc_init_hard -------------
                      /           /                  \
                     /           /                    \
                    /           /                      \
malloc_init_hard_a0_locked  background_thread_boot1  pa_shard_enable_hpa
        /                     /                          \
       /                     /                            \
      /                     /                              \
arena_boot       background_thread_enabled_seta         hpa_shard_init
     |
     |
pa_central_init
     |
     |
hpa_central_init
```
2025-02-18 12:10:35 -08:00
Dmitry Ilvokhin
421b17a622 Remove age_counter from hpa_central
Before this commit we had two age counters: one global in HPA central
and one local in each HPA shard. We used HPA shard counter, when we are
reused empty pageslab and HPA central counter anywhere else. They
suppose to be comparable, because we use them for allocation placement
decisions, but in reality they are not, there is no ordering guarantees
between them.

At the moment, there is no way for pageslab to migrate between HPA
shards, so we don't actually need HPA central age counter.
2025-02-13 16:00:41 -08:00
roblabla
c17bf8b368 Disable config from file or envvar with build flag
This adds a new autoconf flag, --disable-user-config, which disables
reading the configuration from /etc/malloc.conf or the MALLOC_CONF
environment variable. This can be useful when integrating jemalloc in a
binary that internally handles all aspects of the configuration and
shouldn't be impacted by ambient change in the environment.
2025-02-05 15:01:50 -08:00
Dmitry Ilvokhin
34c823f147 Add autoconf options to enable sanitizers
This commit allows to enable sanitizers with autoconf options, instead
of modifying `CFLAGS`, `CXXFLAGS` and `LDFLAGS` directly.

* `--enable-tsan` option to enable Thread Sanitizer.
* `--enable-ubsan` option to enable Undefined Behaviour Sanitizer.

End goal is to speedup development by finding problems quickly, early
and easier. Eventually, when all current issues will be fixed, we can
enable sanitizers in CI. Fortunately, there are not a lot of problems we
need to fix.

Address Sanitizer is a bit controversial, because it replaces memory
allocator, so we decided to left it out for a while.

Below are couple of examples of how tests look like under different
sanitizers at the moment.

```
$  ../configure --enable-tsan --enable-debug
<...>
asan               : 0
tsan               : 1
ubsan              : 0
$ make -j`nproc` check
<...>
  Thread T13 (tid=332043, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x61748)
    #1 thd_create ../test/src/thd.c:25 (bin_batching+0x5631ca)
    #2 stress_run ../test/unit/bin_batching.c:148
(bin_batching+0x40364c)
    #3 test_races ../test/unit/bin_batching.c:249
(bin_batching+0x403d79)
    #4 p_test_impl ../test/src/test.c:149 (bin_batching+0x562811)
    #5 p_test_no_reentrancy ../test/src/test.c:213
(bin_batching+0x562d35)
    #6 main ../test/unit/bin_batching.c:268 (bin_batching+0x40417e)

SUMMARY: ThreadSanitizer: data race
../include/jemalloc/internal/edata.h:498 in edata_nfree_inc
```

```
$ ../configure --enable-ubsan --enable-debug
<...>
asan               : 0
tsan               : 0
ubsan              : 1
$ make -j`nproc` check
<...>
=== test/unit/hash ===
../test/unit/hash.c:119:16: runtime error: left shift of 176 by 24
places cannot be represented in type 'int'
<...>
```
2025-02-05 14:28:28 -08:00
Qi Wang
3bc89cfeca Avoid implicit conversion in test/unit/prof_threshold 2025-01-31 10:18:36 -08:00
Qi Wang
1abeae9ebd Fix test/unit/prof_threshold when !config_stats 2025-01-30 10:39:49 -08:00
Shai Duvdevani
257e64b968 Unlike prof_sample which is supported only with profiling mode active, prof_threshold is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks. 2025-01-29 18:55:52 -08:00
Dmitry Ilvokhin
ef8e512e29 Fix bitmap_ffu out of range read
We tried to load `g` from `bitmap[i]` before checking it is actually a
valid load. Tweaked a loop a bit to `break` early, when we are done
scanning for bits.

Before this commit undefined behaviour sanitizer from GCC 14+ was
unhappy at `test/unit/bitmap` test with following error.

```
../include/jemalloc/internal/bitmap.h:293:5: runtime error: load of
address 0x7bb1c2e08008 with insufficient space for an object of type
'const bitmap_t'
<...>
    #0 0x62671a149954 in bitmap_ffu ../include/jemalloc/internal/bitmap.h:293
    #1 0x62671a149954 in test_bitmap_xfu_body ../test/unit/bitmap.c:275
    #2 0x62671a14b767 in test_bitmap_xfu ../test/unit/bitmap.c:323
    #3 0x62671a376ad1 in p_test_impl ../test/src/test.c:149
    #4 0x62671a377135 in p_test ../test/src/test.c:200
    #5 0x62671a13da06 in main ../test/unit/bitmap.c:336
<...>
```
2025-01-28 10:42:20 -08:00
Qi Wang
607b866035 Check for 0 input when setting max_background_thread through mallctl.
Reported by @nc7s.
2025-01-28 10:38:56 -08:00
Qi Wang
20cc983314 Fix the gettid() detection caught by @mrluanma . 2025-01-22 10:30:53 -08:00
Dmitry Ilvokhin
52fa9577ba Fix integer overflow in test/unit/hash.c
`final[3]` is `uint8_t`. Integer conversion rank of `uint8_t` is lower
than integer conversion rank of `int`, so `uint8_t` got promoted to
`int`, which is signed integer type. Shift `final[3]` value left on 24,
when leftmost bit is set overflows `int` and it is undefined behaviour.

Before this change Undefined Behaviour Sanitizer was unhappy about it
with the following message.

```
../test/unit/hash.c:119:25: runtime error: left shift of 176 by 24
places cannot be represented in type 'int'
```

After this commit problem is gone.
2025-01-17 12:54:22 -08:00
Dan Horák
17881ebbfd Add configure check for gettid() presence
The gettid() function is available on Linux in glibc only since version
2.30. There are supported distributions that still use older glibc
version. Thus add a configure check if the gettid() function is
available and extend the check in src/prof_stack_range.c so it's skipped
also when gettid() isn't available.

Fixes: https://github.com/jemalloc/jemalloc/issues/2740
2024-12-17 12:40:54 -08:00
appujee
4b88bddbca Conditionally remove unreachable for C23+ 2024-12-17 12:39:00 -08:00
appujee
d8486b2653 Remove unreachable() macro as c23 already defines it.
Taken from https://android-review.git.corp.google.com/c/platform/external/jemalloc_new/+/3316478

This might need more cleanups to remove the definition of JEMALLOC_INTERNAL_UNREACHABLE.
2024-12-17 12:39:00 -08:00
Guangli Dai
587676fee8 Disable psset test when hugepage size is too large. 2024-12-17 12:35:35 -08:00
Guangli Dai
a17385a882 Enable large hugepage tests for arm64 on Travis 2024-12-17 12:35:35 -08:00
Guangli Dai
6786934280 Fix ehooks assertion for arena creation 2024-12-11 13:33:32 -08:00
Dmitry Ilvokhin
46690c9ec0 Fix test_retained on boxes with a lot of CPUs
We are trying to create `ncpus * 2` threads for this test and place them
into `VARIABLE_ARRAY`, but `VARIABLE_ARRAY` can not be more than
`VARIABLE_ARRAY_SIZE_MAX` bytes. When there are a lot of threads on the
box test always fails.

```
$ nproc
176

$ make -j`nproc` tests_unit && ./test/unit/retained
<jemalloc>: ../test/unit/retained.c:123: Failed assertion:
"sizeof(thd_t) * (nthreads) <= VARIABLE_ARRAY_SIZE_MAX"
Aborted (core dumped)
```

There is no need for high concurrency for this test as we are only
checking stats there and it's behaviour is quite stable regarding number
of allocating threads.

Limited number of threads to 16 to save compute resources (on CI for
example) and reduce tests running time.

Before the change (`nproc` is 80 on this box).

```
$ make -j`nproc` tests_unit && time ./test/unit/retained
<...>
real    0m0.372s
user    0m14.236s
sys     0m12.338s
```

After the change (same box).

```
$ make -j`nproc` tests_unit && time ./test/unit/retained
<...>
real    0m0.018s
user    0m0.108s
sys     0m0.068s
```
2024-12-02 14:12:26 -08:00
Dmitry Ilvokhin
6092c980a6 Expose psset state stats
When evaluating changes in HPA logic, it is useful to know internal
`hpa_shard` state. Great deal of this state is `psset`. Some of the
`psset` stats was available, but in disaggregated form, which is not
very convenient. This commit exposed `psset` counters to `mallctl`
and malloc stats dumps.

Example of how malloc stats dump will look like after the change.

HPA shard stats:
  Pageslabs: 14899 (4354 huge, 10545 nonhuge)
  Active pages: 6708166 (2228917 huge, 4479249 nonhuge)
  Dirty pages: 233816 (331 huge, 233485 nonhuge)
  Retained pages: 686306
  Purge passes: 8730 (10 / sec)
  Purges: 127501 (146 / sec)
  Hugeifies: 4358 (5 / sec)
  Dehugifies: 4 (0 / sec)

Pageslabs, active pages, dirty pages and retained pages are rows added
by this change.
2024-11-21 09:23:32 -08:00
Dmitry Ilvokhin
3820e38dc1 Remove validation for HPA ratios
Config validation was introduced at 3aae792b with main intention to fix
infinite purging loop, but it didn't actually fix the underlying
problem, just masked it. Later 47d69b4ea was merged to address the same
problem.

Options `hpa_dirty_mult` and `hpa_hugification_threshold` have different
application dimensions: `hpa_dirty_mult` applied to active memory on the
shard, but `hpa_hugification_threshold` is a threshold for single
pageslab (hugepage). It doesn't make much sense to sum them up together.

While it is true that too high value of `hpa_dirty_mult` and too low
value of `hpa_hugification_threshold` can lead to pathological
behaviour, it is true for other options as well. Poor configurations
might lead to suboptimal and sometimes completely unacceptable
behaviour and that's OK, that is exactly the reason why they are called
poor.

There are other mechanism exist to prevent extreme behaviour, when we
hugified and then immediately purged page, see
`hpa_hugify_blocked_by_ndirty` function, which exist to prevent exactly
this case.

Lastly, `hpa_dirty_mult + hpa_hugification_threshold >= 1` constraint is
too tight and prevents a lot of valid configurations.
2024-11-20 18:59:07 -08:00
Dmitry Ilvokhin
0ce13c6fb5 Add opt hpa_hugify_sync to hugify synchronously
Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort
synchronous collapse of the native pages mapped by the memory range into
transparent huge pages.

Synchronous hugification might be beneficial for at least two reasons:
we are not relying on khugepaged anymore and get an instant feedback if
range wasn't hugified.

If `hpa_hugify_sync` option is on, we'll try to perform synchronously
collapse and if it wasn't successful, we'll fallback to asynchronous
behaviour.
2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin
a361e886e2 Move je_cv_thp logic closer to definition 2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin
b82333fdec Split stats_arena_hpa_shard_print function
Make multiple functions from `stats_arena_hpa_shard_print` for
readability and ease of change in the future.
2024-11-08 12:18:15 -08:00
Dmitry Ilvokhin
b9758afff0 Add nstime_ms_since to get time since in ms
Milliseconds are used a lot in hpa, so it is convenient to have
`nstime_ms_since` function instead of dividing to `MILLION` constantly.

For consistency renamed `nstime_msec` to `nstime_ms` as `ms` abbreviation
is used much more commonly across codebase than `msec`.

```
$ grep -Rn '_msec' include src | wc -l
2

$ grep -RPn '_ms( |,|:)' include src | wc -l
72
```

Function `nstime_msec` wasn't used anywhere in the code yet.
2024-11-08 10:37:28 -08:00
Qi Wang
2a693b83d2 Fix the sized-dealloc safety check abort msg. 2024-10-14 10:34:15 -07:00
Qi Wang
6d625d5e5e Add support for clock_gettime_nsec_np()
Prefer clock_gettime_nsec_np(CLOCK_UPTIME_RAW) to mach_absolute_time().
2024-10-14 10:33:27 -07:00
Guangli Dai
397827a27d Updated jeprof with more symbols to filter. 2024-10-14 10:31:58 -07:00
Qi Wang
02251c0070 Update the configure cache file example in INSTALL.md 2024-10-10 16:41:48 -07:00
Qi Wang
8c2b8bcf24 Update doc to reflect muzzy decay is disabled by default.
It has been disabled since 5.2.0 (in #1421).
2024-10-10 16:41:23 -07:00
Nathan Slingerland
edc1576f03 Add safe frame-pointer backtrace unwinder 2024-10-01 11:01:56 -07:00
Ben Niu
3a0d9cdadb Use MSVC __declspec(thread) for TSD on Windows 2024-09-30 11:33:44 -07:00
Guangli Dai
1c900088c3 Do not support hpa if HUGEPAGE is too large. 2024-09-27 15:34:13 -07:00
Dmitry Ilvokhin
4f4fd42447 Remove strict_min_purge_interval option
Option `experimental_hpa_strict_min_purge_interval` was expected to be
temporary to simplify rollout of a bugfix. Now, when bugfix rollout is
complete it is safe to remove this option.
2024-09-25 11:49:18 -07:00
Qi Wang
6cc42173cb Assert the mutex is locked within malloc_mutex_assert_owner(). 2024-09-23 18:06:07 -07:00
Qi Wang
44db479fad Fix the lock owner sanity checking during background thread boot.
During boot, some mutexes are not initialized yet, plus there's no point taking
many mutexes while everything is covered by the global init lock, so the locking
assumptions in some functions (e.g. background_thread_enabled_set()) can't be
enforced.  Skip the lock owner check in this case.
2024-09-23 18:06:07 -07:00
Guangli Dai
0181aaa495 Optimize edata_cmp_summary_compare when __uint128_t is available 2024-09-23 16:23:42 -07:00
roblabla
734f29ce56 Fix compilation with MSVC 2022
On MSVC, log is an intrinsic that doesn't require libm. However,
AC_SEARCH_LIBS does not successfully detect this, as it will try to
compile a program using the wrong signature for log. Newer versions of
MSVC CL detects this and rejects the program with the following
messages:

conftest.c(40): warning C4391: 'char log()': incorrect return type for intrinsic function, expected 'double'
conftest.c(44): error C2168: 'log': too few actual parameters for intrinsic function

Since log is always available on MSVC (it's been around since the dawn
of time), we simply always assume it's there if MSVC is detected.
2024-09-23 10:42:31 -07:00
Qi Wang
de5606d0d8 Fix a missing init value warning caught by static analysis. 2024-09-20 16:56:07 -07:00
Qi Wang
1960536b61 Add malloc_mutex_is_locked() sanity checks. 2024-09-20 16:56:07 -07:00
Qi Wang
3eb7a4b53d Fix mutex state tracking around pthread_cond_wait().
pthread_cond_wait drops and re-acquires the mutex internally, w/o
going through our wrapper.  Update the locked state explicitly.
2024-09-20 16:56:07 -07:00
Qi Wang
661fb1e672 Fix the locked flag for malloc_mutex_trylock(). 2024-09-20 16:56:07 -07:00
Guangli Dai
db4f0e7182 Add travis tests for arm64. 2024-09-12 15:40:04 -07:00
Nathan Slingerland
8c2e15d1a5 Add malloc_open() / malloc_close() reentrancy safe helpers 2024-09-12 15:38:08 -07:00
Nathan Slingerland
60f472f367 Fix initialization of pop_attempt_results in bin_batching test 2024-09-12 11:36:17 -07:00
Qi Wang
323ed2e3a8 Optimize fast path to allow static size class computation.
After inlining at LTO time, many callsites have input size known which means the
index and usable size can be translated at compile time.  However the size-index
lookup table prevents it -- this commit solves that by switching to the compute
approach when the size is detected to be a known const.
2024-09-12 11:34:09 -07:00
Qi Wang
c1a3ca3755 Adjust the value width in stats output.
Some of the values are accumulative and can reach high after running for long
periods.
2024-09-11 14:29:32 -07:00
Qi Wang
3383b98f1b Check if the huge page size is expected when enabling HPA. 2024-09-04 15:43:59 -07:00
Qi Wang
cd05b19f10 Fix the VM over-reservation on aarch64 w/ larger pages.
HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages),
in which case it would cause grow_retained / exp_grow to over-reserve VMs.

Similarly, make sure the base alloc has a const 2M alignment.
2024-09-04 15:43:59 -07:00
Shirui Cheng
baa5a90cc6 fix nstime_update_mock in arena_decay unit test 2024-08-29 10:50:33 -07:00
Shirui Cheng
7c99686165 Better handle burst allocation on tcache_alloc_small_hard 2024-08-29 10:50:33 -07:00
Shirui Cheng
0c88be9e0a Regulate GC frequency by requiring a time interval between two consecutive GCs 2024-08-29 10:50:33 -07:00
Shirui Cheng
e2c9f3a9ce Take locality into consideration when doing GC flush 2024-08-29 10:50:33 -07:00
Shirui Cheng
14d5dc136a Allow a range for the nfill passed to arena_cache_bin_fill_small 2024-08-29 10:50:33 -07:00
Shirui Cheng
f68effe4ac Add a runtime option opt_experimental_tcache_gc to guard the new design 2024-08-29 10:50:33 -07:00
Ben Niu
9e123a833c Leverage new Windows API TlsGetValue2 for performance 2024-08-28 16:50:33 -07:00
Qi Wang
e29ac61987 Limit Cirrus CI to freebsd 15 and 14 2024-08-28 16:33:36 -07:00
Qi Wang
bd0a5b0f3b Fix static analysis warnings.
Newly reported warnings included several reserved macro identifier, and
false-positive used-uninitialized.
2024-08-28 16:03:53 -07:00
Guangli Dai
5b72ac098a Remove tests for ppc64 on Travic CI. 2024-08-26 09:53:00 -07:00
Shirui Cheng
8c54637f8c Better trigger race condition in bin_batching unit test 2024-08-23 14:10:04 -07:00
Dmitry Ilvokhin
c7ccb8d7e9 Add experimental prefix to hpa_strict_min_purge_interval
Goal is to make it obvious this option is experimental.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
aaa29003ab Limit maximum number of purged slabs with option
Option `experimental_hpa_max_purge_nhp` introduced for backward
compatibility reasons: to make it possible to have behaviour similar
to buggy `hpa_strict_min_purge_interval` implementation.

When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit
to number of slabs we'll purge on each iteration. Otherwise, we'll purge
no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in
turn means we might not purge enough dirty pages to satisfy
`hpa_dirty_mult` requirement.

Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and
`hpa_strict_min_purge_interval` options allows us to have steady rate of
pages returned back to the system. This provides a strickier latency
guarantees as number of `madvise` calls is bounded (and hence number of
TLB shootdowns is limited) in exchange to weaker memory usage
guarantees.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
143f458188 Fix hpa_strict_min_purge_interval option logic
We update `shard->last_purge` on each call of `hpa_try_purge` if we
purged something. This means, when `hpa_strict_min_purge_interval`
option is set only one slab will be purged, because on the next
call condition for too frequent purge protection
`since_last_purge_ms < shard->opts.min_purge_interval_ms` will always
be true. This is not an intended behaviour.

Instead, we need to check `min_purge_interval_ms` once and purge as many
pages as needed to satisfy requirements for `hpa_dirty_mult` option.

Make possible to count number of actions performed in unit tests (purge,
hugify, dehugify) instead of binary: called/not called. Extended current
unit tests with cases where we need to purge more than one page for a
purge phase.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
0a9f51d0d8 Simplify hpa_shard_maybe_do_deferred_work
It doesn't make much sense to repeat purging once we done with
hugification, because we can de-hugify pages that were hugified just
moment ago for no good reason. Let them wait next deferred work phase
instead. And if they still meeting purging conditions then, purge them.
2024-08-20 10:02:38 -07:00
Amaury Séchet
a25b9b8ba9 Simplify the logic when bumping lg_fill_div. 2024-08-06 13:31:49 -07:00
Shirui Cheng
8fefabd3a4 increase the ncached_max in fill_flush test case to 1024 2024-08-06 13:16:09 -07:00
Shirui Cheng
47c9bcd402 Use a for-loop to fulfill flush requests that are larger than CACHE_BIN_NFLUSH_BATCH_MAX items 2024-08-06 13:16:09 -07:00
Shirui Cheng
48f66cf4a2 add a size check when declare a stack array to be less than 2048 bytes 2024-08-06 13:16:09 -07:00
Burton Li
8dc97b1108 Fix NSTIME_MONOTONIC for win32 implementation 2024-07-30 10:30:41 -07:00
Nathan Slingerland
bc32ddff2d Add usize to prof_sample_hook_t 2024-07-30 10:29:30 -07:00
Dmitry Ilvokhin
b66f689764 Emit long string values without truncation
There are few long options (`bin_shards` and `slab_sizes` for example)
when they are specified and we emit statistics value gets truncated.

Moved emitting logic for strings into separate `emitter_emit_str`
function. It will try to emit string same way as before and if value is
too long will fallback emiting rest partially with chunks of `BUF_SIZE`.

Justification for long strings (longer than `BUF_SIZE`) is not
supported.
2024-07-29 13:58:31 -07:00
Danny Lin
c893fcd169 Change macOS mmap tag to fix conflict with CoreMedia
Tag 101 is assigned to "CoreMedia Capture Data", which makes for confusing output when debugging.

To avoid conflicts, use a tag in the reserved application-specific range from 240–255 (inclusive).

All assigned tags: 94d3b45284/osfmk/mach/vm_statistics.h (L773-L775)
2024-06-26 14:53:48 -07:00
Shirui Cheng
a1fcbebb18 skip tcache GC for tcache_max unit test 2024-06-25 12:59:45 -07:00
Guangli Dai
8477ec9562 Set dependent as false for all rtree reads without ownership 2024-06-24 10:50:20 -07:00
Guangli Dai
21bcc0a8d4 Make JEMALLOC_CXX_THROW definition compatible with newer C++ versions 2024-06-13 11:03:05 -07:00
Dmitry Ilvokhin
867c6dd7dc Option to guard hpa_min_purge_interval_ms fix
Change in `hpa_min_purge_interval_ms` handling logic is not backward
compatible as it might increase memory usage. Now this logic guarded by
`hpa_strict_min_purge_interval` option.

When `hpa_strict_min_purge_interval` is true, we will purge no more than
`hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is
false, old purging logic behaviour is preserved.

Long term strategy migrate all users of hpa to new logic and then delete
`hpa_strict_min_purge_interval` option.
2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin
91a6d230db Respect hpa_min_purge_interval_ms option
Currently, hugepages aware allocator backend works together with classic
one as a fallback for not yet supported allocations. When background
threads are enabled wake up time for classic interfere with hpa as there
were no checks inside hpa purging logic to check if we are not purging too
frequently. If background thread is running and `hpa_should_purge`
returns true, then we will purge, even if we purged less than
hpa_min_purge_interval_ms ago.
2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin
90c627edb7 Export hugepage size with arenas.hugepage 2024-06-05 15:37:41 -07:00
David Goldblatt
f9c0b5f7f8 Bin batching: add some stats.
This lets us easily see what fraction of flush load is being taken up by the
bins, and helps guide future optimization approaches (for example: should we
prefetch during cache bin fills? It depends on how many objects the average fill
pops out of the batch).
2024-05-22 10:30:31 -07:00
David Goldblatt
fc615739cb Add batching to arena bins.
This adds a fast-path for threads freeing a small number of allocations to
bins which are not their "home-base" and which encounter lock contention in
attempting to do so. In producer-consumer workflows, such small lock hold times
can cause lock convoying that greatly increases overall bin mutex contention.
2024-05-22 10:30:31 -07:00
David Goldblatt
44d91cf243 Tcache flush: Partition by bin before locking.
This accomplishes two things:
- It avoids a full array scan (and any attendant branch prediction misses, etc.)
  while holding the bin lock.
- It allows us to know the number of items that will be flushed before flushing
  them, which will (in an upcoming commit) let us know if it's safe to use the
  batched flush (in which case we won't acquire the bin mutex).
2024-05-22 10:30:31 -07:00
David Goldblatt
6e56848850 Tcache: Split up small/large handling.
The main bits of shared code are the edata filtering and the stats flushing
logic, both of which are fairly simple to read and not so painful to duplicate.
The shared code comes at the cost of guarding all the subtle logic with
`if (small)`, which doesn't feel worth it.
2024-05-22 10:30:31 -07:00
David Goldblatt
c085530c71 Tcache batching: Plumbing
In the next commit, we'll start using the batcher to eliminate mutex traffic.
To avoid cluttering up that commit with the random bits of busy-work it entails,
we'll centralize them here.  This commit introduces:
- A batched bin type.
- The ability to mix batched and unbatched bins in the arena.
- Conf parsing to set batches per size and a max batched size.
- mallctl access to the corresponding opt-namespace keys.
- Stats output of the above.
2024-05-22 10:30:31 -07:00
David Goldblatt
70c94d7474 Add batcher module.
This can be used to batch up simple operation commands for later use by another
thread.
2024-05-22 10:30:31 -07:00
David Goldblatt
86f4851f5d Add clang static analyzer suppression macro. 2024-05-22 10:30:31 -07:00
Amaury Séchet
5afff2e44e Simplify the logic in tcache_gc_small. 2024-05-02 18:52:19 -07:00
Qi Wang
8d8379da44 Fix background_thread creation for the oversize_arena.
Bypassing background thread creation for the oversize_arena used to be an
optimization since that arena had eager purging.  However #2466 changed the
purging policy for the oversize_arena -- specifically it switched to the default
decay time when background_thread is enabled.

This issue is noticable when the number of arenas is low: whenever the total #
of arenas is <= 4 (which is the default max # of background threads), in which
case the purging will be stalled since no background thread is created for the
oversize_arena.
2024-05-02 14:45:18 -07:00
Dmitry Ilvokhin
47d69b4eab HPA: Fix infinite purging loop
One of the condition to start purging is `hpa_hugify_blocked_by_ndirty`
function call returns true. This can happen in cases where we have no
dirty memory for this shard at all. In this case purging loop will be an
infinite loop.

`hpa_hugify_blocked_by_ndirty` was introduced at 0f6c420, but at that
time purging loop has different form and additional `break` was not
required. Purging loop form was re-written at 6630c5989, but additional
exit condition wasn't added there at the time.

Repo code was shared by Patrik Dokoupil at [1], I stripped it down to
minimum to reproduce issue in jemalloc unit tests.

[1]: https://github.com/jemalloc/jemalloc/pull/2533
2024-04-30 13:46:32 -07:00
Qi Wang
fa451de17f Fix the tcache flush sanity checking around ncached and nstashed.
When there were many items stashed, it's possible that after flushing stashed,
ncached is already lower than the remain, in which case the flush can simply
return at that point.
2024-04-12 16:01:55 -07:00
debing.sun
630434bb0a Fixed type error with allocated that caused incorrect printing on 32bit 2024-04-09 14:44:43 -07:00
Shirui Cheng
4b555c11a5 Enable heap profiling on MacOS 2024-04-09 12:57:01 -07:00
Daniel Hodges
11038ff762 Add support for namespace pids in heap profile names
This change adds support for writing pid namespaces to the filename of a
heap profile. When running with namespaces pids may reused across
namespaces and if mounts are shared where profiles are written there is
not a great way to differentiate profiles between pids.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Signed-off-by: Daniel Hodges <hodgesd@fb.com>
2024-04-09 10:27:52 -07:00
Qi Wang
83b075789b rallocx path: only set errno on the realloc case. 2024-04-05 17:41:43 -07:00
Shirui Cheng
5081c16bb4 Experimental calloc implementation with using memset on larger sizes 2024-04-04 15:31:56 -07:00
Juhyung Park
38056fea64 Set errno to ENOMEM on rallocx() OOM failures
realloc() and rallocx() shares path, and realloc() should set errno to
ENOMEM upon OOM failures.

Fixes: ee961c2310 ("Merge realloc and rallocx pathways.")
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-04-04 15:13:22 -07:00
Dmitry Ilvokhin
268e8ee880 Include HPA ndirty into page allocator ndirty stat 2024-04-04 12:17:30 -07:00
Dmitry Ilvokhin
b2e59a96e1 Introduce getters for page allocator shard stats
Access nactive, ndirty and nmuzzy throught getters and not directly.
There are no functional change, but getters are required to propagate
HPA's statistics up to Page Allocator's statitics.
2024-04-04 12:17:30 -07:00
Amaury Séchet
92aa52c062 Reduce nesting in phn_merge_siblings using an early return. 2024-03-14 13:08:17 -07:00
Amaury Séchet
10d713151d Ensure that the root of a heap is always the best element. 2024-03-14 13:07:45 -07:00
Minsoo Choo
1978e5cdac Update acitons/checkout and actions/upload-artifact to v4 2024-03-12 12:59:15 -07:00
XChy
ed9b00a96b Replace unsigned induction variable with size_t in background_threads_enable
This patch avoids unnecessary vectorizations in clang and missed recognition of memset in gcc. See also https://godbolt.org/z/aoeMsjr4c.
2024-03-05 14:54:50 -08:00
Shirui Cheng
373884ab48 print out all malloc_conf settings in stats 2024-02-29 12:12:44 -08:00
Qi Wang
1aba4f41a3 Allow zero sized memalign to pass.
Instead of failing on assertions.  Previously the same change was made for
posix_memalign and aligned_alloc (#1554).  Make memalign behave the same way
even though it's obsolete.
2024-02-16 13:06:07 -08:00
Qi Wang
6d181bc1b7 Fix Cirrus CI.
13.0-RELEASE does not exist anymore.  "The resource
'projects/freebsd-org-cloud-dev/global/images/family/freebsd-13-0' was not
found"
2024-02-16 13:05:40 -08:00
David Goldblatt
f96010b7fa gitignore: Start ignoring clangd dirs. 2024-01-23 17:02:01 -08:00
Qi Wang
a2c5267409 HPA: Allow frequent reused alloc to bypass the slab_max_alloc limit, as long as
it's within the huge page size.  These requests do not concern internal
fragmentation with huge pages, since the entire range is expected to be
accessed.
2024-01-18 14:51:04 -08:00
guangli-dai
b1792c80d2 Add LOGs when entrying and exiting free and sdallocx. 2024-01-11 14:37:20 -08:00
Qi Wang
05160258df When safety_check_fail, also embed hint msg in the abort function name
because there are cases only logging crash stack traces.
2024-01-11 14:19:54 -08:00
Qi Wang
3a6296e1ef Disable FreeBSD on Travis CI since it's not working.
Travis CI currently provides only FreeBSD 12 which is EOL.
2024-01-04 14:47:52 -08:00
Minsoo Choo
d284aad027 Test on more FreeBSD versions
Added 14.0-RELEASE
Added 15-CURRENT
Added 14-STABLE
Added 13-STABLE

13.0-RELEASE will be updated when 13.3-RELEASE comes out.
2024-01-04 12:48:24 -08:00
Connor
dfb3260b97 Fix missing cleanup message for collected profiles.
```
sub cleanup {
  unlink($main::tmpfile_sym);
  unlink(keys %main::tempnames);

  # We leave any collected profiles in $HOME/jeprof in case the user wants
  # to look at them later.  We print a message informing them of this.
  if ((scalar(@main::profile_files) > 0) &&
      defined($main::collected_profile)) {
    if (scalar(@main::profile_files) == 1) {
      print STDERR "Dynamically gathered profile is in $main::collected_profile\n";
    }
    print STDERR "If you want to investigate this profile further, you can do:\n";
    print STDERR "\n";
    print STDERR "  jeprof \\\n";
    print STDERR "    $main::prog \\\n";
    print STDERR "    $main::collected_profile\n";
    print STDERR "\n";
  }
}
```
On cleanup, it would print out a message for the collected profile.
If there is only one collected profile, it would pop by L691, then `scalar(@main::profile_files)` would be 0, and no message would be printed.
2024-01-03 14:24:38 -08:00
Honggyu Kim
f6fe6abdcb build: Make autogen.sh accept quoted extra options
The current autogen.sh script doesn't allow receiving quoted extra
options.

If someone wants to pass extra CFLAGS that is split into multiple
options with a whitespace, then a quote is required.

However, the configure inside autogen.sh fails in this case as follows.

  $ ./autogen.sh CFLAGS="-Dmmap=cxl_mmap -Dmunmap=cxl_munmap"
  autoconf
  ./configure --enable-autogen CFLAGS=-Dmmap=cxl_mmap -Dmunmap=cxl_munmap
  configure: error: unrecognized option: `-Dmunmap=cxl_munmap'
  Try `./configure --help' for more information
  Error 0 in ./configure

It's because the quote discarded unexpectedly when calling configure.

This patch is to fix this problem.

Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
2024-01-03 14:20:34 -08:00
guangli-dai
eda05b3994 Fix static analysis warnings. 2024-01-03 14:18:52 -08:00
Shirui Cheng
e4817c8d89 Cleanup cache_bin_info_t* info input args 2023-10-25 10:27:31 -07:00
Qi Wang
3025b021b9 Optimize mutex and bin alignment / locality. 2023-10-23 20:28:26 -07:00
guangli-dai
e2cd27132a Change stack_size assertion back to the more compatabile one. 2023-10-23 20:28:26 -07:00
guangli-dai
756d4df2fd Add util.c into vs project file. 2023-10-18 22:11:13 -07:00
Qi Wang
04d1a87b78 Fix a zero-initializer warning on macOS. 2023-10-18 14:12:43 -07:00
guangli-dai
d88fa71bbd Fix nfill = 0 bug when ncached_max is 1 2023-10-18 14:11:46 -07:00
guangli-dai
6fb3b6a8e4 Refactor the tcache initiailization
1. Pre-generate all default tcache ncached_max in tcache_boot;
2. Add getters returning default ncached_max and ncached_max_set;
3. Refactor tcache init so that it is always init with a given setting.
2023-10-18 14:11:46 -07:00
guangli-dai
8a22d10b83 Allow setting default ncached_max for each bin through malloc_conf 2023-10-18 14:11:46 -07:00
guangli-dai
867eedfc58 Fix the bug in dalloc promoted allocations.
An allocation small enough will be promoted so that it does not
share an extent with others.  However, when dalloc, such allocations
may not be dalloc as a promoted one if nbins < SC_NBINS.  This
commit fixes the bug.
2023-10-17 14:53:23 -07:00
guangli-dai
630f7de952 Add mallctl to set and get ncached_max of each cache_bin.
1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the
    ncached_max of the bin with the input sizeclass, passed in through
    oldp (will be upper casted if not an exact bin size is given).
2. `thread_tcache_ncached_max_write` takes in a char array
    representing the settings for bins in the tcache.
2023-10-17 14:53:23 -07:00
guangli-dai
6b197fdd46 Pre-generate ncached_max for all bins for better tcache_max tuning experience. 2023-10-17 14:53:23 -07:00
Shirui Cheng
36becb1302 metadata usage breakdowns: tracking edata and rtree usages 2023-10-11 11:56:01 -07:00
Qi Wang
005f20aa7f Fix comments about malloc_conf to enable logging. 2023-10-04 11:49:10 -07:00
guangli-dai
7a9e4c9073 Mark jemalloc.h as system header to resolve header conflicts. 2023-10-04 11:41:30 -07:00
Qi Wang
72cfdce718 Allocate tcache stack from base allocator
When using metadata_thp, allocate tcache bin stacks from base0, which means they
will be placed on huge pages along with other metadata, instead of mixed with
other regular allocations.

In order to do so, modified the base allocator to support limited reuse: freed
tcached stacks (from thread termination) will be returned to base0 and made
available for reuse, but no merging will be attempted since they were bump
allocated out of base blocks. These reused base extents are managed using
separately allocated base edata_t -- they are cached in base->edata_avail when
the extent is all allocated.

One tricky part is, stats updating must be skipped for such reused extents
(since they were accounted for already, and there is no purging for base). This
requires tracking the "if is reused" state explicitly and bypass the stats
updates when allocating from them.
2023-09-18 12:18:32 -07:00
guangli-dai
a442d9b895 Enable per-tcache tcache_max
1. add tcache_max and nhbins into tcache_t so that they are per-tcache,
   with one auto tcache per thread, it's also per-thread;
2. add mallctl for each thread to set its own tcache_max (of its auto tcache);
3. store the maximum number of items in each bin instead of using a global storage;
4. add tests for the modifications above.
5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.
2023-09-06 10:47:14 -07:00
guangli-dai
fbca96c433 Remove unnecessary parameters for cache_bin_postincrement. 2023-09-06 10:47:14 -07:00
Evers Chen
7d9eceaf38 Fix array bounds false warning in gcc 12.3.0
1.error: array subscript 232 is above array bounds of ‘size_t[232]’ in gcc 12.3.0
2.it also optimizer to the code
2023-09-05 14:33:55 -07:00
BtbN
ce8ce99a4a Expose jemalloc_prefix via pkg-config 2023-09-05 14:30:21 -07:00
BtbN
ed7e6fe71a Expose private library dependencies via pkg-config
When linking statically, these need to be included for linking to succeed.
2023-09-05 14:29:33 -07:00
Qi Wang
7d563a8f81 Update safety check message to remove --enable-debug when it's already on. 2023-09-05 14:15:45 -07:00
Qi Wang
b71da25b8a Fix reading CPU id using rdtscp.
As pointed out in #2527, the correct register containing CPU id should be ecx
instead edx.
2023-08-28 11:46:39 -07:00
Qi Wang
87c56c8df8 Fix arenas.i.bins.j.mutex link id in manual. 2023-08-28 11:01:13 -07:00
Kevin Svetlitski
da66aa391f Enable a few additional warnings for CI and fix the issues they uncovered
- `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are
  helpful for finding dead code and/or things that should be `static`
  but aren't marked as such.
- `-Wunused-macros` is of similar utility, but for identifying dead macros.
- `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly
  what they say: flag unreachable code.
2023-08-11 13:56:23 -07:00
Kevin Svetlitski
d2c9ed3d1e Ensure short read(2)s/write(2)s are properly handled by IO utilities
`read(2)` and `write(2)` may read or write fewer bytes than were
requested. In order to robustly ensure that all of the requested bytes
are read/written, these edge-cases must be handled.
2023-08-11 13:36:24 -07:00
guangli-dai
254c4847e8 Print colorful reminder for failed tests. 2023-08-08 15:01:07 -07:00
Kevin Svetlitski
4f50f782fa Use compiler-provided assume builtins when available
There are several benefits to this:
1. It's cleaner and more reliable to use the builtin to
   inform the compiler of assumptions instead of hoping that the
   optimizer understands your intentions.
2. `clang` will warn you if any of your assumptions would produce
   side-effects (which the compiler will discard). [This blog post](https://fastcompression.blogspot.com/2019/01/compiler-checked-contracts.html)
   by Yann Collet highlights that a hazard of using the
   `unreachable()`-based method of signaling assumptions is that it
   can sometimes result in additional instructions being generated (see
   [this Godbolt link](https://godbolt.org/z/lKNMs3) from the blog post
   for an example).
2023-08-08 14:59:36 -07:00
Kevin Svetlitski
3aae792b10 Fix infinite purging loop in HPA
As reported in #2449, under certain circumstances it's possible to get
stuck in an infinite loop attempting to purge from the HPA. We now
handle this by validating the HPA settings at the end of
configuration parsing and either normalizing them or aborting depending on
if `abort_conf` is set.
2023-08-08 14:36:19 -07:00
Kevin Svetlitski
424dd61d57 Issue a warning upon directly accessing an arena's bins
An arena's bins should normally be accessed via the `arena_get_bin`
function, which properly takes into account bin-shards. To ensure that
we don't accidentally commit code which incorrectly accesses the bins
directly, we mark the field with `__attribute__((deprecated))` with an
appropriate warning message, and suppress the warning in the few places
where directly accessing the bins is allowed.
2023-08-04 15:47:05 -07:00
Kevin Svetlitski
120abd703a Add support for the deprecated attribute
This is useful for enforcing the usage of getter/setter functions to
access fields which are considered private or have unique access constraints.
2023-08-04 15:47:05 -07:00
Kevin Svetlitski
162ff8365d Update the Ubuntu version used by Travis CI
Update from Ubuntu Focal Fossa to Ubuntu Jammy Jellyfish. Staying up to
date is always good, but I'm also hoping that perhaps this newer release
contains fixes so that PowerPC VMs don't randomly hang indefinitely
while booting anymore, stalling our CI pipeline.
2023-08-04 15:32:15 -07:00
Kevin Svetlitski
07a2eab3ed Stop over-reporting memory usage from sampled small allocations
@interwq noticed [while reviewing an earlier PR](https://github.com/jemalloc/jemalloc/pull/2478#discussion_r1256217261)
that I missed modifying this statistics accounting in line with the rest
of the changes from #2459. This is now fixed, such that sampled small
allocations increment the `.nmalloc`/`.ndalloc` of their effective bin
size instead of over-reporting memory usage by attributing all such
allocations to `SC_LARGE_MINCLASS`.
2023-08-03 16:12:22 -07:00
Kevin Svetlitski
ea5b7bea31 Add configuration option controlling DSS support
In many environments, the fallback `sbrk(2)` allocation path is never
used even if the system supports the syscall; if you're at the point
where `mmap(2)` is failing, `sbrk(2)` is unlikely to succeed. Without
changing the default, I've added the ability to disable the usage of DSS
altogether, so that you do not need to pay for the additional code size
and handful of extra runtime branches in such environments.
2023-08-03 11:52:25 -07:00
Qi Wang
6816b23862 Include the unrecognized malloc conf option in the error message.
Previously the option causing trouble will not be printed, unless the option
key:value pair format is found.
2023-08-02 10:44:55 -07:00
Kevin Svetlitski
62648c88e5 Ensured sampled allocations are properly deallocated during arena_reset
Sampled allocations were not being demoted before being deallocated
during an `arena_reset` operation.
2023-08-01 11:35:37 -07:00
Kevin Svetlitski
b01d496646 Add an override for the compile-time malloc_conf to jemalloc_internal_overrides.h 2023-07-31 14:53:15 -07:00
Kevin Svetlitski
9ba1e1cb37 Make ctl_arena_clear slightly more efficient
While this function isn't particularly hot, (accounting for just 0.27% of
time spent inside the allocator on average across the fleet), looking
at the generated assembly and performance profiles does show we're dispatching
to multiple different `memset`s when we could instead be just tail-calling
`memset` once, reducing code size and marginally improving performance.
2023-07-31 14:44:04 -07:00
Kevin Svetlitski
8ff7e7d6c3 Remove errant #includes in public jemalloc.h header
In an attempt to make all headers self-contained, I inadvertently added
`#include`s which refer to intermediate, generated headers that aren't
included in the final install. Closes #2489.
2023-07-25 16:26:50 -07:00
Kevin Svetlitski
3e82f357bb Fix all optimization-inhibiting integer-to-pointer casts
Following from PR #2481, we replace all integer-to-pointer casts [which
hide pointer provenance information (and thus inhibit
optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html)
with equivalent operations that preserve this information. I have
enabled the corresponding clang-tidy check in our static analysis CI so
that we do not get bitten by this again in the future.
2023-07-24 14:40:42 -07:00
Kevin Svetlitski
4827bb17bd Remove vestigial TCACHE_STATE_* macros 2023-07-24 14:40:42 -07:00
Kevin Svetlitski
1431153695 Define SBRK_INVALID instead of using a magic number 2023-07-24 14:40:42 -07:00
Kevin Svetlitski
7e54dd1ddb Define PROF_TCTX_SENTINEL instead of using magic numbers
This makes the code more readable on its own, and also sets the stage
for more cleanly handling the pointer provenance lints in a following
commit.
2023-07-24 14:40:42 -07:00
Kevin Svetlitski
c49c17f128 Suppress verbose frame address warnings
These warnings are not useful, and make the output of some CI jobs
enormous and difficult to read, so let's suppress them.
2023-07-24 10:44:17 -07:00
Kevin Svetlitski
cdb2c0e02f Implement C23's free_sized and free_aligned_sized
[N2699 - Sized Memory Deallocation](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2699.htm)
introduced two new functions which were incorporated into the C23
standard, `free_sized` and `free_aligned_sized`. Both already have
analogues in Jemalloc, all we are doing here is adding the appropriate
wrappers.
2023-07-20 15:06:41 -07:00
Kevin Svetlitski
41e0b857be Make headers self-contained by fixing #includes
Header files are now self-contained, which makes the relationships
between the files clearer, and crucially allows LSP tools like `clangd`
to function correctly in all of our header files. I have verified that
the headers are self-contained (aside from the various Windows shims) by
compiling them as if they were C files – in a follow-up commit I plan to
add this to CI to ensure we don't regress on this front.
2023-07-14 09:06:32 -07:00
Kevin Svetlitski
856db56f6e Move tsd implementation details into tsd_internals.h
This is a prerequisite to achieving self-contained headers. Previously,
the various tsd implementation headers (`tsd_generic.h`,
`tsd_tls.h`, `tsd_malloc_thread_cleanup.h`, and `tsd_win.h`) relied
implicitly on being included in `tsd.h` after a variety of dependencies
had been defined above them. This commit instead makes these
dependencies explicit by splitting them out into a separate file,
`tsd_internals.h`, which each of the tsd implementation headers includes
directly.
2023-07-14 09:06:32 -07:00
Kevin Svetlitski
36ca0c1b7d Stop concealing pointer provenance in phn_link_get
At least for LLVM, [casting from an integer to a pointer hides provenance information](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html)
and inhibits optimizations. Here's a [Godbolt link](https://godbolt.org/z/5bYPcKoWT)
showing how this change removes a couple unnecessary branches in
`phn_merge_siblings`, which is a very hot function. Canary profiles show
only minor improvements (since most of the cost of this function is in
cache misses), but there's no reason we shouldn't take it.
2023-07-13 15:12:31 -07:00
Kevin Svetlitski
314c073a38 Print the failed assertion before aborting in test cases
This makes it faster and easier to debug, so that you don't need to fire
up a debugger just to see which assertion triggered in a failing test.
2023-07-13 15:07:17 -07:00
Kevin Svetlitski
65d3b5989b Print test error messages in color when stderr is a terminal
When stderr is a terminal and supports color, print error messages
from tests in red to make them stand out from the surrounding output.
2023-07-13 13:03:23 -07:00
Kevin Svetlitski
1d9e9c2ed6 Fix inconsistent parameter names between definition/declaration pairs
For the sake of consistency, function definitions and their
corresponding declarations should use the same names for parameters.
I've enabled this check in static analysis to prevent this issue from
occurring again in the future.
2023-07-13 12:59:47 -07:00
Kevin Svetlitski
5711dc31d8 Only enable -Wstrict-prototypes in CI to unbreak feature detection
Adding `-Wstrict-prototypes` to the default `CFLAGS` in PR #2473 had the
non-obvious side-effect of breaking configure-time feature detection,
because the [test-program `autoconf` generates for feature
detection](https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Generating-Sources.html#:~:text=main%20())
defines `main` as:
```c
int main()
```
Which causes all feature checks to fail, since this triggers
`-Wstrict-prototypes` and the feature checks use `-Werror`.

Resolved by only adding `-Wstrict-prototypes` to
`EXTRA_{CFLAGS,CXXFLAGS}` in CI, since these flags are not used during
feature detection and we control which compiler is used.
2023-07-06 18:03:13 -07:00
Kevin Svetlitski
589c63b424 Make eligible global variables static and/or const
For better or worse, Jemalloc has a significant number of global
variables. Making all eligible global variables `static` and/or `const`
at least makes it slightly easier to reason about them, as these
qualifications communicate to the programmer restrictions on their use
without having to `grep` the whole codebase.
2023-07-06 14:15:12 -07:00
Qi Wang
e249d1a2a1 Remove unreachable code. 2023-07-06 12:06:06 -07:00
Qi Wang
602edd7566 Enabled -Wstrict-prototypes and fixed warnings. 2023-07-06 12:00:02 -07:00
Kevin Svetlitski
ebd7e99f5c Add a test-case for small profiled allocations
Validate that small allocations (i.e. those with `size <= SC_SMALL_MAXCLASS`)
which are sampled for profiling maintain the expected invariants even
though they now take up less space.
2023-07-03 16:19:06 -07:00
Kevin Svetlitski
5a858c64d6 Reduce the memory overhead of sampled small allocations
Previously, small allocations which were sampled as part of heap
profiling were rounded up to `SC_LARGE_MINCLASS`. This additional memory
usage becomes problematic when the page size is increased, as noted in #2358.

Small allocations are now rounded up to the nearest multiple of `PAGE`
instead, reducing the memory overhead by a factor of 4 in the most
extreme cases.
2023-07-03 16:19:06 -07:00
Kevin Svetlitski
e1338703ef Address compiler warnings in the unit tests 2023-07-03 16:06:35 -07:00
Qi Wang
d131331310 Avoid eager purging on the dedicated oversize arena when using bg thds.
We have observed new workload patterns (namely ML training type) that cycle
through oversized allocations frequently, because 1) the dataset might be sparse
which is faster to go through, and 2) GPU accelerated.  As a result, the eager
purging from the oversize arena becomes a bottleneck.  To offer an easy
solution, allow normal purging of the oversized extents when background threads
are enabled.
2023-06-27 11:57:41 -07:00
Kevin Svetlitski
46e464a26b Fix downloading LLVM in GitHub Action
It turns out LLVM does not include a build for every platform in the
assets for every release, just some of them. As such, I've pinned us to
the latest release version with a corresponding build.
2023-06-23 14:30:49 -07:00
Kevin Svetlitski
f2e00d2fd3 Remove trailing whitespace
Additionally, added a GitHub Action to ensure no more trailing
whitespace will creep in again in the future.

I'm excluding Markdown files from this check, since trailing whitespace
is significant there, and also excluding `build-aux/install-sh` because
there is significant trailing whitespace on the line that sets
`defaultIFS`.
2023-06-23 11:58:18 -07:00
Kevin Svetlitski
05385191d4 Add GitHub action which runs static analysis
Now that all of the various issues that static analysis uncovered have
been fixed (#2431, #2432, #2433, #2436, #2437, #2446), I've added a
GitHub action which will run static analysis for every PR going forward.
When static analysis detects issues with your code, the GitHub action
provides a link to download its findings in a form tailored for human
consumption.

Take a look at [this demonstration of what it looks like when static
analysis issues are
found](https://github.com/Svetlitski/jemalloc/actions/runs/5010245602)
on my fork for an example (make sure to follow the instructions in the
error message to download and inspect the results).
2023-06-23 11:55:43 -07:00
Kevin Svetlitski
bb0333e745 Fix remaining static analysis warnings
Fix or suppress the remaining warnings generated by static analysis.
This is a necessary step before we can incorporate static analysis into
CI. Where possible, I've preferred to modify the code itself instead of
just disabling the warning with a magic comment, so that if we decide to
use different static analysis tools in the future we will be covered
against them raising similar warnings.
2023-06-23 11:50:29 -07:00
Kevin Svetlitski
210f0d0b2b Fix read of uninitialized data in prof_free
In #2433, I inadvertently introduced a regression which causes the use of
uninitialized data. Namely, the control path I added for the safety
check in `arena_prof_info_get` neglected to set `prof_info->alloc_tctx`
when the check fails, resulting in `prof_info.alloc_tctx` being
uninitialized [when it is read at the end of
`prof_free`](90176f8a87/include/jemalloc/internal/prof_inlines.h (L272)).
2023-06-15 18:30:05 -07:00
Kevin Svetlitski
90176f8a87 Fix segfault in rb *_tree_remove
Static analysis flagged this. It's possible to segfault in the
`*_tree_remove` function generated by `rb_gen`, as `nodep` may
still be `NULL` after the initial for loop. I can confirm from reviewing
the fleetwide coredump data that this was in fact being hit in
production, primarily through `tctx_tree_remove`, and much more rarely
through `gctx_tree_remove`.
2023-06-07 14:48:41 -07:00
Qi Wang
86eb49b478 Fix the arena selection for oversized allocations.
Use the per-arena oversize_threshold, instead of the global setting.
2023-06-06 15:03:13 -07:00
Christos Zoulas
5832ef6589 Use a local variable to set the alignment for this particular allocation
instead of changing mmap_flags which makes the change permanent. This was
enforcing large alignments for allocations that did not need it causing
fragmentation. Reported by Andreas Gustafsson.
2023-05-31 14:44:24 -07:00
Kevin Svetlitski
6d4aa33753 Extract the calculation of psset heap assignment for an hpdata into a common function
This is in preparation for upcoming changes I plan to make to this
logic. Extracting it into a common function will make this easier and
less error-prone, and cleans up the existing code regardless.
2023-05-31 11:44:04 -07:00
Arne Welzel
c1d3ad4674 Prune je_malloc_default and do_rallocx in jeprof
Running a simple Ruby and Python execution je_malloc_default and
do_rallocx() in the resulting SVG / text output. Prune these, too.

    MALLOC_CONF='stats_print:true,lg_prof_sample:8,prof:true,prof_final:true' \
        python3 -c '[x for x in range(10000000)]'

    MALLOC_CONF='stats_print:true,lg_prof_sample:8,prof:true,prof_final:true' \
        ruby -e 'puts (0..1000).map{"0"}.join(" ")'
2023-05-31 11:41:09 -07:00
Arne Welzel
d59e30cbc9 Rename fallback_impl to fallbackNewImpl and prune in jeprof
The existing fallback_impl name seemed a bit generic and given
it's static probably okay to rename.

Closes #2451
2023-05-31 11:41:09 -07:00
Qi Wang
d577e9b588 Explicitly cast to unsigned for MALLOCX_ARENA and _TCACHE defines. 2023-05-26 11:52:42 -07:00
Qi Wang
a2259f9fa6 Fix the include path of "jemalloc_internal_overrides.h". 2023-05-25 15:22:02 -07:00
Kevin Svetlitski
9c32689e57 Fix bug where hpa_shard was not being destroyed
It appears that this was a simple mistake where `hpa_shard_disable` was
being called instead of `hpa_shard_destroy`. At present
`hpa_shard_destroy` is not called anywhere at all outside of test-cases,
which further suggests that this is a bug. @davidtgoldblatt noted
however that since HPA is disabled for manual arenas and we don't
support destruction for auto arenas that presently there is no way to
actually trigger this bug. Nonetheless, it should be fixed.
2023-05-18 14:17:38 -07:00
Kevin Svetlitski
4e6f1e9208 Allow overriding LG_PAGE
This is useful for our internal builds where we override the
configuration in the header files generated by autoconf.
2023-05-17 13:55:38 -07:00
Kevin Svetlitski
3e2ba7a651 Remove dead stores detected by static analysis
None of these are harmful, and they are almost certainly optimized
away by the compiler. The motivation for fixing them anyway is that
we'd like to enable static analysis as part of CI, and the first step
towards that is resolving the warnings it produces at present.
2023-05-11 20:27:49 -07:00
Kevin Svetlitski
0288126d9c Fix possible NULL pointer dereference from mallctl("prof.prefix", ...)
Static analysis flagged this issue. Here is a minimal program which
causes a segfault within Jemalloc:
```
#include <jemalloc/jemalloc.h>

const char *malloc_conf = "prof:true";

int main() {
  mallctl("prof.prefix", NULL, NULL, NULL, 0);
}
```

Fixed by checking if `prefix` is `NULL`.
2023-05-11 14:47:50 -07:00
Qi Wang
d4a2b8bab1 Add the prof_sys_thread_name feature in the prof_recent unit test.
This tests the combination of the prof_recent and thread_name features.
Verified that it catches the issue being fixed in this PR.

Also explicitly set thread name in test/unit/prof_recent.  This fixes the name
testing when no default thread name is set (e.g. FreeBSD).
2023-05-11 09:10:57 -07:00
Qi Wang
94ace05832 Fix the prof thread_name reference in prof_recent dump.
As pointed out in #2434, the thread_name in prof_tdata_t was changed in #2407.
This also requires an update for the prof_recent dump, specifically the emitter
expects a "char **" which is fixed in this commit.
2023-05-11 09:10:57 -07:00
Qi Wang
6ea8a7e928 Add config detection for JEMALLOC_HAVE_PTHREAD_SET_NAME_NP.
and use it on the background thread name setting.
2023-05-11 09:10:57 -07:00
auxten
5bac384970 If ptr present check if alloc_ctx.edata == NULL 2023-05-10 17:18:22 -07:00
auxten
019cccc293 Make arenas_lookup_ctl triable 2023-05-10 17:18:22 -07:00
Kevin Svetlitski
dc0a184f8d Fix possible NULL pointer dereference in VERIFY_READ
Static analysis flagged this. Fixed by simply checking `oldlenp`
before dereferencing it.
2023-05-09 10:57:09 -07:00
Kevin Svetlitski
12311fe6c3 Fix segfault in extent_try_coalesce_impl
Static analysis flagged this. `extent_record` was passing `NULL` as the
value for `coalesced` to `extent_try_coalesce`, which in turn passes
that argument to `extent_try_coalesce_impl`, where it is written to
without checking if it is `NULL`. I can confirm from reviewing the
fleetwide coredump data that this was in fact being hit in production.
2023-05-09 10:55:44 -07:00
Kevin Svetlitski
70344a2d38 Make eligible functions static
The codebase is already very disciplined in making any function which
can be `static`, but there are a few that appear to have slipped through
the cracks.
2023-05-08 15:00:02 -07:00
Kevin Svetlitski
6841110bd6 Make edata_cmp_summary_comp 30% faster
`edata_cmp_summary_comp` is one of the very hottest functions, taking up
3% of all time spent inside Jemalloc. I noticed that all existing
callsites rely only on the sign of the value returned by this function,
so I came up with this equivalent branchless implementation which
preserves this property. After empirical measurement, I have found that
this implementation is 30% faster, therefore representing a 1% speed-up
to the allocator as a whole.

At @interwq's suggestion, I've applied the same optimization to
`edata_esnead_comp` in case this function becomes hotter in the future.
2023-05-04 09:59:17 -07:00
Amaury Séchet
f2b28906e6 Some nits in cache_bin.h 2023-05-01 10:21:17 -07:00
Kevin Svetlitski
fc680128e0 Remove errant assert in arena_extent_alloc_large
This codepath may generate deferred work when the HPA is enabled.
See also [@davidtgoldblatt's relevant comment on the PR which
introduced this](https://github.com/jemalloc/jemalloc/pull/2107#discussion_r699770967)
which prevented a similarly incorrect `assert` from being added elsewhere.
2023-05-01 10:00:30 -07:00
Eric Mueller
521970fb2e Check for equality instead of assigning in asserts in hpa_from_pai.
It appears like a simple typo means we're unconditionally overwriting
some fields in hpa_from_pai when asserts are enabled. From hpa_shard_init,
it looks like these fields have these values anyway, so this shouldn't
cause bugs, but if something is wrong it seems better to have these
asserts in place.

See issue #2412.
2023-04-17 20:57:48 -07:00
guangli-dai
5f64ad60cd Remove locked flag set in malloc_mutex_trylock
As a hint flag of the lock, parameter locked should be set only
when the lock is gained or freed.
2023-04-06 10:57:04 -07:00
Qi Wang
434a68e221 Disallow decay during reentrancy.
Decay should not be triggered during reentrant calls (may cause lock order
reversal / deadlocks).  Added a delay_trigger flag to the tickers to bypass
decay when rentrancy_level is not zero.
2023-04-05 10:16:37 -07:00
Qi Wang
e62aa478c7 Rearrange the bools in prof_tdata_t to save some bytes.
This lowered the sizeof(prof_tdata_t) from 200 to 192 which is a round size
class.  Afterwards the tdata_t size remain unchanged with the last commit, which
effectively inlined the storage of thread names for free.
2023-04-05 10:03:12 -07:00
Qi Wang
ce0b7ab6c8 Inline the storage for thread name in prof_tdata_t.
The previous approach managed the thread name in a separate buffer, which causes
races because the thread name update (triggered by new samples) can happen at
the same time as prof dumping (which reads the thread names) -- these two
operations are under separate locks to avoid blocking each other.  Implemented
the thread name storage as part of the tdata struct, which resolves the lifetime
issue and also avoids internal alloc / dalloc during prof_sample.
2023-04-05 10:03:12 -07:00
Qi Wang
6cab460a45 Add a multithreaded test for prof_sys_thread_name.
Verified that this catches the issue being fixed in 5fd5583.
2023-04-05 10:03:12 -07:00
Amaury Séchet
5266152d79 Simplify the logic in ph_remove 2023-03-31 14:35:31 -07:00
Amaury Séchet
be6da4f663 Do not maintain root->prev in ph_remove. 2023-03-31 14:34:57 -07:00
Amaury Séchet
543e2d61e6 Simplify the logic in ph_insert
Also fixes what looks like an off by one error in the lazy aux list
merge part of the code that previously never touched the last node in
the aux list.
2023-03-31 14:34:24 -07:00
guangli-dai
31e01a98f1 Fix the rdtscp detection bug and add prefix for the macro. 2023-03-23 11:16:19 -07:00
Qi Wang
8b64be3441 Explicit arena assignment in test_tcache_max.
Otherwise the associated arena could change with percpu arena enabled.
2023-03-22 15:16:43 -07:00
Qi Wang
8e7353a19b Explicit arena assignment in test_thread_idle.
Otherwise the associated arena could change with percpu arena enabled.
2023-03-22 15:16:43 -07:00
Marvin Schmidt
45249cf5a9 Fix exception specification error for hosts using musl libc
It turns out that the previous commit did not suffice since the
JEMALLOC_SYS_NOTHROW definition also causes the same exception specification
errors as JEMALLOC_USE_CXX_THROW did:
```
x86_64-pc-linux-musl-cc -std=gnu11 -Werror=unknown-warning-option -Wall -Wextra -Wshorten-64-to-32 -Wsign-compare -Wundef -Wno-format-zero-length -Wpointer-
arith -Wno-missing-braces -Wno-missing-field-initializers -pipe -g3 -fvisibility=hidden -Wimplicit-fallthrough -O3 -funroll-loops -march=native -O2 -pipe -c -march=native -O2 -pipe -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/background_thread.o src/background_thread.c
In file included from src/jemalloc_cpp.cpp:9:
In file included from include/jemalloc/internal/jemalloc_preamble.h:27:
include/jemalloc/internal/../jemalloc.h:254:32: error: exception specification in declaration does not match previous declaration
    void JEMALLOC_SYS_NOTHROW   *je_malloc(size_t size)
                                 ^
include/jemalloc/internal/../jemalloc.h:75:21: note: expanded from macro 'je_malloc'
                    ^
/usr/x86_64-pc-linux-musl/include/stdlib.h:40:7: note: previous declaration is here
void *malloc (size_t);
      ^
```

On systems using the musl C library we have to omit the exception specification
on malloc function family like it's done for MacOS, FreeBSD and OpenBSD.
2023-03-16 12:11:40 -07:00
Marvin Schmidt
aba1645f2d configure: Handle *-linux-musl* hosts properly
This is the same as the `*-*-linux*` case with the two exceptions that
we don't set glibc=1 and don't define JEMALLOC_USE_CXX_THROW
2023-03-16 12:11:40 -07:00
Qi Wang
d503d72129 Add the missing descriptions in AC_DEFINE 2023-03-14 16:47:00 -07:00
Qi Wang
71bc1a3d91 Avoid assuming the arena id in test when percpu_arena is used. 2023-03-13 10:50:10 -07:00
Amaury Séchet
f743690739 Remove unused mutex from hpa_central 2023-03-10 11:25:47 -08:00
Chris Seymour
4edea8eb8e switch to https 2023-03-09 11:44:02 -08:00
guangli-dai
09e4b38fb1 Use asm volatile during benchmarks. 2023-02-24 11:17:48 -08:00
Fernando Pelliccioni
e8b28908de [MSVC] support for Visual Studio 2019 and 2022 2023-02-21 13:39:25 -08:00
barracuda156
4422f88d17 Makefile.in: link with g++ when cxx enabled 2023-02-21 13:26:58 -08:00
Qi Wang
c7805f1eb5 Add a header in HPA stats for the nonfull slabs. 2023-02-17 13:31:27 -08:00
Qi Wang
b6125120ac Add an explicit name to the dedicated oversize arena. 2023-02-17 13:31:09 -08:00
Qi Wang
97b313c7d4 More conservative setting for /test/unit/background_thread_enable.
Lower the thread and arena count to avoid resource exhaustion on 32-bit.
2023-02-16 14:42:21 -08:00
Qi Wang
5fd55837bb Fix thread_name updating for heap profiling.
The current thread name reading path updates the name every time, which requires
both alloc and dalloc -- and the temporary NULL value in the middle causes races
where the prof dump read path gets NULLed in the middle.

Minimize the changes in this commit to isolate the bugfix testing; will also
refactor the whole thread name paths later.
2023-02-15 17:49:40 -08:00
Qi Wang
8580c65f81 Implement prof sample hooks "experimental.hooks.prof_sample(_free)".
The added hooks hooks.prof_sample and hooks.prof_sample_free are intended to
allow advanced users to track additional information, to enable new ways of
profiling on top of the jemalloc heap profile and sample features.

The sample hook is invoked after the allocation and backtracing, and forwards
the both the allocation and backtrace to the user hook; the sample_free hook
happens before the actual deallocation, and forwards only the ptr and usz to the
hook.
2022-12-07 16:06:49 -08:00
guangli-dai
a74acb57e8 Fix dividing 0 error in stress/cpp/microbench
Summary:
Per issue #2356, some CXX compilers may optimize away the
new/delete operation in stress/cpp/microbench.cpp.
Thus, this commit (1) bumps the time interval to 1 if it is 0, and
(2) modifies the pointers in the microbench to volatile.
2022-12-06 10:46:14 -08:00
Guangli Dai
e8f9f13811 Inline free and sdallocx into operator delete 2022-11-21 11:14:05 -08:00
guangli-dai
06374d2a6a Benchmark operator delete
Added the microbenchmark for operator delete.
Also modified bench.h so that it can be used in C++.
2022-11-21 11:14:05 -08:00
guangli-dai
14ad8205bf Update the ratio display in benchmark
In bench.h, specify the ratio as the time consumption ratio and
modify the display of the ratio.
2022-11-21 11:14:05 -08:00
Qi Wang
481bbfc990 Add a configure option --enable-force-getenv.
Allows the use of getenv() rather than secure_getenv() to read MALLOC_CONF.
This helps in situations where hosts are under full control, and setting
MALLOC_CONF is needed while also setuid.  Disabled by default.
2022-11-04 13:37:14 -07:00
Qi Wang
143e9c4a2f Enable fast thread locals for dealloc-only threads.
Previously if a thread does only allocations, it stays on the slow path /
minimal initialized state forever.  However, dealloc-only is a valid pattern for
dedicated reclamation threads -- this means thread cache is disabled (no batched
flush) for them, which causes high overhead and contention.

Added the condition to fully initialize TSD when a fair amount of dealloc
activities are observed.
2022-10-25 09:54:38 -07:00
Paul Smith
be65438f20 jemalloc_internal_types.h: Use alloca if __STDC_NO_VLA__ is defined
No currently-available version of Visual Studio C compiler supports
variable length arrays, even if it defines __STDC_VERSION__ >= C99.
As far as I know Microsoft has no plans to ever support VLAs in MSVC.

The C11 standard requires that the __STDC_NO_VLA__ macro be defined if
the compiler doesn't support VLAs, so fall back to alloca() if so.
2022-10-14 15:48:32 -07:00
divanorama
1897f185d2 Fix safety_check segfault in double free test 2022-10-03 10:55:10 -07:00
Jordan Rome
b04e7666f2 update PROFILING_INTERNALS.md
Expand the bad example of summing before unbiasing.
2022-10-03 10:48:29 -07:00
David Carlier
4c95c953e2 fix build for non linux/BSD platforms. 2022-10-03 10:42:09 -07:00
divanorama
3de0c24859 Disable builtin malloc in tests
With `--with-jemalloc-prefix=` and without `-fno-builtin` or `-O1` both clang and gcc may optimize out `malloc` calls
whose result is unused. Comparing result to NULL also doesn't necessarily count as being used.

This won't be a problem in most client programs as this only concerns really unused pointers, but in
tests it's important to actually execute allocations.
`-fno-builtin` should disable this optimization for both gcc and clang, and applying it only to tests code shouldn't hopefully be an issue.
Another alternative is to force "use" of result but that'd require more changes and may miss some other optimization-related issues.

This should resolve https://github.com/jemalloc/jemalloc/issues/2091
2022-10-03 10:39:13 -07:00
Lily Wang
c0c9783ec9 Add vcpkg installation instructions 2022-09-19 15:15:28 -07:00
Guangli Dai
c9ac1f4701 Fix a bug in C++ integration test. 2022-09-16 15:04:59 -07:00
Guangli Dai
ba19d2cb78 Add arena-level name.
An arena-level name can help identify manual arenas.
2022-09-16 15:04:59 -07:00
Guangli Dai
a0734fd6ee Making jemalloc max stack depth a runtime option 2022-09-12 13:56:22 -07:00
Abael He
56ddbea270 error: implicit declaration of function 'pthread_create_fptr_init' is invalid in C99
./autogen.sh \
&& ./configure --prefix=/usr/local  --enable-static   --enable-autogen --enable-xmalloc --with-static-libunwind=/usr/local/lib/libunwind.a --enable-lazy-lock --with-jemalloc-prefix='' \
&& make -j16

...
gcc -std=gnu11 -Werror=unknown-warning-option -Wall -Wextra -Wshorten-64-to-32 -Wsign-compare -Wundef -Wno-format-zero-length -Wpointer-arith -Wno-missing-braces -Wno-missing-field-initializers -pipe -g3 -Wimplicit-fallthrough -O3 -funroll-loops -fPIC -DPIC -c -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/edata_cache.sym.o src/edata_cache.c
src/background_thread.c:768:6: error: implicit declaration of function 'pthread_create_fptr_init' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
            pthread_create_fptr_init()) {
            ^
src/background_thread.c:768:6: note: did you mean 'pthread_create_wrapper_init'?
src/background_thread.c:34:1: note: 'pthread_create_wrapper_init' declared here
pthread_create_wrapper_init(void) {
^
1 error generated.
make: *** [src/background_thread.sym.o] Error 1
make: *** Waiting for unfinished jobs....
2022-09-07 11:56:41 -07:00
Guangli Dai
ce29b4c3d9 Refactor the remote / cross thread cache bin stats reading
Refactored cache_bin.h so that only one function is racy.
2022-09-06 19:41:19 -07:00
Guangli Dai
42daa1ac44 Add double free detection using slab bitmap for debug build
Add a sanity check for double free issue in the arena in case that the tcache has been flushed.
2022-09-06 12:54:21 -07:00
Ivan Zaitsev
36366f3c4c Add double free detection in thread cache for debug build
Add new runtime option `debug_double_free_max_scan` that specifies the max
number of stack entries to scan in the cache bit when trying to detect the
double free bug (currently debug build only).
2022-08-04 16:58:22 -07:00
David CARLIER
adc70c0511 update travis 2022-07-19 13:23:08 -07:00
David CARLIER
4e12d21c8d enabled percpu_arena settings on macOs.
follow-up on #2280
2022-07-19 13:23:08 -07:00
David Carlier
58478412be OpenBSD build fix. still no cpu affinity.
- enabling pthread_get/pthread_set_name_np api.
- disabling per thread cpu affinity handling, unsupported on this platform.
2022-07-19 13:20:11 -07:00
Qi Wang
a1c7d9c046 Add the missing opt.cache_oblivious handling. 2022-07-14 22:41:27 -07:00
Jasmin Parent
41a859ef73 Remove duplicated words in documentation 2022-07-11 15:30:16 -07:00
Azat Khuzhin
cb578bbe01 Fix possible "nmalloc >= ndalloc" assertion
In arena_stats_merge() first nmalloc was read, and after ndalloc.

However with this order, it is possible for some thread to incement
ndalloc in between, and then nmalloc < ndalloc, and assertion will fail,
like again found by ClickHouse CI [1] (even after #2234).

  [1]: https://github.com/ClickHouse/ClickHouse/issues/31531

Swap the order to avoid possible assertion.

Cc: @interwq
Follow-up for: #2234
2022-07-11 15:27:51 -07:00
David CARLIER
a9215bf18a CI update FreeBSD version. 2022-06-28 11:48:23 -07:00
Alex Lapenkou
3713932836 Update building for Windows instructions
Explain how to build for Windows in INSTALL.md and remove another readme.txt in
an obscure location.
2022-06-14 14:04:48 -07:00
David Carlier
4fc5c4fbac New configure option '--enable-pageid' for Linux
The option makes jemalloc use prctl with PR_SET_VMA to tag memory mappings with
"jemalloc_pg" or "jemalloc_pg_overcommit". This allows to easily identify
jemalloc's mappings in /proc/<pid>/maps. PR_SET_VMA is only available in Linux
5.17 and above.
2022-06-09 18:54:08 -07:00
Qi Wang
b950934916 Enable retain by default on macOS.
High number of mappings result in unusually high fork() cost on macOS.  Retain
fixes the issue, at a small cost of extra VM space reserved.
2022-06-09 11:37:44 -07:00
David Carlier
df8f7d10af Implement malloc_getcpu for amd64 and arm64 macOS
This enables per CPU arena on MacOS
2022-06-08 15:13:55 -07:00
Alex Lapenkou
df7ad8a9b6 Revert "Echo installed files via verbose 'install' command"
This reverts commit f15d8f3b41. "install -v"
turned out to be not portable and not work on NetBSD.
2022-06-07 12:28:45 -07:00
barracuda156
70e3735f3a jemalloc: fix PowerPC definitions in quantum.h 2022-05-26 10:51:10 -07:00
Alex Lapenkou
5b1f2cc5d7 Implement pvalloc replacement
Despite being an obsolete function, pvalloc is still present in GLIBC and should
work correctly when jemalloc replaces libc allocator.
2022-05-18 17:01:09 -07:00
Qi Wang
cd5aaf308a Improve the failure message upon opt_experimental_infallible_new. 2022-05-17 16:07:40 -07:00
Yuriy Chernyshov
70d4102f48 Fix compiling edata.h with MSVC
At the time an attempt to compile jemalloc 5.3.0 with MSVC 2019 results in the followin error message:

> jemalloc/include/jemalloc/internal/edata.h:660: error C4576: a parenthesized type followed by an initializer list is a non-standard explicit type conversion syntax
2022-05-09 14:51:07 -07:00
Qi Wang
54eaed1d8b Merge branch 'dev' 2022-05-06 11:28:25 -07:00
Qi Wang
304c919829 Update ChangeLog for 5.3.0. 2022-05-06 11:24:21 -07:00
Qi Wang
8cb814629a Make the default option of zero realloc match the system allocator. 2022-05-05 17:11:18 -07:00
Qi Wang
66c889500a Make test/unit/background_thread_enable more conservative.
To avoid resource exhaustion on 32-bit platforms.
2022-05-04 15:32:57 -07:00
Qi Wang
a7d73dd4c9 Update TUNING.md to include the new tcache_max option. 2022-05-04 10:59:40 -07:00
Qi Wang
254b011915 Small doc tweak of opt.trust_madvise.
Avoid quoted enabled and disabled because it's a bool type instead of char *.
2022-04-28 21:16:25 -07:00
Qi Wang
f5e840bbf0 Minor typo fix in doc. 2022-04-27 20:25:29 -07:00
Qi Wang
ceca07d2ca Correct the name of stats.mutexes.prof_thds_data in doc. 2022-04-25 20:12:58 -07:00
Qi Wang
391bad4b95 Avoid abort() in test/integration/cpp/infallible_new_true.
Allow setting the safety check abort hook through mallctl, which avoids abort()
and core dumps.
2022-04-25 11:29:32 -07:00
cuishuang
9a242f16d9 fix some typos
Signed-off-by: cuishuang <imcusg@gmail.com>
2022-04-25 11:29:00 -07:00
Qi Wang
0e29ad4efa Rename zero_realloc option "strict" to "alloc".
With realloc(ptr, 0) being UB per C23, the option name "strict" makes less sense
now.  Rename to "alloc" which describes the behavior.
2022-04-20 10:27:25 -07:00
Qi Wang
5841b6dbe7 Update FreeBSD image to 12.3 for cirrus ci. 2022-04-19 15:29:30 -07:00
Qi Wang
ed5fc14b28 Use volatile to workaround buffer overflow false positives.
In test/integration/rallocx, full usable size is checked which may confuse
overflow detection.
2022-04-04 12:16:46 -07:00
Alex Lapenkou
25517b852e Reoreder TravisCI jobs to optimize CI time
Sorting jobs by descending expected runtime helps to utilize concurrency
better.
2022-03-29 11:58:27 -07:00
Alex Lapenkou
8a49b62e78 Enable TravisCI for Windows 2022-03-29 11:58:27 -07:00
Alex Lapenkou
fdb6c10162 Add FreeBSD to TravisCI
Implement the generation of Travis jobs for FreeBSD. The generated jobs
replicate the existing CirrusCI config.
2022-03-29 11:58:27 -07:00
Alex Lapenkou
a93931537e Do not disable SEC by default for 64k pages platforms
Default SEC max_alloc option value was 32k, disabling SEC for platforms with
lg-page=16. This change enables SEC for all platforms, making minimum max_alloc
value equal to PAGE.
2022-03-24 22:05:35 -07:00
Charles
eaaa368bab Add comments and use meaningful vars in sz_psz2ind. 2022-03-24 16:56:59 -07:00
Alex Lapenkou
5bf03f8ce5 Implement PAGE_FLOOR macro 2022-03-22 17:45:55 -07:00
Alex Lapenkou
52631c90f6 Fix size class calculation for sec
Due to a bug in sec initialization, the number of cached size classes
was equal to 198. The bug caused the creation of more than a hundred of
unused bins, although it didn't affect the caching logic.
2022-03-22 17:45:55 -07:00
Qi Wang
7ae0f15c59 Add a default page size when cross-compile for Apple M1.
When cross-compile for M1 and no page size specified, use the default 16K and
skip detecting the page size (which is likely incorrect).
2022-03-21 14:30:48 -07:00
Alex Lapenkov
eb65d1b078 Fix FreeBSD system jemalloc TSD cleanup
Before this commit, in case FreeBSD libc jemalloc was overridden by another
jemalloc, proper thread shutdown callback was involved only for the overriding
jemalloc. A call to _malloc_thread_cleanup from libthr would be redirected to
user jemalloc, leaving data about dead threads hanging in system jemalloc. This
change tackles the issue in two ways. First, for current and old system
jemallocs, which we can not modify, the overriding jemalloc would locate and
invoke system cleanup routine. For upcoming jemalloc integrations, the cleanup
registering function will also be redirected to user jemalloc, which means that
system jemalloc's cleanup routine will be registered in user's jemalloc and a
single call to _malloc_thread_cleanup will be sufficient to invoke both
callbacks.
2022-03-02 10:10:27 -08:00
Azat Khuzhin
78b58379c8 Fix possible "nmalloc >= ndalloc" assertion.
It is possible that ndalloc will be updated before nmalloc, in
arena_large_ralloc_stats_update(), fix this by reorder those calls.

It was found by ClickHouse CI, that periodically hits this assertion [1].

  [1]: https://github.com/ClickHouse/ClickHouse/issues/31531

That issue contains lots of examples, with core dump and some gdb output [2].

  [2]: https://s3.amazonaws.com/clickhouse-test-reports/34951/96390a9263cb5af3d6e42a84988239c9ae87ce32/stress_test__debug__actions_.html

Here you can find binaries for that particular report [3] you need
clickhouse debug build [4].

  [3]: https://s3.amazonaws.com/clickhouse-builds/34951/96390a9263cb5af3d6e42a84988239c9ae87ce32/clickhouse_build_check_(actions)/report.html
  [4]: https://s3.amazonaws.com/clickhouse-builds/34951/96390a9263cb5af3d6e42a84988239c9ae87ce32/package_debug/clickhouse

Brief info from that report:

    2 0x000000002ad6dbfe in arena_stats_merge (tsdn=0x7f2399abdd20, arena=0x7f241ce01080, nthreads=0x7f24e4360958, dss=0x7f24e4360960, dirty_decay_ms=0x7f24e4360968, muzzy_decay_ms=0x7f24e4360970, nactive=0x7f24e4360978, ndirty=0x7f24e43
    e4360988, astats=0x7f24e4360998, bstats=0x7f24e4363310, lstats=0x7f24e4364990, estats=0x7f24e4366e50, hpastats=0x7f24e43693a0, secstats=0x7f24e436a020) at ../contrib/jemalloc/src/arena.c:138
            ndalloc = 226
            nflush = 0
            curlextents = 0
            nmalloc = 225
            nrequests = 0

Here you can see that they differs only by 1.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-01 12:28:28 -08:00
Alex Lapenkou
ca709c3139 Fix failed assertion due to racy memory access
While calculating the number of stashed pointers, multiple variables
potentially modified by a concurrent thread were used for the
calculation.  This led to some inconsistencies, correctly detected by
the assertions.  The change eliminates some possible inconsistencies by
using unmodified variables and only once a concurrently modified one.
The assertions are omitted for the cases where we acknowledge potential
inconsistencies too.
2022-02-17 09:35:52 -08:00
Qi Wang
063d134aeb Properly detect background thread support on Darwin.
When cross-compile, the host type / abi should be checked to determine
background thread compatibility.
2022-02-15 10:10:11 -08:00
Alex Lapenkou
a4e81221cc Document 'make uninstall'
Update INSTALL.md, reflecting the addition of 'uninstall' target.
2022-01-31 14:55:00 -08:00
Qi Wang
20f9802e4f Avoid overflow warnings in test/unit/safety_check. 2022-01-27 10:29:54 -08:00
Qi Wang
8c59c44ffa Add a dependency checking step at the end of malloc_conf_init.
Currently only prof_leak_error and prof_final are checked.
2022-01-26 17:17:48 -08:00
Qi Wang
efc539c040 Initialize prof_leak during prof init.
Otherwise, prof_leak may get set after prof_leak_error, and disagree with each
other.
2022-01-26 17:17:48 -08:00
Alex Lapenkou
002f0e9397 Disable TravisCI jobs generation for Windows
These jobs take about 20 minutes to complete. We don't want to enable
them until we switch to unlimited concurrency plan, otherwise the builds
will take way too long.
2022-01-26 10:16:57 -08:00
Alex Lapenkou
01a293fc08 Add Windows to TravisCI
Implement the generation of Travis jobs for Windows. Currently, the
generated jobs replicate Appveyor setup and complete successfully. There
is support for MinGW GCC and MSVC compilers as well as 64 and 32 bit
compilation. Linux and MacOS jobs behave identically, but some
environment variables change - CROSS_COMPILE_32BIT=yes is added for
builds with cross compilation, empty COMPILER_FLAGS are not set anymore.
2022-01-26 10:16:57 -08:00
yunxu
b798fabdf7 Add prof_leak_error option
The option makes the process to exit with error code 1 if a memory leak
is detected. This is useful for implementing automated tools that rely
on leak detection.
2022-01-21 16:24:20 -08:00
Alex Lapenkou
eafd2ac39f Forbid spaces in prefix and exec_prefix
Spaces in these are also not handled correctly by Make, so there's sense
in not allowing that.
2022-01-19 12:28:16 -08:00
Alex Lapenkou
36a09ba2c7 Forbid spaces in install suffix
To avoid potential issues with removing unintended files after 'make
uninstall', spaces are no longer allowed in install suffix. It's worth
mentioning, that with GNU Make on Linux spaces in install suffix didn't
work anyway, leading to errors in the Makefile. But being verbose
about this restriction makes it more transparent for the developers.
2022-01-19 12:28:16 -08:00
Shuduo Sang
640c3c72e6 Add support for 'make uninstall' 2022-01-19 12:28:16 -08:00
Alex Lapenkou
f15d8f3b41 Echo installed files via verbose 'install' command
It's not necessary to manually echo all install commands, similar effect
is achieved via 'install -v'
2022-01-19 12:28:16 -08:00
Charles
eb196815d6 Avoid calculating size of size class twice & delete sc_data_global. 2022-01-18 11:54:12 -08:00
Qi Wang
011449f17b Fix doc build with install-suffix. 2022-01-11 21:15:24 -08:00
Qi Wang
8b49eb132e Fix the HELP_STRING of --enable-doc. 2022-01-11 21:15:24 -08:00
Qi Wang
ddb170b1d9 Simplify arena_migrate() to take arena_t* instead of indices.
This makes debugging slightly easier and avoids the confusion of "should we
create new arenas" here.
2022-01-11 16:59:22 -08:00
Qi Wang
648b3b9f76 Lower the num_threads in the stress test of test/unit/prof_recent
This takes a fair amount of resources.  Under high concurrency it was causing
resource exhaustion such as pthread_create and mmap failures.
2022-01-11 16:58:56 -08:00
Qi Wang
d66162e032 Fix the extent state checking on the merge error path.
With DSS as primary, the default merge impl will (correctly) decline to merge
when one of the extent is non-dss.  The error path should tolerate the
not-merged extent being in a merging state.
2022-01-11 16:58:47 -08:00
Craig Leres
c9946fa7e6 FreeBSD also needs the OS-X "don't declare system functions as
nothrow" fix since it also has jemalloc in the base system
2022-01-11 11:53:25 -08:00
Jonathan Swinney
89fe8ee6bf Use the isb instruction instead of yield for spin locks on arm
isb introduces a small delay which is closer to the x86 pause instruction.
2022-01-10 15:29:56 -08:00
Qi Wang
6230cc88b6 Add background thread sleep retry in test/unit/hpa_background_thread
Under high concurrency / heavy test load (e.g. using run_tests.sh), the
background thread may not get scheduled for a longer period of time.  Retry 100
times max before bailing out.
2022-01-07 10:28:28 -08:00
Qi Wang
61978bbe69 Purge all if the last thread migrated away from an arena. 2022-01-06 19:02:26 -08:00
Yuriy Chernyshov
c91e62dd37 #include <features.h> as requested 2022-01-05 18:45:27 -08:00
Yuriy Chernyshov
18510020e7 Fix symbol conflict with musl libc
`__libc` prefixed functions are used by musl libc as non-replaceable malloc stubs.

Fix this conflict by checking if we are linking against glibc.
2022-01-05 18:45:27 -08:00
Qi Wang
f509703af5 Fix two conversion warnings in tcache. 2022-01-04 13:55:06 -08:00
Qi Wang
067c2da074 Fix unnecessary returns in san_(un)guard_pages_two_sided. 2022-01-04 13:55:06 -08:00
Qi Wang
d660683d3d Fix test config of lg_san_uaf_align.
The option may be configure-disabled, which resulted in the invalid options
output from the tests.
2022-01-04 11:03:51 -08:00
Qi Wang
eabe889162 Rename full_position to low_bound in cache_bin.h. 2021-12-29 14:44:43 -08:00
Qi Wang
dfdd7562f5 Rename san_enabled() to san_guard_enabled(). 2021-12-29 14:44:43 -08:00
Qi Wang
01d61a3c6f Fix a conversion warning. 2021-12-29 14:44:43 -08:00
Qi Wang
8b34a788b5 Fix an used-uninitialized warning (false positive). 2021-12-29 14:44:43 -08:00
Qi Wang
e491cef9ab Add stats for stashed bytes in tcache. 2021-12-29 14:44:43 -08:00
Qi Wang
b75822bc6e Implement use-after-free detection using junk and stash.
On deallocation, sampled pointers (specially aligned) get junked and stashed
into tcache (to prevent immediate reuse).  The expected behavior is to have
read-after-free corrupted and stopped by the junk-filling, while
write-after-free is checked when flushing the stashed pointers.
2021-12-29 14:44:43 -08:00
Qi Wang
06aac61c4b Split the core logic of tcache flush into a separate function.
The core function takes a ptr array as input (containing items to be flushed),
which will be reused to flush sanitizer-stashed items.
2021-12-29 14:44:43 -08:00
Qi Wang
d038160f3b Fix shadowed variable usage.
Verified with EXTRA_CFLAGS=-Wshadow.
2021-12-23 10:55:08 -08:00
Qi Wang
bd70d8fc0f Add the profiling settings for tests explicit.
Many profiling related tests make assumptions on the profiling settings,
e.g. opt_prof is off by default, and prof_active is default on when opt_prof is
on.  However the default settings can be changed via --with-malloc-conf at build
time.  Fixing the tests by adding the assumed settings explicitly.
2021-12-22 20:10:28 -08:00
Joshua Watt
e491df1d2f Fix warnings when using autoheader. 2021-12-22 13:57:41 -08:00
Qi Wang
60b9637cc0 Only invoke malloc_cpu_count_is_deterministic() when necessary.
Also refactor the handling of the non-deterministic case.  Notably allow the
case with narenas set to proceed w/o warnings, to not affect existing valid use
cases.
2021-12-22 13:52:12 -08:00
Qi Wang
837b37c4ce Fix the time-since computation in HPA.
nstime module guarantees monotonic clock update within a single nstime_t.  This
means, if two separate nstime_t variables are read and updated separately,
nstime_subtract between them may result in underflow.  Fixed by switching to the
time since utility provided by nstime.
2021-12-21 23:37:22 -08:00
Qi Wang
310af725b0 Add nstime_ns_since which obtains the duration since the input time. 2021-12-21 23:37:22 -08:00
Azat Khuzhin
cafe9a3158 Disable percpu arena in case of non deterministic CPU count
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: https://github.com/lxc/lxcfs/issues/301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: https://github.com/jemalloc/jemalloc/pull/1939
Refs: https://github.com/ClickHouse/ClickHouse/issues/32806

v2: move malloc_cpu_count_is_deterministic() into
    malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
    allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
2021-12-21 11:53:09 -08:00
mweisgut
bb5052ce90 Fix base_ehooks_get_for_metadata 2021-12-20 15:37:53 -08:00
Alex Lapenkov
9015e129bd Update visual studio projects
Add relevant source files to the projects.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
d90655390f San: Create a function for committing and zeroing
Committing and zeroing an extent is usually done together, hence a new
function.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
800ce49c19 San: Bump alloc frequently reused guarded allocations
To utilize a separate retained area for guarded extents, use bump alloc
to allocate those extents.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
f56f5b9930 Pass 'frequent_reuse' hint to PAI
Currently used only for guarding purposes, the hint is used to determine
if the allocation is supposed to be frequently reused. For example, it
might urge the allocator to ensure the allocation is cached.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
2c70e8d351 Rename 'arena_decay' to 'arena_util'
While initially this file contained helper functions for one particular
test, now its usage spread across different test files. Purpose has
shifted towards a collection of handy arena ctl wrappers.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
0f6da1257d San: Implement bump alloc
The new allocator will be used to allocate guarded extents used as slabs
for guarded small allocations.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
34b00f8969 San: Avoid running san tests with prof enabled
With prof enabled, number of page aligned allocations doesn't match the
number of slab "ends" because prof allocations skew the addresses. It
leads to 'pages' array overflow and hard to debug failures.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
62f9c54d2a San: Rename 'guard' to 'san'
This prepares the foundation for more sanitizer-related work in the
future.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
d9bbf539ff CI: Refactor gen_travis.py
The CI consolidation project adds more operating systems to Travis. This
refactoring is aimed to decouple the configuration of each individual OS
from the actual job matrix generation and formatting. Otherwise,
format_job function would turn into a huge collection of ad-hoc
conditions.
2021-12-06 15:11:14 -08:00
Qi Wang
7dcf77809c Mark slab as true on sized dealloc fast path.
For sized dealloc, fastpath only handles lookup-able sizes, which must be slabs.
2021-12-06 14:28:34 -08:00
Qi Wang
af6ee27c0d Enforce abort_conf:true when malloc_conf is not fully recognized.
Ensures the malloc_conf "ends with key", "ends with comma" and "malform conf
string" cases abort under abort_conf:true.
2021-12-06 14:27:25 -08:00
David CARLIER
113e8e68e1 freebsd 14 build fix proposal.
seems to have introduced finally more linux api cpu affinity (sched_* family)
compatibility detected at configure time thus adjusting accordingly.
2021-12-06 13:15:21 -08:00
Alex Lapenkou
3b3257a709 Correct opt.prof_leak documentation
The option has been misleading, because it stays disabled unless
prof_final is also specified. In practice it's impossible to detect that
the option is silently disabled, because it just doesn't provide any
output as if there are no memory leaks detected.
2021-11-23 15:10:21 -08:00
Qi Wang
cdabe908d0 Track the initialized state of nstime_t on debug build.
Some nstime_t operations require and assume the input nstime is initialized
(e.g. nstime_update) -- uninitialized input may cause silent failures which is
difficult to reproduce / debug.  Add an explicit flag to track the state
(limited to debug build only).

Also fixed an use case in hpa (time of last_purge).
2021-11-17 15:49:27 -08:00
Qi Wang
400c59895a Fix uninitialized nstime reading / updating on the stack in hpa.
In order for nstime_update to handle non-monotonic clocks, it requires the input
nstime to be initialized -- when reading for the first time, zero init has to be
done.  Otherwise random stack value may be seen as clocks and returned.
2021-11-16 16:54:12 -08:00
Qi Wang
8b81d3f214 Fix the initialization of last_event in thread event init.
The event counters maintain a relationship with the current bytes: last_event <=
current < next_event.  When a reinit happens (e.g. reincarnated tsd), the last
event needs progressing because all events start fresh from the current bytes.
2021-11-16 10:28:00 -08:00
Qi Wang
6bdb4f5ab0 Check prof_active in addtion to opt_prof during batch_alloc(). 2021-11-12 09:20:18 -08:00
Qi Wang
37342a4d32 Add ctl interface for experimental_infallible_new. 2021-11-05 13:20:09 -07:00
Alex Lapenkou
6cb585b13a San: Unguard guarded slabs during arena destruction
When opt_retain is on, slab extents remain guarded in all states, even
retained. This works well if arena is never destroyed, because we
anticipate those slabs will be eventually reused. But if the arena is
destroyed, the slabs must be unguarded to prevent leaking guard pages.
2021-11-03 17:55:50 -07:00
Qi Wang
b6a7a535b3 Optimize away a branch on the free fastpath.
On the rtree metadata lookup fast path, there will never be a NULL returned when
the cache key matches (which is unknown to the compiler).  The previous logic
was checking for NULL return value, resulting in the extra branch (in addition to
the cache key match checking).  Make the lookup_fast return a bool to indicate
cache miss / match, so that the extra branch is avoided.
2021-10-28 16:55:54 -07:00
Qi Wang
4d56aaeca5 Optimize away the tsd_fast() check on free fastpath.
To ensure that the free fastpath can tolerate uninitialized tsd, improved the
static initializer for rtree_ctx in tsd.
2021-10-28 10:05:59 -07:00
Ashutosh Grewal
26f5257b88 Remove declaration of an undefined function 2021-10-18 11:10:22 -07:00
Wang JinLong
2159615419 Add new architecture loongarch.
Signed-off-by: Wang JinLong <wangjinlong@uniontech.com>
2021-10-18 10:57:34 -07:00
Alex Lapenkou
8daac7958f Redefine functions with test hooks only for tests
Android build has issues with these defines, this will allow the build to
succeed if it doesn't need to build the tests.
2021-10-15 15:25:36 -07:00
Alex Lapenkou
c9ebff0fd6 Initialize deferred_work_generated
As the code evolves, some code paths that have previously assigned
deferred_work_generated may cease being reached. This would leave the value
uninitialized. This change initializes the value for safety.
2021-10-07 11:50:38 -07:00
Stan Angelov
912324a1ac Add debug check outside of the loop in hpa_alloc_batch.
This optimizes the whole loop away for non-debug builds.
2021-10-01 14:40:43 -07:00
David CARLIER
cf9724531a Darwin malloc_size override support proposal.
Darwin has similar api than Linux/FreeBSD's malloc_usable_size.
2021-10-01 14:32:40 -07:00
Qi Wang
ab0f1604b4 Delay the atexit call to prof_log_start().
So that atexit() is only done when prof_log is used.
2021-09-29 13:35:50 -07:00
David Carlier
11b6db7448 CPU affinity on BSD platforms support. 2021-09-28 11:40:21 -07:00
Qi Wang
83f3294027 Small refactors around 7bb05e0. 2021-09-27 16:05:13 -07:00
Qi Wang
3c4b717ffc Remove unused header base_structs.h. 2021-09-27 16:05:13 -07:00
Qi Wang
deb8e62a83 Implement guard pages.
Adding guarded extents, which are regular extents surrounded by guard pages
(mprotected).  To reduce syscalls, small guarded extents are cached as a
separate eset in ecache, and decay through the dirty / muzzy / retained pipeline
as usual.
2021-09-26 16:30:15 -07:00
Piotr Balcer
7bb05e04be add experimental.arenas_create_ext mallctl
This mallctl accepts an arena_config_t structure which
can be used to customize the behavior of the arena.
Right now it contains extent_hooks and a new option,
metadata_use_hooks, which controls whether the extent
hooks are also used for metadata allocation.

The medata_use_hooks option has two main use cases:

1. In heterogeneous memory systems, to avoid metadata
being placed on potentially slower memory.

2. Avoiding virtual memory from being leaked as a result
of metadata allocation failure originating in an extent hook.
2021-09-24 13:43:18 -07:00
Alex Lapenkou
a9031a0970 Allow setting a dump hook
If users want to be notified when a heap dump occurs, they can set this hook.
2021-09-22 15:04:01 -07:00
Alex Lapenkou
f7d46b8119 Allow setting custom backtrace hook
Existing backtrace implementations skip native stack frames from runtimes like
Python. The hook allows to augment the backtraces to attribute allocations to
native functions in heap profiles.
2021-09-22 15:04:01 -07:00
Qi Wang
523cfa55c5 Guard prof related mallctl with opt_prof.
The prof initialization is done only when opt_prof is true.  This change makes
sure the prof_* mallctls only have limited read access (i.e. no access to prof
internals) when opt_prof is false.

In addition, initialize the global prof mutexes even if opt_prof is false.  This
makes sure the mutex stats are set properly.
2021-09-20 10:42:16 -07:00
Alex Lapenkou
6e848a005e Remove opt_background_thread_hpa_interval_max_ms
Now that HPA can communicate the time until its deferred work should be done,
this option is not used anymore.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
8229cc77c5 Wake up background threads on demand
This change allows every allocator conforming to PAI communicate that it
deferred some work for the future. Without it if a background thread goes into
indefinite sleep, there is no way to notify it about upcoming deferred work.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
97da57c13a HPA: Add min_purge_interval_ms option
This rate limiting option is required to avoid purging too often.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
b8b8027f19 Allow PAI to calculate time until deferred work
Previously the calculation of sleep time between wakeups was implemented within
background_thread. This resulted in some parts of decay and hpa specific
logic mixing with background thread implementation. In this change, background
thread delegates this calculation to arena and it, in turn, delegates it to PAI.
The next step is to implement the actual calculation of time until deferred work
in HPA.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
26140dd246 Reject --enable-prof-libunwind without --enable-prof
Prior to the change you could specify --enable-prof-libunwind without
--enable-prof which would do effectively nothing. This was confusing as I
expected --enable-prof-libunwind to act like --enable-prof, but use libunwind.
2021-09-13 14:02:40 -07:00
Mingli Yu
e5062e9fb9 Makefile.in: make sure doc generated before install
There is a race between the doc generation and the doc installation,
so make the install depend on the build for doc.

Signed-off-by: Mingli Yu <mingli.yu@windriver.com>
2021-09-13 13:40:39 -07:00
Qi Wang
8b24cb8fdf Don't assume initialized arena in the default alloc hook.
Specifically, this change allows the default alloc hook to used during
arenas.create.  One use case is to invoke the default alloc hook in a customized
hook arena, i.e. the default hooks can be read out of a default arena, then
create customized ones based on these hooks.  Note that mixing the default with
customized hooks is not recommended, and should only be considered when the
customization is simple and straightforward.
2021-08-25 14:19:25 -07:00
Alex Lapenkou
c01a885e94 HPA: Correctly calculate retained pages
Retained pages are those which haven't been touched and are unbacked from OS
perspective. For a pageslab their number should equal "total pages in slab"
minus "touched pages".
2021-08-20 18:06:17 -07:00
Alex Lapenkou
2c625d5cd9 Fix warnings when compiled with clang
When clang sees an unknown warning option, unlike gcc it doesn't fail the build
with error. It issues a warning. Hence JE_CFLAGS_ADD with warning options that
didnt't exist in clang would still mark those options as available. This led to
several warnings when built with clang or "gcc" on OSX. This change fixes those
warnings by simply making clang fail builds with non-existent warning options.
2021-08-13 14:14:46 -07:00
Alex Lapenkou
9d02bdc883 Port gen_run_tests.py to python3
Insignificant changes to make the script runnable on python3.
2021-08-13 10:59:32 -07:00
Qi Wang
5884a076fb Rename prof.dump_prefix to prof.prefix
This better aligns with our naming convention.  The option has not been included
in any upstream release yet.
2021-08-12 23:04:29 -07:00
Qi Wang
6a01600712 Add Cirrus CI testing matrix
Contains 16 testing configs -- a mix of debug, prof, -m32
and a few uncommon options.
2021-08-10 09:59:10 -07:00
Alex Lapenkou
f58064b932 Verify that HPA is used before calling its functions
This change eliminates the possibility of PA calling functions of uninitialized
HPA.
2021-08-05 16:43:28 -07:00
David Goldblatt
27f71242b7 Mutex: Tweak internal spin count.
The recent pairing heap optimizations flattened the lock hold time profile.
This was a win for raw cycle counts, but ended up causing us to "just miss"
acquiring the mutex before sleeping more often.  Bump those counts.
2021-08-05 14:33:16 -07:00
David Goldblatt
6f41ba55ee Mutex: Make spin count configurable.
Don't document it since we don't want to support this as a "real" setting, but
it's handy for testing.
2021-08-05 10:13:53 -07:00
David Goldblatt
dae24589bc PH: Insert-below-min fast-path. 2021-08-02 15:02:49 -07:00
David Goldblatt
40d53e007c ph: Add aux-list counting and pre-merging. 2021-08-02 15:02:49 -07:00
David Goldblatt
dcb7b83fac Eset: Cache summary information for heap edatas.
This lets us do a single array scan to find first fits, instead of taking a
cache miss per examined size class.
2021-08-02 15:02:49 -07:00
David Goldblatt
252e0942d0 Eset: Pull per-pszind data into structs.
We currently have one for stats and one for the data.  The data struct is just a
wrapper around the edata_heap_t, but this will change shortly.
2021-08-02 15:02:49 -07:00
David Goldblatt
dc0a4b8b2f Edata: Pull out comparison fields into a summary.
For now, this is a no-op; eventually, it will allow some caching in the eset.
2021-08-02 15:02:49 -07:00
David Goldblatt
0170dd198a Edata: Fix a couple typos.
Some readability-enhancing whitespace, and a spelling error.
2021-08-02 15:02:49 -07:00
David Goldblatt
08a4cc0969 Pairing heap: inline functions instead of macros.
By force-inlining everything that would otherwise be a macro, we get the same
effect (it's not clear in the first place that this is actually a good idea, but
it avoids making any changes to the existing performance profile).

This makes the code more maintainable (in anticipation of subsequent changes),
as well as making performance profiles and debug info more readable (we get
"real" line numbers, instead of making everything point to the macro definition
of all associated functions).
2021-08-02 15:02:49 -07:00
David Goldblatt
92a1e38f52 edata_cache: Allow unbounded fast caching.
The edata_cache_small had a fill/flush heuristic.  In retrospect, this was a
premature optimization; more testing indicates that an unbounded cache is
effectively fine here, and moreover we spend a nontrivial amount of time doing
unnecessary filling/flushing.

As the HPA takes on a larger and larger fraction of all allocations, any
theoretical differences in allocation patterns should shrink.  The HPA is more
efficient with its metadata in general, so it still comes out ahead on metadata
usage anyways.
2021-07-26 15:14:37 -07:00
David Goldblatt
d93eef2f40 HPA: Introduce a redesigned hpa_central_t.
For now, this only handles allocating virtual address space to shards, with no
reuse.  This is framework, though; it will change over time.
2021-07-23 21:59:59 -07:00
David Goldblatt
e09eac1d4e Remove hpa_central.
This is now dead code.
2021-07-23 21:59:59 -07:00
Alex Lapenkou
c88fe355e6 Add unit tests for decay
After slight changes in the interface, it's an opportunity to enhance unit
tests.
2021-07-22 23:19:09 -07:00
Alex Lapenkou
aaea4fd1e6 Add more documentation to decay.c
It took me a while to understand why some things are implemented the way they
are, so hopefully it will help future readers.
2021-07-22 23:19:09 -07:00
Alex Lapenkou
4b633b9a81 Clean up background thread sleep computation
Isolate the computation of purge interval from background thread logic and
move into more suitable file.
2021-07-22 23:19:09 -07:00
David Goldblatt
6630c59896 HPA: Hugification hysteresis.
We wait a while after deciding a huge extent should get hugified to see if it
gets purged before long.  This avoids hugifying extents that might shortly get
dehugified for purging.

Rename and use the hpa_dehugification_threshold option support code for this,
since it's now ignored.
2021-07-12 17:59:18 -07:00
David Goldblatt
113938b6f4 HPA: Pull out a hooks type.
For now, this is a no-op change.  In a subsequent commit, it will be useful for
testing.
2021-07-12 17:59:18 -07:00
David Goldblatt
1d4a7666d5 HPA: Do deferred operations on background threads. 2021-07-12 17:59:18 -07:00
David Goldblatt
583284f2d9 Add HPA deferral functionality. 2021-07-12 17:59:18 -07:00
David Goldblatt
ace329d11b HPA batch dalloc: Just do one deferred work check.
We only need to do one check per batch dalloc, not one check per dalloc in the
batch.
2021-07-12 17:59:18 -07:00
David Goldblatt
47d8a7e6b0 psset: Purge empty slabs first.
These are particularly good candidates for purging (listed in the diff).
2021-07-12 17:59:18 -07:00
David Goldblatt
41fd56605e HPA: Purge across retained extents.
This lets us cut down on the number of expensive system calls we perform.
2021-07-12 17:59:18 -07:00
David Goldblatt
347523517b PAI: Fix a typo. 2021-07-12 17:59:11 -07:00
David Goldblatt
9c42ed2d14 Travis: Don't test "clang" on OS X.
On OS X, "gcc" is really just clang anyways, so this combination gets tested by
the gcc test.  This is purely redundant, and (since it runs early in the output)
increases time to signal for real breakages further down in the list.
2021-07-08 09:53:28 -07:00
David Goldblatt
d202218e86 HPA: Fix typos with big performance implications.
This fixes two simple but significant typos in the HPA:
- The conf string parsing accidentally set a min value of PAGE for
  hpa_sec_batch_fill_extra; i.e. allocating 4096 extra pages every time we
  attempted to allocate a single page.  This puts us over the SEC flush limit,
  so we then immediately flush all but one of them (probably triggering
  purging).
- The HPA was using the default PAI batch alloc implementation, which meant it
  did not actually get any locking advantages.

This snuck by because I did all the performance testing without using the PAI
interface or config settings.  When I cleaned it up and put everything behind
nice interfaces, I only did correctness checks, and didn't try any performance
ones.
2021-06-24 16:26:55 -07:00
David Goldblatt
de033f56c0 mpsc_queue: Add module.
This is a simple multi-producer, single-consumer queue.  The intended use case
is in the HPA, as we begin supporting hpdatas that move between hpa_shards.  We
take just a single CAS as the cost to send a message (or a batch of messages) in
the low-contention case, and lock-freedom lets us avoid some lock-ordering
issues.
2021-06-24 14:55:49 -07:00
David Goldblatt
4452a4812f Add opt.experimental_infallible_new.
This allows a guarantee that operator new never throws.

Fix the .gitignore rules to include test/integration/cpp while we're here.
2021-06-24 12:22:51 -07:00
David Goldblatt
0689448b1e Travis: Unbreak the builds.
In the hopes of future-proofing as much as possible, jump to the latest
distribution Travis supports.
2021-06-24 07:40:28 -07:00
David Carlier
4fb93a18ee extent_can_acquire_neighbor typo fix 2021-06-19 08:13:11 -07:00
Vineet Gupta
2381efab57 ARC: add Minimum allocation alignment
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2021-06-03 13:43:38 -07:00
Ondřej Surý
2c0f4c2ac3 Fix typo in configure.ac: experimetal -> experimental 2021-05-25 08:20:37 -07:00
David Goldblatt
36c6bfb963 SEC: Allow arbitrarily many shards, cached sizes. 2021-05-22 08:17:41 -07:00
Deanna Gelbart
11beab38bc Added --debug-syms-by-id option 2021-05-17 10:00:40 -07:00
Qi Wang
08089589f7 Fix an interaction between the oversize_threshold test and bgthds.
Also added the shared utility to check if background_thread is enabled.
2021-05-13 16:19:14 -07:00
David Goldblatt
5417938215 Red-black tree: add summarize/filter.
This allows tracking extra information in the nodes of an red-black tree to
filter searches in the tree to just those that match some property.
2021-05-12 11:14:23 -07:00
David Goldblatt
b2c08ef2e6 RB unit tests: don't test reentrantly.
The RB code doesn't do any allocation, and takes a little bit of time to run.
There's no sense in doing everything three times.
2021-05-12 11:14:23 -07:00
David Goldblatt
aea91b8c33 Clean up some minor data structure inconsistencies
Namely, unify the include guard styling with the majority of the project, and do
flat_bitmap -> fb, to match its naming convention.
2021-05-12 11:14:23 -07:00
David Goldblatt
1f688490e1 Stats: Fix a printing bug when hpa_dirty_mult = -1
Missed a layer of indirection.
2021-05-05 19:45:25 -07:00
David Goldblatt
4f7cb3a413 Sized deallocation: fix a typo.
dealloction -> deallocation.
2021-05-04 16:46:15 -07:00
David Goldblatt
12cd13cd41 Fix thread.name/prof_sys_thread_name interaction
When prof_sys_thread_name is true, we don't allow setting the thread name.
Teach the unit test this.
2021-03-31 14:45:12 -07:00
David Goldblatt
304cdbb132 Fix a prof_recent/prof_sys_thread_name interaction
When both of these are enabled, the output format changes slightly.  Teach the
unit test about the interaction.
2021-03-31 14:45:12 -07:00
Qi Wang
9b523c6c15 Refactor the locking in extent_recycle().
Hold the ecache lock across extent_recycle_extract() and extent_recycle_split(),
so that the extent_deactivate after split can avoid re-take the ecache mutex.
2021-03-31 14:42:33 -07:00
Qi Wang
ce68f326b0 Avoid the release & re-acquire of the ecache locks around the merge hook. 2021-03-31 14:42:33 -07:00
Qi Wang
7dc77527ba Delete the mutex_pool module. 2021-03-29 17:19:53 -07:00
Qi Wang
03d95cba88 Remove the unnecessary arena_ind_set in base_alloc_edata().
All edata alloc sites are already followed with proper edata_init().
2021-03-29 17:19:53 -07:00
Qi Wang
3093d9455e Move the edata mergeability related functions to extent.h. 2021-03-29 17:19:53 -07:00
Qi Wang
7c964b0352 Add rtree_write_range(): writing the same content to multiple leaf elements.
Apply to emap_(de)register_interior which became noticeable in perf profiles.
2021-03-29 17:19:53 -07:00
Qi Wang
add636596a Stop checking head state in the merge hook.
Now that all merging go through try_acquire_edata_neighbor, the mergeablility
checks (including head state checking) are done before reaching the merge hook.
In other words, merge hook will never be called if the head state doesn't agree.
2021-03-29 17:19:53 -07:00
Qi Wang
49b7d7f0a4 Passing down the original edata on the expand path.
Instead of passing down the new_addr, pass down the active edata which allows us
to always use a neighbor-acquiring semantic.  In other words, this tells us both
the original edata and neighbor address.  With this change, only neighbors of a
"known" edata can be acquired, i.e. acquiring an edata based on an arbitrary
address isn't possible anymore.
2021-03-29 17:19:53 -07:00
Qi Wang
1784939688 Use rtree tracked states to protect edata outside of ecache locks.
This avoids the addr-based mutexes (i.e. the mutex_pool), and instead relies on
the metadata tracked in rtree leaf: the head state and extent_state.  Before
trying to access the neighbor edata (e.g. for coalescing), the states will be
verified first -- only neighbor edatas from the same arena and with the same
state will be accessed.
2021-03-29 17:19:53 -07:00
Qi Wang
9ea235f8fe Add witness_assert_positive_depth_to_rank(). 2021-03-29 17:19:53 -07:00
Qi Wang
4d8c22f9a5 Store edata->state in rtree leaf and make edata_t 128B aligned.
Verified that this doesn't result in any real increase of edata_t bytes
allocated.
2021-03-29 17:19:53 -07:00
Qi Wang
70d1541c5b Track extent is_head state in rtree leaf. 2021-03-29 17:19:53 -07:00
Qi Wang
862219e461 Add quiescence sync before deleting base during arena_destroy. 2021-03-29 17:19:53 -07:00
Evers Chen
a137a68252 Remove redundant declaration, pac_retain_grow_limit_get_set was declared twice in pac.h 2021-03-29 16:42:46 -07:00
lirui
2ae1ef7dbd Fix doc large size 54 KiB error 2021-03-26 13:34:49 -07:00
Qi Wang
61afb6a405 Fix locking on arena_i_destroy_ctl().
The ctl_mtx should be held to protect against concurrent arenas.create.
2021-03-22 23:18:52 -07:00
David Goldblatt
9193ea2248 Cirrus: fix build.
Remaining on 12.1 has started to break with an m4 error.  Upgrading fixes
things.

Mangle public symbols to work around a public definition error.
2021-03-19 13:42:30 -07:00
Qi Wang
3913077146 Mark head state during dss alloc.
Specifically, the extent_dalloc_gap relies on the correct head state to
coalesce.
2021-03-12 19:17:25 -08:00
Qi Wang
11127240ca Remove redundant enable-debug definition in configure. 2021-03-12 11:30:56 -08:00
Qi Wang
22be724af4 Set is_head in extent_alloc_wrapper w/ retain.
When retain is on, when extent_grow_retained failed (e.g. due to split hook
failures), we'll try extent_alloc_wrapper as the last resort.  Set the is_head
bit in that case to be consistent.  The allocated extent in that case will be
retained properly, but not merged with other extents.
2021-03-12 10:20:08 -08:00
David Goldblatt
73ca4b8ef8 HPA: Use dirtiest-first purging.
This seems to be practically beneficial, despite some pathological corner cases.
2021-02-19 15:10:54 -08:00
David Goldblatt
0f6c420f83 HPA: Make purging/hugifying more principled.
Before this change, purge/hugify decisions had several sharp edges that could
lead to pathological behavior if tuning parameters weren't carefully chosen.
It's the first of a series; this introduces basic "make every hugepage with
dirty pages purgeable" functionality, and the next commit expands that
functionality to have a smarter policy for picking hugepages to purge.

Previously, the dehugify logic would *never* dehugify a hugepage unless it was
dirtier than the dehugification threshold.  This can lead to situations in which
these pages (which themselves could never be purged) would push us above the
maximum allowed dirty pages in the shard.  This forces immediate purging of any
pages deallocated in non-hugified hugepages, which in turn places nonobvious
practical limitations on the relationships between various config settings.

Instead, we make our preference not to dehugify to purge a soft one rather than
a hard one.  We'll avoid purging them, but only so long as we can do so by
purging non-hugified pages.  If we need to purge them to satisfy our dirty page
limits, or to hugify other, more worthy candidates, we'll still do so.
2021-02-19 15:10:54 -08:00
David Goldblatt
6bddb92ad6 psset: Rename "bitmap" to "pageslab_bitmap".
It tracks pageslabs.  Soon, we'll have another bitmap (to track dirty pages)
that we want to disambiguate.

While we're here, fix an out-of-date comment.
2021-02-19 15:10:54 -08:00
David Goldblatt
154aa5fcc1 Use the flat bitmap for eset and psset bitmaps.
This is simpler (note that the eset field comment was actually incorrect!), and
slightly faster.
2021-02-19 15:10:54 -08:00
David Goldblatt
271a676dcd hpdata: early bailout for longest free range.
A number of common special cases allow us to stop iterating through an hpdata's
bitmap earlier rather than later.
2021-02-19 15:10:54 -08:00
David Goldblatt
d21d5b46b6 Edata: Move sn into its own field.
This lets the bins use a fragmentation avoidance policy that matches the HPA's
(without affecting the PAC).
2021-02-19 15:10:54 -08:00
David Goldblatt
fb327368db SEC: Expand option configurability.
This change pulls the SEC options into a struct, which simplifies their handling
across various modules (e.g. PA needs to forward on SEC options from the
malloc_conf string, but it doesn't really need to know their names).  While
we're here, make some of the fixed constants configurable, and unify naming from
the configuration options to the internals.
2021-02-19 15:10:54 -08:00
David Goldblatt
ce9386370a HPA: Implement batch allocation. 2021-02-19 15:10:54 -08:00
David Goldblatt
cdae6706a6 SEC: Use batch fills.
Currently, this doesn't help much, since no PAI implementation supports
flushing.  This will change in subsequent commits.
2021-02-19 15:10:54 -08:00
David Goldblatt
480f3b11cd Add a batch allocation interface to the PAI.
For now, no real allocator actually implements this interface; this will change
in subsequent diffs.
2021-02-19 15:10:54 -08:00
David Goldblatt
bf448d7a5a SEC: Reduce lock hold times.
Only flush a subset of extents during flushing, and drop the lock while doing
so.
2021-02-19 15:10:54 -08:00
David Goldblatt
1944ebbe7f HPA: Implement batch deallocation.
This saves O(n) mutex locks/unlocks during SEC flush.
2021-02-19 15:10:54 -08:00
David Goldblatt
f47b4c2cd8 PAI/SEC: Add a dalloc_batch function.
This lets the SEC flush all of its items in a single call, rather than flushing
everything at once.
2021-02-19 15:10:54 -08:00
David Goldblatt
4b8870c7db SEC: Fix a comment typo. 2021-02-19 15:10:54 -08:00
Jordan Rome
cde7097eca Update INSTALL.md to mention 'autoconf'
'autoconf' needs to be installed for './autogen.sh' to work.
2021-02-16 17:48:46 -08:00
Qi Wang
a11be50332 Implement opt.cache_oblivious.
Keep config.cache_oblivious for now to remain backward-compatible.
2021-02-11 11:32:01 -08:00
Jordan Rome
8c5e5f50a2 Fix stats for "tcache_max" (was "lg_tcache_max")
This opt was changed here: c8209150f9
and looks like this got missed.

Also update the write type to be unsigned.
2021-02-10 23:01:46 -08:00
Qi Wang
041145c272 Report the correct and wrong sizes on sized dealloc bug detection. 2021-02-08 14:42:27 -08:00
Qi Wang
f3b2668b32 Report the offending pointer on sized dealloc bug detection. 2021-02-08 14:42:27 -08:00
David Goldblatt
edbfe6912c Inline malloc fastpath into operator new.
This saves a small but non-negligible amount of CPU in C++ programs.
2021-02-08 14:17:47 -08:00
David Goldblatt
79f81a3732 HPA: Make dirty_mult configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
32dd153796 HPA: Make dehugification threshold configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
4790db15ed HPA: make the hugification threshold configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
b3df80bc79 Pull HPA options into a containing struct.
Currently that just means max_alloc, but we're about to add more.  While we're
touching these lines anyways, tweak things to be more in line with testing.
2021-02-04 20:58:31 -08:00
David Goldblatt
bdb7307ff2 fxp: Add FXP_INIT_PERCENT
This lets us specify fxp values easily in source.
2021-02-04 20:58:31 -08:00
David Goldblatt
caef4c2868 FXP: add fxp_mul_frac.
This can multiply size_ts by a fraction without the risk of overflow.
2021-02-04 20:58:31 -08:00
David Goldblatt
56e85c0e47 HPA: Use a whole-shard purging heuristic.
Previously, we used only hpdata-local information to decide whether to purge.
2021-02-04 20:58:31 -08:00
David Goldblatt
dc886e5608 hpdata: Return the number of pages to be purged.
We'll use this in the next commit.
2021-02-04 20:58:31 -08:00
David Goldblatt
9fd9c876bb psset: keep aggregate stats.
This will let us quickly query these stats to make purging decisions quickly.
2021-02-04 20:58:31 -08:00
David Goldblatt
da63f23e68 HPA: Track pending purges/hugifies in the psset.
This finishes the refactoring of the HPA/psset interactions the past few commits
have been building towards.

Rather than the HPA removing and then reinserting hpdatas, it simply begins
updates and ends them.  These updates can set flags on the hpdata that prevent
it from being returned for certain types of requests.  For example, it can call
hpdata_alloc_allowed_set(hpdata, false) during an update, at which point the
given hpdata will no longer be returned for psset_pick_alloc requests.

This has various of benefits:
- It maintains stats correctness during purges and hugifies.
- It allows simpler and more explicit concurrency control for the various
  special cases (e.g. allocations are disallowed during purge, but not during
  hugify).
- It lets allocations and deallocations avoid disturbing the purging and
  hugification orderings.  If an hpdata "loses its place" in one of the queues
  just do to an alloc / dalloc, it can result in pathological edge cases where
  very hot, very full hugepages never get hugified  (and cold extents on the
  same hugepage as hot ones never get purged).

The key benefit though is that tracking hpdatas to be purged / hugified in a
principled way will let us do delayed purging and hugification.  Eventually this
will let us move these operations to background threads, but in the short term
the benefit is that it will let us have global purging policies (e.g. purge when
the entire arena has too many dirty pages, rather than any particular hugepage).
2021-02-04 20:58:31 -08:00
David Goldblatt
0ea3d6307c CTL, Stats: report HPA empty slab stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
bf64557ed6 Move empty slab tracking to the psset.
We're moving towards a world in which purging decisions are less rigidly
enforced at a single-hugepage level.  In that world, it makes sense to keep
around some hpdatas which are not completely purged, in which case we'll need to
track them.
2021-02-04 20:58:31 -08:00
David Goldblatt
99fc0717e6 psset: Reconceptualize insertion/removal.
Really, this isn't a functional change, just a naming change.  We start thinking
of pageslabs as being always in the psset.  What we used to think of as removal
is now thought of as being in the psset, but in the process of being updated
(and therefore, unavalable for serving new allocations).

This is in preparation of subsequent changes to support deferred purging;
allocations will still be in the psset for the purposes of choosing when to
purge, but not for purposes of allocation/deallocation.
2021-02-04 20:58:31 -08:00
David Goldblatt
061cabb712 HPA stats: report retained instead of inactive.
This more closely maps to the PAC.
2021-02-04 20:58:31 -08:00
David Goldblatt
d3e5ea03c5 HPA: Track dirty stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
68a1666e91 hpdata: Rename "dirty" to "touched".
This matches the usage in the rest of the codebase.
2021-02-04 20:58:31 -08:00
David Goldblatt
be0d7a53f3 HPA: Don't track inactive pages.
This is really only useful for human consumption.  Correspondingly, emit it only
in the human-readable stats, and let everybody else compute from the hugepage
size and nactive.
2021-02-04 20:58:31 -08:00
David Goldblatt
55e0f60ca1 psset stats: Simplify handling.
We can treat the huge and nonhuge cases uniformly using huge state as an array
index.
2021-02-04 20:58:31 -08:00
David Goldblatt
94cd9444c5 HPA: Some minor reformattings. 2021-02-04 20:58:31 -08:00
David Goldblatt
b25ee5d88e HPA: Add purge stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
746ea3de6f HPA stats: Allow some derived stats.
However, we put them in their own struct, to avoid the messiness that the arena
has (mixing derived and non-derived stats in the arena_stats_t).
2021-02-04 20:58:31 -08:00
David Goldblatt
30b9e8162b HPA: Generalize purging.
Previously, we would purge a hugepage only when it's completely empty.  With
this change, we can purge even when only partially empty.  Although the
heuristic here is still fairly primitive, this infrastructure can scale to
become more advanced.
2021-02-04 20:58:31 -08:00
David Goldblatt
70692cfb13 hpdata: Add state changing helpers.
We're about to allow hugepage subextent purging; get as much of our metadata
handling ready as possible.
2021-02-04 20:58:31 -08:00
David Goldblatt
9b75808be1 flat bitmap: Add a bitwise and/or/not.
We're about to need them.
2021-02-04 20:58:31 -08:00
David Goldblatt
2ae966222f hpdata: track per-page dirty state. 2021-02-04 20:58:31 -08:00
David Goldblatt
ff4086aa6b hpdata: count active pages instead of free ones.
This will be more consistent with later naming choices.
2021-02-04 20:58:31 -08:00
David Goldblatt
3624dd42ff hpdata: Add a comment for hpdata_consistent. 2021-02-04 20:58:31 -08:00
David Goldblatt
20140629b4 Bin: Move stats closer to the mutex.
This is a slight cache locality optimization.
2021-02-04 14:10:43 -08:00
David Goldblatt
c259323ab3 Use ticker_geom_t for arena tcache decay. 2021-02-04 14:10:43 -08:00
David Goldblatt
8edfc5b170 Add ticker_geom_t.
This lets a single ticker object drive events across a large number of different
tick streams while sharing state.
2021-02-04 14:10:43 -08:00
David Goldblatt
3967329813 Arena: share bin offsets in a global.
This saves us a cache miss when lookup up the arena bin offset in a remote
arena during tcache flush.  All arenas share the base offset, and so we don't
need to look it up repeatedly for each arena.  Secondarily, it shaves 288 bytes
off the arena on, e.g., x86-64.
2021-02-04 14:10:43 -08:00
David Goldblatt
2fcbd18115 Cache bin: Don't reverse flush order.
The items we pick to flush matter a lot, but the order in which they get flushed
doesn't; just use forward scans.  This simplifies the accessing code, both in
terms of the C and the generated assembly (i.e. this speeds up the flush
pathways).
2021-02-04 14:10:43 -08:00
David Goldblatt
4c46e11365 Cache an arena's index in the arena.
This saves us a pointer hop down some perf-sensitive paths.
2021-02-04 14:10:43 -08:00
David Goldblatt
229994a204 Tcache flush: keep common path state in registers.
By carefully force-inlining the division constants and the operation sum count,
we can eliminate redundant operations in the arena-level dalloc function.  Do
so.
2021-02-04 14:10:43 -08:00
David Goldblatt
31a629c3de Tcache flush: prefetch edata contents.
This frontloads more of the miss latency.  It also moves it to a pathway where
we have not yet acquired any locks, so that it should (hopefully) reduce hold
times.
2021-02-04 14:10:43 -08:00
David Goldblatt
9f9247a62e Tcache fluhing: increase cache miss parallelism.
In practice, many rtree_leaf_elm accesses are cache misses.  By restructuring,
we can make it more likely that these misses occur without blocking us from
starting later lookups, taking more of those misses in parallel.
2021-02-04 14:10:43 -08:00
David Goldblatt
181ba7fd4d Tcache flush: Add an emap "batch lookup" path.
For now this is a no-op; but the interface is a little more flexible for our
purposes.
2021-02-04 14:10:43 -08:00
David Goldblatt
c007c537ff Tcache flush: Unify edata lookup path. 2021-02-04 14:10:43 -08:00
David CARLIER
35a8552605 Mac OS: Tag mapped pages.
This can be used to help profiling tools (e.g. vmmap) identify the
sources of mappings more specifically.
2021-02-03 15:05:53 -08:00
Yinan Zhang
f6699803e2 Fix duration in prof log 2021-01-25 16:38:38 -08:00
Azat Khuzhin
a943172b73 Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu)
qemu does not support this, yet [1], and you can get very tricky assert
if you will run program with jemalloc in use under qemu:

    <jemalloc>: ../contrib/jemalloc/src/extent.c:1195: Failed assertion: "p[i] == 0"

  [1]: https://patchwork.kernel.org/patch/10576637/

Here is a simple example that shows the problem [2]:

    // Gist to check possible issues with MADV_DONTNEED
    // For example it does not supported by qemu user
    // There is a patch for this [1], but it hasn't been applied.
    //   [1]: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05422.html

    #include <sys/mman.h>
    #include <stdio.h>
    #include <stddef.h>
    #include <assert.h>
    #include <string.h>

    int main(int argc, char **argv)
    {
        void *addr = mmap(NULL, 1<<16, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        if (addr == MAP_FAILED) {
            perror("mmap");
            return 1;
        }
        memset(addr, 'A', 1<<16);

        if (!madvise(addr, 1<<16, MADV_DONTNEED)) {
            puts("MADV_DONTNEED does not return error. Check memory.");
            for (int i = 0; i < 1<<16; ++i) {
                assert(((unsigned char *)addr)[i] == 0);
            }
        } else {
            perror("madvise");
        }

        if (munmap(addr, 1<<16)) {
            perror("munmap");
            return 1;
        }

        return 0;
    }

  ### unpatched qemu

      $ qemu-x86_64-static /tmp/test-MADV_DONTNEED
      MADV_DONTNEED does not return error. Check memory.
      test-MADV_DONTNEED: /tmp/test-MADV_DONTNEED.c:19: main: Assertion `((unsigned char *)addr)[i] == 0' failed.
      qemu: uncaught target signal 6 (Aborted) - core dumped
      Aborted (core dumped)

  ### patched qemu (by returning ENOSYS error)

      $ qemu-x86_64 /tmp/test-MADV_DONTNEED
      madvise: Success

  ### patch for qemu to return ENOSYS

      diff --git a/linux-user/syscall.c b/linux-user/syscall.c
      index 897d20c076..5540792e0e 100644
      --- a/linux-user/syscall.c
      +++ b/linux-user/syscall.c
      @@ -11775,7 +11775,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
                  turns private file-backed mappings into anonymous mappings.
                  This will break MADV_DONTNEED.
                  This is a hint, so ignoring and returning success is ok.  */
      -        return 0;
      +        return ENOSYS;
       #endif
       #ifdef TARGET_NR_fcntl64
           case TARGET_NR_fcntl64:

  [2]: https://gist.github.com/azat/12ba2c825b710653ece34dba7f926ece

v2:
- review fixes
- add opt_dont_trust_madvise
v3:
- review fixes
- rename opt_dont_trust_madvise to opt_trust_madvise
2021-01-20 20:08:30 -08:00
Uwe L. Korn
2e3104ba07 Update config.{sub,guess} to support support-aarch64-apple-darwin as a target 2021-01-11 14:00:03 -08:00
David Goldblatt
a011c4c22d cache_bin: Separate out local and remote accesses.
This fixes an incorrect debug-mode assert:
- T1 starts an arena stats update and reads stack_head from another thread's
  cache bin, when that cache bin has 1 item in it.
- T2 allocates from that cache bin.  The cache_bin's stack_head now points to a
  NULL pointer, since the cache bin is empty.
- T1 Re-reads the cache_bin's stack_head to perform an assertion check (since it
  previously saw that the bin was empty, whatever stack_head points to should be
  non-NULL).
2021-01-08 14:18:08 -08:00
Yinan Zhang
14d689c0f9 Add prof stats mutex stats 2021-01-07 20:39:49 -08:00
Yinan Zhang
9f71b5779b Output prof stats in stats print 2021-01-07 20:39:49 -08:00
Yinan Zhang
1f1a0231ed Split macros for initializing stats headers 2021-01-07 20:39:49 -08:00
Yinan Zhang
4352cbc21c Add alignment tests for prof stats 2021-01-07 20:39:49 -08:00
Yinan Zhang
54f3351f1f Add mallctl for prof stats fetching 2021-01-07 20:39:49 -08:00
Yinan Zhang
40fa4d29d3 Track per size class internal fragmentation 2021-01-07 20:39:49 -08:00
Yinan Zhang
afa489c3c5 Record request size in prof info 2021-01-07 20:39:49 -08:00
David Goldblatt
f9bb8dedef Un-force-inline do_rallocx.
The additional overhead of the function-call setup and flags checking is
relatively small, but costs us the replication of the entire realloc pathway in
terms of size.
2021-01-04 14:55:49 -08:00
David Goldblatt
a9fa2defdb Add JEMALLOC_COLD, and mark some functions cold.
This hints to the compiler that it should care more about space than CPU (among
other things).  In cases where the compiler lacks profile-guided information,
this can be a substantial space savings.

For now, we mark the mallctl or atexit driven profiling and stats functions that
take up the most space.
2021-01-04 14:55:49 -08:00
David Goldblatt
5d8e70ab26 prof_recent: cassert(config_prof) more often.
This tells the compiler that these functions are never called, which lets them
be optimized away in builds where profiling is disabled.
2021-01-04 14:55:49 -08:00
David Goldblatt
83cad746ae prof_log: cassert(config_prof) in public functions
This lets the compiler infer that the code is dead in builds where profiling is
enabled, saving on space there.
2021-01-04 14:55:49 -08:00
David Goldblatt
526180b76d Extent.c: Avoid an rtree NULL-check.
The edge case in which pages_map returns (void *)PAGE can trigger an incorrect
assertion failure.  Avoid it.
2021-01-04 14:50:49 -08:00
Yinan Zhang
b35ac00d58 Do not bump to large size for page aligned request 2020-12-29 17:09:58 -08:00
Yinan Zhang
8a56d6b636 Add last-N mutex stats 2020-12-29 09:44:19 -08:00
Yinan Zhang
22d62d8cbd Handle ending gap properly for HPA stats 2020-12-18 16:40:57 -08:00
Yinan Zhang
6c5a3a24dd Omit bin stats rows with no data 2020-12-18 16:40:57 -08:00
Yinan Zhang
ea013d8fa4 Enforce realloc sizing stability 2020-12-18 11:41:52 -08:00
Yinan Zhang
74bd63b203 Optimize stats print using partial name-to-mib 2020-12-18 10:39:58 -08:00
Yinan Zhang
4557c0a67d Enable ctl on partial mib and partial name 2020-12-18 10:39:58 -08:00
Yinan Zhang
006dd0414e Add partial name-to-mib functionality 2020-12-18 10:39:58 -08:00
Yinan Zhang
f2e1a5be77 Do not fail on partial ctl path for ctl_nametomib()
We do not fail on partial ctl path when the given `mib` array is
shorter than the given name, and we should keep the behavior the
same in the reverse case, which I feel is also the more natural way.
2020-12-18 10:39:58 -08:00
Yinan Zhang
6ab181d2b7 Extract node lookup given mib input 2020-12-18 10:39:58 -08:00
Yinan Zhang
3a627b9674 No need to record all nodes in ctl_lookup() 2020-12-18 10:39:58 -08:00
Yinan Zhang
91e006c4c2 Enable ctl_lookup() to start from arbitrary node 2020-12-18 10:39:58 -08:00
Jin Qian
063a767ffe Define JEMALLOC_HAS_ALLOCA_H for QNX
QNX has <alloca.h>
2020-12-18 10:05:59 -08:00
Jin Qian
4e3fe218e9 Use posix_madvise to purge pages when available 2020-12-18 10:05:59 -08:00
Jin Qian
26c1dc5a3a Support AutoConf for posix_madvise and POSIX_MADV_DONTNEED 2020-12-18 10:05:59 -08:00
Jin Qian
96a59c3bb5 Fix recursive malloc during bootstrap on QNX
pthread_key_create on QNX triggers recursive allocation during tsd
bootstrapping. Using tsd_init_check_recursion to detect that.

Before pthread_key_create, the address of tsd_boot_wrapper is returned
from tsd_get_wrapper instead of using TLS to store the pointer.
tsd_set_wrapper becomes a no-op. After that, the address of
tsd_boot_wrapper is written to TLS and bootstrap continues as before.

Signed-off-by: Jin Qian <jqian@aurora.tech>
2020-12-18 10:05:59 -08:00
Jin Qian
986cbe4881 Disable JEMALLOC_TLS for QNX
TLS access triggers recurisive malloc during bootstrapping. Need to use
pthread_getspecific and pthread_setspecific with a follow up fix.
2020-12-18 10:05:59 -08:00
David Goldblatt
1e3b8636ff HPA: Remove unused malloc_conf options. 2020-12-08 12:10:48 -08:00
Yinan Zhang
e82771807e Cache mallctl mib for batch allocation stress test 2020-12-07 09:10:11 -08:00
Yinan Zhang
0dfdd31e0f Add tiny batch size to batch allocation stress test 2020-12-07 09:10:11 -08:00
Aditya Kumar
9522ae41d6 Move n_search outside of assert as reported by static analyzer 2020-12-07 06:49:27 -08:00
David Goldblatt
a559caf74a hpdata: Strengthen assertions.
Now that we have flat bitmap bit counting functions, we can easily assert that
nfree is always correct.  While we're tightening up this code, enforce
consistency on API boundaries as well.
2020-12-07 06:21:08 -08:00
David Goldblatt
f51948d9e1 psset unit test: fix a bug.
The next commit adds assertions that reveal a bug in the test code
(double-free).  Fix it.
2020-12-07 06:21:08 -08:00
David Goldblatt
54c94c1679 flat bitmap: add scount / ucount functions.
These can compute the number or set or unset bits in a subrange of the bitmap.
2020-12-07 06:21:08 -08:00
David Goldblatt
e6c057ad35 fb: implement assign in terms of a visitor.
We'll reuse this visitor in the next commit.
2020-12-07 06:21:08 -08:00
David Goldblatt
734e72ce8f bit_util: Guarantee popcount's presence.
Implement popcount generically, so that we can rely on it being present.
2020-12-07 06:21:08 -08:00
David Goldblatt
d9f7e6c668 hpdata: Add a test.
We're about to make the functionality here more complicated; testing hpdata
directly (rather than relying on user's tests) will make debugging easier.
2020-12-07 06:21:08 -08:00
David Goldblatt
3ed0b4e8a3 HPA: Add an nevictions counter.
I.e. the number of times we've purged a hugepage-sized region.
2020-12-07 06:21:08 -08:00
David Goldblatt
fffcefed33 malloc_conf: Clarify HPA options. 2020-12-07 06:21:08 -08:00
David Goldblatt
f7cf23aa4d psset: Relegate alloc/dalloc to test code.
This is no longer part of the "core" functionality; we only need the stub
implementations as an end-to-end test of hpdata + psset interactions when
metadata is being modified.  Treat them accordingly.
2020-12-07 06:21:08 -08:00
David Goldblatt
f9299ca572 HPA: Use psset fit/insert/remove.
This will let us remove alloc_new and alloc_reuse functions from the psset.
2020-12-07 06:21:08 -08:00
David Goldblatt
0971e1e4e3 hpdata: Use addr/size instead of begin/npages.
This is easier for the users of the hpdata.
2020-12-07 06:21:08 -08:00
David Goldblatt
5228d869ee psset: Use fit/insert/remove as basis functions.
All other functionality can be implemented in terms of these; doing so (while
retaining the same API) will be convenient for subsequent refactors.
2020-12-07 06:21:08 -08:00
David Goldblatt
089f8fa442 Move hpdata bitmap logic out of the psset. 2020-12-07 06:21:08 -08:00
David Goldblatt
ca30b5db2b Introduce hpdata_t.
Using an edata_t both for hugepages and the allocations within those hugepages
was convenient at first, but has outlived its usefulness.  Representing
hugepages explicitly, with their own data structure, will make future
development easier.
2020-12-07 06:21:08 -08:00
David Goldblatt
4a15008cfb HPA unit test: skip if unsupported.
Previously, we replicated the logic in hpa_supported in the test as well.
2020-12-07 06:21:08 -08:00
David Goldblatt
43af63fff4 HPA: Manage whole hugepages at a time.
This redesigns the HPA implementation to allow us to manage hugepages all at
once, locally, without relying on a global fallback.
2020-12-07 06:21:08 -08:00
David Goldblatt
63677dde63 Pages: Statically detect if pages_huge may succeed 2020-12-07 06:21:08 -08:00
David Goldblatt
c1b2a77933 psset: Move in stats.
A later change will benefit from having these functions pulled into a
psset-module set of functions.
2020-12-07 06:21:08 -08:00
David Goldblatt
d0a991d47b psset: Add insert/remove functions.
These will allow us to (for instance) move pageslabs from a psset dedicated to
not-yet-hugeified pages to one dedicated to hugeified ones.
2020-12-07 06:21:08 -08:00
David Goldblatt
d438296b1f narenas_ratio: Accept fractional values.
With recent scalability improvements to the HPA, we're experimenting with much
lower arena counts; this gets annoying when trying to test across different
hardware configurations using only the narenas setting.
2020-12-04 23:48:19 -08:00
David Goldblatt
ecd39418ac Add fxp: A fixed-point math library.
This will be used in the next commit to allow non-integer values for
narenas_ratio.
2020-12-04 23:48:19 -08:00
Igor Wiedler
99c2d6c232 Backport jeprof --collapse for flamegraph generation 2020-12-04 10:48:21 -08:00
David Carlier
520b75fa2d utrace support with label based signature. 2020-11-30 11:43:00 -08:00
Yinan Zhang
92e189be8b Add some comments to the batch allocation logic flow 2020-11-16 20:58:01 -08:00
Yinan Zhang
d96e4525ad Route batch allocation of small batch size to tcache 2020-11-16 20:58:01 -08:00
Yinan Zhang
ac480136d7 Split out locality checking in batch allocation tests 2020-11-16 20:58:01 -08:00
Yinan Zhang
be5e49f4fa Add a batch mode for cache_bin_alloc() 2020-11-16 20:58:01 -08:00
Yinan Zhang
4a65f34930 Fix a cache bin test 2020-11-16 20:58:01 -08:00
Yinan Zhang
566c4a8594 Slight changes to cache bin internal functions 2020-11-16 20:58:01 -08:00
Yinan Zhang
9545c2cd36 Add sample interval to prof last-N dump 2020-11-13 15:33:27 -08:00
David Goldblatt
cf2549a149 Add a per-arena oversize_threshold.
This can let manual arenas trade off memory and CPU the way auto arenas do.
2020-11-13 13:45:35 -08:00
David Goldblatt
4ca3d91e96 Rename geom_grow -> exp_grow.
This was promised in the review of the introduction of geom_grow, but would have
been painful to do there because of the series that introduced it.  Now that
those are comitted, renaming is easier.
2020-11-13 13:42:33 -08:00
David Goldblatt
b4c37a6e81 Rename edata_tree_t -> edata_avail_t.
This isn't a tree any more, and it mildly irritates me any time I see it.
2020-11-13 13:42:11 -08:00
David Carlier
95f0a77fde Detect pthread_getname_np explicitly.
At least one libc (musl) defines pthread_setname_np without defining
pthread_getname_np. Detect the presence of each individually, rather than
inferring both must be defined if set is.
2020-11-11 17:31:22 -08:00
Issam E. Maghni
b3c5690b7e Update config.{guess,sub} to 2020-11-07@77632d9 2020-11-10 13:32:32 -08:00
David Goldblatt
589638182a Use the edata_cache_small_t in the HPA. 2020-11-05 12:34:43 -08:00
David Goldblatt
03a6047111 Edata cache small: rewrite.
In previous designs, this was intended to be a sort of cache that couldn't fail.
In the current design, we want to use it just as a contention reduction
mechanism.  Rewrite it with those goals in mind.
2020-11-05 12:34:43 -08:00
David Goldblatt
c9757d9e3b HPA: Don't disable shards that were never started. 2020-11-05 12:34:43 -08:00
David Goldblatt
1b3ee75667 Add experimental.thread.activity_callback.
This (experimental, undocumented) functionality can be used by users to track
various statistics of interest at a finer level of granularity than the thread.
2020-11-05 12:33:25 -08:00
David Carlier
27ef02ca9a Android build fix proposal.
These are detected at configure time while they are glibc
specifics. the bionic equivalent is not api compatible
and dlopen is restricted in this platform.
2020-11-02 13:38:44 -08:00
David Carlier
d2d941017b MADV_DO[NOT]DUMP support equivalence on FreeBSD. 2020-11-02 09:15:15 -08:00
David Goldblatt
180b843159 Appveyor: fix 404 errors.
It looks like the mirrors we were using no longer carry this package, but that
it is installed by default and so no longer needs a remote mirror.
2020-10-27 15:28:20 -07:00
DC
ef6d51ed44 DragonFlyBSD build support. 2020-10-27 12:35:19 -07:00
Qi Wang
bf72188f80 Allow opt.tcache_max to accept small size classes.
Previously all the small size classes were cached.  However this has downsides
-- particularly when page size is greater than 4K (e.g. iOS), which will result
in much higher SMALL_MAXCLASS.

This change allows tcache_max to be set to lower values, to better control
resources taken by tcache.
2020-10-24 20:43:44 -07:00
David Goldblatt
ea32060f9c SEC: Implement thread affinity.
For now, just have every thread pick a shard once and stick with it.
2020-10-23 11:14:34 -07:00
David Goldblatt
d16849c91d psset: Do first-fit based on slab age.
This functions more like the serial number strategy of the ecache and
hpa_central_t.  Longer-lived slabs are more likely to continue to live for
longer in the future.
2020-10-23 11:14:34 -07:00
David Goldblatt
634ec6f50a Edata: add an "age" field. 2020-10-23 11:14:34 -07:00
David Goldblatt
6599651aee PA: Use an SEC in fron of the HPA shard. 2020-10-23 11:14:34 -07:00
David Goldblatt
ea51e97bb8 Add SEC module: a small extent cache.
This can be used to take pressure off a more centralized, worse-sharded
allocator without requiring a full break of the arena abstraction.
2020-10-23 11:14:34 -07:00
David Goldblatt
1964b08394 HPA: Add stats for the hpa_shard. 2020-10-23 11:14:34 -07:00
David Goldblatt
534504d4a7 HPA: add size-exclusion functionality.
I.e. only allowing allocations under or over certain sizes.
2020-10-23 11:14:34 -07:00
David Goldblatt
484f04733e HPA: Add central mutex contention stats. 2020-10-23 11:14:34 -07:00
David Goldblatt
bf025d2ec8 HPA: Make slab sizes and maxes configurable.
This allows easy experimentation with them as tuning parameters.
2020-10-23 11:14:34 -07:00
David Goldblatt
1c7da33317 HPA: Tie components into a PAI implementation. 2020-10-23 11:14:34 -07:00
Qi Wang
c8209150f9 Switch from opt.lg_tcache_max to opt.tcache_max
Though for convenience, keep parsing lg_tcache_max.
2020-10-22 20:40:41 -07:00
Yinan Zhang
5ba861715a Add thread name in prof last-N records 2020-10-20 15:58:24 -07:00
David Goldblatt
4ef5b8b4df Add a logo to doc_internal.
This is the logo from the jemalloc development team's snazzy windbreakers.  We
don't actually use it in any documentation yet, but there's no reason we
couldn't.  In the meantime, it's probably best if it exists somewhere more
stable than various email inboxes.
2020-10-19 15:32:51 -07:00
Qi Wang
5e41ff9b74 Add a hard limit on tcache max size class.
For locality reasons, tcache bins are integrated in TSD.  Allowing all size
classes to be cached has little benefit, but takes up much thread local storage.
In addition, it complicates the layout which we try hard to optimize.
2020-10-16 13:49:51 -07:00
Qi Wang
3de19ba401 Eagerly detect double free and sized dealloc bugs for large sizes. 2020-10-15 10:03:16 -07:00
David Goldblatt
be9548f2be Tcaches: Fix a subtle race condition.
Without a lock held continuously between checking tcaches_past and incrementing
it, it's possible for two threads to go down manual creation path
simultaneously.  If the number of tcaches is one less than the maximum, it's
possible for both to create a tcache and increment tcaches_past, with the second
thread returning a value larger than TCACHES_MAX.
2020-10-13 15:06:16 -07:00
Qi Wang
a9aa6f6d0f Fix the alloc_ctx check in free_fastpath.
The sanity check requires a functional TSD, which free_fastpath only guarantees
after the threshold branch.  Move the check function to afterwards.
2020-10-12 19:02:27 -07:00
David Goldblatt
b971f7c4dd Add "default" option to slab sizes.
This comes in handy when overriding earlier settings to test alternate ones.  We
don't really include tests for this, but I claim that's OK here:
- It's fairly straightforward
- It's fairly hard to test well
- This entire code path is undocumented and mostly for our internal
  experimentation in the first place.
- I tested manually.
2020-10-07 12:54:29 -07:00
David Goldblatt
21b70cb540 Add hpa_central module
This will be the centralized component of the coming hugepage allocator; the
source of larger chunks of memory from which smaller ones can be obtained.
2020-10-05 19:55:57 -07:00
David Goldblatt
1ed7ec369f Emap: Add emap_assert_not_mapped.
The counterpart to emap_assert_mapped, it lets callers check that some edata is
not already in the emap.
2020-10-05 19:55:57 -07:00
David Goldblatt
2a6ba121b5 PRNG test: cleanups.
Since we no longer have both atomic and non-atomic variants, there's no reason
to try to test both.
2020-10-05 19:55:57 -07:00
David Goldblatt
9e6aa77ab9 PRNG: Remove atomic functionality.
These had no uses and complicated the API.  As a rule we now expect to only use
thread-local randomization for contention-reduction reasons, so we only pay the
API costs and never get the functionality benefits.
2020-10-05 19:55:57 -07:00
David Goldblatt
0513047170 PRNG: Allow a a range argument of 1.
This is convenient when the range argument itself is generated from some
computation whose value we don't know in advance.
2020-10-05 19:55:57 -07:00
David Goldblatt
bdb60a8053 Appveyor: don't update msys2 keyring.
This is no longer required, and the step now fails.
2020-10-05 19:54:21 -07:00
David Goldblatt
025d8c37c9 Add a script to check for clang-formattedness. 2020-10-02 14:49:56 -07:00
David Goldblatt
f6bbfc1e96 Add a .clang-format file. 2020-10-02 14:49:56 -07:00
David Goldblatt
259c5e3e8f psset: Add stats 2020-09-18 12:39:25 -07:00
David Goldblatt
018b162d67 Add psset: a set of pageslabs.
This introduces a new sort of edata_t; a pageslab, and a set to manage them.
This is part of a series of a commits to implement a hugepage allocator; the
pageset will be per-arena, and track small page allocations requests within a
larger extent allocated from a centralized hugepage allocator.
2020-09-18 12:39:25 -07:00
David Goldblatt
ed99d300b9 Flat bitmap: Add longest-range computation.
This will come in handy in the (upcoming) page-slab set assertions.
2020-09-18 12:39:25 -07:00
David Goldblatt
e034500698 Edata: rename "ranged" bit to "pai".
This better represents its intended purpose; the hugepage allocator design
evolved away from needing contiguity of hugepage virtual address space.
2020-09-18 12:39:25 -07:00
David Goldblatt
7ad2f78663 Avoid a -Wundef warning on LG_SLAB_MAXREGS. 2020-09-17 10:05:40 -07:00
David Goldblatt
40cf71a06d Remove --with-slab-maxregs options from INSTALL.md
The variable slab sizes feature is still experimental; we don't want people to
start using it willy-nilly, or document its existence as a guarantee.
2020-09-17 10:05:40 -07:00
ezeeyahoo
36ebb5abe3 CI support for PPC64LE architecture 2020-09-17 10:03:08 -07:00
Hao Liu
1541ffc765 configure: add --with-lg-slab-maxregs configure option.
Specify the maximum number of regions in a slab, which is
(<lg-page> - <lg-tiny-min>) by default. This increases the limit of slab sizes
specified by "slab_sizes" in malloc_conf. This should never be less than
the default value. The max value of this option is related to LG_BITMAP_MAXBITS
(see more in bitmap.h).

For example, on a 4k page size system, if we:
  1) configure jemalloc with with --with-lg-slab-maxregs=12.
  2) export MALLOC_CONF="slab_sizes:9-16:4"
The slab size of 16 bytes is set to 4 pages. Previously, the default
lg-slab-maxregs is 9 (i.e. 12 - 3). The max slab size of 16 bytes is 2 pages
(i.e. (1<<9) * 16 bytes). By increasing the value from 9 to 12, the max slab
size can be set by MALLOC_CONF is 16 pages (i.e. (1<<12) * 16 bytes).
2020-09-16 13:58:38 -07:00
David Goldblatt
d243b4ec48 Add PROFILING_INTERNALS.md
This documents and explains some of the logic behind the profiling
implementation.
2020-09-10 15:56:59 -07:00
Yinan Zhang
09eda2c9b6 Add unit tests for usize in prof recent records 2020-09-09 13:31:35 -07:00
Yinan Zhang
b549389e4a Correct usize in prof last-N record 2020-09-09 13:31:35 -07:00
Yinan Zhang
202f01d4f8 Fix szind computation in profiling 2020-08-27 15:52:25 -07:00
Yinan Zhang
866231fc61 Do not repeat reentrancy test in profiling 2020-08-25 16:49:32 -07:00
Yinan Zhang
20f2479ed7 Do not create size class tables for non-prof builds 2020-08-24 20:10:02 -07:00
Yinan Zhang
8efcdc3f98 Move unbias data to prof_data 2020-08-24 20:10:02 -07:00
David Goldblatt
5e90fd006e Geom_grow: Don't keep the mutex internal.
We're about to use it in ways that will have external synchronization.
2020-08-19 16:53:21 -07:00
David Goldblatt
c57494879f Geom_grow: Don't take tsdn at init.
It's never used.
2020-08-19 16:53:21 -07:00
David Goldblatt
ffe552223c Geom_grow: Move in advancing logic. 2020-08-19 16:53:21 -07:00
David Goldblatt
131b1b5338 Rename ecache_grow -> geom_grow.
We're about to start using it outside of the ecaches, in the HPA central
allocator.
2020-08-19 16:53:21 -07:00
David Goldblatt
b399463fba flat_bitmap unit test: Silence a warning. 2020-08-17 12:50:27 -07:00
David Goldblatt
b0ffa39cac Mallctl stress test: fix a type.
The mallctlbymib_long helper was copy-pasted from mallctlbymib_short, and
incorrectly used its output variable (a char *) rather than the output variable
of the mallctl call it was using (a uint64_t), causing breakages when
sizeof(char *) differed from sizeof(uint64_t).
2020-08-17 12:50:14 -07:00
David Goldblatt
753bbf1849 Benchmarks: Also print ns / iter.
This is often what we really care about.  It's not easy to do the division
mentally in all cases.
2020-08-13 10:03:15 -07:00
David Goldblatt
7b187360e9 IO: Support 0-padding for unsigned numbers. 2020-08-13 10:03:15 -07:00
David Goldblatt
32d4673221 Add a mallctl speed stress test. 2020-08-13 10:03:15 -07:00
David Goldblatt
38867c5c17 Makefile: alphabetize stress/analyze utilities. 2020-08-13 10:03:15 -07:00
David Goldblatt
ab274a23b9 Add narenas_ratio.
This allows setting arenas per cpu dynamically, rather than forcing the user to
know the number of CPUs in advance if they want a particular CPU/space tradeoff.
2020-08-12 16:41:57 -07:00
David Goldblatt
9e18ae639f Config: safety checks don't imply size checks.
The commit introducing size checks accidentally enabled them whenever any safety
checks were on.  This ends up causing the regression that splitting up the
features was intended to avoid.  Fix the issue.
2020-08-12 13:00:19 -07:00
Yinan Zhang
8f9e958e1e Add alignment stress test for rallocx 2020-08-11 11:56:43 -07:00
Yinan Zhang
743021b63f Fix size miscalculation bug in reallocation 2020-08-11 11:56:43 -07:00
David Goldblatt
eaed1e39be Add sized-delete size-checking functionality.
The existing checks are good at finding such issues (on tcache flush), but not
so good at pinpointing them.  Debug mode can find them, but sometimes debug mode
slows down a program so much that hard-to-hit bugs can take a long time to
crash.

This commit adds functionality to keep programs mostly on their fast paths,
while also checking every sized delete argument they get.
2020-08-05 19:34:05 -07:00
David Goldblatt
53084cc5c2 Safety check: Don't directly abort.
The sized dealloc checks called the generic safety_check_fail, and then called
abort.  This means the failure case isn't mockable, hence not testable.  Fix it
in anticipation of a coming diff.
2020-08-05 19:34:05 -07:00
David Goldblatt
60993697d8 Prof: Add prof_unbias.
This gives more accurate attribution of bytes and counts to stack traces,
without introducing backwards incompatibilities in heap-profile parsing tools.
We track the ideal reported (to the end user) number of bytes more carefully
inside core jemalloc.  When dumping heap profiles, insteading of outputting our
counts directly, we output counts that will cause parsing tools to give a result
close to the value we want.

We retain the old version as an opt setting, to let users who are tracking
values on a per-component basis to keep their metrics stable until they decide
to switch.
2020-08-05 18:33:55 -07:00
David Goldblatt
81c2f841e5 Add a simple utility to detect profiling bias. 2020-08-05 18:33:55 -07:00
Yinan Zhang
e032a1a1de Add a stress test for batch allocation 2020-08-03 09:36:40 -07:00
Yinan Zhang
f6cf5eb388 Add mallctl for batch allocation API 2020-07-31 09:16:50 -07:00
Yinan Zhang
978f830ee3 Add batch allocation API 2020-07-31 09:16:50 -07:00
Yinan Zhang
c6f59e9bb4 Add surplus reading API for thread event lookahead 2020-07-31 09:16:50 -07:00
Yinan Zhang
f805468957 Add zero option to arena batch allocation 2020-07-31 09:16:50 -07:00
Yinan Zhang
49e5c2fe7d Add batch allocation from fresh slabs 2020-07-31 09:16:50 -07:00
Yinan Zhang
2bb8060d57 Add empty test and concat for typed list 2020-07-31 09:16:50 -07:00
Yinan Zhang
f28cc2bc87 Extract bin shard selection out of bin locking 2020-07-31 09:16:50 -07:00
David Goldblatt
ddb8dc4ad0 FB: Add range iteration support. 2020-07-30 15:25:23 -07:00
David Goldblatt
ceee823519 Add flat_bitmap.
The flat_bitmap module offers an extended API, at the cost of decreased
performance in the case of very large bitmaps.
2020-07-30 15:25:23 -07:00
David Goldblatt
7fde6ac490 Nbits: Add a couple more interesting sizes.
Previously, all tests with more than two levels came in powers of 2.  It's
usefule to check cases where we have a partially filled group at above the
second level.
2020-07-30 15:25:23 -07:00
David Goldblatt
efeab1f498 bitset test: Pull NBITS_TAB into its own file. 2020-07-30 15:25:23 -07:00
David Goldblatt
22da836094 bit_util: Add fls_ functions; "find last set".
These simplify a lot of the bit_util module, which had grown bits and pieces of
this functionality across a variety of places over the years.

While we're here, kill off BIT_UTIL_INLINE and don't do reentrancy testing for
bit_util.
2020-07-30 15:25:23 -07:00
David Goldblatt
1ed0288d9c bit_util: Change ffs functions indexing.
Making these 0-based instead of 1-based makes calling code simpler and will be
more consistent with functions introduced in subsequent diffs.
2020-07-30 15:25:23 -07:00
David Goldblatt
786a27b9e5 CI: Update keyring. 2020-07-27 15:54:57 -07:00
Yinan Zhang
fb347dc618 Verify output space before doing heavy work in mallctl 2020-07-27 09:48:35 -07:00
Yinan Zhang
f5fb4e5a97 Modify mallctl output length when needed
This is the only reason why `oldlenp` was designed to be in the form
of a pointer.
2020-07-27 09:48:35 -07:00
Yinan Zhang
4258402047 Corrections for prof_log_start() 2020-07-22 13:34:49 -07:00
Yinan Zhang
e6cb7a1c9b Shorten wait time for peak events 2020-07-14 09:00:33 -07:00
David Goldblatt
6107857b7b PA->PAC: Move in PAI implementation. 2020-07-09 13:41:04 -07:00
David Goldblatt
6041aaba97 PA -> PAC: Move in destruction functions. 2020-07-09 13:41:04 -07:00
David Goldblatt
cbf096b05e Arena: remove redundant bg inactivity check. 2020-07-09 13:41:04 -07:00
David Goldblatt
471eb5913c PAC: Move in decay rate setting. 2020-07-09 13:41:04 -07:00
David Goldblatt
6a2774719f PA->PAC: Move in decay functions. 2020-07-09 13:41:04 -07:00
David Goldblatt
4ee75be3a3 PA -> PAC: Move in decay_purge enum. 2020-07-09 13:41:04 -07:00
David Goldblatt
72435b0aba PA->PAC: Make extent.c forget about PA. 2020-07-09 13:41:04 -07:00
David Goldblatt
dee5d1c42d PA->PAC: Move in extent_sn. 2020-07-09 13:41:04 -07:00
David Goldblatt
7391382349 PA->PAC: Move in stats. 2020-07-09 13:41:04 -07:00
David Goldblatt
db211eefbf PAC: Move in decay. 2020-07-09 13:41:04 -07:00
David Goldblatt
c81e389996 PAC: Move in ecache_grow. 2020-07-09 13:41:04 -07:00
David Goldblatt
65803171a7 PAC: move in emap 2020-07-09 13:41:04 -07:00
David Goldblatt
7efcb946c4 PAC: Add an init function. 2020-07-09 13:41:04 -07:00
David Goldblatt
722652222a PAC: Move in edata_cache accesses. 2020-07-09 13:41:04 -07:00
David Goldblatt
777b0ba965 Add PAC: Page allocator classic.
For now, this is just a stub containing the ecaches, with no surrounding code
changed.  Eventually all the core allocator bits will be moved in, in the
subsequent stack of commits.
2020-07-09 13:41:04 -07:00
David Goldblatt
1b5f632e0f Introduce PAI: Page allocator interface 2020-07-09 13:41:04 -07:00
David Goldblatt
3cf19c6e5e atomic: add atomic_load_sub_store 2020-07-09 13:41:04 -07:00
David Goldblatt
f1f4ec315a Tcache: Tweak nslots_max tuning parameter.
In making these settings configurable, 634afc4124
unintentially changed a tuning parameter (reducing the "goal" max by a factor of
4).  This commit undoes that change.
2020-07-09 08:58:05 -07:00
David Goldblatt
ae541d3fab Edata: Reserve some space for hugepages. 2020-07-08 13:20:59 -07:00
David Goldblatt
392f645f4d Edata: split up different list linkage uses. 2020-07-08 13:20:59 -07:00
David Goldblatt
129b727058 Add typed-list module.
This gives some named convenience wrappers.
2020-07-08 13:20:59 -07:00
David Carlier
00f06c9beb enabling mpss on solaris/illumos.
reusing slighty linux configuration as possible, aligning the
 address range to HUGEPAGE.
2020-07-06 09:59:10 -07:00
Yinan Zhang
c2e7a06392 No need to intercept prof_dump_header() in tests 2020-06-29 14:27:50 -07:00
Yinan Zhang
f58ebdff7a Generalize prof_cnt_all() for testing 2020-06-29 14:27:50 -07:00
Yinan Zhang
80d18c18c9 Pass prof dump parameters explicitly in prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
d4259ea53b Simplify signatures for prof dump functions 2020-06-29 14:27:50 -07:00
Yinan Zhang
5d823f3a91 Consolidate struct definitions for prof dump parameters 2020-06-29 14:27:50 -07:00
Yinan Zhang
1f5fe3a3e3 Pass write callback explicitly in prof_data 2020-06-29 14:27:50 -07:00
Yinan Zhang
4556d3c0c8 Define structures for prof dump parameters 2020-06-29 14:27:50 -07:00
Yinan Zhang
1c6742e6a0 Migrate prof dumping to use buffered writer 2020-06-29 14:27:50 -07:00
Yinan Zhang
dad821bb22 Move unwind to prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
d128efcb6a Relocate a few prof utilities to the right modules 2020-06-29 14:27:50 -07:00
Yinan Zhang
4736fb4fc9 Move file handling logic in prof_data to prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
767a2e1790 Move file handling logic in prof to prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
03ae509f32 Create prof_sys module for reading system thread name 2020-06-29 14:27:50 -07:00
Yinan Zhang
adfd9d7b1d Change tsdn to tsd for thread name allocation 2020-06-29 14:27:50 -07:00
Yinan Zhang
841af2b426 Move thread name handling to prof_data module 2020-06-29 14:27:50 -07:00
Yinan Zhang
8118056c03 Expose prof_data testing internals only in prof tests 2020-06-29 14:27:50 -07:00
Yinan Zhang
f43ac8543e Correct prof header macro namings 2020-06-29 14:27:50 -07:00
Yinan Zhang
c8683bee80 Unify printing for prof counts object 2020-06-29 14:27:50 -07:00
Yinan Zhang
5d292b5660 Push error handling logic out of core dumping logic 2020-06-29 14:27:50 -07:00
Yinan Zhang
f541871f5d Reduce prof dump buffer size in debug build 2020-06-29 14:27:50 -07:00
Yinan Zhang
354183b10d Define prof dump buffer size centrally 2020-06-29 14:27:50 -07:00
Yinan Zhang
7455813e57 Make dump file writing replaceable in test 2020-06-29 14:27:50 -07:00
Yinan Zhang
21e44c45d9 Make maps file opening replaceable in test 2020-06-29 14:27:50 -07:00
Yinan Zhang
4bb4037dbe Extract utility function for opening maps file 2020-06-29 14:27:50 -07:00
Yinan Zhang
f307b25804 Only replace the dump file opening function in test 2020-06-29 14:27:50 -07:00
Yinan Zhang
d8cea87562 Move size inspections to test/analyze 2020-06-26 09:45:28 -07:00
Yinan Zhang
537a4bedb4 Add a tool to examine random number distributions 2020-06-26 09:45:28 -07:00
Yinan Zhang
d460333efb Improve naming for prof system thread name option 2020-06-24 14:32:01 -07:00
David T. Goldblatt
25e43c6022 Witness: Make ranks an enum.
This lets us avoid having to increment a bunch of values manually every time we
add a new sort of lock.
2020-06-19 18:05:08 -07:00
Yinan Zhang
092fcac0b4 Remove unnecessary source files 2020-06-19 12:15:44 -07:00
Yinan Zhang
a795b19327 Remove beginning define in source files
```
sed -i "/^#define JEMALLOC_[A-Z_]*_C_$/d" src/*.c;
```
2020-06-19 12:15:44 -07:00
Yinan Zhang
24bbf376ce Unify arena flag reading and selection 2020-06-19 11:06:05 -07:00
Yinan Zhang
e128b170a0 Do not fallback to auto arena when manual arena is requested 2020-06-19 11:06:05 -07:00
Yinan Zhang
95a59d2f72 Unify tcache flag reading and selection 2020-06-19 11:06:05 -07:00
Yinan Zhang
4b0c008489 Unify zero flag reading and setting 2020-06-19 11:06:05 -07:00
Yinan Zhang
2a84f9b8fc Unify alignment flag reading and computation 2020-06-19 11:06:05 -07:00
Yinan Zhang
b7858abfc0 Expose prof testing internal functions 2020-06-19 09:16:51 -07:00
Yinan Zhang
40fa6674a9 Fix prof timestamp conf reading 2020-06-17 16:02:51 -07:00
David Goldblatt
7e09a57b39 stress/sizes: Fix an off-by-one issue.
Algorithmically, a size greater than 1024 ZB could access one-past-the-end of
the sizes array.  This couldn't really happen since SIZE_MAX is less than 1024
ZB on all platforms we support (and we pick the arguments to this function to be
reasonable anyways), but it's not like there's any reason *not* to fix it,
either.
2020-06-16 10:34:19 -07:00
David Goldblatt
dcfa6fd507 stress/sizes: Add a couple more types. 2020-06-16 10:34:19 -07:00
David Goldblatt
40672b0b78 Remove duplicate logging in malloc. 2020-06-16 10:33:55 -07:00
Jon Haslam
4aea743279 High Resolution Timestamps for Profiling 2020-06-15 12:12:49 -07:00
David Goldblatt
d82a164d0d Add thread.peak.[read|reset] mallctls.
These can be used to track net allocator activity on a per-thread basis.
2020-06-11 13:54:22 -07:00
David Goldblatt
fe7108305a Add peak_t, for tracking allocator net max. 2020-06-11 13:54:22 -07:00
David Goldblatt
17a64fe91c Add a small program to print data structure sizes. 2020-06-11 08:13:38 -07:00
Yinan Zhang
3e19ebd2ea Add lock to protect prof last-N dumping 2020-06-09 17:03:05 -07:00
Yinan Zhang
a835d9cf85 Make prof last-N dumping non-blocking 2020-06-09 17:03:05 -07:00
Yinan Zhang
fc8bc4b5c0 Increase dump buffer for prof last-N list 2020-06-09 17:03:05 -07:00
Yinan Zhang
264d89d641 Extract restore and async cleanup functions for prof last-N list 2020-06-09 17:03:05 -07:00
Yinan Zhang
857ebd3daf Make edata pointer on prof recent record an atomic fence 2020-06-09 17:03:05 -07:00
Yinan Zhang
b8bdea6b26 Fix: prof_recent_alloc_max_ctl_read() does not take tsd 2020-06-09 17:03:05 -07:00
Yinan Zhang
730658f72f Extract alloc/dalloc utility for last-N nodes 2020-06-09 17:03:05 -07:00
Yinan Zhang
035be44867 Separate out dumping for each prof recent record 2020-06-09 17:03:05 -07:00
David Goldblatt
8da0896b79 Tcache: Make an integer conversion explicit. 2020-05-28 15:52:40 -07:00
David Goldblatt
cd28e60337 Don't warn on uniform initialization. 2020-05-28 15:52:40 -07:00
David Goldblatt
6cdac3c573 Tcache: Make flush fractions configurable. 2020-05-16 13:34:23 -07:00
David Goldblatt
7503b5b33a Stats, CTL: Expose new tcache settings. 2020-05-16 13:34:23 -07:00
David Goldblatt
ee72bf1cfd Tcache: Add tcache gc delay option.
This can reduce flushing frequency for small size classes.
2020-05-16 13:34:23 -07:00
David Goldblatt
d338dd45d7 Tcache: Make incremental gc bytes configurable. 2020-05-16 13:34:23 -07:00
David Goldblatt
ec0b579563 Tcache: Privatize opt_lg_tcache_max default. 2020-05-16 13:34:23 -07:00
David Goldblatt
10b96f6351 Tcache: Remove some unused gc constants. 2020-05-16 13:34:23 -07:00
David Goldblatt
181093173d Tcache: make slot sizing configurable. 2020-05-16 13:34:23 -07:00
David Goldblatt
b58dea8d1b Cache bin: expose ncached_max publicly. 2020-05-16 13:34:23 -07:00
David Goldblatt
634afc4124 Tcache: Make size computation configurable. 2020-05-16 13:34:23 -07:00
David Goldblatt
97b7a9cf77 Add a fill/flush microbenchmark. 2020-05-16 13:34:23 -07:00
David Carlier
33372cbd40 cpu instruction spin wait for arm32/64 2020-05-14 10:31:20 -07:00
Brooks Davis
27f29e424b LQ_QUANTUM should be 4 on mips64 hardware.
This matches the ABI stack alignment requirements.
2020-05-14 10:30:37 -07:00
David Goldblatt
eda9c2858f Edata: zero stack edatas before initializing.
This avoids some UB. No compilers take advantage of it for now, but no sense in
tempting fate.
2020-05-14 10:30:20 -07:00
David Goldblatt
5dead37a9d Allow narenas:default.
This can be useful when you know you want to override some lower-priority
configuration setting with its default value, but don't know what that value
would be.
2020-05-14 10:30:08 -07:00
Yinan Zhang
dcea2c0f8b Get rid of TSD -> thread event dependency 2020-05-12 09:16:16 -07:00
Yinan Zhang
75dae934a1 Always initialize TE counters in TSD init 2020-05-12 09:16:16 -07:00
Yinan Zhang
b06dfb9ccc Push event handlers to constituent modules 2020-05-12 09:16:16 -07:00
Yinan Zhang
381c97caa4 Treat postponed prof sample event as new event 2020-05-12 09:16:16 -07:00
Yinan Zhang
abd4674931 Extract out per event postponed wait time fetching 2020-05-12 09:16:16 -07:00
Yinan Zhang
f72014d097 Only compute thread event threshold once per trigger 2020-05-12 09:16:16 -07:00
Yinan Zhang
7324c4f85f Break down event init and handler functions 2020-05-12 09:16:16 -07:00
Yinan Zhang
6de77799de Move thread event wait time update to local 2020-05-12 09:16:16 -07:00
Yinan Zhang
733ae918f0 Extract out per event new wait time fetching 2020-05-12 09:16:16 -07:00
Yinan Zhang
1e2524e15a Do not reset sample wait time when re-initing tdata 2020-05-12 09:16:16 -07:00
Yinan Zhang
855d20f6f3 Remove outdated comments in thread event 2020-05-12 09:16:16 -07:00
Yinan Zhang
fc052ff728 Migrate counter to use locked int 2020-05-12 08:23:15 -07:00
Yinan Zhang
b543c20a94 Minor update to locked int 2020-05-12 08:23:15 -07:00
Yinan Zhang
f533ab6da6 Add forking handling for stats 2020-05-11 15:35:06 -07:00
Yinan Zhang
508303077b Add forking handling for prof idump counter 2020-05-11 15:35:06 -07:00
Yinan Zhang
4d970f8bfc Add forking handling for counter module 2020-05-11 15:35:06 -07:00
Yinan Zhang
2097e1945b Unify write callback signature 2020-05-11 14:51:24 -07:00
Yinan Zhang
fef9abdcc0 Cleanup tcache allocation logic
The logic in tcache allocation no longer involves profiling or
filling.
2020-05-11 12:24:56 -07:00
Yinan Zhang
e6cb6919c0 Consolidate prof inline function headers
The prof inline functions are no longer involved in a circular
dependency, so consolidate the two headers into one.
2020-05-11 12:24:56 -07:00
Yinan Zhang
d454af90f1 Remove unused prof_accum field from arena 2020-05-11 12:24:56 -07:00
Yinan Zhang
8be5584494 Initialize prof idump counter once rather than once per arena 2020-05-11 12:24:56 -07:00
Yinan Zhang
e10e5059e8 Make prof_idump_accum() non-inline 2020-05-11 12:24:56 -07:00
Yinan Zhang
039bfd4e30 Do not rollback prof idump counter in arena_prof_promote() 2020-05-11 12:24:56 -07:00
Yinan Zhang
0295aa38a2 Deduplicate entries in witness error message 2020-05-11 12:04:02 -07:00
David Goldblatt
f1f8a75496 Let opt.zero propagate to core allocation.
I.e. set dopts->zero early on if opt.zero is true, rather than leaving it set by
the entry-point function (malloc, calloc, etc.) and then memsetting.  This
avoids situations where we zero once in the large-alloc pathway and then again
via memset.
2020-05-04 12:36:45 -07:00
David Goldblatt
2c09d43494 Add a benchmark of large allocations. 2020-05-04 12:36:45 -07:00
David Goldblatt
46471ea327 SC: Name the max lookup constant. 2020-05-04 12:27:07 -07:00
David Goldblatt
79dd0c04ed SC: Simplify SC_NPSIZES computation.
Rather than taking all the sizes and subtracting out those that don't fit, we
instead just add up all the ones that do.
2020-05-04 12:27:07 -07:00
David Goldblatt
fb6cfffd39 Configure: Get rid of LG_QUANTA.
This is no longer used.
2020-05-04 12:27:07 -07:00
David Goldblatt
4f8efba824 TSD: Make rtree_ctx a slow-path field.
Performance-sensitive users will use sized deallocation facilities, so that
actually touching the rtree_ctx is unnecessary.  We make it the last element of
the slow data, so that it is for practical purposes almost-fast.
2020-04-14 15:20:19 -07:00
David Goldblatt
cd29ebefd0 Tcache: treat small and large cache bins uniformly 2020-04-14 15:20:19 -07:00
David Goldblatt
a13fbad374 Tcache: split up fast and slow path data. 2020-04-14 15:20:19 -07:00
David Goldblatt
7099c66205 Arena: fill in terms of cache_bins. 2020-04-14 15:20:19 -07:00
David Goldblatt
40e7aed59e TSD: Move in some of the tcache fields.
We had put these in the tcache for cache optimization reasons.  After the
previous diff, these no longer apply.
2020-04-14 15:20:19 -07:00
David Goldblatt
58a00df238 TSD: Put all fast-path data together. 2020-04-14 15:20:19 -07:00
David Goldblatt
3589571bfd SC: use SC_LG_NGROUP instead of its value.
This magic constant introduces inconsistencies.  We should be able to change its
value solely by adjusting the definition in the header.
2020-04-13 10:01:30 -07:00
David Goldblatt
877af247a8 QL, QR: Add documentation. 2020-04-11 10:32:11 -07:00
David Goldblatt
79ae7f9211 Rtree: Remove the per-field accessors.
We instead split things into "edata" and "metadata".
2020-04-10 13:12:47 -07:00
David Goldblatt
26e9a3103d PA: Simple decay test. 2020-04-10 13:12:47 -07:00
David Goldblatt
bb6a418523 Emap: Drop szind/slab splitting parameters.
After the previous diff, these are constants.
2020-04-10 13:12:47 -07:00
David Goldblatt
50289750b3 Extent: Remove szind/slab knowledge. 2020-04-10 13:12:47 -07:00
David Goldblatt
dc26b30094 Rtree: Clean up compact/non-compact split. 2020-04-10 13:12:47 -07:00
David Goldblatt
93b99dd140 Extent: Stop passing an edata_cache everywhere.
We already pass the pa_shard_t around everywhere; we can just use that.
2020-04-10 13:12:47 -07:00
David Goldblatt
a4759a1911 Ehooks: avoid touching arena_emap_global in tests.
That breaks our ability to test custom emaps in isolation.
2020-04-10 13:12:47 -07:00
David Goldblatt
11c47cb133 Extent: Take "bool zero" over "bool *zero". 2020-04-10 13:12:47 -07:00
David Goldblatt
1a1124462e PA: Take zero as a bool rather than as a bool *.
Now that we've moved junking to a higher level of the allocation stack, we don't
care about this performance optimization (which only occurred in debug modes).
2020-04-10 13:12:47 -07:00
David Goldblatt
294b276fc7 PA: Parameterize emap. Move emap_global to arena.
This lets us test the PA module without interfering with the global emap used by
the real allocator (the one not under test).
2020-04-10 13:12:47 -07:00
David Goldblatt
f730577277 Eset: Parameterize last globals accesses.
I.e. opt_retain and maps_coalesce.
2020-04-10 13:12:47 -07:00
David Goldblatt
7bb6e2dc0d Eset: take opt_lg_max_active_fit as a parameter.
This breaks its dependence on the global.
2020-04-10 13:12:47 -07:00
David Goldblatt
883ab327cc Emap: Move out last edata state touching. 2020-04-10 13:12:47 -07:00
David Goldblatt
0c96a2f03b Emap: Move out remaining edata modifications. 2020-04-10 13:12:47 -07:00
David Goldblatt
dfef0df71a Emap: Move edata modification out of emap_remap. 2020-04-10 13:12:47 -07:00
David Goldblatt
12eb888e54 Edata: Add a ranged bit.
We steal the dumpable bit, which we ended up not needing.
2020-04-10 13:12:47 -07:00
David Goldblatt
bd4fdf295e Rtree: Pull leaf contents into their own struct. 2020-04-10 13:12:47 -07:00
David Goldblatt
faec7219b2 PA: Move in decay initialization. 2020-04-10 13:12:47 -07:00
David Goldblatt
45671e4a27 PA: Move in retain growth limit setting. 2020-04-10 13:12:47 -07:00
David Goldblatt
daefde88fe PA: Move in mutex stats reading. 2020-04-10 13:12:47 -07:00
David Goldblatt
07675840a5 PA: Move in some more internals accesses. 2020-04-10 13:12:47 -07:00
David Goldblatt
238f3c7430 PA: Move in full stats merging. 2020-04-10 13:12:47 -07:00
David Goldblatt
81c6027592 Arena stats: Give it its own "mapped".
This distinguishes it from the PA mapped stat, which is now named "pa_mapped" to
avoid confusion. The (derived) arena stat includes base memory, and the PA stat
is no longer partially derived.
2020-04-10 13:12:47 -07:00
David Goldblatt
506d907e40 PA: Move in basic stats merging. 2020-04-10 13:12:47 -07:00
David Goldblatt
f29f6090f5 PA: Add pa_extra.c and put PA forking there. 2020-04-10 13:12:47 -07:00
David Goldblatt
8164fad404 Stats: Fix edata_cache size merging.
Previously, we assigned to the output rather than incrementing it.
2020-04-10 13:12:47 -07:00
David Goldblatt
565045ef71 Arena: Make more derived stats non-atomic/locked. 2020-04-10 13:12:47 -07:00
David Goldblatt
d0c43217b5 Arena stats: Move retained to PA, use plain ints.
Retained is a property of the allocated pages.  The derived fields no longer
require any locking; they're computed on demand.
2020-04-10 13:12:47 -07:00
David Goldblatt
e2cf3fb1a3 PA: Move in all modifications of mapped. 2020-04-10 13:12:47 -07:00
David Goldblatt
436789ad96 PA: Make mapped stat atomic.
We always have atomic_zu_t, and mapped/unmapped transitions are always expensive
enough that trying to piggyback on a lock is a waste of time.
2020-04-10 13:12:47 -07:00
David Goldblatt
3c28aa6f17 PA: Move edata_avail stat in, make it non-atomic. 2020-04-10 13:12:47 -07:00
David Goldblatt
f6bfa3dcca Move extent stats to the PA module.
While we're at it, make them non-atomic -- they are purely derived statistics
(and in fact aren't even in the arena_t or pa_shard_t).
2020-04-10 13:12:47 -07:00
David Goldblatt
527dd4cdb8 PA: Move in nactive counter. 2020-04-10 13:12:47 -07:00
David Goldblatt
c075fd0bcb PA: Minor cleanups and comment fixes. 2020-04-10 13:12:47 -07:00
David Goldblatt
46a9d7fc0b PA: Move in rest of purging. 2020-04-10 13:12:47 -07:00
David Goldblatt
2d6eec7b5c PA: Move in decay-all pathway. 2020-04-10 13:12:47 -07:00
David Goldblatt
65698b7f2e PA: Remove public visibility of some internals. 2020-04-10 13:12:47 -07:00
David Goldblatt
f012c43be0 PA: Move in decay_to_limit 2020-04-10 13:12:47 -07:00
David Goldblatt
103f5feda5 Move bg thread activity check out of purging core. 2020-04-10 13:12:47 -07:00
David Goldblatt
3034f4a508 PA: Move in decay_stashed. 2020-04-10 13:12:47 -07:00
David Goldblatt
aef28b2f8f PA: Move in stash_decayed. 2020-04-10 13:12:47 -07:00
David Goldblatt
655a096343 Move bg inactivity check out of purge inner loop.
I.e. do it once per call to arena_decay_stashed instead of once per muzzy purge.
2020-04-10 13:12:47 -07:00
David Goldblatt
71fc0dc968 PA: Move in remaining page allocation functions. 2020-04-10 13:12:47 -07:00
David Goldblatt
74958567a4 PA: have expand take sizes instead of new usize.
This avoids involving usize, which makes some of the stats modifications more
intuitively correct.
2020-04-10 13:12:47 -07:00
David Goldblatt
5bcc2c2ab9 PA: Have expand take szind and slab.
This isn't really necessary, but having a uniform API will help us later.
2020-04-10 13:12:47 -07:00
David Goldblatt
0880c2ab97 PA: Have large expands use it. 2020-04-10 13:12:47 -07:00
David Goldblatt
7be3dea82c PA: Have slab allocations use it. 2020-04-10 13:12:47 -07:00
David Goldblatt
9f93625c14 PA: Move in arena large allocation functionality. 2020-04-10 13:12:47 -07:00
David Goldblatt
7624043a41 PA: Add ehook-getting support. 2020-04-10 13:12:47 -07:00
David Goldblatt
eba35e2e48 Remove extent knowledge of arena. 2020-04-10 13:12:47 -07:00
David Goldblatt
e77f47a85a Move arena decay getters to PA. 2020-04-10 13:12:47 -07:00
David Goldblatt
48a2cd6d79 Decay: Add a (mostly stub) test case. 2020-04-10 13:12:47 -07:00
David Goldblatt
f77cec311e Decay: Take current time as an argument.
This better facilitates testing.
2020-04-10 13:12:47 -07:00
David Goldblatt
bf55e58e63 Rename test/unit/decay -> test/unit/arena_decay.
This is really more of an end-to-end test at the arena level; it's not just of
the decay code in particular any more.
2020-04-10 13:12:47 -07:00
David Goldblatt
d1d7e1076b Decay: move in some background_thread accesses. 2020-04-10 13:12:47 -07:00
David Goldblatt
cdb916ed3f Decay: Add comments for the public API. 2020-04-10 13:12:47 -07:00
David Goldblatt
8f2193dc8d Decay: Move in arena decay functions. 2020-04-10 13:12:47 -07:00
David Goldblatt
4d090d23f1 Decay: Introduce a stub .c file. 2020-04-10 13:12:47 -07:00
David Goldblatt
7b62885476 Introduce decay module and put decay objects in PA 2020-04-10 13:12:47 -07:00
David Goldblatt
497836dbc8 Arena stats: mark edata_avail as derived.
The true number is in the edata_cache itself.
2020-04-10 13:12:47 -07:00
David Goldblatt
3192d6b77d Extents: Have extent_dalloc_gap take ehooks.
We're almost to the point where the extent code doesn't know about arenas at
all.  In that world, we shouldn't pull them out of the arena.
2020-04-10 13:12:47 -07:00
David Goldblatt
22a0a7b93a Move arena_decay_extent to extent module. 2020-04-10 13:12:47 -07:00
David Goldblatt
70d12ffa05 PA: Move mapped into pa stats. 2020-04-10 13:12:47 -07:00
David Goldblatt
6ca918d0cf PA: Add a stats comment. 2020-04-10 13:12:47 -07:00
David Goldblatt
ce8c0d6c09 PA: Move in arena extent_sn counter.
Just another step towards making PA self-contained.
2020-04-10 13:12:47 -07:00
David Goldblatt
1ada4aef84 PA: Get rid of arena_ind_get calls.
This is another step on the path towards breaking the extent reliance on the
arena module.
2020-04-10 13:12:47 -07:00
David Goldblatt
1ad368c8b7 PA: Move in decay stats. 2020-04-10 13:12:47 -07:00
David Goldblatt
356aaa7dc6 Introduce lockedint module.
This pulls out the various abstractions where some stats counter is sometimes an
atomic, sometimes a plain variable, sometimes always protected by a lock,
sometimes protected by reads but not writes, etc.  With this change, these cases
are treated consistently, and access patterns tagged.

In the process, we fix a few missed-update bugs (where one caller assumes
"protected-by-a-lock" semantics and another does not).
2020-04-10 13:12:47 -07:00
David Goldblatt
acd0bf6a26 PA: move in ecache_grow. 2020-04-10 13:12:47 -07:00
David Goldblatt
32cb7c2f0b PA: Add a stats type. 2020-04-10 13:12:47 -07:00
David Goldblatt
688fb3eb89 PA: Move in the arena edata_cache. 2020-04-10 13:12:47 -07:00
David Goldblatt
8433ad84ea PA: move in shard initialization. 2020-04-10 13:12:47 -07:00
David Goldblatt
a24faed569 PA: Move in the ecache_t objects. 2020-04-10 13:12:47 -07:00
David Goldblatt
585f925055 Move cache index randomization out of extent.
This is logically at a higher level of the stack; extent should just allocate
things at the page-level; it shouldn't care exactly why the callers wants a
given number of pages.
2020-04-10 13:12:47 -07:00
David Goldblatt
12be9f5727 Add a stub PA module -- a page allocator. 2020-04-10 13:12:47 -07:00
Yinan Zhang
c4e9ea8cc6 Get rid of locks in prof recent test 2020-04-07 17:22:24 -07:00
Yinan Zhang
2deabac079 Get rid of custom iterator for last-N records 2020-04-07 17:22:24 -07:00
Yinan Zhang
a5ddfa7d91 Use ql for prof last-N list 2020-04-07 17:22:24 -07:00
David Goldblatt
8da6676a02 Don't do reentrant testing in junk tests. 2020-04-07 15:45:40 -07:00
Yinan Zhang
ce17af4221 Better structure ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
4b66297ea0 Add move constructor to ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
a62b7ed928 Add emptiness checking to ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
1dd24ca6d2 Add rotate functionality to ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
0dc95a882f Add concat and split functionality to ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
1ad06aa53b deduplicate insert and delete logic in qr module 2020-04-06 09:50:27 -07:00
Yinan Zhang
c9d56cddf2 Optimize meld in qr module
The goal of `qr_meld()` is to change the following four fields
`(a->prev, a->prev->next, b->prev, b->prev->next)` from the values
`(a->prev, a, b->prev, b)` to `(b->prev, b, a->prev, a)`.

This commit changes

```
a->prev->next = b;
b->prev->next = a;
temp = a->prev;
a->prev = b->prev;
b->prev = temp;
```

to

```
temp = a->prev;
a->prev = b->prev;
b->prev = temp;
a->prev->next = a;
b->prev->next = b;
```

The benefit is that we can use `b->prev->next` for `temp`, and so
there's no need to pass in `a_type`.

The restriction is that `b` cannot be a `qr_next()` macro, so users
of `qr_meld()` must pay attention.  (Before this change, neither `a`
nor `b` could be a `qr_next()` macro.)
2020-04-06 09:50:27 -07:00
David Goldblatt
0d6d9e8586 configure.ac: Put public symbols on one line. 2020-04-02 13:27:29 -07:00
Yinan Zhang
f9aad7a49b Add piping API to buffered writer 2020-04-01 09:41:20 -07:00
Yinan Zhang
09cd79495f Encapsulate buffer allocation failure in buffered writer 2020-04-01 09:41:20 -07:00
Yinan Zhang
a166c20818 Make prof_tctx_t pointer a true prof atomic fence 2020-03-31 17:43:42 -07:00
David T. Goldblatt
d936b46d3a Add malloc_conf_2_conf_harder
This comes in handy when you're just a user of a canary system who wants to
change settings set by the configuration system itself.
2020-03-31 06:25:08 -07:00
David Goldblatt
3b4a03b92b Mac: don't declare system functions as nothrow.
This contradicts the system headers, which can lead to breakages.
2020-03-26 14:11:24 -07:00
Yinan Zhang
2256ef8961 Add option to fetch system thread name on each prof sample 2020-03-24 21:39:57 -07:00
Yinan Zhang
ccdc70a5ce Fix: assertion could abort on past failures 2020-03-18 20:48:26 -07:00
Yinan Zhang
b30a5c2f90 Reorganize cpp APIs and suppress unused function warnings 2020-03-13 12:16:09 -07:00
David Goldblatt
2e5899c129 Stats: Fix tcache_bytes reporting.
Previously, large allocations in tcaches would have their sizes reduced during
stats estimation.  Added a test, which fails before this change but passes now.

This fixes a bug introduced in 5934846612, which
was itself fixing a bug introduced in 9c0549007d.
2020-03-13 07:53:34 -07:00
Yinan Zhang
a5780598b3 Remove thread_event_rollback() 2020-03-12 13:55:00 -07:00
Yinan Zhang
ba783b3a0f Remove prof -> thread_event dependency 2020-03-12 13:55:00 -07:00
Yinan Zhang
441d88d1c7 Rewrite profiling thread event 2020-03-12 13:55:00 -07:00
David Goldblatt
0dcd576600 Edata cache: atomic fetch-add -> load-store.
The modifications to count are protected by a mutex; there's no need to use the
more costly version.
2020-03-12 11:58:09 -07:00
David Goldblatt
99b1291d17 Edata cache: add edata_cache_small_t.
This can be used to amortize the synchronization costs of edata_cache accesses.
2020-03-12 11:58:09 -07:00
David Goldblatt
734109d9c2 Edata cache: add a unit test. 2020-03-12 11:58:09 -07:00
David Goldblatt
e732344ef1 Inspect test: Reduce checks when profiling is on.
Profiled small allocations don't live in bins, which is contrary to the test
expectation.
2020-03-12 11:58:09 -07:00
David Goldblatt
92485032b2 Cache bin: improve comments. 2020-03-12 11:54:19 -07:00
David Goldblatt
d701a085c2 Fast path: allow low-water mark changes.
This lets us put more allocations on an "almost as fast" path after a flush.
This results in around a 4% reduction in malloc cycles in prod workloads
(corresponding to about a 0.1% reduction in overall cycles).
2020-03-12 11:54:19 -07:00
David Goldblatt
397da03865 Cache bin: rewrite to track more state.
With this, we track all of the empty, full, and low water states together.  This
simplifies a lot of the tracking logic, since we now don't need the
cache_bin_info_t for state queries (except for some debugging).
2020-03-12 11:54:19 -07:00
David Goldblatt
fef0b1ffe4 Cache bin: Remove last internals accesses. 2020-03-12 11:54:19 -07:00
David Goldblatt
0a2fcfac01 Tcache: Hold cache bin allocation explicitly. 2020-03-12 11:54:19 -07:00
David Goldblatt
d498a4bb08 Cache bin: Add an emptiness assertion. 2020-03-12 11:54:19 -07:00
David Goldblatt
6a7aa46ef7 Cache bin: Add a debug method for init checking. 2020-03-12 11:54:19 -07:00
David Goldblatt
370c1ea007 Cache bin: Write the unit test in terms of the API
I.e. stop allowing the unit test to have secret access to implementation
internals.
2020-03-12 11:54:19 -07:00
David Goldblatt
7f5ebd211c Cache bin: set low-water internally. 2020-03-12 11:54:19 -07:00
David Goldblatt
60113dfe3b Cache bin: Move in initialization code. 2020-03-12 11:54:19 -07:00
David Goldblatt
44529da852 Cache-bin: Make flush modifications internal
I.e. the tcache code just calls a cache-bin function to finish flush (and move
pointers around, etc.).  It doesn't directly access the cache-bin's owned memory
any more.
2020-03-12 11:54:19 -07:00
David Goldblatt
ff6acc6ed5 Cache bin: simplify names and argument ordering.
We always start with the cache bin, then its info (if necessary).
2020-03-12 11:54:19 -07:00
David Goldblatt
e1dcc557d6 Cache bin: Only take the relevant cache_bin_info_t
Previously, we took an array of cache_bin_info_ts and an index, and dereferenced
ourselves.  But infos for other cache_bins aren't relevant to any particular
cache bin, so that should be the caller's job.
2020-03-12 11:54:19 -07:00
David Goldblatt
1b00d808d7 cache_bin: Don't let arena see empty position. 2020-03-12 11:54:19 -07:00
David Goldblatt
d303f30796 cache_bin nflush -> n.
We're going to use it on the fill pathway as well.
2020-03-12 11:54:19 -07:00
David Goldblatt
74d36d78ef Cache bin: Make ncached_max a query on the info_t. 2020-03-12 11:54:19 -07:00
David Goldblatt
b66c0973cc cache_bin: Don't allow direct internals access. 2020-03-12 11:54:19 -07:00
David Goldblatt
da68f73296 Move percpu_arena_update.
It's not really part of the API of the arena; it changes which arena we're using
that API on.
2020-03-12 11:54:19 -07:00
David Goldblatt
909c501b07 Cache_bin: Shouldn't know about tcache.
Instead, have it take the cache_bin_info_ts to use by pointer.  While we're
here, add a src file for the cache bin.
2020-03-12 11:54:19 -07:00
David Goldblatt
79f1ee2fc0 Move junking out of arena/tcache code.
This is debug only and we keep it off the fast path.  Moving it here simplifies
the internal logic.

This never tries to junk on regions that were shrunk via xallocx.  I think this
is fine for two reasons:
- The shrunk-with-xallocx case is rare.
- We don't always do that anyway before this diff (it depends on the opt
  settings and extent hooks in effect).
2020-03-12 11:54:19 -07:00
David Goldblatt
b428dceeaf Config: Warn on void * pointer arithmetic.
This is handy while developing, but not portable.
2020-03-12 11:54:19 -07:00
David Goldblatt
22657a5e65 Extents: Silence the "potentially unused" warning. 2020-03-12 11:54:19 -07:00
Yinan Zhang
4a78c6d81b Correct thread event unit test 2020-03-10 09:31:55 -07:00
Yinan Zhang
305b1f6d96 Correction on geometric sampling 2020-03-04 13:55:21 -08:00
David T. Goldblatt
6c3491ad31 Tcache: Unify bin flush logic.
The small and large pathways share most of their logic, even if some of the
individual operations are different.  We pull out the common logic into a
force-inlined function, and then specialize twice, once for each value of
"small".
2020-02-25 10:21:03 -08:00
David T. Goldblatt
9f4fc27389 Ehooks: Fix a build warning.
We wrote `return some_void_func()` in a function returning void, which is
confusing and triggers warnings on MSVC.
2020-02-25 10:21:03 -08:00
Qi Wang
bc31041edb Cirrus-CI: test on new freebsd releases. 2020-02-23 20:43:38 -08:00
Yinan Zhang
51bd147422 Make use of assert_* in test/unit/thread_event.c 2020-02-19 16:03:16 -08:00
Yinan Zhang
9d2cc3b0fa Make use of assert_* in test/unit/prof_recent.c 2020-02-19 16:03:16 -08:00
Yinan Zhang
a88d22ea11 Make use of assert_* in test/unit/inspect.c 2020-02-19 16:03:16 -08:00
Yinan Zhang
0ceb31184d Make use of assert_* in test/unit/buf_writer.c 2020-02-19 16:03:16 -08:00
Yinan Zhang
fa61579382 Add assert_* functionality to tests 2020-02-19 16:03:16 -08:00
Yinan Zhang
21dfa4300d Change assert_* to expect_* in tests
```
grep -Irl assert_ test/ | xargs sed -i \
    's/witness_assert/witness_do_not_replace/g';
grep -Irl assert_ test/ | xargs sed -i \
    's/malloc_mutex_assert_owner/malloc_mutex_do_not_replace_owner/g';

grep -Ir assert_ test/ | grep -o "[_a-zA-Z]*assert_[_a-zA-Z]*" | \
    grep -v "^assert_"; # confirm no output
grep -Irl assert_ test/ | xargs sed -i 's/assert_/expect_/g';

grep -Irl witness_do_not_replace test/ | xargs sed -i \
    's/witness_do_not_replace/witness_assert/g';
grep -Irl malloc_mutex_do_not_replace_owner test/ | xargs sed -i \
    's/malloc_mutex_do_not_replace_owner/malloc_mutex_assert_owner/g';
```
2020-02-19 16:03:16 -08:00
David T. Goldblatt
162c2bcf31 Background thread: take base as a parameter. 2020-02-18 11:22:09 -08:00
David T. Goldblatt
29436fa056 Break prof and tcache knowledge of b0. 2020-02-18 11:22:09 -08:00
David T. Goldblatt
a0c1f4ac57 Rtree: take the base allocator as a parameter.
This facilitates better testing by avoiding mixing of the "real" base with the
base used by the rtree under test.
2020-02-18 11:22:09 -08:00
David T. Goldblatt
7013716aaa Emap: Take (and propagate) a zeroed parameter.
Rtree needs this, and we should really treat them similarly.
2020-02-18 11:22:09 -08:00
David T. Goldblatt
182192f83c Base: Pull into a single header. 2020-02-18 11:22:09 -08:00
David T. Goldblatt
34b7165fde Put szind_t, pszind_t in sz.h. 2020-02-18 11:22:09 -08:00
David Goldblatt
7e6c8a7286 Emap: Standardize naming.
Namespace everything under emap_, always specify what it is we're looking up
(emap_lookup -> emap_edata_lookup), and use "ctx" over "info".
2020-02-17 10:50:51 -08:00
David Goldblatt
ac50c1e44b Emap: Remove direct access to emap internals.
In the process, we do a few local cleanups and optimizations.  In particular,
the size safety check on tcache flush no longer does a redundant load.
2020-02-17 10:50:51 -08:00
David Goldblatt
06e42090f7 Make jemalloc.c use the emap interface.
While we're here, we'll also clean up some style nits.
2020-02-17 10:50:51 -08:00
David Goldblatt
f7d9c6c42d Emap: Move in alloc_ctx lookup functionality. 2020-02-17 10:50:51 -08:00
David Goldblatt
65a54d7714 Emap: Move in szind and slab modifications. 2020-02-17 10:50:51 -08:00
David Goldblatt
9b5d105fc3 Emap: Move in iealloc.
This is logically scoped to the emap.
2020-02-17 10:50:51 -08:00
David Goldblatt
1d449bd9a6 Emap: Internal rtree context setting.
The only time sharing an rtree context saves across extent operations isn't a
no-op is when tsd is unavailable.  But this happens only in situations like
thread death or initialization, and we don't care about shaving off every
possible cycle in such scenarios.
2020-02-17 10:50:51 -08:00
David Goldblatt
08eb1e6c31 Emap: Comments and cleanup
Document some of the public interface, and hide the functions that are no longer
used outside of the emap module.
2020-02-17 10:50:51 -08:00
David Goldblatt
231d1477e5 Rename emap_split_prepare_t -> emap_prepare_t.
Both the split and merge functions use it.
2020-02-17 10:50:51 -08:00
David Goldblatt
0586a56f39 Emap: Move in merge functionality. 2020-02-17 10:50:51 -08:00
David Goldblatt
040eac77cc Tell edatas their creation arena immediately.
This avoids having to pass it in anywhere else.
2020-02-17 10:50:51 -08:00
David Goldblatt
7c7b702064 Emap: Move over metadata splitting logic. 2020-02-17 10:50:51 -08:00
David Goldblatt
44f5f53605 Emap: Move over deregistration functions. 2020-02-17 10:50:51 -08:00
David Goldblatt
6513d9d923 Emap: Move over deregistration boundary functions. 2020-02-17 10:50:51 -08:00
David Goldblatt
9b5ca0b09d Emap: Move in slab interior registration. 2020-02-17 10:50:51 -08:00
David Goldblatt
d05b61db4a Emap: Move extent boundary registration in. 2020-02-17 10:50:51 -08:00
David Goldblatt
ca21ce4071 Emap: Move in write_acquired from extent. 2020-02-17 10:50:51 -08:00
David Goldblatt
01f255161c Add emap, for tracking extent locking. 2020-02-17 10:50:51 -08:00
Qi Wang
0f686e82a3 Avoid variable length array with length 0. 2020-02-16 14:14:07 -08:00
Yinan Zhang
68e8ddcaff Add mallctl for dumping last-N profiling records 2020-02-14 12:46:38 -08:00
Yinan Zhang
bc05ecebf6 Add const qualifier in assert_cmp() 2020-02-14 12:46:38 -08:00
Qi Wang
ba0e35411c Rework the bin locking around tcache refill / flush.
Previously, tcache fill/flush (as well as small alloc/dalloc on the arena) may
potentially drop the bin lock for slab_alloc and slab_dalloc.  This commit
refactors the logic so that the slab calls happen in the same function / level
as the bin lock / unlock.  The main purpose is to be able to use flat combining
without having to keep track of stack state.

In the meantime, this change reduces the locking, especially for slab_dalloc
calls, where nothing happens after the call.
2020-02-13 23:31:54 -08:00
Kamil Rytarowski
7fd22f7b2e Fix Undefined Behavior in hash.h
hash.h:200:27, left shift of 250 by 24 places cannot be represented in type 'int'
2020-02-13 12:25:26 -08:00
Qi Wang
ca1f082251 Disallow merge across mmap regions to preserve SN / first-fit.
Check the is_head state before merging two extents.  Disallow the merge if it's
crossing two separate mmap regions.  This enforces first-fit (by not losing the
SN) at a very small cost.
2020-02-13 12:18:44 -08:00
Yinan Zhang
7014f81e17 Add ASSURED_WRITE in mallctl 2020-02-05 15:29:14 -08:00
Yinan Zhang
2476889195 Add inspect.c to MSVC filters 2020-02-05 10:01:49 -08:00
Yinan Zhang
9cac3fa8f5 Encapsulate buffer allocation in buffered writer 2020-02-04 13:21:58 -08:00
Yinan Zhang
bdc08b5158 Better naming buffered writer 2020-02-04 13:21:58 -08:00
Qi Wang
c6bfe55857 Update the tsd description. 2020-02-04 13:07:05 -08:00
Qi Wang
e896522616 Abbreviate thread-event to te. 2020-02-04 13:07:05 -08:00
Qi Wang
5e500523a0 Remove thread_event_boot(). 2020-02-04 00:18:15 -08:00
Qi Wang
97dd79db6c Implement deallocation events.
Make the event module to accept two event types, and pass around the event
context.  Use bytes-based events to trigger tcache GC on deallocation, and get
rid of the tcache ticker.
2020-02-04 00:18:15 -08:00
zoulasc
536ea6858e NetBSD specific changes:
- NetBSD overcommits
- When mapping pages, use the maximum of the alignment requested and the
  compiled-in PAGE constant which might be greater than the current kernel
  pagesize, since we compile binaries with the maximum page size supported
  by the architecture (so that they work with all kernels).
2020-02-03 15:49:36 -08:00
Qi Wang
974222c626 Add safety check on sdallocx slow / sampled path. 2020-01-31 00:04:22 -08:00
Qi Wang
88d9eca848 Enforce page alignment for sampled allocations.
This allows sampled allocations to be checked through alignment, therefore
enable sized deallocation regardless of cache_oblivious.
2020-01-31 00:04:22 -08:00
Qi Wang
0f552ed673 Don't purge huge extents when decay is off. 2020-01-30 14:40:38 -08:00
Qi Wang
38a48e5741 Set reentrancy to 1 for tsd_state_purgatory.
Reentrancy is already set for other non-nominal tsd states (reincarnated and
minimal_initialized).  Add purgatory to be safe and consistent.
2020-01-30 13:55:20 -08:00
Qi Wang
88b0e03a4e Implement opt.stats_interval and the _opts options.
Add options stats_interval and stats_interval_opts to allow interval based stats
printing.  This provides an easy way to collect stats without code changes,
because opt.stats_print may not work (some binaries never exit).
2020-01-29 09:57:55 -08:00
Qi Wang
d71a145ec1 Chagne prof_accum_t to counter_accum_t for general purpose. 2020-01-29 09:57:55 -08:00
Qi Wang
ea351a7b52 Fix syntax errors in doc for thread.idle. 2020-01-23 16:41:53 -08:00
David Goldblatt
d92f0175c7 Introduce NEITHER_READ_NOR_WRITE in ctl.
This is slightly clearer in meaning.  A function that is both READONLY() and
WRITEONLY() is in fact neither one.
2020-01-22 18:29:13 -08:00
David Goldblatt
6a622867ca Add "thread.idle" mallctl.
This can encapsulate various internal cleaning logic, and can be used to free up
resources before a long sleep.
2020-01-22 18:29:13 -08:00
Yinan Zhang
f81341a48b Fallback to unbuffered printing if OOM 2020-01-21 17:09:44 -08:00
Yinan Zhang
cd6e908241 Add stress test for last-N profiling mode 2020-01-21 16:51:26 -08:00
Yinan Zhang
84b28c6a13 Properly handle tdata deletion race 2020-01-21 16:51:26 -08:00
Yinan Zhang
d331208560 Get rid of redundant logic in prof 2020-01-21 16:51:26 -08:00
Yinan Zhang
a72ea0db60 Restructure and correct sleep utility for testing 2020-01-21 16:51:26 -08:00
Yinan Zhang
7b67ed0b5a Get rid of lock overlap in prof_recent_alloc_reset 2020-01-21 16:51:26 -08:00
David Goldblatt
bd3be8e0b1 Remove commit parameter to ecache functions.
No caller ever wants uncommitted memory.
2020-01-17 10:54:56 -08:00
Yinan Zhang
b8df719d5c No tdata creation for backtracing on dying thread 2020-01-16 21:54:14 -08:00
Qi Wang
dab81bd315 Rework and fix the assertions on malloc fastpath.
The first half of the malloc fastpath may execute before malloc_init.  Make the
assertions work in that case.
2020-01-14 15:00:41 -08:00
Yinan Zhang
ad3f3fc561 Fetch time after tctx and only for samples 2020-01-14 14:36:20 -08:00
Qi Wang
a5d3dd4059 Fix an assertion on extent head state with dss. 2020-01-10 13:29:14 -08:00
Yinan Zhang
2b604a3016 Record request size in prof recent entries 2020-01-10 12:01:01 -08:00
Yinan Zhang
40a391408c Define constructor for buffered writer argument 2020-01-10 11:59:02 -08:00
Yinan Zhang
6d8e616902 Make buffered writer an independent module 2020-01-10 11:59:02 -08:00
Yinan Zhang
6b6b4709b3 Unify buffered writer naming 2020-01-09 14:31:31 -08:00
Yinan Zhang
9a60cf54ec Last-N profiling mode 2019-12-30 15:58:57 -08:00
Yinan Zhang
7a27a05940 Delete tdata states used for cleanup 2019-12-30 15:58:57 -08:00
Yinan Zhang
e98ddf7987 Fix unlikely condition in arena_prof_info_get() 2019-12-30 15:58:57 -08:00
Yinan Zhang
3fa142cf39 Remove _externs from prof internal header names 2019-12-23 11:14:15 -08:00
Yinan Zhang
112dc36dd5 Handle log_mtx during forking 2019-12-20 17:17:48 -08:00
Yinan Zhang
ea42174d07 Refactor profiling headers 2019-12-20 17:17:48 -08:00
David Goldblatt
6342da0970 Ehooks: Further optimize default merge case.
This avoids the cost of an iealloc in cases where the user uses the default
merge hook without using the default extent hooks.
2019-12-20 10:18:40 -08:00
David Goldblatt
f2f2084e79 Ehooks: Assert alloc isn't NULL 2019-12-20 10:18:40 -08:00
David Goldblatt
e210ccc57e Move extent2 -> extent.
Eventually, we may fully break off the extent module; but not for some time.  If
it's going to live on in a non-transitory state, it might as well have the nicer
name.
2019-12-20 10:18:40 -08:00
David Goldblatt
2f4fa80414 Rename extents -> ecache. 2019-12-20 10:18:40 -08:00
David Goldblatt
56cc56b692 Break extent split dependence on arena. 2019-12-20 10:18:40 -08:00
David Goldblatt
0aa9769fb0 Break commit functions' arena dependence 2019-12-20 10:18:40 -08:00
David Goldblatt
48ec5d4355 Break extent_coalesce arena dependence 2019-12-20 10:18:40 -08:00
David Goldblatt
282a382326 Extent: Break [de]activation's arena dependence. 2019-12-20 10:18:40 -08:00
David Goldblatt
576d7047ab Ecache: Should know its arena_ind.
What we call an arena_ind is really the index associated with some particular
set of ehooks; the arena is just the user-visible portion of that.  Making this
explicit, and reframing checks in terms of that, makes the code simpler and
cleaner, and helps us avoid passing the arena itself all throughout extent code.

This lets us put back an arena-specific assert.
2019-12-20 10:18:40 -08:00
David Goldblatt
372042a082 Remove merge dependence on the arena. 2019-12-20 10:18:40 -08:00
David Goldblatt
439219be7e Remove extent_can_coalesce arena dependency. 2019-12-20 10:18:40 -08:00
David Goldblatt
9cad5639ff Ehooks: remove arena_ind parameter.
This lives within the ehooks_t now, so that callers don't need to know it.
2019-12-20 10:18:40 -08:00
David Goldblatt
57fe99d4be Move relevant index into the ehooks_t itself.
It's always passed into the ehooks; keeping it colocated lets us avoid passing
the arena everywhere.
2019-12-20 10:18:40 -08:00
David Goldblatt
c792f3e4ab edata_cache: Remember the associated base_t.
This will save us some trouble down the line when we stop passing arena pointers
everywhere; we won't have to pass around a base_t pointer either.
2019-12-20 10:18:40 -08:00
David Goldblatt
ae23e5f426 Unify extent_alloc_wrapper with the other wrappers.
Previously, it was really more like extents_alloc (it looks in an ecache for an
extent to reuse as its primary allocation pathway).  Make that pathway more
explciitly like extents_alloc, and rename extent_alloc_wrapper_hard accordingly.
2019-12-20 10:18:40 -08:00
David Goldblatt
d8b0b66c6c Put extent_state_t into ecache as well as eset. 2019-12-20 10:18:40 -08:00
David Goldblatt
98eb40e563 Move delay_coalesce from the eset to the ecache. 2019-12-20 10:18:40 -08:00
David Goldblatt
bb70df8e5b Extent refactor: Introduce ecache module.
This will eventually completely wrap the eset, and handle concurrency,
allocation, and deallocation.  For now, we only pull out the mutex from the
eset.
2019-12-20 10:18:40 -08:00
David Goldblatt
0704516245 Ehooks: Add head tracking. 2019-12-20 10:18:40 -08:00
David Goldblatt
09475bf8ac extent_may_dalloc -> ehooks_dalloc_will_fail 2019-12-20 10:18:40 -08:00
David Goldblatt
7859184179 Pull out edata_t caching into its own module. 2019-12-20 10:18:40 -08:00
David Goldblatt
a7862df616 Rename extent_t to edata_t.
This frees us up from the unfortunate extent/extent2 naming collision.
2019-12-20 10:18:40 -08:00
David Goldblatt
865debda22 Rename extent.h -> edata.h.
This name is slightly pithier; a full-on rename will come shortly.
2019-12-20 10:18:40 -08:00
David Goldblatt
a738a66b5c Ehooks: Add some debug zero and addr checks.
These help make sure that the ehooks return properly zeroed memory when required
to.
2019-12-20 10:18:40 -08:00
David Goldblatt
4b2e5ee8b9 Ehooks: Add a "zero" ehook.
This is the first API expansion.  It lets the hooks pick where and how to purge
within themselves.
2019-12-20 10:18:40 -08:00
David Goldblatt
d0f187ad3b Arena: Loosen arena_may_have_muzzy restrictions.
If there are custom extent hooks, pages_can_purge_lazy is not necessarily the
right guard.  We could check ehooks_are_default too, but the case where
purge_lazy is unsupported is rare and getting rarer.  Just checking the decay
interval captures most of the benefit.
2019-12-20 10:18:40 -08:00
David Goldblatt
ebbb973271 Base: Remove some unnecessary reentrancy guards.
The ehooks module will now call these if necessary.
2019-12-20 10:18:40 -08:00
David Goldblatt
403f2d1664 Extents: Split out introspection functionality.
This isn't really part of the core extent allocation facilities.  Especially as
this module grows, having it in its own place may come in handy.
2019-12-20 10:18:40 -08:00
David Goldblatt
92a511d385 Make extent module hermetic.
In the form of extent2.h.  The naming leaves something to be desired, but I'll
leave that for a later diff.
2019-12-20 10:18:40 -08:00
David Goldblatt
e08c581cf1 Extent: Get rid of extent-specific pre/post reentrancy calls.
These are taken care of by the ehook module; the extra increments and
decrements are safe but unnecessary.
2019-12-20 10:18:40 -08:00
David Goldblatt
39fdc690a0 Ehooks comments and cleanup. 2019-12-20 10:18:40 -08:00
David Goldblatt
c8dae890c8 Extent -> Ehooks: Move over default hooks. 2019-12-20 10:18:40 -08:00
David Goldblatt
2fe5108263 Extent -> Ehooks: Move merge hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
1fff4d2ee3 Extent -> Ehooks: Move split hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
a5b42a1a10 Extent -> Ehooks: Move purge_forced hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
368baa42ef Extent -> Ehooks: Move purge_lazy hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
f83fdf5336 Extent: Clean up a comma 2019-12-20 10:18:40 -08:00
David Goldblatt
d78fe241ac Extent -> Ehooks: Move commit and decommit hooks. 2019-12-20 10:18:40 -08:00
David Goldblatt
5459ec9dae Extent -> Ehooks: Move destroy hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
bac8e2e5a6 Extent -> Ehooks: Move dalloc hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
dc8b4e6e13 Extent -> Ehooks: Move alloc hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
703fbc0ff5 Introduce unsafe reentrancy guards.
We have to work to circumvent the safety checks in pre_reentrancy when going
down extent hook pathways.  Instead, let's explicitly have checked and unchecked
guards.
2019-12-20 10:18:40 -08:00
David Goldblatt
ae0d8e8591 Move extent ehook calls into ehooks 2019-12-20 10:18:40 -08:00
David Goldblatt
ba8b9ecbcb Add ehooks module 2019-12-20 10:18:40 -08:00
David Goldblatt
837119a948 base_structs.h: Remove some mid-line tabs. 2019-12-20 10:18:40 -08:00
David Goldblatt
9f6eb09585 Extents: Eagerly initialize extent hooks.
When deferred initialization was added, initializing required copying
sizeof(extent_hooks_t) bytes after a pointer chase. Today, it's just a single
pointer loaded from the base_t. In subsequent diffs, we'll get rid of even that.
2019-12-20 10:18:40 -08:00
David Goldblatt
4278f84603 Move extent hook getters/setters to arena.c
This is where they're logically scoped; they access arena data.
2019-12-20 10:18:40 -08:00
Wenbo Zhang
9226e1f0d8 fix opt.thp:never still use THP with base_new 2019-12-19 13:27:00 -08:00
Qi Wang
d5031ea824 Allow dallocx and sdallocx after tsd destruction.
After a thread turns into purgatory / reincarnated state, still allow dallocx
and sdallocx to function normally.
2019-12-19 11:17:03 -08:00
Yinan Zhang
4afd709d1f Restructure setters for profiling info
Explicitly define three setters:

- `prof_tctx_reset()`: set `prof_tctx` to `1U`, if we don't know in
advance whether the allocation is large or not;
- `prof_tctx_reset_sampled()`: set `prof_tctx` to `1U`, if we already
know in advance that the allocation is large;
- `prof_info_set()`: set a real `prof_tctx`, and also set other
profiling info e.g. the allocation time.

Code structure wise, the prof level is kept as a thin wrapper, the
large level only provides low level setter APIs, and the arena level
carries out the main logic.
2019-12-17 10:01:28 -08:00
Yinan Zhang
1d01e4c770 Initialization utilities for nstime 2019-12-16 16:08:56 -08:00
Qi Wang
dd649c9485 Optimize away the tsd_fast() check on fastpath.
Fold the tsd_state check onto the event threshold check.  The fast threshold is
set to 0 when tsd switch to non-nominal.

The fast_threshold can be reset by remote threads, to refect the non nominal tsd
state change.
2019-12-11 23:44:20 -08:00
Qi Wang
1decf958d1 Fix incorrect usage of cassert. 2019-12-11 14:02:59 -08:00
Yinan Zhang
45836d7fd3 Pass nstime_t pointer for profiling 2019-12-11 11:38:16 -08:00
Yinan Zhang
7d2bac5a38 Refactor destroy code path for prof_tctx 2019-12-10 16:31:05 -08:00
Yinan Zhang
055478cca8 Threshold is no longer updated before prof_realloc() 2019-12-10 16:31:05 -08:00
Yinan Zhang
7e3671911f Get rid of old indentation style for prof 2019-12-06 09:47:51 -08:00
Yinan Zhang
dfdd46f6c1 Refactor prof_tctx_t creation 2019-12-06 09:47:51 -08:00
Yinan Zhang
aa1d71fb7a Rename prof_tctx to alloc_tctx in prof_info_t 2019-12-06 09:47:51 -08:00
Yinan Zhang
5e0b090992 No need to pass usize to prof_tctx_set() 2019-12-06 09:47:51 -08:00
David Goldblatt
1b1e76acfe Disable some spuriously-triggering warnings 2019-12-04 13:45:17 -08:00
Li-Wen Hsu
a70909b130 Test on all supported release of FreeBSD
Keep 11.2 because 11.3 is temporarily not available for now.
2019-12-02 13:18:12 -08:00
Yinan Zhang
5c47a30227 Guard C++ aligned APIs 2019-11-25 18:02:16 -08:00
Yinan Zhang
6945371778 Change tsdn to tsd for profiling code path 2019-11-22 16:31:56 -08:00
Yinan Zhang
b55419f9b9 Restructure profiling
Develop new data structure and code logic for holding profiling
related information stored in the extent that may be needed after the
extent is released, which in particular is the case for the
reallocation code path (e.g. in `rallocx()` and `xallocx()`).  The
data structure is a generalization of `prof_tctx_t`: we previously
only copy out the `prof_tctx` before the extent is released, but we
may be in need of additional fields. Currently the only additional
field is the allocation time field, but there may be more fields in
the future.

The restructuring also resolved a bug: `prof_realloc()` mistakenly
passed the new `ptr` to `prof_free_sampled_object()`, but passing in
the `old_ptr` would crash because it's already been released.  Now
the essential profiling information is collectively copied out early
and safely passed to `prof_free_sampled_object()` after the extent is
released.
2019-11-22 16:31:56 -08:00
Mark Santaniello
8b2c2a596d Support C++17 over-aligned allocation
Summary:
Add support for C++17 over-aligned allocation:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0035r4.html

Supporting all 10 operators means we avoid thunking thru libstdc++-v3/libsupc++ and just call jemalloc directly.

It's also worth noting that there is now an aligned *and sized* operator delete:
```
void operator delete(void* ptr, std::size_t size, std::align_val_t al) noexcept;
```

If JeMalloc did not provide this, the default implementation would ignore the size parameter entirely:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/del_opsa.cc#L30-L33

(I must also update ax_cxx_compile_stdcxx.m4 to a newer version with C++17 support.)

Test Plan:
Wrote a simple test that allocates and then deletes an over-aligned type:
```
struct alignas(32) Foo {};
Foo *f;

int main()
{
  f = new Foo;
  delete f;
}
```

Before this change, both new and delete go thru PLT, and we end up calling regular old free:
```
(gdb) disassemble
Dump of assembler code for function main():
...
   0x00000000004029b7 <+55>:    call   0x4022d0 <_ZnwmSt11align_val_t@plt>
...
   0x00000000004029d5 <+85>:    call   0x4022e0 <_ZdlPvmSt11align_val_t@plt>
...
(gdb) s
free (ptr=0x7ffff6408020) at /home/engshare/third-party2/jemalloc/master/src/jemalloc.git-trunk/src/jemalloc.c:2842
2842            if (!free_fastpath(ptr, 0, false)) {
```

After this change, we directly call new/delete and ultimately call sdallocx:
```
(gdb) disassemble
Dump of assembler code for function main():
...
   0x0000000000402b77 <+55>:    call   0x496ca0 <operator new(unsigned long, std::align_val_t)>
...
   0x0000000000402b95 <+85>:    call   0x496e60 <operator delete(void*, unsigned long, std::align_val_t)>
...
(gdb) s
116             je_sdallocx_noflags(ptr, size);
```
2019-11-22 10:14:16 -08:00
Qi Wang
9a3c738009 Refactor arena_bin_malloc_hard(). 2019-11-21 11:41:26 -08:00
Qi Wang
9a7ae3c97f Reduce footprint of bin_t.
Avoid storing mutex_prof_data_t in bin_t.  Added bin_stats_data_t which is used
for reporting bin stats.
2019-11-21 11:08:36 -08:00
Qi Wang
cb1a1f4ada Remove the unnecessary alloc_ctx on free_fastpath. 2019-11-16 13:41:13 -08:00
Qi Wang
7160617107 Add branch hints to free_fastpath.
Explicityly mark the non-slab case unlikely.  Previously there were jumps in the
common case.
2019-11-16 13:41:13 -08:00
Qi Wang
a787d2f5b3 Prefer getaffinity() to detect number of CPUs. 2019-11-15 16:24:38 -08:00
Qi Wang
04cb7d4d6b Bail out early for muzzy decay.
This avoids taking the muzzy decay mutex with the default setting.
2019-11-15 16:24:15 -08:00
Yinan Zhang
73510dfd15 Revert "Fix bug in prof_realloc"
This reverts commit 3b5eecf102.
2019-11-15 15:13:39 -08:00
Yinan Zhang
3b5eecf102 Fix bug in prof_realloc
We should pass in `old_ptr` rather than the new `ptr` to
`prof_free_sampled_object()` when `old_ptr` points to a sampled
allocation.
2019-11-15 13:28:33 -08:00
Qi Wang
e4c36a6f30 Emphasize no modification through thread.allocatedp allowed. 2019-11-13 09:12:08 -08:00
Leonardo Santagada
c462753cc8 Use __forceinline for JEMALLOC_ALWAYS_INLINE on msvc 2019-11-12 13:50:25 -08:00
Qi Wang
836d7a7e69 Check for large size first in the uncommon case of malloc.
Larger sizes are not that uncommon comparing to !tsd_fast.
2019-11-11 13:30:20 -08:00
Qi Wang
9c59abe42a Fix a typo in Makefile. 2019-11-11 12:17:08 -08:00
Qi Wang
da50d8ce87 Refactor and optimize prof sampling initialization.
Makes the prof sample prng use the tsd prng_state.  This allows us to properly
initialize the sample interval event, without having to create tdata.  As a
result, tdata will be created on demand (when a thread reaches the sample
interval bytes allocated), instead of on the first allocation.
2019-11-11 10:35:37 -08:00
Qi Wang
bc774a3519 Rename tsd->offset_state to tsd->prng_state. 2019-11-11 10:35:37 -08:00
Qi Wang
19a51abf33 Avoid arena->offset_state when tsd not available for prng.
Use stack locals and remove the offset_state in arena.
2019-11-11 10:35:37 -08:00
Nick Desaulniers
d01b425e5d Add -Wimplicit-fallthrough checks if supported
Clang since r369414 (clang-10) can now check -Wimplicit-fallthrough for
C code, and use the GNU C style attribute to denote fallthrough.

Move the test from header only to autoconf. The previous test used
brittle version detection which did not work for newer clang that
supported this feature.

The attribute has to be its own statement, hence the added `;`. It also
can only precede case statements, so the final cases should be
explicitly terminated with break statements.

Fixes commit 3d29d11ac2 ("Clean compilation -Wextra")
Link: 1e0affb6e5
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
2019-11-08 13:03:03 -08:00
Yinan Zhang
a8b578d538 Remove mallctl test for zero_realloc 2019-11-05 10:09:18 -08:00
Yinan Zhang
43f0ce92d8 Define general purpose tsd_thread_event_init() 2019-11-04 16:07:56 -08:00
Yinan Zhang
97f93fa0f2 Pull tcache GC events into thread event handler 2019-11-04 16:07:56 -08:00
Yinan Zhang
198f02e797 Pull prof_accumbytes into thread event handler 2019-11-04 15:21:16 -08:00
Yinan Zhang
152c0ef954 Build a general purpose thread event handler 2019-11-04 11:15:50 -08:00
RingsC
6924f83cb2 use SYS_openat when available
some architecture like AArch64 may not have the open syscall, but have
openat syscall. so check and use SYS_openat if SYS_openat available if
SYS_open is not supported at init_thp_state.
2019-11-01 13:06:40 -07:00
David T. Goldblatt
de81a4eada Add stats counters for number of zero reallocs 2019-10-29 17:48:44 -07:00
David T. Goldblatt
9cfa805947 Realloc: Make behavior of realloc(ptr, 0) configurable. 2019-10-29 17:48:44 -07:00
David T. Goldblatt
ee961c2310 Merge realloc and rallocx pathways. 2019-10-29 17:48:44 -07:00
Yinan Zhang
bd6e28d6a3 Guard slabcur fetching in extent_util 2019-10-28 17:27:51 -07:00
Yinan Zhang
4786099a3a Increase column width for global malloc/free rate 2019-10-24 14:54:51 -07:00
Yinan Zhang
05681e387a Optimize cache_bin_alloc_easy for malloc fast path
`tcache_bin_info` is not accessed on malloc fast path but the
compiler reserves a register for it, as well as an additional
register for `tcache_bin_info[ind].stack_size`.  The optimization
gets rid of the need for the two registers.
2019-10-21 16:43:45 -07:00
Yinan Zhang
4fe50bc7d0 Fix amd64 MSVC warning 2019-10-18 10:16:29 -07:00
Yinan Zhang
4fbbc817c1 Simplify time setting and getting for prof log 2019-10-16 09:24:52 -07:00
Qi Wang
4094b7c03f Limit # of iters of test_bitmap_xfu.
Otherwise the test is too slow for higher page sizes such as 64k.
2019-10-09 11:15:37 -07:00
Yinan Zhang
66e07f986d Suppress tdata creation in reentrancy
This change suppresses tdata initialization and prof sample threshold
update in interrupting malloc calls.  Interrupting calls have no need
for tdata.  Delaying tdata creation aligns better with our lazy tdata
creation principle, and it also helps us gain control back from
interrupting calls more quickly and reduces any risk of delegating
tdata creation to an interrupting call.
2019-10-04 08:52:50 -07:00
Yinan Zhang
beb7c16e94 Guard prof_active reset by opt_prof
Set `prof_active` to read-only when `opt_prof` is turned off.
2019-10-02 11:42:53 -07:00
Gareth Lloyd
1df9dd3515 Fix je_ prefix issue in test 2019-09-24 11:24:57 -07:00
David T. Goldblatt
3d84bd57f4 Arena: Add helper function arena_get_from_extent. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
c97d255752 Eset: Remove temporary declaration. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
ce5b128f10 Remove the undefined extent_size_quantize declarations. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
821dd53a1d Extent -> Eset: Rename arena members. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
e144b21e4b Extent -> Eset: Move fork handling. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
77bbb35a92 Extent -> Eset: Move extent fit functions. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
1210af9a4e Extent -> Eset: Move insertion and removal. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
a42861540e Extents -> Eset: Convert some stats getters. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
820f070c6b Move page quantization to sz module. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
63d1b7a7a7 Extents -> Eset: move extents_state_get. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
b416b96a39 Extents -> Eset: rename/move extents_init. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
e6180fe1b4 Eset: Add a source file.
This will let us move extents_* functions over one by one.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
4e5e43f22e Rename extents_t -> eset_t. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
723ccc6c27 Extents: Split out extent struct. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
41187bdfb0 Extents: Break extent-struct/arena interactions
Specifically, the extent_arena_[g|s]et functions and the address randomization.

These are the only things that tie the extent struct itself to the arena code.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
529cfe2abc Arena: rename arena_structs_b.h -> arena_structs.h
arena_structs_a.h was removed in the previous commit.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
e7cf84a8dd Rearrange slab data and constants
The constants logically belong in the sc module. The slab data bitmap isn't
really scoped to an arena; move it to its own module.
2019-09-23 23:06:27 -07:00
Qi Wang
d1be488cd8 Add --with-lg-page=16 to CI. 2019-09-22 18:51:03 -07:00
Qi Wang
ac5185f73e Fix tcache bin stack alignment.
Set the proper alignment when allocating space for the tcache bin stack.
2019-09-13 12:32:29 -07:00
zhxchen17
b7c7df24ba Add max_per_bg_thd stats for per background thread mutexes.
Added a new stats row to aggregate the maximum value of mutex counters for each
background threads.  Given that the per bg thd mutex is not expected to be
contended, this counter is mainly for sanity check / debugging.
2019-09-13 09:23:57 -07:00
zhxchen17
4b76c684bb Add "prof.dump_prefix" to override filename prefixes for dumps. 2019-09-12 22:26:03 -07:00
zhxchen17
242af439b8 Rename "prof_dump_seq_mtx" to "prof_dump_filename_mtx". 2019-09-12 22:26:03 -07:00
Giridhar Prasath R
e06658cb24 check GNU make exists in path
Signed-off-by: Giridhar Prasath R <cristianoprasath@gmail.com>
2019-09-11 16:36:19 -07:00
Qi Wang
22bc75ee3e Workaround the stringop-overflow check false positives. 2019-09-09 11:35:04 -07:00
Yinan Zhang
93d6151800 Pass tsd down to prof_backtrace() 2019-09-05 10:57:43 -07:00
Yinan Zhang
671f120e26 Fix prof_backtrace() reentrancy level 2019-09-05 10:57:43 -07:00
Qi Wang
785b84e603 Make cache_bin_sz_t unsigned.
The bin size type was made signed only because the low_water could go -1, which
was already removed.
2019-09-04 13:37:07 -07:00
Qi Wang
23dc7a7fba Fix index type for cache_bin_alloc_easy. 2019-09-04 13:37:07 -07:00
Qi Wang
2abb02ecd7 Fix MSVC 2015 build, as proposed by @christianaguilera-foundry. 2019-08-28 23:37:24 -07:00
Qi Wang
719583f14a Fix large.nflushes in the merged stats. 2019-08-28 23:37:00 -07:00
Yinan Zhang
adce29c885 Optimize for prof_active off
Move the handling of `prof_active` off case completely to slow path,
so as to reduce register pressure on malloc fast path.
2019-08-27 14:48:56 -07:00
Yinan Zhang
49e6fbce78 Always adjust thread_(de)allocated 2019-08-26 11:56:41 -07:00
Yinan Zhang
57b81c078e Pull thread_(de)allocated out of config_stats 2019-08-26 11:56:41 -07:00
Yinan Zhang
9e031c1d11 Bug fix for prof_active switch
The bug is subtle but critical: if application performs the following
three actions in sequence: (a) turn `prof_active` off, (b) make at
least one allocation that triggers the malloc slow path via the
`if (unlikely(bytes_until_sample < 0))` path, and (c) turn
`prof_active` back on, then the application would never get another
sample (until a very very long time later).

The fix is to properly reset `bytes_until_sample` rather than
throwing it all the way to `SSIZE_MAX`.

A side minor change is to call `prof_active_get_unlocked()` rather
than directly grabbing the `prof_active` variable - it is the very
reason why we defined the `prof_active_get_unlocked()` function.
2019-08-22 13:00:10 -07:00
Qi Wang
0043e68d4c Track low_water == -1 case explicitly.
The -1 value of low_water indicates if the cache has been depleted and
refilled.  Track the status explicitly in the tcache struct.

This allows the fast path to check if (cur_ptr > low_water), instead of >=,
which avoids reaching slow path when the last item is allocated.
2019-08-21 16:00:38 -07:00
Qi Wang
937ca1db9f Store ncached_max * ptr_size in tcache_bin_info.
With the cache bin metadata switched to pointers, ncached_max is usually
accessed and timed by sizeof(ptr). Store the results in tcache_bin_info for
direct access, and add a helper function for the ncached_max value.
2019-08-19 12:23:24 -07:00
Qi Wang
7599c82d48 Redesign the cache bin metadata for fast path.
Implement the pointer-based metadata for tcache bins --
- 3 pointers are maintained to represent each bin;
- 2 of the pointers are compressed on 64-bit;
- is_full / is_empty done through pointer comparison;

Comparing to the previous counter based design --
- fast-path speed up ~15% in benchmarks
- direct pointer comparison and de-reference
- no need to access tcache_bin_info in common case
2019-08-19 12:21:44 -07:00
Qi Wang
d2dddfb82a Add hint in the bogus version string. 2019-08-16 16:08:18 -07:00
Qi Wang
d6b7995c16 Update INSTALL.md about the default doc build. 2019-08-16 10:03:34 -07:00
Qi Wang
e2c7584361 Simplify / refactor tcache_dalloc_large. 2019-08-14 13:08:23 -07:00
Qi Wang
9c5c2a2c86 Unify the signature of tcache_flush small and large. 2019-08-14 13:08:23 -07:00
Yinan Zhang
28ed9b9a51 Buffer stats printing
Without buffering `malloc_stats_print` would invoke the write back
call (which could mean an expensive `malloc_write_fd` call) for every
single `printf` (including printing each line break and each leading
tab/space for indentation).
2019-08-13 09:40:11 -07:00
Yinan Zhang
eb70fef8ca Make compact json format as default
Saves 20-50% of the output size.
2019-08-12 13:59:50 -07:00
Yinan Zhang
a219cfcda3 Clear tcache prof_accumbytes in tcache_flush_cache
`tcache->prof_accumbytes` should always be cleared after being
transferred to arena; otherwise the allocations would be double
counted, leading to excessive prof dumps.
2019-08-12 09:08:09 -07:00
Yinan Zhang
ad3f7dbfa0 Buffer prof_log_stop
Make use of the new buffered writer for the output of `prof_log_stop`.
2019-08-12 09:06:01 -07:00
Qi Wang
5934846612 Fix large bin index accessed through cache bin descriptor. 2019-08-11 16:31:12 -07:00
Qi Wang
22746d3c9f Properly dalloc prof nodes with idalloctm.
The prof_alloc_node is allocated through ialloc as internal.  Switch to
idalloctm with tcache and is_internal properly set.
2019-08-09 10:29:49 -07:00
Yinan Zhang
8c8466fa6e Add compact json option for emitter
JSON format is largely meant for machine-machine communication, so
adding the option to the emitter.  According to local testing, the
savings in terms of bytes outputted is around 50% for stats printing
and around 25% for prof log printing.
2019-08-09 09:53:41 -07:00
Yinan Zhang
7fc6b1b259 Add buffered writer
The buffered writer adopts a signature identical to `write_cb`,
so that it can be plugged into anywhere `write_cb` appears.
2019-08-09 09:44:29 -07:00
Yinan Zhang
39343555d6 Report stats for tdatas_mtx and prof_dump_mtx 2019-08-09 09:24:16 -07:00
Qi Wang
87e2400cbb Fix tcaches mutex pre- / post-fork handling. 2019-08-08 10:55:32 -07:00
Yinan Zhang
07ce2434bf Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-08-07 19:48:28 -07:00
Yinan Zhang
56126d0d2d Refactor prof log
Prof logging is conceptually seperate from core profiling, so
split it out as a module of its own.  There are a few internal
functions that had to be exposed but I think it is a fair trade-off.
2019-08-07 13:53:45 -07:00
Yinan Zhang
56c8ecffc1 Correct tsd layout graph
Augmented the tsd layout graph so that the two recently added fields,
`offset_state` and `bytes_until_sample`, are properly reflected.
As is shown, the cache footprint is 16 bytes larger than before.
2019-08-05 15:30:20 -07:00
Qi Wang
ea6b3e973b Merge branch 'dev' 2019-08-05 12:59:21 -07:00
Qi Wang
0cfa36a58a Update Changelog for 5.2.1. 2019-08-05 12:52:43 -07:00
Qi Wang
8a94ac25d5 Sanity check on prof dump buffer size. 2019-08-01 17:55:45 -07:00
Yinan Zhang
82b8aaaeb6 Quick fix for prof log printing
The emitter APIs used were incorrect, a side effect of which was
extra lines being printed.
2019-07-30 19:31:28 -07:00
Yinan Zhang
9344d25488 Workaround to address g++ unused variable warnings
g++ 5.5.0+ complained `parameter ‘expected’ set but not used
[-Werror=unused-but-set-parameter]` (despite that `expected` is in
fact used).
2019-07-30 11:37:56 -07:00
Qi Wang
c9cdc1b27f Limit to exact fit on Windows with retain off.
W/o retain, split and merge are disallowed on Windows.  Avoid doing first-fit
which needs splitting almost always.  Instead, try exact fit only and bail out
early.
2019-07-29 16:19:36 -07:00
Qi Wang
5742473cc8 Revert "Refactor prof log"
This reverts commit 7618b0b8e4.
2019-07-29 14:10:15 -07:00
Qi Wang
1a0503367b Revert "Refactor profiling"
This reverts commit 0b462407ae.
2019-07-29 14:10:15 -07:00
Yinan Zhang
0b462407ae Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-07-29 13:55:00 -07:00
Yinan Zhang
7618b0b8e4 Refactor prof log
`prof.c` is growing too long, so trying to modularize it.  There are
a few internal functions that had to be exposed but I think it is a
fair trade-off.
2019-07-29 13:55:00 -07:00
Qi Wang
85f0cb2d0c Add indent to individual options for confirm_conf. 2019-07-25 17:00:31 -07:00
Qi Wang
9f6a9f4c1f Update manual for opt.retain (new default on Windows). 2019-07-25 15:25:58 -07:00
Qi Wang
10fcff6c38 Lower nthreads in test/unit/retained on 32-bit to avoid OOM. 2019-07-25 13:10:03 -07:00
Qi Wang
a3fa597921 Refactor arena_dalloc() / _sdalloc(). 2019-07-24 18:30:54 -07:00
Qi Wang
bc0998a905 Invoke arena_dalloc_promoted() properly w/o tcache.
When tcache was disabled, the dalloc promoted case was missing.
2019-07-24 18:30:54 -07:00
Qi Wang
1d148f353a Optimize max_active_fit in first_fit.
Stop scanning once reached the first max_active_fit size.
2019-07-24 11:28:45 -07:00
Qi Wang
4e36ce34c1 Track the leaked VM space via the abandoned_vm counter.
The counter is 0 unless metadata allocation failed (indicates OOM), and is
mainly for sanity checking.
2019-07-24 11:24:22 -07:00
Qi Wang
42807fcd9e extent_dalloc instead of leak when register fails.
extent_register may only fail if the underlying extent and region got stolen /
coalesced before we lock.  Avoid doing extent_leak (which purges the region)
since we don't really own the region.
2019-07-23 22:34:45 -07:00
Qi Wang
57dbab5d6b Avoid leaking extents / VM when split is not supported.
This can only happen on Windows and with opt.retain disabled (which isn't the
default).  The solution is suboptimal, however not a common case as retain is
the long term plan for all platforms anyway.
2019-07-23 22:18:55 -07:00
Qi Wang
badf8d95f1 Enable opt.retain by default on Windows. 2019-07-23 22:18:55 -07:00
Qi Wang
9a86c65abc Implement retain on Windows.
The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot
be used across multiple VirtualAlloc regions.  To properly support decommit,
only allow merge / split within the same region -- this is done by tracking the
"is_head" state of extents and not merging cross-region.

Add a new state is_head (only relevant for retain && !maps_coalesce), which is
true for the first extent in each VirtualAlloc region.  Determine if two extents
can be merged based on the head state, and use serial numbers for sanity checks.
2019-07-23 22:18:55 -07:00
Qi Wang
f32f23d6cc Fix posix_memalign with input size 0.
Return a valid pointer instead of failed assertion.
2019-07-18 00:43:23 -07:00
Yinan Zhang
a2a693e722 Remove prof_accumbytes in arena
`prof_accumbytes` was supposed to be replaced by `prof_accum` in
https://github.com/jemalloc/jemalloc/pull/623.
2019-07-16 15:18:52 -07:00
Yinan Zhang
e0a0c8d4bf Fix a bug in prof_dump_write
The original logic can be disastrous if `PROF_DUMP_BUFSIZE` is less
than `slen` -- `prof_dump_buf_end + slen <= PROF_DUMP_BUFSIZE` would
always be `false`, so `memcpy` would always try to copy
`PROF_DUMP_BUFSIZE - prof_dump_buf_end` chars, which can be
dangerous: in the last round of the `while` loop it would not only
illegally read the memory beyond `s` (which might not always be
disastrous), but it would also illegally overwrite the memory beyond
`prof_dump_buf` (which can be pretty disastrous).  `slen` probably
has never gone beyond `PROF_DUMP_BUFSIZE` so we were just lucky.
2019-07-16 15:15:32 -07:00
Yinan Zhang
d26636d566 Fix logic in printing
`cbopaque` can now be overriden without overriding `write_cb` in
the first place.  (Otherwise there would be no need to have the
`cbopaque` parameter in `malloc_message`.)
2019-07-16 14:54:23 -07:00
Qi Wang
34e75630cc Reorder the configs for AppVeyor.
Enable-debug and 64-bit runs tend to be more relevant. 	Run them first.
2019-07-14 23:06:24 -07:00
Yinan Zhang
7720b6e385 Fix redzone setting and checking 2019-07-11 20:51:29 -07:00
frederik-h
40a3435b8d Add missing safety_check.c to MSBuild projects
The file is included in the list of source files in Makefile.in,
but it is missing from the project files. This causes the
build to fail due to unresolved symbols.
2019-05-24 09:00:19 -07:00
Qi Wang
1a71533511 Avoid blocking on background thread lock for stats.
Background threads may run for a long time, especially when the # of dirty pages
is high.  Avoid blocking stats calls because of this (which may cause latency
spikes).
2019-05-22 14:28:38 -07:00
Qi Wang
e13cf65a5f Add experimental.arenas.i.pactivep.
The new experimental mallctl exposes the arena pactive counter to applications,
which allows fast read w/o going through the mallctl / epoch steps.  This is
particularly useful when frequent balancing is required, e.g. when having
multiple manual arenas, and threads are multiplexed to them based on usage.
2019-05-22 14:27:58 -07:00
Yinan Zhang
c92ac30601 Add confirm_conf option
If the confirm_conf option is set, when the program starts, each of
the four malloc_conf strings will be printed, and each option will
be printed when being set.
2019-05-22 09:38:39 -07:00
Yinan Zhang
4c63b0e76a Improve memory utilization tests
Added tests for large size classes and expanded the tests to
cover wider range of allocation sizes.
2019-05-21 12:57:06 -07:00
Vaibhav Jain
2d6d099fed Fix GCC-9.1 warning with macro GET_ARG_NUMERIC
GCC-9.1 reports following error when trying to compile file
src/malloc_io.c and with CFLAGS='-Werror' :

src/malloc_io.c: In function ‘malloc_vsnprintf’:
src/malloc_io.c:369:2: error: case label value exceeds maximum value for type [-Werror]
  369 |  case '?' | 0x80:      \
      |  ^~~~
src/malloc_io.c:581:5: note: in expansion of macro ‘GET_ARG_NUMERIC’
  581 |     GET_ARG_NUMERIC(val, 'p');
      |     ^~~~~~~~~~~~~~~
...
<snip>
cc1: all warnings being treated as errors
make: *** [Makefile:388: src/malloc_io.sym.o] Error 1

The warning is reported as by default the type 'char' is 'signed char'
and or-ing 0x80 will turn the case label char negative which will be
beyond the printable ascii range (0 - 127).

The patch fixes this by explicitly casting the 'len' variable as
unsigned char' inside the 'switch' statement so that value of
expression " '?' | 0x80 " falls within the legal values of the
variable 'len'.
2019-05-21 11:20:07 -07:00
Qi Wang
07c44847c2 Track nfills and nflushes for arenas.i.small / large.
Small is added purely for convenience.  Large flushes wasn't tracked before and
can be useful in analysis.  Large fill simply reports nmalloc, since there is no
batch fill for large currently.
2019-05-15 10:05:09 -07:00
Yinan Zhang
13e88ae970 Fix assert in free fastpath
rtree_szind_slab_read_fast() may have not initialized
alloc_ctx.szind, unless after confirming the return is true.
2019-05-15 09:42:52 -07:00
Yinan Zhang
259b15dec5 Improve macro readability in malloc_conf_init
Define more readable macros than yes and no.
2019-05-08 14:15:03 -07:00
Dave Watson
5679751208 Remove best fit
This option saves a few CPU cycles, but potentially adds a lot of
fragmentation - so much so that there are workarounds like
max_active.  Instead, let's just drop it entirely.  It only made
a difference in one service I tested (.3% cpu regression), while
many services saw a memory win (also small, less than 1% mem P99)
2019-05-08 13:15:19 -07:00
Dave Watson
b62d126df8 Add max_active_fit to first_fit
The max_active_fit check is currently only on the best_fit
path, add it to the first_fit path also.
2019-05-08 13:15:19 -07:00
Doron Roberts-Kedes
7fc4f2a32c Add nonfull_slabs to bin_stats_t.
When config_stats is enabled track the size of bin->slabs_nonfull in
the new nonfull_slabs counter in bin_stats_t. This metric should be
useful for establishing an upper ceiling on the savings possible by
meshing.
2019-04-29 13:35:02 -07:00
Yinan Zhang
ae124b8684 Improve size class header
Mainly fixing typos.  The only non-trivial change is in the
computation for SC_NPSIZES, though the result wouldn't be any
different when SC_NGROUP = 4 as is always the case at the moment.
2019-04-24 10:45:12 -07:00
Fabrice Fontaine
702d76dbd0 configure.ac: Add an option to disable doc
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
2019-04-23 15:32:02 -07:00
498f47e1ec Fix typo derived from tcmalloc's pprof
The same pr is submitted into gperftools:

https://github.com/gperftools/gperftools/pull/1105
2019-04-23 15:29:57 -07:00
Qi Wang
1aabab5fdc Enforce TLS_MODEL attribute.
Caught by @zoulasc in #1460.  The attribute needs to be added in the headers as
well.
2019-04-16 11:07:15 -07:00
David Goldblatt
21cfe59ff7 Safety checks: Run tests by default 2019-04-15 16:48:12 -07:00
David Goldblatt
33e1dad680 Safety checks: Add a redzoning feature. 2019-04-15 16:48:12 -07:00
David Goldblatt
b92c9a1a81 Safety checks: Indirect through a function.
This will let us share code on failure pathways.pathways
2019-04-15 16:48:12 -07:00
David Goldblatt
f95a88fcd9 Safety checks: Expose config value via mallctl and stats. 2019-04-15 16:48:12 -07:00
David Goldblatt
f4d24f05e1 Move extra size checks behind a config flag.
This will let us turn that flag into a generic "turn on runtime checks" flag
that guards other functionality we have planned.
2019-04-15 16:48:12 -07:00
zoulasc
7f7935cf78 Add an autoconf feature test for format_arg and a jemalloc-specific
macro for it.
2019-04-15 15:14:46 -07:00
zoulasc
14e4176758 Fix incorrect macro use.
Compiling with warnings produces missing prototype warnings.
2019-04-15 15:14:46 -07:00
zoulasc
020b5dc7ac Convert the format generator function to an annotated format function,
so that the generated formats can be checked by the compiler.
2019-04-15 15:14:46 -07:00
Yinan Zhang
7ee3897740 Separate tests for extent utilization API
As title.
2019-04-10 13:03:20 -07:00
mgrice
d3d7a8ef09 remove compare and branch in fast path for c++ operator delete[]
Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation).  This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).
2019-04-08 10:59:05 -07:00
Qi Wang
c2a3a7cd3f Fix test/unit/prof_log
Compiler optimizations may produce traces more than expected.  Instead verify
the lower bound only.
2019-04-05 13:47:10 -07:00
Qi Wang
93084cdc89 Ensure page alignment on extent_alloc.
This is discovered and suggested by @jasone in #1468.  When custom extent hooks
are in use, we should ensure page alignment on the extent alloc path, instead of
relying on the user hooks to do so.
2019-04-04 13:49:37 -07:00
Yinan Zhang
9aab3f2be0 Add memory utilization analytics to mallctl
The analytics tool is put under experimental.utilization namespace in
mallctl.  Input is one pointer or an array of pointers and the output
is a list of memory utilization statistics.
2019-04-04 13:48:39 -07:00
Qi Wang
b0b3e49a54 Merge branch 'dev' 2019-04-02 17:50:42 -07:00
Qi Wang
f7489dc8f1 Update Changelog for 5.2.0. 2019-04-02 17:40:42 -07:00
Qi Wang
978a7a21ae Use iallocztm instead of ialloc in prof_log functions.
Explicitly use iallocztm for internal allocations.  ialloc could trigger arena
creation, which may cause lock order reversal (narenas_mtx and log_mtx).
2019-04-02 16:53:00 -07:00
Qi Wang
6fe11633b0 Fix the binshard unit test.
The test attempts to trigger usage of multiple sharded bins, which percpu_arena
makes it less reliable.
2019-04-02 16:53:00 -07:00
Qi Wang
064d6e570e Tweak the wording about oversize_threshold. 2019-04-01 10:36:29 -07:00
Qi Wang
0101d5ebef Avoid check_min for opt_lg_extent_max_active_fit.
This fixes a compiler warning.
2019-03-29 15:56:53 -07:00
Qi Wang
59d9891948 Add the missing unlock in the error path of extent_register. 2019-03-29 15:56:53 -07:00
Qi Wang
ce03e4c7b8 Document opt.oversize_threshold. 2019-03-29 11:55:05 -07:00
Qi Wang
788a657cee Allow low values of oversize_threshold to disable the feature.
We should allow a way to easily disable the feature (e.g. not reserving the
arena id at all).
2019-03-29 11:33:00 -07:00
Qi Wang
a4d017f5e5 Output message before aborting on tcache size-matching check. 2019-03-29 11:33:00 -07:00
Qi Wang
fb56766ca9 Eagerly purge oversized merged extents.
This change improves memory usage slightly, at virtually no CPU cost.
2019-03-14 17:34:55 -07:00
Qi Wang
f6c30cbafa Remove some unused comments. 2019-03-14 17:34:55 -07:00
Qi Wang
b804d0f019 Fallback to 32-bit when 8-bit atomics are missing for TSD.
When it happens, this might cause a slowdown on the fast path operations.
However such case is very rare.
2019-03-09 12:52:06 -08:00
Qi Wang
06f0850427 Detect if 8-bit atomics are available.
In some rare cases (older compiler, e.g. gcc 4.2 w/ MIPS), 8-bit atomics might
be unavailable.  Detect such cases so that we can workaround.
2019-03-09 12:52:06 -08:00
Jason Evans
14d3686c9f Do not use #pragma GCC diagnostic with gcc < 4.6.
This regression was introduced by
3d29d11ac2 (Clean compilation -Wextra).
2019-03-09 12:10:30 -08:00
Qi Wang
ac24ffb21e Fix a syntax error in configure.ac
Introduced in e13400c919.
2019-03-04 10:50:17 -08:00
Jason Evans
775fe302a7 Remove JE_FORCE_SYNC_COMPARE_AND_SWAP_[48].
These macros have been unused since
d4ac7582f3 (Introduce a backport of C11
atomics).
2019-02-22 14:22:16 -08:00
Dave Rigby
cbdb1807ce Stringify tls_callback linker directive
Proposed fix for #1444 - ensure that `tls_callback` in the `#pragma comment(linker)`directive gets the same prefix added as it does i the C declaration.
2019-02-22 12:43:35 -08:00
Qi Wang
18450d0abe Guard libgcc unwind init with opt_prof.
Only triggers libgcc unwind init when prof is enabled.  This helps workaround
some bootstrapping issues.
2019-02-21 16:04:47 -08:00
Jason Evans
dca7060d5e Avoid redefining tsd_t.
This fixes a build failure when integrating with FreeBSD's libc.  This
regression was introduced by d1e11d48d4
(Move tsd link and in_hook after tcache.).
2019-02-20 20:27:55 -08:00
Qi Wang
9015deb126 Add build_doc by default.
However, skip building the docs (and output warnings) if XML support is missing.
This allows `make install` to succeed w/o `make dist`.
2019-02-08 14:13:20 -08:00
Qi Wang
23b15e764b Add --disable-libdl to travis. 2019-02-06 21:00:59 -08:00
Qi Wang
2db2d2ef5e Make background_thread not dependent on libdl.
When not using libdl, still allows background_thread to be enabled.
2019-02-06 21:00:59 -08:00
Qi Wang
1f55a15467 Add configure option --disable-libdl.
This makes it possible to build full static binary.
2019-02-06 21:00:59 -08:00
Qi Wang
8e9a613122 Disable muzzy decay by default. 2019-02-04 14:38:54 -08:00
Qi Wang
e13400c919 Sanity check szind on tcache flush.
This adds some overhead to the tcache flush path (which is one of the
popular paths).  Guard it behind a config option.
2019-02-01 12:31:34 -08:00
Qi Wang
b33eb26dee Tweak the spacing for the total_wait_time per second. 2019-01-28 15:37:19 -08:00
Qi Wang
374dc30d3d Update copyright dates. 2019-01-25 13:25:20 -08:00
Qi Wang
e3db480f6f Rename huge_threshold to oversize_threshold.
The keyword huge tend to remind people of huge pages which is not relevent to
the feature.
2019-01-25 13:15:45 -08:00
Qi Wang
350809dc5d Set huge_threshold to 8M by default.
This feature uses an dedicated arena to handle huge requests, which
significantly improves VM fragmentation.  In production workload we tested it
often reduces VM size by >30%.
2019-01-24 13:29:23 -08:00
Qi Wang
d3145014a0 Explicitly use arena 0 in alignment and OOM tests.
This helps us avoid issues with size based routing (i.e. the huge_threshold
feature).
2019-01-24 13:29:23 -08:00
Edward Tomasz Napierala
a7b0a124c3 Mention different mmap(2) behaviour with retain:true. 2019-01-23 18:34:59 -08:00
Qi Wang
522d1e7b4b Tweak the spacing for nrequests in stats output. 2019-01-23 17:42:12 -08:00
Qi Wang
8c9571376e Fix stats output (rate for total # of requests).
The rate calculation for the total row was missing.
2019-01-23 17:42:12 -08:00
Qi Wang
7a815c1b7c Un-experimental the huge_threshold feature. 2019-01-16 12:28:57 -08:00
Qi Wang
bbe8e6a909 Avoid creating bg thds for huge arena lone.
For low arena count settings, the huge threshold feature may trigger an unwanted
bg thd creation.  Given that the huge arena does eager purging by default,
bypass bg thd creation when initializing the huge arena.
2019-01-15 16:00:34 -08:00
Jason Evans
b6f1f2669a Revert "Customize cloning to include tags so that VERSION is valid."
This reverts commit 646af596d8.
2019-01-14 10:35:48 -08:00
Jason Evans
225d89998b Revert "Remove --branch=${CIRRUS_BASE_BRANCH} in git clone command."
This reverts commit fc13a7f1fa.
2019-01-14 10:35:48 -08:00
Qi Wang
f459454afe Avoid potential issues on extent zero-out.
When custom extent_hooks or transparent huge pages are in use, the purging
semantics may change, which means we may not get zeroed pages on repopulating.
Fixing the issue by manually memset for such cases.
2019-01-11 19:16:12 -08:00
Qi Wang
0ecd5addb1 Force purge on thread death only when w/o bg thds. 2019-01-11 19:15:34 -08:00
Jason Evans
fc13a7f1fa Remove --branch=${CIRRUS_BASE_BRANCH} in git clone command.
The --branch parameter is unnecessary, and may avoid problems when
testing directly on the dev branch.
2019-01-11 13:50:56 -08:00
Jason Evans
646af596d8 Customize cloning to include tags so that VERSION is valid. 2019-01-10 15:14:33 -08:00
Li-Wen Hsu
6910fcb208 Add Cirrus-CI config for FreeBSD builds 2019-01-10 15:14:33 -08:00
Faidon Liambotis
471191075d Replace -lpthread with -pthread
This automatically adds -latomic if and when needed, e.g. on riscv64
systems.

Fixes #1401.
2019-01-09 13:43:33 -08:00
Leonardo Santagada
daa0e436ba implement malloc_getcpu for windows 2019-01-08 14:34:45 -08:00
John Ericson
4e920d2c9d Add --{enable,disable}-{static,shared} to configure script
My distro offers a custom toolchain where it's not possible to make
static libs, so it's insufficient to just delete the libs I don't want.
I actually need to avoid building them in the first place.
2018-12-19 13:34:26 -08:00
Qi Wang
7241bf5b74 Only read arena index from extent on the tcache flush path.
Add exten_arena_ind_get() to avoid loading the actual arena ptr in case we just
need to check arena matching.
2018-12-18 15:19:30 -08:00
Qi Wang
441335d924 Add unit test for producer-consumer pattern. 2018-12-18 15:09:53 -08:00
Alexander Zinoviev
36de5189c7 Add rate counters to stats 2018-12-18 09:59:41 -08:00
Qi Wang
99f4eefb61 Fix incorrect stats mreging with sharded bins.
With sharded bins, we may not flush all items from the same arena in one run.
Adjust the stats merging logic accordingly.
2018-12-07 18:16:15 -08:00
Qi Wang
711a61f3b4 Add unit test for sharded bins. 2018-12-03 17:17:03 -08:00
Qi Wang
98b56ab23d Store the bin shard selection in TSD.
This avoids having to choose bin shard on the fly, also will allow flexible bin
binding for each thread.
2018-12-03 17:17:03 -08:00
Qi Wang
45bb4483ba Add stats for arenas.bin.i.nshards. 2018-12-03 17:17:03 -08:00
Qi Wang
3f9f2833f6 Add opt.bin_shards to specify number of bin shards.
The option uses the same format as "slab_sizes" to specify number of shards for
each bin size.
2018-12-03 17:17:03 -08:00
Qi Wang
37b8913925 Add support for sharded bins within an arena.
This makes it possible to have multiple set of bins in an arena, which improves
arena scalability because the bins (especially the small ones) are always the
limiting factor in production workload.

A bin shard is picked on allocation; each extent tracks the bin shard id for
deallocation.  The shard size will be determined using runtime options.
2018-12-03 17:17:03 -08:00
Dave Watson
b23336af96 mutex: fix trylock spin wait contention
If there are 3 or more threads spin-waiting on the same mutex,
there will be excessive exclusive cacheline contention because
pthread_trylock() immediately tries to CAS in a new value, instead
of first checking if the lock is locked.

This diff adds a 'locked' hint flag, and we will only spin wait
without trylock()ing while set.  I don't know of any other portable
way to get the same behavior as pthread_mutex_lock().

This is pretty easy to test via ttest, e.g.

./ttest1 500 3 10000 1 100

Throughput is nearly 3x as fast.

This blames to the mutex profiling changes, however, we almost never
have 3 or more threads contending in properly configured production
workloads, but still worth fixing.
2018-11-28 15:17:02 -08:00
Qi Wang
c4063ce439 Set the default number of background threads to 4.
The setting has been tested in production for a while.  No negative effect while
we were able to reduce number of threads per process.
2018-11-16 09:35:12 -08:00
Qi Wang
43f3b1ad0c Deprecate OSSpinLock. 2018-11-14 08:44:05 -08:00
Dave Watson
13c237c7ef Add a fastpath for arena_slab_reg_alloc_batch
Also adds a configure.ac check for __builtin_popcount, which is used
in the new fastpath.
2018-11-14 07:09:11 -08:00
Dave Watson
17aa470760 add extent_nfree_sub 2018-11-14 07:09:11 -08:00
Dave Watson
4b82872ebf arena: Refactor tcache_fill to batch fill from slab
Refactor tcache_fill, introducing a new function arena_slab_reg_alloc_batch,
which will fill multiple pointers from a slab.

There should be no functional changes here, but allows future optimization
on reg_alloc_batch.
2018-11-14 07:09:11 -08:00
Qi Wang
57553c3b1a Avoid touching all pages in extent_recycle for debug build.
We may have a large number of pages with *zero set (since they are populated on
demand).  Only check the first page to avoid paging in all of them.
2018-11-13 08:54:48 -08:00
Qi Wang
1f56115704 Fix tcache_flush (follow up cd2931a).
Also catch invalid tcache id.
2018-11-13 08:54:09 -08:00
Dave Watson
794e29c0ab Add a free() and sdallocx(where flags=0) fastpath
Add unsized and sized deallocation fastpaths.  Similar to the malloc()
fastpath, this removes all frame manipulation for the majority of
free() calls.  The performance advantages here are less than that
of the malloc() fastpath, but from prod tests seems to still be half
a percent or so of improvement.

Stats and sampling a both supported (sdallocx needs a sampling check,
for rtree lookups slab will only be set for unsampled objects).

We don't support flush, any flush requests go to the slowpath.
2018-11-12 13:20:37 -08:00
Dave Watson
e2ab215324 refactor tcache_dalloc_small
Add a cache_bin_dalloc_easy (to match the alloc_easy function),
and use it in tcache_dalloc_small.  It will also be used in the
new free fastpath.
2018-11-12 13:20:37 -08:00
Dave Watson
5e795297b3 rtree: add rtree_szind_slab_read_fast
For a free fastpath, we want something that will not make additional
calls.  Assume most free() calls will hit the L1 cache, and use
a custom rtree function for this.

Additionally, roll the ptr=NULL check in to the rtree cache check.
2018-11-12 13:20:37 -08:00
Edward Tomasz Napierala
a4c6b9ae01 Restore a FreeBSD-specific getpagesize(3) optimization.
It was removed in 0771ff2cea.
Add a comment explaining its purpose.
2018-11-09 14:14:49 -08:00
Qi Wang
cd2931ad9b Fix tcaches_flush.
The regression was introduced in 3a1363b.
2018-11-09 13:11:37 -08:00
Qi Wang
7ee0b6cc37 Properly trigger decay on tcache destory.
When destroying tcache, decay may not be triggered since tsd is non-nominal.
Explicitly decay to avoid pathological cases.
2018-11-09 11:03:19 -08:00
Qi Wang
d66f976628 Optimize large deallocation.
We eagerly coalesce large buffers when deallocating, however the previous logic
around this introduced extra lock overhead -- when coalescing we always lock the
neighbors even if they are active, while for active extents nothing can be done.

This commit checks if the neighbor extents are potentially active before
locking, and avoids locking if possible.  This speeds up large_dalloc by ~20%.
It also fixes some undesired behavior: we could stop coalescing because a small
buffer was merged, while a large neighbor was ignored on the other side.
2018-11-08 13:35:59 -08:00
Qi Wang
8dabf81df1 Bypass extent_dalloc when retain is enabled.
When retain is enabled, the default dalloc hook does nothing (since we avoid
munmap).  But the overhead preparing the call is high, specifically the extent
de-register and re-register involve locking and extent / rtree modifications.
Bypass the call with retain in this diff.
2018-11-08 11:32:25 -08:00
Qi Wang
50b473c883 Set commit properly for FreeBSD w/ overcommit.
When overcommit is enabled, commit needs to be set when doing mmap().  The
regression was introduced in f80c97e.
2018-11-05 09:47:04 -08:00
Justin Hibbits
be0749f591 Restrict lwsync to powerpc64 only
Nearly all 32-bit powerpc hardware treats lwsync as sync, and some cores
(Freescale e500) trap lwsync as an illegal instruction, which then gets
emulated in the kernel.  To avoid unnecessary traps on the e500, use
sync on all 32-bit powerpc.  This pessimizes 32-bit software running on
64-bit hardware, but those numbers should be slim.
2018-10-24 11:18:55 -07:00
Edward Tomasz Napierala
ceba1dde27 Make use of pthread_set_name_np(3) on FreeBSD. 2018-10-24 10:06:37 -07:00
Dave Watson
936bc2aa15 prof: Fix memory regression
The diff 'refactor prof accum...' moved the bytes_until_sample
subtraction before the load of tdata.  If tdata is null,
tdata_get(true) will overwrite bytes_until_sample, but we
still sample the current allocation.   Instead, do the subtraction
and check logic again, to keep the previous behavior.

blame-rev: 0ac524308d
2018-10-23 12:39:57 -07:00
Dave Watson
0f8313659e malloc: Add a fastpath
This diff adds a fastpath that assumes size <= SC_LOOKUP_MAXCLASS, and
that we hit tcache.  If either of these is false, we fall back to
the previous codepath (renamed 'malloc_default').

Crucially, we only tail call malloc_default, and with the same kind
and number of arguments, so that both clang and gcc tail-calling
will kick in - therefore malloc() gets treated as a leaf function,
and there are *no* caller-saved registers.   Previously malloc() contained
5 caller saved registers on x64, resulting in at least 10 extra
memory-movement instructions.

In microbenchmarks this results in up to ~10% improvement in malloc()
fastpath.  In real programs, this is a ~1% CPU and latency improvement
overall.
2018-10-18 08:32:19 -07:00
Dave Watson
0ec656eb71 ticker: add ticker_trytick
For the fastpath, we want to tick, but undo the tick and jump to the
slowpath if ticker would fire.
2018-10-18 08:32:19 -07:00
Dave Watson
ac34afb403 drop bump_empty_alloc option. Size class lookup support used instead. 2018-10-17 08:50:58 -07:00
Dave Watson
4edbb7c64c sz: Support 0 size in size2index lookup/compute 2018-10-17 08:50:58 -07:00
Dave Watson
2b112ea593 add test for zero-sized alloc and aligned alloc 2018-10-17 08:50:58 -07:00
gnzlbg
01e2a38e5a Make smallocx symbol name depend on the JEMALLOC_VERSION_GID
This comments concatenates the `JEMALLOC_VERSION_GID` to the
`smallocx` symbol name, such that the symbol ends up exported
as `smallocx_{git_hash}`.
2018-10-17 07:12:28 -07:00
gnzlbg
837de32496 Test smallocx on Travis-CI
This commit updates the gen_travis script with a new build bot
that covers the experimental `smallocx` API and updates the
travis CI script to test this API under travis.
2018-10-17 07:12:28 -07:00
gnzlbg
741fca1bb7 Hide smallocx even when enabled from the library API
The experimental `smallocx` API is not exposed via header files,
requiring the users to peek at `jemalloc`'s source code to manually
add the external declarations to their own programs.

This should reinforce that `smallocx` is experimental, and that `jemalloc`
does not offer any kind of backwards compatiblity or ABI gurantees for it.
2018-10-17 07:12:28 -07:00
gnzlbg
730e57b08f Adapts mallocx integration tests for smallocx 2018-10-17 07:12:28 -07:00
gnzlbg
08260a6b94 Add experimental API: smallocx_return_t smallocx(size, flags)
---

Motivation:

This new experimental memory-allocaction API returns a pointer to
the allocation as well as the usable size of the allocated memory
region.

The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to
convey that this API returns the size of the allocated memory region.

It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make
use of it.

The main purpose of these APIs is to improve telemetry. It is more accurate
to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`,
for example. The latter will always line up perfectly with the existing
size classes, causing a loss of telemetry information about the internal
fragmentation induced by potentially poor size-classes choices.

Instrumenting `nallocx` does not help much since user code can cache its
result and use it repeatedly.

---

Implementation:

The implementation adds a new `usize` option to `static_opts_s` and an `usize`
variable to `dynamic_opts_s`. These are then used to cache the result of
`sz_index2size` and similar functions in the code paths in which they are
unconditionally invoked. In the code-paths in which these functions are not
unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these
functions explicitly.

---

[0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html
2018-10-17 07:12:28 -07:00
Dave Watson
325e3305fc remove malloc_init() off the fastpath 2018-10-15 10:11:08 -07:00
Dave Watson
997d86acc6 restrict bytes_until_sample to int64_t. This allows optimal asm
generation of sub bytes_until_sample, usize; je; for x86 arch.
Subtraction is unconditional, and only flags are checked for the jump,
no extra compare is necessary.  This also reduces register pressure.
2018-10-15 08:24:12 -07:00
Dave Watson
d1a861fa80 add a check for SC_LARGE_MAXCLASS
If we assume SC_LARGE_MAXCLASS will always fit in a SSIZE_T, then we can
optimize some checks by unconditional subtraction, and then checking flags
only, without a compare statement in x86.
2018-10-15 08:24:12 -07:00
Dave Watson
0ac524308d refactor prof accum, so that tdata is not loaded if we aren't going to sample. 2018-10-15 08:24:12 -07:00
Dave Watson
9ed3bdc848 move bytes until sample to tsd. Fastpath allocation does not need
to load tdata now, avoiding several branches.
2018-10-15 08:24:12 -07:00
Dave Watson
09adf18f1a Remove a branch from cache_bin_alloc_easy
Combine the branches for checking for an empty cache_bin, and
checking for the low watermark.
2018-10-15 08:18:15 -07:00
jsteemann
856319dc8a check return value of malloc_read_fd
in case `malloc_read_fd` returns a negative error number, the result
would afterwards be casted to an unsigned size_t, and may have
theoretically caused an out-of-bounds memory access in the following
`strncmp` call.
2018-10-11 17:25:20 -07:00
Edward Tomasz Napierala
f80c97e477 Rework the way jemalloc uses mmap(2) on FreeBSD.
This makes it directly use MAP_EXCL and MAP_ALIGNED() instead
of weird workarounds involving mapping at random places and then
unmapping parts of them.
2018-10-06 22:06:56 -07:00
Edward Tomasz Napierala
676cdd6679 Disable runtime detection of lazy purging support on FreeBSD.
The check doesn't seem to serve any purpose here, and this shaves
off three syscalls on binary startup.
2018-10-06 22:06:56 -07:00
Rajeev Misra
115ce93562 bit_util: Don't use __builtin_clz on s390x
There's an optimizer bug upstream that results in test failures; reported at
https://bugzilla.redhat.com/show_bug.cgi?id=1619354.  This works around the
failure reported at https://github.com/jemalloc/jemalloc/issues/1307.
2018-09-20 11:25:17 -07:00
David Goldblatt
88771fa013 Bootstrapping: don't overwrite opt_prof_prefix. 2018-09-12 17:06:06 -07:00
rustyx
9f43defb6e Add sc.c to the MSVC project 2018-09-04 12:58:05 -07:00
Rajeev Misra
4c548a61c8 Bit_util: Use intrinsics for pow2_ceil, where available. 2018-08-15 19:38:31 -07:00
gnzlbg
36eb0b3d77 Add valgrind build bots to CI
This commit adds two build-bots to CI that test the release builds
of jemalloc on linux and macOS under valgrind.

The macOS build is not enabled because valgrind reports
errors about reads of uninitialized memory in some tests and
segfaults in others.
2018-08-13 10:59:20 -07:00
David Goldblatt
1f71e1ca43 Add hook microbenchmark. 2018-08-09 13:16:54 -07:00
David Carlier
0771ff2cea FreeBSD build changes and allow to run the tests. 2018-08-09 10:41:20 -07:00
David Goldblatt
e8ec9528ab Allow the use of readlinkat over readlink.
This can be useful in situations where readlink is disallowed.
2018-08-03 14:04:32 -07:00
Tyler Etzel
126252a7e6 Add stats for the size of extent_avail heap 2018-08-02 10:16:06 -07:00
Tyler Etzel
c14e6c0819 Add extents information to mallocstats output
- Show number/bytes of extents of each size that are dirty, muzzy, retained.
2018-08-02 10:16:06 -07:00
Tyler Etzel
33f1aa5bad Fix comment on SC_NPSIZES. 2018-08-02 10:16:06 -07:00
Tyler Etzel
5e23f96dd4 Add unit tests for logging 2018-08-01 13:27:11 -07:00
Tyler Etzel
b664bd7935 Add logging for sampled allocations
- prof_opt_log flag starts logging automatically at runtime
- prof_log_{start,stop} mallctl for manual control
2018-08-01 13:27:11 -07:00
Tyler Etzel
eb261e53a6 Small refactoring of emitter
- Make API more clear for using as standalone json emitter
- Support cases that weren't possible before, e.g.
	- emitting primitive values in an array
	- emitting nested arrays
2018-08-01 13:27:11 -07:00
David Goldblatt
41b7372ead TSD: Add fork support to tsd_nominal_tsds.
In case of multithreaded fork, we want to leave the child in a reasonable state,
in which tsd_nominal_tsds is either empty or contains only the forking thread.
2018-07-26 17:22:25 -07:00
David Goldblatt
013ab26c86 TSD: Add a tsd_nominal_list death assertion.
A thread should have had its state transition away from nominal before it dies.
This change adds that to the list of thread death assertions.
2018-07-26 17:22:25 -07:00
David Goldblatt
3aba072cef SC: Remove global data.
The global data is mostly only used at initialization, or for easy access to
values we could compute statically.  Instead of consuming that space (and
risking TLB misses), we can just pass around a pointer to stack data during
bootstrapping.
2018-07-23 13:37:08 -07:00
Qi Wang
4bc48718b2 Tolerate experimental features for abort_conf.
Not aborting with unrecognized experimental options.  This helps us testing
experimental features with abort_conf enabled.
2018-07-17 20:40:32 -07:00
gnzlbg
6deed86deb Test that .travis.yml has been produced by gen_travis.py on CI
This commits checks on Travis-CI that the current `.travis.yml` file
equals the output of the `gen_travis.py` script, and updated
the `.travis.yml` file accordingly.
2018-07-17 17:55:50 -07:00
gnzlbg
0eb0641cac Simplify output of gen_travis.py script
This commit simplifies the output of the
`gen_travis.py` script by reusing addons.

The `.travis.yml` script is updated to
reflect these changes.
2018-07-17 17:55:50 -07:00
David Goldblatt
55e5cc1341 SC: Make some key size classes static.
The largest small class, smallest large class, and largest large class may all
be needed down fast paths; to avoid the risk of touching another cache line, we
can make them available as constants.
2018-07-12 20:53:06 -07:00
David T. Goldblatt
5112d9e5fd Add MALLOC_CONF parsing for dynamic slab sizes.
This actually enables us to change the values.
2018-07-12 20:53:06 -07:00
David T. Goldblatt
4610ffa942 Bootstrapping: Parse MALLOC_CONF before using slab sizes.
I.e., parse before booting the bin module or sz module.  This lets us tweak size
class settings before committing to them by letting them leak into other
modules.

This commit does not actually do any tweaking of the size classes; it *just*
chanchanges bootstrapping order; this may help bisecting any bootstrapping
failures on poorly-tested architectures.
2018-07-12 20:53:06 -07:00
David T. Goldblatt
a7f68aed3e SC: Add page customization functionality. 2018-07-12 20:53:06 -07:00
David T. Goldblatt
017dca198c SC module: Add a note on style. 2018-07-12 20:53:06 -07:00
David Goldblatt
5b7fc9056c Remove the --with-lg-page-sizes configure option.
This appears to be unused.
2018-07-12 20:53:06 -07:00
David Goldblatt
0552aad91b Kill size_classes.sh.
We've moved size class computations to boot time; they were being used only to
check that the computations resulted in equal values.
2018-07-12 20:53:06 -07:00
David Goldblatt
4f55c0ec22 Translate size class computation from bash shell into C.
This is the last big step in making size classes a runtime computation rather
than a configure-time one.

The compile-time computation has been left in, for now, to allow assertion
checking that the results are identical.
2018-07-12 20:53:06 -07:00
David Goldblatt
2f07e92adb Add lg_ceil to bit_util.
Also, add the bit_util test back to the Makefile.
2018-07-12 20:53:06 -07:00
David Goldblatt
07b89c7673 Move quantum detection into its own file.
This is logically fairly independent.
2018-07-12 20:53:06 -07:00
David Goldblatt
e904f813b4 Hide size class computation behind a layer of indirection.
This class removes almost all the dependencies on size_classes.h, accessing the
data there only via the new module sc.h, which does not depend on any
configuration options.

In a subsequent commit, we'll remove the configure-time size class computations,
doing them at boot time, instead.
2018-07-12 20:53:06 -07:00
gnzlbg
fb924dd7bf Suppress -Wmissing-field-initializer warning only for compilers with buggy implementation 2018-07-10 13:13:36 -07:00
gnzlbg
3d29d11ac2 Clean compilation -Wextra
Before this commit jemalloc produced many warnings when compiled with -Wextra
with both Clang and GCC. This commit fixes the issues raised by these warnings
or suppresses them if they were spurious at least for the Clang and GCC
versions covered by CI.

This commit:

* adds `JEMALLOC_DIAGNOSTIC` macros: `JEMALLOC_DIAGNOSTIC_{PUSH,POP}` are
  used to modify the stack of enabled diagnostics. The
  `JEMALLOC_DIAGNOSTIC_IGNORE_...` macros are used to ignore a concrete
  diagnostic.

* adds `JEMALLOC_FALLTHROUGH` macro to explicitly state that falling
  through `case` labels in a `switch` statement is intended

* Removes all UNUSED annotations on function parameters. The warning
  -Wunused-parameter is now disabled globally in
  `jemalloc_internal_macros.h` for all translation units that include
  that header. It is never re-enabled since that header cannot be
  included by users.

* locally suppresses some -Wextra diagnostics:

  * `-Wmissing-field-initializer` is buggy in older Clang and GCC versions,
    where it does not understanding that, in C, `= {0}` is a common C idiom
    to initialize a struct to zero

  * `-Wtype-bounds` is suppressed in a particular situation where a generic
    macro, used in multiple different places, compares an unsigned integer for
    smaller than zero, which is always true.

  * `-Walloc-larger-than-size=` diagnostics warn when an allocation function is
    called with a size that is too large (out-of-range). These are suppressed in
    the parts of the tests where `jemalloc` explicitly does this to test that the
    allocation functions fail properly.

* adds a new CI build bot that runs the log unit test on CI.

Closes #1196 .
2018-07-09 21:40:42 -07:00
Maks Naumov
ce5c073fe5 Fix MSVC build 2018-07-05 13:50:01 -07:00
Qi Wang
cdf15b458a Rename huge_threshold to experimental, and tweak documentation. 2018-06-29 10:35:02 -07:00
Qi Wang
ff622eeab5 Add unit test for opt.huge_threshold. 2018-06-29 10:35:02 -07:00
Qi Wang
1302af4c43 Add ctl and stats for opt.huge_threshold. 2018-06-29 10:35:02 -07:00
Qi Wang
79522b2fc2 Refactor arena_is_auto. 2018-06-29 10:35:02 -07:00
Qi Wang
94a88c26f4 Implement huge arena: opt.huge_threshold.
The feature allows using a dedicated arena for huge allocations.  We want the
addtional arena to separate huge allocation because: 1) mixing small extents
with huge ones causes fragmentation over the long run (this feature reduces VM
size significantly); 2) with many arenas, huge extents rarely get reused across
threads; and 3) huge allocations happen way less frequently, therefore no
concerns for lock contention.
2018-06-29 10:35:02 -07:00
Qi Wang
77a71ef2b7 Fall back to the default pthread_create if RTLD_NEXT fails. 2018-06-28 13:18:21 -07:00
David Goldblatt
d1e11d48d4 Move tsd link and in_hook after tcache.
This can lead to better cache utilization down the common paths where we don't
touch the link.
2018-06-27 13:39:02 -07:00
Qi Wang
50820010fe Add test for remote deallocation. 2018-06-26 23:13:15 -07:00
Qi Wang
fec1ef7c91 Fix arena locking in tcache_bin_flush_large().
This regression was introduced in c834912 (incorrect arena used).
2018-06-26 23:13:15 -07:00
Qi Wang
0ff7ff3ec7 Optimize ixalloc by avoiding a size lookup. 2018-06-05 21:03:51 -07:00
Qi Wang
c834912aa9 Avoid taking large_mtx for auto arenas.
On tcache flush path, we can avoid touching the large_mtx for auto arenas, since
it was only needed for manual arenas where arena_reset is allowed.
2018-06-05 15:16:03 -07:00
Qi Wang
9bd8deb260 Fix stats output for opt.lg_extent_max_active_fit. 2018-06-05 10:23:28 -07:00
Qi Wang
d22e150320 Avoid taking extents_muzzy mutex when muzzy is disabled.
When muzzy decay is disabled, no need to allocate from extents_muzzy.  This
saves us a couple of mutex operations down the extents_alloc path.
2018-05-24 14:40:56 -07:00
David Goldblatt
a7f749c9af Hooks: Protect against reentrancy.
Previously, we made the user deal with this themselves, but that's not good
enough; if hooks may allocate, we should test the allocation pathways down
hooks.  If we're doing that, we might as well actually implement the protection
for the user.
2018-05-18 11:43:03 -07:00
David Goldblatt
0379235f47 Tests: Shouldn't be able to change global slowness.
This can help ensure that we don't leave slowness changes behind in case of
resource exhaustion.
2018-05-18 11:43:03 -07:00
David Goldblatt
59e371f463 Hooks: Add a hook exhaustion test.
When we run out of space in which to store hooks, we should return EAGAIN from
the mallctl, but not otherwise misbehave.
2018-05-18 11:43:03 -07:00
David Goldblatt
bb071db92e Mallctl: Add experimental.hooks.[install|remove]. 2018-05-18 11:43:03 -07:00
David Goldblatt
126e9a84a5 Hooks: move the "extra" pointer into the hook_t itself.
This simplifies the mallctl call to install a hook, which should only take a
single argument.
2018-05-18 11:43:03 -07:00
David Goldblatt
cb0707c0fc Hooks: hook the realloc pathways that move/expand. 2018-05-18 11:43:03 -07:00
David Goldblatt
67270040a5 Hooks: hook the realloc paths that act as pure malloc/free. 2018-05-18 11:43:03 -07:00
David Goldblatt
83e516154c Hooks: hook the pure-expand function. 2018-05-18 11:43:03 -07:00
David Goldblatt
c154f5881b Hooks: hook the pure-deallocation functions. 2018-05-18 11:43:03 -07:00
David Goldblatt
226327cf66 Hooks: hook the pure-allocation functions. 2018-05-18 11:43:03 -07:00
David Goldblatt
fe0e399385 Hooks: add an early-exit path for the common no-hook case. 2018-05-18 11:43:03 -07:00
David Goldblatt
5ae6e7cbfa Add "hook" module.
The hook module allows a low-reader-overhead way of finding hooks to invoke and
calling them.

For now, none of the allocation pathways are tied into the hooks; this will come
later.
2018-05-18 11:43:03 -07:00
David Goldblatt
06a8c40b36 Add the Seq module, a simple seqlock implementation.
This allows fast reader-writer concurrency in cases where writers are rare.  The
immediate use case is for the hooking implementaiton.
2018-05-18 11:43:03 -07:00
David Goldblatt
c7a87e0e0b Rename hooks module to test_hooks.
"Hooks" is really the best name for the module that will contain the publicly
exposed hooks.  So lets rename the current "hooks" module (that hook external
dependencies, for reentrancy testing) to "test_hooks".
2018-05-18 11:43:03 -07:00
David Goldblatt
e870829e64 TSD: Add the ability to enter a global slow path.
This gives any thread the ability to send other threads down slow paths the next
time they fetch tsd.
2018-05-18 11:43:03 -07:00
David Goldblatt
feff510b9f TSD: Pull name mangling into a macro. 2018-05-18 11:43:03 -07:00
David Goldblatt
39d6420c0c TSD: Make state atomic.
This will let us change the state of another thread remotely, eventually.
2018-05-18 11:43:03 -07:00
David Goldblatt
982c10de35 TSD: Make all state access happen through a function.
Shortly, tsd state will be atomic and have some complicated enough logic down
the state-setting path that we should be aware of it.
2018-05-18 11:43:03 -07:00
David Goldblatt
e74a1a37c8 Atomics: Add atomic_u8_t, force-inline operations.
We're about to need an atomic uint8_t for state operations.

Unfortunately, we're at the point where things won't get inlined into the key
methods unless they're force-inlined.  This is embarassing and we should do
something about it, but in the meantime we'll force-inline a little more when we
need to.
2018-05-18 11:43:03 -07:00
Qi Wang
09edea3f5c Tweak the format of the per arena summary section.
Increase the width to ensure enough space for long running programs.
2018-05-17 12:58:56 -07:00
Qi Wang
b293a3eb86 Fix the max_background_thread test.
We may set number of background threads separately, e.g. through
--with-malloc-conf, so avoid assuming the default number in the test.
2018-05-15 14:00:51 -07:00
Qi Wang
312352faa8 Fix background thread index issues with max_background_threads. 2018-05-15 12:25:23 -07:00
Qi Wang
e8a63b87c3 Fix an incorrect assertion.
When configured with --with-lg-page, it's possible for the configured page size
to be greater than the system page size, in which case the page address may only
be aligned with the system page size.
2018-05-09 23:52:56 -07:00
Qi Wang
61efbda709 Merge branch 'dev' 2018-05-08 12:12:50 -07:00
Qi Wang
1c51381b7c Update ChangeLog for 5.1.0. 2018-05-08 12:06:34 -07:00
David T. Goldblatt
e94ca7f3e2 run_tests.sh: Don't test large vaddr with -m32. 2018-05-08 11:20:25 -07:00
Qi Wang
a308af360c Reformat the version number in jemalloc.pc.in. 2018-05-07 20:12:03 -07:00
Christoph Muellner
b73380bee0 Fix include path order for out-of-tree builds.
When configuring out-of-tree (source directory is not build directory),
the generated include files from the build directory should have higher
priority than those in the source dir.

This is especially helpful when cross-compiling.

Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-05-05 10:11:22 -07:00
David Goldblatt
4c8829e692 run_tests.sh: Test --with-lg-vaddr. 2018-05-04 15:50:12 -07:00
David Goldblatt
b001e6e740 INSTALL.md: Clarify --with-lg-vaddr.
The current wording can be taken to imply that we return tagged pointers to the
user, or otherwise rely on architectural support for them.
2018-05-04 15:50:12 -07:00
Christoph Muellner
63712b4c4e configure: Add --with-lg-vaddr configure option.
This patch allows to override the lg-vaddr values, which
are defined by the build machine's CPUID information (x86_64)
or default values (other architectures like aarch64).

Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-05-04 10:34:10 -07:00
Qi Wang
95789a24fa Update copyright dates. 2018-05-03 15:31:42 -07:00
Qi Wang
2e7af1af73 Add TUNING.md. 2018-05-03 12:52:52 -07:00
Qi Wang
3bcaedeea2 Remove documentation for --disable-thp which was removed. 2018-05-03 12:52:52 -07:00
Qi Wang
c5b72a92cc Fix a typo in INSTALL.md. 2018-05-02 15:08:49 -07:00
Latchesar Ionkov
a32b7bd567 Mallctl: Add arenas.lookup
Implement a new mallctl operation that allows looking up the arena a
region of memory belongs to.
2018-05-01 13:14:36 -07:00
Christoph Muellner
6df90600a7 aarch64: Add ILP32 support.
Instead of setting a fix value of 48 allowed VA bits,
we distiguish between LP64 and ILP32.

Testsuite result with LP64:
Test suite summary: pass: 13/13, skip: 0/13, fail: 0/13

Testsuit result with ILP32:
Test suite summary: pass: 13/13, skip: 0/13, fail: 0/13

Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
2018-04-30 15:04:00 -07:00
Issam Maghni
39b1b20499 Adding install_lib_pc
Related to https://github.com/jemalloc/jemalloc/issues/974
2018-04-22 11:52:47 -07:00
Qi Wang
b8f4c730ef Remove an incorrect assertion.
Background threads are created without holding the global background_thread
lock, which mean paused state is possible (and fine).
2018-04-18 14:17:08 -07:00
Qi Wang
dedfeecc4e Invoke dlsym() on demand.
If no lazy lock or background thread is enabled, avoid dlsym pthread_create on
boot.
2018-04-18 11:20:21 -07:00
David Goldblatt
c95284df1a Avoid a resource leak down extent split failure paths.
Previously, we would leak the extent and memory associated with a salvageable
portion of an extent that we were trying to split in three, in the case where
the first split attempt succeeded and the second failed.
2018-04-18 08:19:41 -07:00
David Goldblatt
a62e42baeb Add the --disable-initial-exec-tls configure option.
Right now we always make our TLS use the initial-exec model if the compiler
supports it.  This change allows configure-time disabling of this setting, which
can be helpful when dynamically loading jemalloc is the only option.
2018-04-17 19:22:01 -07:00
Qi Wang
e40b2f75bd Fix abort_conf processing.
When abort_conf is set, make sure we always error out at the end of the options
processing loop.
2018-04-17 18:23:53 -07:00
Qi Wang
0fadf4a2e3 Add UNUSED to avoid compiler warnings. 2018-04-16 13:50:21 -07:00
Jason Evans
2a80d6f15b Avoid a printf format specifier warning.
This dodges a warning emitted by the FreeBSD system gcc when compiling
libc for architectures which don't use clang as the system compiler.
2018-04-16 11:07:51 -07:00
Qi Wang
3f0dc64c6b Allow setting extent hooks on uninitialized auto arenas.
Setting extent hooks can result in initializing an unused auto arena.  This is
useful to install extent hooks on auto arenas from the beginning.
2018-04-11 21:21:54 -07:00
Qi Wang
02585420c3 Document liveness requirements for extent_hooks_t structures. 2018-04-11 12:35:28 -07:00
Qi Wang
f0b146acc4 Fix a typo. 2018-04-11 10:42:57 -07:00
Jason Evans
cad27a894a Fix a typo. 2018-04-10 17:59:10 -07:00
Jason Evans
4937309620 Silence a compiler warning. 2018-04-10 17:59:00 -07:00
Dave Watson
8b14f3abc0 background_thread: add max thread count config
Looking at the thread counts in our services, jemalloc's background thread
is useful, but mostly idle.  Add a config option to tune down the number of threads.
2018-04-10 14:01:45 -07:00
Qi Wang
4be74d5112 Consolidate the two memory loads in rtree_szind_slab_read().
szind and slab bits are read on fast path, where compiler generated two memory
loads separately for them before this diff.  Manually operate on the bits to
avoid the extra memory load.
2018-04-10 10:18:46 -07:00
Rajeev Misra
5f51882a0a Stack address should not be used for ordering mutexes 2018-04-10 10:16:57 -07:00
Qi Wang
cf2f4aac1c Fix const qualifier warnings. 2018-04-09 16:50:30 -07:00
Qi Wang
d3e0976a2c Fix type warning on Windows.
Add cast since read / write has unsigned return type on windows.
2018-04-09 16:50:30 -07:00
Qi Wang
4df483f0fd Fix arguments passed to extent_init. 2018-04-09 16:35:58 -07:00
Qi Wang
2dccf45640 Control idump and gdump with prof_active. 2018-04-09 16:35:14 -07:00
Dave Watson
6d02421730 extents: Remove preserve_lru feature.
preserve_lru feature adds lots of complication, for little value.
Removing it means merged extents are re-added to the lru list, and may
take longer to madvise away than they otherwise would.

Canaries after removal seem flat for several services (no change).
2018-04-02 12:40:28 -07:00
Qi Wang
21eb0d15a6 Fix a background_thread shutdown issue.
1) make sure background thread 0 is always created; and 2) fix synchronization
between thread 0 and the control thread.
2018-04-02 10:03:47 -07:00
Qi Wang
956c4ad6b5 Change mutable option output in stats to avoid stringify issues. 2018-03-15 14:42:48 -07:00
Qi Wang
baffeb1d0a Fix a typo in stats. 2018-03-15 14:42:48 -07:00
Qi Wang
742416f645 Revert "CI: Remove "catgets" dependency on appveyor."
This reverts commit ae0f5d5c3f.
2018-03-15 13:58:42 -07:00
David Goldblatt
4c36cd2cc5 Stats printing: Convert arena large stats to use emitter.
This completes the conversion; we now have only structured text output.
2018-03-09 11:47:17 -08:00
David Goldblatt
4eed989bbf Stats printing: convert arena bin stats to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
a9f3cedc6e Stats printing: remove a spurious newline.
This was left over from a previous emitter conversion.  It didn't affect the
correctness of the output.
2018-03-09 11:47:17 -08:00
David Goldblatt
a1738f4efd Stats printing: Make arena mutex stats use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
07fb707623 Stats printing: convert most per-arena stats to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
8fc850695d Stats printing: convert paging and alloc counts to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
bc6620f73e Stats printing: convert decay stats to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
a6ef061c43 Stats printing: Move emitter cutoff point into stats_arena_print. 2018-03-09 11:47:17 -08:00
David Goldblatt
cbde666d9a Stats printing: move stats_print_helper to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
86c61d4a57 Stats printing: Move global mutex stats to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
ebe0b5f828 Emitter: Add support for row-based output in table mode.
This is needed for things like mutex stats in table mode.
2018-03-09 11:47:17 -08:00
David Goldblatt
9e1846b004 Stats printing: move non-mutex arena stats to the emitter.
Another step in the conversion process.  The mutex is a little different,
because we we want to emit it as an array.
2018-03-09 11:47:17 -08:00
David Goldblatt
8076b28721 Stats printing: Remove explicit callback passing to stats_print_helper.
This makes the emitter the only source of callback information, which is a step
towards where we want to be.
2018-03-09 11:47:17 -08:00
David Goldblatt
0d20eda127 Stats printing: Move emitter -> manual cutoff point.
This makes it so that the "general" portion of the stats code is completely
agnostic to emitter type.
2018-03-09 11:47:17 -08:00
David Goldblatt
ec31d476ff Stats printing: Convert profiling stats to use the emitter.
While we're at it, print them in table form, too.
2018-03-09 11:47:17 -08:00
David Goldblatt
e5acc35400 Stats printing: Convert general arena stats to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
4a335e0c6f Stats printing: convert config and opt output to use emitter.
This is a step along the path towards using the emitter for all stats output.
2018-03-09 11:47:17 -08:00
David Goldblatt
b646f89173 Stats printing: Convert header and footer to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
27a8fe6780 Introduce the emitter module.
The emitter can be used to produce structured json or tabular output.  For now
it has no uses; in subsequent commits, I'll begin transitioning stats printing
code over.
2018-03-09 11:47:17 -08:00
Qi Wang
e4f090e8df Add opt.thp which allows explicit hugepage usage.
"always" marks all user mappings as MADV_HUGEPAGE; while "never" marks all
mappings as MADV_NOHUGEPAGE. The default setting "default" does not change any
settings.  Note that all the madvise calls are part of the default extent hooks
by design, so that customized extent hooks have complete control over the
mappings including hugepage settings.
2018-03-08 13:08:06 -08:00
Qi Wang
efa40532dc Remove config.thp which wasn't in use. 2018-03-08 13:08:06 -08:00
Qi Wang
6b35366ef5 Skip test_alignment_and_size if percpu_arena is enabled.
test_alignment_and_size needs a lot of memory.  When percpu_arena is enabled,
multiple arenas may cause the test to OOM.
2018-03-02 14:44:21 -08:00
Qi Wang
548153e789 Remove unused code in test/thread_tcache_enabled. 2018-03-02 14:44:21 -08:00
David Goldblatt
26b1c13982 Background threads: fix an indexing bug.
We have a buffer overrun that manifests in the case where arena indices higher
than the number of CPUs are accessed before arena indices lower than the number
of CPUs.  This fixes the bug and adds a test.
2018-02-27 19:43:05 -08:00
David T. Goldblatt
dd7e283b6f Tweak the ticker paths to help GCC generate better code.
GCC on its own isn't quite able to turn the ticker subtract into a memory
operation followed by a js.
2018-02-21 16:04:23 -08:00
David Goldblatt
ae0f5d5c3f CI: Remove "catgets" dependency on appveyor.
This seems to cause a configuration error with msys2.
2018-02-14 16:21:44 -08:00
Maks Naumov
a3abbb4bdf Fix MSVC build 2018-02-12 10:35:53 -08:00
rustyx
83aa9880b7 Make generated headers usable in both x86 and x64 mode in Visual Studio 2018-01-30 13:11:41 -08:00
rustyx
ed52d24f74 Define JEMALLOC_NO_PRIVATE_NAMESPACE also in Visual Studio x86 targets 2018-01-30 13:11:41 -08:00
Christopher Ferris
f78d4ca3fb Modify configure to determine return value of strerror_r.
On glibc and Android's bionic, strerror_r returns char* when
_GNU_SOURCE is defined.

Add a configure check for this rather than assume glibc is the
only libc that behaves this way.
2018-01-10 21:01:18 -08:00
Qi Wang
ba5992fe9a Improve the fit for aligned allocation.
We compute the max size required to satisfy an alignment.  However this can be
quite pessimistic, especially with frequent reuse (and combined with state-based
fragmentation).  This commit adds one more fit step specific to aligned
allocations, searching in all potential fit size classes.
2018-01-05 14:27:58 -08:00
Qi Wang
41790f4fa4 Check tsdn_null before reading reentrancy level. 2018-01-05 13:05:17 -08:00
Qi Wang
91b247d311 In iallocztm, check lock rank only when not in reentrancy. 2018-01-05 13:05:17 -08:00
Nehal J Wani
78a87e4a80 Make sure JE_CXXFLAGS_ADD uses CPP compiler
All the invocations of AC_COMPILE_IFELSE inside JE_CXXFLAGS_ADD were
running 'the compiler and compilation flags of the current language'
which was always the C compiler and the CXXFLAGS were never being tested
against a C++ compiler. This patch fixes this issue by temporarily
changing the chosen compiler to C++ by pushing it over the stack and
popping it immediately after the compilation check.
2018-01-04 11:14:46 -08:00
marxin
433c2edabc Disable JEMALLOC_HAVE_MADVISE_HUGE for arm* CPUs. 2018-01-04 11:13:32 -08:00
Rajeev Misra
72bdbc35e3 extent_t bitpacking logic refactoring 2018-01-04 11:11:04 -08:00
Rajeev Misra
f47e39d11a handle 32 bit mutex counters 2018-01-04 11:08:17 -08:00
David Goldblatt
d41b19f9c7 Implement arena regind computation using div_info_t.
This eliminates the need to generate an enormous switch statement in
arena_slab_regind.
2017-12-21 14:25:43 -08:00
David Goldblatt
21f7c13d0b Add the div module, which allows fast division by dynamic values. 2017-12-21 14:25:43 -08:00
David T. Goldblatt
7f1b02e3fa Split up and standardize naming of stats code.
The arena-associated stats are now all prefixed with arena_stats_, and live in
their own file.  Likewise, malloc_bin_stats_t -> bin_stats_t, also in its own
file.
2017-12-18 16:29:10 -08:00
David T. Goldblatt
901d94a2b0 Rename cache_alloc_easy to cache_bin_alloc_easy.
This lives in the cache_bin module; just a typo.
2017-12-18 16:29:10 -08:00
David T. Goldblatt
8aafa270fd Move bin stats code from arena to bin module. 2017-12-18 16:29:10 -08:00
David T. Goldblatt
48bb4a056b Move bin forking code from arena to bin module. 2017-12-18 16:29:10 -08:00
David T. Goldblatt
a8dd8876fb Move bin initialization from arena module to bin module. 2017-12-18 16:29:10 -08:00
David T. Goldblatt
4bf4a1c4ea Pull out arena_bin_info_t and arena_bin_t into their own file.
In the process, kill arena_bin_index, which is unused.  To follow are several
diffs continuing this separation.
2017-12-18 16:29:10 -08:00
Qi Wang
740bdd68b1 Over purge by 1 extent always.
When purging, large allocations are usually the ones that cross the npages_limit
threshold, simply because they are "large".  This means we often leave the large
extent around for a while, which has the downsides of: 1) high RSS and 2) more
chance of them getting fragmented.  Given that they are not likely to be reused
very soon (LRU), let's over purge by 1 extent (which is often large and not
reused frequently).
2017-12-18 12:57:07 -08:00
Qi Wang
f70785de91 Skip test/unit/pack when profiling is enabled.
The test assumes no sampled allocations.
2017-12-18 12:47:46 -08:00
Qi Wang
5e0332890f Output opt.lg_extent_max_active_fit in stats. 2017-12-14 15:49:15 -08:00
nicolov
22460cbebd jemalloc_mangle.sh: set sh in strict mode 2017-12-11 23:35:20 -08:00
Ed Schouten
749caf14ae Also use __riscv to detect builds for RISC-V CPUs.
According to the RISC-V toolchain conventions, __riscv__ is the old
spelling of this definition. __riscv should be used going forward.

https://github.com/riscv/riscv-toolchain-conventions#cc-preprocessor-definitions
2017-12-09 10:10:42 -08:00
Qi Wang
955b1d9cc5 Fix extent deregister on the leak path.
On leak path we should not adjust gdump when deregister.
2017-12-08 22:22:03 -08:00
Qi Wang
b5ab3f91ea Fix test/integration/extent.
Should only run the hook tests without background threads.  This was introduced
in 6e841f6.
2017-12-08 22:22:03 -08:00
Qi Wang
6e841f618a Add more tests for extent hooks failure paths. 2017-11-28 21:52:49 -08:00
Qi Wang
26a8f82c48 Add missing deregister before extents_leak.
This fixes an regression introduced by 211b1f3 (refactor extent split).
2017-11-19 21:12:40 -08:00
Qi Wang
e475d03752 Avoid setting zero and commit if split fails in extent_recycle. 2017-11-19 21:12:27 -08:00
Qi Wang
3e64dae802 Eagerly coalesce large extents.
Coalescing is a small price to pay for large allocations since they happen less
frequently.  This reduces fragmentation while also potentially improving
locality.
2017-11-16 15:32:02 -08:00
Qi Wang
eb1b08daae Fix an extent coalesce bug.
When coalescing, we should take both extents off the LRU list; otherwise decay
can grab the existing outer extent through extents_evict.
2017-11-16 15:32:02 -08:00
Qi Wang
fac706836f Add opt.lg_extent_max_active_fit
When allocating from dirty extents (which we always prefer if available), large
active extents can get split even if the new allocation is much smaller, in
which case the introduced fragmentation causes high long term damage.  This new
option controls the threshold to reuse and split an existing active extent.  We
avoid using a large extent for much smaller sizes, in order to reduce
fragmentation.  In some workload, adding the threshold improves virtual memory
usage by >10x.
2017-11-16 15:32:02 -08:00
Qi Wang
282a3faa17 Use extent_heap_first for best fit.
extent_heap_any makes the layout less predictable and as a result incurs more
fragmentation.
2017-11-16 15:32:02 -08:00
Dave Watson
d6feed6e66 Use tsd offset_state instead of atomic
While working on #852, I noticed the prng state is atomic.  This is the only
atomic use of prng in all of jemalloc.  Instead, use a threadlocal prng
state if possible to avoid unnecessary cache line contention.
2017-11-14 08:58:18 -08:00
Qi Wang
cb3b72b975 Fix base allocator THP auto mode locking and stats.
Added proper synchronization for switching to using THP in auto mode.  Also
fixed stats for number of THPs used.
2017-11-09 16:14:12 -08:00
Qi Wang
b5d071c266 Fix unbounded increase in stash_decayed.
Added an upper bound on how many pages we can decay during the current run.
Without this, decay could have unbounded increase in stashed, since other
threads could add new pages into the extents.
2017-11-08 16:33:30 -08:00
Qi Wang
6dd5681ab7 Use hugepage alignment for base allocator.
This gives us an easier way to tell if the allocation is for metadata in the
extent hooks.
2017-11-03 19:37:13 -07:00
Qi Wang
e422fa8e7e Add arena.i.retain_grow_limit
This option controls the max size when grow_retained.  This is useful when we
have customized extent hooks reserving physical memory (e.g. 1G huge pages).
Without this feature, the default increasing sequence could result in fragmented
and wasted physical memory.
2017-11-03 13:53:33 -07:00
Edward Tomasz Napierala
9f455e2786 Try to use sysctl(3) instead of sysctlbyname(3).
This attempts to use VM_OVERCOMMIT OID - newly introduced in -CURRENT
few days ago, specifically for this purpose - instead of querying the
sysctl by its string name.  Due to how syctlbyname(3) works, this means
we do one syscall during binary startup instead of two.

Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
2017-11-03 08:25:39 -07:00
Edward Tomasz Napierala
d591df05c8 Use getpagesize(3) under FreeBSD.
This avoids sysctl(2) syscall during binary startup, using the value
passed in the ELF aux vector instead.

Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
2017-11-03 08:25:39 -07:00
Qi Wang
58eba024c0 metadata_thp: auto mode adjustment for a0.
We observed that arena 0 can have much more metadata allocated comparing to
other arenas.  Tune the auto mode to only switch to huge page on the 5th block
(instead of 3 previously) for a0.
2017-11-01 13:52:06 -07:00
Qi Wang
47203d5f42 Output all counters for bin mutex stats.
The saved space is not worth the trouble of missing counters.
2017-10-19 16:31:54 -07:00
David Goldblatt
d14bbf8d81 Add a "dumpable" bit to the extent state.
Currently, this is unused (i.e. all extents are always marked dumpable).  In the
future, we'll begin using this functionality.
2017-10-16 15:35:49 -07:00
David Goldblatt
bbaa72422b Add pages_dontdump and pages_dodump.
This will, eventually, enable us to avoid dumping eden regions.
2017-10-16 15:35:49 -07:00
David Goldblatt
ccd09050aa Add configure-time detection for madvise(..., MADV_DO[NT]DUMP) 2017-10-16 15:35:49 -07:00
David Goldblatt
211b1f3c7d Factor out extent-splitting core from extent lifetime management.
Before this commit, extent_recycle_split intermingles the splitting of an extent
and the return of parts of that extent to a given extents_t.  After it, that
logic is separated.  This will enable splitting extents that don't live in any
extents_t (as the grow retained region soon will).
2017-10-16 15:35:49 -07:00
David Goldblatt
5bad01c38e Document some of the internal extent functions. 2017-10-16 15:35:49 -07:00
rustyx
33df2fa169 Fix MSVC 2015 project and add a VS 2017 solution 2017-10-16 10:26:54 -07:00
Qi Wang
f4f814cd4c Remove the default value for JEMALLOC_PURGE_MADVISE_DONTNEED_ZEROS. 2017-10-11 15:49:22 -07:00
Qi Wang
31ab38be5f Define MADV_FREE on our own when needed.
On x86 Linux, we define our own MADV_FREE if madvise(2) is available, but no
MADV_FREE is detected.  This allows the feature to be built in and enabled with
runtime detection.
2017-10-11 15:49:22 -07:00
Qi Wang
fc83de0384 Document the potential issues about opt.background_thread. 2017-10-11 09:52:04 -07:00
Qi Wang
7e74093c96 Set isthreaded manually.
Avoid relying pthread_once which creates dependency during init.
2017-10-05 22:57:56 -07:00
Qi Wang
a2e6eb2c22 Delay background_thread_ctl_init to right before thread creation.
ctl_init sets isthreaded, which means it should be done without holding any
locks.
2017-10-05 22:57:56 -07:00
Qi Wang
79e83451ff Enable a0 metadata thp on the 3rd base block.
Since we allocate rtree nodes from a0's base, it's pushed to over 1 block on
initialization right away, which makes the auto thp mode less effective on a0.
We change a0 to make the switch on the 3rd block instead.
2017-10-05 13:39:03 -07:00
David Goldblatt
1245faae90 Power: disable the CPU_SPINWAIT macro.
Quoting from https://github.com/jemalloc/jemalloc/issues/761 :

[...] reading the Power ISA documentation[1], the assembly in [the CPU_SPINWAIT
macro] isn't correct anyway (as @marxin points out): the setting of the
program-priority register is "sticky", and we never undo the lowering.

We could do something similar, but given that we don't have testing here in the
first place, I'm inclined to simply not try. I'll put something up reverting the
problematic commit tomorrow.

[1] Book II, chapter 3 of the 2.07B or 3.0B ISA documents.
2017-10-04 18:37:23 -07:00
Dave Watson
7c6c99b829 Use ph instead of rb tree for extents_avail_
There does not seem to be any overlap between usage of
extent_avail and extent_heap, so we can use the same hook.

The only remaining usage of rb trees is in the profiling code,
which has some 'interesting' iteration constraints.

Fixes #888
2017-10-04 12:23:03 -07:00
David Goldblatt
8a7ee3014c Logging: capitalize log macro.
Dodge a name-conflict with the math.h logarithm function. D'oh.
2017-10-02 20:44:43 -07:00
David Goldblatt
7a8bc7172b ARM: Don't extend bit LG_VADDR to compute high address bits.
In userspace ARM on Linux, zero-ing the high bits is the correct way to do this.
This doesn't fix the fact that we currently set LG_VADDR to 48 on ARM, when in
fact larger virtual address sizes are coming soon.  We'll cross that bridge when
we come to it.
2017-10-02 14:54:46 -07:00
Qi Wang
0720192a32 Add runtime detection of lazy purging support.
It's possible to build with lazy purge enabled but depoly to systems without
such support.  In this case, rely on the boot time detection instead of keep
making unnecessary madvise calls (which all returns EINVAL).
2017-09-26 17:26:22 -07:00
Qi Wang
3959a9fe19 Avoid left shift by negative values.
Fix warnings on -Wshift-negative-value.
2017-09-25 15:38:58 -07:00
Qi Wang
56f0e57844 Add "falls through" comment explicitly.
Fix warnings by -Wimplicit-fallthrough.
2017-09-25 15:38:58 -07:00
Tamir Duberstein
a545f1804a dumpbin doesn't exist in mingw 2017-09-21 12:18:19 -07:00
Tamir Duberstein
24766ccd5b Allow toolchain to determine nm 2017-09-21 12:18:19 -07:00
Tamir Duberstein
96f1468221 whitespace 2017-09-21 12:18:19 -07:00
Qi Wang
eaa58a5026 Put static keyword first.
Fix a warning by -Wold-style-declaration.
2017-09-21 12:18:10 -07:00
Qi Wang
d60f3bac12 Add missing field in initializer for rtree cache.
Fix a warning by -Wmissing-field-initializers.
2017-09-21 12:18:10 -07:00
David Goldblatt
9e39425bf1 Force Ubuntu "precise" for Travis CI builds.
We've been seeing strange errors in jemalloc_cpp.cpp since Travis upgraded from
precise to trusty as their default CI environment (seeming to stem from some
the new clang version finding the headers for an old version of libstdc++.  In
the long run we'll have to deal with this "for real", but at that point we may
have a better C++ story in general, making it a moot point.
2017-09-20 10:38:26 -07:00
Qi Wang
9b20a4bf70 Clear cache bin ql postfork.
This fixes a regression in 9c05490, which introduced the new cache bin ql.  The
list needs to be cleaned up after fork, same as tcache_ql.
2017-09-12 16:16:12 -07:00
Qi Wang
886053b966 Fix huge page test in test/unit/pages.
Huge pages could be disabled even if the kernel header has MAD_HUGEPAGE
defined.  Guard the huge pagetest with runtime detection.
2017-09-12 14:29:49 -07:00
Qi Wang
cf4738455d Fix a link for dirty_decay_ms in manual. 2017-09-11 13:38:45 -07:00
Qi Wang
a315688be0 Relax constraints on reentrancy for extent hooks.
If we guarantee no malloc activity in extent hooks, it's possible to make
customized hooks working on arena 0.  Remove the non-a0 assertion to enable such
use cases.
2017-08-31 11:03:34 -07:00
Qi Wang
e55c3ca267 Add stats for metadata_thp.
Report number of THPs used in arena and aggregated stats.
2017-08-30 16:47:32 -07:00
Qi Wang
47b20bb654 Change opt.metadata_thp to [disabled,auto,always].
To avoid the high RSS caused by THP + low usage arena (i.e. THP becomes a
significant percentage), added a new "auto" option which will only start using
THP after a base allocator used up the first THP region.  Starting from the
second hugepage (in a single arena), "auto" behaves the same as "always",
i.e. madvise hugepage right away.
2017-08-30 16:47:32 -07:00
David Goldblatt
ea91dfa58e Document the ialloc function abbreviations.
In the jemalloc_internal_inlines files, we have a lot of somewhat terse function
names.  This commit adds some documentation to aid in translation.
2017-08-16 17:48:44 -07:00
David Goldblatt
9c0549007d Make arena stats collection go through cache bins.
This eliminates the need for the arena stats code to "know" about tcaches; all
that it needs is a cache_bin_array_descriptor_t to tell it where to find
cache_bins whose stats it should aggregate.
2017-08-16 17:48:44 -07:00
David Goldblatt
f3170baa30 Pull out caching for a bin into its own file.
This is the first step towards breaking up the tcache and arena (since they
interact primarily at the bin level).  It should also make a future arena
caching implementation more straightforward.
2017-08-16 17:48:44 -07:00
Qi Wang
b0825351d9 Add missing mallctl unit test for abort_conf.
The abort_conf option was missed from test/unit/mallctl.
2017-08-11 22:58:58 -07:00
Faidon Liambotis
82d1a3fb31 Add support for m68k, nios2, SH3 architectures
Add minimum alignment for three more architectures, as requested by
Debian users or porters (see Debian bugs #807554, #816236, #863424).
2017-08-11 16:35:44 -07:00
Faidon Liambotis
8da69b69e6 Fix support for GNU/kFreeBSD
The configure.ac seciton right now is the same for Linux and kFreeBSD,
which results into an incorrect configuration of e.g. defining
JEMALLOC_PROC_SYS_VM_OVERCOMMIT_MEMORY instead of FreeBSD's
JEMALLOC_SYSCTL_VM_OVERCOMMIT.

GNU/kFreeBSD is really a glibc + FreeBSD kernel system, so it needs its
own entry which has a mixture of configuration options from Linux and
FreeBSD.
2017-08-11 16:35:44 -07:00
Qi Wang
3ec279ba1c Fix test/unit/pages.
As part of the metadata_thp support, We now have a separate swtich
(JEMALLOC_HAVE_MADVISE_HUGE) for MADV_HUGEPAGE availability.  Use that instead
of JEMALLOC_THP (which doesn't guard pages_huge anymore) in tests.
2017-08-11 15:57:12 -07:00
Qi Wang
8fdd9a5797 Implement opt.metadata_thp
This option enables transparent huge page for base allocators (require
MADV_HUGEPAGE support).
2017-08-11 14:51:20 -07:00
Qi Wang
d157864027 Filter out "void *newImpl" in prof output. 2017-08-08 12:28:29 -07:00
Ryan Libby
048c6679cd Remove external linkage for spin_adaptive
The external linkage for spin_adaptive was not used, and the inline
declaration of spin_adaptive that was used caused a probem on FreeBSD
where CPU_SPINWAIT is implemented as a call to a static procedure for
x86 architectures.
2017-08-08 10:30:21 -07:00
Qi Wang
1ab2ab294c Only read szind if ptr is not paged aligned in sdallocx.
If ptr is not page aligned, we know the allocation was not sampled. In this case
use the size passed into sdallocx directly w/o accessing rtree.  This improve
sdallocx efficiency in the common case (not sampled && small allocation).
2017-07-31 15:47:48 -07:00
David Goldblatt
9a39b23c9c Remove a redundant '--with-malloc-conf=tcache:false' from gen_run_tests.py
This is already tested via its inclusion in possible_malloc_conf_opts.
2017-07-31 15:36:40 -07:00
Qi Wang
3800e55a2c Bypass extent_alloc_wrapper_hard for no_move_expand.
When retain is enabled, we should not attempt mmap for in-place expansion
(large_ralloc_no_move), because it's virtually impossible to succeed, and causes
unnecessary syscalls (which can cause lock contention under load).
2017-07-31 14:04:17 -07:00
Qi Wang
2d2fa72647 Filter out "newImpl" from profiling output. 2017-07-28 14:08:00 -07:00
David Goldblatt
7c22ea7a93 Only run test/integration/sdallocx non-reentrantly.
This is a temporary workaround until we add some beefier CI machines.  Right
now, we're seeing too many OOMs for this to be useful.
2017-07-24 16:21:24 -07:00
David Goldblatt
e6aeceb606 Logging: log using the log var names directly.
Currently we have to log by writing something like:

  static log_var_t log_a_b_c = LOG_VAR_INIT("a.b.c");
  log (log_a_b_c, "msg");

This is sort of annoying.  Let's just write:

  log("a.b.c", "msg");
2017-07-24 14:55:54 -07:00
Qinfan Wu
b28f31e7ed Split out cold code path in newImpl
I noticed that the whole newImpl is inlined. Since OOM handling code is
rarely executed, we should only inline the hot path.
2017-07-24 13:37:02 -07:00
David Goldblatt
a9f7732d45 Logging: allow logging with empty varargs.
Currently, the log macro requires at least one argument after the format string,
because of the way the preprocessor handles varargs macros.  We can hide some of
that irritation by pushing the extra arguments into a varargs function.
2017-07-22 09:38:19 -07:00
Y. T. Chung
aa6c282137 Validates fd before calling fcntl 2017-07-22 07:46:30 -07:00
David T. Goldblatt
e215a7bc18 Add entry and exit logging to all core functions.
I.e. mallloc, free, the allocx API, the posix extensions.
2017-07-20 17:58:37 -07:00
David T. Goldblatt
9761b449c8 Add a logging facility.
This sets up a hierarchical logging facility, so that we can add logging
statements liberally, and turn them on in a fine-grained manner.
2017-07-20 17:58:37 -07:00
Y. T. Chung
0975b88dfd Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable.
Older Linux systems don't have O_CLOEXEC.  If that's the case, we fcntl
immediately after open, to minimize the length of the racy period in
which an
operation in another thread can leak a file descriptor to a child.
2017-07-20 14:13:33 -07:00
David Goldblatt
fb6787a78c Add a test of behavior under multi-threaded forking.
Forking a multithreaded process is dangerous but allowed, so long as the child
only executes async-signal-safe functions (e.g. exec).  Add a test to ensure
that we don't break this behavior.
2017-07-10 18:17:12 -07:00
David Goldblatt
0a4f5a7eea Fix deadlock in multithreaded fork in OS X.
On OS X, we rely on the zone machinery to call our prefork and postfork
handlers.

In zone_force_unlock, we call jemalloc_postfork_child, reinitializing all our
mutexes regardless of state, since the mutex implementation will assert if the
tid of the unlocker is different from that of the locker.  This has the effect
of unlocking the mutexes, but also fails to wake any threads waiting on them in
the parent.

To fix this, we track whether or not we're the parent or child after the fork,
and unlock or reinit as appropriate.

This resolves #895.
2017-07-10 18:17:12 -07:00
Tamir Duberstein
3f5049340e Allow toolchain to determine nm 2017-07-06 14:46:02 -07:00
Tamir Duberstein
ef55006c1d dumpbin doesn't exist in mingw 2017-07-06 14:46:02 -07:00
Tamir Duberstein
f9dfb8db73 whitespace 2017-07-06 14:46:02 -07:00
Jason Evans
aa44ddbcdd Fix a typo. 2017-07-02 21:05:23 -07:00
Jason Evans
896ed3a8b3 Merge branch 'dev' 2017-07-01 17:44:01 -07:00
Jason Evans
284edf02b0 Update ChangeLog for 5.0.1. 2017-07-01 17:34:34 -07:00
Qi Wang
cb032781bd Add extent_grow_mtx in pre_ / post_fork handlers.
This fixed the issue that could cause the child process to stuck after fork.
2017-06-29 17:01:18 -07:00
Jason Evans
2b31cf5bd2 Enforce minimum autoconf version (currently 2.68).
This resolves #912.
2017-06-29 16:23:35 -07:00
Jason Evans
c99e570a48 Make sure LG_PAGE <= LG_HUGEPAGE.
This resolves #883.
2017-06-28 18:21:47 -07:00
Qi Wang
aa363f9388 Fix pthread_sigmask() usage to block all signals. 2017-06-26 11:27:21 -07:00
Qi Wang
57beeb2fcb Switch ctl to explicitly use tsd instead of tsdn. 2017-06-23 13:27:53 -07:00
Qi Wang
425463a446 Check arena in current context in pre_reentrancy. 2017-06-23 13:27:53 -07:00
Qi Wang
d6eb8ac8f3 Set reentrancy when invoking customized extent hooks.
Customized extent hooks may malloc / free thus trigger reentry.  Support this
behavior by adding reentrancy on hook functions.
2017-06-23 13:27:53 -07:00
Jason Evans
d49ac4c709 Fix assertion typos.
Reported by Conrad Meyer.
2017-06-23 11:48:00 -07:00
Qi Wang
a3f4977217 Add thread name for background threads. 2017-06-23 10:54:54 -07:00
Qi Wang
52fc887b49 Avoid inactivity_check within background threads.
Passing is_background_thread down the decay path, so that background thread
itself won't attempt inactivity_check.  This fixes an issue with background
thread doing trylock on a mutex it already owns.
2017-06-22 16:53:58 -07:00
Jason Evans
37f3fa0941 Mask signals during background thread creation.
This prevents signals from being inadvertently delivered to background
threads.
2017-06-20 17:47:38 -07:00
Qi Wang
d35c037e03 Clear tcache_ql after fork in child. 2017-06-19 21:53:07 -07:00
Qi Wang
9b1befabbb Add minimal initialized TSD.
We use the minimal_initilized tsd (which requires no cleanup) for free()
specifically, if tsd hasn't been initialized yet.

Any other activity will transit the state from minimal to normal.  This is to
workaround the case where a thread has no malloc calls in its lifetime until
during thread termination, free() happens after tls destructors.
2017-06-15 17:55:53 -07:00
Qi Wang
ae93fb08e2 Pass tsd to tcache_flush(). 2017-06-15 17:55:53 -07:00
Qi Wang
84f6c2cae0 Log decay->nunpurged before purging.
During purging, we may unlock decay->mtx.  Therefore we should finish logging
decay related counters before attempt to purge.
2017-06-14 20:18:02 -07:00
Qi Wang
a4d6fe73cf Only abort on dlsym when necessary.
If neither background_thread nor lazy_lock is in use, do not abort on dlsym
errors.
2017-06-14 13:27:41 -07:00
Qi Wang
bdcf40a620 Add alloc hook test in test/integration/extent. 2017-06-14 09:34:29 -07:00
Qi Wang
d955d6f2be Fix extent_hooks in extent_grow_retained().
This issue caused the default extent alloc function to be incorrectly
used even when arena.<i>.extent_hooks is set.  This bug was introduced
by 411697adcd (Use exponential series to
size extents.), which was first released in 5.0.0.
2017-06-14 09:34:29 -07:00
514 changed files with 87735 additions and 26592 deletions

View file

@ -5,33 +5,42 @@ environment:
- MSYSTEM: MINGW64
CPU: x86_64
MSVC: amd64
CONFIG_FLAGS: --enable-debug
- MSYSTEM: MINGW64
CPU: x86_64
CONFIG_FLAGS: --enable-debug
EXTRA_CFLAGS: "-fcommon"
- MSYSTEM: MINGW32
CPU: i686
MSVC: x86
- MSYSTEM: MINGW64
CPU: x86_64
CONFIG_FLAGS: --enable-debug
- MSYSTEM: MINGW32
CPU: i686
CONFIG_FLAGS: --enable-debug
EXTRA_CFLAGS: "-fcommon"
- MSYSTEM: MINGW64
CPU: x86_64
MSVC: amd64
CONFIG_FLAGS: --enable-debug
CONFIG_FLAGS:
- MSYSTEM: MINGW64
CPU: x86_64
CONFIG_FLAGS:
EXTRA_CFLAGS: "-fcommon"
- MSYSTEM: MINGW32
CPU: i686
MSVC: x86
CONFIG_FLAGS: --enable-debug
- MSYSTEM: MINGW64
CPU: x86_64
CONFIG_FLAGS: --enable-debug
CONFIG_FLAGS:
- MSYSTEM: MINGW32
CPU: i686
CONFIG_FLAGS: --enable-debug
CONFIG_FLAGS:
EXTRA_CFLAGS: "-fcommon"
install:
- set PATH=c:\msys64\%MSYSTEM%\bin;c:\msys64\usr\bin;%PATH%
- if defined MSVC call "c:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" %MSVC%
- if defined MSVC pacman --noconfirm -Rsc mingw-w64-%CPU%-gcc gcc
- pacman --noconfirm -Suy mingw-w64-%CPU%-make
- pacman --noconfirm -Syuu
- pacman --noconfirm -S autoconf
build_script:
- bash -c "autoconf"

122
.clang-format Normal file
View file

@ -0,0 +1,122 @@
# jemalloc targets clang-format version 8. We include every option it supports
# here, but comment out the ones that aren't relevant for us.
---
# AccessModifierOffset: -2
AlignAfterOpenBracket: DontAlign
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: true
AlignEscapedNewlines: Right
AlignOperands: false
AlignTrailingComments: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterReturnType: AllDefinitions
AlwaysBreakBeforeMultilineStrings: true
# AlwaysBreakTemplateDeclarations: Yes
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterClass: true
AfterControlStatement: true
AfterEnum: true
AfterFunction: true
AfterNamespace: true
AfterObjCDeclaration: true
AfterStruct: true
AfterUnion: true
BeforeCatch: true
BeforeElse: true
IndentBraces: false
# BreakAfterJavaFieldAnnotations: true
BreakBeforeBinaryOperators: NonAssignment
BreakBeforeBraces: Attach
BreakBeforeTernaryOperators: true
# BreakConstructorInitializers: BeforeColon
# BreakInheritanceList: BeforeColon
BreakStringLiterals: false
ColumnLimit: 80
# CommentPragmas: ''
# CompactNamespaces: true
# ConstructorInitializerAllOnOneLineOrOnePerLine: true
# ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DerivePointerAlignment: false
DisableFormat: false
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros: [ ql_foreach, qr_foreach, ]
# IncludeBlocks: Preserve
# IncludeCategories:
# - Regex: '^<.*\.h(pp)?>'
# Priority: 1
# IncludeIsMainRegex: ''
IndentCaseLabels: false
IndentPPDirectives: AfterHash
IndentWidth: 8
IndentWrappedFunctionNames: false
# JavaImportGroups: []
# JavaScriptQuotes: Leave
# JavaScriptWrapImports: True
KeepEmptyLinesAtTheStartOfBlocks: false
Language: Cpp
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
# NamespaceIndentation: None
# ObjCBinPackProtocolList: Auto
# ObjCBlockIndentWidth: 2
# ObjCSpaceAfterProperty: false
# ObjCSpaceBeforeProtocolList: false
PenaltyBreakAssignment: 100
PenaltyBreakBeforeFirstCallParameter: 100
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
# PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 60
PointerAlignment: Right
# RawStringFormats:
# - Language: TextProto
# Delimiters:
# - 'pb'
# - 'proto'
# EnclosingFunctions:
# - 'PARSE_TEXT_PROTO'
# BasedOnStyle: google
# - Language: Cpp
# Delimiters:
# - 'cc'
# - 'cpp'
# BasedOnStyle: llvm
# CanonicalDelimiter: 'cc'
ReflowComments: false
SortIncludes: false
SpaceAfterCStyleCast: false
# SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
# SpaceBeforeCpp11BracedList: false
# SpaceBeforeCtorInitializerColon: true
# SpaceBeforeInheritanceColon: true
SpaceBeforeParens: ControlStatements
# SpaceBeforeRangeBasedForLoopColon: true
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInCStyleCastParentheses: false
# SpacesInContainerLiterals: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
# Standard: Cpp11
# This is nominally supported in clang-format version 8, but not in the build
# used by some of the core jemalloc developers.
# StatementMacros: []
TabWidth: 8
UseTab: ForIndentation
...

2
.git-blame-ignore-revs Normal file
View file

@ -0,0 +1,2 @@
554185356bf990155df8d72060c4efe993642baf
34f359e0ca613b5f9d970e9b2152a5203c9df8d6

10
.github/workflows/check_formatting.yaml vendored Normal file
View file

@ -0,0 +1,10 @@
name: 'Check Formatting'
on: [pull_request]
jobs:
check-formatting:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Check for trailing whitespace
run: scripts/check_trailing_whitespace.sh

66
.github/workflows/freebsd-ci.yml vendored Normal file
View file

@ -0,0 +1,66 @@
# This config file is generated by ./scripts/gen_gh_actions.py.
# Do not edit by hand.
name: FreeBSD CI
on:
push:
branches: [ dev, ci_travis ]
pull_request:
branches: [ dev ]
jobs:
test-freebsd:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
debug: ['--enable-debug', '--disable-debug']
prof: ['--enable-prof', '--disable-prof']
arch: ['64-bit', '32-bit']
uncommon:
- ''
- '--with-lg-page=16 --with-malloc-conf=tcache:false'
name: FreeBSD (${{ matrix.arch }}, debug=${{ matrix.debug }}, prof=${{ matrix.prof }}${{ matrix.uncommon && ', uncommon' || '' }})
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Test on FreeBSD
uses: vmactions/freebsd-vm@v1
with:
release: '15.0'
usesh: true
prepare: |
pkg install -y autoconf gmake
run: |
# Verify we're running in FreeBSD
echo "==== System Information ===="
uname -a
freebsd-version
echo "============================"
# Set compiler flags for 32-bit if needed
if [ "${{ matrix.arch }}" = "32-bit" ]; then
export CC="cc -m32"
export CXX="c++ -m32"
fi
# Generate configure script
autoconf
# Configure with matrix options
./configure --with-jemalloc-prefix=ci_ ${{ matrix.debug }} ${{ matrix.prof }} ${{ matrix.uncommon }}
# Get CPU count for parallel builds
export JFLAG=$(sysctl -n kern.smp.cpus)
gmake -j${JFLAG}
gmake -j${JFLAG} tests
gmake check

695
.github/workflows/linux-ci.yml vendored Normal file
View file

@ -0,0 +1,695 @@
# This config file is generated by ./scripts/gen_gh_actions.py.
# Do not edit by hand.
name: Linux CI
on:
push:
branches: [ dev, ci_travis ]
pull_request:
branches: [ dev ]
jobs:
test-linux:
runs-on: ubuntu-24.04
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: clang
CXX: clang++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-prof
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: clang
CXX: clang++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --enable-prof
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --enable-prof
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --enable-prof"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --disable-stats"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --disable-libdl"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --enable-opt-safety-checks"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --disable-stats"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --disable-libdl"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-opt-safety-checks"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --disable-libdl"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --enable-opt-safety-checks"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --enable-opt-safety-checks"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false,dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false,percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false,background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary,percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary,background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu,background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --disable-cache-oblivious --enable-stats --enable-log --enable-prof"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --enable-experimental-smallocx --enable-stats --enable-prof"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
steps:
- uses: actions/checkout@v4
- name: Show OS version
run: |
echo "=== System Information ==="
uname -a
echo ""
echo "=== Architecture ==="
uname -m
arch
echo ""
echo "=== OS Release ==="
cat /etc/os-release || true
echo ""
echo "=== CPU Info ==="
lscpu | grep -E "Architecture|CPU op-mode|Byte Order|CPU\(s\):" || true
- name: Install dependencies (32-bit)
if: matrix.env.CROSS_COMPILE_32BIT == 'yes'
run: |
sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install -y gcc-multilib g++-multilib libc6-dev-i386
- name: Build and test
env:
CC: ${{ matrix.env.CC }}
CXX: ${{ matrix.env.CXX }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Verify the script generates the same output
./scripts/gen_gh_actions.py > gh_actions_script.yml
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build
make -j3
make -j3 tests
# Run tests
make check
test-linux-arm64:
runs-on: ubuntu-24.04-arm
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: clang
CXX: clang++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-prof
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-lg-hugepage=29"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
steps:
- uses: actions/checkout@v4
- name: Show OS version
run: |
echo "=== System Information ==="
uname -a
echo ""
echo "=== Architecture ==="
uname -m
arch
echo ""
echo "=== OS Release ==="
cat /etc/os-release || true
echo ""
echo "=== CPU Info ==="
lscpu | grep -E "Architecture|CPU op-mode|Byte Order|CPU\(s\):" || true
- name: Install dependencies (32-bit)
if: matrix.env.CROSS_COMPILE_32BIT == 'yes'
run: |
sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install -y gcc-multilib g++-multilib libc6-dev-i386
- name: Build and test
env:
CC: ${{ matrix.env.CC }}
CXX: ${{ matrix.env.CXX }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Verify the script generates the same output
./scripts/gen_gh_actions.py > gh_actions_script.yml
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build
make -j3
make -j3 tests
# Run tests
make check

212
.github/workflows/macos-ci.yml vendored Normal file
View file

@ -0,0 +1,212 @@
# This config file is generated by ./scripts/gen_gh_actions.py.
# Do not edit by hand.
name: macOS CI
on:
push:
branches: [ dev, ci_travis ]
pull_request:
branches: [ dev ]
jobs:
test-macos:
runs-on: macos-15-intel
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
steps:
- uses: actions/checkout@v4
- name: Show OS version
run: |
echo "=== macOS Version ==="
sw_vers
echo ""
echo "=== Architecture ==="
uname -m
arch
echo ""
echo "=== CPU Info ==="
sysctl -n machdep.cpu.brand_string
sysctl -n hw.machine
- name: Install dependencies
run: |
brew install autoconf
- name: Build and test
env:
CC: ${{ matrix.env.CC || 'gcc' }}
CXX: ${{ matrix.env.CXX || 'g++' }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build
make -j3
make -j3 tests
# Run tests
make check
test-macos-arm64:
runs-on: macos-15
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-lg-hugepage=29"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
steps:
- uses: actions/checkout@v4
- name: Show OS version
run: |
echo "=== macOS Version ==="
sw_vers
echo ""
echo "=== Architecture ==="
uname -m
arch
echo ""
echo "=== CPU Info ==="
sysctl -n machdep.cpu.brand_string
sysctl -n hw.machine
- name: Install dependencies
run: |
brew install autoconf
- name: Build and test
env:
CC: ${{ matrix.env.CC || 'gcc' }}
CXX: ${{ matrix.env.CXX || 'g++' }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build
make -j3
make -j3 tests
# Run tests
make check

68
.github/workflows/static_analysis.yaml vendored Normal file
View file

@ -0,0 +1,68 @@
name: 'Static Analysis'
on: [pull_request]
jobs:
static-analysis:
runs-on: ubuntu-latest
steps:
# We build libunwind ourselves because sadly the version
# provided by Ubuntu via apt-get is much too old.
- name: Check out libunwind
uses: actions/checkout@v4
with:
repository: libunwind/libunwind
path: libunwind
ref: 'v1.6.2'
github-server-url: 'https://github.com'
- name: Install libunwind
run: |
cd libunwind
autoreconf -i
./configure --prefix=/usr
make -s -j $(nproc) V=0
sudo make -s install V=0
cd ..
rm -rf libunwind
- name: Check out repository
uses: actions/checkout@v4
# We download LLVM directly from the latest stable release
# on GitHub, because this tends to be much newer than the
# version available via apt-get in Ubuntu.
- name: Download LLVM
uses: dsaltares/fetch-gh-release-asset@master
with:
repo: 'llvm/llvm-project'
version: 'tags/llvmorg-16.0.4'
file: 'clang[+]llvm-.*x86_64-linux-gnu.*'
regex: true
target: 'llvm_assets/'
token: ${{ secrets.GITHUB_TOKEN }}
- name: Install prerequisites
id: install_prerequisites
run: |
tar -C llvm_assets -xaf llvm_assets/*.tar* &
sudo apt-get update
sudo apt-get install -y jq bear python3-pip
pip install codechecker
echo "Extracting LLVM from tar" 1>&2
wait
echo "LLVM_BIN_DIR=$(echo llvm_assets/clang*/bin)" >> "$GITHUB_OUTPUT"
- name: Run static analysis
id: run_static_analysis
run: >
PATH="${{ steps.install_prerequisites.outputs.LLVM_BIN_DIR }}:$PATH"
LDFLAGS='-L/usr/lib'
scripts/run_static_analysis.sh static_analysis_results "$GITHUB_OUTPUT"
- name: Upload static analysis results
if: ${{ steps.run_static_analysis.outputs.HAS_STATIC_ANALYSIS_RESULTS }} == '1'
uses: actions/upload-artifact@v4
with:
name: static_analysis_results
path: static_analysis_results
- name: Check static analysis results
run: |
if [[ "${{ steps.run_static_analysis.outputs.HAS_STATIC_ANALYSIS_RESULTS }}" == '1' ]]
then
echo "::error::Static analysis found issues with your code. Download the 'static_analysis_results' artifact from this workflow and view the 'index.html' file contained within it in a web browser locally for detailed results."
exit 1
fi

155
.github/workflows/windows-ci.yml vendored Normal file
View file

@ -0,0 +1,155 @@
# This config file is generated by ./scripts/gen_gh_actions.py.
# Do not edit by hand.
name: Windows CI
on:
push:
branches: [ dev, ci_travis ]
pull_request:
branches: [ dev ]
jobs:
test-windows:
runs-on: windows-latest
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: -fcommon
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: -fcommon
- env:
CC: cl.exe
CXX: cl.exe
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
EXTRA_CFLAGS: -fcommon
- env:
CC: cl.exe
CXX: cl.exe
CONFIGURE_FLAGS: --enable-debug
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: -fcommon
- env:
CC: cl.exe
CXX: cl.exe
CROSS_COMPILE_32BIT: yes
- env:
CC: cl.exe
CXX: cl.exe
CROSS_COMPILE_32BIT: yes
CONFIGURE_FLAGS: --enable-debug
steps:
- uses: actions/checkout@v4
- name: Show OS version
shell: cmd
run: |
echo === Windows Version ===
systeminfo | findstr /B /C:"OS Name" /C:"OS Version"
ver
echo.
echo === Architecture ===
echo PROCESSOR_ARCHITECTURE=%PROCESSOR_ARCHITECTURE%
echo.
- name: Setup MSYS2
uses: msys2/setup-msys2@v2
with:
msystem: ${{ matrix.env.CROSS_COMPILE_32BIT == 'yes' && 'MINGW32' || 'MINGW64' }}
update: true
install: >-
autotools
git
pacboy: >-
make:p
gcc:p
binutils:p
- name: Build and test (MinGW-GCC)
if: matrix.env.CC != 'cl.exe'
shell: msys2 {0}
env:
CC: ${{ matrix.env.CC || 'gcc' }}
CXX: ${{ matrix.env.CXX || 'g++' }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build (mingw32-make is the "make" command in MSYS2)
mingw32-make -j3
mingw32-make tests
# Run tests
mingw32-make -k check
- name: Setup MSVC environment
if: matrix.env.CC == 'cl.exe'
uses: ilammy/msvc-dev-cmd@v1
with:
arch: ${{ matrix.env.CROSS_COMPILE_32BIT == 'yes' && 'x86' || 'x64' }}
- name: Build and test (MSVC)
if: matrix.env.CC == 'cl.exe'
shell: msys2 {0}
env:
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
MSYS2_PATH_TYPE: inherit
run: |
# Export MSVC environment variables for configure
export CC=cl.exe
export CXX=cl.exe
export AR=lib.exe
export NM=dumpbin.exe
export RANLIB=:
# Verify cl.exe is accessible (should be in PATH via inherit)
if ! which cl.exe > /dev/null 2>&1; then
echo "cl.exe not found, trying to locate MSVC..."
# Find and add MSVC bin directory to PATH
MSVC_BIN=$(cmd.exe /c "echo %VCToolsInstallDir%" | tr -d '\\r' | sed 's/\\\\\\\\/\//g' | sed 's/C:/\\/c/g')
if [ -n "$MSVC_BIN" ]; then
export PATH="$PATH:$MSVC_BIN/bin/Hostx64/x64:$MSVC_BIN/bin/Hostx86/x86"
fi
fi
# Run autoconf
autoconf
# Configure with MSVC
./configure CC=cl.exe CXX=cl.exe AR=lib.exe $CONFIGURE_FLAGS
# Build (mingw32-make is the "make" command in MSYS2)
mingw32-make -j3
# Build tests sequentially due to PDB file issues
mingw32-make tests
# Run tests
mingw32-make -k check

19
.gitignore vendored
View file

@ -13,6 +13,8 @@
/doc/jemalloc.html
/doc/jemalloc.3
/doc_internal/PROFILING_INTERNALS.pdf
/jemalloc.pc
/lib/
@ -30,7 +32,6 @@
/include/jemalloc/internal/public_namespace.h
/include/jemalloc/internal/public_symbols.txt
/include/jemalloc/internal/public_unnamespace.h
/include/jemalloc/internal/size_classes.h
/include/jemalloc/jemalloc.h
/include/jemalloc/jemalloc_defs.h
/include/jemalloc/jemalloc_macros.h
@ -44,6 +45,13 @@
/src/*.[od]
/src/*.sym
# These are semantically meaningful for clangd and related tooling.
/build/
/.cache/
compile_commands.json
/static_analysis_raw_results
/static_analysis_results
/run_tests.out/
/test/test.sh
@ -51,6 +59,7 @@ test/include/test/jemalloc_test.h
test/include/test/jemalloc_test_defs.h
/test/integration/[A-Za-z]*
!/test/integration/cpp/
!/test/integration/[A-Za-z]*.*
/test/integration/*.[od]
/test/integration/*.out
@ -64,6 +73,7 @@ test/include/test/jemalloc_test_defs.h
/test/stress/[A-Za-z]*
!/test/stress/[A-Za-z]*.*
!/test/stress/pa/
/test/stress/*.[od]
/test/stress/*.out
@ -72,17 +82,24 @@ test/include/test/jemalloc_test_defs.h
/test/unit/*.[od]
/test/unit/*.out
/test/analyze/[A-Za-z]*
!/test/analyze/[A-Za-z]*.*
/test/analyze/*.[od]
/test/analyze/*.out
/VERSION
*.pdb
*.sdf
*.opendb
*.VC.db
*.opensdf
*.cachefile
*.suo
*.user
*.sln.docstates
*.tmp
.vs/
/msvc/Win32/
/msvc/x64/
/msvc/projects/*/*/Debug*/

View file

@ -1,155 +1,365 @@
language: generic
# This config file is generated by ./scripts/gen_travis.py.
# Do not edit by hand.
matrix:
# We use 'minimal', because 'generic' makes Windows VMs hang at startup. Also
# the software provided by 'generic' is simply not needed for our tests.
# Differences are explained here:
# https://docs.travis-ci.com/user/languages/minimal-and-generic/
language: minimal
dist: jammy
jobs:
include:
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: osx
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=clang CXX=clang++ COMPILER_FLAGS="" CONFIGURE_FLAGS="" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=clang CXX=clang++ EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
addons:
apt:
packages:
- gcc-multilib
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: osx
env: CC=clang CXX=clang++ COMPILER_FLAGS="" CONFIGURE_FLAGS="" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: osx
env: CC=gcc CXX=g++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: osx
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: osx
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: osx
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=clang CXX=clang++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
addons:
apt:
packages:
- gcc-multilib
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=clang CXX=clang++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=clang CXX=clang++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=clang CXX=clang++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=clang CXX=clang++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=clang CXX=clang++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=clang CXX=clang++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=clang CXX=clang++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=clang CXX=clang++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
addons:
apt:
packages:
- gcc-multilib
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
addons:
apt:
packages:
- gcc-multilib
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
addons:
apt:
packages:
- gcc-multilib
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
addons:
apt:
packages:
- gcc-multilib
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
addons:
apt:
packages:
- gcc-multilib
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
addons:
apt:
packages:
- gcc-multilib
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
addons:
apt:
packages:
- gcc-multilib
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-debug --enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-debug --disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-prof --disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=tcache:false,dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=tcache:false,percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=tcache:false,background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=dss:primary,percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=dss:primary,background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
env: CC=gcc CXX=g++ COMPILER_FLAGS="" CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu,background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false,dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false,percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false,background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary,percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary,background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu,background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=clang CXX=clang++ EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-lg-hugepage=29" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
# Development build
- os: linux
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --disable-cache-oblivious --enable-stats --enable-log --enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
# --enable-expermental-smallocx:
- os: linux
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --enable-experimental-smallocx --enable-stats --enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
before_install:
- |-
if test -f "./scripts/$TRAVIS_OS_NAME/before_install.sh"; then
source ./scripts/$TRAVIS_OS_NAME/before_install.sh
fi
before_script:
- autoconf
- ./configure ${COMPILER_FLAGS:+ CC="$CC $COMPILER_FLAGS" CXX="$CXX $COMPILER_FLAGS" } $CONFIGURE_FLAGS
- make -j3
- make -j3 tests
- |-
if test -f "./scripts/$TRAVIS_OS_NAME/before_script.sh"; then
source ./scripts/$TRAVIS_OS_NAME/before_script.sh
else
scripts/gen_travis.py > travis_script && diff .travis.yml travis_script
autoconf
# If COMPILER_FLAGS are not empty, add them to CC and CXX
./configure ${COMPILER_FLAGS:+ CC="$CC $COMPILER_FLAGS" CXX="$CXX $COMPILER_FLAGS"} $CONFIGURE_FLAGS
make -j3
make -j3 tests
fi
script:
- make check
- |-
if test -f "./scripts/$TRAVIS_OS_NAME/script.sh"; then
source ./scripts/$TRAVIS_OS_NAME/script.sh
else
make check
fi

View file

@ -1,10 +1,10 @@
Unless otherwise specified, files in the jemalloc source distribution are
subject to the following license:
--------------------------------------------------------------------------------
Copyright (C) 2002-2017 Jason Evans <jasone@canonware.com>.
Copyright (C) 2002-present Jason Evans <jasone@canonware.com>.
All rights reserved.
Copyright (C) 2007-2012 Mozilla Foundation. All rights reserved.
Copyright (C) 2009-2017 Facebook, Inc. All rights reserved.
Copyright (C) 2009-present Facebook, Inc. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

538
ChangeLog
View file

@ -4,6 +4,542 @@ brevity. Much more detail can be found in the git revision history:
https://github.com/jemalloc/jemalloc
* 5.3.1 (Apr 13, 2026)
This release includes over 390 commits spanning bug fixes, new features,
performance optimizations, and portability improvements. Multiple percent
of system-level metric improvements were measured in tested production
workloads. The release has gone through large-scale production testing
at Meta.
New features:
- Support pvalloc. (@Lapenkov: 5b1f2cc5)
- Add double free detection for the debug build. (@izaitsevfb:
36366f3c, @guangli-dai: 42daa1ac, @divanorama: 1897f185)
- Add compile-time option `--enable-pageid` to enable memory mapping
annotation. (@devnexen: 4fc5c4fb)
- Add runtime option `prof_bt_max` to control the max stack depth for
profiling. (@guangli-dai: a0734fd6)
- Add compile-time option `--enable-force-getenv` to use `getenv` instead
of `secure_getenv`. (@interwq: 481bbfc9)
- Add compile-time option `--disable-dss` to disable the usage of
`sbrk(2)`. (@Svetlitski: ea5b7bea)
- Add runtime option `tcache_ncached_max` to control the number of items
in each size bin in the thread cache. (@guangli-dai: 8a22d10b)
- Add runtime option `calloc_madvise_threshold` to determine if kernel or
memset is used to zero the allocations for calloc. (@nullptr0-0:
5081c16b)
- Add compile-time option `--disable-user-config` to disable reading the
runtime configurations from `/etc/malloc.conf` or environment variable
`MALLOC_CONF`. (@roblabla: c17bf8b3)
- Add runtime option `disable_large_size_classes` to guard the new usable
size calculation, which minimizes the memory overhead for large
allocations, i.e., >= 4 * PAGE. (@guangli-dai: c067a55c, 8347f104)
- Enable process_madvise usage, add runtime option
`process_madvise_max_batch` to control the max # of regions in each
madvise batch. (@interwq: 22440a02, @spredolac: 4246475b)
- Add mallctl interfaces:
+ `opt.prof_bt_max` (@guangli-dai: a0734fd6)
+ `arena.<i>.name` to set and get arena names. (@guangli-dai: ba19d2cb)
+ `thread.tcache.max` to set and get the `tcache_max` of the current
thread. (@guangli-dai: a442d9b8)
+ `thread.tcache.ncached_max.write` and
`thread.tcache.ncached_max.read_sizeclass` to set and get the
`ncached_max` setup of the current thread. (@guangli-dai: 630f7de9,
6b197fdd)
+ `arenas.hugepage` to return the hugepage size used, also exported to
malloc stats. (@ilvokhin: 90c627ed)
+ `approximate_stats.active` to return an estimate of the current active
bytes, which should not be compared with other stats retrieved.
(@guangli-dai: 0988583d)
Bug fixes:
- Prevent potential deadlocks in decaying during reentrancy. (@interwq:
434a68e2)
- Fix segfault in extent coalescing. (@Svetlitski: 12311fe6)
- Add null pointer detections in mallctl calls. (@Svetlitski: dc0a184f,
0288126d)
- Make mallctl `arenas.lookup` triable without crashing on invalid
pointers. (@auxten: 019cccc2, 5bac3849)
- Demote sampled allocations for proper deallocations during
`arena_reset`. (@Svetlitski: 62648c88)
- Fix jemalloc's `read(2)` and `write(2)`. (@Svetlitski: d2c9ed3d, @lexprfuncall:
9fdc1160)
- Fix the pkg-config metadata file. (@BtbN: ed7e6fe7, ce8ce99a)
- Fix the autogen.sh so that it accepts quoted extra options.
(@honggyukim: f6fe6abd)
- Fix `rallocx()` to set errno to ENOMEM upon OOMing. (@arter97: 38056fea,
@interwq: 83b07578)
- Avoid stack overflow for internal variable array usage. (@nullptr0-0:
47c9bcd4, 48f66cf4, @xinydev: 9169e927)
- Fix background thread initialization race. (@puzpuzpuz: 4d0ffa07)
- Guard os_page_id against a NULL address. (@lexprfuncall: 79cc7dcc)
- Handle tcache init failures gracefully. (@lexprfuncall: a056c20d)
- Fix missing release of acquired neighbor edata in
extent_try_coalesce_impl. (@spredolac: 675ab079)
- Fix memory leak of old curr_reg on san_bump_grow_locked failure.
(@spredolac: 5904a421)
- Fix large alloc nrequests under-counting on cache misses. (@spredolac:
3cc56d32)
Portability improvements:
- Fix the build in C99. (@abaelhe: 56ddbea2)
- Add `pthread_setaffinity_np` detection for non Linux/BSD platforms.
(@devnexen: 4c95c953)
- Make `VARIABLE_ARRAY` compatible with compilers not supporting VLA,
i.e., Visual Studio C compiler in C11 or C17 modes. (@madscientist:
be65438f)
- Fix the build on Linux using musl library. (@marv: aba1645f, 45249cf5)
- Reduce the memory overhead in small allocation sampling for systems
with larger page sizes, e.g., ARM. (@Svetlitski: 5a858c64)
- Add C23's `free_sized` and `free_aligned_sized`. (@Svetlitski:
cdb2c0e0)
- Enable heap profiling on MacOS. (@nullptr0-0: 4b555c11)
- Fix incorrect printing on 32bit. (@sundb: 630434bb)
- Make `JEMALLOC_CXX_THROW` compatible with C++ versions newer than
C++17. (@r-barnes, @guangli-dai: 21bcc0a8)
- Fix mmap tag conflicts on MacOS. (@kdrag0n: c893fcd1)
- Fix monotonic timer assumption for win32. (@burtonli: 8dc97b11)
- Fix VM over-reservation on systems with larger pages, e.g., aarch64.
(@interwq: cd05b19f)
- Remove `unreachable()` macro conditionally to prevent definition
conflicts for C23+. (@appujee: d8486b26, 4b88bddb)
- Fix dlsym failure observed on FreeBSD. (@rhelmot: 86bbabac)
- Change the default page size to 64KB on aarch64 Linux. (@lexprfuncall:
9442300c)
- Update config.guess and config.sub to the latest version.
(@lexprfuncall: c51949ea)
- Determine the page size on Android from NDK header files.
(@lexprfuncall: c51abba1)
- Improve the portability of grep patterns in configure.ac.
(@lexprfuncall: 365747bc)
- Add compile-time option `--with-cxx-stdlib` to specify the C++ standard
library. (@yuxuanchen1997: a10ef3e1)
Optimizations and refactors:
- Enable tcache for deallocation-only threads. (@interwq: 143e9c4a)
- Inline to accelerate operator delete. (@guangli-dai: e8f9f138)
- Optimize pairing heap's performance. (@deadalnix: 5266152d, be6da4f6,
543e2d61, 10d71315, 92aa52c0, @Svetlitski: 36ca0c1b)
- Inline the storage for thread name in the profiling data. (@interwq:
ce0b7ab6, e62aa478)
- Optimize a hot function `edata_cmp_summary_comp` to accelerate it.
(@Svetlitski: 6841110b, @guangli-dai: 0181aaa4)
- Allocate thread cache using the base allocator, which enables thread
cache to use thp when `metadata_thp` is turned on. (@interwq:
72cfdce7)
- Allow oversize arena not to purge immediately when background threads
are enabled, although the default decay time is 0 to be back compatible.
(@interwq: d1313313)
- Optimize thread-local storage implementation on Windows. (@mcfi:
9e123a83, 3a0d9cda)
- Optimize fast path to allow static size class computation. (@interwq:
323ed2e3)
- Redesign tcache GC to regulate the frequency and make it
locality-aware. The new design is default on, guarded by option
`experimental_tcache_gc`. (@nullptr0-0: 0c88be9e, e2c9f3a9,
14d5dc13, @deadalnix: 5afff2e4)
- Reduce the arena switching overhead by avoiding forced purging when
background thread is enabled. (@interwq: a3910b98)
- Improve the reuse efficiency by limiting the maximum coalesced size for
large extents. (@jiebinn: 3c14707b)
- Refactor thread events to allow registration of users' thread events
and remove prof_threshold as the built-in event. (@spredolac: e6864c60,
015b0179, 34ace916)
Documentation:
- Update Windows building instructions. (@Lapenkov: 37139328)
- Add vcpkg installation instructions. (@LilyWangLL: c0c9783e)
- Update profiling internals with an example. (@jordalgo: b04e7666)
* 5.3.0 (May 6, 2022)
This release contains many speed and space optimizations, from micro
optimizations on common paths to rework of internal data structures and
locking schemes, and many more too detailed to list below. Multiple percent
of system level metric improvements were measured in tested production
workloads. The release has gone through large-scale production testing.
New features:
- Add the thread.idle mallctl which hints that the calling thread will be
idle for a nontrivial period of time. (@davidtgoldblatt)
- Allow small size classes to be the maximum size class to cache in the
thread-specific cache, through the opt.[lg_]tcache_max option. (@interwq,
@jordalgo)
- Make the behavior of realloc(ptr, 0) configurable with opt.zero_realloc.
(@davidtgoldblatt)
- Add 'make uninstall' support. (@sangshuduo, @Lapenkov)
- Support C++17 over-aligned allocation. (@marksantaniello)
- Add the thread.peak mallctl for approximate per-thread peak memory tracking.
(@davidtgoldblatt)
- Add interval-based stats output opt.stats_interval. (@interwq)
- Add prof.prefix to override filename prefixes for dumps. (@zhxchen17)
- Add high resolution timestamp support for profiling. (@tyroguru)
- Add the --collapsed flag to jeprof for flamegraph generation.
(@igorwwwwwwwwwwwwwwwwwwww)
- Add the --debug-syms-by-id option to jeprof for debug symbols discovery.
(@DeannaGelbart)
- Add the opt.prof_leak_error option to exit with error code when leak is
detected using opt.prof_final. (@yunxuo)
- Add opt.cache_oblivious as an runtime alternative to config.cache_oblivious.
(@interwq)
- Add mallctl interfaces:
+ opt.zero_realloc (@davidtgoldblatt)
+ opt.cache_oblivious (@interwq)
+ opt.prof_leak_error (@yunxuo)
+ opt.stats_interval (@interwq)
+ opt.stats_interval_opts (@interwq)
+ opt.tcache_max (@interwq)
+ opt.trust_madvise (@azat)
+ prof.prefix (@zhxchen17)
+ stats.zero_reallocs (@davidtgoldblatt)
+ thread.idle (@davidtgoldblatt)
+ thread.peak.{read,reset} (@davidtgoldblatt)
Bug fixes:
- Fix the synchronization around explicit tcache creation which could cause
invalid tcache identifiers. This regression was first released in 5.0.0.
(@yoshinorim, @davidtgoldblatt)
- Fix a profiling biasing issue which could cause incorrect heap usage and
object counts. This issue existed in all previous releases with the heap
profiling feature. (@davidtgoldblatt)
- Fix the order of stats counter updating on large realloc which could cause
failed assertions. This regression was first released in 5.0.0. (@azat)
- Fix the locking on the arena destroy mallctl, which could cause concurrent
arena creations to fail. This functionality was first introduced in 5.0.0.
(@interwq)
Portability improvements:
- Remove nothrow from system function declarations on macOS and FreeBSD.
(@davidtgoldblatt, @fredemmott, @leres)
- Improve overcommit and page alignment settings on NetBSD. (@zoulasc)
- Improve CPU affinity support on BSD platforms. (@devnexen)
- Improve utrace detection and support. (@devnexen)
- Improve QEMU support with MADV_DONTNEED zeroed pages detection. (@azat)
- Add memcntl support on Solaris / illumos. (@devnexen)
- Improve CPU_SPINWAIT on ARM. (@AWSjswinney)
- Improve TSD cleanup on FreeBSD. (@Lapenkov)
- Disable percpu_arena if the CPU count cannot be reliably detected. (@azat)
- Add malloc_size(3) override support. (@devnexen)
- Add mmap VM_MAKE_TAG support. (@devnexen)
- Add support for MADV_[NO]CORE. (@devnexen)
- Add support for DragonFlyBSD. (@devnexen)
- Fix the QUANTUM setting on MIPS64. (@brooksdavis)
- Add the QUANTUM setting for ARC. (@vineetgarc)
- Add the QUANTUM setting for LoongArch. (@wangjl-uos)
- Add QNX support. (@jqian-aurora)
- Avoid atexit(3) calls unless the relevant profiling features are enabled.
(@BusyJay, @laiwei-rice, @interwq)
- Fix unknown option detection when using Clang. (@Lapenkov)
- Fix symbol conflict with musl libc. (@georgthegreat)
- Add -Wimplicit-fallthrough checks. (@nickdesaulniers)
- Add __forceinline support on MSVC. (@santagada)
- Improve FreeBSD and Windows CI support. (@Lapenkov)
- Add CI support for PPC64LE architecture. (@ezeeyahoo)
Incompatible changes:
- Maximum size class allowed in tcache (opt.[lg_]tcache_max) now has an upper
bound of 8MiB. (@interwq)
Optimizations and refactors (@davidtgoldblatt, @Lapenkov, @interwq):
- Optimize the common cases of the thread cache operations.
- Optimize internal data structures, including RB tree and pairing heap.
- Optimize the internal locking on extent management.
- Extract and refactor the internal page allocator and interface modules.
Documentation:
- Fix doc build with --with-install-suffix. (@lawmurray, @interwq)
- Add PROFILING_INTERNALS.md. (@davidtgoldblatt)
- Ensure the proper order of doc building and installation. (@Mingli-Yu)
* 5.2.1 (August 5, 2019)
This release is primarily about Windows. A critical virtual memory leak is
resolved on all Windows platforms. The regression was present in all releases
since 5.0.0.
Bug fixes:
- Fix a severe virtual memory leak on Windows. This regression was first
released in 5.0.0. (@Ignition, @j0t, @frederik-h, @davidtgoldblatt,
@interwq)
- Fix size 0 handling in posix_memalign(). This regression was first released
in 5.2.0. (@interwq)
- Fix the prof_log unit test which may observe unexpected backtraces from
compiler optimizations. The test was first added in 5.2.0. (@marxin,
@gnzlbg, @interwq)
- Fix the declaration of the extent_avail tree. This regression was first
released in 5.1.0. (@zoulasc)
- Fix an incorrect reference in jeprof. This functionality was first released
in 3.0.0. (@prehistoric-penguin)
- Fix an assertion on the deallocation fast-path. This regression was first
released in 5.2.0. (@yinan1048576)
- Fix the TLS_MODEL attribute in headers. This regression was first released
in 5.0.0. (@zoulasc, @interwq)
Optimizations and refactors:
- Implement opt.retain on Windows and enable by default on 64-bit. (@interwq,
@davidtgoldblatt)
- Optimize away a branch on the operator delete[] path. (@mgrice)
- Add format annotation to the format generator function. (@zoulasc)
- Refactor and improve the size class header generation. (@yinan1048576)
- Remove best fit. (@djwatson)
- Avoid blocking on background thread locks for stats. (@oranagra, @interwq)
* 5.2.0 (April 2, 2019)
This release includes a few notable improvements, which are summarized below:
1) improved fast-path performance from the optimizations by @djwatson; 2)
reduced virtual memory fragmentation and metadata usage; and 3) bug fixes on
setting the number of background threads. In addition, peak / spike memory
usage is improved with certain allocation patterns. As usual, the release and
prior dev versions have gone through large-scale production testing.
New features:
- Implement oversize_threshold, which uses a dedicated arena for allocations
crossing the specified threshold to reduce fragmentation. (@interwq)
- Add extents usage information to stats. (@tyleretzel)
- Log time information for sampled allocations. (@tyleretzel)
- Support 0 size in sdallocx. (@djwatson)
- Output rate for certain counters in malloc_stats. (@zinoale)
- Add configure option --enable-readlinkat, which allows the use of readlinkat
over readlink. (@davidtgoldblatt)
- Add configure options --{enable,disable}-{static,shared} to allow not
building unwanted libraries. (@Ericson2314)
- Add configure option --disable-libdl to enable fully static builds.
(@interwq)
- Add mallctl interfaces:
+ opt.oversize_threshold (@interwq)
+ stats.arenas.<i>.extent_avail (@tyleretzel)
+ stats.arenas.<i>.extents.<j>.n{dirty,muzzy,retained} (@tyleretzel)
+ stats.arenas.<i>.extents.<j>.{dirty,muzzy,retained}_bytes
(@tyleretzel)
Portability improvements:
- Update MSVC builds. (@maksqwe, @rustyx)
- Workaround a compiler optimizer bug on s390x. (@rkmisra)
- Make use of pthread_set_name_np(3) on FreeBSD. (@trasz)
- Implement malloc_getcpu() to enable percpu_arena for windows. (@santagada)
- Link against -pthread instead of -lpthread. (@paravoid)
- Make background_thread not dependent on libdl. (@interwq)
- Add stringify to fix a linker directive issue on MSVC. (@daverigby)
- Detect and fall back when 8-bit atomics are unavailable. (@interwq)
- Fall back to the default pthread_create if dlsym(3) fails. (@interwq)
Optimizations and refactors:
- Refactor the TSD module. (@davidtgoldblatt)
- Avoid taking extents_muzzy mutex when muzzy is disabled. (@interwq)
- Avoid taking large_mtx for auto arenas on the tcache flush path. (@interwq)
- Optimize ixalloc by avoiding a size lookup. (@interwq)
- Implement opt.oversize_threshold which uses a dedicated arena for requests
crossing the threshold, also eagerly purges the oversize extents. Default
the threshold to 8 MiB. (@interwq)
- Clean compilation with -Wextra. (@gnzlbg, @jasone)
- Refactor the size class module. (@davidtgoldblatt)
- Refactor the stats emitter. (@tyleretzel)
- Optimize pow2_ceil. (@rkmisra)
- Avoid runtime detection of lazy purging on FreeBSD. (@trasz)
- Optimize mmap(2) alignment handling on FreeBSD. (@trasz)
- Improve error handling for THP state initialization. (@jsteemann)
- Rework the malloc() fast path. (@djwatson)
- Rework the free() fast path. (@djwatson)
- Refactor and optimize the tcache fill / flush paths. (@djwatson)
- Optimize sync / lwsync on PowerPC. (@chmeeedalf)
- Bypass extent_dalloc() when retain is enabled. (@interwq)
- Optimize the locking on large deallocation. (@interwq)
- Reduce the number of pages committed from sanity checking in debug build.
(@trasz, @interwq)
- Deprecate OSSpinLock. (@interwq)
- Lower the default number of background threads to 4 (when the feature
is enabled). (@interwq)
- Optimize the trylock spin wait. (@djwatson)
- Use arena index for arena-matching checks. (@interwq)
- Avoid forced decay on thread termination when using background threads.
(@interwq)
- Disable muzzy decay by default. (@djwatson, @interwq)
- Only initialize libgcc unwinder when profiling is enabled. (@paravoid,
@interwq)
Bug fixes (all only relevant to jemalloc 5.x):
- Fix background thread index issues with max_background_threads. (@djwatson,
@interwq)
- Fix stats output for opt.lg_extent_max_active_fit. (@interwq)
- Fix opt.prof_prefix initialization. (@davidtgoldblatt)
- Properly trigger decay on tcache destroy. (@interwq, @amosbird)
- Fix tcache.flush. (@interwq)
- Detect whether explicit extent zero out is necessary with huge pages or
custom extent hooks, which may change the purge semantics. (@interwq)
- Fix a side effect caused by extent_max_active_fit combined with decay-based
purging, where freed extents can accumulate and not be reused for an
extended period of time. (@interwq, @mpghf)
- Fix a missing unlock on extent register error handling. (@zoulasc)
Testing:
- Simplify the Travis script output. (@gnzlbg)
- Update the test scripts for FreeBSD. (@devnexen)
- Add unit tests for the producer-consumer pattern. (@interwq)
- Add Cirrus-CI config for FreeBSD builds. (@jasone)
- Add size-matching sanity checks on tcache flush. (@davidtgoldblatt,
@interwq)
Incompatible changes:
- Remove --with-lg-page-sizes. (@davidtgoldblatt)
Documentation:
- Attempt to build docs by default, however skip doc building when xsltproc
is missing. (@interwq, @cmuellner)
* 5.1.0 (May 4, 2018)
This release is primarily about fine-tuning, ranging from several new features
to numerous notable performance and portability enhancements. The release and
prior dev versions have been running in multiple large scale applications for
months, and the cumulative improvements are substantial in many cases.
Given the long and successful production runs, this release is likely a good
candidate for applications to upgrade, from both jemalloc 5.0 and before. For
performance-critical applications, the newly added TUNING.md provides
guidelines on jemalloc tuning.
New features:
- Implement transparent huge page support for internal metadata. (@interwq)
- Add opt.thp to allow enabling / disabling transparent huge pages for all
mappings. (@interwq)
- Add maximum background thread count option. (@djwatson)
- Allow prof_active to control opt.lg_prof_interval and prof.gdump.
(@interwq)
- Allow arena index lookup based on allocation addresses via mallctl.
(@lionkov)
- Allow disabling initial-exec TLS model. (@davidtgoldblatt, @KenMacD)
- Add opt.lg_extent_max_active_fit to set the max ratio between the size of
the active extent selected (to split off from) and the size of the requested
allocation. (@interwq, @davidtgoldblatt)
- Add retain_grow_limit to set the max size when growing virtual address
space. (@interwq)
- Add mallctl interfaces:
+ arena.<i>.retain_grow_limit (@interwq)
+ arenas.lookup (@lionkov)
+ max_background_threads (@djwatson)
+ opt.lg_extent_max_active_fit (@interwq)
+ opt.max_background_threads (@djwatson)
+ opt.metadata_thp (@interwq)
+ opt.thp (@interwq)
+ stats.metadata_thp (@interwq)
Portability improvements:
- Support GNU/kFreeBSD configuration. (@paravoid)
- Support m68k, nios2 and SH3 architectures. (@paravoid)
- Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable. (@zonyitoo)
- Fix symbol listing for cross-compiling. (@tamird)
- Fix high bits computation on ARM. (@davidtgoldblatt, @paravoid)
- Disable the CPU_SPINWAIT macro for Power. (@davidtgoldblatt, @marxin)
- Fix MSVC 2015 & 2017 builds. (@rustyx)
- Improve RISC-V support. (@EdSchouten)
- Set name mangling script in strict mode. (@nicolov)
- Avoid MADV_HUGEPAGE on ARM. (@marxin)
- Modify configure to determine return value of strerror_r.
(@davidtgoldblatt, @cferris1000)
- Make sure CXXFLAGS is tested with CPP compiler. (@nehaljwani)
- Fix 32-bit build on MSVC. (@rustyx)
- Fix external symbol on MSVC. (@maksqwe)
- Avoid a printf format specifier warning. (@jasone)
- Add configure option --disable-initial-exec-tls which can allow jemalloc to
be dynamically loaded after program startup. (@davidtgoldblatt, @KenMacD)
- AArch64: Add ILP32 support. (@cmuellner)
- Add --with-lg-vaddr configure option to support cross compiling.
(@cmuellner, @davidtgoldblatt)
Optimizations and refactors:
- Improve active extent fit with extent_max_active_fit. This considerably
reduces fragmentation over time and improves virtual memory and metadata
usage. (@davidtgoldblatt, @interwq)
- Eagerly coalesce large extents to reduce fragmentation. (@interwq)
- sdallocx: only read size info when page aligned (i.e. possibly sampled),
which speeds up the sized deallocation path significantly. (@interwq)
- Avoid attempting new mappings for in place expansion with retain, since
it rarely succeeds in practice and causes high overhead. (@interwq)
- Refactor OOM handling in newImpl. (@wqfish)
- Add internal fine-grained logging functionality for debugging use.
(@davidtgoldblatt)
- Refactor arena / tcache interactions. (@davidtgoldblatt)
- Refactor extent management with dumpable flag. (@davidtgoldblatt)
- Add runtime detection of lazy purging. (@interwq)
- Use pairing heap instead of red-black tree for extents_avail. (@djwatson)
- Use sysctl on startup in FreeBSD. (@trasz)
- Use thread local prng state instead of atomic. (@djwatson)
- Make decay to always purge one more extent than before, because in
practice large extents are usually the ones that cross the decay threshold.
Purging the additional extent helps save memory as well as reduce VM
fragmentation. (@interwq)
- Fast division by dynamic values. (@davidtgoldblatt)
- Improve the fit for aligned allocation. (@interwq, @edwinsmith)
- Refactor extent_t bitpacking. (@rkmisra)
- Optimize the generated assembly for ticker operations. (@davidtgoldblatt)
- Convert stats printing to use a structured text emitter. (@davidtgoldblatt)
- Remove preserve_lru feature for extents management. (@djwatson)
- Consolidate two memory loads into one on the fast deallocation path.
(@davidtgoldblatt, @interwq)
Bug fixes (most of the issues are only relevant to jemalloc 5.0):
- Fix deadlock with multithreaded fork in OS X. (@davidtgoldblatt)
- Validate returned file descriptor before use. (@zonyitoo)
- Fix a few background thread initialization and shutdown issues. (@interwq)
- Fix an extent coalesce + decay race by taking both coalescing extents off
the LRU list. (@interwq)
- Fix potentially unbound increase during decay, caused by one thread keep
stashing memory to purge while other threads generating new pages. The
number of pages to purge is checked to prevent this. (@interwq)
- Fix a FreeBSD bootstrap assertion. (@strejda, @interwq)
- Handle 32 bit mutex counters. (@rkmisra)
- Fix a indexing bug when creating background threads. (@davidtgoldblatt,
@binliu19)
- Fix arguments passed to extent_init. (@yuleniwo, @interwq)
- Fix addresses used for ordering mutexes. (@rkmisra)
- Fix abort_conf processing during bootstrap. (@interwq)
- Fix include path order for out-of-tree builds. (@cmuellner)
Incompatible changes:
- Remove --disable-thp. (@interwq)
- Remove mallctl interfaces:
+ config.thp (@interwq)
Documentation:
- Add TUNING.md. (@interwq, @davidtgoldblatt, @djwatson)
* 5.0.1 (July 1, 2017)
This bugfix release fixes several issues, most of which are obscure enough
that typical applications are not impacted.
Bug fixes:
- Update decay->nunpurged before purging, in order to avoid potential update
races and subsequent incorrect purging volume. (@interwq)
- Only abort on dlsym(3) error if the failure impacts an enabled feature (lazy
locking and/or background threads). This mitigates an initialization
failure bug for which we still do not have a clear reproduction test case.
(@interwq)
- Modify tsd management so that it neither crashes nor leaks if a thread's
only allocation activity is to call free() after TLS destructors have been
executed. This behavior was observed when operating with GNU libc, and is
unlikely to be an issue with other libc implementations. (@interwq)
- Mask signals during background thread creation. This prevents signals from
being inadvertently delivered to background threads. (@jasone,
@davidtgoldblatt, @interwq)
- Avoid inactivity checks within background threads, in order to prevent
recursive mutex acquisition. (@interwq)
- Fix extent_grow_retained() to use the specified hooks when the
arena.<i>.extent_hooks mallctl is used to override the default hooks.
(@interwq)
- Add missing reentrancy support for custom extent hooks which allocate.
(@interwq)
- Post-fork(2), re-initialize the list of tcaches associated with each arena
to contain no tcaches except the forking thread's. (@interwq)
- Add missing post-fork(2) mutex reinitialization for extent_grow_mtx. This
fixes potential deadlocks after fork(2). (@interwq)
- Enforce minimum autoconf version (currently 2.68), since 2.63 is known to
generate corrupt configure scripts. (@jasone)
- Ensure that the configured page size (--with-lg-page) is no larger than the
configured huge page size (--with-lg-hugepage). (@jasone)
* 5.0.0 (June 13, 2017)
Unlike all previous jemalloc releases, this release does not use naturally
@ -480,7 +1016,7 @@ brevity. Much more detail can be found in the git revision history:
these fixes, xallocx() now tries harder to partially fulfill requests for
optional extra space. Note that a couple of minor heap profiling
optimizations are included, but these are better thought of as performance
fixes that were integral to disovering most of the other bugs.
fixes that were integral to discovering most of the other bugs.
Optimizations:
- Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the

View file

@ -9,14 +9,17 @@ If building from unpackaged developer sources, the simplest command sequence
that might work is:
./autogen.sh
make dist
make
make install
Note that documentation is not built by the default target because doing so
would create a dependency on xsltproc in packaged releases, hence the
requirement to either run 'make dist' or avoid installing docs via the various
install_* targets documented below.
You can uninstall the installed build artifacts like this:
make uninstall
Notes:
- "autoconf" needs to be installed
- Documentation is built by the default target only when xsltproc is
available. Build will warn but not stop if the dependency is missing.
## Advanced configuration
@ -136,6 +139,7 @@ any of the following arguments (not a definitive list) to 'configure':
in the following list that appears to function correctly:
+ libunwind (requires --enable-prof-libunwind)
+ frame pointer (requires --enable-prof-frameptr)
+ libgcc (unless --disable-prof-libgcc)
+ gcc intrinsics (unless --disable-prof-gcc)
@ -144,6 +148,12 @@ any of the following arguments (not a definitive list) to 'configure':
Use the libunwind library (http://www.nongnu.org/libunwind/) for stack
backtracing.
* `--enable-prof-frameptr`
Use the optimized frame pointer unwinder for stack backtracing. Safe
to use in mixed code (with and without frame pointers) - but requires
frame pointers to produce meaningful stacks. Linux only.
* `--disable-prof-libgcc`
Disable the use of libgcc's backtracing functionality.
@ -157,11 +167,6 @@ any of the following arguments (not a definitive list) to 'configure':
Statically link against the specified libunwind.a rather than dynamically
linking with -lunwind.
* `--disable-thp`
Disable transparent huge page (THP) integration. This option can be useful
when cross compiling.
* `--disable-fill`
Disable support for junk/zero filling of memory. See the "opt.junk" and
@ -193,13 +198,13 @@ any of the following arguments (not a definitive list) to 'configure':
* `--disable-cache-oblivious`
Disable cache-oblivious large allocation alignment for large allocation
requests with no alignment constraints. If this feature is disabled, all
large allocations are page-aligned as an implementation artifact, which can
severely harm CPU cache utilization. However, the cache-oblivious layout
comes at the cost of one extra page per large allocation, which in the
most extreme case increases physical memory usage for the 16 KiB size class
to 20 KiB.
Disable cache-oblivious large allocation alignment by default, for large
allocation requests with no alignment constraints. If this feature is
disabled, all large allocations are page-aligned as an implementation
artifact, which can severely harm CPU cache utilization. However, the
cache-oblivious layout comes at the cost of one extra page per large
allocation, which in the most extreme case increases physical memory usage
for the 16 KiB size class to 20 KiB.
* `--disable-syscall`
@ -226,13 +231,6 @@ any of the following arguments (not a definitive list) to 'configure':
system page size may change between configuration and execution, e.g. when
cross compiling.
* `--with-lg-page-sizes=<lg-page-sizes>`
Specify the comma-separated base 2 logs of the page sizes to support. This
option may be useful when cross compiling in combination with
`--with-lg-page`, but its primary use case is for integration with FreeBSD's
libc, wherein jemalloc is embedded.
* `--with-lg-hugepage=<lg-hugepage>`
Specify the base 2 log of the system huge page size. This option is useful
@ -265,6 +263,27 @@ any of the following arguments (not a definitive list) to 'configure':
configuration, jemalloc will provide additional size classes that are not
16-byte-aligned (24, 40, and 56).
* `--with-lg-vaddr=<lg-vaddr>`
Specify the number of significant virtual address bits. By default, the
configure script attempts to detect virtual address size on those platforms
where it knows how, and picks a default otherwise. This option may be
useful when cross-compiling.
* `--disable-initial-exec-tls`
Disable the initial-exec TLS model for jemalloc's internal thread-local
storage (on those platforms that support explicit settings). This can allow
jemalloc to be dynamically loaded after program startup (e.g. using dlopen).
Note that in this case, there will be two malloc implementations operating
in the same process, which will almost certainly result in confusing runtime
crashes if pointers leak from one implementation to the other.
* `--disable-libdl`
Disable the usage of libdl, namely dlsym(3) which is required by the lazy
lock option. This can allow building static binaries.
The following environment variables (not a definitive list) impact configure's
behavior:
@ -303,13 +322,13 @@ behavior:
'configure' uses this to find programs.
In some cases it may be necessary to work around configuration results that do
not match reality. For example, Linux 4.5 added support for the MADV_FREE flag
to madvise(2), which can cause problems if building on a host with MADV_FREE
support and deploying to a target without. To work around this, use a cache
file to override the relevant configuration variable defined in configure.ac,
e.g.:
not match reality. For example, Linux 3.4 added support for the MADV_DONTDUMP
flag to madvise(2), which can cause problems if building on a host with
MADV_DONTDUMP support and deploying to a target without. To work around this,
use a cache file to override the relevant configuration variable defined in
configure.ac, e.g.:
echo "je_cv_madv_free=no" > config.cache && ./configure -C
echo "je_cv_madv_dontdump=no" > config.cache && ./configure -C
## Advanced compilation
@ -329,6 +348,7 @@ To install only parts of jemalloc, use the following targets:
install_include
install_lib_shared
install_lib_static
install_lib_pc
install_lib
install_doc_html
install_doc_man
@ -383,6 +403,102 @@ exclusively):
Use this to search for programs used during configuration and building.
## Building for Windows
There are at least two ways to build jemalloc's libraries for Windows. They
differ in their ease of use and flexibility.
### With MSVC solutions
This is the easy, but less flexible approach. It doesn't let you specify
arguments to the `configure` script.
1. Install Cygwin with at least the following packages:
* autoconf
* autogen
* gawk
* grep
* sed
2. Install Visual Studio 2015 or 2017 with Visual C++
3. Add Cygwin\bin to the PATH environment variable
4. Open "x64 Native Tools Command Prompt for VS 2017"
(note: x86/x64 doesn't matter at this point)
5. Generate header files:
sh -c "CC=cl ./autogen.sh"
6. Now the project can be opened and built in Visual Studio:
msvc\jemalloc_vc2017.sln
### With MSYS
This is a more involved approach that offers the same configuration flexibility
as Linux builds. We use it for our CI workflow to test different jemalloc
configurations on Windows.
1. Install the prerequisites
1. MSYS2
2. Chocolatey
3. Visual Studio if you want to compile with MSVC compiler
2. Run your bash emulation. It could be MSYS2 or Git Bash (this manual was
tested on both)
3. Manually and selectively follow
[before_install.sh](https://github.com/jemalloc/jemalloc/blob/dev/scripts/windows/before_install.sh)
script.
1. Skip the `TRAVIS_OS_NAME` check, `rm -rf C:/tools/msys64` and `choco
uninstall/upgrade` part.
2. If using `msys2` shell, add path to `RefreshEnv.cmd` to `PATH`:
`PATH="$PATH:/c/ProgramData/chocolatey/bin"`
3. Assign `msys_shell_cmd`, `msys2`, `mingw32` and `mingw64` as in the
script.
4. Pick `CROSS_COMPILE_32BIT` , `CC` and `USE_MSVC` values depending on
your needs. For instance, if you'd like to build for x86_64 Windows
with `gcc`, then `CROSS_COMPILE_32BIT="no"`, `CC="gcc"` and
`USE_MSVC=""`. If you'd like to build for x86 Windows with `cl.exe`,
then `CROSS_COMPILE_32BIT="yes"`, `CC="cl.exe"`, `USE_MSVC="x86"`.
For x86_64 builds with `cl.exe`, assign `USE_MSVC="amd64"` and
`CROSS_COMPILE_32BIT="no"`.
5. Replace the path to `vcvarsall.bat` with the path on your system. For
instance, on my Windows PC with Visual Studio 17, the path is
`C:\Program Files (x86)\Microsoft Visual
Studio\2017\BuildTools\VC\Auxiliary\Build\vcvarsall.bat`.
6. Execute the rest of the script. It will install the required
dependencies and assign the variable `build_env`, which is a function
that executes following commands with the correct environment
variables set.
4. Use `$build_env <command>` as you would in a Linux shell:
1. `$build_env autoconf`
2. `$build_env ./configure CC="<desired compiler>" <configuration flags>`
3. `$build_env mingw32-make`
If you're having any issues with the above, ensure the following:
5. When you run `cmd //C RefreshEnv.cmd`, you get an output line starting with
`Refreshing` . If it errors saying `RefreshEnv.cmd` is not found, then you
need to add it to your `PATH` as described above in item 3.2
6. When you run `cmd //C $vcvarsall`, it prints a bunch of environment
variables. Otherwise, check the path to the `vcvarsall.bat` in `$vcvarsall`
script and fix it.
### Building from vcpkg
The jemalloc port in vcpkg is kept up to date by Microsoft team members and
community contributors. The url of vcpkg is: https://github.com/Microsoft/vcpkg
. You can download and install jemalloc using the vcpkg dependency manager:
```shell
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh # ./bootstrap-vcpkg.bat for Windows
./vcpkg integrate install
./vcpkg install jemalloc
```
If the version is out of date, please [create an issue or pull
request](https://github.com/Microsoft/vcpkg) on the vcpkg repository.
## Development

View file

@ -24,7 +24,7 @@ abs_srcroot := @abs_srcroot@
abs_objroot := @abs_objroot@
# Build parameters.
CPPFLAGS := @CPPFLAGS@ -I$(srcroot)include -I$(objroot)include
CPPFLAGS := @CPPFLAGS@ -I$(objroot)include -I$(srcroot)include
CONFIGURE_CFLAGS := @CONFIGURE_CFLAGS@
SPECIFIED_CFLAGS := @SPECIFIED_CFLAGS@
EXTRA_CFLAGS := @EXTRA_CFLAGS@
@ -47,6 +47,7 @@ REV := @rev@
install_suffix := @install_suffix@
ABI := @abi@
XSLTPROC := @XSLTPROC@
XSLROOT := @XSLROOT@
AUTOCONF := @AUTOCONF@
_RPATH = @RPATH@
RPATH = $(if $(1),$(call _RPATH,$(1)))
@ -55,8 +56,12 @@ cfghdrs_out := @cfghdrs_out@
cfgoutputs_in := $(addprefix $(srcroot),@cfgoutputs_in@)
cfgoutputs_out := @cfgoutputs_out@
enable_autogen := @enable_autogen@
enable_doc := @enable_doc@
enable_shared := @enable_shared@
enable_static := @enable_static@
enable_prof := @enable_prof@
enable_zone_allocator := @enable_zone_allocator@
enable_experimental_smallocx := @enable_experimental_smallocx@
MALLOC_CONF := @JEMALLOC_CPREFIX@MALLOC_CONF
link_whole_archive := @link_whole_archive@
DSO_LDFLAGS = @DSO_LDFLAGS@
@ -93,29 +98,68 @@ C_SRCS := $(srcroot)src/jemalloc.c \
$(srcroot)src/arena.c \
$(srcroot)src/background_thread.c \
$(srcroot)src/base.c \
$(srcroot)src/bin.c \
$(srcroot)src/bin_info.c \
$(srcroot)src/bitmap.c \
$(srcroot)src/buf_writer.c \
$(srcroot)src/cache_bin.c \
$(srcroot)src/ckh.c \
$(srcroot)src/counter.c \
$(srcroot)src/ctl.c \
$(srcroot)src/decay.c \
$(srcroot)src/div.c \
$(srcroot)src/ecache.c \
$(srcroot)src/edata.c \
$(srcroot)src/edata_cache.c \
$(srcroot)src/ehooks.c \
$(srcroot)src/emap.c \
$(srcroot)src/eset.c \
$(srcroot)src/exp_grow.c \
$(srcroot)src/extent.c \
$(srcroot)src/extent_dss.c \
$(srcroot)src/extent_mmap.c \
$(srcroot)src/hash.c \
$(srcroot)src/hooks.c \
$(srcroot)src/fxp.c \
$(srcroot)src/san.c \
$(srcroot)src/san_bump.c \
$(srcroot)src/hook.c \
$(srcroot)src/hpa.c \
$(srcroot)src/hpa_central.c \
$(srcroot)src/hpa_hooks.c \
$(srcroot)src/hpa_utils.c \
$(srcroot)src/hpdata.c \
$(srcroot)src/inspect.c \
$(srcroot)src/large.c \
$(srcroot)src/log.c \
$(srcroot)src/malloc_io.c \
$(srcroot)src/conf.c \
$(srcroot)src/mutex.c \
$(srcroot)src/mutex_pool.c \
$(srcroot)src/nstime.c \
$(srcroot)src/pa.c \
$(srcroot)src/pa_extra.c \
$(srcroot)src/pac.c \
$(srcroot)src/pages.c \
$(srcroot)src/prng.c \
$(srcroot)src/peak_event.c \
$(srcroot)src/prof.c \
$(srcroot)src/prof_data.c \
$(srcroot)src/prof_log.c \
$(srcroot)src/prof_recent.c \
$(srcroot)src/prof_stack_range.c \
$(srcroot)src/prof_stats.c \
$(srcroot)src/prof_sys.c \
$(srcroot)src/psset.c \
$(srcroot)src/rtree.c \
$(srcroot)src/safety_check.c \
$(srcroot)src/sc.c \
$(srcroot)src/sec.c \
$(srcroot)src/stats.c \
$(srcroot)src/spin.c \
$(srcroot)src/sz.c \
$(srcroot)src/tcache.c \
$(srcroot)src/test_hooks.c \
$(srcroot)src/thread_event.c \
$(srcroot)src/thread_event_registry.c \
$(srcroot)src/ticker.c \
$(srcroot)src/tsd.c \
$(srcroot)src/util.c \
$(srcroot)src/witness.c
ifeq ($(enable_zone_allocator), 1)
C_SRCS += $(srcroot)src/zone.c
@ -138,98 +182,186 @@ else
LJEMALLOC := $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB)
endif
PC := $(objroot)jemalloc.pc
MAN3 := $(objroot)doc/jemalloc$(install_suffix).3
DOCS_XML := $(objroot)doc/jemalloc$(install_suffix).xml
DOCS_HTML := $(DOCS_XML:$(objroot)%.xml=$(objroot)%.html)
DOCS_MAN3 := $(DOCS_XML:$(objroot)%.xml=$(objroot)%.3)
DOCS := $(DOCS_HTML) $(DOCS_MAN3)
C_TESTLIB_SRCS := $(srcroot)test/src/btalloc.c $(srcroot)test/src/btalloc_0.c \
$(srcroot)test/src/btalloc_1.c $(srcroot)test/src/math.c \
$(srcroot)test/src/mtx.c $(srcroot)test/src/mq.c \
$(srcroot)test/src/mtx.c $(srcroot)test/src/sleep.c \
$(srcroot)test/src/SFMT.c $(srcroot)test/src/test.c \
$(srcroot)test/src/thd.c $(srcroot)test/src/timer.c
ifeq (1, $(link_whole_archive))
C_UTIL_INTEGRATION_SRCS :=
C_UTIL_CPP_SRCS :=
else
C_UTIL_INTEGRATION_SRCS := $(srcroot)src/nstime.c $(srcroot)src/malloc_io.c
C_UTIL_INTEGRATION_SRCS := $(srcroot)src/nstime.c $(srcroot)src/malloc_io.c \
$(srcroot)src/ticker.c
C_UTIL_CPP_SRCS := $(srcroot)src/nstime.c $(srcroot)src/malloc_io.c
endif
TESTS_UNIT := \
$(srcroot)test/unit/a0.c \
$(srcroot)test/unit/arena_decay.c \
$(srcroot)test/unit/arena_reset.c \
$(srcroot)test/unit/atomic.c \
$(srcroot)test/unit/background_thread.c \
$(srcroot)test/unit/background_thread_enable.c \
$(srcroot)test/unit/background_thread_init.c \
$(srcroot)test/unit/base.c \
$(srcroot)test/unit/batch_alloc.c \
$(srcroot)test/unit/bin.c \
$(srcroot)test/unit/binshard.c \
$(srcroot)test/unit/bitmap.c \
$(srcroot)test/unit/bit_util.c \
$(srcroot)test/unit/buf_writer.c \
$(srcroot)test/unit/cache_bin.c \
$(srcroot)test/unit/ckh.c \
$(srcroot)test/unit/conf.c \
$(srcroot)test/unit/conf_init_0.c \
$(srcroot)test/unit/conf_init_1.c \
$(srcroot)test/unit/conf_init_confirm.c \
$(srcroot)test/unit/conf_parse.c \
$(srcroot)test/unit/counter.c \
$(srcroot)test/unit/decay.c \
$(srcroot)test/unit/div.c \
$(srcroot)test/unit/double_free.c \
$(srcroot)test/unit/edata_cache.c \
$(srcroot)test/unit/emitter.c \
$(srcroot)test/unit/extent_quantize.c \
${srcroot}test/unit/fb.c \
$(srcroot)test/unit/fork.c \
${srcroot}test/unit/fxp.c \
${srcroot}test/unit/san.c \
${srcroot}test/unit/san_bump.c \
$(srcroot)test/unit/hash.c \
$(srcroot)test/unit/hooks.c \
$(srcroot)test/unit/hook.c \
$(srcroot)test/unit/hpa.c \
$(srcroot)test/unit/hpa_sec_integration.c \
$(srcroot)test/unit/hpa_thp_always.c \
$(srcroot)test/unit/hpa_vectorized_madvise.c \
$(srcroot)test/unit/hpa_vectorized_madvise_large_batch.c \
$(srcroot)test/unit/hpa_background_thread.c \
$(srcroot)test/unit/hpdata.c \
$(srcroot)test/unit/huge.c \
$(srcroot)test/unit/inspect.c \
$(srcroot)test/unit/junk.c \
$(srcroot)test/unit/junk_alloc.c \
$(srcroot)test/unit/junk_free.c \
$(srcroot)test/unit/json_stats.c \
$(srcroot)test/unit/large_ralloc.c \
$(srcroot)test/unit/log.c \
$(srcroot)test/unit/mallctl.c \
$(srcroot)test/unit/malloc_conf_2.c \
$(srcroot)test/unit/malloc_io.c \
$(srcroot)test/unit/math.c \
$(srcroot)test/unit/mpsc_queue.c \
$(srcroot)test/unit/mq.c \
$(srcroot)test/unit/mtx.c \
$(srcroot)test/unit/nstime.c \
$(srcroot)test/unit/ncached_max.c \
$(srcroot)test/unit/oversize_threshold.c \
$(srcroot)test/unit/pa.c \
$(srcroot)test/unit/pack.c \
$(srcroot)test/unit/pages.c \
$(srcroot)test/unit/peak.c \
$(srcroot)test/unit/ph.c \
$(srcroot)test/unit/prng.c \
$(srcroot)test/unit/prof_accum.c \
$(srcroot)test/unit/prof_active.c \
$(srcroot)test/unit/prof_gdump.c \
$(srcroot)test/unit/prof_hook.c \
$(srcroot)test/unit/prof_idump.c \
$(srcroot)test/unit/prof_log.c \
$(srcroot)test/unit/prof_mdump.c \
$(srcroot)test/unit/prof_recent.c \
$(srcroot)test/unit/prof_reset.c \
$(srcroot)test/unit/prof_small.c \
$(srcroot)test/unit/prof_stats.c \
$(srcroot)test/unit/prof_tctx.c \
$(srcroot)test/unit/prof_thread_name.c \
$(srcroot)test/unit/prof_sys_thread_name.c \
$(srcroot)test/unit/psset.c \
$(srcroot)test/unit/ql.c \
$(srcroot)test/unit/qr.c \
$(srcroot)test/unit/rb.c \
$(srcroot)test/unit/retained.c \
$(srcroot)test/unit/rtree.c \
$(srcroot)test/unit/safety_check.c \
$(srcroot)test/unit/sc.c \
$(srcroot)test/unit/sec.c \
$(srcroot)test/unit/seq.c \
$(srcroot)test/unit/SFMT.c \
$(srcroot)test/unit/size_check.c \
$(srcroot)test/unit/size_classes.c \
$(srcroot)test/unit/slab.c \
$(srcroot)test/unit/smoothstep.c \
$(srcroot)test/unit/spin.c \
$(srcroot)test/unit/stats.c \
$(srcroot)test/unit/stats_print.c \
$(srcroot)test/unit/sz.c \
$(srcroot)test/unit/tcache_init.c \
$(srcroot)test/unit/tcache_max.c \
$(srcroot)test/unit/test_hooks.c \
$(srcroot)test/unit/thread_event.c \
$(srcroot)test/unit/ticker.c \
$(srcroot)test/unit/nstime.c \
$(srcroot)test/unit/tsd.c \
$(srcroot)test/unit/uaf.c \
$(srcroot)test/unit/witness.c \
$(srcroot)test/unit/zero.c
$(srcroot)test/unit/zero.c \
$(srcroot)test/unit/zero_realloc_abort.c \
$(srcroot)test/unit/zero_realloc_free.c \
$(srcroot)test/unit/zero_realloc_alloc.c \
$(srcroot)test/unit/zero_reallocs.c
ifeq (@enable_prof@, 1)
TESTS_UNIT += \
$(srcroot)test/unit/arena_reset_prof.c
$(srcroot)test/unit/arena_reset_prof.c \
$(srcroot)test/unit/batch_alloc_prof.c
endif
TESTS_INTEGRATION := $(srcroot)test/integration/aligned_alloc.c \
$(srcroot)test/integration/allocated.c \
$(srcroot)test/integration/extent.c \
$(srcroot)test/integration/malloc.c \
$(srcroot)test/integration/mallocx.c \
$(srcroot)test/integration/MALLOCX_ARENA.c \
$(srcroot)test/integration/overflow.c \
$(srcroot)test/integration/posix_memalign.c \
$(srcroot)test/integration/rallocx.c \
$(srcroot)test/integration/sdallocx.c \
$(srcroot)test/integration/slab_sizes.c \
$(srcroot)test/integration/thread_arena.c \
$(srcroot)test/integration/thread_tcache_enabled.c \
$(srcroot)test/integration/xallocx.c
ifeq (@enable_experimental_smallocx@, 1)
TESTS_INTEGRATION += \
$(srcroot)test/integration/smallocx.c
endif
ifeq (@enable_cxx@, 1)
CPP_SRCS := $(srcroot)src/jemalloc_cpp.cpp
TESTS_INTEGRATION_CPP := $(srcroot)test/integration/cpp/basic.cpp
TESTS_INTEGRATION_CPP := $(srcroot)test/integration/cpp/basic.cpp \
$(srcroot)test/integration/cpp/infallible_new_true.cpp \
$(srcroot)test/integration/cpp/infallible_new_false.cpp
else
CPP_SRCS :=
TESTS_INTEGRATION_CPP :=
endif
TESTS_STRESS := $(srcroot)test/stress/microbench.c
TESTS_ANALYZE := $(srcroot)test/analyze/prof_bias.c \
$(srcroot)test/analyze/rand.c \
$(srcroot)test/analyze/sizes.c
TESTS_STRESS := $(srcroot)test/stress/batch_alloc.c \
$(srcroot)test/stress/fill_flush.c \
$(srcroot)test/stress/hookbench.c \
$(srcroot)test/stress/large_microbench.c \
$(srcroot)test/stress/mallctl.c \
$(srcroot)test/stress/microbench.c
ifeq (@enable_cxx@, 1)
TESTS_STRESS_CPP := $(srcroot)test/stress/cpp/microbench.cpp
else
TESTS_STRESS_CPP :=
endif
TESTS := $(TESTS_UNIT) $(TESTS_INTEGRATION) $(TESTS_INTEGRATION_CPP) $(TESTS_STRESS)
TESTS := $(TESTS_UNIT) $(TESTS_INTEGRATION) $(TESTS_INTEGRATION_CPP) \
$(TESTS_ANALYZE) $(TESTS_STRESS) $(TESTS_STRESS_CPP)
PRIVATE_NAMESPACE_HDRS := $(objroot)include/jemalloc/internal/private_namespace.h $(objroot)include/jemalloc/internal/private_namespace_jet.h
PRIVATE_NAMESPACE_GEN_HDRS := $(PRIVATE_NAMESPACE_HDRS:%.h=%.gen.h)
@ -245,15 +377,21 @@ C_JET_OBJS := $(C_SRCS:$(srcroot)%.c=$(objroot)%.jet.$(O))
C_TESTLIB_UNIT_OBJS := $(C_TESTLIB_SRCS:$(srcroot)%.c=$(objroot)%.unit.$(O))
C_TESTLIB_INTEGRATION_OBJS := $(C_TESTLIB_SRCS:$(srcroot)%.c=$(objroot)%.integration.$(O))
C_UTIL_INTEGRATION_OBJS := $(C_UTIL_INTEGRATION_SRCS:$(srcroot)%.c=$(objroot)%.integration.$(O))
C_TESTLIB_ANALYZE_OBJS := $(C_TESTLIB_SRCS:$(srcroot)%.c=$(objroot)%.analyze.$(O))
C_TESTLIB_STRESS_OBJS := $(C_TESTLIB_SRCS:$(srcroot)%.c=$(objroot)%.stress.$(O))
C_TESTLIB_OBJS := $(C_TESTLIB_UNIT_OBJS) $(C_TESTLIB_INTEGRATION_OBJS) $(C_UTIL_INTEGRATION_OBJS) $(C_TESTLIB_STRESS_OBJS)
C_TESTLIB_OBJS := $(C_TESTLIB_UNIT_OBJS) $(C_TESTLIB_INTEGRATION_OBJS) \
$(C_UTIL_INTEGRATION_OBJS) $(C_TESTLIB_ANALYZE_OBJS) \
$(C_TESTLIB_STRESS_OBJS)
TESTS_UNIT_OBJS := $(TESTS_UNIT:$(srcroot)%.c=$(objroot)%.$(O))
TESTS_INTEGRATION_OBJS := $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%.$(O))
TESTS_INTEGRATION_CPP_OBJS := $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%.$(O))
TESTS_ANALYZE_OBJS := $(TESTS_ANALYZE:$(srcroot)%.c=$(objroot)%.$(O))
TESTS_STRESS_OBJS := $(TESTS_STRESS:$(srcroot)%.c=$(objroot)%.$(O))
TESTS_OBJS := $(TESTS_UNIT_OBJS) $(TESTS_INTEGRATION_OBJS) $(TESTS_STRESS_OBJS)
TESTS_CPP_OBJS := $(TESTS_INTEGRATION_CPP_OBJS)
TESTS_STRESS_CPP_OBJS := $(TESTS_STRESS_CPP:$(srcroot)%.cpp=$(objroot)%.$(O))
TESTS_OBJS := $(TESTS_UNIT_OBJS) $(TESTS_INTEGRATION_OBJS) $(TESTS_ANALYZE_OBJS) \
$(TESTS_STRESS_OBJS)
TESTS_CPP_OBJS := $(TESTS_INTEGRATION_CPP_OBJS) $(TESTS_STRESS_CPP_OBJS)
.PHONY: all dist build_doc_html build_doc_man build_doc
.PHONY: install_bin install_include install_lib
@ -267,11 +405,32 @@ all: build_lib
dist: build_doc
$(objroot)doc/%.html : $(objroot)doc/%.xml $(srcroot)doc/stylesheet.xsl $(objroot)doc/html.xsl
$(objroot)doc/%$(install_suffix).html : $(objroot)doc/%.xml $(srcroot)doc/stylesheet.xsl $(objroot)doc/html.xsl
ifneq ($(XSLROOT),)
$(XSLTPROC) -o $@ $(objroot)doc/html.xsl $<
else
ifeq ($(wildcard $(DOCS_HTML)),)
@echo "<p>Missing xsltproc. Doc not built.</p>" > $@
endif
@echo "Missing xsltproc. "$@" not (re)built."
endif
$(objroot)doc/%.3 : $(objroot)doc/%.xml $(srcroot)doc/stylesheet.xsl $(objroot)doc/manpages.xsl
$(objroot)doc/%$(install_suffix).3 : $(objroot)doc/%.xml $(srcroot)doc/stylesheet.xsl $(objroot)doc/manpages.xsl
ifneq ($(XSLROOT),)
$(XSLTPROC) -o $@ $(objroot)doc/manpages.xsl $<
# The -o option (output filename) of xsltproc may not work (it uses the
# <refname> in the .xml file). Manually add the suffix if so.
ifneq ($(install_suffix),)
@if [ -f $(objroot)doc/jemalloc.3 ]; then \
mv $(objroot)doc/jemalloc.3 $(objroot)doc/jemalloc$(install_suffix).3 ; \
fi
endif
else
ifeq ($(wildcard $(DOCS_MAN3)),)
@echo "Missing xsltproc. Doc not built." > $@
endif
@echo "Missing xsltproc. "$@" not (re)built."
endif
build_doc_html: $(DOCS_HTML)
build_doc_man: $(DOCS_MAN3)
@ -312,17 +471,23 @@ $(C_TESTLIB_UNIT_OBJS): CPPFLAGS += -DJEMALLOC_UNIT_TEST
$(C_TESTLIB_INTEGRATION_OBJS): $(objroot)test/src/%.integration.$(O): $(srcroot)test/src/%.c
$(C_TESTLIB_INTEGRATION_OBJS): CPPFLAGS += -DJEMALLOC_INTEGRATION_TEST
$(C_UTIL_INTEGRATION_OBJS): $(objroot)src/%.integration.$(O): $(srcroot)src/%.c
$(C_TESTLIB_ANALYZE_OBJS): $(objroot)test/src/%.analyze.$(O): $(srcroot)test/src/%.c
$(C_TESTLIB_ANALYZE_OBJS): CPPFLAGS += -DJEMALLOC_ANALYZE_TEST
$(C_TESTLIB_STRESS_OBJS): $(objroot)test/src/%.stress.$(O): $(srcroot)test/src/%.c
$(C_TESTLIB_STRESS_OBJS): CPPFLAGS += -DJEMALLOC_STRESS_TEST -DJEMALLOC_STRESS_TESTLIB
$(C_TESTLIB_OBJS): CPPFLAGS += -I$(srcroot)test/include -I$(objroot)test/include
$(TESTS_UNIT_OBJS): CPPFLAGS += -DJEMALLOC_UNIT_TEST
$(TESTS_INTEGRATION_OBJS): CPPFLAGS += -DJEMALLOC_INTEGRATION_TEST
$(TESTS_INTEGRATION_CPP_OBJS): CPPFLAGS += -DJEMALLOC_INTEGRATION_CPP_TEST
$(TESTS_ANALYZE_OBJS): CPPFLAGS += -DJEMALLOC_ANALYZE_TEST
$(TESTS_STRESS_OBJS): CPPFLAGS += -DJEMALLOC_STRESS_TEST
$(TESTS_STRESS_CPP_OBJS): CPPFLAGS += -DJEMALLOC_STRESS_CPP_TEST
$(TESTS_OBJS): $(objroot)test/%.$(O): $(srcroot)test/%.c
$(TESTS_CPP_OBJS): $(objroot)test/%.$(O): $(srcroot)test/%.cpp
$(TESTS_OBJS): CPPFLAGS += -I$(srcroot)test/include -I$(objroot)test/include
$(TESTS_CPP_OBJS): CPPFLAGS += -I$(srcroot)test/include -I$(objroot)test/include
$(TESTS_OBJS): CFLAGS += -fno-builtin
$(TESTS_CPP_OBJS): CPPFLAGS += -fno-builtin
ifneq ($(IMPORTLIB),$(SO))
$(CPP_OBJS) $(C_SYM_OBJS) $(C_OBJS) $(C_JET_SYM_OBJS) $(C_JET_OBJS): CPPFLAGS += -DDLLEXPORT
endif
@ -337,7 +502,7 @@ $(TESTS_OBJS) $(TESTS_CPP_OBJS): $(objroot)test/include/test/jemalloc_test.h
endif
$(C_OBJS) $(CPP_OBJS) $(C_PIC_OBJS) $(CPP_PIC_OBJS) $(C_TESTLIB_INTEGRATION_OBJS) $(C_UTIL_INTEGRATION_OBJS) $(TESTS_INTEGRATION_OBJS) $(TESTS_INTEGRATION_CPP_OBJS): $(objroot)include/jemalloc/internal/private_namespace.h
$(C_JET_OBJS) $(C_TESTLIB_UNIT_OBJS) $(C_TESTLIB_STRESS_OBJS) $(TESTS_UNIT_OBJS) $(TESTS_STRESS_OBJS): $(objroot)include/jemalloc/internal/private_namespace_jet.h
$(C_JET_OBJS) $(C_TESTLIB_UNIT_OBJS) $(C_TESTLIB_ANALYZE_OBJS) $(C_TESTLIB_STRESS_OBJS) $(TESTS_UNIT_OBJS) $(TESTS_ANALYZE_OBJS) $(TESTS_STRESS_OBJS) $(TESTS_STRESS_CPP_OBJS): $(objroot)include/jemalloc/internal/private_namespace_jet.h
$(C_SYM_OBJS) $(C_OBJS) $(C_PIC_OBJS) $(C_JET_SYM_OBJS) $(C_JET_OBJS) $(C_TESTLIB_OBJS) $(TESTS_OBJS): %.$(O):
@mkdir -p $(@D)
@ -361,7 +526,7 @@ $(objroot)include/jemalloc/internal/private_namespace_jet.gen.h: $(C_JET_SYMS)
$(SHELL) $(srcroot)include/jemalloc/internal/private_namespace.sh $^ > $@
%.h: %.gen.h
@if ! `cmp -s $< $@` ; then echo "cp $< $<"; cp $< $@ ; fi
@if ! `cmp -s $< $@` ; then echo "cp $< $@"; cp $< $@ ; fi
$(CPP_OBJS) $(CPP_PIC_OBJS) $(TESTS_CPP_OBJS): %.$(O):
@mkdir -p $(@D)
@ -378,7 +543,11 @@ endif
$(objroot)lib/$(LIBJEMALLOC).$(SOREV) : $(if $(PIC_CFLAGS),$(C_PIC_OBJS),$(C_OBJS)) $(if $(PIC_CFLAGS),$(CPP_PIC_OBJS),$(CPP_OBJS))
@mkdir -p $(@D)
ifeq (@enable_cxx@, 1)
$(CXX) $(DSO_LDFLAGS) $(call RPATH,$(RPATH_EXTRA)) $(LDTARGET) $+ $(LDFLAGS) $(LIBS) $(EXTRA_LDFLAGS)
else
$(CC) $(DSO_LDFLAGS) $(call RPATH,$(RPATH_EXTRA)) $(LDTARGET) $+ $(LDFLAGS) $(LIBS) $(EXTRA_LDFLAGS)
endif
$(objroot)lib/$(LIBJEMALLOC)_pic.$(A) : $(C_PIC_OBJS) $(CPP_PIC_OBJS)
$(objroot)lib/$(LIBJEMALLOC).$(A) : $(C_OBJS) $(CPP_OBJS)
@ -394,19 +563,50 @@ $(objroot)test/unit/%$(EXE): $(objroot)test/unit/%.$(O) $(C_JET_OBJS) $(C_TESTLI
$(objroot)test/integration/%$(EXE): $(objroot)test/integration/%.$(O) $(C_TESTLIB_INTEGRATION_OBJS) $(C_UTIL_INTEGRATION_OBJS) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB)
@mkdir -p $(@D)
$(CC) $(TEST_LD_MODE) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(LJEMALLOC) $(LDFLAGS) $(filter-out -lm,$(filter -lrt -lpthread -lstdc++,$(LIBS))) $(LM) $(EXTRA_LDFLAGS)
$(CC) $(TEST_LD_MODE) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(LJEMALLOC) $(LDFLAGS) $(filter-out -lm,$(filter -lrt -pthread -lstdc++,$(LIBS))) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/integration/cpp/%$(EXE): $(objroot)test/integration/cpp/%.$(O) $(C_TESTLIB_INTEGRATION_OBJS) $(C_UTIL_INTEGRATION_OBJS) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB)
@mkdir -p $(@D)
$(CXX) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB) $(LDFLAGS) $(filter-out -lm,$(LIBS)) -lm $(EXTRA_LDFLAGS)
$(objroot)test/analyze/%$(EXE): $(objroot)test/analyze/%.$(O) $(C_JET_OBJS) $(C_TESTLIB_ANALYZE_OBJS)
@mkdir -p $(@D)
$(CC) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(LDFLAGS) $(filter-out -lm,$(LIBS)) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/stress/%$(EXE): $(objroot)test/stress/%.$(O) $(C_JET_OBJS) $(C_TESTLIB_STRESS_OBJS) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB)
@mkdir -p $(@D)
$(CC) $(TEST_LD_MODE) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB) $(LDFLAGS) $(filter-out -lm,$(LIBS)) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/stress/pa/pa_data_preprocessor$(EXE): $(objroot)test/stress/pa/pa_data_preprocessor.$(O)
@mkdir -p $(@D)
$(CXX) $(LDTARGET) $(filter %.$(O),$^) $(LDFLAGS) $(filter-out -lm,$(LIBS)) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/stress/pa/pa_microbench$(EXE): $(objroot)test/stress/pa/pa_microbench.$(O) $(C_JET_OBJS) $(C_TESTLIB_STRESS_OBJS)
@mkdir -p $(@D)
$(CC) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(LDFLAGS) $(filter-out -lm,$(LIBS)) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/stress/pa/%.$(O): $(srcroot)test/stress/pa/%.c
@mkdir -p $(@D)
$(CC) $(CFLAGS) -c $(CPPFLAGS) -DJEMALLOC_STRESS_TEST -I$(srcroot)test/include -I$(objroot)test/include $(CTARGET) $<
ifdef CC_MM
@$(CC) -MM $(CPPFLAGS) -DJEMALLOC_STRESS_TEST -I$(srcroot)test/include -I$(objroot)test/include -MT $@ -o $(@:%.$(O)=%.d) $<
endif
$(objroot)test/stress/pa/%.$(O): $(srcroot)test/stress/pa/%.cpp
@mkdir -p $(@D)
$(CXX) $(CXXFLAGS) -c $(CPPFLAGS) -I$(srcroot)test/include -I$(objroot)test/include $(CTARGET) $<
ifdef CC_MM
@$(CXX) -MM $(CPPFLAGS) -I$(srcroot)test/include -I$(objroot)test/include -MT $@ -o $(@:%.$(O)=%.d) $<
endif
build_lib_shared: $(DSOS)
build_lib_static: $(STATIC_LIBS)
build_lib: build_lib_shared build_lib_static
ifeq ($(enable_shared), 1)
build_lib: build_lib_shared
endif
ifeq ($(enable_static), 1)
build_lib: build_lib_static
endif
install_bin:
$(INSTALL) -d $(BINDIR)
@ -443,16 +643,22 @@ install_lib_pc: $(PC)
$(INSTALL) -m 644 $$l $(LIBDIR)/pkgconfig; \
done
install_lib: install_lib_shared install_lib_static install_lib_pc
ifeq ($(enable_shared), 1)
install_lib: install_lib_shared
endif
ifeq ($(enable_static), 1)
install_lib: install_lib_static
endif
install_lib: install_lib_pc
install_doc_html:
install_doc_html: build_doc_html
$(INSTALL) -d $(DATADIR)/doc/jemalloc$(install_suffix)
@for d in $(DOCS_HTML); do \
echo "$(INSTALL) -m 644 $$d $(DATADIR)/doc/jemalloc$(install_suffix)"; \
$(INSTALL) -m 644 $$d $(DATADIR)/doc/jemalloc$(install_suffix); \
done
install_doc_man:
install_doc_man: build_doc_man
$(INSTALL) -d $(MANDIR)/man3
@for d in $(DOCS_MAN3); do \
echo "$(INSTALL) -m 644 $$d $(MANDIR)/man3"; \
@ -461,17 +667,67 @@ done
install_doc: install_doc_html install_doc_man
install: install_bin install_include install_lib install_doc
install: install_bin install_include install_lib
ifeq ($(enable_doc), 1)
install: install_doc
endif
uninstall_bin:
$(RM) -v $(foreach b,$(notdir $(BINS)),$(BINDIR)/$(b))
uninstall_include:
$(RM) -v $(foreach h,$(notdir $(C_HDRS)),$(INCLUDEDIR)/jemalloc/$(h))
rmdir -v $(INCLUDEDIR)/jemalloc
uninstall_lib_shared:
$(RM) -v $(LIBDIR)/$(LIBJEMALLOC).$(SOREV)
ifneq ($(SOREV),$(SO))
$(RM) -v $(LIBDIR)/$(LIBJEMALLOC).$(SO)
endif
uninstall_lib_static:
$(RM) -v $(foreach l,$(notdir $(STATIC_LIBS)),$(LIBDIR)/$(l))
uninstall_lib_pc:
$(RM) -v $(foreach p,$(notdir $(PC)),$(LIBDIR)/pkgconfig/$(p))
ifeq ($(enable_shared), 1)
uninstall_lib: uninstall_lib_shared
endif
ifeq ($(enable_static), 1)
uninstall_lib: uninstall_lib_static
endif
uninstall_lib: uninstall_lib_pc
uninstall_doc_html:
$(RM) -v $(foreach d,$(notdir $(DOCS_HTML)),$(DATADIR)/doc/jemalloc$(install_suffix)/$(d))
rmdir -v $(DATADIR)/doc/jemalloc$(install_suffix)
uninstall_doc_man:
$(RM) -v $(foreach d,$(notdir $(DOCS_MAN3)),$(MANDIR)/man3/$(d))
uninstall_doc: uninstall_doc_html uninstall_doc_man
uninstall: uninstall_bin uninstall_include uninstall_lib
ifeq ($(enable_doc), 1)
uninstall: uninstall_doc
endif
tests_unit: $(TESTS_UNIT:$(srcroot)%.c=$(objroot)%$(EXE))
tests_integration: $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%$(EXE)) $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%$(EXE))
tests_stress: $(TESTS_STRESS:$(srcroot)%.c=$(objroot)%$(EXE))
tests: tests_unit tests_integration tests_stress
tests_analyze: $(TESTS_ANALYZE:$(srcroot)%.c=$(objroot)%$(EXE))
tests_stress: $(TESTS_STRESS:$(srcroot)%.c=$(objroot)%$(EXE)) $(TESTS_STRESS_CPP:$(srcroot)%.cpp=$(objroot)%$(EXE))
tests_pa: $(objroot)test/stress/pa/pa_data_preprocessor$(EXE) $(objroot)test/stress/pa/pa_microbench$(EXE)
tests: tests_unit tests_integration tests_analyze tests_stress
check_unit_dir:
@mkdir -p $(objroot)test/unit
check_integration_dir:
@mkdir -p $(objroot)test/integration
analyze_dir:
@mkdir -p $(objroot)test/analyze
stress_dir:
@mkdir -p $(objroot)test/stress
check_dir: check_unit_dir check_integration_dir
@ -488,8 +744,15 @@ check_integration_decay: tests_integration check_integration_dir
$(MALLOC_CONF)="dirty_decay_ms:0,muzzy_decay_ms:0" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%) $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%)
check_integration: tests_integration check_integration_dir
$(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%) $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%)
analyze: tests_analyze analyze_dir
ifeq ($(enable_prof), 1)
$(MALLOC_CONF)="prof:true" $(SHELL) $(objroot)test/test.sh $(TESTS_ANALYZE:$(srcroot)%.c=$(objroot)%)
else
$(SHELL) $(objroot)test/test.sh $(TESTS_ANALYZE:$(srcroot)%.c=$(objroot)%)
endif
stress: tests_stress stress_dir
$(SHELL) $(objroot)test/test.sh $(TESTS_STRESS:$(srcroot)%.c=$(objroot)%)
$(SHELL) $(objroot)test/test.sh $(TESTS_STRESS_CPP:$(srcroot)%.cpp=$(objroot)%)
check: check_unit check_integration check_integration_decay check_integration_prof
clean:

2
README
View file

@ -17,4 +17,4 @@ jemalloc.
The ChangeLog file contains a brief summary of changes for each release.
URL: http://jemalloc.net/
URL: https://jemalloc.net/

129
TUNING.md Normal file
View file

@ -0,0 +1,129 @@
This document summarizes the common approaches for performance fine tuning with
jemalloc (as of 5.3.0). The default configuration of jemalloc tends to work
reasonably well in practice, and most applications should not have to tune any
options. However, in order to cover a wide range of applications and avoid
pathological cases, the default setting is sometimes kept conservative and
suboptimal, even for many common workloads. When jemalloc is properly tuned for
a specific application / workload, it is common to improve system level metrics
by a few percent, or make favorable trade-offs.
## Notable runtime options for performance tuning
Runtime options can be set via
[malloc_conf](https://jemalloc.net/jemalloc.3.html#tuning).
* [background_thread](https://jemalloc.net/jemalloc.3.html#background_thread)
Enabling jemalloc background threads generally improves the tail latency for
application threads, since unused memory purging is shifted to the dedicated
background threads. In addition, unintended purging delay caused by
application inactivity is avoided with background threads.
Suggested: `background_thread:true` when jemalloc managed threads can be
allowed.
* [metadata_thp](https://jemalloc.net/jemalloc.3.html#opt.metadata_thp)
Allowing jemalloc to utilize transparent huge pages for its internal
metadata usually reduces TLB misses significantly, especially for programs
with large memory footprint and frequent allocation / deallocation
activities. Metadata memory usage may increase due to the use of huge
pages.
Suggested for allocation intensive programs: `metadata_thp:auto` or
`metadata_thp:always`, which is expected to improve CPU utilization at a
small memory cost.
* [dirty_decay_ms](https://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms) and
[muzzy_decay_ms](https://jemalloc.net/jemalloc.3.html#opt.muzzy_decay_ms)
Decay time determines how fast jemalloc returns unused pages back to the
operating system, and therefore provides a fairly straightforward trade-off
between CPU and memory usage. Shorter decay time purges unused pages faster
to reduces memory usage (usually at the cost of more CPU cycles spent on
purging), and vice versa.
Suggested: tune the values based on the desired trade-offs.
* [narenas](https://jemalloc.net/jemalloc.3.html#opt.narenas)
By default jemalloc uses multiple arenas to reduce internal lock contention.
However high arena count may also increase overall memory fragmentation,
since arenas manage memory independently. When high degree of parallelism
is not expected at the allocator level, lower number of arenas often
improves memory usage.
Suggested: if low parallelism is expected, try lower arena count while
monitoring CPU and memory usage.
* [percpu_arena](https://jemalloc.net/jemalloc.3.html#opt.percpu_arena)
Enable dynamic thread to arena association based on running CPU. This has
the potential to improve locality, e.g. when thread to CPU affinity is
present.
Suggested: try `percpu_arena:percpu` or `percpu_arena:phycpu` if
thread migration between processors is expected to be infrequent.
Examples:
* High resource consumption application, prioritizing CPU utilization:
`background_thread:true,metadata_thp:auto` combined with relaxed decay time
(increased `dirty_decay_ms` and / or `muzzy_decay_ms`,
e.g. `dirty_decay_ms:30000,muzzy_decay_ms:30000`).
* High resource consumption application, prioritizing memory usage:
`background_thread:true,tcache_max:4096` combined with shorter decay time
(decreased `dirty_decay_ms` and / or `muzzy_decay_ms`,
e.g. `dirty_decay_ms:5000,muzzy_decay_ms:5000`), and lower arena count
(e.g. number of CPUs).
* Low resource consumption application:
`narenas:1,tcache_max:1024` combined with shorter decay time (decreased
`dirty_decay_ms` and / or `muzzy_decay_ms`,e.g.
`dirty_decay_ms:1000,muzzy_decay_ms:0`).
* Extremely conservative -- minimize memory usage at all costs, only suitable when
allocation activity is very rare:
`narenas:1,tcache:false,dirty_decay_ms:0,muzzy_decay_ms:0`
Note that it is recommended to combine the options with `abort_conf:true` which
aborts immediately on illegal options.
## Beyond runtime options
In addition to the runtime options, there are a number of programmatic ways to
improve application performance with jemalloc.
* [Explicit arenas](https://jemalloc.net/jemalloc.3.html#arenas.create)
Manually created arenas can help performance in various ways, e.g. by
managing locality and contention for specific usages. For example,
applications can explicitly allocate frequently accessed objects from a
dedicated arena with
[mallocx()](https://jemalloc.net/jemalloc.3.html#MALLOCX_ARENA) to improve
locality. In addition, explicit arenas often benefit from individually
tuned options, e.g. relaxed [decay
time](https://jemalloc.net/jemalloc.3.html#arena.i.dirty_decay_ms) if
frequent reuse is expected.
* [Extent hooks](https://jemalloc.net/jemalloc.3.html#arena.i.extent_hooks)
Extent hooks allow customization for managing underlying memory. One use
case for performance purpose is to utilize huge pages -- for example,
[HHVM](httpss://github.com/facebook/hhvm/blob/master/hphp/util/alloc.cpp)
uses explicit arenas with customized extent hooks to manage 1GB huge pages
for frequently accessed data, which reduces TLB misses significantly.
* [Explicit thread-to-arena
binding](https://jemalloc.net/jemalloc.3.html#thread.arena)
It is common for some threads in an application to have different memory
access / allocation patterns. Threads with heavy workloads often benefit
from explicit binding, e.g. binding very active threads to dedicated arenas
may reduce contention at the allocator level.

View file

@ -9,8 +9,8 @@ for i in autoconf; do
fi
done
echo "./configure --enable-autogen $@"
./configure --enable-autogen $@
echo "./configure --enable-autogen \"$@\""
./configure --enable-autogen "$@"
if [ $? -ne 0 ]; then
echo "Error $? in ./configure"
exit 1

View file

@ -88,6 +88,7 @@ my %obj_tool_map = (
#"nm_pdb" => "nm-pdb", # for reading windows (PDB-format) executables
#"addr2line_pdb" => "addr2line-pdb", # ditto
#"otool" => "otool", # equivalent of objdump on OS X
#"dyld_info" => "dyld_info", # equivalent of otool on OS X for shared cache
);
# NOTE: these are lists, so you can put in commandline flags if you want.
my @DOT = ("dot"); # leave non-absolute, since it may be in /usr/local
@ -205,6 +206,8 @@ Output type:
--svg Generate SVG to stdout
--gif Generate GIF to stdout
--raw Generate symbolized jeprof data (useful with remote fetch)
--collapsed Generate collapsed stacks for building flame graphs
(see http://www.brendangregg.com/flamegraphs.html)
Heap-Profile Options:
--inuse_space Display in-use (mega)bytes [default]
@ -238,6 +241,7 @@ Miscellaneous:
--test Run unit tests
--help This message
--version Version information
--debug-syms-by-id (Linux only) Find debug symbol files by build ID as well as by name
Environment Variables:
JEPROF_TMPDIR Profiles directory. Defaults to \$HOME/jeprof
@ -332,6 +336,7 @@ sub Init() {
$main::opt_gif = 0;
$main::opt_svg = 0;
$main::opt_raw = 0;
$main::opt_collapsed = 0;
$main::opt_nodecount = 80;
$main::opt_nodefraction = 0.005;
@ -362,6 +367,7 @@ sub Init() {
$main::opt_tools = "";
$main::opt_debug = 0;
$main::opt_test = 0;
$main::opt_debug_syms_by_id = 0;
# These are undocumented flags used only by unittests.
$main::opt_test_stride = 0;
@ -405,6 +411,7 @@ sub Init() {
"svg!" => \$main::opt_svg,
"gif!" => \$main::opt_gif,
"raw!" => \$main::opt_raw,
"collapsed!" => \$main::opt_collapsed,
"interactive!" => \$main::opt_interactive,
"nodecount=i" => \$main::opt_nodecount,
"nodefraction=f" => \$main::opt_nodefraction,
@ -429,6 +436,7 @@ sub Init() {
"tools=s" => \$main::opt_tools,
"test!" => \$main::opt_test,
"debug!" => \$main::opt_debug,
"debug-syms-by-id!" => \$main::opt_debug_syms_by_id,
# Undocumented flags used only by unittests:
"test_stride=i" => \$main::opt_test_stride,
) || usage("Invalid option(s)");
@ -490,6 +498,7 @@ sub Init() {
$main::opt_svg +
$main::opt_gif +
$main::opt_raw +
$main::opt_collapsed +
$main::opt_interactive +
0;
if ($modes > 1) {
@ -572,6 +581,11 @@ sub Init() {
foreach (@prefix_list) {
s|/+$||;
}
# Flag to prevent us from trying over and over to use
# elfutils if it's not installed (used only with
# --debug-syms-by-id option).
$main::gave_up_on_elfutils = 0;
}
sub FilterAndPrint {
@ -621,6 +635,8 @@ sub FilterAndPrint {
PrintText($symbols, $flat, $cumulative, -1);
} elsif ($main::opt_raw) {
PrintSymbolizedProfile($symbols, $profile, $main::prog);
} elsif ($main::opt_collapsed) {
PrintCollapsedStacks($symbols, $profile);
} elsif ($main::opt_callgrind) {
PrintCallgrind($calls);
} else {
@ -673,15 +689,15 @@ sub Main() {
my $symbol_map = {};
# Read one profile, pick the last item on the list
my $data = ReadProfile($main::prog, pop(@main::profile_files));
my $data = ReadProfile($main::prog, $main::profile_files[0]);
my $profile = $data->{profile};
my $pcs = $data->{pcs};
my $libs = $data->{libs}; # Info about main program and shared libraries
$symbol_map = MergeSymbols($symbol_map, $data->{symbols});
# Add additional profiles, if available.
if (scalar(@main::profile_files) > 0) {
foreach my $pname (@main::profile_files) {
if (scalar(@main::profile_files) > 1) {
foreach my $pname (@main::profile_files[1..$#main::profile_files]) {
my $data2 = ReadProfile($main::prog, $pname);
$profile = AddProfile($profile, $data2->{profile});
$pcs = AddPcs($pcs, $data2->{pcs});
@ -2810,6 +2826,40 @@ sub IsSecondPcAlwaysTheSame {
return $second_pc;
}
sub ExtractSymbolNameInlineStack {
my $symbols = shift;
my $address = shift;
my @stack = ();
if (exists $symbols->{$address}) {
my @localinlinestack = @{$symbols->{$address}};
for (my $i = $#localinlinestack; $i > 0; $i-=3) {
my $file = $localinlinestack[$i-1];
my $fn = $localinlinestack[$i-0];
if ($file eq "?" || $file eq ":0") {
$file = "??:0";
}
if ($fn eq '??') {
# If we can't get the symbol name, at least use the file information.
$fn = $file;
}
my $suffix = "[inline]";
if ($i == 2) {
$suffix = "";
}
push (@stack, $fn.$suffix);
}
}
else {
# If we can't get a symbol name, at least fill in the address.
push (@stack, $address);
}
return @stack;
}
sub ExtractSymbolLocation {
my $symbols = shift;
my $address = shift;
@ -2884,6 +2934,17 @@ sub FilterFrames {
return $result;
}
sub PrintCollapsedStacks {
my $symbols = shift;
my $profile = shift;
while (my ($stack_trace, $count) = each %$profile) {
my @address = split(/\n/, $stack_trace);
my @names = reverse ( map { ExtractSymbolNameInlineStack($symbols, $_) } @address );
printf("%s %d\n", join(";", @names), $count);
}
}
sub RemoveUninterestingFrames {
my $symbols = shift;
my $profile = shift;
@ -2895,6 +2956,25 @@ sub RemoveUninterestingFrames {
foreach my $name ('@JEMALLOC_PREFIX@calloc',
'cfree',
'@JEMALLOC_PREFIX@malloc',
'je_malloc_default',
'newImpl',
'void* newImpl',
'fallbackNewImpl',
'void* fallbackNewImpl',
'fallback_impl',
'void* fallback_impl',
'imalloc',
'int imalloc',
'imalloc_body',
'int imalloc_body',
'prof_alloc_prep',
'prof_tctx_t *prof_alloc_prep',
'prof_backtrace_impl',
'void prof_backtrace_impl',
'je_prof_backtrace',
'void je_prof_backtrace',
'je_prof_tctx_create',
'prof_tctx_t* prof_tctx_create',
'@JEMALLOC_PREFIX@free',
'@JEMALLOC_PREFIX@memalign',
'@JEMALLOC_PREFIX@posix_memalign',
@ -2903,10 +2983,16 @@ sub RemoveUninterestingFrames {
'@JEMALLOC_PREFIX@valloc',
'@JEMALLOC_PREFIX@realloc',
'@JEMALLOC_PREFIX@mallocx',
'irallocx_prof',
'void *irallocx_prof',
'@JEMALLOC_PREFIX@rallocx',
'do_rallocx',
'ixallocx_prof',
'size_t ixallocx_prof',
'@JEMALLOC_PREFIX@xallocx',
'@JEMALLOC_PREFIX@dallocx',
'@JEMALLOC_PREFIX@sdallocx',
'@JEMALLOC_PREFIX@sdallocx_noflags',
'tc_calloc',
'tc_cfree',
'tc_malloc',
@ -3015,6 +3101,8 @@ sub RemoveUninterestingFrames {
foreach my $a (@addrs) {
if (exists($symbols->{$a})) {
my $func = $symbols->{$a}->[0];
# Remove suffix in the symbols following space when filtering.
$func =~ s/ .*//;
if ($skip{$func} || ($func =~ m/$skip_regexp/)) {
# Throw away the portion of the backtrace seen so far, under the
# assumption that previous frames were for functions internal to the
@ -4437,16 +4525,54 @@ sub FindLibrary {
# For libc libraries, the copy in /usr/lib/debug contains debugging symbols
sub DebuggingLibrary {
my $file = shift;
if ($file =~ m|^/|) {
if (-f "/usr/lib/debug$file") {
return "/usr/lib/debug$file";
} elsif (-f "/usr/lib/debug$file.debug") {
return "/usr/lib/debug$file.debug";
}
if ($file !~ m|^/|) {
return undef;
}
# Find debug symbol file if it's named after the library's name.
if (-f "/usr/lib/debug$file") {
if($main::opt_debug) { print STDERR "found debug info for $file in /usr/lib/debug$file\n"; }
return "/usr/lib/debug$file";
} elsif (-f "/usr/lib/debug$file.debug") {
if($main::opt_debug) { print STDERR "found debug info for $file in /usr/lib/debug$file.debug\n"; }
return "/usr/lib/debug$file.debug";
}
if(!$main::opt_debug_syms_by_id) {
if($main::opt_debug) { print STDERR "no debug symbols found for $file\n" };
return undef;
}
# Find debug file if it's named after the library's build ID.
my $readelf = '';
if (!$main::gave_up_on_elfutils) {
$readelf = qx/eu-readelf -n ${file}/;
if ($?) {
print STDERR "Cannot run eu-readelf. To use --debug-syms-by-id you must be on Linux, with elfutils installed.\n";
$main::gave_up_on_elfutils = 1;
return undef;
}
my $buildID = $1 if $readelf =~ /Build ID: ([A-Fa-f0-9]+)/s;
if (defined $buildID && length $buildID > 0) {
my $symbolFile = '/usr/lib/debug/.build-id/' . substr($buildID, 0, 2) . '/' . substr($buildID, 2) . '.debug';
if (-e $symbolFile) {
if($main::opt_debug) { print STDERR "found debug symbol file $symbolFile for $file\n" };
return $symbolFile;
} else {
if($main::opt_debug) { print STDERR "no debug symbol file found for $file, build ID: $buildID\n" };
return undef;
}
}
}
if($main::opt_debug) { print STDERR "no debug symbols found for $file, build ID unknown\n" };
return undef;
}
# Parse text section header of a library using objdump
sub ParseTextSectionHeaderFromObjdump {
my $lib = shift;
@ -4556,7 +4682,65 @@ sub ParseTextSectionHeaderFromOtool {
return $r;
}
# Parse text section header of a library in OS X shared cache using dyld_info
sub ParseTextSectionHeaderFromDyldInfo {
my $lib = shift;
my $size = undef;
my $vma;
my $file_offset;
# Get dyld_info output from the library file to figure out how to
# map between mapped addresses and addresses in the library.
my $cmd = ShellEscape($obj_tool_map{"dyld_info"}, "-segments", $lib);
open(DYLD, "$cmd |") || error("$cmd: $!\n");
while (<DYLD>) {
s/\r//g; # turn windows-looking lines into unix-looking lines
# -segments:
# load-address segment section sect-size seg-size perm
# 0x1803E0000 __TEXT 112KB r.x
# 0x1803E4F34 __text 80960
# 0x1803F8B74 __auth_stubs 768
# 0x1803F8E74 __init_offsets 4
# 0x1803F8E78 __gcc_except_tab 1180
my @x = split;
if ($#x >= 2) {
if ($x[0] eq 'load-offset') {
# dyld_info should only be used for the shared lib.
return undef;
} elsif ($x[1] eq '__TEXT') {
$file_offset = $x[0];
} elsif ($x[1] eq '__text') {
$size = $x[2];
$vma = $x[0];
$file_offset = AddressSub($x[0], $file_offset);
last;
}
}
}
close(DYLD);
if (!defined($vma) || !defined($size) || !defined($file_offset)) {
return undef;
}
my $r = {};
$r->{size} = $size;
$r->{vma} = $vma;
$r->{file_offset} = $file_offset;
return $r;
}
sub ParseTextSectionHeader {
# obj_tool_map("dyld_info") is only defined if we're in a Mach-O environment
if (defined($obj_tool_map{"dyld_info"})) {
my $r = ParseTextSectionHeaderFromDyldInfo(@_);
if (defined($r)){
return $r;
}
}
# if dyld_info doesn't work, or we don't have it, fall back to otool
# obj_tool_map("otool") is only defined if we're in a Mach-O environment
if (defined($obj_tool_map{"otool"})) {
my $r = ParseTextSectionHeaderFromOtool(@_);
@ -4597,7 +4781,7 @@ sub ParseLibraries {
$offset = HexExtend($3);
$lib = $4;
$lib =~ s|\\|/|g; # turn windows-style paths into unix-style paths
} elsif ($l =~ /^\s*($h)-($h):\s*(\S+\.so(\.\d+)*)/) {
} elsif ($l =~ /^\s*($h)-($h):\s*(\S+\.(so|dll|dylib|bundle)(\.\d+)*)/) {
# Cooked line from DumpAddressMap. Example:
# 40000000-40015000: /lib/ld-2.3.2.so
$start = HexExtend($1);
@ -4614,6 +4798,15 @@ sub ParseLibraries {
$offset = HexExtend($3);
$lib = $4;
$lib =~ s|\\|/|g; # turn windows-style paths into unix-style paths
} elsif (($l =~ /^\s*($h)-($h):\s*(\S+)/) && ($3 eq $prog)) {
# PIEs and address space randomization do not play well with our
# default assumption that main executable is at lowest
# addresses. So we're detecting main executable from
# DumpAddressMap as well.
$start = HexExtend($1);
$finish = HexExtend($2);
$offset = $zero_offset;
$lib = $3;
}
# FreeBSD 10.0 virtual memory map /proc/curproc/map as defined in
# function procfs_doprocmap (sys/fs/procfs/procfs_map.c)
@ -4984,7 +5177,7 @@ sub MapToSymbols {
} else {
# MapSymbolsWithNM tags each routine with its starting address,
# useful in case the image has multiple occurrences of this
# routine. (It uses a syntax that resembles template paramters,
# routine. (It uses a syntax that resembles template parameters,
# that are automatically stripped out by ShortFunctionName().)
# addr2line does not provide the same information. So we check
# if nm disambiguated our symbol, and if so take the annotated
@ -5144,6 +5337,7 @@ sub ConfigureObjTools {
if ($file_type =~ /Mach-O/) {
# OS X uses otool to examine Mach-O files, rather than objdump.
$obj_tool_map{"otool"} = "otool";
$obj_tool_map{"dyld_info"} = "dyld_info";
$obj_tool_map{"addr2line"} = "false"; # no addr2line
$obj_tool_map{"objdump"} = "false"; # no objdump
}
@ -5336,7 +5530,7 @@ sub GetProcedureBoundaries {
# "nm -f $image" is supposed to fail on GNU nm, but if:
#
# a. $image starts with [BbSsPp] (for example, bin/foo/bar), AND
# b. you have a.out in your current directory (a not uncommon occurence)
# b. you have a.out in your current directory (a not uncommon occurrence)
#
# then "nm -f $image" succeeds because -f only looks at the first letter of
# the argument, which looks valid because it's [BbSsPp], and then since
@ -5364,7 +5558,7 @@ sub GetProcedureBoundaries {
my $demangle_flag = "";
my $cppfilt_flag = "";
my $to_devnull = ">$dev_null 2>&1";
if (system(ShellEscape($nm, "--demangle", "image") . $to_devnull) == 0) {
if (system(ShellEscape($nm, "--demangle", $image) . $to_devnull) == 0) {
# In this mode, we do "nm --demangle <foo>"
$demangle_flag = "--demangle";
$cppfilt_flag = "";

1706
build-aux/config.guess vendored

File diff suppressed because it is too large Load diff

3455
build-aux/config.sub vendored

File diff suppressed because it is too large Load diff

View file

@ -115,7 +115,7 @@ fi
if [ x"$dir_arg" != x ]; then
dst=$src
src=""
if [ -d $dst ]; then
instcmd=:
else
@ -124,7 +124,7 @@ if [ x"$dir_arg" != x ]; then
else
# Waiting for this to be detected by the "$instcmd $src $dsttmp" command
# might cause directories to be created, which would be especially bad
# might cause directories to be created, which would be especially bad
# if $src (and thus $dsttmp) contains '*'.
if [ -f $src -o -d $src ]
@ -134,7 +134,7 @@ else
echo "install: $src does not exist"
exit 1
fi
if [ x"$dst" = x ]
then
echo "install: no destination specified"
@ -201,17 +201,17 @@ else
# If we're going to rename the final executable, determine the name now.
if [ x"$transformarg" = x ]
if [ x"$transformarg" = x ]
then
dstfile=`basename $dst`
else
dstfile=`basename $dst $transformbasename |
dstfile=`basename $dst $transformbasename |
sed $transformarg`$transformbasename
fi
# don't allow the sed command to completely eliminate the filename
if [ x"$dstfile" = x ]
if [ x"$dstfile" = x ]
then
dstfile=`basename $dst`
else
@ -242,7 +242,7 @@ else
# Now rename the file to the real destination.
$doit $rmcmd -f $dstdir/$dstfile &&
$doit $mvcmd $dsttmp $dstdir/$dstfile
$doit $mvcmd $dsttmp $dstdir/$dstfile
fi &&

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,145 @@
# jemalloc profiling
This describes the mathematical basis behind jemalloc's profiling implementation, as well as the implementation tricks that make it effective. Historically, the jemalloc profiling design simply copied tcmalloc's. The implementation has since diverged, due to both the desire to record additional information, and to correct some biasing bugs.
Note: this document is markdown with embedded LaTeX; different markdown renderers may not produce the expected output. Viewing with `pandoc -s PROFILING_INTERNALS.md -o PROFILING_INTERNALS.pdf` is recommended.
## Some tricks in our implementation toolbag
### Sampling
Recording our metadata is quite expensive; we need to walk up the stack to get a stack trace. On top of that, we need to allocate storage to record that stack trace, and stick it somewhere where a profile-dumping call can find it. That call might happen on another thread, so we'll probably need to take a lock to do so. These costs are quite large compared to the average cost of an allocation. To manage this, we'll only sample some fraction of allocations. This will miss some of them, so our data will be incomplete, but we'll try to make up for it. We can tune our sampling rate to balance accuracy and performance.
### Fast Bernoulli sampling
Compared to our fast paths, even a `coinflip(p)` function can be quite expensive. Having to do a random-number generation and some floating point operations would be a sizeable relative cost. However (as pointed out in [[Vitter, 1987](https://dl.acm.org/doi/10.1145/23002.23003)]), if we can orchestrate our algorithm so that many of our `coinflip` calls share their parameter value, we can do better. We can sample from the geometric distribution, and initialize a counter with the result. When the counter hits 0, the `coinflip` function returns true (and reinitializes its internal counter).
This can let us do a random-number generation once per (logical) coinflip that comes up heads, rather than once per (logical) coinflip. Since we expect to sample relatively rarely, this can be a large win.
### Fast-path / slow-path thinking
Most programs have a skewed distribution of allocations. Smaller allocations are much more frequent than large ones, but shorter lived and less common as a fraction of program memory. "Small" and "large" are necessarily sort of fuzzy terms, but if we define "small" as "allocations jemalloc puts into slabs" and "large" as the others, then it's not uncommon for small allocations to be hundreds of times more frequent than large ones, but take up around half the amount of heap space as large ones. Moreover, small allocations tend to be much cheaper than large ones (often by a factor of 20-30): they're more likely to hit in thread caches, less likely to have to do an mmap, and cheaper to fill (by the user) once the allocation has been returned.
## An unbiased estimator of space consumption from (almost) arbitrary sampling strategies
Suppose we have a sampling strategy that meets the following criteria:
- One allocation being sampled is independent of other allocations being sampled.
- Each allocation has a non-zero probability of being sampled.
We can then estimate the bytes in live allocations through some particular stack trace as:
$$ \sum_i S_i I_i \frac{1}{\mathrm{E}[I_i]} $$
where the sum ranges over some index variable of live allocations from that stack, $S_i$ is the size of the $i$'th allocation, and $I_i$ is an indicator random variable for whether or not the $i'th$ allocation is sampled. $S_i$ and $\mathrm{E}[I_i]$ are constants (the program allocations are fixed; the random variables are the sampling decisions), so taking the expectation we get
$$ \sum_i S_i \mathrm{E}[I_i] \frac{1}{\mathrm{E}[I_i]}.$$
This is of course $\sum_i S_i$, as we want (and, a similar calculation could be done for allocation counts as well).
This is a fairly general strategy; note that while we require that sampling decisions be independent of one another's outcomes, they don't have to be independent of previous allocations, total bytes allocated, etc. You can imagine strategies that:
- Sample allocations at program startup at a higher rate than subsequent allocations
- Sample even-indexed allocations more frequently than odd-indexed ones (so long as no allocation has zero sampling probability)
- Let threads declare themselves as high-sampling-priority, and sample their allocations at an increased rate.
These can all be fit into this framework to give an unbiased estimator.
## Evaluating sampling strategies
Not all strategies for picking allocations to sample are equally good, of course. Among unbiased estimators, the lower the variance, the lower the mean squared error. Using the estimator above, the variance is:
$$
\begin{aligned}
& \mathrm{Var}[\sum_i S_i I_i \frac{1}{\mathrm{E}[I_i]}] \\
=& \sum_i \mathrm{Var}[S_i I_i \frac{1}{\mathrm{E}[I_i]}] \\
=& \sum_i \frac{S_i^2}{\mathrm{E}[I_i]^2} \mathrm{Var}[I_i] \\
=& \sum_i \frac{S_i^2}{\mathrm{E}[I_i]^2} \mathrm{Var}[I_i] \\
=& \sum_i \frac{S_i^2}{\mathrm{E}[I_i]^2} \mathrm{E}[I_i](1 - \mathrm{E}[I_i]) \\
=& \sum_i S_i^2 \frac{1 - \mathrm{E}[I_i]}{\mathrm{E}[I_i]}.
\end{aligned}
$$
We can use this formula to compare various strategy choices. All else being equal, lower-variance strategies are better.
## Possible sampling strategies
Because of the desire to avoid the fast-path costs, we'd like to use our Bernoulli trick if possible. There are two obvious counters to use: a coinflip per allocation, and a coinflip per byte allocated.
### Bernoulli sampling per-allocation
An obvious strategy is to pick some large $N$, and give each allocation a $1/N$ chance of being sampled. This would let us use our Bernoulli-via-Geometric trick. Using the formula from above, we can compute the variance as:
$$ \sum_i S_i^2 \frac{1 - \frac{1}{N}}{\frac{1}{N}} = (N-1) \sum_i S_i^2.$$
That is, an allocation of size $Z$ contributes a term of $(N-1)Z^2$ to the variance.
### Bernoulli sampling per-byte
Another option we have is to pick some rate $R$, and give each byte a $1/R$ chance of being picked for sampling (at which point we would sample its contained allocation). The chance of an allocation of size $Z$ being sampled, then, is
$$1-(1-\frac{1}{R})^{Z}$$
and an allocation of size $Z$ contributes a term of
$$Z^2 \frac{(1-\frac{1}{R})^{Z}}{1-(1-\frac{1}{R})^{Z}}.$$
In practical settings, $R$ is large, and so this is well-approximated by
$$Z^2 \frac{e^{-Z/R}}{1 - e^{-Z/R}} .$$
Just to get a sense of the dynamics here, let's look at the behavior for various values of $Z$. When $Z$ is small relative to $R$, we can use $e^z \approx 1 + x$, and conclude that the variance contributed by a small-$Z$ allocation is around
$$Z^2 \frac{1-Z/R}{Z/R} \approx RZ.$$
When $Z$ is comparable to $R$, the variance term is near $Z^2$ (we have $\frac{e^{-Z/R}}{1 - e^{-Z/R}} = 1$ when $Z/R = \ln 2 \approx 0.693$). When $Z$ is large relative to $R$, the variance term goes to zero.
## Picking a sampling strategy
The fast-path/slow-path dynamics of allocation patterns point us towards the per-byte sampling approach:
- The quadratic increase in variance per allocation in the first approach is quite costly when heaps have a non-negligible portion of their bytes in those allocations, which is practically often the case.
- The Bernoulli-per-byte approach shifts more of its samples towards large allocations, which are already a slow-path.
- We drive several tickers (e.g. tcache gc) by bytes allocated, and report bytes-allocated as a user-visible statistic, so we have to do all the necessary bookkeeping anyways.
Indeed, this is the approach we use in jemalloc. Our heap dumps record the size of the allocation and the sampling rate $R$, and jeprof unbiases by dividing by $1 - e^{-Z/R}$. The framework above would suggest dividing by $1-(1-1/R)^Z$; instead, we use the fact that $R$ is large in practical situations, and so $e^{-Z/R}$ is a good approximation (and faster to compute). (Equivalently, we may also see this as the factor that falls out from viewing sampling as a Poisson process directly).
## Consequences for heap dump consumers
Using this approach means that there are a few things users need to be aware of.
### Stack counts are not proportional to allocation frequencies
If one stack appears twice as often as another, this by itself does not imply that it allocates twice as often. Consider the case in which there are only two types of allocating call stacks in a program. Stack A allocates 8 bytes, and occurs a million times in a program. Stack B allocates 8 MB, and occurs just once in a program. If our sampling rate $R$ is about 1MB, we expect stack A to show up about 8 times, and stack B to show up once. Stack A isn't 8 times more frequent than stack B, though; it's a million times more frequent.
### Aggregation must be done after unbiasing samples
Some tools manually parse heap dump output, and aggregate across stacks (or across program runs) to provide wider-scale data analyses. When doing this aggregation, though, it's important to unbias-and-then-sum, rather than sum-and-then-unbias. Reusing our example from the previous section: suppose we collect heap dumps of the program from 1 million machines. We then have 8 million samples of stack A (8 per machine, each of 8 bytes), and 1 million samples of stack B (1 per machine, each of 8 MB).
If we sum first then unbias based on this formula: $1 - e^{-Z/R}$ we get:
$$Z = 8,000,000 * 8 bytes = 64MB$$
$$64MB / (1 - e^{-64MB/1MB}) \approx 64MB (Stack A)$$
$$Z = 1,000,000 * 8MB = 8TB$$
$$8TB / (1 - e^{-1TB/1MB}) \approx 8TB (Stack B)$$
Clearly we are unbiasing by an infinitesimal amount, which dramatically underreports the amount of memory allocated by stack A. Whereas if we unbias first and then sum:
$$Z = 8 bytes$$
$$8 bytes / (1 - e^{-8 bytes/1MB}) \approx 1MB$$
$$1MB * 8,000,000 = 8TB (Stack A)$$
$$Z = 8MB$$
$$8MB / (1 - e^{-8MB/1MB}) \approx 8MB$$
$$8MB * 1,000,000 = 8TB (Stack B)$$
## An avenue for future exploration
While the framework we laid out above is pretty general, as an engineering decision we're only interested in fairly simple approaches (i.e. ones for which the chance of an allocation being sampled depends only on its size). Our job is then: for each size class $Z$, pick a probability $p_Z$ that an allocation of that size will be sampled. We made some handwave-y references to statistical distributions to justify our choices, but there's no reason we need to pick them that way. Any set of non-zero probabilities is a valid choice.
The real limiting factor in our ability to reduce estimator variance is that fact that sampling is expensive; we want to make sure we only do it on a small fraction of allocations. Our goal, then, is to pick the $p_Z$ to minimize variance given some maximum sampling rate $P$. If we define $a_Z$ to be the fraction of allocations of size $Z$, and $l_Z$ to be the fraction of allocations of size $Z$ still alive at the time of a heap dump, then we can phrase this as an optimization problem over the choices of $p_Z$:
Minimize
$$ \sum_Z Z^2 l_Z \frac{1-p_Z}{p_Z} $$
subject to
$$ \sum_Z a_Z p_Z \leq P $$
Ignoring a term that doesn't depend on $p_Z$, the objective is minimized whenever
$$ \sum_Z Z^2 l_Z \frac{1}{p_Z} $$
is. For a particular program, $l_Z$ and $a_Z$ are just numbers that can be obtained (exactly) from existing stats introspection facilities, and we have a fairly tractable convex optimization problem (it can be framed as a second-order cone program). It would be interesting to evaluate, for various common allocation patterns, how well our current strategy adapts. Do our actual choices for $p_Z$ closely correspond to the optimal ones? How close is the variance of our choices to the variance of the optimal strategy?
You can imagine an implementation that actually goes all the way, and makes $p_Z$ selections a tuning parameter. I don't think this is a good use of development time for the foreseeable future; but I do wonder about the answers to some of these questions.
## Implementation realities
The nice story above is at least partially a lie. Initially, jeprof (copying its logic from pprof) had the sum-then-unbias error described above. The current version of jemalloc does the unbiasing step on a per-allocation basis internally, so that we're always tracking what the unbiased numbers "should" be. The problem is, actually surfacing those unbiased numbers would require a breaking change to jeprof (and the various already-deployed tools that have copied its logic). Instead, we use a little bit more trickery. Since we know at dump time the numbers we want jeprof to report, we simply choose the values we'll output so that the jeprof numbers will match the true numbers. The math is described in `src/prof_data.c` (where the only cleverness is a change of variables that lets the exponentials fall out).
This has the effect of making the output of jeprof (and related tools) correct, while making its inputs incorrect. This can be annoying to human readers of raw profiling dump output.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 16 KiB

View file

@ -1,96 +1,125 @@
#ifndef JEMALLOC_INTERNAL_ARENA_EXTERNS_H
#define JEMALLOC_INTERNAL_ARENA_EXTERNS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_stats.h"
#include "jemalloc/internal/bin.h"
#include "jemalloc/internal/div.h"
#include "jemalloc/internal/emap.h"
#include "jemalloc/internal/extent_dss.h"
#include "jemalloc/internal/hook.h"
#include "jemalloc/internal/pages.h"
#include "jemalloc/internal/size_classes.h"
#include "jemalloc/internal/stats.h"
/*
* When the amount of pages to be purged exceeds this amount, deferred purge
* should happen.
*/
#define ARENA_DEFERRED_PURGE_NPAGES_THRESHOLD UINT64_C(1024)
extern ssize_t opt_dirty_decay_ms;
extern ssize_t opt_muzzy_decay_ms;
extern const arena_bin_info_t arena_bin_info[NBINS];
extern percpu_arena_mode_t opt_percpu_arena;
extern const char *percpu_arena_mode_names[];
extern const char *const percpu_arena_mode_names[];
extern const uint64_t h_steps[SMOOTHSTEP_NSTEPS];
extern malloc_mutex_t arenas_lock;
extern div_info_t arena_binind_div_info[SC_NBINS];
void arena_stats_large_nrequests_add(tsdn_t *tsdn, arena_stats_t *arena_stats,
szind_t szind, uint64_t nrequests);
void arena_stats_mapped_add(tsdn_t *tsdn, arena_stats_t *arena_stats,
size_t size);
void arena_basic_stats_merge(tsdn_t *tsdn, arena_t *arena,
unsigned *nthreads, const char **dss, ssize_t *dirty_decay_ms,
ssize_t *muzzy_decay_ms, size_t *nactive, size_t *ndirty, size_t *nmuzzy);
extern emap_t arena_emap_global;
extern size_t opt_oversize_threshold;
extern size_t oversize_threshold;
extern bool opt_huge_arena_pac_thp;
extern pac_thp_t huge_arena_pac_thp;
/*
* arena_bin_offsets[binind] is the offset of the first bin shard for size class
* binind.
*/
extern uint32_t arena_bin_offsets[SC_NBINS];
void arena_basic_stats_merge(tsdn_t *tsdn, arena_t *arena, unsigned *nthreads,
const char **dss, ssize_t *dirty_decay_ms, ssize_t *muzzy_decay_ms,
size_t *nactive, size_t *ndirty, size_t *nmuzzy);
void arena_stats_merge(tsdn_t *tsdn, arena_t *arena, unsigned *nthreads,
const char **dss, ssize_t *dirty_decay_ms, ssize_t *muzzy_decay_ms,
size_t *nactive, size_t *ndirty, size_t *nmuzzy, arena_stats_t *astats,
malloc_bin_stats_t *bstats, malloc_large_stats_t *lstats);
void arena_extents_dirty_dalloc(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extent_t *extent);
#ifdef JEMALLOC_JET
size_t arena_slab_regind(extent_t *slab, szind_t binind, const void *ptr);
#endif
extent_t *arena_extent_alloc_large(tsdn_t *tsdn, arena_t *arena,
size_t usize, size_t alignment, bool *zero);
void arena_extent_dalloc_large_prep(tsdn_t *tsdn, arena_t *arena,
extent_t *extent);
void arena_extent_ralloc_large_shrink(tsdn_t *tsdn, arena_t *arena,
extent_t *extent, size_t oldsize);
void arena_extent_ralloc_large_expand(tsdn_t *tsdn, arena_t *arena,
extent_t *extent, size_t oldsize);
ssize_t arena_dirty_decay_ms_get(arena_t *arena);
bool arena_dirty_decay_ms_set(tsdn_t *tsdn, arena_t *arena, ssize_t decay_ms);
ssize_t arena_muzzy_decay_ms_get(arena_t *arena);
bool arena_muzzy_decay_ms_set(tsdn_t *tsdn, arena_t *arena, ssize_t decay_ms);
void arena_decay(tsdn_t *tsdn, arena_t *arena, bool is_background_thread,
bool all);
void arena_reset(tsd_t *tsd, arena_t *arena);
void arena_destroy(tsd_t *tsd, arena_t *arena);
void arena_tcache_fill_small(tsdn_t *tsdn, arena_t *arena, tcache_t *tcache,
tcache_bin_t *tbin, szind_t binind, uint64_t prof_accumbytes);
void arena_alloc_junk_small(void *ptr, const arena_bin_info_t *bin_info,
bool zero);
bin_stats_data_t *bstats, arena_stats_large_t *lstats, pac_estats_t *estats,
hpa_shard_stats_t *hpastats);
void arena_handle_deferred_work(tsdn_t *tsdn, arena_t *arena);
edata_t *arena_extent_alloc_large(
tsdn_t *tsdn, arena_t *arena, size_t usize, size_t alignment, bool zero);
void arena_extent_dalloc_large_prep(
tsdn_t *tsdn, arena_t *arena, edata_t *edata);
void arena_extent_ralloc_large_shrink(
tsdn_t *tsdn, arena_t *arena, edata_t *edata, size_t oldusize);
void arena_extent_ralloc_large_expand(
tsdn_t *tsdn, arena_t *arena, edata_t *edata, size_t oldusize);
bool arena_decay_ms_set(
tsdn_t *tsdn, arena_t *arena, extent_state_t state, ssize_t decay_ms);
ssize_t arena_decay_ms_get(arena_t *arena, extent_state_t state);
void arena_decay(
tsdn_t *tsdn, arena_t *arena, bool is_background_thread, bool all);
uint64_t arena_time_until_deferred(tsdn_t *tsdn, arena_t *arena);
void arena_do_deferred_work(tsdn_t *tsdn, arena_t *arena);
void arena_reset(tsd_t *tsd, arena_t *arena);
void arena_destroy(tsd_t *tsd, arena_t *arena);
cache_bin_sz_t arena_ptr_array_fill_small(tsdn_t *tsdn, arena_t *arena,
szind_t binind, cache_bin_ptr_array_t *arr, const cache_bin_sz_t nfill_min,
const cache_bin_sz_t nfill_max, cache_bin_stats_t merge_stats);
typedef void (arena_dalloc_junk_small_t)(void *, const arena_bin_info_t *);
extern arena_dalloc_junk_small_t *JET_MUTABLE arena_dalloc_junk_small;
void *arena_malloc_hard(tsdn_t *tsdn, arena_t *arena, size_t size, szind_t ind,
bool zero, bool slab);
void *arena_palloc(tsdn_t *tsdn, arena_t *arena, size_t usize, size_t alignment,
bool zero, bool slab, tcache_t *tcache);
void arena_prof_promote(
tsdn_t *tsdn, void *ptr, size_t usize, size_t bumped_usize);
void arena_dalloc_promoted(
tsdn_t *tsdn, void *ptr, tcache_t *tcache, bool slow_path);
void arena_slab_dalloc(tsdn_t *tsdn, arena_t *arena, edata_t *slab);
void *arena_malloc_hard(tsdn_t *tsdn, arena_t *arena, size_t size,
szind_t ind, bool zero);
void *arena_palloc(tsdn_t *tsdn, arena_t *arena, size_t usize,
size_t alignment, bool zero, tcache_t *tcache);
void arena_prof_promote(tsdn_t *tsdn, const void *ptr, size_t usize);
void arena_dalloc_promoted(tsdn_t *tsdn, void *ptr, tcache_t *tcache,
bool slow_path);
void arena_dalloc_bin_junked_locked(tsdn_t *tsdn, arena_t *arena,
extent_t *extent, void *ptr);
void arena_dalloc_small(tsdn_t *tsdn, void *ptr);
bool arena_ralloc_no_move(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size,
size_t extra, bool zero);
void arena_dalloc_small(tsdn_t *tsdn, void *ptr);
void arena_ptr_array_flush(tsd_t *tsd, szind_t binind,
cache_bin_ptr_array_t *arr, unsigned nflush, bool small,
arena_t *stats_arena, cache_bin_stats_t merge_stats);
bool arena_ralloc_no_move(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size,
size_t extra, bool zero, size_t *newsize);
void *arena_ralloc(tsdn_t *tsdn, arena_t *arena, void *ptr, size_t oldsize,
size_t size, size_t alignment, bool zero, tcache_t *tcache);
dss_prec_t arena_dss_prec_get(arena_t *arena);
bool arena_dss_prec_set(arena_t *arena, dss_prec_t dss_prec);
size_t size, size_t alignment, bool zero, bool slab, tcache_t *tcache,
hook_ralloc_args_t *hook_args);
dss_prec_t arena_dss_prec_get(arena_t *arena);
ehooks_t *arena_get_ehooks(arena_t *arena);
extent_hooks_t *arena_set_extent_hooks(
tsd_t *tsd, arena_t *arena, extent_hooks_t *extent_hooks);
bool arena_dss_prec_set(arena_t *arena, dss_prec_t dss_prec);
void arena_name_get(arena_t *arena, char *name);
void arena_name_set(arena_t *arena, const char *name);
ssize_t arena_dirty_decay_ms_default_get(void);
bool arena_dirty_decay_ms_default_set(ssize_t decay_ms);
bool arena_dirty_decay_ms_default_set(ssize_t decay_ms);
ssize_t arena_muzzy_decay_ms_default_get(void);
bool arena_muzzy_decay_ms_default_set(ssize_t decay_ms);
bool arena_muzzy_decay_ms_default_set(ssize_t decay_ms);
bool arena_retain_grow_limit_get_set(
tsd_t *tsd, arena_t *arena, size_t *old_limit, size_t *new_limit);
unsigned arena_nthreads_get(arena_t *arena, bool internal);
void arena_nthreads_inc(arena_t *arena, bool internal);
void arena_nthreads_dec(arena_t *arena, bool internal);
size_t arena_extent_sn_next(arena_t *arena);
arena_t *arena_new(tsdn_t *tsdn, unsigned ind, extent_hooks_t *extent_hooks);
void arena_boot(void);
void arena_prefork0(tsdn_t *tsdn, arena_t *arena);
void arena_prefork1(tsdn_t *tsdn, arena_t *arena);
void arena_prefork2(tsdn_t *tsdn, arena_t *arena);
void arena_prefork3(tsdn_t *tsdn, arena_t *arena);
void arena_prefork4(tsdn_t *tsdn, arena_t *arena);
void arena_prefork5(tsdn_t *tsdn, arena_t *arena);
void arena_prefork6(tsdn_t *tsdn, arena_t *arena);
void arena_postfork_parent(tsdn_t *tsdn, arena_t *arena);
void arena_postfork_child(tsdn_t *tsdn, arena_t *arena);
void arena_nthreads_inc(arena_t *arena, bool internal);
void arena_nthreads_dec(arena_t *arena, bool internal);
arena_t *arena_new(tsdn_t *tsdn, unsigned ind, const arena_config_t *config);
bool arena_init_huge(tsdn_t *tsdn, arena_t *a0);
arena_t *arena_choose_huge(tsd_t *tsd);
size_t arena_fill_small_fresh(tsdn_t *tsdn, arena_t *arena, szind_t binind,
void **ptrs, size_t nfill, bool zero);
bool arena_boot(sc_data_t *sc_data, base_t *base, bool hpa);
void arena_prefork0(tsdn_t *tsdn, arena_t *arena);
void arena_prefork1(tsdn_t *tsdn, arena_t *arena);
void arena_prefork2(tsdn_t *tsdn, arena_t *arena);
void arena_prefork3(tsdn_t *tsdn, arena_t *arena);
void arena_prefork4(tsdn_t *tsdn, arena_t *arena);
void arena_prefork5(tsdn_t *tsdn, arena_t *arena);
void arena_prefork6(tsdn_t *tsdn, arena_t *arena);
void arena_prefork7(tsdn_t *tsdn, arena_t *arena);
void arena_prefork8(tsdn_t *tsdn, arena_t *arena);
void arena_postfork_parent(tsdn_t *tsdn, arena_t *arena);
void arena_postfork_child(tsdn_t *tsdn, arena_t *arena);
#endif /* JEMALLOC_INTERNAL_ARENA_EXTERNS_H */

View file

@ -1,9 +1,12 @@
#ifndef JEMALLOC_INTERNAL_ARENA_INLINES_A_H
#define JEMALLOC_INTERNAL_ARENA_INLINES_A_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_structs.h"
static inline unsigned
arena_ind_get(const arena_t *arena) {
return base_ind_get(arena->base);
return arena->ind;
}
static inline void
@ -21,37 +24,4 @@ arena_internal_get(arena_t *arena) {
return atomic_load_zu(&arena->stats.internal, ATOMIC_RELAXED);
}
static inline bool
arena_prof_accum(tsdn_t *tsdn, arena_t *arena, uint64_t accumbytes) {
cassert(config_prof);
if (likely(prof_interval == 0)) {
return false;
}
return prof_accum_add(tsdn, &arena->prof_accum, accumbytes);
}
static inline void
percpu_arena_update(tsd_t *tsd, unsigned cpu) {
assert(have_percpu_arena);
arena_t *oldarena = tsd_arena_get(tsd);
assert(oldarena != NULL);
unsigned oldind = arena_ind_get(oldarena);
if (oldind != cpu) {
unsigned newind = cpu;
arena_t *newarena = arena_get(tsd_tsdn(tsd), newind, true);
assert(newarena != NULL);
/* Set new arena/tcache associations. */
arena_migrate(tsd, oldind, newind);
tcache_t *tcache = tcache_get(tsd);
if (tcache != NULL) {
tcache_arena_reassociate(tsd_tsdn(tsd), tcache,
newarena);
}
}
}
#endif /* JEMALLOC_INTERNAL_ARENA_INLINES_A_H */

View file

@ -1,134 +1,235 @@
#ifndef JEMALLOC_INTERNAL_ARENA_INLINES_B_H
#define JEMALLOC_INTERNAL_ARENA_INLINES_B_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_externs.h"
#include "jemalloc/internal/arena_structs.h"
#include "jemalloc/internal/bin_inlines.h"
#include "jemalloc/internal/div.h"
#include "jemalloc/internal/emap.h"
#include "jemalloc/internal/jemalloc_internal_inlines_b.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/large_externs.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/prof_externs.h"
#include "jemalloc/internal/prof_structs.h"
#include "jemalloc/internal/rtree.h"
#include "jemalloc/internal/size_classes.h"
#include "jemalloc/internal/safety_check.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/sz.h"
#include "jemalloc/internal/tcache_inlines.h"
#include "jemalloc/internal/ticker.h"
static inline szind_t
arena_bin_index(arena_t *arena, arena_bin_t *bin) {
szind_t binind = (szind_t)(bin - arena->bins);
assert(binind < NBINS);
return binind;
static inline arena_t *
arena_get_from_edata(edata_t *edata) {
return (arena_t *)atomic_load_p(
&arenas[edata_arena_ind_get(edata)], ATOMIC_RELAXED);
}
JEMALLOC_ALWAYS_INLINE prof_tctx_t *
arena_prof_tctx_get(tsdn_t *tsdn, const void *ptr, alloc_ctx_t *alloc_ctx) {
cassert(config_prof);
assert(ptr != NULL);
/* Static check. */
if (alloc_ctx == NULL) {
const extent_t *extent = iealloc(tsdn, ptr);
if (unlikely(!extent_slab_get(extent))) {
return large_prof_tctx_get(tsdn, extent);
}
} else {
if (unlikely(!alloc_ctx->slab)) {
return large_prof_tctx_get(tsdn, iealloc(tsdn, ptr));
}
JEMALLOC_ALWAYS_INLINE arena_t *
arena_choose_maybe_huge(tsd_t *tsd, arena_t *arena, size_t size) {
if (arena != NULL) {
return arena;
}
return (prof_tctx_t *)(uintptr_t)1U;
/*
* For huge allocations, use the dedicated huge arena if both are true:
* 1) is using auto arena selection (i.e. arena == NULL), and 2) the
* thread is not assigned to a manual arena.
*/
arena_t *tsd_arena = tsd_arena_get(tsd);
if (tsd_arena == NULL) {
tsd_arena = arena_choose(tsd, NULL);
}
size_t threshold = atomic_load_zu(
&tsd_arena->pa_shard.pac.oversize_threshold, ATOMIC_RELAXED);
if (unlikely(size >= threshold) && arena_is_auto(tsd_arena)) {
return arena_choose_huge(tsd);
}
return tsd_arena;
}
JEMALLOC_ALWAYS_INLINE bool
large_dalloc_safety_checks(edata_t *edata, const void *ptr, size_t input_size) {
if (!config_opt_safety_checks) {
return false;
}
/*
* Eagerly detect double free and sized dealloc bugs for large sizes.
* The cost is low enough (as edata will be accessed anyway) to be
* enabled all the time.
*/
if (unlikely(edata == NULL
|| edata_state_get(edata) != extent_state_active)) {
safety_check_fail(
"Invalid deallocation detected: "
"pages being freed (%p) not currently active, "
"possibly caused by double free bugs.",
ptr);
return true;
}
if (unlikely(input_size != edata_usize_get(edata)
|| input_size > SC_LARGE_MAXCLASS)) {
safety_check_fail_sized_dealloc(/* current_dealloc */ true, ptr,
/* true_size */ edata_usize_get(edata), input_size);
return true;
}
return false;
}
JEMALLOC_ALWAYS_INLINE void
arena_prof_tctx_set(tsdn_t *tsdn, const void *ptr, size_t usize,
alloc_ctx_t *alloc_ctx, prof_tctx_t *tctx) {
arena_prof_info_get(tsd_t *tsd, const void *ptr, emap_alloc_ctx_t *alloc_ctx,
prof_info_t *prof_info, bool reset_recent) {
cassert(config_prof);
assert(ptr != NULL);
assert(prof_info != NULL);
edata_t *edata = NULL;
bool is_slab;
/* Static check. */
if (alloc_ctx == NULL) {
edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
is_slab = edata_slab_get(edata);
} else if (unlikely(!(is_slab = alloc_ctx->slab))) {
edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
}
if (unlikely(!is_slab)) {
/* edata must have been initialized at this point. */
assert(edata != NULL);
size_t usize = (alloc_ctx == NULL)
? edata_usize_get(edata)
: emap_alloc_ctx_usize_get(alloc_ctx);
if (reset_recent
&& large_dalloc_safety_checks(edata, ptr, usize)) {
prof_info->alloc_tctx = PROF_TCTX_SENTINEL;
return;
}
large_prof_info_get(tsd, edata, prof_info, reset_recent);
} else {
prof_info->alloc_tctx = PROF_TCTX_SENTINEL;
/*
* No need to set other fields in prof_info; they will never be
* accessed if alloc_tctx == PROF_TCTX_SENTINEL.
*/
}
}
JEMALLOC_ALWAYS_INLINE void
arena_prof_tctx_reset(
tsd_t *tsd, const void *ptr, emap_alloc_ctx_t *alloc_ctx) {
cassert(config_prof);
assert(ptr != NULL);
/* Static check. */
if (alloc_ctx == NULL) {
extent_t *extent = iealloc(tsdn, ptr);
if (unlikely(!extent_slab_get(extent))) {
large_prof_tctx_set(tsdn, extent, tctx);
edata_t *edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
if (unlikely(!edata_slab_get(edata))) {
large_prof_tctx_reset(edata);
}
} else {
if (unlikely(!alloc_ctx->slab)) {
large_prof_tctx_set(tsdn, iealloc(tsdn, ptr), tctx);
edata_t *edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
large_prof_tctx_reset(edata);
}
}
}
static inline void
arena_prof_tctx_reset(tsdn_t *tsdn, const void *ptr, prof_tctx_t *tctx) {
JEMALLOC_ALWAYS_INLINE void
arena_prof_tctx_reset_sampled(tsd_t *tsd, const void *ptr) {
cassert(config_prof);
assert(ptr != NULL);
extent_t *extent = iealloc(tsdn, ptr);
assert(!extent_slab_get(extent));
edata_t *edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
assert(!edata_slab_get(edata));
large_prof_tctx_reset(tsdn, extent);
large_prof_tctx_reset(edata);
}
JEMALLOC_ALWAYS_INLINE void
arena_prof_info_set(
tsd_t *tsd, edata_t *edata, prof_tctx_t *tctx, size_t size) {
cassert(config_prof);
assert(!edata_slab_get(edata));
large_prof_info_set(edata, tctx, size);
}
JEMALLOC_ALWAYS_INLINE void
arena_decay_ticks(tsdn_t *tsdn, arena_t *arena, unsigned nticks) {
tsd_t *tsd;
ticker_t *decay_ticker;
if (unlikely(tsdn_null(tsdn))) {
return;
}
tsd = tsdn_tsd(tsdn);
decay_ticker = decay_ticker_get(tsd, arena_ind_get(arena));
if (unlikely(decay_ticker == NULL)) {
return;
}
if (unlikely(ticker_ticks(decay_ticker, nticks))) {
tsd_t *tsd = tsdn_tsd(tsdn);
/*
* We use the ticker_geom_t to avoid having per-arena state in the tsd.
* Instead of having a countdown-until-decay timer running for every
* arena in every thread, we flip a coin once per tick, whose
* probability of coming up heads is 1/nticks; this is effectively the
* operation of the ticker_geom_t. Each arena has the same chance of a
* coinflip coming up heads (1/ARENA_DECAY_NTICKS_PER_UPDATE), so we can
* use a single ticker for all of them.
*/
ticker_geom_t *decay_ticker = tsd_arena_decay_tickerp_get(tsd);
uint64_t *prng_state = tsd_prng_statep_get(tsd);
if (unlikely(ticker_geom_ticks(decay_ticker, prng_state, nticks,
tsd_reentrancy_level_get(tsd) > 0))) {
arena_decay(tsdn, arena, false, false);
}
}
JEMALLOC_ALWAYS_INLINE void
arena_decay_tick(tsdn_t *tsdn, arena_t *arena) {
malloc_mutex_assert_not_owner(tsdn, &arena->decay_dirty.mtx);
malloc_mutex_assert_not_owner(tsdn, &arena->decay_muzzy.mtx);
arena_decay_ticks(tsdn, arena, 1);
}
JEMALLOC_ALWAYS_INLINE void *
arena_malloc(tsdn_t *tsdn, arena_t *arena, size_t size, szind_t ind, bool zero,
tcache_t *tcache, bool slow_path) {
bool slab, tcache_t *tcache, bool slow_path) {
assert(!tsdn_null(tsdn) || tcache == NULL);
assert(size != 0);
if (likely(tcache != NULL)) {
if (likely(size <= SMALL_MAXCLASS)) {
return tcache_alloc_small(tsdn_tsd(tsdn), arena,
tcache, size, ind, zero, slow_path);
if (likely(slab)) {
assert(sz_can_use_slab(size));
return tcache_alloc_small(tsdn_tsd(tsdn), arena, tcache,
size, ind, zero, slow_path);
} else if (likely(ind < tcache_nbins_get(tcache->tcache_slow)
&& !tcache_bin_disabled(ind, &tcache->bins[ind],
tcache->tcache_slow))) {
return tcache_alloc_large(tsdn_tsd(tsdn), arena, tcache,
size, ind, zero, slow_path);
}
if (likely(size <= tcache_maxclass)) {
return tcache_alloc_large(tsdn_tsd(tsdn), arena,
tcache, size, ind, zero, slow_path);
}
/* (size > tcache_maxclass) case falls through. */
assert(size > tcache_maxclass);
/* (size > tcache_max) case falls through. */
}
return arena_malloc_hard(tsdn, arena, size, ind, zero);
return arena_malloc_hard(tsdn, arena, size, ind, zero, slab);
}
JEMALLOC_ALWAYS_INLINE arena_t *
arena_aalloc(tsdn_t *tsdn, const void *ptr) {
return extent_arena_get(iealloc(tsdn, ptr));
edata_t *edata = emap_edata_lookup(tsdn, &arena_emap_global, ptr);
unsigned arena_ind = edata_arena_ind_get(edata);
return (arena_t *)atomic_load_p(&arenas[arena_ind], ATOMIC_RELAXED);
}
JEMALLOC_ALWAYS_INLINE size_t
arena_salloc(tsdn_t *tsdn, const void *ptr) {
assert(ptr != NULL);
emap_alloc_ctx_t alloc_ctx;
emap_alloc_ctx_lookup(tsdn, &arena_emap_global, ptr, &alloc_ctx);
assert(alloc_ctx.szind != SC_NSIZES);
rtree_ctx_t rtree_ctx_fallback;
rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback);
szind_t szind = rtree_szind_read(tsdn, &extents_rtree, rtree_ctx,
(uintptr_t)ptr, true);
assert(szind != NSIZES);
return sz_index2size(szind);
return emap_alloc_ctx_usize_get(&alloc_ctx);
}
JEMALLOC_ALWAYS_INLINE size_t
@ -142,60 +243,129 @@ arena_vsalloc(tsdn_t *tsdn, const void *ptr) {
* failure.
*/
rtree_ctx_t rtree_ctx_fallback;
rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback);
extent_t *extent;
szind_t szind;
if (rtree_extent_szind_read(tsdn, &extents_rtree, rtree_ctx,
(uintptr_t)ptr, false, &extent, &szind)) {
emap_full_alloc_ctx_t full_alloc_ctx;
bool missing = emap_full_alloc_ctx_try_lookup(
tsdn, &arena_emap_global, ptr, &full_alloc_ctx);
if (missing) {
return 0;
}
if (extent == NULL) {
if (full_alloc_ctx.edata == NULL) {
return 0;
}
assert(extent_state_get(extent) == extent_state_active);
assert(edata_state_get(full_alloc_ctx.edata) == extent_state_active);
/* Only slab members should be looked up via interior pointers. */
assert(extent_addr_get(extent) == ptr || extent_slab_get(extent));
assert(edata_addr_get(full_alloc_ctx.edata) == ptr
|| edata_slab_get(full_alloc_ctx.edata));
assert(szind != NSIZES);
assert(full_alloc_ctx.szind != SC_NSIZES);
return sz_index2size(szind);
return edata_usize_get(full_alloc_ctx.edata);
}
static inline void
arena_dalloc_large_no_tcache(
tsdn_t *tsdn, void *ptr, szind_t szind, size_t usize) {
/*
* szind is still needed in this function mainly becuase
* szind < SC_NBINS determines not only if this is a small alloc,
* but also if szind is valid (an inactive extent would have
* szind == SC_NSIZES).
*/
if (config_prof && unlikely(szind < SC_NBINS)) {
arena_dalloc_promoted(tsdn, ptr, NULL, true);
} else {
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
if (large_dalloc_safety_checks(edata, ptr, usize)) {
/* See the comment in isfree. */
return;
}
large_dalloc(tsdn, edata);
}
}
static inline void
arena_dalloc_no_tcache(tsdn_t *tsdn, void *ptr) {
assert(ptr != NULL);
rtree_ctx_t rtree_ctx_fallback;
rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback);
szind_t szind;
bool slab;
rtree_szind_slab_read(tsdn, &extents_rtree, rtree_ctx, (uintptr_t)ptr,
true, &szind, &slab);
emap_alloc_ctx_t alloc_ctx;
emap_alloc_ctx_lookup(tsdn, &arena_emap_global, ptr, &alloc_ctx);
if (config_debug) {
extent_t *extent = rtree_extent_read(tsdn, &extents_rtree,
rtree_ctx, (uintptr_t)ptr, true);
assert(szind == extent_szind_get(extent));
assert(szind < NSIZES);
assert(slab == extent_slab_get(extent));
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
assert(alloc_ctx.szind == edata_szind_get(edata));
assert(alloc_ctx.szind < SC_NSIZES);
assert(alloc_ctx.slab == edata_slab_get(edata));
assert(emap_alloc_ctx_usize_get(&alloc_ctx)
== edata_usize_get(edata));
}
if (likely(slab)) {
if (likely(alloc_ctx.slab)) {
/* Small allocation. */
arena_dalloc_small(tsdn, ptr);
} else {
extent_t *extent = iealloc(tsdn, ptr);
large_dalloc(tsdn, extent);
arena_dalloc_large_no_tcache(tsdn, ptr, alloc_ctx.szind,
emap_alloc_ctx_usize_get(&alloc_ctx));
}
}
JEMALLOC_ALWAYS_INLINE void
arena_dalloc_large(tsdn_t *tsdn, void *ptr, tcache_t *tcache, szind_t szind,
size_t usize, bool slow_path) {
assert(!tsdn_null(tsdn) && tcache != NULL);
bool is_sample_promoted = config_prof && szind < SC_NBINS;
if (unlikely(is_sample_promoted)) {
arena_dalloc_promoted(tsdn, ptr, tcache, slow_path);
} else {
if (szind < tcache_nbins_get(tcache->tcache_slow)
&& !tcache_bin_disabled(
szind, &tcache->bins[szind], tcache->tcache_slow)) {
tcache_dalloc_large(
tsdn_tsd(tsdn), tcache, ptr, szind, slow_path);
} else {
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
if (large_dalloc_safety_checks(edata, ptr, usize)) {
/* See the comment in isfree. */
return;
}
large_dalloc(tsdn, edata);
}
}
}
JEMALLOC_ALWAYS_INLINE bool
arena_tcache_dalloc_small_safety_check(tsdn_t *tsdn, void *ptr) {
if (!config_debug) {
return false;
}
edata_t *edata = emap_edata_lookup(tsdn, &arena_emap_global, ptr);
szind_t binind = edata_szind_get(edata);
div_info_t div_info = arena_binind_div_info[binind];
/*
* Calls the internal function bin_slab_regind_impl because the
* safety check does not require a lock.
*/
size_t regind = bin_slab_regind_impl(&div_info, binind, edata, ptr);
slab_data_t *slab_data = edata_slab_data_get(edata);
const bin_info_t *bin_info = &bin_infos[binind];
assert(edata_nfree_get(edata) < bin_info->nregs);
if (unlikely(!bitmap_get(
slab_data->bitmap, &bin_info->bitmap_info, regind))) {
safety_check_fail(
"Invalid deallocation detected: the pointer being freed (%p) not "
"currently active, possibly caused by double free bugs.\n",
ptr);
return true;
}
return false;
}
JEMALLOC_ALWAYS_INLINE void
arena_dalloc(tsdn_t *tsdn, void *ptr, tcache_t *tcache,
alloc_ctx_t *alloc_ctx, bool slow_path) {
emap_alloc_ctx_t *caller_alloc_ctx, bool slow_path) {
assert(!tsdn_null(tsdn) || tcache == NULL);
assert(ptr != NULL);
@ -204,158 +374,165 @@ arena_dalloc(tsdn_t *tsdn, void *ptr, tcache_t *tcache,
return;
}
szind_t szind;
bool slab;
rtree_ctx_t *rtree_ctx;
if (alloc_ctx != NULL) {
szind = alloc_ctx->szind;
slab = alloc_ctx->slab;
assert(szind != NSIZES);
emap_alloc_ctx_t alloc_ctx;
if (caller_alloc_ctx != NULL) {
alloc_ctx = *caller_alloc_ctx;
} else {
rtree_ctx = tsd_rtree_ctx(tsdn_tsd(tsdn));
rtree_szind_slab_read(tsdn, &extents_rtree, rtree_ctx,
(uintptr_t)ptr, true, &szind, &slab);
util_assume(tsdn != NULL);
emap_alloc_ctx_lookup(
tsdn, &arena_emap_global, ptr, &alloc_ctx);
}
if (config_debug) {
rtree_ctx = tsd_rtree_ctx(tsdn_tsd(tsdn));
extent_t *extent = rtree_extent_read(tsdn, &extents_rtree,
rtree_ctx, (uintptr_t)ptr, true);
assert(szind == extent_szind_get(extent));
assert(szind < NSIZES);
assert(slab == extent_slab_get(extent));
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
assert(alloc_ctx.szind == edata_szind_get(edata));
assert(alloc_ctx.szind < SC_NSIZES);
assert(alloc_ctx.slab == edata_slab_get(edata));
assert(emap_alloc_ctx_usize_get(&alloc_ctx)
== edata_usize_get(edata));
}
if (likely(slab)) {
if (likely(alloc_ctx.slab)) {
/* Small allocation. */
tcache_dalloc_small(tsdn_tsd(tsdn), tcache, ptr, szind,
slow_path);
} else {
if (szind < nhbins) {
if (config_prof && unlikely(szind < NBINS)) {
arena_dalloc_promoted(tsdn, ptr, tcache,
slow_path);
} else {
tcache_dalloc_large(tsdn_tsd(tsdn), tcache, ptr,
szind, slow_path);
}
} else {
extent_t *extent = iealloc(tsdn, ptr);
large_dalloc(tsdn, extent);
if (arena_tcache_dalloc_small_safety_check(tsdn, ptr)) {
return;
}
tcache_dalloc_small(
tsdn_tsd(tsdn), tcache, ptr, alloc_ctx.szind, slow_path);
} else {
arena_dalloc_large(tsdn, ptr, tcache, alloc_ctx.szind,
emap_alloc_ctx_usize_get(&alloc_ctx), slow_path);
}
}
static inline void
arena_sdalloc_no_tcache(tsdn_t *tsdn, void *ptr, size_t size) {
assert(ptr != NULL);
assert(size <= LARGE_MAXCLASS);
assert(size <= SC_LARGE_MAXCLASS);
szind_t szind;
bool slab;
emap_alloc_ctx_t alloc_ctx;
if (!config_prof || !opt_prof) {
/*
* There is no risk of being confused by a promoted sampled
* object, so base szind and slab on the given size.
*/
szind = sz_size2index(size);
slab = (szind < NBINS);
szind_t szind = sz_size2index(size);
emap_alloc_ctx_init(
&alloc_ctx, szind, (szind < SC_NBINS), size);
}
if ((config_prof && opt_prof) || config_debug) {
rtree_ctx_t rtree_ctx_fallback;
rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn,
&rtree_ctx_fallback);
emap_alloc_ctx_lookup(
tsdn, &arena_emap_global, ptr, &alloc_ctx);
rtree_szind_slab_read(tsdn, &extents_rtree, rtree_ctx,
(uintptr_t)ptr, true, &szind, &slab);
assert(szind == sz_size2index(size));
assert((config_prof && opt_prof) || slab == (szind < NBINS));
assert(alloc_ctx.szind == sz_size2index(size));
assert((config_prof && opt_prof)
|| alloc_ctx.slab == (alloc_ctx.szind < SC_NBINS));
if (config_debug) {
extent_t *extent = rtree_extent_read(tsdn,
&extents_rtree, rtree_ctx, (uintptr_t)ptr, true);
assert(szind == extent_szind_get(extent));
assert(slab == extent_slab_get(extent));
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
assert(alloc_ctx.szind == edata_szind_get(edata));
assert(alloc_ctx.slab == edata_slab_get(edata));
}
}
if (likely(slab)) {
if (likely(alloc_ctx.slab)) {
/* Small allocation. */
arena_dalloc_small(tsdn, ptr);
} else {
extent_t *extent = iealloc(tsdn, ptr);
large_dalloc(tsdn, extent);
arena_dalloc_large_no_tcache(tsdn, ptr, alloc_ctx.szind,
emap_alloc_ctx_usize_get(&alloc_ctx));
}
}
JEMALLOC_ALWAYS_INLINE void
arena_sdalloc(tsdn_t *tsdn, void *ptr, size_t size, tcache_t *tcache,
alloc_ctx_t *alloc_ctx, bool slow_path) {
emap_alloc_ctx_t *caller_alloc_ctx, bool slow_path) {
assert(!tsdn_null(tsdn) || tcache == NULL);
assert(ptr != NULL);
assert(size <= LARGE_MAXCLASS);
assert(size <= SC_LARGE_MAXCLASS);
if (unlikely(tcache == NULL)) {
arena_sdalloc_no_tcache(tsdn, ptr, size);
return;
}
szind_t szind;
bool slab;
UNUSED alloc_ctx_t local_ctx;
emap_alloc_ctx_t alloc_ctx;
if (config_prof && opt_prof) {
if (alloc_ctx == NULL) {
if (caller_alloc_ctx == NULL) {
/* Uncommon case and should be a static check. */
rtree_ctx_t rtree_ctx_fallback;
rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn,
&rtree_ctx_fallback);
rtree_szind_slab_read(tsdn, &extents_rtree, rtree_ctx,
(uintptr_t)ptr, true, &local_ctx.szind,
&local_ctx.slab);
assert(local_ctx.szind == sz_size2index(size));
alloc_ctx = &local_ctx;
emap_alloc_ctx_lookup(
tsdn, &arena_emap_global, ptr, &alloc_ctx);
assert(alloc_ctx.szind == sz_size2index(size));
assert(emap_alloc_ctx_usize_get(&alloc_ctx) == size);
} else {
alloc_ctx = *caller_alloc_ctx;
}
slab = alloc_ctx->slab;
szind = alloc_ctx->szind;
} else {
/*
* There is no risk of being confused by a promoted sampled
* object, so base szind and slab on the given size.
*/
szind = sz_size2index(size);
slab = (szind < NBINS);
alloc_ctx.szind = sz_size2index(size);
alloc_ctx.slab = (alloc_ctx.szind < SC_NBINS);
}
if (config_debug) {
rtree_ctx_t *rtree_ctx = tsd_rtree_ctx(tsdn_tsd(tsdn));
rtree_szind_slab_read(tsdn, &extents_rtree, rtree_ctx,
(uintptr_t)ptr, true, &szind, &slab);
extent_t *extent = rtree_extent_read(tsdn,
&extents_rtree, rtree_ctx, (uintptr_t)ptr, true);
assert(szind == extent_szind_get(extent));
assert(slab == extent_slab_get(extent));
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
assert(alloc_ctx.szind == edata_szind_get(edata));
assert(alloc_ctx.slab == edata_slab_get(edata));
emap_alloc_ctx_init(
&alloc_ctx, alloc_ctx.szind, alloc_ctx.slab, sz_s2u(size));
assert(emap_alloc_ctx_usize_get(&alloc_ctx)
== edata_usize_get(edata));
}
if (likely(slab)) {
if (likely(alloc_ctx.slab)) {
/* Small allocation. */
tcache_dalloc_small(tsdn_tsd(tsdn), tcache, ptr, szind,
slow_path);
} else {
if (szind < nhbins) {
if (config_prof && unlikely(szind < NBINS)) {
arena_dalloc_promoted(tsdn, ptr, tcache,
slow_path);
} else {
tcache_dalloc_large(tsdn_tsd(tsdn),
tcache, ptr, szind, slow_path);
}
} else {
extent_t *extent = iealloc(tsdn, ptr);
large_dalloc(tsdn, extent);
if (arena_tcache_dalloc_small_safety_check(tsdn, ptr)) {
return;
}
tcache_dalloc_small(
tsdn_tsd(tsdn), tcache, ptr, alloc_ctx.szind, slow_path);
} else {
arena_dalloc_large(tsdn, ptr, tcache, alloc_ctx.szind,
sz_s2u(size), slow_path);
}
}
static inline void
arena_cache_oblivious_randomize(
tsdn_t *tsdn, arena_t *arena, edata_t *edata, size_t alignment) {
assert(edata_base_get(edata) == edata_addr_get(edata));
if (alignment < PAGE) {
unsigned lg_range = LG_PAGE
- lg_floor(CACHELINE_CEILING(alignment));
size_t r;
if (!tsdn_null(tsdn)) {
tsd_t *tsd = tsdn_tsd(tsdn);
r = (size_t)prng_lg_range_u64(
tsd_prng_statep_get(tsd), lg_range);
} else {
uint64_t stack_value = (uint64_t)(uintptr_t)&r;
r = (size_t)prng_lg_range_u64(&stack_value, lg_range);
}
uintptr_t random_offset = ((uintptr_t)r)
<< (LG_PAGE - lg_range);
edata->e_addr = (void *)((byte_t *)edata->e_addr
+ random_offset);
assert(ALIGNMENT_ADDR2BASE(edata->e_addr, alignment)
== edata->e_addr);
}
}
static inline bin_t *
arena_get_bin(arena_t *arena, szind_t binind, unsigned binshard) {
bin_t *shard0 = (bin_t *)((byte_t *)arena + arena_bin_offsets[binind]);
return shard0 + binshard;
}
#endif /* JEMALLOC_INTERNAL_ARENA_INLINES_B_H */

View file

@ -0,0 +1,123 @@
#ifndef JEMALLOC_INTERNAL_ARENA_STATS_H
#define JEMALLOC_INTERNAL_ARENA_STATS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/lockedint.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/mutex_prof.h"
#include "jemalloc/internal/pa.h"
#include "jemalloc/internal/sc.h"
JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS
typedef struct arena_stats_large_s arena_stats_large_t;
struct arena_stats_large_s {
/*
* Total number of large allocation/deallocation requests served directly
* by the arena.
*/
locked_u64_t nmalloc;
locked_u64_t ndalloc;
/*
* Total large active bytes (allocated - deallocated) served directly
* by the arena.
*/
locked_u64_t active_bytes;
/*
* Number of allocation requests that correspond to this size class.
* This includes requests served by tcache, though tcache only
* periodically merges into this counter.
*/
locked_u64_t nrequests; /* Partially derived. */
/*
* Number of tcache fills / flushes for large (similarly, periodically
* merged). Note that there is no large tcache batch-fill currently
* (i.e. only fill 1 at a time); however flush may be batched.
*/
locked_u64_t nfills; /* Partially derived. */
locked_u64_t nflushes; /* Partially derived. */
/* Current number of allocations of this size class. */
size_t curlextents; /* Derived. */
};
/*
* Arena stats. Note that fields marked "derived" are not directly maintained
* within the arena code; rather their values are derived during stats merge
* requests.
*/
typedef struct arena_stats_s arena_stats_t;
struct arena_stats_s {
LOCKEDINT_MTX_DECLARE(mtx)
/*
* resident includes the base stats -- that's why it lives here and not
* in pa_shard_stats_t.
*/
size_t base; /* Derived. */
size_t metadata_edata; /* Derived. */
size_t metadata_rtree; /* Derived. */
size_t resident; /* Derived. */
size_t metadata_thp; /* Derived. */
size_t mapped; /* Derived. */
atomic_zu_t internal;
size_t allocated_large; /* Derived. */
uint64_t nmalloc_large; /* Derived. */
uint64_t ndalloc_large; /* Derived. */
uint64_t nfills_large; /* Derived. */
uint64_t nflushes_large; /* Derived. */
uint64_t nrequests_large; /* Derived. */
/*
* The stats logically owned by the pa_shard in the same arena. This
* lives here only because it's convenient for the purposes of the ctl
* module -- it only knows about the single arena_stats.
*/
pa_shard_stats_t pa_shard_stats;
/* Number of bytes cached in tcache associated with this arena. */
size_t tcache_bytes; /* Derived. */
size_t tcache_stashed_bytes; /* Derived. */
mutex_prof_data_t mutex_prof_data[mutex_prof_num_arena_mutexes];
/* One element for each large size class. */
arena_stats_large_t lstats[SC_NSIZES - SC_NBINS];
/* Arena uptime. */
nstime_t uptime;
};
static inline bool
arena_stats_init(tsdn_t *tsdn, arena_stats_t *arena_stats) {
if (config_debug) {
for (size_t i = 0; i < sizeof(arena_stats_t); i++) {
assert(((char *)arena_stats)[i] == 0);
}
}
if (LOCKEDINT_MTX_INIT(arena_stats->mtx, "arena_stats",
WITNESS_RANK_ARENA_STATS, malloc_mutex_rank_exclusive)) {
return true;
}
/* Memory is zeroed, so there is no need to clear stats. */
return false;
}
static inline void
arena_stats_large_flush_nrequests_add(tsdn_t *tsdn, arena_stats_t *arena_stats,
szind_t szind, uint64_t nrequests) {
LOCKEDINT_MTX_LOCK(tsdn, arena_stats->mtx);
arena_stats_large_t *lstats = &arena_stats->lstats[szind - SC_NBINS];
locked_inc_u64(tsdn, LOCKEDINT_MTX(arena_stats->mtx),
&lstats->nrequests, nrequests);
locked_inc_u64(
tsdn, LOCKEDINT_MTX(arena_stats->mtx), &lstats->nflushes, 1);
LOCKEDINT_MTX_UNLOCK(tsdn, arena_stats->mtx);
}
#endif /* JEMALLOC_INTERNAL_ARENA_STATS_H */

View file

@ -0,0 +1,111 @@
#ifndef JEMALLOC_INTERNAL_ARENA_STRUCTS_H
#define JEMALLOC_INTERNAL_ARENA_STRUCTS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_stats.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/bin.h"
#include "jemalloc/internal/bitmap.h"
#include "jemalloc/internal/counter.h"
#include "jemalloc/internal/ecache.h"
#include "jemalloc/internal/edata_cache.h"
#include "jemalloc/internal/extent_dss.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/nstime.h"
#include "jemalloc/internal/pa.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/ticker.h"
struct arena_s {
/*
* Number of threads currently assigned to this arena. Each thread has
* two distinct assignments, one for application-serving allocation, and
* the other for internal metadata allocation. Internal metadata must
* not be allocated from arenas explicitly created via the arenas.create
* mallctl, because the arena.<i>.reset mallctl indiscriminately
* discards all allocations for the affected arena.
*
* 0: Application allocation.
* 1: Internal metadata allocation.
*
* Synchronization: atomic.
*/
atomic_u_t nthreads[2];
/* Next bin shard for binding new threads. Synchronization: atomic. */
atomic_u_t binshard_next;
/*
* When percpu_arena is enabled, to amortize the cost of reading /
* updating the current CPU id, track the most recent thread accessing
* this arena, and only read CPU if there is a mismatch.
*/
tsdn_t *last_thd;
/* Synchronization: internal. */
arena_stats_t stats;
/*
* Lists of tcaches and cache_bin_array_descriptors for extant threads
* associated with this arena. Stats from these are merged
* incrementally, and at exit if opt_stats_print is enabled.
*
* Synchronization: tcache_ql_mtx.
*/
ql_head(tcache_slow_t) tcache_ql;
ql_head(cache_bin_array_descriptor_t) cache_bin_array_descriptor_ql;
malloc_mutex_t tcache_ql_mtx;
/*
* Represents a dss_prec_t, but atomically.
*
* Synchronization: atomic.
*/
atomic_u_t dss_prec;
/*
* Extant large allocations.
*
* Synchronization: large_mtx.
*/
edata_list_active_t large;
/* Synchronizes all large allocation/update/deallocation. */
malloc_mutex_t large_mtx;
/* The page-level allocator shard this arena uses. */
pa_shard_t pa_shard;
/*
* A cached copy of base->ind. This can get accessed on hot paths;
* looking it up in base requires an extra pointer hop / cache miss.
*/
unsigned ind;
/*
* Base allocator, from which arena metadata are allocated.
*
* Synchronization: internal.
*/
base_t *base;
/* Used to determine uptime. Read-only after initialization. */
nstime_t create_time;
/* The name of the arena. */
char name[ARENA_NAME_LEN];
/*
* The arena is allocated alongside its bins; really this is a
* dynamically sized array determined by the binshard settings.
* Enforcing cacheline-alignment to minimize the number of cachelines
* touched on the hot paths.
*/
JEMALLOC_WARN_ON_USAGE(
"Do not use this field directly. "
"Use `arena_get_bin` instead.")
JEMALLOC_ALIGNED(CACHELINE)
bin_t all_bins[0];
};
#endif /* JEMALLOC_INTERNAL_ARENA_STRUCTS_H */

View file

@ -1,11 +0,0 @@
#ifndef JEMALLOC_INTERNAL_ARENA_STRUCTS_A_H
#define JEMALLOC_INTERNAL_ARENA_STRUCTS_A_H
#include "jemalloc/internal/bitmap.h"
struct arena_slab_data_s {
/* Per region allocated/deallocated bitmap. */
bitmap_t bitmap[BITMAP_GROUPS_MAX];
};
#endif /* JEMALLOC_INTERNAL_ARENA_STRUCTS_A_H */

View file

@ -1,284 +0,0 @@
#ifndef JEMALLOC_INTERNAL_ARENA_STRUCTS_B_H
#define JEMALLOC_INTERNAL_ARENA_STRUCTS_B_H
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/bitmap.h"
#include "jemalloc/internal/extent_dss.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/nstime.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/size_classes.h"
#include "jemalloc/internal/smoothstep.h"
#include "jemalloc/internal/stats.h"
#include "jemalloc/internal/ticker.h"
/*
* Read-only information associated with each element of arena_t's bins array
* is stored separately, partly to reduce memory usage (only one copy, rather
* than one per arena), but mainly to avoid false cacheline sharing.
*
* Each slab has the following layout:
*
* /--------------------\
* | region 0 |
* |--------------------|
* | region 1 |
* |--------------------|
* | ... |
* | ... |
* | ... |
* |--------------------|
* | region nregs-1 |
* \--------------------/
*/
struct arena_bin_info_s {
/* Size of regions in a slab for this bin's size class. */
size_t reg_size;
/* Total size of a slab for this bin's size class. */
size_t slab_size;
/* Total number of regions in a slab for this bin's size class. */
uint32_t nregs;
/*
* Metadata used to manipulate bitmaps for slabs associated with this
* bin.
*/
bitmap_info_t bitmap_info;
};
struct arena_decay_s {
/* Synchronizes all non-atomic fields. */
malloc_mutex_t mtx;
/*
* True if a thread is currently purging the extents associated with
* this decay structure.
*/
bool purging;
/*
* Approximate time in milliseconds from the creation of a set of unused
* dirty pages until an equivalent set of unused dirty pages is purged
* and/or reused.
*/
atomic_zd_t time_ms;
/* time / SMOOTHSTEP_NSTEPS. */
nstime_t interval;
/*
* Time at which the current decay interval logically started. We do
* not actually advance to a new epoch until sometime after it starts
* because of scheduling and computation delays, and it is even possible
* to completely skip epochs. In all cases, during epoch advancement we
* merge all relevant activity into the most recently recorded epoch.
*/
nstime_t epoch;
/* Deadline randomness generator. */
uint64_t jitter_state;
/*
* Deadline for current epoch. This is the sum of interval and per
* epoch jitter which is a uniform random variable in [0..interval).
* Epochs always advance by precise multiples of interval, but we
* randomize the deadline to reduce the likelihood of arenas purging in
* lockstep.
*/
nstime_t deadline;
/*
* Number of unpurged pages at beginning of current epoch. During epoch
* advancement we use the delta between arena->decay_*.nunpurged and
* extents_npages_get(&arena->extents_*) to determine how many dirty
* pages, if any, were generated.
*/
size_t nunpurged;
/*
* Trailing log of how many unused dirty pages were generated during
* each of the past SMOOTHSTEP_NSTEPS decay epochs, where the last
* element is the most recent epoch. Corresponding epoch times are
* relative to epoch.
*/
size_t backlog[SMOOTHSTEP_NSTEPS];
/*
* Pointer to associated stats. These stats are embedded directly in
* the arena's stats due to how stats structures are shared between the
* arena and ctl code.
*
* Synchronization: Same as associated arena's stats field. */
decay_stats_t *stats;
/* Peak number of pages in associated extents. Used for debug only. */
uint64_t ceil_npages;
};
struct arena_bin_s {
/* All operations on arena_bin_t fields require lock ownership. */
malloc_mutex_t lock;
/*
* Current slab being used to service allocations of this bin's size
* class. slabcur is independent of slabs_{nonfull,full}; whenever
* slabcur is reassigned, the previous slab must be deallocated or
* inserted into slabs_{nonfull,full}.
*/
extent_t *slabcur;
/*
* Heap of non-full slabs. This heap is used to assure that new
* allocations come from the non-full slab that is oldest/lowest in
* memory.
*/
extent_heap_t slabs_nonfull;
/* List used to track full slabs. */
extent_list_t slabs_full;
/* Bin statistics. */
malloc_bin_stats_t stats;
};
struct arena_s {
/*
* Number of threads currently assigned to this arena. Each thread has
* two distinct assignments, one for application-serving allocation, and
* the other for internal metadata allocation. Internal metadata must
* not be allocated from arenas explicitly created via the arenas.create
* mallctl, because the arena.<i>.reset mallctl indiscriminately
* discards all allocations for the affected arena.
*
* 0: Application allocation.
* 1: Internal metadata allocation.
*
* Synchronization: atomic.
*/
atomic_u_t nthreads[2];
/*
* When percpu_arena is enabled, to amortize the cost of reading /
* updating the current CPU id, track the most recent thread accessing
* this arena, and only read CPU if there is a mismatch.
*/
tsdn_t *last_thd;
/* Synchronization: internal. */
arena_stats_t stats;
/*
* List of tcaches for extant threads associated with this arena.
* Stats from these are merged incrementally, and at exit if
* opt_stats_print is enabled.
*
* Synchronization: tcache_ql_mtx.
*/
ql_head(tcache_t) tcache_ql;
malloc_mutex_t tcache_ql_mtx;
/* Synchronization: internal. */
prof_accum_t prof_accum;
uint64_t prof_accumbytes;
/*
* PRNG state for cache index randomization of large allocation base
* pointers.
*
* Synchronization: atomic.
*/
atomic_zu_t offset_state;
/*
* Extent serial number generator state.
*
* Synchronization: atomic.
*/
atomic_zu_t extent_sn_next;
/*
* Represents a dss_prec_t, but atomically.
*
* Synchronization: atomic.
*/
atomic_u_t dss_prec;
/*
* Number of pages in active extents.
*
* Synchronization: atomic.
*/
atomic_zu_t nactive;
/*
* Extant large allocations.
*
* Synchronization: large_mtx.
*/
extent_list_t large;
/* Synchronizes all large allocation/update/deallocation. */
malloc_mutex_t large_mtx;
/*
* Collections of extents that were previously allocated. These are
* used when allocating extents, in an attempt to re-use address space.
*
* Synchronization: internal.
*/
extents_t extents_dirty;
extents_t extents_muzzy;
extents_t extents_retained;
/*
* Decay-based purging state, responsible for scheduling extent state
* transitions.
*
* Synchronization: internal.
*/
arena_decay_t decay_dirty; /* dirty --> muzzy */
arena_decay_t decay_muzzy; /* muzzy --> retained */
/*
* Next extent size class in a growing series to use when satisfying a
* request via the extent hooks (only if opt_retain). This limits the
* number of disjoint virtual memory ranges so that extent merging can
* be effective even if multiple arenas' extent allocation requests are
* highly interleaved.
*
* Synchronization: extent_grow_mtx
*/
pszind_t extent_grow_next;
malloc_mutex_t extent_grow_mtx;
/*
* Available extent structures that were allocated via
* base_alloc_extent().
*
* Synchronization: extent_avail_mtx.
*/
extent_tree_t extent_avail;
malloc_mutex_t extent_avail_mtx;
/*
* bins is used to store heaps of free regions.
*
* Synchronization: internal.
*/
arena_bin_t bins[NBINS];
/*
* Base allocator, from which arena metadata are allocated.
*
* Synchronization: internal.
*/
base_t *base;
/* Used to determine uptime. Read-only after initialization. */
nstime_t create_time;
};
/* Used in conjunction with tsd for fast arena-related context lookup. */
struct arena_tdata_s {
ticker_t decay_ticker;
};
/* Used to pass rtree lookup context down the path. */
struct alloc_ctx_s {
szind_t szind;
bool slab;
};
#endif /* JEMALLOC_INTERNAL_ARENA_STRUCTS_B_H */

View file

@ -1,45 +1,60 @@
#ifndef JEMALLOC_INTERNAL_ARENA_TYPES_H
#define JEMALLOC_INTERNAL_ARENA_TYPES_H
/* Maximum number of regions in one slab. */
#define LG_SLAB_MAXREGS (LG_PAGE - LG_TINY_MIN)
#define SLAB_MAXREGS (1U << LG_SLAB_MAXREGS)
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/sc.h"
/* Default decay times in milliseconds. */
#define DIRTY_DECAY_MS_DEFAULT ZD(10 * 1000)
#define MUZZY_DECAY_MS_DEFAULT ZD(10 * 1000)
#define DIRTY_DECAY_MS_DEFAULT ZD(10 * 1000)
#define MUZZY_DECAY_MS_DEFAULT (0)
/* Number of event ticks between time checks. */
#define DECAY_NTICKS_PER_UPDATE 1000
#define ARENA_DECAY_NTICKS_PER_UPDATE 1000
/* Maximum length of the arena name. */
#define ARENA_NAME_LEN 32
typedef struct arena_slab_data_s arena_slab_data_t;
typedef struct arena_bin_info_s arena_bin_info_t;
typedef struct arena_decay_s arena_decay_t;
typedef struct arena_bin_s arena_bin_t;
typedef struct arena_s arena_t;
typedef struct arena_tdata_s arena_tdata_t;
typedef struct alloc_ctx_s alloc_ctx_t;
typedef enum {
percpu_arena_mode_names_base = 0, /* Used for options processing. */
percpu_arena_mode_names_base = 0, /* Used for options processing. */
/*
* *_uninit are used only during bootstrapping, and must correspond
* to initialized variant plus percpu_arena_mode_enabled_base.
*/
percpu_arena_uninit = 0,
per_phycpu_arena_uninit = 1,
percpu_arena_uninit = 0,
per_phycpu_arena_uninit = 1,
/* All non-disabled modes must come after percpu_arena_disabled. */
percpu_arena_disabled = 2,
percpu_arena_disabled = 2,
percpu_arena_mode_names_limit = 3, /* Used for options processing. */
percpu_arena_mode_names_limit = 3, /* Used for options processing. */
percpu_arena_mode_enabled_base = 3,
percpu_arena = 3,
per_phycpu_arena = 4 /* Hyper threads share arena. */
percpu_arena = 3,
per_phycpu_arena = 4 /* Hyper threads share arena. */
} percpu_arena_mode_t;
#define PERCPU_ARENA_ENABLED(m) ((m) >= percpu_arena_mode_enabled_base)
#define PERCPU_ARENA_DEFAULT percpu_arena_disabled
#define PERCPU_ARENA_ENABLED(m) ((m) >= percpu_arena_mode_enabled_base)
#define PERCPU_ARENA_DEFAULT percpu_arena_disabled
/*
* When allocation_size >= oversize_threshold, use the dedicated huge arena
* (unless have explicitly spicified arena index). 0 disables the feature.
*/
#define OVERSIZE_THRESHOLD_DEFAULT (8 << 20)
struct arena_config_s {
/* extent hooks to be used for the arena */
extent_hooks_t *extent_hooks;
/*
* Use extent hooks for metadata (base) allocations when true.
*/
bool metadata_use_hooks;
};
typedef struct arena_config_s arena_config_t;
extern const arena_config_t arena_config_default;
#endif /* JEMALLOC_INTERNAL_ARENA_TYPES_H */

View file

@ -1,3 +1,4 @@
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/malloc_io.h"
#include "jemalloc/internal/util.h"
@ -6,51 +7,57 @@
* assertion failure.
*/
#ifndef assert
#define assert(e) do { \
if (unlikely(config_debug && !(e))) { \
malloc_printf( \
"<jemalloc>: %s:%d: Failed assertion: \"%s\"\n", \
__FILE__, __LINE__, #e); \
abort(); \
} \
} while (0)
# define assert(e) \
do { \
if (unlikely(config_debug && !(e))) { \
malloc_printf( \
"<jemalloc>: %s:%d: Failed assertion: \"%s\"\n", \
__FILE__, __LINE__, #e); \
abort(); \
} \
} while (0)
#endif
#ifndef not_reached
#define not_reached() do { \
if (config_debug) { \
malloc_printf( \
"<jemalloc>: %s:%d: Unreachable code reached\n", \
__FILE__, __LINE__); \
abort(); \
} \
unreachable(); \
} while (0)
# define not_reached() \
do { \
if (config_debug) { \
malloc_printf( \
"<jemalloc>: %s:%d: Unreachable code reached\n", \
__FILE__, __LINE__); \
abort(); \
} \
unreachable(); \
} while (0)
#endif
#ifndef not_implemented
#define not_implemented() do { \
if (config_debug) { \
malloc_printf("<jemalloc>: %s:%d: Not implemented\n", \
__FILE__, __LINE__); \
abort(); \
} \
} while (0)
# define not_implemented() \
do { \
if (config_debug) { \
malloc_printf( \
"<jemalloc>: %s:%d: Not implemented\n", \
__FILE__, __LINE__); \
abort(); \
} \
} while (0)
#endif
#ifndef assert_not_implemented
#define assert_not_implemented(e) do { \
if (unlikely(config_debug && !(e))) { \
not_implemented(); \
} \
} while (0)
# define assert_not_implemented(e) \
do { \
if (unlikely(config_debug && !(e))) { \
not_implemented(); \
} \
} while (0)
#endif
/* Use to assert a particular configuration, e.g., cassert(config_debug). */
#ifndef cassert
#define cassert(c) do { \
if (unlikely(!(c))) { \
not_reached(); \
} \
} while (0)
# define cassert(c) \
do { \
if (unlikely(!(c))) { \
not_reached(); \
} \
} while (0)
#endif

View file

@ -1,20 +1,29 @@
#ifndef JEMALLOC_INTERNAL_ATOMIC_H
#define JEMALLOC_INTERNAL_ATOMIC_H
#define ATOMIC_INLINE static inline
#include "jemalloc/internal/jemalloc_preamble.h"
#define JEMALLOC_U8_ATOMICS
#if defined(JEMALLOC_GCC_ATOMIC_ATOMICS)
# include "jemalloc/internal/atomic_gcc_atomic.h"
# include "jemalloc/internal/atomic_gcc_atomic.h"
# if !defined(JEMALLOC_GCC_U8_ATOMIC_ATOMICS)
# undef JEMALLOC_U8_ATOMICS
# endif
#elif defined(JEMALLOC_GCC_SYNC_ATOMICS)
# include "jemalloc/internal/atomic_gcc_sync.h"
# include "jemalloc/internal/atomic_gcc_sync.h"
# if !defined(JEMALLOC_GCC_U8_SYNC_ATOMICS)
# undef JEMALLOC_U8_ATOMICS
# endif
#elif defined(_MSC_VER)
# include "jemalloc/internal/atomic_msvc.h"
# include "jemalloc/internal/atomic_msvc.h"
#elif defined(JEMALLOC_C11_ATOMICS)
# include "jemalloc/internal/atomic_c11.h"
# include "jemalloc/internal/atomic_c11.h"
#else
# error "Don't have atomics implemented on this platform."
# error "Don't have atomics implemented on this platform."
#endif
#define ATOMIC_INLINE JEMALLOC_ALWAYS_INLINE
/*
* This header gives more or less a backport of C11 atomics. The user can write
* JEMALLOC_GENERATE_ATOMICS(type, short_type, lg_sizeof_type); to generate
@ -44,12 +53,30 @@
#define ATOMIC_ACQ_REL atomic_memory_order_acq_rel
#define ATOMIC_SEQ_CST atomic_memory_order_seq_cst
/*
* Another convenience -- simple atomic helper functions.
*/
#define JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(type, short_type, lg_size) \
JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, lg_size) \
ATOMIC_INLINE void atomic_load_add_store_##short_type( \
atomic_##short_type##_t *a, type inc) { \
type oldval = atomic_load_##short_type(a, ATOMIC_RELAXED); \
type newval = oldval + inc; \
atomic_store_##short_type(a, newval, ATOMIC_RELAXED); \
} \
ATOMIC_INLINE void atomic_load_sub_store_##short_type( \
atomic_##short_type##_t *a, type inc) { \
type oldval = atomic_load_##short_type(a, ATOMIC_RELAXED); \
type newval = oldval - inc; \
atomic_store_##short_type(a, newval, ATOMIC_RELAXED); \
}
/*
* Not all platforms have 64-bit atomics. If we do, this #define exposes that
* fact.
*/
#if (LG_SIZEOF_PTR == 3 || LG_SIZEOF_INT == 3)
# define JEMALLOC_ATOMIC_U64
# define JEMALLOC_ATOMIC_U64
#endif
JEMALLOC_GENERATE_ATOMICS(void *, p, LG_SIZEOF_PTR)
@ -60,16 +87,20 @@ JEMALLOC_GENERATE_ATOMICS(void *, p, LG_SIZEOF_PTR)
*/
JEMALLOC_GENERATE_ATOMICS(bool, b, 0)
JEMALLOC_GENERATE_INT_ATOMICS(unsigned, u, LG_SIZEOF_INT)
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(unsigned, u, LG_SIZEOF_INT)
JEMALLOC_GENERATE_INT_ATOMICS(size_t, zu, LG_SIZEOF_PTR)
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(int, i, LG_SIZEOF_INT)
JEMALLOC_GENERATE_INT_ATOMICS(ssize_t, zd, LG_SIZEOF_PTR)
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(size_t, zu, LG_SIZEOF_PTR)
JEMALLOC_GENERATE_INT_ATOMICS(uint32_t, u32, 2)
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(ssize_t, zd, LG_SIZEOF_PTR)
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(uint8_t, u8, 0)
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(uint32_t, u32, 2)
#ifdef JEMALLOC_ATOMIC_U64
JEMALLOC_GENERATE_INT_ATOMICS(uint64_t, u64, 3)
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(uint64_t, u64, 3)
#endif
#undef ATOMIC_INLINE

View file

@ -1,6 +1,7 @@
#ifndef JEMALLOC_INTERNAL_ATOMIC_C11_H
#define JEMALLOC_INTERNAL_ATOMIC_C11_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include <stdatomic.h>
#define ATOMIC_INIT(...) ATOMIC_VAR_INIT(__VA_ARGS__)
@ -14,6 +15,7 @@
#define atomic_fence atomic_thread_fence
/* clang-format off */
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, \
/* unused */ lg_size) \
typedef _Atomic(type) atomic_##short_type##_t; \
@ -58,40 +60,35 @@ atomic_compare_exchange_strong_##short_type(atomic_##short_type##_t *a, \
return atomic_compare_exchange_strong_explicit(a, expected, \
desired, success_mo, failure_mo); \
}
/* clang-format on */
/*
* Integral types have some special operations available that non-integral ones
* lack.
*/
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, \
/* unused */ lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
\
ATOMIC_INLINE type \
atomic_fetch_add_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
return atomic_fetch_add_explicit(a, val, mo); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_sub_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
return atomic_fetch_sub_explicit(a, val, mo); \
} \
ATOMIC_INLINE type \
atomic_fetch_and_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
return atomic_fetch_and_explicit(a, val, mo); \
} \
ATOMIC_INLINE type \
atomic_fetch_or_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
return atomic_fetch_or_explicit(a, val, mo); \
} \
ATOMIC_INLINE type \
atomic_fetch_xor_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
return atomic_fetch_xor_explicit(a, val, mo); \
}
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, /* unused */ lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
\
ATOMIC_INLINE type atomic_fetch_add_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_add_explicit(a, val, mo); \
} \
\
ATOMIC_INLINE type atomic_fetch_sub_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_sub_explicit(a, val, mo); \
} \
ATOMIC_INLINE type atomic_fetch_and_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_and_explicit(a, val, mo); \
} \
ATOMIC_INLINE type atomic_fetch_or_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_or_explicit(a, val, mo); \
} \
ATOMIC_INLINE type atomic_fetch_xor_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_xor_explicit(a, val, mo); \
}
#endif /* JEMALLOC_INTERNAL_ATOMIC_C11_H */

View file

@ -1,9 +1,13 @@
#ifndef JEMALLOC_INTERNAL_ATOMIC_GCC_ATOMIC_H
#define JEMALLOC_INTERNAL_ATOMIC_GCC_ATOMIC_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
#define ATOMIC_INIT(...) {__VA_ARGS__}
#define ATOMIC_INLINE JEMALLOC_ALWAYS_INLINE
#define ATOMIC_INIT(...) \
{ __VA_ARGS__ }
typedef enum {
atomic_memory_order_relaxed,
@ -36,92 +40,82 @@ atomic_fence(atomic_memory_order_t mo) {
__atomic_thread_fence(atomic_enum_to_builtin(mo));
}
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, \
/* unused */ lg_size) \
typedef struct { \
type repr; \
} atomic_##short_type##_t; \
\
ATOMIC_INLINE type \
atomic_load_##short_type(const atomic_##short_type##_t *a, \
atomic_memory_order_t mo) { \
type result; \
__atomic_load(&a->repr, &result, atomic_enum_to_builtin(mo)); \
return result; \
} \
\
ATOMIC_INLINE void \
atomic_store_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
__atomic_store(&a->repr, &val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type \
atomic_exchange_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
type result; \
__atomic_exchange(&a->repr, &val, &result, \
atomic_enum_to_builtin(mo)); \
return result; \
} \
\
ATOMIC_INLINE bool \
atomic_compare_exchange_weak_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
return __atomic_compare_exchange(&a->repr, expected, &desired, \
true, atomic_enum_to_builtin(success_mo), \
atomic_enum_to_builtin(failure_mo)); \
} \
\
ATOMIC_INLINE bool \
atomic_compare_exchange_strong_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
return __atomic_compare_exchange(&a->repr, expected, &desired, \
false, \
atomic_enum_to_builtin(success_mo), \
atomic_enum_to_builtin(failure_mo)); \
}
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
typedef struct { \
type repr; \
} atomic_##short_type##_t; \
\
ATOMIC_INLINE type atomic_load_##short_type( \
const atomic_##short_type##_t *a, atomic_memory_order_t mo) { \
type result; \
__atomic_load(&a->repr, &result, atomic_enum_to_builtin(mo)); \
return result; \
} \
\
ATOMIC_INLINE void atomic_store_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
__atomic_store(&a->repr, &val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_exchange_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
type result; \
__atomic_exchange( \
&a->repr, &val, &result, atomic_enum_to_builtin(mo)); \
return result; \
} \
\
ATOMIC_INLINE bool atomic_compare_exchange_weak_##short_type( \
atomic_##short_type##_t *a, UNUSED type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
return __atomic_compare_exchange(&a->repr, expected, &desired, \
true, atomic_enum_to_builtin(success_mo), \
atomic_enum_to_builtin(failure_mo)); \
} \
\
ATOMIC_INLINE bool atomic_compare_exchange_strong_##short_type( \
atomic_##short_type##_t *a, UNUSED type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
return __atomic_compare_exchange(&a->repr, expected, &desired, \
false, atomic_enum_to_builtin(success_mo), \
atomic_enum_to_builtin(failure_mo)); \
}
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, /* unused */ lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
\
ATOMIC_INLINE type atomic_fetch_add_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_add( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_fetch_sub_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_sub( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_fetch_and_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_and( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_fetch_or_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_or( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_fetch_xor_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_xor( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
}
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, \
/* unused */ lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
\
ATOMIC_INLINE type \
atomic_fetch_add_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __atomic_fetch_add(&a->repr, val, \
atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_sub_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __atomic_fetch_sub(&a->repr, val, \
atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_and_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __atomic_fetch_and(&a->repr, val, \
atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_or_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __atomic_fetch_or(&a->repr, val, \
atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_xor_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __atomic_fetch_xor(&a->repr, val, \
atomic_enum_to_builtin(mo)); \
}
#undef ATOMIC_INLINE
#endif /* JEMALLOC_INTERNAL_ATOMIC_GCC_ATOMIC_H */

View file

@ -1,7 +1,12 @@
#ifndef JEMALLOC_INTERNAL_ATOMIC_GCC_SYNC_H
#define JEMALLOC_INTERNAL_ATOMIC_GCC_SYNC_H
#define ATOMIC_INIT(...) {__VA_ARGS__}
#include "jemalloc/internal/jemalloc_preamble.h"
#define ATOMIC_INLINE JEMALLOC_ALWAYS_INLINE
#define ATOMIC_INIT(...) \
{ __VA_ARGS__ }
typedef enum {
atomic_memory_order_relaxed,
@ -25,11 +30,13 @@ atomic_fence(atomic_memory_order_t mo) {
return;
}
asm volatile("" ::: "memory");
# if defined(__i386__) || defined(__x86_64__)
#if defined(__i386__) || defined(__x86_64__)
/* This is implicit on x86. */
# elif defined(__ppc__)
#elif defined(__ppc64__)
asm volatile("lwsync");
# elif defined(__sparc__) && defined(__arch64__)
#elif defined(__ppc__)
asm volatile("sync");
#elif defined(__sparc__) && defined(__arch64__)
if (mo == atomic_memory_order_acquire) {
asm volatile("membar #LoadLoad | #LoadStore");
} else if (mo == atomic_memory_order_release) {
@ -37,9 +44,9 @@ atomic_fence(atomic_memory_order_t mo) {
} else {
asm volatile("membar #LoadLoad | #LoadStore | #StoreStore");
}
# else
#else
__sync_synchronize();
# endif
#endif
asm volatile("" ::: "memory");
}
@ -62,25 +69,25 @@ atomic_fence(atomic_memory_order_t mo) {
ATOMIC_INLINE void
atomic_pre_sc_load_fence() {
# if defined(__i386__) || defined(__x86_64__) || \
(defined(__sparc__) && defined(__arch64__))
#if defined(__i386__) || defined(__x86_64__) \
|| (defined(__sparc__) && defined(__arch64__))
atomic_fence(atomic_memory_order_relaxed);
# else
#else
atomic_fence(atomic_memory_order_seq_cst);
# endif
#endif
}
ATOMIC_INLINE void
atomic_post_sc_store_fence() {
# if defined(__i386__) || defined(__x86_64__) || \
(defined(__sparc__) && defined(__arch64__))
#if defined(__i386__) || defined(__x86_64__) \
|| (defined(__sparc__) && defined(__arch64__))
atomic_fence(atomic_memory_order_seq_cst);
# else
#else
atomic_fence(atomic_memory_order_relaxed);
# endif
#endif
}
/* clang-format off */
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, \
/* unused */ lg_size) \
typedef struct { \
@ -113,8 +120,8 @@ atomic_store_##short_type(atomic_##short_type##_t *a, \
} \
\
ATOMIC_INLINE type \
atomic_exchange_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
atomic_exchange_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
/* \
* Because of FreeBSD, we care about gcc 4.2, which doesn't have\
* an atomic exchange builtin. We fake it with a CAS loop. \
@ -129,8 +136,9 @@ atomic_exchange_##short_type(atomic_##short_type##_t *a, type val, \
\
ATOMIC_INLINE bool \
atomic_compare_exchange_weak_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
type prev = __sync_val_compare_and_swap(&a->repr, *expected, \
desired); \
if (prev == *expected) { \
@ -142,8 +150,9 @@ atomic_compare_exchange_weak_##short_type(atomic_##short_type##_t *a, \
} \
ATOMIC_INLINE bool \
atomic_compare_exchange_strong_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
type prev = __sync_val_compare_and_swap(&a->repr, *expected, \
desired); \
if (prev == *expected) { \
@ -153,39 +162,36 @@ atomic_compare_exchange_strong_##short_type(atomic_##short_type##_t *a, \
return false; \
} \
}
/* clang-format on */
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, \
/* unused */ lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
\
ATOMIC_INLINE type \
atomic_fetch_add_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __sync_fetch_and_add(&a->repr, val); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_sub_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __sync_fetch_and_sub(&a->repr, val); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_and_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __sync_fetch_and_and(&a->repr, val); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_or_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __sync_fetch_and_or(&a->repr, val); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_xor_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return __sync_fetch_and_xor(&a->repr, val); \
}
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, /* unused */ lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
\
ATOMIC_INLINE type atomic_fetch_add_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_add(&a->repr, val); \
} \
\
ATOMIC_INLINE type atomic_fetch_sub_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_sub(&a->repr, val); \
} \
\
ATOMIC_INLINE type atomic_fetch_and_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_and(&a->repr, val); \
} \
\
ATOMIC_INLINE type atomic_fetch_or_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_or(&a->repr, val); \
} \
\
ATOMIC_INLINE type atomic_fetch_xor_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_xor(&a->repr, val); \
}
#undef ATOMIC_INLINE
#endif /* JEMALLOC_INTERNAL_ATOMIC_GCC_SYNC_H */

View file

@ -1,7 +1,12 @@
#ifndef JEMALLOC_INTERNAL_ATOMIC_MSVC_H
#define JEMALLOC_INTERNAL_ATOMIC_MSVC_H
#define ATOMIC_INIT(...) {__VA_ARGS__}
#include "jemalloc/internal/jemalloc_preamble.h"
#define ATOMIC_INLINE JEMALLOC_ALWAYS_INLINE
#define ATOMIC_INIT(...) \
{ __VA_ARGS__ }
typedef enum {
atomic_memory_order_relaxed,
@ -11,109 +16,106 @@ typedef enum {
atomic_memory_order_seq_cst
} atomic_memory_order_t;
typedef char atomic_repr_0_t;
typedef short atomic_repr_1_t;
typedef long atomic_repr_2_t;
typedef char atomic_repr_0_t;
typedef short atomic_repr_1_t;
typedef long atomic_repr_2_t;
typedef __int64 atomic_repr_3_t;
ATOMIC_INLINE void
atomic_fence(atomic_memory_order_t mo) {
_ReadWriteBarrier();
# if defined(_M_ARM) || defined(_M_ARM64)
#if defined(_M_ARM) || defined(_M_ARM64)
/* ARM needs a barrier for everything but relaxed. */
if (mo != atomic_memory_order_relaxed) {
MemoryBarrier();
}
# elif defined(_M_IX86) || defined (_M_X64)
#elif defined(_M_IX86) || defined(_M_X64)
/* x86 needs a barrier only for seq_cst. */
if (mo == atomic_memory_order_seq_cst) {
MemoryBarrier();
}
# else
# error "Don't know how to create atomics for this platform for MSVC."
# endif
#else
# error "Don't know how to create atomics for this platform for MSVC."
#endif
_ReadWriteBarrier();
}
#define ATOMIC_INTERLOCKED_REPR(lg_size) atomic_repr_ ## lg_size ## _t
#define ATOMIC_INTERLOCKED_REPR(lg_size) atomic_repr_##lg_size##_t
#define ATOMIC_CONCAT(a, b) ATOMIC_RAW_CONCAT(a, b)
#define ATOMIC_RAW_CONCAT(a, b) a ## b
#define ATOMIC_RAW_CONCAT(a, b) a##b
#define ATOMIC_INTERLOCKED_NAME(base_name, lg_size) ATOMIC_CONCAT( \
base_name, ATOMIC_INTERLOCKED_SUFFIX(lg_size))
#define ATOMIC_INTERLOCKED_NAME(base_name, lg_size) \
ATOMIC_CONCAT(base_name, ATOMIC_INTERLOCKED_SUFFIX(lg_size))
#define ATOMIC_INTERLOCKED_SUFFIX(lg_size) \
ATOMIC_CONCAT(ATOMIC_INTERLOCKED_SUFFIX_, lg_size)
#define ATOMIC_INTERLOCKED_SUFFIX(lg_size) \
ATOMIC_CONCAT(ATOMIC_INTERLOCKED_SUFFIX_, lg_size)
#define ATOMIC_INTERLOCKED_SUFFIX_0 8
#define ATOMIC_INTERLOCKED_SUFFIX_1 16
#define ATOMIC_INTERLOCKED_SUFFIX_2
#define ATOMIC_INTERLOCKED_SUFFIX_3 64
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, lg_size) \
typedef struct { \
ATOMIC_INTERLOCKED_REPR(lg_size) repr; \
} atomic_##short_type##_t; \
\
ATOMIC_INLINE type \
atomic_load_##short_type(const atomic_##short_type##_t *a, \
atomic_memory_order_t mo) { \
ATOMIC_INTERLOCKED_REPR(lg_size) ret = a->repr; \
if (mo != atomic_memory_order_relaxed) { \
atomic_fence(atomic_memory_order_acquire); \
} \
return (type) ret; \
} \
\
ATOMIC_INLINE void \
atomic_store_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
if (mo != atomic_memory_order_relaxed) { \
atomic_fence(atomic_memory_order_release); \
} \
a->repr = (ATOMIC_INTERLOCKED_REPR(lg_size)) val; \
if (mo == atomic_memory_order_seq_cst) { \
atomic_fence(atomic_memory_order_seq_cst); \
} \
} \
\
ATOMIC_INLINE type \
atomic_exchange_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return (type)ATOMIC_INTERLOCKED_NAME(_InterlockedExchange, \
lg_size)(&a->repr, (ATOMIC_INTERLOCKED_REPR(lg_size))val); \
} \
\
ATOMIC_INLINE bool \
atomic_compare_exchange_weak_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
ATOMIC_INTERLOCKED_REPR(lg_size) e = \
(ATOMIC_INTERLOCKED_REPR(lg_size))*expected; \
ATOMIC_INTERLOCKED_REPR(lg_size) d = \
(ATOMIC_INTERLOCKED_REPR(lg_size))desired; \
ATOMIC_INTERLOCKED_REPR(lg_size) old = \
ATOMIC_INTERLOCKED_NAME(_InterlockedCompareExchange, \
lg_size)(&a->repr, d, e); \
if (old == e) { \
return true; \
} else { \
*expected = (type)old; \
return false; \
} \
} \
\
ATOMIC_INLINE bool \
atomic_compare_exchange_strong_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
/* We implement the weak version with strong semantics. */ \
return atomic_compare_exchange_weak_##short_type(a, expected, \
desired, success_mo, failure_mo); \
}
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, lg_size) \
typedef struct { \
ATOMIC_INTERLOCKED_REPR(lg_size) repr; \
} atomic_##short_type##_t; \
\
ATOMIC_INLINE type atomic_load_##short_type( \
const atomic_##short_type##_t *a, atomic_memory_order_t mo) { \
ATOMIC_INTERLOCKED_REPR(lg_size) ret = a->repr; \
if (mo != atomic_memory_order_relaxed) { \
atomic_fence(atomic_memory_order_acquire); \
} \
return (type)ret; \
} \
\
ATOMIC_INLINE void atomic_store_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
if (mo != atomic_memory_order_relaxed) { \
atomic_fence(atomic_memory_order_release); \
} \
a->repr = (ATOMIC_INTERLOCKED_REPR(lg_size))val; \
if (mo == atomic_memory_order_seq_cst) { \
atomic_fence(atomic_memory_order_seq_cst); \
} \
} \
\
ATOMIC_INLINE type atomic_exchange_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return (type)ATOMIC_INTERLOCKED_NAME(_InterlockedExchange, \
lg_size)(&a->repr, (ATOMIC_INTERLOCKED_REPR(lg_size))val); \
} \
\
ATOMIC_INLINE bool atomic_compare_exchange_weak_##short_type( \
atomic_##short_type##_t *a, type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
ATOMIC_INTERLOCKED_REPR(lg_size) \
e = (ATOMIC_INTERLOCKED_REPR(lg_size)) * expected; \
ATOMIC_INTERLOCKED_REPR(lg_size) \
d = (ATOMIC_INTERLOCKED_REPR(lg_size))desired; \
ATOMIC_INTERLOCKED_REPR(lg_size) \
old = ATOMIC_INTERLOCKED_NAME( \
_InterlockedCompareExchange, lg_size)(&a->repr, d, e); \
if (old == e) { \
return true; \
} else { \
*expected = (type)old; \
return false; \
} \
} \
\
ATOMIC_INLINE bool atomic_compare_exchange_strong_##short_type( \
atomic_##short_type##_t *a, type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
/* We implement the weak version with strong semantics. */ \
return atomic_compare_exchange_weak_##short_type( \
a, expected, desired, success_mo, failure_mo); \
}
/* clang-format off */
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, lg_size) \
\
@ -154,5 +156,8 @@ atomic_fetch_xor_##short_type(atomic_##short_type##_t *a, \
return (type)ATOMIC_INTERLOCKED_NAME(_InterlockedXor, lg_size)( \
&a->repr, (ATOMIC_INTERLOCKED_REPR(lg_size))val); \
}
/* clang-format on */
#undef ATOMIC_INLINE
#endif /* JEMALLOC_INTERNAL_ATOMIC_MSVC_H */

View file

@ -1,23 +1,31 @@
#ifndef JEMALLOC_INTERNAL_BACKGROUND_THREAD_EXTERNS_H
#define JEMALLOC_INTERNAL_BACKGROUND_THREAD_EXTERNS_H
extern bool opt_background_thread;
extern malloc_mutex_t background_thread_lock;
extern atomic_b_t background_thread_enabled_state;
extern size_t n_background_threads;
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/background_thread_structs.h"
#include "jemalloc/internal/base.h"
#include "jemalloc/internal/mutex.h"
extern bool opt_background_thread;
extern size_t opt_max_background_threads;
extern malloc_mutex_t background_thread_lock;
extern atomic_b_t background_thread_enabled_state;
extern size_t n_background_threads;
extern size_t max_background_threads;
extern background_thread_info_t *background_thread_info;
bool background_thread_create(tsd_t *tsd, unsigned arena_ind);
bool background_threads_enable(tsd_t *tsd);
bool background_threads_disable(tsd_t *tsd);
void background_thread_interval_check(tsdn_t *tsdn, arena_t *arena,
arena_decay_t *decay, size_t npages_new);
bool background_thread_is_started(background_thread_info_t *info);
void background_thread_wakeup_early(
background_thread_info_t *info, nstime_t *remaining_sleep);
void background_thread_prefork0(tsdn_t *tsdn);
void background_thread_prefork1(tsdn_t *tsdn);
void background_thread_postfork_parent(tsdn_t *tsdn);
void background_thread_postfork_child(tsdn_t *tsdn);
bool background_thread_stats_read(tsdn_t *tsdn,
background_thread_stats_t *stats);
bool background_thread_stats_read(
tsdn_t *tsdn, background_thread_stats_t *stats);
void background_thread_ctl_init(tsdn_t *tsdn);
#ifdef JEMALLOC_PTHREAD_CREATE_WRAPPER
@ -25,6 +33,6 @@ extern int pthread_create_wrapper(pthread_t *__restrict, const pthread_attr_t *,
void *(*)(void *), void *__restrict);
#endif
bool background_thread_boot0(void);
bool background_thread_boot1(tsdn_t *tsdn);
bool background_thread_boot1(tsdn_t *tsdn, base_t *base);
#endif /* JEMALLOC_INTERNAL_BACKGROUND_THREAD_EXTERNS_H */

View file

@ -1,34 +1,49 @@
#ifndef JEMALLOC_INTERNAL_BACKGROUND_THREAD_INLINES_H
#define JEMALLOC_INTERNAL_BACKGROUND_THREAD_INLINES_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_inlines_a.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/background_thread_externs.h"
JEMALLOC_ALWAYS_INLINE bool
background_thread_enabled(void) {
return atomic_load_b(&background_thread_enabled_state, ATOMIC_RELAXED);
}
JEMALLOC_ALWAYS_INLINE void
background_thread_enabled_set_impl(bool state) {
atomic_store_b(&background_thread_enabled_state, state, ATOMIC_RELAXED);
}
JEMALLOC_ALWAYS_INLINE void
background_thread_enabled_set(tsdn_t *tsdn, bool state) {
malloc_mutex_assert_owner(tsdn, &background_thread_lock);
atomic_store_b(&background_thread_enabled_state, state, ATOMIC_RELAXED);
background_thread_enabled_set_impl(state);
}
JEMALLOC_ALWAYS_INLINE background_thread_info_t *
arena_background_thread_info_get(arena_t *arena) {
unsigned arena_ind = arena_ind_get(arena);
return &background_thread_info[arena_ind % ncpus];
return &background_thread_info[arena_ind % max_background_threads];
}
JEMALLOC_ALWAYS_INLINE background_thread_info_t *
background_thread_info_get(size_t ind) {
return &background_thread_info[ind % max_background_threads];
}
JEMALLOC_ALWAYS_INLINE uint64_t
background_thread_wakeup_time_get(background_thread_info_t *info) {
uint64_t next_wakeup = nstime_ns(&info->next_wakeup);
assert(atomic_load_b(&info->indefinite_sleep, ATOMIC_ACQUIRE) ==
(next_wakeup == BACKGROUND_THREAD_INDEFINITE_SLEEP));
assert(atomic_load_b(&info->indefinite_sleep, ATOMIC_ACQUIRE)
== (next_wakeup == BACKGROUND_THREAD_INDEFINITE_SLEEP));
return next_wakeup;
}
JEMALLOC_ALWAYS_INLINE void
background_thread_wakeup_time_set(tsdn_t *tsdn, background_thread_info_t *info,
uint64_t wakeup_time) {
background_thread_wakeup_time_set(
tsdn_t *tsdn, background_thread_info_t *info, uint64_t wakeup_time) {
malloc_mutex_assert_owner(tsdn, &info->mtx);
atomic_store_b(&info->indefinite_sleep,
wakeup_time == BACKGROUND_THREAD_INDEFINITE_SLEEP, ATOMIC_RELEASE);
@ -40,17 +55,4 @@ background_thread_indefinite_sleep(background_thread_info_t *info) {
return atomic_load_b(&info->indefinite_sleep, ATOMIC_ACQUIRE);
}
JEMALLOC_ALWAYS_INLINE void
arena_background_thread_inactivity_check(tsdn_t *tsdn, arena_t *arena) {
if (!background_thread_enabled()) {
return;
}
background_thread_info_t *info =
arena_background_thread_info_get(arena);
if (background_thread_indefinite_sleep(info)) {
background_thread_interval_check(tsdn, arena,
&arena->decay_dirty, 0);
}
}
#endif /* JEMALLOC_INTERNAL_BACKGROUND_THREAD_INLINES_H */

View file

@ -1,13 +1,29 @@
#ifndef JEMALLOC_INTERNAL_BACKGROUND_THREAD_STRUCTS_H
#define JEMALLOC_INTERNAL_BACKGROUND_THREAD_STRUCTS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/mutex.h"
/* This file really combines "structs" and "types", but only transitionally. */
#if defined(JEMALLOC_BACKGROUND_THREAD) || defined(JEMALLOC_LAZY_LOCK)
# define JEMALLOC_PTHREAD_CREATE_WRAPPER
# define JEMALLOC_PTHREAD_CREATE_WRAPPER
#endif
#define BACKGROUND_THREAD_INDEFINITE_SLEEP UINT64_MAX
#define MAX_BACKGROUND_THREAD_LIMIT MALLOCX_ARENA_LIMIT
#define DEFAULT_NUM_BACKGROUND_THREAD 4
/*
* These exist only as a transitional state. Eventually, deferral should be
* part of the PAI, and each implementation can indicate wait times with more
* specificity.
*/
#define BACKGROUND_THREAD_HPA_INTERVAL_MAX_UNINITIALIZED (-2)
#define BACKGROUND_THREAD_HPA_INTERVAL_MAX_DEFAULT_WHEN_ENABLED 5000
#define BACKGROUND_THREAD_DEFERRED_MIN UINT64_C(0)
#define BACKGROUND_THREAD_DEFERRED_MAX UINT64_MAX
typedef enum {
background_thread_stopped,
@ -19,33 +35,34 @@ typedef enum {
struct background_thread_info_s {
#ifdef JEMALLOC_BACKGROUND_THREAD
/* Background thread is pthread specific. */
pthread_t thread;
pthread_cond_t cond;
pthread_t thread;
pthread_cond_t cond;
#endif
malloc_mutex_t mtx;
background_thread_state_t state;
malloc_mutex_t mtx;
background_thread_state_t state;
/* When true, it means no wakeup scheduled. */
atomic_b_t indefinite_sleep;
atomic_b_t indefinite_sleep;
/* Next scheduled wakeup time (absolute time in ns). */
nstime_t next_wakeup;
nstime_t next_wakeup;
/*
* Since the last background thread run, newly added number of pages
* that need to be purged by the next wakeup. This is adjusted on
* epoch advance, and is used to determine whether we should signal the
* background thread to wake up earlier.
*/
size_t npages_to_purge_new;
size_t npages_to_purge_new;
/* Stats: total number of runs since started. */
uint64_t tot_n_runs;
uint64_t tot_n_runs;
/* Stats: total sleep time since started. */
nstime_t tot_sleep_time;
nstime_t tot_sleep_time;
};
typedef struct background_thread_info_s background_thread_info_t;
struct background_thread_stats_s {
size_t num_threads;
uint64_t num_runs;
nstime_t run_interval;
size_t num_threads;
uint64_t num_runs;
nstime_t run_interval;
mutex_prof_data_t max_counter_per_bg_thd;
};
typedef struct background_thread_stats_s background_thread_stats_t;

View file

@ -0,0 +1,125 @@
#ifndef JEMALLOC_INTERNAL_BASE_H
#define JEMALLOC_INTERNAL_BASE_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/edata.h"
#include "jemalloc/internal/ehooks.h"
#include "jemalloc/internal/mutex.h"
/*
* Alignment when THP is not enabled. Set to constant 2M in case the HUGEPAGE
* value is unexpected high (which would cause VM over-reservation).
*/
#define BASE_BLOCK_MIN_ALIGN ((size_t)2 << 20)
enum metadata_thp_mode_e {
metadata_thp_disabled = 0,
/*
* Lazily enable hugepage for metadata. To avoid high RSS caused by THP
* + low usage arena (i.e. THP becomes a significant percentage), the
* "auto" option only starts using THP after a base allocator used up
* the first THP region. Starting from the second hugepage (in a single
* arena), "auto" behaves the same as "always", i.e. madvise hugepage
* right away.
*/
metadata_thp_auto = 1,
metadata_thp_always = 2,
metadata_thp_mode_limit = 3
};
typedef enum metadata_thp_mode_e metadata_thp_mode_t;
#define METADATA_THP_DEFAULT metadata_thp_disabled
extern metadata_thp_mode_t opt_metadata_thp;
extern const char *const metadata_thp_mode_names[];
/* Embedded at the beginning of every block of base-managed virtual memory. */
typedef struct base_block_s base_block_t;
struct base_block_s {
/* Total size of block's virtual memory mapping. */
size_t size;
/* Next block in list of base's blocks. */
base_block_t *next;
/* Tracks unused trailing space. */
edata_t edata;
};
typedef struct base_s base_t;
struct base_s {
/*
* User-configurable extent hook functions.
*/
ehooks_t ehooks;
/*
* User-configurable extent hook functions for metadata allocations.
*/
ehooks_t ehooks_base;
/* Protects base_alloc() and base_stats_get() operations. */
malloc_mutex_t mtx;
/* Using THP when true (metadata_thp auto mode). */
bool auto_thp_switched;
/*
* Most recent size class in the series of increasingly large base
* extents. Logarithmic spacing between subsequent allocations ensures
* that the total number of distinct mappings remains small.
*/
pszind_t pind_last;
/* Serial number generation state. */
size_t extent_sn_next;
/* Chain of all blocks associated with base. */
base_block_t *blocks;
/* Heap of extents that track unused trailing space within blocks. */
edata_heap_t avail[SC_NSIZES];
/* Contains reusable base edata (used by tcache_stacks currently). */
edata_avail_t edata_avail;
/* Stats, only maintained if config_stats. */
size_t allocated;
size_t edata_allocated;
size_t rtree_allocated;
size_t resident;
size_t mapped;
/* Number of THP regions touched. */
size_t n_thp;
};
static inline unsigned
base_ind_get(const base_t *base) {
return ehooks_ind_get(&base->ehooks);
}
static inline bool
metadata_thp_enabled(void) {
return (opt_metadata_thp != metadata_thp_disabled);
}
base_t *b0get(void);
base_t *base_new(tsdn_t *tsdn, unsigned ind, const extent_hooks_t *extent_hooks,
bool metadata_use_hooks);
void base_delete(tsdn_t *tsdn, base_t *base);
ehooks_t *base_ehooks_get(base_t *base);
ehooks_t *base_ehooks_get_for_metadata(base_t *base);
extent_hooks_t *base_extent_hooks_set(
base_t *base, extent_hooks_t *extent_hooks);
void *base_alloc(tsdn_t *tsdn, base_t *base, size_t size, size_t alignment);
edata_t *base_alloc_edata(tsdn_t *tsdn, base_t *base);
void *base_alloc_rtree(tsdn_t *tsdn, base_t *base, size_t size);
void *b0_alloc_tcache_stack(tsdn_t *tsdn, size_t size);
void b0_dalloc_tcache_stack(tsdn_t *tsdn, void *tcache_stack);
void base_stats_get(tsdn_t *tsdn, base_t *base, size_t *allocated,
size_t *edata_allocated, size_t *rtree_allocated, size_t *resident,
size_t *mapped, size_t *n_thp);
void base_prefork(tsdn_t *tsdn, base_t *base);
void base_postfork_parent(tsdn_t *tsdn, base_t *base);
void base_postfork_child(tsdn_t *tsdn, base_t *base);
bool base_boot(tsdn_t *tsdn);
#endif /* JEMALLOC_INTERNAL_BASE_H */

View file

@ -1,19 +0,0 @@
#ifndef JEMALLOC_INTERNAL_BASE_EXTERNS_H
#define JEMALLOC_INTERNAL_BASE_EXTERNS_H
base_t *b0get(void);
base_t *base_new(tsdn_t *tsdn, unsigned ind, extent_hooks_t *extent_hooks);
void base_delete(base_t *base);
extent_hooks_t *base_extent_hooks_get(base_t *base);
extent_hooks_t *base_extent_hooks_set(base_t *base,
extent_hooks_t *extent_hooks);
void *base_alloc(tsdn_t *tsdn, base_t *base, size_t size, size_t alignment);
extent_t *base_alloc_extent(tsdn_t *tsdn, base_t *base);
void base_stats_get(tsdn_t *tsdn, base_t *base, size_t *allocated,
size_t *resident, size_t *mapped);
void base_prefork(tsdn_t *tsdn, base_t *base);
void base_postfork_parent(tsdn_t *tsdn, base_t *base);
void base_postfork_child(tsdn_t *tsdn, base_t *base);
bool base_boot(tsdn_t *tsdn);
#endif /* JEMALLOC_INTERNAL_BASE_EXTERNS_H */

View file

@ -1,9 +0,0 @@
#ifndef JEMALLOC_INTERNAL_BASE_INLINES_H
#define JEMALLOC_INTERNAL_BASE_INLINES_H
static inline unsigned
base_ind_get(const base_t *base) {
return base->ind;
}
#endif /* JEMALLOC_INTERNAL_BASE_INLINES_H */

View file

@ -1,55 +0,0 @@
#ifndef JEMALLOC_INTERNAL_BASE_STRUCTS_H
#define JEMALLOC_INTERNAL_BASE_STRUCTS_H
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/size_classes.h"
/* Embedded at the beginning of every block of base-managed virtual memory. */
struct base_block_s {
/* Total size of block's virtual memory mapping. */
size_t size;
/* Next block in list of base's blocks. */
base_block_t *next;
/* Tracks unused trailing space. */
extent_t extent;
};
struct base_s {
/* Associated arena's index within the arenas array. */
unsigned ind;
/*
* User-configurable extent hook functions. Points to an
* extent_hooks_t.
*/
atomic_p_t extent_hooks;
/* Protects base_alloc() and base_stats_get() operations. */
malloc_mutex_t mtx;
/*
* Most recent size class in the series of increasingly large base
* extents. Logarithmic spacing between subsequent allocations ensures
* that the total number of distinct mappings remains small.
*/
pszind_t pind_last;
/* Serial number generation state. */
size_t extent_sn_next;
/* Chain of all blocks associated with base. */
base_block_t *blocks;
/* Heap of extents that track unused trailing space within blocks. */
extent_heap_t avail[NSIZES];
/* Stats, only maintained if config_stats. */
size_t allocated;
size_t resident;
size_t mapped;
};
#endif /* JEMALLOC_INTERNAL_BASE_STRUCTS_H */

View file

@ -1,7 +0,0 @@
#ifndef JEMALLOC_INTERNAL_BASE_TYPES_H
#define JEMALLOC_INTERNAL_BASE_TYPES_H
typedef struct base_block_s base_block_t;
typedef struct base_s base_t;
#endif /* JEMALLOC_INTERNAL_BASE_TYPES_H */

View file

@ -0,0 +1,121 @@
#ifndef JEMALLOC_INTERNAL_BIN_H
#define JEMALLOC_INTERNAL_BIN_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/bin_info.h"
#include "jemalloc/internal/bin_stats.h"
#include "jemalloc/internal/bin_types.h"
#include "jemalloc/internal/edata.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/sc.h"
/*
* A bin contains a set of extents that are currently being used for slab
* allocations.
*/
typedef struct bin_s bin_t;
struct bin_s {
/* All operations on bin_t fields require lock ownership. */
malloc_mutex_t lock;
/*
* Bin statistics. These get touched every time the lock is acquired,
* so put them close by in the hopes of getting some cache locality.
*/
bin_stats_t stats;
/*
* Current slab being used to service allocations of this bin's size
* class. slabcur is independent of slabs_{nonfull,full}; whenever
* slabcur is reassigned, the previous slab must be deallocated or
* inserted into slabs_{nonfull,full}.
*/
edata_t *slabcur;
/*
* Heap of non-full slabs. This heap is used to assure that new
* allocations come from the non-full slab that is oldest/lowest in
* memory.
*/
edata_heap_t slabs_nonfull;
/* List used to track full slabs. */
edata_list_active_t slabs_full;
};
/* A set of sharded bins of the same size class. */
typedef struct bins_s bins_t;
struct bins_s {
/* Sharded bins. Dynamically sized. */
bin_t *bin_shards;
};
void bin_shard_sizes_boot(unsigned bin_shard_sizes[SC_NBINS]);
bool bin_update_shard_size(unsigned bin_shards[SC_NBINS], size_t start_size,
size_t end_size, size_t nshards);
/* Initializes a bin to empty. Returns true on error. */
bool bin_init(bin_t *bin);
/* Forking. */
void bin_prefork(tsdn_t *tsdn, bin_t *bin);
void bin_postfork_parent(tsdn_t *tsdn, bin_t *bin);
void bin_postfork_child(tsdn_t *tsdn, bin_t *bin);
/* Slab region allocation. */
void *bin_slab_reg_alloc(edata_t *slab, const bin_info_t *bin_info);
void bin_slab_reg_alloc_batch(
edata_t *slab, const bin_info_t *bin_info, unsigned cnt, void **ptrs);
/* Slab list management. */
void bin_slabs_nonfull_insert(bin_t *bin, edata_t *slab);
void bin_slabs_nonfull_remove(bin_t *bin, edata_t *slab);
edata_t *bin_slabs_nonfull_tryget(bin_t *bin);
void bin_slabs_full_insert(bool is_auto, bin_t *bin, edata_t *slab);
void bin_slabs_full_remove(bool is_auto, bin_t *bin, edata_t *slab);
/* Slab association / demotion. */
void bin_dissociate_slab(bool is_auto, edata_t *slab, bin_t *bin);
void bin_lower_slab(tsdn_t *tsdn, bool is_auto, edata_t *slab, bin_t *bin);
/* Deallocation helpers (called under bin lock). */
void bin_dalloc_slab_prepare(tsdn_t *tsdn, edata_t *slab, bin_t *bin);
void bin_dalloc_locked_handle_newly_empty(
tsdn_t *tsdn, bool is_auto, edata_t *slab, bin_t *bin);
void bin_dalloc_locked_handle_newly_nonempty(
tsdn_t *tsdn, bool is_auto, edata_t *slab, bin_t *bin);
/* Slabcur refill and allocation. */
void bin_refill_slabcur_with_fresh_slab(tsdn_t *tsdn, bin_t *bin,
szind_t binind, edata_t *fresh_slab);
void *bin_malloc_with_fresh_slab(tsdn_t *tsdn, bin_t *bin,
szind_t binind, edata_t *fresh_slab);
bool bin_refill_slabcur_no_fresh_slab(tsdn_t *tsdn, bool is_auto,
bin_t *bin);
void *bin_malloc_no_fresh_slab(tsdn_t *tsdn, bool is_auto, bin_t *bin,
szind_t binind);
/* Bin selection. */
bin_t *bin_choose(tsdn_t *tsdn, arena_t *arena, szind_t binind,
unsigned *binshard_p);
/* Stats. */
static inline void
bin_stats_merge(tsdn_t *tsdn, bin_stats_data_t *dst_bin_stats, bin_t *bin) {
malloc_mutex_lock(tsdn, &bin->lock);
malloc_mutex_prof_accum(tsdn, &dst_bin_stats->mutex_data, &bin->lock);
bin_stats_t *stats = &dst_bin_stats->stats_data;
stats->nmalloc += bin->stats.nmalloc;
stats->ndalloc += bin->stats.ndalloc;
stats->nrequests += bin->stats.nrequests;
stats->curregs += bin->stats.curregs;
stats->nfills += bin->stats.nfills;
stats->nflushes += bin->stats.nflushes;
stats->nslabs += bin->stats.nslabs;
stats->reslabs += bin->stats.reslabs;
stats->curslabs += bin->stats.curslabs;
stats->nonfull_slabs += bin->stats.nonfull_slabs;
malloc_mutex_unlock(tsdn, &bin->lock);
}
#endif /* JEMALLOC_INTERNAL_BIN_H */

View file

@ -0,0 +1,51 @@
#ifndef JEMALLOC_INTERNAL_BIN_INFO_H
#define JEMALLOC_INTERNAL_BIN_INFO_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/bitmap.h"
/*
* Read-only information associated with each element of arena_t's bins array
* is stored separately, partly to reduce memory usage (only one copy, rather
* than one per arena), but mainly to avoid false cacheline sharing.
*
* Each slab has the following layout:
*
* /--------------------\
* | region 0 |
* |--------------------|
* | region 1 |
* |--------------------|
* | ... |
* | ... |
* | ... |
* |--------------------|
* | region nregs-1 |
* \--------------------/
*/
typedef struct bin_info_s bin_info_t;
struct bin_info_s {
/* Size of regions in a slab for this bin's size class. */
size_t reg_size;
/* Total size of a slab for this bin's size class. */
size_t slab_size;
/* Total number of regions in a slab for this bin's size class. */
uint32_t nregs;
/* Number of sharded bins in each arena for this size class. */
uint32_t n_shards;
/*
* Metadata used to manipulate bitmaps for slabs associated with this
* bin.
*/
bitmap_info_t bitmap_info;
};
extern bin_info_t bin_infos[SC_NBINS];
void bin_info_boot(sc_data_t *sc_data, unsigned bin_shard_sizes[SC_NBINS]);
#endif /* JEMALLOC_INTERNAL_BIN_INFO_H */

View file

@ -0,0 +1,112 @@
#ifndef JEMALLOC_INTERNAL_BIN_INLINES_H
#define JEMALLOC_INTERNAL_BIN_INLINES_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/bin.h"
#include "jemalloc/internal/bin_info.h"
#include "jemalloc/internal/bitmap.h"
#include "jemalloc/internal/div.h"
#include "jemalloc/internal/edata.h"
#include "jemalloc/internal/sc.h"
/*
* The dalloc bin info contains just the information that the common paths need
* during tcache flushes. By force-inlining these paths, and using local copies
* of data (so that the compiler knows it's constant), we avoid a whole bunch of
* redundant loads and stores by leaving this information in registers.
*/
typedef struct bin_dalloc_locked_info_s bin_dalloc_locked_info_t;
struct bin_dalloc_locked_info_s {
div_info_t div_info;
uint32_t nregs;
uint64_t ndalloc;
};
/* Find the region index of a pointer within a slab. */
JEMALLOC_ALWAYS_INLINE size_t
bin_slab_regind_impl(
div_info_t *div_info, szind_t binind, edata_t *slab, const void *ptr) {
size_t diff, regind;
/* Freeing a pointer outside the slab can cause assertion failure. */
assert((uintptr_t)ptr >= (uintptr_t)edata_addr_get(slab));
assert((uintptr_t)ptr < (uintptr_t)edata_past_get(slab));
/* Freeing an interior pointer can cause assertion failure. */
assert(((uintptr_t)ptr - (uintptr_t)edata_addr_get(slab))
% (uintptr_t)bin_infos[binind].reg_size
== 0);
diff = (size_t)((uintptr_t)ptr - (uintptr_t)edata_addr_get(slab));
/* Avoid doing division with a variable divisor. */
regind = div_compute(div_info, diff);
assert(regind < bin_infos[binind].nregs);
return regind;
}
JEMALLOC_ALWAYS_INLINE size_t
bin_slab_regind(bin_dalloc_locked_info_t *info, szind_t binind,
edata_t *slab, const void *ptr) {
size_t regind = bin_slab_regind_impl(
&info->div_info, binind, slab, ptr);
return regind;
}
JEMALLOC_ALWAYS_INLINE void
bin_dalloc_locked_begin(
bin_dalloc_locked_info_t *info, szind_t binind) {
info->div_info = arena_binind_div_info[binind];
info->nregs = bin_infos[binind].nregs;
info->ndalloc = 0;
}
/*
* Does the deallocation work associated with freeing a single pointer (a
* "step") in between a bin_dalloc_locked begin and end call.
*
* Returns true if arena_slab_dalloc must be called on slab. Doesn't do
* stats updates, which happen during finish (this lets running counts get left
* in a register).
*/
JEMALLOC_ALWAYS_INLINE bool
bin_dalloc_locked_step(tsdn_t *tsdn, bool is_auto, bin_t *bin,
bin_dalloc_locked_info_t *info, szind_t binind, edata_t *slab,
void *ptr) {
const bin_info_t *bin_info = &bin_infos[binind];
size_t regind = bin_slab_regind(info, binind, slab, ptr);
slab_data_t *slab_data = edata_slab_data_get(slab);
assert(edata_nfree_get(slab) < bin_info->nregs);
/* Freeing an unallocated pointer can cause assertion failure. */
assert(bitmap_get(slab_data->bitmap, &bin_info->bitmap_info, regind));
bitmap_unset(slab_data->bitmap, &bin_info->bitmap_info, regind);
edata_nfree_inc(slab);
if (config_stats) {
info->ndalloc++;
}
unsigned nfree = edata_nfree_get(slab);
if (nfree == bin_info->nregs) {
bin_dalloc_locked_handle_newly_empty(
tsdn, is_auto, slab, bin);
return true;
} else if (nfree == 1 && slab != bin->slabcur) {
bin_dalloc_locked_handle_newly_nonempty(
tsdn, is_auto, slab, bin);
}
return false;
}
JEMALLOC_ALWAYS_INLINE void
bin_dalloc_locked_finish(tsdn_t *tsdn, bin_t *bin,
bin_dalloc_locked_info_t *info) {
if (config_stats) {
bin->stats.ndalloc += info->ndalloc;
assert(bin->stats.curregs >= (size_t)info->ndalloc);
bin->stats.curregs -= (size_t)info->ndalloc;
}
}
#endif /* JEMALLOC_INTERNAL_BIN_INLINES_H */

View file

@ -0,0 +1,58 @@
#ifndef JEMALLOC_INTERNAL_BIN_STATS_H
#define JEMALLOC_INTERNAL_BIN_STATS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/mutex_prof.h"
typedef struct bin_stats_s bin_stats_t;
struct bin_stats_s {
/*
* Total number of allocation/deallocation requests served directly by
* the bin. Note that tcache may allocate an object, then recycle it
* many times, resulting many increments to nrequests, but only one
* each to nmalloc and ndalloc.
*/
uint64_t nmalloc;
uint64_t ndalloc;
/*
* Number of allocation requests that correspond to the size of this
* bin. This includes requests served by tcache, though tcache only
* periodically merges into this counter.
*/
uint64_t nrequests;
/*
* Current number of regions of this size class, including regions
* currently cached by tcache.
*/
size_t curregs;
/* Number of tcache fills from this bin. */
uint64_t nfills;
/* Number of tcache flushes to this bin. */
uint64_t nflushes;
/* Total number of slabs created for this bin's size class. */
uint64_t nslabs;
/*
* Total number of slabs reused by extracting them from the slabs heap
* for this bin's size class.
*/
uint64_t reslabs;
/* Current number of slabs in this bin. */
size_t curslabs;
/* Current size of nonfull slabs heap in this bin. */
size_t nonfull_slabs;
};
typedef struct bin_stats_data_s bin_stats_data_t;
struct bin_stats_data_s {
bin_stats_t stats_data;
mutex_prof_data_t mutex_data;
};
#endif /* JEMALLOC_INTERNAL_BIN_STATS_H */

View file

@ -0,0 +1,21 @@
#ifndef JEMALLOC_INTERNAL_BIN_TYPES_H
#define JEMALLOC_INTERNAL_BIN_TYPES_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/sc.h"
#define BIN_SHARDS_MAX (1 << EDATA_BITS_BINSHARD_WIDTH)
#define N_BIN_SHARDS_DEFAULT 1
/* Used in TSD static initializer only. Real init in arena_bind(). */
#define TSD_BINSHARDS_ZERO_INITIALIZER \
{ \
{ UINT8_MAX } \
}
typedef struct tsd_binshards_s tsd_binshards_t;
struct tsd_binshards_s {
uint8_t binshard[SC_NBINS];
};
#endif /* JEMALLOC_INTERNAL_BIN_TYPES_H */

View file

@ -1,93 +1,391 @@
#ifndef JEMALLOC_INTERNAL_BIT_UTIL_H
#define JEMALLOC_INTERNAL_BIT_UTIL_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
#define BIT_UTIL_INLINE static inline
/* Sanity check. */
#if !defined(JEMALLOC_INTERNAL_FFSLL) || !defined(JEMALLOC_INTERNAL_FFSL) \
#if !defined(JEMALLOC_INTERNAL_FFSLL) || !defined(JEMALLOC_INTERNAL_FFSL) \
|| !defined(JEMALLOC_INTERNAL_FFS)
# error JEMALLOC_INTERNAL_FFS{,L,LL} should have been defined by configure
# error JEMALLOC_INTERNAL_FFS{,L,LL} should have been defined by configure
#endif
BIT_UTIL_INLINE unsigned
ffs_llu(unsigned long long bitmap) {
return JEMALLOC_INTERNAL_FFSLL(bitmap);
/*
* Unlike the builtins and posix ffs functions, our ffs requires a non-zero
* input, and returns the position of the lowest bit set (as opposed to the
* posix versions, which return 1 larger than that position and use a return
* value of zero as a sentinel. This tends to simplify logic in callers, and
* allows for consistency with the builtins we build fls on top of.
*/
static inline unsigned
ffs_llu(unsigned long long x) {
util_assume(x != 0);
return JEMALLOC_INTERNAL_FFSLL(x) - 1;
}
BIT_UTIL_INLINE unsigned
ffs_lu(unsigned long bitmap) {
return JEMALLOC_INTERNAL_FFSL(bitmap);
static inline unsigned
ffs_lu(unsigned long x) {
util_assume(x != 0);
return JEMALLOC_INTERNAL_FFSL(x) - 1;
}
BIT_UTIL_INLINE unsigned
ffs_u(unsigned bitmap) {
return JEMALLOC_INTERNAL_FFS(bitmap);
static inline unsigned
ffs_u(unsigned x) {
util_assume(x != 0);
return JEMALLOC_INTERNAL_FFS(x) - 1;
}
BIT_UTIL_INLINE unsigned
ffs_zu(size_t bitmap) {
/* clang-format off */
#define DO_FLS_SLOW(x, suffix) do { \
util_assume(x != 0); \
x |= (x >> 1); \
x |= (x >> 2); \
x |= (x >> 4); \
x |= (x >> 8); \
x |= (x >> 16); \
if (sizeof(x) > 4) { \
/* \
* If sizeof(x) is 4, then the expression "x >> 32" \
* will generate compiler warnings even if the code \
* never executes. This circumvents the warning, and \
* gets compiled out in optimized builds. \
*/ \
int constant_32 = sizeof(x) * 4; \
x |= (x >> constant_32); \
} \
x++; \
if (x == 0) { \
return 8 * sizeof(x) - 1; \
} \
return ffs_##suffix(x) - 1; \
} while(0)
/* clang-format on */
static inline unsigned
fls_llu_slow(unsigned long long x) {
DO_FLS_SLOW(x, llu);
}
static inline unsigned
fls_lu_slow(unsigned long x) {
DO_FLS_SLOW(x, lu);
}
static inline unsigned
fls_u_slow(unsigned x) {
DO_FLS_SLOW(x, u);
}
#undef DO_FLS_SLOW
#ifdef JEMALLOC_HAVE_BUILTIN_CLZ
static inline unsigned
fls_llu(unsigned long long x) {
util_assume(x != 0);
/*
* Note that the xor here is more naturally written as subtraction; the
* last bit set is the number of bits in the type minus the number of
* leading zero bits. But GCC implements that as:
* bsr edi, edi
* mov eax, 31
* xor edi, 31
* sub eax, edi
* If we write it as xor instead, then we get
* bsr eax, edi
* as desired.
*/
return (8 * sizeof(x) - 1) ^ __builtin_clzll(x);
}
static inline unsigned
fls_lu(unsigned long x) {
util_assume(x != 0);
return (8 * sizeof(x) - 1) ^ __builtin_clzl(x);
}
static inline unsigned
fls_u(unsigned x) {
util_assume(x != 0);
return (8 * sizeof(x) - 1) ^ __builtin_clz(x);
}
#elif defined(_MSC_VER)
# if LG_SIZEOF_PTR == 3
# define DO_BSR64(bit, x) _BitScanReverse64(&bit, x)
# else
/*
* This never actually runs; we're just dodging a compiler error for the
* never-taken branch where sizeof(void *) == 8.
*/
# define DO_BSR64(bit, x) \
bit = 0; \
unreachable()
# endif
/* clang-format off */
#define DO_FLS(x) do { \
if (x == 0) { \
return 8 * sizeof(x); \
} \
unsigned long bit; \
if (sizeof(x) == 4) { \
_BitScanReverse(&bit, (unsigned)x); \
return (unsigned)bit; \
} \
if (sizeof(x) == 8 && sizeof(void *) == 8) { \
DO_BSR64(bit, x); \
return (unsigned)bit; \
} \
if (sizeof(x) == 8 && sizeof(void *) == 4) { \
/* Dodge a compiler warning, as above. */ \
int constant_32 = sizeof(x) * 4; \
if (_BitScanReverse(&bit, \
(unsigned)(x >> constant_32))) { \
return 32 + (unsigned)bit; \
} else { \
_BitScanReverse(&bit, (unsigned)x); \
return (unsigned)bit; \
} \
} \
unreachable(); \
} while (0)
/* clang-format on */
static inline unsigned
fls_llu(unsigned long long x) {
DO_FLS(x);
}
static inline unsigned
fls_lu(unsigned long x) {
DO_FLS(x);
}
static inline unsigned
fls_u(unsigned x) {
DO_FLS(x);
}
# undef DO_FLS
# undef DO_BSR64
#else
static inline unsigned
fls_llu(unsigned long long x) {
return fls_llu_slow(x);
}
static inline unsigned
fls_lu(unsigned long x) {
return fls_lu_slow(x);
}
static inline unsigned
fls_u(unsigned x) {
return fls_u_slow(x);
}
#endif
#if LG_SIZEOF_LONG_LONG > 3
# error "Haven't implemented popcount for 16-byte ints."
#endif
/* clang-format off */
#define DO_POPCOUNT(x, type) do { \
/* \
* Algorithm from an old AMD optimization reference manual. \
* We're putting a little bit more work than you might expect \
* into the no-instrinsic case, since we only support the \
* GCC intrinsics spelling of popcount (for now). Detecting \
* whether or not the popcount builtin is actually useable in \
* MSVC is nontrivial. \
*/ \
\
type bmul = (type)0x0101010101010101ULL; \
\
/* \
* Replace each 2 bits with the sideways sum of the original \
* values. 0x5 = 0b0101. \
* \
* You might expect this to be: \
* x = (x & 0x55...) + ((x >> 1) & 0x55...). \
* That costs an extra mask relative to this, though. \
*/ \
x = x - ((x >> 1) & (0x55U * bmul)); \
/* Replace each 4 bits with their sideays sum. 0x3 = 0b0011. */\
x = (x & (bmul * 0x33U)) + ((x >> 2) & (bmul * 0x33U)); \
/* \
* Replace each 8 bits with their sideways sum. Note that we \
* can't overflow within each 4-bit sum here, so we can skip \
* the initial mask. \
*/ \
x = (x + (x >> 4)) & (bmul * 0x0FU); \
/* \
* None of the partial sums in this multiplication (viewed in \
* base-256) can overflow into the next digit. So the least \
* significant byte of the product will be the least \
* significant byte of the original value, the second least \
* significant byte will be the sum of the two least \
* significant bytes of the original value, and so on. \
* Importantly, the high byte will be the byte-wise sum of all \
* the bytes of the original value. \
*/ \
x = x * bmul; \
x >>= ((sizeof(x) - 1) * 8); \
return (unsigned)x; \
} while(0)
/* clang-format on */
static inline unsigned
popcount_u_slow(unsigned bitmap) {
DO_POPCOUNT(bitmap, unsigned);
}
static inline unsigned
popcount_lu_slow(unsigned long bitmap) {
DO_POPCOUNT(bitmap, unsigned long);
}
static inline unsigned
popcount_llu_slow(unsigned long long bitmap) {
DO_POPCOUNT(bitmap, unsigned long long);
}
#undef DO_POPCOUNT
static inline unsigned
popcount_u(unsigned bitmap) {
#ifdef JEMALLOC_INTERNAL_POPCOUNT
return JEMALLOC_INTERNAL_POPCOUNT(bitmap);
#else
return popcount_u_slow(bitmap);
#endif
}
static inline unsigned
popcount_lu(unsigned long bitmap) {
#ifdef JEMALLOC_INTERNAL_POPCOUNTL
return JEMALLOC_INTERNAL_POPCOUNTL(bitmap);
#else
return popcount_lu_slow(bitmap);
#endif
}
static inline unsigned
popcount_llu(unsigned long long bitmap) {
#ifdef JEMALLOC_INTERNAL_POPCOUNTLL
return JEMALLOC_INTERNAL_POPCOUNTLL(bitmap);
#else
return popcount_llu_slow(bitmap);
#endif
}
/*
* Clears first unset bit in bitmap, and returns
* place of bit. bitmap *must not* be 0.
*/
static inline size_t
cfs_lu(unsigned long *bitmap) {
util_assume(*bitmap != 0);
size_t bit = ffs_lu(*bitmap);
*bitmap ^= ZU(1) << bit;
return bit;
}
static inline unsigned
ffs_zu(size_t x) {
#if LG_SIZEOF_PTR == LG_SIZEOF_INT
return ffs_u(bitmap);
return ffs_u(x);
#elif LG_SIZEOF_PTR == LG_SIZEOF_LONG
return ffs_lu(bitmap);
return ffs_lu(x);
#elif LG_SIZEOF_PTR == LG_SIZEOF_LONG_LONG
return ffs_llu(bitmap);
return ffs_llu(x);
#else
#error No implementation for size_t ffs()
# error No implementation for size_t ffs()
#endif
}
BIT_UTIL_INLINE unsigned
ffs_u64(uint64_t bitmap) {
static inline unsigned
fls_zu(size_t x) {
#if LG_SIZEOF_PTR == LG_SIZEOF_INT
return fls_u(x);
#elif LG_SIZEOF_PTR == LG_SIZEOF_LONG
return fls_lu(x);
#elif LG_SIZEOF_PTR == LG_SIZEOF_LONG_LONG
return fls_llu(x);
#else
# error No implementation for size_t fls()
#endif
}
static inline unsigned
ffs_u64(uint64_t x) {
#if LG_SIZEOF_LONG == 3
return ffs_lu(bitmap);
return ffs_lu(x);
#elif LG_SIZEOF_LONG_LONG == 3
return ffs_llu(bitmap);
return ffs_llu(x);
#else
#error No implementation for 64-bit ffs()
# error No implementation for 64-bit ffs()
#endif
}
BIT_UTIL_INLINE unsigned
ffs_u32(uint32_t bitmap) {
static inline unsigned
fls_u64(uint64_t x) {
#if LG_SIZEOF_LONG == 3
return fls_lu(x);
#elif LG_SIZEOF_LONG_LONG == 3
return fls_llu(x);
#else
# error No implementation for 64-bit fls()
#endif
}
static inline unsigned
ffs_u32(uint32_t x) {
#if LG_SIZEOF_INT == 2
return ffs_u(bitmap);
return ffs_u(x);
#else
#error No implementation for 32-bit ffs()
# error No implementation for 32-bit ffs()
#endif
return ffs_u(bitmap);
}
BIT_UTIL_INLINE uint64_t
static inline unsigned
fls_u32(uint32_t x) {
#if LG_SIZEOF_INT == 2
return fls_u(x);
#else
# error No implementation for 32-bit fls()
#endif
}
static inline uint64_t
pow2_ceil_u64(uint64_t x) {
x--;
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x |= x >> 32;
x++;
return x;
if (unlikely(x <= 1)) {
return x;
}
size_t msb_on_index = fls_u64(x - 1);
/*
* Range-check; it's on the callers to ensure that the result of this
* call won't overflow.
*/
assert(msb_on_index < 63);
return 1ULL << (msb_on_index + 1);
}
BIT_UTIL_INLINE uint32_t
static inline uint32_t
pow2_ceil_u32(uint32_t x) {
x--;
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x++;
return x;
if (unlikely(x <= 1)) {
return x;
}
size_t msb_on_index = fls_u32(x - 1);
/* As above. */
assert(msb_on_index < 31);
return 1U << (msb_on_index + 1);
}
/* Compute the smallest power of 2 that is >= x. */
BIT_UTIL_INLINE size_t
static inline size_t
pow2_ceil_zu(size_t x) {
#if (LG_SIZEOF_PTR == 3)
return pow2_ceil_u64(x);
@ -96,70 +394,38 @@ pow2_ceil_zu(size_t x) {
#endif
}
#if (defined(__i386__) || defined(__amd64__) || defined(__x86_64__))
BIT_UTIL_INLINE unsigned
static inline unsigned
lg_floor(size_t x) {
size_t ret;
assert(x != 0);
asm ("bsr %1, %0"
: "=r"(ret) // Outputs.
: "r"(x) // Inputs.
);
assert(ret < UINT_MAX);
return (unsigned)ret;
}
#elif (defined(_MSC_VER))
BIT_UTIL_INLINE unsigned
lg_floor(size_t x) {
unsigned long ret;
assert(x != 0);
util_assume(x != 0);
#if (LG_SIZEOF_PTR == 3)
_BitScanReverse64(&ret, x);
#elif (LG_SIZEOF_PTR == 2)
_BitScanReverse(&ret, x);
return fls_u64(x);
#else
# error "Unsupported type size for lg_floor()"
#endif
assert(ret < UINT_MAX);
return (unsigned)ret;
}
#elif (defined(JEMALLOC_HAVE_BUILTIN_CLZ))
BIT_UTIL_INLINE unsigned
lg_floor(size_t x) {
assert(x != 0);
#if (LG_SIZEOF_PTR == LG_SIZEOF_INT)
return ((8 << LG_SIZEOF_PTR) - 1) - __builtin_clz(x);
#elif (LG_SIZEOF_PTR == LG_SIZEOF_LONG)
return ((8 << LG_SIZEOF_PTR) - 1) - __builtin_clzl(x);
#else
# error "Unsupported type size for lg_floor()"
return fls_u32(x);
#endif
}
#else
BIT_UTIL_INLINE unsigned
lg_floor(size_t x) {
assert(x != 0);
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
#if (LG_SIZEOF_PTR == 3)
x |= (x >> 32);
#endif
if (x == SIZE_T_MAX) {
return (8 << LG_SIZEOF_PTR) - 1;
}
x++;
return ffs_zu(x) - 2;
static inline unsigned
lg_ceil(size_t x) {
return lg_floor(x) + ((x & (x - 1)) == 0 ? 0 : 1);
}
/* A compile-time version of lg_floor and lg_ceil. */
#define LG_FLOOR_1(x) 0
#define LG_FLOOR_2(x) (x < (1ULL << 1) ? LG_FLOOR_1(x) : 1 + LG_FLOOR_1(x >> 1))
#define LG_FLOOR_4(x) (x < (1ULL << 2) ? LG_FLOOR_2(x) : 2 + LG_FLOOR_2(x >> 2))
#define LG_FLOOR_8(x) (x < (1ULL << 4) ? LG_FLOOR_4(x) : 4 + LG_FLOOR_4(x >> 4))
#define LG_FLOOR_16(x) \
(x < (1ULL << 8) ? LG_FLOOR_8(x) : 8 + LG_FLOOR_8(x >> 8))
#define LG_FLOOR_32(x) \
(x < (1ULL << 16) ? LG_FLOOR_16(x) : 16 + LG_FLOOR_16(x >> 16))
#define LG_FLOOR_64(x) \
(x < (1ULL << 32) ? LG_FLOOR_32(x) : 32 + LG_FLOOR_32(x >> 32))
#if LG_SIZEOF_PTR == 2
# define LG_FLOOR(x) LG_FLOOR_32((x))
#else
# define LG_FLOOR(x) LG_FLOOR_64((x))
#endif
#undef BIT_UTIL_INLINE
#define LG_CEIL(x) (LG_FLOOR(x) + (((x) & ((x) - 1)) == 0 ? 0 : 1))
#endif /* JEMALLOC_INTERNAL_BIT_UTIL_H */

View file

@ -1,27 +1,27 @@
#ifndef JEMALLOC_INTERNAL_BITMAP_H
#define JEMALLOC_INTERNAL_BITMAP_H
#include "jemalloc/internal/arena_types.h"
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/bit_util.h"
#include "jemalloc/internal/size_classes.h"
#include "jemalloc/internal/sc.h"
typedef unsigned long bitmap_t;
#define LG_SIZEOF_BITMAP LG_SIZEOF_LONG
#define LG_SIZEOF_BITMAP LG_SIZEOF_LONG
/* Maximum bitmap bit count is 2^LG_BITMAP_MAXBITS. */
#if LG_SLAB_MAXREGS > LG_CEIL_NSIZES
#if SC_LG_SLAB_MAXREGS > LG_CEIL(SC_NSIZES)
/* Maximum bitmap bit count is determined by maximum regions per slab. */
# define LG_BITMAP_MAXBITS LG_SLAB_MAXREGS
# define LG_BITMAP_MAXBITS SC_LG_SLAB_MAXREGS
#else
/* Maximum bitmap bit count is determined by number of extent size classes. */
# define LG_BITMAP_MAXBITS LG_CEIL_NSIZES
# define LG_BITMAP_MAXBITS LG_CEIL(SC_NSIZES)
#endif
#define BITMAP_MAXBITS (ZU(1) << LG_BITMAP_MAXBITS)
#define BITMAP_MAXBITS (ZU(1) << LG_BITMAP_MAXBITS)
/* Number of bits per group. */
#define LG_BITMAP_GROUP_NBITS (LG_SIZEOF_BITMAP + 3)
#define BITMAP_GROUP_NBITS (1U << LG_BITMAP_GROUP_NBITS)
#define BITMAP_GROUP_NBITS_MASK (BITMAP_GROUP_NBITS-1)
#define LG_BITMAP_GROUP_NBITS (LG_SIZEOF_BITMAP + 3)
#define BITMAP_GROUP_NBITS (1U << LG_BITMAP_GROUP_NBITS)
#define BITMAP_GROUP_NBITS_MASK (BITMAP_GROUP_NBITS - 1)
/*
* Do some analysis on how big the bitmap is before we use a tree. For a brute
@ -29,67 +29,64 @@ typedef unsigned long bitmap_t;
* use a tree instead.
*/
#if LG_BITMAP_MAXBITS - LG_BITMAP_GROUP_NBITS > 3
# define BITMAP_USE_TREE
# define BITMAP_USE_TREE
#endif
/* Number of groups required to store a given number of bits. */
#define BITMAP_BITS2GROUPS(nbits) \
(((nbits) + BITMAP_GROUP_NBITS_MASK) >> LG_BITMAP_GROUP_NBITS)
#define BITMAP_BITS2GROUPS(nbits) \
(((nbits) + BITMAP_GROUP_NBITS_MASK) >> LG_BITMAP_GROUP_NBITS)
/*
* Number of groups required at a particular level for a given number of bits.
*/
#define BITMAP_GROUPS_L0(nbits) \
BITMAP_BITS2GROUPS(nbits)
#define BITMAP_GROUPS_L1(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(nbits))
#define BITMAP_GROUPS_L2(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits))))
#define BITMAP_GROUPS_L3(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS( \
BITMAP_BITS2GROUPS((nbits)))))
#define BITMAP_GROUPS_L4(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS( \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits))))))
#define BITMAP_GROUPS_L0(nbits) BITMAP_BITS2GROUPS(nbits)
#define BITMAP_GROUPS_L1(nbits) BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(nbits))
#define BITMAP_GROUPS_L2(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits))))
#define BITMAP_GROUPS_L3(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS( \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits)))))
#define BITMAP_GROUPS_L4(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS( \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits))))))
/*
* Assuming the number of levels, number of groups required for a given number
* of bits.
*/
#define BITMAP_GROUPS_1_LEVEL(nbits) \
BITMAP_GROUPS_L0(nbits)
#define BITMAP_GROUPS_2_LEVEL(nbits) \
(BITMAP_GROUPS_1_LEVEL(nbits) + BITMAP_GROUPS_L1(nbits))
#define BITMAP_GROUPS_3_LEVEL(nbits) \
(BITMAP_GROUPS_2_LEVEL(nbits) + BITMAP_GROUPS_L2(nbits))
#define BITMAP_GROUPS_4_LEVEL(nbits) \
(BITMAP_GROUPS_3_LEVEL(nbits) + BITMAP_GROUPS_L3(nbits))
#define BITMAP_GROUPS_5_LEVEL(nbits) \
(BITMAP_GROUPS_4_LEVEL(nbits) + BITMAP_GROUPS_L4(nbits))
#define BITMAP_GROUPS_1_LEVEL(nbits) BITMAP_GROUPS_L0(nbits)
#define BITMAP_GROUPS_2_LEVEL(nbits) \
(BITMAP_GROUPS_1_LEVEL(nbits) + BITMAP_GROUPS_L1(nbits))
#define BITMAP_GROUPS_3_LEVEL(nbits) \
(BITMAP_GROUPS_2_LEVEL(nbits) + BITMAP_GROUPS_L2(nbits))
#define BITMAP_GROUPS_4_LEVEL(nbits) \
(BITMAP_GROUPS_3_LEVEL(nbits) + BITMAP_GROUPS_L3(nbits))
#define BITMAP_GROUPS_5_LEVEL(nbits) \
(BITMAP_GROUPS_4_LEVEL(nbits) + BITMAP_GROUPS_L4(nbits))
/*
* Maximum number of groups required to support LG_BITMAP_MAXBITS.
*/
#ifdef BITMAP_USE_TREE
#if LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_1_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_1_LEVEL(BITMAP_MAXBITS)
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 2
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_2_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_2_LEVEL(BITMAP_MAXBITS)
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 3
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_3_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_3_LEVEL(BITMAP_MAXBITS)
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 4
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_4_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_4_LEVEL(BITMAP_MAXBITS)
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 5
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_5_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_5_LEVEL(BITMAP_MAXBITS)
#else
# error "Unsupported bitmap size"
#endif
# if LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_1_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_1_LEVEL(BITMAP_MAXBITS)
# elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 2
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_2_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_2_LEVEL(BITMAP_MAXBITS)
# elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 3
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_3_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_3_LEVEL(BITMAP_MAXBITS)
# elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 4
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_4_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_4_LEVEL(BITMAP_MAXBITS)
# elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 5
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_5_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_5_LEVEL(BITMAP_MAXBITS)
# else
# error "Unsupported bitmap size"
# endif
/*
* Maximum number of levels possible. This could be statically computed based
@ -105,42 +102,53 @@ typedef unsigned long bitmap_t;
* unused trailing entries in bitmap_info_t structures; the bitmaps themselves
* are not impacted.
*/
#define BITMAP_MAX_LEVELS 5
# define BITMAP_MAX_LEVELS 5
#define BITMAP_INFO_INITIALIZER(nbits) { \
/* nbits. */ \
nbits, \
/* nlevels. */ \
(BITMAP_GROUPS_L0(nbits) > BITMAP_GROUPS_L1(nbits)) + \
(BITMAP_GROUPS_L1(nbits) > BITMAP_GROUPS_L2(nbits)) + \
(BITMAP_GROUPS_L2(nbits) > BITMAP_GROUPS_L3(nbits)) + \
(BITMAP_GROUPS_L3(nbits) > BITMAP_GROUPS_L4(nbits)) + 1, \
/* levels. */ \
{ \
{0}, \
{BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L1(nbits) + BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L2(nbits) + BITMAP_GROUPS_L1(nbits) + \
BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L3(nbits) + BITMAP_GROUPS_L2(nbits) + \
BITMAP_GROUPS_L1(nbits) + BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L4(nbits) + BITMAP_GROUPS_L3(nbits) + \
BITMAP_GROUPS_L2(nbits) + BITMAP_GROUPS_L1(nbits) \
+ BITMAP_GROUPS_L0(nbits)} \
} \
}
# define BITMAP_INFO_INITIALIZER(nbits) \
{ \
/* nbits. */ \
nbits, /* nlevels. */ \
(BITMAP_GROUPS_L0(nbits) \
> BITMAP_GROUPS_L1(nbits)) \
+ (BITMAP_GROUPS_L1(nbits) \
> BITMAP_GROUPS_L2(nbits)) \
+ (BITMAP_GROUPS_L2(nbits) \
> BITMAP_GROUPS_L3(nbits)) \
+ (BITMAP_GROUPS_L3(nbits) \
> BITMAP_GROUPS_L4(nbits)) \
+ 1, /* levels. */ \
{ \
{0}, {BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L1(nbits) \
+ BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L2(nbits) \
+ BITMAP_GROUPS_L1(nbits) \
+ BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L3(nbits) \
+ BITMAP_GROUPS_L2(nbits) \
+ BITMAP_GROUPS_L1(nbits) \
+ BITMAP_GROUPS_L0(nbits)}, \
{ \
BITMAP_GROUPS_L4(nbits) \
+ BITMAP_GROUPS_L3(nbits) \
+ BITMAP_GROUPS_L2(nbits) \
+ BITMAP_GROUPS_L1(nbits) \
+ BITMAP_GROUPS_L0(nbits) \
} \
} \
}
#else /* BITMAP_USE_TREE */
#define BITMAP_GROUPS(nbits) BITMAP_BITS2GROUPS(nbits)
#define BITMAP_GROUPS_MAX BITMAP_BITS2GROUPS(BITMAP_MAXBITS)
# define BITMAP_GROUPS(nbits) BITMAP_BITS2GROUPS(nbits)
# define BITMAP_GROUPS_MAX BITMAP_BITS2GROUPS(BITMAP_MAXBITS)
#define BITMAP_INFO_INITIALIZER(nbits) { \
/* nbits. */ \
nbits, \
/* ngroups. */ \
BITMAP_BITS2GROUPS(nbits) \
}
# define BITMAP_INFO_INITIALIZER(nbits) \
{ \
/* nbits. */ \
nbits, /* ngroups. */ \
BITMAP_BITS2GROUPS(nbits) \
}
#endif /* BITMAP_USE_TREE */
@ -161,21 +169,21 @@ typedef struct bitmap_info_s {
* Only the first (nlevels+1) elements are used, and levels are ordered
* bottom to top (e.g. the bottom level is stored in levels[0]).
*/
bitmap_level_t levels[BITMAP_MAX_LEVELS+1];
#else /* BITMAP_USE_TREE */
bitmap_level_t levels[BITMAP_MAX_LEVELS + 1];
#else /* BITMAP_USE_TREE */
/* Number of groups necessary for nbits. */
size_t ngroups;
#endif /* BITMAP_USE_TREE */
} bitmap_info_t;
void bitmap_info_init(bitmap_info_t *binfo, size_t nbits);
void bitmap_init(bitmap_t *bitmap, const bitmap_info_t *binfo, bool fill);
void bitmap_info_init(bitmap_info_t *binfo, size_t nbits);
void bitmap_init(bitmap_t *bitmap, const bitmap_info_t *binfo, bool fill);
size_t bitmap_size(const bitmap_info_t *binfo);
static inline bool
bitmap_full(bitmap_t *bitmap, const bitmap_info_t *binfo) {
#ifdef BITMAP_USE_TREE
size_t rgoff = binfo->levels[binfo->nlevels].group_offset - 1;
size_t rgoff = binfo->levels[binfo->nlevels].group_offset - 1;
bitmap_t rg = bitmap[rgoff];
/* The bitmap is full iff the root group is 0. */
return (rg == 0);
@ -193,7 +201,7 @@ bitmap_full(bitmap_t *bitmap, const bitmap_info_t *binfo) {
static inline bool
bitmap_get(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit) {
size_t goff;
size_t goff;
bitmap_t g;
assert(bit < binfo->nbits);
@ -204,9 +212,9 @@ bitmap_get(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit) {
static inline void
bitmap_set(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit) {
size_t goff;
size_t goff;
bitmap_t *gp;
bitmap_t g;
bitmap_t g;
assert(bit < binfo->nbits);
assert(!bitmap_get(bitmap, binfo, bit));
@ -245,12 +253,13 @@ bitmap_ffu(const bitmap_t *bitmap, const bitmap_info_t *binfo, size_t min_bit) {
#ifdef BITMAP_USE_TREE
size_t bit = 0;
for (unsigned level = binfo->nlevels; level--;) {
size_t lg_bits_per_group = (LG_BITMAP_GROUP_NBITS * (level +
1));
bitmap_t group = bitmap[binfo->levels[level].group_offset + (bit
>> lg_bits_per_group)];
unsigned group_nmask = (unsigned)(((min_bit > bit) ? (min_bit -
bit) : 0) >> (lg_bits_per_group - LG_BITMAP_GROUP_NBITS));
size_t lg_bits_per_group = (LG_BITMAP_GROUP_NBITS
* (level + 1));
bitmap_t group = bitmap[binfo->levels[level].group_offset
+ (bit >> lg_bits_per_group)];
unsigned group_nmask =
(unsigned)(((min_bit > bit) ? (min_bit - bit) : 0)
>> (lg_bits_per_group - LG_BITMAP_GROUP_NBITS));
assert(group_nmask <= BITMAP_GROUP_NBITS);
bitmap_t group_mask = ~((1LU << group_nmask) - 1);
bitmap_t group_masked = group & group_mask;
@ -273,25 +282,28 @@ bitmap_ffu(const bitmap_t *bitmap, const bitmap_info_t *binfo, size_t min_bit) {
}
return bitmap_ffu(bitmap, binfo, sib_base);
}
bit += ((size_t)(ffs_lu(group_masked) - 1)) <<
(lg_bits_per_group - LG_BITMAP_GROUP_NBITS);
bit += ((size_t)ffs_lu(group_masked))
<< (lg_bits_per_group - LG_BITMAP_GROUP_NBITS);
}
assert(bit >= min_bit);
assert(bit < binfo->nbits);
return bit;
#else
size_t i = min_bit >> LG_BITMAP_GROUP_NBITS;
bitmap_t g = bitmap[i] & ~((1LU << (min_bit & BITMAP_GROUP_NBITS_MASK))
- 1);
size_t i = min_bit >> LG_BITMAP_GROUP_NBITS;
bitmap_t g = bitmap[i]
& ~((1LU << (min_bit & BITMAP_GROUP_NBITS_MASK)) - 1);
size_t bit;
do {
bit = ffs_lu(g);
if (bit != 0) {
return (i << LG_BITMAP_GROUP_NBITS) + (bit - 1);
while (1) {
if (g != 0) {
bit = ffs_lu(g);
return (i << LG_BITMAP_GROUP_NBITS) + bit;
}
i++;
if (i >= binfo->ngroups) {
break;
}
g = bitmap[i];
} while (i < binfo->ngroups);
}
return binfo->nbits;
#endif
}
@ -299,7 +311,7 @@ bitmap_ffu(const bitmap_t *bitmap, const bitmap_info_t *binfo, size_t min_bit) {
/* sfu: set first unset. */
static inline size_t
bitmap_sfu(bitmap_t *bitmap, const bitmap_info_t *binfo) {
size_t bit;
size_t bit;
bitmap_t g;
unsigned i;
@ -308,20 +320,20 @@ bitmap_sfu(bitmap_t *bitmap, const bitmap_info_t *binfo) {
#ifdef BITMAP_USE_TREE
i = binfo->nlevels - 1;
g = bitmap[binfo->levels[i].group_offset];
bit = ffs_lu(g) - 1;
bit = ffs_lu(g);
while (i > 0) {
i--;
g = bitmap[binfo->levels[i].group_offset + bit];
bit = (bit << LG_BITMAP_GROUP_NBITS) + (ffs_lu(g) - 1);
bit = (bit << LG_BITMAP_GROUP_NBITS) + ffs_lu(g);
}
#else
i = 0;
g = bitmap[0];
while ((bit = ffs_lu(g)) == 0) {
while (g == 0) {
i++;
g = bitmap[i];
}
bit = (i << LG_BITMAP_GROUP_NBITS) + (bit - 1);
bit = (i << LG_BITMAP_GROUP_NBITS) + ffs_lu(g);
#endif
bitmap_set(bitmap, binfo, bit);
return bit;
@ -329,9 +341,9 @@ bitmap_sfu(bitmap_t *bitmap, const bitmap_info_t *binfo) {
static inline void
bitmap_unset(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit) {
size_t goff;
bitmap_t *gp;
bitmap_t g;
size_t goff;
bitmap_t *gp;
bitmap_t g;
UNUSED bool propagate;
assert(bit < binfo->nbits);

View file

@ -0,0 +1,36 @@
#ifndef JEMALLOC_INTERNAL_BUF_WRITER_H
#define JEMALLOC_INTERNAL_BUF_WRITER_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/tsd_types.h"
/*
* Note: when using the buffered writer, cbopaque is passed to write_cb only
* when the buffer is flushed. It would make a difference if cbopaque points
* to something that's changing for each write_cb call, or something that
* affects write_cb in a way dependent on the content of the output string.
* However, the most typical usage case in practice is that cbopaque points to
* some "option like" content for the write_cb, so it doesn't matter.
*/
typedef struct {
write_cb_t *write_cb;
void *cbopaque;
char *buf;
size_t buf_size;
size_t buf_end;
bool internal_buf;
} buf_writer_t;
bool buf_writer_init(tsdn_t *tsdn, buf_writer_t *buf_writer,
write_cb_t *write_cb, void *cbopaque, char *buf, size_t buf_len);
void buf_writer_flush(buf_writer_t *buf_writer);
write_cb_t buf_writer_cb;
void buf_writer_terminate(tsdn_t *tsdn, buf_writer_t *buf_writer);
typedef ssize_t(read_cb_t)(void *read_cbopaque, void *buf, size_t limit);
void buf_writer_pipe(
buf_writer_t *buf_writer, read_cb_t *read_cb, void *read_cbopaque);
#endif /* JEMALLOC_INTERNAL_BUF_WRITER_H */

View file

@ -0,0 +1,777 @@
#ifndef JEMALLOC_INTERNAL_CACHE_BIN_H
#define JEMALLOC_INTERNAL_CACHE_BIN_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/jemalloc_internal_externs.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/safety_check.h"
#include "jemalloc/internal/sz.h"
/*
* The cache_bins are the mechanism that the tcache and the arena use to
* communicate. The tcache fills from and flushes to the arena by passing a
* cache_bin_t to fill/flush. When the arena needs to pull stats from the
* tcaches associated with it, it does so by iterating over its
* cache_bin_array_descriptor_t objects and reading out per-bin stats it
* contains. This makes it so that the arena need not know about the existence
* of the tcache at all.
*/
/*
* The size in bytes of each cache bin stack. We also use this to indicate
* *counts* of individual objects.
*/
typedef uint16_t cache_bin_sz_t;
#define JUNK_ADDR ((uintptr_t)0x7a7a7a7a7a7a7a7aULL)
/*
* Leave a noticeable mark pattern on the cache bin stack boundaries, in case a
* bug starts leaking those. Make it look like the junk pattern but be distinct
* from it.
*/
static const uintptr_t cache_bin_preceding_junk = JUNK_ADDR;
/* Note: JUNK_ADDR vs. JUNK_ADDR + 1 -- this tells you which pointer leaked. */
static const uintptr_t cache_bin_trailing_junk = JUNK_ADDR + 1;
/*
* A pointer used to initialize a fake stack_head for disabled small bins
* so that the enabled/disabled assessment does not rely on ncached_max.
*/
extern const uintptr_t disabled_bin;
/*
* That implies the following value, for the maximum number of items in any
* individual bin. The cache bins track their bounds looking just at the low
* bits of a pointer, compared against a cache_bin_sz_t. So that's
* 1 << (sizeof(cache_bin_sz_t) * 8)
* bytes spread across pointer sized objects to get the maximum.
*/
#define CACHE_BIN_NCACHED_MAX \
(((size_t)1 << sizeof(cache_bin_sz_t) * 8) / sizeof(void *) - 1)
/*
* This lives inside the cache_bin (for locality reasons), and is initialized
* alongside it, but is otherwise not modified by any cache bin operations.
* It's logically public and maintained by its callers.
*/
typedef struct cache_bin_stats_s cache_bin_stats_t;
struct cache_bin_stats_s {
/*
* Number of allocation requests that corresponded to the size of this
* bin.
*/
uint64_t nrequests;
};
/*
* Read-only information associated with each element of tcache_t's tbins array
* is stored separately, mainly to reduce memory usage.
*/
typedef struct cache_bin_info_s cache_bin_info_t;
struct cache_bin_info_s {
cache_bin_sz_t ncached_max;
};
/*
* Responsible for caching allocations associated with a single size.
*
* Several pointers are used to track the stack. To save on metadata bytes,
* only the stack_head is a full sized pointer (which is dereferenced on the
* fastpath), while the others store only the low 16 bits -- this is correct
* because a single stack never takes more space than 2^16 bytes, and at the
* same time only equality checks are performed on the low bits.
*
* (low addr) (high addr)
* |------stashed------|------available------|------cached-----|
* ^ ^ ^ ^
* low_bound(derived) low_bits_full stack_head low_bits_empty
*/
typedef struct cache_bin_s cache_bin_t;
struct cache_bin_s {
/*
* The stack grows down. Whenever the bin is nonempty, the head points
* to an array entry containing a valid allocation. When it is empty,
* the head points to one element past the owned array.
*/
void **stack_head;
/*
* cur_ptr and stats are both modified frequently. Let's keep them
* close so that they have a higher chance of being on the same
* cacheline, thus less write-backs.
*/
cache_bin_stats_t tstats;
/*
* The low bits of the address of the first item in the stack that
* hasn't been used since the last GC, to track the low water mark (min
* # of cached items).
*
* Since the stack grows down, this is a higher address than
* low_bits_full.
*/
cache_bin_sz_t low_bits_low_water;
/*
* The low bits of the value that stack_head will take on when the array
* is full (of cached & stashed items). But remember that stack_head
* always points to a valid item when the array is nonempty -- this is
* in the array.
*
* Recall that since the stack grows down, this is the lowest available
* address in the array for caching. Only adjusted when stashing items.
*/
cache_bin_sz_t low_bits_full;
/*
* The low bits of the value that stack_head will take on when the array
* is empty.
*
* The stack grows down -- this is one past the highest address in the
* array. Immutable after initialization.
*/
cache_bin_sz_t low_bits_empty;
/* The maximum number of cached items in the bin. */
cache_bin_info_t bin_info;
};
/*
* The cache_bins live inside the tcache, but the arena (by design) isn't
* supposed to know much about tcache internals. To let the arena iterate over
* associated bins, we keep (with the tcache) a linked list of
* cache_bin_array_descriptor_ts that tell the arena how to find the bins.
*/
typedef struct cache_bin_array_descriptor_s cache_bin_array_descriptor_t;
struct cache_bin_array_descriptor_s {
/*
* The arena keeps a list of the cache bins associated with it, for
* stats collection.
*/
ql_elm(cache_bin_array_descriptor_t) link;
/* Pointers to the tcache bins. */
cache_bin_t *bins;
};
static inline void
cache_bin_array_descriptor_init(
cache_bin_array_descriptor_t *descriptor, cache_bin_t *bins) {
ql_elm_new(descriptor, link);
descriptor->bins = bins;
}
JEMALLOC_ALWAYS_INLINE bool
cache_bin_nonfast_aligned(const void *ptr) {
if (!config_uaf_detection) {
return false;
}
/*
* Currently we use alignment to decide which pointer to junk & stash on
* dealloc (for catching use-after-free). In some common cases a
* page-aligned check is needed already (sdalloc w/ config_prof), so we
* are getting it more or less for free -- no added instructions on
* free_fastpath.
*
* Another way of deciding which pointer to sample, is adding another
* thread_event to pick one every N bytes. That also adds no cost on
* the fastpath, however it will tend to pick large allocations which is
* not the desired behavior.
*/
return ((uintptr_t)ptr & san_cache_bin_nonfast_mask) == 0;
}
static inline const void *
cache_bin_disabled_bin_stack(void) {
return &disabled_bin;
}
/*
* If a cache bin was zero initialized (either because it lives in static or
* thread-local storage, or was memset to 0), this function indicates whether or
* not cache_bin_init was called on it.
*/
static inline bool
cache_bin_still_zero_initialized(cache_bin_t *bin) {
return bin->stack_head == NULL;
}
static inline bool
cache_bin_disabled(cache_bin_t *bin) {
bool disabled = (bin->stack_head == cache_bin_disabled_bin_stack());
if (disabled) {
assert((uintptr_t)(*bin->stack_head) == JUNK_ADDR);
}
return disabled;
}
/* Gets ncached_max without asserting that the bin is enabled. */
static inline cache_bin_sz_t
cache_bin_ncached_max_get_unsafe(cache_bin_t *bin) {
return bin->bin_info.ncached_max;
}
/* Returns ncached_max: Upper limit on ncached. */
static inline cache_bin_sz_t
cache_bin_ncached_max_get(cache_bin_t *bin) {
assert(!cache_bin_disabled(bin));
return cache_bin_ncached_max_get_unsafe(bin);
}
/*
* Internal.
*
* Asserts that the pointer associated with earlier is <= the one associated
* with later.
*/
static inline void
cache_bin_assert_earlier(
cache_bin_t *bin, cache_bin_sz_t earlier, cache_bin_sz_t later) {
if (earlier > later) {
assert(bin->low_bits_full > bin->low_bits_empty);
}
}
/*
* Internal.
*
* Does difference calculations that handle wraparound correctly. Earlier must
* be associated with the position earlier in memory.
*/
static inline cache_bin_sz_t
cache_bin_diff(cache_bin_t *bin, cache_bin_sz_t earlier, cache_bin_sz_t later) {
cache_bin_assert_earlier(bin, earlier, later);
return later - earlier;
}
/*
* Number of items currently cached in the bin, without checking ncached_max.
*/
static inline cache_bin_sz_t
cache_bin_ncached_get_internal(cache_bin_t *bin) {
cache_bin_sz_t diff = cache_bin_diff(bin,
(cache_bin_sz_t)(uintptr_t)bin->stack_head, bin->low_bits_empty);
cache_bin_sz_t n = diff / sizeof(void *);
/*
* We have undefined behavior here; if this function is called from the
* arena stats updating code, then stack_head could change from the
* first line to the next one. Morally, these loads should be atomic,
* but compilers won't currently generate comparisons with in-memory
* operands against atomics, and these variables get accessed on the
* fast paths. This should still be "safe" in the sense of generating
* the correct assembly for the foreseeable future, though.
*/
assert(n == 0 || *(bin->stack_head) != NULL);
return n;
}
/*
* Number of items currently cached in the bin, with checking ncached_max. The
* caller must know that no concurrent modification of the cache_bin is
* possible.
*/
static inline cache_bin_sz_t
cache_bin_ncached_get_local(cache_bin_t *bin) {
cache_bin_sz_t n = cache_bin_ncached_get_internal(bin);
assert(n <= cache_bin_ncached_max_get(bin));
return n;
}
/*
* Internal.
*
* A pointer to the position one past the end of the backing array.
*
* Do not call if racy, because both 'bin->stack_head' and 'bin->low_bits_full'
* are subject to concurrent modifications.
*/
static inline void **
cache_bin_empty_position_get(cache_bin_t *bin) {
cache_bin_sz_t diff = cache_bin_diff(bin,
(cache_bin_sz_t)(uintptr_t)bin->stack_head, bin->low_bits_empty);
byte_t *empty_bits = (byte_t *)bin->stack_head + diff;
void **ret = (void **)empty_bits;
assert(ret >= bin->stack_head);
return ret;
}
/*
* Internal.
*
* Calculates low bits of the lower bound of the usable cache bin's range (see
* cache_bin_t visual representation above).
*
* No values are concurrently modified, so should be safe to read in a
* multithreaded environment. Currently concurrent access happens only during
* arena statistics collection.
*/
static inline cache_bin_sz_t
cache_bin_low_bits_low_bound_get(cache_bin_t *bin) {
return (cache_bin_sz_t)bin->low_bits_empty
- cache_bin_ncached_max_get(bin) * sizeof(void *);
}
/*
* Internal.
*
* A pointer to the position with the lowest address of the backing array.
*/
static inline void **
cache_bin_low_bound_get(cache_bin_t *bin) {
cache_bin_sz_t ncached_max = cache_bin_ncached_max_get(bin);
void **ret = cache_bin_empty_position_get(bin) - ncached_max;
assert(ret <= bin->stack_head);
return ret;
}
/*
* As the name implies. This is important since it's not correct to try to
* batch fill a nonempty cache bin.
*/
static inline void
cache_bin_assert_empty(cache_bin_t *bin) {
assert(cache_bin_ncached_get_local(bin) == 0);
assert(cache_bin_empty_position_get(bin) == bin->stack_head);
}
/*
* Get low water, but without any of the correctness checking we do for the
* caller-usable version, if we are temporarily breaking invariants (like
* ncached >= low_water during flush).
*/
static inline cache_bin_sz_t
cache_bin_low_water_get_internal(cache_bin_t *bin) {
return cache_bin_diff(bin, bin->low_bits_low_water, bin->low_bits_empty)
/ sizeof(void *);
}
/* Returns the numeric value of low water in [0, ncached]. */
static inline cache_bin_sz_t
cache_bin_low_water_get(cache_bin_t *bin) {
cache_bin_sz_t low_water = cache_bin_low_water_get_internal(bin);
assert(low_water <= cache_bin_ncached_max_get(bin));
assert(low_water <= cache_bin_ncached_get_local(bin));
cache_bin_assert_earlier(bin,
(cache_bin_sz_t)(uintptr_t)bin->stack_head,
bin->low_bits_low_water);
return low_water;
}
/*
* Indicates that the current cache bin position should be the low water mark
* going forward.
*/
static inline void
cache_bin_low_water_set(cache_bin_t *bin) {
assert(!cache_bin_disabled(bin));
bin->low_bits_low_water = (cache_bin_sz_t)(uintptr_t)bin->stack_head;
}
static inline void
cache_bin_low_water_adjust(cache_bin_t *bin) {
assert(!cache_bin_disabled(bin));
if (cache_bin_ncached_get_internal(bin)
< cache_bin_low_water_get_internal(bin)) {
cache_bin_low_water_set(bin);
}
}
JEMALLOC_ALWAYS_INLINE void *
cache_bin_alloc_impl(cache_bin_t *bin, bool *success, bool adjust_low_water) {
/*
* success (instead of ret) should be checked upon the return of this
* function. We avoid checking (ret == NULL) because there is never a
* null stored on the avail stack (which is unknown to the compiler),
* and eagerly checking ret would cause pipeline stall (waiting for the
* cacheline).
*/
/*
* This may read from the empty position; however the loaded value won't
* be used. It's safe because the stack has one more slot reserved.
*/
void *ret = *bin->stack_head;
cache_bin_sz_t low_bits = (cache_bin_sz_t)(uintptr_t)bin->stack_head;
void **new_head = bin->stack_head + 1;
/*
* Note that the low water mark is at most empty; if we pass this check,
* we know we're non-empty.
*/
if (likely(low_bits != bin->low_bits_low_water)) {
bin->stack_head = new_head;
*success = true;
return ret;
}
if (!adjust_low_water) {
*success = false;
return NULL;
}
/*
* In the fast-path case where we call alloc_easy and then alloc, the
* previous checking and computation is optimized away -- we didn't
* actually commit any of our operations.
*/
if (likely(low_bits != bin->low_bits_empty)) {
bin->stack_head = new_head;
bin->low_bits_low_water = (cache_bin_sz_t)(uintptr_t)new_head;
*success = true;
return ret;
}
*success = false;
return NULL;
}
/*
* Allocate an item out of the bin, failing if we're at the low-water mark.
*/
JEMALLOC_ALWAYS_INLINE void *
cache_bin_alloc_easy(cache_bin_t *bin, bool *success) {
/* We don't look at info if we're not adjusting low-water. */
return cache_bin_alloc_impl(bin, success, false);
}
/*
* Allocate an item out of the bin, even if we're currently at the low-water
* mark (and failing only if the bin is empty).
*/
JEMALLOC_ALWAYS_INLINE void *
cache_bin_alloc(cache_bin_t *bin, bool *success) {
return cache_bin_alloc_impl(bin, success, true);
}
JEMALLOC_ALWAYS_INLINE cache_bin_sz_t
cache_bin_alloc_batch(cache_bin_t *bin, size_t num, void **out) {
cache_bin_sz_t n = cache_bin_ncached_get_internal(bin);
if (n > num) {
n = (cache_bin_sz_t)num;
}
memcpy(out, bin->stack_head, n * sizeof(void *));
bin->stack_head += n;
cache_bin_low_water_adjust(bin);
return n;
}
JEMALLOC_ALWAYS_INLINE bool
cache_bin_full(cache_bin_t *bin) {
return (
(cache_bin_sz_t)(uintptr_t)bin->stack_head == bin->low_bits_full);
}
/*
* Scans the allocated area of the cache_bin for the given pointer up to limit.
* Fires safety_check_fail if the ptr is found and returns true.
*/
JEMALLOC_ALWAYS_INLINE bool
cache_bin_dalloc_safety_checks(cache_bin_t *bin, void *ptr) {
if (!config_debug || opt_debug_double_free_max_scan == 0) {
return false;
}
cache_bin_sz_t ncached = cache_bin_ncached_get_internal(bin);
unsigned max_scan = opt_debug_double_free_max_scan < ncached
? opt_debug_double_free_max_scan
: ncached;
void **cur = bin->stack_head;
void **limit = cur + max_scan;
for (; cur < limit; cur++) {
if (*cur == ptr) {
safety_check_fail(
"Invalid deallocation detected: double free of "
"pointer %p\n",
ptr);
return true;
}
}
return false;
}
/*
* Free an object into the given bin. Fails only if the bin is full.
*/
JEMALLOC_ALWAYS_INLINE bool
cache_bin_dalloc_easy(cache_bin_t *bin, void *ptr) {
if (unlikely(cache_bin_full(bin))) {
return false;
}
if (unlikely(cache_bin_dalloc_safety_checks(bin, ptr))) {
return true;
}
bin->stack_head--;
*bin->stack_head = ptr;
cache_bin_assert_earlier(bin, bin->low_bits_full,
(cache_bin_sz_t)(uintptr_t)bin->stack_head);
return true;
}
/* Returns false if failed to stash (i.e. bin is full). */
JEMALLOC_ALWAYS_INLINE bool
cache_bin_stash(cache_bin_t *bin, void *ptr) {
if (cache_bin_full(bin)) {
return false;
}
/* Stash at the full position, in the [full, head) range. */
cache_bin_sz_t low_bits_head = (cache_bin_sz_t)(uintptr_t)
bin->stack_head;
/* Wraparound handled as well. */
cache_bin_sz_t diff = cache_bin_diff(
bin, bin->low_bits_full, low_bits_head);
*(void **)((byte_t *)bin->stack_head - diff) = ptr;
assert(!cache_bin_full(bin));
bin->low_bits_full += sizeof(void *);
cache_bin_assert_earlier(bin, bin->low_bits_full, low_bits_head);
return true;
}
/* Get the number of stashed pointers. */
JEMALLOC_ALWAYS_INLINE cache_bin_sz_t
cache_bin_nstashed_get_internal(cache_bin_t *bin) {
cache_bin_sz_t ncached_max = cache_bin_ncached_max_get(bin);
cache_bin_sz_t low_bits_low_bound = cache_bin_low_bits_low_bound_get(
bin);
cache_bin_sz_t n = cache_bin_diff(
bin, low_bits_low_bound, bin->low_bits_full)
/ sizeof(void *);
assert(n <= ncached_max);
if (config_debug && n != 0) {
/* Below are for assertions only. */
void **low_bound = cache_bin_low_bound_get(bin);
assert(
(cache_bin_sz_t)(uintptr_t)low_bound == low_bits_low_bound);
void *stashed = *(low_bound + n - 1);
bool aligned = cache_bin_nonfast_aligned(stashed);
#ifdef JEMALLOC_JET
/* Allow arbitrary pointers to be stashed in tests. */
aligned = true;
#endif
assert(stashed != NULL && aligned);
}
return n;
}
JEMALLOC_ALWAYS_INLINE cache_bin_sz_t
cache_bin_nstashed_get_local(cache_bin_t *bin) {
cache_bin_sz_t n = cache_bin_nstashed_get_internal(bin);
assert(n <= cache_bin_ncached_max_get(bin));
return n;
}
/*
* Obtain a racy view of the number of items currently in the cache bin, in the
* presence of possible concurrent modifications.
*
* Note that this is the only racy function in this header. Any other functions
* are assumed to be non-racy. The "racy" term here means accessed from another
* thread (that is not the owner of the specific cache bin). This only happens
* when gathering stats (read-only). The only change because of the racy
* condition is that assertions based on mutable fields are omitted.
*
* It's important to keep in mind that 'bin->stack_head' and
* 'bin->low_bits_full' can be modified concurrently and almost no assertions
* about their values can be made.
*
* This function should not call other utility functions because the racy
* condition may cause unexpected / undefined behaviors in unverified utility
* functions. Currently, this function calls two utility functions
* cache_bin_ncached_max_get and cache_bin_low_bits_low_bound_get because
* they help access values that will not be concurrently modified.
*/
static inline void
cache_bin_nitems_get_remote(
cache_bin_t *bin, cache_bin_sz_t *ncached, cache_bin_sz_t *nstashed) {
/* Racy version of cache_bin_ncached_get_internal. */
cache_bin_sz_t diff = bin->low_bits_empty
- (cache_bin_sz_t)(uintptr_t)bin->stack_head;
cache_bin_sz_t n = diff / sizeof(void *);
*ncached = n;
/* Racy version of cache_bin_nstashed_get_internal. */
cache_bin_sz_t low_bits_low_bound = cache_bin_low_bits_low_bound_get(
bin);
n = (bin->low_bits_full - low_bits_low_bound) / sizeof(void *);
*nstashed = n;
/*
* Note that cannot assert anything regarding ncached_max because
* it can be configured on the fly and is thus racy.
*/
}
/*
* For small bins, used to calculate how many items to fill at a time.
* The final nfill is calculated by (ncached_max >> (base - offset)).
*/
typedef struct cache_bin_fill_ctl_s cache_bin_fill_ctl_t;
struct cache_bin_fill_ctl_s {
uint8_t base;
uint8_t offset;
};
/*
* Limit how many items can be flushed in a batch (Which is the upper bound
* for the nflush parameter in tcache_bin_flush_impl()).
* This is to avoid stack overflow when we do batch edata look up, which
* reserves a nflush * sizeof(emap_batch_lookup_result_t) stack variable.
*/
#define CACHE_BIN_NFLUSH_BATCH_MAX \
((VARIABLE_ARRAY_SIZE_MAX >> LG_SIZEOF_PTR) - 1)
/*
* Filling and flushing are done in batch, on arrays of void *s. For filling,
* the arrays go forward, and can be accessed with ordinary array arithmetic.
* For flushing, we work from the end backwards, and so need to use special
* accessors that invert the usual ordering.
*
* This is important for maintaining first-fit; the arena code fills with
* earliest objects first, and so those are the ones we should return first for
* cache_bin_alloc calls. When flushing, we should flush the objects that we
* wish to return later; those at the end of the array. This is better for the
* first-fit heuristic as well as for cache locality; the most recently freed
* objects are the ones most likely to still be in cache.
*
* This all sounds very hand-wavey and theoretical, but reverting the ordering
* on one or the other pathway leads to measurable slowdowns.
*/
typedef struct cache_bin_ptr_array_s cache_bin_ptr_array_t;
struct cache_bin_ptr_array_s {
cache_bin_sz_t n;
void **ptr;
};
/*
* Declare a cache_bin_ptr_array_t sufficient for nval items.
*
* In the current implementation, this could be just part of a
* cache_bin_ptr_array_init_... call, since we reuse the cache bin stack memory.
* Indirecting behind a macro, though, means experimenting with linked-list
* representations is easy (since they'll require an alloca in the calling
* frame).
*/
#define CACHE_BIN_PTR_ARRAY_DECLARE(name, nval) \
cache_bin_ptr_array_t name; \
name.n = (nval)
/*
* Start a fill. The bin must be empty, and This must be followed by a
* finish_fill call before doing any alloc/dalloc operations on the bin.
*/
static inline void
cache_bin_init_ptr_array_for_fill(
cache_bin_t *bin, cache_bin_ptr_array_t *arr, cache_bin_sz_t nfill) {
cache_bin_assert_empty(bin);
arr->ptr = cache_bin_empty_position_get(bin) - nfill;
}
/*
* While nfill in cache_bin_init_ptr_array_for_fill is the number we *intend* to
* fill, nfilled here is the number we actually filled (which may be less, in
* case of OOM.
*/
static inline void
cache_bin_finish_fill(
cache_bin_t *bin, cache_bin_ptr_array_t *arr, cache_bin_sz_t nfilled) {
cache_bin_assert_empty(bin);
void **empty_position = cache_bin_empty_position_get(bin);
if (nfilled < arr->n) {
memmove(empty_position - nfilled, empty_position - arr->n,
nfilled * sizeof(void *));
}
bin->stack_head = empty_position - nfilled;
/* Reset the bin stats as it's merged during fill. */
if (config_stats) {
bin->tstats.nrequests = 0;
}
}
/*
* Same deal, but with flush. Unlike fill (which can fail), the user must flush
* everything we give them.
*/
static inline void
cache_bin_init_ptr_array_for_flush(
cache_bin_t *bin, cache_bin_ptr_array_t *arr, cache_bin_sz_t nflush) {
arr->ptr = cache_bin_empty_position_get(bin) - nflush;
assert(cache_bin_ncached_get_local(bin) == 0 || *arr->ptr != NULL);
}
static inline void
cache_bin_finish_flush(
cache_bin_t *bin, cache_bin_ptr_array_t *arr, cache_bin_sz_t nflushed) {
unsigned rem = cache_bin_ncached_get_local(bin) - nflushed;
memmove(
bin->stack_head + nflushed, bin->stack_head, rem * sizeof(void *));
bin->stack_head += nflushed;
cache_bin_low_water_adjust(bin);
/* Reset the bin stats as it's merged during flush. */
if (config_stats) {
bin->tstats.nrequests = 0;
}
}
static inline void
cache_bin_init_ptr_array_for_stashed(cache_bin_t *bin, szind_t binind,
cache_bin_ptr_array_t *arr, cache_bin_sz_t nstashed) {
assert(nstashed > 0);
assert(cache_bin_nstashed_get_local(bin) == nstashed);
void **low_bound = cache_bin_low_bound_get(bin);
arr->ptr = low_bound;
assert(*arr->ptr != NULL);
}
static inline void
cache_bin_finish_flush_stashed(cache_bin_t *bin) {
void **low_bound = cache_bin_low_bound_get(bin);
/* Reset the bin local full position. */
bin->low_bits_full = (uint16_t)(uintptr_t)low_bound;
assert(cache_bin_nstashed_get_local(bin) == 0);
/* Reset the bin stats as it's merged during flush. */
if (config_stats) {
bin->tstats.nrequests = 0;
}
}
/*
* Initialize a cache_bin_info to represent up to the given number of items in
* the cache_bins it is associated with.
*/
void cache_bin_info_init(
cache_bin_info_t *bin_info, cache_bin_sz_t ncached_max);
/*
* Given an array of initialized cache_bin_info_ts, determine how big an
* allocation is required to initialize a full set of cache_bin_ts.
*/
void cache_bin_info_compute_alloc(const cache_bin_info_t *infos, szind_t ninfos,
size_t *size, size_t *alignment);
/*
* Actually initialize some cache bins. Callers should allocate the backing
* memory indicated by a call to cache_bin_compute_alloc. They should then
* preincrement, call init once for each bin and info, and then call
* cache_bin_postincrement. *alloc_cur will then point immediately past the end
* of the allocation.
*/
void cache_bin_preincrement(const cache_bin_info_t *infos, szind_t ninfos,
void *alloc, size_t *cur_offset);
void cache_bin_postincrement(void *alloc, size_t *cur_offset);
void cache_bin_init(cache_bin_t *bin, const cache_bin_info_t *info, void *alloc,
size_t *cur_offset);
void cache_bin_init_disabled(cache_bin_t *bin, cache_bin_sz_t ncached_max);
bool cache_bin_stack_use_thp(void);
#endif /* JEMALLOC_INTERNAL_CACHE_BIN_H */

View file

@ -1,6 +1,7 @@
#ifndef JEMALLOC_INTERNAL_CKH_H
#define JEMALLOC_INTERNAL_CKH_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/tsd.h"
/* Cuckoo hashing implementation. Skip to the end for the interface. */
@ -21,8 +22,8 @@
#define LG_CKH_BUCKET_CELLS (LG_CACHELINE - LG_SIZEOF_PTR - 1)
/* Typedefs to allow easy function pointer passing. */
typedef void ckh_hash_t (const void *, size_t[2]);
typedef bool ckh_keycomp_t (const void *, const void *);
typedef void ckh_hash_t(const void *, size_t[2]);
typedef bool ckh_keycomp_t(const void *, const void *);
/* Hash table cell. */
typedef struct {
@ -55,7 +56,7 @@ typedef struct {
unsigned lg_curbuckets;
/* Hash and comparison functions. */
ckh_hash_t *hash;
ckh_hash_t *hash;
ckh_keycomp_t *keycomp;
/* Hash table with 2^lg_curbuckets buckets. */
@ -88,8 +89,8 @@ bool ckh_iter(ckh_t *ckh, size_t *tabind, void **key, void **data);
* the key and value, and doesn't do any lifetime management.
*/
bool ckh_insert(tsd_t *tsd, ckh_t *ckh, const void *key, const void *data);
bool ckh_remove(tsd_t *tsd, ckh_t *ckh, const void *searchkey, void **key,
void **data);
bool ckh_remove(
tsd_t *tsd, ckh_t *ckh, const void *searchkey, void **key, void **data);
bool ckh_search(ckh_t *ckh, const void *searchkey, void **key, void **data);
/* Some useful hash and comparison functions for strings and pointers. */

View file

@ -0,0 +1,23 @@
#ifndef JEMALLOC_INTERNAL_CONF_H
#define JEMALLOC_INTERNAL_CONF_H
#include "jemalloc/internal/sc.h"
void malloc_conf_init(sc_data_t *sc_data, unsigned bin_shard_sizes[SC_NBINS],
char readlink_buf[PATH_MAX + 1]);
void malloc_abort_invalid_conf(void);
#ifdef JEMALLOC_JET
extern bool had_conf_error;
bool conf_next(char const **opts_p, char const **k_p, size_t *klen_p,
char const **v_p, size_t *vlen_p);
void conf_error(
const char *msg, const char *k, size_t klen, const char *v, size_t vlen);
bool conf_handle_bool(const char *v, size_t vlen, bool *result);
bool conf_handle_signed(const char *v, size_t vlen, intmax_t min, intmax_t max,
bool check_min, bool check_max, bool clip, intmax_t *result);
bool conf_handle_char_p(const char *v, size_t vlen, char *dest, size_t dest_sz);
#endif
#endif /* JEMALLOC_INTERNAL_CONF_H */

View file

@ -0,0 +1,36 @@
#ifndef JEMALLOC_INTERNAL_COUNTER_H
#define JEMALLOC_INTERNAL_COUNTER_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/lockedint.h"
#include "jemalloc/internal/mutex.h"
typedef struct counter_accum_s {
LOCKEDINT_MTX_DECLARE(mtx)
locked_u64_t accumbytes;
uint64_t interval;
} counter_accum_t;
JEMALLOC_ALWAYS_INLINE bool
counter_accum(tsdn_t *tsdn, counter_accum_t *counter, uint64_t bytes) {
uint64_t interval = counter->interval;
assert(interval > 0);
LOCKEDINT_MTX_LOCK(tsdn, counter->mtx);
/*
* If the event moves fast enough (and/or if the event handling is slow
* enough), extreme overflow can cause counter trigger coalescing.
* This is an intentional mechanism that avoids rate-limiting
* allocation.
*/
bool overflow = locked_inc_mod_u64(tsdn, LOCKEDINT_MTX(counter->mtx),
&counter->accumbytes, bytes, interval);
LOCKEDINT_MTX_UNLOCK(tsdn, counter->mtx);
return overflow;
}
bool counter_accum_init(counter_accum_t *counter, uint64_t interval);
void counter_prefork(tsdn_t *tsdn, counter_accum_t *counter);
void counter_postfork_parent(tsdn_t *tsdn, counter_accum_t *counter);
void counter_postfork_child(tsdn_t *tsdn, counter_accum_t *counter);
#endif /* JEMALLOC_INTERNAL_COUNTER_H */

View file

@ -1,53 +1,65 @@
#ifndef JEMALLOC_INTERNAL_CTL_H
#define JEMALLOC_INTERNAL_CTL_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_stats.h"
#include "jemalloc/internal/background_thread_structs.h"
#include "jemalloc/internal/bin_stats.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/malloc_io.h"
#include "jemalloc/internal/mutex_prof.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/size_classes.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/stats.h"
/* Maximum ctl tree depth. */
#define CTL_MAX_DEPTH 7
#define CTL_MAX_DEPTH 7
#define CTL_MULTI_SETTING_MAX_LEN 1000
typedef struct ctl_node_s {
bool named;
} ctl_node_t;
typedef struct ctl_named_node_s {
ctl_node_t node;
ctl_node_t node;
const char *name;
/* If (nchildren == 0), this is a terminal node. */
size_t nchildren;
size_t nchildren;
const ctl_node_t *children;
int (*ctl)(tsd_t *, const size_t *, size_t, void *, size_t *, void *,
size_t);
int (*ctl)(
tsd_t *, const size_t *, size_t, void *, size_t *, void *, size_t);
} ctl_named_node_t;
typedef struct ctl_indexed_node_s {
struct ctl_node_s node;
const ctl_named_node_t *(*index)(tsdn_t *, const size_t *, size_t,
size_t);
const ctl_named_node_t *(*index)(
tsdn_t *, const size_t *, size_t, size_t);
} ctl_indexed_node_t;
typedef struct ctl_arena_stats_s {
arena_stats_t astats;
/* Aggregate stats for small size classes, based on bin stats. */
size_t allocated_small;
size_t allocated_small;
uint64_t nmalloc_small;
uint64_t ndalloc_small;
uint64_t nrequests_small;
uint64_t nfills_small;
uint64_t nflushes_small;
malloc_bin_stats_t bstats[NBINS];
malloc_large_stats_t lstats[NSIZES - NBINS];
bin_stats_data_t bstats[SC_NBINS];
arena_stats_large_t lstats[SC_NSIZES - SC_NBINS];
pac_estats_t estats[SC_NPSIZES];
hpa_shard_stats_t hpastats;
} ctl_arena_stats_t;
typedef struct ctl_stats_s {
size_t allocated;
size_t active;
size_t metadata;
size_t metadata_edata;
size_t metadata_rtree;
size_t metadata_thp;
size_t resident;
size_t mapped;
size_t retained;
@ -59,17 +71,17 @@ typedef struct ctl_stats_s {
typedef struct ctl_arena_s ctl_arena_t;
struct ctl_arena_s {
unsigned arena_ind;
bool initialized;
bool initialized;
ql_elm(ctl_arena_t) destroyed_link;
/* Basic stats, supported even if !config_stats. */
unsigned nthreads;
unsigned nthreads;
const char *dss;
ssize_t dirty_decay_ms;
ssize_t muzzy_decay_ms;
size_t pactive;
size_t pdirty;
size_t pmuzzy;
ssize_t dirty_decay_ms;
ssize_t muzzy_decay_ms;
size_t pactive;
size_t pdirty;
size_t pmuzzy;
/* NULL if !config_stats. */
ctl_arena_stats_t *astats;
@ -91,41 +103,70 @@ typedef struct ctl_arenas_s {
int ctl_byname(tsd_t *tsd, const char *name, void *oldp, size_t *oldlenp,
void *newp, size_t newlen);
int ctl_nametomib(tsdn_t *tsdn, const char *name, size_t *mibp,
size_t *miblenp);
int ctl_nametomib(tsd_t *tsd, const char *name, size_t *mibp, size_t *miblenp);
int ctl_bymib(tsd_t *tsd, const size_t *mib, size_t miblen, void *oldp,
size_t *oldlenp, void *newp, size_t newlen);
int ctl_mibnametomib(
tsd_t *tsd, size_t *mib, size_t miblen, const char *name, size_t *miblenp);
int ctl_bymibname(tsd_t *tsd, size_t *mib, size_t miblen, const char *name,
size_t *miblenp, void *oldp, size_t *oldlenp, void *newp, size_t newlen);
bool ctl_boot(void);
void ctl_prefork(tsdn_t *tsdn);
void ctl_postfork_parent(tsdn_t *tsdn);
void ctl_postfork_child(tsdn_t *tsdn);
void ctl_mtx_assert_held(tsdn_t *tsdn);
#define xmallctl(name, oldp, oldlenp, newp, newlen) do { \
if (je_mallctl(name, oldp, oldlenp, newp, newlen) \
!= 0) { \
malloc_printf( \
"<jemalloc>: Failure in xmallctl(\"%s\", ...)\n", \
name); \
abort(); \
} \
} while (0)
#define xmallctl(name, oldp, oldlenp, newp, newlen) \
do { \
if (je_mallctl(name, oldp, oldlenp, newp, newlen) != 0) { \
malloc_printf( \
"<jemalloc>: Failure in xmallctl(\"%s\", ...)\n", \
name); \
abort(); \
} \
} while (0)
#define xmallctlnametomib(name, mibp, miblenp) do { \
if (je_mallctlnametomib(name, mibp, miblenp) != 0) { \
malloc_printf("<jemalloc>: Failure in " \
"xmallctlnametomib(\"%s\", ...)\n", name); \
abort(); \
} \
} while (0)
#define xmallctlnametomib(name, mibp, miblenp) \
do { \
if (je_mallctlnametomib(name, mibp, miblenp) != 0) { \
malloc_printf( \
"<jemalloc>: Failure in " \
"xmallctlnametomib(\"%s\", ...)\n", \
name); \
abort(); \
} \
} while (0)
#define xmallctlbymib(mib, miblen, oldp, oldlenp, newp, newlen) do { \
if (je_mallctlbymib(mib, miblen, oldp, oldlenp, newp, \
newlen) != 0) { \
malloc_write( \
"<jemalloc>: Failure in xmallctlbymib()\n"); \
abort(); \
} \
} while (0)
#define xmallctlbymib(mib, miblen, oldp, oldlenp, newp, newlen) \
do { \
if (je_mallctlbymib(mib, miblen, oldp, oldlenp, newp, newlen) \
!= 0) { \
malloc_write( \
"<jemalloc>: Failure in xmallctlbymib()\n"); \
abort(); \
} \
} while (0)
#define xmallctlmibnametomib(mib, miblen, name, miblenp) \
do { \
if (ctl_mibnametomib(tsd_fetch(), mib, miblen, name, miblenp) \
!= 0) { \
malloc_write( \
"<jemalloc>: Failure in ctl_mibnametomib()\n"); \
abort(); \
} \
} while (0)
#define xmallctlbymibname( \
mib, miblen, name, miblenp, oldp, oldlenp, newp, newlen) \
do { \
if (ctl_bymibname(tsd_fetch(), mib, miblen, name, miblenp, \
oldp, oldlenp, newp, newlen) \
!= 0) { \
malloc_write( \
"<jemalloc>: Failure in ctl_bymibname()\n"); \
abort(); \
} \
} while (0)
#endif /* JEMALLOC_INTERNAL_CTL_H */

View file

@ -0,0 +1,188 @@
#ifndef JEMALLOC_INTERNAL_DECAY_H
#define JEMALLOC_INTERNAL_DECAY_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/smoothstep.h"
#define DECAY_UNBOUNDED_TIME_TO_PURGE ((uint64_t) - 1)
/*
* The decay_t computes the number of pages we should purge at any given time.
* Page allocators inform a decay object when pages enter a decay-able state
* (i.e. dirty or muzzy), and query it to determine how many pages should be
* purged at any given time.
*
* This is mostly a single-threaded data structure and doesn't care about
* synchronization at all; it's the caller's responsibility to manage their
* synchronization on their own. There are two exceptions:
* 1) It's OK to racily call decay_ms_read (i.e. just the simplest state query).
* 2) The mtx and purging fields live (and are initialized) here, but are
* logically owned by the page allocator. This is just a convenience (since
* those fields would be duplicated for both the dirty and muzzy states
* otherwise).
*/
typedef struct decay_s decay_t;
struct decay_s {
/* Synchronizes all non-atomic fields. */
malloc_mutex_t mtx;
/*
* True if a thread is currently purging the extents associated with
* this decay structure.
*/
bool purging;
/*
* Approximate time in milliseconds from the creation of a set of unused
* dirty pages until an equivalent set of unused dirty pages is purged
* and/or reused.
*/
atomic_zd_t time_ms;
/* time / SMOOTHSTEP_NSTEPS. */
nstime_t interval;
/*
* Time at which the current decay interval logically started. We do
* not actually advance to a new epoch until sometime after it starts
* because of scheduling and computation delays, and it is even possible
* to completely skip epochs. In all cases, during epoch advancement we
* merge all relevant activity into the most recently recorded epoch.
*/
nstime_t epoch;
/* Deadline randomness generator. */
uint64_t jitter_state;
/*
* Deadline for current epoch. This is the sum of interval and per
* epoch jitter which is a uniform random variable in [0..interval).
* Epochs always advance by precise multiples of interval, but we
* randomize the deadline to reduce the likelihood of arenas purging in
* lockstep.
*/
nstime_t deadline;
/*
* The number of pages we cap ourselves at in the current epoch, per
* decay policies. Updated on an epoch change. After an epoch change,
* the caller should take steps to try to purge down to this amount.
*/
size_t npages_limit;
/*
* Number of unpurged pages at beginning of current epoch. During epoch
* advancement we use the delta between arena->decay_*.nunpurged and
* ecache_npages_get(&arena->ecache_*) to determine how many dirty pages,
* if any, were generated.
*/
size_t nunpurged;
/*
* Trailing log of how many unused dirty pages were generated during
* each of the past SMOOTHSTEP_NSTEPS decay epochs, where the last
* element is the most recent epoch. Corresponding epoch times are
* relative to epoch.
*
* Updated only on epoch advance, triggered by
* decay_maybe_advance_epoch, below.
*/
size_t backlog[SMOOTHSTEP_NSTEPS];
/* Peak number of pages in associated extents. Used for debug only. */
uint64_t ceil_npages;
};
/*
* The current decay time setting. This is the only public access to a decay_t
* that's allowed without holding mtx.
*/
static inline ssize_t
decay_ms_read(const decay_t *decay) {
return atomic_load_zd(&decay->time_ms, ATOMIC_RELAXED);
}
/*
* See the comment on the struct field -- the limit on pages we should allow in
* this decay state this epoch.
*/
static inline size_t
decay_npages_limit_get(const decay_t *decay) {
return decay->npages_limit;
}
/* How many unused dirty pages were generated during the last epoch. */
static inline size_t
decay_epoch_npages_delta(const decay_t *decay) {
return decay->backlog[SMOOTHSTEP_NSTEPS - 1];
}
/*
* Current epoch duration, in nanoseconds. Given that new epochs are started
* somewhat haphazardly, this is not necessarily exactly the time between any
* two calls to decay_maybe_advance_epoch; see the comments on fields in the
* decay_t.
*/
static inline uint64_t
decay_epoch_duration_ns(const decay_t *decay) {
return nstime_ns(&decay->interval);
}
static inline bool
decay_immediately(const decay_t *decay) {
ssize_t decay_ms = decay_ms_read(decay);
return decay_ms == 0;
}
static inline bool
decay_disabled(const decay_t *decay) {
ssize_t decay_ms = decay_ms_read(decay);
return decay_ms < 0;
}
/* Returns true if decay is enabled and done gradually. */
static inline bool
decay_gradually(const decay_t *decay) {
ssize_t decay_ms = decay_ms_read(decay);
return decay_ms > 0;
}
/*
* Returns true if the passed in decay time setting is valid.
* < -1 : invalid
* -1 : never decay
* 0 : decay immediately
* > 0 : some positive decay time, up to a maximum allowed value of
* NSTIME_SEC_MAX * 1000, which corresponds to decaying somewhere in the early
* 27th century. By that time, we expect to have implemented alternate purging
* strategies.
*/
bool decay_ms_valid(ssize_t decay_ms);
/*
* As a precondition, the decay_t must be zeroed out (as if with memset).
*
* Returns true on error.
*/
bool decay_init(decay_t *decay, nstime_t *cur_time, ssize_t decay_ms);
/*
* Given an already-initialized decay_t, reinitialize it with the given decay
* time. The decay_t must have previously been initialized (and should not then
* be zeroed).
*/
void decay_reinit(decay_t *decay, nstime_t *cur_time, ssize_t decay_ms);
/*
* Compute how many of 'npages_new' pages we would need to purge in 'time'.
*/
uint64_t decay_npages_purge_in(
decay_t *decay, nstime_t *time, size_t npages_new);
/* Returns true if the epoch advanced and there are pages to purge. */
bool decay_maybe_advance_epoch(
decay_t *decay, nstime_t *new_time, size_t current_npages);
/*
* Calculates wait time until a number of pages in the interval
* [0.5 * npages_threshold .. 1.5 * npages_threshold] should be purged.
*
* Returns number of nanoseconds or DECAY_UNBOUNDED_TIME_TO_PURGE in case of
* indefinite wait.
*/
uint64_t decay_ns_until_purge(
decay_t *decay, size_t npages_current, uint64_t npages_threshold);
#endif /* JEMALLOC_INTERNAL_DECAY_H */

View file

@ -0,0 +1,42 @@
#ifndef JEMALLOC_INTERNAL_DIV_H
#define JEMALLOC_INTERNAL_DIV_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
/*
* This module does the division that computes the index of a region in a slab,
* given its offset relative to the base.
* That is, given a divisor d, an n = i * d (all integers), we'll return i.
* We do some pre-computation to do this more quickly than a CPU division
* instruction.
* We bound n < 2^32, and don't support dividing by one.
*/
typedef struct div_info_s div_info_t;
struct div_info_s {
uint32_t magic;
#ifdef JEMALLOC_DEBUG
size_t d;
#endif
};
void div_init(div_info_t *div_info, size_t divisor);
static inline size_t
div_compute(div_info_t *div_info, size_t n) {
assert(n <= (uint32_t)-1);
/*
* This generates, e.g. mov; imul; shr on x86-64. On a 32-bit machine,
* the compilers I tried were all smart enough to turn this into the
* appropriate "get the high 32 bits of the result of a multiply" (e.g.
* mul; mov edx eax; on x86, umull on arm, etc.).
*/
size_t i = ((uint64_t)n * (uint64_t)div_info->magic) >> 32;
#ifdef JEMALLOC_DEBUG
assert(i * div_info->d == n);
#endif
return i;
}
#endif /* JEMALLOC_INTERNAL_DIV_H */

View file

@ -0,0 +1,56 @@
#ifndef JEMALLOC_INTERNAL_ECACHE_H
#define JEMALLOC_INTERNAL_ECACHE_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/eset.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/san.h"
typedef struct ecache_s ecache_t;
struct ecache_s {
malloc_mutex_t mtx;
eset_t eset;
eset_t guarded_eset;
/* All stored extents must be in the same state. */
extent_state_t state;
/* The index of the ehooks the ecache is associated with. */
unsigned ind;
/*
* If true, delay coalescing until eviction; otherwise coalesce during
* deallocation.
*/
bool delay_coalesce;
};
static inline size_t
ecache_npages_get(ecache_t *ecache) {
return eset_npages_get(&ecache->eset)
+ eset_npages_get(&ecache->guarded_eset);
}
/* Get the number of extents in the given page size index. */
static inline size_t
ecache_nextents_get(ecache_t *ecache, pszind_t ind) {
return eset_nextents_get(&ecache->eset, ind)
+ eset_nextents_get(&ecache->guarded_eset, ind);
}
/* Get the sum total bytes of the extents in the given page size index. */
static inline size_t
ecache_nbytes_get(ecache_t *ecache, pszind_t ind) {
return eset_nbytes_get(&ecache->eset, ind)
+ eset_nbytes_get(&ecache->guarded_eset, ind);
}
static inline unsigned
ecache_ind_get(ecache_t *ecache) {
return ecache->ind;
}
bool ecache_init(tsdn_t *tsdn, ecache_t *ecache, extent_state_t state,
unsigned ind, bool delay_coalesce);
void ecache_prefork(tsdn_t *tsdn, ecache_t *ecache);
void ecache_postfork_parent(tsdn_t *tsdn, ecache_t *ecache);
void ecache_postfork_child(tsdn_t *tsdn, ecache_t *ecache);
#endif /* JEMALLOC_INTERNAL_ECACHE_H */

View file

@ -0,0 +1,795 @@
#ifndef JEMALLOC_INTERNAL_EDATA_H
#define JEMALLOC_INTERNAL_EDATA_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/bin_info.h"
#include "jemalloc/internal/bit_util.h"
#include "jemalloc/internal/hpdata.h"
#include "jemalloc/internal/nstime.h"
#include "jemalloc/internal/ph.h"
#include "jemalloc/internal/prof_types.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/slab_data.h"
#include "jemalloc/internal/sz.h"
#include "jemalloc/internal/typed_list.h"
/*
* sizeof(edata_t) is 128 bytes on 64-bit architectures. Ensure the alignment
* to free up the low bits in the rtree leaf.
*/
#define EDATA_ALIGNMENT 128
/*
* Defines how many nodes visited when enumerating the heap to search for
* qualified extents. More nodes visited may result in better choices at
* the cost of longer search time. This size should not exceed 2^16 - 1
* because we use uint16_t for accessing the queue needed for enumeration.
*/
#define ESET_ENUMERATE_MAX_NUM 32
enum extent_state_e {
extent_state_active = 0,
extent_state_dirty = 1,
extent_state_muzzy = 2,
extent_state_retained = 3,
extent_state_transition = 4, /* States below are intermediate. */
extent_state_merging = 5,
extent_state_max = 5 /* Sanity checking only. */
};
typedef enum extent_state_e extent_state_t;
enum extent_head_state_e {
EXTENT_NOT_HEAD,
EXTENT_IS_HEAD /* See comments in ehooks_default_merge_impl(). */
};
typedef enum extent_head_state_e extent_head_state_t;
/*
* Which implementation of the page allocator interface, (PAI, defined in
* pai.h) owns the given extent?
*/
enum extent_pai_e { EXTENT_PAI_PAC = 0, EXTENT_PAI_HPA = 1 };
typedef enum extent_pai_e extent_pai_t;
struct e_prof_info_s {
/* Time when this was allocated. */
nstime_t e_prof_alloc_time;
/* Allocation request size. */
size_t e_prof_alloc_size;
/* Points to a prof_tctx_t. */
atomic_p_t e_prof_tctx;
/*
* Points to a prof_recent_t for the allocation; NULL
* means the recent allocation record no longer exists.
* Protected by prof_recent_alloc_mtx.
*/
atomic_p_t e_prof_recent_alloc;
};
typedef struct e_prof_info_s e_prof_info_t;
/*
* The information about a particular edata that lives in an emap. Space is
* more precious there (the information, plus the edata pointer, has to live in
* a 64-bit word if we want to enable a packed representation.
*
* There are two things that are special about the information here:
* - It's quicker to access. You have one fewer pointer hop, since finding the
* edata_t associated with an item always requires accessing the rtree leaf in
* which this data is stored.
* - It can be read unsynchronized, and without worrying about lifetime issues.
*/
typedef struct edata_map_info_s edata_map_info_t;
struct edata_map_info_s {
bool slab;
szind_t szind;
};
typedef struct edata_cmp_summary_s edata_cmp_summary_t;
struct edata_cmp_summary_s {
uint64_t sn;
uintptr_t addr;
};
/* Extent (span of pages). Use accessor functions for e_* fields. */
typedef struct edata_s edata_t;
ph_structs(edata_avail, edata_t, ESET_ENUMERATE_MAX_NUM);
ph_structs(edata_heap, edata_t, ESET_ENUMERATE_MAX_NUM);
struct edata_s {
/*
* Bitfield containing several fields:
*
* a: arena_ind
* b: slab
* c: committed
* p: pai
* z: zeroed
* g: guarded
* t: state
* i: szind
* f: nfree
* s: bin_shard
*
* 00000000 ... 0000ssss ssffffff ffffiiii iiiitttg zpcbaaaa aaaaaaaa
*
* arena_ind: Arena from which this extent came, or all 1 bits if
* unassociated.
*
* slab: The slab flag indicates whether the extent is used for a slab
* of small regions. This helps differentiate small size classes,
* and it indicates whether interior pointers can be looked up via
* iealloc().
*
* committed: The committed flag indicates whether physical memory is
* committed to the extent, whether explicitly or implicitly
* as on a system that overcommits and satisfies physical
* memory needs on demand via soft page faults.
*
* pai: The pai flag is an extent_pai_t.
*
* zeroed: The zeroed flag is used by extent recycling code to track
* whether memory is zero-filled.
*
* guarded: The guarded flag is use by the sanitizer to track whether
* the extent has page guards around it.
*
* state: The state flag is an extent_state_t.
*
* szind: The szind flag indicates usable size class index for
* allocations residing in this extent, regardless of whether the
* extent is a slab. Extent size and usable size often differ
* even for non-slabs, either due to sz_large_pad or promotion of
* sampled small regions.
*
* nfree: Number of free regions in slab.
*
* bin_shard: the shard of the bin from which this extent came.
*/
uint64_t e_bits;
#define MASK(CURRENT_FIELD_WIDTH, CURRENT_FIELD_SHIFT) \
((((((uint64_t)0x1U) << (CURRENT_FIELD_WIDTH)) - 1)) \
<< (CURRENT_FIELD_SHIFT))
#define EDATA_BITS_ARENA_WIDTH MALLOCX_ARENA_BITS
#define EDATA_BITS_ARENA_SHIFT 0
#define EDATA_BITS_ARENA_MASK \
MASK(EDATA_BITS_ARENA_WIDTH, EDATA_BITS_ARENA_SHIFT)
#define EDATA_BITS_SLAB_WIDTH 1
#define EDATA_BITS_SLAB_SHIFT (EDATA_BITS_ARENA_WIDTH + EDATA_BITS_ARENA_SHIFT)
#define EDATA_BITS_SLAB_MASK MASK(EDATA_BITS_SLAB_WIDTH, EDATA_BITS_SLAB_SHIFT)
#define EDATA_BITS_COMMITTED_WIDTH 1
#define EDATA_BITS_COMMITTED_SHIFT \
(EDATA_BITS_SLAB_WIDTH + EDATA_BITS_SLAB_SHIFT)
#define EDATA_BITS_COMMITTED_MASK \
MASK(EDATA_BITS_COMMITTED_WIDTH, EDATA_BITS_COMMITTED_SHIFT)
#define EDATA_BITS_PAI_WIDTH 1
#define EDATA_BITS_PAI_SHIFT \
(EDATA_BITS_COMMITTED_WIDTH + EDATA_BITS_COMMITTED_SHIFT)
#define EDATA_BITS_PAI_MASK MASK(EDATA_BITS_PAI_WIDTH, EDATA_BITS_PAI_SHIFT)
#define EDATA_BITS_ZEROED_WIDTH 1
#define EDATA_BITS_ZEROED_SHIFT (EDATA_BITS_PAI_WIDTH + EDATA_BITS_PAI_SHIFT)
#define EDATA_BITS_ZEROED_MASK \
MASK(EDATA_BITS_ZEROED_WIDTH, EDATA_BITS_ZEROED_SHIFT)
#define EDATA_BITS_GUARDED_WIDTH 1
#define EDATA_BITS_GUARDED_SHIFT \
(EDATA_BITS_ZEROED_WIDTH + EDATA_BITS_ZEROED_SHIFT)
#define EDATA_BITS_GUARDED_MASK \
MASK(EDATA_BITS_GUARDED_WIDTH, EDATA_BITS_GUARDED_SHIFT)
#define EDATA_BITS_STATE_WIDTH 3
#define EDATA_BITS_STATE_SHIFT \
(EDATA_BITS_GUARDED_WIDTH + EDATA_BITS_GUARDED_SHIFT)
#define EDATA_BITS_STATE_MASK \
MASK(EDATA_BITS_STATE_WIDTH, EDATA_BITS_STATE_SHIFT)
#define EDATA_BITS_SZIND_WIDTH LG_CEIL(SC_NSIZES)
#define EDATA_BITS_SZIND_SHIFT (EDATA_BITS_STATE_WIDTH + EDATA_BITS_STATE_SHIFT)
#define EDATA_BITS_SZIND_MASK \
MASK(EDATA_BITS_SZIND_WIDTH, EDATA_BITS_SZIND_SHIFT)
#define EDATA_BITS_NFREE_WIDTH (SC_LG_SLAB_MAXREGS + 1)
#define EDATA_BITS_NFREE_SHIFT (EDATA_BITS_SZIND_WIDTH + EDATA_BITS_SZIND_SHIFT)
#define EDATA_BITS_NFREE_MASK \
MASK(EDATA_BITS_NFREE_WIDTH, EDATA_BITS_NFREE_SHIFT)
#define EDATA_BITS_BINSHARD_WIDTH 6
#define EDATA_BITS_BINSHARD_SHIFT \
(EDATA_BITS_NFREE_WIDTH + EDATA_BITS_NFREE_SHIFT)
#define EDATA_BITS_BINSHARD_MASK \
MASK(EDATA_BITS_BINSHARD_WIDTH, EDATA_BITS_BINSHARD_SHIFT)
#define EDATA_BITS_IS_HEAD_WIDTH 1
#define EDATA_BITS_IS_HEAD_SHIFT \
(EDATA_BITS_BINSHARD_WIDTH + EDATA_BITS_BINSHARD_SHIFT)
#define EDATA_BITS_IS_HEAD_MASK \
MASK(EDATA_BITS_IS_HEAD_WIDTH, EDATA_BITS_IS_HEAD_SHIFT)
/* Pointer to the extent that this structure is responsible for. */
void *e_addr;
union {
/*
* Extent size and serial number associated with the extent
* structure (different than the serial number for the extent at
* e_addr).
*
* ssssssss [...] ssssssss ssssnnnn nnnnnnnn
*/
size_t e_size_esn;
#define EDATA_SIZE_MASK ((size_t) ~(PAGE - 1))
#define EDATA_ESN_MASK ((size_t)PAGE - 1)
/* Base extent size, which may not be a multiple of PAGE. */
size_t e_bsize;
};
/*
* If this edata is a user allocation from an HPA, it comes out of some
* pageslab (we don't yet support hugepage allocations that don't fit
* into pageslabs). This tracks it.
*/
hpdata_t *e_ps;
/*
* Serial number. These are not necessarily unique; splitting an extent
* results in two extents with the same serial number.
*/
uint64_t e_sn;
union {
/*
* List linkage used when the edata_t is active; either in
* arena's large allocations or bin_t's slabs_full.
*/
ql_elm(edata_t) ql_link_active;
/*
* Pairing heap linkage. Used whenever the extent is inactive
* (in the page allocators), or when it is active and in
* slabs_nonfull, or when the edata_t is unassociated with an
* extent and sitting in an edata_cache.
*/
union {
edata_heap_link_t heap_link;
edata_avail_link_t avail_link;
};
};
union {
/*
* List linkage used when the extent is inactive:
* - Stashed dirty extents
* - Ecache LRU functionality.
*/
ql_elm(edata_t) ql_link_inactive;
/* Small region slab metadata. */
slab_data_t e_slab_data;
/* Profiling data, used for large objects. */
e_prof_info_t e_prof_info;
};
};
TYPED_LIST(edata_list_active, edata_t, ql_link_active)
TYPED_LIST(edata_list_inactive, edata_t, ql_link_inactive)
static inline unsigned
edata_arena_ind_get(const edata_t *edata) {
unsigned arena_ind = (unsigned)((edata->e_bits & EDATA_BITS_ARENA_MASK)
>> EDATA_BITS_ARENA_SHIFT);
assert(arena_ind < MALLOCX_ARENA_LIMIT);
return arena_ind;
}
static inline szind_t
edata_szind_get_maybe_invalid(const edata_t *edata) {
szind_t szind = (szind_t)((edata->e_bits & EDATA_BITS_SZIND_MASK)
>> EDATA_BITS_SZIND_SHIFT);
assert(szind <= SC_NSIZES);
return szind;
}
static inline szind_t
edata_szind_get(const edata_t *edata) {
szind_t szind = edata_szind_get_maybe_invalid(edata);
assert(szind < SC_NSIZES); /* Never call when "invalid". */
return szind;
}
static inline size_t
edata_usize_get(const edata_t *edata) {
assert(edata != NULL);
/*
* When sz_large_size_classes_disabled() is true, two cases:
* 1. if usize_from_ind is not smaller than SC_LARGE_MINCLASS,
* usize_from_size is accurate;
* 2. otherwise, usize_from_ind is accurate.
*
* When sz_large_size_classes_disabled() is not true, the two should be the
* same when usize_from_ind is not smaller than SC_LARGE_MINCLASS.
*
* Note sampled small allocs will be promoted. Their extent size is
* recorded in edata_size_get(edata), while their szind reflects the
* true usize. Thus, usize retrieved here is still accurate for
* sampled small allocs.
*/
szind_t szind = edata_szind_get(edata);
#ifdef JEMALLOC_JET
/*
* Double free is invalid and results in undefined behavior. However,
* for double free tests to end gracefully, return an invalid usize
* when szind shows the edata is not active, i.e., szind == SC_NSIZES.
*/
if (unlikely(szind == SC_NSIZES)) {
return SC_LARGE_MAXCLASS + 1;
}
#endif
if (!sz_large_size_classes_disabled() || szind < SC_NBINS) {
size_t usize_from_ind = sz_index2size(szind);
if (!sz_large_size_classes_disabled()
&& usize_from_ind >= SC_LARGE_MINCLASS) {
size_t size = (edata->e_size_esn & EDATA_SIZE_MASK);
assert(size > sz_large_pad);
size_t usize_from_size = size - sz_large_pad;
assert(usize_from_ind == usize_from_size);
}
return usize_from_ind;
}
size_t size = (edata->e_size_esn & EDATA_SIZE_MASK);
assert(size > sz_large_pad);
size_t usize_from_size = size - sz_large_pad;
/*
* no matter large size classes disabled or not, usize retrieved from
* size is not accurate when smaller than SC_LARGE_MINCLASS.
*/
assert(usize_from_size >= SC_LARGE_MINCLASS);
return usize_from_size;
}
static inline unsigned
edata_binshard_get(const edata_t *edata) {
unsigned binshard = (unsigned)((edata->e_bits
& EDATA_BITS_BINSHARD_MASK)
>> EDATA_BITS_BINSHARD_SHIFT);
assert(binshard < bin_infos[edata_szind_get(edata)].n_shards);
return binshard;
}
static inline uint64_t
edata_sn_get(const edata_t *edata) {
return edata->e_sn;
}
static inline extent_state_t
edata_state_get(const edata_t *edata) {
return (extent_state_t)((edata->e_bits & EDATA_BITS_STATE_MASK)
>> EDATA_BITS_STATE_SHIFT);
}
static inline bool
edata_guarded_get(const edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_GUARDED_MASK)
>> EDATA_BITS_GUARDED_SHIFT);
}
static inline bool
edata_zeroed_get(const edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_ZEROED_MASK)
>> EDATA_BITS_ZEROED_SHIFT);
}
static inline bool
edata_committed_get(const edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_COMMITTED_MASK)
>> EDATA_BITS_COMMITTED_SHIFT);
}
static inline extent_pai_t
edata_pai_get(const edata_t *edata) {
return (extent_pai_t)((edata->e_bits & EDATA_BITS_PAI_MASK)
>> EDATA_BITS_PAI_SHIFT);
}
static inline bool
edata_slab_get(const edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_SLAB_MASK)
>> EDATA_BITS_SLAB_SHIFT);
}
static inline unsigned
edata_nfree_get(const edata_t *edata) {
assert(edata_slab_get(edata));
return (unsigned)((edata->e_bits & EDATA_BITS_NFREE_MASK)
>> EDATA_BITS_NFREE_SHIFT);
}
static inline void *
edata_base_get(const edata_t *edata) {
assert(edata->e_addr == PAGE_ADDR2BASE(edata->e_addr)
|| !edata_slab_get(edata));
return PAGE_ADDR2BASE(edata->e_addr);
}
static inline void *
edata_addr_get(const edata_t *edata) {
assert(edata->e_addr == PAGE_ADDR2BASE(edata->e_addr)
|| !edata_slab_get(edata));
return edata->e_addr;
}
static inline size_t
edata_size_get(const edata_t *edata) {
return (edata->e_size_esn & EDATA_SIZE_MASK);
}
static inline size_t
edata_esn_get(const edata_t *edata) {
return (edata->e_size_esn & EDATA_ESN_MASK);
}
static inline size_t
edata_bsize_get(const edata_t *edata) {
return edata->e_bsize;
}
static inline hpdata_t *
edata_ps_get(const edata_t *edata) {
assert(edata_pai_get(edata) == EXTENT_PAI_HPA);
return edata->e_ps;
}
static inline void *
edata_before_get(const edata_t *edata) {
return (void *)((byte_t *)edata_base_get(edata) - PAGE);
}
static inline void *
edata_last_get(const edata_t *edata) {
return (void *)((byte_t *)edata_base_get(edata) + edata_size_get(edata)
- PAGE);
}
static inline void *
edata_past_get(const edata_t *edata) {
return (
void *)((byte_t *)edata_base_get(edata) + edata_size_get(edata));
}
static inline slab_data_t *
edata_slab_data_get(edata_t *edata) {
assert(edata_slab_get(edata));
return &edata->e_slab_data;
}
static inline const slab_data_t *
edata_slab_data_get_const(const edata_t *edata) {
assert(edata_slab_get(edata));
return &edata->e_slab_data;
}
static inline prof_tctx_t *
edata_prof_tctx_get(const edata_t *edata) {
return (prof_tctx_t *)atomic_load_p(
&edata->e_prof_info.e_prof_tctx, ATOMIC_ACQUIRE);
}
static inline const nstime_t *
edata_prof_alloc_time_get(const edata_t *edata) {
return &edata->e_prof_info.e_prof_alloc_time;
}
static inline size_t
edata_prof_alloc_size_get(const edata_t *edata) {
return edata->e_prof_info.e_prof_alloc_size;
}
static inline prof_recent_t *
edata_prof_recent_alloc_get_dont_call_directly(const edata_t *edata) {
return (prof_recent_t *)atomic_load_p(
&edata->e_prof_info.e_prof_recent_alloc, ATOMIC_RELAXED);
}
static inline void
edata_arena_ind_set(edata_t *edata, unsigned arena_ind) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_ARENA_MASK)
| ((uint64_t)arena_ind << EDATA_BITS_ARENA_SHIFT);
}
static inline void
edata_binshard_set(edata_t *edata, unsigned binshard) {
/* The assertion assumes szind is set already. */
assert(binshard < bin_infos[edata_szind_get(edata)].n_shards);
edata->e_bits = (edata->e_bits & ~EDATA_BITS_BINSHARD_MASK)
| ((uint64_t)binshard << EDATA_BITS_BINSHARD_SHIFT);
}
static inline void
edata_addr_set(edata_t *edata, void *addr) {
edata->e_addr = addr;
}
static inline void
edata_size_set(edata_t *edata, size_t size) {
assert((size & ~EDATA_SIZE_MASK) == 0);
edata->e_size_esn = size | (edata->e_size_esn & ~EDATA_SIZE_MASK);
}
static inline void
edata_esn_set(edata_t *edata, size_t esn) {
edata->e_size_esn = (edata->e_size_esn & ~EDATA_ESN_MASK)
| (esn & EDATA_ESN_MASK);
}
static inline void
edata_bsize_set(edata_t *edata, size_t bsize) {
edata->e_bsize = bsize;
}
static inline void
edata_ps_set(edata_t *edata, hpdata_t *ps) {
assert(edata_pai_get(edata) == EXTENT_PAI_HPA);
edata->e_ps = ps;
}
static inline void
edata_szind_set(edata_t *edata, szind_t szind) {
assert(szind <= SC_NSIZES); /* SC_NSIZES means "invalid". */
edata->e_bits = (edata->e_bits & ~EDATA_BITS_SZIND_MASK)
| ((uint64_t)szind << EDATA_BITS_SZIND_SHIFT);
}
static inline void
edata_nfree_set(edata_t *edata, unsigned nfree) {
assert(edata_slab_get(edata));
edata->e_bits = (edata->e_bits & ~EDATA_BITS_NFREE_MASK)
| ((uint64_t)nfree << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_nfree_binshard_set(edata_t *edata, unsigned nfree, unsigned binshard) {
/* The assertion assumes szind is set already. */
assert(binshard < bin_infos[edata_szind_get(edata)].n_shards);
edata->e_bits = (edata->e_bits
& (~EDATA_BITS_NFREE_MASK
& ~EDATA_BITS_BINSHARD_MASK))
| ((uint64_t)binshard << EDATA_BITS_BINSHARD_SHIFT)
| ((uint64_t)nfree << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_nfree_inc(edata_t *edata) {
assert(edata_slab_get(edata));
edata->e_bits += ((uint64_t)1U << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_nfree_dec(edata_t *edata) {
assert(edata_slab_get(edata));
edata->e_bits -= ((uint64_t)1U << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_nfree_sub(edata_t *edata, uint64_t n) {
assert(edata_slab_get(edata));
edata->e_bits -= (n << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_sn_set(edata_t *edata, uint64_t sn) {
edata->e_sn = sn;
}
static inline void
edata_state_set(edata_t *edata, extent_state_t state) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_STATE_MASK)
| ((uint64_t)state << EDATA_BITS_STATE_SHIFT);
}
static inline void
edata_guarded_set(edata_t *edata, bool guarded) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_GUARDED_MASK)
| ((uint64_t)guarded << EDATA_BITS_GUARDED_SHIFT);
}
static inline void
edata_zeroed_set(edata_t *edata, bool zeroed) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_ZEROED_MASK)
| ((uint64_t)zeroed << EDATA_BITS_ZEROED_SHIFT);
}
static inline void
edata_committed_set(edata_t *edata, bool committed) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_COMMITTED_MASK)
| ((uint64_t)committed << EDATA_BITS_COMMITTED_SHIFT);
}
static inline void
edata_pai_set(edata_t *edata, extent_pai_t pai) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_PAI_MASK)
| ((uint64_t)pai << EDATA_BITS_PAI_SHIFT);
}
static inline void
edata_slab_set(edata_t *edata, bool slab) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_SLAB_MASK)
| ((uint64_t)slab << EDATA_BITS_SLAB_SHIFT);
}
static inline void
edata_prof_tctx_set(edata_t *edata, prof_tctx_t *tctx) {
atomic_store_p(&edata->e_prof_info.e_prof_tctx, tctx, ATOMIC_RELEASE);
}
static inline void
edata_prof_alloc_time_set(edata_t *edata, nstime_t *t) {
nstime_copy(&edata->e_prof_info.e_prof_alloc_time, t);
}
static inline void
edata_prof_alloc_size_set(edata_t *edata, size_t size) {
edata->e_prof_info.e_prof_alloc_size = size;
}
static inline void
edata_prof_recent_alloc_set_dont_call_directly(
edata_t *edata, prof_recent_t *recent_alloc) {
atomic_store_p(&edata->e_prof_info.e_prof_recent_alloc, recent_alloc,
ATOMIC_RELAXED);
}
static inline bool
edata_is_head_get(edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_IS_HEAD_MASK)
>> EDATA_BITS_IS_HEAD_SHIFT);
}
static inline void
edata_is_head_set(edata_t *edata, bool is_head) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_IS_HEAD_MASK)
| ((uint64_t)is_head << EDATA_BITS_IS_HEAD_SHIFT);
}
static inline bool
edata_state_in_transition(extent_state_t state) {
return state >= extent_state_transition;
}
/*
* Because this function is implemented as a sequence of bitfield modifications,
* even though each individual bit is properly initialized, we technically read
* uninitialized data within it. This is mostly fine, since most callers get
* their edatas from zeroing sources, but callers who make stack edata_ts need
* to manually zero them.
*/
static inline void
edata_init(edata_t *edata, unsigned arena_ind, void *addr, size_t size,
bool slab, szind_t szind, uint64_t sn, extent_state_t state, bool zeroed,
bool committed, extent_pai_t pai, extent_head_state_t is_head) {
assert(addr == PAGE_ADDR2BASE(addr) || !slab);
edata_arena_ind_set(edata, arena_ind);
edata_addr_set(edata, addr);
edata_size_set(edata, size);
edata_slab_set(edata, slab);
edata_szind_set(edata, szind);
edata_sn_set(edata, sn);
edata_state_set(edata, state);
edata_guarded_set(edata, false);
edata_zeroed_set(edata, zeroed);
edata_committed_set(edata, committed);
edata_pai_set(edata, pai);
edata_is_head_set(edata, is_head == EXTENT_IS_HEAD);
if (config_prof) {
edata_prof_tctx_set(edata, NULL);
}
}
static inline void
edata_binit(
edata_t *edata, void *addr, size_t bsize, uint64_t sn, bool reused) {
edata_arena_ind_set(edata, (1U << MALLOCX_ARENA_BITS) - 1);
edata_addr_set(edata, addr);
edata_bsize_set(edata, bsize);
edata_slab_set(edata, false);
edata_szind_set(edata, SC_NSIZES);
edata_sn_set(edata, sn);
edata_state_set(edata, extent_state_active);
/* See comments in base_edata_is_reused. */
edata_guarded_set(edata, reused);
edata_zeroed_set(edata, true);
edata_committed_set(edata, true);
/*
* This isn't strictly true, but base allocated extents never get
* deallocated and can't be looked up in the emap, but no sense in
* wasting a state bit to encode this fact.
*/
edata_pai_set(edata, EXTENT_PAI_PAC);
}
static inline int
edata_esn_comp(const edata_t *a, const edata_t *b) {
size_t a_esn = edata_esn_get(a);
size_t b_esn = edata_esn_get(b);
return (a_esn > b_esn) - (a_esn < b_esn);
}
static inline int
edata_ead_comp(const edata_t *a, const edata_t *b) {
uintptr_t a_eaddr = (uintptr_t)a;
uintptr_t b_eaddr = (uintptr_t)b;
return (a_eaddr > b_eaddr) - (a_eaddr < b_eaddr);
}
static inline edata_cmp_summary_t
edata_cmp_summary_get(const edata_t *edata) {
edata_cmp_summary_t result;
result.sn = edata_sn_get(edata);
result.addr = (uintptr_t)edata_addr_get(edata);
return result;
}
#ifdef JEMALLOC_HAVE_INT128
JEMALLOC_ALWAYS_INLINE unsigned __int128
edata_cmp_summary_encode(edata_cmp_summary_t src) {
return ((unsigned __int128)src.sn << 64) | src.addr;
}
static inline int
edata_cmp_summary_comp(edata_cmp_summary_t a, edata_cmp_summary_t b) {
unsigned __int128 a_encoded = edata_cmp_summary_encode(a);
unsigned __int128 b_encoded = edata_cmp_summary_encode(b);
if (a_encoded < b_encoded)
return -1;
if (a_encoded == b_encoded)
return 0;
return 1;
}
#else
static inline int
edata_cmp_summary_comp(edata_cmp_summary_t a, edata_cmp_summary_t b) {
/*
* Logically, what we're doing here is comparing based on `.sn`, and
* falling back to comparing on `.addr` in the case that `a.sn == b.sn`.
* We accomplish this by multiplying the result of the `.sn` comparison
* by 2, so that so long as it is not 0, it will dominate the `.addr`
* comparison in determining the sign of the returned result value.
* The justification for doing things this way is that this is
* branchless - all of the branches that would be present in a
* straightforward implementation are common cases, and thus the branch
* prediction accuracy is not great. As a result, this implementation
* is measurably faster (by around 30%).
*/
return (2 * ((a.sn > b.sn) - (a.sn < b.sn)))
+ ((a.addr > b.addr) - (a.addr < b.addr));
}
#endif
static inline int
edata_snad_comp(const edata_t *a, const edata_t *b) {
edata_cmp_summary_t a_cmp = edata_cmp_summary_get(a);
edata_cmp_summary_t b_cmp = edata_cmp_summary_get(b);
return edata_cmp_summary_comp(a_cmp, b_cmp);
}
static inline int
edata_esnead_comp(const edata_t *a, const edata_t *b) {
/*
* Similar to `edata_cmp_summary_comp`, we've opted for a
* branchless implementation for the sake of performance.
*/
return (2 * edata_esn_comp(a, b)) + edata_ead_comp(a, b);
}
ph_proto(, edata_avail, edata_t) ph_proto(, edata_heap, edata_t)
#endif /* JEMALLOC_INTERNAL_EDATA_H */

View file

@ -0,0 +1,50 @@
#ifndef JEMALLOC_INTERNAL_EDATA_CACHE_H
#define JEMALLOC_INTERNAL_EDATA_CACHE_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/base.h"
/* For tests only. */
#define EDATA_CACHE_FAST_FILL 4
/*
* A cache of edata_t structures allocated via base_alloc_edata (as opposed to
* the underlying extents they describe). The contents of returned edata_t
* objects are garbage and cannot be relied upon.
*/
typedef struct edata_cache_s edata_cache_t;
struct edata_cache_s {
edata_avail_t avail;
atomic_zu_t count;
malloc_mutex_t mtx;
base_t *base;
};
bool edata_cache_init(edata_cache_t *edata_cache, base_t *base);
edata_t *edata_cache_get(tsdn_t *tsdn, edata_cache_t *edata_cache);
void edata_cache_put(tsdn_t *tsdn, edata_cache_t *edata_cache, edata_t *edata);
void edata_cache_prefork(tsdn_t *tsdn, edata_cache_t *edata_cache);
void edata_cache_postfork_parent(tsdn_t *tsdn, edata_cache_t *edata_cache);
void edata_cache_postfork_child(tsdn_t *tsdn, edata_cache_t *edata_cache);
/*
* An edata_cache_small is like an edata_cache, but it relies on external
* synchronization and avoids first-fit strategies.
*/
typedef struct edata_cache_fast_s edata_cache_fast_t;
struct edata_cache_fast_s {
edata_list_inactive_t list;
edata_cache_t *fallback;
bool disabled;
};
void edata_cache_fast_init(edata_cache_fast_t *ecs, edata_cache_t *fallback);
edata_t *edata_cache_fast_get(tsdn_t *tsdn, edata_cache_fast_t *ecs);
void edata_cache_fast_put(
tsdn_t *tsdn, edata_cache_fast_t *ecs, edata_t *edata);
void edata_cache_fast_disable(tsdn_t *tsdn, edata_cache_fast_t *ecs);
#endif /* JEMALLOC_INTERNAL_EDATA_CACHE_H */

View file

@ -0,0 +1,414 @@
#ifndef JEMALLOC_INTERNAL_EHOOKS_H
#define JEMALLOC_INTERNAL_EHOOKS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/extent_mmap.h"
#include "jemalloc/internal/tsd.h"
#include "jemalloc/internal/tsd_types.h"
/*
* This module is the internal interface to the extent hooks (both
* user-specified and external). Eventually, this will give us the flexibility
* to use multiple different versions of user-visible extent-hook APIs under a
* single user interface.
*
* Current API expansions (not available to anyone but the default hooks yet):
* - Head state tracking. Hooks can decide whether or not to merge two
* extents based on whether or not one of them is the head (i.e. was
* allocated on its own). The later extent loses its "head" status.
*/
extern const extent_hooks_t ehooks_default_extent_hooks;
typedef struct ehooks_s ehooks_t;
struct ehooks_s {
/*
* The user-visible id that goes with the ehooks (i.e. that of the base
* they're a part of, the associated arena's index within the arenas
* array).
*/
unsigned ind;
/* Logically an extent_hooks_t *. */
atomic_p_t ptr;
};
extern const extent_hooks_t ehooks_default_extent_hooks;
/*
* These are not really part of the public API. Each hook has a fast-path for
* the default-hooks case that can avoid various small inefficiencies:
* - Forgetting tsd and then calling tsd_get within the hook.
* - Getting more state than necessary out of the extent_t.
* - Doing arena_ind -> arena -> arena_ind lookups.
* By making the calls to these functions visible to the compiler, it can move
* those extra bits of computation down below the fast-paths where they get ignored.
*/
void *ehooks_default_alloc_impl(tsdn_t *tsdn, void *new_addr, size_t size,
size_t alignment, bool *zero, bool *commit, unsigned arena_ind);
bool ehooks_default_dalloc_impl(void *addr, size_t size);
void ehooks_default_destroy_impl(void *addr, size_t size);
bool ehooks_default_commit_impl(void *addr, size_t offset, size_t length);
bool ehooks_default_decommit_impl(void *addr, size_t offset, size_t length);
#ifdef PAGES_CAN_PURGE_LAZY
bool ehooks_default_purge_lazy_impl(void *addr, size_t offset, size_t length);
#endif
#ifdef PAGES_CAN_PURGE_FORCED
bool ehooks_default_purge_forced_impl(void *addr, size_t offset, size_t length);
#endif
bool ehooks_default_split_impl(void);
/*
* Merge is the only default extent hook we declare -- see the comment in
* ehooks_merge.
*/
bool ehooks_default_merge(extent_hooks_t *extent_hooks, void *addr_a,
size_t size_a, void *addr_b, size_t size_b, bool committed,
unsigned arena_ind);
bool ehooks_default_merge_impl(tsdn_t *tsdn, void *addr_a, void *addr_b);
void ehooks_default_zero_impl(void *addr, size_t size);
void ehooks_default_guard_impl(void *guard1, void *guard2);
void ehooks_default_unguard_impl(void *guard1, void *guard2);
/*
* We don't officially support reentrancy from wtihin the extent hooks. But
* various people who sit within throwing distance of the jemalloc team want
* that functionality in certain limited cases. The default reentrancy guards
* assert that we're not reentrant from a0 (since it's the bootstrap arena,
* where reentrant allocations would be redirected), which we would incorrectly
* trigger in cases where a0 has extent hooks (those hooks themselves can't be
* reentrant, then, but there are reasonable uses for such functionality, like
* putting internal metadata on hugepages). Therefore, we use the raw
* reentrancy guards.
*
* Eventually, we need to think more carefully about whether and where we
* support allocating from within extent hooks (and what that means for things
* like profiling, stats collection, etc.), and document what the guarantee is.
*/
static inline void
ehooks_pre_reentrancy(tsdn_t *tsdn) {
tsd_t *tsd = tsdn_null(tsdn) ? tsd_fetch() : tsdn_tsd(tsdn);
tsd_pre_reentrancy_raw(tsd);
}
static inline void
ehooks_post_reentrancy(tsdn_t *tsdn) {
tsd_t *tsd = tsdn_null(tsdn) ? tsd_fetch() : tsdn_tsd(tsdn);
tsd_post_reentrancy_raw(tsd);
}
/* Beginning of the public API. */
void ehooks_init(ehooks_t *ehooks, extent_hooks_t *extent_hooks, unsigned ind);
static inline unsigned
ehooks_ind_get(const ehooks_t *ehooks) {
return ehooks->ind;
}
static inline void
ehooks_set_extent_hooks_ptr(ehooks_t *ehooks, extent_hooks_t *extent_hooks) {
atomic_store_p(&ehooks->ptr, extent_hooks, ATOMIC_RELEASE);
}
static inline extent_hooks_t *
ehooks_get_extent_hooks_ptr(ehooks_t *ehooks) {
return (extent_hooks_t *)atomic_load_p(&ehooks->ptr, ATOMIC_ACQUIRE);
}
static inline bool
ehooks_are_default(ehooks_t *ehooks) {
return ehooks_get_extent_hooks_ptr(ehooks)
== &ehooks_default_extent_hooks;
}
/*
* In some cases, a caller needs to allocate resources before attempting to call
* a hook. If that hook is doomed to fail, this is wasteful. We therefore
* include some checks for such cases.
*/
static inline bool
ehooks_dalloc_will_fail(ehooks_t *ehooks) {
if (ehooks_are_default(ehooks)) {
return opt_retain;
} else {
return ehooks_get_extent_hooks_ptr(ehooks)->dalloc == NULL;
}
}
static inline bool
ehooks_split_will_fail(ehooks_t *ehooks) {
return ehooks_get_extent_hooks_ptr(ehooks)->split == NULL;
}
static inline bool
ehooks_merge_will_fail(ehooks_t *ehooks) {
return ehooks_get_extent_hooks_ptr(ehooks)->merge == NULL;
}
static inline bool
ehooks_guard_will_fail(ehooks_t *ehooks) {
/*
* Before the guard hooks are officially introduced, limit the use to
* the default hooks only.
*/
return !ehooks_are_default(ehooks);
}
/*
* Some hooks are required to return zeroed memory in certain situations. In
* debug mode, we do some heuristic checks that they did what they were supposed
* to.
*
* This isn't really ehooks-specific (i.e. anyone can check for zeroed memory).
* But incorrect zero information indicates an ehook bug.
*/
static inline void
ehooks_debug_zero_check(void *addr, size_t size) {
assert(((uintptr_t)addr & PAGE_MASK) == 0);
assert((size & PAGE_MASK) == 0);
assert(size > 0);
if (config_debug) {
/* Check the whole first page. */
size_t *p = (size_t *)addr;
for (size_t i = 0; i < PAGE / sizeof(size_t); i++) {
assert(p[i] == 0);
}
/*
* And 4 spots within. There's a tradeoff here; the larger
* this number, the more likely it is that we'll catch a bug
* where ehooks return a sparsely non-zero range. But
* increasing the number of checks also increases the number of
* page faults in debug mode. FreeBSD does much of their
* day-to-day development work in debug mode, so we don't want
* even the debug builds to be too slow.
*/
const size_t nchecks = 4;
assert(PAGE >= sizeof(size_t) * nchecks);
for (size_t i = 0; i < nchecks; ++i) {
assert(p[i * (size / sizeof(size_t) / nchecks)] == 0);
}
}
}
static inline void *
ehooks_alloc(tsdn_t *tsdn, ehooks_t *ehooks, void *new_addr, size_t size,
size_t alignment, bool *zero, bool *commit) {
bool orig_zero = *zero;
void *ret;
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ret = ehooks_default_alloc_impl(tsdn, new_addr, size, alignment,
zero, commit, ehooks_ind_get(ehooks));
} else {
ehooks_pre_reentrancy(tsdn);
ret = extent_hooks->alloc(extent_hooks, new_addr, size,
alignment, zero, commit, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
}
assert(new_addr == NULL || ret == NULL || new_addr == ret);
assert(!orig_zero || *zero);
if (*zero && ret != NULL) {
ehooks_debug_zero_check(ret, size);
}
return ret;
}
static inline bool
ehooks_dalloc(
tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size, bool committed) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_dalloc_impl(addr, size);
} else if (extent_hooks->dalloc == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->dalloc(extent_hooks, addr, size,
committed, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline void
ehooks_destroy(
tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size, bool committed) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ehooks_default_destroy_impl(addr, size);
} else if (extent_hooks->destroy == NULL) {
/* Do nothing. */
} else {
ehooks_pre_reentrancy(tsdn);
extent_hooks->destroy(extent_hooks, addr, size, committed,
ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
}
}
static inline bool
ehooks_commit(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t offset, size_t length) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
bool err;
if (extent_hooks == &ehooks_default_extent_hooks) {
err = ehooks_default_commit_impl(addr, offset, length);
} else if (extent_hooks->commit == NULL) {
err = true;
} else {
ehooks_pre_reentrancy(tsdn);
err = extent_hooks->commit(extent_hooks, addr, size, offset,
length, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
}
if (!err) {
ehooks_debug_zero_check(addr, size);
}
return err;
}
static inline bool
ehooks_decommit(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t offset, size_t length) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_decommit_impl(addr, offset, length);
} else if (extent_hooks->decommit == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->decommit(extent_hooks, addr, size,
offset, length, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline bool
ehooks_purge_lazy(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t offset, size_t length) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
#ifdef PAGES_CAN_PURGE_LAZY
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_purge_lazy_impl(addr, offset, length);
}
#endif
if (extent_hooks->purge_lazy == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->purge_lazy(extent_hooks, addr, size,
offset, length, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline bool
ehooks_purge_forced(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t offset, size_t length) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
/*
* It would be correct to have a ehooks_debug_zero_check call at the end
* of this function; purge_forced is required to zero. But checking
* would touch the page in question, which may have performance
* consequences (imagine the hooks are using hugepages, with a global
* zero page off). Even in debug mode, it's usually a good idea to
* avoid cases that can dramatically increase memory consumption.
*/
#ifdef PAGES_CAN_PURGE_FORCED
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_purge_forced_impl(addr, offset, length);
}
#endif
if (extent_hooks->purge_forced == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->purge_forced(extent_hooks, addr, size,
offset, length, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline bool
ehooks_split(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t size_a, size_t size_b, bool committed) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (ehooks_are_default(ehooks)) {
return ehooks_default_split_impl();
} else if (extent_hooks->split == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->split(extent_hooks, addr, size, size_a,
size_b, committed, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline bool
ehooks_merge(tsdn_t *tsdn, ehooks_t *ehooks, void *addr_a, size_t size_a,
void *addr_b, size_t size_b, bool committed) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_merge_impl(tsdn, addr_a, addr_b);
} else if (extent_hooks->merge == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->merge(extent_hooks, addr_a, size_a,
addr_b, size_b, committed, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline void
ehooks_zero(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ehooks_default_zero_impl(addr, size);
} else {
/*
* It would be correct to try using the user-provided purge
* hooks (since they are required to have zeroed the extent if
* they indicate success), but we don't necessarily know their
* cost. We'll be conservative and use memset.
*/
memset(addr, 0, size);
}
}
static inline bool
ehooks_guard(tsdn_t *tsdn, ehooks_t *ehooks, void *guard1, void *guard2) {
bool err;
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ehooks_default_guard_impl(guard1, guard2);
err = false;
} else {
err = true;
}
return err;
}
static inline bool
ehooks_unguard(tsdn_t *tsdn, ehooks_t *ehooks, void *guard1, void *guard2) {
bool err;
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ehooks_default_unguard_impl(guard1, guard2);
err = false;
} else {
err = true;
}
return err;
}
#endif /* JEMALLOC_INTERNAL_EHOOKS_H */

View file

@ -0,0 +1,397 @@
#ifndef JEMALLOC_INTERNAL_EMAP_H
#define JEMALLOC_INTERNAL_EMAP_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/base.h"
#include "jemalloc/internal/rtree.h"
/*
* Note: Ends without at semicolon, so that
* EMAP_DECLARE_RTREE_CTX;
* in uses will avoid empty-statement warnings.
*/
#define EMAP_DECLARE_RTREE_CTX \
rtree_ctx_t rtree_ctx_fallback; \
rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback)
typedef struct emap_s emap_t;
struct emap_s {
rtree_t rtree;
};
/* Used to pass rtree lookup context down the path. */
typedef struct emap_alloc_ctx_s emap_alloc_ctx_t;
struct emap_alloc_ctx_s {
size_t usize;
szind_t szind;
bool slab;
};
typedef struct emap_full_alloc_ctx_s emap_full_alloc_ctx_t;
struct emap_full_alloc_ctx_s {
szind_t szind;
bool slab;
edata_t *edata;
};
bool emap_init(emap_t *emap, base_t *base, bool zeroed);
void emap_remap(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, szind_t szind, bool slab);
void emap_update_edata_state(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, extent_state_t state);
/*
* The two acquire functions below allow accessing neighbor edatas, if it's safe
* and valid to do so (i.e. from the same arena, of the same state, etc.). This
* is necessary because the ecache locks are state based, and only protect
* edatas with the same state. Therefore the neighbor edata's state needs to be
* verified first, before chasing the edata pointer. The returned edata will be
* in an acquired state, meaning other threads will be prevented from accessing
* it, even if technically the edata can still be discovered from the rtree.
*
* This means, at any moment when holding pointers to edata, either one of the
* state based locks is held (and the edatas are all of the protected state), or
* the edatas are in an acquired state (e.g. in active or merging state). The
* acquire operation itself (changing the edata to an acquired state) is done
* under the state locks.
*/
edata_t *emap_try_acquire_edata_neighbor(tsdn_t *tsdn, emap_t *emap,
edata_t *edata, extent_pai_t pai, extent_state_t expected_state,
bool forward);
edata_t *emap_try_acquire_edata_neighbor_expand(tsdn_t *tsdn, emap_t *emap,
edata_t *edata, extent_pai_t pai, extent_state_t expected_state);
void emap_release_edata(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, extent_state_t new_state);
/*
* Associate the given edata with its beginning and end address, setting the
* szind and slab info appropriately.
* Returns true on error (i.e. resource exhaustion).
*/
bool emap_register_boundary(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, szind_t szind, bool slab);
/*
* Does the same thing, but with the interior of the range, for slab
* allocations.
*
* You might wonder why we don't just have a single emap_register function that
* does both depending on the value of 'slab'. The answer is twofold:
* - As a practical matter, in places like the extract->split->commit pathway,
* we defer the interior operation until we're sure that the commit won't fail
* (but we have to register the split boundaries there).
* - In general, we're trying to move to a world where the page-specific
* allocator doesn't know as much about how the pages it allocates will be
* used, and passing a 'slab' parameter everywhere makes that more
* complicated.
*
* Unlike the boundary version, this function can't fail; this is because slabs
* can't get big enough to touch a new page that neither of the boundaries
* touched, so no allocation is necessary to fill the interior once the boundary
* has been touched.
*/
void emap_register_interior(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, szind_t szind);
void emap_deregister_boundary(tsdn_t *tsdn, emap_t *emap, edata_t *edata);
void emap_deregister_interior(tsdn_t *tsdn, emap_t *emap, edata_t *edata);
typedef struct emap_prepare_s emap_prepare_t;
struct emap_prepare_s {
rtree_leaf_elm_t *lead_elm_a;
rtree_leaf_elm_t *lead_elm_b;
rtree_leaf_elm_t *trail_elm_a;
rtree_leaf_elm_t *trail_elm_b;
};
/**
* These functions the emap metadata management for merging, splitting, and
* reusing extents. In particular, they set the boundary mappings from
* addresses to edatas. If the result is going to be used as a slab, you
* still need to call emap_register_interior on it, though.
*
* Remap simply changes the szind and slab status of an extent's boundary
* mappings. If the extent is not a slab, it doesn't bother with updating the
* end mapping (since lookups only occur in the interior of an extent for
* slabs). Since the szind and slab status only make sense for active extents,
* this should only be called while activating or deactivating an extent.
*
* Split and merge have a "prepare" and a "commit" portion. The prepare portion
* does the operations that can be done without exclusive access to the extent
* in question, while the commit variant requires exclusive access to maintain
* the emap invariants. The only function that can fail is emap_split_prepare,
* and it returns true on failure (at which point the caller shouldn't commit).
*
* In all cases, "lead" refers to the lower-addressed extent, and trail to the
* higher-addressed one. It's the caller's responsibility to set the edata
* state appropriately.
*/
bool emap_split_prepare(tsdn_t *tsdn, emap_t *emap, emap_prepare_t *prepare,
edata_t *edata, size_t size_a, edata_t *trail, size_t size_b);
void emap_split_commit(tsdn_t *tsdn, emap_t *emap, emap_prepare_t *prepare,
edata_t *lead, size_t size_a, edata_t *trail, size_t size_b);
void emap_merge_prepare(tsdn_t *tsdn, emap_t *emap, emap_prepare_t *prepare,
edata_t *lead, edata_t *trail);
void emap_merge_commit(tsdn_t *tsdn, emap_t *emap, emap_prepare_t *prepare,
edata_t *lead, edata_t *trail);
/* Assert that the emap's view of the given edata matches the edata's view. */
void emap_do_assert_mapped(tsdn_t *tsdn, emap_t *emap, edata_t *edata);
static inline void
emap_assert_mapped(tsdn_t *tsdn, emap_t *emap, edata_t *edata) {
if (config_debug) {
emap_do_assert_mapped(tsdn, emap, edata);
}
}
/* Assert that the given edata isn't in the map. */
void emap_do_assert_not_mapped(tsdn_t *tsdn, emap_t *emap, edata_t *edata);
static inline void
emap_assert_not_mapped(tsdn_t *tsdn, emap_t *emap, edata_t *edata) {
if (config_debug) {
emap_do_assert_not_mapped(tsdn, emap, edata);
}
}
JEMALLOC_ALWAYS_INLINE bool
emap_edata_in_transition(tsdn_t *tsdn, emap_t *emap, edata_t *edata) {
assert(config_debug);
emap_assert_mapped(tsdn, emap, edata);
EMAP_DECLARE_RTREE_CTX;
rtree_contents_t contents = rtree_read(
tsdn, &emap->rtree, rtree_ctx, (uintptr_t)edata_base_get(edata));
return edata_state_in_transition(contents.metadata.state);
}
JEMALLOC_ALWAYS_INLINE bool
emap_edata_is_acquired(tsdn_t *tsdn, emap_t *emap, edata_t *edata) {
if (!config_debug) {
/* For assertions only. */
return false;
}
/*
* The edata is considered acquired if no other threads will attempt to
* read / write any fields from it. This includes a few cases:
*
* 1) edata not hooked into emap yet -- This implies the edata just got
* allocated or initialized.
*
* 2) in an active or transition state -- In both cases, the edata can
* be discovered from the emap, however the state tracked in the rtree
* will prevent other threads from accessing the actual edata.
*/
EMAP_DECLARE_RTREE_CTX;
rtree_leaf_elm_t *elm = rtree_leaf_elm_lookup(tsdn, &emap->rtree,
rtree_ctx, (uintptr_t)edata_base_get(edata), /* dependent */ false,
/* init_missing */ false);
if (elm == NULL) {
return true;
}
rtree_contents_t contents = rtree_leaf_elm_read(tsdn, &emap->rtree, elm,
/* dependent */ false);
if (contents.edata == NULL
|| contents.metadata.state == extent_state_active
|| edata_state_in_transition(contents.metadata.state)) {
return true;
}
return false;
}
JEMALLOC_ALWAYS_INLINE void
extent_assert_can_coalesce(const edata_t *inner, const edata_t *outer) {
assert(edata_arena_ind_get(inner) == edata_arena_ind_get(outer));
assert(edata_pai_get(inner) == edata_pai_get(outer));
assert(edata_committed_get(inner) == edata_committed_get(outer));
assert(edata_state_get(inner) == extent_state_active);
assert(edata_state_get(outer) == extent_state_merging);
assert(!edata_guarded_get(inner) && !edata_guarded_get(outer));
assert(edata_base_get(inner) == edata_past_get(outer)
|| edata_base_get(outer) == edata_past_get(inner));
}
JEMALLOC_ALWAYS_INLINE void
extent_assert_can_expand(const edata_t *original, const edata_t *expand) {
assert(edata_arena_ind_get(original) == edata_arena_ind_get(expand));
assert(edata_pai_get(original) == edata_pai_get(expand));
assert(edata_state_get(original) == extent_state_active);
assert(edata_state_get(expand) == extent_state_merging);
assert(edata_past_get(original) == edata_base_get(expand));
}
JEMALLOC_ALWAYS_INLINE edata_t *
emap_edata_lookup(tsdn_t *tsdn, emap_t *emap, const void *ptr) {
EMAP_DECLARE_RTREE_CTX;
return rtree_read(tsdn, &emap->rtree, rtree_ctx, (uintptr_t)ptr).edata;
}
JEMALLOC_ALWAYS_INLINE void
emap_alloc_ctx_init(
emap_alloc_ctx_t *alloc_ctx, szind_t szind, bool slab, size_t usize) {
alloc_ctx->szind = szind;
alloc_ctx->slab = slab;
alloc_ctx->usize = usize;
assert(
sz_large_size_classes_disabled() || usize == sz_index2size(szind));
}
JEMALLOC_ALWAYS_INLINE size_t
emap_alloc_ctx_usize_get(emap_alloc_ctx_t *alloc_ctx) {
assert(alloc_ctx->szind < SC_NSIZES);
if (alloc_ctx->slab) {
assert(alloc_ctx->usize == sz_index2size(alloc_ctx->szind));
return sz_index2size(alloc_ctx->szind);
}
assert(sz_large_size_classes_disabled()
|| alloc_ctx->usize == sz_index2size(alloc_ctx->szind));
assert(alloc_ctx->usize <= SC_LARGE_MAXCLASS);
return alloc_ctx->usize;
}
/* Fills in alloc_ctx with the info in the map. */
JEMALLOC_ALWAYS_INLINE void
emap_alloc_ctx_lookup(
tsdn_t *tsdn, emap_t *emap, const void *ptr, emap_alloc_ctx_t *alloc_ctx) {
EMAP_DECLARE_RTREE_CTX;
rtree_contents_t contents = rtree_read(
tsdn, &emap->rtree, rtree_ctx, (uintptr_t)ptr);
/*
* If the alloc is invalid, do not calculate usize since edata
* could be corrupted.
*/
emap_alloc_ctx_init(alloc_ctx, contents.metadata.szind,
contents.metadata.slab,
(contents.metadata.szind == SC_NSIZES || contents.edata == NULL)
? 0
: edata_usize_get(contents.edata));
}
/* The pointer must be mapped. */
JEMALLOC_ALWAYS_INLINE void
emap_full_alloc_ctx_lookup(tsdn_t *tsdn, emap_t *emap, const void *ptr,
emap_full_alloc_ctx_t *full_alloc_ctx) {
EMAP_DECLARE_RTREE_CTX;
rtree_contents_t contents = rtree_read(
tsdn, &emap->rtree, rtree_ctx, (uintptr_t)ptr);
full_alloc_ctx->edata = contents.edata;
full_alloc_ctx->szind = contents.metadata.szind;
full_alloc_ctx->slab = contents.metadata.slab;
}
/*
* The pointer is allowed to not be mapped.
*
* Returns true when the pointer is not present.
*/
JEMALLOC_ALWAYS_INLINE bool
emap_full_alloc_ctx_try_lookup(tsdn_t *tsdn, emap_t *emap, const void *ptr,
emap_full_alloc_ctx_t *full_alloc_ctx) {
EMAP_DECLARE_RTREE_CTX;
rtree_contents_t contents;
bool err = rtree_read_independent(
tsdn, &emap->rtree, rtree_ctx, (uintptr_t)ptr, &contents);
if (err) {
return true;
}
full_alloc_ctx->edata = contents.edata;
full_alloc_ctx->szind = contents.metadata.szind;
full_alloc_ctx->slab = contents.metadata.slab;
return false;
}
/*
* Only used on the fastpath of free. Returns true when cannot be fulfilled by
* fast path, e.g. when the metadata key is not cached.
*/
JEMALLOC_ALWAYS_INLINE bool
emap_alloc_ctx_try_lookup_fast(
tsd_t *tsd, emap_t *emap, const void *ptr, emap_alloc_ctx_t *alloc_ctx) {
/* Use the unsafe getter since this may gets called during exit. */
rtree_ctx_t *rtree_ctx = tsd_rtree_ctxp_get_unsafe(tsd);
rtree_metadata_t metadata;
bool err = rtree_metadata_try_read_fast(
tsd_tsdn(tsd), &emap->rtree, rtree_ctx, (uintptr_t)ptr, &metadata);
if (err) {
return true;
}
/*
* Small allocs using the fastpath can always use index to get the
* usize. Therefore, do not set alloc_ctx->usize here.
*/
alloc_ctx->szind = metadata.szind;
alloc_ctx->slab = metadata.slab;
if (config_debug) {
alloc_ctx->usize = SC_LARGE_MAXCLASS + 1;
}
return false;
}
/*
* We want to do batch lookups out of the cache bins, which use
* cache_bin_ptr_array_get to access the i'th element of the bin (since they
* invert usual ordering in deciding what to flush). This lets the emap avoid
* caring about its caller's ordering.
*/
typedef const void *(*emap_ptr_getter)(void *ctx, size_t ind);
/*
* This allows size-checking assertions, which we can only do while we're in the
* process of edata lookups.
*/
typedef void (*emap_metadata_visitor)(
void *ctx, emap_full_alloc_ctx_t *alloc_ctx);
typedef union emap_batch_lookup_result_u emap_batch_lookup_result_t;
union emap_batch_lookup_result_u {
edata_t *edata;
rtree_leaf_elm_t *rtree_leaf;
};
JEMALLOC_ALWAYS_INLINE void
emap_edata_lookup_batch(tsd_t *tsd, emap_t *emap, size_t nptrs,
emap_ptr_getter ptr_getter, void *ptr_getter_ctx,
emap_metadata_visitor metadata_visitor, void *metadata_visitor_ctx,
emap_batch_lookup_result_t *result) {
/* Avoids null-checking tsdn in the loop below. */
util_assume(tsd != NULL);
rtree_ctx_t *rtree_ctx = tsd_rtree_ctxp_get(tsd);
for (size_t i = 0; i < nptrs; i++) {
const void *ptr = ptr_getter(ptr_getter_ctx, i);
/*
* Reuse the edatas array as a temp buffer, lying a little about
* the types.
*/
result[i].rtree_leaf = rtree_leaf_elm_lookup(tsd_tsdn(tsd),
&emap->rtree, rtree_ctx, (uintptr_t)ptr,
/* dependent */ true, /* init_missing */ false);
}
for (size_t i = 0; i < nptrs; i++) {
rtree_leaf_elm_t *elm = result[i].rtree_leaf;
rtree_contents_t contents = rtree_leaf_elm_read(
tsd_tsdn(tsd), &emap->rtree, elm, /* dependent */ true);
result[i].edata = contents.edata;
emap_full_alloc_ctx_t alloc_ctx;
/*
* Not all these fields are read in practice by the metadata
* visitor. But the compiler can easily optimize away the ones
* that aren't, so no sense in being incomplete.
*/
alloc_ctx.szind = contents.metadata.szind;
alloc_ctx.slab = contents.metadata.slab;
alloc_ctx.edata = contents.edata;
metadata_visitor(metadata_visitor_ctx, &alloc_ctx);
}
}
#endif /* JEMALLOC_INTERNAL_EMAP_H */

View file

@ -0,0 +1,530 @@
#ifndef JEMALLOC_INTERNAL_EMITTER_H
#define JEMALLOC_INTERNAL_EMITTER_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/malloc_io.h"
#include "jemalloc/internal/ql.h"
typedef enum emitter_output_e emitter_output_t;
enum emitter_output_e {
emitter_output_json,
emitter_output_json_compact,
emitter_output_table
};
typedef enum emitter_justify_e emitter_justify_t;
enum emitter_justify_e {
emitter_justify_left,
emitter_justify_right,
/* Not for users; just to pass to internal functions. */
emitter_justify_none
};
typedef enum emitter_type_e emitter_type_t;
enum emitter_type_e {
emitter_type_bool,
emitter_type_int,
emitter_type_int64,
emitter_type_unsigned,
emitter_type_uint32,
emitter_type_uint64,
emitter_type_size,
emitter_type_ssize,
emitter_type_string,
/*
* A title is a column title in a table; it's just a string, but it's
* not quoted.
*/
emitter_type_title,
};
typedef struct emitter_col_s emitter_col_t;
struct emitter_col_s {
/* Filled in by the user. */
emitter_justify_t justify;
int width;
emitter_type_t type;
union {
bool bool_val;
int int_val;
unsigned unsigned_val;
uint32_t uint32_val;
uint32_t uint32_t_val;
uint64_t uint64_val;
uint64_t uint64_t_val;
size_t size_val;
ssize_t ssize_val;
const char *str_val;
};
/* Filled in by initialization. */
ql_elm(emitter_col_t) link;
};
typedef struct emitter_row_s emitter_row_t;
struct emitter_row_s {
ql_head(emitter_col_t) cols;
};
typedef struct emitter_s emitter_t;
struct emitter_s {
emitter_output_t output;
/* The output information. */
write_cb_t *write_cb;
void *cbopaque;
int nesting_depth;
/* True if we've already emitted a value at the given depth. */
bool item_at_depth;
/* True if we emitted a key and will emit corresponding value next. */
bool emitted_key;
};
static inline bool
emitter_outputs_json(emitter_t *emitter) {
return emitter->output == emitter_output_json
|| emitter->output == emitter_output_json_compact;
}
/* Internal convenience function. Write to the emitter the given string. */
JEMALLOC_FORMAT_PRINTF(2, 3)
static inline void
emitter_printf(emitter_t *emitter, const char *format, ...) {
va_list ap;
va_start(ap, format);
malloc_vcprintf(emitter->write_cb, emitter->cbopaque, format, ap);
va_end(ap);
}
static inline const char *
JEMALLOC_FORMAT_ARG(3) emitter_gen_fmt(char *out_fmt, size_t out_size,
const char *fmt_specifier, emitter_justify_t justify, int width) {
size_t written;
fmt_specifier++;
if (justify == emitter_justify_none) {
written = malloc_snprintf(
out_fmt, out_size, "%%%s", fmt_specifier);
} else if (justify == emitter_justify_left) {
written = malloc_snprintf(
out_fmt, out_size, "%%-%d%s", width, fmt_specifier);
} else {
written = malloc_snprintf(
out_fmt, out_size, "%%%d%s", width, fmt_specifier);
}
/* Only happens in case of bad format string, which *we* choose. */
assert(written < out_size);
return out_fmt;
}
static inline void
emitter_emit_str(emitter_t *emitter, emitter_justify_t justify, int width,
char *fmt, size_t fmt_size, const char *str) {
#define BUF_SIZE 256
char buf[BUF_SIZE];
size_t str_written = malloc_snprintf(buf, BUF_SIZE, "\"%s\"", str);
emitter_printf(
emitter, emitter_gen_fmt(fmt, fmt_size, "%s", justify, width), buf);
if (str_written < BUF_SIZE) {
return;
}
/*
* There is no support for long string justification at the moment as
* we output them partially with multiple malloc_snprintf calls and
* justufication will work correctly only withing one call.
* Fortunately this is not a big concern as we don't use justufication
* with long strings right now.
*
* We emitted leading quotation mark and trailing '\0', hence need to
* exclude extra characters from str shift.
*/
str += BUF_SIZE - 2;
do {
str_written = malloc_snprintf(buf, BUF_SIZE, "%s\"", str);
str += str_written >= BUF_SIZE ? BUF_SIZE - 1 : str_written;
emitter_printf(emitter,
emitter_gen_fmt(fmt, fmt_size, "%s", justify, width), buf);
} while (str_written >= BUF_SIZE);
#undef BUF_SIZE
}
/*
* Internal. Emit the given value type in the relevant encoding (so that the
* bool true gets mapped to json "true", but the string "true" gets mapped to
* json "\"true\"", for instance.
*
* Width is ignored if justify is emitter_justify_none.
*/
static inline void
emitter_print_value(emitter_t *emitter, emitter_justify_t justify, int width,
emitter_type_t value_type, const void *value) {
#define FMT_SIZE 10
/*
* We dynamically generate a format string to emit, to let us use the
* snprintf machinery. This is kinda hacky, but gets the job done
* quickly without having to think about the various snprintf edge
* cases.
*/
char fmt[FMT_SIZE];
#define EMIT_SIMPLE(type, format) \
emitter_printf(emitter, \
emitter_gen_fmt(fmt, FMT_SIZE, format, justify, width), \
*(const type *)value);
switch (value_type) {
case emitter_type_bool:
emitter_printf(emitter,
emitter_gen_fmt(fmt, FMT_SIZE, "%s", justify, width),
*(const bool *)value ? "true" : "false");
break;
case emitter_type_int:
EMIT_SIMPLE(int, "%d")
break;
case emitter_type_int64:
EMIT_SIMPLE(int64_t, "%" FMTd64)
break;
case emitter_type_unsigned:
EMIT_SIMPLE(unsigned, "%u")
break;
case emitter_type_ssize:
EMIT_SIMPLE(ssize_t, "%zd")
break;
case emitter_type_size:
EMIT_SIMPLE(size_t, "%zu")
break;
case emitter_type_string:
emitter_emit_str(emitter, justify, width, fmt, FMT_SIZE,
*(const char *const *)value);
break;
case emitter_type_uint32:
EMIT_SIMPLE(uint32_t, "%" FMTu32)
break;
case emitter_type_uint64:
EMIT_SIMPLE(uint64_t, "%" FMTu64)
break;
case emitter_type_title:
EMIT_SIMPLE(char *const, "%s");
break;
default:
unreachable();
}
#undef FMT_SIZE
}
/* Internal functions. In json mode, tracks nesting state. */
static inline void
emitter_nest_inc(emitter_t *emitter) {
emitter->nesting_depth++;
emitter->item_at_depth = false;
}
static inline void
emitter_nest_dec(emitter_t *emitter) {
emitter->nesting_depth--;
emitter->item_at_depth = true;
}
static inline void
emitter_indent(emitter_t *emitter) {
int amount = emitter->nesting_depth;
const char *indent_str;
assert(emitter->output != emitter_output_json_compact);
if (emitter->output == emitter_output_json) {
indent_str = "\t";
} else {
amount *= 2;
indent_str = " ";
}
for (int i = 0; i < amount; i++) {
emitter_printf(emitter, "%s", indent_str);
}
}
static inline void
emitter_json_key_prefix(emitter_t *emitter) {
assert(emitter_outputs_json(emitter));
if (emitter->emitted_key) {
emitter->emitted_key = false;
return;
}
if (emitter->item_at_depth) {
emitter_printf(emitter, ",");
}
if (emitter->output != emitter_output_json_compact) {
emitter_printf(emitter, "\n");
emitter_indent(emitter);
}
}
/******************************************************************************/
/* Public functions for emitter_t. */
static inline void
emitter_init(emitter_t *emitter, emitter_output_t emitter_output,
write_cb_t *write_cb, void *cbopaque) {
emitter->output = emitter_output;
emitter->write_cb = write_cb;
emitter->cbopaque = cbopaque;
emitter->item_at_depth = false;
emitter->emitted_key = false;
emitter->nesting_depth = 0;
}
/******************************************************************************/
/* JSON public API. */
/*
* Emits a key (e.g. as appears in an object). The next json entity emitted will
* be the corresponding value.
*/
static inline void
emitter_json_key(emitter_t *emitter, const char *json_key) {
if (emitter_outputs_json(emitter)) {
emitter_json_key_prefix(emitter);
emitter_printf(emitter, "\"%s\":%s", json_key,
emitter->output == emitter_output_json_compact ? "" : " ");
emitter->emitted_key = true;
}
}
static inline void
emitter_json_value(
emitter_t *emitter, emitter_type_t value_type, const void *value) {
if (emitter_outputs_json(emitter)) {
emitter_json_key_prefix(emitter);
emitter_print_value(
emitter, emitter_justify_none, -1, value_type, value);
emitter->item_at_depth = true;
}
}
/* Shorthand for calling emitter_json_key and then emitter_json_value. */
static inline void
emitter_json_kv(emitter_t *emitter, const char *json_key,
emitter_type_t value_type, const void *value) {
emitter_json_key(emitter, json_key);
emitter_json_value(emitter, value_type, value);
}
static inline void
emitter_json_array_begin(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
emitter_json_key_prefix(emitter);
emitter_printf(emitter, "[");
emitter_nest_inc(emitter);
}
}
/* Shorthand for calling emitter_json_key and then emitter_json_array_begin. */
static inline void
emitter_json_array_kv_begin(emitter_t *emitter, const char *json_key) {
emitter_json_key(emitter, json_key);
emitter_json_array_begin(emitter);
}
static inline void
emitter_json_array_end(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
assert(emitter->nesting_depth > 0);
emitter_nest_dec(emitter);
if (emitter->output != emitter_output_json_compact) {
emitter_printf(emitter, "\n");
emitter_indent(emitter);
}
emitter_printf(emitter, "]");
}
}
static inline void
emitter_json_object_begin(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
emitter_json_key_prefix(emitter);
emitter_printf(emitter, "{");
emitter_nest_inc(emitter);
}
}
/* Shorthand for calling emitter_json_key and then emitter_json_object_begin. */
static inline void
emitter_json_object_kv_begin(emitter_t *emitter, const char *json_key) {
emitter_json_key(emitter, json_key);
emitter_json_object_begin(emitter);
}
static inline void
emitter_json_object_end(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
assert(emitter->nesting_depth > 0);
emitter_nest_dec(emitter);
if (emitter->output != emitter_output_json_compact) {
emitter_printf(emitter, "\n");
emitter_indent(emitter);
}
emitter_printf(emitter, "}");
}
}
/******************************************************************************/
/* Table public API. */
static inline void
emitter_table_dict_begin(emitter_t *emitter, const char *table_key) {
if (emitter->output == emitter_output_table) {
emitter_indent(emitter);
emitter_printf(emitter, "%s\n", table_key);
emitter_nest_inc(emitter);
}
}
static inline void
emitter_table_dict_end(emitter_t *emitter) {
if (emitter->output == emitter_output_table) {
emitter_nest_dec(emitter);
}
}
static inline void
emitter_table_kv_note(emitter_t *emitter, const char *table_key,
emitter_type_t value_type, const void *value, const char *table_note_key,
emitter_type_t table_note_value_type, const void *table_note_value) {
if (emitter->output == emitter_output_table) {
emitter_indent(emitter);
emitter_printf(emitter, "%s: ", table_key);
emitter_print_value(
emitter, emitter_justify_none, -1, value_type, value);
if (table_note_key != NULL) {
emitter_printf(emitter, " (%s: ", table_note_key);
emitter_print_value(emitter, emitter_justify_none, -1,
table_note_value_type, table_note_value);
emitter_printf(emitter, ")");
}
emitter_printf(emitter, "\n");
}
emitter->item_at_depth = true;
}
static inline void
emitter_table_kv(emitter_t *emitter, const char *table_key,
emitter_type_t value_type, const void *value) {
emitter_table_kv_note(emitter, table_key, value_type, value, NULL,
emitter_type_bool, NULL);
}
/* Write to the emitter the given string, but only in table mode. */
JEMALLOC_FORMAT_PRINTF(2, 3)
static inline void
emitter_table_printf(emitter_t *emitter, const char *format, ...) {
if (emitter->output == emitter_output_table) {
va_list ap;
va_start(ap, format);
malloc_vcprintf(
emitter->write_cb, emitter->cbopaque, format, ap);
va_end(ap);
}
}
static inline void
emitter_table_row(emitter_t *emitter, emitter_row_t *row) {
if (emitter->output != emitter_output_table) {
return;
}
emitter_col_t *col;
ql_foreach (col, &row->cols, link) {
emitter_print_value(emitter, col->justify, col->width,
col->type, (const void *)&col->bool_val);
}
emitter_table_printf(emitter, "\n");
}
static inline void
emitter_row_init(emitter_row_t *row) {
ql_new(&row->cols);
}
static inline void
emitter_col_init(emitter_col_t *col, emitter_row_t *row) {
ql_elm_new(col, link);
ql_tail_insert(&row->cols, col, link);
}
/******************************************************************************/
/*
* Generalized public API. Emits using either JSON or table, according to
* settings in the emitter_t. */
/*
* Note emits a different kv pair as well, but only in table mode. Omits the
* note if table_note_key is NULL.
*/
static inline void
emitter_kv_note(emitter_t *emitter, const char *json_key, const char *table_key,
emitter_type_t value_type, const void *value, const char *table_note_key,
emitter_type_t table_note_value_type, const void *table_note_value) {
if (emitter_outputs_json(emitter)) {
emitter_json_key(emitter, json_key);
emitter_json_value(emitter, value_type, value);
} else {
emitter_table_kv_note(emitter, table_key, value_type, value,
table_note_key, table_note_value_type, table_note_value);
}
emitter->item_at_depth = true;
}
static inline void
emitter_kv(emitter_t *emitter, const char *json_key, const char *table_key,
emitter_type_t value_type, const void *value) {
emitter_kv_note(emitter, json_key, table_key, value_type, value, NULL,
emitter_type_bool, NULL);
}
static inline void
emitter_dict_begin(
emitter_t *emitter, const char *json_key, const char *table_header) {
if (emitter_outputs_json(emitter)) {
emitter_json_key(emitter, json_key);
emitter_json_object_begin(emitter);
} else {
emitter_table_dict_begin(emitter, table_header);
}
}
static inline void
emitter_dict_end(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
emitter_json_object_end(emitter);
} else {
emitter_table_dict_end(emitter);
}
}
static inline void
emitter_begin(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
assert(emitter->nesting_depth == 0);
emitter_printf(emitter, "{");
emitter_nest_inc(emitter);
} else {
/*
* This guarantees that we always call write_cb at least once.
* This is useful if some invariant is established by each call
* to write_cb, but doesn't hold initially: e.g., some buffer
* holds a null-terminated string.
*/
emitter_printf(emitter, "%s", "");
}
}
static inline void
emitter_end(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
assert(emitter->nesting_depth == 1);
emitter_nest_dec(emitter);
emitter_printf(emitter, "%s",
emitter->output == emitter_output_json_compact ? "}"
: "\n}\n");
}
}
#endif /* JEMALLOC_INTERNAL_EMITTER_H */

View file

@ -0,0 +1,78 @@
#ifndef JEMALLOC_INTERNAL_ESET_H
#define JEMALLOC_INTERNAL_ESET_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/edata.h"
#include "jemalloc/internal/fb.h"
#include "jemalloc/internal/mutex.h"
/*
* An eset ("extent set") is a quantized collection of extents, with built-in
* LRU queue.
*
* This class is not thread-safe; synchronization must be done externally if
* there are mutating operations. One exception is the stats counters, which
* may be read without any locking.
*/
typedef struct eset_bin_s eset_bin_t;
struct eset_bin_s {
edata_heap_t heap;
/*
* We do first-fit across multiple size classes. If we compared against
* the min element in each heap directly, we'd take a cache miss per
* extent we looked at. If we co-locate the edata summaries, we only
* take a miss on the edata we're actually going to return (which is
* inevitable anyways).
*/
edata_cmp_summary_t heap_min;
};
typedef struct eset_bin_stats_s eset_bin_stats_t;
struct eset_bin_stats_s {
atomic_zu_t nextents;
atomic_zu_t nbytes;
};
typedef struct eset_s eset_t;
struct eset_s {
/* Bitmap for which set bits correspond to non-empty heaps. */
fb_group_t bitmap[FB_NGROUPS(SC_NPSIZES + 1)];
/* Quantized per size class heaps of extents. */
eset_bin_t bins[SC_NPSIZES + 1];
eset_bin_stats_t bin_stats[SC_NPSIZES + 1];
/* LRU of all extents in heaps. */
edata_list_inactive_t lru;
/* Page sum for all extents in heaps. */
atomic_zu_t npages;
/*
* A duplication of the data in the containing ecache. We use this only
* for assertions on the states of the passed-in extents.
*/
extent_state_t state;
};
void eset_init(eset_t *eset, extent_state_t state);
size_t eset_npages_get(eset_t *eset);
/* Get the number of extents in the given page size index. */
size_t eset_nextents_get(eset_t *eset, pszind_t ind);
/* Get the sum total bytes of the extents in the given page size index. */
size_t eset_nbytes_get(eset_t *eset, pszind_t ind);
void eset_insert(eset_t *eset, edata_t *edata);
void eset_remove(eset_t *eset, edata_t *edata);
/*
* Select an extent from this eset of the given size and alignment. Returns
* null if no such item could be found.
*/
edata_t *eset_fit(eset_t *eset, size_t esize, size_t alignment, bool exact_only,
unsigned lg_max_fit);
#endif /* JEMALLOC_INTERNAL_ESET_H */

View file

@ -0,0 +1,50 @@
#ifndef JEMALLOC_INTERNAL_EXP_GROW_H
#define JEMALLOC_INTERNAL_EXP_GROW_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/sz.h"
typedef struct exp_grow_s exp_grow_t;
struct exp_grow_s {
/*
* Next extent size class in a growing series to use when satisfying a
* request via the extent hooks (only if opt_retain). This limits the
* number of disjoint virtual memory ranges so that extent merging can
* be effective even if multiple arenas' extent allocation requests are
* highly interleaved.
*
* retain_grow_limit is the max allowed size ind to expand (unless the
* required size is greater). Default is no limit, and controlled
* through mallctl only.
*/
pszind_t next;
pszind_t limit;
};
static inline bool
exp_grow_size_prepare(exp_grow_t *exp_grow, size_t alloc_size_min,
size_t *r_alloc_size, pszind_t *r_skip) {
*r_skip = 0;
*r_alloc_size = sz_pind2sz(exp_grow->next + *r_skip);
while (*r_alloc_size < alloc_size_min) {
(*r_skip)++;
if (exp_grow->next + *r_skip >= sz_psz2ind(SC_LARGE_MAXCLASS)) {
/* Outside legal range. */
return true;
}
*r_alloc_size = sz_pind2sz(exp_grow->next + *r_skip);
}
return false;
}
static inline void
exp_grow_size_commit(exp_grow_t *exp_grow, pszind_t skip) {
if (exp_grow->next + skip + 1 <= exp_grow->limit) {
exp_grow->next += skip + 1;
} else {
exp_grow->next = exp_grow->limit;
}
}
void exp_grow_init(exp_grow_t *exp_grow);
#endif /* JEMALLOC_INTERNAL_EXP_GROW_H */

View file

@ -0,0 +1,148 @@
#ifndef JEMALLOC_INTERNAL_EXTENT_H
#define JEMALLOC_INTERNAL_EXTENT_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/ecache.h"
#include "jemalloc/internal/ehooks.h"
#include "jemalloc/internal/pac.h"
#include "jemalloc/internal/ph.h"
#include "jemalloc/internal/rtree.h"
/*
* This module contains the page-level allocator. It chooses the addresses that
* allocations requested by other modules will inhabit, and updates the global
* metadata to reflect allocation/deallocation/purging decisions.
*/
/*
* When reuse (and split) an active extent, (1U << opt_lg_extent_max_active_fit)
* is the max ratio between the size of the active extent and the new extent.
*/
#define LG_EXTENT_MAX_ACTIVE_FIT_DEFAULT 6
extern size_t opt_lg_extent_max_active_fit;
#define PROCESS_MADVISE_MAX_BATCH_DEFAULT 0
extern size_t opt_process_madvise_max_batch;
#ifdef JEMALLOC_HAVE_PROCESS_MADVISE
/* The iovec is on stack. Limit the max batch to avoid stack overflow. */
# define PROCESS_MADVISE_MAX_BATCH_LIMIT \
(VARIABLE_ARRAY_SIZE_MAX / sizeof(struct iovec))
#else
# define PROCESS_MADVISE_MAX_BATCH_LIMIT 0
#endif
edata_t *ecache_alloc(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
ecache_t *ecache, edata_t *expand_edata, size_t size, size_t alignment,
bool zero, bool guarded);
edata_t *ecache_alloc_grow(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
ecache_t *ecache, edata_t *expand_edata, size_t size, size_t alignment,
bool zero, bool guarded);
void ecache_dalloc(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, ecache_t *ecache,
edata_t *edata);
edata_t *ecache_evict(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
ecache_t *ecache, size_t npages_min);
void extent_gdump_add(tsdn_t *tsdn, const edata_t *edata);
void extent_record(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, ecache_t *ecache,
edata_t *edata);
void extent_dalloc_gap(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *edata);
edata_t *extent_alloc_wrapper(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
void *new_addr, size_t size, size_t alignment, bool zero, bool *commit,
bool growing_retained);
void extent_dalloc_wrapper(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *edata);
void extent_dalloc_wrapper_purged(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *edata);
void extent_destroy_wrapper(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *edata);
bool extent_purge_lazy_wrapper(tsdn_t *tsdn, ehooks_t *ehooks, edata_t *edata,
size_t offset, size_t length);
bool extent_purge_forced_wrapper(tsdn_t *tsdn, ehooks_t *ehooks, edata_t *edata,
size_t offset, size_t length);
edata_t *extent_split_wrapper(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
edata_t *edata, size_t size_a, size_t size_b, bool holding_core_locks);
bool extent_merge_wrapper(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *a, edata_t *b);
bool extent_commit_zero(tsdn_t *tsdn, ehooks_t *ehooks, edata_t *edata,
bool commit, bool zero, bool growing_retained);
size_t extent_sn_next(pac_t *pac);
bool extent_boot(void);
JEMALLOC_ALWAYS_INLINE bool
extent_neighbor_head_state_mergeable(
bool edata_is_head, bool neighbor_is_head, bool forward) {
/*
* Head states checking: disallow merging if the higher addr extent is a
* head extent. This helps preserve first-fit, and more importantly
* makes sure no merge across arenas.
*/
if (forward) {
if (neighbor_is_head) {
return false;
}
} else {
if (edata_is_head) {
return false;
}
}
return true;
}
JEMALLOC_ALWAYS_INLINE bool
extent_can_acquire_neighbor(edata_t *edata, rtree_contents_t contents,
extent_pai_t pai, extent_state_t expected_state, bool forward,
bool expanding) {
edata_t *neighbor = contents.edata;
if (neighbor == NULL) {
return false;
}
/* It's not safe to access *neighbor yet; must verify states first. */
bool neighbor_is_head = contents.metadata.is_head;
if (!extent_neighbor_head_state_mergeable(
edata_is_head_get(edata), neighbor_is_head, forward)) {
return false;
}
extent_state_t neighbor_state = contents.metadata.state;
if (pai == EXTENT_PAI_PAC) {
if (neighbor_state != expected_state) {
return false;
}
/* From this point, it's safe to access *neighbor. */
if (!expanding
&& (edata_committed_get(edata)
!= edata_committed_get(neighbor))) {
/*
* Some platforms (e.g. Windows) require an explicit
* commit step (and writing to uncommitted memory is not
* allowed).
*/
return false;
}
} else {
if (neighbor_state == extent_state_active) {
return false;
}
/* From this point, it's safe to access *neighbor. */
}
assert(edata_pai_get(edata) == pai);
if (edata_pai_get(neighbor) != pai) {
return false;
}
if (opt_retain) {
assert(edata_arena_ind_get(edata)
== edata_arena_ind_get(neighbor));
} else {
if (edata_arena_ind_get(edata)
!= edata_arena_ind_get(neighbor)) {
return false;
}
}
assert(!edata_guarded_get(edata) && !edata_guarded_get(neighbor));
return true;
}
#endif /* JEMALLOC_INTERNAL_EXTENT_H */

View file

@ -1,26 +1,30 @@
#ifndef JEMALLOC_INTERNAL_EXTENT_DSS_H
#define JEMALLOC_INTERNAL_EXTENT_DSS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_types.h"
#include "jemalloc/internal/tsd_types.h"
typedef enum {
dss_prec_disabled = 0,
dss_prec_primary = 1,
dss_prec_disabled = 0,
dss_prec_primary = 1,
dss_prec_secondary = 2,
dss_prec_limit = 3
dss_prec_limit = 3
} dss_prec_t;
#define DSS_PREC_DEFAULT dss_prec_secondary
#define DSS_DEFAULT "secondary"
extern const char *dss_prec_names[];
extern const char *const dss_prec_names[];
extern const char *opt_dss;
dss_prec_t extent_dss_prec_get(void);
bool extent_dss_prec_set(dss_prec_t dss_prec);
void *extent_alloc_dss(tsdn_t *tsdn, arena_t *arena, void *new_addr,
size_t size, size_t alignment, bool *zero, bool *commit);
bool extent_in_dss(void *addr);
bool extent_dss_mergeable(void *addr_a, void *addr_b);
void extent_dss_boot(void);
bool extent_dss_prec_set(dss_prec_t dss_prec);
void *extent_alloc_dss(tsdn_t *tsdn, arena_t *arena, void *new_addr,
size_t size, size_t alignment, bool *zero, bool *commit);
bool extent_in_dss(void *addr);
bool extent_dss_mergeable(void *addr_a, void *addr_b);
void extent_dss_boot(void);
#endif /* JEMALLOC_INTERNAL_EXTENT_DSS_H */

View file

@ -1,72 +0,0 @@
#ifndef JEMALLOC_INTERNAL_EXTENT_EXTERNS_H
#define JEMALLOC_INTERNAL_EXTENT_EXTERNS_H
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/mutex_pool.h"
#include "jemalloc/internal/ph.h"
#include "jemalloc/internal/rb.h"
#include "jemalloc/internal/rtree.h"
extern rtree_t extents_rtree;
extern const extent_hooks_t extent_hooks_default;
extern mutex_pool_t extent_mutex_pool;
extent_t *extent_alloc(tsdn_t *tsdn, arena_t *arena);
void extent_dalloc(tsdn_t *tsdn, arena_t *arena, extent_t *extent);
extent_hooks_t *extent_hooks_get(arena_t *arena);
extent_hooks_t *extent_hooks_set(tsd_t *tsd, arena_t *arena,
extent_hooks_t *extent_hooks);
#ifdef JEMALLOC_JET
size_t extent_size_quantize_floor(size_t size);
size_t extent_size_quantize_ceil(size_t size);
#endif
rb_proto(, extent_avail_, extent_tree_t, extent_t)
ph_proto(, extent_heap_, extent_heap_t, extent_t)
bool extents_init(tsdn_t *tsdn, extents_t *extents, extent_state_t state,
bool delay_coalesce);
extent_state_t extents_state_get(const extents_t *extents);
size_t extents_npages_get(extents_t *extents);
extent_t *extents_alloc(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extents_t *extents, void *new_addr,
size_t size, size_t pad, size_t alignment, bool slab, szind_t szind,
bool *zero, bool *commit);
void extents_dalloc(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extents_t *extents, extent_t *extent);
extent_t *extents_evict(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extents_t *extents, size_t npages_min);
void extents_prefork(tsdn_t *tsdn, extents_t *extents);
void extents_postfork_parent(tsdn_t *tsdn, extents_t *extents);
void extents_postfork_child(tsdn_t *tsdn, extents_t *extents);
extent_t *extent_alloc_wrapper(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, void *new_addr, size_t size, size_t pad,
size_t alignment, bool slab, szind_t szind, bool *zero, bool *commit);
void extent_dalloc_gap(tsdn_t *tsdn, arena_t *arena, extent_t *extent);
void extent_dalloc_wrapper(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extent_t *extent);
void extent_destroy_wrapper(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extent_t *extent);
bool extent_commit_wrapper(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extent_t *extent, size_t offset,
size_t length);
bool extent_decommit_wrapper(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extent_t *extent, size_t offset,
size_t length);
bool extent_purge_lazy_wrapper(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extent_t *extent, size_t offset,
size_t length);
bool extent_purge_forced_wrapper(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extent_t *extent, size_t offset,
size_t length);
extent_t *extent_split_wrapper(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extent_t *extent, size_t size_a,
szind_t szind_a, bool slab_a, size_t size_b, szind_t szind_b, bool slab_b);
bool extent_merge_wrapper(tsdn_t *tsdn, arena_t *arena,
extent_hooks_t **r_extent_hooks, extent_t *a, extent_t *b);
bool extent_boot(void);
#endif /* JEMALLOC_INTERNAL_EXTENT_EXTERNS_H */

View file

@ -1,407 +0,0 @@
#ifndef JEMALLOC_INTERNAL_EXTENT_INLINES_H
#define JEMALLOC_INTERNAL_EXTENT_INLINES_H
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/mutex_pool.h"
#include "jemalloc/internal/pages.h"
#include "jemalloc/internal/prng.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/sz.h"
static inline void
extent_lock(tsdn_t *tsdn, extent_t *extent) {
assert(extent != NULL);
mutex_pool_lock(tsdn, &extent_mutex_pool, (uintptr_t)extent);
}
static inline void
extent_unlock(tsdn_t *tsdn, extent_t *extent) {
assert(extent != NULL);
mutex_pool_unlock(tsdn, &extent_mutex_pool, (uintptr_t)extent);
}
static inline void
extent_lock2(tsdn_t *tsdn, extent_t *extent1, extent_t *extent2) {
assert(extent1 != NULL && extent2 != NULL);
mutex_pool_lock2(tsdn, &extent_mutex_pool, (uintptr_t)extent1,
(uintptr_t)extent2);
}
static inline void
extent_unlock2(tsdn_t *tsdn, extent_t *extent1, extent_t *extent2) {
assert(extent1 != NULL && extent2 != NULL);
mutex_pool_unlock2(tsdn, &extent_mutex_pool, (uintptr_t)extent1,
(uintptr_t)extent2);
}
static inline arena_t *
extent_arena_get(const extent_t *extent) {
unsigned arena_ind = (unsigned)((extent->e_bits &
EXTENT_BITS_ARENA_MASK) >> EXTENT_BITS_ARENA_SHIFT);
/*
* The following check is omitted because we should never actually read
* a NULL arena pointer.
*/
if (false && arena_ind >= MALLOCX_ARENA_LIMIT) {
return NULL;
}
assert(arena_ind < MALLOCX_ARENA_LIMIT);
return (arena_t *)atomic_load_p(&arenas[arena_ind], ATOMIC_ACQUIRE);
}
static inline szind_t
extent_szind_get_maybe_invalid(const extent_t *extent) {
szind_t szind = (szind_t)((extent->e_bits & EXTENT_BITS_SZIND_MASK) >>
EXTENT_BITS_SZIND_SHIFT);
assert(szind <= NSIZES);
return szind;
}
static inline szind_t
extent_szind_get(const extent_t *extent) {
szind_t szind = extent_szind_get_maybe_invalid(extent);
assert(szind < NSIZES); /* Never call when "invalid". */
return szind;
}
static inline size_t
extent_usize_get(const extent_t *extent) {
return sz_index2size(extent_szind_get(extent));
}
static inline size_t
extent_sn_get(const extent_t *extent) {
return (size_t)((extent->e_bits & EXTENT_BITS_SN_MASK) >>
EXTENT_BITS_SN_SHIFT);
}
static inline extent_state_t
extent_state_get(const extent_t *extent) {
return (extent_state_t)((extent->e_bits & EXTENT_BITS_STATE_MASK) >>
EXTENT_BITS_STATE_SHIFT);
}
static inline bool
extent_zeroed_get(const extent_t *extent) {
return (bool)((extent->e_bits & EXTENT_BITS_ZEROED_MASK) >>
EXTENT_BITS_ZEROED_SHIFT);
}
static inline bool
extent_committed_get(const extent_t *extent) {
return (bool)((extent->e_bits & EXTENT_BITS_COMMITTED_MASK) >>
EXTENT_BITS_COMMITTED_SHIFT);
}
static inline bool
extent_slab_get(const extent_t *extent) {
return (bool)((extent->e_bits & EXTENT_BITS_SLAB_MASK) >>
EXTENT_BITS_SLAB_SHIFT);
}
static inline unsigned
extent_nfree_get(const extent_t *extent) {
assert(extent_slab_get(extent));
return (unsigned)((extent->e_bits & EXTENT_BITS_NFREE_MASK) >>
EXTENT_BITS_NFREE_SHIFT);
}
static inline void *
extent_base_get(const extent_t *extent) {
assert(extent->e_addr == PAGE_ADDR2BASE(extent->e_addr) ||
!extent_slab_get(extent));
return PAGE_ADDR2BASE(extent->e_addr);
}
static inline void *
extent_addr_get(const extent_t *extent) {
assert(extent->e_addr == PAGE_ADDR2BASE(extent->e_addr) ||
!extent_slab_get(extent));
return extent->e_addr;
}
static inline size_t
extent_size_get(const extent_t *extent) {
return (extent->e_size_esn & EXTENT_SIZE_MASK);
}
static inline size_t
extent_esn_get(const extent_t *extent) {
return (extent->e_size_esn & EXTENT_ESN_MASK);
}
static inline size_t
extent_bsize_get(const extent_t *extent) {
return extent->e_bsize;
}
static inline void *
extent_before_get(const extent_t *extent) {
return (void *)((uintptr_t)extent_base_get(extent) - PAGE);
}
static inline void *
extent_last_get(const extent_t *extent) {
return (void *)((uintptr_t)extent_base_get(extent) +
extent_size_get(extent) - PAGE);
}
static inline void *
extent_past_get(const extent_t *extent) {
return (void *)((uintptr_t)extent_base_get(extent) +
extent_size_get(extent));
}
static inline arena_slab_data_t *
extent_slab_data_get(extent_t *extent) {
assert(extent_slab_get(extent));
return &extent->e_slab_data;
}
static inline const arena_slab_data_t *
extent_slab_data_get_const(const extent_t *extent) {
assert(extent_slab_get(extent));
return &extent->e_slab_data;
}
static inline prof_tctx_t *
extent_prof_tctx_get(const extent_t *extent) {
return (prof_tctx_t *)atomic_load_p(&extent->e_prof_tctx,
ATOMIC_ACQUIRE);
}
static inline void
extent_arena_set(extent_t *extent, arena_t *arena) {
unsigned arena_ind = (arena != NULL) ? arena_ind_get(arena) : ((1U <<
MALLOCX_ARENA_BITS) - 1);
extent->e_bits = (extent->e_bits & ~EXTENT_BITS_ARENA_MASK) |
((uint64_t)arena_ind << EXTENT_BITS_ARENA_SHIFT);
}
static inline void
extent_addr_set(extent_t *extent, void *addr) {
extent->e_addr = addr;
}
static inline void
extent_addr_randomize(tsdn_t *tsdn, extent_t *extent, size_t alignment) {
assert(extent_base_get(extent) == extent_addr_get(extent));
if (alignment < PAGE) {
unsigned lg_range = LG_PAGE -
lg_floor(CACHELINE_CEILING(alignment));
size_t r =
prng_lg_range_zu(&extent_arena_get(extent)->offset_state,
lg_range, true);
uintptr_t random_offset = ((uintptr_t)r) << (LG_PAGE -
lg_range);
extent->e_addr = (void *)((uintptr_t)extent->e_addr +
random_offset);
assert(ALIGNMENT_ADDR2BASE(extent->e_addr, alignment) ==
extent->e_addr);
}
}
static inline void
extent_size_set(extent_t *extent, size_t size) {
assert((size & ~EXTENT_SIZE_MASK) == 0);
extent->e_size_esn = size | (extent->e_size_esn & ~EXTENT_SIZE_MASK);
}
static inline void
extent_esn_set(extent_t *extent, size_t esn) {
extent->e_size_esn = (extent->e_size_esn & ~EXTENT_ESN_MASK) | (esn &
EXTENT_ESN_MASK);
}
static inline void
extent_bsize_set(extent_t *extent, size_t bsize) {
extent->e_bsize = bsize;
}
static inline void
extent_szind_set(extent_t *extent, szind_t szind) {
assert(szind <= NSIZES); /* NSIZES means "invalid". */
extent->e_bits = (extent->e_bits & ~EXTENT_BITS_SZIND_MASK) |
((uint64_t)szind << EXTENT_BITS_SZIND_SHIFT);
}
static inline void
extent_nfree_set(extent_t *extent, unsigned nfree) {
assert(extent_slab_get(extent));
extent->e_bits = (extent->e_bits & ~EXTENT_BITS_NFREE_MASK) |
((uint64_t)nfree << EXTENT_BITS_NFREE_SHIFT);
}
static inline void
extent_nfree_inc(extent_t *extent) {
assert(extent_slab_get(extent));
extent->e_bits += ((uint64_t)1U << EXTENT_BITS_NFREE_SHIFT);
}
static inline void
extent_nfree_dec(extent_t *extent) {
assert(extent_slab_get(extent));
extent->e_bits -= ((uint64_t)1U << EXTENT_BITS_NFREE_SHIFT);
}
static inline void
extent_sn_set(extent_t *extent, size_t sn) {
extent->e_bits = (extent->e_bits & ~EXTENT_BITS_SN_MASK) |
((uint64_t)sn << EXTENT_BITS_SN_SHIFT);
}
static inline void
extent_state_set(extent_t *extent, extent_state_t state) {
extent->e_bits = (extent->e_bits & ~EXTENT_BITS_STATE_MASK) |
((uint64_t)state << EXTENT_BITS_STATE_SHIFT);
}
static inline void
extent_zeroed_set(extent_t *extent, bool zeroed) {
extent->e_bits = (extent->e_bits & ~EXTENT_BITS_ZEROED_MASK) |
((uint64_t)zeroed << EXTENT_BITS_ZEROED_SHIFT);
}
static inline void
extent_committed_set(extent_t *extent, bool committed) {
extent->e_bits = (extent->e_bits & ~EXTENT_BITS_COMMITTED_MASK) |
((uint64_t)committed << EXTENT_BITS_COMMITTED_SHIFT);
}
static inline void
extent_slab_set(extent_t *extent, bool slab) {
extent->e_bits = (extent->e_bits & ~EXTENT_BITS_SLAB_MASK) |
((uint64_t)slab << EXTENT_BITS_SLAB_SHIFT);
}
static inline void
extent_prof_tctx_set(extent_t *extent, prof_tctx_t *tctx) {
atomic_store_p(&extent->e_prof_tctx, tctx, ATOMIC_RELEASE);
}
static inline void
extent_init(extent_t *extent, arena_t *arena, void *addr, size_t size,
bool slab, szind_t szind, size_t sn, extent_state_t state, bool zeroed,
bool committed) {
assert(addr == PAGE_ADDR2BASE(addr) || !slab);
extent_arena_set(extent, arena);
extent_addr_set(extent, addr);
extent_size_set(extent, size);
extent_slab_set(extent, slab);
extent_szind_set(extent, szind);
extent_sn_set(extent, sn);
extent_state_set(extent, state);
extent_zeroed_set(extent, zeroed);
extent_committed_set(extent, committed);
ql_elm_new(extent, ql_link);
if (config_prof) {
extent_prof_tctx_set(extent, NULL);
}
}
static inline void
extent_binit(extent_t *extent, void *addr, size_t bsize, size_t sn) {
extent_arena_set(extent, NULL);
extent_addr_set(extent, addr);
extent_bsize_set(extent, bsize);
extent_slab_set(extent, false);
extent_szind_set(extent, NSIZES);
extent_sn_set(extent, sn);
extent_state_set(extent, extent_state_active);
extent_zeroed_set(extent, true);
extent_committed_set(extent, true);
}
static inline void
extent_list_init(extent_list_t *list) {
ql_new(list);
}
static inline extent_t *
extent_list_first(const extent_list_t *list) {
return ql_first(list);
}
static inline extent_t *
extent_list_last(const extent_list_t *list) {
return ql_last(list, ql_link);
}
static inline void
extent_list_append(extent_list_t *list, extent_t *extent) {
ql_tail_insert(list, extent, ql_link);
}
static inline void
extent_list_replace(extent_list_t *list, extent_t *to_remove,
extent_t *to_insert) {
ql_after_insert(to_remove, to_insert, ql_link);
ql_remove(list, to_remove, ql_link);
}
static inline void
extent_list_remove(extent_list_t *list, extent_t *extent) {
ql_remove(list, extent, ql_link);
}
static inline int
extent_sn_comp(const extent_t *a, const extent_t *b) {
size_t a_sn = extent_sn_get(a);
size_t b_sn = extent_sn_get(b);
return (a_sn > b_sn) - (a_sn < b_sn);
}
static inline int
extent_esn_comp(const extent_t *a, const extent_t *b) {
size_t a_esn = extent_esn_get(a);
size_t b_esn = extent_esn_get(b);
return (a_esn > b_esn) - (a_esn < b_esn);
}
static inline int
extent_ad_comp(const extent_t *a, const extent_t *b) {
uintptr_t a_addr = (uintptr_t)extent_addr_get(a);
uintptr_t b_addr = (uintptr_t)extent_addr_get(b);
return (a_addr > b_addr) - (a_addr < b_addr);
}
static inline int
extent_ead_comp(const extent_t *a, const extent_t *b) {
uintptr_t a_eaddr = (uintptr_t)a;
uintptr_t b_eaddr = (uintptr_t)b;
return (a_eaddr > b_eaddr) - (a_eaddr < b_eaddr);
}
static inline int
extent_snad_comp(const extent_t *a, const extent_t *b) {
int ret;
ret = extent_sn_comp(a, b);
if (ret != 0) {
return ret;
}
ret = extent_ad_comp(a, b);
return ret;
}
static inline int
extent_esnead_comp(const extent_t *a, const extent_t *b) {
int ret;
ret = extent_esn_comp(a, b);
if (ret != 0) {
return ret;
}
ret = extent_ead_comp(a, b);
return ret;
}
#endif /* JEMALLOC_INTERNAL_EXTENT_INLINES_H */

View file

@ -1,10 +1,12 @@
#ifndef JEMALLOC_INTERNAL_EXTENT_MMAP_EXTERNS_H
#define JEMALLOC_INTERNAL_EXTENT_MMAP_EXTERNS_H
#include "jemalloc/internal/jemalloc_preamble.h"
extern bool opt_retain;
void *extent_alloc_mmap(void *new_addr, size_t size, size_t alignment,
bool *zero, bool *commit);
void *extent_alloc_mmap(
void *new_addr, size_t size, size_t alignment, bool *zero, bool *commit);
bool extent_dalloc_mmap(void *addr, size_t size);
#endif /* JEMALLOC_INTERNAL_EXTENT_MMAP_EXTERNS_H */

View file

@ -1,199 +0,0 @@
#ifndef JEMALLOC_INTERNAL_EXTENT_STRUCTS_H
#define JEMALLOC_INTERNAL_EXTENT_STRUCTS_H
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/bitmap.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/rb.h"
#include "jemalloc/internal/ph.h"
#include "jemalloc/internal/size_classes.h"
typedef enum {
extent_state_active = 0,
extent_state_dirty = 1,
extent_state_muzzy = 2,
extent_state_retained = 3
} extent_state_t;
/* Extent (span of pages). Use accessor functions for e_* fields. */
struct extent_s {
/*
* Bitfield containing several fields:
*
* a: arena_ind
* b: slab
* c: committed
* z: zeroed
* t: state
* i: szind
* f: nfree
* n: sn
*
* nnnnnnnn ... nnnnnfff fffffffi iiiiiiit tzcbaaaa aaaaaaaa
*
* arena_ind: Arena from which this extent came, or all 1 bits if
* unassociated.
*
* slab: The slab flag indicates whether the extent is used for a slab
* of small regions. This helps differentiate small size classes,
* and it indicates whether interior pointers can be looked up via
* iealloc().
*
* committed: The committed flag indicates whether physical memory is
* committed to the extent, whether explicitly or implicitly
* as on a system that overcommits and satisfies physical
* memory needs on demand via soft page faults.
*
* zeroed: The zeroed flag is used by extent recycling code to track
* whether memory is zero-filled.
*
* state: The state flag is an extent_state_t.
*
* szind: The szind flag indicates usable size class index for
* allocations residing in this extent, regardless of whether the
* extent is a slab. Extent size and usable size often differ
* even for non-slabs, either due to sz_large_pad or promotion of
* sampled small regions.
*
* nfree: Number of free regions in slab.
*
* sn: Serial number (potentially non-unique).
*
* Serial numbers may wrap around if !opt_retain, but as long as
* comparison functions fall back on address comparison for equal
* serial numbers, stable (if imperfect) ordering is maintained.
*
* Serial numbers may not be unique even in the absence of
* wrap-around, e.g. when splitting an extent and assigning the same
* serial number to both resulting adjacent extents.
*/
uint64_t e_bits;
#define EXTENT_BITS_ARENA_SHIFT 0
#define EXTENT_BITS_ARENA_MASK \
(((uint64_t)(1U << MALLOCX_ARENA_BITS) - 1) << EXTENT_BITS_ARENA_SHIFT)
#define EXTENT_BITS_SLAB_SHIFT MALLOCX_ARENA_BITS
#define EXTENT_BITS_SLAB_MASK \
((uint64_t)0x1U << EXTENT_BITS_SLAB_SHIFT)
#define EXTENT_BITS_COMMITTED_SHIFT (MALLOCX_ARENA_BITS + 1)
#define EXTENT_BITS_COMMITTED_MASK \
((uint64_t)0x1U << EXTENT_BITS_COMMITTED_SHIFT)
#define EXTENT_BITS_ZEROED_SHIFT (MALLOCX_ARENA_BITS + 2)
#define EXTENT_BITS_ZEROED_MASK \
((uint64_t)0x1U << EXTENT_BITS_ZEROED_SHIFT)
#define EXTENT_BITS_STATE_SHIFT (MALLOCX_ARENA_BITS + 3)
#define EXTENT_BITS_STATE_MASK \
((uint64_t)0x3U << EXTENT_BITS_STATE_SHIFT)
#define EXTENT_BITS_SZIND_SHIFT (MALLOCX_ARENA_BITS + 5)
#define EXTENT_BITS_SZIND_MASK \
(((uint64_t)(1U << LG_CEIL_NSIZES) - 1) << EXTENT_BITS_SZIND_SHIFT)
#define EXTENT_BITS_NFREE_SHIFT \
(MALLOCX_ARENA_BITS + 5 + LG_CEIL_NSIZES)
#define EXTENT_BITS_NFREE_MASK \
((uint64_t)((1U << (LG_SLAB_MAXREGS + 1)) - 1) << EXTENT_BITS_NFREE_SHIFT)
#define EXTENT_BITS_SN_SHIFT \
(MALLOCX_ARENA_BITS + 5 + LG_CEIL_NSIZES + (LG_SLAB_MAXREGS + 1))
#define EXTENT_BITS_SN_MASK (UINT64_MAX << EXTENT_BITS_SN_SHIFT)
/* Pointer to the extent that this structure is responsible for. */
void *e_addr;
union {
/*
* Extent size and serial number associated with the extent
* structure (different than the serial number for the extent at
* e_addr).
*
* ssssssss [...] ssssssss ssssnnnn nnnnnnnn
*/
size_t e_size_esn;
#define EXTENT_SIZE_MASK ((size_t)~(PAGE-1))
#define EXTENT_ESN_MASK ((size_t)PAGE-1)
/* Base extent size, which may not be a multiple of PAGE. */
size_t e_bsize;
};
union {
/*
* List linkage, used by a variety of lists:
* - arena_bin_t's slabs_full
* - extents_t's LRU
* - stashed dirty extents
* - arena's large allocations
*/
ql_elm(extent_t) ql_link;
/* Red-black tree linkage, used by arena's extent_avail. */
rb_node(extent_t) rb_link;
};
/* Linkage for per size class sn/address-ordered heaps. */
phn(extent_t) ph_link;
union {
/* Small region slab metadata. */
arena_slab_data_t e_slab_data;
/*
* Profile counters, used for large objects. Points to a
* prof_tctx_t.
*/
atomic_p_t e_prof_tctx;
};
};
typedef ql_head(extent_t) extent_list_t;
typedef rb_tree(extent_t) extent_tree_t;
typedef ph(extent_t) extent_heap_t;
/* Quantized collection of extents, with built-in LRU queue. */
struct extents_s {
malloc_mutex_t mtx;
/*
* Quantized per size class heaps of extents.
*
* Synchronization: mtx.
*/
extent_heap_t heaps[NPSIZES+1];
/*
* Bitmap for which set bits correspond to non-empty heaps.
*
* Synchronization: mtx.
*/
bitmap_t bitmap[BITMAP_GROUPS(NPSIZES+1)];
/*
* LRU of all extents in heaps.
*
* Synchronization: mtx.
*/
extent_list_t lru;
/*
* Page sum for all extents in heaps.
*
* The synchronization here is a little tricky. Modifications to npages
* must hold mtx, but reads need not (though, a reader who sees npages
* without holding the mutex can't assume anything about the rest of the
* state of the extents_t).
*/
atomic_zu_t npages;
/* All stored extents must be in the same state. */
extent_state_t state;
/*
* If true, delay coalescing until eviction; otherwise coalesce during
* deallocation.
*/
bool delay_coalesce;
};
#endif /* JEMALLOC_INTERNAL_EXTENT_STRUCTS_H */

View file

@ -1,9 +0,0 @@
#ifndef JEMALLOC_INTERNAL_EXTENT_TYPES_H
#define JEMALLOC_INTERNAL_EXTENT_TYPES_H
typedef struct extent_s extent_t;
typedef struct extents_s extents_t;
#define EXTENT_HOOKS_INITIALIZER NULL
#endif /* JEMALLOC_INTERNAL_EXTENT_TYPES_H */

View file

@ -0,0 +1,378 @@
#ifndef JEMALLOC_INTERNAL_FB_H
#define JEMALLOC_INTERNAL_FB_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
#include "jemalloc/internal/bit_util.h"
/*
* The flat bitmap module. This has a larger API relative to the bitmap module
* (supporting things like backwards searches, and searching for both set and
* unset bits), at the cost of slower operations for very large bitmaps.
*
* Initialized flat bitmaps start at all-zeros (all bits unset).
*/
typedef unsigned long fb_group_t;
#define FB_GROUP_BITS (ZU(1) << (LG_SIZEOF_LONG + 3))
#define FB_NGROUPS(nbits) \
((nbits) / FB_GROUP_BITS + ((nbits) % FB_GROUP_BITS == 0 ? 0 : 1))
static inline void
fb_init(fb_group_t *fb, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
memset(fb, 0, ngroups * sizeof(fb_group_t));
}
static inline bool
fb_empty(fb_group_t *fb, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
for (size_t i = 0; i < ngroups; i++) {
if (fb[i] != 0) {
return false;
}
}
return true;
}
static inline bool
fb_full(fb_group_t *fb, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
size_t trailing_bits = nbits % FB_GROUP_BITS;
size_t limit = (trailing_bits == 0 ? ngroups : ngroups - 1);
for (size_t i = 0; i < limit; i++) {
if (fb[i] != ~(fb_group_t)0) {
return false;
}
}
if (trailing_bits == 0) {
return true;
}
return fb[ngroups - 1] == ((fb_group_t)1 << trailing_bits) - 1;
}
static inline bool
fb_get(fb_group_t *fb, size_t nbits, size_t bit) {
assert(bit < nbits);
size_t group_ind = bit / FB_GROUP_BITS;
size_t bit_ind = bit % FB_GROUP_BITS;
return (bool)(fb[group_ind] & ((fb_group_t)1 << bit_ind));
}
static inline void
fb_set(fb_group_t *fb, size_t nbits, size_t bit) {
assert(bit < nbits);
size_t group_ind = bit / FB_GROUP_BITS;
size_t bit_ind = bit % FB_GROUP_BITS;
fb[group_ind] |= ((fb_group_t)1 << bit_ind);
}
static inline void
fb_unset(fb_group_t *fb, size_t nbits, size_t bit) {
assert(bit < nbits);
size_t group_ind = bit / FB_GROUP_BITS;
size_t bit_ind = bit % FB_GROUP_BITS;
fb[group_ind] &= ~((fb_group_t)1 << bit_ind);
}
/*
* Some implementation details. This visitation function lets us apply a group
* visitor to each group in the bitmap (potentially modifying it). The mask
* indicates which bits are logically part of the visitation.
*/
typedef void (*fb_group_visitor_t)(void *ctx, fb_group_t *fb, fb_group_t mask);
JEMALLOC_ALWAYS_INLINE void
fb_visit_impl(fb_group_t *fb, size_t nbits, fb_group_visitor_t visit, void *ctx,
size_t start, size_t cnt) {
assert(cnt > 0);
assert(start + cnt <= nbits);
size_t group_ind = start / FB_GROUP_BITS;
size_t start_bit_ind = start % FB_GROUP_BITS;
/*
* The first group is special; it's the only one we don't start writing
* to from bit 0.
*/
size_t first_group_cnt = (start_bit_ind + cnt > FB_GROUP_BITS
? FB_GROUP_BITS - start_bit_ind
: cnt);
/*
* We can basically split affected words into:
* - The first group, where we touch only the high bits
* - The last group, where we touch only the low bits
* - The middle, where we set all the bits to the same thing.
* We treat each case individually. The last two could be merged, but
* this can lead to bad codegen for those middle words.
*/
/* First group */
fb_group_t mask =
((~(fb_group_t)0) >> (FB_GROUP_BITS - first_group_cnt))
<< start_bit_ind;
visit(ctx, &fb[group_ind], mask);
cnt -= first_group_cnt;
group_ind++;
/* Middle groups */
while (cnt > FB_GROUP_BITS) {
visit(ctx, &fb[group_ind], ~(fb_group_t)0);
cnt -= FB_GROUP_BITS;
group_ind++;
}
/* Last group */
if (cnt != 0) {
mask = (~(fb_group_t)0) >> (FB_GROUP_BITS - cnt);
visit(ctx, &fb[group_ind], mask);
}
}
JEMALLOC_ALWAYS_INLINE void
fb_assign_visitor(void *ctx, fb_group_t *fb, fb_group_t mask) {
bool val = *(bool *)ctx;
if (val) {
*fb |= mask;
} else {
*fb &= ~mask;
}
}
/* Sets the cnt bits starting at position start. Must not have a 0 count. */
static inline void
fb_set_range(fb_group_t *fb, size_t nbits, size_t start, size_t cnt) {
bool val = true;
fb_visit_impl(fb, nbits, &fb_assign_visitor, &val, start, cnt);
}
/* Unsets the cnt bits starting at position start. Must not have a 0 count. */
static inline void
fb_unset_range(fb_group_t *fb, size_t nbits, size_t start, size_t cnt) {
bool val = false;
fb_visit_impl(fb, nbits, &fb_assign_visitor, &val, start, cnt);
}
JEMALLOC_ALWAYS_INLINE void
fb_scount_visitor(void *ctx, fb_group_t *fb, fb_group_t mask) {
size_t *scount = (size_t *)ctx;
*scount += popcount_lu(*fb & mask);
}
/* Finds the number of set bit in the of length cnt starting at start. */
JEMALLOC_ALWAYS_INLINE size_t
fb_scount(fb_group_t *fb, size_t nbits, size_t start, size_t cnt) {
size_t scount = 0;
fb_visit_impl(fb, nbits, &fb_scount_visitor, &scount, start, cnt);
return scount;
}
/* Finds the number of unset bit in the of length cnt starting at start. */
JEMALLOC_ALWAYS_INLINE size_t
fb_ucount(fb_group_t *fb, size_t nbits, size_t start, size_t cnt) {
size_t scount = fb_scount(fb, nbits, start, cnt);
return cnt - scount;
}
/*
* An implementation detail; find the first bit at position >= min_bit with the
* value val.
*
* Returns the number of bits in the bitmap if no such bit exists.
*/
JEMALLOC_ALWAYS_INLINE ssize_t
fb_find_impl(
fb_group_t *fb, size_t nbits, size_t start, bool val, bool forward) {
assert(start < nbits);
size_t ngroups = FB_NGROUPS(nbits);
ssize_t group_ind = start / FB_GROUP_BITS;
size_t bit_ind = start % FB_GROUP_BITS;
fb_group_t maybe_invert = (val ? 0 : (fb_group_t)-1);
fb_group_t group = fb[group_ind];
group ^= maybe_invert;
if (forward) {
/* Only keep ones in bits bit_ind and above. */
group &= ~((1LU << bit_ind) - 1);
} else {
/*
* Only keep ones in bits bit_ind and below. You might more
* naturally express this as (1 << (bit_ind + 1)) - 1, but
* that shifts by an invalid amount if bit_ind is one less than
* FB_GROUP_BITS.
*/
group &= ((2LU << bit_ind) - 1);
}
ssize_t group_ind_bound = forward ? (ssize_t)ngroups : -1;
while (group == 0) {
group_ind += forward ? 1 : -1;
if (group_ind == group_ind_bound) {
return forward ? (ssize_t)nbits : (ssize_t)-1;
}
group = fb[group_ind];
group ^= maybe_invert;
}
assert(group != 0);
size_t bit = forward ? ffs_lu(group) : fls_lu(group);
size_t pos = group_ind * FB_GROUP_BITS + bit;
/*
* The high bits of a partially filled last group are zeros, so if we're
* looking for zeros we don't want to report an invalid result.
*/
if (forward && !val && pos > nbits) {
return nbits;
}
return pos;
}
/*
* Find the first set bit in the bitmap with an index >= min_bit. Returns the
* number of bits in the bitmap if no such bit exists.
*/
static inline size_t
fb_ffu(fb_group_t *fb, size_t nbits, size_t min_bit) {
return (size_t)fb_find_impl(fb, nbits, min_bit, /* val */ false,
/* forward */ true);
}
/* The same, but looks for an unset bit. */
static inline size_t
fb_ffs(fb_group_t *fb, size_t nbits, size_t min_bit) {
return (size_t)fb_find_impl(fb, nbits, min_bit, /* val */ true,
/* forward */ true);
}
/*
* Find the last set bit in the bitmap with an index <= max_bit. Returns -1 if
* no such bit exists.
*/
static inline ssize_t
fb_flu(fb_group_t *fb, size_t nbits, size_t max_bit) {
return fb_find_impl(fb, nbits, max_bit, /* val */ false,
/* forward */ false);
}
static inline ssize_t
fb_fls(fb_group_t *fb, size_t nbits, size_t max_bit) {
return fb_find_impl(fb, nbits, max_bit, /* val */ true,
/* forward */ false);
}
/* Returns whether or not we found a range. */
JEMALLOC_ALWAYS_INLINE bool
fb_iter_range_impl(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len, bool val, bool forward) {
assert(start < nbits);
ssize_t next_range_begin = fb_find_impl(fb, nbits, start, val, forward);
if ((forward && next_range_begin == (ssize_t)nbits)
|| (!forward && next_range_begin == (ssize_t)-1)) {
return false;
}
/* Half open range; the set bits are [begin, end). */
ssize_t next_range_end = fb_find_impl(
fb, nbits, next_range_begin, !val, forward);
if (forward) {
*r_begin = next_range_begin;
*r_len = next_range_end - next_range_begin;
} else {
*r_begin = next_range_end + 1;
*r_len = next_range_begin - next_range_end;
}
return true;
}
/*
* Used to iterate through ranges of set bits.
*
* Tries to find the next contiguous sequence of set bits with a first index >=
* start. If one exists, puts the earliest bit of the range in *r_begin, its
* length in *r_len, and returns true. Otherwise, returns false (without
* touching *r_begin or *r_end).
*/
static inline bool
fb_srange_iter(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len) {
return fb_iter_range_impl(fb, nbits, start, r_begin, r_len,
/* val */ true, /* forward */ true);
}
/*
* The same as fb_srange_iter, but searches backwards from start rather than
* forwards. (The position returned is still the earliest bit in the range).
*/
static inline bool
fb_srange_riter(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len) {
return fb_iter_range_impl(fb, nbits, start, r_begin, r_len,
/* val */ true, /* forward */ false);
}
/* Similar to fb_srange_iter, but searches for unset bits. */
static inline bool
fb_urange_iter(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len) {
return fb_iter_range_impl(fb, nbits, start, r_begin, r_len,
/* val */ false, /* forward */ true);
}
/* Similar to fb_srange_riter, but searches for unset bits. */
static inline bool
fb_urange_riter(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len) {
return fb_iter_range_impl(fb, nbits, start, r_begin, r_len,
/* val */ false, /* forward */ false);
}
JEMALLOC_ALWAYS_INLINE size_t
fb_range_longest_impl(fb_group_t *fb, size_t nbits, bool val) {
size_t begin = 0;
size_t longest_len = 0;
size_t len = 0;
while (begin < nbits
&& fb_iter_range_impl(
fb, nbits, begin, &begin, &len, val, /* forward */ true)) {
if (len > longest_len) {
longest_len = len;
}
begin += len;
}
return longest_len;
}
static inline size_t
fb_srange_longest(fb_group_t *fb, size_t nbits) {
return fb_range_longest_impl(fb, nbits, /* val */ true);
}
static inline size_t
fb_urange_longest(fb_group_t *fb, size_t nbits) {
return fb_range_longest_impl(fb, nbits, /* val */ false);
}
/*
* Initializes each bit of dst with the bitwise-AND of the corresponding bits of
* src1 and src2. All bitmaps must be the same size.
*/
static inline void
fb_bit_and(fb_group_t *dst, fb_group_t *src1, fb_group_t *src2, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
for (size_t i = 0; i < ngroups; i++) {
dst[i] = src1[i] & src2[i];
}
}
/* Like fb_bit_and, but with bitwise-OR. */
static inline void
fb_bit_or(fb_group_t *dst, fb_group_t *src1, fb_group_t *src2, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
for (size_t i = 0; i < ngroups; i++) {
dst[i] = src1[i] | src2[i];
}
}
/* Initializes dst bit i to the negation of source bit i. */
static inline void
fb_bit_not(fb_group_t *dst, fb_group_t *src, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
for (size_t i = 0; i < ngroups; i++) {
dst[i] = ~src[i];
}
}
#endif /* JEMALLOC_INTERNAL_FB_H */

View file

@ -0,0 +1,129 @@
#ifndef JEMALLOC_INTERNAL_FXP_H
#define JEMALLOC_INTERNAL_FXP_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
/*
* A simple fixed-point math implementation, supporting only unsigned values
* (with overflow being an error).
*
* It's not in general safe to use floating point in core code, because various
* libc implementations we get linked against can assume that malloc won't touch
* floating point state and call it with an unusual calling convention.
*/
/*
* High 16 bits are the integer part, low 16 are the fractional part. Or
* equivalently, repr == 2**16 * val, where we use "val" to refer to the
* (imaginary) fractional representation of the true value.
*
* We pick a uint32_t here since it's convenient in some places to
* double the representation size (i.e. multiplication and division use
* 64-bit integer types), and a uint64_t is the largest type we're
* certain is available.
*/
typedef uint32_t fxp_t;
#define FXP_INIT_INT(x) ((x) << 16)
#define FXP_INIT_PERCENT(pct) (((pct) << 16) / 100)
/*
* Amount of precision used in parsing and printing numbers. The integer bound
* is simply because the integer part of the number gets 16 bits, and so is
* bounded by 65536.
*
* We use a lot of precision for the fractional part, even though most of it
* gets rounded off; this lets us get exact values for the important special
* case where the denominator is a small power of 2 (for instance,
* 1/512 == 0.001953125 is exactly representable even with only 16 bits of
* fractional precision). We need to left-shift by 16 before dividing by
* 10**precision, so we pick precision to be floor(log(2**48)) = 14.
*/
#define FXP_INTEGER_PART_DIGITS 5
#define FXP_FRACTIONAL_PART_DIGITS 14
/*
* In addition to the integer and fractional parts of the number, we need to
* include a null character and (possibly) a decimal point.
*/
#define FXP_BUF_SIZE (FXP_INTEGER_PART_DIGITS + FXP_FRACTIONAL_PART_DIGITS + 2)
static inline fxp_t
fxp_add(fxp_t a, fxp_t b) {
return a + b;
}
static inline fxp_t
fxp_sub(fxp_t a, fxp_t b) {
assert(a >= b);
return a - b;
}
static inline fxp_t
fxp_mul(fxp_t a, fxp_t b) {
uint64_t unshifted = (uint64_t)a * (uint64_t)b;
/*
* Unshifted is (a.val * 2**16) * (b.val * 2**16)
* == (a.val * b.val) * 2**32, but we want
* (a.val * b.val) * 2 ** 16.
*/
return (uint32_t)(unshifted >> 16);
}
static inline fxp_t
fxp_div(fxp_t a, fxp_t b) {
assert(b != 0);
uint64_t unshifted = ((uint64_t)a << 32) / (uint64_t)b;
/*
* Unshifted is (a.val * 2**16) * (2**32) / (b.val * 2**16)
* == (a.val / b.val) * (2 ** 32), which again corresponds to a right
* shift of 16.
*/
return (uint32_t)(unshifted >> 16);
}
static inline uint32_t
fxp_round_down(fxp_t a) {
return a >> 16;
}
static inline uint32_t
fxp_round_nearest(fxp_t a) {
uint32_t fractional_part = (a & ((1U << 16) - 1));
uint32_t increment = (uint32_t)(fractional_part >= (1U << 15));
return (a >> 16) + increment;
}
/*
* Approximately computes x * frac, without the size limitations that would be
* imposed by converting u to an fxp_t.
*/
static inline size_t
fxp_mul_frac(size_t x_orig, fxp_t frac) {
assert(frac <= (1U << 16));
/*
* Work around an over-enthusiastic warning about type limits below (on
* 32-bit platforms, a size_t is always less than 1ULL << 48).
*/
uint64_t x = (uint64_t)x_orig;
/*
* If we can guarantee no overflow, multiply first before shifting, to
* preserve some precision. Otherwise, shift first and then multiply.
* In the latter case, we only lose the low 16 bits of a 48-bit number,
* so we're still accurate to within 1/2**32.
*/
if (x < (1ULL << 48)) {
return (size_t)((x * frac) >> 16);
} else {
return (size_t)((x >> 16) * (uint64_t)frac);
}
}
/*
* Returns true on error. Otherwise, returns false and updates *ptr to point to
* the first character not parsed (because it wasn't a digit).
*/
bool fxp_parse(fxp_t *a, const char *ptr, char **end);
void fxp_print(fxp_t a, char buf[FXP_BUF_SIZE]);
#endif /* JEMALLOC_INTERNAL_FXP_H */

View file

@ -1,6 +1,7 @@
#ifndef JEMALLOC_INTERNAL_HASH_H
#define JEMALLOC_INTERNAL_HASH_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
/*
@ -24,7 +25,7 @@ hash_rotl_64(uint64_t x, int8_t r) {
static inline uint32_t
hash_get_block_32(const uint32_t *p, int i) {
/* Handle unaligned read. */
if (unlikely((uintptr_t)p & (sizeof(uint32_t)-1)) != 0) {
if (unlikely((uintptr_t)p & (sizeof(uint32_t) - 1)) != 0) {
uint32_t ret;
memcpy(&ret, (uint8_t *)(p + i), sizeof(uint32_t));
@ -37,7 +38,7 @@ hash_get_block_32(const uint32_t *p, int i) {
static inline uint64_t
hash_get_block_64(const uint64_t *p, int i) {
/* Handle unaligned read. */
if (unlikely((uintptr_t)p & (sizeof(uint64_t)-1)) != 0) {
if (unlikely((uintptr_t)p & (sizeof(uint64_t) - 1)) != 0) {
uint64_t ret;
memcpy(&ret, (uint8_t *)(p + i), sizeof(uint64_t));
@ -71,8 +72,8 @@ hash_fmix_64(uint64_t k) {
static inline uint32_t
hash_x86_32(const void *key, int len, uint32_t seed) {
const uint8_t *data = (const uint8_t *) key;
const int nblocks = len / 4;
const uint8_t *data = (const uint8_t *)key;
const int nblocks = len / 4;
uint32_t h1 = seed;
@ -81,8 +82,8 @@ hash_x86_32(const void *key, int len, uint32_t seed) {
/* body */
{
const uint32_t *blocks = (const uint32_t *) (data + nblocks*4);
int i;
const uint32_t *blocks = (const uint32_t *)(data + nblocks * 4);
int i;
for (i = -nblocks; i; i++) {
uint32_t k1 = hash_get_block_32(blocks, i);
@ -93,21 +94,29 @@ hash_x86_32(const void *key, int len, uint32_t seed) {
h1 ^= k1;
h1 = hash_rotl_32(h1, 13);
h1 = h1*5 + 0xe6546b64;
h1 = h1 * 5 + 0xe6546b64;
}
}
/* tail */
{
const uint8_t *tail = (const uint8_t *) (data + nblocks*4);
const uint8_t *tail = (const uint8_t *)(data + nblocks * 4);
uint32_t k1 = 0;
switch (len & 3) {
case 3: k1 ^= tail[2] << 16;
case 2: k1 ^= tail[1] << 8;
case 1: k1 ^= tail[0]; k1 *= c1; k1 = hash_rotl_32(k1, 15);
k1 *= c2; h1 ^= k1;
case 3:
k1 ^= tail[2] << 16;
JEMALLOC_FALLTHROUGH;
case 2:
k1 ^= tail[1] << 8;
JEMALLOC_FALLTHROUGH;
case 1:
k1 ^= tail[0];
k1 *= c1;
k1 = hash_rotl_32(k1, 15);
k1 *= c2;
h1 ^= k1;
}
}
@ -119,11 +128,10 @@ hash_x86_32(const void *key, int len, uint32_t seed) {
return h1;
}
UNUSED static inline void
hash_x86_128(const void *key, const int len, uint32_t seed,
uint64_t r_out[2]) {
const uint8_t * data = (const uint8_t *) key;
const int nblocks = len / 16;
static inline void
hash_x86_128(const void *key, const int len, uint32_t seed, uint64_t r_out[2]) {
const uint8_t *data = (const uint8_t *)key;
const int nblocks = len / 16;
uint32_t h1 = seed;
uint32_t h2 = seed;
@ -137,94 +145,161 @@ hash_x86_128(const void *key, const int len, uint32_t seed,
/* body */
{
const uint32_t *blocks = (const uint32_t *) (data + nblocks*16);
int i;
const uint32_t *blocks = (const uint32_t *)(data
+ nblocks * 16);
int i;
for (i = -nblocks; i; i++) {
uint32_t k1 = hash_get_block_32(blocks, i*4 + 0);
uint32_t k2 = hash_get_block_32(blocks, i*4 + 1);
uint32_t k3 = hash_get_block_32(blocks, i*4 + 2);
uint32_t k4 = hash_get_block_32(blocks, i*4 + 3);
uint32_t k1 = hash_get_block_32(blocks, i * 4 + 0);
uint32_t k2 = hash_get_block_32(blocks, i * 4 + 1);
uint32_t k3 = hash_get_block_32(blocks, i * 4 + 2);
uint32_t k4 = hash_get_block_32(blocks, i * 4 + 3);
k1 *= c1; k1 = hash_rotl_32(k1, 15); k1 *= c2; h1 ^= k1;
k1 *= c1;
k1 = hash_rotl_32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = hash_rotl_32(h1, 19); h1 += h2;
h1 = h1*5 + 0x561ccd1b;
h1 = hash_rotl_32(h1, 19);
h1 += h2;
h1 = h1 * 5 + 0x561ccd1b;
k2 *= c2; k2 = hash_rotl_32(k2, 16); k2 *= c3; h2 ^= k2;
k2 *= c2;
k2 = hash_rotl_32(k2, 16);
k2 *= c3;
h2 ^= k2;
h2 = hash_rotl_32(h2, 17); h2 += h3;
h2 = h2*5 + 0x0bcaa747;
h2 = hash_rotl_32(h2, 17);
h2 += h3;
h2 = h2 * 5 + 0x0bcaa747;
k3 *= c3; k3 = hash_rotl_32(k3, 17); k3 *= c4; h3 ^= k3;
k3 *= c3;
k3 = hash_rotl_32(k3, 17);
k3 *= c4;
h3 ^= k3;
h3 = hash_rotl_32(h3, 15); h3 += h4;
h3 = h3*5 + 0x96cd1c35;
h3 = hash_rotl_32(h3, 15);
h3 += h4;
h3 = h3 * 5 + 0x96cd1c35;
k4 *= c4; k4 = hash_rotl_32(k4, 18); k4 *= c1; h4 ^= k4;
k4 *= c4;
k4 = hash_rotl_32(k4, 18);
k4 *= c1;
h4 ^= k4;
h4 = hash_rotl_32(h4, 13); h4 += h1;
h4 = h4*5 + 0x32ac3b17;
h4 = hash_rotl_32(h4, 13);
h4 += h1;
h4 = h4 * 5 + 0x32ac3b17;
}
}
/* tail */
{
const uint8_t *tail = (const uint8_t *) (data + nblocks*16);
uint32_t k1 = 0;
uint32_t k2 = 0;
uint32_t k3 = 0;
uint32_t k4 = 0;
const uint8_t *tail = (const uint8_t *)(data + nblocks * 16);
uint32_t k1 = 0;
uint32_t k2 = 0;
uint32_t k3 = 0;
uint32_t k4 = 0;
switch (len & 15) {
case 15: k4 ^= tail[14] << 16;
case 14: k4 ^= tail[13] << 8;
case 13: k4 ^= tail[12] << 0;
k4 *= c4; k4 = hash_rotl_32(k4, 18); k4 *= c1; h4 ^= k4;
case 12: k3 ^= tail[11] << 24;
case 11: k3 ^= tail[10] << 16;
case 10: k3 ^= tail[ 9] << 8;
case 9: k3 ^= tail[ 8] << 0;
k3 *= c3; k3 = hash_rotl_32(k3, 17); k3 *= c4; h3 ^= k3;
case 8: k2 ^= tail[ 7] << 24;
case 7: k2 ^= tail[ 6] << 16;
case 6: k2 ^= tail[ 5] << 8;
case 5: k2 ^= tail[ 4] << 0;
k2 *= c2; k2 = hash_rotl_32(k2, 16); k2 *= c3; h2 ^= k2;
case 4: k1 ^= tail[ 3] << 24;
case 3: k1 ^= tail[ 2] << 16;
case 2: k1 ^= tail[ 1] << 8;
case 1: k1 ^= tail[ 0] << 0;
k1 *= c1; k1 = hash_rotl_32(k1, 15); k1 *= c2; h1 ^= k1;
case 15:
k4 ^= tail[14] << 16;
JEMALLOC_FALLTHROUGH;
case 14:
k4 ^= tail[13] << 8;
JEMALLOC_FALLTHROUGH;
case 13:
k4 ^= tail[12] << 0;
k4 *= c4;
k4 = hash_rotl_32(k4, 18);
k4 *= c1;
h4 ^= k4;
JEMALLOC_FALLTHROUGH;
case 12:
k3 ^= (uint32_t)tail[11] << 24;
JEMALLOC_FALLTHROUGH;
case 11:
k3 ^= tail[10] << 16;
JEMALLOC_FALLTHROUGH;
case 10:
k3 ^= tail[9] << 8;
JEMALLOC_FALLTHROUGH;
case 9:
k3 ^= tail[8] << 0;
k3 *= c3;
k3 = hash_rotl_32(k3, 17);
k3 *= c4;
h3 ^= k3;
JEMALLOC_FALLTHROUGH;
case 8:
k2 ^= (uint32_t)tail[7] << 24;
JEMALLOC_FALLTHROUGH;
case 7:
k2 ^= tail[6] << 16;
JEMALLOC_FALLTHROUGH;
case 6:
k2 ^= tail[5] << 8;
JEMALLOC_FALLTHROUGH;
case 5:
k2 ^= tail[4] << 0;
k2 *= c2;
k2 = hash_rotl_32(k2, 16);
k2 *= c3;
h2 ^= k2;
JEMALLOC_FALLTHROUGH;
case 4:
k1 ^= (uint32_t)tail[3] << 24;
JEMALLOC_FALLTHROUGH;
case 3:
k1 ^= tail[2] << 16;
JEMALLOC_FALLTHROUGH;
case 2:
k1 ^= tail[1] << 8;
JEMALLOC_FALLTHROUGH;
case 1:
k1 ^= tail[0] << 0;
k1 *= c1;
k1 = hash_rotl_32(k1, 15);
k1 *= c2;
h1 ^= k1;
break;
}
}
/* finalization */
h1 ^= len; h2 ^= len; h3 ^= len; h4 ^= len;
h1 ^= len;
h2 ^= len;
h3 ^= len;
h4 ^= len;
h1 += h2; h1 += h3; h1 += h4;
h2 += h1; h3 += h1; h4 += h1;
h1 += h2;
h1 += h3;
h1 += h4;
h2 += h1;
h3 += h1;
h4 += h1;
h1 = hash_fmix_32(h1);
h2 = hash_fmix_32(h2);
h3 = hash_fmix_32(h3);
h4 = hash_fmix_32(h4);
h1 += h2; h1 += h3; h1 += h4;
h2 += h1; h3 += h1; h4 += h1;
h1 += h2;
h1 += h3;
h1 += h4;
h2 += h1;
h3 += h1;
h4 += h1;
r_out[0] = (((uint64_t) h2) << 32) | h1;
r_out[1] = (((uint64_t) h4) << 32) | h3;
r_out[0] = (((uint64_t)h2) << 32) | h1;
r_out[1] = (((uint64_t)h4) << 32) | h3;
}
UNUSED static inline void
hash_x64_128(const void *key, const int len, const uint32_t seed,
uint64_t r_out[2]) {
const uint8_t *data = (const uint8_t *) key;
const int nblocks = len / 16;
static inline void
hash_x64_128(
const void *key, const int len, const uint32_t seed, uint64_t r_out[2]) {
const uint8_t *data = (const uint8_t *)key;
const int nblocks = len / 16;
uint64_t h1 = seed;
uint64_t h2 = seed;
@ -234,55 +309,99 @@ hash_x64_128(const void *key, const int len, const uint32_t seed,
/* body */
{
const uint64_t *blocks = (const uint64_t *) (data);
int i;
const uint64_t *blocks = (const uint64_t *)(data);
int i;
for (i = 0; i < nblocks; i++) {
uint64_t k1 = hash_get_block_64(blocks, i*2 + 0);
uint64_t k2 = hash_get_block_64(blocks, i*2 + 1);
uint64_t k1 = hash_get_block_64(blocks, i * 2 + 0);
uint64_t k2 = hash_get_block_64(blocks, i * 2 + 1);
k1 *= c1; k1 = hash_rotl_64(k1, 31); k1 *= c2; h1 ^= k1;
k1 *= c1;
k1 = hash_rotl_64(k1, 31);
k1 *= c2;
h1 ^= k1;
h1 = hash_rotl_64(h1, 27); h1 += h2;
h1 = h1*5 + 0x52dce729;
h1 = hash_rotl_64(h1, 27);
h1 += h2;
h1 = h1 * 5 + 0x52dce729;
k2 *= c2; k2 = hash_rotl_64(k2, 33); k2 *= c1; h2 ^= k2;
k2 *= c2;
k2 = hash_rotl_64(k2, 33);
k2 *= c1;
h2 ^= k2;
h2 = hash_rotl_64(h2, 31); h2 += h1;
h2 = h2*5 + 0x38495ab5;
h2 = hash_rotl_64(h2, 31);
h2 += h1;
h2 = h2 * 5 + 0x38495ab5;
}
}
/* tail */
{
const uint8_t *tail = (const uint8_t*)(data + nblocks*16);
uint64_t k1 = 0;
uint64_t k2 = 0;
const uint8_t *tail = (const uint8_t *)(data + nblocks * 16);
uint64_t k1 = 0;
uint64_t k2 = 0;
switch (len & 15) {
case 15: k2 ^= ((uint64_t)(tail[14])) << 48;
case 14: k2 ^= ((uint64_t)(tail[13])) << 40;
case 13: k2 ^= ((uint64_t)(tail[12])) << 32;
case 12: k2 ^= ((uint64_t)(tail[11])) << 24;
case 11: k2 ^= ((uint64_t)(tail[10])) << 16;
case 10: k2 ^= ((uint64_t)(tail[ 9])) << 8;
case 9: k2 ^= ((uint64_t)(tail[ 8])) << 0;
k2 *= c2; k2 = hash_rotl_64(k2, 33); k2 *= c1; h2 ^= k2;
case 8: k1 ^= ((uint64_t)(tail[ 7])) << 56;
case 7: k1 ^= ((uint64_t)(tail[ 6])) << 48;
case 6: k1 ^= ((uint64_t)(tail[ 5])) << 40;
case 5: k1 ^= ((uint64_t)(tail[ 4])) << 32;
case 4: k1 ^= ((uint64_t)(tail[ 3])) << 24;
case 3: k1 ^= ((uint64_t)(tail[ 2])) << 16;
case 2: k1 ^= ((uint64_t)(tail[ 1])) << 8;
case 1: k1 ^= ((uint64_t)(tail[ 0])) << 0;
k1 *= c1; k1 = hash_rotl_64(k1, 31); k1 *= c2; h1 ^= k1;
case 15:
k2 ^= ((uint64_t)(tail[14])) << 48;
JEMALLOC_FALLTHROUGH;
case 14:
k2 ^= ((uint64_t)(tail[13])) << 40;
JEMALLOC_FALLTHROUGH;
case 13:
k2 ^= ((uint64_t)(tail[12])) << 32;
JEMALLOC_FALLTHROUGH;
case 12:
k2 ^= ((uint64_t)(tail[11])) << 24;
JEMALLOC_FALLTHROUGH;
case 11:
k2 ^= ((uint64_t)(tail[10])) << 16;
JEMALLOC_FALLTHROUGH;
case 10:
k2 ^= ((uint64_t)(tail[9])) << 8;
JEMALLOC_FALLTHROUGH;
case 9:
k2 ^= ((uint64_t)(tail[8])) << 0;
k2 *= c2;
k2 = hash_rotl_64(k2, 33);
k2 *= c1;
h2 ^= k2;
JEMALLOC_FALLTHROUGH;
case 8:
k1 ^= ((uint64_t)(tail[7])) << 56;
JEMALLOC_FALLTHROUGH;
case 7:
k1 ^= ((uint64_t)(tail[6])) << 48;
JEMALLOC_FALLTHROUGH;
case 6:
k1 ^= ((uint64_t)(tail[5])) << 40;
JEMALLOC_FALLTHROUGH;
case 5:
k1 ^= ((uint64_t)(tail[4])) << 32;
JEMALLOC_FALLTHROUGH;
case 4:
k1 ^= ((uint64_t)(tail[3])) << 24;
JEMALLOC_FALLTHROUGH;
case 3:
k1 ^= ((uint64_t)(tail[2])) << 16;
JEMALLOC_FALLTHROUGH;
case 2:
k1 ^= ((uint64_t)(tail[1])) << 8;
JEMALLOC_FALLTHROUGH;
case 1:
k1 ^= ((uint64_t)(tail[0])) << 0;
k1 *= c1;
k1 = hash_rotl_64(k1, 31);
k1 *= c2;
h1 ^= k1;
break;
}
}
/* finalization */
h1 ^= len; h2 ^= len;
h1 ^= len;
h2 ^= len;
h1 += h2;
h2 += h1;

View file

@ -0,0 +1,163 @@
#ifndef JEMALLOC_INTERNAL_HOOK_H
#define JEMALLOC_INTERNAL_HOOK_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/tsd.h"
/*
* This API is *extremely* experimental, and may get ripped out, changed in API-
* and ABI-incompatible ways, be insufficiently or incorrectly documented, etc.
*
* It allows hooking the stateful parts of the API to see changes as they
* happen.
*
* Allocation hooks are called after the allocation is done, free hooks are
* called before the free is done, and expand hooks are called after the
* allocation is expanded.
*
* For realloc and rallocx, if the expansion happens in place, the expansion
* hook is called. If it is moved, then the alloc hook is called on the new
* location, and then the free hook is called on the old location (i.e. both
* hooks are invoked in between the alloc and the dalloc).
*
* If we return NULL from OOM, then usize might not be trustworthy. Calling
* realloc(NULL, size) only calls the alloc hook, and calling realloc(ptr, 0)
* only calls the free hook. (Calling realloc(NULL, 0) is treated as malloc(0),
* and only calls the alloc hook).
*
* Reentrancy:
* Reentrancy is guarded against from within the hook implementation. If you
* call allocator functions from within a hook, the hooks will not be invoked
* again.
* Threading:
* The installation of a hook synchronizes with all its uses. If you can
* prove the installation of a hook happens-before a jemalloc entry point,
* then the hook will get invoked (unless there's a racing removal).
*
* Hook insertion appears to be atomic at a per-thread level (i.e. if a thread
* allocates and has the alloc hook invoked, then a subsequent free on the
* same thread will also have the free hook invoked).
*
* The *removal* of a hook does *not* block until all threads are done with
* the hook. Hook authors have to be resilient to this, and need some
* out-of-band mechanism for cleaning up any dynamically allocated memory
* associated with their hook.
* Ordering:
* Order of hook execution is unspecified, and may be different than insertion
* order.
*/
#define HOOK_MAX 4
enum hook_alloc_e {
hook_alloc_malloc,
hook_alloc_posix_memalign,
hook_alloc_aligned_alloc,
hook_alloc_calloc,
hook_alloc_memalign,
hook_alloc_valloc,
hook_alloc_pvalloc,
hook_alloc_mallocx,
/* The reallocating functions have both alloc and dalloc variants */
hook_alloc_realloc,
hook_alloc_rallocx,
};
/*
* We put the enum typedef after the enum, since this file may get included by
* jemalloc_cpp.cpp, and C++ disallows enum forward declarations.
*/
typedef enum hook_alloc_e hook_alloc_t;
enum hook_dalloc_e {
hook_dalloc_free,
hook_dalloc_dallocx,
hook_dalloc_sdallocx,
/*
* The dalloc halves of reallocation (not called if in-place expansion
* happens).
*/
hook_dalloc_realloc,
hook_dalloc_rallocx,
};
typedef enum hook_dalloc_e hook_dalloc_t;
enum hook_expand_e {
hook_expand_realloc,
hook_expand_rallocx,
hook_expand_xallocx,
};
typedef enum hook_expand_e hook_expand_t;
typedef void (*hook_alloc)(void *extra, hook_alloc_t type, void *result,
uintptr_t result_raw, uintptr_t args_raw[3]);
typedef void (*hook_dalloc)(
void *extra, hook_dalloc_t type, void *address, uintptr_t args_raw[3]);
typedef void (*hook_expand)(void *extra, hook_expand_t type, void *address,
size_t old_usize, size_t new_usize, uintptr_t result_raw,
uintptr_t args_raw[4]);
typedef struct hooks_s hooks_t;
struct hooks_s {
hook_alloc alloc_hook;
hook_dalloc dalloc_hook;
hook_expand expand_hook;
void *extra;
};
/*
* Begin implementation details; everything above this point might one day live
* in a public API. Everything below this point never will.
*/
/*
* The realloc pathways haven't gotten any refactoring love in a while, and it's
* fairly difficult to pass information from the entry point to the hooks. We
* put the informaiton the hooks will need into a struct to encapsulate
* everything.
*
* Much of these pathways are force-inlined, so that the compiler can avoid
* materializing this struct until we hit an extern arena function. For fairly
* goofy reasons, *many* of the realloc paths hit an extern arena function.
* These paths are cold enough that it doesn't matter; eventually, we should
* rewrite the realloc code to make the expand-in-place and the
* free-then-realloc paths more orthogonal, at which point we don't need to
* spread the hook logic all over the place.
*/
typedef struct hook_ralloc_args_s hook_ralloc_args_t;
struct hook_ralloc_args_s {
/* I.e. as opposed to rallocx. */
bool is_realloc;
/*
* The expand hook takes 4 arguments, even if only 3 are actually used;
* we add an extra one in case the user decides to memcpy without
* looking too closely at the hooked function.
*/
uintptr_t args[4];
};
/*
* Returns an opaque handle to be used when removing the hook. NULL means that
* we couldn't install the hook.
*/
bool hook_boot(void);
void *hook_install(tsdn_t *tsdn, hooks_t *to_install);
/* Uninstalls the hook with the handle previously returned from hook_install. */
void hook_remove(tsdn_t *tsdn, void *opaque);
/* Hooks */
void hook_invoke_alloc(hook_alloc_t type, void *result, uintptr_t result_raw,
uintptr_t args_raw[3]);
void hook_invoke_dalloc(
hook_dalloc_t type, void *address, uintptr_t args_raw[3]);
void hook_invoke_expand(hook_expand_t type, void *address, size_t old_usize,
size_t new_usize, uintptr_t result_raw, uintptr_t args_raw[4]);
#endif /* JEMALLOC_INTERNAL_HOOK_H */

View file

@ -1,19 +0,0 @@
#ifndef JEMALLOC_INTERNAL_HOOKS_H
#define JEMALLOC_INTERNAL_HOOKS_H
extern JEMALLOC_EXPORT void (*hooks_arena_new_hook)();
extern JEMALLOC_EXPORT void (*hooks_libc_hook)();
#define JEMALLOC_HOOK(fn, hook) ((void)(hook != NULL && (hook(), 0)), fn)
#define open JEMALLOC_HOOK(open, hooks_libc_hook)
#define read JEMALLOC_HOOK(read, hooks_libc_hook)
#define write JEMALLOC_HOOK(write, hooks_libc_hook)
#define readlink JEMALLOC_HOOK(readlink, hooks_libc_hook)
#define close JEMALLOC_HOOK(close, hooks_libc_hook)
#define creat JEMALLOC_HOOK(creat, hooks_libc_hook)
#define secure_getenv JEMALLOC_HOOK(secure_getenv, hooks_libc_hook)
/* Note that this is undef'd and re-define'd in src/prof.c. */
#define _Unwind_Backtrace JEMALLOC_HOOK(_Unwind_Backtrace, hooks_libc_hook)
#endif /* JEMALLOC_INTERNAL_HOOKS_H */

View file

@ -0,0 +1,185 @@
#ifndef JEMALLOC_INTERNAL_HPA_H
#define JEMALLOC_INTERNAL_HPA_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/base.h"
#include "jemalloc/internal/edata_cache.h"
#include "jemalloc/internal/emap.h"
#include "jemalloc/internal/exp_grow.h"
#include "jemalloc/internal/hpa_central.h"
#include "jemalloc/internal/hpa_hooks.h"
#include "jemalloc/internal/hpa_opts.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/pai.h"
#include "jemalloc/internal/psset.h"
#include "jemalloc/internal/sec.h"
typedef struct hpa_shard_nonderived_stats_s hpa_shard_nonderived_stats_t;
struct hpa_shard_nonderived_stats_s {
/*
* The number of times we've purged within a hugepage.
*
* Guarded by mtx.
*/
uint64_t npurge_passes;
/*
* The number of individual purge calls we perform (which should always
* be bigger than npurge_passes, since each pass purges at least one
* extent within a hugepage.
*
* Guarded by mtx.
*/
uint64_t npurges;
/*
* The number of times we've hugified a pageslab.
*
* Guarded by mtx.
*/
uint64_t nhugifies;
/*
* The number of times we've tried to hugify a pageslab, but failed.
*
* Guarded by mtx.
*/
uint64_t nhugify_failures;
/*
* The number of times we've dehugified a pageslab.
*
* Guarded by mtx.
*/
uint64_t ndehugifies;
};
/* Completely derived; only used by CTL. */
typedef struct hpa_shard_stats_s hpa_shard_stats_t;
struct hpa_shard_stats_s {
psset_stats_t psset_stats;
hpa_shard_nonderived_stats_t nonderived_stats;
sec_stats_t secstats;
};
typedef struct hpa_shard_s hpa_shard_t;
struct hpa_shard_s {
/*
* pai must be the first member; we cast from a pointer to it to a
* pointer to the hpa_shard_t.
*/
pai_t pai;
/* The central allocator we get our hugepages from. */
hpa_central_t *central;
/* Protects most of this shard's state. */
malloc_mutex_t mtx;
/*
* Guards the shard's access to the central allocator (preventing
* multiple threads operating on this shard from accessing the central
* allocator).
*/
malloc_mutex_t grow_mtx;
/* The base metadata allocator. */
base_t *base;
/*
* This edata cache is the one we use when allocating a small extent
* from a pageslab. The pageslab itself comes from the centralized
* allocator, and so will use its edata_cache.
*/
edata_cache_fast_t ecf;
/* Small extent cache (not guarded by mtx) */
JEMALLOC_ALIGNED(CACHELINE) sec_t sec;
psset_t psset;
/*
* How many grow operations have occurred.
*
* Guarded by grow_mtx.
*/
uint64_t age_counter;
/* The arena ind we're associated with. */
unsigned ind;
/*
* Our emap. This is just a cache of the emap pointer in the associated
* hpa_central.
*/
emap_t *emap;
/* The configuration choices for this hpa shard. */
hpa_shard_opts_t opts;
/*
* How many pages have we started but not yet finished purging in this
* hpa shard.
*/
size_t npending_purge;
/*
* Those stats which are copied directly into the CTL-centric hpa shard
* stats.
*/
hpa_shard_nonderived_stats_t stats;
/*
* Last time we performed purge on this shard.
*/
nstime_t last_purge;
/*
* Last time when we attempted work (purging or hugifying). If deferral
* of the work is allowed (we have background thread), this is the time
* when background thread checked if purging or hugifying needs to be
* done. If deferral is not allowed, this is the time of (hpa_alloc or
* hpa_dalloc) activity in the shard.
*/
nstime_t last_time_work_attempted;
};
bool hpa_hugepage_size_exceeds_limit(void);
/*
* Whether or not the HPA can be used given the current configuration. This is
* is not necessarily a guarantee that it backs its allocations by hugepages,
* just that it can function properly given the system it's running on.
*/
bool hpa_supported(void);
bool hpa_shard_init(tsdn_t *tsdn, hpa_shard_t *shard, hpa_central_t *central,
emap_t *emap, base_t *base, edata_cache_t *edata_cache, unsigned ind,
const hpa_shard_opts_t *opts, const sec_opts_t *sec_opts);
void hpa_shard_stats_accum(hpa_shard_stats_t *dst, hpa_shard_stats_t *src);
void hpa_shard_stats_merge(
tsdn_t *tsdn, hpa_shard_t *shard, hpa_shard_stats_t *dst);
/*
* Notify the shard that we won't use it for allocations much longer. Due to
* the possibility of races, we don't actually prevent allocations; just flush
* and disable the embedded edata_cache_small.
*/
void hpa_shard_disable(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_destroy(tsdn_t *tsdn, hpa_shard_t *shard);
/* Flush caches that shard may be using */
void hpa_shard_flush(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_set_deferral_allowed(
tsdn_t *tsdn, hpa_shard_t *shard, bool deferral_allowed);
void hpa_shard_do_deferred_work(tsdn_t *tsdn, hpa_shard_t *shard);
/*
* We share the fork ordering with the PA and arena prefork handling; that's why
* these are 2, 3 and 4 rather than 0 and 1.
*/
void hpa_shard_prefork2(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_prefork3(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_prefork4(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_postfork_parent(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_postfork_child(tsdn_t *tsdn, hpa_shard_t *shard);
#endif /* JEMALLOC_INTERNAL_HPA_H */

View file

@ -0,0 +1,41 @@
#ifndef JEMALLOC_INTERNAL_HPA_CENTRAL_H
#define JEMALLOC_INTERNAL_HPA_CENTRAL_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/base.h"
#include "jemalloc/internal/hpa_hooks.h"
#include "jemalloc/internal/hpdata.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/tsd_types.h"
typedef struct hpa_central_s hpa_central_t;
struct hpa_central_s {
/*
* Guards expansion of eden. We separate this from the regular mutex so
* that cheaper operations can still continue while we're doing the OS
* call.
*/
malloc_mutex_t grow_mtx;
/*
* Either NULL (if empty), or some integer multiple of a
* hugepage-aligned number of hugepages. We carve them off one at a
* time to satisfy new pageslab requests.
*
* Guarded by grow_mtx.
*/
void *eden;
size_t eden_len;
/* Source for metadata. */
base_t *base;
/* The HPA hooks. */
hpa_hooks_t hooks;
};
bool hpa_central_init(
hpa_central_t *central, base_t *base, const hpa_hooks_t *hooks);
hpdata_t *hpa_central_extract(tsdn_t *tsdn, hpa_central_t *central, size_t size,
uint64_t age, bool hugify_eager, bool *oom);
#endif /* JEMALLOC_INTERNAL_HPA_CENTRAL_H */

View file

@ -0,0 +1,21 @@
#ifndef JEMALLOC_INTERNAL_HPA_HOOKS_H
#define JEMALLOC_INTERNAL_HPA_HOOKS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/nstime.h"
typedef struct hpa_hooks_s hpa_hooks_t;
struct hpa_hooks_s {
void *(*map)(size_t size);
void (*unmap)(void *ptr, size_t size);
void (*purge)(void *ptr, size_t size);
bool (*hugify)(void *ptr, size_t size, bool sync);
void (*dehugify)(void *ptr, size_t size);
void (*curtime)(nstime_t *r_time, bool first_reading);
uint64_t (*ms_since)(nstime_t *r_time);
bool (*vectorized_purge)(void *vec, size_t vlen, size_t nbytes);
};
extern const hpa_hooks_t hpa_hooks_default;
#endif /* JEMALLOC_INTERNAL_HPA_HOOKS_H */

View file

@ -0,0 +1,190 @@
#ifndef JEMALLOC_INTERNAL_HPA_OPTS_H
#define JEMALLOC_INTERNAL_HPA_OPTS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/fxp.h"
/*
* This file is morally part of hpa.h, but is split out for header-ordering
* reasons.
*
* All of these hpa_shard_opts below are experimental. We are exploring more
* efficient packing, hugifying, and purging approaches to make efficient
* trade-offs between CPU, memory, latency, and usability. This means all of
* them are at the risk of being deprecated and corresponding configurations
* should be updated once the final version settles.
*/
/*
* This enum controls how jemalloc hugifies/dehugifies pages. Each style may be
* more suitable depending on deployment environments.
*
* hpa_hugify_style_none
* Using this means that jemalloc will not be hugifying or dehugifying pages,
* but will let the kernel make those decisions. This style only makes sense
* when deploying on systems where THP are enabled in 'always' mode. With this
* style, you most likely want to have no purging at all (dirty_mult=-1) or
* purge_threshold=HUGEPAGE bytes (2097152 for 2Mb page), although other
* thresholds may work well depending on kernel settings of your deployment
* targets.
*
* hpa_hugify_style_eager
* This style results in jemalloc giving hugepage advice, if needed, to
* anonymous memory immediately after it is mapped, so huge pages can be backing
* that memory at page-fault time. This is usually more efficient than doing
* it later, and it allows us to benefit from the hugepages from the start.
* Same options for purging as for the style 'none' are good starting choices:
* no purging, or purge_threshold=HUGEPAGE, some min_purge_delay_ms that allows
* for page not to be purged quickly, etc. This is a good choice if you can
* afford extra memory and your application gets performance increase from
* transparent hughepages.
*
* hpa_hugify_style_lazy
* This style is suitable when you purge more aggressively (you sacrifice CPU
* performance for less memory). When this style is chosen, jemalloc will
* hugify once hugification_threshold is reached, and dehugify before purging.
* If the kernel is configured to use direct compaction you may experience some
* allocation latency when using this style. The best is to measure what works
* better for your application needs, and in the target deployment environment.
* This is a good choice for apps that cannot afford a lot of memory regression,
* but would still like to benefit from backing certain memory regions with
* hugepages.
*/
enum hpa_hugify_style_e {
hpa_hugify_style_auto = 0,
hpa_hugify_style_none = 1,
hpa_hugify_style_eager = 2,
hpa_hugify_style_lazy = 3,
hpa_hugify_style_limit = hpa_hugify_style_lazy + 1
};
typedef enum hpa_hugify_style_e hpa_hugify_style_t;
extern const char *const hpa_hugify_style_names[];
typedef struct hpa_shard_opts_s hpa_shard_opts_t;
struct hpa_shard_opts_s {
/*
* The largest size we'll allocate out of the shard. For those
* allocations refused, the caller (in practice, the PA module) will
* fall back to the more general (for now) PAC, which can always handle
* any allocation request.
*/
size_t slab_max_alloc;
/*
* When the number of active bytes in a hugepage is >=
* hugification_threshold, we force hugify it.
*/
size_t hugification_threshold;
/*
* The HPA purges whenever the number of pages exceeds dirty_mult *
* active_pages. This may be set to (fxp_t)-1 to disable purging.
*/
fxp_t dirty_mult;
/*
* Whether or not the PAI methods are allowed to defer work to a
* subsequent hpa_shard_do_deferred_work() call. Practically, this
* corresponds to background threads being enabled. We track this
* ourselves for encapsulation purposes.
*/
bool deferral_allowed;
/*
* How long a hugepage has to be a hugification candidate before it will
* actually get hugified.
*/
uint64_t hugify_delay_ms;
/*
* Hugify pages synchronously (hugify will happen even if hugify_style
* is not hpa_hugify_style_lazy).
*/
bool hugify_sync;
/*
* Minimum amount of time between purges.
*/
uint64_t min_purge_interval_ms;
/*
* Maximum number of hugepages to purge on each purging attempt.
*/
ssize_t experimental_max_purge_nhp;
/*
* Minimum number of inactive bytes needed for a non-empty page to be
* considered purgable.
*
* When the number of touched inactive bytes on non-empty hugepage is
* >= purge_threshold, the page is purgable. Empty pages are always
* purgable. Setting this to HUGEPAGE bytes would only purge empty
* pages if using hugify_style_eager and the purges would be exactly
* HUGEPAGE bytes. Depending on your kernel settings, this may result
* in better performance.
*
* Please note, when threshold is reached, we will purge all the dirty
* bytes, and not just up to the threshold. If this is PAGE bytes, then
* all the pages that have any dirty bytes are purgable. We treat
* purgability constraint for purge_threshold as stronger than
* dirty_mult, IOW, if no page meets purge_threshold, we will not purge
* even if we are above dirty_mult.
*/
size_t purge_threshold;
/*
* Minimum number of ms that needs to elapse between HP page becoming
* eligible for purging and actually getting purged.
*
* Setting this to a larger number would give better chance of reusing
* that memory. Setting it to 0 means that page is eligible for purging
* as soon as it meets the purge_threshold. The clock resets when
* purgability of the page changes (page goes from being non-purgable to
* purgable). When using eager style you probably want to allow for
* some delay, to avoid purging the page too quickly and give it time to
* be used.
*/
uint64_t min_purge_delay_ms;
/*
* Style of hugification/dehugification (see comment at
* hpa_hugify_style_t for options).
*/
hpa_hugify_style_t hugify_style;
};
/* clang-format off */
#define HPA_SHARD_OPTS_DEFAULT { \
/* slab_max_alloc */ \
64 * 1024, \
/* hugification_threshold */ \
HUGEPAGE * 95 / 100, \
/* dirty_mult */ \
FXP_INIT_PERCENT(25), \
/* \
* deferral_allowed \
* \
* Really, this is always set by the arena during creation \
* or by an hpa_shard_set_deferral_allowed call, so the value \
* we put here doesn't matter. \
*/ \
false, \
/* hugify_delay_ms */ \
10 * 1000, \
/* hugify_sync */ \
false, \
/* min_purge_interval_ms */ \
5 * 1000, \
/* experimental_max_purge_nhp */ \
-1, \
/* size_t purge_threshold */ \
PAGE, \
/* min_purge_delay_ms */ \
0, \
/* hugify_style */ \
hpa_hugify_style_lazy \
}
/* clang-format on */
#endif /* JEMALLOC_INTERNAL_HPA_OPTS_H */

View file

@ -0,0 +1,161 @@
#ifndef JEMALLOC_INTERNAL_HPA_UTILS_H
#define JEMALLOC_INTERNAL_HPA_UTILS_H
#include "jemalloc/internal/hpa.h"
#include "jemalloc/internal/extent.h"
#define HPA_MIN_VAR_VEC_SIZE 8
/*
* This is used for jemalloc internal tuning and may change in the future based
* on production traffic.
*
* This value protects two things:
* 1. Stack size
* 2. Number of huge pages that are being purged in a batch as we do not
* allow allocations while making madvise syscall.
*/
#define HPA_PURGE_BATCH_MAX 16
#ifdef JEMALLOC_HAVE_PROCESS_MADVISE
typedef struct iovec hpa_io_vector_t;
#else
typedef struct {
void *iov_base;
size_t iov_len;
} hpa_io_vector_t;
#endif
static inline size_t
hpa_process_madvise_max_iovec_len(void) {
assert(
opt_process_madvise_max_batch <= PROCESS_MADVISE_MAX_BATCH_LIMIT);
return opt_process_madvise_max_batch == 0
? HPA_MIN_VAR_VEC_SIZE
: opt_process_madvise_max_batch;
}
/* Actually invoke hooks. If we fail vectorized, use single purges */
static void
hpa_try_vectorized_purge(
hpa_hooks_t *hooks, hpa_io_vector_t *vec, size_t vlen, size_t nbytes) {
bool success = opt_process_madvise_max_batch > 0
&& !hooks->vectorized_purge(vec, vlen, nbytes);
if (!success) {
/* On failure, it is safe to purge again (potential perf
* penalty) If kernel can tell exactly which regions
* failed, we could avoid that penalty.
*/
for (size_t i = 0; i < vlen; ++i) {
hooks->purge(vec[i].iov_base, vec[i].iov_len);
}
}
}
/*
* This structure accumulates the regions for process_madvise. It invokes the
* hook when batch limit is reached.
*/
typedef struct {
hpa_io_vector_t *vp;
size_t cur;
size_t total_bytes;
size_t capacity;
} hpa_range_accum_t;
static inline void
hpa_range_accum_init(hpa_range_accum_t *ra, hpa_io_vector_t *v, size_t sz) {
ra->vp = v;
ra->capacity = sz;
ra->total_bytes = 0;
ra->cur = 0;
}
static inline void
hpa_range_accum_flush(hpa_range_accum_t *ra, hpa_hooks_t *hooks) {
assert(ra->total_bytes > 0 && ra->cur > 0);
hpa_try_vectorized_purge(hooks, ra->vp, ra->cur, ra->total_bytes);
ra->cur = 0;
ra->total_bytes = 0;
}
static inline void
hpa_range_accum_add(
hpa_range_accum_t *ra, void *addr, size_t sz, hpa_hooks_t *hooks) {
assert(ra->cur < ra->capacity);
ra->vp[ra->cur].iov_base = addr;
ra->vp[ra->cur].iov_len = sz;
ra->total_bytes += sz;
ra->cur++;
if (ra->cur == ra->capacity) {
hpa_range_accum_flush(ra, hooks);
}
}
static inline void
hpa_range_accum_finish(hpa_range_accum_t *ra, hpa_hooks_t *hooks) {
if (ra->cur > 0) {
hpa_range_accum_flush(ra, hooks);
}
}
/*
* For purging more than one page we use batch of these items
*/
typedef struct {
hpdata_purge_state_t state;
hpdata_t *hp;
bool dehugify;
} hpa_purge_item_t;
typedef struct hpa_purge_batch_s hpa_purge_batch_t;
struct hpa_purge_batch_s {
hpa_purge_item_t *items;
size_t items_capacity;
/* Number of huge pages to purge in current batch */
size_t item_cnt;
/* Number of ranges to purge in current batch */
size_t nranges;
/* Total number of dirty pages in current batch*/
size_t ndirty_in_batch;
/* Max number of huge pages to purge */
size_t max_hp;
/*
* Once we are above this watermark we should not add more pages
* to the same batch. This is because while we want to minimize
* number of madvise calls we also do not want to be preventing
* allocations from too many huge pages (which we have to do
* while they are being purged)
*/
size_t range_watermark;
size_t npurged_hp_total;
};
static inline bool
hpa_batch_full(hpa_purge_batch_t *b) {
/* It's okay for ranges to go above */
return b->npurged_hp_total == b->max_hp
|| b->item_cnt == b->items_capacity
|| b->nranges >= b->range_watermark;
}
static inline void
hpa_batch_pass_start(hpa_purge_batch_t *b) {
b->item_cnt = 0;
b->nranges = 0;
b->ndirty_in_batch = 0;
}
static inline bool
hpa_batch_empty(hpa_purge_batch_t *b) {
return b->item_cnt == 0;
}
/* Purge pages in a batch using given hooks */
void hpa_purge_batch(
hpa_hooks_t *hooks, hpa_purge_item_t *batch, size_t batch_sz);
#endif /* JEMALLOC_INTERNAL_HPA_UTILS_H */

View file

@ -0,0 +1,486 @@
#ifndef JEMALLOC_INTERNAL_HPDATA_H
#define JEMALLOC_INTERNAL_HPDATA_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/fb.h"
#include "jemalloc/internal/nstime.h"
#include "jemalloc/internal/pages.h"
#include "jemalloc/internal/ph.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/typed_list.h"
/*
* The metadata representation we use for extents in hugepages. While the PAC
* uses the edata_t to represent both active and inactive extents, the HP only
* uses the edata_t for active ones; instead, inactive extent state is tracked
* within hpdata associated with the enclosing hugepage-sized, hugepage-aligned
* region of virtual address space.
*
* An hpdata need not be "truly" backed by a hugepage (which is not necessarily
* an observable property of any given region of address space). It's just
* hugepage-sized and hugepage-aligned; it's *potentially* huge.
*/
/*
* The max enumeration num should not exceed 2^16 - 1, see comments in edata.h
* for ESET_ENUMERATE_MAX_NUM for more details.
*/
#define PSSET_ENUMERATE_MAX_NUM 32
typedef struct hpdata_s hpdata_t;
ph_structs(hpdata_age_heap, hpdata_t, PSSET_ENUMERATE_MAX_NUM);
struct hpdata_s {
/*
* We likewise follow the edata convention of mangling names and forcing
* the use of accessors -- this lets us add some consistency checks on
* access.
*/
/*
* The address of the hugepage in question. This can't be named h_addr,
* since that conflicts with a macro defined in Windows headers.
*/
void *h_address;
/* Its age (measured in psset operations). */
uint64_t h_age;
/* Whether or not we think the hugepage is mapped that way by the OS. */
bool h_huge;
/*
* For some properties, we keep parallel sets of bools; h_foo_allowed
* and h_in_psset_foo_container. This is a decoupling mechanism to
* avoid bothering the hpa (which manages policies) from the psset
* (which is the mechanism used to enforce those policies). This allows
* all the container management logic to live in one place, without the
* HPA needing to know or care how that happens.
*/
/*
* Whether or not the hpdata is allowed to be used to serve allocations,
* and whether or not the psset is currently tracking it as such.
*/
bool h_alloc_allowed;
bool h_in_psset_alloc_container;
/*
* The same, but with purging. There's no corresponding
* h_in_psset_purge_container, because the psset (currently) always
* removes hpdatas from their containers during updates (to implement
* LRU for purging).
*/
bool h_purge_allowed;
/* And with hugifying. */
bool h_hugify_allowed;
/* When we became a hugification candidate. */
nstime_t h_time_hugify_allowed;
bool h_in_psset_hugify_container;
/* Whether or not a purge or hugify is currently happening. */
bool h_mid_purge;
bool h_mid_hugify;
/*
* Whether or not the hpdata is being updated in the psset (i.e. if
* there has been a psset_update_begin call issued without a matching
* psset_update_end call). Eventually this will expand to other types
* of updates.
*/
bool h_updating;
/* Whether or not the hpdata is in a psset. */
bool h_in_psset;
union {
/* When nonempty (and also nonfull), used by the psset bins. */
hpdata_age_heap_link_t age_link;
/*
* When empty (or not corresponding to any hugepage), list
* linkage.
*/
ql_elm(hpdata_t) ql_link_empty;
};
/*
* Linkage for the psset to track candidates for purging and hugifying.
*/
ql_elm(hpdata_t) ql_link_purge;
ql_elm(hpdata_t) ql_link_hugify;
/* The length of the largest contiguous sequence of inactive pages. */
size_t h_longest_free_range;
/* Number of active pages. */
size_t h_nactive;
/* A bitmap with bits set in the active pages. */
fb_group_t active_pages[FB_NGROUPS(HUGEPAGE_PAGES)];
/*
* Number of dirty or active pages, and a bitmap tracking them. One
* way to think of this is as which pages are dirty from the OS's
* perspective.
*/
size_t h_ntouched;
/* The touched pages (using the same definition as above). */
fb_group_t touched_pages[FB_NGROUPS(HUGEPAGE_PAGES)];
/* Time when this extent (hpdata) becomes eligible for purging */
nstime_t h_time_purge_allowed;
/* True if the extent was huge and empty last time when it was purged */
bool h_purged_when_empty_and_huge;
};
TYPED_LIST(hpdata_empty_list, hpdata_t, ql_link_empty)
TYPED_LIST(hpdata_purge_list, hpdata_t, ql_link_purge)
TYPED_LIST(hpdata_hugify_list, hpdata_t, ql_link_hugify)
ph_proto(, hpdata_age_heap, hpdata_t);
static inline void *
hpdata_addr_get(const hpdata_t *hpdata) {
return hpdata->h_address;
}
static inline void
hpdata_addr_set(hpdata_t *hpdata, void *addr) {
assert(HUGEPAGE_ADDR2BASE(addr) == addr);
hpdata->h_address = addr;
}
static inline uint64_t
hpdata_age_get(const hpdata_t *hpdata) {
return hpdata->h_age;
}
static inline void
hpdata_age_set(hpdata_t *hpdata, uint64_t age) {
hpdata->h_age = age;
}
static inline bool
hpdata_huge_get(const hpdata_t *hpdata) {
return hpdata->h_huge;
}
static inline bool
hpdata_alloc_allowed_get(const hpdata_t *hpdata) {
return hpdata->h_alloc_allowed;
}
static inline void
hpdata_alloc_allowed_set(hpdata_t *hpdata, bool alloc_allowed) {
hpdata->h_alloc_allowed = alloc_allowed;
}
static inline bool
hpdata_in_psset_alloc_container_get(const hpdata_t *hpdata) {
return hpdata->h_in_psset_alloc_container;
}
static inline void
hpdata_in_psset_alloc_container_set(hpdata_t *hpdata, bool in_container) {
assert(in_container != hpdata->h_in_psset_alloc_container);
hpdata->h_in_psset_alloc_container = in_container;
}
static inline bool
hpdata_purge_allowed_get(const hpdata_t *hpdata) {
return hpdata->h_purge_allowed;
}
static inline void
hpdata_purge_allowed_set(hpdata_t *hpdata, bool purge_allowed) {
assert(purge_allowed == false || !hpdata->h_mid_purge);
hpdata->h_purge_allowed = purge_allowed;
}
static inline bool
hpdata_hugify_allowed_get(const hpdata_t *hpdata) {
return hpdata->h_hugify_allowed;
}
static inline void
hpdata_allow_hugify(hpdata_t *hpdata, nstime_t now) {
assert(!hpdata->h_mid_hugify);
hpdata->h_hugify_allowed = true;
hpdata->h_time_hugify_allowed = now;
}
static inline nstime_t
hpdata_time_hugify_allowed(hpdata_t *hpdata) {
return hpdata->h_time_hugify_allowed;
}
static inline void
hpdata_disallow_hugify(hpdata_t *hpdata) {
hpdata->h_hugify_allowed = false;
}
static inline bool
hpdata_in_psset_hugify_container_get(const hpdata_t *hpdata) {
return hpdata->h_in_psset_hugify_container;
}
static inline void
hpdata_in_psset_hugify_container_set(hpdata_t *hpdata, bool in_container) {
assert(in_container != hpdata->h_in_psset_hugify_container);
hpdata->h_in_psset_hugify_container = in_container;
}
static inline bool
hpdata_mid_purge_get(const hpdata_t *hpdata) {
return hpdata->h_mid_purge;
}
static inline void
hpdata_mid_purge_set(hpdata_t *hpdata, bool mid_purge) {
assert(mid_purge != hpdata->h_mid_purge);
hpdata->h_mid_purge = mid_purge;
}
static inline bool
hpdata_mid_hugify_get(const hpdata_t *hpdata) {
return hpdata->h_mid_hugify;
}
static inline void
hpdata_mid_hugify_set(hpdata_t *hpdata, bool mid_hugify) {
assert(mid_hugify != hpdata->h_mid_hugify);
hpdata->h_mid_hugify = mid_hugify;
}
static inline bool
hpdata_changing_state_get(const hpdata_t *hpdata) {
return hpdata->h_mid_purge || hpdata->h_mid_hugify;
}
static inline bool
hpdata_updating_get(const hpdata_t *hpdata) {
return hpdata->h_updating;
}
static inline void
hpdata_updating_set(hpdata_t *hpdata, bool updating) {
assert(updating != hpdata->h_updating);
hpdata->h_updating = updating;
}
static inline bool
hpdata_in_psset_get(const hpdata_t *hpdata) {
return hpdata->h_in_psset;
}
static inline void
hpdata_in_psset_set(hpdata_t *hpdata, bool in_psset) {
assert(in_psset != hpdata->h_in_psset);
hpdata->h_in_psset = in_psset;
}
static inline size_t
hpdata_longest_free_range_get(const hpdata_t *hpdata) {
return hpdata->h_longest_free_range;
}
static inline void
hpdata_longest_free_range_set(hpdata_t *hpdata, size_t longest_free_range) {
assert(longest_free_range <= HUGEPAGE_PAGES);
hpdata->h_longest_free_range = longest_free_range;
}
static inline size_t
hpdata_nactive_get(const hpdata_t *hpdata) {
return hpdata->h_nactive;
}
static inline size_t
hpdata_ntouched_get(const hpdata_t *hpdata) {
return hpdata->h_ntouched;
}
static inline size_t
hpdata_ndirty_get(const hpdata_t *hpdata) {
return hpdata->h_ntouched - hpdata->h_nactive;
}
static inline size_t
hpdata_nretained_get(hpdata_t *hpdata) {
return HUGEPAGE_PAGES - hpdata->h_ntouched;
}
static inline void
hpdata_time_purge_allowed_set(hpdata_t *hpdata, const nstime_t *v) {
nstime_copy(&hpdata->h_time_purge_allowed, v);
}
static inline const nstime_t *
hpdata_time_purge_allowed_get(const hpdata_t *hpdata) {
return &hpdata->h_time_purge_allowed;
}
static inline bool
hpdata_purged_when_empty_and_huge_get(const hpdata_t *hpdata) {
return hpdata->h_purged_when_empty_and_huge;
}
static inline void
hpdata_purged_when_empty_and_huge_set(hpdata_t *hpdata, bool v) {
hpdata->h_purged_when_empty_and_huge = v;
}
static inline void
hpdata_assert_empty(hpdata_t *hpdata) {
assert(fb_empty(hpdata->active_pages, HUGEPAGE_PAGES));
assert(hpdata->h_nactive == 0);
}
/*
* Only used in tests, and in hpdata_assert_consistent, below. Verifies some
* consistency properties of the hpdata (e.g. that cached counts of page stats
* match computed ones).
*/
static inline bool
hpdata_consistent(hpdata_t *hpdata) {
bool res = true;
const size_t active_urange_longest = fb_urange_longest(
hpdata->active_pages, HUGEPAGE_PAGES);
const size_t longest_free_range = hpdata_longest_free_range_get(hpdata);
if (active_urange_longest != longest_free_range) {
malloc_printf(
"<jemalloc>: active_fb_urange_longest=%zu != hpdata_longest_free_range=%zu\n",
active_urange_longest, longest_free_range);
res = false;
}
const size_t active_scount = fb_scount(
hpdata->active_pages, HUGEPAGE_PAGES, 0, HUGEPAGE_PAGES);
if (active_scount != hpdata->h_nactive) {
malloc_printf(
"<jemalloc>: active_fb_scount=%zu != hpdata_nactive=%zu\n",
active_scount, hpdata->h_nactive);
res = false;
}
const size_t touched_scount = fb_scount(
hpdata->touched_pages, HUGEPAGE_PAGES, 0, HUGEPAGE_PAGES);
if (touched_scount != hpdata->h_ntouched) {
malloc_printf(
"<jemalloc>: touched_fb_scount=%zu != hpdata_ntouched=%zu\n",
touched_scount, hpdata->h_ntouched);
res = false;
}
if (hpdata->h_ntouched < hpdata->h_nactive) {
malloc_printf(
"<jemalloc>: hpdata_ntouched=%zu < hpdata_nactive=%zu\n",
hpdata->h_ntouched, hpdata->h_nactive);
res = false;
}
if (hpdata->h_huge && (hpdata->h_ntouched != HUGEPAGE_PAGES)) {
malloc_printf(
"<jemalloc>: hpdata_huge=%d && (hpdata_ntouched=%zu != hugepage_pages=%zu)\n",
hpdata->h_huge, hpdata->h_ntouched, HUGEPAGE_PAGES);
res = false;
}
const bool changing_state = hpdata_changing_state_get(hpdata);
if (changing_state
&& (hpdata->h_purge_allowed || hpdata->h_hugify_allowed)) {
malloc_printf(
"<jemalloc>: hpdata_changing_state=%d && (hpdata_purge_allowed=%d || hpdata_hugify_allowed=%d)\n",
changing_state, hpdata->h_purge_allowed,
hpdata->h_hugify_allowed);
res = false;
}
if (hpdata_hugify_allowed_get(hpdata)
!= hpdata_in_psset_hugify_container_get(hpdata)) {
malloc_printf(
"<jemalloc>: hpdata_hugify_allowed=%d != hpdata_in_psset_hugify_container=%d\n",
hpdata_hugify_allowed_get(hpdata),
hpdata_in_psset_hugify_container_get(hpdata));
res = false;
}
return res;
}
#define hpdata_assert_consistent(hpdata) \
do { \
assert(hpdata_consistent(hpdata)); \
} while (0)
static inline bool
hpdata_empty(const hpdata_t *hpdata) {
return hpdata->h_nactive == 0;
}
static inline bool
hpdata_full(const hpdata_t *hpdata) {
return hpdata->h_nactive == HUGEPAGE_PAGES;
}
void hpdata_init(hpdata_t *hpdata, void *addr, uint64_t age, bool is_huge);
/*
* Given an hpdata which can serve an allocation request, pick and reserve an
* offset within that allocation.
*/
void *hpdata_reserve_alloc(hpdata_t *hpdata, size_t sz);
void hpdata_unreserve(hpdata_t *hpdata, void *addr, size_t sz);
/*
* The hpdata_purge_prepare_t allows grabbing the metadata required to purge
* subranges of a hugepage while holding a lock, drop the lock during the actual
* purging of them, and reacquire it to update the metadata again.
*/
typedef struct hpdata_purge_state_s hpdata_purge_state_t;
struct hpdata_purge_state_s {
size_t npurged;
size_t ndirty_to_purge;
fb_group_t to_purge[FB_NGROUPS(HUGEPAGE_PAGES)];
size_t next_purge_search_begin;
};
/*
* Initializes purge state. The access to hpdata must be externally
* synchronized with other hpdata_* calls.
*
* You can tell whether or not a thread is purging or hugifying a given hpdata
* via hpdata_changing_state_get(hpdata). Racing hugification or purging
* operations aren't allowed.
*
* Once you begin purging, you have to follow through and call hpdata_purge_next
* until you're done, and then end. Allocating out of an hpdata undergoing
* purging is not allowed.
*
* Returns the number of dirty pages that will be purged and sets nranges
* to number of ranges with dirty pages that will be purged.
*/
size_t hpdata_purge_begin(
hpdata_t *hpdata, hpdata_purge_state_t *purge_state, size_t *nranges);
/*
* If there are more extents to purge, sets *r_purge_addr and *r_purge_size to
* true, and returns true. Otherwise, returns false to indicate that we're
* done.
*
* This requires exclusive access to the purge state, but *not* to the hpdata.
* In particular, unreserve calls are allowed while purging (i.e. you can dalloc
* into one part of the hpdata while purging a different part).
*/
bool hpdata_purge_next(hpdata_t *hpdata, hpdata_purge_state_t *purge_state,
void **r_purge_addr, size_t *r_purge_size);
/*
* Updates the hpdata metadata after all purging is done. Needs external
* synchronization.
*/
void hpdata_purge_end(hpdata_t *hpdata, hpdata_purge_state_t *purge_state);
void hpdata_hugify(hpdata_t *hpdata);
void hpdata_dehugify(hpdata_t *hpdata);
#endif /* JEMALLOC_INTERNAL_HPDATA_H */

View file

@ -0,0 +1,43 @@
#ifndef JEMALLOC_INTERNAL_INSPECT_H
#define JEMALLOC_INTERNAL_INSPECT_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/tsd_types.h"
/*
* This module contains the heap introspection capabilities. For now they are
* exposed purely through mallctl APIs in the experimental namespace, but this
* may change over time.
*/
/*
* The following two structs are for experimental purposes. See
* experimental_utilization_query_ctl and
* experimental_utilization_batch_query_ctl in src/ctl.c.
*/
typedef struct inspect_extent_util_stats_s inspect_extent_util_stats_t;
struct inspect_extent_util_stats_s {
size_t nfree;
size_t nregs;
size_t size;
};
typedef struct inspect_extent_util_stats_verbose_s
inspect_extent_util_stats_verbose_t;
struct inspect_extent_util_stats_verbose_s {
void *slabcur_addr;
size_t nfree;
size_t nregs;
size_t size;
size_t bin_nfree;
size_t bin_nregs;
};
void inspect_extent_util_stats_get(
tsdn_t *tsdn, const void *ptr, size_t *nfree, size_t *nregs, size_t *size);
void inspect_extent_util_stats_verbose_get(tsdn_t *tsdn, const void *ptr,
size_t *nfree, size_t *nregs, size_t *size, size_t *bin_nfree,
size_t *bin_nregs, void **slabcur_addr);
#endif /* JEMALLOC_INTERNAL_INSPECT_H */

View file

@ -3,46 +3,65 @@
#include <math.h>
#ifdef _WIN32
# include <windows.h>
# include "msvc_compat/windows_extra.h"
# include <windows.h>
# include "msvc_compat/windows_extra.h"
# include "msvc_compat/strings.h"
# ifdef _WIN64
# if LG_VADDR <= 32
# error Generate the headers using x64 vcargs
# endif
# else
# if LG_VADDR > 32
# undef LG_VADDR
# define LG_VADDR 32
# endif
# endif
#else
# include <sys/param.h>
# include <sys/mman.h>
# if !defined(__pnacl__) && !defined(__native_client__)
# include <sys/syscall.h>
# if !defined(SYS_write) && defined(__NR_write)
# define SYS_write __NR_write
# endif
# if defined(SYS_open) && defined(__aarch64__)
/* Android headers may define SYS_open to __NR_open even though
# include <sys/param.h>
# include <sys/mman.h>
# if !defined(__pnacl__) && !defined(__native_client__)
# include <sys/syscall.h>
# if !defined(SYS_write) && defined(__NR_write)
# define SYS_write __NR_write
# endif
# if defined(SYS_open) && defined(__aarch64__)
/* Android headers may define SYS_open to __NR_open even though
* __NR_open may not exist on AArch64 (superseded by __NR_openat). */
# undef SYS_open
# endif
# include <sys/uio.h>
# endif
# include <pthread.h>
# ifdef JEMALLOC_OS_UNFAIR_LOCK
# include <os/lock.h>
# endif
# ifdef JEMALLOC_GLIBC_MALLOC_HOOK
# include <sched.h>
# endif
# include <errno.h>
# include <sys/time.h>
# include <time.h>
# ifdef JEMALLOC_HAVE_MACH_ABSOLUTE_TIME
# include <mach/mach_time.h>
# endif
# undef SYS_open
# endif
# include <sys/uio.h>
# endif
# include <pthread.h>
# if defined(__FreeBSD__) || defined(__DragonFly__) \
|| defined(__OpenBSD__)
# include <pthread_np.h>
# include <sched.h>
# if defined(__FreeBSD__)
# define cpu_set_t cpuset_t
# endif
# endif
# include <signal.h>
# ifdef JEMALLOC_OS_UNFAIR_LOCK
# include <os/lock.h>
# endif
# ifdef JEMALLOC_GLIBC_MALLOC_HOOK
# include <sched.h>
# endif
# include <errno.h>
# include <sys/time.h>
# include <time.h>
# ifdef JEMALLOC_HAVE_MACH_ABSOLUTE_TIME
# include <mach/mach_time.h>
# endif
#endif
#include <sys/types.h>
#include <limits.h>
#ifndef SIZE_T_MAX
# define SIZE_T_MAX SIZE_MAX
# define SIZE_T_MAX SIZE_MAX
#endif
#ifndef SSIZE_MAX
# define SSIZE_MAX ((ssize_t)(SIZE_T_MAX >> 1))
# define SSIZE_MAX ((ssize_t)(SIZE_T_MAX >> 1))
#endif
#include <stdarg.h>
#include <stdbool.h>
@ -51,31 +70,57 @@
#include <stdint.h>
#include <stddef.h>
#ifndef offsetof
# define offsetof(type, member) ((size_t)&(((type *)NULL)->member))
# define offsetof(type, member) ((size_t) & (((type *)NULL)->member))
#endif
#include <string.h>
#include <strings.h>
#include <ctype.h>
#ifdef _MSC_VER
# include <io.h>
# include <io.h>
typedef intptr_t ssize_t;
# define PATH_MAX 1024
# define STDERR_FILENO 2
# define __func__ __FUNCTION__
# ifdef JEMALLOC_HAS_RESTRICT
# define restrict __restrict
# endif
# define PATH_MAX 1024
# define STDERR_FILENO 2
# define __func__ __FUNCTION__
# ifdef JEMALLOC_HAS_RESTRICT
# define restrict __restrict
# endif
/* Disable warnings about deprecated system functions. */
# pragma warning(disable: 4996)
#if _MSC_VER < 1800
# pragma warning(disable : 4996)
# if _MSC_VER < 1800
static int
isblank(int c) {
return (c == '\t' || c == ' ');
}
#endif
# endif
#else
# include <unistd.h>
# include <unistd.h>
#endif
#include <fcntl.h>
/*
* The Win32 midl compiler has #define small char; we don't use midl, but
* "small" is a nice identifier to have available when talking about size
* classes.
*/
#ifdef small
# undef small
#endif
/*
* Oftentimes we'd like to perform some kind of arithmetic to obtain
* a pointer from another pointer but with some offset or mask applied.
* Naively you would accomplish this by casting the source pointer to
* `uintptr_t`, performing all of the relevant arithmetic, and then casting
* the result to the desired pointer type. However, this has the unfortunate
* side-effect of concealing pointer provenance, hiding useful information for
* optimization from the compiler (see here for details:
* https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html
* )
* Instead what one should do is cast the source pointer to `char *` and perform
* the equivalent arithmetic (since `char` of course represents one byte). But
* because `char *` has the semantic meaning of "string", we define this typedef
* simply to make it clearer where we are performing such pointer arithmetic.
*/
typedef char byte_t;
#endif /* JEMALLOC_INTERNAL_H */

View file

@ -14,10 +14,13 @@
*/
#undef JEMALLOC_OVERRIDE___LIBC_CALLOC
#undef JEMALLOC_OVERRIDE___LIBC_FREE
#undef JEMALLOC_OVERRIDE___LIBC_FREE_SIZED
#undef JEMALLOC_OVERRIDE___LIBC_FREE_ALIGNED_SIZED
#undef JEMALLOC_OVERRIDE___LIBC_MALLOC
#undef JEMALLOC_OVERRIDE___LIBC_MEMALIGN
#undef JEMALLOC_OVERRIDE___LIBC_REALLOC
#undef JEMALLOC_OVERRIDE___LIBC_VALLOC
#undef JEMALLOC_OVERRIDE___LIBC_PVALLOC
#undef JEMALLOC_OVERRIDE___POSIX_MEMALIGN
/*
@ -33,6 +36,8 @@
* order to yield to another virtual CPU.
*/
#undef CPU_SPINWAIT
/* 1 if CPU_SPINWAIT is defined, 0 otherwise. */
#undef HAVE_CPU_SPINWAIT
/*
* Number of significant bits in virtual addresses. This may be less than the
@ -46,25 +51,13 @@
/* Defined if GCC __atomic atomics are available. */
#undef JEMALLOC_GCC_ATOMIC_ATOMICS
/* and the 8-bit variant support. */
#undef JEMALLOC_GCC_U8_ATOMIC_ATOMICS
/* Defined if GCC __sync atomics are available. */
#undef JEMALLOC_GCC_SYNC_ATOMICS
/*
* Defined if __sync_add_and_fetch(uint32_t *, uint32_t) and
* __sync_sub_and_fetch(uint32_t *, uint32_t) are available, despite
* __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 not being defined (which means the
* functions are defined in libgcc instead of being inlines).
*/
#undef JE_FORCE_SYNC_COMPARE_AND_SWAP_4
/*
* Defined if __sync_add_and_fetch(uint64_t *, uint64_t) and
* __sync_sub_and_fetch(uint64_t *, uint64_t) are available, despite
* __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 not being defined (which means the
* functions are defined in libgcc instead of being inlines).
*/
#undef JE_FORCE_SYNC_COMPARE_AND_SWAP_8
/* and the 8-bit variant support. */
#undef JEMALLOC_GCC_U8_SYNC_ATOMICS
/*
* Defined if __builtin_clz() and __builtin_clzl() are available.
@ -76,12 +69,6 @@
*/
#undef JEMALLOC_OS_UNFAIR_LOCK
/*
* Defined if OSSpin*() functions are available, as provided by Darwin, and
* documented in the spinlock(3) manual page.
*/
#undef JEMALLOC_OSSPIN
/* Defined if syscall(2) is usable. */
#undef JEMALLOC_USE_SYSCALL
@ -98,6 +85,18 @@
/* Defined if pthread_atfork(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_ATFORK
/* Defined if pthread_setname_np(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_SETNAME_NP
/* Defined if pthread_getname_np(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_GETNAME_NP
/* Defined if pthread_set_name_np(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_SET_NAME_NP
/* Defined if pthread_get_name_np(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_GET_NAME_NP
/*
* Defined if clock_gettime(CLOCK_MONOTONIC_COARSE, ...) is available.
*/
@ -113,6 +112,16 @@
*/
#undef JEMALLOC_HAVE_MACH_ABSOLUTE_TIME
/*
* Defined if clock_gettime(CLOCK_REALTIME, ...) is available.
*/
#undef JEMALLOC_HAVE_CLOCK_REALTIME
/*
* Defined if clock_gettime_nsec_np(CLOCK_UPTIME_RAW) is available.
*/
#undef JEMALLOC_HAVE_CLOCK_GETTIME_NSEC_NP
/*
* Defined if _malloc_thread_cleanup() exists. At least in the case of
* FreeBSD, pthread_key_create() allocates, which if used during malloc
@ -148,6 +157,9 @@
/* JEMALLOC_STATS enables statistics calculation. */
#undef JEMALLOC_STATS
/* JEMALLOC_EXPERIMENTAL_SMALLOCX_API enables experimental smallocx API. */
#undef JEMALLOC_EXPERIMENTAL_SMALLOCX_API
/* JEMALLOC_PROF enables allocation profiling. */
#undef JEMALLOC_PROF
@ -160,6 +172,15 @@
/* Use gcc intrinsics for profile backtracing if defined. */
#undef JEMALLOC_PROF_GCC
/* Use frame pointer for profile backtracing if defined. Linux only. */
#undef JEMALLOC_PROF_FRAME_POINTER
/* JEMALLOC_PAGEID enabled page id */
#undef JEMALLOC_PAGEID
/* JEMALLOC_HAVE_PRCTL checks prctl */
#undef JEMALLOC_HAVE_PRCTL
/*
* JEMALLOC_DSS enables use of sbrk(2) to allocate extents from the data storage
* segment (DSS).
@ -172,6 +193,9 @@
/* Support utrace(2)-based tracing. */
#undef JEMALLOC_UTRACE
/* Support utrace(2)-based tracing (label based signature). */
#undef JEMALLOC_UTRACE_LABEL
/* Support optional abort() on OOM. */
#undef JEMALLOC_XMALLOC
@ -187,6 +211,9 @@
/* One page is 2^LG_PAGE bytes. */
#undef LG_PAGE
/* Maximum number of regions in a slab. */
#undef CONFIG_LG_SLAB_MAXREGS
/*
* One huge page is 2^LG_HUGEPAGE bytes. Note that this is defined even if the
* system does not explicitly support huge pages; system calls that require
@ -228,12 +255,36 @@
#undef JEMALLOC_INTERNAL_FFSL
#undef JEMALLOC_INTERNAL_FFS
/*
* popcount*() functions to use for bitmapping.
*/
#undef JEMALLOC_INTERNAL_POPCOUNTL
#undef JEMALLOC_INTERNAL_POPCOUNT
/*
* If defined, explicitly attempt to more uniformly distribute large allocation
* pointer alignments across all cache indices.
*/
#undef JEMALLOC_CACHE_OBLIVIOUS
/*
* If defined, enable logging facilities. We make this a configure option to
* avoid taking extra branches everywhere.
*/
#undef JEMALLOC_LOG
/*
* If defined, use readlinkat() (instead of readlink()) to follow
* /etc/malloc_conf.
*/
#undef JEMALLOC_READLINKAT
/*
* If defined, use getenv() (instead of secure_getenv() or
* alternatives) to access MALLOC_CONF.
*/
#undef JEMALLOC_FORCE_GETENV
/*
* Darwin (OS X) uses zones to work around Mach-O symbol override shortcomings.
*/
@ -251,6 +302,19 @@
/* Defined if madvise(2) is available. */
#undef JEMALLOC_HAVE_MADVISE
/*
* Defined if transparent huge pages are supported via the MADV_[NO]HUGEPAGE
* arguments to madvise(2).
*/
#undef JEMALLOC_HAVE_MADVISE_HUGE
/*
* Defined if best-effort synchronous collapse of the native
* pages mapped by the memory range into transparent huge pages is supported
* via MADV_COLLAPSE arguments to madvise(2).
*/
#undef JEMALLOC_HAVE_MADVISE_COLLAPSE
/*
* Methods for purging unused pages differ between operating systems.
*
@ -268,12 +332,63 @@
#undef JEMALLOC_PURGE_MADVISE_DONTNEED
#undef JEMALLOC_PURGE_MADVISE_DONTNEED_ZEROS
/* Defined if madvise(2) is available but MADV_FREE is not (x86 Linux only). */
#undef JEMALLOC_DEFINE_MADVISE_FREE
/*
* Defined if MADV_DO[NT]DUMP is supported as an argument to madvise.
*/
#undef JEMALLOC_MADVISE_DONTDUMP
/*
* Defined if MADV_[NO]CORE is supported as an argument to madvise.
*/
#undef JEMALLOC_MADVISE_NOCORE
/* Defined if process_madvise(2) is available. */
#undef JEMALLOC_HAVE_PROCESS_MADVISE
#undef EXPERIMENTAL_SYS_PROCESS_MADVISE_NR
/* Defined if mprotect(2) is available. */
#undef JEMALLOC_HAVE_MPROTECT
/* Defined if sys/sdt.h is available and sdt tracing enabled */
#undef JEMALLOC_EXPERIMENTAL_USDT_STAP
/*
* Defined if sys/sdt.h is unavailable, sdt tracing enabled, and
* platform is supported
*/
#undef JEMALLOC_EXPERIMENTAL_USDT_CUSTOM
/*
* Defined if transparent huge pages (THPs) are supported via the
* MADV_[NO]HUGEPAGE arguments to madvise(2), and THP support is enabled.
*/
#undef JEMALLOC_THP
/* Defined if posix_madvise is available. */
#undef JEMALLOC_HAVE_POSIX_MADVISE
/*
* Method for purging unused pages using posix_madvise.
*
* posix_madvise(..., POSIX_MADV_DONTNEED)
*/
#undef JEMALLOC_PURGE_POSIX_MADVISE_DONTNEED
#undef JEMALLOC_PURGE_POSIX_MADVISE_DONTNEED_ZEROS
/*
* Defined if memcntl page admin call is supported
*/
#undef JEMALLOC_HAVE_MEMCNTL
/*
* Defined if malloc_size is supported
*/
#undef JEMALLOC_HAVE_MALLOC_SIZE
/* Define if operating system has alloca.h header. */
#undef JEMALLOC_HAS_ALLOCA_H
@ -310,12 +425,18 @@
/* Adaptive mutex support in pthreads. */
#undef JEMALLOC_HAVE_PTHREAD_MUTEX_ADAPTIVE_NP
/* gettid() support */
#undef JEMALLOC_HAVE_GETTID
/* GNU specific sched_getcpu support */
#undef JEMALLOC_HAVE_SCHED_GETCPU
/* GNU specific sched_setaffinity support */
#undef JEMALLOC_HAVE_SCHED_SETAFFINITY
/* pthread_setaffinity_np support */
#undef JEMALLOC_HAVE_PTHREAD_SETAFFINITY_NP
/*
* If defined, all the features necessary for background threads are present.
*/
@ -333,4 +454,41 @@
/* If defined, jemalloc takes the malloc/free/etc. symbol names. */
#undef JEMALLOC_IS_MALLOC
/*
* Defined if strerror_r returns char * if _GNU_SOURCE is defined.
*/
#undef JEMALLOC_STRERROR_R_RETURNS_CHAR_WITH_GNU_SOURCE
/* Performs additional safety checks when defined. */
#undef JEMALLOC_OPT_SAFETY_CHECKS
/* Is C++ support being built? */
#undef JEMALLOC_ENABLE_CXX
/* Performs additional size checks when defined. */
#undef JEMALLOC_OPT_SIZE_CHECKS
/* Allows sampled junk and stash for checking use-after-free when defined. */
#undef JEMALLOC_UAF_DETECTION
/* Darwin VM_MAKE_TAG support */
#undef JEMALLOC_HAVE_VM_MAKE_TAG
/* If defined, realloc(ptr, 0) defaults to "free" instead of "alloc". */
#undef JEMALLOC_ZERO_REALLOC_DEFAULT_FREE
/* If defined, use volatile asm during benchmarks. */
#undef JEMALLOC_HAVE_ASM_VOLATILE
/*
* If defined, support the use of rdtscp to get the time stamp counter
* and the processor ID.
*/
#undef JEMALLOC_HAVE_RDTSCP
/* If defined, use __int128 for optimization. */
#undef JEMALLOC_HAVE_INT128
#include "jemalloc/internal/jemalloc_internal_overrides.h"
#endif /* JEMALLOC_INTERNAL_DEFS_H_ */

View file

@ -1,23 +1,55 @@
#ifndef JEMALLOC_INTERNAL_EXTERNS_H
#define JEMALLOC_INTERNAL_EXTERNS_H
#include "jemalloc/internal/arena_types.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/size_classes.h"
#include "jemalloc/internal/fxp.h"
#include "jemalloc/internal/hpa_opts.h"
#include "jemalloc/internal/nstime.h"
#include "jemalloc/internal/sec_opts.h"
#include "jemalloc/internal/tsd_types.h"
/* TSD checks this to set thread local slow state accordingly. */
extern bool malloc_slow;
/* Run-time options. */
extern bool opt_abort;
extern bool opt_abort_conf;
extern bool opt_abort;
extern bool opt_abort_conf;
extern bool opt_trust_madvise;
extern bool opt_experimental_hpa_start_huge_if_thp_always;
extern bool opt_experimental_hpa_enforce_hugify;
extern bool opt_confirm_conf;
extern bool opt_hpa;
extern hpa_shard_opts_t opt_hpa_opts;
extern sec_opts_t opt_hpa_sec_opts;
extern const char *opt_junk;
extern bool opt_junk_alloc;
extern bool opt_junk_free;
extern bool opt_utrace;
extern bool opt_xmalloc;
extern bool opt_zero;
extern unsigned opt_narenas;
extern bool opt_junk_alloc;
extern bool opt_junk_free;
extern void (*JET_MUTABLE junk_free_callback)(void *ptr, size_t size);
extern void (*JET_MUTABLE junk_alloc_callback)(void *ptr, size_t size);
extern void (*JET_MUTABLE invalid_conf_abort)(void);
extern bool opt_utrace;
extern bool opt_xmalloc;
extern bool opt_experimental_infallible_new;
extern bool opt_experimental_tcache_gc;
extern bool opt_zero;
extern unsigned opt_narenas;
extern fxp_t opt_narenas_ratio;
extern zero_realloc_action_t opt_zero_realloc_action;
extern malloc_init_t malloc_init_state;
extern const char *const zero_realloc_mode_names[];
extern atomic_zu_t zero_realloc_count;
extern bool opt_cache_oblivious;
extern unsigned opt_debug_double_free_max_scan;
extern size_t opt_calloc_madvise_threshold;
extern bool opt_disable_large_size_classes;
extern const char *opt_malloc_conf_symlink;
extern const char *opt_malloc_conf_env_var;
/* Escape free-fastpath when ptr & mask == 0 (for sanitization purpose). */
extern uintptr_t san_cache_bin_nonfast_mask;
/* Number of CPUs. */
extern unsigned ncpus;
@ -25,29 +57,35 @@ extern unsigned ncpus;
/* Number of arenas used for automatic multiplexing of threads and arenas. */
extern unsigned narenas_auto;
/* Base index for manual arenas. */
extern unsigned manual_arena_base;
/*
* Arenas that are used to service external requests. Not all elements of the
* arenas array are necessarily used; arenas are created lazily as needed.
*/
extern atomic_p_t arenas[];
void *a0malloc(size_t size);
void a0dalloc(void *ptr);
void *bootstrap_malloc(size_t size);
void *bootstrap_calloc(size_t num, size_t size);
void bootstrap_free(void *ptr);
void arena_set(unsigned ind, arena_t *arena);
extern unsigned huge_arena_ind;
void *a0malloc(size_t size);
void a0dalloc(void *ptr);
void *bootstrap_malloc(size_t size);
void *bootstrap_calloc(size_t num, size_t size);
void bootstrap_free(void *ptr);
void arena_set(unsigned ind, arena_t *arena);
unsigned narenas_total_get(void);
arena_t *arena_init(tsdn_t *tsdn, unsigned ind, extent_hooks_t *extent_hooks);
arena_tdata_t *arena_tdata_get_hard(tsd_t *tsd, unsigned ind);
arena_t *arena_init(tsdn_t *tsdn, unsigned ind, const arena_config_t *config);
arena_t *arena_choose_hard(tsd_t *tsd, bool internal);
void arena_migrate(tsd_t *tsd, unsigned oldind, unsigned newind);
void iarena_cleanup(tsd_t *tsd);
void arena_cleanup(tsd_t *tsd);
void arenas_tdata_cleanup(tsd_t *tsd);
void jemalloc_prefork(void);
void jemalloc_postfork_parent(void);
void jemalloc_postfork_child(void);
bool malloc_initialized(void);
void arena_migrate(tsd_t *tsd, arena_t *oldarena, arena_t *newarena);
void iarena_cleanup(tsd_t *tsd);
void arena_cleanup(tsd_t *tsd);
size_t batch_alloc(void **ptrs, size_t num, size_t size, int flags);
void jemalloc_prefork(void);
void jemalloc_postfork_parent(void);
void jemalloc_postfork_child(void);
void sdallocx_default(void *ptr, size_t size, int flags);
void free_default(void *ptr);
void *malloc_default(size_t size);
#endif /* JEMALLOC_INTERNAL_EXTERNS_H */

View file

@ -10,7 +10,7 @@
* structs, externs, and inlines), and included each header file multiple times
* in this file, picking out the portion we want on each pass using the
* following #defines:
* JEMALLOC_H_TYPES : Preprocessor-defined constants and psuedo-opaque data
* JEMALLOC_H_TYPES : Preprocessor-defined constants and pseudo-opaque data
* types.
* JEMALLOC_H_STRUCTS : Data structures.
* JEMALLOC_H_EXTERNS : Extern data declarations and function prototypes.
@ -40,8 +40,6 @@
/* TYPES */
/******************************************************************************/
#include "jemalloc/internal/extent_types.h"
#include "jemalloc/internal/base_types.h"
#include "jemalloc/internal/arena_types.h"
#include "jemalloc/internal/tcache_types.h"
#include "jemalloc/internal/prof_types.h"
@ -50,11 +48,8 @@
/* STRUCTS */
/******************************************************************************/
#include "jemalloc/internal/arena_structs_a.h"
#include "jemalloc/internal/extent_structs.h"
#include "jemalloc/internal/base_structs.h"
#include "jemalloc/internal/prof_structs.h"
#include "jemalloc/internal/arena_structs_b.h"
#include "jemalloc/internal/arena_structs.h"
#include "jemalloc/internal/tcache_structs.h"
#include "jemalloc/internal/background_thread_structs.h"
@ -63,8 +58,6 @@
/******************************************************************************/
#include "jemalloc/internal/jemalloc_internal_externs.h"
#include "jemalloc/internal/extent_externs.h"
#include "jemalloc/internal/base_externs.h"
#include "jemalloc/internal/arena_externs.h"
#include "jemalloc/internal/large_externs.h"
#include "jemalloc/internal/tcache_externs.h"
@ -76,19 +69,16 @@
/******************************************************************************/
#include "jemalloc/internal/jemalloc_internal_inlines_a.h"
#include "jemalloc/internal/base_inlines.h"
/*
* Include portions of arena code interleaved with tcache code in order to
* resolve circular dependencies.
*/
#include "jemalloc/internal/prof_inlines_a.h"
#include "jemalloc/internal/arena_inlines_a.h"
#include "jemalloc/internal/extent_inlines.h"
#include "jemalloc/internal/jemalloc_internal_inlines_b.h"
#include "jemalloc/internal/tcache_inlines.h"
#include "jemalloc/internal/arena_inlines_b.h"
#include "jemalloc/internal/jemalloc_internal_inlines_c.h"
#include "jemalloc/internal/prof_inlines_b.h"
#include "jemalloc/internal/prof_inlines.h"
#include "jemalloc/internal/background_thread_inlines.h"
#endif /* JEMALLOC_INTERNAL_INCLUDES_H */

View file

@ -1,17 +1,32 @@
#ifndef JEMALLOC_INTERNAL_INLINES_A_H
#define JEMALLOC_INTERNAL_INLINES_A_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_externs.h"
#include "jemalloc/internal/arena_types.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/bit_util.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/size_classes.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/tcache_externs.h"
#include "jemalloc/internal/ticker.h"
JEMALLOC_ALWAYS_INLINE malloc_cpuid_t
malloc_getcpu(void) {
assert(have_percpu_arena);
#if defined(JEMALLOC_HAVE_SCHED_GETCPU)
#if defined(_WIN32)
return GetCurrentProcessorNumber();
#elif defined(JEMALLOC_HAVE_SCHED_GETCPU)
return (malloc_cpuid_t)sched_getcpu();
#elif defined(JEMALLOC_HAVE_RDTSCP)
unsigned int ecx;
asm volatile("rdtscp" : "=c"(ecx)::"eax", "edx");
return (malloc_cpuid_t)(ecx & 0xfff);
#elif defined(__aarch64__) && defined(__APPLE__)
/* Other oses most likely use tpidr_el0 instead */
uintptr_t c;
asm volatile("mrs %x0, tpidrro_el0" : "=r"(c)::"memory");
return (malloc_cpuid_t)(c & (1 << 3) - 1);
#else
not_reached();
return -1;
@ -27,8 +42,8 @@ percpu_arena_choose(void) {
assert(cpuid >= 0);
unsigned arena_ind;
if ((opt_percpu_arena == percpu_arena) || ((unsigned)cpuid < ncpus /
2)) {
if ((opt_percpu_arena == percpu_arena)
|| ((unsigned)cpuid < ncpus / 2)) {
arena_ind = cpuid;
} else {
assert(opt_percpu_arena == per_phycpu_arena);
@ -54,31 +69,6 @@ percpu_arena_ind_limit(percpu_arena_mode_t mode) {
}
}
static inline arena_tdata_t *
arena_tdata_get(tsd_t *tsd, unsigned ind, bool refresh_if_missing) {
arena_tdata_t *tdata;
arena_tdata_t *arenas_tdata = tsd_arenas_tdata_get(tsd);
if (unlikely(arenas_tdata == NULL)) {
/* arenas_tdata hasn't been initialized yet. */
return arena_tdata_get_hard(tsd, ind);
}
if (unlikely(ind >= tsd_narenas_tdata_get(tsd))) {
/*
* ind is invalid, cache is old (too small), or tdata to be
* initialized.
*/
return (refresh_if_missing ? arena_tdata_get_hard(tsd, ind) :
NULL);
}
tdata = &arenas_tdata[ind];
if (likely(tdata != NULL) || !refresh_if_missing) {
return tdata;
}
return arena_tdata_get_hard(tsd, ind);
}
static inline arena_t *
arena_get(tsdn_t *tsdn, unsigned ind, bool init_if_missing) {
arena_t *ret;
@ -88,36 +78,12 @@ arena_get(tsdn_t *tsdn, unsigned ind, bool init_if_missing) {
ret = (arena_t *)atomic_load_p(&arenas[ind], ATOMIC_ACQUIRE);
if (unlikely(ret == NULL)) {
if (init_if_missing) {
ret = arena_init(tsdn, ind,
(extent_hooks_t *)&extent_hooks_default);
ret = arena_init(tsdn, ind, &arena_config_default);
}
}
return ret;
}
static inline ticker_t *
decay_ticker_get(tsd_t *tsd, unsigned ind) {
arena_tdata_t *tdata;
tdata = arena_tdata_get(tsd, ind, true);
if (unlikely(tdata == NULL)) {
return NULL;
}
return &tdata->decay_ticker;
}
JEMALLOC_ALWAYS_INLINE tcache_bin_t *
tcache_small_bin_get(tcache_t *tcache, szind_t binind) {
assert(binind < NBINS);
return &tcache->tbins_small[binind];
}
JEMALLOC_ALWAYS_INLINE tcache_bin_t *
tcache_large_bin_get(tcache_t *tcache, szind_t binind) {
assert(binind >= NBINS &&binind < nhbins);
return &tcache->tbins_large[binind - NBINS];
}
JEMALLOC_ALWAYS_INLINE bool
tcache_available(tsd_t *tsd) {
/*
@ -127,9 +93,9 @@ tcache_available(tsd_t *tsd) {
*/
if (likely(tsd_tcache_enabled_get(tsd))) {
/* Associated arena == NULL implies tcache init in progress. */
assert(tsd_tcachep_get(tsd)->arena == NULL ||
tcache_small_bin_get(tsd_tcachep_get(tsd), 0)->avail !=
NULL);
if (config_debug && tsd_tcache_slowp_get(tsd)->arena != NULL) {
tcache_assert_initialized(tsd_tcachep_get(tsd));
}
return true;
}
@ -145,24 +111,25 @@ tcache_get(tsd_t *tsd) {
return tsd_tcachep_get(tsd);
}
static inline void
pre_reentrancy(tsd_t *tsd) {
bool fast = tsd_fast(tsd);
++*tsd_reentrancy_levelp_get(tsd);
if (fast) {
/* Prepare slow path for reentrancy. */
tsd_slow_update(tsd);
assert(tsd->state == tsd_state_nominal_slow);
JEMALLOC_ALWAYS_INLINE tcache_slow_t *
tcache_slow_get(tsd_t *tsd) {
if (!tcache_available(tsd)) {
return NULL;
}
return tsd_tcache_slowp_get(tsd);
}
static inline void
pre_reentrancy(tsd_t *tsd, arena_t *arena) {
/* arena is the current context. Reentry from a0 is not allowed. */
assert(arena != arena_get(tsd_tsdn(tsd), 0, false));
tsd_pre_reentrancy_raw(tsd);
}
static inline void
post_reentrancy(tsd_t *tsd) {
int8_t *reentrancy_level = tsd_reentrancy_levelp_get(tsd);
assert(*reentrancy_level > 0);
if (--*reentrancy_level == 0) {
tsd_slow_update(tsd);
}
tsd_post_reentrancy_raw(tsd);
}
#endif /* JEMALLOC_INTERNAL_INLINES_A_H */

View file

@ -1,7 +1,34 @@
#ifndef JEMALLOC_INTERNAL_INLINES_B_H
#define JEMALLOC_INTERNAL_INLINES_B_H
#include "jemalloc/internal/rtree.h"
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_inlines_a.h"
#include "jemalloc/internal/extent.h"
#include "jemalloc/internal/jemalloc_internal_inlines_a.h"
static inline void
percpu_arena_update(tsd_t *tsd, unsigned cpu) {
assert(have_percpu_arena);
arena_t *oldarena = tsd_arena_get(tsd);
assert(oldarena != NULL);
unsigned oldind = arena_ind_get(oldarena);
if (oldind != cpu) {
unsigned newind = cpu;
arena_t *newarena = arena_get(tsd_tsdn(tsd), newind, true);
assert(newarena != NULL);
/* Set new arena/tcache associations. */
arena_migrate(tsd, oldarena, newarena);
tcache_t *tcache = tcache_get(tsd);
if (tcache != NULL) {
tcache_slow_t *tcache_slow = tsd_tcache_slowp_get(tsd);
assert(tcache_slow->arena != NULL);
tcache_arena_reassociate(
tsd_tsdn(tsd), tcache_slow, tcache, newarena);
}
}
}
/* Choose an arena based on a per-thread value. */
static inline arena_t *
@ -22,18 +49,19 @@ arena_choose_impl(tsd_t *tsd, arena_t *arena, bool internal) {
ret = arena_choose_hard(tsd, internal);
assert(ret);
if (tcache_available(tsd)) {
tcache_t *tcache = tcache_get(tsd);
if (tcache->arena != NULL) {
/* See comments in tcache_data_init().*/
assert(tcache->arena ==
arena_get(tsd_tsdn(tsd), 0, false));
if (tcache->arena != ret) {
tcache_slow_t *tcache_slow = tsd_tcache_slowp_get(tsd);
tcache_t *tcache = tsd_tcachep_get(tsd);
if (tcache_slow->arena != NULL) {
/* See comments in tsd_tcache_data_init().*/
assert(tcache_slow->arena
== arena_get(tsd_tsdn(tsd), 0, false));
if (tcache_slow->arena != ret) {
tcache_arena_reassociate(tsd_tsdn(tsd),
tcache, ret);
tcache_slow, tcache, ret);
}
} else {
tcache_arena_associate(tsd_tsdn(tsd), tcache,
ret);
tcache_arena_associate(
tsd_tsdn(tsd), tcache_slow, tcache, ret);
}
}
}
@ -43,10 +71,10 @@ arena_choose_impl(tsd_t *tsd, arena_t *arena, bool internal) {
* auto percpu arena range, (i.e. thread is assigned to a manually
* managed arena), then percpu arena is skipped.
*/
if (have_percpu_arena && PERCPU_ARENA_ENABLED(opt_percpu_arena) &&
!internal && (arena_ind_get(ret) <
percpu_arena_ind_limit(opt_percpu_arena)) && (ret->last_thd !=
tsd_tsdn(tsd))) {
if (have_percpu_arena && PERCPU_ARENA_ENABLED(opt_percpu_arena)
&& !internal
&& (arena_ind_get(ret) < percpu_arena_ind_limit(opt_percpu_arena))
&& (ret->last_thd != tsd_tsdn(tsd))) {
unsigned ind = percpu_arena_choose();
if (arena_ind_get(ret) != ind) {
percpu_arena_update(tsd, ind);
@ -71,16 +99,8 @@ arena_ichoose(tsd_t *tsd, arena_t *arena) {
static inline bool
arena_is_auto(arena_t *arena) {
assert(narenas_auto > 0);
return (arena_ind_get(arena) < narenas_auto);
}
JEMALLOC_ALWAYS_INLINE extent_t *
iealloc(tsdn_t *tsdn, const void *ptr) {
rtree_ctx_t rtree_ctx_fallback;
rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback);
return rtree_extent_read(tsdn, &extents_rtree, rtree_ctx,
(uintptr_t)ptr, true);
return (arena_ind_get(arena) < manual_arena_base);
}
#endif /* JEMALLOC_INTERNAL_INLINES_B_H */

View file

@ -1,10 +1,44 @@
#ifndef JEMALLOC_INTERNAL_INLINES_C_H
#define JEMALLOC_INTERNAL_INLINES_C_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_externs.h"
#include "jemalloc/internal/arena_inlines_b.h"
#include "jemalloc/internal/emap.h"
#include "jemalloc/internal/hook.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/log.h"
#include "jemalloc/internal/sz.h"
#include "jemalloc/internal/thread_event.h"
#include "jemalloc/internal/witness.h"
/*
* These correspond to the macros in jemalloc/jemalloc_macros.h. Broadly, we
* should have one constant here per magic value there. Note however that the
* representations need not be related.
*/
#define TCACHE_IND_NONE ((unsigned)-1)
#define TCACHE_IND_AUTOMATIC ((unsigned)-2)
#define ARENA_IND_AUTOMATIC ((unsigned)-1)
/*
* Translating the names of the 'i' functions:
* Abbreviations used in the first part of the function name (before
* alloc/dalloc) describe what that function accomplishes:
* a: arena (query)
* s: size (query, or sized deallocation)
* e: extent (query)
* p: aligned (allocates)
* vs: size (query, without knowing that the pointer is into the heap)
* r: rallocx implementation
* x: xallocx implementation
* Abbreviations used in the second part of the function name (after
* alloc/dalloc) describe the arguments it takes
* z: whether to return zeroed memory
* t: accepts a tcache_t * parameter
* m: accepts an arena_t * parameter
*/
JEMALLOC_ALWAYS_INLINE arena_t *
iaalloc(tsdn_t *tsdn, const void *ptr) {
assert(ptr != NULL);
@ -20,23 +54,35 @@ isalloc(tsdn_t *tsdn, const void *ptr) {
}
JEMALLOC_ALWAYS_INLINE void *
iallocztm(tsdn_t *tsdn, size_t size, szind_t ind, bool zero, tcache_t *tcache,
bool is_internal, arena_t *arena, bool slow_path) {
iallocztm_explicit_slab(tsdn_t *tsdn, size_t size, szind_t ind, bool zero,
bool slab, tcache_t *tcache, bool is_internal, arena_t *arena,
bool slow_path) {
void *ret;
assert(size != 0);
assert(!slab || sz_can_use_slab(size)); /* slab && large is illegal */
assert(!is_internal || tcache == NULL);
assert(!is_internal || arena == NULL || arena_is_auto(arena));
witness_assert_depth_to_rank(tsdn_witness_tsdp_get(tsdn),
WITNESS_RANK_CORE, 0);
if (!tsdn_null(tsdn) && tsd_reentrancy_level_get(tsdn_tsd(tsdn)) == 0) {
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
}
ret = arena_malloc(tsdn, arena, size, ind, zero, tcache, slow_path);
ret = arena_malloc(
tsdn, arena, size, ind, zero, slab, tcache, slow_path);
if (config_stats && is_internal && likely(ret != NULL)) {
arena_internal_add(iaalloc(tsdn, ret), isalloc(tsdn, ret));
}
return ret;
}
JEMALLOC_ALWAYS_INLINE void *
iallocztm(tsdn_t *tsdn, size_t size, szind_t ind, bool zero, tcache_t *tcache,
bool is_internal, arena_t *arena, bool slow_path) {
bool slab = sz_can_use_slab(size);
return iallocztm_explicit_slab(
tsdn, size, ind, zero, slab, tcache, is_internal, arena, slow_path);
}
JEMALLOC_ALWAYS_INLINE void *
ialloc(tsd_t *tsd, size_t size, szind_t ind, bool zero, bool slow_path) {
return iallocztm(tsd_tsdn(tsd), size, ind, zero, tcache_get(tsd), false,
@ -44,18 +90,19 @@ ialloc(tsd_t *tsd, size_t size, szind_t ind, bool zero, bool slow_path) {
}
JEMALLOC_ALWAYS_INLINE void *
ipallocztm(tsdn_t *tsdn, size_t usize, size_t alignment, bool zero,
tcache_t *tcache, bool is_internal, arena_t *arena) {
ipallocztm_explicit_slab(tsdn_t *tsdn, size_t usize, size_t alignment,
bool zero, bool slab, tcache_t *tcache, bool is_internal, arena_t *arena) {
void *ret;
assert(!slab || sz_can_use_slab(usize)); /* slab && large is illegal */
assert(usize != 0);
assert(usize == sz_sa2u(usize, alignment));
assert(!is_internal || tcache == NULL);
assert(!is_internal || arena == NULL || arena_is_auto(arena));
witness_assert_depth_to_rank(tsdn_witness_tsdp_get(tsdn),
WITNESS_RANK_CORE, 0);
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
ret = arena_palloc(tsdn, arena, usize, alignment, zero, tcache);
ret = arena_palloc(tsdn, arena, usize, alignment, zero, slab, tcache);
assert(ALIGNMENT_ADDR2BASE(ret, alignment) == ret);
if (config_stats && is_internal && likely(ret != NULL)) {
arena_internal_add(iaalloc(tsdn, ret), isalloc(tsdn, ret));
@ -63,12 +110,26 @@ ipallocztm(tsdn_t *tsdn, size_t usize, size_t alignment, bool zero,
return ret;
}
JEMALLOC_ALWAYS_INLINE void *
ipallocztm(tsdn_t *tsdn, size_t usize, size_t alignment, bool zero,
tcache_t *tcache, bool is_internal, arena_t *arena) {
return ipallocztm_explicit_slab(tsdn, usize, alignment, zero,
sz_can_use_slab(usize), tcache, is_internal, arena);
}
JEMALLOC_ALWAYS_INLINE void *
ipalloct(tsdn_t *tsdn, size_t usize, size_t alignment, bool zero,
tcache_t *tcache, arena_t *arena) {
return ipallocztm(tsdn, usize, alignment, zero, tcache, false, arena);
}
JEMALLOC_ALWAYS_INLINE void *
ipalloct_explicit_slab(tsdn_t *tsdn, size_t usize, size_t alignment, bool zero,
bool slab, tcache_t *tcache, arena_t *arena) {
return ipallocztm_explicit_slab(
tsdn, usize, alignment, zero, slab, tcache, false, arena);
}
JEMALLOC_ALWAYS_INLINE void *
ipalloc(tsd_t *tsd, size_t usize, size_t alignment, bool zero) {
return ipallocztm(tsd_tsdn(tsd), usize, alignment, zero,
@ -81,17 +142,18 @@ ivsalloc(tsdn_t *tsdn, const void *ptr) {
}
JEMALLOC_ALWAYS_INLINE void
idalloctm(tsdn_t *tsdn, void *ptr, tcache_t *tcache, alloc_ctx_t *alloc_ctx,
bool is_internal, bool slow_path) {
idalloctm(tsdn_t *tsdn, void *ptr, tcache_t *tcache,
emap_alloc_ctx_t *alloc_ctx, bool is_internal, bool slow_path) {
assert(ptr != NULL);
assert(!is_internal || tcache == NULL);
assert(!is_internal || arena_is_auto(iaalloc(tsdn, ptr)));
witness_assert_depth_to_rank(tsdn_witness_tsdp_get(tsdn),
WITNESS_RANK_CORE, 0);
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
if (config_stats && is_internal) {
arena_internal_sub(iaalloc(tsdn, ptr), isalloc(tsdn, ptr));
}
if (!is_internal && tsd_reentrancy_level_get(tsdn_tsd(tsdn)) != 0) {
if (!is_internal && !tsdn_null(tsdn)
&& tsd_reentrancy_level_get(tsdn_tsd(tsdn)) != 0) {
assert(tcache == NULL);
}
arena_dalloc(tsdn, ptr, tcache, alloc_ctx, slow_path);
@ -104,39 +166,29 @@ idalloc(tsd_t *tsd, void *ptr) {
JEMALLOC_ALWAYS_INLINE void
isdalloct(tsdn_t *tsdn, void *ptr, size_t size, tcache_t *tcache,
alloc_ctx_t *alloc_ctx, bool slow_path) {
witness_assert_depth_to_rank(tsdn_witness_tsdp_get(tsdn),
WITNESS_RANK_CORE, 0);
emap_alloc_ctx_t *alloc_ctx, bool slow_path) {
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
arena_sdalloc(tsdn, ptr, size, tcache, alloc_ctx, slow_path);
}
JEMALLOC_ALWAYS_INLINE void *
iralloct_realign(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size,
size_t extra, size_t alignment, bool zero, tcache_t *tcache,
arena_t *arena) {
witness_assert_depth_to_rank(tsdn_witness_tsdp_get(tsdn),
WITNESS_RANK_CORE, 0);
void *p;
size_t alignment, bool zero, bool slab, tcache_t *tcache, arena_t *arena,
hook_ralloc_args_t *hook_args) {
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
void *p;
size_t usize, copysize;
usize = sz_sa2u(size + extra, alignment);
if (unlikely(usize == 0 || usize > LARGE_MAXCLASS)) {
usize = sz_sa2u(size, alignment);
if (unlikely(usize == 0 || usize > SC_LARGE_MAXCLASS)) {
return NULL;
}
p = ipalloct(tsdn, usize, alignment, zero, tcache, arena);
p = ipalloct_explicit_slab(
tsdn, usize, alignment, zero, slab, tcache, arena);
if (p == NULL) {
if (extra == 0) {
return NULL;
}
/* Try again, without extra this time. */
usize = sz_sa2u(size, alignment);
if (unlikely(usize == 0 || usize > LARGE_MAXCLASS)) {
return NULL;
}
p = ipalloct(tsdn, usize, alignment, zero, tcache, arena);
if (p == NULL) {
return NULL;
}
return NULL;
}
/*
* Copy at most size bytes (not size+extra), since the caller has no
@ -144,54 +196,405 @@ iralloct_realign(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size,
*/
copysize = (size < oldsize) ? size : oldsize;
memcpy(p, ptr, copysize);
hook_invoke_alloc(
hook_args->is_realloc ? hook_alloc_realloc : hook_alloc_rallocx, p,
(uintptr_t)p, hook_args->args);
hook_invoke_dalloc(
hook_args->is_realloc ? hook_dalloc_realloc : hook_dalloc_rallocx,
ptr, hook_args->args);
isdalloct(tsdn, ptr, oldsize, tcache, NULL, true);
return p;
}
/*
* is_realloc threads through the knowledge of whether or not this call comes
* from je_realloc (as opposed to je_rallocx); this ensures that we pass the
* correct entry point into any hooks.
* Note that these functions are all force-inlined, so no actual bool gets
* passed-around anywhere.
*/
JEMALLOC_ALWAYS_INLINE void *
iralloct(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size, size_t alignment,
bool zero, tcache_t *tcache, arena_t *arena) {
iralloct_explicit_slab(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size,
size_t alignment, bool zero, bool slab, tcache_t *tcache, arena_t *arena,
hook_ralloc_args_t *hook_args) {
assert(ptr != NULL);
assert(size != 0);
witness_assert_depth_to_rank(tsdn_witness_tsdp_get(tsdn),
WITNESS_RANK_CORE, 0);
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
if (alignment != 0 && ((uintptr_t)ptr & ((uintptr_t)alignment-1))
!= 0) {
if (alignment != 0
&& ((uintptr_t)ptr & ((uintptr_t)alignment - 1)) != 0) {
/*
* Existing object alignment is inadequate; allocate new space
* and copy.
*/
return iralloct_realign(tsdn, ptr, oldsize, size, 0, alignment,
zero, tcache, arena);
return iralloct_realign(tsdn, ptr, oldsize, size, alignment,
zero, slab, tcache, arena, hook_args);
}
return arena_ralloc(tsdn, arena, ptr, oldsize, size, alignment, zero,
tcache);
slab, tcache, hook_args);
}
JEMALLOC_ALWAYS_INLINE void *
iralloct(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size, size_t alignment,
size_t usize, bool zero, tcache_t *tcache, arena_t *arena,
hook_ralloc_args_t *hook_args) {
bool slab = sz_can_use_slab(usize);
return iralloct_explicit_slab(tsdn, ptr, oldsize, size, alignment, zero,
slab, tcache, arena, hook_args);
}
JEMALLOC_ALWAYS_INLINE void *
iralloc(tsd_t *tsd, void *ptr, size_t oldsize, size_t size, size_t alignment,
bool zero) {
return iralloct(tsd_tsdn(tsd), ptr, oldsize, size, alignment, zero,
tcache_get(tsd), NULL);
size_t usize, bool zero, hook_ralloc_args_t *hook_args) {
return iralloct(tsd_tsdn(tsd), ptr, oldsize, size, alignment, usize,
zero, tcache_get(tsd), NULL, hook_args);
}
JEMALLOC_ALWAYS_INLINE bool
ixalloc(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size, size_t extra,
size_t alignment, bool zero) {
size_t alignment, bool zero, size_t *newsize) {
assert(ptr != NULL);
assert(size != 0);
witness_assert_depth_to_rank(tsdn_witness_tsdp_get(tsdn),
WITNESS_RANK_CORE, 0);
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
if (alignment != 0 && ((uintptr_t)ptr & ((uintptr_t)alignment-1))
!= 0) {
if (alignment != 0
&& ((uintptr_t)ptr & ((uintptr_t)alignment - 1)) != 0) {
/* Existing object alignment is inadequate. */
*newsize = oldsize;
return true;
}
return arena_ralloc_no_move(tsdn, ptr, oldsize, size, extra, zero);
return arena_ralloc_no_move(
tsdn, ptr, oldsize, size, extra, zero, newsize);
}
JEMALLOC_ALWAYS_INLINE void
fastpath_success_finish(
tsd_t *tsd, uint64_t allocated_after, cache_bin_t *bin, void *ret) {
thread_allocated_set(tsd, allocated_after);
if (config_stats) {
bin->tstats.nrequests++;
}
}
JEMALLOC_ALWAYS_INLINE bool
malloc_initialized(void) {
return (malloc_init_state == malloc_init_initialized);
}
/*
* malloc() fastpath. Included here so that we can inline it into operator new;
* function call overhead there is non-negligible as a fraction of total CPU in
* allocation-heavy C++ programs. We take the fallback alloc to allow malloc
* (which can return NULL) to differ in its behavior from operator new (which
* can't). It matches the signature of malloc / operator new so that we can
* tail-call the fallback allocator, allowing us to avoid setting up the call
* frame in the common case.
*
* Fastpath assumes size <= SC_LOOKUP_MAXCLASS, and that we hit
* tcache. If either of these is false, we tail-call to the slowpath,
* malloc_default(). Tail-calling is used to avoid any caller-saved
* registers.
*
* fastpath supports ticker and profiling, both of which will also
* tail-call to the slowpath if they fire.
*/
JEMALLOC_ALWAYS_INLINE void *
imalloc_fastpath(size_t size, void *(fallback_alloc)(size_t)) {
if (tsd_get_allocates() && unlikely(!malloc_initialized())) {
return fallback_alloc(size);
}
tsd_t *tsd = tsd_get(false);
if (unlikely((size > SC_LOOKUP_MAXCLASS) || tsd == NULL)) {
return fallback_alloc(size);
}
/*
* The code below till the branch checking the next_event threshold may
* execute before malloc_init(), in which case the threshold is 0 to
* trigger slow path and initialization.
*
* Note that when uninitialized, only the fast-path variants of the sz /
* tsd facilities may be called.
*/
szind_t ind;
/*
* The thread_allocated counter in tsd serves as a general purpose
* accumulator for bytes of allocation to trigger different types of
* events. usize is always needed to advance thread_allocated, though
* it's not always needed in the core allocation logic.
*/
size_t usize;
sz_size2index_usize_fastpath(size, &ind, &usize);
/* Fast path relies on size being a bin. */
assert(ind < SC_NBINS);
assert((SC_LOOKUP_MAXCLASS < SC_SMALL_MAXCLASS)
&& (size <= SC_SMALL_MAXCLASS));
uint64_t allocated, threshold;
te_malloc_fastpath_ctx(tsd, &allocated, &threshold);
uint64_t allocated_after = allocated + usize;
/*
* The ind and usize might be uninitialized (or partially) before
* malloc_init(). The assertions check for: 1) full correctness (usize
* & ind) when initialized; and 2) guaranteed slow-path (threshold == 0)
* when !initialized.
*/
if (!malloc_initialized()) {
assert(threshold == 0);
} else {
assert(ind == sz_size2index(size));
assert(usize > 0 && usize == sz_index2size(ind));
}
/*
* Check for events and tsd non-nominal (fast_threshold will be set to
* 0) in a single branch.
*/
if (unlikely(allocated_after >= threshold)) {
return fallback_alloc(size);
}
assert(tsd_fast(tsd));
tcache_t *tcache = tsd_tcachep_get(tsd);
assert(tcache == tcache_get(tsd));
cache_bin_t *bin = &tcache->bins[ind];
/* Suppress spurious warning from static analysis */
assert(bin != NULL);
bool tcache_success;
void *ret;
/*
* We split up the code this way so that redundant low-water
* computation doesn't happen on the (more common) case in which we
* don't touch the low water mark. The compiler won't do this
* duplication on its own.
*/
ret = cache_bin_alloc_easy(bin, &tcache_success);
if (tcache_success) {
fastpath_success_finish(tsd, allocated_after, bin, ret);
return ret;
}
ret = cache_bin_alloc(bin, &tcache_success);
if (tcache_success) {
fastpath_success_finish(tsd, allocated_after, bin, ret);
return ret;
}
return fallback_alloc(size);
}
JEMALLOC_ALWAYS_INLINE tcache_t *
tcache_get_from_ind(tsd_t *tsd, unsigned tcache_ind, bool slow, bool is_alloc) {
tcache_t *tcache;
if (tcache_ind == TCACHE_IND_AUTOMATIC) {
if (likely(!slow)) {
/* Getting tcache ptr unconditionally. */
tcache = tsd_tcachep_get(tsd);
assert(tcache == tcache_get(tsd));
} else if (is_alloc
|| likely(tsd_reentrancy_level_get(tsd) == 0)) {
tcache = tcache_get(tsd);
} else {
tcache = NULL;
}
} else {
/*
* Should not specify tcache on deallocation path when being
* reentrant.
*/
assert(is_alloc || tsd_reentrancy_level_get(tsd) == 0
|| tsd_state_nocleanup(tsd));
if (tcache_ind == TCACHE_IND_NONE) {
tcache = NULL;
} else {
tcache = tcaches_get(tsd, tcache_ind);
}
}
return tcache;
}
JEMALLOC_ALWAYS_INLINE bool
maybe_check_alloc_ctx(tsd_t *tsd, void *ptr, emap_alloc_ctx_t *alloc_ctx) {
if (config_opt_size_checks) {
emap_alloc_ctx_t dbg_ctx;
emap_alloc_ctx_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr, &dbg_ctx);
if (alloc_ctx->szind != dbg_ctx.szind) {
safety_check_fail_sized_dealloc(
/* current_dealloc */ true, ptr,
/* true_size */ emap_alloc_ctx_usize_get(&dbg_ctx),
/* input_size */
emap_alloc_ctx_usize_get(alloc_ctx));
return true;
}
if (alloc_ctx->slab != dbg_ctx.slab) {
safety_check_fail(
"Internal heap corruption detected: "
"mismatch in slab bit");
return true;
}
}
return false;
}
JEMALLOC_ALWAYS_INLINE bool
prof_sample_aligned(const void *ptr) {
return ((uintptr_t)ptr & PROF_SAMPLE_ALIGNMENT_MASK) == 0;
}
JEMALLOC_ALWAYS_INLINE bool
free_fastpath_nonfast_aligned(void *ptr, bool check_prof) {
/*
* free_fastpath do not handle two uncommon cases: 1) sampled profiled
* objects and 2) sampled junk & stash for use-after-free detection.
* Both have special alignments which are used to escape the fastpath.
*
* prof_sample is page-aligned, which covers the UAF check when both
* are enabled (the assertion below). Avoiding redundant checks since
* this is on the fastpath -- at most one runtime branch from this.
*/
if (config_debug && cache_bin_nonfast_aligned(ptr)) {
assert(prof_sample_aligned(ptr));
}
if (config_prof && check_prof) {
/* When prof is enabled, the prof_sample alignment is enough. */
if (prof_sample_aligned(ptr)) {
return true;
} else {
return false;
}
}
if (config_uaf_detection) {
if (cache_bin_nonfast_aligned(ptr)) {
return true;
} else {
return false;
}
}
return false;
}
/* Returns whether or not the free attempt was successful. */
JEMALLOC_ALWAYS_INLINE
bool
free_fastpath(void *ptr, size_t size, bool size_hint) {
tsd_t *tsd = tsd_get(false);
/* The branch gets optimized away unless tsd_get_allocates(). */
if (unlikely(tsd == NULL)) {
return false;
}
/*
* The tsd_fast() / initialized checks are folded into the branch
* testing (deallocated_after >= threshold) later in this function.
* The threshold will be set to 0 when !tsd_fast.
*/
assert(tsd_fast(tsd)
|| *tsd_thread_deallocated_next_event_fastp_get_unsafe(tsd) == 0);
emap_alloc_ctx_t alloc_ctx JEMALLOC_CC_SILENCE_INIT({0, 0, false});
size_t usize;
if (!size_hint) {
bool err = emap_alloc_ctx_try_lookup_fast(
tsd, &arena_emap_global, ptr, &alloc_ctx);
/* Note: profiled objects will have alloc_ctx.slab set */
if (unlikely(err || !alloc_ctx.slab
|| free_fastpath_nonfast_aligned(ptr,
/* check_prof */ false))) {
return false;
}
assert(alloc_ctx.szind != SC_NSIZES);
usize = sz_index2size(alloc_ctx.szind);
} else {
/*
* Check for both sizes that are too large, and for sampled /
* special aligned objects. The alignment check will also check
* for null ptr.
*/
if (unlikely(size > SC_LOOKUP_MAXCLASS
|| free_fastpath_nonfast_aligned(ptr,
/* check_prof */ true))) {
return false;
}
sz_size2index_usize_fastpath(size, &alloc_ctx.szind, &usize);
/* Max lookup class must be small. */
assert(alloc_ctx.szind < SC_NBINS);
/* This is a dead store, except when opt size checking is on. */
alloc_ctx.slab = true;
}
/*
* Currently the fastpath only handles small sizes. The branch on
* SC_LOOKUP_MAXCLASS makes sure of it. This lets us avoid checking
* tcache szind upper limit (i.e. tcache_max) as well.
*/
assert(alloc_ctx.slab);
uint64_t deallocated, threshold;
te_free_fastpath_ctx(tsd, &deallocated, &threshold);
uint64_t deallocated_after = deallocated + usize;
/*
* Check for events and tsd non-nominal (fast_threshold will be set to
* 0) in a single branch. Note that this handles the uninitialized case
* as well (TSD init will be triggered on the non-fastpath). Therefore
* anything depends on a functional TSD (e.g. the alloc_ctx sanity check
* below) needs to be after this branch.
*/
if (unlikely(deallocated_after >= threshold)) {
return false;
}
assert(tsd_fast(tsd));
bool fail = maybe_check_alloc_ctx(tsd, ptr, &alloc_ctx);
if (fail) {
/* See the comment in isfree. */
return true;
}
tcache_t *tcache = tcache_get_from_ind(tsd, TCACHE_IND_AUTOMATIC,
/* slow */ false, /* is_alloc */ false);
cache_bin_t *bin = &tcache->bins[alloc_ctx.szind];
/*
* If junking were enabled, this is where we would do it. It's not
* though, since we ensured above that we're on the fast path. Assert
* that to double-check.
*/
assert(!opt_junk_free);
if (!cache_bin_dalloc_easy(bin, ptr)) {
return false;
}
*tsd_thread_deallocatedp_get(tsd) = deallocated_after;
return true;
}
JEMALLOC_ALWAYS_INLINE void JEMALLOC_NOTHROW
je_sdallocx_noflags(void *ptr, size_t size) {
if (!free_fastpath(ptr, size, true)) {
sdallocx_default(ptr, size, 0);
}
}
JEMALLOC_ALWAYS_INLINE void JEMALLOC_NOTHROW
je_sdallocx_impl(void *ptr, size_t size, int flags) {
if (flags != 0 || !free_fastpath(ptr, size, true)) {
sdallocx_default(ptr, size, flags);
}
}
JEMALLOC_ALWAYS_INLINE void JEMALLOC_NOTHROW
je_free_impl(void *ptr) {
if (!free_fastpath(ptr, 0, false)) {
free_default(ptr);
}
}
#endif /* JEMALLOC_INTERNAL_INLINES_C_H */

View file

@ -2,39 +2,145 @@
#define JEMALLOC_INTERNAL_MACROS_H
#ifdef JEMALLOC_DEBUG
# define JEMALLOC_ALWAYS_INLINE static inline
# define JEMALLOC_ALWAYS_INLINE static inline
#else
# define JEMALLOC_ALWAYS_INLINE JEMALLOC_ATTR(always_inline) static inline
# ifdef _MSC_VER
# define JEMALLOC_ALWAYS_INLINE static __forceinline
# else
# define JEMALLOC_ALWAYS_INLINE \
JEMALLOC_ATTR(always_inline) static inline
# endif
#endif
#ifdef _MSC_VER
# define inline _inline
# define inline _inline
#endif
#define UNUSED JEMALLOC_ATTR(unused)
#define ZU(z) ((size_t)z)
#define ZD(z) ((ssize_t)z)
#define QU(q) ((uint64_t)q)
#define QD(q) ((int64_t)q)
#define ZU(z) ((size_t)z)
#define ZD(z) ((ssize_t)z)
#define QU(q) ((uint64_t)q)
#define QD(q) ((int64_t)q)
#define KZU(z) ZU(z##ULL)
#define KZD(z) ZD(z##LL)
#define KQU(q) QU(q##ULL)
#define KQD(q) QI(q##LL)
#define KZU(z) ZU(z##ULL)
#define KZD(z) ZD(z##LL)
#define KQU(q) QU(q##ULL)
#define KQD(q) QI(q##LL)
#ifndef __DECONST
# define __DECONST(type, var) ((type)(uintptr_t)(const void *)(var))
# define __DECONST(type, var) ((type)(uintptr_t)(const void *)(var))
#endif
#if !defined(JEMALLOC_HAS_RESTRICT) || defined(__cplusplus)
# define restrict
# define restrict
#endif
/* Various function pointers are statick and immutable except during testing. */
/* Various function pointers are static and immutable except during testing. */
#ifdef JEMALLOC_JET
# define JET_MUTABLE
# define JET_MUTABLE
# define JET_EXTERN extern
#else
# define JET_MUTABLE const
# define JET_MUTABLE const
# define JET_EXTERN static
#endif
#define JEMALLOC_VA_ARGS_HEAD(head, ...) head
#define JEMALLOC_VA_ARGS_TAIL(head, ...) __VA_ARGS__
/* Diagnostic suppression macros */
#if defined(_MSC_VER) && !defined(__clang__)
# define JEMALLOC_DIAGNOSTIC_PUSH __pragma(warning(push))
# define JEMALLOC_DIAGNOSTIC_POP __pragma(warning(pop))
# define JEMALLOC_DIAGNOSTIC_IGNORE(W) __pragma(warning(disable : W))
# define JEMALLOC_DIAGNOSTIC_IGNORE_MISSING_STRUCT_FIELD_INITIALIZERS
# define JEMALLOC_DIAGNOSTIC_IGNORE_FRAME_ADDRESS
# define JEMALLOC_DIAGNOSTIC_IGNORE_TYPE_LIMITS
# define JEMALLOC_DIAGNOSTIC_IGNORE_ALLOC_SIZE_LARGER_THAN
# define JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED
# define JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS
/* #pragma GCC diagnostic first appeared in gcc 4.6. */
#elif (defined(__GNUC__) \
&& ((__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ > 5)))) \
|| defined(__clang__)
/*
* The JEMALLOC_PRAGMA__ macro is an implementation detail of the GCC and Clang
* diagnostic suppression macros and should not be used anywhere else.
*/
# define JEMALLOC_PRAGMA__(X) _Pragma(#X)
# define JEMALLOC_DIAGNOSTIC_PUSH JEMALLOC_PRAGMA__(GCC diagnostic push)
# define JEMALLOC_DIAGNOSTIC_POP JEMALLOC_PRAGMA__(GCC diagnostic pop)
# define JEMALLOC_DIAGNOSTIC_IGNORE(W) \
JEMALLOC_PRAGMA__(GCC diagnostic ignored W)
/*
* The -Wmissing-field-initializers warning is buggy in GCC versions < 5.1 and
* all clang versions up to version 7 (currently trunk, unreleased). This macro
* suppresses the warning for the affected compiler versions only.
*/
# if ((defined(__GNUC__) && !defined(__clang__)) && (__GNUC__ < 5)) \
|| defined(__clang__)
# define JEMALLOC_DIAGNOSTIC_IGNORE_MISSING_STRUCT_FIELD_INITIALIZERS \
JEMALLOC_DIAGNOSTIC_IGNORE( \
"-Wmissing-field-initializers")
# else
# define JEMALLOC_DIAGNOSTIC_IGNORE_MISSING_STRUCT_FIELD_INITIALIZERS
# endif
# define JEMALLOC_DIAGNOSTIC_IGNORE_FRAME_ADDRESS \
JEMALLOC_DIAGNOSTIC_IGNORE("-Wframe-address")
# define JEMALLOC_DIAGNOSTIC_IGNORE_TYPE_LIMITS \
JEMALLOC_DIAGNOSTIC_IGNORE("-Wtype-limits")
# define JEMALLOC_DIAGNOSTIC_IGNORE_UNUSED_PARAMETER \
JEMALLOC_DIAGNOSTIC_IGNORE("-Wunused-parameter")
# if defined(__GNUC__) && !defined(__clang__) && (__GNUC__ >= 7)
# define JEMALLOC_DIAGNOSTIC_IGNORE_ALLOC_SIZE_LARGER_THAN \
JEMALLOC_DIAGNOSTIC_IGNORE("-Walloc-size-larger-than=")
# else
# define JEMALLOC_DIAGNOSTIC_IGNORE_ALLOC_SIZE_LARGER_THAN
# endif
# ifdef JEMALLOC_HAVE_ATTR_DEPRECATED
# define JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED \
JEMALLOC_DIAGNOSTIC_IGNORE("-Wdeprecated-declarations")
# else
# define JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED
# endif
# define JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS \
JEMALLOC_DIAGNOSTIC_PUSH \
JEMALLOC_DIAGNOSTIC_IGNORE_UNUSED_PARAMETER
#else
# define JEMALLOC_DIAGNOSTIC_PUSH
# define JEMALLOC_DIAGNOSTIC_POP
# define JEMALLOC_DIAGNOSTIC_IGNORE(W)
# define JEMALLOC_DIAGNOSTIC_IGNORE_MISSING_STRUCT_FIELD_INITIALIZERS
# define JEMALLOC_DIAGNOSTIC_IGNORE_FRAME_ADDRESS
# define JEMALLOC_DIAGNOSTIC_IGNORE_TYPE_LIMITS
# define JEMALLOC_DIAGNOSTIC_IGNORE_ALLOC_SIZE_LARGER_THAN
# define JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED
# define JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS
#endif
#ifdef __clang_analyzer__
# define JEMALLOC_CLANG_ANALYZER
#endif
#ifdef JEMALLOC_CLANG_ANALYZER
# define JEMALLOC_CLANG_ANALYZER_SUPPRESS __attribute__((suppress))
# define JEMALLOC_CLANG_ANALYZER_SILENCE_INIT(v) = v
#else
# define JEMALLOC_CLANG_ANALYZER_SUPPRESS
# define JEMALLOC_CLANG_ANALYZER_SILENCE_INIT(v)
#endif
#define JEMALLOC_SUPPRESS_WARN_ON_USAGE(...) \
JEMALLOC_DIAGNOSTIC_PUSH \
JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED \
__VA_ARGS__ \
JEMALLOC_DIAGNOSTIC_POP
/*
* Disables spurious diagnostics for all headers. Since these headers are not
* included by users directly, it does not affect their diagnostic settings.
*/
JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS
#endif /* JEMALLOC_INTERNAL_MACROS_H */

View file

@ -0,0 +1,22 @@
#ifndef JEMALLOC_INTERNAL_OVERRIDES_H
#define JEMALLOC_INTERNAL_OVERRIDES_H
/*
* Under normal circumstances this header serves no purpose, as these settings
* can be customized via the corresponding autoconf options at configure-time.
* Overriding in this fashion is useful when the header files generated by
* autoconf are used as input for another build system.
*/
#ifdef JEMALLOC_OVERRIDE_LG_PAGE
# undef LG_PAGE
# define LG_PAGE JEMALLOC_OVERRIDE_LG_PAGE
#endif
#ifdef JEMALLOC_OVERRIDE_JEMALLOC_CONFIG_MALLOC_CONF
# undef JEMALLOC_CONFIG_MALLOC_CONF
# define JEMALLOC_CONFIG_MALLOC_CONF \
JEMALLOC_OVERRIDE_JEMALLOC_CONFIG_MALLOC_CONF
#endif
#endif /* JEMALLOC_INTERNAL_OVERRIDES_H */

View file

@ -1,15 +1,33 @@
#ifndef JEMALLOC_INTERNAL_TYPES_H
#define JEMALLOC_INTERNAL_TYPES_H
/* Page size index type. */
typedef unsigned pszind_t;
/* Size class index type. */
typedef unsigned szind_t;
#include "jemalloc/internal/quantum.h"
/* Processor / core id type. */
typedef int malloc_cpuid_t;
/* When realloc(non-null-ptr, 0) is called, what happens? */
enum zero_realloc_action_e {
/* Realloc(ptr, 0) is free(ptr); return malloc(0); */
zero_realloc_action_alloc = 0,
/* Realloc(ptr, 0) is free(ptr); */
zero_realloc_action_free = 1,
/* Realloc(ptr, 0) aborts. */
zero_realloc_action_abort = 2
};
typedef enum zero_realloc_action_e zero_realloc_action_t;
/* Signature of write callback. */
typedef void(write_cb_t)(void *, const char *);
enum malloc_init_e {
malloc_init_uninitialized = 3,
malloc_init_a0_initialized = 2,
malloc_init_recursible = 1,
malloc_init_initialized = 0 /* Common case --> jnz. */
};
typedef enum malloc_init_e malloc_init_t;
/*
* Flags bits:
*
@ -21,114 +39,46 @@ typedef int malloc_cpuid_t;
*
* aaaaaaaa aaaatttt tttttttt 0znnnnnn
*/
#define MALLOCX_ARENA_BITS 12
#define MALLOCX_TCACHE_BITS 12
#define MALLOCX_LG_ALIGN_BITS 6
#define MALLOCX_ARENA_SHIFT 20
#define MALLOCX_TCACHE_SHIFT 8
#define MALLOCX_ARENA_MASK \
(((1 << MALLOCX_ARENA_BITS) - 1) << MALLOCX_ARENA_SHIFT)
#define MALLOCX_ARENA_BITS 12
#define MALLOCX_TCACHE_BITS 12
#define MALLOCX_LG_ALIGN_BITS 6
#define MALLOCX_ARENA_SHIFT 20
#define MALLOCX_TCACHE_SHIFT 8
#define MALLOCX_ARENA_MASK \
((unsigned)(((1U << MALLOCX_ARENA_BITS) - 1) << MALLOCX_ARENA_SHIFT))
/* NB: Arena index bias decreases the maximum number of arenas by 1. */
#define MALLOCX_ARENA_LIMIT ((1 << MALLOCX_ARENA_BITS) - 1)
#define MALLOCX_TCACHE_MASK \
(((1 << MALLOCX_TCACHE_BITS) - 1) << MALLOCX_TCACHE_SHIFT)
#define MALLOCX_TCACHE_MAX ((1 << MALLOCX_TCACHE_BITS) - 3)
#define MALLOCX_LG_ALIGN_MASK ((1 << MALLOCX_LG_ALIGN_BITS) - 1)
#define MALLOCX_ARENA_LIMIT ((unsigned)((1U << MALLOCX_ARENA_BITS) - 1))
#define MALLOCX_TCACHE_MASK \
((unsigned)(((1U << MALLOCX_TCACHE_BITS) - 1) << MALLOCX_TCACHE_SHIFT))
#define MALLOCX_TCACHE_MAX ((unsigned)((1U << MALLOCX_TCACHE_BITS) - 3))
#define MALLOCX_LG_ALIGN_MASK ((1 << MALLOCX_LG_ALIGN_BITS) - 1)
/* Use MALLOCX_ALIGN_GET() if alignment may not be specified in flags. */
#define MALLOCX_ALIGN_GET_SPECIFIED(flags) \
(ZU(1) << (flags & MALLOCX_LG_ALIGN_MASK))
#define MALLOCX_ALIGN_GET(flags) \
(MALLOCX_ALIGN_GET_SPECIFIED(flags) & (SIZE_T_MAX-1))
#define MALLOCX_ZERO_GET(flags) \
((bool)(flags & MALLOCX_ZERO))
#define MALLOCX_ALIGN_GET_SPECIFIED(flags) \
(ZU(1) << (flags & MALLOCX_LG_ALIGN_MASK))
#define MALLOCX_ALIGN_GET(flags) \
(MALLOCX_ALIGN_GET_SPECIFIED(flags) & (SIZE_T_MAX - 1))
#define MALLOCX_ZERO_GET(flags) ((bool)(flags & MALLOCX_ZERO))
#define MALLOCX_TCACHE_GET(flags) \
(((unsigned)((flags & MALLOCX_TCACHE_MASK) >> MALLOCX_TCACHE_SHIFT)) - 2)
#define MALLOCX_ARENA_GET(flags) \
(((unsigned)(((unsigned)flags) >> MALLOCX_ARENA_SHIFT)) - 1)
#define MALLOCX_TCACHE_GET(flags) \
(((unsigned)((flags & MALLOCX_TCACHE_MASK) >> MALLOCX_TCACHE_SHIFT)) \
- 2)
#define MALLOCX_ARENA_GET(flags) \
(((unsigned)(((unsigned)flags) >> MALLOCX_ARENA_SHIFT)) - 1)
/* Smallest size class to support. */
#define TINY_MIN (1U << LG_TINY_MIN)
#define TINY_MIN (1U << LG_TINY_MIN)
/*
* Minimum allocation alignment is 2^LG_QUANTUM bytes (ignoring tiny size
* classes).
*/
#ifndef LG_QUANTUM
# if (defined(__i386__) || defined(_M_IX86))
# define LG_QUANTUM 4
# endif
# ifdef __ia64__
# define LG_QUANTUM 4
# endif
# ifdef __alpha__
# define LG_QUANTUM 4
# endif
# if (defined(__sparc64__) || defined(__sparcv9) || defined(__sparc_v9__))
# define LG_QUANTUM 4
# endif
# if (defined(__amd64__) || defined(__x86_64__) || defined(_M_X64))
# define LG_QUANTUM 4
# endif
# ifdef __arm__
# define LG_QUANTUM 3
# endif
# ifdef __aarch64__
# define LG_QUANTUM 4
# endif
# ifdef __hppa__
# define LG_QUANTUM 4
# endif
# ifdef __mips__
# define LG_QUANTUM 3
# endif
# ifdef __or1k__
# define LG_QUANTUM 3
# endif
# ifdef __powerpc__
# define LG_QUANTUM 4
# endif
# ifdef __riscv__
# define LG_QUANTUM 4
# endif
# ifdef __s390__
# define LG_QUANTUM 4
# endif
# ifdef __SH4__
# define LG_QUANTUM 4
# endif
# ifdef __tile__
# define LG_QUANTUM 4
# endif
# ifdef __le32__
# define LG_QUANTUM 4
# endif
# ifndef LG_QUANTUM
# error "Unknown minimum alignment for architecture; specify via "
"--with-lg-quantum"
# endif
#endif
#define QUANTUM ((size_t)(1U << LG_QUANTUM))
#define QUANTUM_MASK (QUANTUM - 1)
/* Return the smallest quantum multiple that is >= a. */
#define QUANTUM_CEILING(a) \
(((a) + QUANTUM_MASK) & ~QUANTUM_MASK)
#define LONG ((size_t)(1U << LG_SIZEOF_LONG))
#define LONG_MASK (LONG - 1)
#define LONG ((size_t)(1U << LG_SIZEOF_LONG))
#define LONG_MASK (LONG - 1)
/* Return the smallest long multiple that is >= a. */
#define LONG_CEILING(a) \
(((a) + LONG_MASK) & ~LONG_MASK)
#define LONG_CEILING(a) (((a) + LONG_MASK) & ~LONG_MASK)
#define SIZEOF_PTR (1U << LG_SIZEOF_PTR)
#define PTR_MASK (SIZEOF_PTR - 1)
#define SIZEOF_PTR (1U << LG_SIZEOF_PTR)
#define PTR_MASK (SIZEOF_PTR - 1)
/* Return the smallest (void *) multiple that is >= a. */
#define PTR_CEILING(a) \
(((a) + PTR_MASK) & ~PTR_MASK)
#define PTR_CEILING(a) (((a) + PTR_MASK) & ~PTR_MASK)
/*
* Maximum size of L1 cache line. This is used to avoid cache line aliasing.
@ -137,42 +87,62 @@ typedef int malloc_cpuid_t;
* CACHELINE cannot be based on LG_CACHELINE because __declspec(align()) can
* only handle raw constants.
*/
#define LG_CACHELINE 6
#define CACHELINE 64
#define CACHELINE_MASK (CACHELINE - 1)
#define LG_CACHELINE 6
#define CACHELINE 64
#define CACHELINE_MASK (CACHELINE - 1)
/* Return the smallest cacheline multiple that is >= s. */
#define CACHELINE_CEILING(s) \
(((s) + CACHELINE_MASK) & ~CACHELINE_MASK)
#define CACHELINE_CEILING(s) (((s) + CACHELINE_MASK) & ~CACHELINE_MASK)
/* Return the nearest aligned address at or below a. */
#define ALIGNMENT_ADDR2BASE(a, alignment) \
((void *)((uintptr_t)(a) & ((~(alignment)) + 1)))
#define ALIGNMENT_ADDR2BASE(a, alignment) \
((void *)(((byte_t *)(a)) \
- (((uintptr_t)(a)) - ((uintptr_t)(a) & ((~(alignment)) + 1)))))
/* Return the offset between a and the nearest aligned address at or below a. */
#define ALIGNMENT_ADDR2OFFSET(a, alignment) \
#define ALIGNMENT_ADDR2OFFSET(a, alignment) \
((size_t)((uintptr_t)(a) & (alignment - 1)))
/* Return the smallest alignment multiple that is >= s. */
#define ALIGNMENT_CEILING(s, alignment) \
#define ALIGNMENT_CEILING(s, alignment) \
(((s) + (alignment - 1)) & ((~(alignment)) + 1))
/*
* Return the nearest aligned address at or above a.
*
* While at first glance this would appear to be merely a more complicated
* way to perform the same computation as `ALIGNMENT_CEILING`,
* this has the important additional property of not concealing pointer
* provenance from the compiler. See the block-comment on the
* definition of `byte_t` for more details.
*/
#define ALIGNMENT_ADDR2CEILING(a, alignment) \
((void *)(((byte_t *)(a)) \
+ (((((uintptr_t)(a)) + (alignment - 1)) & ((~(alignment)) + 1)) \
- ((uintptr_t)(a)))))
/* Declare a variable-length array. */
#if __STDC_VERSION__ < 199901L
# ifdef _MSC_VER
# include <malloc.h>
# define alloca _alloca
# else
# ifdef JEMALLOC_HAS_ALLOCA_H
# include <alloca.h>
# else
# include <stdlib.h>
# endif
# endif
# define VARIABLE_ARRAY(type, name, count) \
type *name = alloca(sizeof(type) * (count))
#if __STDC_VERSION__ < 199901L || defined(__STDC_NO_VLA__)
# ifdef _MSC_VER
# include <malloc.h>
# define alloca _alloca
# else
# ifdef JEMALLOC_HAS_ALLOCA_H
# include <alloca.h>
# else
# include <stdlib.h>
# endif
# endif
# define VARIABLE_ARRAY_UNSAFE(type, name, count) \
type *name = alloca(sizeof(type) * (count))
#else
# define VARIABLE_ARRAY(type, name, count) type name[(count)]
# define VARIABLE_ARRAY_UNSAFE(type, name, count) type name[(count)]
#endif
#define VARIABLE_ARRAY_SIZE_MAX 2048
#define VARIABLE_ARRAY(type, name, count) \
assert(sizeof(type) * (count) <= VARIABLE_ARRAY_SIZE_MAX); \
VARIABLE_ARRAY_UNSAFE(type, name, count)
#define CALLOC_MADVISE_THRESHOLD_DEFAULT (((size_t)1) << 23) /* 8 MB */
#endif /* JEMALLOC_INTERNAL_TYPES_H */

Some files were not shown because too many files have changed in this diff Show more