Compare commits

...

2287 commits
4.3.1 ... dev

Author SHA1 Message Date
Guangli Dai
81034ce1f1
Update ChangeLog for release 5.3.1 2026-04-13 17:12:37 -07:00
Ian Ker-Seymer
b8646f4db3 Fix opt.max_background_threads default in docs 2026-04-13 14:46:53 -07:00
Guangli Dai
6515df8cec
Documentation updates (#2869)
* Document new mallctl interfaces added since 5.3.0

Add documentation for the following new mallctl entries:
- opt.debug_double_free_max_scan: double-free detection scan limit
- opt.prof_bt_max: max profiling backtrace depth
- opt.disable_large_size_classes: page-aligned large allocations
- opt.process_madvise_max_batch: batched process_madvise purging
- thread.tcache.max: per-thread tcache_max control
- thread.tcache.ncached_max.read_sizeclass: query ncached_max
- thread.tcache.ncached_max.write: set ncached_max per size range
- arena.<i>.name: get/set arena names
- arenas.hugepage: hugepage size
- approximate_stats.active: lightweight active bytes estimate

Remove config.prof_frameptr since it still needs more development
and is still experimental.

Co-authored-by: lexprfuncall <carl.shapiro@gmail.com>
2026-04-07 10:41:44 -07:00
Slobodan Predolac
f265645d02 Emit retained HPA slab stats in JSON 2026-04-01 23:15:19 -04:00
Slobodan Predolac
db7d99703d Add TODO to benchmark possibly better policy 2026-04-01 23:15:19 -04:00
Slobodan Predolac
6281482c39 Nest HPA SEC stats inside hpa_shard JSON 2026-04-01 23:15:19 -04:00
Slobodan Predolac
3cc56d325c Fix large alloc nrequests under-counting on cache misses 2026-04-01 23:15:19 -04:00
Slobodan Predolac
a47fa33b5a Run clang-format on test/unit/tcache_max.c 2026-04-01 23:15:19 -04:00
Slobodan Predolac
b507644cb0 Fix conf_handle_char_p zero-sized dest and remove unused conf_handle_unsigned 2026-04-01 23:15:19 -04:00
Slobodan Predolac
3ac9f96158 Run clang-format on test/unit/conf_parse.c 2026-04-01 23:15:19 -04:00
Slobodan Predolac
5904a42187 Fix memory leak of old curr_reg on san_bump_grow_locked failure
When san_bump_grow_locked fails, it sets sba->curr_reg to NULL.
The old curr_reg (saved in to_destroy) was never freed or restored,
leaking the virtual memory extent. Restore sba->curr_reg from
to_destroy on failure so the old region remains usable.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
2fceece256 Fix extra size argument in edata_init call in extent_alloc_dss
An extra 'size' argument was passed where 'slab' (false) should be,
shifting all subsequent arguments: slab got size (nonzero=true),
szind got false (0), and sn got SC_NSIZES instead of a proper serial
number from extent_sn_next(). Match the correct pattern used by the
gap edata_init call above.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
234404d324 Fix wrong loop variable for array index in sz_boot_pind2sz_tab
The sentinel fill loop used sz_pind2sz_tab[pind] (constant) instead
of sz_pind2sz_tab[i] (loop variable), writing only to the first
entry repeatedly and leaving subsequent entries uninitialized.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
675ab079e7 Fix missing release of acquired neighbor edata in extent_try_coalesce_impl
When emap_try_acquire_edata_neighbor returned a non-NULL neighbor but
the size check failed, the neighbor was never released from
extent_state_merging, making it permanently invisible to future
allocation and coalescing operations.

Release the neighbor when it doesn't meet the size requirement,
matching the pattern used in extent_recycle_extract.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
3f6e63e86a Fix wrong type for malloc_read_fd return value in prof_stack_range
Used size_t (unsigned) instead of ssize_t for the return value of
malloc_read_fd, which returns -1 on error. With size_t, -1 becomes
a huge positive value, bypassing the error check and corrupting the
remaining byte count.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
dd30c91eaa Fix wrong fallback value in os_page_detect when sysconf fails
Returned LG_PAGE (log2 of page size, e.g. 12) instead of PAGE (actual
page size, e.g. 4096) when sysconf(_SC_PAGESIZE) failed. This would
cause os_page to be set to an absurdly small value, breaking all
page-aligned operations.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
3a8bee81f1 Fix pac_mapped stats inflation on allocation failure
newly_mapped_size was set unconditionally in the ecache_alloc_grow
fallback path, even when the allocation returned NULL. This inflated
pac_mapped stats without a corresponding deallocation to correct them.

Guard the assignment with an edata != NULL check, matching the pattern
used in the batched allocation path above it.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
c2d57040f0 Fix out-of-bounds write in malloc_vsnprintf when size is 0
When called with size==0, the else branch wrote to str[size-1] which
is str[(size_t)-1], a massive out-of-bounds write. Standard vsnprintf
allows size==0 to mean "compute length only, write nothing".

Add unit test for the size==0 case.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
eab2b29736 Fix off-by-one in stats_arenas_i_bins_j and stats_arenas_i_lextents_j bounds checks
Same pattern as arenas_bin_i_index: used > instead of >= allowing
access one past the end of bstats[] and lstats[] arrays.

Add unit tests that verify boundary indices return ENOENT.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
a0f2bdf91d Fix missing negation in large_ralloc_no_move usize_min fallback
The second expansion attempt in large_ralloc_no_move omitted the !
before large_ralloc_no_move_expand(), inverting the return value.
On expansion failure, the function falsely reported success, making
callers believe the allocation was expanded in-place when it was not.
On expansion success, the function falsely reported failure, causing
callers to unnecessarily allocate, copy, and free.

Add unit test that verifies the return value matches actual size change.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
87f9938de5 Fix duplicate "nactive_huge" JSON key in HPA shard stats output
In both the full_slabs and empty_slabs JSON sections of HPA shard
stats, "nactive_huge" was emitted twice instead of emitting
"ndirty_huge" as the second entry. This caused ndirty_huge to be
missing from the JSON output entirely.

Add a unit test that verifies both sections contain "ndirty_huge".
2026-04-01 23:15:19 -04:00
Slobodan Predolac
513778bcb1 Fix off-by-one in arenas_bin_i_index and arenas_lextent_i_index bounds checks
The index validation used > instead of >=, allowing access at index
SC_NBINS (for bins) and SC_NSIZES-SC_NBINS (for lextents), which are
one past the valid range. This caused out-of-bounds reads in bin_infos[]
and sz_index2size_unsafe().

Add unit tests that verify the boundary indices return ENOENT.
2026-04-01 23:15:19 -04:00
Slobodan Predolac
176ea0a801 Remove experimental.thread.activity_callback 2026-04-01 16:23:41 -07:00
Slobodan Predolac
19bbefe136 Remove dead code: extent_commit_wrapper, large_salloc, tcache_gc_dalloc event waits
These functions had zero callers anywhere in the codebase:
- extent_commit_wrapper: wrapper never called, _impl used directly
- large_salloc: trivial wrapper never called
- tcache_gc_dalloc_new_event_wait: no header declaration, no callers
- tcache_gc_dalloc_postponed_event_wait: no header declaration, no callers
2026-04-01 17:48:19 -04:00
Weixie Cui
a87c518bab Fix typo in prof_log_rep_check: use != instead of || for alloc_count
The condition incorrectly used 'alloc_count || 0' which was likely a typo
for 'alloc_count != 0'. While both evaluate similarly for the zero/non-zero
case, the fix ensures consistency with bt_count and thr_count checks and
uses the correct comparison operator.
2026-03-26 10:42:29 -07:00
Slobodan Predolac
d758349ca4 Fix psset_pick_purge when last candidate with index 0 dirtiness is ineligible
psset_pick_purge used max_bit-- after rejecting a time-ineligible
candidate, which caused unnecessary re-scanning of the same bitmap
and makes assert fail in debug mode) and a size_t underflow
when the lowest-index entry was rejected.  Use max_bit = ind - 1
to skip directly past the rejected index.
2026-03-26 10:39:37 -07:00
Tony Printezis
1d018d8fda improve hpdata_assert_consistent()
A few ways this consistency check can be improved:
* Print which conditions fail and associated values.
* Accumulate the result so that we can print all conditions that fail.
* Turn hpdata_assert_consistent() into a macro so, when it fails,
  we can get line number where it's called from.
2026-03-26 10:39:23 -07:00
Carl Shapiro
86b7219213 Add unit tests for conf parsing and its helpers 2026-03-10 18:14:33 -07:00
Carl Shapiro
ad726adf75 Separate out the configuration code from initialization 2026-03-10 18:14:33 -07:00
Carl Shapiro
a056c20d67 Handle tcache init failures gracefully
tsd_tcache_data_init() returns true on failure but its callers ignore
this return value, leaving the per-thread tcache in an uninitialized
state after a failure.

This change disables the tcache on an initialization failure and logs
an error message.  If opt_abort is true, it will also abort.

New unit tests have been added to test tcache initialization failures.
2026-03-10 18:14:33 -07:00
Carl Shapiro
a75655badf Add unit test coverage for bin interfaces 2026-03-10 18:14:33 -07:00
Carl Shapiro
0ac9380cf1 Move bin inline functions from arena_inlines_b.h to bin_inlines.h
This is a continuation of my previous clean-up change, now focusing on
the inline functions defined in header files.
2026-03-10 18:14:33 -07:00
Carl Shapiro
1cc563f531 Move bin functions from arena.c to bin.c
This is a clean-up change that gives the bin functions implemented in
the area code a prefix of bin_ and moves them into the bin code.

To further decouple the bin code from the arena code, bin functions
that had taken an arena_t to check arena_is_auto now take an is_auto
parameter instead.
2026-03-10 18:14:33 -07:00
guangli-dai
c73ab1c2ff Add a test to check the output in JSON-based stats is consistent with mallctl results. 2026-03-10 18:14:33 -07:00
guangli-dai
12b33ed8f1 Fix wrong mutex stats in json-formatted malloc stats
During mutex stats emit, derived counters are not emitted for json.
Yet the array indexing counter should still be increased to skip
derived elements in the output, which was not. This commit fixes it.
2026-03-10 18:14:33 -07:00
Carl Shapiro
79cc7dcc82 Guard os_page_id against a NULL address
While undocumented, the prctl system call will set errno to ENOMEM
when passed NULL as an address.  Under that condition, an assertion
that check for EINVAL as the only possible errno value will fail.  To
avoid the assertion failure, this change skips the call to os_page_id
when address is NULL.  NULL can only occur after mmap fails in which
case there is no mapping to name.
2026-03-10 18:14:33 -07:00
Yuxuan Chen
a10ef3e1f1 configure: add --with-cxx-stdlib option
When C++ support is enabled, configure unconditionally probes
`-lstdc++` and keeps it in LIBS if the link test succeeds. On
platforms using libc++, this probe can succeed at compile time (if
libstdc++ headers/libraries happen to be installed) but then cause
runtime failures when configure tries to execute test binaries
because `libstdc++.so.6` isn't actually available.

Add a `--with-cxx-stdlib=<libstdc++|libcxx>` option that lets the
build system specify which C++ standard library to link. When given,
the probe is skipped and the specified library is linked directly.
When not given, the original probe behavior is preserved.
2026-03-10 18:14:33 -07:00
Tony Printezis
0fa27fd28f Run single subtest from a test file
Add mechanism to be able to select a test to run from a test file. The test harness will read the JEMALLOC_TEST_NAME env and, if set, it will only run subtests with that name.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
34ace9169b Remove prof_threshold built-in event. It is trivial to implement it as user event if needed 2026-03-10 18:14:33 -07:00
Andrei Pechkurov
4d0ffa075b Fix background thread initialization race 2026-03-10 18:14:33 -07:00
Slobodan Predolac
d4908fe44a Revert "Experimental configuration option for fast path prefetch from cache_bin"
This reverts commit f9fae9f1f8.
2026-03-10 18:14:33 -07:00
Carl Shapiro
c51abba131 Determine the page size on Android from NDK header files
The definition of the PAGE_SIZE macro is used as a signal for a 32-bit
target or a 64-bit target with an older NDK.  Otherwise, a 16KiB page
size is assumed.

Closes: #2657
2026-03-10 18:14:33 -07:00
Carl Shapiro
5f353dc283 Remove an incorrect use of the address operator
The address of the local variable created_threads is a different
location than the data it points to.  Incorrectly treating these
values as being the same can cause out-of-bounds writes to the stack.

Closes: facebook/jemalloc#59
2026-03-10 18:14:33 -07:00
Carl Shapiro
365747bc8d Use the BRE construct \{1,\} for one or more consecutive matches
This removes duplication introduced by my earlier commit that
eliminating the use of the non-standard "\+" from BREs in the
configure script.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
6016d86c18 [SEC] Make SEC owned by hpa_shard, simplify the code, add stats, lock per bin 2026-03-10 18:14:33 -07:00
Slobodan Predolac
c7690e92da Remove Cirrus CI 2026-03-10 18:14:33 -07:00
Slobodan Predolac
441e840df7 Add a script to generate github actions instead of Travis CI and Cirrus 2026-03-10 18:14:33 -07:00
Guangli Dai
0988583d7c Add a mallctl for users to get an approximate of active bytes. 2026-03-10 18:14:33 -07:00
Slobodan Predolac
8a06b086f3 [EASY] Extract hpa_central component from hpa source file 2026-03-10 18:14:33 -07:00
Slobodan Predolac
355774270d [EASY] Encapsulate better, do not pass hpa_shard when hooks are enough, move shard independent actions to hpa_utils 2026-03-10 18:14:33 -07:00
Slobodan Predolac
47aeff1d08 Add experimental_enforce_hugify 2026-03-10 18:14:33 -07:00
Shirui Cheng
6d4611197e move fill/flush pointer array out of tcache.c 2026-03-10 18:14:33 -07:00
Slobodan Predolac
3678a57c10 When extracting from central, hugify_eager is different than start_as_huge 2026-03-10 18:14:33 -07:00
guangli-dai
2cfa41913e Refactor init_system_thp_mode and print it in malloc stats. 2026-03-10 18:14:33 -07:00
Slobodan Predolac
87555dfbb2 Do not release the hpa_shard->mtx when inserting newly retrieved page from central before allocating from it 2026-03-10 18:14:33 -07:00
Carl Shapiro
f714cd9249 Inline the value of an always false boolean local variable
Next to its use, which is always as an argument, we include the name
of the parameter in a constant.  This completes a partially
implemented cleanup suggested in an earlier commit.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
5e49c28ef0 [EASY] Spelling in the comments 2026-03-10 18:14:33 -07:00
Slobodan Predolac
7c40be249c Add npurges and npurge_passes to output of pa_benchmark 2026-03-10 18:14:33 -07:00
Slobodan Predolac
707aab0c95 [pa-bench] Add clock to pa benchmark 2026-03-10 18:14:33 -07:00
Slobodan Predolac
a199278f37 [HPA] Add ability to start page as huge and more flexibility for purging 2026-03-10 18:14:33 -07:00
Slobodan Predolac
ace437d26a Running clang-format on two files 2026-03-10 18:14:33 -07:00
Slobodan Predolac
2688047b56 Revert "Do not dehugify when purging"
This reverts commit 16c5abd1cd.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
de886e05d2 Revert "Remove an unused function and global variable"
This reverts commit acd85e5359.
2026-03-10 18:14:33 -07:00
guangli-dai
755735a6bf Remove Travis Windows CI for now since it has infra failures. 2026-03-10 18:14:33 -07:00
Slobodan Predolac
d70882a05d [sdt] Add some tracepoints to sec and hpa modules 2026-03-10 18:14:33 -07:00
Carl Shapiro
67435187d1 Improve the portability of grep patterns in configure.ac
The configure.ac script uses backslash plus in its grep patterns to
match one or more occurrences.  This is a GNU grep extension to the
Basic Regular Expressions syntax that fails on systems with a more
traditional grep.  This changes fixes grep patterns that use backslash
plus to use a star instead.

Closes: #2777
2026-03-10 18:14:33 -07:00
guangli-dai
261591f123 Add a page-allocator microbenchmark. 2026-03-10 18:14:33 -07:00
guangli-dai
56cdce8592 Adding trace analysis in preparation for page allocator microbenchmark. 2026-03-10 18:14:33 -07:00
Carl Shapiro
daf44173c5 Replace an instance of indentation with spaces with tabs 2026-03-10 18:14:33 -07:00
Aurélien Brooke
ce02945070 Add missing thread_event_registry.c to Visual Studio projects
This file was added by b2a35a905f.
2026-03-10 18:14:33 -07:00
lexprfuncall
c51949ea3e Update config.guess and config.sub to the latest versions
These files need to be refreshed periodically to support new platform
types.

The following command was used to retrieve the updates

curl -L -O https://git.savannah.gnu.org/cgit/config.git/plain/config.guess
curl -L -O https://git.savannah.gnu.org/cgit/config.git/plain/config.sub

Closes: #2814
2026-03-10 18:14:33 -07:00
Carl Shapiro
5a634a8d0a Always use pthread_equal to compare thread IDs
This change replaces direct comparisons of Pthread thread IDs with
calls to pthread_equal.  Directly comparing thread IDs is neither
portable nor reliable since a thread ID is defined as an opaque type
that can be implemented using a structure.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
5d5f76ee01 Remove pidfd_open call handling and rely on PIDFD_SELF 2026-03-10 18:14:33 -07:00
lexprfuncall
9442300cc3 Change the default page size to 64KiB on Aarch64 Linux
This updates the configuration script to set the default page size to
64KiB on Aarch64 Linux.  This is motivated by compatibility as a build
configured for a 64KiB page will work on kernels that use the smaller
4KiB or 16KiB pages, whereas the reverse is not true.

To make the configured page size setting more visible, the script now
displays the page size when printing the configuration results.

Users that want to override the page size in to choose a smaller value
can still do so with the --with-lg-pagesize configuration option.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
2a66c0be5a [EASY][BUGFIX] Spelling and format 2026-03-10 18:14:33 -07:00
lexprfuncall
38b12427b7 Define malloc_{write,read}_fd as non-inline global functions
The static inline definition made more sense when these functions just
dispatched to a syscall wrapper.  Since they acquired a retry loop, a
non-inline definition makes more sense.
2026-03-10 18:14:33 -07:00
lexprfuncall
9fdc1160c5 Handle interruptions and retries of read(2) and write(2) 2026-03-10 18:14:33 -07:00
lexprfuncall
48b4ad60a7 Remove an orphaned comment
This was left behind when definitions of malloc_open and malloc_close
were abstracted from code that had followed.
2026-03-10 18:14:33 -07:00
Shirui Cheng
2114349a4e Revert PR #2608: Manually revert commits 70c94d..f9c0b5
Closes: #2707
2026-03-10 18:14:33 -07:00
lexprfuncall
ced8b3cffb Fix the compilation check for process madvise
An include of unistd.h is needed to make the declaration of the
syscall function visible to the compiler.  The include of sys/mman.h
is not used at all.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
5e98585b37 Save and restore errno when calling process_madvise 2026-03-10 18:14:33 -07:00
lexprfuncall
e4fa33148a Remove an unused function and global variable
When the dehugify functionality was retired in an previous commit, a
dehugify-related function and global variable in a test was
accidentally left in-place causing builds that add -Werror to CFLAGS
to fail.
2026-03-10 18:14:33 -07:00
Slobodan Predolac
d73de95f72 Experimental configuration option for fast path prefetch from cache_bin 2026-03-10 18:14:33 -07:00
lexprfuncall
9528a2e2dd Use relaxed atomics to access the process madvise pid fd
Relaxed atomics already provide sequentially consistent access to single
location data structures.
2026-03-10 18:14:33 -07:00
lexprfuncall
a156e997d7 Do not dehugify when purging
Giving the advice MADV_DONTNEED to a range of virtual memory backed by
a transparent huge page already causes that range of virtual memory to
become backed by regular pages.
2026-03-10 18:14:33 -07:00
lexprfuncall
395e63bf7e Fix several spelling errors in comments 2026-03-10 18:14:33 -07:00
Slobodan Predolac
4246475b44 [process_madvise] Make init lazy so that python tests pass. Reset the pidfd on fork 2026-03-10 18:14:33 -07:00
Slobodan Predolac
f87bbab22c Add several USDT probes for hpa 2026-03-10 18:14:33 -07:00
Slobodan Predolac
711fff750c Add experimental support for usdt systemtap probes 2026-03-10 18:14:33 -07:00
guangli-dai
5847516692 Ignore the clang-format changes in the git blame. 2026-03-10 18:14:33 -07:00
guangli-dai
6200e8987f Reformat the codebase with the clang-format 18. 2026-03-10 18:14:33 -07:00
Shirui Cheng
a952a3b8b0 Update the default value for opt_experimental_tcache_gc and opt_calloc_madvise_threshold 2026-03-10 18:14:33 -07:00
Guangli Dai
e350c71571 Remove --enable-limit-usize-gap for cirrus CI since the config-time option is removed. 2026-03-10 18:14:33 -07:00
guangli-dai
95fc091b0f Update appveyor settings. 2026-03-10 18:14:33 -07:00
dzhao.ampere
c5547f9e64 test/unit/psset.c: fix SIGSEGV when PAGESIZE is large
When hugepage is enabled and PAGESIZE is large, the test could
ask for a stack size larger than user limit. Allocating the
memory instead can avoid the failure.

Closes: #2408
2026-03-10 18:14:33 -07:00
Slobodan Predolac
015b017973 [thread_event] Add support for user events in thread events when stats are enabled 2026-03-10 18:14:33 -07:00
Slobodan Predolac
e6864c6075 [thread_event] Remove macros from thread_event and replace with dynamic event objects 2026-03-10 18:14:33 -07:00
Qi Wang
1972241cd2 Remove unused options in the batched madvise unit tests. 2025-06-02 11:25:37 -07:00
Jason Evans
27d7960cf9 Revert "Extend purging algorithm with peak demand tracking"
This reverts commit ad108d50f1.
2025-06-02 10:44:37 -07:00
guangli-dai
edaab8b3ad Turn clang-format off for codes with multi-line commands in macros 2025-05-28 19:22:21 -07:00
guangli-dai
4531411abe Modify .clang-format to have declarations aligned 2025-05-28 19:22:21 -07:00
guangli-dai
1818170c8d Fix binshard.sh by specifying bin_shards for all sizes. 2025-05-28 19:21:49 -07:00
guangli-dai
fd60645260 Add one more check to double free validation. 2025-05-28 19:21:49 -07:00
Xin Yang
5e460bfea2 Refactor: use the cache_bin_sz_t typedef instead of direct uint16_t
any future changes to the underlying data type for bin sizes
(such as upgrading from `uint16_t` to `uint32_t`) can be achieved
by modifying only the `cache_bin_sz_t` definition.

Signed-off-by: Xin Yang <yangxin.dev@bytedance.com>
2025-05-22 10:43:33 -07:00
Xin Yang
9169e9272a Fix: Adjust CACHE_BIN_NFLUSH_BATCH_MAX size to prevent assert failures
The maximum allowed value for `nflush_batch` is
`CACHE_BIN_NFLUSH_BATCH_MAX`. However, `tcache_bin_flush_impl_small`
could potentially declare an array of `emap_batch_lookup_result_t`
of size `CACHE_BIN_NFLUSH_BATCH_MAX + 1`. leads to a `VARIABLE_ARRAY`
assertion failure, observed when `tcache_nslots_small_max` is
configured to 2048. This patch ensures the array size does not exceed
the allowed maximum.

Signed-off-by: Xin Yang <yangxin.dev@bytedance.com>
2025-05-22 10:27:09 -07:00
guangli-dai
f19a569216 Ignore formatting commit in blame. 2025-05-20 14:21:08 -07:00
Slobodan Predolac
b6338c4ff6 EASY - be explicit in non-vectorized hpa tests 2025-05-19 16:31:04 -07:00
guangli-dai
554185356b Sample format on tcache_max test 2025-05-19 15:06:13 -07:00
guangli-dai
3cee771cfa Modify .clang-format to make it more aligned with current freebsd style 2025-05-19 15:06:13 -07:00
Jiebin Sun
3c14707b01 To improve reuse efficiency, the maximum coalesced size for large extents
in the dirty ecache has been limited. This patch was tested with real
workloads using ClickHouse (Clickbench Q35) on a system with 2x240 vCPUs.
The results showed a 2X in query per second (QPS) performance and
a reduction in page faults to 29% of the previous rate. Additionally,
microbenchmark testing involved 256 memory reallocations resizing
from 4KB to 16KB in one arena, which demonstrated a 5X performance
improvement.

Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
2025-05-12 15:45:36 -07:00
guangli-dai
37bf846cc3 Fixes to prevent static analysis warnings. 2025-05-06 14:47:35 -07:00
guangli-dai
8347f1045a Renaming limit_usize_gap to disable_large_size_classes 2025-05-06 14:47:35 -07:00
Guangli Dai
01e9ecbeb2 Remove build-time configuration 'config_limit_usize_gap' 2025-05-06 14:47:35 -07:00
Slobodan Predolac
852da1be15 Add experimental option force using SYS_process_madvise 2025-04-28 18:45:30 -07:00
Slobodan Predolac
1956a54a43 [process_madvise] Use process_madvise across multiple huge_pages 2025-04-25 19:19:03 -07:00
Slobodan Predolac
0dfb4a5a1a Add output argument to hpa_purge_begin to count dirty ranges 2025-04-25 19:19:03 -07:00
Slobodan Predolac
cfa90dfd80 Refactor hpa purging to prepare for vectorized call across multiple pages 2025-04-25 19:19:03 -07:00
Qi Wang
a3910b9802 Avoid forced purging during thread-arena migration when bg thd is on. 2025-04-25 19:18:20 -07:00
guangli-dai
c23a6bfdf6 Add opt.limit_usize_gap to stats 2025-04-16 10:38:10 -07:00
guangli-dai
c20a63a765 Silence the uninitialized warning from clang. 2025-04-16 10:38:10 -07:00
Qi Wang
f81fb92a89 Remove Travis CI macOS configs (not supported anymore). 2025-04-14 15:27:38 -07:00
Slobodan Predolac
f19f49ef3e if process_madvise is supported, call it when purging hpa 2025-04-04 13:57:42 -07:00
Kaspar M. Rohrer
80e9001af3 Move `extern "C" specifications for C++ to where they are needed
This should fix errors when compiling C++ code with modules enabled on clang.
2025-03-31 10:41:51 -07:00
Shirui Cheng
3688dfb5c3 fix assertion error in huge_arena_auto_thp_switch() when b0 is deleted in unit test 2025-03-20 12:45:23 -07:00
Jay Lee
a4defdb854 detect false failure of strerror_r
See tikv/jemallocator#108.

In a summary, test on `strerror_r` can fail due to reasons other
than `strerror_r` itself, so add an additional test to determine
the failure is expected.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
2025-03-17 17:50:20 -07:00
Shirui Cheng
e1a77ec558 Support THP with Huge Arena in PAC 2025-03-17 16:06:43 -07:00
Audrey Dutcher
86bbabac32 background_thread: add fallback for pthread_create dlsym
If jemalloc is linked into a shared library, the RTLD_NEXT dlsym call
may fail since RTLD_NEXT is only specified to search all objects after
the current one in the loading order, and the pthread library may be
earlier in the load order. Instead of failing immediately, attempt one
more time to find pthread_create via RTLD_GLOBAL.

Errors cascading from this were observed on FreeBSD 14.1.
2025-03-17 09:41:04 -07:00
Guangli Dai
81f35e0b55 Modify Travis tests to use frameptr when profiling 2025-03-13 17:15:42 -07:00
Guangli Dai
773b5809f9 Fix frame pointer based unwinder to handle changing stack range 2025-03-13 17:15:42 -07:00
Dmitry Ilvokhin
ad108d50f1 Extend purging algorithm with peak demand tracking
Implementation inspired by idea described in "Beyond malloc efficiency
to fleet efficiency: a hugepage-aware memory allocator" paper [1].

Primary idea is to track maximum number (peak) of active pages in use
with sliding window and then use this number to decide how many dirty
pages we would like to keep.

We are trying to estimate maximum amount of active memory we'll need in
the near future. We do so by projecting future active memory demand
(based on peak active memory usage we observed in the past within
sliding window) and adding slack on top of it (an overhead is reasonable
to have in exchange of higher hugepages coverage). When peak demand
tracking is off, projection of future active memory is active memory we
are having right now.

Estimation is essentially the same as `nactive_max * (1 + dirty_mult)`.

Peak demand purging algorithm controlled by two config options. Option
`hpa_peak_demand_window_ms` controls duration of sliding window we track
maximum active memory usage in and option `hpa_dirty_mult` controls
amount of slack we are allowed to have as a percent from maximum active
memory usage. By default `hpa_peak_demand_window_ms == 0` now and we
have same behaviour (ratio based purging) that we had before this
commit.

[1]: https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf
2025-03-13 10:12:22 -07:00
Qi Wang
22440a0207 Implement process_madvise support.
Add opt.process_madvise_max_batch which determines if process_madvise is enabled
(non-zero) and the max # of regions in each batch.  Added another limiting
factor which is the space to reserve on stack, which results in the max batch of
128.
2025-03-07 15:32:32 -08:00
Guangli Dai
70f019cd3a Enable limit-usize-gap in CI tests.
Considering the new usize calculation will be default soon, add the
config option in for Travis, Cirrus and appveyor.
2025-03-06 15:08:13 -08:00
Guangli Dai
6035d4a8d3 Cache extra extents in the dirty pool from ecache_alloc_grow 2025-03-06 15:08:13 -08:00
guangli-dai
c067a55c79 Introducing a new usize calculation policy
Converting size to usize is what jemalloc has been done by ceiling
size to the closest size class. However, this causes lots of memory
wastes with HPA enabled.  This commit changes how usize is calculated so
that the gap between two contiguous usize is no larger than a page.
Specifically, this commit includes the following changes:

1. Adding a build-time config option (--enable-limit-usize-gap) and a
runtime one (limit_usize_gap) to guard the changes.
When build-time
config is enabled, some minor CPU overhead is expected because usize
will be stored and accessed apart from index.  When runtime option is
also enabled (it can only be enabled with the build-time config
enabled). a new usize calculation approach wil be employed.  This new
calculation will ceil size to the closest multiple of PAGE for all sizes
larger than USIZE_GROW_SLOW_THRESHOLD instead of using the size classes.
Note when the build-time config is enabled, the runtime option is
default on.

2. Prepare tcache for size to grow by PAGE over GROUP*PAGE.
To prepare for the upcoming changes where size class grows by PAGE when
larger than NGROUP * PAGE, disable the tcache when it is larger than 2 *
NGROUP * PAGE. The threshold for tcache is set higher to prevent perf
regression as much as possible while usizes between NGROUP * PAGE and 2 *
NGROUP * PAGE happen to grow by PAGE.

3. Prepare pac and hpa psset for size to grow by PAGE over GROUP*PAGE
For PAC, to avoid having too many bins, arena bins still have the same
layout.  This means some extra search is needed for a page-level request that
is not aligned with the orginal size class: it should also search the heap
before the current index since the previous heap might also be able to
have some allocations satisfying it.  The same changes apply to HPA's
psset.
This search relies on the enumeration of the heap because not all allocs in
the previous heap are guaranteed to satisfy the request.  To balance the
memory and CPU overhead, we currently enumerate at most a fixed number
of nodes before concluding none can satisfy the request during an
enumeration.

4. Add bytes counter to arena large stats.
To prepare for the upcoming usize changes, stats collected by
multiplying alive allocations and the bin size is no longer accurate.
Thus, add separate counters to record the bytes malloced and dalloced.

5. Change structs use when freeing to avoid using index2size for large sizes.
  - Change the definition of emap_alloc_ctx_t
  - Change the read of both from edata_t.
  - Change the assignment and usage of emap_alloc_ctx_t.
  - Change other callsites of index2size.
Note for the changes in the data structure, i.e., emap_alloc_ctx_t,
will be used when the build-time config (--enable-limit-usize-gap) is
enabled but they will store the same value as index2size(szind) if the
runtime option (opt_limit_usize_gap) is not enabled.

6. Adapt hpa to the usize changes.
Change the settings in sec to limit is usage for sizes larger than
USIZE_GROW_SLOW_THRESHOLD and modify corresponding tests.

7. Modify usize calculation and corresponding tests.
Change the sz_s2u_compute. Note sz_index2size is not always safe now
while sz_size2index still works as expected.
2025-03-06 15:08:13 -08:00
Guangli Dai
ac279d7e71 Fix profiling sample metadata lookup during xallocx 2025-03-04 14:42:04 -08:00
Qi Wang
f55e0c3f5c Remove unsupported Cirrus CI config 2025-03-03 16:29:04 -08:00
Dmitry Ilvokhin
499f306859 Fix arena 0 deferral_allowed flag init
Arena 0 have a dedicated initialization path, which differs from
initialization path of other arenas. The main difference for the purpose
of this change is that we initialize arena 0 before we initialize
background threads. HPA shard options have `deferral_allowed` flag which
should be equal to `background_thread_enabled()` return value, but it
wasn't the case before this change, because for arena 0
`background_thread_enabled()` was initialized correctly after arena 0
initialization phase already ended.

Below is initialization sequence for arena 0 after this commit to
illustrate everything still should be initialized correctly.

* `hpa_central_init` initializes HPA Central, before we initialize every
  HPA shard (including arena's 0).
* `background_thread_boot1` initializes `background_thread_enabled()`
  return value.
* `pa_shard_enable_hpa` initializes arena 0 HPA shard.

```
                       malloc_init_hard -------------
                      /           /                  \
                     /           /                    \
                    /           /                      \
malloc_init_hard_a0_locked  background_thread_boot1  pa_shard_enable_hpa
        /                     /                          \
       /                     /                            \
      /                     /                              \
arena_boot       background_thread_enabled_seta         hpa_shard_init
     |
     |
pa_central_init
     |
     |
hpa_central_init
```
2025-02-18 12:10:35 -08:00
Dmitry Ilvokhin
421b17a622 Remove age_counter from hpa_central
Before this commit we had two age counters: one global in HPA central
and one local in each HPA shard. We used HPA shard counter, when we are
reused empty pageslab and HPA central counter anywhere else. They
suppose to be comparable, because we use them for allocation placement
decisions, but in reality they are not, there is no ordering guarantees
between them.

At the moment, there is no way for pageslab to migrate between HPA
shards, so we don't actually need HPA central age counter.
2025-02-13 16:00:41 -08:00
roblabla
c17bf8b368 Disable config from file or envvar with build flag
This adds a new autoconf flag, --disable-user-config, which disables
reading the configuration from /etc/malloc.conf or the MALLOC_CONF
environment variable. This can be useful when integrating jemalloc in a
binary that internally handles all aspects of the configuration and
shouldn't be impacted by ambient change in the environment.
2025-02-05 15:01:50 -08:00
Dmitry Ilvokhin
34c823f147 Add autoconf options to enable sanitizers
This commit allows to enable sanitizers with autoconf options, instead
of modifying `CFLAGS`, `CXXFLAGS` and `LDFLAGS` directly.

* `--enable-tsan` option to enable Thread Sanitizer.
* `--enable-ubsan` option to enable Undefined Behaviour Sanitizer.

End goal is to speedup development by finding problems quickly, early
and easier. Eventually, when all current issues will be fixed, we can
enable sanitizers in CI. Fortunately, there are not a lot of problems we
need to fix.

Address Sanitizer is a bit controversial, because it replaces memory
allocator, so we decided to left it out for a while.

Below are couple of examples of how tests look like under different
sanitizers at the moment.

```
$  ../configure --enable-tsan --enable-debug
<...>
asan               : 0
tsan               : 1
ubsan              : 0
$ make -j`nproc` check
<...>
  Thread T13 (tid=332043, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x61748)
    #1 thd_create ../test/src/thd.c:25 (bin_batching+0x5631ca)
    #2 stress_run ../test/unit/bin_batching.c:148
(bin_batching+0x40364c)
    #3 test_races ../test/unit/bin_batching.c:249
(bin_batching+0x403d79)
    #4 p_test_impl ../test/src/test.c:149 (bin_batching+0x562811)
    #5 p_test_no_reentrancy ../test/src/test.c:213
(bin_batching+0x562d35)
    #6 main ../test/unit/bin_batching.c:268 (bin_batching+0x40417e)

SUMMARY: ThreadSanitizer: data race
../include/jemalloc/internal/edata.h:498 in edata_nfree_inc
```

```
$ ../configure --enable-ubsan --enable-debug
<...>
asan               : 0
tsan               : 0
ubsan              : 1
$ make -j`nproc` check
<...>
=== test/unit/hash ===
../test/unit/hash.c:119:16: runtime error: left shift of 176 by 24
places cannot be represented in type 'int'
<...>
```
2025-02-05 14:28:28 -08:00
Qi Wang
3bc89cfeca Avoid implicit conversion in test/unit/prof_threshold 2025-01-31 10:18:36 -08:00
Qi Wang
1abeae9ebd Fix test/unit/prof_threshold when !config_stats 2025-01-30 10:39:49 -08:00
Shai Duvdevani
257e64b968 Unlike prof_sample which is supported only with profiling mode active, prof_threshold is intended to be an always-supported allocation callback with much less overhead. The usage of the threshold allows performance critical callers to change program execution based on the callback: e.g. drop caches when memory becomes high or to predict the program is about to OOM ahead of time using peak memory watermarks. 2025-01-29 18:55:52 -08:00
Dmitry Ilvokhin
ef8e512e29 Fix bitmap_ffu out of range read
We tried to load `g` from `bitmap[i]` before checking it is actually a
valid load. Tweaked a loop a bit to `break` early, when we are done
scanning for bits.

Before this commit undefined behaviour sanitizer from GCC 14+ was
unhappy at `test/unit/bitmap` test with following error.

```
../include/jemalloc/internal/bitmap.h:293:5: runtime error: load of
address 0x7bb1c2e08008 with insufficient space for an object of type
'const bitmap_t'
<...>
    #0 0x62671a149954 in bitmap_ffu ../include/jemalloc/internal/bitmap.h:293
    #1 0x62671a149954 in test_bitmap_xfu_body ../test/unit/bitmap.c:275
    #2 0x62671a14b767 in test_bitmap_xfu ../test/unit/bitmap.c:323
    #3 0x62671a376ad1 in p_test_impl ../test/src/test.c:149
    #4 0x62671a377135 in p_test ../test/src/test.c:200
    #5 0x62671a13da06 in main ../test/unit/bitmap.c:336
<...>
```
2025-01-28 10:42:20 -08:00
Qi Wang
607b866035 Check for 0 input when setting max_background_thread through mallctl.
Reported by @nc7s.
2025-01-28 10:38:56 -08:00
Qi Wang
20cc983314 Fix the gettid() detection caught by @mrluanma . 2025-01-22 10:30:53 -08:00
Dmitry Ilvokhin
52fa9577ba Fix integer overflow in test/unit/hash.c
`final[3]` is `uint8_t`. Integer conversion rank of `uint8_t` is lower
than integer conversion rank of `int`, so `uint8_t` got promoted to
`int`, which is signed integer type. Shift `final[3]` value left on 24,
when leftmost bit is set overflows `int` and it is undefined behaviour.

Before this change Undefined Behaviour Sanitizer was unhappy about it
with the following message.

```
../test/unit/hash.c:119:25: runtime error: left shift of 176 by 24
places cannot be represented in type 'int'
```

After this commit problem is gone.
2025-01-17 12:54:22 -08:00
Dan Horák
17881ebbfd Add configure check for gettid() presence
The gettid() function is available on Linux in glibc only since version
2.30. There are supported distributions that still use older glibc
version. Thus add a configure check if the gettid() function is
available and extend the check in src/prof_stack_range.c so it's skipped
also when gettid() isn't available.

Fixes: https://github.com/jemalloc/jemalloc/issues/2740
2024-12-17 12:40:54 -08:00
appujee
4b88bddbca Conditionally remove unreachable for C23+ 2024-12-17 12:39:00 -08:00
appujee
d8486b2653 Remove unreachable() macro as c23 already defines it.
Taken from https://android-review.git.corp.google.com/c/platform/external/jemalloc_new/+/3316478

This might need more cleanups to remove the definition of JEMALLOC_INTERNAL_UNREACHABLE.
2024-12-17 12:39:00 -08:00
Guangli Dai
587676fee8 Disable psset test when hugepage size is too large. 2024-12-17 12:35:35 -08:00
Guangli Dai
a17385a882 Enable large hugepage tests for arm64 on Travis 2024-12-17 12:35:35 -08:00
Guangli Dai
6786934280 Fix ehooks assertion for arena creation 2024-12-11 13:33:32 -08:00
Dmitry Ilvokhin
46690c9ec0 Fix test_retained on boxes with a lot of CPUs
We are trying to create `ncpus * 2` threads for this test and place them
into `VARIABLE_ARRAY`, but `VARIABLE_ARRAY` can not be more than
`VARIABLE_ARRAY_SIZE_MAX` bytes. When there are a lot of threads on the
box test always fails.

```
$ nproc
176

$ make -j`nproc` tests_unit && ./test/unit/retained
<jemalloc>: ../test/unit/retained.c:123: Failed assertion:
"sizeof(thd_t) * (nthreads) <= VARIABLE_ARRAY_SIZE_MAX"
Aborted (core dumped)
```

There is no need for high concurrency for this test as we are only
checking stats there and it's behaviour is quite stable regarding number
of allocating threads.

Limited number of threads to 16 to save compute resources (on CI for
example) and reduce tests running time.

Before the change (`nproc` is 80 on this box).

```
$ make -j`nproc` tests_unit && time ./test/unit/retained
<...>
real    0m0.372s
user    0m14.236s
sys     0m12.338s
```

After the change (same box).

```
$ make -j`nproc` tests_unit && time ./test/unit/retained
<...>
real    0m0.018s
user    0m0.108s
sys     0m0.068s
```
2024-12-02 14:12:26 -08:00
Dmitry Ilvokhin
6092c980a6 Expose psset state stats
When evaluating changes in HPA logic, it is useful to know internal
`hpa_shard` state. Great deal of this state is `psset`. Some of the
`psset` stats was available, but in disaggregated form, which is not
very convenient. This commit exposed `psset` counters to `mallctl`
and malloc stats dumps.

Example of how malloc stats dump will look like after the change.

HPA shard stats:
  Pageslabs: 14899 (4354 huge, 10545 nonhuge)
  Active pages: 6708166 (2228917 huge, 4479249 nonhuge)
  Dirty pages: 233816 (331 huge, 233485 nonhuge)
  Retained pages: 686306
  Purge passes: 8730 (10 / sec)
  Purges: 127501 (146 / sec)
  Hugeifies: 4358 (5 / sec)
  Dehugifies: 4 (0 / sec)

Pageslabs, active pages, dirty pages and retained pages are rows added
by this change.
2024-11-21 09:23:32 -08:00
Dmitry Ilvokhin
3820e38dc1 Remove validation for HPA ratios
Config validation was introduced at 3aae792b with main intention to fix
infinite purging loop, but it didn't actually fix the underlying
problem, just masked it. Later 47d69b4ea was merged to address the same
problem.

Options `hpa_dirty_mult` and `hpa_hugification_threshold` have different
application dimensions: `hpa_dirty_mult` applied to active memory on the
shard, but `hpa_hugification_threshold` is a threshold for single
pageslab (hugepage). It doesn't make much sense to sum them up together.

While it is true that too high value of `hpa_dirty_mult` and too low
value of `hpa_hugification_threshold` can lead to pathological
behaviour, it is true for other options as well. Poor configurations
might lead to suboptimal and sometimes completely unacceptable
behaviour and that's OK, that is exactly the reason why they are called
poor.

There are other mechanism exist to prevent extreme behaviour, when we
hugified and then immediately purged page, see
`hpa_hugify_blocked_by_ndirty` function, which exist to prevent exactly
this case.

Lastly, `hpa_dirty_mult + hpa_hugification_threshold >= 1` constraint is
too tight and prevents a lot of valid configurations.
2024-11-20 18:59:07 -08:00
Dmitry Ilvokhin
0ce13c6fb5 Add opt hpa_hugify_sync to hugify synchronously
Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort
synchronous collapse of the native pages mapped by the memory range into
transparent huge pages.

Synchronous hugification might be beneficial for at least two reasons:
we are not relying on khugepaged anymore and get an instant feedback if
range wasn't hugified.

If `hpa_hugify_sync` option is on, we'll try to perform synchronously
collapse and if it wasn't successful, we'll fallback to asynchronous
behaviour.
2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin
a361e886e2 Move je_cv_thp logic closer to definition 2024-11-20 10:52:52 -08:00
Dmitry Ilvokhin
b82333fdec Split stats_arena_hpa_shard_print function
Make multiple functions from `stats_arena_hpa_shard_print` for
readability and ease of change in the future.
2024-11-08 12:18:15 -08:00
Dmitry Ilvokhin
b9758afff0 Add nstime_ms_since to get time since in ms
Milliseconds are used a lot in hpa, so it is convenient to have
`nstime_ms_since` function instead of dividing to `MILLION` constantly.

For consistency renamed `nstime_msec` to `nstime_ms` as `ms` abbreviation
is used much more commonly across codebase than `msec`.

```
$ grep -Rn '_msec' include src | wc -l
2

$ grep -RPn '_ms( |,|:)' include src | wc -l
72
```

Function `nstime_msec` wasn't used anywhere in the code yet.
2024-11-08 10:37:28 -08:00
Qi Wang
2a693b83d2 Fix the sized-dealloc safety check abort msg. 2024-10-14 10:34:15 -07:00
Qi Wang
6d625d5e5e Add support for clock_gettime_nsec_np()
Prefer clock_gettime_nsec_np(CLOCK_UPTIME_RAW) to mach_absolute_time().
2024-10-14 10:33:27 -07:00
Guangli Dai
397827a27d Updated jeprof with more symbols to filter. 2024-10-14 10:31:58 -07:00
Qi Wang
02251c0070 Update the configure cache file example in INSTALL.md 2024-10-10 16:41:48 -07:00
Qi Wang
8c2b8bcf24 Update doc to reflect muzzy decay is disabled by default.
It has been disabled since 5.2.0 (in #1421).
2024-10-10 16:41:23 -07:00
Nathan Slingerland
edc1576f03 Add safe frame-pointer backtrace unwinder 2024-10-01 11:01:56 -07:00
Ben Niu
3a0d9cdadb Use MSVC __declspec(thread) for TSD on Windows 2024-09-30 11:33:44 -07:00
Guangli Dai
1c900088c3 Do not support hpa if HUGEPAGE is too large. 2024-09-27 15:34:13 -07:00
Dmitry Ilvokhin
4f4fd42447 Remove strict_min_purge_interval option
Option `experimental_hpa_strict_min_purge_interval` was expected to be
temporary to simplify rollout of a bugfix. Now, when bugfix rollout is
complete it is safe to remove this option.
2024-09-25 11:49:18 -07:00
Qi Wang
6cc42173cb Assert the mutex is locked within malloc_mutex_assert_owner(). 2024-09-23 18:06:07 -07:00
Qi Wang
44db479fad Fix the lock owner sanity checking during background thread boot.
During boot, some mutexes are not initialized yet, plus there's no point taking
many mutexes while everything is covered by the global init lock, so the locking
assumptions in some functions (e.g. background_thread_enabled_set()) can't be
enforced.  Skip the lock owner check in this case.
2024-09-23 18:06:07 -07:00
Guangli Dai
0181aaa495 Optimize edata_cmp_summary_compare when __uint128_t is available 2024-09-23 16:23:42 -07:00
roblabla
734f29ce56 Fix compilation with MSVC 2022
On MSVC, log is an intrinsic that doesn't require libm. However,
AC_SEARCH_LIBS does not successfully detect this, as it will try to
compile a program using the wrong signature for log. Newer versions of
MSVC CL detects this and rejects the program with the following
messages:

conftest.c(40): warning C4391: 'char log()': incorrect return type for intrinsic function, expected 'double'
conftest.c(44): error C2168: 'log': too few actual parameters for intrinsic function

Since log is always available on MSVC (it's been around since the dawn
of time), we simply always assume it's there if MSVC is detected.
2024-09-23 10:42:31 -07:00
Qi Wang
de5606d0d8 Fix a missing init value warning caught by static analysis. 2024-09-20 16:56:07 -07:00
Qi Wang
1960536b61 Add malloc_mutex_is_locked() sanity checks. 2024-09-20 16:56:07 -07:00
Qi Wang
3eb7a4b53d Fix mutex state tracking around pthread_cond_wait().
pthread_cond_wait drops and re-acquires the mutex internally, w/o
going through our wrapper.  Update the locked state explicitly.
2024-09-20 16:56:07 -07:00
Qi Wang
661fb1e672 Fix the locked flag for malloc_mutex_trylock(). 2024-09-20 16:56:07 -07:00
Guangli Dai
db4f0e7182 Add travis tests for arm64. 2024-09-12 15:40:04 -07:00
Nathan Slingerland
8c2e15d1a5 Add malloc_open() / malloc_close() reentrancy safe helpers 2024-09-12 15:38:08 -07:00
Nathan Slingerland
60f472f367 Fix initialization of pop_attempt_results in bin_batching test 2024-09-12 11:36:17 -07:00
Qi Wang
323ed2e3a8 Optimize fast path to allow static size class computation.
After inlining at LTO time, many callsites have input size known which means the
index and usable size can be translated at compile time.  However the size-index
lookup table prevents it -- this commit solves that by switching to the compute
approach when the size is detected to be a known const.
2024-09-12 11:34:09 -07:00
Qi Wang
c1a3ca3755 Adjust the value width in stats output.
Some of the values are accumulative and can reach high after running for long
periods.
2024-09-11 14:29:32 -07:00
Qi Wang
3383b98f1b Check if the huge page size is expected when enabling HPA. 2024-09-04 15:43:59 -07:00
Qi Wang
cd05b19f10 Fix the VM over-reservation on aarch64 w/ larger pages.
HUGEPAGE could be larger on some platforms (e.g. 512M on aarch64 w/ 64K pages),
in which case it would cause grow_retained / exp_grow to over-reserve VMs.

Similarly, make sure the base alloc has a const 2M alignment.
2024-09-04 15:43:59 -07:00
Shirui Cheng
baa5a90cc6 fix nstime_update_mock in arena_decay unit test 2024-08-29 10:50:33 -07:00
Shirui Cheng
7c99686165 Better handle burst allocation on tcache_alloc_small_hard 2024-08-29 10:50:33 -07:00
Shirui Cheng
0c88be9e0a Regulate GC frequency by requiring a time interval between two consecutive GCs 2024-08-29 10:50:33 -07:00
Shirui Cheng
e2c9f3a9ce Take locality into consideration when doing GC flush 2024-08-29 10:50:33 -07:00
Shirui Cheng
14d5dc136a Allow a range for the nfill passed to arena_cache_bin_fill_small 2024-08-29 10:50:33 -07:00
Shirui Cheng
f68effe4ac Add a runtime option opt_experimental_tcache_gc to guard the new design 2024-08-29 10:50:33 -07:00
Ben Niu
9e123a833c Leverage new Windows API TlsGetValue2 for performance 2024-08-28 16:50:33 -07:00
Qi Wang
e29ac61987 Limit Cirrus CI to freebsd 15 and 14 2024-08-28 16:33:36 -07:00
Qi Wang
bd0a5b0f3b Fix static analysis warnings.
Newly reported warnings included several reserved macro identifier, and
false-positive used-uninitialized.
2024-08-28 16:03:53 -07:00
Guangli Dai
5b72ac098a Remove tests for ppc64 on Travic CI. 2024-08-26 09:53:00 -07:00
Shirui Cheng
8c54637f8c Better trigger race condition in bin_batching unit test 2024-08-23 14:10:04 -07:00
Dmitry Ilvokhin
c7ccb8d7e9 Add experimental prefix to hpa_strict_min_purge_interval
Goal is to make it obvious this option is experimental.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
aaa29003ab Limit maximum number of purged slabs with option
Option `experimental_hpa_max_purge_nhp` introduced for backward
compatibility reasons: to make it possible to have behaviour similar
to buggy `hpa_strict_min_purge_interval` implementation.

When `experimental_hpa_max_purge_nhp` is set to -1, there is no limit
to number of slabs we'll purge on each iteration. Otherwise, we'll purge
no more than `experimental_hpa_max_purge_nhp` hugepages (slabs). This in
turn means we might not purge enough dirty pages to satisfy
`hpa_dirty_mult` requirement.

Combination of `hpa_dirty_mult`, `experimental_hpa_max_purge_nhp` and
`hpa_strict_min_purge_interval` options allows us to have steady rate of
pages returned back to the system. This provides a strickier latency
guarantees as number of `madvise` calls is bounded (and hence number of
TLB shootdowns is limited) in exchange to weaker memory usage
guarantees.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
143f458188 Fix hpa_strict_min_purge_interval option logic
We update `shard->last_purge` on each call of `hpa_try_purge` if we
purged something. This means, when `hpa_strict_min_purge_interval`
option is set only one slab will be purged, because on the next
call condition for too frequent purge protection
`since_last_purge_ms < shard->opts.min_purge_interval_ms` will always
be true. This is not an intended behaviour.

Instead, we need to check `min_purge_interval_ms` once and purge as many
pages as needed to satisfy requirements for `hpa_dirty_mult` option.

Make possible to count number of actions performed in unit tests (purge,
hugify, dehugify) instead of binary: called/not called. Extended current
unit tests with cases where we need to purge more than one page for a
purge phase.
2024-08-20 10:02:38 -07:00
Dmitry Ilvokhin
0a9f51d0d8 Simplify hpa_shard_maybe_do_deferred_work
It doesn't make much sense to repeat purging once we done with
hugification, because we can de-hugify pages that were hugified just
moment ago for no good reason. Let them wait next deferred work phase
instead. And if they still meeting purging conditions then, purge them.
2024-08-20 10:02:38 -07:00
Amaury Séchet
a25b9b8ba9 Simplify the logic when bumping lg_fill_div. 2024-08-06 13:31:49 -07:00
Shirui Cheng
8fefabd3a4 increase the ncached_max in fill_flush test case to 1024 2024-08-06 13:16:09 -07:00
Shirui Cheng
47c9bcd402 Use a for-loop to fulfill flush requests that are larger than CACHE_BIN_NFLUSH_BATCH_MAX items 2024-08-06 13:16:09 -07:00
Shirui Cheng
48f66cf4a2 add a size check when declare a stack array to be less than 2048 bytes 2024-08-06 13:16:09 -07:00
Burton Li
8dc97b1108 Fix NSTIME_MONOTONIC for win32 implementation 2024-07-30 10:30:41 -07:00
Nathan Slingerland
bc32ddff2d Add usize to prof_sample_hook_t 2024-07-30 10:29:30 -07:00
Dmitry Ilvokhin
b66f689764 Emit long string values without truncation
There are few long options (`bin_shards` and `slab_sizes` for example)
when they are specified and we emit statistics value gets truncated.

Moved emitting logic for strings into separate `emitter_emit_str`
function. It will try to emit string same way as before and if value is
too long will fallback emiting rest partially with chunks of `BUF_SIZE`.

Justification for long strings (longer than `BUF_SIZE`) is not
supported.
2024-07-29 13:58:31 -07:00
Danny Lin
c893fcd169 Change macOS mmap tag to fix conflict with CoreMedia
Tag 101 is assigned to "CoreMedia Capture Data", which makes for confusing output when debugging.

To avoid conflicts, use a tag in the reserved application-specific range from 240–255 (inclusive).

All assigned tags: 94d3b45284/osfmk/mach/vm_statistics.h (L773-L775)
2024-06-26 14:53:48 -07:00
Shirui Cheng
a1fcbebb18 skip tcache GC for tcache_max unit test 2024-06-25 12:59:45 -07:00
Guangli Dai
8477ec9562 Set dependent as false for all rtree reads without ownership 2024-06-24 10:50:20 -07:00
Guangli Dai
21bcc0a8d4 Make JEMALLOC_CXX_THROW definition compatible with newer C++ versions 2024-06-13 11:03:05 -07:00
Dmitry Ilvokhin
867c6dd7dc Option to guard hpa_min_purge_interval_ms fix
Change in `hpa_min_purge_interval_ms` handling logic is not backward
compatible as it might increase memory usage. Now this logic guarded by
`hpa_strict_min_purge_interval` option.

When `hpa_strict_min_purge_interval` is true, we will purge no more than
`hpa_min_purge_interval_ms`. When `hpa_strict_min_purge_interval` is
false, old purging logic behaviour is preserved.

Long term strategy migrate all users of hpa to new logic and then delete
`hpa_strict_min_purge_interval` option.
2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin
91a6d230db Respect hpa_min_purge_interval_ms option
Currently, hugepages aware allocator backend works together with classic
one as a fallback for not yet supported allocations. When background
threads are enabled wake up time for classic interfere with hpa as there
were no checks inside hpa purging logic to check if we are not purging too
frequently. If background thread is running and `hpa_should_purge`
returns true, then we will purge, even if we purged less than
hpa_min_purge_interval_ms ago.
2024-06-07 10:52:41 -07:00
Dmitry Ilvokhin
90c627edb7 Export hugepage size with arenas.hugepage 2024-06-05 15:37:41 -07:00
David Goldblatt
f9c0b5f7f8 Bin batching: add some stats.
This lets us easily see what fraction of flush load is being taken up by the
bins, and helps guide future optimization approaches (for example: should we
prefetch during cache bin fills? It depends on how many objects the average fill
pops out of the batch).
2024-05-22 10:30:31 -07:00
David Goldblatt
fc615739cb Add batching to arena bins.
This adds a fast-path for threads freeing a small number of allocations to
bins which are not their "home-base" and which encounter lock contention in
attempting to do so. In producer-consumer workflows, such small lock hold times
can cause lock convoying that greatly increases overall bin mutex contention.
2024-05-22 10:30:31 -07:00
David Goldblatt
44d91cf243 Tcache flush: Partition by bin before locking.
This accomplishes two things:
- It avoids a full array scan (and any attendant branch prediction misses, etc.)
  while holding the bin lock.
- It allows us to know the number of items that will be flushed before flushing
  them, which will (in an upcoming commit) let us know if it's safe to use the
  batched flush (in which case we won't acquire the bin mutex).
2024-05-22 10:30:31 -07:00
David Goldblatt
6e56848850 Tcache: Split up small/large handling.
The main bits of shared code are the edata filtering and the stats flushing
logic, both of which are fairly simple to read and not so painful to duplicate.
The shared code comes at the cost of guarding all the subtle logic with
`if (small)`, which doesn't feel worth it.
2024-05-22 10:30:31 -07:00
David Goldblatt
c085530c71 Tcache batching: Plumbing
In the next commit, we'll start using the batcher to eliminate mutex traffic.
To avoid cluttering up that commit with the random bits of busy-work it entails,
we'll centralize them here.  This commit introduces:
- A batched bin type.
- The ability to mix batched and unbatched bins in the arena.
- Conf parsing to set batches per size and a max batched size.
- mallctl access to the corresponding opt-namespace keys.
- Stats output of the above.
2024-05-22 10:30:31 -07:00
David Goldblatt
70c94d7474 Add batcher module.
This can be used to batch up simple operation commands for later use by another
thread.
2024-05-22 10:30:31 -07:00
David Goldblatt
86f4851f5d Add clang static analyzer suppression macro. 2024-05-22 10:30:31 -07:00
Amaury Séchet
5afff2e44e Simplify the logic in tcache_gc_small. 2024-05-02 18:52:19 -07:00
Qi Wang
8d8379da44 Fix background_thread creation for the oversize_arena.
Bypassing background thread creation for the oversize_arena used to be an
optimization since that arena had eager purging.  However #2466 changed the
purging policy for the oversize_arena -- specifically it switched to the default
decay time when background_thread is enabled.

This issue is noticable when the number of arenas is low: whenever the total #
of arenas is <= 4 (which is the default max # of background threads), in which
case the purging will be stalled since no background thread is created for the
oversize_arena.
2024-05-02 14:45:18 -07:00
Dmitry Ilvokhin
47d69b4eab HPA: Fix infinite purging loop
One of the condition to start purging is `hpa_hugify_blocked_by_ndirty`
function call returns true. This can happen in cases where we have no
dirty memory for this shard at all. In this case purging loop will be an
infinite loop.

`hpa_hugify_blocked_by_ndirty` was introduced at 0f6c420, but at that
time purging loop has different form and additional `break` was not
required. Purging loop form was re-written at 6630c5989, but additional
exit condition wasn't added there at the time.

Repo code was shared by Patrik Dokoupil at [1], I stripped it down to
minimum to reproduce issue in jemalloc unit tests.

[1]: https://github.com/jemalloc/jemalloc/pull/2533
2024-04-30 13:46:32 -07:00
Qi Wang
fa451de17f Fix the tcache flush sanity checking around ncached and nstashed.
When there were many items stashed, it's possible that after flushing stashed,
ncached is already lower than the remain, in which case the flush can simply
return at that point.
2024-04-12 16:01:55 -07:00
debing.sun
630434bb0a Fixed type error with allocated that caused incorrect printing on 32bit 2024-04-09 14:44:43 -07:00
Shirui Cheng
4b555c11a5 Enable heap profiling on MacOS 2024-04-09 12:57:01 -07:00
Daniel Hodges
11038ff762 Add support for namespace pids in heap profile names
This change adds support for writing pid namespaces to the filename of a
heap profile. When running with namespaces pids may reused across
namespaces and if mounts are shared where profiles are written there is
not a great way to differentiate profiles between pids.

Signed-off-by: Daniel Hodges <hodges.daniel.scott@gmail.com>
Signed-off-by: Daniel Hodges <hodgesd@fb.com>
2024-04-09 10:27:52 -07:00
Qi Wang
83b075789b rallocx path: only set errno on the realloc case. 2024-04-05 17:41:43 -07:00
Shirui Cheng
5081c16bb4 Experimental calloc implementation with using memset on larger sizes 2024-04-04 15:31:56 -07:00
Juhyung Park
38056fea64 Set errno to ENOMEM on rallocx() OOM failures
realloc() and rallocx() shares path, and realloc() should set errno to
ENOMEM upon OOM failures.

Fixes: ee961c2310 ("Merge realloc and rallocx pathways.")
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
2024-04-04 15:13:22 -07:00
Dmitry Ilvokhin
268e8ee880 Include HPA ndirty into page allocator ndirty stat 2024-04-04 12:17:30 -07:00
Dmitry Ilvokhin
b2e59a96e1 Introduce getters for page allocator shard stats
Access nactive, ndirty and nmuzzy throught getters and not directly.
There are no functional change, but getters are required to propagate
HPA's statistics up to Page Allocator's statitics.
2024-04-04 12:17:30 -07:00
Amaury Séchet
92aa52c062 Reduce nesting in phn_merge_siblings using an early return. 2024-03-14 13:08:17 -07:00
Amaury Séchet
10d713151d Ensure that the root of a heap is always the best element. 2024-03-14 13:07:45 -07:00
Minsoo Choo
1978e5cdac Update acitons/checkout and actions/upload-artifact to v4 2024-03-12 12:59:15 -07:00
XChy
ed9b00a96b Replace unsigned induction variable with size_t in background_threads_enable
This patch avoids unnecessary vectorizations in clang and missed recognition of memset in gcc. See also https://godbolt.org/z/aoeMsjr4c.
2024-03-05 14:54:50 -08:00
Shirui Cheng
373884ab48 print out all malloc_conf settings in stats 2024-02-29 12:12:44 -08:00
Qi Wang
1aba4f41a3 Allow zero sized memalign to pass.
Instead of failing on assertions.  Previously the same change was made for
posix_memalign and aligned_alloc (#1554).  Make memalign behave the same way
even though it's obsolete.
2024-02-16 13:06:07 -08:00
Qi Wang
6d181bc1b7 Fix Cirrus CI.
13.0-RELEASE does not exist anymore.  "The resource
'projects/freebsd-org-cloud-dev/global/images/family/freebsd-13-0' was not
found"
2024-02-16 13:05:40 -08:00
David Goldblatt
f96010b7fa gitignore: Start ignoring clangd dirs. 2024-01-23 17:02:01 -08:00
Qi Wang
a2c5267409 HPA: Allow frequent reused alloc to bypass the slab_max_alloc limit, as long as
it's within the huge page size.  These requests do not concern internal
fragmentation with huge pages, since the entire range is expected to be
accessed.
2024-01-18 14:51:04 -08:00
guangli-dai
b1792c80d2 Add LOGs when entrying and exiting free and sdallocx. 2024-01-11 14:37:20 -08:00
Qi Wang
05160258df When safety_check_fail, also embed hint msg in the abort function name
because there are cases only logging crash stack traces.
2024-01-11 14:19:54 -08:00
Qi Wang
3a6296e1ef Disable FreeBSD on Travis CI since it's not working.
Travis CI currently provides only FreeBSD 12 which is EOL.
2024-01-04 14:47:52 -08:00
Minsoo Choo
d284aad027 Test on more FreeBSD versions
Added 14.0-RELEASE
Added 15-CURRENT
Added 14-STABLE
Added 13-STABLE

13.0-RELEASE will be updated when 13.3-RELEASE comes out.
2024-01-04 12:48:24 -08:00
Connor
dfb3260b97 Fix missing cleanup message for collected profiles.
```
sub cleanup {
  unlink($main::tmpfile_sym);
  unlink(keys %main::tempnames);

  # We leave any collected profiles in $HOME/jeprof in case the user wants
  # to look at them later.  We print a message informing them of this.
  if ((scalar(@main::profile_files) > 0) &&
      defined($main::collected_profile)) {
    if (scalar(@main::profile_files) == 1) {
      print STDERR "Dynamically gathered profile is in $main::collected_profile\n";
    }
    print STDERR "If you want to investigate this profile further, you can do:\n";
    print STDERR "\n";
    print STDERR "  jeprof \\\n";
    print STDERR "    $main::prog \\\n";
    print STDERR "    $main::collected_profile\n";
    print STDERR "\n";
  }
}
```
On cleanup, it would print out a message for the collected profile.
If there is only one collected profile, it would pop by L691, then `scalar(@main::profile_files)` would be 0, and no message would be printed.
2024-01-03 14:24:38 -08:00
Honggyu Kim
f6fe6abdcb build: Make autogen.sh accept quoted extra options
The current autogen.sh script doesn't allow receiving quoted extra
options.

If someone wants to pass extra CFLAGS that is split into multiple
options with a whitespace, then a quote is required.

However, the configure inside autogen.sh fails in this case as follows.

  $ ./autogen.sh CFLAGS="-Dmmap=cxl_mmap -Dmunmap=cxl_munmap"
  autoconf
  ./configure --enable-autogen CFLAGS=-Dmmap=cxl_mmap -Dmunmap=cxl_munmap
  configure: error: unrecognized option: `-Dmunmap=cxl_munmap'
  Try `./configure --help' for more information
  Error 0 in ./configure

It's because the quote discarded unexpectedly when calling configure.

This patch is to fix this problem.

Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
2024-01-03 14:20:34 -08:00
guangli-dai
eda05b3994 Fix static analysis warnings. 2024-01-03 14:18:52 -08:00
Shirui Cheng
e4817c8d89 Cleanup cache_bin_info_t* info input args 2023-10-25 10:27:31 -07:00
Qi Wang
3025b021b9 Optimize mutex and bin alignment / locality. 2023-10-23 20:28:26 -07:00
guangli-dai
e2cd27132a Change stack_size assertion back to the more compatabile one. 2023-10-23 20:28:26 -07:00
guangli-dai
756d4df2fd Add util.c into vs project file. 2023-10-18 22:11:13 -07:00
Qi Wang
04d1a87b78 Fix a zero-initializer warning on macOS. 2023-10-18 14:12:43 -07:00
guangli-dai
d88fa71bbd Fix nfill = 0 bug when ncached_max is 1 2023-10-18 14:11:46 -07:00
guangli-dai
6fb3b6a8e4 Refactor the tcache initiailization
1. Pre-generate all default tcache ncached_max in tcache_boot;
2. Add getters returning default ncached_max and ncached_max_set;
3. Refactor tcache init so that it is always init with a given setting.
2023-10-18 14:11:46 -07:00
guangli-dai
8a22d10b83 Allow setting default ncached_max for each bin through malloc_conf 2023-10-18 14:11:46 -07:00
guangli-dai
867eedfc58 Fix the bug in dalloc promoted allocations.
An allocation small enough will be promoted so that it does not
share an extent with others.  However, when dalloc, such allocations
may not be dalloc as a promoted one if nbins < SC_NBINS.  This
commit fixes the bug.
2023-10-17 14:53:23 -07:00
guangli-dai
630f7de952 Add mallctl to set and get ncached_max of each cache_bin.
1. `thread_tcache_ncached_max_read_sizeclass` allows users to get the
    ncached_max of the bin with the input sizeclass, passed in through
    oldp (will be upper casted if not an exact bin size is given).
2. `thread_tcache_ncached_max_write` takes in a char array
    representing the settings for bins in the tcache.
2023-10-17 14:53:23 -07:00
guangli-dai
6b197fdd46 Pre-generate ncached_max for all bins for better tcache_max tuning experience. 2023-10-17 14:53:23 -07:00
Shirui Cheng
36becb1302 metadata usage breakdowns: tracking edata and rtree usages 2023-10-11 11:56:01 -07:00
Qi Wang
005f20aa7f Fix comments about malloc_conf to enable logging. 2023-10-04 11:49:10 -07:00
guangli-dai
7a9e4c9073 Mark jemalloc.h as system header to resolve header conflicts. 2023-10-04 11:41:30 -07:00
Qi Wang
72cfdce718 Allocate tcache stack from base allocator
When using metadata_thp, allocate tcache bin stacks from base0, which means they
will be placed on huge pages along with other metadata, instead of mixed with
other regular allocations.

In order to do so, modified the base allocator to support limited reuse: freed
tcached stacks (from thread termination) will be returned to base0 and made
available for reuse, but no merging will be attempted since they were bump
allocated out of base blocks. These reused base extents are managed using
separately allocated base edata_t -- they are cached in base->edata_avail when
the extent is all allocated.

One tricky part is, stats updating must be skipped for such reused extents
(since they were accounted for already, and there is no purging for base). This
requires tracking the "if is reused" state explicitly and bypass the stats
updates when allocating from them.
2023-09-18 12:18:32 -07:00
guangli-dai
a442d9b895 Enable per-tcache tcache_max
1. add tcache_max and nhbins into tcache_t so that they are per-tcache,
   with one auto tcache per thread, it's also per-thread;
2. add mallctl for each thread to set its own tcache_max (of its auto tcache);
3. store the maximum number of items in each bin instead of using a global storage;
4. add tests for the modifications above.
5. Rename `nhbins` and `tcache_maxclass` to `global_do_not_change_nhbins` and `global_do_not_change_tcache_maxclass`.
2023-09-06 10:47:14 -07:00
guangli-dai
fbca96c433 Remove unnecessary parameters for cache_bin_postincrement. 2023-09-06 10:47:14 -07:00
Evers Chen
7d9eceaf38 Fix array bounds false warning in gcc 12.3.0
1.error: array subscript 232 is above array bounds of ‘size_t[232]’ in gcc 12.3.0
2.it also optimizer to the code
2023-09-05 14:33:55 -07:00
BtbN
ce8ce99a4a Expose jemalloc_prefix via pkg-config 2023-09-05 14:30:21 -07:00
BtbN
ed7e6fe71a Expose private library dependencies via pkg-config
When linking statically, these need to be included for linking to succeed.
2023-09-05 14:29:33 -07:00
Qi Wang
7d563a8f81 Update safety check message to remove --enable-debug when it's already on. 2023-09-05 14:15:45 -07:00
Qi Wang
b71da25b8a Fix reading CPU id using rdtscp.
As pointed out in #2527, the correct register containing CPU id should be ecx
instead edx.
2023-08-28 11:46:39 -07:00
Qi Wang
87c56c8df8 Fix arenas.i.bins.j.mutex link id in manual. 2023-08-28 11:01:13 -07:00
Kevin Svetlitski
da66aa391f Enable a few additional warnings for CI and fix the issues they uncovered
- `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are
  helpful for finding dead code and/or things that should be `static`
  but aren't marked as such.
- `-Wunused-macros` is of similar utility, but for identifying dead macros.
- `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly
  what they say: flag unreachable code.
2023-08-11 13:56:23 -07:00
Kevin Svetlitski
d2c9ed3d1e Ensure short read(2)s/write(2)s are properly handled by IO utilities
`read(2)` and `write(2)` may read or write fewer bytes than were
requested. In order to robustly ensure that all of the requested bytes
are read/written, these edge-cases must be handled.
2023-08-11 13:36:24 -07:00
guangli-dai
254c4847e8 Print colorful reminder for failed tests. 2023-08-08 15:01:07 -07:00
Kevin Svetlitski
4f50f782fa Use compiler-provided assume builtins when available
There are several benefits to this:
1. It's cleaner and more reliable to use the builtin to
   inform the compiler of assumptions instead of hoping that the
   optimizer understands your intentions.
2. `clang` will warn you if any of your assumptions would produce
   side-effects (which the compiler will discard). [This blog post](https://fastcompression.blogspot.com/2019/01/compiler-checked-contracts.html)
   by Yann Collet highlights that a hazard of using the
   `unreachable()`-based method of signaling assumptions is that it
   can sometimes result in additional instructions being generated (see
   [this Godbolt link](https://godbolt.org/z/lKNMs3) from the blog post
   for an example).
2023-08-08 14:59:36 -07:00
Kevin Svetlitski
3aae792b10 Fix infinite purging loop in HPA
As reported in #2449, under certain circumstances it's possible to get
stuck in an infinite loop attempting to purge from the HPA. We now
handle this by validating the HPA settings at the end of
configuration parsing and either normalizing them or aborting depending on
if `abort_conf` is set.
2023-08-08 14:36:19 -07:00
Kevin Svetlitski
424dd61d57 Issue a warning upon directly accessing an arena's bins
An arena's bins should normally be accessed via the `arena_get_bin`
function, which properly takes into account bin-shards. To ensure that
we don't accidentally commit code which incorrectly accesses the bins
directly, we mark the field with `__attribute__((deprecated))` with an
appropriate warning message, and suppress the warning in the few places
where directly accessing the bins is allowed.
2023-08-04 15:47:05 -07:00
Kevin Svetlitski
120abd703a Add support for the deprecated attribute
This is useful for enforcing the usage of getter/setter functions to
access fields which are considered private or have unique access constraints.
2023-08-04 15:47:05 -07:00
Kevin Svetlitski
162ff8365d Update the Ubuntu version used by Travis CI
Update from Ubuntu Focal Fossa to Ubuntu Jammy Jellyfish. Staying up to
date is always good, but I'm also hoping that perhaps this newer release
contains fixes so that PowerPC VMs don't randomly hang indefinitely
while booting anymore, stalling our CI pipeline.
2023-08-04 15:32:15 -07:00
Kevin Svetlitski
07a2eab3ed Stop over-reporting memory usage from sampled small allocations
@interwq noticed [while reviewing an earlier PR](https://github.com/jemalloc/jemalloc/pull/2478#discussion_r1256217261)
that I missed modifying this statistics accounting in line with the rest
of the changes from #2459. This is now fixed, such that sampled small
allocations increment the `.nmalloc`/`.ndalloc` of their effective bin
size instead of over-reporting memory usage by attributing all such
allocations to `SC_LARGE_MINCLASS`.
2023-08-03 16:12:22 -07:00
Kevin Svetlitski
ea5b7bea31 Add configuration option controlling DSS support
In many environments, the fallback `sbrk(2)` allocation path is never
used even if the system supports the syscall; if you're at the point
where `mmap(2)` is failing, `sbrk(2)` is unlikely to succeed. Without
changing the default, I've added the ability to disable the usage of DSS
altogether, so that you do not need to pay for the additional code size
and handful of extra runtime branches in such environments.
2023-08-03 11:52:25 -07:00
Qi Wang
6816b23862 Include the unrecognized malloc conf option in the error message.
Previously the option causing trouble will not be printed, unless the option
key:value pair format is found.
2023-08-02 10:44:55 -07:00
Kevin Svetlitski
62648c88e5 Ensured sampled allocations are properly deallocated during arena_reset
Sampled allocations were not being demoted before being deallocated
during an `arena_reset` operation.
2023-08-01 11:35:37 -07:00
Kevin Svetlitski
b01d496646 Add an override for the compile-time malloc_conf to jemalloc_internal_overrides.h 2023-07-31 14:53:15 -07:00
Kevin Svetlitski
9ba1e1cb37 Make ctl_arena_clear slightly more efficient
While this function isn't particularly hot, (accounting for just 0.27% of
time spent inside the allocator on average across the fleet), looking
at the generated assembly and performance profiles does show we're dispatching
to multiple different `memset`s when we could instead be just tail-calling
`memset` once, reducing code size and marginally improving performance.
2023-07-31 14:44:04 -07:00
Kevin Svetlitski
8ff7e7d6c3 Remove errant #includes in public jemalloc.h header
In an attempt to make all headers self-contained, I inadvertently added
`#include`s which refer to intermediate, generated headers that aren't
included in the final install. Closes #2489.
2023-07-25 16:26:50 -07:00
Kevin Svetlitski
3e82f357bb Fix all optimization-inhibiting integer-to-pointer casts
Following from PR #2481, we replace all integer-to-pointer casts [which
hide pointer provenance information (and thus inhibit
optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html)
with equivalent operations that preserve this information. I have
enabled the corresponding clang-tidy check in our static analysis CI so
that we do not get bitten by this again in the future.
2023-07-24 14:40:42 -07:00
Kevin Svetlitski
4827bb17bd Remove vestigial TCACHE_STATE_* macros 2023-07-24 14:40:42 -07:00
Kevin Svetlitski
1431153695 Define SBRK_INVALID instead of using a magic number 2023-07-24 14:40:42 -07:00
Kevin Svetlitski
7e54dd1ddb Define PROF_TCTX_SENTINEL instead of using magic numbers
This makes the code more readable on its own, and also sets the stage
for more cleanly handling the pointer provenance lints in a following
commit.
2023-07-24 14:40:42 -07:00
Kevin Svetlitski
c49c17f128 Suppress verbose frame address warnings
These warnings are not useful, and make the output of some CI jobs
enormous and difficult to read, so let's suppress them.
2023-07-24 10:44:17 -07:00
Kevin Svetlitski
cdb2c0e02f Implement C23's free_sized and free_aligned_sized
[N2699 - Sized Memory Deallocation](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2699.htm)
introduced two new functions which were incorporated into the C23
standard, `free_sized` and `free_aligned_sized`. Both already have
analogues in Jemalloc, all we are doing here is adding the appropriate
wrappers.
2023-07-20 15:06:41 -07:00
Kevin Svetlitski
41e0b857be Make headers self-contained by fixing #includes
Header files are now self-contained, which makes the relationships
between the files clearer, and crucially allows LSP tools like `clangd`
to function correctly in all of our header files. I have verified that
the headers are self-contained (aside from the various Windows shims) by
compiling them as if they were C files – in a follow-up commit I plan to
add this to CI to ensure we don't regress on this front.
2023-07-14 09:06:32 -07:00
Kevin Svetlitski
856db56f6e Move tsd implementation details into tsd_internals.h
This is a prerequisite to achieving self-contained headers. Previously,
the various tsd implementation headers (`tsd_generic.h`,
`tsd_tls.h`, `tsd_malloc_thread_cleanup.h`, and `tsd_win.h`) relied
implicitly on being included in `tsd.h` after a variety of dependencies
had been defined above them. This commit instead makes these
dependencies explicit by splitting them out into a separate file,
`tsd_internals.h`, which each of the tsd implementation headers includes
directly.
2023-07-14 09:06:32 -07:00
Kevin Svetlitski
36ca0c1b7d Stop concealing pointer provenance in phn_link_get
At least for LLVM, [casting from an integer to a pointer hides provenance information](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html)
and inhibits optimizations. Here's a [Godbolt link](https://godbolt.org/z/5bYPcKoWT)
showing how this change removes a couple unnecessary branches in
`phn_merge_siblings`, which is a very hot function. Canary profiles show
only minor improvements (since most of the cost of this function is in
cache misses), but there's no reason we shouldn't take it.
2023-07-13 15:12:31 -07:00
Kevin Svetlitski
314c073a38 Print the failed assertion before aborting in test cases
This makes it faster and easier to debug, so that you don't need to fire
up a debugger just to see which assertion triggered in a failing test.
2023-07-13 15:07:17 -07:00
Kevin Svetlitski
65d3b5989b Print test error messages in color when stderr is a terminal
When stderr is a terminal and supports color, print error messages
from tests in red to make them stand out from the surrounding output.
2023-07-13 13:03:23 -07:00
Kevin Svetlitski
1d9e9c2ed6 Fix inconsistent parameter names between definition/declaration pairs
For the sake of consistency, function definitions and their
corresponding declarations should use the same names for parameters.
I've enabled this check in static analysis to prevent this issue from
occurring again in the future.
2023-07-13 12:59:47 -07:00
Kevin Svetlitski
5711dc31d8 Only enable -Wstrict-prototypes in CI to unbreak feature detection
Adding `-Wstrict-prototypes` to the default `CFLAGS` in PR #2473 had the
non-obvious side-effect of breaking configure-time feature detection,
because the [test-program `autoconf` generates for feature
detection](https://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/Generating-Sources.html#:~:text=main%20())
defines `main` as:
```c
int main()
```
Which causes all feature checks to fail, since this triggers
`-Wstrict-prototypes` and the feature checks use `-Werror`.

Resolved by only adding `-Wstrict-prototypes` to
`EXTRA_{CFLAGS,CXXFLAGS}` in CI, since these flags are not used during
feature detection and we control which compiler is used.
2023-07-06 18:03:13 -07:00
Kevin Svetlitski
589c63b424 Make eligible global variables static and/or const
For better or worse, Jemalloc has a significant number of global
variables. Making all eligible global variables `static` and/or `const`
at least makes it slightly easier to reason about them, as these
qualifications communicate to the programmer restrictions on their use
without having to `grep` the whole codebase.
2023-07-06 14:15:12 -07:00
Qi Wang
e249d1a2a1 Remove unreachable code. 2023-07-06 12:06:06 -07:00
Qi Wang
602edd7566 Enabled -Wstrict-prototypes and fixed warnings. 2023-07-06 12:00:02 -07:00
Kevin Svetlitski
ebd7e99f5c Add a test-case for small profiled allocations
Validate that small allocations (i.e. those with `size <= SC_SMALL_MAXCLASS`)
which are sampled for profiling maintain the expected invariants even
though they now take up less space.
2023-07-03 16:19:06 -07:00
Kevin Svetlitski
5a858c64d6 Reduce the memory overhead of sampled small allocations
Previously, small allocations which were sampled as part of heap
profiling were rounded up to `SC_LARGE_MINCLASS`. This additional memory
usage becomes problematic when the page size is increased, as noted in #2358.

Small allocations are now rounded up to the nearest multiple of `PAGE`
instead, reducing the memory overhead by a factor of 4 in the most
extreme cases.
2023-07-03 16:19:06 -07:00
Kevin Svetlitski
e1338703ef Address compiler warnings in the unit tests 2023-07-03 16:06:35 -07:00
Qi Wang
d131331310 Avoid eager purging on the dedicated oversize arena when using bg thds.
We have observed new workload patterns (namely ML training type) that cycle
through oversized allocations frequently, because 1) the dataset might be sparse
which is faster to go through, and 2) GPU accelerated.  As a result, the eager
purging from the oversize arena becomes a bottleneck.  To offer an easy
solution, allow normal purging of the oversized extents when background threads
are enabled.
2023-06-27 11:57:41 -07:00
Kevin Svetlitski
46e464a26b Fix downloading LLVM in GitHub Action
It turns out LLVM does not include a build for every platform in the
assets for every release, just some of them. As such, I've pinned us to
the latest release version with a corresponding build.
2023-06-23 14:30:49 -07:00
Kevin Svetlitski
f2e00d2fd3 Remove trailing whitespace
Additionally, added a GitHub Action to ensure no more trailing
whitespace will creep in again in the future.

I'm excluding Markdown files from this check, since trailing whitespace
is significant there, and also excluding `build-aux/install-sh` because
there is significant trailing whitespace on the line that sets
`defaultIFS`.
2023-06-23 11:58:18 -07:00
Kevin Svetlitski
05385191d4 Add GitHub action which runs static analysis
Now that all of the various issues that static analysis uncovered have
been fixed (#2431, #2432, #2433, #2436, #2437, #2446), I've added a
GitHub action which will run static analysis for every PR going forward.
When static analysis detects issues with your code, the GitHub action
provides a link to download its findings in a form tailored for human
consumption.

Take a look at [this demonstration of what it looks like when static
analysis issues are
found](https://github.com/Svetlitski/jemalloc/actions/runs/5010245602)
on my fork for an example (make sure to follow the instructions in the
error message to download and inspect the results).
2023-06-23 11:55:43 -07:00
Kevin Svetlitski
bb0333e745 Fix remaining static analysis warnings
Fix or suppress the remaining warnings generated by static analysis.
This is a necessary step before we can incorporate static analysis into
CI. Where possible, I've preferred to modify the code itself instead of
just disabling the warning with a magic comment, so that if we decide to
use different static analysis tools in the future we will be covered
against them raising similar warnings.
2023-06-23 11:50:29 -07:00
Kevin Svetlitski
210f0d0b2b Fix read of uninitialized data in prof_free
In #2433, I inadvertently introduced a regression which causes the use of
uninitialized data. Namely, the control path I added for the safety
check in `arena_prof_info_get` neglected to set `prof_info->alloc_tctx`
when the check fails, resulting in `prof_info.alloc_tctx` being
uninitialized [when it is read at the end of
`prof_free`](90176f8a87/include/jemalloc/internal/prof_inlines.h (L272)).
2023-06-15 18:30:05 -07:00
Kevin Svetlitski
90176f8a87 Fix segfault in rb *_tree_remove
Static analysis flagged this. It's possible to segfault in the
`*_tree_remove` function generated by `rb_gen`, as `nodep` may
still be `NULL` after the initial for loop. I can confirm from reviewing
the fleetwide coredump data that this was in fact being hit in
production, primarily through `tctx_tree_remove`, and much more rarely
through `gctx_tree_remove`.
2023-06-07 14:48:41 -07:00
Qi Wang
86eb49b478 Fix the arena selection for oversized allocations.
Use the per-arena oversize_threshold, instead of the global setting.
2023-06-06 15:03:13 -07:00
Christos Zoulas
5832ef6589 Use a local variable to set the alignment for this particular allocation
instead of changing mmap_flags which makes the change permanent. This was
enforcing large alignments for allocations that did not need it causing
fragmentation. Reported by Andreas Gustafsson.
2023-05-31 14:44:24 -07:00
Kevin Svetlitski
6d4aa33753 Extract the calculation of psset heap assignment for an hpdata into a common function
This is in preparation for upcoming changes I plan to make to this
logic. Extracting it into a common function will make this easier and
less error-prone, and cleans up the existing code regardless.
2023-05-31 11:44:04 -07:00
Arne Welzel
c1d3ad4674 Prune je_malloc_default and do_rallocx in jeprof
Running a simple Ruby and Python execution je_malloc_default and
do_rallocx() in the resulting SVG / text output. Prune these, too.

    MALLOC_CONF='stats_print:true,lg_prof_sample:8,prof:true,prof_final:true' \
        python3 -c '[x for x in range(10000000)]'

    MALLOC_CONF='stats_print:true,lg_prof_sample:8,prof:true,prof_final:true' \
        ruby -e 'puts (0..1000).map{"0"}.join(" ")'
2023-05-31 11:41:09 -07:00
Arne Welzel
d59e30cbc9 Rename fallback_impl to fallbackNewImpl and prune in jeprof
The existing fallback_impl name seemed a bit generic and given
it's static probably okay to rename.

Closes #2451
2023-05-31 11:41:09 -07:00
Qi Wang
d577e9b588 Explicitly cast to unsigned for MALLOCX_ARENA and _TCACHE defines. 2023-05-26 11:52:42 -07:00
Qi Wang
a2259f9fa6 Fix the include path of "jemalloc_internal_overrides.h". 2023-05-25 15:22:02 -07:00
Kevin Svetlitski
9c32689e57 Fix bug where hpa_shard was not being destroyed
It appears that this was a simple mistake where `hpa_shard_disable` was
being called instead of `hpa_shard_destroy`. At present
`hpa_shard_destroy` is not called anywhere at all outside of test-cases,
which further suggests that this is a bug. @davidtgoldblatt noted
however that since HPA is disabled for manual arenas and we don't
support destruction for auto arenas that presently there is no way to
actually trigger this bug. Nonetheless, it should be fixed.
2023-05-18 14:17:38 -07:00
Kevin Svetlitski
4e6f1e9208 Allow overriding LG_PAGE
This is useful for our internal builds where we override the
configuration in the header files generated by autoconf.
2023-05-17 13:55:38 -07:00
Kevin Svetlitski
3e2ba7a651 Remove dead stores detected by static analysis
None of these are harmful, and they are almost certainly optimized
away by the compiler. The motivation for fixing them anyway is that
we'd like to enable static analysis as part of CI, and the first step
towards that is resolving the warnings it produces at present.
2023-05-11 20:27:49 -07:00
Kevin Svetlitski
0288126d9c Fix possible NULL pointer dereference from mallctl("prof.prefix", ...)
Static analysis flagged this issue. Here is a minimal program which
causes a segfault within Jemalloc:
```
#include <jemalloc/jemalloc.h>

const char *malloc_conf = "prof:true";

int main() {
  mallctl("prof.prefix", NULL, NULL, NULL, 0);
}
```

Fixed by checking if `prefix` is `NULL`.
2023-05-11 14:47:50 -07:00
Qi Wang
d4a2b8bab1 Add the prof_sys_thread_name feature in the prof_recent unit test.
This tests the combination of the prof_recent and thread_name features.
Verified that it catches the issue being fixed in this PR.

Also explicitly set thread name in test/unit/prof_recent.  This fixes the name
testing when no default thread name is set (e.g. FreeBSD).
2023-05-11 09:10:57 -07:00
Qi Wang
94ace05832 Fix the prof thread_name reference in prof_recent dump.
As pointed out in #2434, the thread_name in prof_tdata_t was changed in #2407.
This also requires an update for the prof_recent dump, specifically the emitter
expects a "char **" which is fixed in this commit.
2023-05-11 09:10:57 -07:00
Qi Wang
6ea8a7e928 Add config detection for JEMALLOC_HAVE_PTHREAD_SET_NAME_NP.
and use it on the background thread name setting.
2023-05-11 09:10:57 -07:00
auxten
5bac384970 If ptr present check if alloc_ctx.edata == NULL 2023-05-10 17:18:22 -07:00
auxten
019cccc293 Make arenas_lookup_ctl triable 2023-05-10 17:18:22 -07:00
Kevin Svetlitski
dc0a184f8d Fix possible NULL pointer dereference in VERIFY_READ
Static analysis flagged this. Fixed by simply checking `oldlenp`
before dereferencing it.
2023-05-09 10:57:09 -07:00
Kevin Svetlitski
12311fe6c3 Fix segfault in extent_try_coalesce_impl
Static analysis flagged this. `extent_record` was passing `NULL` as the
value for `coalesced` to `extent_try_coalesce`, which in turn passes
that argument to `extent_try_coalesce_impl`, where it is written to
without checking if it is `NULL`. I can confirm from reviewing the
fleetwide coredump data that this was in fact being hit in production.
2023-05-09 10:55:44 -07:00
Kevin Svetlitski
70344a2d38 Make eligible functions static
The codebase is already very disciplined in making any function which
can be `static`, but there are a few that appear to have slipped through
the cracks.
2023-05-08 15:00:02 -07:00
Kevin Svetlitski
6841110bd6 Make edata_cmp_summary_comp 30% faster
`edata_cmp_summary_comp` is one of the very hottest functions, taking up
3% of all time spent inside Jemalloc. I noticed that all existing
callsites rely only on the sign of the value returned by this function,
so I came up with this equivalent branchless implementation which
preserves this property. After empirical measurement, I have found that
this implementation is 30% faster, therefore representing a 1% speed-up
to the allocator as a whole.

At @interwq's suggestion, I've applied the same optimization to
`edata_esnead_comp` in case this function becomes hotter in the future.
2023-05-04 09:59:17 -07:00
Amaury Séchet
f2b28906e6 Some nits in cache_bin.h 2023-05-01 10:21:17 -07:00
Kevin Svetlitski
fc680128e0 Remove errant assert in arena_extent_alloc_large
This codepath may generate deferred work when the HPA is enabled.
See also [@davidtgoldblatt's relevant comment on the PR which
introduced this](https://github.com/jemalloc/jemalloc/pull/2107#discussion_r699770967)
which prevented a similarly incorrect `assert` from being added elsewhere.
2023-05-01 10:00:30 -07:00
Eric Mueller
521970fb2e Check for equality instead of assigning in asserts in hpa_from_pai.
It appears like a simple typo means we're unconditionally overwriting
some fields in hpa_from_pai when asserts are enabled. From hpa_shard_init,
it looks like these fields have these values anyway, so this shouldn't
cause bugs, but if something is wrong it seems better to have these
asserts in place.

See issue #2412.
2023-04-17 20:57:48 -07:00
guangli-dai
5f64ad60cd Remove locked flag set in malloc_mutex_trylock
As a hint flag of the lock, parameter locked should be set only
when the lock is gained or freed.
2023-04-06 10:57:04 -07:00
Qi Wang
434a68e221 Disallow decay during reentrancy.
Decay should not be triggered during reentrant calls (may cause lock order
reversal / deadlocks).  Added a delay_trigger flag to the tickers to bypass
decay when rentrancy_level is not zero.
2023-04-05 10:16:37 -07:00
Qi Wang
e62aa478c7 Rearrange the bools in prof_tdata_t to save some bytes.
This lowered the sizeof(prof_tdata_t) from 200 to 192 which is a round size
class.  Afterwards the tdata_t size remain unchanged with the last commit, which
effectively inlined the storage of thread names for free.
2023-04-05 10:03:12 -07:00
Qi Wang
ce0b7ab6c8 Inline the storage for thread name in prof_tdata_t.
The previous approach managed the thread name in a separate buffer, which causes
races because the thread name update (triggered by new samples) can happen at
the same time as prof dumping (which reads the thread names) -- these two
operations are under separate locks to avoid blocking each other.  Implemented
the thread name storage as part of the tdata struct, which resolves the lifetime
issue and also avoids internal alloc / dalloc during prof_sample.
2023-04-05 10:03:12 -07:00
Qi Wang
6cab460a45 Add a multithreaded test for prof_sys_thread_name.
Verified that this catches the issue being fixed in 5fd5583.
2023-04-05 10:03:12 -07:00
Amaury Séchet
5266152d79 Simplify the logic in ph_remove 2023-03-31 14:35:31 -07:00
Amaury Séchet
be6da4f663 Do not maintain root->prev in ph_remove. 2023-03-31 14:34:57 -07:00
Amaury Séchet
543e2d61e6 Simplify the logic in ph_insert
Also fixes what looks like an off by one error in the lazy aux list
merge part of the code that previously never touched the last node in
the aux list.
2023-03-31 14:34:24 -07:00
guangli-dai
31e01a98f1 Fix the rdtscp detection bug and add prefix for the macro. 2023-03-23 11:16:19 -07:00
Qi Wang
8b64be3441 Explicit arena assignment in test_tcache_max.
Otherwise the associated arena could change with percpu arena enabled.
2023-03-22 15:16:43 -07:00
Qi Wang
8e7353a19b Explicit arena assignment in test_thread_idle.
Otherwise the associated arena could change with percpu arena enabled.
2023-03-22 15:16:43 -07:00
Marvin Schmidt
45249cf5a9 Fix exception specification error for hosts using musl libc
It turns out that the previous commit did not suffice since the
JEMALLOC_SYS_NOTHROW definition also causes the same exception specification
errors as JEMALLOC_USE_CXX_THROW did:
```
x86_64-pc-linux-musl-cc -std=gnu11 -Werror=unknown-warning-option -Wall -Wextra -Wshorten-64-to-32 -Wsign-compare -Wundef -Wno-format-zero-length -Wpointer-
arith -Wno-missing-braces -Wno-missing-field-initializers -pipe -g3 -fvisibility=hidden -Wimplicit-fallthrough -O3 -funroll-loops -march=native -O2 -pipe -c -march=native -O2 -pipe -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/background_thread.o src/background_thread.c
In file included from src/jemalloc_cpp.cpp:9:
In file included from include/jemalloc/internal/jemalloc_preamble.h:27:
include/jemalloc/internal/../jemalloc.h:254:32: error: exception specification in declaration does not match previous declaration
    void JEMALLOC_SYS_NOTHROW   *je_malloc(size_t size)
                                 ^
include/jemalloc/internal/../jemalloc.h:75:21: note: expanded from macro 'je_malloc'
                    ^
/usr/x86_64-pc-linux-musl/include/stdlib.h:40:7: note: previous declaration is here
void *malloc (size_t);
      ^
```

On systems using the musl C library we have to omit the exception specification
on malloc function family like it's done for MacOS, FreeBSD and OpenBSD.
2023-03-16 12:11:40 -07:00
Marvin Schmidt
aba1645f2d configure: Handle *-linux-musl* hosts properly
This is the same as the `*-*-linux*` case with the two exceptions that
we don't set glibc=1 and don't define JEMALLOC_USE_CXX_THROW
2023-03-16 12:11:40 -07:00
Qi Wang
d503d72129 Add the missing descriptions in AC_DEFINE 2023-03-14 16:47:00 -07:00
Qi Wang
71bc1a3d91 Avoid assuming the arena id in test when percpu_arena is used. 2023-03-13 10:50:10 -07:00
Amaury Séchet
f743690739 Remove unused mutex from hpa_central 2023-03-10 11:25:47 -08:00
Chris Seymour
4edea8eb8e switch to https 2023-03-09 11:44:02 -08:00
guangli-dai
09e4b38fb1 Use asm volatile during benchmarks. 2023-02-24 11:17:48 -08:00
Fernando Pelliccioni
e8b28908de [MSVC] support for Visual Studio 2019 and 2022 2023-02-21 13:39:25 -08:00
barracuda156
4422f88d17 Makefile.in: link with g++ when cxx enabled 2023-02-21 13:26:58 -08:00
Qi Wang
c7805f1eb5 Add a header in HPA stats for the nonfull slabs. 2023-02-17 13:31:27 -08:00
Qi Wang
b6125120ac Add an explicit name to the dedicated oversize arena. 2023-02-17 13:31:09 -08:00
Qi Wang
97b313c7d4 More conservative setting for /test/unit/background_thread_enable.
Lower the thread and arena count to avoid resource exhaustion on 32-bit.
2023-02-16 14:42:21 -08:00
Qi Wang
5fd55837bb Fix thread_name updating for heap profiling.
The current thread name reading path updates the name every time, which requires
both alloc and dalloc -- and the temporary NULL value in the middle causes races
where the prof dump read path gets NULLed in the middle.

Minimize the changes in this commit to isolate the bugfix testing; will also
refactor the whole thread name paths later.
2023-02-15 17:49:40 -08:00
Qi Wang
8580c65f81 Implement prof sample hooks "experimental.hooks.prof_sample(_free)".
The added hooks hooks.prof_sample and hooks.prof_sample_free are intended to
allow advanced users to track additional information, to enable new ways of
profiling on top of the jemalloc heap profile and sample features.

The sample hook is invoked after the allocation and backtracing, and forwards
the both the allocation and backtrace to the user hook; the sample_free hook
happens before the actual deallocation, and forwards only the ptr and usz to the
hook.
2022-12-07 16:06:49 -08:00
guangli-dai
a74acb57e8 Fix dividing 0 error in stress/cpp/microbench
Summary:
Per issue #2356, some CXX compilers may optimize away the
new/delete operation in stress/cpp/microbench.cpp.
Thus, this commit (1) bumps the time interval to 1 if it is 0, and
(2) modifies the pointers in the microbench to volatile.
2022-12-06 10:46:14 -08:00
Guangli Dai
e8f9f13811 Inline free and sdallocx into operator delete 2022-11-21 11:14:05 -08:00
guangli-dai
06374d2a6a Benchmark operator delete
Added the microbenchmark for operator delete.
Also modified bench.h so that it can be used in C++.
2022-11-21 11:14:05 -08:00
guangli-dai
14ad8205bf Update the ratio display in benchmark
In bench.h, specify the ratio as the time consumption ratio and
modify the display of the ratio.
2022-11-21 11:14:05 -08:00
Qi Wang
481bbfc990 Add a configure option --enable-force-getenv.
Allows the use of getenv() rather than secure_getenv() to read MALLOC_CONF.
This helps in situations where hosts are under full control, and setting
MALLOC_CONF is needed while also setuid.  Disabled by default.
2022-11-04 13:37:14 -07:00
Qi Wang
143e9c4a2f Enable fast thread locals for dealloc-only threads.
Previously if a thread does only allocations, it stays on the slow path /
minimal initialized state forever.  However, dealloc-only is a valid pattern for
dedicated reclamation threads -- this means thread cache is disabled (no batched
flush) for them, which causes high overhead and contention.

Added the condition to fully initialize TSD when a fair amount of dealloc
activities are observed.
2022-10-25 09:54:38 -07:00
Paul Smith
be65438f20 jemalloc_internal_types.h: Use alloca if __STDC_NO_VLA__ is defined
No currently-available version of Visual Studio C compiler supports
variable length arrays, even if it defines __STDC_VERSION__ >= C99.
As far as I know Microsoft has no plans to ever support VLAs in MSVC.

The C11 standard requires that the __STDC_NO_VLA__ macro be defined if
the compiler doesn't support VLAs, so fall back to alloca() if so.
2022-10-14 15:48:32 -07:00
divanorama
1897f185d2 Fix safety_check segfault in double free test 2022-10-03 10:55:10 -07:00
Jordan Rome
b04e7666f2 update PROFILING_INTERNALS.md
Expand the bad example of summing before unbiasing.
2022-10-03 10:48:29 -07:00
David Carlier
4c95c953e2 fix build for non linux/BSD platforms. 2022-10-03 10:42:09 -07:00
divanorama
3de0c24859 Disable builtin malloc in tests
With `--with-jemalloc-prefix=` and without `-fno-builtin` or `-O1` both clang and gcc may optimize out `malloc` calls
whose result is unused. Comparing result to NULL also doesn't necessarily count as being used.

This won't be a problem in most client programs as this only concerns really unused pointers, but in
tests it's important to actually execute allocations.
`-fno-builtin` should disable this optimization for both gcc and clang, and applying it only to tests code shouldn't hopefully be an issue.
Another alternative is to force "use" of result but that'd require more changes and may miss some other optimization-related issues.

This should resolve https://github.com/jemalloc/jemalloc/issues/2091
2022-10-03 10:39:13 -07:00
Lily Wang
c0c9783ec9 Add vcpkg installation instructions 2022-09-19 15:15:28 -07:00
Guangli Dai
c9ac1f4701 Fix a bug in C++ integration test. 2022-09-16 15:04:59 -07:00
Guangli Dai
ba19d2cb78 Add arena-level name.
An arena-level name can help identify manual arenas.
2022-09-16 15:04:59 -07:00
Guangli Dai
a0734fd6ee Making jemalloc max stack depth a runtime option 2022-09-12 13:56:22 -07:00
Abael He
56ddbea270 error: implicit declaration of function 'pthread_create_fptr_init' is invalid in C99
./autogen.sh \
&& ./configure --prefix=/usr/local  --enable-static   --enable-autogen --enable-xmalloc --with-static-libunwind=/usr/local/lib/libunwind.a --enable-lazy-lock --with-jemalloc-prefix='' \
&& make -j16

...
gcc -std=gnu11 -Werror=unknown-warning-option -Wall -Wextra -Wshorten-64-to-32 -Wsign-compare -Wundef -Wno-format-zero-length -Wpointer-arith -Wno-missing-braces -Wno-missing-field-initializers -pipe -g3 -Wimplicit-fallthrough -O3 -funroll-loops -fPIC -DPIC -c -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/edata_cache.sym.o src/edata_cache.c
src/background_thread.c:768:6: error: implicit declaration of function 'pthread_create_fptr_init' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
            pthread_create_fptr_init()) {
            ^
src/background_thread.c:768:6: note: did you mean 'pthread_create_wrapper_init'?
src/background_thread.c:34:1: note: 'pthread_create_wrapper_init' declared here
pthread_create_wrapper_init(void) {
^
1 error generated.
make: *** [src/background_thread.sym.o] Error 1
make: *** Waiting for unfinished jobs....
2022-09-07 11:56:41 -07:00
Guangli Dai
ce29b4c3d9 Refactor the remote / cross thread cache bin stats reading
Refactored cache_bin.h so that only one function is racy.
2022-09-06 19:41:19 -07:00
Guangli Dai
42daa1ac44 Add double free detection using slab bitmap for debug build
Add a sanity check for double free issue in the arena in case that the tcache has been flushed.
2022-09-06 12:54:21 -07:00
Ivan Zaitsev
36366f3c4c Add double free detection in thread cache for debug build
Add new runtime option `debug_double_free_max_scan` that specifies the max
number of stack entries to scan in the cache bit when trying to detect the
double free bug (currently debug build only).
2022-08-04 16:58:22 -07:00
David CARLIER
adc70c0511 update travis 2022-07-19 13:23:08 -07:00
David CARLIER
4e12d21c8d enabled percpu_arena settings on macOs.
follow-up on #2280
2022-07-19 13:23:08 -07:00
David Carlier
58478412be OpenBSD build fix. still no cpu affinity.
- enabling pthread_get/pthread_set_name_np api.
- disabling per thread cpu affinity handling, unsupported on this platform.
2022-07-19 13:20:11 -07:00
Qi Wang
a1c7d9c046 Add the missing opt.cache_oblivious handling. 2022-07-14 22:41:27 -07:00
Jasmin Parent
41a859ef73 Remove duplicated words in documentation 2022-07-11 15:30:16 -07:00
Azat Khuzhin
cb578bbe01 Fix possible "nmalloc >= ndalloc" assertion
In arena_stats_merge() first nmalloc was read, and after ndalloc.

However with this order, it is possible for some thread to incement
ndalloc in between, and then nmalloc < ndalloc, and assertion will fail,
like again found by ClickHouse CI [1] (even after #2234).

  [1]: https://github.com/ClickHouse/ClickHouse/issues/31531

Swap the order to avoid possible assertion.

Cc: @interwq
Follow-up for: #2234
2022-07-11 15:27:51 -07:00
David CARLIER
a9215bf18a CI update FreeBSD version. 2022-06-28 11:48:23 -07:00
Alex Lapenkou
3713932836 Update building for Windows instructions
Explain how to build for Windows in INSTALL.md and remove another readme.txt in
an obscure location.
2022-06-14 14:04:48 -07:00
David Carlier
4fc5c4fbac New configure option '--enable-pageid' for Linux
The option makes jemalloc use prctl with PR_SET_VMA to tag memory mappings with
"jemalloc_pg" or "jemalloc_pg_overcommit". This allows to easily identify
jemalloc's mappings in /proc/<pid>/maps. PR_SET_VMA is only available in Linux
5.17 and above.
2022-06-09 18:54:08 -07:00
Qi Wang
b950934916 Enable retain by default on macOS.
High number of mappings result in unusually high fork() cost on macOS.  Retain
fixes the issue, at a small cost of extra VM space reserved.
2022-06-09 11:37:44 -07:00
David Carlier
df8f7d10af Implement malloc_getcpu for amd64 and arm64 macOS
This enables per CPU arena on MacOS
2022-06-08 15:13:55 -07:00
Alex Lapenkou
df7ad8a9b6 Revert "Echo installed files via verbose 'install' command"
This reverts commit f15d8f3b41. "install -v"
turned out to be not portable and not work on NetBSD.
2022-06-07 12:28:45 -07:00
barracuda156
70e3735f3a jemalloc: fix PowerPC definitions in quantum.h 2022-05-26 10:51:10 -07:00
Alex Lapenkou
5b1f2cc5d7 Implement pvalloc replacement
Despite being an obsolete function, pvalloc is still present in GLIBC and should
work correctly when jemalloc replaces libc allocator.
2022-05-18 17:01:09 -07:00
Qi Wang
cd5aaf308a Improve the failure message upon opt_experimental_infallible_new. 2022-05-17 16:07:40 -07:00
Yuriy Chernyshov
70d4102f48 Fix compiling edata.h with MSVC
At the time an attempt to compile jemalloc 5.3.0 with MSVC 2019 results in the followin error message:

> jemalloc/include/jemalloc/internal/edata.h:660: error C4576: a parenthesized type followed by an initializer list is a non-standard explicit type conversion syntax
2022-05-09 14:51:07 -07:00
Qi Wang
54eaed1d8b Merge branch 'dev' 2022-05-06 11:28:25 -07:00
Qi Wang
304c919829 Update ChangeLog for 5.3.0. 2022-05-06 11:24:21 -07:00
Qi Wang
8cb814629a Make the default option of zero realloc match the system allocator. 2022-05-05 17:11:18 -07:00
Qi Wang
66c889500a Make test/unit/background_thread_enable more conservative.
To avoid resource exhaustion on 32-bit platforms.
2022-05-04 15:32:57 -07:00
Qi Wang
a7d73dd4c9 Update TUNING.md to include the new tcache_max option. 2022-05-04 10:59:40 -07:00
Qi Wang
254b011915 Small doc tweak of opt.trust_madvise.
Avoid quoted enabled and disabled because it's a bool type instead of char *.
2022-04-28 21:16:25 -07:00
Qi Wang
f5e840bbf0 Minor typo fix in doc. 2022-04-27 20:25:29 -07:00
Qi Wang
ceca07d2ca Correct the name of stats.mutexes.prof_thds_data in doc. 2022-04-25 20:12:58 -07:00
Qi Wang
391bad4b95 Avoid abort() in test/integration/cpp/infallible_new_true.
Allow setting the safety check abort hook through mallctl, which avoids abort()
and core dumps.
2022-04-25 11:29:32 -07:00
cuishuang
9a242f16d9 fix some typos
Signed-off-by: cuishuang <imcusg@gmail.com>
2022-04-25 11:29:00 -07:00
Qi Wang
0e29ad4efa Rename zero_realloc option "strict" to "alloc".
With realloc(ptr, 0) being UB per C23, the option name "strict" makes less sense
now.  Rename to "alloc" which describes the behavior.
2022-04-20 10:27:25 -07:00
Qi Wang
5841b6dbe7 Update FreeBSD image to 12.3 for cirrus ci. 2022-04-19 15:29:30 -07:00
Qi Wang
ed5fc14b28 Use volatile to workaround buffer overflow false positives.
In test/integration/rallocx, full usable size is checked which may confuse
overflow detection.
2022-04-04 12:16:46 -07:00
Alex Lapenkou
25517b852e Reoreder TravisCI jobs to optimize CI time
Sorting jobs by descending expected runtime helps to utilize concurrency
better.
2022-03-29 11:58:27 -07:00
Alex Lapenkou
8a49b62e78 Enable TravisCI for Windows 2022-03-29 11:58:27 -07:00
Alex Lapenkou
fdb6c10162 Add FreeBSD to TravisCI
Implement the generation of Travis jobs for FreeBSD. The generated jobs
replicate the existing CirrusCI config.
2022-03-29 11:58:27 -07:00
Alex Lapenkou
a93931537e Do not disable SEC by default for 64k pages platforms
Default SEC max_alloc option value was 32k, disabling SEC for platforms with
lg-page=16. This change enables SEC for all platforms, making minimum max_alloc
value equal to PAGE.
2022-03-24 22:05:35 -07:00
Charles
eaaa368bab Add comments and use meaningful vars in sz_psz2ind. 2022-03-24 16:56:59 -07:00
Alex Lapenkou
5bf03f8ce5 Implement PAGE_FLOOR macro 2022-03-22 17:45:55 -07:00
Alex Lapenkou
52631c90f6 Fix size class calculation for sec
Due to a bug in sec initialization, the number of cached size classes
was equal to 198. The bug caused the creation of more than a hundred of
unused bins, although it didn't affect the caching logic.
2022-03-22 17:45:55 -07:00
Qi Wang
7ae0f15c59 Add a default page size when cross-compile for Apple M1.
When cross-compile for M1 and no page size specified, use the default 16K and
skip detecting the page size (which is likely incorrect).
2022-03-21 14:30:48 -07:00
Alex Lapenkov
eb65d1b078 Fix FreeBSD system jemalloc TSD cleanup
Before this commit, in case FreeBSD libc jemalloc was overridden by another
jemalloc, proper thread shutdown callback was involved only for the overriding
jemalloc. A call to _malloc_thread_cleanup from libthr would be redirected to
user jemalloc, leaving data about dead threads hanging in system jemalloc. This
change tackles the issue in two ways. First, for current and old system
jemallocs, which we can not modify, the overriding jemalloc would locate and
invoke system cleanup routine. For upcoming jemalloc integrations, the cleanup
registering function will also be redirected to user jemalloc, which means that
system jemalloc's cleanup routine will be registered in user's jemalloc and a
single call to _malloc_thread_cleanup will be sufficient to invoke both
callbacks.
2022-03-02 10:10:27 -08:00
Azat Khuzhin
78b58379c8 Fix possible "nmalloc >= ndalloc" assertion.
It is possible that ndalloc will be updated before nmalloc, in
arena_large_ralloc_stats_update(), fix this by reorder those calls.

It was found by ClickHouse CI, that periodically hits this assertion [1].

  [1]: https://github.com/ClickHouse/ClickHouse/issues/31531

That issue contains lots of examples, with core dump and some gdb output [2].

  [2]: https://s3.amazonaws.com/clickhouse-test-reports/34951/96390a9263cb5af3d6e42a84988239c9ae87ce32/stress_test__debug__actions_.html

Here you can find binaries for that particular report [3] you need
clickhouse debug build [4].

  [3]: https://s3.amazonaws.com/clickhouse-builds/34951/96390a9263cb5af3d6e42a84988239c9ae87ce32/clickhouse_build_check_(actions)/report.html
  [4]: https://s3.amazonaws.com/clickhouse-builds/34951/96390a9263cb5af3d6e42a84988239c9ae87ce32/package_debug/clickhouse

Brief info from that report:

    2 0x000000002ad6dbfe in arena_stats_merge (tsdn=0x7f2399abdd20, arena=0x7f241ce01080, nthreads=0x7f24e4360958, dss=0x7f24e4360960, dirty_decay_ms=0x7f24e4360968, muzzy_decay_ms=0x7f24e4360970, nactive=0x7f24e4360978, ndirty=0x7f24e43
    e4360988, astats=0x7f24e4360998, bstats=0x7f24e4363310, lstats=0x7f24e4364990, estats=0x7f24e4366e50, hpastats=0x7f24e43693a0, secstats=0x7f24e436a020) at ../contrib/jemalloc/src/arena.c:138
            ndalloc = 226
            nflush = 0
            curlextents = 0
            nmalloc = 225
            nrequests = 0

Here you can see that they differs only by 1.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-03-01 12:28:28 -08:00
Alex Lapenkou
ca709c3139 Fix failed assertion due to racy memory access
While calculating the number of stashed pointers, multiple variables
potentially modified by a concurrent thread were used for the
calculation.  This led to some inconsistencies, correctly detected by
the assertions.  The change eliminates some possible inconsistencies by
using unmodified variables and only once a concurrently modified one.
The assertions are omitted for the cases where we acknowledge potential
inconsistencies too.
2022-02-17 09:35:52 -08:00
Qi Wang
063d134aeb Properly detect background thread support on Darwin.
When cross-compile, the host type / abi should be checked to determine
background thread compatibility.
2022-02-15 10:10:11 -08:00
Alex Lapenkou
a4e81221cc Document 'make uninstall'
Update INSTALL.md, reflecting the addition of 'uninstall' target.
2022-01-31 14:55:00 -08:00
Qi Wang
20f9802e4f Avoid overflow warnings in test/unit/safety_check. 2022-01-27 10:29:54 -08:00
Qi Wang
8c59c44ffa Add a dependency checking step at the end of malloc_conf_init.
Currently only prof_leak_error and prof_final are checked.
2022-01-26 17:17:48 -08:00
Qi Wang
efc539c040 Initialize prof_leak during prof init.
Otherwise, prof_leak may get set after prof_leak_error, and disagree with each
other.
2022-01-26 17:17:48 -08:00
Alex Lapenkou
002f0e9397 Disable TravisCI jobs generation for Windows
These jobs take about 20 minutes to complete. We don't want to enable
them until we switch to unlimited concurrency plan, otherwise the builds
will take way too long.
2022-01-26 10:16:57 -08:00
Alex Lapenkou
01a293fc08 Add Windows to TravisCI
Implement the generation of Travis jobs for Windows. Currently, the
generated jobs replicate Appveyor setup and complete successfully. There
is support for MinGW GCC and MSVC compilers as well as 64 and 32 bit
compilation. Linux and MacOS jobs behave identically, but some
environment variables change - CROSS_COMPILE_32BIT=yes is added for
builds with cross compilation, empty COMPILER_FLAGS are not set anymore.
2022-01-26 10:16:57 -08:00
yunxu
b798fabdf7 Add prof_leak_error option
The option makes the process to exit with error code 1 if a memory leak
is detected. This is useful for implementing automated tools that rely
on leak detection.
2022-01-21 16:24:20 -08:00
Alex Lapenkou
eafd2ac39f Forbid spaces in prefix and exec_prefix
Spaces in these are also not handled correctly by Make, so there's sense
in not allowing that.
2022-01-19 12:28:16 -08:00
Alex Lapenkou
36a09ba2c7 Forbid spaces in install suffix
To avoid potential issues with removing unintended files after 'make
uninstall', spaces are no longer allowed in install suffix. It's worth
mentioning, that with GNU Make on Linux spaces in install suffix didn't
work anyway, leading to errors in the Makefile. But being verbose
about this restriction makes it more transparent for the developers.
2022-01-19 12:28:16 -08:00
Shuduo Sang
640c3c72e6 Add support for 'make uninstall' 2022-01-19 12:28:16 -08:00
Alex Lapenkou
f15d8f3b41 Echo installed files via verbose 'install' command
It's not necessary to manually echo all install commands, similar effect
is achieved via 'install -v'
2022-01-19 12:28:16 -08:00
Charles
eb196815d6 Avoid calculating size of size class twice & delete sc_data_global. 2022-01-18 11:54:12 -08:00
Qi Wang
011449f17b Fix doc build with install-suffix. 2022-01-11 21:15:24 -08:00
Qi Wang
8b49eb132e Fix the HELP_STRING of --enable-doc. 2022-01-11 21:15:24 -08:00
Qi Wang
ddb170b1d9 Simplify arena_migrate() to take arena_t* instead of indices.
This makes debugging slightly easier and avoids the confusion of "should we
create new arenas" here.
2022-01-11 16:59:22 -08:00
Qi Wang
648b3b9f76 Lower the num_threads in the stress test of test/unit/prof_recent
This takes a fair amount of resources.  Under high concurrency it was causing
resource exhaustion such as pthread_create and mmap failures.
2022-01-11 16:58:56 -08:00
Qi Wang
d66162e032 Fix the extent state checking on the merge error path.
With DSS as primary, the default merge impl will (correctly) decline to merge
when one of the extent is non-dss.  The error path should tolerate the
not-merged extent being in a merging state.
2022-01-11 16:58:47 -08:00
Craig Leres
c9946fa7e6 FreeBSD also needs the OS-X "don't declare system functions as
nothrow" fix since it also has jemalloc in the base system
2022-01-11 11:53:25 -08:00
Jonathan Swinney
89fe8ee6bf Use the isb instruction instead of yield for spin locks on arm
isb introduces a small delay which is closer to the x86 pause instruction.
2022-01-10 15:29:56 -08:00
Qi Wang
6230cc88b6 Add background thread sleep retry in test/unit/hpa_background_thread
Under high concurrency / heavy test load (e.g. using run_tests.sh), the
background thread may not get scheduled for a longer period of time.  Retry 100
times max before bailing out.
2022-01-07 10:28:28 -08:00
Qi Wang
61978bbe69 Purge all if the last thread migrated away from an arena. 2022-01-06 19:02:26 -08:00
Yuriy Chernyshov
c91e62dd37 #include <features.h> as requested 2022-01-05 18:45:27 -08:00
Yuriy Chernyshov
18510020e7 Fix symbol conflict with musl libc
`__libc` prefixed functions are used by musl libc as non-replaceable malloc stubs.

Fix this conflict by checking if we are linking against glibc.
2022-01-05 18:45:27 -08:00
Qi Wang
f509703af5 Fix two conversion warnings in tcache. 2022-01-04 13:55:06 -08:00
Qi Wang
067c2da074 Fix unnecessary returns in san_(un)guard_pages_two_sided. 2022-01-04 13:55:06 -08:00
Qi Wang
d660683d3d Fix test config of lg_san_uaf_align.
The option may be configure-disabled, which resulted in the invalid options
output from the tests.
2022-01-04 11:03:51 -08:00
Qi Wang
eabe889162 Rename full_position to low_bound in cache_bin.h. 2021-12-29 14:44:43 -08:00
Qi Wang
dfdd7562f5 Rename san_enabled() to san_guard_enabled(). 2021-12-29 14:44:43 -08:00
Qi Wang
01d61a3c6f Fix a conversion warning. 2021-12-29 14:44:43 -08:00
Qi Wang
8b34a788b5 Fix an used-uninitialized warning (false positive). 2021-12-29 14:44:43 -08:00
Qi Wang
e491cef9ab Add stats for stashed bytes in tcache. 2021-12-29 14:44:43 -08:00
Qi Wang
b75822bc6e Implement use-after-free detection using junk and stash.
On deallocation, sampled pointers (specially aligned) get junked and stashed
into tcache (to prevent immediate reuse).  The expected behavior is to have
read-after-free corrupted and stopped by the junk-filling, while
write-after-free is checked when flushing the stashed pointers.
2021-12-29 14:44:43 -08:00
Qi Wang
06aac61c4b Split the core logic of tcache flush into a separate function.
The core function takes a ptr array as input (containing items to be flushed),
which will be reused to flush sanitizer-stashed items.
2021-12-29 14:44:43 -08:00
Qi Wang
d038160f3b Fix shadowed variable usage.
Verified with EXTRA_CFLAGS=-Wshadow.
2021-12-23 10:55:08 -08:00
Qi Wang
bd70d8fc0f Add the profiling settings for tests explicit.
Many profiling related tests make assumptions on the profiling settings,
e.g. opt_prof is off by default, and prof_active is default on when opt_prof is
on.  However the default settings can be changed via --with-malloc-conf at build
time.  Fixing the tests by adding the assumed settings explicitly.
2021-12-22 20:10:28 -08:00
Joshua Watt
e491df1d2f Fix warnings when using autoheader. 2021-12-22 13:57:41 -08:00
Qi Wang
60b9637cc0 Only invoke malloc_cpu_count_is_deterministic() when necessary.
Also refactor the handling of the non-deterministic case.  Notably allow the
case with narenas set to proceed w/o warnings, to not affect existing valid use
cases.
2021-12-22 13:52:12 -08:00
Qi Wang
837b37c4ce Fix the time-since computation in HPA.
nstime module guarantees monotonic clock update within a single nstime_t.  This
means, if two separate nstime_t variables are read and updated separately,
nstime_subtract between them may result in underflow.  Fixed by switching to the
time since utility provided by nstime.
2021-12-21 23:37:22 -08:00
Qi Wang
310af725b0 Add nstime_ns_since which obtains the duration since the input time. 2021-12-21 23:37:22 -08:00
Azat Khuzhin
cafe9a3158 Disable percpu arena in case of non deterministic CPU count
Determinitic number of CPUs is important for percpu arena to work
correctly, since it uses cpu index - sched_getcpu(), and if it will
greater then number of CPUs bad thing will happen, or assertion will be
failed in debug build:

    <jemalloc>: ../contrib/jemalloc/src/jemalloc.c:321: Failed assertion: "ind <= narenas_total_get()"
    Aborted (core dumped)

Number of CPUs can be obtained from the following places:
- sched_getaffinity()
- sysconf(_SC_NPROCESSORS_ONLN)
- sysconf(_SC_NPROCESSORS_CONF)

For the sched_getaffinity() you may simply use taskset(1) to run program
on a different cpu, and in case it will be not first, percpu will work
incorrectly, i.e.:

    $ taskset --cpu-list $(( $(getconf _NPROCESSORS_ONLN)-1 )) <your_program>

_SC_NPROCESSORS_ONLN uses /sys/devices/system/cpu/online, LXD/LXC
virtualize /sys/devices/system/cpu/online file [1], and so when you run
container with limited limits.cpus it will bind randomly selected CPU to
it

  [1]: https://github.com/lxc/lxcfs/issues/301

_SC_NPROCESSORS_CONF uses /sys/devices/system/cpu/cpu*, and AFAIK nobody
playing with dentries there.

So if all three of these are equal, percpu arenas should work correctly.

And a small note regardless _SC_NPROCESSORS_ONLN/_SC_NPROCESSORS_CONF,
musl uses sched_getaffinity() for both. So this will also increase the
entropy.

Also note, that you can check is percpu arena really applied using
abort_conf:true.

Refs: https://github.com/jemalloc/jemalloc/pull/1939
Refs: https://github.com/ClickHouse/ClickHouse/issues/32806

v2: move malloc_cpu_count_is_deterministic() into
    malloc_init_hard_recursible() since _SC_NPROCESSORS_CONF does
    allocations for readdir()
v3:
- mark cpu_count_is_deterministic static
- check only if percpu arena is enabled
- check narenas
2021-12-21 11:53:09 -08:00
mweisgut
bb5052ce90 Fix base_ehooks_get_for_metadata 2021-12-20 15:37:53 -08:00
Alex Lapenkov
9015e129bd Update visual studio projects
Add relevant source files to the projects.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
d90655390f San: Create a function for committing and zeroing
Committing and zeroing an extent is usually done together, hence a new
function.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
800ce49c19 San: Bump alloc frequently reused guarded allocations
To utilize a separate retained area for guarded extents, use bump alloc
to allocate those extents.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
f56f5b9930 Pass 'frequent_reuse' hint to PAI
Currently used only for guarding purposes, the hint is used to determine
if the allocation is supposed to be frequently reused. For example, it
might urge the allocator to ensure the allocation is cached.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
2c70e8d351 Rename 'arena_decay' to 'arena_util'
While initially this file contained helper functions for one particular
test, now its usage spread across different test files. Purpose has
shifted towards a collection of handy arena ctl wrappers.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
0f6da1257d San: Implement bump alloc
The new allocator will be used to allocate guarded extents used as slabs
for guarded small allocations.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
34b00f8969 San: Avoid running san tests with prof enabled
With prof enabled, number of page aligned allocations doesn't match the
number of slab "ends" because prof allocations skew the addresses. It
leads to 'pages' array overflow and hard to debug failures.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
62f9c54d2a San: Rename 'guard' to 'san'
This prepares the foundation for more sanitizer-related work in the
future.
2021-12-15 10:39:17 -08:00
Alex Lapenkou
d9bbf539ff CI: Refactor gen_travis.py
The CI consolidation project adds more operating systems to Travis. This
refactoring is aimed to decouple the configuration of each individual OS
from the actual job matrix generation and formatting. Otherwise,
format_job function would turn into a huge collection of ad-hoc
conditions.
2021-12-06 15:11:14 -08:00
Qi Wang
7dcf77809c Mark slab as true on sized dealloc fast path.
For sized dealloc, fastpath only handles lookup-able sizes, which must be slabs.
2021-12-06 14:28:34 -08:00
Qi Wang
af6ee27c0d Enforce abort_conf:true when malloc_conf is not fully recognized.
Ensures the malloc_conf "ends with key", "ends with comma" and "malform conf
string" cases abort under abort_conf:true.
2021-12-06 14:27:25 -08:00
David CARLIER
113e8e68e1 freebsd 14 build fix proposal.
seems to have introduced finally more linux api cpu affinity (sched_* family)
compatibility detected at configure time thus adjusting accordingly.
2021-12-06 13:15:21 -08:00
Alex Lapenkou
3b3257a709 Correct opt.prof_leak documentation
The option has been misleading, because it stays disabled unless
prof_final is also specified. In practice it's impossible to detect that
the option is silently disabled, because it just doesn't provide any
output as if there are no memory leaks detected.
2021-11-23 15:10:21 -08:00
Qi Wang
cdabe908d0 Track the initialized state of nstime_t on debug build.
Some nstime_t operations require and assume the input nstime is initialized
(e.g. nstime_update) -- uninitialized input may cause silent failures which is
difficult to reproduce / debug.  Add an explicit flag to track the state
(limited to debug build only).

Also fixed an use case in hpa (time of last_purge).
2021-11-17 15:49:27 -08:00
Qi Wang
400c59895a Fix uninitialized nstime reading / updating on the stack in hpa.
In order for nstime_update to handle non-monotonic clocks, it requires the input
nstime to be initialized -- when reading for the first time, zero init has to be
done.  Otherwise random stack value may be seen as clocks and returned.
2021-11-16 16:54:12 -08:00
Qi Wang
8b81d3f214 Fix the initialization of last_event in thread event init.
The event counters maintain a relationship with the current bytes: last_event <=
current < next_event.  When a reinit happens (e.g. reincarnated tsd), the last
event needs progressing because all events start fresh from the current bytes.
2021-11-16 10:28:00 -08:00
Qi Wang
6bdb4f5ab0 Check prof_active in addtion to opt_prof during batch_alloc(). 2021-11-12 09:20:18 -08:00
Qi Wang
37342a4d32 Add ctl interface for experimental_infallible_new. 2021-11-05 13:20:09 -07:00
Alex Lapenkou
6cb585b13a San: Unguard guarded slabs during arena destruction
When opt_retain is on, slab extents remain guarded in all states, even
retained. This works well if arena is never destroyed, because we
anticipate those slabs will be eventually reused. But if the arena is
destroyed, the slabs must be unguarded to prevent leaking guard pages.
2021-11-03 17:55:50 -07:00
Qi Wang
b6a7a535b3 Optimize away a branch on the free fastpath.
On the rtree metadata lookup fast path, there will never be a NULL returned when
the cache key matches (which is unknown to the compiler).  The previous logic
was checking for NULL return value, resulting in the extra branch (in addition to
the cache key match checking).  Make the lookup_fast return a bool to indicate
cache miss / match, so that the extra branch is avoided.
2021-10-28 16:55:54 -07:00
Qi Wang
4d56aaeca5 Optimize away the tsd_fast() check on free fastpath.
To ensure that the free fastpath can tolerate uninitialized tsd, improved the
static initializer for rtree_ctx in tsd.
2021-10-28 10:05:59 -07:00
Ashutosh Grewal
26f5257b88 Remove declaration of an undefined function 2021-10-18 11:10:22 -07:00
Wang JinLong
2159615419 Add new architecture loongarch.
Signed-off-by: Wang JinLong <wangjinlong@uniontech.com>
2021-10-18 10:57:34 -07:00
Alex Lapenkou
8daac7958f Redefine functions with test hooks only for tests
Android build has issues with these defines, this will allow the build to
succeed if it doesn't need to build the tests.
2021-10-15 15:25:36 -07:00
Alex Lapenkou
c9ebff0fd6 Initialize deferred_work_generated
As the code evolves, some code paths that have previously assigned
deferred_work_generated may cease being reached. This would leave the value
uninitialized. This change initializes the value for safety.
2021-10-07 11:50:38 -07:00
Stan Angelov
912324a1ac Add debug check outside of the loop in hpa_alloc_batch.
This optimizes the whole loop away for non-debug builds.
2021-10-01 14:40:43 -07:00
David CARLIER
cf9724531a Darwin malloc_size override support proposal.
Darwin has similar api than Linux/FreeBSD's malloc_usable_size.
2021-10-01 14:32:40 -07:00
Qi Wang
ab0f1604b4 Delay the atexit call to prof_log_start().
So that atexit() is only done when prof_log is used.
2021-09-29 13:35:50 -07:00
David Carlier
11b6db7448 CPU affinity on BSD platforms support. 2021-09-28 11:40:21 -07:00
Qi Wang
83f3294027 Small refactors around 7bb05e0. 2021-09-27 16:05:13 -07:00
Qi Wang
3c4b717ffc Remove unused header base_structs.h. 2021-09-27 16:05:13 -07:00
Qi Wang
deb8e62a83 Implement guard pages.
Adding guarded extents, which are regular extents surrounded by guard pages
(mprotected).  To reduce syscalls, small guarded extents are cached as a
separate eset in ecache, and decay through the dirty / muzzy / retained pipeline
as usual.
2021-09-26 16:30:15 -07:00
Piotr Balcer
7bb05e04be add experimental.arenas_create_ext mallctl
This mallctl accepts an arena_config_t structure which
can be used to customize the behavior of the arena.
Right now it contains extent_hooks and a new option,
metadata_use_hooks, which controls whether the extent
hooks are also used for metadata allocation.

The medata_use_hooks option has two main use cases:

1. In heterogeneous memory systems, to avoid metadata
being placed on potentially slower memory.

2. Avoiding virtual memory from being leaked as a result
of metadata allocation failure originating in an extent hook.
2021-09-24 13:43:18 -07:00
Alex Lapenkou
a9031a0970 Allow setting a dump hook
If users want to be notified when a heap dump occurs, they can set this hook.
2021-09-22 15:04:01 -07:00
Alex Lapenkou
f7d46b8119 Allow setting custom backtrace hook
Existing backtrace implementations skip native stack frames from runtimes like
Python. The hook allows to augment the backtraces to attribute allocations to
native functions in heap profiles.
2021-09-22 15:04:01 -07:00
Qi Wang
523cfa55c5 Guard prof related mallctl with opt_prof.
The prof initialization is done only when opt_prof is true.  This change makes
sure the prof_* mallctls only have limited read access (i.e. no access to prof
internals) when opt_prof is false.

In addition, initialize the global prof mutexes even if opt_prof is false.  This
makes sure the mutex stats are set properly.
2021-09-20 10:42:16 -07:00
Alex Lapenkou
6e848a005e Remove opt_background_thread_hpa_interval_max_ms
Now that HPA can communicate the time until its deferred work should be done,
this option is not used anymore.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
8229cc77c5 Wake up background threads on demand
This change allows every allocator conforming to PAI communicate that it
deferred some work for the future. Without it if a background thread goes into
indefinite sleep, there is no way to notify it about upcoming deferred work.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
97da57c13a HPA: Add min_purge_interval_ms option
This rate limiting option is required to avoid purging too often.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
b8b8027f19 Allow PAI to calculate time until deferred work
Previously the calculation of sleep time between wakeups was implemented within
background_thread. This resulted in some parts of decay and hpa specific
logic mixing with background thread implementation. In this change, background
thread delegates this calculation to arena and it, in turn, delegates it to PAI.
The next step is to implement the actual calculation of time until deferred work
in HPA.
2021-09-17 16:56:41 -07:00
Alex Lapenkou
26140dd246 Reject --enable-prof-libunwind without --enable-prof
Prior to the change you could specify --enable-prof-libunwind without
--enable-prof which would do effectively nothing. This was confusing as I
expected --enable-prof-libunwind to act like --enable-prof, but use libunwind.
2021-09-13 14:02:40 -07:00
Mingli Yu
e5062e9fb9 Makefile.in: make sure doc generated before install
There is a race between the doc generation and the doc installation,
so make the install depend on the build for doc.

Signed-off-by: Mingli Yu <mingli.yu@windriver.com>
2021-09-13 13:40:39 -07:00
Qi Wang
8b24cb8fdf Don't assume initialized arena in the default alloc hook.
Specifically, this change allows the default alloc hook to used during
arenas.create.  One use case is to invoke the default alloc hook in a customized
hook arena, i.e. the default hooks can be read out of a default arena, then
create customized ones based on these hooks.  Note that mixing the default with
customized hooks is not recommended, and should only be considered when the
customization is simple and straightforward.
2021-08-25 14:19:25 -07:00
Alex Lapenkou
c01a885e94 HPA: Correctly calculate retained pages
Retained pages are those which haven't been touched and are unbacked from OS
perspective. For a pageslab their number should equal "total pages in slab"
minus "touched pages".
2021-08-20 18:06:17 -07:00
Alex Lapenkou
2c625d5cd9 Fix warnings when compiled with clang
When clang sees an unknown warning option, unlike gcc it doesn't fail the build
with error. It issues a warning. Hence JE_CFLAGS_ADD with warning options that
didnt't exist in clang would still mark those options as available. This led to
several warnings when built with clang or "gcc" on OSX. This change fixes those
warnings by simply making clang fail builds with non-existent warning options.
2021-08-13 14:14:46 -07:00
Alex Lapenkou
9d02bdc883 Port gen_run_tests.py to python3
Insignificant changes to make the script runnable on python3.
2021-08-13 10:59:32 -07:00
Qi Wang
5884a076fb Rename prof.dump_prefix to prof.prefix
This better aligns with our naming convention.  The option has not been included
in any upstream release yet.
2021-08-12 23:04:29 -07:00
Qi Wang
6a01600712 Add Cirrus CI testing matrix
Contains 16 testing configs -- a mix of debug, prof, -m32
and a few uncommon options.
2021-08-10 09:59:10 -07:00
Alex Lapenkou
f58064b932 Verify that HPA is used before calling its functions
This change eliminates the possibility of PA calling functions of uninitialized
HPA.
2021-08-05 16:43:28 -07:00
David Goldblatt
27f71242b7 Mutex: Tweak internal spin count.
The recent pairing heap optimizations flattened the lock hold time profile.
This was a win for raw cycle counts, but ended up causing us to "just miss"
acquiring the mutex before sleeping more often.  Bump those counts.
2021-08-05 14:33:16 -07:00
David Goldblatt
6f41ba55ee Mutex: Make spin count configurable.
Don't document it since we don't want to support this as a "real" setting, but
it's handy for testing.
2021-08-05 10:13:53 -07:00
David Goldblatt
dae24589bc PH: Insert-below-min fast-path. 2021-08-02 15:02:49 -07:00
David Goldblatt
40d53e007c ph: Add aux-list counting and pre-merging. 2021-08-02 15:02:49 -07:00
David Goldblatt
dcb7b83fac Eset: Cache summary information for heap edatas.
This lets us do a single array scan to find first fits, instead of taking a
cache miss per examined size class.
2021-08-02 15:02:49 -07:00
David Goldblatt
252e0942d0 Eset: Pull per-pszind data into structs.
We currently have one for stats and one for the data.  The data struct is just a
wrapper around the edata_heap_t, but this will change shortly.
2021-08-02 15:02:49 -07:00
David Goldblatt
dc0a4b8b2f Edata: Pull out comparison fields into a summary.
For now, this is a no-op; eventually, it will allow some caching in the eset.
2021-08-02 15:02:49 -07:00
David Goldblatt
0170dd198a Edata: Fix a couple typos.
Some readability-enhancing whitespace, and a spelling error.
2021-08-02 15:02:49 -07:00
David Goldblatt
08a4cc0969 Pairing heap: inline functions instead of macros.
By force-inlining everything that would otherwise be a macro, we get the same
effect (it's not clear in the first place that this is actually a good idea, but
it avoids making any changes to the existing performance profile).

This makes the code more maintainable (in anticipation of subsequent changes),
as well as making performance profiles and debug info more readable (we get
"real" line numbers, instead of making everything point to the macro definition
of all associated functions).
2021-08-02 15:02:49 -07:00
David Goldblatt
92a1e38f52 edata_cache: Allow unbounded fast caching.
The edata_cache_small had a fill/flush heuristic.  In retrospect, this was a
premature optimization; more testing indicates that an unbounded cache is
effectively fine here, and moreover we spend a nontrivial amount of time doing
unnecessary filling/flushing.

As the HPA takes on a larger and larger fraction of all allocations, any
theoretical differences in allocation patterns should shrink.  The HPA is more
efficient with its metadata in general, so it still comes out ahead on metadata
usage anyways.
2021-07-26 15:14:37 -07:00
David Goldblatt
d93eef2f40 HPA: Introduce a redesigned hpa_central_t.
For now, this only handles allocating virtual address space to shards, with no
reuse.  This is framework, though; it will change over time.
2021-07-23 21:59:59 -07:00
David Goldblatt
e09eac1d4e Remove hpa_central.
This is now dead code.
2021-07-23 21:59:59 -07:00
Alex Lapenkou
c88fe355e6 Add unit tests for decay
After slight changes in the interface, it's an opportunity to enhance unit
tests.
2021-07-22 23:19:09 -07:00
Alex Lapenkou
aaea4fd1e6 Add more documentation to decay.c
It took me a while to understand why some things are implemented the way they
are, so hopefully it will help future readers.
2021-07-22 23:19:09 -07:00
Alex Lapenkou
4b633b9a81 Clean up background thread sleep computation
Isolate the computation of purge interval from background thread logic and
move into more suitable file.
2021-07-22 23:19:09 -07:00
David Goldblatt
6630c59896 HPA: Hugification hysteresis.
We wait a while after deciding a huge extent should get hugified to see if it
gets purged before long.  This avoids hugifying extents that might shortly get
dehugified for purging.

Rename and use the hpa_dehugification_threshold option support code for this,
since it's now ignored.
2021-07-12 17:59:18 -07:00
David Goldblatt
113938b6f4 HPA: Pull out a hooks type.
For now, this is a no-op change.  In a subsequent commit, it will be useful for
testing.
2021-07-12 17:59:18 -07:00
David Goldblatt
1d4a7666d5 HPA: Do deferred operations on background threads. 2021-07-12 17:59:18 -07:00
David Goldblatt
583284f2d9 Add HPA deferral functionality. 2021-07-12 17:59:18 -07:00
David Goldblatt
ace329d11b HPA batch dalloc: Just do one deferred work check.
We only need to do one check per batch dalloc, not one check per dalloc in the
batch.
2021-07-12 17:59:18 -07:00
David Goldblatt
47d8a7e6b0 psset: Purge empty slabs first.
These are particularly good candidates for purging (listed in the diff).
2021-07-12 17:59:18 -07:00
David Goldblatt
41fd56605e HPA: Purge across retained extents.
This lets us cut down on the number of expensive system calls we perform.
2021-07-12 17:59:18 -07:00
David Goldblatt
347523517b PAI: Fix a typo. 2021-07-12 17:59:11 -07:00
David Goldblatt
9c42ed2d14 Travis: Don't test "clang" on OS X.
On OS X, "gcc" is really just clang anyways, so this combination gets tested by
the gcc test.  This is purely redundant, and (since it runs early in the output)
increases time to signal for real breakages further down in the list.
2021-07-08 09:53:28 -07:00
David Goldblatt
d202218e86 HPA: Fix typos with big performance implications.
This fixes two simple but significant typos in the HPA:
- The conf string parsing accidentally set a min value of PAGE for
  hpa_sec_batch_fill_extra; i.e. allocating 4096 extra pages every time we
  attempted to allocate a single page.  This puts us over the SEC flush limit,
  so we then immediately flush all but one of them (probably triggering
  purging).
- The HPA was using the default PAI batch alloc implementation, which meant it
  did not actually get any locking advantages.

This snuck by because I did all the performance testing without using the PAI
interface or config settings.  When I cleaned it up and put everything behind
nice interfaces, I only did correctness checks, and didn't try any performance
ones.
2021-06-24 16:26:55 -07:00
David Goldblatt
de033f56c0 mpsc_queue: Add module.
This is a simple multi-producer, single-consumer queue.  The intended use case
is in the HPA, as we begin supporting hpdatas that move between hpa_shards.  We
take just a single CAS as the cost to send a message (or a batch of messages) in
the low-contention case, and lock-freedom lets us avoid some lock-ordering
issues.
2021-06-24 14:55:49 -07:00
David Goldblatt
4452a4812f Add opt.experimental_infallible_new.
This allows a guarantee that operator new never throws.

Fix the .gitignore rules to include test/integration/cpp while we're here.
2021-06-24 12:22:51 -07:00
David Goldblatt
0689448b1e Travis: Unbreak the builds.
In the hopes of future-proofing as much as possible, jump to the latest
distribution Travis supports.
2021-06-24 07:40:28 -07:00
David Carlier
4fb93a18ee extent_can_acquire_neighbor typo fix 2021-06-19 08:13:11 -07:00
Vineet Gupta
2381efab57 ARC: add Minimum allocation alignment
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2021-06-03 13:43:38 -07:00
Ondřej Surý
2c0f4c2ac3 Fix typo in configure.ac: experimetal -> experimental 2021-05-25 08:20:37 -07:00
David Goldblatt
36c6bfb963 SEC: Allow arbitrarily many shards, cached sizes. 2021-05-22 08:17:41 -07:00
Deanna Gelbart
11beab38bc Added --debug-syms-by-id option 2021-05-17 10:00:40 -07:00
Qi Wang
08089589f7 Fix an interaction between the oversize_threshold test and bgthds.
Also added the shared utility to check if background_thread is enabled.
2021-05-13 16:19:14 -07:00
David Goldblatt
5417938215 Red-black tree: add summarize/filter.
This allows tracking extra information in the nodes of an red-black tree to
filter searches in the tree to just those that match some property.
2021-05-12 11:14:23 -07:00
David Goldblatt
b2c08ef2e6 RB unit tests: don't test reentrantly.
The RB code doesn't do any allocation, and takes a little bit of time to run.
There's no sense in doing everything three times.
2021-05-12 11:14:23 -07:00
David Goldblatt
aea91b8c33 Clean up some minor data structure inconsistencies
Namely, unify the include guard styling with the majority of the project, and do
flat_bitmap -> fb, to match its naming convention.
2021-05-12 11:14:23 -07:00
David Goldblatt
1f688490e1 Stats: Fix a printing bug when hpa_dirty_mult = -1
Missed a layer of indirection.
2021-05-05 19:45:25 -07:00
David Goldblatt
4f7cb3a413 Sized deallocation: fix a typo.
dealloction -> deallocation.
2021-05-04 16:46:15 -07:00
David Goldblatt
12cd13cd41 Fix thread.name/prof_sys_thread_name interaction
When prof_sys_thread_name is true, we don't allow setting the thread name.
Teach the unit test this.
2021-03-31 14:45:12 -07:00
David Goldblatt
304cdbb132 Fix a prof_recent/prof_sys_thread_name interaction
When both of these are enabled, the output format changes slightly.  Teach the
unit test about the interaction.
2021-03-31 14:45:12 -07:00
Qi Wang
9b523c6c15 Refactor the locking in extent_recycle().
Hold the ecache lock across extent_recycle_extract() and extent_recycle_split(),
so that the extent_deactivate after split can avoid re-take the ecache mutex.
2021-03-31 14:42:33 -07:00
Qi Wang
ce68f326b0 Avoid the release & re-acquire of the ecache locks around the merge hook. 2021-03-31 14:42:33 -07:00
Qi Wang
7dc77527ba Delete the mutex_pool module. 2021-03-29 17:19:53 -07:00
Qi Wang
03d95cba88 Remove the unnecessary arena_ind_set in base_alloc_edata().
All edata alloc sites are already followed with proper edata_init().
2021-03-29 17:19:53 -07:00
Qi Wang
3093d9455e Move the edata mergeability related functions to extent.h. 2021-03-29 17:19:53 -07:00
Qi Wang
7c964b0352 Add rtree_write_range(): writing the same content to multiple leaf elements.
Apply to emap_(de)register_interior which became noticeable in perf profiles.
2021-03-29 17:19:53 -07:00
Qi Wang
add636596a Stop checking head state in the merge hook.
Now that all merging go through try_acquire_edata_neighbor, the mergeablility
checks (including head state checking) are done before reaching the merge hook.
In other words, merge hook will never be called if the head state doesn't agree.
2021-03-29 17:19:53 -07:00
Qi Wang
49b7d7f0a4 Passing down the original edata on the expand path.
Instead of passing down the new_addr, pass down the active edata which allows us
to always use a neighbor-acquiring semantic.  In other words, this tells us both
the original edata and neighbor address.  With this change, only neighbors of a
"known" edata can be acquired, i.e. acquiring an edata based on an arbitrary
address isn't possible anymore.
2021-03-29 17:19:53 -07:00
Qi Wang
1784939688 Use rtree tracked states to protect edata outside of ecache locks.
This avoids the addr-based mutexes (i.e. the mutex_pool), and instead relies on
the metadata tracked in rtree leaf: the head state and extent_state.  Before
trying to access the neighbor edata (e.g. for coalescing), the states will be
verified first -- only neighbor edatas from the same arena and with the same
state will be accessed.
2021-03-29 17:19:53 -07:00
Qi Wang
9ea235f8fe Add witness_assert_positive_depth_to_rank(). 2021-03-29 17:19:53 -07:00
Qi Wang
4d8c22f9a5 Store edata->state in rtree leaf and make edata_t 128B aligned.
Verified that this doesn't result in any real increase of edata_t bytes
allocated.
2021-03-29 17:19:53 -07:00
Qi Wang
70d1541c5b Track extent is_head state in rtree leaf. 2021-03-29 17:19:53 -07:00
Qi Wang
862219e461 Add quiescence sync before deleting base during arena_destroy. 2021-03-29 17:19:53 -07:00
Evers Chen
a137a68252 Remove redundant declaration, pac_retain_grow_limit_get_set was declared twice in pac.h 2021-03-29 16:42:46 -07:00
lirui
2ae1ef7dbd Fix doc large size 54 KiB error 2021-03-26 13:34:49 -07:00
Qi Wang
61afb6a405 Fix locking on arena_i_destroy_ctl().
The ctl_mtx should be held to protect against concurrent arenas.create.
2021-03-22 23:18:52 -07:00
David Goldblatt
9193ea2248 Cirrus: fix build.
Remaining on 12.1 has started to break with an m4 error.  Upgrading fixes
things.

Mangle public symbols to work around a public definition error.
2021-03-19 13:42:30 -07:00
Qi Wang
3913077146 Mark head state during dss alloc.
Specifically, the extent_dalloc_gap relies on the correct head state to
coalesce.
2021-03-12 19:17:25 -08:00
Qi Wang
11127240ca Remove redundant enable-debug definition in configure. 2021-03-12 11:30:56 -08:00
Qi Wang
22be724af4 Set is_head in extent_alloc_wrapper w/ retain.
When retain is on, when extent_grow_retained failed (e.g. due to split hook
failures), we'll try extent_alloc_wrapper as the last resort.  Set the is_head
bit in that case to be consistent.  The allocated extent in that case will be
retained properly, but not merged with other extents.
2021-03-12 10:20:08 -08:00
David Goldblatt
73ca4b8ef8 HPA: Use dirtiest-first purging.
This seems to be practically beneficial, despite some pathological corner cases.
2021-02-19 15:10:54 -08:00
David Goldblatt
0f6c420f83 HPA: Make purging/hugifying more principled.
Before this change, purge/hugify decisions had several sharp edges that could
lead to pathological behavior if tuning parameters weren't carefully chosen.
It's the first of a series; this introduces basic "make every hugepage with
dirty pages purgeable" functionality, and the next commit expands that
functionality to have a smarter policy for picking hugepages to purge.

Previously, the dehugify logic would *never* dehugify a hugepage unless it was
dirtier than the dehugification threshold.  This can lead to situations in which
these pages (which themselves could never be purged) would push us above the
maximum allowed dirty pages in the shard.  This forces immediate purging of any
pages deallocated in non-hugified hugepages, which in turn places nonobvious
practical limitations on the relationships between various config settings.

Instead, we make our preference not to dehugify to purge a soft one rather than
a hard one.  We'll avoid purging them, but only so long as we can do so by
purging non-hugified pages.  If we need to purge them to satisfy our dirty page
limits, or to hugify other, more worthy candidates, we'll still do so.
2021-02-19 15:10:54 -08:00
David Goldblatt
6bddb92ad6 psset: Rename "bitmap" to "pageslab_bitmap".
It tracks pageslabs.  Soon, we'll have another bitmap (to track dirty pages)
that we want to disambiguate.

While we're here, fix an out-of-date comment.
2021-02-19 15:10:54 -08:00
David Goldblatt
154aa5fcc1 Use the flat bitmap for eset and psset bitmaps.
This is simpler (note that the eset field comment was actually incorrect!), and
slightly faster.
2021-02-19 15:10:54 -08:00
David Goldblatt
271a676dcd hpdata: early bailout for longest free range.
A number of common special cases allow us to stop iterating through an hpdata's
bitmap earlier rather than later.
2021-02-19 15:10:54 -08:00
David Goldblatt
d21d5b46b6 Edata: Move sn into its own field.
This lets the bins use a fragmentation avoidance policy that matches the HPA's
(without affecting the PAC).
2021-02-19 15:10:54 -08:00
David Goldblatt
fb327368db SEC: Expand option configurability.
This change pulls the SEC options into a struct, which simplifies their handling
across various modules (e.g. PA needs to forward on SEC options from the
malloc_conf string, but it doesn't really need to know their names).  While
we're here, make some of the fixed constants configurable, and unify naming from
the configuration options to the internals.
2021-02-19 15:10:54 -08:00
David Goldblatt
ce9386370a HPA: Implement batch allocation. 2021-02-19 15:10:54 -08:00
David Goldblatt
cdae6706a6 SEC: Use batch fills.
Currently, this doesn't help much, since no PAI implementation supports
flushing.  This will change in subsequent commits.
2021-02-19 15:10:54 -08:00
David Goldblatt
480f3b11cd Add a batch allocation interface to the PAI.
For now, no real allocator actually implements this interface; this will change
in subsequent diffs.
2021-02-19 15:10:54 -08:00
David Goldblatt
bf448d7a5a SEC: Reduce lock hold times.
Only flush a subset of extents during flushing, and drop the lock while doing
so.
2021-02-19 15:10:54 -08:00
David Goldblatt
1944ebbe7f HPA: Implement batch deallocation.
This saves O(n) mutex locks/unlocks during SEC flush.
2021-02-19 15:10:54 -08:00
David Goldblatt
f47b4c2cd8 PAI/SEC: Add a dalloc_batch function.
This lets the SEC flush all of its items in a single call, rather than flushing
everything at once.
2021-02-19 15:10:54 -08:00
David Goldblatt
4b8870c7db SEC: Fix a comment typo. 2021-02-19 15:10:54 -08:00
Jordan Rome
cde7097eca Update INSTALL.md to mention 'autoconf'
'autoconf' needs to be installed for './autogen.sh' to work.
2021-02-16 17:48:46 -08:00
Qi Wang
a11be50332 Implement opt.cache_oblivious.
Keep config.cache_oblivious for now to remain backward-compatible.
2021-02-11 11:32:01 -08:00
Jordan Rome
8c5e5f50a2 Fix stats for "tcache_max" (was "lg_tcache_max")
This opt was changed here: c8209150f9
and looks like this got missed.

Also update the write type to be unsigned.
2021-02-10 23:01:46 -08:00
Qi Wang
041145c272 Report the correct and wrong sizes on sized dealloc bug detection. 2021-02-08 14:42:27 -08:00
Qi Wang
f3b2668b32 Report the offending pointer on sized dealloc bug detection. 2021-02-08 14:42:27 -08:00
David Goldblatt
edbfe6912c Inline malloc fastpath into operator new.
This saves a small but non-negligible amount of CPU in C++ programs.
2021-02-08 14:17:47 -08:00
David Goldblatt
79f81a3732 HPA: Make dirty_mult configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
32dd153796 HPA: Make dehugification threshold configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
4790db15ed HPA: make the hugification threshold configurable. 2021-02-04 20:58:31 -08:00
David Goldblatt
b3df80bc79 Pull HPA options into a containing struct.
Currently that just means max_alloc, but we're about to add more.  While we're
touching these lines anyways, tweak things to be more in line with testing.
2021-02-04 20:58:31 -08:00
David Goldblatt
bdb7307ff2 fxp: Add FXP_INIT_PERCENT
This lets us specify fxp values easily in source.
2021-02-04 20:58:31 -08:00
David Goldblatt
caef4c2868 FXP: add fxp_mul_frac.
This can multiply size_ts by a fraction without the risk of overflow.
2021-02-04 20:58:31 -08:00
David Goldblatt
56e85c0e47 HPA: Use a whole-shard purging heuristic.
Previously, we used only hpdata-local information to decide whether to purge.
2021-02-04 20:58:31 -08:00
David Goldblatt
dc886e5608 hpdata: Return the number of pages to be purged.
We'll use this in the next commit.
2021-02-04 20:58:31 -08:00
David Goldblatt
9fd9c876bb psset: keep aggregate stats.
This will let us quickly query these stats to make purging decisions quickly.
2021-02-04 20:58:31 -08:00
David Goldblatt
da63f23e68 HPA: Track pending purges/hugifies in the psset.
This finishes the refactoring of the HPA/psset interactions the past few commits
have been building towards.

Rather than the HPA removing and then reinserting hpdatas, it simply begins
updates and ends them.  These updates can set flags on the hpdata that prevent
it from being returned for certain types of requests.  For example, it can call
hpdata_alloc_allowed_set(hpdata, false) during an update, at which point the
given hpdata will no longer be returned for psset_pick_alloc requests.

This has various of benefits:
- It maintains stats correctness during purges and hugifies.
- It allows simpler and more explicit concurrency control for the various
  special cases (e.g. allocations are disallowed during purge, but not during
  hugify).
- It lets allocations and deallocations avoid disturbing the purging and
  hugification orderings.  If an hpdata "loses its place" in one of the queues
  just do to an alloc / dalloc, it can result in pathological edge cases where
  very hot, very full hugepages never get hugified  (and cold extents on the
  same hugepage as hot ones never get purged).

The key benefit though is that tracking hpdatas to be purged / hugified in a
principled way will let us do delayed purging and hugification.  Eventually this
will let us move these operations to background threads, but in the short term
the benefit is that it will let us have global purging policies (e.g. purge when
the entire arena has too many dirty pages, rather than any particular hugepage).
2021-02-04 20:58:31 -08:00
David Goldblatt
0ea3d6307c CTL, Stats: report HPA empty slab stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
bf64557ed6 Move empty slab tracking to the psset.
We're moving towards a world in which purging decisions are less rigidly
enforced at a single-hugepage level.  In that world, it makes sense to keep
around some hpdatas which are not completely purged, in which case we'll need to
track them.
2021-02-04 20:58:31 -08:00
David Goldblatt
99fc0717e6 psset: Reconceptualize insertion/removal.
Really, this isn't a functional change, just a naming change.  We start thinking
of pageslabs as being always in the psset.  What we used to think of as removal
is now thought of as being in the psset, but in the process of being updated
(and therefore, unavalable for serving new allocations).

This is in preparation of subsequent changes to support deferred purging;
allocations will still be in the psset for the purposes of choosing when to
purge, but not for purposes of allocation/deallocation.
2021-02-04 20:58:31 -08:00
David Goldblatt
061cabb712 HPA stats: report retained instead of inactive.
This more closely maps to the PAC.
2021-02-04 20:58:31 -08:00
David Goldblatt
d3e5ea03c5 HPA: Track dirty stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
68a1666e91 hpdata: Rename "dirty" to "touched".
This matches the usage in the rest of the codebase.
2021-02-04 20:58:31 -08:00
David Goldblatt
be0d7a53f3 HPA: Don't track inactive pages.
This is really only useful for human consumption.  Correspondingly, emit it only
in the human-readable stats, and let everybody else compute from the hugepage
size and nactive.
2021-02-04 20:58:31 -08:00
David Goldblatt
55e0f60ca1 psset stats: Simplify handling.
We can treat the huge and nonhuge cases uniformly using huge state as an array
index.
2021-02-04 20:58:31 -08:00
David Goldblatt
94cd9444c5 HPA: Some minor reformattings. 2021-02-04 20:58:31 -08:00
David Goldblatt
b25ee5d88e HPA: Add purge stats. 2021-02-04 20:58:31 -08:00
David Goldblatt
746ea3de6f HPA stats: Allow some derived stats.
However, we put them in their own struct, to avoid the messiness that the arena
has (mixing derived and non-derived stats in the arena_stats_t).
2021-02-04 20:58:31 -08:00
David Goldblatt
30b9e8162b HPA: Generalize purging.
Previously, we would purge a hugepage only when it's completely empty.  With
this change, we can purge even when only partially empty.  Although the
heuristic here is still fairly primitive, this infrastructure can scale to
become more advanced.
2021-02-04 20:58:31 -08:00
David Goldblatt
70692cfb13 hpdata: Add state changing helpers.
We're about to allow hugepage subextent purging; get as much of our metadata
handling ready as possible.
2021-02-04 20:58:31 -08:00
David Goldblatt
9b75808be1 flat bitmap: Add a bitwise and/or/not.
We're about to need them.
2021-02-04 20:58:31 -08:00
David Goldblatt
2ae966222f hpdata: track per-page dirty state. 2021-02-04 20:58:31 -08:00
David Goldblatt
ff4086aa6b hpdata: count active pages instead of free ones.
This will be more consistent with later naming choices.
2021-02-04 20:58:31 -08:00
David Goldblatt
3624dd42ff hpdata: Add a comment for hpdata_consistent. 2021-02-04 20:58:31 -08:00
David Goldblatt
20140629b4 Bin: Move stats closer to the mutex.
This is a slight cache locality optimization.
2021-02-04 14:10:43 -08:00
David Goldblatt
c259323ab3 Use ticker_geom_t for arena tcache decay. 2021-02-04 14:10:43 -08:00
David Goldblatt
8edfc5b170 Add ticker_geom_t.
This lets a single ticker object drive events across a large number of different
tick streams while sharing state.
2021-02-04 14:10:43 -08:00
David Goldblatt
3967329813 Arena: share bin offsets in a global.
This saves us a cache miss when lookup up the arena bin offset in a remote
arena during tcache flush.  All arenas share the base offset, and so we don't
need to look it up repeatedly for each arena.  Secondarily, it shaves 288 bytes
off the arena on, e.g., x86-64.
2021-02-04 14:10:43 -08:00
David Goldblatt
2fcbd18115 Cache bin: Don't reverse flush order.
The items we pick to flush matter a lot, but the order in which they get flushed
doesn't; just use forward scans.  This simplifies the accessing code, both in
terms of the C and the generated assembly (i.e. this speeds up the flush
pathways).
2021-02-04 14:10:43 -08:00
David Goldblatt
4c46e11365 Cache an arena's index in the arena.
This saves us a pointer hop down some perf-sensitive paths.
2021-02-04 14:10:43 -08:00
David Goldblatt
229994a204 Tcache flush: keep common path state in registers.
By carefully force-inlining the division constants and the operation sum count,
we can eliminate redundant operations in the arena-level dalloc function.  Do
so.
2021-02-04 14:10:43 -08:00
David Goldblatt
31a629c3de Tcache flush: prefetch edata contents.
This frontloads more of the miss latency.  It also moves it to a pathway where
we have not yet acquired any locks, so that it should (hopefully) reduce hold
times.
2021-02-04 14:10:43 -08:00
David Goldblatt
9f9247a62e Tcache fluhing: increase cache miss parallelism.
In practice, many rtree_leaf_elm accesses are cache misses.  By restructuring,
we can make it more likely that these misses occur without blocking us from
starting later lookups, taking more of those misses in parallel.
2021-02-04 14:10:43 -08:00
David Goldblatt
181ba7fd4d Tcache flush: Add an emap "batch lookup" path.
For now this is a no-op; but the interface is a little more flexible for our
purposes.
2021-02-04 14:10:43 -08:00
David Goldblatt
c007c537ff Tcache flush: Unify edata lookup path. 2021-02-04 14:10:43 -08:00
David CARLIER
35a8552605 Mac OS: Tag mapped pages.
This can be used to help profiling tools (e.g. vmmap) identify the
sources of mappings more specifically.
2021-02-03 15:05:53 -08:00
Yinan Zhang
f6699803e2 Fix duration in prof log 2021-01-25 16:38:38 -08:00
Azat Khuzhin
a943172b73 Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu)
qemu does not support this, yet [1], and you can get very tricky assert
if you will run program with jemalloc in use under qemu:

    <jemalloc>: ../contrib/jemalloc/src/extent.c:1195: Failed assertion: "p[i] == 0"

  [1]: https://patchwork.kernel.org/patch/10576637/

Here is a simple example that shows the problem [2]:

    // Gist to check possible issues with MADV_DONTNEED
    // For example it does not supported by qemu user
    // There is a patch for this [1], but it hasn't been applied.
    //   [1]: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05422.html

    #include <sys/mman.h>
    #include <stdio.h>
    #include <stddef.h>
    #include <assert.h>
    #include <string.h>

    int main(int argc, char **argv)
    {
        void *addr = mmap(NULL, 1<<16, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        if (addr == MAP_FAILED) {
            perror("mmap");
            return 1;
        }
        memset(addr, 'A', 1<<16);

        if (!madvise(addr, 1<<16, MADV_DONTNEED)) {
            puts("MADV_DONTNEED does not return error. Check memory.");
            for (int i = 0; i < 1<<16; ++i) {
                assert(((unsigned char *)addr)[i] == 0);
            }
        } else {
            perror("madvise");
        }

        if (munmap(addr, 1<<16)) {
            perror("munmap");
            return 1;
        }

        return 0;
    }

  ### unpatched qemu

      $ qemu-x86_64-static /tmp/test-MADV_DONTNEED
      MADV_DONTNEED does not return error. Check memory.
      test-MADV_DONTNEED: /tmp/test-MADV_DONTNEED.c:19: main: Assertion `((unsigned char *)addr)[i] == 0' failed.
      qemu: uncaught target signal 6 (Aborted) - core dumped
      Aborted (core dumped)

  ### patched qemu (by returning ENOSYS error)

      $ qemu-x86_64 /tmp/test-MADV_DONTNEED
      madvise: Success

  ### patch for qemu to return ENOSYS

      diff --git a/linux-user/syscall.c b/linux-user/syscall.c
      index 897d20c076..5540792e0e 100644
      --- a/linux-user/syscall.c
      +++ b/linux-user/syscall.c
      @@ -11775,7 +11775,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
                  turns private file-backed mappings into anonymous mappings.
                  This will break MADV_DONTNEED.
                  This is a hint, so ignoring and returning success is ok.  */
      -        return 0;
      +        return ENOSYS;
       #endif
       #ifdef TARGET_NR_fcntl64
           case TARGET_NR_fcntl64:

  [2]: https://gist.github.com/azat/12ba2c825b710653ece34dba7f926ece

v2:
- review fixes
- add opt_dont_trust_madvise
v3:
- review fixes
- rename opt_dont_trust_madvise to opt_trust_madvise
2021-01-20 20:08:30 -08:00
Uwe L. Korn
2e3104ba07 Update config.{sub,guess} to support support-aarch64-apple-darwin as a target 2021-01-11 14:00:03 -08:00
David Goldblatt
a011c4c22d cache_bin: Separate out local and remote accesses.
This fixes an incorrect debug-mode assert:
- T1 starts an arena stats update and reads stack_head from another thread's
  cache bin, when that cache bin has 1 item in it.
- T2 allocates from that cache bin.  The cache_bin's stack_head now points to a
  NULL pointer, since the cache bin is empty.
- T1 Re-reads the cache_bin's stack_head to perform an assertion check (since it
  previously saw that the bin was empty, whatever stack_head points to should be
  non-NULL).
2021-01-08 14:18:08 -08:00
Yinan Zhang
14d689c0f9 Add prof stats mutex stats 2021-01-07 20:39:49 -08:00
Yinan Zhang
9f71b5779b Output prof stats in stats print 2021-01-07 20:39:49 -08:00
Yinan Zhang
1f1a0231ed Split macros for initializing stats headers 2021-01-07 20:39:49 -08:00
Yinan Zhang
4352cbc21c Add alignment tests for prof stats 2021-01-07 20:39:49 -08:00
Yinan Zhang
54f3351f1f Add mallctl for prof stats fetching 2021-01-07 20:39:49 -08:00
Yinan Zhang
40fa4d29d3 Track per size class internal fragmentation 2021-01-07 20:39:49 -08:00
Yinan Zhang
afa489c3c5 Record request size in prof info 2021-01-07 20:39:49 -08:00
David Goldblatt
f9bb8dedef Un-force-inline do_rallocx.
The additional overhead of the function-call setup and flags checking is
relatively small, but costs us the replication of the entire realloc pathway in
terms of size.
2021-01-04 14:55:49 -08:00
David Goldblatt
a9fa2defdb Add JEMALLOC_COLD, and mark some functions cold.
This hints to the compiler that it should care more about space than CPU (among
other things).  In cases where the compiler lacks profile-guided information,
this can be a substantial space savings.

For now, we mark the mallctl or atexit driven profiling and stats functions that
take up the most space.
2021-01-04 14:55:49 -08:00
David Goldblatt
5d8e70ab26 prof_recent: cassert(config_prof) more often.
This tells the compiler that these functions are never called, which lets them
be optimized away in builds where profiling is disabled.
2021-01-04 14:55:49 -08:00
David Goldblatt
83cad746ae prof_log: cassert(config_prof) in public functions
This lets the compiler infer that the code is dead in builds where profiling is
enabled, saving on space there.
2021-01-04 14:55:49 -08:00
David Goldblatt
526180b76d Extent.c: Avoid an rtree NULL-check.
The edge case in which pages_map returns (void *)PAGE can trigger an incorrect
assertion failure.  Avoid it.
2021-01-04 14:50:49 -08:00
Yinan Zhang
b35ac00d58 Do not bump to large size for page aligned request 2020-12-29 17:09:58 -08:00
Yinan Zhang
8a56d6b636 Add last-N mutex stats 2020-12-29 09:44:19 -08:00
Yinan Zhang
22d62d8cbd Handle ending gap properly for HPA stats 2020-12-18 16:40:57 -08:00
Yinan Zhang
6c5a3a24dd Omit bin stats rows with no data 2020-12-18 16:40:57 -08:00
Yinan Zhang
ea013d8fa4 Enforce realloc sizing stability 2020-12-18 11:41:52 -08:00
Yinan Zhang
74bd63b203 Optimize stats print using partial name-to-mib 2020-12-18 10:39:58 -08:00
Yinan Zhang
4557c0a67d Enable ctl on partial mib and partial name 2020-12-18 10:39:58 -08:00
Yinan Zhang
006dd0414e Add partial name-to-mib functionality 2020-12-18 10:39:58 -08:00
Yinan Zhang
f2e1a5be77 Do not fail on partial ctl path for ctl_nametomib()
We do not fail on partial ctl path when the given `mib` array is
shorter than the given name, and we should keep the behavior the
same in the reverse case, which I feel is also the more natural way.
2020-12-18 10:39:58 -08:00
Yinan Zhang
6ab181d2b7 Extract node lookup given mib input 2020-12-18 10:39:58 -08:00
Yinan Zhang
3a627b9674 No need to record all nodes in ctl_lookup() 2020-12-18 10:39:58 -08:00
Yinan Zhang
91e006c4c2 Enable ctl_lookup() to start from arbitrary node 2020-12-18 10:39:58 -08:00
Jin Qian
063a767ffe Define JEMALLOC_HAS_ALLOCA_H for QNX
QNX has <alloca.h>
2020-12-18 10:05:59 -08:00
Jin Qian
4e3fe218e9 Use posix_madvise to purge pages when available 2020-12-18 10:05:59 -08:00
Jin Qian
26c1dc5a3a Support AutoConf for posix_madvise and POSIX_MADV_DONTNEED 2020-12-18 10:05:59 -08:00
Jin Qian
96a59c3bb5 Fix recursive malloc during bootstrap on QNX
pthread_key_create on QNX triggers recursive allocation during tsd
bootstrapping. Using tsd_init_check_recursion to detect that.

Before pthread_key_create, the address of tsd_boot_wrapper is returned
from tsd_get_wrapper instead of using TLS to store the pointer.
tsd_set_wrapper becomes a no-op. After that, the address of
tsd_boot_wrapper is written to TLS and bootstrap continues as before.

Signed-off-by: Jin Qian <jqian@aurora.tech>
2020-12-18 10:05:59 -08:00
Jin Qian
986cbe4881 Disable JEMALLOC_TLS for QNX
TLS access triggers recurisive malloc during bootstrapping. Need to use
pthread_getspecific and pthread_setspecific with a follow up fix.
2020-12-18 10:05:59 -08:00
David Goldblatt
1e3b8636ff HPA: Remove unused malloc_conf options. 2020-12-08 12:10:48 -08:00
Yinan Zhang
e82771807e Cache mallctl mib for batch allocation stress test 2020-12-07 09:10:11 -08:00
Yinan Zhang
0dfdd31e0f Add tiny batch size to batch allocation stress test 2020-12-07 09:10:11 -08:00
Aditya Kumar
9522ae41d6 Move n_search outside of assert as reported by static analyzer 2020-12-07 06:49:27 -08:00
David Goldblatt
a559caf74a hpdata: Strengthen assertions.
Now that we have flat bitmap bit counting functions, we can easily assert that
nfree is always correct.  While we're tightening up this code, enforce
consistency on API boundaries as well.
2020-12-07 06:21:08 -08:00
David Goldblatt
f51948d9e1 psset unit test: fix a bug.
The next commit adds assertions that reveal a bug in the test code
(double-free).  Fix it.
2020-12-07 06:21:08 -08:00
David Goldblatt
54c94c1679 flat bitmap: add scount / ucount functions.
These can compute the number or set or unset bits in a subrange of the bitmap.
2020-12-07 06:21:08 -08:00
David Goldblatt
e6c057ad35 fb: implement assign in terms of a visitor.
We'll reuse this visitor in the next commit.
2020-12-07 06:21:08 -08:00
David Goldblatt
734e72ce8f bit_util: Guarantee popcount's presence.
Implement popcount generically, so that we can rely on it being present.
2020-12-07 06:21:08 -08:00
David Goldblatt
d9f7e6c668 hpdata: Add a test.
We're about to make the functionality here more complicated; testing hpdata
directly (rather than relying on user's tests) will make debugging easier.
2020-12-07 06:21:08 -08:00
David Goldblatt
3ed0b4e8a3 HPA: Add an nevictions counter.
I.e. the number of times we've purged a hugepage-sized region.
2020-12-07 06:21:08 -08:00
David Goldblatt
fffcefed33 malloc_conf: Clarify HPA options. 2020-12-07 06:21:08 -08:00
David Goldblatt
f7cf23aa4d psset: Relegate alloc/dalloc to test code.
This is no longer part of the "core" functionality; we only need the stub
implementations as an end-to-end test of hpdata + psset interactions when
metadata is being modified.  Treat them accordingly.
2020-12-07 06:21:08 -08:00
David Goldblatt
f9299ca572 HPA: Use psset fit/insert/remove.
This will let us remove alloc_new and alloc_reuse functions from the psset.
2020-12-07 06:21:08 -08:00
David Goldblatt
0971e1e4e3 hpdata: Use addr/size instead of begin/npages.
This is easier for the users of the hpdata.
2020-12-07 06:21:08 -08:00
David Goldblatt
5228d869ee psset: Use fit/insert/remove as basis functions.
All other functionality can be implemented in terms of these; doing so (while
retaining the same API) will be convenient for subsequent refactors.
2020-12-07 06:21:08 -08:00
David Goldblatt
089f8fa442 Move hpdata bitmap logic out of the psset. 2020-12-07 06:21:08 -08:00
David Goldblatt
ca30b5db2b Introduce hpdata_t.
Using an edata_t both for hugepages and the allocations within those hugepages
was convenient at first, but has outlived its usefulness.  Representing
hugepages explicitly, with their own data structure, will make future
development easier.
2020-12-07 06:21:08 -08:00
David Goldblatt
4a15008cfb HPA unit test: skip if unsupported.
Previously, we replicated the logic in hpa_supported in the test as well.
2020-12-07 06:21:08 -08:00
David Goldblatt
43af63fff4 HPA: Manage whole hugepages at a time.
This redesigns the HPA implementation to allow us to manage hugepages all at
once, locally, without relying on a global fallback.
2020-12-07 06:21:08 -08:00
David Goldblatt
63677dde63 Pages: Statically detect if pages_huge may succeed 2020-12-07 06:21:08 -08:00
David Goldblatt
c1b2a77933 psset: Move in stats.
A later change will benefit from having these functions pulled into a
psset-module set of functions.
2020-12-07 06:21:08 -08:00
David Goldblatt
d0a991d47b psset: Add insert/remove functions.
These will allow us to (for instance) move pageslabs from a psset dedicated to
not-yet-hugeified pages to one dedicated to hugeified ones.
2020-12-07 06:21:08 -08:00
David Goldblatt
d438296b1f narenas_ratio: Accept fractional values.
With recent scalability improvements to the HPA, we're experimenting with much
lower arena counts; this gets annoying when trying to test across different
hardware configurations using only the narenas setting.
2020-12-04 23:48:19 -08:00
David Goldblatt
ecd39418ac Add fxp: A fixed-point math library.
This will be used in the next commit to allow non-integer values for
narenas_ratio.
2020-12-04 23:48:19 -08:00
Igor Wiedler
99c2d6c232 Backport jeprof --collapse for flamegraph generation 2020-12-04 10:48:21 -08:00
David Carlier
520b75fa2d utrace support with label based signature. 2020-11-30 11:43:00 -08:00
Yinan Zhang
92e189be8b Add some comments to the batch allocation logic flow 2020-11-16 20:58:01 -08:00
Yinan Zhang
d96e4525ad Route batch allocation of small batch size to tcache 2020-11-16 20:58:01 -08:00
Yinan Zhang
ac480136d7 Split out locality checking in batch allocation tests 2020-11-16 20:58:01 -08:00
Yinan Zhang
be5e49f4fa Add a batch mode for cache_bin_alloc() 2020-11-16 20:58:01 -08:00
Yinan Zhang
4a65f34930 Fix a cache bin test 2020-11-16 20:58:01 -08:00
Yinan Zhang
566c4a8594 Slight changes to cache bin internal functions 2020-11-16 20:58:01 -08:00
Yinan Zhang
9545c2cd36 Add sample interval to prof last-N dump 2020-11-13 15:33:27 -08:00
David Goldblatt
cf2549a149 Add a per-arena oversize_threshold.
This can let manual arenas trade off memory and CPU the way auto arenas do.
2020-11-13 13:45:35 -08:00
David Goldblatt
4ca3d91e96 Rename geom_grow -> exp_grow.
This was promised in the review of the introduction of geom_grow, but would have
been painful to do there because of the series that introduced it.  Now that
those are comitted, renaming is easier.
2020-11-13 13:42:33 -08:00
David Goldblatt
b4c37a6e81 Rename edata_tree_t -> edata_avail_t.
This isn't a tree any more, and it mildly irritates me any time I see it.
2020-11-13 13:42:11 -08:00
David Carlier
95f0a77fde Detect pthread_getname_np explicitly.
At least one libc (musl) defines pthread_setname_np without defining
pthread_getname_np. Detect the presence of each individually, rather than
inferring both must be defined if set is.
2020-11-11 17:31:22 -08:00
Issam E. Maghni
b3c5690b7e Update config.{guess,sub} to 2020-11-07@77632d9 2020-11-10 13:32:32 -08:00
David Goldblatt
589638182a Use the edata_cache_small_t in the HPA. 2020-11-05 12:34:43 -08:00
David Goldblatt
03a6047111 Edata cache small: rewrite.
In previous designs, this was intended to be a sort of cache that couldn't fail.
In the current design, we want to use it just as a contention reduction
mechanism.  Rewrite it with those goals in mind.
2020-11-05 12:34:43 -08:00
David Goldblatt
c9757d9e3b HPA: Don't disable shards that were never started. 2020-11-05 12:34:43 -08:00
David Goldblatt
1b3ee75667 Add experimental.thread.activity_callback.
This (experimental, undocumented) functionality can be used by users to track
various statistics of interest at a finer level of granularity than the thread.
2020-11-05 12:33:25 -08:00
David Carlier
27ef02ca9a Android build fix proposal.
These are detected at configure time while they are glibc
specifics. the bionic equivalent is not api compatible
and dlopen is restricted in this platform.
2020-11-02 13:38:44 -08:00
David Carlier
d2d941017b MADV_DO[NOT]DUMP support equivalence on FreeBSD. 2020-11-02 09:15:15 -08:00
David Goldblatt
180b843159 Appveyor: fix 404 errors.
It looks like the mirrors we were using no longer carry this package, but that
it is installed by default and so no longer needs a remote mirror.
2020-10-27 15:28:20 -07:00
DC
ef6d51ed44 DragonFlyBSD build support. 2020-10-27 12:35:19 -07:00
Qi Wang
bf72188f80 Allow opt.tcache_max to accept small size classes.
Previously all the small size classes were cached.  However this has downsides
-- particularly when page size is greater than 4K (e.g. iOS), which will result
in much higher SMALL_MAXCLASS.

This change allows tcache_max to be set to lower values, to better control
resources taken by tcache.
2020-10-24 20:43:44 -07:00
David Goldblatt
ea32060f9c SEC: Implement thread affinity.
For now, just have every thread pick a shard once and stick with it.
2020-10-23 11:14:34 -07:00
David Goldblatt
d16849c91d psset: Do first-fit based on slab age.
This functions more like the serial number strategy of the ecache and
hpa_central_t.  Longer-lived slabs are more likely to continue to live for
longer in the future.
2020-10-23 11:14:34 -07:00
David Goldblatt
634ec6f50a Edata: add an "age" field. 2020-10-23 11:14:34 -07:00
David Goldblatt
6599651aee PA: Use an SEC in fron of the HPA shard. 2020-10-23 11:14:34 -07:00
David Goldblatt
ea51e97bb8 Add SEC module: a small extent cache.
This can be used to take pressure off a more centralized, worse-sharded
allocator without requiring a full break of the arena abstraction.
2020-10-23 11:14:34 -07:00
David Goldblatt
1964b08394 HPA: Add stats for the hpa_shard. 2020-10-23 11:14:34 -07:00
David Goldblatt
534504d4a7 HPA: add size-exclusion functionality.
I.e. only allowing allocations under or over certain sizes.
2020-10-23 11:14:34 -07:00
David Goldblatt
484f04733e HPA: Add central mutex contention stats. 2020-10-23 11:14:34 -07:00
David Goldblatt
bf025d2ec8 HPA: Make slab sizes and maxes configurable.
This allows easy experimentation with them as tuning parameters.
2020-10-23 11:14:34 -07:00
David Goldblatt
1c7da33317 HPA: Tie components into a PAI implementation. 2020-10-23 11:14:34 -07:00
Qi Wang
c8209150f9 Switch from opt.lg_tcache_max to opt.tcache_max
Though for convenience, keep parsing lg_tcache_max.
2020-10-22 20:40:41 -07:00
Yinan Zhang
5ba861715a Add thread name in prof last-N records 2020-10-20 15:58:24 -07:00
David Goldblatt
4ef5b8b4df Add a logo to doc_internal.
This is the logo from the jemalloc development team's snazzy windbreakers.  We
don't actually use it in any documentation yet, but there's no reason we
couldn't.  In the meantime, it's probably best if it exists somewhere more
stable than various email inboxes.
2020-10-19 15:32:51 -07:00
Qi Wang
5e41ff9b74 Add a hard limit on tcache max size class.
For locality reasons, tcache bins are integrated in TSD.  Allowing all size
classes to be cached has little benefit, but takes up much thread local storage.
In addition, it complicates the layout which we try hard to optimize.
2020-10-16 13:49:51 -07:00
Qi Wang
3de19ba401 Eagerly detect double free and sized dealloc bugs for large sizes. 2020-10-15 10:03:16 -07:00
David Goldblatt
be9548f2be Tcaches: Fix a subtle race condition.
Without a lock held continuously between checking tcaches_past and incrementing
it, it's possible for two threads to go down manual creation path
simultaneously.  If the number of tcaches is one less than the maximum, it's
possible for both to create a tcache and increment tcaches_past, with the second
thread returning a value larger than TCACHES_MAX.
2020-10-13 15:06:16 -07:00
Qi Wang
a9aa6f6d0f Fix the alloc_ctx check in free_fastpath.
The sanity check requires a functional TSD, which free_fastpath only guarantees
after the threshold branch.  Move the check function to afterwards.
2020-10-12 19:02:27 -07:00
David Goldblatt
b971f7c4dd Add "default" option to slab sizes.
This comes in handy when overriding earlier settings to test alternate ones.  We
don't really include tests for this, but I claim that's OK here:
- It's fairly straightforward
- It's fairly hard to test well
- This entire code path is undocumented and mostly for our internal
  experimentation in the first place.
- I tested manually.
2020-10-07 12:54:29 -07:00
David Goldblatt
21b70cb540 Add hpa_central module
This will be the centralized component of the coming hugepage allocator; the
source of larger chunks of memory from which smaller ones can be obtained.
2020-10-05 19:55:57 -07:00
David Goldblatt
1ed7ec369f Emap: Add emap_assert_not_mapped.
The counterpart to emap_assert_mapped, it lets callers check that some edata is
not already in the emap.
2020-10-05 19:55:57 -07:00
David Goldblatt
2a6ba121b5 PRNG test: cleanups.
Since we no longer have both atomic and non-atomic variants, there's no reason
to try to test both.
2020-10-05 19:55:57 -07:00
David Goldblatt
9e6aa77ab9 PRNG: Remove atomic functionality.
These had no uses and complicated the API.  As a rule we now expect to only use
thread-local randomization for contention-reduction reasons, so we only pay the
API costs and never get the functionality benefits.
2020-10-05 19:55:57 -07:00
David Goldblatt
0513047170 PRNG: Allow a a range argument of 1.
This is convenient when the range argument itself is generated from some
computation whose value we don't know in advance.
2020-10-05 19:55:57 -07:00
David Goldblatt
bdb60a8053 Appveyor: don't update msys2 keyring.
This is no longer required, and the step now fails.
2020-10-05 19:54:21 -07:00
David Goldblatt
025d8c37c9 Add a script to check for clang-formattedness. 2020-10-02 14:49:56 -07:00
David Goldblatt
f6bbfc1e96 Add a .clang-format file. 2020-10-02 14:49:56 -07:00
David Goldblatt
259c5e3e8f psset: Add stats 2020-09-18 12:39:25 -07:00
David Goldblatt
018b162d67 Add psset: a set of pageslabs.
This introduces a new sort of edata_t; a pageslab, and a set to manage them.
This is part of a series of a commits to implement a hugepage allocator; the
pageset will be per-arena, and track small page allocations requests within a
larger extent allocated from a centralized hugepage allocator.
2020-09-18 12:39:25 -07:00
David Goldblatt
ed99d300b9 Flat bitmap: Add longest-range computation.
This will come in handy in the (upcoming) page-slab set assertions.
2020-09-18 12:39:25 -07:00
David Goldblatt
e034500698 Edata: rename "ranged" bit to "pai".
This better represents its intended purpose; the hugepage allocator design
evolved away from needing contiguity of hugepage virtual address space.
2020-09-18 12:39:25 -07:00
David Goldblatt
7ad2f78663 Avoid a -Wundef warning on LG_SLAB_MAXREGS. 2020-09-17 10:05:40 -07:00
David Goldblatt
40cf71a06d Remove --with-slab-maxregs options from INSTALL.md
The variable slab sizes feature is still experimental; we don't want people to
start using it willy-nilly, or document its existence as a guarantee.
2020-09-17 10:05:40 -07:00
ezeeyahoo
36ebb5abe3 CI support for PPC64LE architecture 2020-09-17 10:03:08 -07:00
Hao Liu
1541ffc765 configure: add --with-lg-slab-maxregs configure option.
Specify the maximum number of regions in a slab, which is
(<lg-page> - <lg-tiny-min>) by default. This increases the limit of slab sizes
specified by "slab_sizes" in malloc_conf. This should never be less than
the default value. The max value of this option is related to LG_BITMAP_MAXBITS
(see more in bitmap.h).

For example, on a 4k page size system, if we:
  1) configure jemalloc with with --with-lg-slab-maxregs=12.
  2) export MALLOC_CONF="slab_sizes:9-16:4"
The slab size of 16 bytes is set to 4 pages. Previously, the default
lg-slab-maxregs is 9 (i.e. 12 - 3). The max slab size of 16 bytes is 2 pages
(i.e. (1<<9) * 16 bytes). By increasing the value from 9 to 12, the max slab
size can be set by MALLOC_CONF is 16 pages (i.e. (1<<12) * 16 bytes).
2020-09-16 13:58:38 -07:00
David Goldblatt
d243b4ec48 Add PROFILING_INTERNALS.md
This documents and explains some of the logic behind the profiling
implementation.
2020-09-10 15:56:59 -07:00
Yinan Zhang
09eda2c9b6 Add unit tests for usize in prof recent records 2020-09-09 13:31:35 -07:00
Yinan Zhang
b549389e4a Correct usize in prof last-N record 2020-09-09 13:31:35 -07:00
Yinan Zhang
202f01d4f8 Fix szind computation in profiling 2020-08-27 15:52:25 -07:00
Yinan Zhang
866231fc61 Do not repeat reentrancy test in profiling 2020-08-25 16:49:32 -07:00
Yinan Zhang
20f2479ed7 Do not create size class tables for non-prof builds 2020-08-24 20:10:02 -07:00
Yinan Zhang
8efcdc3f98 Move unbias data to prof_data 2020-08-24 20:10:02 -07:00
David Goldblatt
5e90fd006e Geom_grow: Don't keep the mutex internal.
We're about to use it in ways that will have external synchronization.
2020-08-19 16:53:21 -07:00
David Goldblatt
c57494879f Geom_grow: Don't take tsdn at init.
It's never used.
2020-08-19 16:53:21 -07:00
David Goldblatt
ffe552223c Geom_grow: Move in advancing logic. 2020-08-19 16:53:21 -07:00
David Goldblatt
131b1b5338 Rename ecache_grow -> geom_grow.
We're about to start using it outside of the ecaches, in the HPA central
allocator.
2020-08-19 16:53:21 -07:00
David Goldblatt
b399463fba flat_bitmap unit test: Silence a warning. 2020-08-17 12:50:27 -07:00
David Goldblatt
b0ffa39cac Mallctl stress test: fix a type.
The mallctlbymib_long helper was copy-pasted from mallctlbymib_short, and
incorrectly used its output variable (a char *) rather than the output variable
of the mallctl call it was using (a uint64_t), causing breakages when
sizeof(char *) differed from sizeof(uint64_t).
2020-08-17 12:50:14 -07:00
David Goldblatt
753bbf1849 Benchmarks: Also print ns / iter.
This is often what we really care about.  It's not easy to do the division
mentally in all cases.
2020-08-13 10:03:15 -07:00
David Goldblatt
7b187360e9 IO: Support 0-padding for unsigned numbers. 2020-08-13 10:03:15 -07:00
David Goldblatt
32d4673221 Add a mallctl speed stress test. 2020-08-13 10:03:15 -07:00
David Goldblatt
38867c5c17 Makefile: alphabetize stress/analyze utilities. 2020-08-13 10:03:15 -07:00
David Goldblatt
ab274a23b9 Add narenas_ratio.
This allows setting arenas per cpu dynamically, rather than forcing the user to
know the number of CPUs in advance if they want a particular CPU/space tradeoff.
2020-08-12 16:41:57 -07:00
David Goldblatt
9e18ae639f Config: safety checks don't imply size checks.
The commit introducing size checks accidentally enabled them whenever any safety
checks were on.  This ends up causing the regression that splitting up the
features was intended to avoid.  Fix the issue.
2020-08-12 13:00:19 -07:00
Yinan Zhang
8f9e958e1e Add alignment stress test for rallocx 2020-08-11 11:56:43 -07:00
Yinan Zhang
743021b63f Fix size miscalculation bug in reallocation 2020-08-11 11:56:43 -07:00
David Goldblatt
eaed1e39be Add sized-delete size-checking functionality.
The existing checks are good at finding such issues (on tcache flush), but not
so good at pinpointing them.  Debug mode can find them, but sometimes debug mode
slows down a program so much that hard-to-hit bugs can take a long time to
crash.

This commit adds functionality to keep programs mostly on their fast paths,
while also checking every sized delete argument they get.
2020-08-05 19:34:05 -07:00
David Goldblatt
53084cc5c2 Safety check: Don't directly abort.
The sized dealloc checks called the generic safety_check_fail, and then called
abort.  This means the failure case isn't mockable, hence not testable.  Fix it
in anticipation of a coming diff.
2020-08-05 19:34:05 -07:00
David Goldblatt
60993697d8 Prof: Add prof_unbias.
This gives more accurate attribution of bytes and counts to stack traces,
without introducing backwards incompatibilities in heap-profile parsing tools.
We track the ideal reported (to the end user) number of bytes more carefully
inside core jemalloc.  When dumping heap profiles, insteading of outputting our
counts directly, we output counts that will cause parsing tools to give a result
close to the value we want.

We retain the old version as an opt setting, to let users who are tracking
values on a per-component basis to keep their metrics stable until they decide
to switch.
2020-08-05 18:33:55 -07:00
David Goldblatt
81c2f841e5 Add a simple utility to detect profiling bias. 2020-08-05 18:33:55 -07:00
Yinan Zhang
e032a1a1de Add a stress test for batch allocation 2020-08-03 09:36:40 -07:00
Yinan Zhang
f6cf5eb388 Add mallctl for batch allocation API 2020-07-31 09:16:50 -07:00
Yinan Zhang
978f830ee3 Add batch allocation API 2020-07-31 09:16:50 -07:00
Yinan Zhang
c6f59e9bb4 Add surplus reading API for thread event lookahead 2020-07-31 09:16:50 -07:00
Yinan Zhang
f805468957 Add zero option to arena batch allocation 2020-07-31 09:16:50 -07:00
Yinan Zhang
49e5c2fe7d Add batch allocation from fresh slabs 2020-07-31 09:16:50 -07:00
Yinan Zhang
2bb8060d57 Add empty test and concat for typed list 2020-07-31 09:16:50 -07:00
Yinan Zhang
f28cc2bc87 Extract bin shard selection out of bin locking 2020-07-31 09:16:50 -07:00
David Goldblatt
ddb8dc4ad0 FB: Add range iteration support. 2020-07-30 15:25:23 -07:00
David Goldblatt
ceee823519 Add flat_bitmap.
The flat_bitmap module offers an extended API, at the cost of decreased
performance in the case of very large bitmaps.
2020-07-30 15:25:23 -07:00
David Goldblatt
7fde6ac490 Nbits: Add a couple more interesting sizes.
Previously, all tests with more than two levels came in powers of 2.  It's
usefule to check cases where we have a partially filled group at above the
second level.
2020-07-30 15:25:23 -07:00
David Goldblatt
efeab1f498 bitset test: Pull NBITS_TAB into its own file. 2020-07-30 15:25:23 -07:00
David Goldblatt
22da836094 bit_util: Add fls_ functions; "find last set".
These simplify a lot of the bit_util module, which had grown bits and pieces of
this functionality across a variety of places over the years.

While we're here, kill off BIT_UTIL_INLINE and don't do reentrancy testing for
bit_util.
2020-07-30 15:25:23 -07:00
David Goldblatt
1ed0288d9c bit_util: Change ffs functions indexing.
Making these 0-based instead of 1-based makes calling code simpler and will be
more consistent with functions introduced in subsequent diffs.
2020-07-30 15:25:23 -07:00
David Goldblatt
786a27b9e5 CI: Update keyring. 2020-07-27 15:54:57 -07:00
Yinan Zhang
fb347dc618 Verify output space before doing heavy work in mallctl 2020-07-27 09:48:35 -07:00
Yinan Zhang
f5fb4e5a97 Modify mallctl output length when needed
This is the only reason why `oldlenp` was designed to be in the form
of a pointer.
2020-07-27 09:48:35 -07:00
Yinan Zhang
4258402047 Corrections for prof_log_start() 2020-07-22 13:34:49 -07:00
Yinan Zhang
e6cb7a1c9b Shorten wait time for peak events 2020-07-14 09:00:33 -07:00
David Goldblatt
6107857b7b PA->PAC: Move in PAI implementation. 2020-07-09 13:41:04 -07:00
David Goldblatt
6041aaba97 PA -> PAC: Move in destruction functions. 2020-07-09 13:41:04 -07:00
David Goldblatt
cbf096b05e Arena: remove redundant bg inactivity check. 2020-07-09 13:41:04 -07:00
David Goldblatt
471eb5913c PAC: Move in decay rate setting. 2020-07-09 13:41:04 -07:00
David Goldblatt
6a2774719f PA->PAC: Move in decay functions. 2020-07-09 13:41:04 -07:00
David Goldblatt
4ee75be3a3 PA -> PAC: Move in decay_purge enum. 2020-07-09 13:41:04 -07:00
David Goldblatt
72435b0aba PA->PAC: Make extent.c forget about PA. 2020-07-09 13:41:04 -07:00
David Goldblatt
dee5d1c42d PA->PAC: Move in extent_sn. 2020-07-09 13:41:04 -07:00
David Goldblatt
7391382349 PA->PAC: Move in stats. 2020-07-09 13:41:04 -07:00
David Goldblatt
db211eefbf PAC: Move in decay. 2020-07-09 13:41:04 -07:00
David Goldblatt
c81e389996 PAC: Move in ecache_grow. 2020-07-09 13:41:04 -07:00
David Goldblatt
65803171a7 PAC: move in emap 2020-07-09 13:41:04 -07:00
David Goldblatt
7efcb946c4 PAC: Add an init function. 2020-07-09 13:41:04 -07:00
David Goldblatt
722652222a PAC: Move in edata_cache accesses. 2020-07-09 13:41:04 -07:00
David Goldblatt
777b0ba965 Add PAC: Page allocator classic.
For now, this is just a stub containing the ecaches, with no surrounding code
changed.  Eventually all the core allocator bits will be moved in, in the
subsequent stack of commits.
2020-07-09 13:41:04 -07:00
David Goldblatt
1b5f632e0f Introduce PAI: Page allocator interface 2020-07-09 13:41:04 -07:00
David Goldblatt
3cf19c6e5e atomic: add atomic_load_sub_store 2020-07-09 13:41:04 -07:00
David Goldblatt
f1f4ec315a Tcache: Tweak nslots_max tuning parameter.
In making these settings configurable, 634afc4124
unintentially changed a tuning parameter (reducing the "goal" max by a factor of
4).  This commit undoes that change.
2020-07-09 08:58:05 -07:00
David Goldblatt
ae541d3fab Edata: Reserve some space for hugepages. 2020-07-08 13:20:59 -07:00
David Goldblatt
392f645f4d Edata: split up different list linkage uses. 2020-07-08 13:20:59 -07:00
David Goldblatt
129b727058 Add typed-list module.
This gives some named convenience wrappers.
2020-07-08 13:20:59 -07:00
David Carlier
00f06c9beb enabling mpss on solaris/illumos.
reusing slighty linux configuration as possible, aligning the
 address range to HUGEPAGE.
2020-07-06 09:59:10 -07:00
Yinan Zhang
c2e7a06392 No need to intercept prof_dump_header() in tests 2020-06-29 14:27:50 -07:00
Yinan Zhang
f58ebdff7a Generalize prof_cnt_all() for testing 2020-06-29 14:27:50 -07:00
Yinan Zhang
80d18c18c9 Pass prof dump parameters explicitly in prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
d4259ea53b Simplify signatures for prof dump functions 2020-06-29 14:27:50 -07:00
Yinan Zhang
5d823f3a91 Consolidate struct definitions for prof dump parameters 2020-06-29 14:27:50 -07:00
Yinan Zhang
1f5fe3a3e3 Pass write callback explicitly in prof_data 2020-06-29 14:27:50 -07:00
Yinan Zhang
4556d3c0c8 Define structures for prof dump parameters 2020-06-29 14:27:50 -07:00
Yinan Zhang
1c6742e6a0 Migrate prof dumping to use buffered writer 2020-06-29 14:27:50 -07:00
Yinan Zhang
dad821bb22 Move unwind to prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
d128efcb6a Relocate a few prof utilities to the right modules 2020-06-29 14:27:50 -07:00
Yinan Zhang
4736fb4fc9 Move file handling logic in prof_data to prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
767a2e1790 Move file handling logic in prof to prof_sys 2020-06-29 14:27:50 -07:00
Yinan Zhang
03ae509f32 Create prof_sys module for reading system thread name 2020-06-29 14:27:50 -07:00
Yinan Zhang
adfd9d7b1d Change tsdn to tsd for thread name allocation 2020-06-29 14:27:50 -07:00
Yinan Zhang
841af2b426 Move thread name handling to prof_data module 2020-06-29 14:27:50 -07:00
Yinan Zhang
8118056c03 Expose prof_data testing internals only in prof tests 2020-06-29 14:27:50 -07:00
Yinan Zhang
f43ac8543e Correct prof header macro namings 2020-06-29 14:27:50 -07:00
Yinan Zhang
c8683bee80 Unify printing for prof counts object 2020-06-29 14:27:50 -07:00
Yinan Zhang
5d292b5660 Push error handling logic out of core dumping logic 2020-06-29 14:27:50 -07:00
Yinan Zhang
f541871f5d Reduce prof dump buffer size in debug build 2020-06-29 14:27:50 -07:00
Yinan Zhang
354183b10d Define prof dump buffer size centrally 2020-06-29 14:27:50 -07:00
Yinan Zhang
7455813e57 Make dump file writing replaceable in test 2020-06-29 14:27:50 -07:00
Yinan Zhang
21e44c45d9 Make maps file opening replaceable in test 2020-06-29 14:27:50 -07:00
Yinan Zhang
4bb4037dbe Extract utility function for opening maps file 2020-06-29 14:27:50 -07:00
Yinan Zhang
f307b25804 Only replace the dump file opening function in test 2020-06-29 14:27:50 -07:00
Yinan Zhang
d8cea87562 Move size inspections to test/analyze 2020-06-26 09:45:28 -07:00
Yinan Zhang
537a4bedb4 Add a tool to examine random number distributions 2020-06-26 09:45:28 -07:00
Yinan Zhang
d460333efb Improve naming for prof system thread name option 2020-06-24 14:32:01 -07:00
David T. Goldblatt
25e43c6022 Witness: Make ranks an enum.
This lets us avoid having to increment a bunch of values manually every time we
add a new sort of lock.
2020-06-19 18:05:08 -07:00
Yinan Zhang
092fcac0b4 Remove unnecessary source files 2020-06-19 12:15:44 -07:00
Yinan Zhang
a795b19327 Remove beginning define in source files
```
sed -i "/^#define JEMALLOC_[A-Z_]*_C_$/d" src/*.c;
```
2020-06-19 12:15:44 -07:00
Yinan Zhang
24bbf376ce Unify arena flag reading and selection 2020-06-19 11:06:05 -07:00
Yinan Zhang
e128b170a0 Do not fallback to auto arena when manual arena is requested 2020-06-19 11:06:05 -07:00
Yinan Zhang
95a59d2f72 Unify tcache flag reading and selection 2020-06-19 11:06:05 -07:00
Yinan Zhang
4b0c008489 Unify zero flag reading and setting 2020-06-19 11:06:05 -07:00
Yinan Zhang
2a84f9b8fc Unify alignment flag reading and computation 2020-06-19 11:06:05 -07:00
Yinan Zhang
b7858abfc0 Expose prof testing internal functions 2020-06-19 09:16:51 -07:00
Yinan Zhang
40fa6674a9 Fix prof timestamp conf reading 2020-06-17 16:02:51 -07:00
David Goldblatt
7e09a57b39 stress/sizes: Fix an off-by-one issue.
Algorithmically, a size greater than 1024 ZB could access one-past-the-end of
the sizes array.  This couldn't really happen since SIZE_MAX is less than 1024
ZB on all platforms we support (and we pick the arguments to this function to be
reasonable anyways), but it's not like there's any reason *not* to fix it,
either.
2020-06-16 10:34:19 -07:00
David Goldblatt
dcfa6fd507 stress/sizes: Add a couple more types. 2020-06-16 10:34:19 -07:00
David Goldblatt
40672b0b78 Remove duplicate logging in malloc. 2020-06-16 10:33:55 -07:00
Jon Haslam
4aea743279 High Resolution Timestamps for Profiling 2020-06-15 12:12:49 -07:00
David Goldblatt
d82a164d0d Add thread.peak.[read|reset] mallctls.
These can be used to track net allocator activity on a per-thread basis.
2020-06-11 13:54:22 -07:00
David Goldblatt
fe7108305a Add peak_t, for tracking allocator net max. 2020-06-11 13:54:22 -07:00
David Goldblatt
17a64fe91c Add a small program to print data structure sizes. 2020-06-11 08:13:38 -07:00
Yinan Zhang
3e19ebd2ea Add lock to protect prof last-N dumping 2020-06-09 17:03:05 -07:00
Yinan Zhang
a835d9cf85 Make prof last-N dumping non-blocking 2020-06-09 17:03:05 -07:00
Yinan Zhang
fc8bc4b5c0 Increase dump buffer for prof last-N list 2020-06-09 17:03:05 -07:00
Yinan Zhang
264d89d641 Extract restore and async cleanup functions for prof last-N list 2020-06-09 17:03:05 -07:00
Yinan Zhang
857ebd3daf Make edata pointer on prof recent record an atomic fence 2020-06-09 17:03:05 -07:00
Yinan Zhang
b8bdea6b26 Fix: prof_recent_alloc_max_ctl_read() does not take tsd 2020-06-09 17:03:05 -07:00
Yinan Zhang
730658f72f Extract alloc/dalloc utility for last-N nodes 2020-06-09 17:03:05 -07:00
Yinan Zhang
035be44867 Separate out dumping for each prof recent record 2020-06-09 17:03:05 -07:00
David Goldblatt
8da0896b79 Tcache: Make an integer conversion explicit. 2020-05-28 15:52:40 -07:00
David Goldblatt
cd28e60337 Don't warn on uniform initialization. 2020-05-28 15:52:40 -07:00
David Goldblatt
6cdac3c573 Tcache: Make flush fractions configurable. 2020-05-16 13:34:23 -07:00
David Goldblatt
7503b5b33a Stats, CTL: Expose new tcache settings. 2020-05-16 13:34:23 -07:00
David Goldblatt
ee72bf1cfd Tcache: Add tcache gc delay option.
This can reduce flushing frequency for small size classes.
2020-05-16 13:34:23 -07:00
David Goldblatt
d338dd45d7 Tcache: Make incremental gc bytes configurable. 2020-05-16 13:34:23 -07:00
David Goldblatt
ec0b579563 Tcache: Privatize opt_lg_tcache_max default. 2020-05-16 13:34:23 -07:00
David Goldblatt
10b96f6351 Tcache: Remove some unused gc constants. 2020-05-16 13:34:23 -07:00
David Goldblatt
181093173d Tcache: make slot sizing configurable. 2020-05-16 13:34:23 -07:00
David Goldblatt
b58dea8d1b Cache bin: expose ncached_max publicly. 2020-05-16 13:34:23 -07:00
David Goldblatt
634afc4124 Tcache: Make size computation configurable. 2020-05-16 13:34:23 -07:00
David Goldblatt
97b7a9cf77 Add a fill/flush microbenchmark. 2020-05-16 13:34:23 -07:00
David Carlier
33372cbd40 cpu instruction spin wait for arm32/64 2020-05-14 10:31:20 -07:00
Brooks Davis
27f29e424b LQ_QUANTUM should be 4 on mips64 hardware.
This matches the ABI stack alignment requirements.
2020-05-14 10:30:37 -07:00
David Goldblatt
eda9c2858f Edata: zero stack edatas before initializing.
This avoids some UB. No compilers take advantage of it for now, but no sense in
tempting fate.
2020-05-14 10:30:20 -07:00
David Goldblatt
5dead37a9d Allow narenas:default.
This can be useful when you know you want to override some lower-priority
configuration setting with its default value, but don't know what that value
would be.
2020-05-14 10:30:08 -07:00
Yinan Zhang
dcea2c0f8b Get rid of TSD -> thread event dependency 2020-05-12 09:16:16 -07:00
Yinan Zhang
75dae934a1 Always initialize TE counters in TSD init 2020-05-12 09:16:16 -07:00
Yinan Zhang
b06dfb9ccc Push event handlers to constituent modules 2020-05-12 09:16:16 -07:00
Yinan Zhang
381c97caa4 Treat postponed prof sample event as new event 2020-05-12 09:16:16 -07:00
Yinan Zhang
abd4674931 Extract out per event postponed wait time fetching 2020-05-12 09:16:16 -07:00
Yinan Zhang
f72014d097 Only compute thread event threshold once per trigger 2020-05-12 09:16:16 -07:00
Yinan Zhang
7324c4f85f Break down event init and handler functions 2020-05-12 09:16:16 -07:00
Yinan Zhang
6de77799de Move thread event wait time update to local 2020-05-12 09:16:16 -07:00
Yinan Zhang
733ae918f0 Extract out per event new wait time fetching 2020-05-12 09:16:16 -07:00
Yinan Zhang
1e2524e15a Do not reset sample wait time when re-initing tdata 2020-05-12 09:16:16 -07:00
Yinan Zhang
855d20f6f3 Remove outdated comments in thread event 2020-05-12 09:16:16 -07:00
Yinan Zhang
fc052ff728 Migrate counter to use locked int 2020-05-12 08:23:15 -07:00
Yinan Zhang
b543c20a94 Minor update to locked int 2020-05-12 08:23:15 -07:00
Yinan Zhang
f533ab6da6 Add forking handling for stats 2020-05-11 15:35:06 -07:00
Yinan Zhang
508303077b Add forking handling for prof idump counter 2020-05-11 15:35:06 -07:00
Yinan Zhang
4d970f8bfc Add forking handling for counter module 2020-05-11 15:35:06 -07:00
Yinan Zhang
2097e1945b Unify write callback signature 2020-05-11 14:51:24 -07:00
Yinan Zhang
fef9abdcc0 Cleanup tcache allocation logic
The logic in tcache allocation no longer involves profiling or
filling.
2020-05-11 12:24:56 -07:00
Yinan Zhang
e6cb6919c0 Consolidate prof inline function headers
The prof inline functions are no longer involved in a circular
dependency, so consolidate the two headers into one.
2020-05-11 12:24:56 -07:00
Yinan Zhang
d454af90f1 Remove unused prof_accum field from arena 2020-05-11 12:24:56 -07:00
Yinan Zhang
8be5584494 Initialize prof idump counter once rather than once per arena 2020-05-11 12:24:56 -07:00
Yinan Zhang
e10e5059e8 Make prof_idump_accum() non-inline 2020-05-11 12:24:56 -07:00
Yinan Zhang
039bfd4e30 Do not rollback prof idump counter in arena_prof_promote() 2020-05-11 12:24:56 -07:00
Yinan Zhang
0295aa38a2 Deduplicate entries in witness error message 2020-05-11 12:04:02 -07:00
David Goldblatt
f1f8a75496 Let opt.zero propagate to core allocation.
I.e. set dopts->zero early on if opt.zero is true, rather than leaving it set by
the entry-point function (malloc, calloc, etc.) and then memsetting.  This
avoids situations where we zero once in the large-alloc pathway and then again
via memset.
2020-05-04 12:36:45 -07:00
David Goldblatt
2c09d43494 Add a benchmark of large allocations. 2020-05-04 12:36:45 -07:00
David Goldblatt
46471ea327 SC: Name the max lookup constant. 2020-05-04 12:27:07 -07:00
David Goldblatt
79dd0c04ed SC: Simplify SC_NPSIZES computation.
Rather than taking all the sizes and subtracting out those that don't fit, we
instead just add up all the ones that do.
2020-05-04 12:27:07 -07:00
David Goldblatt
fb6cfffd39 Configure: Get rid of LG_QUANTA.
This is no longer used.
2020-05-04 12:27:07 -07:00
David Goldblatt
4f8efba824 TSD: Make rtree_ctx a slow-path field.
Performance-sensitive users will use sized deallocation facilities, so that
actually touching the rtree_ctx is unnecessary.  We make it the last element of
the slow data, so that it is for practical purposes almost-fast.
2020-04-14 15:20:19 -07:00
David Goldblatt
cd29ebefd0 Tcache: treat small and large cache bins uniformly 2020-04-14 15:20:19 -07:00
David Goldblatt
a13fbad374 Tcache: split up fast and slow path data. 2020-04-14 15:20:19 -07:00
David Goldblatt
7099c66205 Arena: fill in terms of cache_bins. 2020-04-14 15:20:19 -07:00
David Goldblatt
40e7aed59e TSD: Move in some of the tcache fields.
We had put these in the tcache for cache optimization reasons.  After the
previous diff, these no longer apply.
2020-04-14 15:20:19 -07:00
David Goldblatt
58a00df238 TSD: Put all fast-path data together. 2020-04-14 15:20:19 -07:00
David Goldblatt
3589571bfd SC: use SC_LG_NGROUP instead of its value.
This magic constant introduces inconsistencies.  We should be able to change its
value solely by adjusting the definition in the header.
2020-04-13 10:01:30 -07:00
David Goldblatt
877af247a8 QL, QR: Add documentation. 2020-04-11 10:32:11 -07:00
David Goldblatt
79ae7f9211 Rtree: Remove the per-field accessors.
We instead split things into "edata" and "metadata".
2020-04-10 13:12:47 -07:00
David Goldblatt
26e9a3103d PA: Simple decay test. 2020-04-10 13:12:47 -07:00
David Goldblatt
bb6a418523 Emap: Drop szind/slab splitting parameters.
After the previous diff, these are constants.
2020-04-10 13:12:47 -07:00
David Goldblatt
50289750b3 Extent: Remove szind/slab knowledge. 2020-04-10 13:12:47 -07:00
David Goldblatt
dc26b30094 Rtree: Clean up compact/non-compact split. 2020-04-10 13:12:47 -07:00
David Goldblatt
93b99dd140 Extent: Stop passing an edata_cache everywhere.
We already pass the pa_shard_t around everywhere; we can just use that.
2020-04-10 13:12:47 -07:00
David Goldblatt
a4759a1911 Ehooks: avoid touching arena_emap_global in tests.
That breaks our ability to test custom emaps in isolation.
2020-04-10 13:12:47 -07:00
David Goldblatt
11c47cb133 Extent: Take "bool zero" over "bool *zero". 2020-04-10 13:12:47 -07:00
David Goldblatt
1a1124462e PA: Take zero as a bool rather than as a bool *.
Now that we've moved junking to a higher level of the allocation stack, we don't
care about this performance optimization (which only occurred in debug modes).
2020-04-10 13:12:47 -07:00
David Goldblatt
294b276fc7 PA: Parameterize emap. Move emap_global to arena.
This lets us test the PA module without interfering with the global emap used by
the real allocator (the one not under test).
2020-04-10 13:12:47 -07:00
David Goldblatt
f730577277 Eset: Parameterize last globals accesses.
I.e. opt_retain and maps_coalesce.
2020-04-10 13:12:47 -07:00
David Goldblatt
7bb6e2dc0d Eset: take opt_lg_max_active_fit as a parameter.
This breaks its dependence on the global.
2020-04-10 13:12:47 -07:00
David Goldblatt
883ab327cc Emap: Move out last edata state touching. 2020-04-10 13:12:47 -07:00
David Goldblatt
0c96a2f03b Emap: Move out remaining edata modifications. 2020-04-10 13:12:47 -07:00
David Goldblatt
dfef0df71a Emap: Move edata modification out of emap_remap. 2020-04-10 13:12:47 -07:00
David Goldblatt
12eb888e54 Edata: Add a ranged bit.
We steal the dumpable bit, which we ended up not needing.
2020-04-10 13:12:47 -07:00
David Goldblatt
bd4fdf295e Rtree: Pull leaf contents into their own struct. 2020-04-10 13:12:47 -07:00
David Goldblatt
faec7219b2 PA: Move in decay initialization. 2020-04-10 13:12:47 -07:00
David Goldblatt
45671e4a27 PA: Move in retain growth limit setting. 2020-04-10 13:12:47 -07:00
David Goldblatt
daefde88fe PA: Move in mutex stats reading. 2020-04-10 13:12:47 -07:00
David Goldblatt
07675840a5 PA: Move in some more internals accesses. 2020-04-10 13:12:47 -07:00
David Goldblatt
238f3c7430 PA: Move in full stats merging. 2020-04-10 13:12:47 -07:00
David Goldblatt
81c6027592 Arena stats: Give it its own "mapped".
This distinguishes it from the PA mapped stat, which is now named "pa_mapped" to
avoid confusion. The (derived) arena stat includes base memory, and the PA stat
is no longer partially derived.
2020-04-10 13:12:47 -07:00
David Goldblatt
506d907e40 PA: Move in basic stats merging. 2020-04-10 13:12:47 -07:00
David Goldblatt
f29f6090f5 PA: Add pa_extra.c and put PA forking there. 2020-04-10 13:12:47 -07:00
David Goldblatt
8164fad404 Stats: Fix edata_cache size merging.
Previously, we assigned to the output rather than incrementing it.
2020-04-10 13:12:47 -07:00
David Goldblatt
565045ef71 Arena: Make more derived stats non-atomic/locked. 2020-04-10 13:12:47 -07:00
David Goldblatt
d0c43217b5 Arena stats: Move retained to PA, use plain ints.
Retained is a property of the allocated pages.  The derived fields no longer
require any locking; they're computed on demand.
2020-04-10 13:12:47 -07:00
David Goldblatt
e2cf3fb1a3 PA: Move in all modifications of mapped. 2020-04-10 13:12:47 -07:00
David Goldblatt
436789ad96 PA: Make mapped stat atomic.
We always have atomic_zu_t, and mapped/unmapped transitions are always expensive
enough that trying to piggyback on a lock is a waste of time.
2020-04-10 13:12:47 -07:00
David Goldblatt
3c28aa6f17 PA: Move edata_avail stat in, make it non-atomic. 2020-04-10 13:12:47 -07:00
David Goldblatt
f6bfa3dcca Move extent stats to the PA module.
While we're at it, make them non-atomic -- they are purely derived statistics
(and in fact aren't even in the arena_t or pa_shard_t).
2020-04-10 13:12:47 -07:00
David Goldblatt
527dd4cdb8 PA: Move in nactive counter. 2020-04-10 13:12:47 -07:00
David Goldblatt
c075fd0bcb PA: Minor cleanups and comment fixes. 2020-04-10 13:12:47 -07:00
David Goldblatt
46a9d7fc0b PA: Move in rest of purging. 2020-04-10 13:12:47 -07:00
David Goldblatt
2d6eec7b5c PA: Move in decay-all pathway. 2020-04-10 13:12:47 -07:00
David Goldblatt
65698b7f2e PA: Remove public visibility of some internals. 2020-04-10 13:12:47 -07:00
David Goldblatt
f012c43be0 PA: Move in decay_to_limit 2020-04-10 13:12:47 -07:00
David Goldblatt
103f5feda5 Move bg thread activity check out of purging core. 2020-04-10 13:12:47 -07:00
David Goldblatt
3034f4a508 PA: Move in decay_stashed. 2020-04-10 13:12:47 -07:00
David Goldblatt
aef28b2f8f PA: Move in stash_decayed. 2020-04-10 13:12:47 -07:00
David Goldblatt
655a096343 Move bg inactivity check out of purge inner loop.
I.e. do it once per call to arena_decay_stashed instead of once per muzzy purge.
2020-04-10 13:12:47 -07:00
David Goldblatt
71fc0dc968 PA: Move in remaining page allocation functions. 2020-04-10 13:12:47 -07:00
David Goldblatt
74958567a4 PA: have expand take sizes instead of new usize.
This avoids involving usize, which makes some of the stats modifications more
intuitively correct.
2020-04-10 13:12:47 -07:00
David Goldblatt
5bcc2c2ab9 PA: Have expand take szind and slab.
This isn't really necessary, but having a uniform API will help us later.
2020-04-10 13:12:47 -07:00
David Goldblatt
0880c2ab97 PA: Have large expands use it. 2020-04-10 13:12:47 -07:00
David Goldblatt
7be3dea82c PA: Have slab allocations use it. 2020-04-10 13:12:47 -07:00
David Goldblatt
9f93625c14 PA: Move in arena large allocation functionality. 2020-04-10 13:12:47 -07:00
David Goldblatt
7624043a41 PA: Add ehook-getting support. 2020-04-10 13:12:47 -07:00
David Goldblatt
eba35e2e48 Remove extent knowledge of arena. 2020-04-10 13:12:47 -07:00
David Goldblatt
e77f47a85a Move arena decay getters to PA. 2020-04-10 13:12:47 -07:00
David Goldblatt
48a2cd6d79 Decay: Add a (mostly stub) test case. 2020-04-10 13:12:47 -07:00
David Goldblatt
f77cec311e Decay: Take current time as an argument.
This better facilitates testing.
2020-04-10 13:12:47 -07:00
David Goldblatt
bf55e58e63 Rename test/unit/decay -> test/unit/arena_decay.
This is really more of an end-to-end test at the arena level; it's not just of
the decay code in particular any more.
2020-04-10 13:12:47 -07:00
David Goldblatt
d1d7e1076b Decay: move in some background_thread accesses. 2020-04-10 13:12:47 -07:00
David Goldblatt
cdb916ed3f Decay: Add comments for the public API. 2020-04-10 13:12:47 -07:00
David Goldblatt
8f2193dc8d Decay: Move in arena decay functions. 2020-04-10 13:12:47 -07:00
David Goldblatt
4d090d23f1 Decay: Introduce a stub .c file. 2020-04-10 13:12:47 -07:00
David Goldblatt
7b62885476 Introduce decay module and put decay objects in PA 2020-04-10 13:12:47 -07:00
David Goldblatt
497836dbc8 Arena stats: mark edata_avail as derived.
The true number is in the edata_cache itself.
2020-04-10 13:12:47 -07:00
David Goldblatt
3192d6b77d Extents: Have extent_dalloc_gap take ehooks.
We're almost to the point where the extent code doesn't know about arenas at
all.  In that world, we shouldn't pull them out of the arena.
2020-04-10 13:12:47 -07:00
David Goldblatt
22a0a7b93a Move arena_decay_extent to extent module. 2020-04-10 13:12:47 -07:00
David Goldblatt
70d12ffa05 PA: Move mapped into pa stats. 2020-04-10 13:12:47 -07:00
David Goldblatt
6ca918d0cf PA: Add a stats comment. 2020-04-10 13:12:47 -07:00
David Goldblatt
ce8c0d6c09 PA: Move in arena extent_sn counter.
Just another step towards making PA self-contained.
2020-04-10 13:12:47 -07:00
David Goldblatt
1ada4aef84 PA: Get rid of arena_ind_get calls.
This is another step on the path towards breaking the extent reliance on the
arena module.
2020-04-10 13:12:47 -07:00
David Goldblatt
1ad368c8b7 PA: Move in decay stats. 2020-04-10 13:12:47 -07:00
David Goldblatt
356aaa7dc6 Introduce lockedint module.
This pulls out the various abstractions where some stats counter is sometimes an
atomic, sometimes a plain variable, sometimes always protected by a lock,
sometimes protected by reads but not writes, etc.  With this change, these cases
are treated consistently, and access patterns tagged.

In the process, we fix a few missed-update bugs (where one caller assumes
"protected-by-a-lock" semantics and another does not).
2020-04-10 13:12:47 -07:00
David Goldblatt
acd0bf6a26 PA: move in ecache_grow. 2020-04-10 13:12:47 -07:00
David Goldblatt
32cb7c2f0b PA: Add a stats type. 2020-04-10 13:12:47 -07:00
David Goldblatt
688fb3eb89 PA: Move in the arena edata_cache. 2020-04-10 13:12:47 -07:00
David Goldblatt
8433ad84ea PA: move in shard initialization. 2020-04-10 13:12:47 -07:00
David Goldblatt
a24faed569 PA: Move in the ecache_t objects. 2020-04-10 13:12:47 -07:00
David Goldblatt
585f925055 Move cache index randomization out of extent.
This is logically at a higher level of the stack; extent should just allocate
things at the page-level; it shouldn't care exactly why the callers wants a
given number of pages.
2020-04-10 13:12:47 -07:00
David Goldblatt
12be9f5727 Add a stub PA module -- a page allocator. 2020-04-10 13:12:47 -07:00
Yinan Zhang
c4e9ea8cc6 Get rid of locks in prof recent test 2020-04-07 17:22:24 -07:00
Yinan Zhang
2deabac079 Get rid of custom iterator for last-N records 2020-04-07 17:22:24 -07:00
Yinan Zhang
a5ddfa7d91 Use ql for prof last-N list 2020-04-07 17:22:24 -07:00
David Goldblatt
8da6676a02 Don't do reentrant testing in junk tests. 2020-04-07 15:45:40 -07:00
Yinan Zhang
ce17af4221 Better structure ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
4b66297ea0 Add move constructor to ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
a62b7ed928 Add emptiness checking to ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
1dd24ca6d2 Add rotate functionality to ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
0dc95a882f Add concat and split functionality to ql module 2020-04-06 09:50:27 -07:00
Yinan Zhang
1ad06aa53b deduplicate insert and delete logic in qr module 2020-04-06 09:50:27 -07:00
Yinan Zhang
c9d56cddf2 Optimize meld in qr module
The goal of `qr_meld()` is to change the following four fields
`(a->prev, a->prev->next, b->prev, b->prev->next)` from the values
`(a->prev, a, b->prev, b)` to `(b->prev, b, a->prev, a)`.

This commit changes

```
a->prev->next = b;
b->prev->next = a;
temp = a->prev;
a->prev = b->prev;
b->prev = temp;
```

to

```
temp = a->prev;
a->prev = b->prev;
b->prev = temp;
a->prev->next = a;
b->prev->next = b;
```

The benefit is that we can use `b->prev->next` for `temp`, and so
there's no need to pass in `a_type`.

The restriction is that `b` cannot be a `qr_next()` macro, so users
of `qr_meld()` must pay attention.  (Before this change, neither `a`
nor `b` could be a `qr_next()` macro.)
2020-04-06 09:50:27 -07:00
David Goldblatt
0d6d9e8586 configure.ac: Put public symbols on one line. 2020-04-02 13:27:29 -07:00
Yinan Zhang
f9aad7a49b Add piping API to buffered writer 2020-04-01 09:41:20 -07:00
Yinan Zhang
09cd79495f Encapsulate buffer allocation failure in buffered writer 2020-04-01 09:41:20 -07:00
Yinan Zhang
a166c20818 Make prof_tctx_t pointer a true prof atomic fence 2020-03-31 17:43:42 -07:00
David T. Goldblatt
d936b46d3a Add malloc_conf_2_conf_harder
This comes in handy when you're just a user of a canary system who wants to
change settings set by the configuration system itself.
2020-03-31 06:25:08 -07:00
David Goldblatt
3b4a03b92b Mac: don't declare system functions as nothrow.
This contradicts the system headers, which can lead to breakages.
2020-03-26 14:11:24 -07:00
Yinan Zhang
2256ef8961 Add option to fetch system thread name on each prof sample 2020-03-24 21:39:57 -07:00
Yinan Zhang
ccdc70a5ce Fix: assertion could abort on past failures 2020-03-18 20:48:26 -07:00
Yinan Zhang
b30a5c2f90 Reorganize cpp APIs and suppress unused function warnings 2020-03-13 12:16:09 -07:00
David Goldblatt
2e5899c129 Stats: Fix tcache_bytes reporting.
Previously, large allocations in tcaches would have their sizes reduced during
stats estimation.  Added a test, which fails before this change but passes now.

This fixes a bug introduced in 5934846612, which
was itself fixing a bug introduced in 9c0549007d.
2020-03-13 07:53:34 -07:00
Yinan Zhang
a5780598b3 Remove thread_event_rollback() 2020-03-12 13:55:00 -07:00
Yinan Zhang
ba783b3a0f Remove prof -> thread_event dependency 2020-03-12 13:55:00 -07:00
Yinan Zhang
441d88d1c7 Rewrite profiling thread event 2020-03-12 13:55:00 -07:00
David Goldblatt
0dcd576600 Edata cache: atomic fetch-add -> load-store.
The modifications to count are protected by a mutex; there's no need to use the
more costly version.
2020-03-12 11:58:09 -07:00
David Goldblatt
99b1291d17 Edata cache: add edata_cache_small_t.
This can be used to amortize the synchronization costs of edata_cache accesses.
2020-03-12 11:58:09 -07:00
David Goldblatt
734109d9c2 Edata cache: add a unit test. 2020-03-12 11:58:09 -07:00
David Goldblatt
e732344ef1 Inspect test: Reduce checks when profiling is on.
Profiled small allocations don't live in bins, which is contrary to the test
expectation.
2020-03-12 11:58:09 -07:00
David Goldblatt
92485032b2 Cache bin: improve comments. 2020-03-12 11:54:19 -07:00
David Goldblatt
d701a085c2 Fast path: allow low-water mark changes.
This lets us put more allocations on an "almost as fast" path after a flush.
This results in around a 4% reduction in malloc cycles in prod workloads
(corresponding to about a 0.1% reduction in overall cycles).
2020-03-12 11:54:19 -07:00
David Goldblatt
397da03865 Cache bin: rewrite to track more state.
With this, we track all of the empty, full, and low water states together.  This
simplifies a lot of the tracking logic, since we now don't need the
cache_bin_info_t for state queries (except for some debugging).
2020-03-12 11:54:19 -07:00
David Goldblatt
fef0b1ffe4 Cache bin: Remove last internals accesses. 2020-03-12 11:54:19 -07:00
David Goldblatt
0a2fcfac01 Tcache: Hold cache bin allocation explicitly. 2020-03-12 11:54:19 -07:00
David Goldblatt
d498a4bb08 Cache bin: Add an emptiness assertion. 2020-03-12 11:54:19 -07:00
David Goldblatt
6a7aa46ef7 Cache bin: Add a debug method for init checking. 2020-03-12 11:54:19 -07:00
David Goldblatt
370c1ea007 Cache bin: Write the unit test in terms of the API
I.e. stop allowing the unit test to have secret access to implementation
internals.
2020-03-12 11:54:19 -07:00
David Goldblatt
7f5ebd211c Cache bin: set low-water internally. 2020-03-12 11:54:19 -07:00
David Goldblatt
60113dfe3b Cache bin: Move in initialization code. 2020-03-12 11:54:19 -07:00
David Goldblatt
44529da852 Cache-bin: Make flush modifications internal
I.e. the tcache code just calls a cache-bin function to finish flush (and move
pointers around, etc.).  It doesn't directly access the cache-bin's owned memory
any more.
2020-03-12 11:54:19 -07:00
David Goldblatt
ff6acc6ed5 Cache bin: simplify names and argument ordering.
We always start with the cache bin, then its info (if necessary).
2020-03-12 11:54:19 -07:00
David Goldblatt
e1dcc557d6 Cache bin: Only take the relevant cache_bin_info_t
Previously, we took an array of cache_bin_info_ts and an index, and dereferenced
ourselves.  But infos for other cache_bins aren't relevant to any particular
cache bin, so that should be the caller's job.
2020-03-12 11:54:19 -07:00
David Goldblatt
1b00d808d7 cache_bin: Don't let arena see empty position. 2020-03-12 11:54:19 -07:00
David Goldblatt
d303f30796 cache_bin nflush -> n.
We're going to use it on the fill pathway as well.
2020-03-12 11:54:19 -07:00
David Goldblatt
74d36d78ef Cache bin: Make ncached_max a query on the info_t. 2020-03-12 11:54:19 -07:00
David Goldblatt
b66c0973cc cache_bin: Don't allow direct internals access. 2020-03-12 11:54:19 -07:00
David Goldblatt
da68f73296 Move percpu_arena_update.
It's not really part of the API of the arena; it changes which arena we're using
that API on.
2020-03-12 11:54:19 -07:00
David Goldblatt
909c501b07 Cache_bin: Shouldn't know about tcache.
Instead, have it take the cache_bin_info_ts to use by pointer.  While we're
here, add a src file for the cache bin.
2020-03-12 11:54:19 -07:00
David Goldblatt
79f1ee2fc0 Move junking out of arena/tcache code.
This is debug only and we keep it off the fast path.  Moving it here simplifies
the internal logic.

This never tries to junk on regions that were shrunk via xallocx.  I think this
is fine for two reasons:
- The shrunk-with-xallocx case is rare.
- We don't always do that anyway before this diff (it depends on the opt
  settings and extent hooks in effect).
2020-03-12 11:54:19 -07:00
David Goldblatt
b428dceeaf Config: Warn on void * pointer arithmetic.
This is handy while developing, but not portable.
2020-03-12 11:54:19 -07:00
David Goldblatt
22657a5e65 Extents: Silence the "potentially unused" warning. 2020-03-12 11:54:19 -07:00
Yinan Zhang
4a78c6d81b Correct thread event unit test 2020-03-10 09:31:55 -07:00
Yinan Zhang
305b1f6d96 Correction on geometric sampling 2020-03-04 13:55:21 -08:00
David T. Goldblatt
6c3491ad31 Tcache: Unify bin flush logic.
The small and large pathways share most of their logic, even if some of the
individual operations are different.  We pull out the common logic into a
force-inlined function, and then specialize twice, once for each value of
"small".
2020-02-25 10:21:03 -08:00
David T. Goldblatt
9f4fc27389 Ehooks: Fix a build warning.
We wrote `return some_void_func()` in a function returning void, which is
confusing and triggers warnings on MSVC.
2020-02-25 10:21:03 -08:00
Qi Wang
bc31041edb Cirrus-CI: test on new freebsd releases. 2020-02-23 20:43:38 -08:00
Yinan Zhang
51bd147422 Make use of assert_* in test/unit/thread_event.c 2020-02-19 16:03:16 -08:00
Yinan Zhang
9d2cc3b0fa Make use of assert_* in test/unit/prof_recent.c 2020-02-19 16:03:16 -08:00
Yinan Zhang
a88d22ea11 Make use of assert_* in test/unit/inspect.c 2020-02-19 16:03:16 -08:00
Yinan Zhang
0ceb31184d Make use of assert_* in test/unit/buf_writer.c 2020-02-19 16:03:16 -08:00
Yinan Zhang
fa61579382 Add assert_* functionality to tests 2020-02-19 16:03:16 -08:00
Yinan Zhang
21dfa4300d Change assert_* to expect_* in tests
```
grep -Irl assert_ test/ | xargs sed -i \
    's/witness_assert/witness_do_not_replace/g';
grep -Irl assert_ test/ | xargs sed -i \
    's/malloc_mutex_assert_owner/malloc_mutex_do_not_replace_owner/g';

grep -Ir assert_ test/ | grep -o "[_a-zA-Z]*assert_[_a-zA-Z]*" | \
    grep -v "^assert_"; # confirm no output
grep -Irl assert_ test/ | xargs sed -i 's/assert_/expect_/g';

grep -Irl witness_do_not_replace test/ | xargs sed -i \
    's/witness_do_not_replace/witness_assert/g';
grep -Irl malloc_mutex_do_not_replace_owner test/ | xargs sed -i \
    's/malloc_mutex_do_not_replace_owner/malloc_mutex_assert_owner/g';
```
2020-02-19 16:03:16 -08:00
David T. Goldblatt
162c2bcf31 Background thread: take base as a parameter. 2020-02-18 11:22:09 -08:00
David T. Goldblatt
29436fa056 Break prof and tcache knowledge of b0. 2020-02-18 11:22:09 -08:00
David T. Goldblatt
a0c1f4ac57 Rtree: take the base allocator as a parameter.
This facilitates better testing by avoiding mixing of the "real" base with the
base used by the rtree under test.
2020-02-18 11:22:09 -08:00
David T. Goldblatt
7013716aaa Emap: Take (and propagate) a zeroed parameter.
Rtree needs this, and we should really treat them similarly.
2020-02-18 11:22:09 -08:00
David T. Goldblatt
182192f83c Base: Pull into a single header. 2020-02-18 11:22:09 -08:00
David T. Goldblatt
34b7165fde Put szind_t, pszind_t in sz.h. 2020-02-18 11:22:09 -08:00
David Goldblatt
7e6c8a7286 Emap: Standardize naming.
Namespace everything under emap_, always specify what it is we're looking up
(emap_lookup -> emap_edata_lookup), and use "ctx" over "info".
2020-02-17 10:50:51 -08:00
David Goldblatt
ac50c1e44b Emap: Remove direct access to emap internals.
In the process, we do a few local cleanups and optimizations.  In particular,
the size safety check on tcache flush no longer does a redundant load.
2020-02-17 10:50:51 -08:00
David Goldblatt
06e42090f7 Make jemalloc.c use the emap interface.
While we're here, we'll also clean up some style nits.
2020-02-17 10:50:51 -08:00
David Goldblatt
f7d9c6c42d Emap: Move in alloc_ctx lookup functionality. 2020-02-17 10:50:51 -08:00
David Goldblatt
65a54d7714 Emap: Move in szind and slab modifications. 2020-02-17 10:50:51 -08:00
David Goldblatt
9b5d105fc3 Emap: Move in iealloc.
This is logically scoped to the emap.
2020-02-17 10:50:51 -08:00
David Goldblatt
1d449bd9a6 Emap: Internal rtree context setting.
The only time sharing an rtree context saves across extent operations isn't a
no-op is when tsd is unavailable.  But this happens only in situations like
thread death or initialization, and we don't care about shaving off every
possible cycle in such scenarios.
2020-02-17 10:50:51 -08:00
David Goldblatt
08eb1e6c31 Emap: Comments and cleanup
Document some of the public interface, and hide the functions that are no longer
used outside of the emap module.
2020-02-17 10:50:51 -08:00
David Goldblatt
231d1477e5 Rename emap_split_prepare_t -> emap_prepare_t.
Both the split and merge functions use it.
2020-02-17 10:50:51 -08:00
David Goldblatt
0586a56f39 Emap: Move in merge functionality. 2020-02-17 10:50:51 -08:00
David Goldblatt
040eac77cc Tell edatas their creation arena immediately.
This avoids having to pass it in anywhere else.
2020-02-17 10:50:51 -08:00
David Goldblatt
7c7b702064 Emap: Move over metadata splitting logic. 2020-02-17 10:50:51 -08:00
David Goldblatt
44f5f53605 Emap: Move over deregistration functions. 2020-02-17 10:50:51 -08:00
David Goldblatt
6513d9d923 Emap: Move over deregistration boundary functions. 2020-02-17 10:50:51 -08:00
David Goldblatt
9b5ca0b09d Emap: Move in slab interior registration. 2020-02-17 10:50:51 -08:00
David Goldblatt
d05b61db4a Emap: Move extent boundary registration in. 2020-02-17 10:50:51 -08:00
David Goldblatt
ca21ce4071 Emap: Move in write_acquired from extent. 2020-02-17 10:50:51 -08:00
David Goldblatt
01f255161c Add emap, for tracking extent locking. 2020-02-17 10:50:51 -08:00
Qi Wang
0f686e82a3 Avoid variable length array with length 0. 2020-02-16 14:14:07 -08:00
Yinan Zhang
68e8ddcaff Add mallctl for dumping last-N profiling records 2020-02-14 12:46:38 -08:00
Yinan Zhang
bc05ecebf6 Add const qualifier in assert_cmp() 2020-02-14 12:46:38 -08:00
Qi Wang
ba0e35411c Rework the bin locking around tcache refill / flush.
Previously, tcache fill/flush (as well as small alloc/dalloc on the arena) may
potentially drop the bin lock for slab_alloc and slab_dalloc.  This commit
refactors the logic so that the slab calls happen in the same function / level
as the bin lock / unlock.  The main purpose is to be able to use flat combining
without having to keep track of stack state.

In the meantime, this change reduces the locking, especially for slab_dalloc
calls, where nothing happens after the call.
2020-02-13 23:31:54 -08:00
Kamil Rytarowski
7fd22f7b2e Fix Undefined Behavior in hash.h
hash.h:200:27, left shift of 250 by 24 places cannot be represented in type 'int'
2020-02-13 12:25:26 -08:00
Qi Wang
ca1f082251 Disallow merge across mmap regions to preserve SN / first-fit.
Check the is_head state before merging two extents.  Disallow the merge if it's
crossing two separate mmap regions.  This enforces first-fit (by not losing the
SN) at a very small cost.
2020-02-13 12:18:44 -08:00
Yinan Zhang
7014f81e17 Add ASSURED_WRITE in mallctl 2020-02-05 15:29:14 -08:00
Yinan Zhang
2476889195 Add inspect.c to MSVC filters 2020-02-05 10:01:49 -08:00
Yinan Zhang
9cac3fa8f5 Encapsulate buffer allocation in buffered writer 2020-02-04 13:21:58 -08:00
Yinan Zhang
bdc08b5158 Better naming buffered writer 2020-02-04 13:21:58 -08:00
Qi Wang
c6bfe55857 Update the tsd description. 2020-02-04 13:07:05 -08:00
Qi Wang
e896522616 Abbreviate thread-event to te. 2020-02-04 13:07:05 -08:00
Qi Wang
5e500523a0 Remove thread_event_boot(). 2020-02-04 00:18:15 -08:00
Qi Wang
97dd79db6c Implement deallocation events.
Make the event module to accept two event types, and pass around the event
context.  Use bytes-based events to trigger tcache GC on deallocation, and get
rid of the tcache ticker.
2020-02-04 00:18:15 -08:00
zoulasc
536ea6858e NetBSD specific changes:
- NetBSD overcommits
- When mapping pages, use the maximum of the alignment requested and the
  compiled-in PAGE constant which might be greater than the current kernel
  pagesize, since we compile binaries with the maximum page size supported
  by the architecture (so that they work with all kernels).
2020-02-03 15:49:36 -08:00
Qi Wang
974222c626 Add safety check on sdallocx slow / sampled path. 2020-01-31 00:04:22 -08:00
Qi Wang
88d9eca848 Enforce page alignment for sampled allocations.
This allows sampled allocations to be checked through alignment, therefore
enable sized deallocation regardless of cache_oblivious.
2020-01-31 00:04:22 -08:00
Qi Wang
0f552ed673 Don't purge huge extents when decay is off. 2020-01-30 14:40:38 -08:00
Qi Wang
38a48e5741 Set reentrancy to 1 for tsd_state_purgatory.
Reentrancy is already set for other non-nominal tsd states (reincarnated and
minimal_initialized).  Add purgatory to be safe and consistent.
2020-01-30 13:55:20 -08:00
Qi Wang
88b0e03a4e Implement opt.stats_interval and the _opts options.
Add options stats_interval and stats_interval_opts to allow interval based stats
printing.  This provides an easy way to collect stats without code changes,
because opt.stats_print may not work (some binaries never exit).
2020-01-29 09:57:55 -08:00
Qi Wang
d71a145ec1 Chagne prof_accum_t to counter_accum_t for general purpose. 2020-01-29 09:57:55 -08:00
Qi Wang
ea351a7b52 Fix syntax errors in doc for thread.idle. 2020-01-23 16:41:53 -08:00
David Goldblatt
d92f0175c7 Introduce NEITHER_READ_NOR_WRITE in ctl.
This is slightly clearer in meaning.  A function that is both READONLY() and
WRITEONLY() is in fact neither one.
2020-01-22 18:29:13 -08:00
David Goldblatt
6a622867ca Add "thread.idle" mallctl.
This can encapsulate various internal cleaning logic, and can be used to free up
resources before a long sleep.
2020-01-22 18:29:13 -08:00
Yinan Zhang
f81341a48b Fallback to unbuffered printing if OOM 2020-01-21 17:09:44 -08:00
Yinan Zhang
cd6e908241 Add stress test for last-N profiling mode 2020-01-21 16:51:26 -08:00
Yinan Zhang
84b28c6a13 Properly handle tdata deletion race 2020-01-21 16:51:26 -08:00
Yinan Zhang
d331208560 Get rid of redundant logic in prof 2020-01-21 16:51:26 -08:00
Yinan Zhang
a72ea0db60 Restructure and correct sleep utility for testing 2020-01-21 16:51:26 -08:00
Yinan Zhang
7b67ed0b5a Get rid of lock overlap in prof_recent_alloc_reset 2020-01-21 16:51:26 -08:00
David Goldblatt
bd3be8e0b1 Remove commit parameter to ecache functions.
No caller ever wants uncommitted memory.
2020-01-17 10:54:56 -08:00
Yinan Zhang
b8df719d5c No tdata creation for backtracing on dying thread 2020-01-16 21:54:14 -08:00
Qi Wang
dab81bd315 Rework and fix the assertions on malloc fastpath.
The first half of the malloc fastpath may execute before malloc_init.  Make the
assertions work in that case.
2020-01-14 15:00:41 -08:00
Yinan Zhang
ad3f3fc561 Fetch time after tctx and only for samples 2020-01-14 14:36:20 -08:00
Qi Wang
a5d3dd4059 Fix an assertion on extent head state with dss. 2020-01-10 13:29:14 -08:00
Yinan Zhang
2b604a3016 Record request size in prof recent entries 2020-01-10 12:01:01 -08:00
Yinan Zhang
40a391408c Define constructor for buffered writer argument 2020-01-10 11:59:02 -08:00
Yinan Zhang
6d8e616902 Make buffered writer an independent module 2020-01-10 11:59:02 -08:00
Yinan Zhang
6b6b4709b3 Unify buffered writer naming 2020-01-09 14:31:31 -08:00
Yinan Zhang
9a60cf54ec Last-N profiling mode 2019-12-30 15:58:57 -08:00
Yinan Zhang
7a27a05940 Delete tdata states used for cleanup 2019-12-30 15:58:57 -08:00
Yinan Zhang
e98ddf7987 Fix unlikely condition in arena_prof_info_get() 2019-12-30 15:58:57 -08:00
Yinan Zhang
3fa142cf39 Remove _externs from prof internal header names 2019-12-23 11:14:15 -08:00
Yinan Zhang
112dc36dd5 Handle log_mtx during forking 2019-12-20 17:17:48 -08:00
Yinan Zhang
ea42174d07 Refactor profiling headers 2019-12-20 17:17:48 -08:00
David Goldblatt
6342da0970 Ehooks: Further optimize default merge case.
This avoids the cost of an iealloc in cases where the user uses the default
merge hook without using the default extent hooks.
2019-12-20 10:18:40 -08:00
David Goldblatt
f2f2084e79 Ehooks: Assert alloc isn't NULL 2019-12-20 10:18:40 -08:00
David Goldblatt
e210ccc57e Move extent2 -> extent.
Eventually, we may fully break off the extent module; but not for some time.  If
it's going to live on in a non-transitory state, it might as well have the nicer
name.
2019-12-20 10:18:40 -08:00
David Goldblatt
2f4fa80414 Rename extents -> ecache. 2019-12-20 10:18:40 -08:00
David Goldblatt
56cc56b692 Break extent split dependence on arena. 2019-12-20 10:18:40 -08:00
David Goldblatt
0aa9769fb0 Break commit functions' arena dependence 2019-12-20 10:18:40 -08:00
David Goldblatt
48ec5d4355 Break extent_coalesce arena dependence 2019-12-20 10:18:40 -08:00
David Goldblatt
282a382326 Extent: Break [de]activation's arena dependence. 2019-12-20 10:18:40 -08:00
David Goldblatt
576d7047ab Ecache: Should know its arena_ind.
What we call an arena_ind is really the index associated with some particular
set of ehooks; the arena is just the user-visible portion of that.  Making this
explicit, and reframing checks in terms of that, makes the code simpler and
cleaner, and helps us avoid passing the arena itself all throughout extent code.

This lets us put back an arena-specific assert.
2019-12-20 10:18:40 -08:00
David Goldblatt
372042a082 Remove merge dependence on the arena. 2019-12-20 10:18:40 -08:00
David Goldblatt
439219be7e Remove extent_can_coalesce arena dependency. 2019-12-20 10:18:40 -08:00
David Goldblatt
9cad5639ff Ehooks: remove arena_ind parameter.
This lives within the ehooks_t now, so that callers don't need to know it.
2019-12-20 10:18:40 -08:00
David Goldblatt
57fe99d4be Move relevant index into the ehooks_t itself.
It's always passed into the ehooks; keeping it colocated lets us avoid passing
the arena everywhere.
2019-12-20 10:18:40 -08:00
David Goldblatt
c792f3e4ab edata_cache: Remember the associated base_t.
This will save us some trouble down the line when we stop passing arena pointers
everywhere; we won't have to pass around a base_t pointer either.
2019-12-20 10:18:40 -08:00
David Goldblatt
ae23e5f426 Unify extent_alloc_wrapper with the other wrappers.
Previously, it was really more like extents_alloc (it looks in an ecache for an
extent to reuse as its primary allocation pathway).  Make that pathway more
explciitly like extents_alloc, and rename extent_alloc_wrapper_hard accordingly.
2019-12-20 10:18:40 -08:00
David Goldblatt
d8b0b66c6c Put extent_state_t into ecache as well as eset. 2019-12-20 10:18:40 -08:00
David Goldblatt
98eb40e563 Move delay_coalesce from the eset to the ecache. 2019-12-20 10:18:40 -08:00
David Goldblatt
bb70df8e5b Extent refactor: Introduce ecache module.
This will eventually completely wrap the eset, and handle concurrency,
allocation, and deallocation.  For now, we only pull out the mutex from the
eset.
2019-12-20 10:18:40 -08:00
David Goldblatt
0704516245 Ehooks: Add head tracking. 2019-12-20 10:18:40 -08:00
David Goldblatt
09475bf8ac extent_may_dalloc -> ehooks_dalloc_will_fail 2019-12-20 10:18:40 -08:00
David Goldblatt
7859184179 Pull out edata_t caching into its own module. 2019-12-20 10:18:40 -08:00
David Goldblatt
a7862df616 Rename extent_t to edata_t.
This frees us up from the unfortunate extent/extent2 naming collision.
2019-12-20 10:18:40 -08:00
David Goldblatt
865debda22 Rename extent.h -> edata.h.
This name is slightly pithier; a full-on rename will come shortly.
2019-12-20 10:18:40 -08:00
David Goldblatt
a738a66b5c Ehooks: Add some debug zero and addr checks.
These help make sure that the ehooks return properly zeroed memory when required
to.
2019-12-20 10:18:40 -08:00
David Goldblatt
4b2e5ee8b9 Ehooks: Add a "zero" ehook.
This is the first API expansion.  It lets the hooks pick where and how to purge
within themselves.
2019-12-20 10:18:40 -08:00
David Goldblatt
d0f187ad3b Arena: Loosen arena_may_have_muzzy restrictions.
If there are custom extent hooks, pages_can_purge_lazy is not necessarily the
right guard.  We could check ehooks_are_default too, but the case where
purge_lazy is unsupported is rare and getting rarer.  Just checking the decay
interval captures most of the benefit.
2019-12-20 10:18:40 -08:00
David Goldblatt
ebbb973271 Base: Remove some unnecessary reentrancy guards.
The ehooks module will now call these if necessary.
2019-12-20 10:18:40 -08:00
David Goldblatt
403f2d1664 Extents: Split out introspection functionality.
This isn't really part of the core extent allocation facilities.  Especially as
this module grows, having it in its own place may come in handy.
2019-12-20 10:18:40 -08:00
David Goldblatt
92a511d385 Make extent module hermetic.
In the form of extent2.h.  The naming leaves something to be desired, but I'll
leave that for a later diff.
2019-12-20 10:18:40 -08:00
David Goldblatt
e08c581cf1 Extent: Get rid of extent-specific pre/post reentrancy calls.
These are taken care of by the ehook module; the extra increments and
decrements are safe but unnecessary.
2019-12-20 10:18:40 -08:00
David Goldblatt
39fdc690a0 Ehooks comments and cleanup. 2019-12-20 10:18:40 -08:00
David Goldblatt
c8dae890c8 Extent -> Ehooks: Move over default hooks. 2019-12-20 10:18:40 -08:00
David Goldblatt
2fe5108263 Extent -> Ehooks: Move merge hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
1fff4d2ee3 Extent -> Ehooks: Move split hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
a5b42a1a10 Extent -> Ehooks: Move purge_forced hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
368baa42ef Extent -> Ehooks: Move purge_lazy hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
f83fdf5336 Extent: Clean up a comma 2019-12-20 10:18:40 -08:00
David Goldblatt
d78fe241ac Extent -> Ehooks: Move commit and decommit hooks. 2019-12-20 10:18:40 -08:00
David Goldblatt
5459ec9dae Extent -> Ehooks: Move destroy hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
bac8e2e5a6 Extent -> Ehooks: Move dalloc hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
dc8b4e6e13 Extent -> Ehooks: Move alloc hook. 2019-12-20 10:18:40 -08:00
David Goldblatt
703fbc0ff5 Introduce unsafe reentrancy guards.
We have to work to circumvent the safety checks in pre_reentrancy when going
down extent hook pathways.  Instead, let's explicitly have checked and unchecked
guards.
2019-12-20 10:18:40 -08:00
David Goldblatt
ae0d8e8591 Move extent ehook calls into ehooks 2019-12-20 10:18:40 -08:00
David Goldblatt
ba8b9ecbcb Add ehooks module 2019-12-20 10:18:40 -08:00
David Goldblatt
837119a948 base_structs.h: Remove some mid-line tabs. 2019-12-20 10:18:40 -08:00
David Goldblatt
9f6eb09585 Extents: Eagerly initialize extent hooks.
When deferred initialization was added, initializing required copying
sizeof(extent_hooks_t) bytes after a pointer chase. Today, it's just a single
pointer loaded from the base_t. In subsequent diffs, we'll get rid of even that.
2019-12-20 10:18:40 -08:00
David Goldblatt
4278f84603 Move extent hook getters/setters to arena.c
This is where they're logically scoped; they access arena data.
2019-12-20 10:18:40 -08:00
Wenbo Zhang
9226e1f0d8 fix opt.thp:never still use THP with base_new 2019-12-19 13:27:00 -08:00
Qi Wang
d5031ea824 Allow dallocx and sdallocx after tsd destruction.
After a thread turns into purgatory / reincarnated state, still allow dallocx
and sdallocx to function normally.
2019-12-19 11:17:03 -08:00
Yinan Zhang
4afd709d1f Restructure setters for profiling info
Explicitly define three setters:

- `prof_tctx_reset()`: set `prof_tctx` to `1U`, if we don't know in
advance whether the allocation is large or not;
- `prof_tctx_reset_sampled()`: set `prof_tctx` to `1U`, if we already
know in advance that the allocation is large;
- `prof_info_set()`: set a real `prof_tctx`, and also set other
profiling info e.g. the allocation time.

Code structure wise, the prof level is kept as a thin wrapper, the
large level only provides low level setter APIs, and the arena level
carries out the main logic.
2019-12-17 10:01:28 -08:00
Yinan Zhang
1d01e4c770 Initialization utilities for nstime 2019-12-16 16:08:56 -08:00
Qi Wang
dd649c9485 Optimize away the tsd_fast() check on fastpath.
Fold the tsd_state check onto the event threshold check.  The fast threshold is
set to 0 when tsd switch to non-nominal.

The fast_threshold can be reset by remote threads, to refect the non nominal tsd
state change.
2019-12-11 23:44:20 -08:00
Qi Wang
1decf958d1 Fix incorrect usage of cassert. 2019-12-11 14:02:59 -08:00
Yinan Zhang
45836d7fd3 Pass nstime_t pointer for profiling 2019-12-11 11:38:16 -08:00
Yinan Zhang
7d2bac5a38 Refactor destroy code path for prof_tctx 2019-12-10 16:31:05 -08:00
Yinan Zhang
055478cca8 Threshold is no longer updated before prof_realloc() 2019-12-10 16:31:05 -08:00
Yinan Zhang
7e3671911f Get rid of old indentation style for prof 2019-12-06 09:47:51 -08:00
Yinan Zhang
dfdd46f6c1 Refactor prof_tctx_t creation 2019-12-06 09:47:51 -08:00
Yinan Zhang
aa1d71fb7a Rename prof_tctx to alloc_tctx in prof_info_t 2019-12-06 09:47:51 -08:00
Yinan Zhang
5e0b090992 No need to pass usize to prof_tctx_set() 2019-12-06 09:47:51 -08:00
David Goldblatt
1b1e76acfe Disable some spuriously-triggering warnings 2019-12-04 13:45:17 -08:00
Li-Wen Hsu
a70909b130 Test on all supported release of FreeBSD
Keep 11.2 because 11.3 is temporarily not available for now.
2019-12-02 13:18:12 -08:00
Yinan Zhang
5c47a30227 Guard C++ aligned APIs 2019-11-25 18:02:16 -08:00
Yinan Zhang
6945371778 Change tsdn to tsd for profiling code path 2019-11-22 16:31:56 -08:00
Yinan Zhang
b55419f9b9 Restructure profiling
Develop new data structure and code logic for holding profiling
related information stored in the extent that may be needed after the
extent is released, which in particular is the case for the
reallocation code path (e.g. in `rallocx()` and `xallocx()`).  The
data structure is a generalization of `prof_tctx_t`: we previously
only copy out the `prof_tctx` before the extent is released, but we
may be in need of additional fields. Currently the only additional
field is the allocation time field, but there may be more fields in
the future.

The restructuring also resolved a bug: `prof_realloc()` mistakenly
passed the new `ptr` to `prof_free_sampled_object()`, but passing in
the `old_ptr` would crash because it's already been released.  Now
the essential profiling information is collectively copied out early
and safely passed to `prof_free_sampled_object()` after the extent is
released.
2019-11-22 16:31:56 -08:00
Mark Santaniello
8b2c2a596d Support C++17 over-aligned allocation
Summary:
Add support for C++17 over-aligned allocation:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0035r4.html

Supporting all 10 operators means we avoid thunking thru libstdc++-v3/libsupc++ and just call jemalloc directly.

It's also worth noting that there is now an aligned *and sized* operator delete:
```
void operator delete(void* ptr, std::size_t size, std::align_val_t al) noexcept;
```

If JeMalloc did not provide this, the default implementation would ignore the size parameter entirely:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/libsupc%2B%2B/del_opsa.cc#L30-L33

(I must also update ax_cxx_compile_stdcxx.m4 to a newer version with C++17 support.)

Test Plan:
Wrote a simple test that allocates and then deletes an over-aligned type:
```
struct alignas(32) Foo {};
Foo *f;

int main()
{
  f = new Foo;
  delete f;
}
```

Before this change, both new and delete go thru PLT, and we end up calling regular old free:
```
(gdb) disassemble
Dump of assembler code for function main():
...
   0x00000000004029b7 <+55>:    call   0x4022d0 <_ZnwmSt11align_val_t@plt>
...
   0x00000000004029d5 <+85>:    call   0x4022e0 <_ZdlPvmSt11align_val_t@plt>
...
(gdb) s
free (ptr=0x7ffff6408020) at /home/engshare/third-party2/jemalloc/master/src/jemalloc.git-trunk/src/jemalloc.c:2842
2842            if (!free_fastpath(ptr, 0, false)) {
```

After this change, we directly call new/delete and ultimately call sdallocx:
```
(gdb) disassemble
Dump of assembler code for function main():
...
   0x0000000000402b77 <+55>:    call   0x496ca0 <operator new(unsigned long, std::align_val_t)>
...
   0x0000000000402b95 <+85>:    call   0x496e60 <operator delete(void*, unsigned long, std::align_val_t)>
...
(gdb) s
116             je_sdallocx_noflags(ptr, size);
```
2019-11-22 10:14:16 -08:00
Qi Wang
9a3c738009 Refactor arena_bin_malloc_hard(). 2019-11-21 11:41:26 -08:00
Qi Wang
9a7ae3c97f Reduce footprint of bin_t.
Avoid storing mutex_prof_data_t in bin_t.  Added bin_stats_data_t which is used
for reporting bin stats.
2019-11-21 11:08:36 -08:00
Qi Wang
cb1a1f4ada Remove the unnecessary alloc_ctx on free_fastpath. 2019-11-16 13:41:13 -08:00
Qi Wang
7160617107 Add branch hints to free_fastpath.
Explicityly mark the non-slab case unlikely.  Previously there were jumps in the
common case.
2019-11-16 13:41:13 -08:00
Qi Wang
a787d2f5b3 Prefer getaffinity() to detect number of CPUs. 2019-11-15 16:24:38 -08:00
Qi Wang
04cb7d4d6b Bail out early for muzzy decay.
This avoids taking the muzzy decay mutex with the default setting.
2019-11-15 16:24:15 -08:00
Yinan Zhang
73510dfd15 Revert "Fix bug in prof_realloc"
This reverts commit 3b5eecf102.
2019-11-15 15:13:39 -08:00
Yinan Zhang
3b5eecf102 Fix bug in prof_realloc
We should pass in `old_ptr` rather than the new `ptr` to
`prof_free_sampled_object()` when `old_ptr` points to a sampled
allocation.
2019-11-15 13:28:33 -08:00
Qi Wang
e4c36a6f30 Emphasize no modification through thread.allocatedp allowed. 2019-11-13 09:12:08 -08:00
Leonardo Santagada
c462753cc8 Use __forceinline for JEMALLOC_ALWAYS_INLINE on msvc 2019-11-12 13:50:25 -08:00
Qi Wang
836d7a7e69 Check for large size first in the uncommon case of malloc.
Larger sizes are not that uncommon comparing to !tsd_fast.
2019-11-11 13:30:20 -08:00
Qi Wang
9c59abe42a Fix a typo in Makefile. 2019-11-11 12:17:08 -08:00
Qi Wang
da50d8ce87 Refactor and optimize prof sampling initialization.
Makes the prof sample prng use the tsd prng_state.  This allows us to properly
initialize the sample interval event, without having to create tdata.  As a
result, tdata will be created on demand (when a thread reaches the sample
interval bytes allocated), instead of on the first allocation.
2019-11-11 10:35:37 -08:00
Qi Wang
bc774a3519 Rename tsd->offset_state to tsd->prng_state. 2019-11-11 10:35:37 -08:00
Qi Wang
19a51abf33 Avoid arena->offset_state when tsd not available for prng.
Use stack locals and remove the offset_state in arena.
2019-11-11 10:35:37 -08:00
Nick Desaulniers
d01b425e5d Add -Wimplicit-fallthrough checks if supported
Clang since r369414 (clang-10) can now check -Wimplicit-fallthrough for
C code, and use the GNU C style attribute to denote fallthrough.

Move the test from header only to autoconf. The previous test used
brittle version detection which did not work for newer clang that
supported this feature.

The attribute has to be its own statement, hence the added `;`. It also
can only precede case statements, so the final cases should be
explicitly terminated with break statements.

Fixes commit 3d29d11ac2 ("Clean compilation -Wextra")
Link: 1e0affb6e5
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
2019-11-08 13:03:03 -08:00
Yinan Zhang
a8b578d538 Remove mallctl test for zero_realloc 2019-11-05 10:09:18 -08:00
Yinan Zhang
43f0ce92d8 Define general purpose tsd_thread_event_init() 2019-11-04 16:07:56 -08:00
Yinan Zhang
97f93fa0f2 Pull tcache GC events into thread event handler 2019-11-04 16:07:56 -08:00
Yinan Zhang
198f02e797 Pull prof_accumbytes into thread event handler 2019-11-04 15:21:16 -08:00
Yinan Zhang
152c0ef954 Build a general purpose thread event handler 2019-11-04 11:15:50 -08:00
RingsC
6924f83cb2 use SYS_openat when available
some architecture like AArch64 may not have the open syscall, but have
openat syscall. so check and use SYS_openat if SYS_openat available if
SYS_open is not supported at init_thp_state.
2019-11-01 13:06:40 -07:00
David T. Goldblatt
de81a4eada Add stats counters for number of zero reallocs 2019-10-29 17:48:44 -07:00
David T. Goldblatt
9cfa805947 Realloc: Make behavior of realloc(ptr, 0) configurable. 2019-10-29 17:48:44 -07:00
David T. Goldblatt
ee961c2310 Merge realloc and rallocx pathways. 2019-10-29 17:48:44 -07:00
Yinan Zhang
bd6e28d6a3 Guard slabcur fetching in extent_util 2019-10-28 17:27:51 -07:00
Yinan Zhang
4786099a3a Increase column width for global malloc/free rate 2019-10-24 14:54:51 -07:00
Yinan Zhang
05681e387a Optimize cache_bin_alloc_easy for malloc fast path
`tcache_bin_info` is not accessed on malloc fast path but the
compiler reserves a register for it, as well as an additional
register for `tcache_bin_info[ind].stack_size`.  The optimization
gets rid of the need for the two registers.
2019-10-21 16:43:45 -07:00
Yinan Zhang
4fe50bc7d0 Fix amd64 MSVC warning 2019-10-18 10:16:29 -07:00
Yinan Zhang
4fbbc817c1 Simplify time setting and getting for prof log 2019-10-16 09:24:52 -07:00
Qi Wang
4094b7c03f Limit # of iters of test_bitmap_xfu.
Otherwise the test is too slow for higher page sizes such as 64k.
2019-10-09 11:15:37 -07:00
Yinan Zhang
66e07f986d Suppress tdata creation in reentrancy
This change suppresses tdata initialization and prof sample threshold
update in interrupting malloc calls.  Interrupting calls have no need
for tdata.  Delaying tdata creation aligns better with our lazy tdata
creation principle, and it also helps us gain control back from
interrupting calls more quickly and reduces any risk of delegating
tdata creation to an interrupting call.
2019-10-04 08:52:50 -07:00
Yinan Zhang
beb7c16e94 Guard prof_active reset by opt_prof
Set `prof_active` to read-only when `opt_prof` is turned off.
2019-10-02 11:42:53 -07:00
Gareth Lloyd
1df9dd3515 Fix je_ prefix issue in test 2019-09-24 11:24:57 -07:00
David T. Goldblatt
3d84bd57f4 Arena: Add helper function arena_get_from_extent. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
c97d255752 Eset: Remove temporary declaration. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
ce5b128f10 Remove the undefined extent_size_quantize declarations. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
821dd53a1d Extent -> Eset: Rename arena members. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
e144b21e4b Extent -> Eset: Move fork handling. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
77bbb35a92 Extent -> Eset: Move extent fit functions. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
1210af9a4e Extent -> Eset: Move insertion and removal. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
a42861540e Extents -> Eset: Convert some stats getters. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
820f070c6b Move page quantization to sz module. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
63d1b7a7a7 Extents -> Eset: move extents_state_get. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
b416b96a39 Extents -> Eset: rename/move extents_init. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
e6180fe1b4 Eset: Add a source file.
This will let us move extents_* functions over one by one.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
4e5e43f22e Rename extents_t -> eset_t. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
723ccc6c27 Extents: Split out extent struct. 2019-09-23 23:06:27 -07:00
David T. Goldblatt
41187bdfb0 Extents: Break extent-struct/arena interactions
Specifically, the extent_arena_[g|s]et functions and the address randomization.

These are the only things that tie the extent struct itself to the arena code.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
529cfe2abc Arena: rename arena_structs_b.h -> arena_structs.h
arena_structs_a.h was removed in the previous commit.
2019-09-23 23:06:27 -07:00
David T. Goldblatt
e7cf84a8dd Rearrange slab data and constants
The constants logically belong in the sc module. The slab data bitmap isn't
really scoped to an arena; move it to its own module.
2019-09-23 23:06:27 -07:00
Qi Wang
d1be488cd8 Add --with-lg-page=16 to CI. 2019-09-22 18:51:03 -07:00
Qi Wang
ac5185f73e Fix tcache bin stack alignment.
Set the proper alignment when allocating space for the tcache bin stack.
2019-09-13 12:32:29 -07:00
zhxchen17
b7c7df24ba Add max_per_bg_thd stats for per background thread mutexes.
Added a new stats row to aggregate the maximum value of mutex counters for each
background threads.  Given that the per bg thd mutex is not expected to be
contended, this counter is mainly for sanity check / debugging.
2019-09-13 09:23:57 -07:00
zhxchen17
4b76c684bb Add "prof.dump_prefix" to override filename prefixes for dumps. 2019-09-12 22:26:03 -07:00
zhxchen17
242af439b8 Rename "prof_dump_seq_mtx" to "prof_dump_filename_mtx". 2019-09-12 22:26:03 -07:00
Giridhar Prasath R
e06658cb24 check GNU make exists in path
Signed-off-by: Giridhar Prasath R <cristianoprasath@gmail.com>
2019-09-11 16:36:19 -07:00
Qi Wang
22bc75ee3e Workaround the stringop-overflow check false positives. 2019-09-09 11:35:04 -07:00
Yinan Zhang
93d6151800 Pass tsd down to prof_backtrace() 2019-09-05 10:57:43 -07:00
Yinan Zhang
671f120e26 Fix prof_backtrace() reentrancy level 2019-09-05 10:57:43 -07:00
Qi Wang
785b84e603 Make cache_bin_sz_t unsigned.
The bin size type was made signed only because the low_water could go -1, which
was already removed.
2019-09-04 13:37:07 -07:00
Qi Wang
23dc7a7fba Fix index type for cache_bin_alloc_easy. 2019-09-04 13:37:07 -07:00
Qi Wang
2abb02ecd7 Fix MSVC 2015 build, as proposed by @christianaguilera-foundry. 2019-08-28 23:37:24 -07:00
Qi Wang
719583f14a Fix large.nflushes in the merged stats. 2019-08-28 23:37:00 -07:00
Yinan Zhang
adce29c885 Optimize for prof_active off
Move the handling of `prof_active` off case completely to slow path,
so as to reduce register pressure on malloc fast path.
2019-08-27 14:48:56 -07:00
Yinan Zhang
49e6fbce78 Always adjust thread_(de)allocated 2019-08-26 11:56:41 -07:00
Yinan Zhang
57b81c078e Pull thread_(de)allocated out of config_stats 2019-08-26 11:56:41 -07:00
Yinan Zhang
9e031c1d11 Bug fix for prof_active switch
The bug is subtle but critical: if application performs the following
three actions in sequence: (a) turn `prof_active` off, (b) make at
least one allocation that triggers the malloc slow path via the
`if (unlikely(bytes_until_sample < 0))` path, and (c) turn
`prof_active` back on, then the application would never get another
sample (until a very very long time later).

The fix is to properly reset `bytes_until_sample` rather than
throwing it all the way to `SSIZE_MAX`.

A side minor change is to call `prof_active_get_unlocked()` rather
than directly grabbing the `prof_active` variable - it is the very
reason why we defined the `prof_active_get_unlocked()` function.
2019-08-22 13:00:10 -07:00
Qi Wang
0043e68d4c Track low_water == -1 case explicitly.
The -1 value of low_water indicates if the cache has been depleted and
refilled.  Track the status explicitly in the tcache struct.

This allows the fast path to check if (cur_ptr > low_water), instead of >=,
which avoids reaching slow path when the last item is allocated.
2019-08-21 16:00:38 -07:00
Qi Wang
937ca1db9f Store ncached_max * ptr_size in tcache_bin_info.
With the cache bin metadata switched to pointers, ncached_max is usually
accessed and timed by sizeof(ptr). Store the results in tcache_bin_info for
direct access, and add a helper function for the ncached_max value.
2019-08-19 12:23:24 -07:00
Qi Wang
7599c82d48 Redesign the cache bin metadata for fast path.
Implement the pointer-based metadata for tcache bins --
- 3 pointers are maintained to represent each bin;
- 2 of the pointers are compressed on 64-bit;
- is_full / is_empty done through pointer comparison;

Comparing to the previous counter based design --
- fast-path speed up ~15% in benchmarks
- direct pointer comparison and de-reference
- no need to access tcache_bin_info in common case
2019-08-19 12:21:44 -07:00
Qi Wang
d2dddfb82a Add hint in the bogus version string. 2019-08-16 16:08:18 -07:00
Qi Wang
d6b7995c16 Update INSTALL.md about the default doc build. 2019-08-16 10:03:34 -07:00
Qi Wang
e2c7584361 Simplify / refactor tcache_dalloc_large. 2019-08-14 13:08:23 -07:00
Qi Wang
9c5c2a2c86 Unify the signature of tcache_flush small and large. 2019-08-14 13:08:23 -07:00
Yinan Zhang
28ed9b9a51 Buffer stats printing
Without buffering `malloc_stats_print` would invoke the write back
call (which could mean an expensive `malloc_write_fd` call) for every
single `printf` (including printing each line break and each leading
tab/space for indentation).
2019-08-13 09:40:11 -07:00
Yinan Zhang
eb70fef8ca Make compact json format as default
Saves 20-50% of the output size.
2019-08-12 13:59:50 -07:00
Yinan Zhang
a219cfcda3 Clear tcache prof_accumbytes in tcache_flush_cache
`tcache->prof_accumbytes` should always be cleared after being
transferred to arena; otherwise the allocations would be double
counted, leading to excessive prof dumps.
2019-08-12 09:08:09 -07:00
Yinan Zhang
ad3f7dbfa0 Buffer prof_log_stop
Make use of the new buffered writer for the output of `prof_log_stop`.
2019-08-12 09:06:01 -07:00
Qi Wang
5934846612 Fix large bin index accessed through cache bin descriptor. 2019-08-11 16:31:12 -07:00
Qi Wang
22746d3c9f Properly dalloc prof nodes with idalloctm.
The prof_alloc_node is allocated through ialloc as internal.  Switch to
idalloctm with tcache and is_internal properly set.
2019-08-09 10:29:49 -07:00
Yinan Zhang
8c8466fa6e Add compact json option for emitter
JSON format is largely meant for machine-machine communication, so
adding the option to the emitter.  According to local testing, the
savings in terms of bytes outputted is around 50% for stats printing
and around 25% for prof log printing.
2019-08-09 09:53:41 -07:00
Yinan Zhang
7fc6b1b259 Add buffered writer
The buffered writer adopts a signature identical to `write_cb`,
so that it can be plugged into anywhere `write_cb` appears.
2019-08-09 09:44:29 -07:00
Yinan Zhang
39343555d6 Report stats for tdatas_mtx and prof_dump_mtx 2019-08-09 09:24:16 -07:00
Qi Wang
87e2400cbb Fix tcaches mutex pre- / post-fork handling. 2019-08-08 10:55:32 -07:00
Yinan Zhang
07ce2434bf Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-08-07 19:48:28 -07:00
Yinan Zhang
56126d0d2d Refactor prof log
Prof logging is conceptually seperate from core profiling, so
split it out as a module of its own.  There are a few internal
functions that had to be exposed but I think it is a fair trade-off.
2019-08-07 13:53:45 -07:00
Yinan Zhang
56c8ecffc1 Correct tsd layout graph
Augmented the tsd layout graph so that the two recently added fields,
`offset_state` and `bytes_until_sample`, are properly reflected.
As is shown, the cache footprint is 16 bytes larger than before.
2019-08-05 15:30:20 -07:00
Qi Wang
ea6b3e973b Merge branch 'dev' 2019-08-05 12:59:21 -07:00
Qi Wang
0cfa36a58a Update Changelog for 5.2.1. 2019-08-05 12:52:43 -07:00
Qi Wang
8a94ac25d5 Sanity check on prof dump buffer size. 2019-08-01 17:55:45 -07:00
Yinan Zhang
82b8aaaeb6 Quick fix for prof log printing
The emitter APIs used were incorrect, a side effect of which was
extra lines being printed.
2019-07-30 19:31:28 -07:00
Yinan Zhang
9344d25488 Workaround to address g++ unused variable warnings
g++ 5.5.0+ complained `parameter ‘expected’ set but not used
[-Werror=unused-but-set-parameter]` (despite that `expected` is in
fact used).
2019-07-30 11:37:56 -07:00
Qi Wang
c9cdc1b27f Limit to exact fit on Windows with retain off.
W/o retain, split and merge are disallowed on Windows.  Avoid doing first-fit
which needs splitting almost always.  Instead, try exact fit only and bail out
early.
2019-07-29 16:19:36 -07:00
Qi Wang
5742473cc8 Revert "Refactor prof log"
This reverts commit 7618b0b8e4.
2019-07-29 14:10:15 -07:00
Qi Wang
1a0503367b Revert "Refactor profiling"
This reverts commit 0b462407ae.
2019-07-29 14:10:15 -07:00
Yinan Zhang
0b462407ae Refactor profiling
Refactored core profiling codebase into two logical parts:

(a) `prof_data.c`: core internal data structure managing & dumping;
(b) `prof.c`: mutexes & outward-facing APIs.

Some internal functions had to be exposed out, but there are not
that many of them if the modularization is (hopefully) clean enough.
2019-07-29 13:55:00 -07:00
Yinan Zhang
7618b0b8e4 Refactor prof log
`prof.c` is growing too long, so trying to modularize it.  There are
a few internal functions that had to be exposed but I think it is a
fair trade-off.
2019-07-29 13:55:00 -07:00
Qi Wang
85f0cb2d0c Add indent to individual options for confirm_conf. 2019-07-25 17:00:31 -07:00
Qi Wang
9f6a9f4c1f Update manual for opt.retain (new default on Windows). 2019-07-25 15:25:58 -07:00
Qi Wang
10fcff6c38 Lower nthreads in test/unit/retained on 32-bit to avoid OOM. 2019-07-25 13:10:03 -07:00
Qi Wang
a3fa597921 Refactor arena_dalloc() / _sdalloc(). 2019-07-24 18:30:54 -07:00
Qi Wang
bc0998a905 Invoke arena_dalloc_promoted() properly w/o tcache.
When tcache was disabled, the dalloc promoted case was missing.
2019-07-24 18:30:54 -07:00
Qi Wang
1d148f353a Optimize max_active_fit in first_fit.
Stop scanning once reached the first max_active_fit size.
2019-07-24 11:28:45 -07:00
Qi Wang
4e36ce34c1 Track the leaked VM space via the abandoned_vm counter.
The counter is 0 unless metadata allocation failed (indicates OOM), and is
mainly for sanity checking.
2019-07-24 11:24:22 -07:00
Qi Wang
42807fcd9e extent_dalloc instead of leak when register fails.
extent_register may only fail if the underlying extent and region got stolen /
coalesced before we lock.  Avoid doing extent_leak (which purges the region)
since we don't really own the region.
2019-07-23 22:34:45 -07:00
Qi Wang
57dbab5d6b Avoid leaking extents / VM when split is not supported.
This can only happen on Windows and with opt.retain disabled (which isn't the
default).  The solution is suboptimal, however not a common case as retain is
the long term plan for all platforms anyway.
2019-07-23 22:18:55 -07:00
Qi Wang
badf8d95f1 Enable opt.retain by default on Windows. 2019-07-23 22:18:55 -07:00
Qi Wang
9a86c65abc Implement retain on Windows.
The VirtualAlloc and VirtualFree APIs are different because MEM_DECOMMIT cannot
be used across multiple VirtualAlloc regions.  To properly support decommit,
only allow merge / split within the same region -- this is done by tracking the
"is_head" state of extents and not merging cross-region.

Add a new state is_head (only relevant for retain && !maps_coalesce), which is
true for the first extent in each VirtualAlloc region.  Determine if two extents
can be merged based on the head state, and use serial numbers for sanity checks.
2019-07-23 22:18:55 -07:00
Qi Wang
f32f23d6cc Fix posix_memalign with input size 0.
Return a valid pointer instead of failed assertion.
2019-07-18 00:43:23 -07:00
Yinan Zhang
a2a693e722 Remove prof_accumbytes in arena
`prof_accumbytes` was supposed to be replaced by `prof_accum` in
https://github.com/jemalloc/jemalloc/pull/623.
2019-07-16 15:18:52 -07:00
Yinan Zhang
e0a0c8d4bf Fix a bug in prof_dump_write
The original logic can be disastrous if `PROF_DUMP_BUFSIZE` is less
than `slen` -- `prof_dump_buf_end + slen <= PROF_DUMP_BUFSIZE` would
always be `false`, so `memcpy` would always try to copy
`PROF_DUMP_BUFSIZE - prof_dump_buf_end` chars, which can be
dangerous: in the last round of the `while` loop it would not only
illegally read the memory beyond `s` (which might not always be
disastrous), but it would also illegally overwrite the memory beyond
`prof_dump_buf` (which can be pretty disastrous).  `slen` probably
has never gone beyond `PROF_DUMP_BUFSIZE` so we were just lucky.
2019-07-16 15:15:32 -07:00
Yinan Zhang
d26636d566 Fix logic in printing
`cbopaque` can now be overriden without overriding `write_cb` in
the first place.  (Otherwise there would be no need to have the
`cbopaque` parameter in `malloc_message`.)
2019-07-16 14:54:23 -07:00
Qi Wang
34e75630cc Reorder the configs for AppVeyor.
Enable-debug and 64-bit runs tend to be more relevant. 	Run them first.
2019-07-14 23:06:24 -07:00
Yinan Zhang
7720b6e385 Fix redzone setting and checking 2019-07-11 20:51:29 -07:00
frederik-h
40a3435b8d Add missing safety_check.c to MSBuild projects
The file is included in the list of source files in Makefile.in,
but it is missing from the project files. This causes the
build to fail due to unresolved symbols.
2019-05-24 09:00:19 -07:00
Qi Wang
1a71533511 Avoid blocking on background thread lock for stats.
Background threads may run for a long time, especially when the # of dirty pages
is high.  Avoid blocking stats calls because of this (which may cause latency
spikes).
2019-05-22 14:28:38 -07:00
Qi Wang
e13cf65a5f Add experimental.arenas.i.pactivep.
The new experimental mallctl exposes the arena pactive counter to applications,
which allows fast read w/o going through the mallctl / epoch steps.  This is
particularly useful when frequent balancing is required, e.g. when having
multiple manual arenas, and threads are multiplexed to them based on usage.
2019-05-22 14:27:58 -07:00
Yinan Zhang
c92ac30601 Add confirm_conf option
If the confirm_conf option is set, when the program starts, each of
the four malloc_conf strings will be printed, and each option will
be printed when being set.
2019-05-22 09:38:39 -07:00
Yinan Zhang
4c63b0e76a Improve memory utilization tests
Added tests for large size classes and expanded the tests to
cover wider range of allocation sizes.
2019-05-21 12:57:06 -07:00
Vaibhav Jain
2d6d099fed Fix GCC-9.1 warning with macro GET_ARG_NUMERIC
GCC-9.1 reports following error when trying to compile file
src/malloc_io.c and with CFLAGS='-Werror' :

src/malloc_io.c: In function ‘malloc_vsnprintf’:
src/malloc_io.c:369:2: error: case label value exceeds maximum value for type [-Werror]
  369 |  case '?' | 0x80:      \
      |  ^~~~
src/malloc_io.c:581:5: note: in expansion of macro ‘GET_ARG_NUMERIC’
  581 |     GET_ARG_NUMERIC(val, 'p');
      |     ^~~~~~~~~~~~~~~
...
<snip>
cc1: all warnings being treated as errors
make: *** [Makefile:388: src/malloc_io.sym.o] Error 1

The warning is reported as by default the type 'char' is 'signed char'
and or-ing 0x80 will turn the case label char negative which will be
beyond the printable ascii range (0 - 127).

The patch fixes this by explicitly casting the 'len' variable as
unsigned char' inside the 'switch' statement so that value of
expression " '?' | 0x80 " falls within the legal values of the
variable 'len'.
2019-05-21 11:20:07 -07:00
Qi Wang
07c44847c2 Track nfills and nflushes for arenas.i.small / large.
Small is added purely for convenience.  Large flushes wasn't tracked before and
can be useful in analysis.  Large fill simply reports nmalloc, since there is no
batch fill for large currently.
2019-05-15 10:05:09 -07:00
Yinan Zhang
13e88ae970 Fix assert in free fastpath
rtree_szind_slab_read_fast() may have not initialized
alloc_ctx.szind, unless after confirming the return is true.
2019-05-15 09:42:52 -07:00
Yinan Zhang
259b15dec5 Improve macro readability in malloc_conf_init
Define more readable macros than yes and no.
2019-05-08 14:15:03 -07:00
Dave Watson
5679751208 Remove best fit
This option saves a few CPU cycles, but potentially adds a lot of
fragmentation - so much so that there are workarounds like
max_active.  Instead, let's just drop it entirely.  It only made
a difference in one service I tested (.3% cpu regression), while
many services saw a memory win (also small, less than 1% mem P99)
2019-05-08 13:15:19 -07:00
Dave Watson
b62d126df8 Add max_active_fit to first_fit
The max_active_fit check is currently only on the best_fit
path, add it to the first_fit path also.
2019-05-08 13:15:19 -07:00
Doron Roberts-Kedes
7fc4f2a32c Add nonfull_slabs to bin_stats_t.
When config_stats is enabled track the size of bin->slabs_nonfull in
the new nonfull_slabs counter in bin_stats_t. This metric should be
useful for establishing an upper ceiling on the savings possible by
meshing.
2019-04-29 13:35:02 -07:00
Yinan Zhang
ae124b8684 Improve size class header
Mainly fixing typos.  The only non-trivial change is in the
computation for SC_NPSIZES, though the result wouldn't be any
different when SC_NGROUP = 4 as is always the case at the moment.
2019-04-24 10:45:12 -07:00
Fabrice Fontaine
702d76dbd0 configure.ac: Add an option to disable doc
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
2019-04-23 15:32:02 -07:00
498f47e1ec Fix typo derived from tcmalloc's pprof
The same pr is submitted into gperftools:

https://github.com/gperftools/gperftools/pull/1105
2019-04-23 15:29:57 -07:00
Qi Wang
1aabab5fdc Enforce TLS_MODEL attribute.
Caught by @zoulasc in #1460.  The attribute needs to be added in the headers as
well.
2019-04-16 11:07:15 -07:00
David Goldblatt
21cfe59ff7 Safety checks: Run tests by default 2019-04-15 16:48:12 -07:00
David Goldblatt
33e1dad680 Safety checks: Add a redzoning feature. 2019-04-15 16:48:12 -07:00
David Goldblatt
b92c9a1a81 Safety checks: Indirect through a function.
This will let us share code on failure pathways.pathways
2019-04-15 16:48:12 -07:00
David Goldblatt
f95a88fcd9 Safety checks: Expose config value via mallctl and stats. 2019-04-15 16:48:12 -07:00
David Goldblatt
f4d24f05e1 Move extra size checks behind a config flag.
This will let us turn that flag into a generic "turn on runtime checks" flag
that guards other functionality we have planned.
2019-04-15 16:48:12 -07:00
zoulasc
7f7935cf78 Add an autoconf feature test for format_arg and a jemalloc-specific
macro for it.
2019-04-15 15:14:46 -07:00
zoulasc
14e4176758 Fix incorrect macro use.
Compiling with warnings produces missing prototype warnings.
2019-04-15 15:14:46 -07:00
zoulasc
020b5dc7ac Convert the format generator function to an annotated format function,
so that the generated formats can be checked by the compiler.
2019-04-15 15:14:46 -07:00
Yinan Zhang
7ee3897740 Separate tests for extent utilization API
As title.
2019-04-10 13:03:20 -07:00
mgrice
d3d7a8ef09 remove compare and branch in fast path for c++ operator delete[]
Summary: sdallocx is checking a flag that will never be set (at least in the provided C++ destructor implementation).  This branch will probably only rarely be mispredicted however it removes two instructions in sdallocx and one at the callsite (to zero out flags).
2019-04-08 10:59:05 -07:00
Qi Wang
c2a3a7cd3f Fix test/unit/prof_log
Compiler optimizations may produce traces more than expected.  Instead verify
the lower bound only.
2019-04-05 13:47:10 -07:00
Qi Wang
93084cdc89 Ensure page alignment on extent_alloc.
This is discovered and suggested by @jasone in #1468.  When custom extent hooks
are in use, we should ensure page alignment on the extent alloc path, instead of
relying on the user hooks to do so.
2019-04-04 13:49:37 -07:00
Yinan Zhang
9aab3f2be0 Add memory utilization analytics to mallctl
The analytics tool is put under experimental.utilization namespace in
mallctl.  Input is one pointer or an array of pointers and the output
is a list of memory utilization statistics.
2019-04-04 13:48:39 -07:00
Qi Wang
b0b3e49a54 Merge branch 'dev' 2019-04-02 17:50:42 -07:00
Qi Wang
f7489dc8f1 Update Changelog for 5.2.0. 2019-04-02 17:40:42 -07:00
Qi Wang
978a7a21ae Use iallocztm instead of ialloc in prof_log functions.
Explicitly use iallocztm for internal allocations.  ialloc could trigger arena
creation, which may cause lock order reversal (narenas_mtx and log_mtx).
2019-04-02 16:53:00 -07:00
Qi Wang
6fe11633b0 Fix the binshard unit test.
The test attempts to trigger usage of multiple sharded bins, which percpu_arena
makes it less reliable.
2019-04-02 16:53:00 -07:00
Qi Wang
064d6e570e Tweak the wording about oversize_threshold. 2019-04-01 10:36:29 -07:00
Qi Wang
0101d5ebef Avoid check_min for opt_lg_extent_max_active_fit.
This fixes a compiler warning.
2019-03-29 15:56:53 -07:00
Qi Wang
59d9891948 Add the missing unlock in the error path of extent_register. 2019-03-29 15:56:53 -07:00
Qi Wang
ce03e4c7b8 Document opt.oversize_threshold. 2019-03-29 11:55:05 -07:00
Qi Wang
788a657cee Allow low values of oversize_threshold to disable the feature.
We should allow a way to easily disable the feature (e.g. not reserving the
arena id at all).
2019-03-29 11:33:00 -07:00
Qi Wang
a4d017f5e5 Output message before aborting on tcache size-matching check. 2019-03-29 11:33:00 -07:00
Qi Wang
fb56766ca9 Eagerly purge oversized merged extents.
This change improves memory usage slightly, at virtually no CPU cost.
2019-03-14 17:34:55 -07:00
Qi Wang
f6c30cbafa Remove some unused comments. 2019-03-14 17:34:55 -07:00
Qi Wang
b804d0f019 Fallback to 32-bit when 8-bit atomics are missing for TSD.
When it happens, this might cause a slowdown on the fast path operations.
However such case is very rare.
2019-03-09 12:52:06 -08:00
Qi Wang
06f0850427 Detect if 8-bit atomics are available.
In some rare cases (older compiler, e.g. gcc 4.2 w/ MIPS), 8-bit atomics might
be unavailable.  Detect such cases so that we can workaround.
2019-03-09 12:52:06 -08:00
Jason Evans
14d3686c9f Do not use #pragma GCC diagnostic with gcc < 4.6.
This regression was introduced by
3d29d11ac2 (Clean compilation -Wextra).
2019-03-09 12:10:30 -08:00
Qi Wang
ac24ffb21e Fix a syntax error in configure.ac
Introduced in e13400c919.
2019-03-04 10:50:17 -08:00
Jason Evans
775fe302a7 Remove JE_FORCE_SYNC_COMPARE_AND_SWAP_[48].
These macros have been unused since
d4ac7582f3 (Introduce a backport of C11
atomics).
2019-02-22 14:22:16 -08:00
Dave Rigby
cbdb1807ce Stringify tls_callback linker directive
Proposed fix for #1444 - ensure that `tls_callback` in the `#pragma comment(linker)`directive gets the same prefix added as it does i the C declaration.
2019-02-22 12:43:35 -08:00
Qi Wang
18450d0abe Guard libgcc unwind init with opt_prof.
Only triggers libgcc unwind init when prof is enabled.  This helps workaround
some bootstrapping issues.
2019-02-21 16:04:47 -08:00
Jason Evans
dca7060d5e Avoid redefining tsd_t.
This fixes a build failure when integrating with FreeBSD's libc.  This
regression was introduced by d1e11d48d4
(Move tsd link and in_hook after tcache.).
2019-02-20 20:27:55 -08:00
Qi Wang
9015deb126 Add build_doc by default.
However, skip building the docs (and output warnings) if XML support is missing.
This allows `make install` to succeed w/o `make dist`.
2019-02-08 14:13:20 -08:00
Qi Wang
23b15e764b Add --disable-libdl to travis. 2019-02-06 21:00:59 -08:00
Qi Wang
2db2d2ef5e Make background_thread not dependent on libdl.
When not using libdl, still allows background_thread to be enabled.
2019-02-06 21:00:59 -08:00
Qi Wang
1f55a15467 Add configure option --disable-libdl.
This makes it possible to build full static binary.
2019-02-06 21:00:59 -08:00
Qi Wang
8e9a613122 Disable muzzy decay by default. 2019-02-04 14:38:54 -08:00
Qi Wang
e13400c919 Sanity check szind on tcache flush.
This adds some overhead to the tcache flush path (which is one of the
popular paths).  Guard it behind a config option.
2019-02-01 12:31:34 -08:00
Qi Wang
b33eb26dee Tweak the spacing for the total_wait_time per second. 2019-01-28 15:37:19 -08:00
Qi Wang
374dc30d3d Update copyright dates. 2019-01-25 13:25:20 -08:00
Qi Wang
e3db480f6f Rename huge_threshold to oversize_threshold.
The keyword huge tend to remind people of huge pages which is not relevent to
the feature.
2019-01-25 13:15:45 -08:00
Qi Wang
350809dc5d Set huge_threshold to 8M by default.
This feature uses an dedicated arena to handle huge requests, which
significantly improves VM fragmentation.  In production workload we tested it
often reduces VM size by >30%.
2019-01-24 13:29:23 -08:00
Qi Wang
d3145014a0 Explicitly use arena 0 in alignment and OOM tests.
This helps us avoid issues with size based routing (i.e. the huge_threshold
feature).
2019-01-24 13:29:23 -08:00
Edward Tomasz Napierala
a7b0a124c3 Mention different mmap(2) behaviour with retain:true. 2019-01-23 18:34:59 -08:00
Qi Wang
522d1e7b4b Tweak the spacing for nrequests in stats output. 2019-01-23 17:42:12 -08:00
Qi Wang
8c9571376e Fix stats output (rate for total # of requests).
The rate calculation for the total row was missing.
2019-01-23 17:42:12 -08:00
Qi Wang
7a815c1b7c Un-experimental the huge_threshold feature. 2019-01-16 12:28:57 -08:00
Qi Wang
bbe8e6a909 Avoid creating bg thds for huge arena lone.
For low arena count settings, the huge threshold feature may trigger an unwanted
bg thd creation.  Given that the huge arena does eager purging by default,
bypass bg thd creation when initializing the huge arena.
2019-01-15 16:00:34 -08:00
Jason Evans
b6f1f2669a Revert "Customize cloning to include tags so that VERSION is valid."
This reverts commit 646af596d8.
2019-01-14 10:35:48 -08:00
Jason Evans
225d89998b Revert "Remove --branch=${CIRRUS_BASE_BRANCH} in git clone command."
This reverts commit fc13a7f1fa.
2019-01-14 10:35:48 -08:00
Qi Wang
f459454afe Avoid potential issues on extent zero-out.
When custom extent_hooks or transparent huge pages are in use, the purging
semantics may change, which means we may not get zeroed pages on repopulating.
Fixing the issue by manually memset for such cases.
2019-01-11 19:16:12 -08:00
Qi Wang
0ecd5addb1 Force purge on thread death only when w/o bg thds. 2019-01-11 19:15:34 -08:00
Jason Evans
fc13a7f1fa Remove --branch=${CIRRUS_BASE_BRANCH} in git clone command.
The --branch parameter is unnecessary, and may avoid problems when
testing directly on the dev branch.
2019-01-11 13:50:56 -08:00
Jason Evans
646af596d8 Customize cloning to include tags so that VERSION is valid. 2019-01-10 15:14:33 -08:00
Li-Wen Hsu
6910fcb208 Add Cirrus-CI config for FreeBSD builds 2019-01-10 15:14:33 -08:00
Faidon Liambotis
471191075d Replace -lpthread with -pthread
This automatically adds -latomic if and when needed, e.g. on riscv64
systems.

Fixes #1401.
2019-01-09 13:43:33 -08:00
Leonardo Santagada
daa0e436ba implement malloc_getcpu for windows 2019-01-08 14:34:45 -08:00
John Ericson
4e920d2c9d Add --{enable,disable}-{static,shared} to configure script
My distro offers a custom toolchain where it's not possible to make
static libs, so it's insufficient to just delete the libs I don't want.
I actually need to avoid building them in the first place.
2018-12-19 13:34:26 -08:00
Qi Wang
7241bf5b74 Only read arena index from extent on the tcache flush path.
Add exten_arena_ind_get() to avoid loading the actual arena ptr in case we just
need to check arena matching.
2018-12-18 15:19:30 -08:00
Qi Wang
441335d924 Add unit test for producer-consumer pattern. 2018-12-18 15:09:53 -08:00
Alexander Zinoviev
36de5189c7 Add rate counters to stats 2018-12-18 09:59:41 -08:00
Qi Wang
99f4eefb61 Fix incorrect stats mreging with sharded bins.
With sharded bins, we may not flush all items from the same arena in one run.
Adjust the stats merging logic accordingly.
2018-12-07 18:16:15 -08:00
Qi Wang
711a61f3b4 Add unit test for sharded bins. 2018-12-03 17:17:03 -08:00
Qi Wang
98b56ab23d Store the bin shard selection in TSD.
This avoids having to choose bin shard on the fly, also will allow flexible bin
binding for each thread.
2018-12-03 17:17:03 -08:00
Qi Wang
45bb4483ba Add stats for arenas.bin.i.nshards. 2018-12-03 17:17:03 -08:00
Qi Wang
3f9f2833f6 Add opt.bin_shards to specify number of bin shards.
The option uses the same format as "slab_sizes" to specify number of shards for
each bin size.
2018-12-03 17:17:03 -08:00
Qi Wang
37b8913925 Add support for sharded bins within an arena.
This makes it possible to have multiple set of bins in an arena, which improves
arena scalability because the bins (especially the small ones) are always the
limiting factor in production workload.

A bin shard is picked on allocation; each extent tracks the bin shard id for
deallocation.  The shard size will be determined using runtime options.
2018-12-03 17:17:03 -08:00
Dave Watson
b23336af96 mutex: fix trylock spin wait contention
If there are 3 or more threads spin-waiting on the same mutex,
there will be excessive exclusive cacheline contention because
pthread_trylock() immediately tries to CAS in a new value, instead
of first checking if the lock is locked.

This diff adds a 'locked' hint flag, and we will only spin wait
without trylock()ing while set.  I don't know of any other portable
way to get the same behavior as pthread_mutex_lock().

This is pretty easy to test via ttest, e.g.

./ttest1 500 3 10000 1 100

Throughput is nearly 3x as fast.

This blames to the mutex profiling changes, however, we almost never
have 3 or more threads contending in properly configured production
workloads, but still worth fixing.
2018-11-28 15:17:02 -08:00
Qi Wang
c4063ce439 Set the default number of background threads to 4.
The setting has been tested in production for a while.  No negative effect while
we were able to reduce number of threads per process.
2018-11-16 09:35:12 -08:00
Qi Wang
43f3b1ad0c Deprecate OSSpinLock. 2018-11-14 08:44:05 -08:00
Dave Watson
13c237c7ef Add a fastpath for arena_slab_reg_alloc_batch
Also adds a configure.ac check for __builtin_popcount, which is used
in the new fastpath.
2018-11-14 07:09:11 -08:00
Dave Watson
17aa470760 add extent_nfree_sub 2018-11-14 07:09:11 -08:00
Dave Watson
4b82872ebf arena: Refactor tcache_fill to batch fill from slab
Refactor tcache_fill, introducing a new function arena_slab_reg_alloc_batch,
which will fill multiple pointers from a slab.

There should be no functional changes here, but allows future optimization
on reg_alloc_batch.
2018-11-14 07:09:11 -08:00
Qi Wang
57553c3b1a Avoid touching all pages in extent_recycle for debug build.
We may have a large number of pages with *zero set (since they are populated on
demand).  Only check the first page to avoid paging in all of them.
2018-11-13 08:54:48 -08:00
Qi Wang
1f56115704 Fix tcache_flush (follow up cd2931a).
Also catch invalid tcache id.
2018-11-13 08:54:09 -08:00
Dave Watson
794e29c0ab Add a free() and sdallocx(where flags=0) fastpath
Add unsized and sized deallocation fastpaths.  Similar to the malloc()
fastpath, this removes all frame manipulation for the majority of
free() calls.  The performance advantages here are less than that
of the malloc() fastpath, but from prod tests seems to still be half
a percent or so of improvement.

Stats and sampling a both supported (sdallocx needs a sampling check,
for rtree lookups slab will only be set for unsampled objects).

We don't support flush, any flush requests go to the slowpath.
2018-11-12 13:20:37 -08:00
Dave Watson
e2ab215324 refactor tcache_dalloc_small
Add a cache_bin_dalloc_easy (to match the alloc_easy function),
and use it in tcache_dalloc_small.  It will also be used in the
new free fastpath.
2018-11-12 13:20:37 -08:00
Dave Watson
5e795297b3 rtree: add rtree_szind_slab_read_fast
For a free fastpath, we want something that will not make additional
calls.  Assume most free() calls will hit the L1 cache, and use
a custom rtree function for this.

Additionally, roll the ptr=NULL check in to the rtree cache check.
2018-11-12 13:20:37 -08:00
Edward Tomasz Napierala
a4c6b9ae01 Restore a FreeBSD-specific getpagesize(3) optimization.
It was removed in 0771ff2cea.
Add a comment explaining its purpose.
2018-11-09 14:14:49 -08:00
Qi Wang
cd2931ad9b Fix tcaches_flush.
The regression was introduced in 3a1363b.
2018-11-09 13:11:37 -08:00
Qi Wang
7ee0b6cc37 Properly trigger decay on tcache destory.
When destroying tcache, decay may not be triggered since tsd is non-nominal.
Explicitly decay to avoid pathological cases.
2018-11-09 11:03:19 -08:00
Qi Wang
d66f976628 Optimize large deallocation.
We eagerly coalesce large buffers when deallocating, however the previous logic
around this introduced extra lock overhead -- when coalescing we always lock the
neighbors even if they are active, while for active extents nothing can be done.

This commit checks if the neighbor extents are potentially active before
locking, and avoids locking if possible.  This speeds up large_dalloc by ~20%.
It also fixes some undesired behavior: we could stop coalescing because a small
buffer was merged, while a large neighbor was ignored on the other side.
2018-11-08 13:35:59 -08:00
Qi Wang
8dabf81df1 Bypass extent_dalloc when retain is enabled.
When retain is enabled, the default dalloc hook does nothing (since we avoid
munmap).  But the overhead preparing the call is high, specifically the extent
de-register and re-register involve locking and extent / rtree modifications.
Bypass the call with retain in this diff.
2018-11-08 11:32:25 -08:00
Qi Wang
50b473c883 Set commit properly for FreeBSD w/ overcommit.
When overcommit is enabled, commit needs to be set when doing mmap().  The
regression was introduced in f80c97e.
2018-11-05 09:47:04 -08:00
Justin Hibbits
be0749f591 Restrict lwsync to powerpc64 only
Nearly all 32-bit powerpc hardware treats lwsync as sync, and some cores
(Freescale e500) trap lwsync as an illegal instruction, which then gets
emulated in the kernel.  To avoid unnecessary traps on the e500, use
sync on all 32-bit powerpc.  This pessimizes 32-bit software running on
64-bit hardware, but those numbers should be slim.
2018-10-24 11:18:55 -07:00
Edward Tomasz Napierala
ceba1dde27 Make use of pthread_set_name_np(3) on FreeBSD. 2018-10-24 10:06:37 -07:00
Dave Watson
936bc2aa15 prof: Fix memory regression
The diff 'refactor prof accum...' moved the bytes_until_sample
subtraction before the load of tdata.  If tdata is null,
tdata_get(true) will overwrite bytes_until_sample, but we
still sample the current allocation.   Instead, do the subtraction
and check logic again, to keep the previous behavior.

blame-rev: 0ac524308d
2018-10-23 12:39:57 -07:00
Dave Watson
0f8313659e malloc: Add a fastpath
This diff adds a fastpath that assumes size <= SC_LOOKUP_MAXCLASS, and
that we hit tcache.  If either of these is false, we fall back to
the previous codepath (renamed 'malloc_default').

Crucially, we only tail call malloc_default, and with the same kind
and number of arguments, so that both clang and gcc tail-calling
will kick in - therefore malloc() gets treated as a leaf function,
and there are *no* caller-saved registers.   Previously malloc() contained
5 caller saved registers on x64, resulting in at least 10 extra
memory-movement instructions.

In microbenchmarks this results in up to ~10% improvement in malloc()
fastpath.  In real programs, this is a ~1% CPU and latency improvement
overall.
2018-10-18 08:32:19 -07:00
Dave Watson
0ec656eb71 ticker: add ticker_trytick
For the fastpath, we want to tick, but undo the tick and jump to the
slowpath if ticker would fire.
2018-10-18 08:32:19 -07:00
Dave Watson
ac34afb403 drop bump_empty_alloc option. Size class lookup support used instead. 2018-10-17 08:50:58 -07:00
Dave Watson
4edbb7c64c sz: Support 0 size in size2index lookup/compute 2018-10-17 08:50:58 -07:00
Dave Watson
2b112ea593 add test for zero-sized alloc and aligned alloc 2018-10-17 08:50:58 -07:00
gnzlbg
01e2a38e5a Make smallocx symbol name depend on the JEMALLOC_VERSION_GID
This comments concatenates the `JEMALLOC_VERSION_GID` to the
`smallocx` symbol name, such that the symbol ends up exported
as `smallocx_{git_hash}`.
2018-10-17 07:12:28 -07:00
gnzlbg
837de32496 Test smallocx on Travis-CI
This commit updates the gen_travis script with a new build bot
that covers the experimental `smallocx` API and updates the
travis CI script to test this API under travis.
2018-10-17 07:12:28 -07:00
gnzlbg
741fca1bb7 Hide smallocx even when enabled from the library API
The experimental `smallocx` API is not exposed via header files,
requiring the users to peek at `jemalloc`'s source code to manually
add the external declarations to their own programs.

This should reinforce that `smallocx` is experimental, and that `jemalloc`
does not offer any kind of backwards compatiblity or ABI gurantees for it.
2018-10-17 07:12:28 -07:00
gnzlbg
730e57b08f Adapts mallocx integration tests for smallocx 2018-10-17 07:12:28 -07:00
gnzlbg
08260a6b94 Add experimental API: smallocx_return_t smallocx(size, flags)
---

Motivation:

This new experimental memory-allocaction API returns a pointer to
the allocation as well as the usable size of the allocated memory
region.

The `s` in `smallocx` stands for `sized`-`mallocx`, attempting to
convey that this API returns the size of the allocated memory region.

It should allow C++ P0901r0 [0] and Rust Alloc::alloc_excess to make
use of it.

The main purpose of these APIs is to improve telemetry. It is more accurate
to register `smallocx(size, flags)` than `smallocx(nallocx(size), flags)`,
for example. The latter will always line up perfectly with the existing
size classes, causing a loss of telemetry information about the internal
fragmentation induced by potentially poor size-classes choices.

Instrumenting `nallocx` does not help much since user code can cache its
result and use it repeatedly.

---

Implementation:

The implementation adds a new `usize` option to `static_opts_s` and an `usize`
variable to `dynamic_opts_s`. These are then used to cache the result of
`sz_index2size` and similar functions in the code paths in which they are
unconditionally invoked. In the code-paths in which these functions are not
unconditionally invoked, `smallocx` calls, as opposed to `mallocx`, these
functions explicitly.

---

[0]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html
2018-10-17 07:12:28 -07:00
Dave Watson
325e3305fc remove malloc_init() off the fastpath 2018-10-15 10:11:08 -07:00
Dave Watson
997d86acc6 restrict bytes_until_sample to int64_t. This allows optimal asm
generation of sub bytes_until_sample, usize; je; for x86 arch.
Subtraction is unconditional, and only flags are checked for the jump,
no extra compare is necessary.  This also reduces register pressure.
2018-10-15 08:24:12 -07:00
Dave Watson
d1a861fa80 add a check for SC_LARGE_MAXCLASS
If we assume SC_LARGE_MAXCLASS will always fit in a SSIZE_T, then we can
optimize some checks by unconditional subtraction, and then checking flags
only, without a compare statement in x86.
2018-10-15 08:24:12 -07:00
Dave Watson
0ac524308d refactor prof accum, so that tdata is not loaded if we aren't going to sample. 2018-10-15 08:24:12 -07:00
Dave Watson
9ed3bdc848 move bytes until sample to tsd. Fastpath allocation does not need
to load tdata now, avoiding several branches.
2018-10-15 08:24:12 -07:00
Dave Watson
09adf18f1a Remove a branch from cache_bin_alloc_easy
Combine the branches for checking for an empty cache_bin, and
checking for the low watermark.
2018-10-15 08:18:15 -07:00
jsteemann
856319dc8a check return value of malloc_read_fd
in case `malloc_read_fd` returns a negative error number, the result
would afterwards be casted to an unsigned size_t, and may have
theoretically caused an out-of-bounds memory access in the following
`strncmp` call.
2018-10-11 17:25:20 -07:00
Edward Tomasz Napierala
f80c97e477 Rework the way jemalloc uses mmap(2) on FreeBSD.
This makes it directly use MAP_EXCL and MAP_ALIGNED() instead
of weird workarounds involving mapping at random places and then
unmapping parts of them.
2018-10-06 22:06:56 -07:00
Edward Tomasz Napierala
676cdd6679 Disable runtime detection of lazy purging support on FreeBSD.
The check doesn't seem to serve any purpose here, and this shaves
off three syscalls on binary startup.
2018-10-06 22:06:56 -07:00
Rajeev Misra
115ce93562 bit_util: Don't use __builtin_clz on s390x
There's an optimizer bug upstream that results in test failures; reported at
https://bugzilla.redhat.com/show_bug.cgi?id=1619354.  This works around the
failure reported at https://github.com/jemalloc/jemalloc/issues/1307.
2018-09-20 11:25:17 -07:00
David Goldblatt
88771fa013 Bootstrapping: don't overwrite opt_prof_prefix. 2018-09-12 17:06:06 -07:00
rustyx
9f43defb6e Add sc.c to the MSVC project 2018-09-04 12:58:05 -07:00
Rajeev Misra
4c548a61c8 Bit_util: Use intrinsics for pow2_ceil, where available. 2018-08-15 19:38:31 -07:00
gnzlbg
36eb0b3d77 Add valgrind build bots to CI
This commit adds two build-bots to CI that test the release builds
of jemalloc on linux and macOS under valgrind.

The macOS build is not enabled because valgrind reports
errors about reads of uninitialized memory in some tests and
segfaults in others.
2018-08-13 10:59:20 -07:00
David Goldblatt
1f71e1ca43 Add hook microbenchmark. 2018-08-09 13:16:54 -07:00
David Carlier
0771ff2cea FreeBSD build changes and allow to run the tests. 2018-08-09 10:41:20 -07:00
David Goldblatt
e8ec9528ab Allow the use of readlinkat over readlink.
This can be useful in situations where readlink is disallowed.
2018-08-03 14:04:32 -07:00
Tyler Etzel
126252a7e6 Add stats for the size of extent_avail heap 2018-08-02 10:16:06 -07:00
Tyler Etzel
c14e6c0819 Add extents information to mallocstats output
- Show number/bytes of extents of each size that are dirty, muzzy, retained.
2018-08-02 10:16:06 -07:00
Tyler Etzel
33f1aa5bad Fix comment on SC_NPSIZES. 2018-08-02 10:16:06 -07:00
Tyler Etzel
5e23f96dd4 Add unit tests for logging 2018-08-01 13:27:11 -07:00
Tyler Etzel
b664bd7935 Add logging for sampled allocations
- prof_opt_log flag starts logging automatically at runtime
- prof_log_{start,stop} mallctl for manual control
2018-08-01 13:27:11 -07:00
Tyler Etzel
eb261e53a6 Small refactoring of emitter
- Make API more clear for using as standalone json emitter
- Support cases that weren't possible before, e.g.
	- emitting primitive values in an array
	- emitting nested arrays
2018-08-01 13:27:11 -07:00
David Goldblatt
41b7372ead TSD: Add fork support to tsd_nominal_tsds.
In case of multithreaded fork, we want to leave the child in a reasonable state,
in which tsd_nominal_tsds is either empty or contains only the forking thread.
2018-07-26 17:22:25 -07:00
David Goldblatt
013ab26c86 TSD: Add a tsd_nominal_list death assertion.
A thread should have had its state transition away from nominal before it dies.
This change adds that to the list of thread death assertions.
2018-07-26 17:22:25 -07:00
David Goldblatt
3aba072cef SC: Remove global data.
The global data is mostly only used at initialization, or for easy access to
values we could compute statically.  Instead of consuming that space (and
risking TLB misses), we can just pass around a pointer to stack data during
bootstrapping.
2018-07-23 13:37:08 -07:00
Qi Wang
4bc48718b2 Tolerate experimental features for abort_conf.
Not aborting with unrecognized experimental options.  This helps us testing
experimental features with abort_conf enabled.
2018-07-17 20:40:32 -07:00
gnzlbg
6deed86deb Test that .travis.yml has been produced by gen_travis.py on CI
This commits checks on Travis-CI that the current `.travis.yml` file
equals the output of the `gen_travis.py` script, and updated
the `.travis.yml` file accordingly.
2018-07-17 17:55:50 -07:00
gnzlbg
0eb0641cac Simplify output of gen_travis.py script
This commit simplifies the output of the
`gen_travis.py` script by reusing addons.

The `.travis.yml` script is updated to
reflect these changes.
2018-07-17 17:55:50 -07:00
David Goldblatt
55e5cc1341 SC: Make some key size classes static.
The largest small class, smallest large class, and largest large class may all
be needed down fast paths; to avoid the risk of touching another cache line, we
can make them available as constants.
2018-07-12 20:53:06 -07:00
David T. Goldblatt
5112d9e5fd Add MALLOC_CONF parsing for dynamic slab sizes.
This actually enables us to change the values.
2018-07-12 20:53:06 -07:00
David T. Goldblatt
4610ffa942 Bootstrapping: Parse MALLOC_CONF before using slab sizes.
I.e., parse before booting the bin module or sz module.  This lets us tweak size
class settings before committing to them by letting them leak into other
modules.

This commit does not actually do any tweaking of the size classes; it *just*
chanchanges bootstrapping order; this may help bisecting any bootstrapping
failures on poorly-tested architectures.
2018-07-12 20:53:06 -07:00
David T. Goldblatt
a7f68aed3e SC: Add page customization functionality. 2018-07-12 20:53:06 -07:00
David T. Goldblatt
017dca198c SC module: Add a note on style. 2018-07-12 20:53:06 -07:00
David Goldblatt
5b7fc9056c Remove the --with-lg-page-sizes configure option.
This appears to be unused.
2018-07-12 20:53:06 -07:00
David Goldblatt
0552aad91b Kill size_classes.sh.
We've moved size class computations to boot time; they were being used only to
check that the computations resulted in equal values.
2018-07-12 20:53:06 -07:00
David Goldblatt
4f55c0ec22 Translate size class computation from bash shell into C.
This is the last big step in making size classes a runtime computation rather
than a configure-time one.

The compile-time computation has been left in, for now, to allow assertion
checking that the results are identical.
2018-07-12 20:53:06 -07:00
David Goldblatt
2f07e92adb Add lg_ceil to bit_util.
Also, add the bit_util test back to the Makefile.
2018-07-12 20:53:06 -07:00
David Goldblatt
07b89c7673 Move quantum detection into its own file.
This is logically fairly independent.
2018-07-12 20:53:06 -07:00
David Goldblatt
e904f813b4 Hide size class computation behind a layer of indirection.
This class removes almost all the dependencies on size_classes.h, accessing the
data there only via the new module sc.h, which does not depend on any
configuration options.

In a subsequent commit, we'll remove the configure-time size class computations,
doing them at boot time, instead.
2018-07-12 20:53:06 -07:00
gnzlbg
fb924dd7bf Suppress -Wmissing-field-initializer warning only for compilers with buggy implementation 2018-07-10 13:13:36 -07:00
gnzlbg
3d29d11ac2 Clean compilation -Wextra
Before this commit jemalloc produced many warnings when compiled with -Wextra
with both Clang and GCC. This commit fixes the issues raised by these warnings
or suppresses them if they were spurious at least for the Clang and GCC
versions covered by CI.

This commit:

* adds `JEMALLOC_DIAGNOSTIC` macros: `JEMALLOC_DIAGNOSTIC_{PUSH,POP}` are
  used to modify the stack of enabled diagnostics. The
  `JEMALLOC_DIAGNOSTIC_IGNORE_...` macros are used to ignore a concrete
  diagnostic.

* adds `JEMALLOC_FALLTHROUGH` macro to explicitly state that falling
  through `case` labels in a `switch` statement is intended

* Removes all UNUSED annotations on function parameters. The warning
  -Wunused-parameter is now disabled globally in
  `jemalloc_internal_macros.h` for all translation units that include
  that header. It is never re-enabled since that header cannot be
  included by users.

* locally suppresses some -Wextra diagnostics:

  * `-Wmissing-field-initializer` is buggy in older Clang and GCC versions,
    where it does not understanding that, in C, `= {0}` is a common C idiom
    to initialize a struct to zero

  * `-Wtype-bounds` is suppressed in a particular situation where a generic
    macro, used in multiple different places, compares an unsigned integer for
    smaller than zero, which is always true.

  * `-Walloc-larger-than-size=` diagnostics warn when an allocation function is
    called with a size that is too large (out-of-range). These are suppressed in
    the parts of the tests where `jemalloc` explicitly does this to test that the
    allocation functions fail properly.

* adds a new CI build bot that runs the log unit test on CI.

Closes #1196 .
2018-07-09 21:40:42 -07:00
Maks Naumov
ce5c073fe5 Fix MSVC build 2018-07-05 13:50:01 -07:00
Qi Wang
cdf15b458a Rename huge_threshold to experimental, and tweak documentation. 2018-06-29 10:35:02 -07:00
Qi Wang
ff622eeab5 Add unit test for opt.huge_threshold. 2018-06-29 10:35:02 -07:00
Qi Wang
1302af4c43 Add ctl and stats for opt.huge_threshold. 2018-06-29 10:35:02 -07:00
Qi Wang
79522b2fc2 Refactor arena_is_auto. 2018-06-29 10:35:02 -07:00
Qi Wang
94a88c26f4 Implement huge arena: opt.huge_threshold.
The feature allows using a dedicated arena for huge allocations.  We want the
addtional arena to separate huge allocation because: 1) mixing small extents
with huge ones causes fragmentation over the long run (this feature reduces VM
size significantly); 2) with many arenas, huge extents rarely get reused across
threads; and 3) huge allocations happen way less frequently, therefore no
concerns for lock contention.
2018-06-29 10:35:02 -07:00
Qi Wang
77a71ef2b7 Fall back to the default pthread_create if RTLD_NEXT fails. 2018-06-28 13:18:21 -07:00
David Goldblatt
d1e11d48d4 Move tsd link and in_hook after tcache.
This can lead to better cache utilization down the common paths where we don't
touch the link.
2018-06-27 13:39:02 -07:00
Qi Wang
50820010fe Add test for remote deallocation. 2018-06-26 23:13:15 -07:00
Qi Wang
fec1ef7c91 Fix arena locking in tcache_bin_flush_large().
This regression was introduced in c834912 (incorrect arena used).
2018-06-26 23:13:15 -07:00
Qi Wang
0ff7ff3ec7 Optimize ixalloc by avoiding a size lookup. 2018-06-05 21:03:51 -07:00
Qi Wang
c834912aa9 Avoid taking large_mtx for auto arenas.
On tcache flush path, we can avoid touching the large_mtx for auto arenas, since
it was only needed for manual arenas where arena_reset is allowed.
2018-06-05 15:16:03 -07:00
Qi Wang
9bd8deb260 Fix stats output for opt.lg_extent_max_active_fit. 2018-06-05 10:23:28 -07:00
Qi Wang
d22e150320 Avoid taking extents_muzzy mutex when muzzy is disabled.
When muzzy decay is disabled, no need to allocate from extents_muzzy.  This
saves us a couple of mutex operations down the extents_alloc path.
2018-05-24 14:40:56 -07:00
David Goldblatt
a7f749c9af Hooks: Protect against reentrancy.
Previously, we made the user deal with this themselves, but that's not good
enough; if hooks may allocate, we should test the allocation pathways down
hooks.  If we're doing that, we might as well actually implement the protection
for the user.
2018-05-18 11:43:03 -07:00
David Goldblatt
0379235f47 Tests: Shouldn't be able to change global slowness.
This can help ensure that we don't leave slowness changes behind in case of
resource exhaustion.
2018-05-18 11:43:03 -07:00
David Goldblatt
59e371f463 Hooks: Add a hook exhaustion test.
When we run out of space in which to store hooks, we should return EAGAIN from
the mallctl, but not otherwise misbehave.
2018-05-18 11:43:03 -07:00
David Goldblatt
bb071db92e Mallctl: Add experimental.hooks.[install|remove]. 2018-05-18 11:43:03 -07:00
David Goldblatt
126e9a84a5 Hooks: move the "extra" pointer into the hook_t itself.
This simplifies the mallctl call to install a hook, which should only take a
single argument.
2018-05-18 11:43:03 -07:00
David Goldblatt
cb0707c0fc Hooks: hook the realloc pathways that move/expand. 2018-05-18 11:43:03 -07:00
David Goldblatt
67270040a5 Hooks: hook the realloc paths that act as pure malloc/free. 2018-05-18 11:43:03 -07:00
David Goldblatt
83e516154c Hooks: hook the pure-expand function. 2018-05-18 11:43:03 -07:00
David Goldblatt
c154f5881b Hooks: hook the pure-deallocation functions. 2018-05-18 11:43:03 -07:00
David Goldblatt
226327cf66 Hooks: hook the pure-allocation functions. 2018-05-18 11:43:03 -07:00
David Goldblatt
fe0e399385 Hooks: add an early-exit path for the common no-hook case. 2018-05-18 11:43:03 -07:00
David Goldblatt
5ae6e7cbfa Add "hook" module.
The hook module allows a low-reader-overhead way of finding hooks to invoke and
calling them.

For now, none of the allocation pathways are tied into the hooks; this will come
later.
2018-05-18 11:43:03 -07:00
David Goldblatt
06a8c40b36 Add the Seq module, a simple seqlock implementation.
This allows fast reader-writer concurrency in cases where writers are rare.  The
immediate use case is for the hooking implementaiton.
2018-05-18 11:43:03 -07:00
David Goldblatt
c7a87e0e0b Rename hooks module to test_hooks.
"Hooks" is really the best name for the module that will contain the publicly
exposed hooks.  So lets rename the current "hooks" module (that hook external
dependencies, for reentrancy testing) to "test_hooks".
2018-05-18 11:43:03 -07:00
David Goldblatt
e870829e64 TSD: Add the ability to enter a global slow path.
This gives any thread the ability to send other threads down slow paths the next
time they fetch tsd.
2018-05-18 11:43:03 -07:00
David Goldblatt
feff510b9f TSD: Pull name mangling into a macro. 2018-05-18 11:43:03 -07:00
David Goldblatt
39d6420c0c TSD: Make state atomic.
This will let us change the state of another thread remotely, eventually.
2018-05-18 11:43:03 -07:00
David Goldblatt
982c10de35 TSD: Make all state access happen through a function.
Shortly, tsd state will be atomic and have some complicated enough logic down
the state-setting path that we should be aware of it.
2018-05-18 11:43:03 -07:00
David Goldblatt
e74a1a37c8 Atomics: Add atomic_u8_t, force-inline operations.
We're about to need an atomic uint8_t for state operations.

Unfortunately, we're at the point where things won't get inlined into the key
methods unless they're force-inlined.  This is embarassing and we should do
something about it, but in the meantime we'll force-inline a little more when we
need to.
2018-05-18 11:43:03 -07:00
Qi Wang
09edea3f5c Tweak the format of the per arena summary section.
Increase the width to ensure enough space for long running programs.
2018-05-17 12:58:56 -07:00
Qi Wang
b293a3eb86 Fix the max_background_thread test.
We may set number of background threads separately, e.g. through
--with-malloc-conf, so avoid assuming the default number in the test.
2018-05-15 14:00:51 -07:00
Qi Wang
312352faa8 Fix background thread index issues with max_background_threads. 2018-05-15 12:25:23 -07:00
Qi Wang
e8a63b87c3 Fix an incorrect assertion.
When configured with --with-lg-page, it's possible for the configured page size
to be greater than the system page size, in which case the page address may only
be aligned with the system page size.
2018-05-09 23:52:56 -07:00
Qi Wang
61efbda709 Merge branch 'dev' 2018-05-08 12:12:50 -07:00
Qi Wang
1c51381b7c Update ChangeLog for 5.1.0. 2018-05-08 12:06:34 -07:00
David T. Goldblatt
e94ca7f3e2 run_tests.sh: Don't test large vaddr with -m32. 2018-05-08 11:20:25 -07:00
Qi Wang
a308af360c Reformat the version number in jemalloc.pc.in. 2018-05-07 20:12:03 -07:00
Christoph Muellner
b73380bee0 Fix include path order for out-of-tree builds.
When configuring out-of-tree (source directory is not build directory),
the generated include files from the build directory should have higher
priority than those in the source dir.

This is especially helpful when cross-compiling.

Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-05-05 10:11:22 -07:00
David Goldblatt
4c8829e692 run_tests.sh: Test --with-lg-vaddr. 2018-05-04 15:50:12 -07:00
David Goldblatt
b001e6e740 INSTALL.md: Clarify --with-lg-vaddr.
The current wording can be taken to imply that we return tagged pointers to the
user, or otherwise rely on architectural support for them.
2018-05-04 15:50:12 -07:00
Christoph Muellner
63712b4c4e configure: Add --with-lg-vaddr configure option.
This patch allows to override the lg-vaddr values, which
are defined by the build machine's CPUID information (x86_64)
or default values (other architectures like aarch64).

Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
2018-05-04 10:34:10 -07:00
Qi Wang
95789a24fa Update copyright dates. 2018-05-03 15:31:42 -07:00
Qi Wang
2e7af1af73 Add TUNING.md. 2018-05-03 12:52:52 -07:00
Qi Wang
3bcaedeea2 Remove documentation for --disable-thp which was removed. 2018-05-03 12:52:52 -07:00
Qi Wang
c5b72a92cc Fix a typo in INSTALL.md. 2018-05-02 15:08:49 -07:00
Latchesar Ionkov
a32b7bd567 Mallctl: Add arenas.lookup
Implement a new mallctl operation that allows looking up the arena a
region of memory belongs to.
2018-05-01 13:14:36 -07:00
Christoph Muellner
6df90600a7 aarch64: Add ILP32 support.
Instead of setting a fix value of 48 allowed VA bits,
we distiguish between LP64 and ILP32.

Testsuite result with LP64:
Test suite summary: pass: 13/13, skip: 0/13, fail: 0/13

Testsuit result with ILP32:
Test suite summary: pass: 13/13, skip: 0/13, fail: 0/13

Signed-off-by: Christoph Muellner <christoph.muellner@theobroma-systems.com>
Reviewed-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
2018-04-30 15:04:00 -07:00
Issam Maghni
39b1b20499 Adding install_lib_pc
Related to https://github.com/jemalloc/jemalloc/issues/974
2018-04-22 11:52:47 -07:00
Qi Wang
b8f4c730ef Remove an incorrect assertion.
Background threads are created without holding the global background_thread
lock, which mean paused state is possible (and fine).
2018-04-18 14:17:08 -07:00
Qi Wang
dedfeecc4e Invoke dlsym() on demand.
If no lazy lock or background thread is enabled, avoid dlsym pthread_create on
boot.
2018-04-18 11:20:21 -07:00
David Goldblatt
c95284df1a Avoid a resource leak down extent split failure paths.
Previously, we would leak the extent and memory associated with a salvageable
portion of an extent that we were trying to split in three, in the case where
the first split attempt succeeded and the second failed.
2018-04-18 08:19:41 -07:00
David Goldblatt
a62e42baeb Add the --disable-initial-exec-tls configure option.
Right now we always make our TLS use the initial-exec model if the compiler
supports it.  This change allows configure-time disabling of this setting, which
can be helpful when dynamically loading jemalloc is the only option.
2018-04-17 19:22:01 -07:00
Qi Wang
e40b2f75bd Fix abort_conf processing.
When abort_conf is set, make sure we always error out at the end of the options
processing loop.
2018-04-17 18:23:53 -07:00
Qi Wang
0fadf4a2e3 Add UNUSED to avoid compiler warnings. 2018-04-16 13:50:21 -07:00
Jason Evans
2a80d6f15b Avoid a printf format specifier warning.
This dodges a warning emitted by the FreeBSD system gcc when compiling
libc for architectures which don't use clang as the system compiler.
2018-04-16 11:07:51 -07:00
Qi Wang
3f0dc64c6b Allow setting extent hooks on uninitialized auto arenas.
Setting extent hooks can result in initializing an unused auto arena.  This is
useful to install extent hooks on auto arenas from the beginning.
2018-04-11 21:21:54 -07:00
Qi Wang
02585420c3 Document liveness requirements for extent_hooks_t structures. 2018-04-11 12:35:28 -07:00
Qi Wang
f0b146acc4 Fix a typo. 2018-04-11 10:42:57 -07:00
Jason Evans
cad27a894a Fix a typo. 2018-04-10 17:59:10 -07:00
Jason Evans
4937309620 Silence a compiler warning. 2018-04-10 17:59:00 -07:00
Dave Watson
8b14f3abc0 background_thread: add max thread count config
Looking at the thread counts in our services, jemalloc's background thread
is useful, but mostly idle.  Add a config option to tune down the number of threads.
2018-04-10 14:01:45 -07:00
Qi Wang
4be74d5112 Consolidate the two memory loads in rtree_szind_slab_read().
szind and slab bits are read on fast path, where compiler generated two memory
loads separately for them before this diff.  Manually operate on the bits to
avoid the extra memory load.
2018-04-10 10:18:46 -07:00
Rajeev Misra
5f51882a0a Stack address should not be used for ordering mutexes 2018-04-10 10:16:57 -07:00
Qi Wang
cf2f4aac1c Fix const qualifier warnings. 2018-04-09 16:50:30 -07:00
Qi Wang
d3e0976a2c Fix type warning on Windows.
Add cast since read / write has unsigned return type on windows.
2018-04-09 16:50:30 -07:00
Qi Wang
4df483f0fd Fix arguments passed to extent_init. 2018-04-09 16:35:58 -07:00
Qi Wang
2dccf45640 Control idump and gdump with prof_active. 2018-04-09 16:35:14 -07:00
Dave Watson
6d02421730 extents: Remove preserve_lru feature.
preserve_lru feature adds lots of complication, for little value.
Removing it means merged extents are re-added to the lru list, and may
take longer to madvise away than they otherwise would.

Canaries after removal seem flat for several services (no change).
2018-04-02 12:40:28 -07:00
Qi Wang
21eb0d15a6 Fix a background_thread shutdown issue.
1) make sure background thread 0 is always created; and 2) fix synchronization
between thread 0 and the control thread.
2018-04-02 10:03:47 -07:00
Qi Wang
956c4ad6b5 Change mutable option output in stats to avoid stringify issues. 2018-03-15 14:42:48 -07:00
Qi Wang
baffeb1d0a Fix a typo in stats. 2018-03-15 14:42:48 -07:00
Qi Wang
742416f645 Revert "CI: Remove "catgets" dependency on appveyor."
This reverts commit ae0f5d5c3f.
2018-03-15 13:58:42 -07:00
David Goldblatt
4c36cd2cc5 Stats printing: Convert arena large stats to use emitter.
This completes the conversion; we now have only structured text output.
2018-03-09 11:47:17 -08:00
David Goldblatt
4eed989bbf Stats printing: convert arena bin stats to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
a9f3cedc6e Stats printing: remove a spurious newline.
This was left over from a previous emitter conversion.  It didn't affect the
correctness of the output.
2018-03-09 11:47:17 -08:00
David Goldblatt
a1738f4efd Stats printing: Make arena mutex stats use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
07fb707623 Stats printing: convert most per-arena stats to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
8fc850695d Stats printing: convert paging and alloc counts to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
bc6620f73e Stats printing: convert decay stats to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
a6ef061c43 Stats printing: Move emitter cutoff point into stats_arena_print. 2018-03-09 11:47:17 -08:00
David Goldblatt
cbde666d9a Stats printing: move stats_print_helper to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
86c61d4a57 Stats printing: Move global mutex stats to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
ebe0b5f828 Emitter: Add support for row-based output in table mode.
This is needed for things like mutex stats in table mode.
2018-03-09 11:47:17 -08:00
David Goldblatt
9e1846b004 Stats printing: move non-mutex arena stats to the emitter.
Another step in the conversion process.  The mutex is a little different,
because we we want to emit it as an array.
2018-03-09 11:47:17 -08:00
David Goldblatt
8076b28721 Stats printing: Remove explicit callback passing to stats_print_helper.
This makes the emitter the only source of callback information, which is a step
towards where we want to be.
2018-03-09 11:47:17 -08:00
David Goldblatt
0d20eda127 Stats printing: Move emitter -> manual cutoff point.
This makes it so that the "general" portion of the stats code is completely
agnostic to emitter type.
2018-03-09 11:47:17 -08:00
David Goldblatt
ec31d476ff Stats printing: Convert profiling stats to use the emitter.
While we're at it, print them in table form, too.
2018-03-09 11:47:17 -08:00
David Goldblatt
e5acc35400 Stats printing: Convert general arena stats to use the emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
4a335e0c6f Stats printing: convert config and opt output to use emitter.
This is a step along the path towards using the emitter for all stats output.
2018-03-09 11:47:17 -08:00
David Goldblatt
b646f89173 Stats printing: Convert header and footer to use emitter. 2018-03-09 11:47:17 -08:00
David Goldblatt
27a8fe6780 Introduce the emitter module.
The emitter can be used to produce structured json or tabular output.  For now
it has no uses; in subsequent commits, I'll begin transitioning stats printing
code over.
2018-03-09 11:47:17 -08:00
Qi Wang
e4f090e8df Add opt.thp which allows explicit hugepage usage.
"always" marks all user mappings as MADV_HUGEPAGE; while "never" marks all
mappings as MADV_NOHUGEPAGE. The default setting "default" does not change any
settings.  Note that all the madvise calls are part of the default extent hooks
by design, so that customized extent hooks have complete control over the
mappings including hugepage settings.
2018-03-08 13:08:06 -08:00
Qi Wang
efa40532dc Remove config.thp which wasn't in use. 2018-03-08 13:08:06 -08:00
Qi Wang
6b35366ef5 Skip test_alignment_and_size if percpu_arena is enabled.
test_alignment_and_size needs a lot of memory.  When percpu_arena is enabled,
multiple arenas may cause the test to OOM.
2018-03-02 14:44:21 -08:00
Qi Wang
548153e789 Remove unused code in test/thread_tcache_enabled. 2018-03-02 14:44:21 -08:00
David Goldblatt
26b1c13982 Background threads: fix an indexing bug.
We have a buffer overrun that manifests in the case where arena indices higher
than the number of CPUs are accessed before arena indices lower than the number
of CPUs.  This fixes the bug and adds a test.
2018-02-27 19:43:05 -08:00
David T. Goldblatt
dd7e283b6f Tweak the ticker paths to help GCC generate better code.
GCC on its own isn't quite able to turn the ticker subtract into a memory
operation followed by a js.
2018-02-21 16:04:23 -08:00
David Goldblatt
ae0f5d5c3f CI: Remove "catgets" dependency on appveyor.
This seems to cause a configuration error with msys2.
2018-02-14 16:21:44 -08:00
Maks Naumov
a3abbb4bdf Fix MSVC build 2018-02-12 10:35:53 -08:00
rustyx
83aa9880b7 Make generated headers usable in both x86 and x64 mode in Visual Studio 2018-01-30 13:11:41 -08:00
rustyx
ed52d24f74 Define JEMALLOC_NO_PRIVATE_NAMESPACE also in Visual Studio x86 targets 2018-01-30 13:11:41 -08:00
Christopher Ferris
f78d4ca3fb Modify configure to determine return value of strerror_r.
On glibc and Android's bionic, strerror_r returns char* when
_GNU_SOURCE is defined.

Add a configure check for this rather than assume glibc is the
only libc that behaves this way.
2018-01-10 21:01:18 -08:00
Qi Wang
ba5992fe9a Improve the fit for aligned allocation.
We compute the max size required to satisfy an alignment.  However this can be
quite pessimistic, especially with frequent reuse (and combined with state-based
fragmentation).  This commit adds one more fit step specific to aligned
allocations, searching in all potential fit size classes.
2018-01-05 14:27:58 -08:00
Qi Wang
41790f4fa4 Check tsdn_null before reading reentrancy level. 2018-01-05 13:05:17 -08:00
Qi Wang
91b247d311 In iallocztm, check lock rank only when not in reentrancy. 2018-01-05 13:05:17 -08:00
Nehal J Wani
78a87e4a80 Make sure JE_CXXFLAGS_ADD uses CPP compiler
All the invocations of AC_COMPILE_IFELSE inside JE_CXXFLAGS_ADD were
running 'the compiler and compilation flags of the current language'
which was always the C compiler and the CXXFLAGS were never being tested
against a C++ compiler. This patch fixes this issue by temporarily
changing the chosen compiler to C++ by pushing it over the stack and
popping it immediately after the compilation check.
2018-01-04 11:14:46 -08:00
marxin
433c2edabc Disable JEMALLOC_HAVE_MADVISE_HUGE for arm* CPUs. 2018-01-04 11:13:32 -08:00
Rajeev Misra
72bdbc35e3 extent_t bitpacking logic refactoring 2018-01-04 11:11:04 -08:00
Rajeev Misra
f47e39d11a handle 32 bit mutex counters 2018-01-04 11:08:17 -08:00
David Goldblatt
d41b19f9c7 Implement arena regind computation using div_info_t.
This eliminates the need to generate an enormous switch statement in
arena_slab_regind.
2017-12-21 14:25:43 -08:00
David Goldblatt
21f7c13d0b Add the div module, which allows fast division by dynamic values. 2017-12-21 14:25:43 -08:00
David T. Goldblatt
7f1b02e3fa Split up and standardize naming of stats code.
The arena-associated stats are now all prefixed with arena_stats_, and live in
their own file.  Likewise, malloc_bin_stats_t -> bin_stats_t, also in its own
file.
2017-12-18 16:29:10 -08:00
David T. Goldblatt
901d94a2b0 Rename cache_alloc_easy to cache_bin_alloc_easy.
This lives in the cache_bin module; just a typo.
2017-12-18 16:29:10 -08:00
David T. Goldblatt
8aafa270fd Move bin stats code from arena to bin module. 2017-12-18 16:29:10 -08:00
David T. Goldblatt
48bb4a056b Move bin forking code from arena to bin module. 2017-12-18 16:29:10 -08:00
David T. Goldblatt
a8dd8876fb Move bin initialization from arena module to bin module. 2017-12-18 16:29:10 -08:00
David T. Goldblatt
4bf4a1c4ea Pull out arena_bin_info_t and arena_bin_t into their own file.
In the process, kill arena_bin_index, which is unused.  To follow are several
diffs continuing this separation.
2017-12-18 16:29:10 -08:00
Qi Wang
740bdd68b1 Over purge by 1 extent always.
When purging, large allocations are usually the ones that cross the npages_limit
threshold, simply because they are "large".  This means we often leave the large
extent around for a while, which has the downsides of: 1) high RSS and 2) more
chance of them getting fragmented.  Given that they are not likely to be reused
very soon (LRU), let's over purge by 1 extent (which is often large and not
reused frequently).
2017-12-18 12:57:07 -08:00
Qi Wang
f70785de91 Skip test/unit/pack when profiling is enabled.
The test assumes no sampled allocations.
2017-12-18 12:47:46 -08:00
Qi Wang
5e0332890f Output opt.lg_extent_max_active_fit in stats. 2017-12-14 15:49:15 -08:00
nicolov
22460cbebd jemalloc_mangle.sh: set sh in strict mode 2017-12-11 23:35:20 -08:00
Ed Schouten
749caf14ae Also use __riscv to detect builds for RISC-V CPUs.
According to the RISC-V toolchain conventions, __riscv__ is the old
spelling of this definition. __riscv should be used going forward.

https://github.com/riscv/riscv-toolchain-conventions#cc-preprocessor-definitions
2017-12-09 10:10:42 -08:00
Qi Wang
955b1d9cc5 Fix extent deregister on the leak path.
On leak path we should not adjust gdump when deregister.
2017-12-08 22:22:03 -08:00
Qi Wang
b5ab3f91ea Fix test/integration/extent.
Should only run the hook tests without background threads.  This was introduced
in 6e841f6.
2017-12-08 22:22:03 -08:00
Qi Wang
6e841f618a Add more tests for extent hooks failure paths. 2017-11-28 21:52:49 -08:00
Qi Wang
26a8f82c48 Add missing deregister before extents_leak.
This fixes an regression introduced by 211b1f3 (refactor extent split).
2017-11-19 21:12:40 -08:00
Qi Wang
e475d03752 Avoid setting zero and commit if split fails in extent_recycle. 2017-11-19 21:12:27 -08:00
Qi Wang
3e64dae802 Eagerly coalesce large extents.
Coalescing is a small price to pay for large allocations since they happen less
frequently.  This reduces fragmentation while also potentially improving
locality.
2017-11-16 15:32:02 -08:00
Qi Wang
eb1b08daae Fix an extent coalesce bug.
When coalescing, we should take both extents off the LRU list; otherwise decay
can grab the existing outer extent through extents_evict.
2017-11-16 15:32:02 -08:00
Qi Wang
fac706836f Add opt.lg_extent_max_active_fit
When allocating from dirty extents (which we always prefer if available), large
active extents can get split even if the new allocation is much smaller, in
which case the introduced fragmentation causes high long term damage.  This new
option controls the threshold to reuse and split an existing active extent.  We
avoid using a large extent for much smaller sizes, in order to reduce
fragmentation.  In some workload, adding the threshold improves virtual memory
usage by >10x.
2017-11-16 15:32:02 -08:00
Qi Wang
282a3faa17 Use extent_heap_first for best fit.
extent_heap_any makes the layout less predictable and as a result incurs more
fragmentation.
2017-11-16 15:32:02 -08:00
Dave Watson
d6feed6e66 Use tsd offset_state instead of atomic
While working on #852, I noticed the prng state is atomic.  This is the only
atomic use of prng in all of jemalloc.  Instead, use a threadlocal prng
state if possible to avoid unnecessary cache line contention.
2017-11-14 08:58:18 -08:00
Qi Wang
cb3b72b975 Fix base allocator THP auto mode locking and stats.
Added proper synchronization for switching to using THP in auto mode.  Also
fixed stats for number of THPs used.
2017-11-09 16:14:12 -08:00
Qi Wang
b5d071c266 Fix unbounded increase in stash_decayed.
Added an upper bound on how many pages we can decay during the current run.
Without this, decay could have unbounded increase in stashed, since other
threads could add new pages into the extents.
2017-11-08 16:33:30 -08:00
Qi Wang
6dd5681ab7 Use hugepage alignment for base allocator.
This gives us an easier way to tell if the allocation is for metadata in the
extent hooks.
2017-11-03 19:37:13 -07:00
Qi Wang
e422fa8e7e Add arena.i.retain_grow_limit
This option controls the max size when grow_retained.  This is useful when we
have customized extent hooks reserving physical memory (e.g. 1G huge pages).
Without this feature, the default increasing sequence could result in fragmented
and wasted physical memory.
2017-11-03 13:53:33 -07:00
Edward Tomasz Napierala
9f455e2786 Try to use sysctl(3) instead of sysctlbyname(3).
This attempts to use VM_OVERCOMMIT OID - newly introduced in -CURRENT
few days ago, specifically for this purpose - instead of querying the
sysctl by its string name.  Due to how syctlbyname(3) works, this means
we do one syscall during binary startup instead of two.

Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
2017-11-03 08:25:39 -07:00
Edward Tomasz Napierala
d591df05c8 Use getpagesize(3) under FreeBSD.
This avoids sysctl(2) syscall during binary startup, using the value
passed in the ELF aux vector instead.

Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
2017-11-03 08:25:39 -07:00
Qi Wang
58eba024c0 metadata_thp: auto mode adjustment for a0.
We observed that arena 0 can have much more metadata allocated comparing to
other arenas.  Tune the auto mode to only switch to huge page on the 5th block
(instead of 3 previously) for a0.
2017-11-01 13:52:06 -07:00
Qi Wang
47203d5f42 Output all counters for bin mutex stats.
The saved space is not worth the trouble of missing counters.
2017-10-19 16:31:54 -07:00
David Goldblatt
d14bbf8d81 Add a "dumpable" bit to the extent state.
Currently, this is unused (i.e. all extents are always marked dumpable).  In the
future, we'll begin using this functionality.
2017-10-16 15:35:49 -07:00
David Goldblatt
bbaa72422b Add pages_dontdump and pages_dodump.
This will, eventually, enable us to avoid dumping eden regions.
2017-10-16 15:35:49 -07:00
David Goldblatt
ccd09050aa Add configure-time detection for madvise(..., MADV_DO[NT]DUMP) 2017-10-16 15:35:49 -07:00
David Goldblatt
211b1f3c7d Factor out extent-splitting core from extent lifetime management.
Before this commit, extent_recycle_split intermingles the splitting of an extent
and the return of parts of that extent to a given extents_t.  After it, that
logic is separated.  This will enable splitting extents that don't live in any
extents_t (as the grow retained region soon will).
2017-10-16 15:35:49 -07:00
David Goldblatt
5bad01c38e Document some of the internal extent functions. 2017-10-16 15:35:49 -07:00
rustyx
33df2fa169 Fix MSVC 2015 project and add a VS 2017 solution 2017-10-16 10:26:54 -07:00
Qi Wang
f4f814cd4c Remove the default value for JEMALLOC_PURGE_MADVISE_DONTNEED_ZEROS. 2017-10-11 15:49:22 -07:00
Qi Wang
31ab38be5f Define MADV_FREE on our own when needed.
On x86 Linux, we define our own MADV_FREE if madvise(2) is available, but no
MADV_FREE is detected.  This allows the feature to be built in and enabled with
runtime detection.
2017-10-11 15:49:22 -07:00
Qi Wang
fc83de0384 Document the potential issues about opt.background_thread. 2017-10-11 09:52:04 -07:00
Qi Wang
7e74093c96 Set isthreaded manually.
Avoid relying pthread_once which creates dependency during init.
2017-10-05 22:57:56 -07:00
Qi Wang
a2e6eb2c22 Delay background_thread_ctl_init to right before thread creation.
ctl_init sets isthreaded, which means it should be done without holding any
locks.
2017-10-05 22:57:56 -07:00
Qi Wang
79e83451ff Enable a0 metadata thp on the 3rd base block.
Since we allocate rtree nodes from a0's base, it's pushed to over 1 block on
initialization right away, which makes the auto thp mode less effective on a0.
We change a0 to make the switch on the 3rd block instead.
2017-10-05 13:39:03 -07:00
David Goldblatt
1245faae90 Power: disable the CPU_SPINWAIT macro.
Quoting from https://github.com/jemalloc/jemalloc/issues/761 :

[...] reading the Power ISA documentation[1], the assembly in [the CPU_SPINWAIT
macro] isn't correct anyway (as @marxin points out): the setting of the
program-priority register is "sticky", and we never undo the lowering.

We could do something similar, but given that we don't have testing here in the
first place, I'm inclined to simply not try. I'll put something up reverting the
problematic commit tomorrow.

[1] Book II, chapter 3 of the 2.07B or 3.0B ISA documents.
2017-10-04 18:37:23 -07:00
Dave Watson
7c6c99b829 Use ph instead of rb tree for extents_avail_
There does not seem to be any overlap between usage of
extent_avail and extent_heap, so we can use the same hook.

The only remaining usage of rb trees is in the profiling code,
which has some 'interesting' iteration constraints.

Fixes #888
2017-10-04 12:23:03 -07:00
David Goldblatt
8a7ee3014c Logging: capitalize log macro.
Dodge a name-conflict with the math.h logarithm function. D'oh.
2017-10-02 20:44:43 -07:00
David Goldblatt
7a8bc7172b ARM: Don't extend bit LG_VADDR to compute high address bits.
In userspace ARM on Linux, zero-ing the high bits is the correct way to do this.
This doesn't fix the fact that we currently set LG_VADDR to 48 on ARM, when in
fact larger virtual address sizes are coming soon.  We'll cross that bridge when
we come to it.
2017-10-02 14:54:46 -07:00
Qi Wang
0720192a32 Add runtime detection of lazy purging support.
It's possible to build with lazy purge enabled but depoly to systems without
such support.  In this case, rely on the boot time detection instead of keep
making unnecessary madvise calls (which all returns EINVAL).
2017-09-26 17:26:22 -07:00
Qi Wang
3959a9fe19 Avoid left shift by negative values.
Fix warnings on -Wshift-negative-value.
2017-09-25 15:38:58 -07:00
Qi Wang
56f0e57844 Add "falls through" comment explicitly.
Fix warnings by -Wimplicit-fallthrough.
2017-09-25 15:38:58 -07:00
Tamir Duberstein
a545f1804a dumpbin doesn't exist in mingw 2017-09-21 12:18:19 -07:00
Tamir Duberstein
24766ccd5b Allow toolchain to determine nm 2017-09-21 12:18:19 -07:00
Tamir Duberstein
96f1468221 whitespace 2017-09-21 12:18:19 -07:00
Qi Wang
eaa58a5026 Put static keyword first.
Fix a warning by -Wold-style-declaration.
2017-09-21 12:18:10 -07:00
Qi Wang
d60f3bac12 Add missing field in initializer for rtree cache.
Fix a warning by -Wmissing-field-initializers.
2017-09-21 12:18:10 -07:00
David Goldblatt
9e39425bf1 Force Ubuntu "precise" for Travis CI builds.
We've been seeing strange errors in jemalloc_cpp.cpp since Travis upgraded from
precise to trusty as their default CI environment (seeming to stem from some
the new clang version finding the headers for an old version of libstdc++.  In
the long run we'll have to deal with this "for real", but at that point we may
have a better C++ story in general, making it a moot point.
2017-09-20 10:38:26 -07:00
Qi Wang
9b20a4bf70 Clear cache bin ql postfork.
This fixes a regression in 9c05490, which introduced the new cache bin ql.  The
list needs to be cleaned up after fork, same as tcache_ql.
2017-09-12 16:16:12 -07:00
Qi Wang
886053b966 Fix huge page test in test/unit/pages.
Huge pages could be disabled even if the kernel header has MAD_HUGEPAGE
defined.  Guard the huge pagetest with runtime detection.
2017-09-12 14:29:49 -07:00
Qi Wang
cf4738455d Fix a link for dirty_decay_ms in manual. 2017-09-11 13:38:45 -07:00
Qi Wang
a315688be0 Relax constraints on reentrancy for extent hooks.
If we guarantee no malloc activity in extent hooks, it's possible to make
customized hooks working on arena 0.  Remove the non-a0 assertion to enable such
use cases.
2017-08-31 11:03:34 -07:00
Qi Wang
e55c3ca267 Add stats for metadata_thp.
Report number of THPs used in arena and aggregated stats.
2017-08-30 16:47:32 -07:00
Qi Wang
47b20bb654 Change opt.metadata_thp to [disabled,auto,always].
To avoid the high RSS caused by THP + low usage arena (i.e. THP becomes a
significant percentage), added a new "auto" option which will only start using
THP after a base allocator used up the first THP region.  Starting from the
second hugepage (in a single arena), "auto" behaves the same as "always",
i.e. madvise hugepage right away.
2017-08-30 16:47:32 -07:00
David Goldblatt
ea91dfa58e Document the ialloc function abbreviations.
In the jemalloc_internal_inlines files, we have a lot of somewhat terse function
names.  This commit adds some documentation to aid in translation.
2017-08-16 17:48:44 -07:00
David Goldblatt
9c0549007d Make arena stats collection go through cache bins.
This eliminates the need for the arena stats code to "know" about tcaches; all
that it needs is a cache_bin_array_descriptor_t to tell it where to find
cache_bins whose stats it should aggregate.
2017-08-16 17:48:44 -07:00
David Goldblatt
f3170baa30 Pull out caching for a bin into its own file.
This is the first step towards breaking up the tcache and arena (since they
interact primarily at the bin level).  It should also make a future arena
caching implementation more straightforward.
2017-08-16 17:48:44 -07:00
Qi Wang
b0825351d9 Add missing mallctl unit test for abort_conf.
The abort_conf option was missed from test/unit/mallctl.
2017-08-11 22:58:58 -07:00
Faidon Liambotis
82d1a3fb31 Add support for m68k, nios2, SH3 architectures
Add minimum alignment for three more architectures, as requested by
Debian users or porters (see Debian bugs #807554, #816236, #863424).
2017-08-11 16:35:44 -07:00
Faidon Liambotis
8da69b69e6 Fix support for GNU/kFreeBSD
The configure.ac seciton right now is the same for Linux and kFreeBSD,
which results into an incorrect configuration of e.g. defining
JEMALLOC_PROC_SYS_VM_OVERCOMMIT_MEMORY instead of FreeBSD's
JEMALLOC_SYSCTL_VM_OVERCOMMIT.

GNU/kFreeBSD is really a glibc + FreeBSD kernel system, so it needs its
own entry which has a mixture of configuration options from Linux and
FreeBSD.
2017-08-11 16:35:44 -07:00
Qi Wang
3ec279ba1c Fix test/unit/pages.
As part of the metadata_thp support, We now have a separate swtich
(JEMALLOC_HAVE_MADVISE_HUGE) for MADV_HUGEPAGE availability.  Use that instead
of JEMALLOC_THP (which doesn't guard pages_huge anymore) in tests.
2017-08-11 15:57:12 -07:00
Qi Wang
8fdd9a5797 Implement opt.metadata_thp
This option enables transparent huge page for base allocators (require
MADV_HUGEPAGE support).
2017-08-11 14:51:20 -07:00
Qi Wang
d157864027 Filter out "void *newImpl" in prof output. 2017-08-08 12:28:29 -07:00
Ryan Libby
048c6679cd Remove external linkage for spin_adaptive
The external linkage for spin_adaptive was not used, and the inline
declaration of spin_adaptive that was used caused a probem on FreeBSD
where CPU_SPINWAIT is implemented as a call to a static procedure for
x86 architectures.
2017-08-08 10:30:21 -07:00
Qi Wang
1ab2ab294c Only read szind if ptr is not paged aligned in sdallocx.
If ptr is not page aligned, we know the allocation was not sampled. In this case
use the size passed into sdallocx directly w/o accessing rtree.  This improve
sdallocx efficiency in the common case (not sampled && small allocation).
2017-07-31 15:47:48 -07:00
David Goldblatt
9a39b23c9c Remove a redundant '--with-malloc-conf=tcache:false' from gen_run_tests.py
This is already tested via its inclusion in possible_malloc_conf_opts.
2017-07-31 15:36:40 -07:00
Qi Wang
3800e55a2c Bypass extent_alloc_wrapper_hard for no_move_expand.
When retain is enabled, we should not attempt mmap for in-place expansion
(large_ralloc_no_move), because it's virtually impossible to succeed, and causes
unnecessary syscalls (which can cause lock contention under load).
2017-07-31 14:04:17 -07:00
Qi Wang
2d2fa72647 Filter out "newImpl" from profiling output. 2017-07-28 14:08:00 -07:00
David Goldblatt
7c22ea7a93 Only run test/integration/sdallocx non-reentrantly.
This is a temporary workaround until we add some beefier CI machines.  Right
now, we're seeing too many OOMs for this to be useful.
2017-07-24 16:21:24 -07:00
David Goldblatt
e6aeceb606 Logging: log using the log var names directly.
Currently we have to log by writing something like:

  static log_var_t log_a_b_c = LOG_VAR_INIT("a.b.c");
  log (log_a_b_c, "msg");

This is sort of annoying.  Let's just write:

  log("a.b.c", "msg");
2017-07-24 14:55:54 -07:00
Qinfan Wu
b28f31e7ed Split out cold code path in newImpl
I noticed that the whole newImpl is inlined. Since OOM handling code is
rarely executed, we should only inline the hot path.
2017-07-24 13:37:02 -07:00
David Goldblatt
a9f7732d45 Logging: allow logging with empty varargs.
Currently, the log macro requires at least one argument after the format string,
because of the way the preprocessor handles varargs macros.  We can hide some of
that irritation by pushing the extra arguments into a varargs function.
2017-07-22 09:38:19 -07:00
Y. T. Chung
aa6c282137 Validates fd before calling fcntl 2017-07-22 07:46:30 -07:00
David T. Goldblatt
e215a7bc18 Add entry and exit logging to all core functions.
I.e. mallloc, free, the allocx API, the posix extensions.
2017-07-20 17:58:37 -07:00
David T. Goldblatt
9761b449c8 Add a logging facility.
This sets up a hierarchical logging facility, so that we can add logging
statements liberally, and turn them on in a fine-grained manner.
2017-07-20 17:58:37 -07:00
Y. T. Chung
0975b88dfd Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable.
Older Linux systems don't have O_CLOEXEC.  If that's the case, we fcntl
immediately after open, to minimize the length of the racy period in
which an
operation in another thread can leak a file descriptor to a child.
2017-07-20 14:13:33 -07:00
David Goldblatt
fb6787a78c Add a test of behavior under multi-threaded forking.
Forking a multithreaded process is dangerous but allowed, so long as the child
only executes async-signal-safe functions (e.g. exec).  Add a test to ensure
that we don't break this behavior.
2017-07-10 18:17:12 -07:00
David Goldblatt
0a4f5a7eea Fix deadlock in multithreaded fork in OS X.
On OS X, we rely on the zone machinery to call our prefork and postfork
handlers.

In zone_force_unlock, we call jemalloc_postfork_child, reinitializing all our
mutexes regardless of state, since the mutex implementation will assert if the
tid of the unlocker is different from that of the locker.  This has the effect
of unlocking the mutexes, but also fails to wake any threads waiting on them in
the parent.

To fix this, we track whether or not we're the parent or child after the fork,
and unlock or reinit as appropriate.

This resolves #895.
2017-07-10 18:17:12 -07:00
Tamir Duberstein
3f5049340e Allow toolchain to determine nm 2017-07-06 14:46:02 -07:00
Tamir Duberstein
ef55006c1d dumpbin doesn't exist in mingw 2017-07-06 14:46:02 -07:00
Tamir Duberstein
f9dfb8db73 whitespace 2017-07-06 14:46:02 -07:00
Jason Evans
aa44ddbcdd Fix a typo. 2017-07-02 21:05:23 -07:00
Jason Evans
896ed3a8b3 Merge branch 'dev' 2017-07-01 17:44:01 -07:00
Jason Evans
284edf02b0 Update ChangeLog for 5.0.1. 2017-07-01 17:34:34 -07:00
Qi Wang
cb032781bd Add extent_grow_mtx in pre_ / post_fork handlers.
This fixed the issue that could cause the child process to stuck after fork.
2017-06-29 17:01:18 -07:00
Jason Evans
2b31cf5bd2 Enforce minimum autoconf version (currently 2.68).
This resolves #912.
2017-06-29 16:23:35 -07:00
Jason Evans
c99e570a48 Make sure LG_PAGE <= LG_HUGEPAGE.
This resolves #883.
2017-06-28 18:21:47 -07:00
Qi Wang
aa363f9388 Fix pthread_sigmask() usage to block all signals. 2017-06-26 11:27:21 -07:00
Qi Wang
57beeb2fcb Switch ctl to explicitly use tsd instead of tsdn. 2017-06-23 13:27:53 -07:00
Qi Wang
425463a446 Check arena in current context in pre_reentrancy. 2017-06-23 13:27:53 -07:00
Qi Wang
d6eb8ac8f3 Set reentrancy when invoking customized extent hooks.
Customized extent hooks may malloc / free thus trigger reentry.  Support this
behavior by adding reentrancy on hook functions.
2017-06-23 13:27:53 -07:00
Jason Evans
d49ac4c709 Fix assertion typos.
Reported by Conrad Meyer.
2017-06-23 11:48:00 -07:00
Qi Wang
a3f4977217 Add thread name for background threads. 2017-06-23 10:54:54 -07:00
Qi Wang
52fc887b49 Avoid inactivity_check within background threads.
Passing is_background_thread down the decay path, so that background thread
itself won't attempt inactivity_check.  This fixes an issue with background
thread doing trylock on a mutex it already owns.
2017-06-22 16:53:58 -07:00
Jason Evans
37f3fa0941 Mask signals during background thread creation.
This prevents signals from being inadvertently delivered to background
threads.
2017-06-20 17:47:38 -07:00
Qi Wang
d35c037e03 Clear tcache_ql after fork in child. 2017-06-19 21:53:07 -07:00
Qi Wang
9b1befabbb Add minimal initialized TSD.
We use the minimal_initilized tsd (which requires no cleanup) for free()
specifically, if tsd hasn't been initialized yet.

Any other activity will transit the state from minimal to normal.  This is to
workaround the case where a thread has no malloc calls in its lifetime until
during thread termination, free() happens after tls destructors.
2017-06-15 17:55:53 -07:00
Qi Wang
ae93fb08e2 Pass tsd to tcache_flush(). 2017-06-15 17:55:53 -07:00
Qi Wang
84f6c2cae0 Log decay->nunpurged before purging.
During purging, we may unlock decay->mtx.  Therefore we should finish logging
decay related counters before attempt to purge.
2017-06-14 20:18:02 -07:00
Qi Wang
a4d6fe73cf Only abort on dlsym when necessary.
If neither background_thread nor lazy_lock is in use, do not abort on dlsym
errors.
2017-06-14 13:27:41 -07:00
Qi Wang
bdcf40a620 Add alloc hook test in test/integration/extent. 2017-06-14 09:34:29 -07:00
Qi Wang
d955d6f2be Fix extent_hooks in extent_grow_retained().
This issue caused the default extent alloc function to be incorrectly
used even when arena.<i>.extent_hooks is set.  This bug was introduced
by 411697adcd (Use exponential series to
size extents.), which was first released in 5.0.0.
2017-06-14 09:34:29 -07:00
Jason Evans
5018fe3f09 Merge branch 'dev' 2017-06-13 12:51:09 -07:00
Jason Evans
ba29113e5a Update MSVC project files. 2017-06-13 11:22:41 -07:00
Jason Evans
aae8fd95fb Update ChangeLog for 5.0.0. 2017-06-12 23:16:44 -07:00
Jason Evans
bff8db439c Update copyright dates. 2017-06-12 23:16:44 -07:00
Qi Wang
813643c6a7 Prevent background threads from running in post_reset().
We lookup freed extents for testing in post_reset.  Take background_thread lock
so that the extents are not modified at the same time.
2017-06-12 08:56:14 -07:00
Qi Wang
394df9519d Combine background_thread started / paused into state. 2017-06-12 08:56:14 -07:00
Qi Wang
b83b5ad44a Not re-enable background thread after fork.
Avoid calling pthread_create in postfork handlers.
2017-06-12 08:56:14 -07:00
Qi Wang
464cb60490 Move background thread creation to background_thread_0.
To avoid complications, avoid invoking pthread_create "internally", instead rely
on thread0 to launch new threads, and also terminating threads when asked.
2017-06-12 08:56:14 -07:00
Jason Evans
13685ab1b7 Normalize background thread configuration.
Also fix a compilation error #ifndef JEMALLOC_PTHREAD_CREATE_WRAPPER.
2017-06-08 23:01:26 -07:00
Jason Evans
94d655b8bd Update a UTRACE() size argument. 2017-06-08 15:33:52 -07:00
Jason Evans
faaf458bad Remove redundant typedefs.
Pre-C11 compilers do not support typedef redefinition.
2017-06-08 13:28:57 -07:00
Qi Wang
5642f03cae Add internal tsd for background_thread. 2017-06-08 10:02:18 -07:00
Qi Wang
73713fbb27 Drop high rank locks when creating threads.
Avoid holding arenas_lock and background_thread_lock when creating background
threads, because pthread_create may take internal locks, and potentially cause
deadlock with jemalloc internal locks.
2017-06-08 10:02:18 -07:00
Qi Wang
00869e39a3 Make tsd no-cleanup during tsd reincarnation.
Since tsd cleanup isn't guaranteed when reincarnated, we set up tsd in a way
that needs no cleanup, by making it going through slow path instead.
2017-06-07 11:03:49 -07:00
Qi Wang
29c2577ee0 Remove assertions on extent_hooks being default.
It's possible to customize the extent_hooks while still using part of the
default implementation.
2017-06-05 10:56:40 -07:00
Qi Wang
3a813946fb Take background thread lock when setting extent hooks. 2017-06-05 10:56:25 -07:00
Qi Wang
530c07a45b Set reentrancy level to 1 during init.
This makes sure we go down slow path w/ a0 in init.
2017-06-02 12:59:21 -07:00
Qi Wang
340071f0cf Set isthreaded when enabling background_thread. 2017-06-01 17:34:49 -07:00
Qi Wang
c84ec3e9da Fix background thread creation.
The state initialization should be done before pthread_create.
2017-06-01 09:00:07 -07:00
Jason Evans
fd0fa003e1 Test with background_thread:true.
Add testing for background_thread:true, and condition a xallocx() -->
rallocx() escalation assertion to allow for spurious in-place rallocx()
following xallocx() failure.
2017-06-01 08:55:27 -07:00
Jason Evans
b511232fcd Refactor/fix background_thread/percpu_arena bootstrapping.
Refactor bootstrapping such that dlsym() is called during the
bootstrapping phase that can tolerate reentrant allocation.
2017-06-01 08:55:27 -07:00
Jason Evans
596b479d83 Skip default tcache testing if !opt_tcache. 2017-06-01 08:55:27 -07:00
David Goldblatt
fa35463d56 Witness assertions: only assert locklessness when non-reentrant.
Previously we could still hit these assertions down error paths or in the
extended API.
2017-05-31 17:02:54 -07:00
Qi Wang
508f54b02b Use real pthread_create for creating background threads. 2017-05-31 16:48:13 -07:00
Jason Evans
9a86c9bd30 Clean source directory before building tests in object directories. 2017-05-31 15:07:30 -07:00
David Goldblatt
8261e581be Header refactoring: Pull size helpers out of jemalloc module. 2017-05-31 13:08:45 -07:00
David Goldblatt
041e041e1f Header refactoring: unify and de-catchall mutex_pool. 2017-05-31 13:08:45 -07:00
David Goldblatt
98774e64a4 Header refactoring: unify and de-catchall extent_mmap module. 2017-05-31 13:08:45 -07:00
David Goldblatt
93284bb53d Header refactoring: unify and de-catchall extent_dss. 2017-05-31 13:08:45 -07:00
David Goldblatt
44f9bd147a Header refactoring: unify and de-catchall rtree module. 2017-05-31 13:08:45 -07:00
David Goldblatt
b4b4a98bc8 Add /run_tests.out/ to .gitignore. 2017-05-31 10:37:48 -07:00
Jason Evans
10d090aae9 Pass the O_CLOEXEC flag to open(2).
This resolves #528.
2017-05-31 08:50:35 -07:00
Qi Wang
66813916b5 Track background thread status separately at fork.
Use a separate boolean to track the enabled status, instead of leaving the
global background thread status inconsistent.
2017-05-31 08:27:31 -07:00
Qi Wang
2e4d1a4e30 Output total_wait_ns for bin mutexes. 2017-05-30 22:25:11 -07:00
Jason Evans
ff8062a511 Add jemalloc prefix to allocator functions pruned by jeprof.
This resolves #507.
2017-05-30 20:22:00 -07:00
Qi Wang
7578b0e929 Explicitly say so when aborting on opt_abort_conf. 2017-05-30 17:37:35 -07:00
Jason Evans
685c97fc43 More thoroughly document the *.{nmalloc,ndalloc,nrequests} mallctls.
This resolves #412.
2017-05-30 15:58:36 -07:00
Jason Evans
c606a87d2a Add the --disable-thp option to support cross compiling.
This resolves #669.
2017-05-30 11:30:54 -07:00
Qi Wang
bf6673a070 Fix npages during arena_decay_epoch_advance().
We do not lock extents while advancing epoch.  This change makes sure that we
only read npages from extents once in order to avoid any inconsistency.
2017-05-30 10:26:53 -07:00
Jason Evans
4f0963b883 Add test for excessive retained memory. 2017-05-29 17:27:18 -07:00
Jason Evans
168793a1c1 Fix extent_grow_next management.
Fix management of extent_grow_next to serialize operations that may grow
retained memory.  This assures that the sizes of the newly allocated
extents correspond to the size classes in the intended growth sequence.

Fix management of extent_grow_next to skip size classes if a request is
too large to be satisfied by the next size in the growth sequence.  This
avoids the potential for an arbitrary number of requests to bypass
triggering extent_grow_next increases.

This resolves #858.
2017-05-29 17:27:18 -07:00
Jason Evans
a16114866a Fix OOM paths in extent_grow_retained(). 2017-05-29 17:27:18 -07:00
Qi Wang
d5ef5ae934 Add opt.stats_print_opts.
The value is passed to atexit(3)-triggered malloc_stats_print() calls.
2017-05-29 11:54:00 -07:00
Qi Wang
49505e558b Make test/unit/background_thread not flaky. 2017-05-26 21:15:15 -07:00
Qi Wang
b86d271cbf Added opt_abort_conf: abort on invalid config options. 2017-05-26 21:14:28 -07:00
Jason Evans
57aaa53f2b Fix run_tests to avoid percpu_arena on !Linux. 2017-05-26 16:17:01 -07:00
Qi Wang
927239b910 Cleanup smoothstep.sh / .h.
h_step_sum was used to compute moving sum.  Not in use anymore.
2017-05-25 16:52:10 -07:00
Qi Wang
1df18d7c83 Fix stats.mapped during deallocation. 2017-05-24 15:57:46 -07:00
Jason Evans
67c93c332a Refactor run_tests to increase parallelism.
Rather than relying on parallel make to build individual configurations
one at a time, use xargs to build multiple configurations in parallel.
This allows the configure scripts to run in parallel.  On a 14-core
system (28 hyperthreads), this increases average CPU utilization from
~20% to ~90%.
2017-05-24 15:55:19 -07:00
David Goldblatt
18ecbfa89e Header refactoring: unify and de-catchall mutex module 2017-05-24 15:27:30 -07:00
David Goldblatt
9f822a1fd7 Header refactoring: unify and de-catchall witness code. 2017-05-24 15:27:30 -07:00
Jason Evans
36195c8f4d Disable percpu_arena by default. 2017-05-23 15:32:50 -07:00
Jason Evans
196a53c2ae Do not assume dss never decreases.
An sbrk() caller outside jemalloc can decrease the dss, so add a
separate atomic boolean to explicitly track whether jemalloc is
concurrently calling sbrk(), rather than depending on state outside
jemalloc's full control.

This resolves #802.
2017-05-23 15:31:29 -07:00
Jason Evans
067b970130 Add dss:primary testing.
Generalize the run_tests.sh and .travis.yml test generation to handle
combinations of arguments to the --with-malloc-conf configure option,
and merge "dss:primary" into the existing "tcache:false" testing.
2017-05-23 15:31:29 -07:00
Jason Evans
9b1038d19c Do not hold the base mutex while calling extent hooks.
Drop the base mutex while allocating new base blocks, because extent
allocation can enter code that prohibits holding non-core mutexes, e.g.
the extent_[d]alloc() and extent_purge_forced_wrapper() calls in
extent_alloc_dss().

This partially resolves #802.
2017-05-23 15:31:29 -07:00
Qi Wang
eeefdf3ce8 Fix # of unpurged pages in decay algorithm.
When # of dirty pages move below npages_limit (e.g. they are reused), we should
not lower number of unpurged pages because that would cause the reused pages to
be double counted in the backlog (as a result, decay happen slower than it
should).  Instead, set number of unpurged to the greater of current npages and
npages_limit.

Added an assertion: the ceiling # of pages should be greater than npages_limit.
2017-05-23 13:48:30 -07:00
Qi Wang
0eae838b0d Check for background thread inactivity on extents_dalloc.
To avoid background threads sleeping forever with idle arenas, we eagerly check
background threads' sleep time after extents_dalloc, and signal the thread if
necessary.
2017-05-23 12:26:20 -07:00
Qi Wang
2c368284d2 Add tests for background threads. 2017-05-23 12:26:20 -07:00
Qi Wang
44559e7cf1 Add documentation for background_thread related options. 2017-05-23 12:26:20 -07:00
Qi Wang
5f5ed2198e Add profiling for the background thread mutex. 2017-05-23 12:26:20 -07:00
Qi Wang
2bee0c6251 Add background thread related stats. 2017-05-23 12:26:20 -07:00
Qi Wang
b693c7868e Implementing opt.background_thread.
Added opt.background_thread to enable background threads, which handles purging
currently.  When enabled, decay ticks will not trigger purging (which will be
left to the background threads).  We limit the max number of threads to NCPUs.
When percpu arena is enabled, set CPU affinity for the background threads as
well.

The sleep interval of background threads is dynamic and determined by computing
number of pages to purge in the future (based on backlog).
2017-05-23 12:26:20 -07:00
David Goldblatt
3f685e8824 Protect the rtree/extent interactions with a mutex pool.
Instead of embedding a lock bit in rtree leaf elements, we associate extents
with a small set of mutexes.  This gets us two things:

- We can use the system mutexes.  This (hypothetically) protects us from
  priority inversion, and lets us stop doing a backoff/sleep loop, instead
  opting for precise wakeups from the mutex.
- Cuts down on the number of mutex acquisitions we have to do (from 4 in the
  worst case to two).

We end up simplifying most of the rtree code (which no longer has to deal with
locking or concurrency at all), at the cost of additional complexity in the
extent code: since the mutex protecting the rtree leaf elements is determined by
reading the extent out of those elements, the initial read is racy, so that we
may acquire an out of date mutex.  We re-check the extent in the leaf after
acquiring the mutex to protect us from this race.
2017-05-19 14:21:27 -07:00
David Goldblatt
26c792e61a Allow mutexes to take a lock ordering enum at construction.
This lets us specify whether and how mutexes of the same rank are allowed to be
acquired.  Currently, we only allow two polices (only a single mutex at a given
rank at a time, and mutexes acquired in ascending order), but we can plausibly
allow more (e.g. the "release uncontended mutexes before blocking").
2017-05-19 14:21:27 -07:00
Jason Evans
6e62c62862 Refactor *decay_time into *decay_ms.
Support millisecond resolution for decay times.  Among other use cases
this makes it possible to specify a short initial dirty-->muzzy decay
phase, followed by a longer muzzy-->clean decay phase.

This resolves #812.
2017-05-18 11:33:45 -07:00
Qi Wang
baf3e294e0 Add stats: arena uptime. 2017-05-18 10:04:28 -07:00
Jason Evans
04fec5e084 Avoid over-rebuilding due to namespace mangling.
Take care not to touch generated namespace mangling headers unless their
contents would change.

This resolves #838.
2017-05-17 10:06:58 -07:00
Qi Wang
b8ba3c3132 Use srcroot path for private_namespace.sh. 2017-05-16 09:30:33 -07:00
Jason Evans
18a83681cf Refactor (MALLOCX_ARENA_MAX + 1) to be MALLOCX_ARENA_LIMIT.
This resolves #673.
2017-05-14 10:14:23 -07:00
Jason Evans
909f0482e4 Automatically generate private symbol name mangling macros.
Rather than using a manually maintained list of internal symbols to
drive name mangling, add a compilation phase to automatically extract
the list of internal symbols.

This resolves #677.
2017-05-11 23:06:54 -07:00
Jason Evans
a4ae9707da Remove unused private_unnamespace infrastructure. 2017-05-11 23:06:54 -07:00
Jason Evans
a268af5085 Stop depending on JEMALLOC_N() for function interception during testing.
Instead, always define function pointers for interceptable functions,
but mark them const unless testing, so that the compiler can optimize
out the pointer dereferences.
2017-05-11 23:06:54 -07:00
Jason Evans
b3b033eefd Do not build in parallel on AppVeyor.
The compiler database used by MSVC is increasingly becoming corrupt,
presumably due to concurrency-related corruption, despite the -FS
compiler flag being specified as recommended.
2017-05-11 23:06:54 -07:00
Arkady Shapkin
6f58e630b6 Update and rename INSTALL to INSTALL.md 2017-05-11 22:53:34 -07:00
Jason Evans
17ddddee10 Specify -Werror for run_tests builds. 2017-05-11 22:11:35 -07:00
Jason Evans
81ef365622 Avoid compiler warnings on Windows. 2017-05-11 18:06:20 -07:00
Jason Evans
11d2f39d96 Remove mutex_prof_data_t redeclaration.
Redeclaration causes compilations failures with e.g. gcc 4.2.1 on
FreeBSD.  This regression was introduced by
89e2d3c12b (Header refactoring: ctl -
unify and remove from catchall.).
2017-05-11 10:49:43 -07:00
Jason Evans
31baedbbb9 Add --with-version=VERSION .
This simplifies configuration when embedding a jemalloc release into
another project's git repository.

This resolves #811.
2017-05-03 10:45:43 -07:00
Jason Evans
0798fe6e70 Fix rtree_leaf_elm_szind_slab_update().
Re-read the leaf element when atomic CAS fails due to a race with
another thread that has locked the leaf element, since
atomic_compare_exchange_strong_p() overwrites the expected value with
the actual value on failure.  This regression was introduced by
0ee0e0c155 (Implement compact rtree leaf
element representation.).

This resolves #798.
2017-05-03 08:52:33 -07:00
Jason Evans
344dd342dd rtree_leaf_elm_extent_write() --> rtree_leaf_elm_extent_lock_write()
Refactor rtree_leaf_elm_extent_write() as
rtree_leaf_elm_extent_lock_write(), so that whether the leaf element is
currently acquired is separate from what lock state to write.  This
allows for a relaxed atomic read when releasing the lock.
2017-05-03 08:52:33 -07:00
rustyx
1c982c37d9 Make VS2015 project work again 2017-05-02 08:20:29 -07:00
Qi Wang
fc1aaf13fe Revert "Use trylock in tcache_bin_flush when possible."
This reverts commit 8584adc451.  Production
results not favorable.  Will investigate separately.
2017-05-01 14:49:42 -07:00
David Goldblatt
209f2926b8 Header refactoring: tsd - cleanup and dependency breaking.
This removes the tsd macros (which are used only for tsd_t in real builds).  We
break up the circular dependencies involving tsd.

We also move all tsd access through getters and setters.  This allows us to
assert that we only touch data when tsd is in a valid state.

We simplify the usages of the x macro trick, removing all the customizability
(get/set, init, cleanup), moving the lifetime logic to tsd_init and tsd_cleanup.
This lets us make initialization order independent of order within tsd_t.
2017-05-01 10:49:56 -07:00
Jason Evans
c86c8f4ffb Add extent_destroy_t and use it during arena destruction.
Add the extent_destroy_t extent destruction hook to extent_hooks_t, and
use it during arena destruction.  This hook explicitly communicates to
the callee that the extent must be destroyed or tracked for later reuse,
lest it be permanently leaked.  Prior to this change, retained extents
could unintentionally be leaked if extent retention was enabled.

This resolves #560.
2017-04-29 09:24:12 -07:00
Jason Evans
b9ab04a191 Refactor !opt.munmap to opt.retain. 2017-04-29 09:24:12 -07:00
Qi Wang
d901a37775 Revert "Use try_flush first in tcache_dalloc."
This reverts commit b0c2a28280.  Production
benchmark shows this caused significant regression in both CPU and memory
consumption.  Will investigate separately later on.
2017-04-28 10:59:04 -07:00
Qi Wang
5c56603e91 Inline tcache_bin_flush_small_impl / _large_impl. 2017-04-27 17:49:39 -07:00
Qi Wang
b0c2a28280 Use try_flush first in tcache_dalloc.
Only do must_flush if try_flush didn't manage to free anything.
2017-04-25 17:21:33 -07:00
Qi Wang
8584adc451 Use trylock in tcache_bin_flush when possible.
During tcache gc, use tcache_bin_try_flush_small / _large so that we can skip
items with their bins locked already.
2017-04-25 17:21:33 -07:00
Qi Wang
e2aad5e810 Remove redundant extent lookup in tcache_bin_flush_large. 2017-04-25 16:50:12 -07:00
Qi Wang
05775a3736 Avoid prof_dump during reentrancy. 2017-04-25 12:54:36 -07:00
David Goldblatt
268843ac68 Header refactoring: pages.h - unify and remove from catchall. 2017-04-25 09:51:38 -07:00
David Goldblatt
dab4beb277 Header refactoring: hash - unify and remove from catchall. 2017-04-25 09:51:38 -07:00
David Goldblatt
89e2d3c12b Header refactoring: ctl - unify and remove from catchall.
In order to do this, we introduce the mutex_prof module, which breaks a circular
dependency between ctl and prof.
2017-04-25 09:51:38 -07:00
Jason Evans
c67c3e4a63 Replace --disable-munmap with opt.munmap.
Control use of munmap(2) via a run-time option rather than a
compile-time option (with the same per platform default).  The old
behavior of --disable-munmap can be achieved with
--with-malloc-conf=munmap:false.

This partially resolves #580.
2017-04-24 20:37:16 -07:00
Jason Evans
e2cc6280ed Remove --enable-code-coverage.
This option hasn't been particularly useful since the original pre-3.0.0
push to broaden test coverage.

This partially resolves #580.
2017-04-24 16:33:04 -07:00
Jason Evans
0f63396b23 Remove --disable-cc-silence.
The explicit compiler warning suppression controlled by this option is
universally desirable, so remove the ability to disable suppression.

This partially resolves #580.
2017-04-24 15:02:45 -07:00
Qi Wang
cf6035e1ee Use trylock in arena_decay_impl().
If another thread is working on decay, we don't have to wait for the mutex.
2017-04-24 13:23:55 -07:00
Qi Wang
f970c497dc Implement malloc_mutex_trylock() w/ proper stats update. 2017-04-24 13:23:55 -07:00
Jason Evans
af76f0e5d2 Remove --with-lg-tiny-min.
This option isn't useful in practice.

This partially resolves #580.
2017-04-24 11:48:28 -07:00
Jason Evans
b54530020f Remove --with-lg-size-class-group.
Four size classes per size doubling has proven to be a universally good
choice for the entire 4.x release series, so there's little point to
preserving this configurability.

This partially resolves #580.
2017-04-24 11:28:49 -07:00
David Goldblatt
120c7a747f Header refactoring: bitmap - unify and remove from catchall. 2017-04-24 10:33:21 -07:00
David Goldblatt
d6b5c7e0f6 Header refactoring: stats - unify and remove from catchall 2017-04-24 10:33:21 -07:00
David Goldblatt
36abf78aa9 Header refactoring: move smoothstep.h out of the catchall. 2017-04-24 10:33:21 -07:00
David Goldblatt
31b43219db Header refactoring: size_classes module - remove from the catchall 2017-04-24 10:33:21 -07:00
David Goldblatt
68da2361d2 Header refactoring: ckh module - remove from the catchall and unify. 2017-04-24 10:33:21 -07:00
David Goldblatt
bf2dc7e678 Header refactoring: ticker module - remove from the catchall and unify. 2017-04-24 10:33:21 -07:00
David Goldblatt
fa3ad730c4 Header refactoring: prng module - remove from the catchall and unify. 2017-04-24 10:33:21 -07:00
David Goldblatt
4d2e4bf5eb Get rid of most of the various inline macros. 2017-04-24 10:33:21 -07:00
Jason Evans
7d86c92c61 Add missing 'test' to LG_SIZEOF_PTR tests.
This fixes a bug/regression introduced by
a01f993077 (Only disable munmap(2) by
default on 64-bit Linux.).
2017-04-24 10:15:52 -07:00
Qi Wang
3aac709029 Output MALLOC_CONF and debug cmd when test failure happens. 2017-04-21 22:52:02 -07:00
David Goldblatt
425253e2cd Enable -Wundef, when supported.
This can catch bugs in which one header defines a numeric constant, and another
uses it without including the defining header. Undefined preprocessor symbols
expand to '0', so that this will compile fine, silently doing the math wrong.
2017-04-21 17:03:56 -07:00
Jason Evans
3823effe12 Remove --enable-ivsalloc.
Continue to use ivsalloc() when --enable-debug is specified (and add
assertions to guard against 0 size), but stop providing a documented
explicit semantics-changing band-aid to dodge undefined behavior in
sallocx() and malloc_usable_size().  ivsalloc() remains compiled in,
unlike when #211 restored --enable-ivsalloc, and if
JEMALLOC_FORCE_IVSALLOC is defined during compilation, sallocx() and
malloc_usable_size() will still use ivsalloc().

This partially resolves #580.
2017-04-21 14:34:35 -07:00
Jason Evans
b2a8453a3f Remove --disable-tls.
This option is no longer useful, because TLS is correctly configured
automatically on all supported platforms.

This partially resolves #580.
2017-04-21 11:12:29 -07:00
Jim Chen
ae248a2160 Use openat syscall if available
Some architectures like AArch64 may not have the open syscall because it
was superseded by the openat syscall, so check and use SYS_openat if
SYS_open is not available.

Additionally, Android headers for AArch64 define SYS_open to __NR_open,
even though __NR_open is undefined. Undefine SYS_open in that case so
SYS_openat is used.
2017-04-21 10:58:42 -07:00
Jason Evans
4403c9ab44 Remove --disable-tcache.
Simplify configuration by removing the --disable-tcache option, but
replace the testing for that configuration with
--with-malloc-conf=tcache:false.

Fix the thread.arena and thread.tcache.flush mallctls to work correctly
if tcache is disabled.

This partially resolves #580.
2017-04-21 10:06:12 -07:00
Qi Wang
5aa46f027d Bypass extent tracking for auto arenas.
Tracking extents is required by arena_reset.  To support this, the extent
linkage was used for tracking 1) large allocations, and 2) full slabs.  However
modifying the extent linkage could be an expensive operation as it likely incurs
cache misses.  Since we forbid arena_reset on auto arenas, let's bypass the
linkage operations for auto arenas.
2017-04-21 00:29:18 -07:00
Jason Evans
fed9a880c8 Trim before commit in extent_recycle().
This avoids creating clean committed pages as a side effect of aligned
allocation.  For configurations that decommit memory, purged pages are
decommitted, and decommitted extents cannot be coalesced with committed
extents.  Unless the clean committed pages happen to be selected during
allocation, they cause unnecessary permanent extent fragmentation.

This resolves #766.
2017-04-19 21:05:12 -07:00
Qi Wang
acf4c8ae33 Output 4 counters for bin mutexes instead of just 2. 2017-04-19 14:53:32 -07:00
Jason Evans
da4cff0279 Support --with-lg-page values larger than system page size.
All mappings continue to be PAGE-aligned, even if the system page size
is smaller.  This change is primarily intended to provide a mechanism
for supporting multiple page sizes with the same binary; smaller page
sizes work better in conjunction with jemalloc's design.

This resolves #467.
2017-04-18 19:01:04 -07:00
Jason Evans
45f087eb03 Revert "Remove BITMAP_USE_TREE."
Some systems use a native 64 KiB page size, which means that the bitmap
for the smallest size class can be 8192 bits, not just 512 bits as when
the page size is 4 KiB.  Linear search in bitmap_{sfu,ffu}() is
unacceptably slow for such large bitmaps.

This reverts commit 7c00f04ff4.
2017-04-18 19:01:04 -07:00
David Goldblatt
38e847c1c5 Header refactoring: unify spin.h and move it out of the catch-all. 2017-04-18 18:35:03 -07:00
David Goldblatt
418d96a86c Header refactoring: unify nstime.h and move it out of the catch-all 2017-04-18 18:35:03 -07:00
David Goldblatt
7ebc83894f Header refactoring: move jemalloc_internal_types.h out of the catch-all 2017-04-18 18:35:03 -07:00
David Goldblatt
d9ec36e22d Header refactoring: move assert.h out of the catch-all 2017-04-18 18:35:03 -07:00
David Goldblatt
f692e6c214 Header refactoring: move util.h out of the catchall 2017-04-18 18:35:03 -07:00
David Goldblatt
54373be084 Header refactoring: move malloc_io.h out of the catchall 2017-04-18 18:35:03 -07:00
David Goldblatt
0b00ffe55f Header refactoring: move bit_util.h out of the catchall 2017-04-18 18:35:03 -07:00
David Goldblatt
22366518b7 Move CPP_PROLOGUE and CPP_EPILOGUE to the .cpp
This lets us avoid having to specify them in every C file.
2017-04-18 18:35:03 -07:00
Jason Evans
a01f993077 Only disable munmap(2) by default on 64-bit Linux.
This reduces the likelihood of address space exhaustion on 32-bit
systems.

This resolves #350.
2017-04-17 16:41:01 -07:00
Jason Evans
c43a83d225 Fix LD_PRELOAD_VAR configuration logic for 64-bit AIX. 2017-04-17 16:41:01 -07:00
Qi Wang
855c127348 Remove the function alignment of prof_backtrace.
This was an attempt to avoid triggering slow path in libunwind, however turns
out to be ineffective.
2017-04-17 16:19:32 -07:00
Jason Evans
881fbf762f Prefer old/low extent_t structures during reuse.
Rather than using a LIFO queue to track available extent_t structures,
use a red-black tree, and always choose the oldest/lowest available
during reuse.
2017-04-17 14:47:45 -07:00
Jason Evans
76b35f4b2f Track extent structure serial number (esn) in extent_t.
This enables stable sorting of extent_t structures.
2017-04-17 14:47:45 -07:00
Jason Evans
69aa552809 Allocate increasingly large base blocks.
Limit the total number of base block by leveraging the exponential
size class sequence, similarly to extent_grow_retained().
2017-04-17 14:47:45 -07:00
Jason Evans
675701660c Update base_unmap() to match extent_dalloc_wrapper().
Reverse the order of forced versus lazy purging attempts in
base_unmap(), in order to match the order in extent_dalloc_wrapper(),
which was reversed by 64e458f5cd
(Implement two-phase decay-based purging.).
2017-04-17 14:47:45 -07:00
Qi Wang
3c9c41edb2 Improve rtree cache with a two-level cache design.
Two levels of rcache is implemented: a direct mapped cache as L1, combined with
a LRU cache as L2.  The L1 cache offers low cost on cache hit, but could suffer
collision under circumstances.  This is complemented by the L2 LRU cache, which
is slower on cache access (overhead from linear search + reordering), but solves
collison of L1 rather well.
2017-04-17 12:05:23 -07:00
Qi Wang
d16f1e53df Skip percpu arena when choosing iarena. 2017-04-16 21:34:44 -07:00
Qi Wang
c2fcf9c2cf Switch to fine-grained reentrancy support.
Previously we had a general detection and support of reentrancy, at the cost of
having branches and inc / dec operations on fast paths.  To avoid taxing fast
paths, we move the reentrancy operations onto tsd slow state, and only modify
reentrancy level around external calls (that might trigger reentrancy).
2017-04-14 19:48:06 -07:00
Qi Wang
b348ba29bb Bundle 3 branches on fast path into tsd_state.
Added tsd_state_nominal_slow, which on fast path malloc() incorporates
tcache_enabled check, and on fast path free() bundles both malloc_slow and
tcache_enabled branches.
2017-04-14 16:58:08 -07:00
Qi Wang
ccfe68a916 Pass alloc_ctx down profiling path.
With this change, when profiling is enabled, we avoid doing redundant rtree
lookups. Also changed dalloc_atx_t to alloc_atx_t, as it's now used on
allocation path as well (to speed up profiling).
2017-04-12 13:55:39 -07:00
Qi Wang
f35213bae4 Pass dalloc_ctx down the sdalloc path.
This avoids redundant rtree lookups.
2017-04-12 13:55:39 -07:00
David Goldblatt
e709fae1d7 Header refactoring: move atomic.h out of the catch-all 2017-04-11 11:52:30 -07:00
David Goldblatt
743d940dc3 Header refactoring: Split up jemalloc_internal.h
This is a biggy.  jemalloc_internal.h has been doing multiple jobs for a while
now:
- The source of system-wide definitions.
- The catch-all include file.
- The module header file for jemalloc.c

This commit splits up this functionality.  The system-wide definitions
responsibility has moved to jemalloc_preamble.h.  The catch-all include file is
now jemalloc_internal_includes.h.  The module headers for jemalloc.c are now in
jemalloc_internal_[externs|inlines|types].h, just as they are for the other
modules.
2017-04-11 11:52:30 -07:00
David Goldblatt
0237870c60 Header refactoring: break out ql.h dependencies 2017-04-11 11:52:30 -07:00
David Goldblatt
610cb83419 Header refactoring: break out qr.h dependencies 2017-04-11 11:52:30 -07:00
David Goldblatt
63a5cd4cc2 Header refactoring: break out rb.h dependencies 2017-04-11 11:52:30 -07:00
David Goldblatt
2f00ce4da7 Header refactoring: break out ph.h dependencies 2017-04-11 11:52:30 -07:00
David Goldblatt
57e36e1a12 Header refactoring: Add CPP_PROLOGUE and CPP_EPILOGUE macros 2017-04-11 11:52:30 -07:00
Qi Wang
bfa530b75b Pass dealloc_ctx down free() fast path.
This gets rid of the redundent rtree lookup down fast path.
2017-04-11 09:58:12 -07:00
David Goldblatt
8209df24ea Turn on -Werror for travis CI builds 2017-04-10 17:12:36 -07:00
Rafael Folco
701daa5298 Port CPU_SPINWAIT to __powerpc64__
Hyper-threaded CPUs may need a special instruction inside spin loops in
order to yield to another virtual CPU. The 'pause' instruction that is
available for x86 is not supported on Power.
Apparently the extended mnemonics like yield, mdoio, and mdoom are not
actually implemented on POWER8, although mentioned in the ISA 2.07
document. The recommended magic bits are an 'or 31,31,31'.
2017-04-10 12:33:02 -07:00
Qi Wang
04ef218d87 Move reentrancy_level to the beginning of TSD. 2017-04-07 16:25:43 -07:00
David Goldblatt
b407a65401 Add basic reentrancy-checking support, and allow arena_new to reenter.
This checks whether or not we're reentrant using thread-local data, and, if we
are, moves certain internal allocations to use arena 0 (which should be properly
initialized after bootstrapping).

The immediate thing this allows is spinning up threads in arena_new, which will
enable spinning up background threads there.
2017-04-07 14:10:27 -07:00
David Goldblatt
0a0fcd3e6a Add hooking functionality
This allows us to hook chosen functions and do interesting things there (in
particular: reentrancy checking).
2017-04-07 14:10:27 -07:00
Qi Wang
36bd90b962 Optimizing TSD and thread cache layout.
1) Re-organize TSD so that frequently accessed fields are closer to the
beginning and more compact.  Assuming 64-bit, the first 2.5 cachelines now
contains everything needed on tcache fast path, expect the tcache struct itself.

2) Re-organize tcache and tbins.  Take lg_fill_div out of tbin, and reduce tbin
to 24 bytes (down from 32). Split tbins into tbins_small and tbins_large, and
place tbins_small close to the beginning.
2017-04-07 14:06:17 -07:00
Qi Wang
4dec507546 Bypass witness_fork in TSD when !config_debug.
With the tcache change, we plan to leave some blank space when !config_debug
(unused tbins, witnesses) at the end of the tsd. Let's not touch the memory.
2017-04-07 14:06:17 -07:00
Qi Wang
0fba57e579 Get rid of tcache_enabled_t as we have runtime init support. 2017-04-07 10:42:29 -07:00
Qi Wang
fde3e20cc0 Integrate auto tcache into TSD.
The embedded tcache is initialized upon tsd initialization.  The avail arrays
for the tbins will be allocated / deallocated accordingly during init / cleanup.

With this change, the pointer to the auto tcache will always be available, as
long as we have access to the TSD.  tcache_available() (called in tcache_get())
is provided to check if we should use tcache.
2017-04-07 09:55:14 -07:00
David Goldblatt
eeabdd2466 Remove the pre-C11-atomics API, which is now unused 2017-04-05 16:25:37 -07:00
David Goldblatt
074f2256ca Make prof's cum_gctx a C11-style atomic 2017-04-05 16:25:37 -07:00
David Goldblatt
5dcc13b342 Make the mutex n_waiting_thds field a C11-style atomic 2017-04-05 16:25:37 -07:00
David Goldblatt
492a941f49 Convert extent module to use C11-style atomcis 2017-04-05 16:25:37 -07:00
David Goldblatt
30d74db08e Convert accumbytes in prof_accum_t to C11 atomics, when possible 2017-04-05 16:25:37 -07:00
David Goldblatt
55d992c48c Make extent_dss use C11-style atomics 2017-04-05 16:25:37 -07:00
David Goldblatt
92aafb0efe Make base_t's extent_hooks field C11-atomic 2017-04-05 16:25:37 -07:00
David Goldblatt
56b72c7b17 Transition arena struct fields to C11 atomics 2017-04-05 16:25:37 -07:00
David Goldblatt
bc32ec3503 Move arena-tracking atomics in jemalloc.c to C11-style 2017-04-05 16:25:37 -07:00
David Goldblatt
864adb7f42 Transition e_prof_tctx in struct extent to C11 atomics 2017-04-04 16:46:04 -07:00
David Goldblatt
7da04a6b09 Convert prng module to use C11-style atomics 2017-04-04 16:45:52 -07:00
Qi Wang
492e9f301e Make the tsd member init functions to take tsd_t * type. 2017-04-04 14:06:07 -07:00
Qi Wang
d3cda3423c Do proper cleanup for tsd_state_reincarnated.
Also enable arena_bind under non-nominal state, as the cleanup will be handled
correctly now.
2017-04-04 00:34:49 -07:00
Qi Wang
51d3682950 Remove the leafkey NULL check in leaf_elm_lookup. 2017-04-04 00:27:35 -07:00
Qi Wang
9ed84b0d45 Add init function support to tsd members.
This will facilitate embedding tcache into tsd, which will require proper
initialization cannot be done via the static initializer.  Make tsd->rtree_ctx
to be initialized via rtree_ctx_data_init().
2017-04-04 00:19:21 -07:00
Aliaksey Kandratsenka
5bf800a542 issue-586: detect main executable even if PIE is active
Previous logic of detecting main program addresses is to assume that
main executable is at least addressess. With PIE (active by default on
Ubuntus) it doesn't work.

In order to deal with that, we're attempting to find main executable
mapping in /proc/[pid]/maps. And old logic is preserved too just in
case.
2017-04-03 19:02:51 -07:00
Qi Wang
d4e98bc0b2 Lookup extent once per time during tcache_flush_small / _large.
Caching the extents on stack to avoid redundant looking up overhead.
2017-03-28 09:58:25 -07:00
Jason Evans
07f4f93434 Move arena_slab_data_t's nfree into extent_t's e_bits.
Compact extent_t to 128 bytes on 64-bit systems by moving
arena_slab_data_t's nfree into extent_t's e_bits.

Cacheline-align extent_t structures so that they always cross the
minimum number of cacheline boundaries.

Re-order extent_t fields such that all fields except the slab bitmap
(and overlaid heap profiling context pointer) are in the first
cacheline.

This resolves #461.
2017-03-27 22:43:39 -07:00
Qi Wang
af3d737a9a Simplify rtree cache replacement policy.
To avoid memmove on free() fast path, simplify the cache replacement policy to
only bubble up the cache hit element by 1.
2017-03-27 13:42:31 -07:00
Jason Evans
c6d1819e48 Simplify rtree_clear() to avoid locking. 2017-03-27 13:22:52 -07:00
Jason Evans
4020523f67 Fix a race in rtree_szind_slab_update() for RTREE_LEAF_COMPACT. 2017-03-27 13:22:36 -07:00
Jason Evans
7c00f04ff4 Remove BITMAP_USE_TREE.
Remove tree-structured bitmap support, in order to reduce complexity and
ease maintenance.  No bitmaps larger than 512 bits have been necessary
since before 4.0.0, and there is no current plan that would increase
maximum bitmap size.  Although tree-structured bitmaps were used on
32-bit platforms prior to this change, the overall benefits were
questionable (higher metadata overhead, higher bitmap modification cost,
marginally lower search cost).
2017-03-27 12:18:40 -07:00
Jason Evans
6258176c87 Fix bitmap_ffu() to work with 3+ levels. 2017-03-27 12:18:40 -07:00
Jason Evans
735ad8210c Pack various extent_t fields into a bitfield.
This reduces sizeof(extent_t) from 160 to 136 on x64.
2017-03-25 23:30:13 -07:00
Jason Evans
0591c204b4 Store arena index rather than (arena_t *) in extent_t. 2017-03-25 23:30:13 -07:00
Jason Evans
5e12223925 Fix BITMAP_USE_TREE version of bitmap_ffu().
This fixes an extent searching regression on 32-bit systems, caused by
the initial bitmap_ffu() implementation in
c8021d01f6 (Implement bitmap_ffu(), which
finds the first unset bit.), as first used in
5d33233a5e (Use a bitmap in extents_t to
speed up search.).
2017-03-25 23:29:32 -07:00
Qi Wang
e6b074472e Force inline ifree to avoid function call costs on fast path.
Without ALWAYS_INLINE, sometimes ifree() gets compiled into its own function,
which adds overhead on the fast path.
2017-03-24 17:54:28 -07:00
Jason Evans
5d33233a5e Use a bitmap in extents_t to speed up search.
Rather than iteratively checking all sufficiently large heaps during
search, maintain and use a bitmap in order to skip empty heaps.
2017-03-24 17:52:46 -07:00
Jason Evans
57e353163f Implement BITMAP_GROUPS(). 2017-03-24 17:52:46 -07:00
Jason Evans
c8021d01f6 Implement bitmap_ffu(), which finds the first unset bit. 2017-03-24 17:52:46 -07:00
Jason Evans
a832ebaee9 Use first fit layout policy instead of best fit.
For extents which do not delay coalescing, use first fit layout policy
rather than first-best fit layout policy.  This packs extents toward
older virtual memory mappings, but at the cost of higher search overhead
in the common case.

This resolves #711.
2017-03-24 17:52:46 -07:00
Qi Wang
bbc16a50f9 Added documentation for mutex profiling related mallctls. 2017-03-23 00:03:28 -07:00
Qi Wang
362e356675 Profile per arena base mutex, instead of just a0. 2017-03-23 00:03:28 -07:00
Qi Wang
d3fde1c124 Refactor mutex profiling code with x-macros. 2017-03-23 00:03:28 -07:00
Qi Wang
f6698ec1e6 Switch to nstime_t for the time related fields in mutex profiling. 2017-03-23 00:03:28 -07:00
Qi Wang
74f78cafda Added custom mutex spin.
A fixed max spin count is used -- with benchmark results showing it
solves almost all problems. As the benchmark used was rather intense,
the upper bound could be a little bit high. However it should offer a
good tradeoff between spinning and blocking.
2017-03-23 00:03:28 -07:00
Qi Wang
20b8c70e9f Added extents_dirty / _muzzy mutexes, as well as decay_dirty / _muzzy. 2017-03-23 00:03:28 -07:00
Qi Wang
64c5f5c174 Added "stats.mutexes.reset" mallctl to reset all mutex stats.
Also switched from the term "lock" to "mutex".
2017-03-23 00:03:28 -07:00
Qi Wang
bd2006a41b Added JSON output for lock stats.
Also added option 'x' to malloc_stats() to bypass lock section.
2017-03-23 00:03:28 -07:00
Qi Wang
ca9074deff Added lock profiling and output for global locks (ctl, prof and base). 2017-03-23 00:03:28 -07:00
Qi Wang
0fb5c0e853 Add arena lock stats output. 2017-03-23 00:03:28 -07:00
Qi Wang
a4f176af57 Output bin lock profiling results to malloc_stats.
Two counters are included for the small bins: lock contention rate, and
max lock waiting time.
2017-03-23 00:03:28 -07:00
Qi Wang
6309df628f First stage of mutex profiling.
Switched to trylock and update counters based on state.
2017-03-23 00:03:28 -07:00
Jason Evans
32e7cf51cd Further specialize arena_[s]dalloc() tcache fast path.
Use tsd_rtree_ctx() rather than tsdn_rtree_ctx() when tcache is
non-NULL, in order to avoid an extra branch (and potentially extra stack
space) in the fast path.
2017-03-22 18:33:32 -07:00
Jason Evans
5e67fbc367 Push down iealloc() calls.
Call iealloc() as deep into call chains as possible without causing
redundant calls.
2017-03-22 18:33:32 -07:00
Jason Evans
51a2ec92a1 Remove extent dereferences from the deallocation fast paths. 2017-03-22 18:33:32 -07:00
Jason Evans
4f341412e5 Remove extent arg from isalloc() and arena_salloc(). 2017-03-22 18:33:32 -07:00
Jason Evans
0ee0e0c155 Implement compact rtree leaf element representation.
If a single virtual adddress pointer has enough unused bits to pack
{szind_t, extent_t *, bool, bool}, use a single pointer-sized field in
each rtree leaf element, rather than using three separate fields.  This
has little impact on access speed (fewer loads/stores, but more bit
twiddling), except that denser representation increases TLB
effectiveness.
2017-03-22 18:33:32 -07:00
Jason Evans
ce41ab0c57 Embed root node into rtree_t.
This avoids one atomic operation per tree access.
2017-03-22 18:33:32 -07:00
Jason Evans
99d68445ef Incorporate szind/slab into rtree leaves.
Expand and restructure the rtree API such that all common operations can
be achieved with minimal work, regardless of whether the rtree leaf
fields are independent versus packed into a single atomic pointer.
2017-03-22 18:33:32 -07:00
Jason Evans
944c8a3383 Split rtree_elm_t into rtree_{node,leaf}_elm_t.
This allows leaf elements to differ in size from internal node elements.

In principle it would be more correct to use a different type for each
level of the tree, but due to implementation details related to atomic
operations, we use casts anyway, thus counteracting the value of
additional type correctness.  Furthermore, such a scheme would require
function code generation (via cpp macros), as well as either unwieldy
type names for leaves or type aliases, e.g.

  typedef struct rtree_elm_d2_s rtree_leaf_elm_t;

This alternate strategy would be more correct, and with less code
duplication, but probably not worth the complexity.
2017-03-22 18:33:32 -07:00
Jason Evans
f50d6009fe Remove binind field from arena_slab_data_t.
binind is now redundant; the containing extent_t's szind field always
provides the same value.
2017-03-22 18:33:32 -07:00
Jason Evans
e8921cf2eb Convert extent_t's usize to szind.
Rather than storing usize only for large (and prof-promoted)
allocations, store the size class index for allocations that reside
within the extent, such that the size class index is valid for all
extents that contain extant allocations, and invalid otherwise (mainly
to make debugging simpler).
2017-03-22 18:33:32 -07:00
Jason Evans
bda12bd925 Clamp LG_VADDR for 32-bit builds on x64. 2017-03-22 18:33:32 -07:00
Qi Wang
ad91762635 Not re-binding iarena when migrate between arenas. 2017-03-21 14:05:20 -07:00
Jason Evans
3a1363bcf8 Refactor tcaches flush/destroy to reduce lock duration.
Drop tcaches_mtx before calling tcache_destroy().
2017-03-16 08:59:58 -07:00
Jason Evans
afb46ce236 Propagate madvise() success/failure from pages_purge_lazy(). 2017-03-16 08:44:57 -07:00
Jason Evans
64e458f5cd Implement two-phase decay-based purging.
Split decay-based purging into two phases, the first of which uses lazy
purging to convert dirty pages to "muzzy", and the second of which uses
forced purging, decommit, or unmapping to convert pages to clean or
destroy them altogether.  Not all operating systems support lazy
purging, yet the application may provide extent hooks that implement
lazy purging, so care must be taken to dynamically omit the first phase
when necessary.

The mallctl interfaces change as follows:
- opt.decay_time --> opt.{dirty,muzzy}_decay_time
- arena.<i>.decay_time --> arena.<i>.{dirty,muzzy}_decay_time
- arenas.decay_time --> arenas.{dirty,muzzy}_decay_time
- stats.arenas.<i>.pdirty --> stats.arenas.<i>.p{dirty,muzzy}
- stats.arenas.<i>.{npurge,nmadvise,purged} -->
  stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}

This resolves #521.
2017-03-15 13:13:47 -07:00
Jason Evans
38a5bfc816 Move arena_t's purging field into arena_decay_t. 2017-03-15 13:13:47 -07:00
Jason Evans
765edd67b4 Refactor decay-related function parametrization.
Refactor most of the decay-related functions to take as parameters the
decay_t and associated extents_t structures to operate on.  This
prepares for supporting both lazy and forced purging on different decay
schedules.
2017-03-15 13:13:47 -07:00
David Goldblatt
ee202efc79 Convert remaining arena_stats_t fields to atomics
These were all size_ts, so we have atomics support for them on all platforms, so
the conversion is straightforward.

Left non-atomic is curlextents, which AFAICT is not used atomically anywhere.
2017-03-13 18:22:33 -07:00
David Goldblatt
4fc2acf5ae Switch atomic uint64_ts in arena_stats_t to C11 atomics
I expect this to be the trickiest conversion we will see, since we want atomics
on 64-bit platforms, but are also always able to piggyback on some sort of
external synchronization on non-64 bit platforms.
2017-03-13 18:22:33 -07:00
Jason Evans
26d23da6cd Prefer pages_purge_forced() over memset().
This has the dual advantages of allowing for sparsely used large
allocations, and relying on the kernel to supply zeroed pages, which
tends to be very fast on modern systems.
2017-03-13 18:19:57 -07:00
Jason Evans
28078274c4 Add alignment/size assertions to pages_*().
These sanity checks prevent what otherwise might result in failed system
calls and unintended fallback execution paths.
2017-03-13 18:19:57 -07:00
Jason Evans
7cbcd2e2b7 Fix pages_purge_forced() to discard pages on non-Linux systems.
madvise(..., MADV_DONTNEED) only causes demand-zeroing on Linux, so fall
back to overlaying a new mapping.
2017-03-13 18:19:57 -07:00
David Goldblatt
21a68e2d22 Convert rtree code to use C11 atomics
In the process, I changed the implementation of rtree_elm_acquire so that it
won't even try to CAS if its initial read (getting the extent + lock bit)
indicates that the CAS is doomed to fail.  This can significantly improve
performance under contention.
2017-03-13 12:05:27 -07:00
Jason Evans
3a2b183d5f Convert arena_t's purging field to non-atomic bool.
The decay mutex already protects all accesses.
2017-03-10 10:14:30 -08:00
Jason Evans
75fddc786c Fix ATOMIC_{ACQUIRE,RELEASE,ACQ_REL} definitions. 2017-03-09 00:57:37 -08:00
Qi Wang
f84471edc3 Add documentation for percpu_arena in jemalloc.xml.in. 2017-03-08 23:19:01 -08:00
Qi Wang
ec532e2c5c Implement per-CPU arena.
The new feature, opt.percpu_arena, determines thread-arena association
dynamically based CPU id. Three modes are supported: "percpu", "phycpu"
and disabled.

"percpu" uses the current core id (with help from sched_getcpu())
directly as the arena index, while "phycpu" will assign threads on the
same physical CPU to the same arena. In other words, "percpu" means # of
arenas == # of CPUs, while "phycpu" has # of arenas == 1/2 * (# of
CPUs). Note that no runtime check on whether hyper threading is enabled
is added yet.

When enabled, threads will be migrated between arenas when a CPU change
is detected. In the current design, to reduce overhead from reading CPU
id, each arena tracks the thread accessed most recently. When a new
thread comes in, we will read CPU id and update arena if necessary.
2017-03-08 23:19:01 -08:00
Qi Wang
8721e19c04 Fix arena_prefork lock rank order for witness.
When witness is enabled, lock rank order needs to be preserved during
prefork, not only for each arena, but also across arenas. This change
breaks arena_prefork into further stages to ensure valid rank order
across arenas. Also changed test/unit/fork to use a manual arena to
catch this case.
2017-03-08 23:07:27 -08:00
David Goldblatt
8adab26972 Convert extents_t's npages field to use C11-style atomics
In the process, we can do some strength reduction, changing the fetch-adds and
fetch-subs to be simple loads followed by stores, since the modifications all
occur while holding the mutex.
2017-03-08 21:27:09 -08:00
David Goldblatt
dafadce622 Reintroduce JEMALLOC_ATOMIC_U64
The C11 atomics backport removed this #define, which degraded atomic 64-bit
reads to require a lock even on platforms that support them.  This commit fixes
that.
2017-03-08 21:26:37 -08:00
Qi Wang
01f47f11a6 Store associated arena in tcache.
This fixes tcache_flush for manual tcaches, which wasn't able to find
the correct arena it associated with. Also changed the decay test to
cover this case (by using manually created arenas).
2017-03-07 12:58:11 -08:00
Jason Evans
cdce93e4a3 Use any-best-fit for cached extent allocation.
This simplifies what would be pairing heap operations to the equivalent
of LIFO queue operations.  This is a complementary optimization in the
context of delayed coalescing for cached extents.
2017-03-07 10:25:33 -08:00
Jason Evans
cc75c35db5 Add any() and remove_any() to ph.
These functions select the easiest-to-remove element in the heap, which
is either the most recently inserted aux list element or the root.  If
no calls are made to first() or remove_first(), the behavior (and time
complexity) is the same as for a LIFO queue.
2017-03-07 10:25:33 -08:00
Jason Evans
e201e24904 Perform delayed coalescing prior to purging.
Rather than purging uncoalesced extents, perform just enough incremental
coalescing to purge only fully coalesced extents.  In the absence of
cached extent reuse, the immediate versus delayed incremental purging
algorithms result in the same purge order.

This resolves #655.
2017-03-07 10:25:12 -08:00
Jason Evans
8547ee11c3 Fix flakiness in test_decay_ticker.
Fix the test_decay_ticker test to carefully control slab
creation/destruction such that the decay backlog reliably reaches zero.
Use an isolated arena so that no extraneous allocation can confuse the
situation.  Speed up time during the latter part of the test so that the
entire decay time can expire in a reasonable amount of wall time.
2017-03-07 10:25:12 -08:00
David Goldblatt
4f1e94658a Change arena to use the atomic functions for ssize_t instead of the union strategy 2017-03-06 18:49:19 -08:00
David Goldblatt
438efede78 Add atomic types for ssize_t 2017-03-06 18:49:19 -08:00
David Goldblatt
424e3428b1 Make type abbreviations consistent: ssize_t is zd everywhere 2017-03-06 18:49:19 -08:00
David Goldblatt
84326c566a Insert not_reached after an exhaustive switch
In the C11 atomics backport, we couldn't use not_reached() in
atomic_enum_to_builtin (in atomic_gcc_atomic.h), since atomic.h was hermetic and
assert.h wasn't; there was a dependency issue.  assert.h is hermetic now, so we
can include it.
2017-03-06 15:08:43 -08:00
David Goldblatt
e9852b5776 Disentangle assert and util
This is the first header refactoring diff, #533.  It splits the assert and util
components into separate, hermetic, header files.  In the process, it splits out
two of the large sub-components of util (the stdio.h replacement, and bit
manipulation routines) into their own components (malloc_io.h and bit_util.h).
This is mostly to break up cyclic dependencies, but it also breaks off a good
chunk of the catch-all-ness of util, which is nice.
2017-03-06 15:08:43 -08:00
Jason Evans
04d8fcb745 Optimize malloc_large_stats_t maintenance.
Convert the nrequests field to be partially derived, and the curlextents
to be fully derived, in order to reduce the number of stats updates
needed during common operations.

This change affects ndalloc stats during arena reset, because it is no
longer possible to cancel out ndalloc effects (curlextents would become
negative).
2017-03-04 08:18:31 -08:00
David Goldblatt
d4ac7582f3 Introduce a backport of C11 atomics
This introduces a backport of C11 atomics.  It has four implementations; ranked
in order of preference, they are:
- GCC/Clang __atomic builtins
- GCC/Clang __sync builtins
- MSVC _Interlocked builtins
- C11 atomics, from <stdatomic.h>

The primary advantages are:
- Close adherence to the standard API gives us a defined memory model.
- Type safety: atomic objects are now separate types from non-atomic ones, so
  that it's impossible to mix up atomic and non-atomic updates (which is
  undefined behavior that compilers are starting to take advantage of).
- Efficiency: we can specify ordering for operations, avoiding fences and
  atomic operations on strongly ordered architectures (example:
  `atomic_write_u32(ptr, val);` involves a CAS loop, whereas
  `atomic_store(ptr, val, ATOMIC_RELEASE);` is a plain store.

This diff leaves in the current atomics API (implementing them in terms of the
backport).  This lets us transition uses over piecemeal.

Testing:
This is by nature hard to test. I've manually tested the first three options on
Linux on gcc by futzing with the #defines manually, on freebsd with gcc and
clang, on MSVC, and on OS X with clang.  All of these were x86 machines though,
and we don't have any test infrastructure set up for non-x86 platforms.
2017-03-03 13:40:59 -08:00
David Goldblatt
957b8c5f21 Stop #define-ining away 'inline'
In the long term, we'll transition to C99-style inline semantics.  In the
short-term, this will allow both styles to coexist without breaking one another.
2017-03-03 13:40:59 -08:00
Jason Evans
fd058f572b Immediately purge cached extents if decay_time is 0.
This fixes a regression caused by
54269dc0ed (Remove obsolete
arena_maybe_purge() call.), as well as providing a general fix.

This resolves #665.
2017-03-02 19:43:06 -08:00
Jason Evans
d61a5f76b2 Convert arena_decay_t's time to be atomically synchronized. 2017-03-02 19:43:06 -08:00
Jason Evans
ff55f07eb6 Fix typos. 2017-03-01 15:31:30 -08:00
Qi Wang
aa1de06e3a Small style fix in ctl.c 2017-03-01 15:21:39 -08:00
charsyam
a8c9e9c651 fix typo sytem -> system 2017-03-01 08:40:05 -08:00
Jason Evans
04380e79f1 Merge branch 'rc-4.5.0' 2017-02-28 19:09:23 -08:00
Jason Evans
379dd44c57 Add casts to CONF_HANDLE_T_U().
This avoids signed/unsigned comparison warnings when specifying integer
constants as inputs.
2017-02-28 17:18:25 -08:00
Jason Evans
700253e1f2 Update ChangeLog for 4.5.0. 2017-02-28 16:21:05 -08:00
Jason Evans
2406c22f36 Add casts to CONF_HANDLE_T_U().
This avoids signed/unsigned comparison warnings when specifying integer
constants as inputs.

Clean up whitespace and add clarifying parentheses for
CONF_HANDLE_SIZE_T(opt_lg_chunk, ...).
2017-02-28 16:20:44 -08:00
Jason Evans
e723f99dec Alphabetize private symbol names. 2017-02-28 15:06:27 -08:00
Jason Evans
cbb6720861 Update ChangeLog for 4.5.0. 2017-02-28 14:25:26 -08:00
Jason Evans
d84d2909c3 Fix/enhance THP integration.
Detect whether chunks start off as THP-capable by default (according to
the state of /sys/kernel/mm/transparent_hugepage/enabled), and use this
as the basis for whether to call pages_nohuge() once per chunk during
first purge of any of the chunk's page runs.

Add the --disable-thp configure option, as well as the the opt.thp
mallctl.

This resolves #541.
2017-02-28 14:25:06 -08:00
Jason Evans
766ddcd0f2 restructure *CFLAGS configuration.
Convert CFLAGS to be a concatenation:

  CFLAGS := CONFIGURE_CFLAGS SPECIFIED_CFLAGS EXTRA_CFLAGS

This ordering makes it possible to override the flags set by the
configure script both during and after configuration, with CFLAGS and
EXTRA_CFLAGS, respectively.

This resolves #619.
2017-02-28 12:54:40 -08:00
Jason Evans
25d50a943a Dodge 32-bit-clang-specific backtracing failure.
This disables run_tests.sh configurations that use the combination of
32-bit clang and heap profiling.
2017-02-28 10:59:27 -08:00
Jason Evans
4a068644c7 Put -D_REENTRANT in CPPFLAGS rather than CFLAGS.
This regression was introduced by
194d6f9de8 (Restructure *CFLAGS/*CXXFLAGS
configuration.).
2017-02-28 01:21:26 -08:00
Qi Wang
7b53fe928e Handle race in stats_arena_bins_print
When multiple threads calling stats_print, race could happen as we read the
counters in separate mallctl calls; and the removed assertion could fail when
other operations happened in between the mallctl calls. For simplicity, output
"race" in the utilization field in this case.

This resolves #616.
2017-02-27 15:25:38 -08:00
Jason Evans
7c124830a1 Fix lg_chunk clamping for config_cache_oblivious.
Fix lg_chunk clamping to take into account cache-oblivious large
allocation.  This regression only resulted in incorrect behavior if
!config_fill (false unless --disable-fill specified) and
config_cache_oblivious (true unless --disable-cache-oblivious
specified).

This regression was introduced by
8a03cf039c (Implement cache index
randomization for large allocations.), which was first released in
4.0.0.

This resolves #555.
2017-02-27 15:19:41 -08:00
Jason Evans
1027a2682b Add some missing explicit casts.
This resolves #614.
2017-02-27 14:41:01 -08:00
Jason Evans
472fef2e12 Fix {allocated,nmalloc,ndalloc,nrequests}_large stats regression.
This fixes a regression introduced by
d433471f58 (Derive
{allocated,nmalloc,ndalloc,nrequests}_large stats.).
2017-02-27 11:18:07 -08:00
Jason Evans
079b8bee37 Tidy up extent quantization.
Remove obsolete unit test scaffolding for extent quantization.  Remove
redundant assertions.  Add an assertion to
extents_first_best_fit_locked() that should help prevent aligned
allocation regressions.
2017-02-27 11:17:47 -08:00
Jason Evans
1e2c9ef8d6 Fix huge-aligned allocation.
This regression was caused by
b9408d77a6 (Fix/simplify chunk_recycle()
allocation size computations.).

This resolves #647.
2017-02-27 11:08:19 -08:00
Jason Evans
d727596bcb Update a comment. 2017-02-26 11:05:27 -08:00
Jason Evans
ed19a48928 Silence harmless warnings discovered via run_tests.sh. 2017-02-26 11:04:58 -08:00
Jason Evans
54d2d697b2 Test JSON output of malloc_stats_print() and fix bugs.
Implement and test a JSON validation parser.  Use the parser to validate
JSON output from malloc_stats_print(), with a significant subset of
supported output options.

This resolves #583.
2017-02-26 11:04:58 -08:00
Jason Evans
61d26425e5 Fix JSON-mode output for !config_stats and/or !config_prof cases.
These bugs were introduced by b599b32280
(Add "J" (JSON) support to malloc_stats_print().), which was first
released in 4.3.0.

This resolves #615.
2017-02-26 11:04:58 -08:00
Jason Evans
adae7cfc4a Fix chunk_alloc_dss() regression.
Fix chunk_alloc_dss() to account for bytes that are not a multiple of
the chunk size.  This regression was introduced by
e2bcf037d4 (Make dss operations
lockless.), which was first released in 4.3.0.
2017-02-26 10:53:26 -08:00
Qi Wang
c2323e13a5 Get rid of witness in malloc_mutex_t when !(configured w/ debug).
We don't touch witness at all when config_debug == false.  Let's only pay the
memory cost in malloc_mutex_s when needed. Note that when !config_debug, we keep
the field in a union so that we don't have to do #ifdefs in multiple places.
2017-02-24 09:41:29 -08:00
Jason Evans
08c24e7c1a Relax witness assertions related to prof_gdump().
In some cases the prof machinery allocates (in order to modify the
bt2gctx hash table), and such operations are synchronized via
bt2gctx_mtx.  Rather than asserting that no locks are held on entry
into functions that may call prof_gdump(), make the weaker assertion
that no "core" locks are held.  The prof machinery enqueues dumps
triggered by prof_gdump() calls when bt2gctx_mtx is held, so this
weakened assertion avoids false failures in such cases.
2017-02-23 10:08:42 -08:00
Jason Evans
f56cb9a68e Add witness_assert_depth[_to_rank]().
This makes it possible to make lock state assertions about precisely
which locks are held.
2017-02-23 10:08:42 -08:00
Jason Evans
7034e6baa1 Enable mutex witnesses even when !isthreaded.
This fixes interactions with witness_assert_depth[_to_rank](), which was
added in dad74bd3c8 (Convert
witness_assert_lockless() to witness_assert_lock_depth().).
2017-02-23 10:08:42 -08:00
David Goldblatt
44e50041dc CI: Run --enable-debug builds on windows
This will hopefully catch some windows-specific bugs.
2017-02-23 10:06:15 -08:00
Jason Evans
e85e588e45 Use MALLOC_CONF rather than malloc_conf for tests.
malloc_conf does not reliably work with MSVC, which complains of
"inconsistent dll linkage", i.e. its inability to support the
application overriding malloc_conf when dynamically linking/loading.
Work around this limitation by adding test harness support for per test
shell script sourcing, and converting all tests to use MALLOC_CONF
instead of malloc_conf.
2017-02-23 10:06:15 -08:00
Jason Evans
3ecc3c8486 Fix/refactor tcaches synchronization.
Synchronize tcaches with tcaches_mtx rather than ctl_mtx.  Add missing
synchronization for tcache flushing.  This bug was introduced by
1cb181ed63 (Implement explicit tcache
support.), which was first released in 4.0.0.
2017-02-23 10:06:15 -08:00
Jason Evans
de49674fbd Use MALLOC_CONF rather than malloc_conf for tests.
malloc_conf does not reliably work with MSVC, which complains of
"inconsistent dll linkage", i.e. its inability to support the
application overriding malloc_conf when dynamically linking/loading.
Work around this limitation by adding test harness support for per test
shell script sourcing, and converting all tests to use MALLOC_CONF
instead of malloc_conf.
2017-02-23 08:57:02 -08:00
Jason Evans
fdba5ad5cc Repair file permissions.
This regression was caused by 8f61fdedb9
(Uniformly cast mallctl[bymib]() oldp/newp arguments to (void *).).

This resolves #538.
2017-02-22 00:24:32 -08:00
Jason Evans
8ac7937eb5 Remove remainder of mb (memory barrier).
This complements 94c5d22a4d (Remove mb.h,
which is unused).
2017-02-22 00:24:14 -08:00
Jason Evans
664ef652d9 Avoid -lgcc for heap profiling if unwind.h is missing.
This removes an unneeded library dependency when falling back to
intrinsics-based backtracing (or failing to enable heap profiling at
all).
2017-02-21 12:46:58 -08:00
Jason Evans
54269dc0ed Remove obsolete arena_maybe_purge() call.
Remove a call to arena_maybe_purge() that was necessary for ratio-based
purging, but is obsolete in the context of decay-based purging.
2017-02-21 12:46:41 -08:00
Jason Evans
003ca8717f Move arena_basic_stats_merge() prototype (hygienic cleanup). 2017-02-21 12:46:20 -08:00
David Goldblatt
d4f3f9a03f Beef up travis CI integration testing
Introduces gen_travis.py, which generates .travis.yml, and updates .travis.yml
to be the generated version.

The travis build matrix approach doesn't play well with mixing and matching
various different environment settings, so we generate every build explicitly,
rather than letting them do it for us.

To avoid abusing travis resources (and save us time waiting for CI results), we
don't test every possible combination of options; we only check up to 2 unusual
settings at a time.
2017-02-21 12:45:59 -08:00
Jason Evans
2dfc5b5aac Disable coalescing of cached extents.
Extent splitting and coalescing is a major component of large allocation
overhead, and disabling coalescing of cached extents provides a simple
and effective hysteresis mechanism.  Once two-phase purging is
implemented, it will probably make sense to leave coalescing disabled
for the first phase, but coalesce during the second phase.
2017-02-16 20:11:50 -08:00
Jason Evans
c1ebfaa673 Optimize extent coalescing.
Refactor extent_can_coalesce(), extent_coalesce(), and extent_record()
to avoid needlessly repeating extent [de]activation operations.
2017-02-16 20:11:50 -08:00
Jason Evans
b0654b95ed Fix arena->stats.mapped accounting.
Mapped memory increases when extent_alloc_wrapper() succeeds, and
decreases when extent_dalloc_wrapper() is called (during purging).
2017-02-16 15:52:11 -08:00
Jason Evans
f8fee6908d Synchronize arena->decay with arena->decay.mtx.
This removes the last use of arena->lock.
2017-02-16 09:39:46 -08:00
Jason Evans
d433471f58 Derive {allocated,nmalloc,ndalloc,nrequests}_large stats.
This mildly reduces stats update overhead during normal operation.
2017-02-16 09:39:46 -08:00
Jason Evans
ab25d3c987 Synchronize arena->tcache_ql with arena->tcache_ql_mtx.
This replaces arena->lock synchronization.
2017-02-16 09:39:46 -08:00
Jason Evans
6b5cba4191 Convert arena->stats synchronization to atomics. 2017-02-16 09:39:46 -08:00
Jason Evans
fa2d64c94b Convert arena->prof_accumbytes synchronization to atomics. 2017-02-16 09:39:46 -08:00
Jason Evans
b779522b9b Convert arena->dss_prec synchronization to atomics. 2017-02-16 09:39:46 -08:00
Jason Evans
0721b895ff Do not generate unused tsd_*_[gs]et() functions.
This avoids a gcc diagnostic note:
    note: The ABI for passing parameters with 64-byte alignment has
    changed in GCC 4.6
This note related to the cacheline alignment of rtree_ctx_t, which was
introduced by 4a346f5593 (Replace rtree
path cache with LRU cache.).
2017-02-13 10:47:16 -08:00
Jason Evans
cd2501efd6 Fix extent_alloc_dss() regression.
Fix extent_alloc_dss() to account for bytes that are not a multiple of
the page size.  This regression was introduced by
577d4572b0 (Make dss operations
lockless.), which was first released in 4.3.0.
2017-02-10 14:06:31 -08:00
Jason Evans
6b8ef771a9 Fix rtree_subkey() regression.
Fix rtree_subkey() to use uintptr_t rather than unsigned for key
bitmasking.  This regression was introduced by
4a346f5593 (Replace rtree path cache with
LRU cache.).
2017-02-10 09:05:02 -08:00
Jason Evans
7f55dbef9b Enable mutex witnesses even when !isthreaded.
This fixes interactions with witness_assert_depth[_to_rank](), which was
added in d0e93ada51 (Add
witness_assert_depth[_to_rank]().).
2017-02-09 17:05:47 -08:00
Jason Evans
db7da56359 Spin adaptively in rtree_elm_acquire(). 2017-02-08 18:50:03 -08:00
Jason Evans
de8a68e853 Enhance spin_adaptive() to yield after several iterations.
This avoids worst case behavior if e.g. another thread is preempted
while owning the resource the spinning thread is waiting for.
2017-02-08 18:50:03 -08:00
Jason Evans
5f11830754 Replace spin_init() with SPIN_INITIALIZER. 2017-02-08 18:50:03 -08:00
Jason Evans
650c070e10 Remove rtree support for 0 (NULL) keys.
NULL can never actually be inserted in practice, and removing support
allows a branch to be removed from the fast path.
2017-02-08 18:50:03 -08:00
Jason Evans
f5cf9b19c8 Determine rtree levels at compile time.
Rather than dynamically building a table to aid per level computations,
define a constant table at compile time.  Omit both high and low
insignificant bits.  Use one to three tree levels, depending on the
number of significant bits.
2017-02-08 18:50:03 -08:00
Jason Evans
ff4db5014e Remove rtree leading 0 bit optimization.
A subsequent change instead ignores insignificant high bits.
2017-02-08 18:50:03 -08:00
Jason Evans
cdc240d501 Make non-essential inline rtree functions static functions. 2017-02-08 18:50:03 -08:00
Jason Evans
c511a44e99 Split rtree_elm_lookup_hard() out of rtree_elm_lookup().
Anything but a hit in the first element of the lookup cache is
expensive enough to negate the benefits of inlining.
2017-02-08 18:50:03 -08:00
Jason Evans
4a346f5593 Replace rtree path cache with LRU cache.
Rework rtree_ctx_t to encapsulate an rtree leaf LRU lookup cache rather
than a single-path element lookup cache.  The replacement is logically
much simpler, as well as slightly faster in the fast path case and less
prone to degraded performance during non-trivial sequences of lookups.
2017-02-08 18:50:03 -08:00
Jason Evans
0ecf692726 Optimize a branch out of rtree_read() if !dependent. 2017-02-08 18:50:03 -08:00
Jason Evans
3bd6d8e41d Conditianalize lg_tcache_max use on JEMALLOC_TCACHE. 2017-02-07 12:15:36 -08:00
Jason Evans
5177995530 Fix extent_record().
Read adjacent rtree elements while holding element locks, since the
extents mutex only protects against relevant like-state extent mutation.

Fix management of the 'coalesced' loop state variable to merge
forward/backward results, rather than overwriting the result of forward
coalescing if attempting to coalesce backward.  In practice this caused
no correctness issues, but could cause extra iterations in rare cases.

These regressions were introduced by
d27f29b468 (Disentangle arena and extent
locking.).
2017-02-06 20:05:49 -08:00
Jason Evans
6737d5f61e Fix a race in extent_grow_retained().
Set extent as active prior to registration so that other threads can't
modify it in the absence of locking.

This regression was introduced by
d27f29b468 (Disentangle arena and extent
locking.), via non-obvious means.  Removal of extents_mtx protection
during extent_grow_retained() execution opened up the race, but in the
presence of that locking, the code was safe.

This resolves #599.
2017-02-04 12:15:13 -08:00
Jason Evans
1bac516aaa Optimize compute_size_with_overflow().
Do not check for overflow unless it is actually a possibility.
2017-02-03 19:13:05 -08:00
Jason Evans
767ffa2b5f Fix compute_size_with_overflow().
Fix compute_size_with_overflow() to use a high_bits mask that has the
high bits set, rather than the low bits.  This regression was introduced
by 5154ff32ee (Unify the allocation
paths).
2017-02-03 19:13:05 -08:00
Jason Evans
d27f29b468 Disentangle arena and extent locking.
Refactor arena and extent locking protocols such that arena and
extent locks are never held when calling into the extent_*_wrapper()
API.  This requires extra care during purging since the arena lock no
longer protects the inner purging logic.  It also requires extra care to
protect extents from being merged with adjacent extents.

Convert extent_t's 'active' flag to an enumerated 'state', so that
retained extents are explicitly marked as such, rather than depending on
ring linkage state.

Refactor the extent collections (and their synchronization) for cached
and retained extents into extents_t.  Incorporate LRU functionality to
support purging.  Incorporate page count accounting, which replaces
arena->ndirty and arena->stats.retained.

Assert that no core locks are held when entering any internal
[de]allocation functions.  This is in addition to existing assertions
that no locks are held when entering external [de]allocation functions.

Audit and document synchronization protocols for all arena_t fields.

This fixes a potential deadlock due to recursive allocation during
gdump, in a similar fashion to b49c649bc1
(Fix lock order reversal during gdump.), but with a necessarily much
broader code impact.
2017-02-01 16:43:46 -08:00
Jason Evans
1b6e43507e Fix/refactor tcaches synchronization.
Synchronize tcaches with tcaches_mtx rather than ctl_mtx.  Add missing
synchronization for tcache flushing.  This bug was introduced by
1cb181ed63 (Implement explicit tcache
support.), which was first released in 4.0.0.
2017-02-01 16:43:46 -08:00
Jason Evans
d0e93ada51 Add witness_assert_depth[_to_rank]().
This makes it possible to make lock state assertions about precisely
which locks are held.
2017-02-01 16:43:46 -08:00
Jason Evans
ace679ce74 Synchronize extent_grow_next accesses.
This should have been part of 411697adcd
(Use exponential series to size extents.), which introduced
extent_grow_next.
2017-02-01 16:43:46 -08:00
Jason Evans
5033a9176a Call prof_gctx_create() without owing bt2gctx_mtx.
This reduces the probability of allocating (and thereby indirectly
making a system call) while owning bt2gctx_mtx.  Unfortunately it is an
incomplete solution, because ckh insertion/deletion can also
allocate/deallocate, which requires more extensive changes to address.
2017-02-01 16:43:46 -08:00
Jason Evans
397f54aa46 Conditionalize prof fork handling on config_prof.
This allows the compiler to completely remove dead code.
2017-02-01 16:43:46 -08:00
Qi Wang
bbff6ca674 Handle race in stats_arena_bins_print
When multiple threads calling stats_print, race could happen as we read the
counters in separate mallctl calls; and the removed assertion could fail when
other operations happened in between the mallctl calls. For simplicity, output
"race" in the utilization field in this case.
2017-02-01 15:17:39 -08:00
Jason Evans
190f81c6d5 Silence harmless warnings discovered via run_tests.sh. 2017-02-01 11:29:12 -08:00
David Goldblatt
449b7f4867 CI: Run --enable-debug builds on windows
This will hopefully catch some windows-specific bugs.
2017-01-31 17:23:30 -08:00
David Goldblatt
5260d9c12f Introduce scripts to run all possible tests
In 6e7d0890 we added better travis continuous integration tests. This is nice,
but has two problems:
- We run only a subset of interesting tests.
- The travis builds can take hours to give us back results (especially on OS X).

This adds scripts/gen_run_tests.py, and its output, run_tests.sh, which builds
and runs a larger portion of possible configurations on the local machine.

While a travis run takes several hours to complete , I can run these scripts on
my (OS X) latop and (Linux) devserve, and get a more exhaustive set of results
back in around 10 minutes.
2017-01-30 17:51:57 -08:00
David Goldblatt
6e7d0890cb Beef up travis CI integration testing
Introduces gen_travis.py, which generates .travis.yml, and updates .travis.yml
to be the generated version.

The travis build matrix approach doesn't play well with mixing and matching
various different environment settings, so we generate every build explicitly,
rather than letting them do it for us.

To avoid abusing travis resources (and save us time waiting for CI results), we
don't test every possible combination of options; we only check up to 2 unusual
settings at a time.
2017-01-26 21:31:21 -08:00
David Goldblatt
85d2841818 Fix a bug in which a potentially invalid usize replaced size
In the refactoring that unified the allocation paths, usize was substituted for
size. This worked fine under the default test configuration, but triggered
asserts when we started beefing up our CI testing.

This change fixes the issue, and clarifies the comment describing the argument
selection that it got wrong.
2017-01-25 15:50:59 -08:00
Tamir Duberstein
0874b648e0 Avoid redeclaring glibc's secure_getenv
Avoid the name secure_getenv to avoid redeclaring secure_getenv when
secure_getenv is present but its use is manually disabled via
ac_cv_func_secure_getenv=no.
2017-01-25 11:24:32 -08:00
Tamir Duberstein
b973ec7975 Avoid redeclaring glibc's secure_getenv
Avoid the name secure_getenv to avoid redeclaring secure_getenv when
secure_getenv is present but its use is manually disabled via
ac_cv_func_secure_getenv=no.
2017-01-25 11:22:28 -08:00
Jason Evans
b49c649bc1 Fix lock order reversal during gdump. 2017-01-24 12:50:06 -08:00
Jason Evans
dad74bd3c8 Convert witness_assert_lockless() to witness_assert_lock_depth().
This makes it possible to make lock state assertions about precisely
which locks are held.
2017-01-24 12:50:06 -08:00
Jason Evans
c0cc5db871 Replace tabs following #define with spaces.
This resolves #564.
2017-01-20 21:45:53 -08:00
Jason Evans
f408643a4c Remove extraneous parens around return arguments.
This resolves #540.
2017-01-20 21:43:07 -08:00
Jason Evans
c4c2592c83 Update brace style.
Add braces around single-line blocks, and remove line breaks before
function-opening braces.

This resolves #537.
2017-01-20 21:43:07 -08:00
David Goldblatt
5154ff32ee Unify the allocation paths
This unifies the allocation paths for malloc, posix_memalign, aligned_alloc,
calloc, memalign, valloc, and mallocx, so that they all share common code where
they can.

There's more work that could be done here, but I think this is the smallest
discrete change in this direction.
2017-01-20 12:15:53 -08:00
Jason Evans
9eb1b1c881 Fix --disable-stats support.
Fix numerous regressions that were exposed by --disable-stats, both in
the core library and in the tests.
2017-01-19 18:31:07 -08:00
Jason Evans
66bf773ef2 Test JSON output of malloc_stats_print() and fix bugs.
Implement and test a JSON validation parser.  Use the parser to validate
JSON output from malloc_stats_print(), with a significant subset of
supported output options.

This resolves #551.
2017-01-19 14:05:00 -08:00
Jason Evans
7a61ebe71f Remove -Werror=declaration-after-statement.
This partially resolves #536.
2017-01-19 11:07:42 -08:00
Qi Wang
58424e679d Added stats about number of bytes cached in tcache currently. 2017-01-18 10:55:21 -08:00
Mike Hommey
12ab4383e9 Add dummy implementations for most remaining OSX zone allocator functions
Some system libraries are using malloc_default_zone() and then using
some of the malloc_zone_* API. Under normal conditions, those functions
check the malloc_zone_t/malloc_introspection_t struct for the values
that are allowed to be NULL, so that a NULL deref doesn't happen.

As of OSX 10.12, malloc_default_zone() doesn't return the actual default
zone anymore, but returns a fake, wrapper zone. The wrapper zone defines
all the possible functions in the malloc_zone_t/malloc_introspection_t
struct (almost), and calls the function from the registered default zone
(jemalloc in our case) on its own. Without checking whether the pointers
are NULL.

This means that a system library that calls e.g.
malloc_zone_batch_malloc(malloc_default_zone(), ...) ends up trying to
call jemalloc_zone.batch_malloc, which is NULL, and crash follows.

So as of OSX 10.12, the default zone is required to have all the
functions available (really, the same as the wrapper zone), even if they
do nothing.

This is arguably a bug in libsystem_malloc in OSX 10.12, but jemalloc
still needs to work in that case.
2017-01-17 20:13:28 -08:00
Mike Hommey
0f7376eb62 Don't rely on OSX SDK malloc/malloc.h for malloc_zone struct definitions
The SDK jemalloc is built against might be not be the latest for various
reasons, but the resulting binary ought to work on newer versions of
OSX.

In order to ensure this, we need the fullest definitions possible, so
copy what we need from the latest version of malloc/malloc.h available
on opensource.apple.com.
2017-01-17 20:13:28 -08:00
Mike Hommey
c6943acb3c Add dummy implementations for most remaining OSX zone allocator functions
Some system libraries are using malloc_default_zone() and then using
some of the malloc_zone_* API. Under normal conditions, those functions
check the malloc_zone_t/malloc_introspection_t struct for the values
that are allowed to be NULL, so that a NULL deref doesn't happen.

As of OSX 10.12, malloc_default_zone() doesn't return the actual default
zone anymore, but returns a fake, wrapper zone. The wrapper zone defines
all the possible functions in the malloc_zone_t/malloc_introspection_t
struct (almost), and calls the function from the registered default zone
(jemalloc in our case) on its own. Without checking whether the pointers
are NULL.

This means that a system library that calls e.g.
malloc_zone_batch_malloc(malloc_default_zone(), ...) ends up trying to
call jemalloc_zone.batch_malloc, which is NULL, and crash follows.

So as of OSX 10.12, the default zone is required to have all the
functions available (really, the same as the wrapper zone), even if they
do nothing.

This is arguably a bug in libsystem_malloc in OSX 10.12, but jemalloc
still needs to work in that case.
2017-01-17 20:12:24 -08:00
Mike Hommey
c68bb41793 Don't rely on OSX SDK malloc/malloc.h for malloc_zone struct definitions
The SDK jemalloc is built against might be not be the latest for various
reasons, but the resulting binary ought to work on newer versions of
OSX.

In order to ensure this, we need the fullest definitions possible, so
copy what we need from the latest version of malloc/malloc.h available
on opensource.apple.com.
2017-01-17 20:12:24 -08:00
Jason Evans
1ff09534b5 Fix prof_realloc() regression.
Mostly revert the prof_realloc() changes in
498856f44a (Move slabs out of chunks.) so
that prof_free_sampled_object() is called when appropriate.  Leave the
prof_tctx_[re]set() optimization in place, but add an assertion to
verify that all eight cases are correctly handled.  Add a comment to
make clear the code ordering, so that the regression originally fixed by
ea8d97b897 (Fix
prof_{malloc,free}_sample_object() call order in prof_realloc().) is not
repeated.

This resolves #499.
2017-01-17 15:16:37 -08:00
Jason Evans
de5e1aff2a Formatting/comment fixes. 2017-01-17 15:16:37 -08:00
Jason Evans
8115f05b26 Add nullptr support to sized delete operators. 2017-01-17 14:30:15 -08:00
Jason Evans
41aa41853c Fix style nits. 2017-01-17 14:30:15 -08:00
Qi Wang
e8990dc7c7 Remove redundent stats-merging logic when destroying tcache.
The removed stats merging logic is already taken care of by tcache_flush.
2017-01-17 09:42:39 -08:00
Jason Evans
ffbb7dac3d Remove leading blank lines from function bodies.
This resolves #535.
2017-01-13 14:49:24 -08:00
Jason Evans
87e81e609b Fix indentation. 2017-01-13 14:49:24 -08:00
John Paul Adrian Glaubitz
9389335b86 Use better pre-processor defines for sparc64
Currently, jemalloc detects sparc64 targets by checking whether
__sparc64__ is defined. However, this definition is used on BSD
targets only. Linux targets define both __sparc__ and __arch64__
for sparc64. Since this also works on BSD, rather use __sparc__
and __arch64__ instead of __sparc64__ to detect sparc64 targets.
2017-01-13 09:01:33 -08:00
David Goldblatt
77cccac8cd Break up headers into constituent parts
This is part of a broader change to make header files better represent the
dependencies between one another (see
https://github.com/jemalloc/jemalloc/issues/533). It breaks up component headers
into smaller parts that can be made to have a simpler dependency graph.

For the autogenerated headers (smoothstep.h and size_classes.h), no splitting
was necessary, so I didn't add support to emit multiple headers.
2017-01-12 15:43:51 -08:00
David Goldblatt
94c5d22a4d Remove mb.h, which is unused 2017-01-11 13:24:30 -08:00
John Paul Adrian Glaubitz
77de5f27d8 Use better pre-processor defines for sparc64
Currently, jemalloc detects sparc64 targets by checking whether
__sparc64__ is defined. However, this definition is used on BSD
targets only. Linux targets define both __sparc__ and __arch64__
for sparc64. Since this also works on BSD, rather use __sparc__
and __arch64__ instead of __sparc64__ to detect sparc64 targets.
2017-01-10 17:39:54 -08:00
Jason Evans
edf1bafb2b Implement arena.<i>.destroy .
Add MALLCTL_ARENAS_DESTROYED for accessing destroyed arena stats as an
analogue to MALLCTL_ARENAS_ALL.

This resolves #382.
2017-01-06 18:58:46 -08:00
Jason Evans
3f291d59ad Refactor test extent hook code to be reusable.
Move test extent hook code from the extent integration test into a
header, and normalize the out-of-band controls and introspection.
Also refactor the base unit test to use the header.
2017-01-06 18:58:46 -08:00
Jason Evans
dc2125cf95 Replace the arenas.initialized mallctl with arena.<i>.initialized . 2017-01-06 18:58:46 -08:00
Jason Evans
6edbedd916 Range-check mib[1] --> arena_ind casts. 2017-01-06 18:58:46 -08:00
Jason Evans
c0a05e6aba Move static ctl_epoch variable into ctl_stats_t (as epoch). 2017-01-06 18:58:45 -08:00
Jason Evans
d778dd2afc Refactor ctl_stats_t.
Refactor ctl_stats_t to be a demand-zeroed non-growing data structure.
To keep the size from being onerous (~60 MiB) on 32-bit systems, convert
the arenas field to contain pointers rather than directly embedded
ctl_arena_stats_t elements.
2017-01-06 18:58:45 -08:00
Jason Evans
0f04bb1d6f Rename the arenas.extend mallctl to arenas.create. 2017-01-06 18:58:45 -08:00
Jason Evans
3dc4e83ccb Add MALLCTL_ARENAS_ALL.
Add the MALLCTL_ARENAS_ALL cpp macro as a fixed index for use
in accessing the arena.<i>.{purge,decay,dss} and stats.arenas.<i>.*
mallctls, and deprecate access via the arenas.narenas index (to be
removed in 6.0.0).
2017-01-06 18:58:45 -08:00
Jason Evans
027ace8519 Reindent. 2017-01-06 18:58:45 -08:00
Jason Evans
d0a3129b88 Fix locking in arena_dirty_count().
This was a latent bug, since the function is (intentionally) not used.
2017-01-06 18:58:45 -08:00
Jason Evans
363629df88 Fix allocated_large stats with respect to sampled small allocations. 2017-01-06 18:58:45 -08:00
Jason Evans
5c5ff8d121 Fix arena_large_reset_stats_cancel().
Decrement ndalloc_large rather than incrementing, in order to cancel out
the increment in arena_large_dalloc_stats_update().
2017-01-04 20:26:30 -08:00
Jason Evans
a0dd3a4483 Implement per arena base allocators.
Add/rename related mallctls:
- Add stats.arenas.<i>.base .
- Rename stats.arenas.<i>.metadata to stats.arenas.<i>.internal .
- Add stats.arenas.<i>.resident .

Modify the arenas.extend mallctl to take an optional (extent_hooks_t *)
argument so that it is possible for all base allocations to be serviced
by the specified extent hooks.

This resolves #463.
2016-12-26 18:08:28 -08:00
Jason Evans
a6e86810d8 Refactor purging and splitting/merging.
Split purging into lazy and forced variants.  Use the forced variant for
zeroing dss.

Add support for NULL function pointers as an opt-out mechanism for the
dalloc, commit, decommit, purge_lazy, purge_forced, split, and merge
fields of extent_hooks_t.

Add short-circuiting checks in large_ralloc_no_move_{shrink,expand}() so
that no attempt is made if splitting/merging is not supported.

This resolves #268.
2016-12-26 18:08:16 -08:00
Jason Evans
884fa22b8c Rename arena_decay_t's ndirty to nunpurged. 2016-12-26 17:59:43 -08:00
Jason Evans
411697adcd Use exponential series to size extents.
If virtual memory is retained, allocate extents such that their sizes
form an exponentially growing series.  This limits the number of
disjoint virtual memory ranges so that extent merging can be effective
even if multiple arenas' extent allocation requests are highly
interleaved.

This resolves #462.
2016-12-26 17:59:42 -08:00
Jason Evans
c1baa0a9b7 Add huge page configuration and pages_[no}huge().
Add the --with-lg-hugepage configure option, but automatically configure
LG_HUGEPAGE even if it isn't specified.

Add the pages_[no]huge() functions, which toggle huge page state via
madvise(..., MADV_[NO]HUGEPAGE) calls.
2016-12-26 17:59:34 -08:00
Jason Evans
eab3b180e5 Fix JSON-mode output for !config_stats and/or !config_prof cases.
These bugs were introduced by 0ba5b9b618
(Add "J" (JSON) support to malloc_stats_print().), which was backported
as b599b32280 (with the same bugs except
the inapplicable "metatata" misspelling) and first released in 4.3.0.
2016-12-23 11:15:44 -08:00
Jason Evans
bacb6afc6c Simplify arena_slab_regind().
Rewrite arena_slab_regind() to provide sufficient constant data for
the compiler to perform division strength reduction.  This replaces
more general manual strength reduction that was implemented before
arena_bin_info was compile-time-constant.  It would be possible to
slightly improve on the compiler-generated division code by taking
advantage of range limits that the compiler doesn't know about.
2016-12-23 10:34:34 -08:00
Jason Evans
194d6f9de8 Restructure *CFLAGS/*CXXFLAGS configuration.
Convert CFLAGS/CXXFLAGS to be concatenations:

  CFLAGS := CONFIGURE_CFLAGS SPECIFIED_CFLAGS EXTRA_CFLAGS
  CXXFLAGS := CONFIGURE_CXXFLAGS SPECIFIED_CXXFLAGS EXTRA_CXXFLAGS

This ordering makes it possible to override the flags set by the
configure script both during and after configuration, with
CFLAGS/CXXFLAGS and EXTRA_CFLAGS/EXTRA_CXXFLAGS, respectively.

This resolves #504.
2016-12-16 07:24:36 -08:00
Jason Evans
a965a9cb12 Re-expand the Travis-CI build matrix. 2016-12-13 16:19:20 -08:00
Jason Evans
590ee2a6e0 Update Travis-CI config for C++ integration. 2016-12-13 14:53:10 -08:00
Jason Evans
69c26cdb01 Add some missing explicit casts. 2016-12-13 13:38:11 -08:00
Dave Watson
2319152d9f jemalloc cpp new/delete bindings
Adds cpp bindings for jemalloc, along with necessary autoconf settings.
This is mostly to add sized deallocation support, which can't be added
from C directly.  Sized deallocation is ~10% microbench improvement.

* Import ax_cxx_compile_stdcxx.m4 from the autoconf repo, seems like the
  easiest way to get c++14 detection.
* Adds various other changes, like CXXFLAGS, to configure.ac.
* Adds new rules to Makefile.in for src/jemalloc-cpp.cpp, and a basic
  unittest.
* Both new and delete are overridden, to ensure jemalloc is used for
  both.
* TODO future enhancement of avoiding extra PLT thunks for new and
  delete - sdallocx and malloc are publicly exported jemalloc symbols,
  using an alias would link them directly.  Unfortunately, was having
  trouble getting it to play nice with jemalloc's namespace support.

Testing:
Tested gcc 4.8, gcc 5, gcc 5.2, clang 4.0.  Only gcc >= 5 has sized
deallocation support, verified that the rest build correctly.

Tested mac osx and Centos.

Tested --with-jemalloc-prefix and --without-export.

This resolves #202.
2016-12-12 18:36:06 -08:00
Jason Evans
d4c5aceb7c Add a_type parameter to qr_{meld,split}(). 2016-12-12 18:16:51 -08:00
Jason Evans
f1f7635731 Merge branch 'rc-4.4.0' 2016-12-03 22:48:43 -08:00
Jason Evans
2d1bb8980f Update ChangeLog for 4.4.0. 2016-12-03 22:44:24 -08:00
Jason Evans
fbe3015818 Update ChangeLog for 4.4.0. 2016-12-03 18:35:23 -08:00
Jason Evans
145f3cd173 Add --disable-syscall.
This resolves #517.
2016-12-03 16:56:19 -08:00
Jason Evans
acb7b1f53e Add --disable-syscall.
This resolves #517.
2016-12-03 16:50:58 -08:00
Jason Evans
e1b2970d28 Update configure cache file example. 2016-12-03 16:09:25 -08:00
Jason Evans
34a7e37a71 Fix pages_purge() when using MADV_DONTNEED.
This fixes a regression caused by
e98a620c59 (Mark partially purged arena
chunks as non-hugepage.).
2016-12-03 16:06:19 -08:00
Jason Evans
7179351a45 Update configure cache file example. 2016-11-30 09:57:12 -08:00
John Szakmeister
a05d4da4d8 Implement a more reliable detection scheme for os_unfair_lock.
The core issue here is the weak linking of the symbol, and in certain
environments--for instance, using the latest Xcode (8.1) with the latest
SDK (10.12)--os_unfair_lock may resolve even though you're compiling on
a host that doesn't support it (10.11).

We can use the availability macros to circumvent this problem, and
detect that we're not compiling for a target that is going to support
them and error out at compile time.  The other alternative is to do a
runtime check, but that presents issues for cross-compiling.
2016-11-28 17:44:29 -08:00
Jason Evans
e98a620c59 Mark partially purged arena chunks as non-hugepage.
Add the pages_[no]huge() functions, which toggle huge page state via
madvise(..., MADV_[NO]HUGEPAGE) calls.

The first time a page run is purged from within an arena chunk, call
pages_nohuge() to tell the kernel to make no further attempts to back
the chunk with huge pages.  Upon arena chunk deletion, restore the
associated virtual memory to its original state via pages_huge().

This resolves #243.
2016-11-24 00:15:55 -08:00
John Szakmeister
eb29d7ec0e Implement a more reliable detection scheme for os_unfair_lock.
The core issue here is the weak linking of the symbol, and in certain
environments--for instance, using the latest Xcode (8.1) with the latest
SDK (10.12)--os_unfair_lock may resolve even though you're compiling on
a host that doesn't support it (10.11).

We can use the availability macros to circumvent this problem, and
detect that we're not compiling for a target that is going to support
them and error out at compile time.  The other alternative is to do a
runtime check, but that presents issues for cross-compiling.
2016-11-23 15:32:35 -05:00
Jason Evans
fc11f3cb84 Enable overriding JEMALLOC_{ALLOC,FREE}_JUNK.
This resolves #509.
2016-11-22 11:02:28 -08:00
Jason Evans
32127949a3 Enable overriding JEMALLOC_{ALLOC,FREE}_JUNK.
This resolves #509.
2016-11-22 10:58:58 -08:00
Jason Evans
c3b85f2585 Style fixes. 2016-11-22 10:58:23 -08:00
Jason Evans
949a27fc32 Add pthread_atfork(3) feature test.
Some versions of Android provide a pthreads library without providing
pthread_atfork(), so in practice a separate feature test is necessary
for the latter.
2016-11-17 15:16:27 -08:00
Jason Evans
5234be2133 Add pthread_atfork(3) feature test.
Some versions of Android provide a pthreads library without providing
pthread_atfork(), so in practice a separate feature test is necessary
for the latter.
2016-11-17 15:14:57 -08:00
Jason Evans
fda60be799 Update a comment. 2016-11-17 11:50:52 -08:00
Jason Evans
62f2d84e7a Refactor madvise(2) configuration.
Add feature tests for the MADV_FREE and MADV_DONTNEED flags to
madvise(2), so that MADV_FREE is detected and used for Linux kernel
versions 4.5 and newer.  Refactor pages_purge() so that on systems which
support both flags, MADV_FREE is preferred over MADV_DONTNEED.

This resolves #387.
2016-11-17 10:37:48 -08:00
Jason Evans
a64123ce13 Refactor madvise(2) configuration.
Add feature tests for the MADV_FREE and MADV_DONTNEED flags to
madvise(2), so that MADV_FREE is detected and used for Linux kernel
versions 4.5 and newer.  Refactor pages_purge() so that on systems which
support both flags, MADV_FREE is preferred over MADV_DONTNEED.

This resolves #387.
2016-11-17 10:31:57 -08:00
Jason Evans
e7ca53bac2 Remove a residual comment. 2016-11-16 19:42:03 -08:00
Jason Evans
f7ca1c9bc3 Remove a residual comment. 2016-11-16 19:41:09 -08:00
Jason Evans
0d6a472db9 Avoid gcc tautological-compare warnings. 2016-11-16 18:53:59 -08:00
Jason Evans
3ea838d2a2 Avoid gcc type-limits warnings. 2016-11-16 18:32:24 -08:00
Jason Evans
aec5a051e8 Avoid gcc type-limits warnings. 2016-11-16 18:28:38 -08:00
Maks Naumov
95974c0440 Remove size_t -> unsigned -> size_t conversion. 2016-11-16 11:23:31 -08:00
Jason Evans
8e3fb7f417 Document how to use --cache configure option.
This resolves #494.
2016-11-16 10:58:32 -08:00
Jason Evans
9b94c015af Document how to use --cache configure option.
This resolves #494.
2016-11-16 10:56:40 -08:00
Jason Evans
2a24dc2476 Revert "Add JE_RUNNABLE() and use it for os_unfair_lock_*() test."
This reverts commit 45f83a2ac6.

JE_RUNNABLE() causes general cross-compilation issues.
2016-11-16 10:40:48 -08:00
Jason Evans
4066b4ef57 Revert "Add JE_RUNNABLE() and use it for os_unfair_lock_*() test."
This reverts commit a2e601a223.

JE_RUNNABLE() causes general cross-compilation issues.
2016-11-16 10:40:00 -08:00
Jason Evans
6468dd52f3 Fix an MSVC compiler warning. 2016-11-15 21:08:28 -08:00
Jason Evans
8a4528bdd1 Uniformly cast mallctl[bymib]() oldp/newp arguments to (void *).
This avoids warnings in some cases, and is otherwise generally good
hygiene.
2016-11-15 15:01:03 -08:00
Jason Evans
8f61fdedb9 Uniformly cast mallctl[bymib]() oldp/newp arguments to (void *).
This avoids warnings in some cases, and is otherwise generally good
hygiene.
2016-11-15 15:00:28 -08:00
Jason Evans
84ae603577 Explicitly cast negative constants meant for use as unsigned. 2016-11-15 14:05:16 -08:00
Jason Evans
72c587a411 Add cast to silence (harmless) conversion warning. 2016-11-15 14:05:00 -08:00
Jason Evans
87004d238c Avoid negation of unsigned numbers.
Rather than relying on two's complement negation for alignment mask
generation, use bitwise not and addition.  This dodges warnings from
MSVC, and should be strength-reduced by compiler optimization anyway.
2016-11-15 14:04:35 -08:00
Jason Evans
2379479225 Consistently use size_t rather than uint64_t for extent serial numbers. 2016-11-15 13:47:22 -08:00
Jason Evans
6a71d37a75 Add packing test, which verifies stable layout policy. 2016-11-15 13:33:47 -08:00
Jason Evans
5c77af98b1 Add extent serial numbers.
Add extent serial numbers and use them where appropriate as a sort key
that is higher priority than address, so that the allocation policy
prefers older extents.

This resolves #147.
2016-11-15 13:33:40 -08:00
Jason Evans
2c95154501 Add packing test, which verifies stable layout policy. 2016-11-15 13:08:33 -08:00
Jason Evans
a38acf716e Add extent serial numbers.
Add extent serial numbers and use them where appropriate as a sort key
that is higher priority than address, so that the allocation policy
prefers older extents.

This resolves #147.
2016-11-15 13:08:33 -08:00
Jason Evans
c0a667112c Fix arena_reset() crashing bug.
This regression was caused by 498856f44a
(Move slabs out of chunks.).
2016-11-15 10:34:02 -08:00
Jason Evans
a2e601a223 Add JE_RUNNABLE() and use it for os_unfair_lock_*() test.
This resolves #494.
2016-11-12 09:48:06 -08:00
Jason Evans
45f83a2ac6 Add JE_RUNNABLE() and use it for os_unfair_lock_*() test.
This resolves #494.
2016-11-12 09:47:07 -08:00
Jason Evans
c25e711cf9 Reduce memory usage for sdallocx() test_alignment_and_size. 2016-11-11 23:50:35 -08:00
Jason Evans
ded4f38ffd Reduce memory usage for sdallocx() test_alignment_and_size. 2016-11-11 23:49:40 -08:00
Jason Evans
1aeea0f391 Simplify extent_quantize().
2cdf07aba9 (Fix extent_quantize() to
handle greater-than-huge-size extents.) solved a non-problem; the
expression passed in to index2size() was never too large.  However the
expression could in principle underflow, so fix the actual (latent) bug
and remove unnecessary complexity.
2016-11-11 22:46:55 -08:00
Jason Evans
a2af09f025 Remove overly restrictive stats_cactive_{add,sub}() assertions.
This fixes a regression caused by
40ee9aa957 (Fix stats.cactive accounting
regression.) and first released in 4.1.0.
2016-11-11 22:19:10 -08:00
Jason Evans
b9408d77a6 Fix/simplify chunk_recycle() allocation size computations.
Remove outer CHUNK_CEILING(s2u(...)) from alloc_size computation, since
s2u() may overflow (and return 0), and CHUNK_CEILING() is only needed
around the alignment portion of the computation.

This fixes a regression caused by
5707d6f952 (Quantize szad trees by size
class.) and first released in 4.0.0.

This resolves #497.
2016-11-11 22:18:39 -08:00
Jason Evans
2cdf07aba9 Fix extent_quantize() to handle greater-than-huge-size extents.
Allocation requests can't directly create extents that exceed
HUGE_MAXCLASS, but extent merging can create them.

This fixes a regression caused by
8a03cf039c (Implement cache index
randomization for large allocations.) and first released in 4.0.0.

This resolves #497.
2016-11-11 22:17:27 -08:00
Jason Evans
e916d55ba1 Add configure support for *-*-linux-android.
This is tailored to Android, i.e. more specific than the *-*-linux*
configuration.

This resolves #471.
2016-11-10 15:40:23 -08:00
Samuel Moritz
092d760817 Support Debian GNU/kFreeBSD.
Treat it exactly like Linux since they both use GNU libc.
2016-11-10 15:39:33 -08:00
Jason Evans
32d69e967e Add configure support for *-*-linux-android.
This is tailored to Android, i.e. more specific than the *-*-linux*
configuration.

This resolves #471.
2016-11-10 15:36:17 -08:00
Jason Evans
b4486dce24 Update config.{guess,sub} from upstream. 2016-11-10 15:08:32 -08:00
Jason Evans
c233dd5e40 Update config.{guess,sub} from upstream. 2016-11-10 15:02:05 -08:00
Jason Evans
85dae2ff49 Update ChangeLog for 4.3.1. 2016-11-07 16:22:02 -08:00
Jason Evans
5e0373c815 Fix test_prng_lg_range_zu() to work on 32-bit systems. 2016-11-07 11:50:11 -08:00
Jason Evans
cda59f9970 Rename atomic_*_{uint32,uint64,u}() to atomic_*_{u32,u64,zu}().
This change conforms to naming conventions throughout the codebase.
2016-11-07 11:27:48 -08:00
Jason Evans
2e46b13ad5 Revert "Define 64-bits atomics unconditionally"
This reverts commit c2942e2c0e.

This resolves #495.
2016-11-07 10:53:35 -08:00
Jason Evans
04b463546e Refactor prng to not use 64-bit atomics on 32-bit platforms.
This resolves #495.
2016-11-07 10:52:44 -08:00
Jason Evans
e0a9e78374 Update ChangeLog for 4.3.0. 2016-11-04 15:15:24 -07:00
Matthew Parkinson
d30b3ea51a Fixes to Visual Studio Project files 2016-11-04 09:59:05 -07:00
Jason Evans
6d2a57cfbb Use -std=gnu11 if available.
This supersedes -std=gnu99, and enables C11 atomics.
2016-11-03 21:57:17 -07:00
Jason Evans
0760876927 Update ChangeLog for 4.3.0. 2016-11-04 00:02:43 -07:00
Jason Evans
a967fae362 Fix/simplify extent_recycle() allocation size computations.
Do not call s2u() during alloc_size computation, since any necessary
ceiling increase is taken care of later by extent_first_best_fit() -->
extent_size_quantize_ceil(), and the s2u() call may erroneously cause a
higher quantization result.

Remove an overly strict overflow check that was added in
4a7852137d (Fix extent_recycle()'s
cache-oblivious padding support.).
2016-11-03 23:49:21 -07:00
Jason Evans
4a7852137d Fix extent_recycle()'s cache-oblivious padding support.
Add padding *after* computing the size class, so that the optimal size
class isn't skipped during search for a usable extent.  This regression
was caused by b46261d58b (Implement
cache-oblivious support for huge size classes.).
2016-11-03 22:33:35 -07:00
Jason Evans
ea9961acdb Fix psz/pind edge cases.
Add an "over-size" extent heap in which to store extents which exceed
the maximum size class (plus cache-oblivious padding, if enabled).
Remove psz2ind_clamp() and use psz2ind() instead so that trying to
allocate the maximum size class can in principle succeed.  In practice,
this allows assertions to hold so that OOM errors can be successfully
generated.
2016-11-03 22:33:34 -07:00
Jason Evans
8dd5ea87ca Fix extent_alloc_cache[_locked]() to support decommitted allocation.
Fix extent_alloc_cache[_locked]() to support decommitted allocation, and
use this ability in arena_stash_dirty(), so that decommitted extents are
not needlessly committed during purging.  In practice this does not
happen on any currently supported systems, because both extent merging
and decommit must be implemented; all supported systems implement one
xor the other.
2016-11-03 22:33:23 -07:00
Jason Evans
4f7d8c2dee Update symbol mangling. 2016-11-03 15:00:02 -07:00
Jason Evans
04e1328ef1 Update ChangeLog for 4.3.0. 2016-11-02 21:39:24 -07:00
Samuel Moritz
69f027b855 Support Debian GNU/kFreeBSD.
Treat it exactly like Linux since they both use GNU libc.
2016-11-02 20:36:37 -07:00
Dave Watson
25f7bbcf28 Fix long spinning in rtree_node_init
rtree_node_init spinlocks the node, allocates, and then sets the node.
This is under heavy contention at the top of the tree if many threads
start to allocate at the same time.

Instead, take a per-rtree sleeping mutex to reduce spinning.  Tested
both pthreads and osx OSSpinLock, and both reduce spinning adequately

Previous benchmark time:
./ttest1 500 100
~15s

New benchmark time:
./ttest1 500 100
.57s
2016-11-02 20:30:53 -07:00
Dave Watson
712fde79fd Check for existance of CPU_COUNT macro before using it.
This resolves #485.
2016-11-02 20:05:40 -07:00
Jason Evans
83ebf2fda5 Fix sycall(2) configure test for Linux. 2016-11-02 19:50:44 -07:00
Jason Evans
d82f2b3473 Do not use syscall(2) on OS X 10.12 (deprecated). 2016-11-02 19:18:33 -07:00
Jason Evans
795f6689de Add os_unfair_lock support.
OS X 10.12 deprecated OSSpinLock; os_unfair_lock is the recommended
replacement.
2016-11-02 18:09:45 -07:00
Jason Evans
d9f7b2a430 Fix/refactor zone allocator integration code.
Fix zone_force_unlock() to reinitialize, rather than unlocking mutexes,
since OS X 10.12 cannot tolerate a child unlocking mutexes that were
locked by its parent.

Refactor; this was a side effect of experimenting with zone
{de,re}registration during fork(2).
2016-11-02 18:06:40 -07:00
Jason Evans
b54072dfee Call _exit(2) rather than exit(3) in forked child.
_exit(2) is async-signal-safe, whereas exit(3) is not.
2016-11-02 18:05:19 -07:00
Jason Evans
eee1ca655e Force no lazy-lock on Windows.
Monitoring thread creation is unimplemented for Windows, which means
lazy-lock cannot function correctly.

This resolves #310.
2016-11-02 09:14:47 -07:00
Jason Evans
7b0a8b74f0 malloc_stats_print() fixes/cleanups.
Fix and clean up various malloc_stats_print() issues caused by
0ba5b9b618 (Add "J" (JSON) support to
malloc_stats_print().).
2016-11-01 15:26:35 -07:00
Jason Evans
2a2d1b6e86 Use <quote>...</quote> rather than &ldquo;...&rdquo; or "..." in XML. 2016-11-01 13:25:42 -07:00
Jason Evans
0ba5b9b618 Add "J" (JSON) support to malloc_stats_print().
This resolves #474.
2016-10-31 22:30:49 -07:00
Jason Evans
b93f63b3eb Fix extent_rtree acquire() to release element on error.
This resolves #480.
2016-10-31 16:32:33 -07:00
Jason Evans
90b60eeae4 Add an assertion in witness_owner(). 2016-10-31 15:28:22 -07:00
Jason Evans
6a834d94bb Refactor witness_unlock() to fix undefined test behavior.
This resolves #396.
2016-10-31 11:49:12 -07:00
Jason Evans
6c80321aed Use CLOCK_MONOTONIC_COARSE rather than COARSE_MONOTONIC_RAW.
The raw clock variant is slow (even relative to plain CLOCK_MONOTONIC),
whereas the coarse clock variant is faster than CLOCK_MONOTONIC, but
still has resolution (~1ms) that is adequate for our purposes.

This resolves #479.
2016-10-29 22:58:18 -07:00
Jason Evans
d87037a62c Use syscall(2) rather than {open,read,close}(2) during boot.
Some applications wrap various system calls, and if they call the
allocator in their wrappers, unexpected reentry can result.  This is not
a general solution (many other syscalls are spread throughout the code),
but this resolves a bootstrapping issue that is apparently common.

This resolves #443.
2016-10-29 22:41:04 -07:00
Jason Evans
af0e28fd94 Fix EXTRA_CFLAGS to not affect configuration. 2016-10-29 22:14:55 -07:00
Jason Evans
1dcd0aa07f Do not mark malloc_conf as weak on Windows.
This works around malloc_conf not being properly initialized by at least
the cygwin toolchain.  Prior build system changes to use
-Wl,--[no-]whole-archive may be necessary for malloc_conf resolution to
work properly as a non-weak symbol (not tested).
2016-10-29 00:13:11 -07:00
Jason Evans
6ec2d8e279 Do not mark malloc_conf as weak for unit tests.
This is generally correct (no need for weak symbols since no jemalloc
library is involved in the link phase), and avoids linking problems
(apparently unininitialized non-NULL malloc_conf) when using cygwin with
gcc.
2016-10-28 23:03:25 -07:00
Dave Watson
8309388408 Support static linking of jemalloc with glibc
glibc defines its malloc implementation with several weak and strong
symbols:

strong_alias (__libc_calloc, __calloc) weak_alias (__libc_calloc, calloc)
strong_alias (__libc_free, __cfree) weak_alias (__libc_free, cfree)
strong_alias (__libc_free, __free) strong_alias (__libc_free, free)
strong_alias (__libc_malloc, __malloc) strong_alias (__libc_malloc, malloc)

The issue is not with the weak symbols, but that other parts of glibc
depend on __libc_malloc explicitly.  Defining them in terms of jemalloc
API's allows the linker to drop glibc's malloc.o completely from the link,
and static linking no longer results in symbol collisions.

Another wrinkle: jemalloc during initialization calls sysconf to
get the number of CPU's.  GLIBC allocates for the first time before
setting up isspace (and other related) tables, which are used by
sysconf.  Instead, use the pthread API to get the number of
CPUs with GLIBC, which seems to work.

This resolves #442.
2016-10-28 15:08:19 -07:00
Jason Evans
bde815dc40 Reduce memory requirements for regression tests.
This is intended to drop memory usage to a level that AppVeyor test
instances can handle.

This resolves #393.
2016-10-28 11:23:24 -07:00
Jason Evans
970d293257 Periodically purge in memory-intensive integration tests.
This resolves #393.
2016-10-28 11:00:36 -07:00
Jason Evans
963289df13 Periodically purge in memory-intensive integration tests.
This resolves #393.
2016-10-28 10:44:39 -07:00
Jason Evans
68e14c9884 Fix over-sized allocation of rtree leaf nodes.
Use the correct level metadata when allocating child nodes so that leaf
nodes don't end up over-sized (2^16 elements vs 2^4 elements).
2016-10-28 00:16:55 -07:00
Jason Evans
977103c897 Uniformly cast mallctl[bymib]() oldp/newp arguments to (void *).
This avoids warnings in some cases, and is otherwise generally good
hygiene.
2016-10-27 21:31:25 -07:00
Jason Evans
44df4a45cf Explicitly cast negative constants meant for use as unsigned. 2016-10-27 21:29:59 -07:00
Jason Evans
17aa187f6b Add cast to silence (harmless) conversion warning. 2016-10-27 21:29:00 -07:00
Jason Evans
48d4adfbeb Avoid negation of unsigned numbers.
Rather than relying on two's complement negation for alignment mask
generation, use bitwise not and addition.  This dodges warnings from
MSVC, and should be strength-reduced by compiler optimization anyway.
2016-10-27 21:26:33 -07:00
Jason Evans
d76cfec319 Only link with libm (-lm) if necessary.
This fixes warnings when building with MSVC.
2016-10-27 21:23:48 -07:00
Jason Evans
c44fa92db5 Only use --whole-archive with gcc.
Conditionalize use of --whole-archive on the platform plus compiler,
rather than on the ABI.  This fixes a regression caused by
7b24c6e557 (Use --whole-archive when
linking integration tests on MinGW.).
2016-10-27 17:10:56 -07:00
Jason Evans
583c32c305 Do not force lazy lock on Windows.
This reverts 13473c7c66, which was
intended to work around bootstrapping issues when linking statically.
However, this actually causes problems in various other configurations,
so this reversion may force a future fix for the underlying problem, if
it still exists.
2016-10-27 15:41:43 -07:00
Jason Evans
7b24c6e557 Use --whole-archive when linking integration tests on MinGW.
Prior to this change, the malloc_conf weak symbol provided by the
jemalloc dynamic library is always used, even if the application
provides a malloc_conf symbol.  Use the --whole-archive linker option
to allow the weak symbol to be overridden.
2016-10-25 22:03:14 -07:00
Jason Evans
b54d160dc4 Do not (recursively) allocate within tsd_fetch().
Refactor tsd so that tsdn_fetch() does not trigger allocation, since
allocation could cause infinite recursion.

This resolves #458.
2016-10-20 23:59:12 -07:00
Jason Evans
577d4572b0 Make dss operations lockless.
Rather than protecting dss operations with a mutex, use atomic
operations.  This has negligible impact on synchronization overhead
during typical dss allocation, but is a substantial improvement for
extent_in_dss() and the newly added extent_dss_mergeable(), which can be
called multiple times during extent deallocations.

This change also has the advantage of avoiding tsd in deallocation paths
associated with purging, which resolves potential deadlocks during
thread exit due to attempted tsd resurrection.

This resolves #425.
2016-10-13 15:37:00 -07:00
Jason Evans
e5effef428 Add/use adaptive spinning.
Add spin_t and spin_{init,adaptive}(), which provide a simple
abstraction for adaptive spinning.

Adaptively spin during busy waits in bootstrapping and rtree node
initialization.
2016-10-13 14:55:39 -07:00
Jason Evans
9acd5cf178 Remove all vestiges of chunks.
Remove mallctls:
- opt.lg_chunk
- stats.cactive

This resolves #464.
2016-10-12 11:55:43 -07:00
Jason Evans
63b5657aa5 Remove ratio-based purging.
Make decay-based purging the default (and only) mode.

Remove associated mallctls:
- opt.purge
- opt.lg_dirty_mult
- arena.<i>.lg_dirty_mult
- arenas.lg_dirty_mult
- stats.arenas.<i>.lg_dirty_mult

This resolves #385.
2016-10-12 10:40:27 -07:00
Jason Evans
b4b4a77848 Fix and simplify decay-based purging.
Simplify decay-based purging attempts to only be triggered when the
epoch is advanced, rather than every time purgeable memory increases.
In a correctly functioning system (not previously the case; see below),
this only causes a behavior difference if during subsequent purge
attempts the least recently used (LRU) purgeable memory extent is
initially too large to be purged, but that memory is reused between
attempts and one or more of the next LRU purgeable memory extents are
small enough to be purged.  In practice this is an arbitrary behavior
change that is within the set of acceptable behaviors.

As for the purging fix, assure that arena->decay.ndirty is recorded
*after* the epoch advance and associated purging occurs.  Prior to this
fix, it was possible for purging during epoch advance to cause a
substantially underrepresentative (arena->ndirty - arena->decay.ndirty),
i.e. the number of dirty pages attributed to the current epoch was too
low, and a series of unintended purges could result.  This fix is also
relevant in the context of the simplification described above, but the
bug's impact would be limited to over-purging at epoch advances.
2016-10-11 15:30:01 -07:00
Jason Evans
48993ed536 Fix decay tests to all adapt to nstime_monotonic(). 2016-10-11 15:28:43 -07:00
Jason Evans
5f11fb7d43 Do not advance decay epoch when time goes backwards.
Instead, move the epoch backward in time.  Additionally, add
nstime_monotonic() and use it in debug builds to assert that time only
goes backward if nstime_update() is using a non-monotonic time source.
2016-10-10 22:15:10 -07:00
Jason Evans
ee0c74b77a Refactor arena->decay_* into arena->decay.* (arena_decay_t). 2016-10-10 20:32:19 -07:00
Jason Evans
e0164bc63c Refine nstime_update().
Add missing #include <time.h>.  The critical time facilities appear to
have been transitively included via unistd.h and sys/time.h, but in
principle this omission was capable of having caused
clock_gettime(CLOCK_MONOTONIC, ...) to have been overlooked in favor of
gettimeofday(), which in turn could cause spurious non-monotonic time
updates.

Refactor nstime_get() out of nstime_update() and add configure tests for
all variants.

Add CLOCK_MONOTONIC_RAW support (Linux-specific) and
mach_absolute_time() support (OS X-specific).

Do not fall back to clock_gettime(CLOCK_REALTIME, ...).  This was a
fragile Linux-specific workaround, which we're unlikely to use at all
now that clock_gettime(CLOCK_MONOTONIC_RAW, ...) is supported, and if we
have no choice besides non-monotonic clocks, gettimeofday() is only
incrementally worse.
2016-10-10 10:33:59 -07:00
Jason Evans
b6c0867142 Reduce "thread.arena" mallctl contention.
This resolves #460.
2016-10-04 09:54:18 -07:00
Jason Evans
a5a8d7ae8d Remove a size class assertion from extent_size_quantize_floor().
Extent coalescence can result in legitimate calls to
extent_size_quantize_floor() with size larger than LARGE_MAXCLASS.
2016-10-03 14:45:27 -07:00
Jason Evans
871a9498e1 Fix size class overflow bugs.
Avoid calling s2u() on raw extent sizes in extent_recycle().

Clamp psz2ind() (implemented as psz2ind_clamp()) when inserting/removing
into/from size-segregated extent heaps.
2016-10-03 14:18:55 -07:00
Jason Evans
d51139c33c Verify extent hook functions receive correct extent_hooks pointer. 2016-09-29 09:50:35 -07:00
Jason Evans
42e79c58a0 Update extent hook function prototype comments. 2016-09-29 09:49:19 -07:00
Jason Evans
3c8c3e9e9b Close file descriptor after reading "/proc/sys/vm/overcommit_memory".
This bug was introduced by c2f970c32b
(Modify pages_map() to support mapping uncommitted virtual memory.).

This resolves #399.
2016-09-26 15:55:40 -07:00
Thomas Köckerbauer
ea68cd25b6 use install command determined by configure 2016-09-26 15:39:55 -07:00
Bai
020c32859d Readme.txt error for building in the Windows
The command can't work using sh -C sh -c "./autogen.sh CC=cl --enable-lazy-lock=no". 
Change the position of the colon, the command of autogen work.
2016-09-26 15:34:53 -07:00
Eric Le Bihan
df0d273a07 Fix LG_QUANTUM definition for sparc64
GCC 4.9.3 cross-compiled for sparc64 defines __sparc_v9__, not
__sparc64__ nor __sparcv9. This prevents LG_QUANTUM from being defined
properly. Adding this new value to the check solves the issue.
2016-09-26 15:13:07 -07:00
Jason Evans
5ff1839133 Formatting fixes. 2016-09-26 11:00:32 -07:00
Elliot Ronaghan
c096ccfe11 Fix a bug in __builtin_unreachable configure check
In 1167e9e, I accidentally tested je_cv_gcc_builtin_ffsl instead of
je_cv_gcc_builtin_unreachable (copy-paste error), which meant that
JEMALLOC_INTERNAL_UNREACHABLE was always getting defined as abort even if
__builtin_unreachable support was detected.
2016-09-26 10:38:59 -07:00
Jason Evans
61f467e16a Avoid self assignment in tsd_set(). 2016-09-23 12:21:34 -07:00
Jason Evans
0222fb41d1 Add various mutex ownership assertions. 2016-09-23 12:21:34 -07:00
Jason Evans
73868b60f2 Fix extent_{before,last,past}() to return page-aligned results. 2016-09-23 12:21:34 -07:00
Jason Evans
e3187ec6b6 Fix large_dalloc_impl() to always lock large_mtx. 2016-09-23 12:21:34 -07:00
Jason Evans
fd96974040 Add new_addr validation in extent_recycle(). 2016-09-23 12:21:25 -07:00
Jason Evans
f6d01ff4b7 Protect extents_dirty access with extents_mtx.
This fixes race conditions during purging.
2016-09-22 11:57:28 -07:00
Jason Evans
bc49157d21 Fix extent_recycle() to exclude other arenas' extents.
When attempting to recycle an extent at a specified address, check that
the extent belongs to the correct arena.
2016-09-22 11:53:19 -07:00
Qi Wang
1cb399b630 Fix arena_bind().
When tsd is not in nominal state (e.g. during thread termination), we
should not increment nthreads.
2016-09-22 09:13:45 -07:00
Josh Gao
17c4b8de5f Fix -Wundef in _MSC_VER check. 2016-09-15 14:33:28 -07:00
Jason Evans
d4ce47e7fb Change html manual encoding to UTF-8.
This works around GitHub's broken automatic reformatting from ISO-8859-1
to UTF-8 when serving static html.

Remove <parameter/> from e.g. <function>malloc<parameter/></function>,
add a custom template that does not append parentheses, and manually
specify them, e.g. <function>malloc()</function>.  This works around
apparently broken XSL formatting that causes <code/> to be emitted in
html (rather than <code></code>, or better yet, nothing).
2016-09-12 16:31:21 -07:00
Jason Evans
c716c1e531 Update project URL. 2016-09-12 11:56:24 -07:00
Mike Hommey
19c9a3e828 Change how the default zone is found
On OSX 10.12, malloc_default_zone returns a special zone that is not
present in the list of registered zones. That zone uses a "lite zone"
if one is present (apparently enabled when malloc stack logging is
enabled), or the first registered zone otherwise. In practice this
means unless malloc stack logging is enabled, the first registered
zone is the default.

So get the list of zones to get the first one, instead of relying on
malloc_default_zone.
2016-07-08 13:35:35 +09:00
Mike Hommey
4abaee5d13 Avoid getting the same default zone twice in a row.
847ff22 added a call to malloc_default_zone() before the main loop in
register_zone, effectively making malloc_default_zone() called twice
without any different outcome expected in the returned result.

It is also called once at the beginning, and a second time at the end
of the loop block.

Instead, call it only once per iteration.
2016-07-08 13:28:16 +09:00
Elliot Ronaghan
47b34dd398 Disable irrelevant Cray compiler warnings if cc-silence is enabled
Cray is pretty warning-happy, so disable ones that aren't helpful. Each warning
has a numeric value instead of having named flags to disable specific warnings.
Disable warnings 128 and 1357.

128:  Ignore unreachable code warning. Cray warns about `not_reached()` not
      being reachable in a couple of places because it detects that some loops
      will never terminate.

1357: Ignore warning about redefinition of malloc and friends

With this patch, Cray 8.4.0 and 8.5.1 build cleanly and pass `make check`
2016-07-07 15:06:01 -07:00
Elliot Ronaghan
3dee73faf2 Add Cray compiler's equivalent of -Werror before __attribute__ checks
Cray uses -herror_on_warning instead of -Werror. Use it everywhere -Werror is
currently used for __attribute__ checks so configure actually detects they're
not supported.
2016-07-07 13:45:48 -07:00
Elliot Ronaghan
3ef67930e0 Disable automatic dependency generation for the Cray compiler
Cray only supports `-M` for generating dependency files. It does not support
`-MM` or `-MT`, so don't try to use them. I just reused the existing mechanism
for turning auto-dependency generation off (`CC_MM=`), but it might be more
principled to add a configure test to check if the compiler supports `-MM` and
`-MT`, instead of manually tracking which compilers don't support those flags.
2016-07-07 13:45:48 -07:00
Elliot Ronaghan
aec07531bc Add initial support for building with the cray compiler
Get jemalloc building and passing `make check_unit` with cray 8.4. An inlining
bug in 8.4 results in internal errors while trying to build jemalloc. This has
already been reported and fixed for the 8.5 release.

In order to work around the inlining bug, disable gnu compatibility and limit
ipa optimizations.

I copied the msvc compiler check for cray, but note that we perform the test
even if we think we're using gcc because cray pretends to be gcc if `-hgnu`
(which is enabled by default) is used. I couldn't come up with a principled way
to check for the inlining bug, so instead I just checked compiler versions.

The build had lots of warnings I need to address and cray doesn't support -MM
or -MT for dependency tracking, so I had to do `make CC_MM=`.
2016-07-07 13:45:48 -07:00
rustyx
e37720cb4a Fix MSVC project 2016-07-07 13:31:51 -07:00
Elliot Ronaghan
1167e9eff3 Check for __builtin_unreachable at configure time
Add a configure check for __builtin_unreachable instead of basing its
availability on the __GNUC__ version. On OS X using gcc (a real gcc, not the
bundled version that's just a gcc front-end) leads to a linker assertion:

    https://github.com/jemalloc/jemalloc/issues/266

It turns out that this is caused by a gcc bug resulting from the use of
__builtin_unreachable():

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438

To work around this bug, check that __builtin_unreachable() actually works at
configure time, and if it doesn't use abort() instead. The check is based on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57438#c21.

With this `make check` passes with a homebrew installed gcc-5 and gcc-6.
2016-07-07 13:28:44 -07:00
Elliot Ronaghan
ae3314785b Fix librt detection when using a Cray compiler wrapper
The Cray compiler wrappers will often add `-lrt` to the base compiler with
`-static` linking (the default at most sites.) However, `-lrt` isn't
automatically added with `-dynamic`. This means that if jemalloc was built with
`-static`, but then used in a program with `-dynamic` jemalloc won't have
detected that librt is a dependency.

The integration and stress tests use -dynamic, which is causing undefined
references to clock_gettime().

This just adds an extra check for librt (ignoring the autoconf cache) with
`-dynamic` thrown. It also stops filtering librt from the integration tests.

With this `make check` passes for:
 - PrgEnv-gnu
 - PrgEnv-intel
 - PrgEnv-pgi

PrgEnv-cray still needs more work (will be in a separate patch.)
2016-07-07 13:25:01 -07:00
Elliot Ronaghan
ccd6416073 Add -dynamic for integration and stress tests with Cray compiler wrappers
Cray systems come with compiler wrappers to simplify building parallel
applications. CC is the C++ wrapper, and cc is the C wrapper.

The wrappers call the base {Cray, Intel, PGI, or GNU} compiler with vendor
specific flags. The "Programming Environment" (prgenv) that's currently loaded
determines the base compiler. e.g. compiling with gnu looks something like:

    module load PrgEnv-gnu
    cc hello.c

On most systems the wrappers defaults to `-static` mode, which causes them to
only look for static libraries, and not for any dynamic ones (even if the
dynamic version was explicitly listed.)

The integration and stress tests expect to be using the .so, so we have to run
the with -dynamic so that wrapper will find/use the .so.
2016-07-07 13:25:01 -07:00
Mike Hommey
2ea7742e6f Add Travis-CI configuration 2016-07-07 13:16:05 -07:00
Mike Hommey
c2942e2c0e Define 64-bits atomics unconditionally
They are used on all platforms in prng.h.
2016-06-09 23:17:39 +09:00
Mike Hommey
0dad5b7719 Fix extent_*_get to build with MSVC 2016-06-09 22:00:18 +09:00
Mike Hommey
91278fbddf Add an AppVeyor config
This builds jemalloc and runs all checks with:
- MSVC 2015 64-bits
- MSVC 2015 32-bits
- MINGW64 (from msys2)
- MINGW32 (from msys2)

Normally, AppVeyor configs are named appveyor.yml, but it is possible to
configure the .yml file name in the AppVeyor project settings such that
the file stays "hidden", like typical travis configs.
2016-06-09 21:06:22 +09:00
Elliot Ronaghan
8a1a794b0c Don't use compact red-black trees with the pgi compiler
Some bug (either in the red-black tree code, or in the pgi compiler) seems to
cause red-black trees to become unbalanced. This issue seems to go away if we
don't use compact red-black trees. Since red-black trees don't seem to be used
much anymore, I opted for what seems to be an easy fix here instead of digging
in and trying to find the root cause of the bug.

Some context in case it's helpful:

I experienced a ton of segfaults while using pgi as Chapel's target compiler
with jemalloc 4.0.4. The little bit of debugging I did pointed me somewhere
deep in red-black tree manipulation, but I didn't get a chance to investigate
further. It looks like 4.2.0 replaced most uses of red-black trees with
pairing-heaps, which seems to avoid whatever bug I was hitting.

However, `make check_unit` was still failing on the rb test, so I figured the
core issue was just being masked. Here's the `make check_unit` failure:

```sh
=== test/unit/rb ===
test_rb_empty: pass
tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black
test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced
tree_recurse:test/unit/rb.c:90: Failed assertion: (((_Bool) (((uintptr_t) (left_node)->link.rbn_right_red) & ((size_t)1)))) == (false) --> true != false: Node should be black
test_rb_random:test/unit/rb.c:274: Failed assertion: (imbalances) == (0) --> 1 != 0: Tree is unbalanced
node_remove:test/unit/rb.c:190: Failed assertion: (imbalances) == (0) --> 2 != 0: Tree is unbalanced
<jemalloc>: test/unit/rb.c:43: Failed assertion: "pathp[-1].cmp < 0"
test/test.sh: line 22: 12926 Aborted
Test harness error
```

While starting to debug I saw the RB_COMPACT option and decided to check if
turning that off resolved the bug. It seems to have fixed it (`make check_unit`
passes and the segfaults under Chapel are gone) so it seems like on okay
work-around. I'd imagine this has performance implications for red-black trees
under pgi, but if they're not going to be used much anymore it's probably not a
big deal.
2016-06-08 14:48:55 -07:00
Elliot Ronaghan
fbd7956d45 Work around a weird pgi bug in test/unit/math.c
pgi fails to compile math.c, reporting that `-INFINITY` in `pt_norm_expected[]`
is a "Non-constant" expression. A simplified version of this failure is:

```c
#include <math.h>

static double inf1, inf2 = INFINITY;  // no complaints
static double inf3 = INFINITY;        // suddenly INFINITY is "Non-constant"

int main() { }
```

```sh
PGC-S-0074-Non-constant expression in initializer (t.c: 4)
```

pgi errors on the declaration of inf3, and will compile fine if that line is
removed. I've reported this bug to pgi, but in the meantime I just switched to
using (DBL_MAX + DBL_MAX) to work around this bug.
2016-06-08 14:20:32 -07:00
Jason Evans
b9b3556289 Update ChangeLog for 4.2.1. 2016-06-08 10:19:33 -07:00
Jason Evans
dd752c1ffd Fix potential VM map fragmentation regression.
Revert 245ae6036c (Support --with-lg-page
values larger than actual page size.), because it could cause VM map
fragmentation if the kernel grows mmap()ed memory downward.

This resolves #391.
2016-06-07 14:15:49 -07:00
Elliot Ronaghan
de23f6fce7 Fix mixed decl in nstime.c
Fix mixed decl in the gettimeofday() branch of nstime_update()
2016-06-07 14:03:27 -07:00
Jason Evans
cc289f40b6 Propagate tsdn to default extent hooks.
This avoids bootstrapping issues for configurations that require
allocation during tsd initialization.

This resolves #390.
2016-06-07 13:37:22 -07:00
Jason Evans
02a475d89a Use extent_commit_wrapper() rather than directly calling commit hook.
As a side effect this causes the extent's 'committed' flag to be
updated.
2016-06-06 15:32:01 -07:00
Jason Evans
10b9087b14 Set 'committed' in extent_[de]commit_wrapper(). 2016-06-05 23:24:52 -07:00
Jason Evans
0c5cec833f Relax extent hook tests to work with unsplittable extents. 2016-06-05 22:30:31 -07:00
Jason Evans
487093d999 Fix regressions related extent splitting failures.
Fix a fundamental extent_split_wrapper() bug in an error path.

Fix extent_recycle() to deregister unsplittable extents before leaking
them.

Relax xallocx() test assertions so that unsplittable extents don't cause
test failures.
2016-06-05 22:08:20 -07:00
Jason Evans
9a645c612f Fix an extent [de]allocation/[de]registration race.
Deregister extents before deallocation, so that subsequent
reallocation/registration doesn't race with deregistration.
2016-06-05 21:00:02 -07:00
Jason Evans
4e910fc958 Fix extent_alloc_dss() regressions.
Page-align the gap, if any, and add/use extent_dalloc_gap(), which
registers the gap extent before deallocation.
2016-06-05 21:00:02 -07:00
Jason Evans
c4bb17f891 Fix gdump triggering regression.
Now that extents are not multiples of chunksize, it's necessary to track
pages rather than chunks.
2016-06-05 21:00:02 -07:00
Jason Evans
42faa9e3e0 Work around legitimate xallocx() failures during testing.
With the removal of subchunk size class infrastructure, there are no
large size classes that are guaranteed to be re-expandable in place
unless munmap() is disabled.  Work around these legitimate failures with
rallocx() fallback calls.  If there were no test configuration for which
the xallocx() calls succeeded, it would be important to override the
extent hooks for testing purposes, but by default these tests don't use
the rallocx() fallbacks on Linux, so test coverage is still sufficient.
2016-06-05 21:00:02 -07:00
Jason Evans
04942c3d90 Remove a stray memset(), and fix a junk filling test regression. 2016-06-05 21:00:02 -07:00
Jason Evans
f02fec8839 Silence a bogus compiler warning. 2016-06-05 21:00:02 -07:00
Jason Evans
8835cf3bed Fix locking order reversal in arena_reset(). 2016-06-05 21:00:02 -07:00
Jason Evans
f8f0542194 Modify extent hook functions to take an (extent_t *) argument.
This facilitates the application accessing its own extent allocator
metadata during hook invocations.

This resolves #259.
2016-06-05 21:00:02 -07:00
Jason Evans
6f29a83924 Add rtree lookup path caching.
rtree-based extent lookups remain more expensive than chunk-based run
lookups, but with this optimization the fast path slowdown is ~3 CPU
cycles per metadata lookup (on Intel Core i7-4980HQ), versus ~11 cycles
prior.  The path caching speedup tends to degrade gracefully unless
allocated memory is spread far apart (as is the case when using a
mixture of sbrk() and mmap()).
2016-06-05 20:59:57 -07:00
Jason Evans
7be2ebc23f Make tsd cleanup functions optional, remove noop cleanup functions. 2016-06-05 20:42:24 -07:00
Jason Evans
e28b43a739 Remove some unnecessary locking. 2016-06-05 20:42:24 -07:00
Jason Evans
37f0e34606 Reduce NSZS, since NSIZES (was nsizes) can not be so large. 2016-06-05 20:42:24 -07:00
Jason Evans
819417580e Fix rallocx() sampling code to not eagerly commit sampler update.
rallocx() for an alignment-constrained request may end up with a
smaller-than-worst-case size if in-place reallocation succeeds due to
serendipitous alignment.  In such cases, sampling may not happen.
2016-06-05 20:42:24 -07:00
Jason Evans
b14fdaaca0 Add a missing prof_alloc_rollback() call.
In the case where prof_alloc_prep() is called with an over-estimate of
allocation size, and sampling doesn't end up being triggered, the tctx
must be discarded.
2016-06-05 20:42:24 -07:00
Jason Evans
c8c3cbdf47 Miscellaneous s/chunk/extent/ updates. 2016-06-05 20:42:24 -07:00
Jason Evans
a43db1c608 Relax NBINS constraint (max 255 --> max 256). 2016-06-05 20:42:24 -07:00
Jason Evans
a83a31c1c5 Relax opt_lg_chunk clamping constraints. 2016-06-05 20:42:24 -07:00
Jason Evans
751f2c332d Remove obsolete stats.arenas.<i>.metadata.mapped mallctl.
Rename stats.arenas.<i>.metadata.allocated mallctl to
stats.arenas.<i>.metadata .
2016-06-05 20:42:24 -07:00
Jason Evans
03eea4fb8b Better document --enable-ivsalloc. 2016-06-05 20:42:24 -07:00
Jason Evans
22588dda6e Rename most remaining *chunk* APIs to *extent*. 2016-06-05 20:42:23 -07:00
Jason Evans
0c4932eb1e s/chunk_lookup/extent_lookup/g, s/chunks_rtree/extents_rtree/g 2016-06-05 20:42:23 -07:00
Jason Evans
4a55daa363 s/CHUNK_HOOKS_INITIALIZER/EXTENT_HOOKS_INITIALIZER/g 2016-06-05 20:42:23 -07:00
Jason Evans
c9a76481d8 Rename chunks_{cached,retained,mtx} to extents_{cached,retained,mtx}. 2016-06-05 20:42:23 -07:00
Jason Evans
127026ad98 Rename chunk_*_t hooks to extent_*_t. 2016-06-05 20:42:23 -07:00
Jason Evans
9c305c9e5c s/chunk_hook/extent_hook/g 2016-06-05 20:42:23 -07:00
Jason Evans
7d63fed0fd Rename huge to large. 2016-06-05 20:42:23 -07:00
Jason Evans
714d1640f3 Update private symbols. 2016-06-05 20:42:23 -07:00
Jason Evans
498856f44a Move slabs out of chunks. 2016-06-05 20:42:23 -07:00
Jason Evans
d28e5a6696 Improve interval-based profile dump triggering.
When an allocation is large enough to trigger multiple dumps, use
modular math rather than subtraction to reset the interval counter.
Prior to this change, it was possible for a single allocation to cause
many subsequent allocations to all trigger profile dumps.

When updating usable size for a sampled object, try to cancel out
the difference between LARGE_MINCLASS and usable size from the interval
counter.
2016-06-05 20:42:23 -07:00
Jason Evans
ed2c2427a7 Use huge size class infrastructure for large size classes. 2016-06-05 20:42:18 -07:00
Jason Evans
b46261d58b Implement cache-oblivious support for huge size classes. 2016-06-03 12:27:41 -07:00
Jason Evans
4731cd47f7 Allow chunks to not be naturally aligned.
Precisely size extents for huge size classes that aren't multiples of
chunksize.
2016-06-03 12:27:41 -07:00
Jason Evans
741967e79d Remove CHUNK_ADDR2BASE() and CHUNK_ADDR2OFFSET(). 2016-06-03 12:27:41 -07:00
Jason Evans
23c52c895f Make extent_prof_tctx_[gs]et() atomic. 2016-06-03 12:27:41 -07:00
Jason Evans
760bf11b23 Add extent_dirty_[gs]et(). 2016-06-03 12:27:41 -07:00
Jason Evans
47613afc34 Convert rtree from per chunk to per page.
Refactor [de]registration to maintain interior rtree entries for slabs.
2016-06-03 12:27:41 -07:00
Jason Evans
5c6be2bdd3 Refactor chunk_purge_wrapper() to take extent argument. 2016-06-03 12:27:41 -07:00
Jason Evans
0eb6f08959 Refactor chunk_[de]commit_wrapper() to take extent arguments. 2016-06-03 12:27:41 -07:00
Jason Evans
6c94470822 Refactor chunk_dalloc_{cache,wrapper}() to take extent arguments.
Rename arena_extent_[d]alloc() to extent_[d]alloc().

Move all chunk [de]registration responsibility into chunk.c.
2016-06-03 12:27:41 -07:00
Jason Evans
de0305a7f3 Add/use chunk_split_wrapper().
Remove redundant ptr/oldsize args from huge_*().

Refactor huge/chunk/arena code boundaries.
2016-06-03 12:27:41 -07:00
Jason Evans
1ad060584f Add/use chunk_merge_wrapper(). 2016-06-03 12:27:41 -07:00
Jason Evans
384e88f451 Add/use chunk_commit_wrapper(). 2016-06-03 12:27:41 -07:00
Jason Evans
56e0031d7d Add/use chunk_decommit_wrapper(). 2016-06-03 12:27:41 -07:00
Jason Evans
4d2d9cec5a Merge chunk_alloc_base() into its only caller. 2016-06-03 12:27:41 -07:00
Jason Evans
fc0372a15e Replace extent_tree_szad_* with extent_heap_*. 2016-06-03 12:27:41 -07:00
Jason Evans
ffa45a5331 Use rtree rather than [sz]ad trees for chunk split/coalesce operations. 2016-06-03 12:27:41 -07:00
Jason Evans
25845db7c9 Dodge ivsalloc() assertion in test code. 2016-06-03 12:27:41 -07:00
Jason Evans
93e79c5c3f Remove redundant chunk argument from chunk_{,de,re}register(). 2016-06-03 12:27:41 -07:00
Jason Evans
f442254bdf Fix opt_zero-triggered in-place huge reallocation zeroing.
Fix huge_ralloc_no_move_expand() to update the extent's zeroed attribute
based on the intersection of the previous value and that of the newly
merged trailing extent.
2016-06-03 12:27:41 -07:00
Jason Evans
9aea58d9a2 Add extent_past_get(). 2016-06-03 12:27:41 -07:00
Jason Evans
d78846c989 Replace extent_achunk_[gs]et() with extent_slab_[gs]et(). 2016-06-03 12:27:41 -07:00
Jason Evans
fae8344098 Add extent_active_[gs]et().
Always initialize extents' runs_dirty and chunks_cache linkage.
2016-06-03 12:27:41 -07:00
Jason Evans
6f71844659 Move *PAGE* definitions to pages.h. 2016-06-03 12:27:41 -07:00
Jason Evans
b2a9fae886 Set/unset rtree node for last chunk of extents.
Set/unset rtree node for last chunk of extents, so that the rtree can be
used for chunk coalescing.
2016-06-03 12:27:41 -07:00
Jason Evans
e75e9be130 Add rtree element witnesses. 2016-06-03 12:27:41 -07:00
Jason Evans
8c9be3e837 Refactor rtree to always use base_alloc() for node allocation. 2016-06-03 12:27:41 -07:00
Jason Evans
db72272bef Use rtree-based chunk lookups rather than pointer bit twiddling.
Look up chunk metadata via the radix tree, rather than using
CHUNK_ADDR2BASE().

Propagate pointer's containing extent.

Minimize extent lookups by doing a single lookup (e.g. in free()) and
propagating the pointer's extent into nearly all the functions that may
need it.
2016-06-03 12:27:41 -07:00
Jason Evans
2d2b4e98c9 Add element acquire/release capabilities to rtree.
This makes it possible to acquire short-term "ownership" of rtree
elements so that it is possible to read an extent pointer *and* read the
extent's contents with a guarantee that the element will not be modified
until the ownership is released.  This is intended as a mechanism for
resolving rtree read/write races rather than as a way to lock extents.
2016-06-03 12:27:33 -07:00
Jason Evans
f4a58847d3 Remove obsolete reference to Valgrind and quarantine. 2016-06-03 12:20:02 -07:00
Jason Evans
a7a6f5bc96 Rename extent_node_t to extent_t. 2016-05-16 12:21:28 -07:00
Jason Evans
3aea827f5e Simplify run quantization. 2016-05-16 12:21:27 -07:00
Jason Evans
7bb00ae9d6 Refactor runs_avail.
Use pszind_t size classes rather than szind_t size classes, and always
reserve space for NPSIZES elements.  This removes unused heaps that are
not multiples of the page size, and adds (currently) unused heaps for
all huge size classes, with the immediate benefit that the size of
arena_t allocations is constant (no longer dependent on chunk size).
2016-05-16 12:21:21 -07:00
Jason Evans
226c446979 Implement pz2ind(), pind2sz(), and psz2u().
These compute size classes and indices similarly to size2index(),
index2size() and s2u(), respectively, but using the subset of size
classes that are multiples of the page size.  Note that pszind_t and
szind_t are not interchangeable.
2016-05-13 10:31:54 -07:00
Jason Evans
627372b459 Initialize arena_bin_info at compile time rather than at boot time.
This resolves #370.
2016-05-13 10:31:30 -07:00
Jason Evans
b683734b43 Implement BITMAP_INFO_INITIALIZER(nbits).
This allows static initialization of bitmap_info_t structures.
2016-05-13 10:27:48 -07:00
Jason Evans
17c021c177 Remove redzone support.
This resolves #369.
2016-05-13 10:27:33 -07:00
Jason Evans
ba5c709517 Remove quarantine support. 2016-05-13 10:25:05 -07:00
Jason Evans
9a8add1510 Remove Valgrind support. 2016-05-13 09:56:18 -07:00
Jason Evans
a397045323 Use TSDN_NULL rather than NULL as appropriate. 2016-05-12 21:07:08 -07:00
Jason Evans
dc7ff6306d Fix a typo. 2016-05-12 15:06:50 -07:00
548 changed files with 103673 additions and 36738 deletions

View file

@ -5,24 +5,47 @@ environment:
- MSYSTEM: MINGW64
CPU: x86_64
MSVC: amd64
CONFIG_FLAGS: --enable-debug
- MSYSTEM: MINGW64
CPU: x86_64
CONFIG_FLAGS: --enable-debug
EXTRA_CFLAGS: "-fcommon"
- MSYSTEM: MINGW32
CPU: i686
MSVC: x86
- MSYSTEM: MINGW64
CPU: x86_64
CONFIG_FLAGS: --enable-debug
- MSYSTEM: MINGW32
CPU: i686
CONFIG_FLAGS: --enable-debug
EXTRA_CFLAGS: "-fcommon"
- MSYSTEM: MINGW64
CPU: x86_64
MSVC: amd64
CONFIG_FLAGS:
- MSYSTEM: MINGW64
CPU: x86_64
CONFIG_FLAGS:
EXTRA_CFLAGS: "-fcommon"
- MSYSTEM: MINGW32
CPU: i686
MSVC: x86
CONFIG_FLAGS:
- MSYSTEM: MINGW32
CPU: i686
CONFIG_FLAGS:
EXTRA_CFLAGS: "-fcommon"
install:
- set PATH=c:\msys64\%MSYSTEM%\bin;c:\msys64\usr\bin;%PATH%
- if defined MSVC call "c:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" %MSVC%
- if defined MSVC pacman --noconfirm -Rsc mingw-w64-%CPU%-gcc gcc
- pacman --noconfirm -Suy mingw-w64-%CPU%-make
- pacman --noconfirm -Syuu
- pacman --noconfirm -S autoconf
build_script:
- bash -c "autoconf"
- bash -c "./configure"
- mingw32-make -j3
- bash -c "./configure $CONFIG_FLAGS"
- mingw32-make
- file lib/jemalloc.dll
- mingw32-make -j3 tests
- mingw32-make tests
- mingw32-make -k check

122
.clang-format Normal file
View file

@ -0,0 +1,122 @@
# jemalloc targets clang-format version 8. We include every option it supports
# here, but comment out the ones that aren't relevant for us.
---
# AccessModifierOffset: -2
AlignAfterOpenBracket: DontAlign
AlignConsecutiveAssignments: false
AlignConsecutiveDeclarations: true
AlignEscapedNewlines: Right
AlignOperands: false
AlignTrailingComments: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortBlocksOnASingleLine: false
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: Empty
AllowShortIfStatementsOnASingleLine: false
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterReturnType: AllDefinitions
AlwaysBreakBeforeMultilineStrings: true
# AlwaysBreakTemplateDeclarations: Yes
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterClass: true
AfterControlStatement: true
AfterEnum: true
AfterFunction: true
AfterNamespace: true
AfterObjCDeclaration: true
AfterStruct: true
AfterUnion: true
BeforeCatch: true
BeforeElse: true
IndentBraces: false
# BreakAfterJavaFieldAnnotations: true
BreakBeforeBinaryOperators: NonAssignment
BreakBeforeBraces: Attach
BreakBeforeTernaryOperators: true
# BreakConstructorInitializers: BeforeColon
# BreakInheritanceList: BeforeColon
BreakStringLiterals: false
ColumnLimit: 80
# CommentPragmas: ''
# CompactNamespaces: true
# ConstructorInitializerAllOnOneLineOrOnePerLine: true
# ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DerivePointerAlignment: false
DisableFormat: false
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros: [ ql_foreach, qr_foreach, ]
# IncludeBlocks: Preserve
# IncludeCategories:
# - Regex: '^<.*\.h(pp)?>'
# Priority: 1
# IncludeIsMainRegex: ''
IndentCaseLabels: false
IndentPPDirectives: AfterHash
IndentWidth: 8
IndentWrappedFunctionNames: false
# JavaImportGroups: []
# JavaScriptQuotes: Leave
# JavaScriptWrapImports: True
KeepEmptyLinesAtTheStartOfBlocks: false
Language: Cpp
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
# NamespaceIndentation: None
# ObjCBinPackProtocolList: Auto
# ObjCBlockIndentWidth: 2
# ObjCSpaceAfterProperty: false
# ObjCSpaceBeforeProtocolList: false
PenaltyBreakAssignment: 100
PenaltyBreakBeforeFirstCallParameter: 100
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
# PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 60
PointerAlignment: Right
# RawStringFormats:
# - Language: TextProto
# Delimiters:
# - 'pb'
# - 'proto'
# EnclosingFunctions:
# - 'PARSE_TEXT_PROTO'
# BasedOnStyle: google
# - Language: Cpp
# Delimiters:
# - 'cc'
# - 'cpp'
# BasedOnStyle: llvm
# CanonicalDelimiter: 'cc'
ReflowComments: false
SortIncludes: false
SpaceAfterCStyleCast: false
# SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
# SpaceBeforeCpp11BracedList: false
# SpaceBeforeCtorInitializerColon: true
# SpaceBeforeInheritanceColon: true
SpaceBeforeParens: ControlStatements
# SpaceBeforeRangeBasedForLoopColon: true
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInCStyleCastParentheses: false
# SpacesInContainerLiterals: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
# Standard: Cpp11
# This is nominally supported in clang-format version 8, but not in the build
# used by some of the core jemalloc developers.
# StatementMacros: []
TabWidth: 8
UseTab: ForIndentation
...

2
.git-blame-ignore-revs Normal file
View file

@ -0,0 +1,2 @@
554185356bf990155df8d72060c4efe993642baf
34f359e0ca613b5f9d970e9b2152a5203c9df8d6

10
.github/workflows/check_formatting.yaml vendored Normal file
View file

@ -0,0 +1,10 @@
name: 'Check Formatting'
on: [pull_request]
jobs:
check-formatting:
runs-on: ubuntu-latest
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Check for trailing whitespace
run: scripts/check_trailing_whitespace.sh

66
.github/workflows/freebsd-ci.yml vendored Normal file
View file

@ -0,0 +1,66 @@
# This config file is generated by ./scripts/gen_gh_actions.py.
# Do not edit by hand.
name: FreeBSD CI
on:
push:
branches: [ dev, ci_travis ]
pull_request:
branches: [ dev ]
jobs:
test-freebsd:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
debug: ['--enable-debug', '--disable-debug']
prof: ['--enable-prof', '--disable-prof']
arch: ['64-bit', '32-bit']
uncommon:
- ''
- '--with-lg-page=16 --with-malloc-conf=tcache:false'
name: FreeBSD (${{ matrix.arch }}, debug=${{ matrix.debug }}, prof=${{ matrix.prof }}${{ matrix.uncommon && ', uncommon' || '' }})
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 1
- name: Test on FreeBSD
uses: vmactions/freebsd-vm@v1
with:
release: '15.0'
usesh: true
prepare: |
pkg install -y autoconf gmake
run: |
# Verify we're running in FreeBSD
echo "==== System Information ===="
uname -a
freebsd-version
echo "============================"
# Set compiler flags for 32-bit if needed
if [ "${{ matrix.arch }}" = "32-bit" ]; then
export CC="cc -m32"
export CXX="c++ -m32"
fi
# Generate configure script
autoconf
# Configure with matrix options
./configure --with-jemalloc-prefix=ci_ ${{ matrix.debug }} ${{ matrix.prof }} ${{ matrix.uncommon }}
# Get CPU count for parallel builds
export JFLAG=$(sysctl -n kern.smp.cpus)
gmake -j${JFLAG}
gmake -j${JFLAG} tests
gmake check

695
.github/workflows/linux-ci.yml vendored Normal file
View file

@ -0,0 +1,695 @@
# This config file is generated by ./scripts/gen_gh_actions.py.
# Do not edit by hand.
name: Linux CI
on:
push:
branches: [ dev, ci_travis ]
pull_request:
branches: [ dev ]
jobs:
test-linux:
runs-on: ubuntu-24.04
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: clang
CXX: clang++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-prof
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: clang
CXX: clang++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --enable-prof
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: clang
CXX: clang++
CONFIGURE_FLAGS: "--with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --enable-prof
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
COMPILER_FLAGS: -m32
CONFIGURE_FLAGS: "--with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --enable-prof"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --disable-stats"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --disable-libdl"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --enable-opt-safety-checks"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --disable-stats"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --disable-libdl"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-opt-safety-checks"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --disable-libdl"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --enable-opt-safety-checks"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-stats --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --enable-opt-safety-checks"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--disable-libdl --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-lg-page=16"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-opt-safety-checks --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr --with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr --with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr --with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr --with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false,dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false,percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false,background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary,percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary,background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu,background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --disable-cache-oblivious --enable-stats --enable-log --enable-prof"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-debug --enable-experimental-smallocx --enable-stats --enable-prof"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
steps:
- uses: actions/checkout@v4
- name: Show OS version
run: |
echo "=== System Information ==="
uname -a
echo ""
echo "=== Architecture ==="
uname -m
arch
echo ""
echo "=== OS Release ==="
cat /etc/os-release || true
echo ""
echo "=== CPU Info ==="
lscpu | grep -E "Architecture|CPU op-mode|Byte Order|CPU\(s\):" || true
- name: Install dependencies (32-bit)
if: matrix.env.CROSS_COMPILE_32BIT == 'yes'
run: |
sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install -y gcc-multilib g++-multilib libc6-dev-i386
- name: Build and test
env:
CC: ${{ matrix.env.CC }}
CXX: ${{ matrix.env.CXX }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Verify the script generates the same output
./scripts/gen_gh_actions.py > gh_actions_script.yml
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build
make -j3
make -j3 tests
# Run tests
make check
test-linux-arm64:
runs-on: ubuntu-24.04-arm
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: clang
CXX: clang++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-prof
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-lg-hugepage=29"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--enable-prof --enable-prof-frameptr"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=dss:primary"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=background_thread:true"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds"
steps:
- uses: actions/checkout@v4
- name: Show OS version
run: |
echo "=== System Information ==="
uname -a
echo ""
echo "=== Architecture ==="
uname -m
arch
echo ""
echo "=== OS Release ==="
cat /etc/os-release || true
echo ""
echo "=== CPU Info ==="
lscpu | grep -E "Architecture|CPU op-mode|Byte Order|CPU\(s\):" || true
- name: Install dependencies (32-bit)
if: matrix.env.CROSS_COMPILE_32BIT == 'yes'
run: |
sudo dpkg --add-architecture i386
sudo apt-get update
sudo apt-get install -y gcc-multilib g++-multilib libc6-dev-i386
- name: Build and test
env:
CC: ${{ matrix.env.CC }}
CXX: ${{ matrix.env.CXX }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Verify the script generates the same output
./scripts/gen_gh_actions.py > gh_actions_script.yml
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build
make -j3
make -j3 tests
# Run tests
make check

212
.github/workflows/macos-ci.yml vendored Normal file
View file

@ -0,0 +1,212 @@
# This config file is generated by ./scripts/gen_gh_actions.py.
# Do not edit by hand.
name: macOS CI
on:
push:
branches: [ dev, ci_travis ]
pull_request:
branches: [ dev ]
jobs:
test-macos:
runs-on: macos-15-intel
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
steps:
- uses: actions/checkout@v4
- name: Show OS version
run: |
echo "=== macOS Version ==="
sw_vers
echo ""
echo "=== Architecture ==="
uname -m
arch
echo ""
echo "=== CPU Info ==="
sysctl -n machdep.cpu.brand_string
sysctl -n hw.machine
- name: Install dependencies
run: |
brew install autoconf
- name: Build and test
env:
CC: ${{ matrix.env.CC || 'gcc' }}
CXX: ${{ matrix.env.CXX || 'g++' }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build
make -j3
make -j3 tests
# Run tests
make check
test-macos-arm64:
runs-on: macos-15
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-stats
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --disable-libdl
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-opt-safety-checks
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --with-lg-page=16
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-lg-page=16 --with-lg-hugepage=29"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=tcache:false"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: "--with-malloc-conf=percpu_arena:percpu"
EXTRA_CFLAGS: "-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes -Wno-deprecated-declarations"
steps:
- uses: actions/checkout@v4
- name: Show OS version
run: |
echo "=== macOS Version ==="
sw_vers
echo ""
echo "=== Architecture ==="
uname -m
arch
echo ""
echo "=== CPU Info ==="
sysctl -n machdep.cpu.brand_string
sysctl -n hw.machine
- name: Install dependencies
run: |
brew install autoconf
- name: Build and test
env:
CC: ${{ matrix.env.CC || 'gcc' }}
CXX: ${{ matrix.env.CXX || 'g++' }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build
make -j3
make -j3 tests
# Run tests
make check

68
.github/workflows/static_analysis.yaml vendored Normal file
View file

@ -0,0 +1,68 @@
name: 'Static Analysis'
on: [pull_request]
jobs:
static-analysis:
runs-on: ubuntu-latest
steps:
# We build libunwind ourselves because sadly the version
# provided by Ubuntu via apt-get is much too old.
- name: Check out libunwind
uses: actions/checkout@v4
with:
repository: libunwind/libunwind
path: libunwind
ref: 'v1.6.2'
github-server-url: 'https://github.com'
- name: Install libunwind
run: |
cd libunwind
autoreconf -i
./configure --prefix=/usr
make -s -j $(nproc) V=0
sudo make -s install V=0
cd ..
rm -rf libunwind
- name: Check out repository
uses: actions/checkout@v4
# We download LLVM directly from the latest stable release
# on GitHub, because this tends to be much newer than the
# version available via apt-get in Ubuntu.
- name: Download LLVM
uses: dsaltares/fetch-gh-release-asset@master
with:
repo: 'llvm/llvm-project'
version: 'tags/llvmorg-16.0.4'
file: 'clang[+]llvm-.*x86_64-linux-gnu.*'
regex: true
target: 'llvm_assets/'
token: ${{ secrets.GITHUB_TOKEN }}
- name: Install prerequisites
id: install_prerequisites
run: |
tar -C llvm_assets -xaf llvm_assets/*.tar* &
sudo apt-get update
sudo apt-get install -y jq bear python3-pip
pip install codechecker
echo "Extracting LLVM from tar" 1>&2
wait
echo "LLVM_BIN_DIR=$(echo llvm_assets/clang*/bin)" >> "$GITHUB_OUTPUT"
- name: Run static analysis
id: run_static_analysis
run: >
PATH="${{ steps.install_prerequisites.outputs.LLVM_BIN_DIR }}:$PATH"
LDFLAGS='-L/usr/lib'
scripts/run_static_analysis.sh static_analysis_results "$GITHUB_OUTPUT"
- name: Upload static analysis results
if: ${{ steps.run_static_analysis.outputs.HAS_STATIC_ANALYSIS_RESULTS }} == '1'
uses: actions/upload-artifact@v4
with:
name: static_analysis_results
path: static_analysis_results
- name: Check static analysis results
run: |
if [[ "${{ steps.run_static_analysis.outputs.HAS_STATIC_ANALYSIS_RESULTS }}" == '1' ]]
then
echo "::error::Static analysis found issues with your code. Download the 'static_analysis_results' artifact from this workflow and view the 'index.html' file contained within it in a web browser locally for detailed results."
exit 1
fi

155
.github/workflows/windows-ci.yml vendored Normal file
View file

@ -0,0 +1,155 @@
# This config file is generated by ./scripts/gen_gh_actions.py.
# Do not edit by hand.
name: Windows CI
on:
push:
branches: [ dev, ci_travis ]
pull_request:
branches: [ dev ]
jobs:
test-windows:
runs-on: windows-latest
strategy:
fail-fast: false
matrix:
include:
- env:
CC: gcc
CXX: g++
EXTRA_CFLAGS: -fcommon
- env:
CC: gcc
CXX: g++
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: -fcommon
- env:
CC: cl.exe
CXX: cl.exe
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
EXTRA_CFLAGS: -fcommon
- env:
CC: cl.exe
CXX: cl.exe
CONFIGURE_FLAGS: --enable-debug
- env:
CC: gcc
CXX: g++
CROSS_COMPILE_32BIT: yes
CONFIGURE_FLAGS: --enable-debug
EXTRA_CFLAGS: -fcommon
- env:
CC: cl.exe
CXX: cl.exe
CROSS_COMPILE_32BIT: yes
- env:
CC: cl.exe
CXX: cl.exe
CROSS_COMPILE_32BIT: yes
CONFIGURE_FLAGS: --enable-debug
steps:
- uses: actions/checkout@v4
- name: Show OS version
shell: cmd
run: |
echo === Windows Version ===
systeminfo | findstr /B /C:"OS Name" /C:"OS Version"
ver
echo.
echo === Architecture ===
echo PROCESSOR_ARCHITECTURE=%PROCESSOR_ARCHITECTURE%
echo.
- name: Setup MSYS2
uses: msys2/setup-msys2@v2
with:
msystem: ${{ matrix.env.CROSS_COMPILE_32BIT == 'yes' && 'MINGW32' || 'MINGW64' }}
update: true
install: >-
autotools
git
pacboy: >-
make:p
gcc:p
binutils:p
- name: Build and test (MinGW-GCC)
if: matrix.env.CC != 'cl.exe'
shell: msys2 {0}
env:
CC: ${{ matrix.env.CC || 'gcc' }}
CXX: ${{ matrix.env.CXX || 'g++' }}
COMPILER_FLAGS: ${{ matrix.env.COMPILER_FLAGS }}
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
EXTRA_CFLAGS: ${{ matrix.env.EXTRA_CFLAGS }}
run: |
# Run autoconf
autoconf
# Configure with flags
if [ -n "$COMPILER_FLAGS" ]; then
./configure CC="${CC} ${COMPILER_FLAGS}" CXX="${CXX} ${COMPILER_FLAGS}" $CONFIGURE_FLAGS
else
./configure $CONFIGURE_FLAGS
fi
# Build (mingw32-make is the "make" command in MSYS2)
mingw32-make -j3
mingw32-make tests
# Run tests
mingw32-make -k check
- name: Setup MSVC environment
if: matrix.env.CC == 'cl.exe'
uses: ilammy/msvc-dev-cmd@v1
with:
arch: ${{ matrix.env.CROSS_COMPILE_32BIT == 'yes' && 'x86' || 'x64' }}
- name: Build and test (MSVC)
if: matrix.env.CC == 'cl.exe'
shell: msys2 {0}
env:
CONFIGURE_FLAGS: ${{ matrix.env.CONFIGURE_FLAGS }}
MSYS2_PATH_TYPE: inherit
run: |
# Export MSVC environment variables for configure
export CC=cl.exe
export CXX=cl.exe
export AR=lib.exe
export NM=dumpbin.exe
export RANLIB=:
# Verify cl.exe is accessible (should be in PATH via inherit)
if ! which cl.exe > /dev/null 2>&1; then
echo "cl.exe not found, trying to locate MSVC..."
# Find and add MSVC bin directory to PATH
MSVC_BIN=$(cmd.exe /c "echo %VCToolsInstallDir%" | tr -d '\\r' | sed 's/\\\\\\\\/\//g' | sed 's/C:/\\/c/g')
if [ -n "$MSVC_BIN" ]; then
export PATH="$PATH:$MSVC_BIN/bin/Hostx64/x64:$MSVC_BIN/bin/Hostx86/x86"
fi
fi
# Run autoconf
autoconf
# Configure with MSVC
./configure CC=cl.exe CXX=cl.exe AR=lib.exe $CONFIGURE_FLAGS
# Build (mingw32-make is the "make" command in MSYS2)
mingw32-make -j3
# Build tests sequentially due to PDB file issues
mingw32-make tests
# Run tests
mingw32-make -k check

47
.gitignore vendored
View file

@ -1,5 +1,3 @@
/*.gcov.*
/bin/jemalloc-config
/bin/jemalloc.sh
/bin/jeprof
@ -15,20 +13,25 @@
/doc/jemalloc.html
/doc/jemalloc.3
/doc_internal/PROFILING_INTERNALS.pdf
/jemalloc.pc
/lib/
/Makefile
/include/jemalloc/internal/jemalloc_internal.h
/include/jemalloc/internal/jemalloc_preamble.h
/include/jemalloc/internal/jemalloc_internal_defs.h
/include/jemalloc/internal/private_namespace.gen.h
/include/jemalloc/internal/private_namespace.h
/include/jemalloc/internal/private_unnamespace.h
/include/jemalloc/internal/private_namespace_jet.gen.h
/include/jemalloc/internal/private_namespace_jet.h
/include/jemalloc/internal/private_symbols.awk
/include/jemalloc/internal/private_symbols_jet.awk
/include/jemalloc/internal/public_namespace.h
/include/jemalloc/internal/public_symbols.txt
/include/jemalloc/internal/public_unnamespace.h
/include/jemalloc/internal/size_classes.h
/include/jemalloc/jemalloc.h
/include/jemalloc/jemalloc_defs.h
/include/jemalloc/jemalloc_macros.h
@ -40,49 +43,63 @@
/include/jemalloc/jemalloc_typedefs.h
/src/*.[od]
/src/*.gcda
/src/*.gcno
/src/*.sym
# These are semantically meaningful for clangd and related tooling.
/build/
/.cache/
compile_commands.json
/static_analysis_raw_results
/static_analysis_results
/run_tests.out/
/test/test.sh
test/include/test/jemalloc_test.h
test/include/test/jemalloc_test_defs.h
/test/integration/[A-Za-z]*
!/test/integration/cpp/
!/test/integration/[A-Za-z]*.*
/test/integration/*.[od]
/test/integration/*.gcda
/test/integration/*.gcno
/test/integration/*.out
/test/integration/cpp/[A-Za-z]*
!/test/integration/cpp/[A-Za-z]*.*
/test/integration/cpp/*.[od]
/test/integration/cpp/*.out
/test/src/*.[od]
/test/src/*.gcda
/test/src/*.gcno
/test/stress/[A-Za-z]*
!/test/stress/[A-Za-z]*.*
!/test/stress/pa/
/test/stress/*.[od]
/test/stress/*.gcda
/test/stress/*.gcno
/test/stress/*.out
/test/unit/[A-Za-z]*
!/test/unit/[A-Za-z]*.*
/test/unit/*.[od]
/test/unit/*.gcda
/test/unit/*.gcno
/test/unit/*.out
/test/analyze/[A-Za-z]*
!/test/analyze/[A-Za-z]*.*
/test/analyze/*.[od]
/test/analyze/*.out
/VERSION
*.pdb
*.sdf
*.opendb
*.VC.db
*.opensdf
*.cachefile
*.suo
*.user
*.sln.docstates
*.tmp
.vs/
/msvc/Win32/
/msvc/x64/
/msvc/projects/*/*/Debug*/

View file

@ -1,29 +1,365 @@
language: c
# This config file is generated by ./scripts/gen_travis.py.
# Do not edit by hand.
matrix:
# We use 'minimal', because 'generic' makes Windows VMs hang at startup. Also
# the software provided by 'generic' is simply not needed for our tests.
# Differences are explained here:
# https://docs.travis-ci.com/user/languages/minimal-and-generic/
language: minimal
dist: jammy
jobs:
include:
- os: linux
compiler: gcc
arch: amd64
env: CC=gcc CXX=g++ EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
compiler: gcc
env:
- EXTRA_FLAGS=-m32
addons:
apt:
packages:
- gcc-multilib
- os: osx
compiler: clang
- os: osx
compiler: clang
env:
- EXTRA_FLAGS=-m32
arch: amd64
env: CC=clang CXX=clang++ EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=clang CXX=clang++ CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CROSS_COMPILE_32BIT=yes COMPILER_FLAGS="-m32" CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr --with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr --with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr --with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr --with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false,dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false,percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false,background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary,percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary,background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: amd64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu,background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=clang CXX=clang++ EXTRA_CFLAGS="-Werror -Wno-array-bounds -Wno-unknown-warning-option -Wno-ignored-attributes"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-stats" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--disable-libdl" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-opt-safety-checks" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-lg-page=16 --with-lg-hugepage=29" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-prof --enable-prof-frameptr" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=tcache:false" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=dss:primary" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=percpu_arena:percpu" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
- os: linux
arch: arm64
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--with-malloc-conf=background_thread:true" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
# Development build
- os: linux
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --disable-cache-oblivious --enable-stats --enable-log --enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
# --enable-expermental-smallocx:
- os: linux
env: CC=gcc CXX=g++ CONFIGURE_FLAGS="--enable-debug --enable-experimental-smallocx --enable-stats --enable-prof" EXTRA_CFLAGS="-Werror -Wno-array-bounds"
before_install:
- |-
if test -f "./scripts/$TRAVIS_OS_NAME/before_install.sh"; then
source ./scripts/$TRAVIS_OS_NAME/before_install.sh
fi
before_script:
- autoconf
- ./configure${EXTRA_FLAGS:+ CC="$CC $EXTRA_FLAGS"}
- make -j3
- make -j3 tests
- |-
if test -f "./scripts/$TRAVIS_OS_NAME/before_script.sh"; then
source ./scripts/$TRAVIS_OS_NAME/before_script.sh
else
scripts/gen_travis.py > travis_script && diff .travis.yml travis_script
autoconf
# If COMPILER_FLAGS are not empty, add them to CC and CXX
./configure ${COMPILER_FLAGS:+ CC="$CC $COMPILER_FLAGS" CXX="$CXX $COMPILER_FLAGS"} $CONFIGURE_FLAGS
make -j3
make -j3 tests
fi
script:
- make check
- |-
if test -f "./scripts/$TRAVIS_OS_NAME/script.sh"; then
source ./scripts/$TRAVIS_OS_NAME/script.sh
else
make check
fi

View file

@ -1,10 +1,10 @@
Unless otherwise specified, files in the jemalloc source distribution are
subject to the following license:
--------------------------------------------------------------------------------
Copyright (C) 2002-2016 Jason Evans <jasone@canonware.com>.
Copyright (C) 2002-present Jason Evans <jasone@canonware.com>.
All rights reserved.
Copyright (C) 2007-2012 Mozilla Foundation. All rights reserved.
Copyright (C) 2009-2016 Facebook, Inc. All rights reserved.
Copyright (C) 2009-present Facebook, Inc. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

787
ChangeLog
View file

@ -4,6 +4,791 @@ brevity. Much more detail can be found in the git revision history:
https://github.com/jemalloc/jemalloc
* 5.3.1 (Apr 13, 2026)
This release includes over 390 commits spanning bug fixes, new features,
performance optimizations, and portability improvements. Multiple percent
of system-level metric improvements were measured in tested production
workloads. The release has gone through large-scale production testing
at Meta.
New features:
- Support pvalloc. (@Lapenkov: 5b1f2cc5)
- Add double free detection for the debug build. (@izaitsevfb:
36366f3c, @guangli-dai: 42daa1ac, @divanorama: 1897f185)
- Add compile-time option `--enable-pageid` to enable memory mapping
annotation. (@devnexen: 4fc5c4fb)
- Add runtime option `prof_bt_max` to control the max stack depth for
profiling. (@guangli-dai: a0734fd6)
- Add compile-time option `--enable-force-getenv` to use `getenv` instead
of `secure_getenv`. (@interwq: 481bbfc9)
- Add compile-time option `--disable-dss` to disable the usage of
`sbrk(2)`. (@Svetlitski: ea5b7bea)
- Add runtime option `tcache_ncached_max` to control the number of items
in each size bin in the thread cache. (@guangli-dai: 8a22d10b)
- Add runtime option `calloc_madvise_threshold` to determine if kernel or
memset is used to zero the allocations for calloc. (@nullptr0-0:
5081c16b)
- Add compile-time option `--disable-user-config` to disable reading the
runtime configurations from `/etc/malloc.conf` or environment variable
`MALLOC_CONF`. (@roblabla: c17bf8b3)
- Add runtime option `disable_large_size_classes` to guard the new usable
size calculation, which minimizes the memory overhead for large
allocations, i.e., >= 4 * PAGE. (@guangli-dai: c067a55c, 8347f104)
- Enable process_madvise usage, add runtime option
`process_madvise_max_batch` to control the max # of regions in each
madvise batch. (@interwq: 22440a02, @spredolac: 4246475b)
- Add mallctl interfaces:
+ `opt.prof_bt_max` (@guangli-dai: a0734fd6)
+ `arena.<i>.name` to set and get arena names. (@guangli-dai: ba19d2cb)
+ `thread.tcache.max` to set and get the `tcache_max` of the current
thread. (@guangli-dai: a442d9b8)
+ `thread.tcache.ncached_max.write` and
`thread.tcache.ncached_max.read_sizeclass` to set and get the
`ncached_max` setup of the current thread. (@guangli-dai: 630f7de9,
6b197fdd)
+ `arenas.hugepage` to return the hugepage size used, also exported to
malloc stats. (@ilvokhin: 90c627ed)
+ `approximate_stats.active` to return an estimate of the current active
bytes, which should not be compared with other stats retrieved.
(@guangli-dai: 0988583d)
Bug fixes:
- Prevent potential deadlocks in decaying during reentrancy. (@interwq:
434a68e2)
- Fix segfault in extent coalescing. (@Svetlitski: 12311fe6)
- Add null pointer detections in mallctl calls. (@Svetlitski: dc0a184f,
0288126d)
- Make mallctl `arenas.lookup` triable without crashing on invalid
pointers. (@auxten: 019cccc2, 5bac3849)
- Demote sampled allocations for proper deallocations during
`arena_reset`. (@Svetlitski: 62648c88)
- Fix jemalloc's `read(2)` and `write(2)`. (@Svetlitski: d2c9ed3d, @lexprfuncall:
9fdc1160)
- Fix the pkg-config metadata file. (@BtbN: ed7e6fe7, ce8ce99a)
- Fix the autogen.sh so that it accepts quoted extra options.
(@honggyukim: f6fe6abd)
- Fix `rallocx()` to set errno to ENOMEM upon OOMing. (@arter97: 38056fea,
@interwq: 83b07578)
- Avoid stack overflow for internal variable array usage. (@nullptr0-0:
47c9bcd4, 48f66cf4, @xinydev: 9169e927)
- Fix background thread initialization race. (@puzpuzpuz: 4d0ffa07)
- Guard os_page_id against a NULL address. (@lexprfuncall: 79cc7dcc)
- Handle tcache init failures gracefully. (@lexprfuncall: a056c20d)
- Fix missing release of acquired neighbor edata in
extent_try_coalesce_impl. (@spredolac: 675ab079)
- Fix memory leak of old curr_reg on san_bump_grow_locked failure.
(@spredolac: 5904a421)
- Fix large alloc nrequests under-counting on cache misses. (@spredolac:
3cc56d32)
Portability improvements:
- Fix the build in C99. (@abaelhe: 56ddbea2)
- Add `pthread_setaffinity_np` detection for non Linux/BSD platforms.
(@devnexen: 4c95c953)
- Make `VARIABLE_ARRAY` compatible with compilers not supporting VLA,
i.e., Visual Studio C compiler in C11 or C17 modes. (@madscientist:
be65438f)
- Fix the build on Linux using musl library. (@marv: aba1645f, 45249cf5)
- Reduce the memory overhead in small allocation sampling for systems
with larger page sizes, e.g., ARM. (@Svetlitski: 5a858c64)
- Add C23's `free_sized` and `free_aligned_sized`. (@Svetlitski:
cdb2c0e0)
- Enable heap profiling on MacOS. (@nullptr0-0: 4b555c11)
- Fix incorrect printing on 32bit. (@sundb: 630434bb)
- Make `JEMALLOC_CXX_THROW` compatible with C++ versions newer than
C++17. (@r-barnes, @guangli-dai: 21bcc0a8)
- Fix mmap tag conflicts on MacOS. (@kdrag0n: c893fcd1)
- Fix monotonic timer assumption for win32. (@burtonli: 8dc97b11)
- Fix VM over-reservation on systems with larger pages, e.g., aarch64.
(@interwq: cd05b19f)
- Remove `unreachable()` macro conditionally to prevent definition
conflicts for C23+. (@appujee: d8486b26, 4b88bddb)
- Fix dlsym failure observed on FreeBSD. (@rhelmot: 86bbabac)
- Change the default page size to 64KB on aarch64 Linux. (@lexprfuncall:
9442300c)
- Update config.guess and config.sub to the latest version.
(@lexprfuncall: c51949ea)
- Determine the page size on Android from NDK header files.
(@lexprfuncall: c51abba1)
- Improve the portability of grep patterns in configure.ac.
(@lexprfuncall: 365747bc)
- Add compile-time option `--with-cxx-stdlib` to specify the C++ standard
library. (@yuxuanchen1997: a10ef3e1)
Optimizations and refactors:
- Enable tcache for deallocation-only threads. (@interwq: 143e9c4a)
- Inline to accelerate operator delete. (@guangli-dai: e8f9f138)
- Optimize pairing heap's performance. (@deadalnix: 5266152d, be6da4f6,
543e2d61, 10d71315, 92aa52c0, @Svetlitski: 36ca0c1b)
- Inline the storage for thread name in the profiling data. (@interwq:
ce0b7ab6, e62aa478)
- Optimize a hot function `edata_cmp_summary_comp` to accelerate it.
(@Svetlitski: 6841110b, @guangli-dai: 0181aaa4)
- Allocate thread cache using the base allocator, which enables thread
cache to use thp when `metadata_thp` is turned on. (@interwq:
72cfdce7)
- Allow oversize arena not to purge immediately when background threads
are enabled, although the default decay time is 0 to be back compatible.
(@interwq: d1313313)
- Optimize thread-local storage implementation on Windows. (@mcfi:
9e123a83, 3a0d9cda)
- Optimize fast path to allow static size class computation. (@interwq:
323ed2e3)
- Redesign tcache GC to regulate the frequency and make it
locality-aware. The new design is default on, guarded by option
`experimental_tcache_gc`. (@nullptr0-0: 0c88be9e, e2c9f3a9,
14d5dc13, @deadalnix: 5afff2e4)
- Reduce the arena switching overhead by avoiding forced purging when
background thread is enabled. (@interwq: a3910b98)
- Improve the reuse efficiency by limiting the maximum coalesced size for
large extents. (@jiebinn: 3c14707b)
- Refactor thread events to allow registration of users' thread events
and remove prof_threshold as the built-in event. (@spredolac: e6864c60,
015b0179, 34ace916)
Documentation:
- Update Windows building instructions. (@Lapenkov: 37139328)
- Add vcpkg installation instructions. (@LilyWangLL: c0c9783e)
- Update profiling internals with an example. (@jordalgo: b04e7666)
* 5.3.0 (May 6, 2022)
This release contains many speed and space optimizations, from micro
optimizations on common paths to rework of internal data structures and
locking schemes, and many more too detailed to list below. Multiple percent
of system level metric improvements were measured in tested production
workloads. The release has gone through large-scale production testing.
New features:
- Add the thread.idle mallctl which hints that the calling thread will be
idle for a nontrivial period of time. (@davidtgoldblatt)
- Allow small size classes to be the maximum size class to cache in the
thread-specific cache, through the opt.[lg_]tcache_max option. (@interwq,
@jordalgo)
- Make the behavior of realloc(ptr, 0) configurable with opt.zero_realloc.
(@davidtgoldblatt)
- Add 'make uninstall' support. (@sangshuduo, @Lapenkov)
- Support C++17 over-aligned allocation. (@marksantaniello)
- Add the thread.peak mallctl for approximate per-thread peak memory tracking.
(@davidtgoldblatt)
- Add interval-based stats output opt.stats_interval. (@interwq)
- Add prof.prefix to override filename prefixes for dumps. (@zhxchen17)
- Add high resolution timestamp support for profiling. (@tyroguru)
- Add the --collapsed flag to jeprof for flamegraph generation.
(@igorwwwwwwwwwwwwwwwwwwww)
- Add the --debug-syms-by-id option to jeprof for debug symbols discovery.
(@DeannaGelbart)
- Add the opt.prof_leak_error option to exit with error code when leak is
detected using opt.prof_final. (@yunxuo)
- Add opt.cache_oblivious as an runtime alternative to config.cache_oblivious.
(@interwq)
- Add mallctl interfaces:
+ opt.zero_realloc (@davidtgoldblatt)
+ opt.cache_oblivious (@interwq)
+ opt.prof_leak_error (@yunxuo)
+ opt.stats_interval (@interwq)
+ opt.stats_interval_opts (@interwq)
+ opt.tcache_max (@interwq)
+ opt.trust_madvise (@azat)
+ prof.prefix (@zhxchen17)
+ stats.zero_reallocs (@davidtgoldblatt)
+ thread.idle (@davidtgoldblatt)
+ thread.peak.{read,reset} (@davidtgoldblatt)
Bug fixes:
- Fix the synchronization around explicit tcache creation which could cause
invalid tcache identifiers. This regression was first released in 5.0.0.
(@yoshinorim, @davidtgoldblatt)
- Fix a profiling biasing issue which could cause incorrect heap usage and
object counts. This issue existed in all previous releases with the heap
profiling feature. (@davidtgoldblatt)
- Fix the order of stats counter updating on large realloc which could cause
failed assertions. This regression was first released in 5.0.0. (@azat)
- Fix the locking on the arena destroy mallctl, which could cause concurrent
arena creations to fail. This functionality was first introduced in 5.0.0.
(@interwq)
Portability improvements:
- Remove nothrow from system function declarations on macOS and FreeBSD.
(@davidtgoldblatt, @fredemmott, @leres)
- Improve overcommit and page alignment settings on NetBSD. (@zoulasc)
- Improve CPU affinity support on BSD platforms. (@devnexen)
- Improve utrace detection and support. (@devnexen)
- Improve QEMU support with MADV_DONTNEED zeroed pages detection. (@azat)
- Add memcntl support on Solaris / illumos. (@devnexen)
- Improve CPU_SPINWAIT on ARM. (@AWSjswinney)
- Improve TSD cleanup on FreeBSD. (@Lapenkov)
- Disable percpu_arena if the CPU count cannot be reliably detected. (@azat)
- Add malloc_size(3) override support. (@devnexen)
- Add mmap VM_MAKE_TAG support. (@devnexen)
- Add support for MADV_[NO]CORE. (@devnexen)
- Add support for DragonFlyBSD. (@devnexen)
- Fix the QUANTUM setting on MIPS64. (@brooksdavis)
- Add the QUANTUM setting for ARC. (@vineetgarc)
- Add the QUANTUM setting for LoongArch. (@wangjl-uos)
- Add QNX support. (@jqian-aurora)
- Avoid atexit(3) calls unless the relevant profiling features are enabled.
(@BusyJay, @laiwei-rice, @interwq)
- Fix unknown option detection when using Clang. (@Lapenkov)
- Fix symbol conflict with musl libc. (@georgthegreat)
- Add -Wimplicit-fallthrough checks. (@nickdesaulniers)
- Add __forceinline support on MSVC. (@santagada)
- Improve FreeBSD and Windows CI support. (@Lapenkov)
- Add CI support for PPC64LE architecture. (@ezeeyahoo)
Incompatible changes:
- Maximum size class allowed in tcache (opt.[lg_]tcache_max) now has an upper
bound of 8MiB. (@interwq)
Optimizations and refactors (@davidtgoldblatt, @Lapenkov, @interwq):
- Optimize the common cases of the thread cache operations.
- Optimize internal data structures, including RB tree and pairing heap.
- Optimize the internal locking on extent management.
- Extract and refactor the internal page allocator and interface modules.
Documentation:
- Fix doc build with --with-install-suffix. (@lawmurray, @interwq)
- Add PROFILING_INTERNALS.md. (@davidtgoldblatt)
- Ensure the proper order of doc building and installation. (@Mingli-Yu)
* 5.2.1 (August 5, 2019)
This release is primarily about Windows. A critical virtual memory leak is
resolved on all Windows platforms. The regression was present in all releases
since 5.0.0.
Bug fixes:
- Fix a severe virtual memory leak on Windows. This regression was first
released in 5.0.0. (@Ignition, @j0t, @frederik-h, @davidtgoldblatt,
@interwq)
- Fix size 0 handling in posix_memalign(). This regression was first released
in 5.2.0. (@interwq)
- Fix the prof_log unit test which may observe unexpected backtraces from
compiler optimizations. The test was first added in 5.2.0. (@marxin,
@gnzlbg, @interwq)
- Fix the declaration of the extent_avail tree. This regression was first
released in 5.1.0. (@zoulasc)
- Fix an incorrect reference in jeprof. This functionality was first released
in 3.0.0. (@prehistoric-penguin)
- Fix an assertion on the deallocation fast-path. This regression was first
released in 5.2.0. (@yinan1048576)
- Fix the TLS_MODEL attribute in headers. This regression was first released
in 5.0.0. (@zoulasc, @interwq)
Optimizations and refactors:
- Implement opt.retain on Windows and enable by default on 64-bit. (@interwq,
@davidtgoldblatt)
- Optimize away a branch on the operator delete[] path. (@mgrice)
- Add format annotation to the format generator function. (@zoulasc)
- Refactor and improve the size class header generation. (@yinan1048576)
- Remove best fit. (@djwatson)
- Avoid blocking on background thread locks for stats. (@oranagra, @interwq)
* 5.2.0 (April 2, 2019)
This release includes a few notable improvements, which are summarized below:
1) improved fast-path performance from the optimizations by @djwatson; 2)
reduced virtual memory fragmentation and metadata usage; and 3) bug fixes on
setting the number of background threads. In addition, peak / spike memory
usage is improved with certain allocation patterns. As usual, the release and
prior dev versions have gone through large-scale production testing.
New features:
- Implement oversize_threshold, which uses a dedicated arena for allocations
crossing the specified threshold to reduce fragmentation. (@interwq)
- Add extents usage information to stats. (@tyleretzel)
- Log time information for sampled allocations. (@tyleretzel)
- Support 0 size in sdallocx. (@djwatson)
- Output rate for certain counters in malloc_stats. (@zinoale)
- Add configure option --enable-readlinkat, which allows the use of readlinkat
over readlink. (@davidtgoldblatt)
- Add configure options --{enable,disable}-{static,shared} to allow not
building unwanted libraries. (@Ericson2314)
- Add configure option --disable-libdl to enable fully static builds.
(@interwq)
- Add mallctl interfaces:
+ opt.oversize_threshold (@interwq)
+ stats.arenas.<i>.extent_avail (@tyleretzel)
+ stats.arenas.<i>.extents.<j>.n{dirty,muzzy,retained} (@tyleretzel)
+ stats.arenas.<i>.extents.<j>.{dirty,muzzy,retained}_bytes
(@tyleretzel)
Portability improvements:
- Update MSVC builds. (@maksqwe, @rustyx)
- Workaround a compiler optimizer bug on s390x. (@rkmisra)
- Make use of pthread_set_name_np(3) on FreeBSD. (@trasz)
- Implement malloc_getcpu() to enable percpu_arena for windows. (@santagada)
- Link against -pthread instead of -lpthread. (@paravoid)
- Make background_thread not dependent on libdl. (@interwq)
- Add stringify to fix a linker directive issue on MSVC. (@daverigby)
- Detect and fall back when 8-bit atomics are unavailable. (@interwq)
- Fall back to the default pthread_create if dlsym(3) fails. (@interwq)
Optimizations and refactors:
- Refactor the TSD module. (@davidtgoldblatt)
- Avoid taking extents_muzzy mutex when muzzy is disabled. (@interwq)
- Avoid taking large_mtx for auto arenas on the tcache flush path. (@interwq)
- Optimize ixalloc by avoiding a size lookup. (@interwq)
- Implement opt.oversize_threshold which uses a dedicated arena for requests
crossing the threshold, also eagerly purges the oversize extents. Default
the threshold to 8 MiB. (@interwq)
- Clean compilation with -Wextra. (@gnzlbg, @jasone)
- Refactor the size class module. (@davidtgoldblatt)
- Refactor the stats emitter. (@tyleretzel)
- Optimize pow2_ceil. (@rkmisra)
- Avoid runtime detection of lazy purging on FreeBSD. (@trasz)
- Optimize mmap(2) alignment handling on FreeBSD. (@trasz)
- Improve error handling for THP state initialization. (@jsteemann)
- Rework the malloc() fast path. (@djwatson)
- Rework the free() fast path. (@djwatson)
- Refactor and optimize the tcache fill / flush paths. (@djwatson)
- Optimize sync / lwsync on PowerPC. (@chmeeedalf)
- Bypass extent_dalloc() when retain is enabled. (@interwq)
- Optimize the locking on large deallocation. (@interwq)
- Reduce the number of pages committed from sanity checking in debug build.
(@trasz, @interwq)
- Deprecate OSSpinLock. (@interwq)
- Lower the default number of background threads to 4 (when the feature
is enabled). (@interwq)
- Optimize the trylock spin wait. (@djwatson)
- Use arena index for arena-matching checks. (@interwq)
- Avoid forced decay on thread termination when using background threads.
(@interwq)
- Disable muzzy decay by default. (@djwatson, @interwq)
- Only initialize libgcc unwinder when profiling is enabled. (@paravoid,
@interwq)
Bug fixes (all only relevant to jemalloc 5.x):
- Fix background thread index issues with max_background_threads. (@djwatson,
@interwq)
- Fix stats output for opt.lg_extent_max_active_fit. (@interwq)
- Fix opt.prof_prefix initialization. (@davidtgoldblatt)
- Properly trigger decay on tcache destroy. (@interwq, @amosbird)
- Fix tcache.flush. (@interwq)
- Detect whether explicit extent zero out is necessary with huge pages or
custom extent hooks, which may change the purge semantics. (@interwq)
- Fix a side effect caused by extent_max_active_fit combined with decay-based
purging, where freed extents can accumulate and not be reused for an
extended period of time. (@interwq, @mpghf)
- Fix a missing unlock on extent register error handling. (@zoulasc)
Testing:
- Simplify the Travis script output. (@gnzlbg)
- Update the test scripts for FreeBSD. (@devnexen)
- Add unit tests for the producer-consumer pattern. (@interwq)
- Add Cirrus-CI config for FreeBSD builds. (@jasone)
- Add size-matching sanity checks on tcache flush. (@davidtgoldblatt,
@interwq)
Incompatible changes:
- Remove --with-lg-page-sizes. (@davidtgoldblatt)
Documentation:
- Attempt to build docs by default, however skip doc building when xsltproc
is missing. (@interwq, @cmuellner)
* 5.1.0 (May 4, 2018)
This release is primarily about fine-tuning, ranging from several new features
to numerous notable performance and portability enhancements. The release and
prior dev versions have been running in multiple large scale applications for
months, and the cumulative improvements are substantial in many cases.
Given the long and successful production runs, this release is likely a good
candidate for applications to upgrade, from both jemalloc 5.0 and before. For
performance-critical applications, the newly added TUNING.md provides
guidelines on jemalloc tuning.
New features:
- Implement transparent huge page support for internal metadata. (@interwq)
- Add opt.thp to allow enabling / disabling transparent huge pages for all
mappings. (@interwq)
- Add maximum background thread count option. (@djwatson)
- Allow prof_active to control opt.lg_prof_interval and prof.gdump.
(@interwq)
- Allow arena index lookup based on allocation addresses via mallctl.
(@lionkov)
- Allow disabling initial-exec TLS model. (@davidtgoldblatt, @KenMacD)
- Add opt.lg_extent_max_active_fit to set the max ratio between the size of
the active extent selected (to split off from) and the size of the requested
allocation. (@interwq, @davidtgoldblatt)
- Add retain_grow_limit to set the max size when growing virtual address
space. (@interwq)
- Add mallctl interfaces:
+ arena.<i>.retain_grow_limit (@interwq)
+ arenas.lookup (@lionkov)
+ max_background_threads (@djwatson)
+ opt.lg_extent_max_active_fit (@interwq)
+ opt.max_background_threads (@djwatson)
+ opt.metadata_thp (@interwq)
+ opt.thp (@interwq)
+ stats.metadata_thp (@interwq)
Portability improvements:
- Support GNU/kFreeBSD configuration. (@paravoid)
- Support m68k, nios2 and SH3 architectures. (@paravoid)
- Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable. (@zonyitoo)
- Fix symbol listing for cross-compiling. (@tamird)
- Fix high bits computation on ARM. (@davidtgoldblatt, @paravoid)
- Disable the CPU_SPINWAIT macro for Power. (@davidtgoldblatt, @marxin)
- Fix MSVC 2015 & 2017 builds. (@rustyx)
- Improve RISC-V support. (@EdSchouten)
- Set name mangling script in strict mode. (@nicolov)
- Avoid MADV_HUGEPAGE on ARM. (@marxin)
- Modify configure to determine return value of strerror_r.
(@davidtgoldblatt, @cferris1000)
- Make sure CXXFLAGS is tested with CPP compiler. (@nehaljwani)
- Fix 32-bit build on MSVC. (@rustyx)
- Fix external symbol on MSVC. (@maksqwe)
- Avoid a printf format specifier warning. (@jasone)
- Add configure option --disable-initial-exec-tls which can allow jemalloc to
be dynamically loaded after program startup. (@davidtgoldblatt, @KenMacD)
- AArch64: Add ILP32 support. (@cmuellner)
- Add --with-lg-vaddr configure option to support cross compiling.
(@cmuellner, @davidtgoldblatt)
Optimizations and refactors:
- Improve active extent fit with extent_max_active_fit. This considerably
reduces fragmentation over time and improves virtual memory and metadata
usage. (@davidtgoldblatt, @interwq)
- Eagerly coalesce large extents to reduce fragmentation. (@interwq)
- sdallocx: only read size info when page aligned (i.e. possibly sampled),
which speeds up the sized deallocation path significantly. (@interwq)
- Avoid attempting new mappings for in place expansion with retain, since
it rarely succeeds in practice and causes high overhead. (@interwq)
- Refactor OOM handling in newImpl. (@wqfish)
- Add internal fine-grained logging functionality for debugging use.
(@davidtgoldblatt)
- Refactor arena / tcache interactions. (@davidtgoldblatt)
- Refactor extent management with dumpable flag. (@davidtgoldblatt)
- Add runtime detection of lazy purging. (@interwq)
- Use pairing heap instead of red-black tree for extents_avail. (@djwatson)
- Use sysctl on startup in FreeBSD. (@trasz)
- Use thread local prng state instead of atomic. (@djwatson)
- Make decay to always purge one more extent than before, because in
practice large extents are usually the ones that cross the decay threshold.
Purging the additional extent helps save memory as well as reduce VM
fragmentation. (@interwq)
- Fast division by dynamic values. (@davidtgoldblatt)
- Improve the fit for aligned allocation. (@interwq, @edwinsmith)
- Refactor extent_t bitpacking. (@rkmisra)
- Optimize the generated assembly for ticker operations. (@davidtgoldblatt)
- Convert stats printing to use a structured text emitter. (@davidtgoldblatt)
- Remove preserve_lru feature for extents management. (@djwatson)
- Consolidate two memory loads into one on the fast deallocation path.
(@davidtgoldblatt, @interwq)
Bug fixes (most of the issues are only relevant to jemalloc 5.0):
- Fix deadlock with multithreaded fork in OS X. (@davidtgoldblatt)
- Validate returned file descriptor before use. (@zonyitoo)
- Fix a few background thread initialization and shutdown issues. (@interwq)
- Fix an extent coalesce + decay race by taking both coalescing extents off
the LRU list. (@interwq)
- Fix potentially unbound increase during decay, caused by one thread keep
stashing memory to purge while other threads generating new pages. The
number of pages to purge is checked to prevent this. (@interwq)
- Fix a FreeBSD bootstrap assertion. (@strejda, @interwq)
- Handle 32 bit mutex counters. (@rkmisra)
- Fix a indexing bug when creating background threads. (@davidtgoldblatt,
@binliu19)
- Fix arguments passed to extent_init. (@yuleniwo, @interwq)
- Fix addresses used for ordering mutexes. (@rkmisra)
- Fix abort_conf processing during bootstrap. (@interwq)
- Fix include path order for out-of-tree builds. (@cmuellner)
Incompatible changes:
- Remove --disable-thp. (@interwq)
- Remove mallctl interfaces:
+ config.thp (@interwq)
Documentation:
- Add TUNING.md. (@interwq, @davidtgoldblatt, @djwatson)
* 5.0.1 (July 1, 2017)
This bugfix release fixes several issues, most of which are obscure enough
that typical applications are not impacted.
Bug fixes:
- Update decay->nunpurged before purging, in order to avoid potential update
races and subsequent incorrect purging volume. (@interwq)
- Only abort on dlsym(3) error if the failure impacts an enabled feature (lazy
locking and/or background threads). This mitigates an initialization
failure bug for which we still do not have a clear reproduction test case.
(@interwq)
- Modify tsd management so that it neither crashes nor leaks if a thread's
only allocation activity is to call free() after TLS destructors have been
executed. This behavior was observed when operating with GNU libc, and is
unlikely to be an issue with other libc implementations. (@interwq)
- Mask signals during background thread creation. This prevents signals from
being inadvertently delivered to background threads. (@jasone,
@davidtgoldblatt, @interwq)
- Avoid inactivity checks within background threads, in order to prevent
recursive mutex acquisition. (@interwq)
- Fix extent_grow_retained() to use the specified hooks when the
arena.<i>.extent_hooks mallctl is used to override the default hooks.
(@interwq)
- Add missing reentrancy support for custom extent hooks which allocate.
(@interwq)
- Post-fork(2), re-initialize the list of tcaches associated with each arena
to contain no tcaches except the forking thread's. (@interwq)
- Add missing post-fork(2) mutex reinitialization for extent_grow_mtx. This
fixes potential deadlocks after fork(2). (@interwq)
- Enforce minimum autoconf version (currently 2.68), since 2.63 is known to
generate corrupt configure scripts. (@jasone)
- Ensure that the configured page size (--with-lg-page) is no larger than the
configured huge page size (--with-lg-hugepage). (@jasone)
* 5.0.0 (June 13, 2017)
Unlike all previous jemalloc releases, this release does not use naturally
aligned "chunks" for virtual memory management, and instead uses page-aligned
"extents". This change has few externally visible effects, but the internal
impacts are... extensive. Many other internal changes combine to make this
the most cohesively designed version of jemalloc so far, with ample
opportunity for further enhancements.
Continuous integration is now an integral aspect of development thanks to the
efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably
stable on the tested platforms (Linux, FreeBSD, macOS, and Windows). As a
side effect the official release frequency may decrease over time.
New features:
- Implement optional per-CPU arena support; threads choose which arena to use
based on current CPU rather than on fixed thread-->arena associations.
(@interwq)
- Implement two-phase decay of unused dirty pages. Pages transition from
dirty-->muzzy-->clean, where the first phase transition relies on
madvise(... MADV_FREE) semantics, and the second phase transition discards
pages such that they are replaced with demand-zeroed pages on next access.
(@jasone)
- Increase decay time resolution from seconds to milliseconds. (@jasone)
- Implement opt-in per CPU background threads, and use them for asynchronous
decay-driven unused dirty page purging. (@interwq)
- Add mutex profiling, which collects a variety of statistics useful for
diagnosing overhead/contention issues. (@interwq)
- Add C++ new/delete operator bindings. (@djwatson)
- Support manually created arena destruction, such that all data and metadata
are discarded. Add MALLCTL_ARENAS_DESTROYED for accessing merged stats
associated with destroyed arenas. (@jasone)
- Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing
merged/destroyed arena statistics via mallctl. (@jasone)
- Add opt.abort_conf to optionally abort if invalid configuration options are
detected during initialization. (@interwq)
- Add opt.stats_print_opts, so that e.g. JSON output can be selected for the
stats dumped during exit if opt.stats_print is true. (@jasone)
- Add --with-version=VERSION for use when embedding jemalloc into another
project's git repository. (@jasone)
- Add --disable-thp to support cross compiling. (@jasone)
- Add --with-lg-hugepage to support cross compiling. (@jasone)
- Add mallctl interfaces (various authors):
+ background_thread
+ opt.abort_conf
+ opt.retain
+ opt.percpu_arena
+ opt.background_thread
+ opt.{dirty,muzzy}_decay_ms
+ opt.stats_print_opts
+ arena.<i>.initialized
+ arena.<i>.destroy
+ arena.<i>.{dirty,muzzy}_decay_ms
+ arena.<i>.extent_hooks
+ arenas.{dirty,muzzy}_decay_ms
+ arenas.bin.<i>.slab_size
+ arenas.nlextents
+ arenas.lextent.<i>.size
+ arenas.create
+ stats.background_thread.{num_threads,num_runs,run_interval}
+ stats.mutexes.{ctl,background_thread,prof,reset}.
{num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
num_owner_switch}
+ stats.arenas.<i>.{dirty,muzzy}_decay_ms
+ stats.arenas.<i>.uptime
+ stats.arenas.<i>.{pmuzzy,base,internal,resident}
+ stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}
+ stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs}
+ stats.arenas.<i>.bins.<j>.mutex.
{num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
num_owner_switch}
+ stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents}
+ stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy,
extents_retained,decay_dirty,decay_muzzy,base,tcache_list}.
{num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
num_owner_switch}
Portability improvements:
- Improve reentrant allocation support, such that deadlock is less likely if
e.g. a system library call in turn allocates memory. (@davidtgoldblatt,
@interwq)
- Support static linking of jemalloc with glibc. (@djwatson)
Optimizations and refactors:
- Organize virtual memory as "extents" of virtual memory pages, rather than as
naturally aligned "chunks", and store all metadata in arbitrarily distant
locations. This reduces virtual memory external fragmentation, and will
interact better with huge pages (not yet explicitly supported). (@jasone)
- Fold large and huge size classes together; only small and large size classes
remain. (@jasone)
- Unify the allocation paths, and merge most fast-path branching decisions.
(@davidtgoldblatt, @interwq)
- Embed per thread automatic tcache into thread-specific data, which reduces
conditional branches and dereferences. Also reorganize tcache to increase
fast-path data locality. (@interwq)
- Rewrite atomics to closely model the C11 API, convert various
synchronization from mutex-based to atomic, and use the explicit memory
ordering control to resolve various hypothetical races without increasing
synchronization overhead. (@davidtgoldblatt)
- Extensively optimize rtree via various methods:
+ Add multiple layers of rtree lookup caching, since rtree lookups are now
part of fast-path deallocation. (@interwq)
+ Determine rtree layout at compile time. (@jasone)
+ Make the tree shallower for common configurations. (@jasone)
+ Embed the root node in the top-level rtree data structure, thus avoiding
one level of indirection. (@jasone)
+ Further specialize leaf elements as compared to internal node elements,
and directly embed extent metadata needed for fast-path deallocation.
(@jasone)
+ Ignore leading always-zero address bits (architecture-specific).
(@jasone)
- Reorganize headers (ongoing work) to make them hermetic, and disentangle
various module dependencies. (@davidtgoldblatt)
- Convert various internal data structures such as size class metadata from
boot-time-initialized to compile-time-initialized. Propagate resulting data
structure simplifications, such as making arena metadata fixed-size.
(@jasone)
- Simplify size class lookups when constrained to size classes that are
multiples of the page size. This speeds lookups, but the primary benefit is
complexity reduction in code that was the source of numerous regressions.
(@jasone)
- Lock individual extents when possible for localized extent operations,
rather than relying on a top-level arena lock. (@davidtgoldblatt, @jasone)
- Use first fit layout policy instead of best fit, in order to improve
packing. (@jasone)
- If munmap(2) is not in use, use an exponential series to grow each arena's
virtual memory, so that the number of disjoint virtual memory mappings
remains low. (@jasone)
- Implement per arena base allocators, so that arenas never share any virtual
memory pages. (@jasone)
- Automatically generate private symbol name mangling macros. (@jasone)
Incompatible changes:
- Replace chunk hooks with an expanded/normalized set of extent hooks.
(@jasone)
- Remove ratio-based purging. (@jasone)
- Remove --disable-tcache. (@jasone)
- Remove --disable-tls. (@jasone)
- Remove --enable-ivsalloc. (@jasone)
- Remove --with-lg-size-class-group. (@jasone)
- Remove --with-lg-tiny-min. (@jasone)
- Remove --disable-cc-silence. (@jasone)
- Remove --enable-code-coverage. (@jasone)
- Remove --disable-munmap (replaced by opt.retain). (@jasone)
- Remove Valgrind support. (@jasone)
- Remove quarantine support. (@jasone)
- Remove redzone support. (@jasone)
- Remove mallctl interfaces (various authors):
+ config.munmap
+ config.tcache
+ config.tls
+ config.valgrind
+ opt.lg_chunk
+ opt.purge
+ opt.lg_dirty_mult
+ opt.decay_time
+ opt.quarantine
+ opt.redzone
+ opt.thp
+ arena.<i>.lg_dirty_mult
+ arena.<i>.decay_time
+ arena.<i>.chunk_hooks
+ arenas.initialized
+ arenas.lg_dirty_mult
+ arenas.decay_time
+ arenas.bin.<i>.run_size
+ arenas.nlruns
+ arenas.lrun.<i>.size
+ arenas.nhchunks
+ arenas.hchunk.<i>.size
+ arenas.extend
+ stats.cactive
+ stats.arenas.<i>.lg_dirty_mult
+ stats.arenas.<i>.decay_time
+ stats.arenas.<i>.metadata.{mapped,allocated}
+ stats.arenas.<i>.{npurge,nmadvise,purged}
+ stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests}
+ stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns}
+ stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns}
+ stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks}
Bug fixes:
- Improve interval-based profile dump triggering to dump only one profile when
a single allocation's size exceeds the interval. (@jasone)
- Use prefixed function names (as controlled by --with-jemalloc-prefix) when
pruning backtrace frames in jeprof. (@jasone)
* 4.5.0 (February 28, 2017)
This is the first release to benefit from much broader continuous integration
testing, thanks to @davidtgoldblatt. Had we had this testing infrastructure
in place for prior releases, it would have caught all of the most serious
regressions fixed by this release.
New features:
- Add --disable-thp and the opt.thp mallctl to provide opt-out mechanisms for
transparent huge page integration. (@jasone)
- Update zone allocator integration to work with macOS 10.12. (@glandium)
- Restructure *CFLAGS configuration, so that CFLAGS behaves typically, and
EXTRA_CFLAGS provides a way to specify e.g. -Werror during building, but not
during configuration. (@jasone, @ronawho)
Bug fixes:
- Fix DSS (sbrk(2)-based) allocation. This regression was first released in
4.3.0. (@jasone)
- Handle race in per size class utilization computation. This functionality
was first released in 4.0.0. (@interwq)
- Fix lock order reversal during gdump. (@jasone)
- Fix/refactor tcache synchronization. This regression was first released in
4.0.0. (@jasone)
- Fix various JSON-formatted malloc_stats_print() bugs. This functionality
was first released in 4.3.0. (@jasone)
- Fix huge-aligned allocation. This regression was first released in 4.4.0.
(@jasone)
- When transparent huge page integration is enabled, detect what state pages
start in according to the kernel's current operating mode, and only convert
arena chunks to non-huge during purging if that is not their initial state.
This functionality was first released in 4.4.0. (@jasone)
- Fix lg_chunk clamping for the --enable-cache-oblivious --disable-fill case.
This regression was first released in 4.0.0. (@jasone, @428desmo)
- Properly detect sparc64 when building for Linux. (@glaubitz)
* 4.4.0 (December 3, 2016)
New features:
- Add configure support for *-*-linux-android. (@cferris1000, @jasone)
- Add the --disable-syscall configure option, for use on systems that place
security-motivated limitations on syscall(2). (@jasone)
- Add support for Debian GNU/kFreeBSD. (@thesam)
Optimizations:
- Add extent serial numbers and use them where appropriate as a sort key that
is higher priority than address, so that the allocation policy prefers older
extents. This tends to improve locality (decrease fragmentation) when
memory grows downward. (@jasone)
- Refactor madvise(2) configuration so that MADV_FREE is detected and utilized
on Linux 4.5 and newer. (@jasone)
- Mark partially purged arena chunks as non-huge-page. This improves
interaction with Linux's transparent huge page functionality. (@jasone)
Bug fixes:
- Fix size class computations for edge conditions involving extremely large
allocations. This regression was first released in 4.0.0. (@jasone,
@ingvarha)
- Remove overly restrictive assertions related to the cactive statistic. This
regression was first released in 4.1.0. (@jasone)
- Implement a more reliable detection scheme for os_unfair_lock on macOS.
(@jszakmeister)
* 4.3.1 (November 7, 2016)
Bug fixes:
@ -231,7 +1016,7 @@ brevity. Much more detail can be found in the git revision history:
these fixes, xallocx() now tries harder to partially fulfill requests for
optional extra space. Note that a couple of minor heap profiling
optimizations are included, but these are better thought of as performance
fixes that were integral to disovering most of the other bugs.
fixes that were integral to discovering most of the other bugs.
Optimizations:
- Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the

414
INSTALL
View file

@ -1,414 +0,0 @@
Building and installing a packaged release of jemalloc can be as simple as
typing the following while in the root directory of the source tree:
./configure
make
make install
If building from unpackaged developer sources, the simplest command sequence
that might work is:
./autogen.sh
make dist
make
make install
Note that documentation is not built by the default target because doing so
would create a dependency on xsltproc in packaged releases, hence the
requirement to either run 'make dist' or avoid installing docs via the various
install_* targets documented below.
=== Advanced configuration =====================================================
The 'configure' script supports numerous options that allow control of which
functionality is enabled, where jemalloc is installed, etc. Optionally, pass
any of the following arguments (not a definitive list) to 'configure':
--help
Print a definitive list of options.
--prefix=<install-root-dir>
Set the base directory in which to install. For example:
./configure --prefix=/usr/local
will cause files to be installed into /usr/local/include, /usr/local/lib,
and /usr/local/man.
--with-version=<major>.<minor>.<bugfix>-<nrev>-g<gid>
Use the specified version string rather than trying to generate one (if in
a git repository) or use existing the VERSION file (if present).
--with-rpath=<colon-separated-rpath>
Embed one or more library paths, so that libjemalloc can find the libraries
it is linked to. This works only on ELF-based systems.
--with-mangling=<map>
Mangle public symbols specified in <map> which is a comma-separated list of
name:mangled pairs.
For example, to use ld's --wrap option as an alternative method for
overriding libc's malloc implementation, specify something like:
--with-mangling=malloc:__wrap_malloc,free:__wrap_free[...]
Note that mangling happens prior to application of the prefix specified by
--with-jemalloc-prefix, and mangled symbols are then ignored when applying
the prefix.
--with-jemalloc-prefix=<prefix>
Prefix all public APIs with <prefix>. For example, if <prefix> is
"prefix_", API changes like the following occur:
malloc() --> prefix_malloc()
malloc_conf --> prefix_malloc_conf
/etc/malloc.conf --> /etc/prefix_malloc.conf
MALLOC_CONF --> PREFIX_MALLOC_CONF
This makes it possible to use jemalloc at the same time as the system
allocator, or even to use multiple copies of jemalloc simultaneously.
By default, the prefix is "", except on OS X, where it is "je_". On OS X,
jemalloc overlays the default malloc zone, but makes no attempt to actually
replace the "malloc", "calloc", etc. symbols.
--without-export
Don't export public APIs. This can be useful when building jemalloc as a
static library, or to avoid exporting public APIs when using the zone
allocator on OSX.
--with-private-namespace=<prefix>
Prefix all library-private APIs with <prefix>je_. For shared libraries,
symbol visibility mechanisms prevent these symbols from being exported, but
for static libraries, naming collisions are a real possibility. By
default, <prefix> is empty, which results in a symbol prefix of je_ .
--with-install-suffix=<suffix>
Append <suffix> to the base name of all installed files, such that multiple
versions of jemalloc can coexist in the same installation directory. For
example, libjemalloc.so.0 becomes libjemalloc<suffix>.so.0.
--with-malloc-conf=<malloc_conf>
Embed <malloc_conf> as a run-time options string that is processed prior to
the malloc_conf global variable, the /etc/malloc.conf symlink, and the
MALLOC_CONF environment variable. For example, to change the default chunk
size to 256 KiB:
--with-malloc-conf=lg_chunk:18
--disable-cc-silence
Disable code that silences non-useful compiler warnings. This is mainly
useful during development when auditing the set of warnings that are being
silenced.
--enable-debug
Enable assertions and validation code. This incurs a substantial
performance hit, but is very useful during application development.
Implies --enable-ivsalloc.
--enable-code-coverage
Enable code coverage support, for use during jemalloc test development.
Additional testing targets are available if this option is enabled:
coverage
coverage_unit
coverage_integration
coverage_stress
These targets do not clear code coverage results from previous runs, and
there are interactions between the various coverage targets, so it is
usually advisable to run 'make clean' between repeated code coverage runs.
--disable-stats
Disable statistics gathering functionality. See the "opt.stats_print"
option documentation for usage details.
--enable-ivsalloc
Enable validation code, which verifies that pointers reside within
jemalloc-owned chunks before dereferencing them. This incurs a minor
performance hit.
--enable-prof
Enable heap profiling and leak detection functionality. See the "opt.prof"
option documentation for usage details. When enabled, there are several
approaches to backtracing, and the configure script chooses the first one
in the following list that appears to function correctly:
+ libunwind (requires --enable-prof-libunwind)
+ libgcc (unless --disable-prof-libgcc)
+ gcc intrinsics (unless --disable-prof-gcc)
--enable-prof-libunwind
Use the libunwind library (http://www.nongnu.org/libunwind/) for stack
backtracing.
--disable-prof-libgcc
Disable the use of libgcc's backtracing functionality.
--disable-prof-gcc
Disable the use of gcc intrinsics for backtracing.
--with-static-libunwind=<libunwind.a>
Statically link against the specified libunwind.a rather than dynamically
linking with -lunwind.
--disable-tcache
Disable thread-specific caches for small objects. Objects are cached and
released in bulk, thus reducing the total number of mutex operations. See
the "opt.tcache" option for usage details.
--disable-munmap
Disable virtual memory deallocation via munmap(2); instead keep track of
the virtual memory for later use. munmap() is disabled by default (i.e.
--disable-munmap is implied) on Linux, which has a quirk in its virtual
memory allocation algorithm that causes semi-permanent VM map holes under
normal jemalloc operation.
--disable-fill
Disable support for junk/zero filling of memory, quarantine, and redzones.
See the "opt.junk", "opt.zero", "opt.quarantine", and "opt.redzone" option
documentation for usage details.
--disable-valgrind
Disable support for Valgrind.
--disable-zone-allocator
Disable zone allocator for Darwin. This means jemalloc won't be hooked as
the default allocator on OSX/iOS.
--enable-utrace
Enable utrace(2)-based allocation tracing. This feature is not broadly
portable (FreeBSD has it, but Linux and OS X do not).
--enable-xmalloc
Enable support for optional immediate termination due to out-of-memory
errors, as is commonly implemented by "xmalloc" wrapper function for malloc.
See the "opt.xmalloc" option documentation for usage details.
--enable-lazy-lock
Enable code that wraps pthread_create() to detect when an application
switches from single-threaded to multi-threaded mode, so that it can avoid
mutex locking/unlocking operations while in single-threaded mode. In
practice, this feature usually has little impact on performance unless
thread-specific caching is disabled.
--disable-tls
Disable thread-local storage (TLS), which allows for fast access to
thread-local variables via the __thread keyword. If TLS is available,
jemalloc uses it for several purposes.
--disable-cache-oblivious
Disable cache-oblivious large allocation alignment for large allocation
requests with no alignment constraints. If this feature is disabled, all
large allocations are page-aligned as an implementation artifact, which can
severely harm CPU cache utilization. However, the cache-oblivious layout
comes at the cost of one extra page per large allocation, which in the
most extreme case increases physical memory usage for the 16 KiB size class
to 20 KiB.
--with-xslroot=<path>
Specify where to find DocBook XSL stylesheets when building the
documentation.
--with-lg-page=<lg-page>
Specify the base 2 log of the system page size. This option is only useful
when cross compiling, since the configure script automatically determines
the host's page size by default.
--with-lg-page-sizes=<lg-page-sizes>
Specify the comma-separated base 2 logs of the page sizes to support. This
option may be useful when cross-compiling in combination with
--with-lg-page, but its primary use case is for integration with FreeBSD's
libc, wherein jemalloc is embedded.
--with-lg-size-class-group=<lg-size-class-group>
Specify the base 2 log of how many size classes to use for each doubling in
size. By default jemalloc uses <lg-size-class-group>=2, which results in
e.g. the following size classes:
[...], 64,
80, 96, 112, 128,
160, [...]
<lg-size-class-group>=3 results in e.g. the following size classes:
[...], 64,
72, 80, 88, 96, 104, 112, 120, 128,
144, [...]
The minimal <lg-size-class-group>=0 causes jemalloc to only provide size
classes that are powers of 2:
[...],
64,
128,
256,
[...]
An implementation detail currently limits the total number of small size
classes to 255, and a compilation error will result if the
<lg-size-class-group> you specify cannot be supported. The limit is
roughly <lg-size-class-group>=4, depending on page size.
--with-lg-quantum=<lg-quantum>
Specify the base 2 log of the minimum allocation alignment. jemalloc needs
to know the minimum alignment that meets the following C standard
requirement (quoted from the April 12, 2011 draft of the C11 standard):
The pointer returned if the allocation succeeds is suitably aligned so
that it may be assigned to a pointer to any type of object with a
fundamental alignment requirement and then used to access such an object
or an array of such objects in the space allocated [...]
This setting is architecture-specific, and although jemalloc includes known
safe values for the most commonly used modern architectures, there is a
wrinkle related to GNU libc (glibc) that may impact your choice of
<lg-quantum>. On most modern architectures, this mandates 16-byte alignment
(<lg-quantum>=4), but the glibc developers chose not to meet this
requirement for performance reasons. An old discussion can be found at
https://sourceware.org/bugzilla/show_bug.cgi?id=206 . Unlike glibc,
jemalloc does follow the C standard by default (caveat: jemalloc
technically cheats if --with-lg-tiny-min is smaller than
--with-lg-quantum), but the fact that Linux systems already work around
this allocator noncompliance means that it is generally safe in practice to
let jemalloc's minimum alignment follow glibc's lead. If you specify
--with-lg-quantum=3 during configuration, jemalloc will provide additional
size classes that are not 16-byte-aligned (24, 40, and 56, assuming
--with-lg-size-class-group=2).
--with-lg-tiny-min=<lg-tiny-min>
Specify the base 2 log of the minimum tiny size class to support. Tiny
size classes are powers of 2 less than the quantum, and are only
incorporated if <lg-tiny-min> is less than <lg-quantum> (see
--with-lg-quantum). Tiny size classes technically violate the C standard
requirement for minimum alignment, and crashes could conceivably result if
the compiler were to generate instructions that made alignment assumptions,
both because illegal instruction traps could result, and because accesses
could straddle page boundaries and cause segmentation faults due to
accessing unmapped addresses.
The default of <lg-tiny-min>=3 works well in practice even on architectures
that technically require 16-byte alignment, probably for the same reason
--with-lg-quantum=3 works. Smaller tiny size classes can, and will, cause
crashes (see https://bugzilla.mozilla.org/show_bug.cgi?id=691003 for an
example).
This option is rarely useful, and is mainly provided as documentation of a
subtle implementation detail. If you do use this option, specify a
value in [3, ..., <lg-quantum>].
The following environment variables (not a definitive list) impact configure's
behavior:
CFLAGS="?"
Pass these flags to the compiler. You probably shouldn't define this unless
you know what you are doing. (Use EXTRA_CFLAGS instead.)
EXTRA_CFLAGS="?"
Append these flags to CFLAGS. This makes it possible to add flags such as
-Werror, while allowing the configure script to determine what other flags
are appropriate for the specified configuration.
The configure script specifically checks whether an optimization flag (-O*)
is specified in EXTRA_CFLAGS, and refrains from specifying an optimization
level if it finds that one has already been specified.
CPPFLAGS="?"
Pass these flags to the C preprocessor. Note that CFLAGS is not passed to
'cpp' when 'configure' is looking for include files, so you must use
CPPFLAGS instead if you need to help 'configure' find header files.
LD_LIBRARY_PATH="?"
'ld' uses this colon-separated list to find libraries.
LDFLAGS="?"
Pass these flags when linking.
PATH="?"
'configure' uses this to find programs.
=== Advanced compilation =======================================================
To build only parts of jemalloc, use the following targets:
build_lib_shared
build_lib_static
build_lib
build_doc_html
build_doc_man
build_doc
To install only parts of jemalloc, use the following targets:
install_bin
install_include
install_lib_shared
install_lib_static
install_lib
install_doc_html
install_doc_man
install_doc
To clean up build results to varying degrees, use the following make targets:
clean
distclean
relclean
=== Advanced installation ======================================================
Optionally, define make variables when invoking make, including (not
exclusively):
INCLUDEDIR="?"
Use this as the installation prefix for header files.
LIBDIR="?"
Use this as the installation prefix for libraries.
MANDIR="?"
Use this as the installation prefix for man pages.
DESTDIR="?"
Prepend DESTDIR to INCLUDEDIR, LIBDIR, DATADIR, and MANDIR. This is useful
when installing to a different path than was specified via --prefix.
CC="?"
Use this to invoke the C compiler.
CFLAGS="?"
Pass these flags to the compiler.
CPPFLAGS="?"
Pass these flags to the C preprocessor.
LDFLAGS="?"
Pass these flags when linking.
PATH="?"
Use this to search for programs used during configuration and building.
=== Development ================================================================
If you intend to make non-trivial changes to jemalloc, use the 'autogen.sh'
script rather than 'configure'. This re-generates 'configure', enables
configuration dependency rules, and enables re-generation of automatically
generated source files.
The build system supports using an object directory separate from the source
tree. For example, you can create an 'obj' directory, and from within that
directory, issue configuration and build commands:
autoconf
mkdir obj
cd obj
../configure --enable-autogen
make
=== Documentation ==============================================================
The manual page is generated in both html and roff formats. Any web browser
can be used to view the html manual. The roff manual page can be formatted
prior to installation via the following command:
nroff -man -t doc/jemalloc.3

527
INSTALL.md Normal file
View file

@ -0,0 +1,527 @@
Building and installing a packaged release of jemalloc can be as simple as
typing the following while in the root directory of the source tree:
./configure
make
make install
If building from unpackaged developer sources, the simplest command sequence
that might work is:
./autogen.sh
make
make install
You can uninstall the installed build artifacts like this:
make uninstall
Notes:
- "autoconf" needs to be installed
- Documentation is built by the default target only when xsltproc is
available. Build will warn but not stop if the dependency is missing.
## Advanced configuration
The 'configure' script supports numerous options that allow control of which
functionality is enabled, where jemalloc is installed, etc. Optionally, pass
any of the following arguments (not a definitive list) to 'configure':
* `--help`
Print a definitive list of options.
* `--prefix=<install-root-dir>`
Set the base directory in which to install. For example:
./configure --prefix=/usr/local
will cause files to be installed into /usr/local/include, /usr/local/lib,
and /usr/local/man.
* `--with-version=(<major>.<minor>.<bugfix>-<nrev>-g<gid>|VERSION)`
The VERSION file is mandatory for successful configuration, and the
following steps are taken to assure its presence:
1) If --with-version=<major>.<minor>.<bugfix>-<nrev>-g<gid> is specified,
generate VERSION using the specified value.
2) If --with-version is not specified in either form and the source
directory is inside a git repository, try to generate VERSION via 'git
describe' invocations that pattern-match release tags.
3) If VERSION is missing, generate it with a bogus version:
0.0.0-0-g0000000000000000000000000000000000000000
Note that --with-version=VERSION bypasses (1) and (2), which simplifies
VERSION configuration when embedding a jemalloc release into another
project's git repository.
* `--with-rpath=<colon-separated-rpath>`
Embed one or more library paths, so that libjemalloc can find the libraries
it is linked to. This works only on ELF-based systems.
* `--with-mangling=<map>`
Mangle public symbols specified in <map> which is a comma-separated list of
name:mangled pairs.
For example, to use ld's --wrap option as an alternative method for
overriding libc's malloc implementation, specify something like:
--with-mangling=malloc:__wrap_malloc,free:__wrap_free[...]
Note that mangling happens prior to application of the prefix specified by
--with-jemalloc-prefix, and mangled symbols are then ignored when applying
the prefix.
* `--with-jemalloc-prefix=<prefix>`
Prefix all public APIs with <prefix>. For example, if <prefix> is
"prefix_", API changes like the following occur:
malloc() --> prefix_malloc()
malloc_conf --> prefix_malloc_conf
/etc/malloc.conf --> /etc/prefix_malloc.conf
MALLOC_CONF --> PREFIX_MALLOC_CONF
This makes it possible to use jemalloc at the same time as the system
allocator, or even to use multiple copies of jemalloc simultaneously.
By default, the prefix is "", except on OS X, where it is "je_". On OS X,
jemalloc overlays the default malloc zone, but makes no attempt to actually
replace the "malloc", "calloc", etc. symbols.
* `--without-export`
Don't export public APIs. This can be useful when building jemalloc as a
static library, or to avoid exporting public APIs when using the zone
allocator on OSX.
* `--with-private-namespace=<prefix>`
Prefix all library-private APIs with <prefix>je_. For shared libraries,
symbol visibility mechanisms prevent these symbols from being exported, but
for static libraries, naming collisions are a real possibility. By
default, <prefix> is empty, which results in a symbol prefix of je_ .
* `--with-install-suffix=<suffix>`
Append <suffix> to the base name of all installed files, such that multiple
versions of jemalloc can coexist in the same installation directory. For
example, libjemalloc.so.0 becomes libjemalloc<suffix>.so.0.
* `--with-malloc-conf=<malloc_conf>`
Embed `<malloc_conf>` as a run-time options string that is processed prior to
the malloc_conf global variable, the /etc/malloc.conf symlink, and the
MALLOC_CONF environment variable. For example, to change the default decay
time to 30 seconds:
--with-malloc-conf=decay_ms:30000
* `--enable-debug`
Enable assertions and validation code. This incurs a substantial
performance hit, but is very useful during application development.
* `--disable-stats`
Disable statistics gathering functionality. See the "opt.stats_print"
option documentation for usage details.
* `--enable-prof`
Enable heap profiling and leak detection functionality. See the "opt.prof"
option documentation for usage details. When enabled, there are several
approaches to backtracing, and the configure script chooses the first one
in the following list that appears to function correctly:
+ libunwind (requires --enable-prof-libunwind)
+ frame pointer (requires --enable-prof-frameptr)
+ libgcc (unless --disable-prof-libgcc)
+ gcc intrinsics (unless --disable-prof-gcc)
* `--enable-prof-libunwind`
Use the libunwind library (http://www.nongnu.org/libunwind/) for stack
backtracing.
* `--enable-prof-frameptr`
Use the optimized frame pointer unwinder for stack backtracing. Safe
to use in mixed code (with and without frame pointers) - but requires
frame pointers to produce meaningful stacks. Linux only.
* `--disable-prof-libgcc`
Disable the use of libgcc's backtracing functionality.
* `--disable-prof-gcc`
Disable the use of gcc intrinsics for backtracing.
* `--with-static-libunwind=<libunwind.a>`
Statically link against the specified libunwind.a rather than dynamically
linking with -lunwind.
* `--disable-fill`
Disable support for junk/zero filling of memory. See the "opt.junk" and
"opt.zero" option documentation for usage details.
* `--disable-zone-allocator`
Disable zone allocator for Darwin. This means jemalloc won't be hooked as
the default allocator on OSX/iOS.
* `--enable-utrace`
Enable utrace(2)-based allocation tracing. This feature is not broadly
portable (FreeBSD has it, but Linux and OS X do not).
* `--enable-xmalloc`
Enable support for optional immediate termination due to out-of-memory
errors, as is commonly implemented by "xmalloc" wrapper function for malloc.
See the "opt.xmalloc" option documentation for usage details.
* `--enable-lazy-lock`
Enable code that wraps pthread_create() to detect when an application
switches from single-threaded to multi-threaded mode, so that it can avoid
mutex locking/unlocking operations while in single-threaded mode. In
practice, this feature usually has little impact on performance unless
thread-specific caching is disabled.
* `--disable-cache-oblivious`
Disable cache-oblivious large allocation alignment by default, for large
allocation requests with no alignment constraints. If this feature is
disabled, all large allocations are page-aligned as an implementation
artifact, which can severely harm CPU cache utilization. However, the
cache-oblivious layout comes at the cost of one extra page per large
allocation, which in the most extreme case increases physical memory usage
for the 16 KiB size class to 20 KiB.
* `--disable-syscall`
Disable use of syscall(2) rather than {open,read,write,close}(2). This is
intended as a workaround for systems that place security limitations on
syscall(2).
* `--disable-cxx`
Disable C++ integration. This will cause new and delete operator
implementations to be omitted.
* `--with-xslroot=<path>`
Specify where to find DocBook XSL stylesheets when building the
documentation.
* `--with-lg-page=<lg-page>`
Specify the base 2 log of the allocator page size, which must in turn be at
least as large as the system page size. By default the configure script
determines the host's page size and sets the allocator page size equal to
the system page size, so this option need not be specified unless the
system page size may change between configuration and execution, e.g. when
cross compiling.
* `--with-lg-hugepage=<lg-hugepage>`
Specify the base 2 log of the system huge page size. This option is useful
when cross compiling, or when overriding the default for systems that do
not explicitly support huge pages.
* `--with-lg-quantum=<lg-quantum>`
Specify the base 2 log of the minimum allocation alignment. jemalloc needs
to know the minimum alignment that meets the following C standard
requirement (quoted from the April 12, 2011 draft of the C11 standard):
> The pointer returned if the allocation succeeds is suitably aligned so
that it may be assigned to a pointer to any type of object with a
fundamental alignment requirement and then used to access such an object
or an array of such objects in the space allocated [...]
This setting is architecture-specific, and although jemalloc includes known
safe values for the most commonly used modern architectures, there is a
wrinkle related to GNU libc (glibc) that may impact your choice of
<lg-quantum>. On most modern architectures, this mandates 16-byte
alignment (<lg-quantum>=4), but the glibc developers chose not to meet this
requirement for performance reasons. An old discussion can be found at
<https://sourceware.org/bugzilla/show_bug.cgi?id=206> . Unlike glibc,
jemalloc does follow the C standard by default (caveat: jemalloc
technically cheats for size classes smaller than the quantum), but the fact
that Linux systems already work around this allocator noncompliance means
that it is generally safe in practice to let jemalloc's minimum alignment
follow glibc's lead. If you specify `--with-lg-quantum=3` during
configuration, jemalloc will provide additional size classes that are not
16-byte-aligned (24, 40, and 56).
* `--with-lg-vaddr=<lg-vaddr>`
Specify the number of significant virtual address bits. By default, the
configure script attempts to detect virtual address size on those platforms
where it knows how, and picks a default otherwise. This option may be
useful when cross-compiling.
* `--disable-initial-exec-tls`
Disable the initial-exec TLS model for jemalloc's internal thread-local
storage (on those platforms that support explicit settings). This can allow
jemalloc to be dynamically loaded after program startup (e.g. using dlopen).
Note that in this case, there will be two malloc implementations operating
in the same process, which will almost certainly result in confusing runtime
crashes if pointers leak from one implementation to the other.
* `--disable-libdl`
Disable the usage of libdl, namely dlsym(3) which is required by the lazy
lock option. This can allow building static binaries.
The following environment variables (not a definitive list) impact configure's
behavior:
* `CFLAGS="?"`
* `CXXFLAGS="?"`
Pass these flags to the C/C++ compiler. Any flags set by the configure
script are prepended, which means explicitly set flags generally take
precedence. Take care when specifying flags such as -Werror, because
configure tests may be affected in undesirable ways.
* `EXTRA_CFLAGS="?"`
* `EXTRA_CXXFLAGS="?"`
Append these flags to CFLAGS/CXXFLAGS, without passing them to the
compiler(s) during configuration. This makes it possible to add flags such
as -Werror, while allowing the configure script to determine what other
flags are appropriate for the specified configuration.
* `CPPFLAGS="?"`
Pass these flags to the C preprocessor. Note that CFLAGS is not passed to
'cpp' when 'configure' is looking for include files, so you must use
CPPFLAGS instead if you need to help 'configure' find header files.
* `LD_LIBRARY_PATH="?"`
'ld' uses this colon-separated list to find libraries.
* `LDFLAGS="?"`
Pass these flags when linking.
* `PATH="?"`
'configure' uses this to find programs.
In some cases it may be necessary to work around configuration results that do
not match reality. For example, Linux 3.4 added support for the MADV_DONTDUMP
flag to madvise(2), which can cause problems if building on a host with
MADV_DONTDUMP support and deploying to a target without. To work around this,
use a cache file to override the relevant configuration variable defined in
configure.ac, e.g.:
echo "je_cv_madv_dontdump=no" > config.cache && ./configure -C
## Advanced compilation
To build only parts of jemalloc, use the following targets:
build_lib_shared
build_lib_static
build_lib
build_doc_html
build_doc_man
build_doc
To install only parts of jemalloc, use the following targets:
install_bin
install_include
install_lib_shared
install_lib_static
install_lib_pc
install_lib
install_doc_html
install_doc_man
install_doc
To clean up build results to varying degrees, use the following make targets:
clean
distclean
relclean
## Advanced installation
Optionally, define make variables when invoking make, including (not
exclusively):
* `INCLUDEDIR="?"`
Use this as the installation prefix for header files.
* `LIBDIR="?"`
Use this as the installation prefix for libraries.
* `MANDIR="?"`
Use this as the installation prefix for man pages.
* `DESTDIR="?"`
Prepend DESTDIR to INCLUDEDIR, LIBDIR, DATADIR, and MANDIR. This is useful
when installing to a different path than was specified via --prefix.
* `CC="?"`
Use this to invoke the C compiler.
* `CFLAGS="?"`
Pass these flags to the compiler.
* `CPPFLAGS="?"`
Pass these flags to the C preprocessor.
* `LDFLAGS="?"`
Pass these flags when linking.
* `PATH="?"`
Use this to search for programs used during configuration and building.
## Building for Windows
There are at least two ways to build jemalloc's libraries for Windows. They
differ in their ease of use and flexibility.
### With MSVC solutions
This is the easy, but less flexible approach. It doesn't let you specify
arguments to the `configure` script.
1. Install Cygwin with at least the following packages:
* autoconf
* autogen
* gawk
* grep
* sed
2. Install Visual Studio 2015 or 2017 with Visual C++
3. Add Cygwin\bin to the PATH environment variable
4. Open "x64 Native Tools Command Prompt for VS 2017"
(note: x86/x64 doesn't matter at this point)
5. Generate header files:
sh -c "CC=cl ./autogen.sh"
6. Now the project can be opened and built in Visual Studio:
msvc\jemalloc_vc2017.sln
### With MSYS
This is a more involved approach that offers the same configuration flexibility
as Linux builds. We use it for our CI workflow to test different jemalloc
configurations on Windows.
1. Install the prerequisites
1. MSYS2
2. Chocolatey
3. Visual Studio if you want to compile with MSVC compiler
2. Run your bash emulation. It could be MSYS2 or Git Bash (this manual was
tested on both)
3. Manually and selectively follow
[before_install.sh](https://github.com/jemalloc/jemalloc/blob/dev/scripts/windows/before_install.sh)
script.
1. Skip the `TRAVIS_OS_NAME` check, `rm -rf C:/tools/msys64` and `choco
uninstall/upgrade` part.
2. If using `msys2` shell, add path to `RefreshEnv.cmd` to `PATH`:
`PATH="$PATH:/c/ProgramData/chocolatey/bin"`
3. Assign `msys_shell_cmd`, `msys2`, `mingw32` and `mingw64` as in the
script.
4. Pick `CROSS_COMPILE_32BIT` , `CC` and `USE_MSVC` values depending on
your needs. For instance, if you'd like to build for x86_64 Windows
with `gcc`, then `CROSS_COMPILE_32BIT="no"`, `CC="gcc"` and
`USE_MSVC=""`. If you'd like to build for x86 Windows with `cl.exe`,
then `CROSS_COMPILE_32BIT="yes"`, `CC="cl.exe"`, `USE_MSVC="x86"`.
For x86_64 builds with `cl.exe`, assign `USE_MSVC="amd64"` and
`CROSS_COMPILE_32BIT="no"`.
5. Replace the path to `vcvarsall.bat` with the path on your system. For
instance, on my Windows PC with Visual Studio 17, the path is
`C:\Program Files (x86)\Microsoft Visual
Studio\2017\BuildTools\VC\Auxiliary\Build\vcvarsall.bat`.
6. Execute the rest of the script. It will install the required
dependencies and assign the variable `build_env`, which is a function
that executes following commands with the correct environment
variables set.
4. Use `$build_env <command>` as you would in a Linux shell:
1. `$build_env autoconf`
2. `$build_env ./configure CC="<desired compiler>" <configuration flags>`
3. `$build_env mingw32-make`
If you're having any issues with the above, ensure the following:
5. When you run `cmd //C RefreshEnv.cmd`, you get an output line starting with
`Refreshing` . If it errors saying `RefreshEnv.cmd` is not found, then you
need to add it to your `PATH` as described above in item 3.2
6. When you run `cmd //C $vcvarsall`, it prints a bunch of environment
variables. Otherwise, check the path to the `vcvarsall.bat` in `$vcvarsall`
script and fix it.
### Building from vcpkg
The jemalloc port in vcpkg is kept up to date by Microsoft team members and
community contributors. The url of vcpkg is: https://github.com/Microsoft/vcpkg
. You can download and install jemalloc using the vcpkg dependency manager:
```shell
git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh # ./bootstrap-vcpkg.bat for Windows
./vcpkg integrate install
./vcpkg install jemalloc
```
If the version is out of date, please [create an issue or pull
request](https://github.com/Microsoft/vcpkg) on the vcpkg repository.
## Development
If you intend to make non-trivial changes to jemalloc, use the 'autogen.sh'
script rather than 'configure'. This re-generates 'configure', enables
configuration dependency rules, and enables re-generation of automatically
generated source files.
The build system supports using an object directory separate from the source
tree. For example, you can create an 'obj' directory, and from within that
directory, issue configuration and build commands:
autoconf
mkdir obj
cd obj
../configure --enable-autogen
make
## Documentation
The manual page is generated in both html and roff formats. Any web browser
can be used to view the html manual. The roff manual page can be formatted
prior to installation via the following command:
nroff -man -t doc/jemalloc.3

View file

@ -9,6 +9,7 @@ vpath % .
SHELL := /bin/sh
CC := @CC@
CXX := @CXX@
# Configuration parameters.
DESTDIR =
@ -23,9 +24,15 @@ abs_srcroot := @abs_srcroot@
abs_objroot := @abs_objroot@
# Build parameters.
CPPFLAGS := @CPPFLAGS@ -I$(srcroot)include -I$(objroot)include
CPPFLAGS := @CPPFLAGS@ -I$(objroot)include -I$(srcroot)include
CONFIGURE_CFLAGS := @CONFIGURE_CFLAGS@
SPECIFIED_CFLAGS := @SPECIFIED_CFLAGS@
EXTRA_CFLAGS := @EXTRA_CFLAGS@
CFLAGS := @CFLAGS@ $(EXTRA_CFLAGS)
CFLAGS := $(strip $(CONFIGURE_CFLAGS) $(SPECIFIED_CFLAGS) $(EXTRA_CFLAGS))
CONFIGURE_CXXFLAGS := @CONFIGURE_CXXFLAGS@
SPECIFIED_CXXFLAGS := @SPECIFIED_CXXFLAGS@
EXTRA_CXXFLAGS := @EXTRA_CXXFLAGS@
CXXFLAGS := $(strip $(CONFIGURE_CXXFLAGS) $(SPECIFIED_CXXFLAGS) $(EXTRA_CXXFLAGS))
LDFLAGS := @LDFLAGS@
EXTRA_LDFLAGS := @EXTRA_LDFLAGS@
LIBS := @LIBS@
@ -40,6 +47,7 @@ REV := @rev@
install_suffix := @install_suffix@
ABI := @abi@
XSLTPROC := @XSLTPROC@
XSLROOT := @XSLROOT@
AUTOCONF := @AUTOCONF@
_RPATH = @RPATH@
RPATH = $(if $(1),$(call _RPATH,$(1)))
@ -48,10 +56,12 @@ cfghdrs_out := @cfghdrs_out@
cfgoutputs_in := $(addprefix $(srcroot),@cfgoutputs_in@)
cfgoutputs_out := @cfgoutputs_out@
enable_autogen := @enable_autogen@
enable_code_coverage := @enable_code_coverage@
enable_doc := @enable_doc@
enable_shared := @enable_shared@
enable_static := @enable_static@
enable_prof := @enable_prof@
enable_valgrind := @enable_valgrind@
enable_zone_allocator := @enable_zone_allocator@
enable_experimental_smallocx := @enable_experimental_smallocx@
MALLOC_CONF := @JEMALLOC_CPREFIX@MALLOC_CONF
link_whole_archive := @link_whole_archive@
DSO_LDFLAGS = @DSO_LDFLAGS@
@ -63,6 +73,8 @@ TEST_LD_MODE = @TEST_LD_MODE@
MKLIB = @MKLIB@
AR = @AR@
ARFLAGS = @ARFLAGS@
DUMP_SYMS = @DUMP_SYMS@
AWK := @AWK@
CC_MM = @CC_MM@
LM := @LM@
INSTALL = @INSTALL@
@ -84,35 +96,71 @@ BINS := $(objroot)bin/jemalloc-config $(objroot)bin/jemalloc.sh $(objroot)bin/je
C_HDRS := $(objroot)include/jemalloc/jemalloc$(install_suffix).h
C_SRCS := $(srcroot)src/jemalloc.c \
$(srcroot)src/arena.c \
$(srcroot)src/atomic.c \
$(srcroot)src/background_thread.c \
$(srcroot)src/base.c \
$(srcroot)src/bin.c \
$(srcroot)src/bin_info.c \
$(srcroot)src/bitmap.c \
$(srcroot)src/chunk.c \
$(srcroot)src/chunk_dss.c \
$(srcroot)src/chunk_mmap.c \
$(srcroot)src/buf_writer.c \
$(srcroot)src/cache_bin.c \
$(srcroot)src/ckh.c \
$(srcroot)src/counter.c \
$(srcroot)src/ctl.c \
$(srcroot)src/decay.c \
$(srcroot)src/div.c \
$(srcroot)src/ecache.c \
$(srcroot)src/edata.c \
$(srcroot)src/edata_cache.c \
$(srcroot)src/ehooks.c \
$(srcroot)src/emap.c \
$(srcroot)src/eset.c \
$(srcroot)src/exp_grow.c \
$(srcroot)src/extent.c \
$(srcroot)src/hash.c \
$(srcroot)src/huge.c \
$(srcroot)src/mb.c \
$(srcroot)src/extent_dss.c \
$(srcroot)src/extent_mmap.c \
$(srcroot)src/fxp.c \
$(srcroot)src/san.c \
$(srcroot)src/san_bump.c \
$(srcroot)src/hook.c \
$(srcroot)src/hpa.c \
$(srcroot)src/hpa_central.c \
$(srcroot)src/hpa_hooks.c \
$(srcroot)src/hpa_utils.c \
$(srcroot)src/hpdata.c \
$(srcroot)src/inspect.c \
$(srcroot)src/large.c \
$(srcroot)src/log.c \
$(srcroot)src/malloc_io.c \
$(srcroot)src/conf.c \
$(srcroot)src/mutex.c \
$(srcroot)src/nstime.c \
$(srcroot)src/pa.c \
$(srcroot)src/pa_extra.c \
$(srcroot)src/pac.c \
$(srcroot)src/pages.c \
$(srcroot)src/prng.c \
$(srcroot)src/peak_event.c \
$(srcroot)src/prof.c \
$(srcroot)src/quarantine.c \
$(srcroot)src/prof_data.c \
$(srcroot)src/prof_log.c \
$(srcroot)src/prof_recent.c \
$(srcroot)src/prof_stack_range.c \
$(srcroot)src/prof_stats.c \
$(srcroot)src/prof_sys.c \
$(srcroot)src/psset.c \
$(srcroot)src/rtree.c \
$(srcroot)src/safety_check.c \
$(srcroot)src/sc.c \
$(srcroot)src/sec.c \
$(srcroot)src/stats.c \
$(srcroot)src/spin.c \
$(srcroot)src/sz.c \
$(srcroot)src/tcache.c \
$(srcroot)src/test_hooks.c \
$(srcroot)src/thread_event.c \
$(srcroot)src/thread_event_registry.c \
$(srcroot)src/ticker.c \
$(srcroot)src/tsd.c \
$(srcroot)src/util.c \
$(srcroot)src/witness.c
ifeq ($(enable_valgrind), 1)
C_SRCS += $(srcroot)src/valgrind.c
endif
ifeq ($(enable_zone_allocator), 1)
C_SRCS += $(srcroot)src/zone.c
endif
@ -134,108 +182,255 @@ else
LJEMALLOC := $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB)
endif
PC := $(objroot)jemalloc.pc
MAN3 := $(objroot)doc/jemalloc$(install_suffix).3
DOCS_XML := $(objroot)doc/jemalloc$(install_suffix).xml
DOCS_HTML := $(DOCS_XML:$(objroot)%.xml=$(objroot)%.html)
DOCS_MAN3 := $(DOCS_XML:$(objroot)%.xml=$(objroot)%.3)
DOCS := $(DOCS_HTML) $(DOCS_MAN3)
C_TESTLIB_SRCS := $(srcroot)test/src/btalloc.c $(srcroot)test/src/btalloc_0.c \
$(srcroot)test/src/btalloc_1.c $(srcroot)test/src/math.c \
$(srcroot)test/src/mtx.c $(srcroot)test/src/mq.c \
$(srcroot)test/src/mtx.c $(srcroot)test/src/sleep.c \
$(srcroot)test/src/SFMT.c $(srcroot)test/src/test.c \
$(srcroot)test/src/thd.c $(srcroot)test/src/timer.c
ifeq (1, $(link_whole_archive))
C_UTIL_INTEGRATION_SRCS :=
C_UTIL_CPP_SRCS :=
else
C_UTIL_INTEGRATION_SRCS := $(srcroot)src/nstime.c $(srcroot)src/util.c
C_UTIL_INTEGRATION_SRCS := $(srcroot)src/nstime.c $(srcroot)src/malloc_io.c \
$(srcroot)src/ticker.c
C_UTIL_CPP_SRCS := $(srcroot)src/nstime.c $(srcroot)src/malloc_io.c
endif
TESTS_UNIT := \
$(srcroot)test/unit/a0.c \
$(srcroot)test/unit/arena_decay.c \
$(srcroot)test/unit/arena_reset.c \
$(srcroot)test/unit/atomic.c \
$(srcroot)test/unit/background_thread.c \
$(srcroot)test/unit/background_thread_enable.c \
$(srcroot)test/unit/background_thread_init.c \
$(srcroot)test/unit/base.c \
$(srcroot)test/unit/batch_alloc.c \
$(srcroot)test/unit/bin.c \
$(srcroot)test/unit/binshard.c \
$(srcroot)test/unit/bitmap.c \
$(srcroot)test/unit/bit_util.c \
$(srcroot)test/unit/buf_writer.c \
$(srcroot)test/unit/cache_bin.c \
$(srcroot)test/unit/ckh.c \
$(srcroot)test/unit/conf.c \
$(srcroot)test/unit/conf_init_0.c \
$(srcroot)test/unit/conf_init_1.c \
$(srcroot)test/unit/conf_init_confirm.c \
$(srcroot)test/unit/conf_parse.c \
$(srcroot)test/unit/counter.c \
$(srcroot)test/unit/decay.c \
$(srcroot)test/unit/div.c \
$(srcroot)test/unit/double_free.c \
$(srcroot)test/unit/edata_cache.c \
$(srcroot)test/unit/emitter.c \
$(srcroot)test/unit/extent_quantize.c \
${srcroot}test/unit/fb.c \
$(srcroot)test/unit/fork.c \
${srcroot}test/unit/fxp.c \
${srcroot}test/unit/san.c \
${srcroot}test/unit/san_bump.c \
$(srcroot)test/unit/hash.c \
$(srcroot)test/unit/hook.c \
$(srcroot)test/unit/hpa.c \
$(srcroot)test/unit/hpa_sec_integration.c \
$(srcroot)test/unit/hpa_thp_always.c \
$(srcroot)test/unit/hpa_vectorized_madvise.c \
$(srcroot)test/unit/hpa_vectorized_madvise_large_batch.c \
$(srcroot)test/unit/hpa_background_thread.c \
$(srcroot)test/unit/hpdata.c \
$(srcroot)test/unit/huge.c \
$(srcroot)test/unit/inspect.c \
$(srcroot)test/unit/junk.c \
$(srcroot)test/unit/junk_alloc.c \
$(srcroot)test/unit/junk_free.c \
$(srcroot)test/unit/lg_chunk.c \
$(srcroot)test/unit/json_stats.c \
$(srcroot)test/unit/large_ralloc.c \
$(srcroot)test/unit/log.c \
$(srcroot)test/unit/mallctl.c \
$(srcroot)test/unit/malloc_conf_2.c \
$(srcroot)test/unit/malloc_io.c \
$(srcroot)test/unit/math.c \
$(srcroot)test/unit/mpsc_queue.c \
$(srcroot)test/unit/mq.c \
$(srcroot)test/unit/mtx.c \
$(srcroot)test/unit/nstime.c \
$(srcroot)test/unit/ncached_max.c \
$(srcroot)test/unit/oversize_threshold.c \
$(srcroot)test/unit/pa.c \
$(srcroot)test/unit/pack.c \
$(srcroot)test/unit/pages.c \
$(srcroot)test/unit/peak.c \
$(srcroot)test/unit/ph.c \
$(srcroot)test/unit/prng.c \
$(srcroot)test/unit/prof_accum.c \
$(srcroot)test/unit/prof_active.c \
$(srcroot)test/unit/prof_gdump.c \
$(srcroot)test/unit/prof_hook.c \
$(srcroot)test/unit/prof_idump.c \
$(srcroot)test/unit/prof_log.c \
$(srcroot)test/unit/prof_mdump.c \
$(srcroot)test/unit/prof_recent.c \
$(srcroot)test/unit/prof_reset.c \
$(srcroot)test/unit/prof_small.c \
$(srcroot)test/unit/prof_stats.c \
$(srcroot)test/unit/prof_tctx.c \
$(srcroot)test/unit/prof_thread_name.c \
$(srcroot)test/unit/prof_sys_thread_name.c \
$(srcroot)test/unit/psset.c \
$(srcroot)test/unit/ql.c \
$(srcroot)test/unit/qr.c \
$(srcroot)test/unit/quarantine.c \
$(srcroot)test/unit/rb.c \
$(srcroot)test/unit/retained.c \
$(srcroot)test/unit/rtree.c \
$(srcroot)test/unit/run_quantize.c \
$(srcroot)test/unit/safety_check.c \
$(srcroot)test/unit/sc.c \
$(srcroot)test/unit/sec.c \
$(srcroot)test/unit/seq.c \
$(srcroot)test/unit/SFMT.c \
$(srcroot)test/unit/size_check.c \
$(srcroot)test/unit/size_classes.c \
$(srcroot)test/unit/slab.c \
$(srcroot)test/unit/smoothstep.c \
$(srcroot)test/unit/spin.c \
$(srcroot)test/unit/stats.c \
$(srcroot)test/unit/stats_print.c \
$(srcroot)test/unit/sz.c \
$(srcroot)test/unit/tcache_init.c \
$(srcroot)test/unit/tcache_max.c \
$(srcroot)test/unit/test_hooks.c \
$(srcroot)test/unit/thread_event.c \
$(srcroot)test/unit/ticker.c \
$(srcroot)test/unit/nstime.c \
$(srcroot)test/unit/tsd.c \
$(srcroot)test/unit/util.c \
$(srcroot)test/unit/uaf.c \
$(srcroot)test/unit/witness.c \
$(srcroot)test/unit/zero.c
$(srcroot)test/unit/zero.c \
$(srcroot)test/unit/zero_realloc_abort.c \
$(srcroot)test/unit/zero_realloc_free.c \
$(srcroot)test/unit/zero_realloc_alloc.c \
$(srcroot)test/unit/zero_reallocs.c
ifeq (@enable_prof@, 1)
TESTS_UNIT += \
$(srcroot)test/unit/arena_reset_prof.c \
$(srcroot)test/unit/batch_alloc_prof.c
endif
TESTS_INTEGRATION := $(srcroot)test/integration/aligned_alloc.c \
$(srcroot)test/integration/allocated.c \
$(srcroot)test/integration/sdallocx.c \
$(srcroot)test/integration/extent.c \
$(srcroot)test/integration/malloc.c \
$(srcroot)test/integration/mallocx.c \
$(srcroot)test/integration/MALLOCX_ARENA.c \
$(srcroot)test/integration/overflow.c \
$(srcroot)test/integration/posix_memalign.c \
$(srcroot)test/integration/rallocx.c \
$(srcroot)test/integration/sdallocx.c \
$(srcroot)test/integration/slab_sizes.c \
$(srcroot)test/integration/thread_arena.c \
$(srcroot)test/integration/thread_tcache_enabled.c \
$(srcroot)test/integration/xallocx.c \
$(srcroot)test/integration/chunk.c
TESTS_STRESS := $(srcroot)test/stress/microbench.c
TESTS := $(TESTS_UNIT) $(TESTS_INTEGRATION) $(TESTS_STRESS)
$(srcroot)test/integration/xallocx.c
ifeq (@enable_experimental_smallocx@, 1)
TESTS_INTEGRATION += \
$(srcroot)test/integration/smallocx.c
endif
ifeq (@enable_cxx@, 1)
CPP_SRCS := $(srcroot)src/jemalloc_cpp.cpp
TESTS_INTEGRATION_CPP := $(srcroot)test/integration/cpp/basic.cpp \
$(srcroot)test/integration/cpp/infallible_new_true.cpp \
$(srcroot)test/integration/cpp/infallible_new_false.cpp
else
CPP_SRCS :=
TESTS_INTEGRATION_CPP :=
endif
TESTS_ANALYZE := $(srcroot)test/analyze/prof_bias.c \
$(srcroot)test/analyze/rand.c \
$(srcroot)test/analyze/sizes.c
TESTS_STRESS := $(srcroot)test/stress/batch_alloc.c \
$(srcroot)test/stress/fill_flush.c \
$(srcroot)test/stress/hookbench.c \
$(srcroot)test/stress/large_microbench.c \
$(srcroot)test/stress/mallctl.c \
$(srcroot)test/stress/microbench.c
ifeq (@enable_cxx@, 1)
TESTS_STRESS_CPP := $(srcroot)test/stress/cpp/microbench.cpp
else
TESTS_STRESS_CPP :=
endif
TESTS := $(TESTS_UNIT) $(TESTS_INTEGRATION) $(TESTS_INTEGRATION_CPP) \
$(TESTS_ANALYZE) $(TESTS_STRESS) $(TESTS_STRESS_CPP)
PRIVATE_NAMESPACE_HDRS := $(objroot)include/jemalloc/internal/private_namespace.h $(objroot)include/jemalloc/internal/private_namespace_jet.h
PRIVATE_NAMESPACE_GEN_HDRS := $(PRIVATE_NAMESPACE_HDRS:%.h=%.gen.h)
C_SYM_OBJS := $(C_SRCS:$(srcroot)%.c=$(objroot)%.sym.$(O))
C_SYMS := $(C_SRCS:$(srcroot)%.c=$(objroot)%.sym)
C_OBJS := $(C_SRCS:$(srcroot)%.c=$(objroot)%.$(O))
CPP_OBJS := $(CPP_SRCS:$(srcroot)%.cpp=$(objroot)%.$(O))
C_PIC_OBJS := $(C_SRCS:$(srcroot)%.c=$(objroot)%.pic.$(O))
CPP_PIC_OBJS := $(CPP_SRCS:$(srcroot)%.cpp=$(objroot)%.pic.$(O))
C_JET_SYM_OBJS := $(C_SRCS:$(srcroot)%.c=$(objroot)%.jet.sym.$(O))
C_JET_SYMS := $(C_SRCS:$(srcroot)%.c=$(objroot)%.jet.sym)
C_JET_OBJS := $(C_SRCS:$(srcroot)%.c=$(objroot)%.jet.$(O))
C_TESTLIB_UNIT_OBJS := $(C_TESTLIB_SRCS:$(srcroot)%.c=$(objroot)%.unit.$(O))
C_TESTLIB_INTEGRATION_OBJS := $(C_TESTLIB_SRCS:$(srcroot)%.c=$(objroot)%.integration.$(O))
C_UTIL_INTEGRATION_OBJS := $(C_UTIL_INTEGRATION_SRCS:$(srcroot)%.c=$(objroot)%.integration.$(O))
C_TESTLIB_ANALYZE_OBJS := $(C_TESTLIB_SRCS:$(srcroot)%.c=$(objroot)%.analyze.$(O))
C_TESTLIB_STRESS_OBJS := $(C_TESTLIB_SRCS:$(srcroot)%.c=$(objroot)%.stress.$(O))
C_TESTLIB_OBJS := $(C_TESTLIB_UNIT_OBJS) $(C_TESTLIB_INTEGRATION_OBJS) $(C_UTIL_INTEGRATION_OBJS) $(C_TESTLIB_STRESS_OBJS)
C_TESTLIB_OBJS := $(C_TESTLIB_UNIT_OBJS) $(C_TESTLIB_INTEGRATION_OBJS) \
$(C_UTIL_INTEGRATION_OBJS) $(C_TESTLIB_ANALYZE_OBJS) \
$(C_TESTLIB_STRESS_OBJS)
TESTS_UNIT_OBJS := $(TESTS_UNIT:$(srcroot)%.c=$(objroot)%.$(O))
TESTS_INTEGRATION_OBJS := $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%.$(O))
TESTS_INTEGRATION_CPP_OBJS := $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%.$(O))
TESTS_ANALYZE_OBJS := $(TESTS_ANALYZE:$(srcroot)%.c=$(objroot)%.$(O))
TESTS_STRESS_OBJS := $(TESTS_STRESS:$(srcroot)%.c=$(objroot)%.$(O))
TESTS_OBJS := $(TESTS_UNIT_OBJS) $(TESTS_INTEGRATION_OBJS) $(TESTS_STRESS_OBJS)
TESTS_STRESS_CPP_OBJS := $(TESTS_STRESS_CPP:$(srcroot)%.cpp=$(objroot)%.$(O))
TESTS_OBJS := $(TESTS_UNIT_OBJS) $(TESTS_INTEGRATION_OBJS) $(TESTS_ANALYZE_OBJS) \
$(TESTS_STRESS_OBJS)
TESTS_CPP_OBJS := $(TESTS_INTEGRATION_CPP_OBJS) $(TESTS_STRESS_CPP_OBJS)
.PHONY: all dist build_doc_html build_doc_man build_doc
.PHONY: install_bin install_include install_lib
.PHONY: install_doc_html install_doc_man install_doc install
.PHONY: tests check clean distclean relclean
.SECONDARY : $(TESTS_OBJS)
.SECONDARY : $(PRIVATE_NAMESPACE_GEN_HDRS) $(TESTS_OBJS) $(TESTS_CPP_OBJS)
# Default target.
all: build_lib
dist: build_doc
$(objroot)doc/%.html : $(objroot)doc/%.xml $(srcroot)doc/stylesheet.xsl $(objroot)doc/html.xsl
$(objroot)doc/%$(install_suffix).html : $(objroot)doc/%.xml $(srcroot)doc/stylesheet.xsl $(objroot)doc/html.xsl
ifneq ($(XSLROOT),)
$(XSLTPROC) -o $@ $(objroot)doc/html.xsl $<
else
ifeq ($(wildcard $(DOCS_HTML)),)
@echo "<p>Missing xsltproc. Doc not built.</p>" > $@
endif
@echo "Missing xsltproc. "$@" not (re)built."
endif
$(objroot)doc/%.3 : $(objroot)doc/%.xml $(srcroot)doc/stylesheet.xsl $(objroot)doc/manpages.xsl
$(objroot)doc/%$(install_suffix).3 : $(objroot)doc/%.xml $(srcroot)doc/stylesheet.xsl $(objroot)doc/manpages.xsl
ifneq ($(XSLROOT),)
$(XSLTPROC) -o $@ $(objroot)doc/manpages.xsl $<
# The -o option (output filename) of xsltproc may not work (it uses the
# <refname> in the .xml file). Manually add the suffix if so.
ifneq ($(install_suffix),)
@if [ -f $(objroot)doc/jemalloc.3 ]; then \
mv $(objroot)doc/jemalloc.3 $(objroot)doc/jemalloc$(install_suffix).3 ; \
fi
endif
else
ifeq ($(wildcard $(DOCS_MAN3)),)
@echo "Missing xsltproc. Doc not built." > $@
endif
@echo "Missing xsltproc. "$@" not (re)built."
endif
build_doc_html: $(DOCS_HTML)
build_doc_man: $(DOCS_MAN3)
@ -245,84 +440,173 @@ build_doc: $(DOCS)
# Include generated dependency files.
#
ifdef CC_MM
-include $(C_SYM_OBJS:%.$(O)=%.d)
-include $(C_OBJS:%.$(O)=%.d)
-include $(CPP_OBJS:%.$(O)=%.d)
-include $(C_PIC_OBJS:%.$(O)=%.d)
-include $(CPP_PIC_OBJS:%.$(O)=%.d)
-include $(C_JET_SYM_OBJS:%.$(O)=%.d)
-include $(C_JET_OBJS:%.$(O)=%.d)
-include $(C_TESTLIB_OBJS:%.$(O)=%.d)
-include $(TESTS_OBJS:%.$(O)=%.d)
-include $(TESTS_CPP_OBJS:%.$(O)=%.d)
endif
$(C_SYM_OBJS): $(objroot)src/%.sym.$(O): $(srcroot)src/%.c
$(C_SYM_OBJS): CPPFLAGS += -DJEMALLOC_NO_PRIVATE_NAMESPACE
$(C_SYMS): $(objroot)src/%.sym: $(objroot)src/%.sym.$(O)
$(C_OBJS): $(objroot)src/%.$(O): $(srcroot)src/%.c
$(CPP_OBJS): $(objroot)src/%.$(O): $(srcroot)src/%.cpp
$(C_PIC_OBJS): $(objroot)src/%.pic.$(O): $(srcroot)src/%.c
$(C_PIC_OBJS): CFLAGS += $(PIC_CFLAGS)
$(CPP_PIC_OBJS): $(objroot)src/%.pic.$(O): $(srcroot)src/%.cpp
$(CPP_PIC_OBJS): CXXFLAGS += $(PIC_CFLAGS)
$(C_JET_SYM_OBJS): $(objroot)src/%.jet.sym.$(O): $(srcroot)src/%.c
$(C_JET_SYM_OBJS): CPPFLAGS += -DJEMALLOC_JET -DJEMALLOC_NO_PRIVATE_NAMESPACE
$(C_JET_SYMS): $(objroot)src/%.jet.sym: $(objroot)src/%.jet.sym.$(O)
$(C_JET_OBJS): $(objroot)src/%.jet.$(O): $(srcroot)src/%.c
$(C_JET_OBJS): CFLAGS += -DJEMALLOC_JET
$(C_JET_OBJS): CPPFLAGS += -DJEMALLOC_JET
$(C_TESTLIB_UNIT_OBJS): $(objroot)test/src/%.unit.$(O): $(srcroot)test/src/%.c
$(C_TESTLIB_UNIT_OBJS): CPPFLAGS += -DJEMALLOC_UNIT_TEST
$(C_TESTLIB_INTEGRATION_OBJS): $(objroot)test/src/%.integration.$(O): $(srcroot)test/src/%.c
$(C_TESTLIB_INTEGRATION_OBJS): CPPFLAGS += -DJEMALLOC_INTEGRATION_TEST
$(C_UTIL_INTEGRATION_OBJS): $(objroot)src/%.integration.$(O): $(srcroot)src/%.c
$(C_TESTLIB_ANALYZE_OBJS): $(objroot)test/src/%.analyze.$(O): $(srcroot)test/src/%.c
$(C_TESTLIB_ANALYZE_OBJS): CPPFLAGS += -DJEMALLOC_ANALYZE_TEST
$(C_TESTLIB_STRESS_OBJS): $(objroot)test/src/%.stress.$(O): $(srcroot)test/src/%.c
$(C_TESTLIB_STRESS_OBJS): CPPFLAGS += -DJEMALLOC_STRESS_TEST -DJEMALLOC_STRESS_TESTLIB
$(C_TESTLIB_OBJS): CPPFLAGS += -I$(srcroot)test/include -I$(objroot)test/include
$(TESTS_UNIT_OBJS): CPPFLAGS += -DJEMALLOC_UNIT_TEST
$(TESTS_INTEGRATION_OBJS): CPPFLAGS += -DJEMALLOC_INTEGRATION_TEST
$(TESTS_INTEGRATION_CPP_OBJS): CPPFLAGS += -DJEMALLOC_INTEGRATION_CPP_TEST
$(TESTS_ANALYZE_OBJS): CPPFLAGS += -DJEMALLOC_ANALYZE_TEST
$(TESTS_STRESS_OBJS): CPPFLAGS += -DJEMALLOC_STRESS_TEST
$(TESTS_STRESS_CPP_OBJS): CPPFLAGS += -DJEMALLOC_STRESS_CPP_TEST
$(TESTS_OBJS): $(objroot)test/%.$(O): $(srcroot)test/%.c
$(TESTS_CPP_OBJS): $(objroot)test/%.$(O): $(srcroot)test/%.cpp
$(TESTS_OBJS): CPPFLAGS += -I$(srcroot)test/include -I$(objroot)test/include
$(TESTS_CPP_OBJS): CPPFLAGS += -I$(srcroot)test/include -I$(objroot)test/include
$(TESTS_OBJS): CFLAGS += -fno-builtin
$(TESTS_CPP_OBJS): CPPFLAGS += -fno-builtin
ifneq ($(IMPORTLIB),$(SO))
$(C_OBJS) $(C_JET_OBJS): CPPFLAGS += -DDLLEXPORT
$(CPP_OBJS) $(C_SYM_OBJS) $(C_OBJS) $(C_JET_SYM_OBJS) $(C_JET_OBJS): CPPFLAGS += -DDLLEXPORT
endif
ifndef CC_MM
# Dependencies.
ifndef CC_MM
HEADER_DIRS = $(srcroot)include/jemalloc/internal \
$(objroot)include/jemalloc $(objroot)include/jemalloc/internal
HEADERS = $(wildcard $(foreach dir,$(HEADER_DIRS),$(dir)/*.h))
$(C_OBJS) $(C_PIC_OBJS) $(C_JET_OBJS) $(C_TESTLIB_OBJS) $(TESTS_OBJS): $(HEADERS)
$(TESTS_OBJS): $(objroot)test/include/test/jemalloc_test.h
HEADERS = $(filter-out $(PRIVATE_NAMESPACE_HDRS),$(wildcard $(foreach dir,$(HEADER_DIRS),$(dir)/*.h)))
$(C_SYM_OBJS) $(C_OBJS) $(CPP_OBJS) $(C_PIC_OBJS) $(CPP_PIC_OBJS) $(C_JET_SYM_OBJS) $(C_JET_OBJS) $(C_TESTLIB_OBJS) $(TESTS_OBJS) $(TESTS_CPP_OBJS): $(HEADERS)
$(TESTS_OBJS) $(TESTS_CPP_OBJS): $(objroot)test/include/test/jemalloc_test.h
endif
$(C_OBJS) $(C_PIC_OBJS) $(C_JET_OBJS) $(C_TESTLIB_OBJS) $(TESTS_OBJS): %.$(O):
$(C_OBJS) $(CPP_OBJS) $(C_PIC_OBJS) $(CPP_PIC_OBJS) $(C_TESTLIB_INTEGRATION_OBJS) $(C_UTIL_INTEGRATION_OBJS) $(TESTS_INTEGRATION_OBJS) $(TESTS_INTEGRATION_CPP_OBJS): $(objroot)include/jemalloc/internal/private_namespace.h
$(C_JET_OBJS) $(C_TESTLIB_UNIT_OBJS) $(C_TESTLIB_ANALYZE_OBJS) $(C_TESTLIB_STRESS_OBJS) $(TESTS_UNIT_OBJS) $(TESTS_ANALYZE_OBJS) $(TESTS_STRESS_OBJS) $(TESTS_STRESS_CPP_OBJS): $(objroot)include/jemalloc/internal/private_namespace_jet.h
$(C_SYM_OBJS) $(C_OBJS) $(C_PIC_OBJS) $(C_JET_SYM_OBJS) $(C_JET_OBJS) $(C_TESTLIB_OBJS) $(TESTS_OBJS): %.$(O):
@mkdir -p $(@D)
$(CC) $(CFLAGS) -c $(CPPFLAGS) $(CTARGET) $<
ifdef CC_MM
@$(CC) -MM $(CPPFLAGS) -MT $@ -o $(@:%.$(O)=%.d) $<
endif
$(C_SYMS): %.sym:
@mkdir -p $(@D)
$(DUMP_SYMS) $< | $(AWK) -f $(objroot)include/jemalloc/internal/private_symbols.awk > $@
$(C_JET_SYMS): %.sym:
@mkdir -p $(@D)
$(DUMP_SYMS) $< | $(AWK) -f $(objroot)include/jemalloc/internal/private_symbols_jet.awk > $@
$(objroot)include/jemalloc/internal/private_namespace.gen.h: $(C_SYMS)
$(SHELL) $(srcroot)include/jemalloc/internal/private_namespace.sh $^ > $@
$(objroot)include/jemalloc/internal/private_namespace_jet.gen.h: $(C_JET_SYMS)
$(SHELL) $(srcroot)include/jemalloc/internal/private_namespace.sh $^ > $@
%.h: %.gen.h
@if ! `cmp -s $< $@` ; then echo "cp $< $@"; cp $< $@ ; fi
$(CPP_OBJS) $(CPP_PIC_OBJS) $(TESTS_CPP_OBJS): %.$(O):
@mkdir -p $(@D)
$(CXX) $(CXXFLAGS) -c $(CPPFLAGS) $(CTARGET) $<
ifdef CC_MM
@$(CXX) -MM $(CPPFLAGS) -MT $@ -o $(@:%.$(O)=%.d) $<
endif
ifneq ($(SOREV),$(SO))
%.$(SO) : %.$(SOREV)
@mkdir -p $(@D)
ln -sf $(<F) $@
endif
$(objroot)lib/$(LIBJEMALLOC).$(SOREV) : $(if $(PIC_CFLAGS),$(C_PIC_OBJS),$(C_OBJS))
$(objroot)lib/$(LIBJEMALLOC).$(SOREV) : $(if $(PIC_CFLAGS),$(C_PIC_OBJS),$(C_OBJS)) $(if $(PIC_CFLAGS),$(CPP_PIC_OBJS),$(CPP_OBJS))
@mkdir -p $(@D)
ifeq (@enable_cxx@, 1)
$(CXX) $(DSO_LDFLAGS) $(call RPATH,$(RPATH_EXTRA)) $(LDTARGET) $+ $(LDFLAGS) $(LIBS) $(EXTRA_LDFLAGS)
else
$(CC) $(DSO_LDFLAGS) $(call RPATH,$(RPATH_EXTRA)) $(LDTARGET) $+ $(LDFLAGS) $(LIBS) $(EXTRA_LDFLAGS)
endif
$(objroot)lib/$(LIBJEMALLOC)_pic.$(A) : $(C_PIC_OBJS)
$(objroot)lib/$(LIBJEMALLOC).$(A) : $(C_OBJS)
$(objroot)lib/$(LIBJEMALLOC)_s.$(A) : $(C_OBJS)
$(objroot)lib/$(LIBJEMALLOC)_pic.$(A) : $(C_PIC_OBJS) $(CPP_PIC_OBJS)
$(objroot)lib/$(LIBJEMALLOC).$(A) : $(C_OBJS) $(CPP_OBJS)
$(objroot)lib/$(LIBJEMALLOC)_s.$(A) : $(C_OBJS) $(CPP_OBJS)
$(STATIC_LIBS):
@mkdir -p $(@D)
$(AR) $(ARFLAGS)@AROUT@ $+
$(objroot)test/unit/%$(EXE): $(objroot)test/unit/%.$(O) $(TESTS_UNIT_LINK_OBJS) $(C_JET_OBJS) $(C_TESTLIB_UNIT_OBJS)
$(objroot)test/unit/%$(EXE): $(objroot)test/unit/%.$(O) $(C_JET_OBJS) $(C_TESTLIB_UNIT_OBJS)
@mkdir -p $(@D)
$(CC) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(LDFLAGS) $(filter-out -lm,$(LIBS)) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/integration/%$(EXE): $(objroot)test/integration/%.$(O) $(C_TESTLIB_INTEGRATION_OBJS) $(C_UTIL_INTEGRATION_OBJS) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB)
@mkdir -p $(@D)
$(CC) $(TEST_LD_MODE) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(LJEMALLOC) $(LDFLAGS) $(filter-out -lm,$(filter -lrt -lpthread,$(LIBS))) $(LM) $(EXTRA_LDFLAGS)
$(CC) $(TEST_LD_MODE) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(LJEMALLOC) $(LDFLAGS) $(filter-out -lm,$(filter -lrt -pthread -lstdc++,$(LIBS))) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/integration/cpp/%$(EXE): $(objroot)test/integration/cpp/%.$(O) $(C_TESTLIB_INTEGRATION_OBJS) $(C_UTIL_INTEGRATION_OBJS) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB)
@mkdir -p $(@D)
$(CXX) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB) $(LDFLAGS) $(filter-out -lm,$(LIBS)) -lm $(EXTRA_LDFLAGS)
$(objroot)test/analyze/%$(EXE): $(objroot)test/analyze/%.$(O) $(C_JET_OBJS) $(C_TESTLIB_ANALYZE_OBJS)
@mkdir -p $(@D)
$(CC) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(LDFLAGS) $(filter-out -lm,$(LIBS)) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/stress/%$(EXE): $(objroot)test/stress/%.$(O) $(C_JET_OBJS) $(C_TESTLIB_STRESS_OBJS) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB)
@mkdir -p $(@D)
$(CC) $(TEST_LD_MODE) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(objroot)lib/$(LIBJEMALLOC).$(IMPORTLIB) $(LDFLAGS) $(filter-out -lm,$(LIBS)) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/stress/pa/pa_data_preprocessor$(EXE): $(objroot)test/stress/pa/pa_data_preprocessor.$(O)
@mkdir -p $(@D)
$(CXX) $(LDTARGET) $(filter %.$(O),$^) $(LDFLAGS) $(filter-out -lm,$(LIBS)) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/stress/pa/pa_microbench$(EXE): $(objroot)test/stress/pa/pa_microbench.$(O) $(C_JET_OBJS) $(C_TESTLIB_STRESS_OBJS)
@mkdir -p $(@D)
$(CC) $(LDTARGET) $(filter %.$(O),$^) $(call RPATH,$(objroot)lib) $(LDFLAGS) $(filter-out -lm,$(LIBS)) $(LM) $(EXTRA_LDFLAGS)
$(objroot)test/stress/pa/%.$(O): $(srcroot)test/stress/pa/%.c
@mkdir -p $(@D)
$(CC) $(CFLAGS) -c $(CPPFLAGS) -DJEMALLOC_STRESS_TEST -I$(srcroot)test/include -I$(objroot)test/include $(CTARGET) $<
ifdef CC_MM
@$(CC) -MM $(CPPFLAGS) -DJEMALLOC_STRESS_TEST -I$(srcroot)test/include -I$(objroot)test/include -MT $@ -o $(@:%.$(O)=%.d) $<
endif
$(objroot)test/stress/pa/%.$(O): $(srcroot)test/stress/pa/%.cpp
@mkdir -p $(@D)
$(CXX) $(CXXFLAGS) -c $(CPPFLAGS) -I$(srcroot)test/include -I$(objroot)test/include $(CTARGET) $<
ifdef CC_MM
@$(CXX) -MM $(CPPFLAGS) -I$(srcroot)test/include -I$(objroot)test/include -MT $@ -o $(@:%.$(O)=%.d) $<
endif
build_lib_shared: $(DSOS)
build_lib_static: $(STATIC_LIBS)
build_lib: build_lib_shared build_lib_static
ifeq ($(enable_shared), 1)
build_lib: build_lib_shared
endif
ifeq ($(enable_static), 1)
build_lib: build_lib_static
endif
install_bin:
$(INSTALL) -d $(BINDIR)
@ -359,16 +643,22 @@ install_lib_pc: $(PC)
$(INSTALL) -m 644 $$l $(LIBDIR)/pkgconfig; \
done
install_lib: install_lib_shared install_lib_static install_lib_pc
ifeq ($(enable_shared), 1)
install_lib: install_lib_shared
endif
ifeq ($(enable_static), 1)
install_lib: install_lib_static
endif
install_lib: install_lib_pc
install_doc_html:
install_doc_html: build_doc_html
$(INSTALL) -d $(DATADIR)/doc/jemalloc$(install_suffix)
@for d in $(DOCS_HTML); do \
echo "$(INSTALL) -m 644 $$d $(DATADIR)/doc/jemalloc$(install_suffix)"; \
$(INSTALL) -m 644 $$d $(DATADIR)/doc/jemalloc$(install_suffix); \
done
install_doc_man:
install_doc_man: build_doc_man
$(INSTALL) -d $(MANDIR)/man3
@for d in $(DOCS_MAN3); do \
echo "$(INSTALL) -m 644 $$d $(MANDIR)/man3"; \
@ -377,94 +667,124 @@ done
install_doc: install_doc_html install_doc_man
install: install_bin install_include install_lib install_doc
install: install_bin install_include install_lib
ifeq ($(enable_doc), 1)
install: install_doc
endif
uninstall_bin:
$(RM) -v $(foreach b,$(notdir $(BINS)),$(BINDIR)/$(b))
uninstall_include:
$(RM) -v $(foreach h,$(notdir $(C_HDRS)),$(INCLUDEDIR)/jemalloc/$(h))
rmdir -v $(INCLUDEDIR)/jemalloc
uninstall_lib_shared:
$(RM) -v $(LIBDIR)/$(LIBJEMALLOC).$(SOREV)
ifneq ($(SOREV),$(SO))
$(RM) -v $(LIBDIR)/$(LIBJEMALLOC).$(SO)
endif
uninstall_lib_static:
$(RM) -v $(foreach l,$(notdir $(STATIC_LIBS)),$(LIBDIR)/$(l))
uninstall_lib_pc:
$(RM) -v $(foreach p,$(notdir $(PC)),$(LIBDIR)/pkgconfig/$(p))
ifeq ($(enable_shared), 1)
uninstall_lib: uninstall_lib_shared
endif
ifeq ($(enable_static), 1)
uninstall_lib: uninstall_lib_static
endif
uninstall_lib: uninstall_lib_pc
uninstall_doc_html:
$(RM) -v $(foreach d,$(notdir $(DOCS_HTML)),$(DATADIR)/doc/jemalloc$(install_suffix)/$(d))
rmdir -v $(DATADIR)/doc/jemalloc$(install_suffix)
uninstall_doc_man:
$(RM) -v $(foreach d,$(notdir $(DOCS_MAN3)),$(MANDIR)/man3/$(d))
uninstall_doc: uninstall_doc_html uninstall_doc_man
uninstall: uninstall_bin uninstall_include uninstall_lib
ifeq ($(enable_doc), 1)
uninstall: uninstall_doc
endif
tests_unit: $(TESTS_UNIT:$(srcroot)%.c=$(objroot)%$(EXE))
tests_integration: $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%$(EXE))
tests_stress: $(TESTS_STRESS:$(srcroot)%.c=$(objroot)%$(EXE))
tests: tests_unit tests_integration tests_stress
tests_integration: $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%$(EXE)) $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%$(EXE))
tests_analyze: $(TESTS_ANALYZE:$(srcroot)%.c=$(objroot)%$(EXE))
tests_stress: $(TESTS_STRESS:$(srcroot)%.c=$(objroot)%$(EXE)) $(TESTS_STRESS_CPP:$(srcroot)%.cpp=$(objroot)%$(EXE))
tests_pa: $(objroot)test/stress/pa/pa_data_preprocessor$(EXE) $(objroot)test/stress/pa/pa_microbench$(EXE)
tests: tests_unit tests_integration tests_analyze tests_stress
check_unit_dir:
@mkdir -p $(objroot)test/unit
check_integration_dir:
@mkdir -p $(objroot)test/integration
analyze_dir:
@mkdir -p $(objroot)test/analyze
stress_dir:
@mkdir -p $(objroot)test/stress
check_dir: check_unit_dir check_integration_dir
check_unit: tests_unit check_unit_dir
$(MALLOC_CONF)="purge:ratio" $(SHELL) $(objroot)test/test.sh $(TESTS_UNIT:$(srcroot)%.c=$(objroot)%)
$(MALLOC_CONF)="purge:decay" $(SHELL) $(objroot)test/test.sh $(TESTS_UNIT:$(srcroot)%.c=$(objroot)%)
$(SHELL) $(objroot)test/test.sh $(TESTS_UNIT:$(srcroot)%.c=$(objroot)%)
check_integration_prof: tests_integration check_integration_dir
ifeq ($(enable_prof), 1)
$(MALLOC_CONF)="prof:true" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%)
$(MALLOC_CONF)="prof:true,prof_active:false" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%)
$(MALLOC_CONF)="prof:true" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%) $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%)
$(MALLOC_CONF)="prof:true,prof_active:false" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%) $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%)
endif
check_integration_decay: tests_integration check_integration_dir
$(MALLOC_CONF)="purge:decay,decay_time:-1" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%)
$(MALLOC_CONF)="purge:decay,decay_time:0" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%)
$(MALLOC_CONF)="purge:decay" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%)
$(MALLOC_CONF)="dirty_decay_ms:-1,muzzy_decay_ms:-1" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%) $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%)
$(MALLOC_CONF)="dirty_decay_ms:0,muzzy_decay_ms:0" $(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%) $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%)
check_integration: tests_integration check_integration_dir
$(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%)
$(SHELL) $(objroot)test/test.sh $(TESTS_INTEGRATION:$(srcroot)%.c=$(objroot)%) $(TESTS_INTEGRATION_CPP:$(srcroot)%.cpp=$(objroot)%)
analyze: tests_analyze analyze_dir
ifeq ($(enable_prof), 1)
$(MALLOC_CONF)="prof:true" $(SHELL) $(objroot)test/test.sh $(TESTS_ANALYZE:$(srcroot)%.c=$(objroot)%)
else
$(SHELL) $(objroot)test/test.sh $(TESTS_ANALYZE:$(srcroot)%.c=$(objroot)%)
endif
stress: tests_stress stress_dir
$(SHELL) $(objroot)test/test.sh $(TESTS_STRESS:$(srcroot)%.c=$(objroot)%)
$(SHELL) $(objroot)test/test.sh $(TESTS_STRESS_CPP:$(srcroot)%.cpp=$(objroot)%)
check: check_unit check_integration check_integration_decay check_integration_prof
ifeq ($(enable_code_coverage), 1)
coverage_unit: check_unit
$(SHELL) $(srcroot)coverage.sh $(srcroot)src jet $(C_JET_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/src unit $(C_TESTLIB_UNIT_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/unit unit $(TESTS_UNIT_OBJS)
coverage_integration: check_integration
$(SHELL) $(srcroot)coverage.sh $(srcroot)src pic $(C_PIC_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)src integration $(C_UTIL_INTEGRATION_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/src integration $(C_TESTLIB_INTEGRATION_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/integration integration $(TESTS_INTEGRATION_OBJS)
coverage_stress: stress
$(SHELL) $(srcroot)coverage.sh $(srcroot)src pic $(C_PIC_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)src jet $(C_JET_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/src stress $(C_TESTLIB_STRESS_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/stress stress $(TESTS_STRESS_OBJS)
coverage: check
$(SHELL) $(srcroot)coverage.sh $(srcroot)src pic $(C_PIC_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)src jet $(C_JET_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)src integration $(C_UTIL_INTEGRATION_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/src unit $(C_TESTLIB_UNIT_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/src integration $(C_TESTLIB_INTEGRATION_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/src stress $(C_TESTLIB_STRESS_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/unit unit $(TESTS_UNIT_OBJS) $(TESTS_UNIT_AUX_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/integration integration $(TESTS_INTEGRATION_OBJS)
$(SHELL) $(srcroot)coverage.sh $(srcroot)test/stress integration $(TESTS_STRESS_OBJS)
endif
clean:
rm -f $(PRIVATE_NAMESPACE_HDRS)
rm -f $(PRIVATE_NAMESPACE_GEN_HDRS)
rm -f $(C_SYM_OBJS)
rm -f $(C_SYMS)
rm -f $(C_OBJS)
rm -f $(CPP_OBJS)
rm -f $(C_PIC_OBJS)
rm -f $(CPP_PIC_OBJS)
rm -f $(C_JET_SYM_OBJS)
rm -f $(C_JET_SYMS)
rm -f $(C_JET_OBJS)
rm -f $(C_TESTLIB_OBJS)
rm -f $(C_SYM_OBJS:%.$(O)=%.d)
rm -f $(C_OBJS:%.$(O)=%.d)
rm -f $(C_OBJS:%.$(O)=%.gcda)
rm -f $(C_OBJS:%.$(O)=%.gcno)
rm -f $(CPP_OBJS:%.$(O)=%.d)
rm -f $(C_PIC_OBJS:%.$(O)=%.d)
rm -f $(C_PIC_OBJS:%.$(O)=%.gcda)
rm -f $(C_PIC_OBJS:%.$(O)=%.gcno)
rm -f $(CPP_PIC_OBJS:%.$(O)=%.d)
rm -f $(C_JET_SYM_OBJS:%.$(O)=%.d)
rm -f $(C_JET_OBJS:%.$(O)=%.d)
rm -f $(C_JET_OBJS:%.$(O)=%.gcda)
rm -f $(C_JET_OBJS:%.$(O)=%.gcno)
rm -f $(C_TESTLIB_OBJS:%.$(O)=%.d)
rm -f $(C_TESTLIB_OBJS:%.$(O)=%.gcda)
rm -f $(C_TESTLIB_OBJS:%.$(O)=%.gcno)
rm -f $(TESTS_OBJS:%.$(O)=%$(EXE))
rm -f $(TESTS_OBJS)
rm -f $(TESTS_OBJS:%.$(O)=%.d)
rm -f $(TESTS_OBJS:%.$(O)=%.gcda)
rm -f $(TESTS_OBJS:%.$(O)=%.gcno)
rm -f $(TESTS_OBJS:%.$(O)=%.out)
rm -f $(TESTS_CPP_OBJS:%.$(O)=%$(EXE))
rm -f $(TESTS_CPP_OBJS)
rm -f $(TESTS_CPP_OBJS:%.$(O)=%.d)
rm -f $(TESTS_CPP_OBJS:%.$(O)=%.out)
rm -f $(DSOS) $(STATIC_LIBS)
rm -f $(objroot)*.gcov.*
distclean: clean
rm -f $(objroot)bin/jemalloc-config

14
README
View file

@ -3,12 +3,12 @@ fragmentation avoidance and scalable concurrency support. jemalloc first came
into use as the FreeBSD libc allocator in 2005, and since then it has found its
way into numerous applications that rely on its predictable behavior. In 2010
jemalloc development efforts broadened to include developer support features
such as heap profiling, Valgrind integration, and extensive monitoring/tuning
hooks. Modern jemalloc releases continue to be integrated back into FreeBSD,
and therefore versatility remains critical. Ongoing development efforts trend
toward making jemalloc among the best allocators for a broad range of demanding
applications, and eliminating/mitigating weaknesses that have practical
repercussions for real world applications.
such as heap profiling and extensive monitoring/tuning hooks. Modern jemalloc
releases continue to be integrated back into FreeBSD, and therefore versatility
remains critical. Ongoing development efforts trend toward making jemalloc
among the best allocators for a broad range of demanding applications, and
eliminating/mitigating weaknesses that have practical repercussions for real
world applications.
The COPYING file contains copyright and licensing information.
@ -17,4 +17,4 @@ jemalloc.
The ChangeLog file contains a brief summary of changes for each release.
URL: http://jemalloc.net/
URL: https://jemalloc.net/

129
TUNING.md Normal file
View file

@ -0,0 +1,129 @@
This document summarizes the common approaches for performance fine tuning with
jemalloc (as of 5.3.0). The default configuration of jemalloc tends to work
reasonably well in practice, and most applications should not have to tune any
options. However, in order to cover a wide range of applications and avoid
pathological cases, the default setting is sometimes kept conservative and
suboptimal, even for many common workloads. When jemalloc is properly tuned for
a specific application / workload, it is common to improve system level metrics
by a few percent, or make favorable trade-offs.
## Notable runtime options for performance tuning
Runtime options can be set via
[malloc_conf](https://jemalloc.net/jemalloc.3.html#tuning).
* [background_thread](https://jemalloc.net/jemalloc.3.html#background_thread)
Enabling jemalloc background threads generally improves the tail latency for
application threads, since unused memory purging is shifted to the dedicated
background threads. In addition, unintended purging delay caused by
application inactivity is avoided with background threads.
Suggested: `background_thread:true` when jemalloc managed threads can be
allowed.
* [metadata_thp](https://jemalloc.net/jemalloc.3.html#opt.metadata_thp)
Allowing jemalloc to utilize transparent huge pages for its internal
metadata usually reduces TLB misses significantly, especially for programs
with large memory footprint and frequent allocation / deallocation
activities. Metadata memory usage may increase due to the use of huge
pages.
Suggested for allocation intensive programs: `metadata_thp:auto` or
`metadata_thp:always`, which is expected to improve CPU utilization at a
small memory cost.
* [dirty_decay_ms](https://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms) and
[muzzy_decay_ms](https://jemalloc.net/jemalloc.3.html#opt.muzzy_decay_ms)
Decay time determines how fast jemalloc returns unused pages back to the
operating system, and therefore provides a fairly straightforward trade-off
between CPU and memory usage. Shorter decay time purges unused pages faster
to reduces memory usage (usually at the cost of more CPU cycles spent on
purging), and vice versa.
Suggested: tune the values based on the desired trade-offs.
* [narenas](https://jemalloc.net/jemalloc.3.html#opt.narenas)
By default jemalloc uses multiple arenas to reduce internal lock contention.
However high arena count may also increase overall memory fragmentation,
since arenas manage memory independently. When high degree of parallelism
is not expected at the allocator level, lower number of arenas often
improves memory usage.
Suggested: if low parallelism is expected, try lower arena count while
monitoring CPU and memory usage.
* [percpu_arena](https://jemalloc.net/jemalloc.3.html#opt.percpu_arena)
Enable dynamic thread to arena association based on running CPU. This has
the potential to improve locality, e.g. when thread to CPU affinity is
present.
Suggested: try `percpu_arena:percpu` or `percpu_arena:phycpu` if
thread migration between processors is expected to be infrequent.
Examples:
* High resource consumption application, prioritizing CPU utilization:
`background_thread:true,metadata_thp:auto` combined with relaxed decay time
(increased `dirty_decay_ms` and / or `muzzy_decay_ms`,
e.g. `dirty_decay_ms:30000,muzzy_decay_ms:30000`).
* High resource consumption application, prioritizing memory usage:
`background_thread:true,tcache_max:4096` combined with shorter decay time
(decreased `dirty_decay_ms` and / or `muzzy_decay_ms`,
e.g. `dirty_decay_ms:5000,muzzy_decay_ms:5000`), and lower arena count
(e.g. number of CPUs).
* Low resource consumption application:
`narenas:1,tcache_max:1024` combined with shorter decay time (decreased
`dirty_decay_ms` and / or `muzzy_decay_ms`,e.g.
`dirty_decay_ms:1000,muzzy_decay_ms:0`).
* Extremely conservative -- minimize memory usage at all costs, only suitable when
allocation activity is very rare:
`narenas:1,tcache:false,dirty_decay_ms:0,muzzy_decay_ms:0`
Note that it is recommended to combine the options with `abort_conf:true` which
aborts immediately on illegal options.
## Beyond runtime options
In addition to the runtime options, there are a number of programmatic ways to
improve application performance with jemalloc.
* [Explicit arenas](https://jemalloc.net/jemalloc.3.html#arenas.create)
Manually created arenas can help performance in various ways, e.g. by
managing locality and contention for specific usages. For example,
applications can explicitly allocate frequently accessed objects from a
dedicated arena with
[mallocx()](https://jemalloc.net/jemalloc.3.html#MALLOCX_ARENA) to improve
locality. In addition, explicit arenas often benefit from individually
tuned options, e.g. relaxed [decay
time](https://jemalloc.net/jemalloc.3.html#arena.i.dirty_decay_ms) if
frequent reuse is expected.
* [Extent hooks](https://jemalloc.net/jemalloc.3.html#arena.i.extent_hooks)
Extent hooks allow customization for managing underlying memory. One use
case for performance purpose is to utilize huge pages -- for example,
[HHVM](httpss://github.com/facebook/hhvm/blob/master/hphp/util/alloc.cpp)
uses explicit arenas with customized extent hooks to manage 1GB huge pages
for frequently accessed data, which reduces TLB misses significantly.
* [Explicit thread-to-arena
binding](https://jemalloc.net/jemalloc.3.html#thread.arena)
It is common for some threads in an application to have different memory
access / allocation patterns. Threads with heavy workloads often benefit
from explicit binding, e.g. binding very active threads to dedicated arenas
may reduce contention at the allocator level.

View file

@ -9,8 +9,8 @@ for i in autoconf; do
fi
done
echo "./configure --enable-autogen $@"
./configure --enable-autogen $@
echo "./configure --enable-autogen \"$@\""
./configure --enable-autogen "$@"
if [ $? -ne 0 ]; then
echo "Error $? in ./configure"
exit 1

View file

@ -18,6 +18,7 @@ Options:
--cc : Print compiler used to build jemalloc.
--cflags : Print compiler flags used to build jemalloc.
--cppflags : Print preprocessor flags used to build jemalloc.
--cxxflags : Print C++ compiler flags used to build jemalloc.
--ldflags : Print library flags used to build jemalloc.
--libs : Print libraries jemalloc was linked against.
EOF
@ -67,6 +68,9 @@ case "$1" in
--cppflags)
echo "@CPPFLAGS@"
;;
--cxxflags)
echo "@CXXFLAGS@"
;;
--ldflags)
echo "@LDFLAGS@ @EXTRA_LDFLAGS@"
;;

View file

@ -71,6 +71,7 @@
use strict;
use warnings;
use Getopt::Long;
use Cwd;
my $JEPROF_VERSION = "@jemalloc_version@";
my $PPROF_VERSION = "2.0";
@ -87,6 +88,7 @@ my %obj_tool_map = (
#"nm_pdb" => "nm-pdb", # for reading windows (PDB-format) executables
#"addr2line_pdb" => "addr2line-pdb", # ditto
#"otool" => "otool", # equivalent of objdump on OS X
#"dyld_info" => "dyld_info", # equivalent of otool on OS X for shared cache
);
# NOTE: these are lists, so you can put in commandline flags if you want.
my @DOT = ("dot"); # leave non-absolute, since it may be in /usr/local
@ -204,6 +206,8 @@ Output type:
--svg Generate SVG to stdout
--gif Generate GIF to stdout
--raw Generate symbolized jeprof data (useful with remote fetch)
--collapsed Generate collapsed stacks for building flame graphs
(see http://www.brendangregg.com/flamegraphs.html)
Heap-Profile Options:
--inuse_space Display in-use (mega)bytes [default]
@ -237,6 +241,7 @@ Miscellaneous:
--test Run unit tests
--help This message
--version Version information
--debug-syms-by-id (Linux only) Find debug symbol files by build ID as well as by name
Environment Variables:
JEPROF_TMPDIR Profiles directory. Defaults to \$HOME/jeprof
@ -331,6 +336,7 @@ sub Init() {
$main::opt_gif = 0;
$main::opt_svg = 0;
$main::opt_raw = 0;
$main::opt_collapsed = 0;
$main::opt_nodecount = 80;
$main::opt_nodefraction = 0.005;
@ -361,6 +367,7 @@ sub Init() {
$main::opt_tools = "";
$main::opt_debug = 0;
$main::opt_test = 0;
$main::opt_debug_syms_by_id = 0;
# These are undocumented flags used only by unittests.
$main::opt_test_stride = 0;
@ -404,6 +411,7 @@ sub Init() {
"svg!" => \$main::opt_svg,
"gif!" => \$main::opt_gif,
"raw!" => \$main::opt_raw,
"collapsed!" => \$main::opt_collapsed,
"interactive!" => \$main::opt_interactive,
"nodecount=i" => \$main::opt_nodecount,
"nodefraction=f" => \$main::opt_nodefraction,
@ -428,6 +436,7 @@ sub Init() {
"tools=s" => \$main::opt_tools,
"test!" => \$main::opt_test,
"debug!" => \$main::opt_debug,
"debug-syms-by-id!" => \$main::opt_debug_syms_by_id,
# Undocumented flags used only by unittests:
"test_stride=i" => \$main::opt_test_stride,
) || usage("Invalid option(s)");
@ -489,6 +498,7 @@ sub Init() {
$main::opt_svg +
$main::opt_gif +
$main::opt_raw +
$main::opt_collapsed +
$main::opt_interactive +
0;
if ($modes > 1) {
@ -571,6 +581,11 @@ sub Init() {
foreach (@prefix_list) {
s|/+$||;
}
# Flag to prevent us from trying over and over to use
# elfutils if it's not installed (used only with
# --debug-syms-by-id option).
$main::gave_up_on_elfutils = 0;
}
sub FilterAndPrint {
@ -620,6 +635,8 @@ sub FilterAndPrint {
PrintText($symbols, $flat, $cumulative, -1);
} elsif ($main::opt_raw) {
PrintSymbolizedProfile($symbols, $profile, $main::prog);
} elsif ($main::opt_collapsed) {
PrintCollapsedStacks($symbols, $profile);
} elsif ($main::opt_callgrind) {
PrintCallgrind($calls);
} else {
@ -672,15 +689,15 @@ sub Main() {
my $symbol_map = {};
# Read one profile, pick the last item on the list
my $data = ReadProfile($main::prog, pop(@main::profile_files));
my $data = ReadProfile($main::prog, $main::profile_files[0]);
my $profile = $data->{profile};
my $pcs = $data->{pcs};
my $libs = $data->{libs}; # Info about main program and shared libraries
$symbol_map = MergeSymbols($symbol_map, $data->{symbols});
# Add additional profiles, if available.
if (scalar(@main::profile_files) > 0) {
foreach my $pname (@main::profile_files) {
if (scalar(@main::profile_files) > 1) {
foreach my $pname (@main::profile_files[1..$#main::profile_files]) {
my $data2 = ReadProfile($main::prog, $pname);
$profile = AddProfile($profile, $data2->{profile});
$pcs = AddPcs($pcs, $data2->{pcs});
@ -2809,6 +2826,40 @@ sub IsSecondPcAlwaysTheSame {
return $second_pc;
}
sub ExtractSymbolNameInlineStack {
my $symbols = shift;
my $address = shift;
my @stack = ();
if (exists $symbols->{$address}) {
my @localinlinestack = @{$symbols->{$address}};
for (my $i = $#localinlinestack; $i > 0; $i-=3) {
my $file = $localinlinestack[$i-1];
my $fn = $localinlinestack[$i-0];
if ($file eq "?" || $file eq ":0") {
$file = "??:0";
}
if ($fn eq '??') {
# If we can't get the symbol name, at least use the file information.
$fn = $file;
}
my $suffix = "[inline]";
if ($i == 2) {
$suffix = "";
}
push (@stack, $fn.$suffix);
}
}
else {
# If we can't get a symbol name, at least fill in the address.
push (@stack, $address);
}
return @stack;
}
sub ExtractSymbolLocation {
my $symbols = shift;
my $address = shift;
@ -2883,6 +2934,17 @@ sub FilterFrames {
return $result;
}
sub PrintCollapsedStacks {
my $symbols = shift;
my $profile = shift;
while (my ($stack_trace, $count) = each %$profile) {
my @address = split(/\n/, $stack_trace);
my @names = reverse ( map { ExtractSymbolNameInlineStack($symbols, $_) } @address );
printf("%s %d\n", join(";", @names), $count);
}
}
sub RemoveUninterestingFrames {
my $symbols = shift;
my $profile = shift;
@ -2891,21 +2953,46 @@ sub RemoveUninterestingFrames {
my %skip = ();
my $skip_regexp = 'NOMATCH';
if ($main::profile_type eq 'heap' || $main::profile_type eq 'growth') {
foreach my $name ('calloc',
foreach my $name ('@JEMALLOC_PREFIX@calloc',
'cfree',
'malloc',
'free',
'memalign',
'posix_memalign',
'aligned_alloc',
'@JEMALLOC_PREFIX@malloc',
'je_malloc_default',
'newImpl',
'void* newImpl',
'fallbackNewImpl',
'void* fallbackNewImpl',
'fallback_impl',
'void* fallback_impl',
'imalloc',
'int imalloc',
'imalloc_body',
'int imalloc_body',
'prof_alloc_prep',
'prof_tctx_t *prof_alloc_prep',
'prof_backtrace_impl',
'void prof_backtrace_impl',
'je_prof_backtrace',
'void je_prof_backtrace',
'je_prof_tctx_create',
'prof_tctx_t* prof_tctx_create',
'@JEMALLOC_PREFIX@free',
'@JEMALLOC_PREFIX@memalign',
'@JEMALLOC_PREFIX@posix_memalign',
'@JEMALLOC_PREFIX@aligned_alloc',
'pvalloc',
'valloc',
'realloc',
'mallocx', # jemalloc
'rallocx', # jemalloc
'xallocx', # jemalloc
'dallocx', # jemalloc
'sdallocx', # jemalloc
'@JEMALLOC_PREFIX@valloc',
'@JEMALLOC_PREFIX@realloc',
'@JEMALLOC_PREFIX@mallocx',
'irallocx_prof',
'void *irallocx_prof',
'@JEMALLOC_PREFIX@rallocx',
'do_rallocx',
'ixallocx_prof',
'size_t ixallocx_prof',
'@JEMALLOC_PREFIX@xallocx',
'@JEMALLOC_PREFIX@dallocx',
'@JEMALLOC_PREFIX@sdallocx',
'@JEMALLOC_PREFIX@sdallocx_noflags',
'tc_calloc',
'tc_cfree',
'tc_malloc',
@ -3014,6 +3101,8 @@ sub RemoveUninterestingFrames {
foreach my $a (@addrs) {
if (exists($symbols->{$a})) {
my $func = $symbols->{$a}->[0];
# Remove suffix in the symbols following space when filtering.
$func =~ s/ .*//;
if ($skip{$func} || ($func =~ m/$skip_regexp/)) {
# Throw away the portion of the backtrace seen so far, under the
# assumption that previous frames were for functions internal to the
@ -4436,16 +4525,54 @@ sub FindLibrary {
# For libc libraries, the copy in /usr/lib/debug contains debugging symbols
sub DebuggingLibrary {
my $file = shift;
if ($file =~ m|^/|) {
if (-f "/usr/lib/debug$file") {
return "/usr/lib/debug$file";
} elsif (-f "/usr/lib/debug$file.debug") {
return "/usr/lib/debug$file.debug";
}
if ($file !~ m|^/|) {
return undef;
}
# Find debug symbol file if it's named after the library's name.
if (-f "/usr/lib/debug$file") {
if($main::opt_debug) { print STDERR "found debug info for $file in /usr/lib/debug$file\n"; }
return "/usr/lib/debug$file";
} elsif (-f "/usr/lib/debug$file.debug") {
if($main::opt_debug) { print STDERR "found debug info for $file in /usr/lib/debug$file.debug\n"; }
return "/usr/lib/debug$file.debug";
}
if(!$main::opt_debug_syms_by_id) {
if($main::opt_debug) { print STDERR "no debug symbols found for $file\n" };
return undef;
}
# Find debug file if it's named after the library's build ID.
my $readelf = '';
if (!$main::gave_up_on_elfutils) {
$readelf = qx/eu-readelf -n ${file}/;
if ($?) {
print STDERR "Cannot run eu-readelf. To use --debug-syms-by-id you must be on Linux, with elfutils installed.\n";
$main::gave_up_on_elfutils = 1;
return undef;
}
my $buildID = $1 if $readelf =~ /Build ID: ([A-Fa-f0-9]+)/s;
if (defined $buildID && length $buildID > 0) {
my $symbolFile = '/usr/lib/debug/.build-id/' . substr($buildID, 0, 2) . '/' . substr($buildID, 2) . '.debug';
if (-e $symbolFile) {
if($main::opt_debug) { print STDERR "found debug symbol file $symbolFile for $file\n" };
return $symbolFile;
} else {
if($main::opt_debug) { print STDERR "no debug symbol file found for $file, build ID: $buildID\n" };
return undef;
}
}
}
if($main::opt_debug) { print STDERR "no debug symbols found for $file, build ID unknown\n" };
return undef;
}
# Parse text section header of a library using objdump
sub ParseTextSectionHeaderFromObjdump {
my $lib = shift;
@ -4555,7 +4682,65 @@ sub ParseTextSectionHeaderFromOtool {
return $r;
}
# Parse text section header of a library in OS X shared cache using dyld_info
sub ParseTextSectionHeaderFromDyldInfo {
my $lib = shift;
my $size = undef;
my $vma;
my $file_offset;
# Get dyld_info output from the library file to figure out how to
# map between mapped addresses and addresses in the library.
my $cmd = ShellEscape($obj_tool_map{"dyld_info"}, "-segments", $lib);
open(DYLD, "$cmd |") || error("$cmd: $!\n");
while (<DYLD>) {
s/\r//g; # turn windows-looking lines into unix-looking lines
# -segments:
# load-address segment section sect-size seg-size perm
# 0x1803E0000 __TEXT 112KB r.x
# 0x1803E4F34 __text 80960
# 0x1803F8B74 __auth_stubs 768
# 0x1803F8E74 __init_offsets 4
# 0x1803F8E78 __gcc_except_tab 1180
my @x = split;
if ($#x >= 2) {
if ($x[0] eq 'load-offset') {
# dyld_info should only be used for the shared lib.
return undef;
} elsif ($x[1] eq '__TEXT') {
$file_offset = $x[0];
} elsif ($x[1] eq '__text') {
$size = $x[2];
$vma = $x[0];
$file_offset = AddressSub($x[0], $file_offset);
last;
}
}
}
close(DYLD);
if (!defined($vma) || !defined($size) || !defined($file_offset)) {
return undef;
}
my $r = {};
$r->{size} = $size;
$r->{vma} = $vma;
$r->{file_offset} = $file_offset;
return $r;
}
sub ParseTextSectionHeader {
# obj_tool_map("dyld_info") is only defined if we're in a Mach-O environment
if (defined($obj_tool_map{"dyld_info"})) {
my $r = ParseTextSectionHeaderFromDyldInfo(@_);
if (defined($r)){
return $r;
}
}
# if dyld_info doesn't work, or we don't have it, fall back to otool
# obj_tool_map("otool") is only defined if we're in a Mach-O environment
if (defined($obj_tool_map{"otool"})) {
my $r = ParseTextSectionHeaderFromOtool(@_);
@ -4570,7 +4755,7 @@ sub ParseTextSectionHeader {
# Split /proc/pid/maps dump into a list of libraries
sub ParseLibraries {
return if $main::use_symbol_page; # We don't need libraries info.
my $prog = shift;
my $prog = Cwd::abs_path(shift);
my $map = shift;
my $pcs = shift;
@ -4596,13 +4781,32 @@ sub ParseLibraries {
$offset = HexExtend($3);
$lib = $4;
$lib =~ s|\\|/|g; # turn windows-style paths into unix-style paths
} elsif ($l =~ /^\s*($h)-($h):\s*(\S+\.so(\.\d+)*)/) {
} elsif ($l =~ /^\s*($h)-($h):\s*(\S+\.(so|dll|dylib|bundle)(\.\d+)*)/) {
# Cooked line from DumpAddressMap. Example:
# 40000000-40015000: /lib/ld-2.3.2.so
$start = HexExtend($1);
$finish = HexExtend($2);
$offset = $zero_offset;
$lib = $3;
} elsif (($l =~ /^($h)-($h)\s+..x.\s+($h)\s+\S+:\S+\s+\d+\s+(\S+)$/i) && ($4 eq $prog)) {
# PIEs and address space randomization do not play well with our
# default assumption that main executable is at lowest
# addresses. So we're detecting main executable in
# /proc/self/maps as well.
$start = HexExtend($1);
$finish = HexExtend($2);
$offset = HexExtend($3);
$lib = $4;
$lib =~ s|\\|/|g; # turn windows-style paths into unix-style paths
} elsif (($l =~ /^\s*($h)-($h):\s*(\S+)/) && ($3 eq $prog)) {
# PIEs and address space randomization do not play well with our
# default assumption that main executable is at lowest
# addresses. So we're detecting main executable from
# DumpAddressMap as well.
$start = HexExtend($1);
$finish = HexExtend($2);
$offset = $zero_offset;
$lib = $3;
}
# FreeBSD 10.0 virtual memory map /proc/curproc/map as defined in
# function procfs_doprocmap (sys/fs/procfs/procfs_map.c)
@ -4973,7 +5177,7 @@ sub MapToSymbols {
} else {
# MapSymbolsWithNM tags each routine with its starting address,
# useful in case the image has multiple occurrences of this
# routine. (It uses a syntax that resembles template paramters,
# routine. (It uses a syntax that resembles template parameters,
# that are automatically stripped out by ShortFunctionName().)
# addr2line does not provide the same information. So we check
# if nm disambiguated our symbol, and if so take the annotated
@ -5133,6 +5337,7 @@ sub ConfigureObjTools {
if ($file_type =~ /Mach-O/) {
# OS X uses otool to examine Mach-O files, rather than objdump.
$obj_tool_map{"otool"} = "otool";
$obj_tool_map{"dyld_info"} = "dyld_info";
$obj_tool_map{"addr2line"} = "false"; # no addr2line
$obj_tool_map{"objdump"} = "false"; # no objdump
}
@ -5325,7 +5530,7 @@ sub GetProcedureBoundaries {
# "nm -f $image" is supposed to fail on GNU nm, but if:
#
# a. $image starts with [BbSsPp] (for example, bin/foo/bar), AND
# b. you have a.out in your current directory (a not uncommon occurence)
# b. you have a.out in your current directory (a not uncommon occurrence)
#
# then "nm -f $image" succeeds because -f only looks at the first letter of
# the argument, which looks valid because it's [BbSsPp], and then since
@ -5353,7 +5558,7 @@ sub GetProcedureBoundaries {
my $demangle_flag = "";
my $cppfilt_flag = "";
my $to_devnull = ">$dev_null 2>&1";
if (system(ShellEscape($nm, "--demangle", "image") . $to_devnull) == 0) {
if (system(ShellEscape($nm, "--demangle", $image) . $to_devnull) == 0) {
# In this mode, we do "nm --demangle <foo>"
$demangle_flag = "--demangle";
$cppfilt_flag = "";

1784
build-aux/config.guess vendored

File diff suppressed because it is too large Load diff

3431
build-aux/config.sub vendored

File diff suppressed because it is too large Load diff

View file

@ -115,7 +115,7 @@ fi
if [ x"$dir_arg" != x ]; then
dst=$src
src=""
if [ -d $dst ]; then
instcmd=:
else
@ -124,7 +124,7 @@ if [ x"$dir_arg" != x ]; then
else
# Waiting for this to be detected by the "$instcmd $src $dsttmp" command
# might cause directories to be created, which would be especially bad
# might cause directories to be created, which would be especially bad
# if $src (and thus $dsttmp) contains '*'.
if [ -f $src -o -d $src ]
@ -134,7 +134,7 @@ else
echo "install: $src does not exist"
exit 1
fi
if [ x"$dst" = x ]
then
echo "install: no destination specified"
@ -201,17 +201,17 @@ else
# If we're going to rename the final executable, determine the name now.
if [ x"$transformarg" = x ]
if [ x"$transformarg" = x ]
then
dstfile=`basename $dst`
else
dstfile=`basename $dst $transformbasename |
dstfile=`basename $dst $transformbasename |
sed $transformarg`$transformbasename
fi
# don't allow the sed command to completely eliminate the filename
if [ x"$dstfile" = x ]
if [ x"$dstfile" = x ]
then
dstfile=`basename $dst`
else
@ -242,7 +242,7 @@ else
# Now rename the file to the real destination.
$doit $rmcmd -f $dstdir/$dstfile &&
$doit $mvcmd $dsttmp $dstdir/$dstfile
$doit $mvcmd $dsttmp $dstdir/$dstfile
fi &&

File diff suppressed because it is too large Load diff

View file

@ -1,16 +0,0 @@
#!/bin/sh
set -e
objdir=$1
suffix=$2
shift 2
objs=$@
gcov -b -p -f -o "${objdir}" ${objs}
# Move gcov outputs so that subsequent gcov invocations won't clobber results
# for the same sources with different compilation flags.
for f in `find . -maxdepth 1 -type f -name '*.gcov'` ; do
mv "${f}" "${f}.${suffix}"
done

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,145 @@
# jemalloc profiling
This describes the mathematical basis behind jemalloc's profiling implementation, as well as the implementation tricks that make it effective. Historically, the jemalloc profiling design simply copied tcmalloc's. The implementation has since diverged, due to both the desire to record additional information, and to correct some biasing bugs.
Note: this document is markdown with embedded LaTeX; different markdown renderers may not produce the expected output. Viewing with `pandoc -s PROFILING_INTERNALS.md -o PROFILING_INTERNALS.pdf` is recommended.
## Some tricks in our implementation toolbag
### Sampling
Recording our metadata is quite expensive; we need to walk up the stack to get a stack trace. On top of that, we need to allocate storage to record that stack trace, and stick it somewhere where a profile-dumping call can find it. That call might happen on another thread, so we'll probably need to take a lock to do so. These costs are quite large compared to the average cost of an allocation. To manage this, we'll only sample some fraction of allocations. This will miss some of them, so our data will be incomplete, but we'll try to make up for it. We can tune our sampling rate to balance accuracy and performance.
### Fast Bernoulli sampling
Compared to our fast paths, even a `coinflip(p)` function can be quite expensive. Having to do a random-number generation and some floating point operations would be a sizeable relative cost. However (as pointed out in [[Vitter, 1987](https://dl.acm.org/doi/10.1145/23002.23003)]), if we can orchestrate our algorithm so that many of our `coinflip` calls share their parameter value, we can do better. We can sample from the geometric distribution, and initialize a counter with the result. When the counter hits 0, the `coinflip` function returns true (and reinitializes its internal counter).
This can let us do a random-number generation once per (logical) coinflip that comes up heads, rather than once per (logical) coinflip. Since we expect to sample relatively rarely, this can be a large win.
### Fast-path / slow-path thinking
Most programs have a skewed distribution of allocations. Smaller allocations are much more frequent than large ones, but shorter lived and less common as a fraction of program memory. "Small" and "large" are necessarily sort of fuzzy terms, but if we define "small" as "allocations jemalloc puts into slabs" and "large" as the others, then it's not uncommon for small allocations to be hundreds of times more frequent than large ones, but take up around half the amount of heap space as large ones. Moreover, small allocations tend to be much cheaper than large ones (often by a factor of 20-30): they're more likely to hit in thread caches, less likely to have to do an mmap, and cheaper to fill (by the user) once the allocation has been returned.
## An unbiased estimator of space consumption from (almost) arbitrary sampling strategies
Suppose we have a sampling strategy that meets the following criteria:
- One allocation being sampled is independent of other allocations being sampled.
- Each allocation has a non-zero probability of being sampled.
We can then estimate the bytes in live allocations through some particular stack trace as:
$$ \sum_i S_i I_i \frac{1}{\mathrm{E}[I_i]} $$
where the sum ranges over some index variable of live allocations from that stack, $S_i$ is the size of the $i$'th allocation, and $I_i$ is an indicator random variable for whether or not the $i'th$ allocation is sampled. $S_i$ and $\mathrm{E}[I_i]$ are constants (the program allocations are fixed; the random variables are the sampling decisions), so taking the expectation we get
$$ \sum_i S_i \mathrm{E}[I_i] \frac{1}{\mathrm{E}[I_i]}.$$
This is of course $\sum_i S_i$, as we want (and, a similar calculation could be done for allocation counts as well).
This is a fairly general strategy; note that while we require that sampling decisions be independent of one another's outcomes, they don't have to be independent of previous allocations, total bytes allocated, etc. You can imagine strategies that:
- Sample allocations at program startup at a higher rate than subsequent allocations
- Sample even-indexed allocations more frequently than odd-indexed ones (so long as no allocation has zero sampling probability)
- Let threads declare themselves as high-sampling-priority, and sample their allocations at an increased rate.
These can all be fit into this framework to give an unbiased estimator.
## Evaluating sampling strategies
Not all strategies for picking allocations to sample are equally good, of course. Among unbiased estimators, the lower the variance, the lower the mean squared error. Using the estimator above, the variance is:
$$
\begin{aligned}
& \mathrm{Var}[\sum_i S_i I_i \frac{1}{\mathrm{E}[I_i]}] \\
=& \sum_i \mathrm{Var}[S_i I_i \frac{1}{\mathrm{E}[I_i]}] \\
=& \sum_i \frac{S_i^2}{\mathrm{E}[I_i]^2} \mathrm{Var}[I_i] \\
=& \sum_i \frac{S_i^2}{\mathrm{E}[I_i]^2} \mathrm{Var}[I_i] \\
=& \sum_i \frac{S_i^2}{\mathrm{E}[I_i]^2} \mathrm{E}[I_i](1 - \mathrm{E}[I_i]) \\
=& \sum_i S_i^2 \frac{1 - \mathrm{E}[I_i]}{\mathrm{E}[I_i]}.
\end{aligned}
$$
We can use this formula to compare various strategy choices. All else being equal, lower-variance strategies are better.
## Possible sampling strategies
Because of the desire to avoid the fast-path costs, we'd like to use our Bernoulli trick if possible. There are two obvious counters to use: a coinflip per allocation, and a coinflip per byte allocated.
### Bernoulli sampling per-allocation
An obvious strategy is to pick some large $N$, and give each allocation a $1/N$ chance of being sampled. This would let us use our Bernoulli-via-Geometric trick. Using the formula from above, we can compute the variance as:
$$ \sum_i S_i^2 \frac{1 - \frac{1}{N}}{\frac{1}{N}} = (N-1) \sum_i S_i^2.$$
That is, an allocation of size $Z$ contributes a term of $(N-1)Z^2$ to the variance.
### Bernoulli sampling per-byte
Another option we have is to pick some rate $R$, and give each byte a $1/R$ chance of being picked for sampling (at which point we would sample its contained allocation). The chance of an allocation of size $Z$ being sampled, then, is
$$1-(1-\frac{1}{R})^{Z}$$
and an allocation of size $Z$ contributes a term of
$$Z^2 \frac{(1-\frac{1}{R})^{Z}}{1-(1-\frac{1}{R})^{Z}}.$$
In practical settings, $R$ is large, and so this is well-approximated by
$$Z^2 \frac{e^{-Z/R}}{1 - e^{-Z/R}} .$$
Just to get a sense of the dynamics here, let's look at the behavior for various values of $Z$. When $Z$ is small relative to $R$, we can use $e^z \approx 1 + x$, and conclude that the variance contributed by a small-$Z$ allocation is around
$$Z^2 \frac{1-Z/R}{Z/R} \approx RZ.$$
When $Z$ is comparable to $R$, the variance term is near $Z^2$ (we have $\frac{e^{-Z/R}}{1 - e^{-Z/R}} = 1$ when $Z/R = \ln 2 \approx 0.693$). When $Z$ is large relative to $R$, the variance term goes to zero.
## Picking a sampling strategy
The fast-path/slow-path dynamics of allocation patterns point us towards the per-byte sampling approach:
- The quadratic increase in variance per allocation in the first approach is quite costly when heaps have a non-negligible portion of their bytes in those allocations, which is practically often the case.
- The Bernoulli-per-byte approach shifts more of its samples towards large allocations, which are already a slow-path.
- We drive several tickers (e.g. tcache gc) by bytes allocated, and report bytes-allocated as a user-visible statistic, so we have to do all the necessary bookkeeping anyways.
Indeed, this is the approach we use in jemalloc. Our heap dumps record the size of the allocation and the sampling rate $R$, and jeprof unbiases by dividing by $1 - e^{-Z/R}$. The framework above would suggest dividing by $1-(1-1/R)^Z$; instead, we use the fact that $R$ is large in practical situations, and so $e^{-Z/R}$ is a good approximation (and faster to compute). (Equivalently, we may also see this as the factor that falls out from viewing sampling as a Poisson process directly).
## Consequences for heap dump consumers
Using this approach means that there are a few things users need to be aware of.
### Stack counts are not proportional to allocation frequencies
If one stack appears twice as often as another, this by itself does not imply that it allocates twice as often. Consider the case in which there are only two types of allocating call stacks in a program. Stack A allocates 8 bytes, and occurs a million times in a program. Stack B allocates 8 MB, and occurs just once in a program. If our sampling rate $R$ is about 1MB, we expect stack A to show up about 8 times, and stack B to show up once. Stack A isn't 8 times more frequent than stack B, though; it's a million times more frequent.
### Aggregation must be done after unbiasing samples
Some tools manually parse heap dump output, and aggregate across stacks (or across program runs) to provide wider-scale data analyses. When doing this aggregation, though, it's important to unbias-and-then-sum, rather than sum-and-then-unbias. Reusing our example from the previous section: suppose we collect heap dumps of the program from 1 million machines. We then have 8 million samples of stack A (8 per machine, each of 8 bytes), and 1 million samples of stack B (1 per machine, each of 8 MB).
If we sum first then unbias based on this formula: $1 - e^{-Z/R}$ we get:
$$Z = 8,000,000 * 8 bytes = 64MB$$
$$64MB / (1 - e^{-64MB/1MB}) \approx 64MB (Stack A)$$
$$Z = 1,000,000 * 8MB = 8TB$$
$$8TB / (1 - e^{-1TB/1MB}) \approx 8TB (Stack B)$$
Clearly we are unbiasing by an infinitesimal amount, which dramatically underreports the amount of memory allocated by stack A. Whereas if we unbias first and then sum:
$$Z = 8 bytes$$
$$8 bytes / (1 - e^{-8 bytes/1MB}) \approx 1MB$$
$$1MB * 8,000,000 = 8TB (Stack A)$$
$$Z = 8MB$$
$$8MB / (1 - e^{-8MB/1MB}) \approx 8MB$$
$$8MB * 1,000,000 = 8TB (Stack B)$$
## An avenue for future exploration
While the framework we laid out above is pretty general, as an engineering decision we're only interested in fairly simple approaches (i.e. ones for which the chance of an allocation being sampled depends only on its size). Our job is then: for each size class $Z$, pick a probability $p_Z$ that an allocation of that size will be sampled. We made some handwave-y references to statistical distributions to justify our choices, but there's no reason we need to pick them that way. Any set of non-zero probabilities is a valid choice.
The real limiting factor in our ability to reduce estimator variance is that fact that sampling is expensive; we want to make sure we only do it on a small fraction of allocations. Our goal, then, is to pick the $p_Z$ to minimize variance given some maximum sampling rate $P$. If we define $a_Z$ to be the fraction of allocations of size $Z$, and $l_Z$ to be the fraction of allocations of size $Z$ still alive at the time of a heap dump, then we can phrase this as an optimization problem over the choices of $p_Z$:
Minimize
$$ \sum_Z Z^2 l_Z \frac{1-p_Z}{p_Z} $$
subject to
$$ \sum_Z a_Z p_Z \leq P $$
Ignoring a term that doesn't depend on $p_Z$, the objective is minimized whenever
$$ \sum_Z Z^2 l_Z \frac{1}{p_Z} $$
is. For a particular program, $l_Z$ and $a_Z$ are just numbers that can be obtained (exactly) from existing stats introspection facilities, and we have a fairly tractable convex optimization problem (it can be framed as a second-order cone program). It would be interesting to evaluate, for various common allocation patterns, how well our current strategy adapts. Do our actual choices for $p_Z$ closely correspond to the optimal ones? How close is the variance of our choices to the variance of the optimal strategy?
You can imagine an implementation that actually goes all the way, and makes $p_Z$ selections a tuning parameter. I don't think this is a good use of development time for the foreseeable future; but I do wonder about the answers to some of these questions.
## Implementation realities
The nice story above is at least partially a lie. Initially, jeprof (copying its logic from pprof) had the sum-then-unbias error described above. The current version of jemalloc does the unbiasing step on a per-allocation basis internally, so that we're always tracking what the unbiased numbers "should" be. The problem is, actually surfacing those unbiased numbers would require a breaking change to jeprof (and the various already-deployed tools that have copied its logic). Instead, we use a little bit more trickery. Since we know at dump time the numbers we want jeprof to report, we simply choose the values we'll output so that the jeprof numbers will match the true numbers. The math is described in `src/prof_data.c` (where the only cleverness is a change of variables that lets the exponentials fall out).
This has the effect of making the output of jeprof (and related tools) correct, while making its inputs incorrect. This can be annoying to human readers of raw profiling dump output.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 16 KiB

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,125 @@
#ifndef JEMALLOC_INTERNAL_ARENA_EXTERNS_H
#define JEMALLOC_INTERNAL_ARENA_EXTERNS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_stats.h"
#include "jemalloc/internal/bin.h"
#include "jemalloc/internal/div.h"
#include "jemalloc/internal/emap.h"
#include "jemalloc/internal/extent_dss.h"
#include "jemalloc/internal/hook.h"
#include "jemalloc/internal/pages.h"
#include "jemalloc/internal/stats.h"
/*
* When the amount of pages to be purged exceeds this amount, deferred purge
* should happen.
*/
#define ARENA_DEFERRED_PURGE_NPAGES_THRESHOLD UINT64_C(1024)
extern ssize_t opt_dirty_decay_ms;
extern ssize_t opt_muzzy_decay_ms;
extern percpu_arena_mode_t opt_percpu_arena;
extern const char *const percpu_arena_mode_names[];
extern div_info_t arena_binind_div_info[SC_NBINS];
extern emap_t arena_emap_global;
extern size_t opt_oversize_threshold;
extern size_t oversize_threshold;
extern bool opt_huge_arena_pac_thp;
extern pac_thp_t huge_arena_pac_thp;
/*
* arena_bin_offsets[binind] is the offset of the first bin shard for size class
* binind.
*/
extern uint32_t arena_bin_offsets[SC_NBINS];
void arena_basic_stats_merge(tsdn_t *tsdn, arena_t *arena, unsigned *nthreads,
const char **dss, ssize_t *dirty_decay_ms, ssize_t *muzzy_decay_ms,
size_t *nactive, size_t *ndirty, size_t *nmuzzy);
void arena_stats_merge(tsdn_t *tsdn, arena_t *arena, unsigned *nthreads,
const char **dss, ssize_t *dirty_decay_ms, ssize_t *muzzy_decay_ms,
size_t *nactive, size_t *ndirty, size_t *nmuzzy, arena_stats_t *astats,
bin_stats_data_t *bstats, arena_stats_large_t *lstats, pac_estats_t *estats,
hpa_shard_stats_t *hpastats);
void arena_handle_deferred_work(tsdn_t *tsdn, arena_t *arena);
edata_t *arena_extent_alloc_large(
tsdn_t *tsdn, arena_t *arena, size_t usize, size_t alignment, bool zero);
void arena_extent_dalloc_large_prep(
tsdn_t *tsdn, arena_t *arena, edata_t *edata);
void arena_extent_ralloc_large_shrink(
tsdn_t *tsdn, arena_t *arena, edata_t *edata, size_t oldusize);
void arena_extent_ralloc_large_expand(
tsdn_t *tsdn, arena_t *arena, edata_t *edata, size_t oldusize);
bool arena_decay_ms_set(
tsdn_t *tsdn, arena_t *arena, extent_state_t state, ssize_t decay_ms);
ssize_t arena_decay_ms_get(arena_t *arena, extent_state_t state);
void arena_decay(
tsdn_t *tsdn, arena_t *arena, bool is_background_thread, bool all);
uint64_t arena_time_until_deferred(tsdn_t *tsdn, arena_t *arena);
void arena_do_deferred_work(tsdn_t *tsdn, arena_t *arena);
void arena_reset(tsd_t *tsd, arena_t *arena);
void arena_destroy(tsd_t *tsd, arena_t *arena);
cache_bin_sz_t arena_ptr_array_fill_small(tsdn_t *tsdn, arena_t *arena,
szind_t binind, cache_bin_ptr_array_t *arr, const cache_bin_sz_t nfill_min,
const cache_bin_sz_t nfill_max, cache_bin_stats_t merge_stats);
void *arena_malloc_hard(tsdn_t *tsdn, arena_t *arena, size_t size, szind_t ind,
bool zero, bool slab);
void *arena_palloc(tsdn_t *tsdn, arena_t *arena, size_t usize, size_t alignment,
bool zero, bool slab, tcache_t *tcache);
void arena_prof_promote(
tsdn_t *tsdn, void *ptr, size_t usize, size_t bumped_usize);
void arena_dalloc_promoted(
tsdn_t *tsdn, void *ptr, tcache_t *tcache, bool slow_path);
void arena_slab_dalloc(tsdn_t *tsdn, arena_t *arena, edata_t *slab);
void arena_dalloc_small(tsdn_t *tsdn, void *ptr);
void arena_ptr_array_flush(tsd_t *tsd, szind_t binind,
cache_bin_ptr_array_t *arr, unsigned nflush, bool small,
arena_t *stats_arena, cache_bin_stats_t merge_stats);
bool arena_ralloc_no_move(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size,
size_t extra, bool zero, size_t *newsize);
void *arena_ralloc(tsdn_t *tsdn, arena_t *arena, void *ptr, size_t oldsize,
size_t size, size_t alignment, bool zero, bool slab, tcache_t *tcache,
hook_ralloc_args_t *hook_args);
dss_prec_t arena_dss_prec_get(arena_t *arena);
ehooks_t *arena_get_ehooks(arena_t *arena);
extent_hooks_t *arena_set_extent_hooks(
tsd_t *tsd, arena_t *arena, extent_hooks_t *extent_hooks);
bool arena_dss_prec_set(arena_t *arena, dss_prec_t dss_prec);
void arena_name_get(arena_t *arena, char *name);
void arena_name_set(arena_t *arena, const char *name);
ssize_t arena_dirty_decay_ms_default_get(void);
bool arena_dirty_decay_ms_default_set(ssize_t decay_ms);
ssize_t arena_muzzy_decay_ms_default_get(void);
bool arena_muzzy_decay_ms_default_set(ssize_t decay_ms);
bool arena_retain_grow_limit_get_set(
tsd_t *tsd, arena_t *arena, size_t *old_limit, size_t *new_limit);
unsigned arena_nthreads_get(arena_t *arena, bool internal);
void arena_nthreads_inc(arena_t *arena, bool internal);
void arena_nthreads_dec(arena_t *arena, bool internal);
arena_t *arena_new(tsdn_t *tsdn, unsigned ind, const arena_config_t *config);
bool arena_init_huge(tsdn_t *tsdn, arena_t *a0);
arena_t *arena_choose_huge(tsd_t *tsd);
size_t arena_fill_small_fresh(tsdn_t *tsdn, arena_t *arena, szind_t binind,
void **ptrs, size_t nfill, bool zero);
bool arena_boot(sc_data_t *sc_data, base_t *base, bool hpa);
void arena_prefork0(tsdn_t *tsdn, arena_t *arena);
void arena_prefork1(tsdn_t *tsdn, arena_t *arena);
void arena_prefork2(tsdn_t *tsdn, arena_t *arena);
void arena_prefork3(tsdn_t *tsdn, arena_t *arena);
void arena_prefork4(tsdn_t *tsdn, arena_t *arena);
void arena_prefork5(tsdn_t *tsdn, arena_t *arena);
void arena_prefork6(tsdn_t *tsdn, arena_t *arena);
void arena_prefork7(tsdn_t *tsdn, arena_t *arena);
void arena_prefork8(tsdn_t *tsdn, arena_t *arena);
void arena_postfork_parent(tsdn_t *tsdn, arena_t *arena);
void arena_postfork_child(tsdn_t *tsdn, arena_t *arena);
#endif /* JEMALLOC_INTERNAL_ARENA_EXTERNS_H */

View file

@ -0,0 +1,27 @@
#ifndef JEMALLOC_INTERNAL_ARENA_INLINES_A_H
#define JEMALLOC_INTERNAL_ARENA_INLINES_A_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_structs.h"
static inline unsigned
arena_ind_get(const arena_t *arena) {
return arena->ind;
}
static inline void
arena_internal_add(arena_t *arena, size_t size) {
atomic_fetch_add_zu(&arena->stats.internal, size, ATOMIC_RELAXED);
}
static inline void
arena_internal_sub(arena_t *arena, size_t size) {
atomic_fetch_sub_zu(&arena->stats.internal, size, ATOMIC_RELAXED);
}
static inline size_t
arena_internal_get(arena_t *arena) {
return atomic_load_zu(&arena->stats.internal, ATOMIC_RELAXED);
}
#endif /* JEMALLOC_INTERNAL_ARENA_INLINES_A_H */

View file

@ -0,0 +1,538 @@
#ifndef JEMALLOC_INTERNAL_ARENA_INLINES_B_H
#define JEMALLOC_INTERNAL_ARENA_INLINES_B_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_externs.h"
#include "jemalloc/internal/arena_structs.h"
#include "jemalloc/internal/bin_inlines.h"
#include "jemalloc/internal/div.h"
#include "jemalloc/internal/emap.h"
#include "jemalloc/internal/jemalloc_internal_inlines_b.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/large_externs.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/prof_externs.h"
#include "jemalloc/internal/prof_structs.h"
#include "jemalloc/internal/rtree.h"
#include "jemalloc/internal/safety_check.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/sz.h"
#include "jemalloc/internal/tcache_inlines.h"
#include "jemalloc/internal/ticker.h"
static inline arena_t *
arena_get_from_edata(edata_t *edata) {
return (arena_t *)atomic_load_p(
&arenas[edata_arena_ind_get(edata)], ATOMIC_RELAXED);
}
JEMALLOC_ALWAYS_INLINE arena_t *
arena_choose_maybe_huge(tsd_t *tsd, arena_t *arena, size_t size) {
if (arena != NULL) {
return arena;
}
/*
* For huge allocations, use the dedicated huge arena if both are true:
* 1) is using auto arena selection (i.e. arena == NULL), and 2) the
* thread is not assigned to a manual arena.
*/
arena_t *tsd_arena = tsd_arena_get(tsd);
if (tsd_arena == NULL) {
tsd_arena = arena_choose(tsd, NULL);
}
size_t threshold = atomic_load_zu(
&tsd_arena->pa_shard.pac.oversize_threshold, ATOMIC_RELAXED);
if (unlikely(size >= threshold) && arena_is_auto(tsd_arena)) {
return arena_choose_huge(tsd);
}
return tsd_arena;
}
JEMALLOC_ALWAYS_INLINE bool
large_dalloc_safety_checks(edata_t *edata, const void *ptr, size_t input_size) {
if (!config_opt_safety_checks) {
return false;
}
/*
* Eagerly detect double free and sized dealloc bugs for large sizes.
* The cost is low enough (as edata will be accessed anyway) to be
* enabled all the time.
*/
if (unlikely(edata == NULL
|| edata_state_get(edata) != extent_state_active)) {
safety_check_fail(
"Invalid deallocation detected: "
"pages being freed (%p) not currently active, "
"possibly caused by double free bugs.",
ptr);
return true;
}
if (unlikely(input_size != edata_usize_get(edata)
|| input_size > SC_LARGE_MAXCLASS)) {
safety_check_fail_sized_dealloc(/* current_dealloc */ true, ptr,
/* true_size */ edata_usize_get(edata), input_size);
return true;
}
return false;
}
JEMALLOC_ALWAYS_INLINE void
arena_prof_info_get(tsd_t *tsd, const void *ptr, emap_alloc_ctx_t *alloc_ctx,
prof_info_t *prof_info, bool reset_recent) {
cassert(config_prof);
assert(ptr != NULL);
assert(prof_info != NULL);
edata_t *edata = NULL;
bool is_slab;
/* Static check. */
if (alloc_ctx == NULL) {
edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
is_slab = edata_slab_get(edata);
} else if (unlikely(!(is_slab = alloc_ctx->slab))) {
edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
}
if (unlikely(!is_slab)) {
/* edata must have been initialized at this point. */
assert(edata != NULL);
size_t usize = (alloc_ctx == NULL)
? edata_usize_get(edata)
: emap_alloc_ctx_usize_get(alloc_ctx);
if (reset_recent
&& large_dalloc_safety_checks(edata, ptr, usize)) {
prof_info->alloc_tctx = PROF_TCTX_SENTINEL;
return;
}
large_prof_info_get(tsd, edata, prof_info, reset_recent);
} else {
prof_info->alloc_tctx = PROF_TCTX_SENTINEL;
/*
* No need to set other fields in prof_info; they will never be
* accessed if alloc_tctx == PROF_TCTX_SENTINEL.
*/
}
}
JEMALLOC_ALWAYS_INLINE void
arena_prof_tctx_reset(
tsd_t *tsd, const void *ptr, emap_alloc_ctx_t *alloc_ctx) {
cassert(config_prof);
assert(ptr != NULL);
/* Static check. */
if (alloc_ctx == NULL) {
edata_t *edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
if (unlikely(!edata_slab_get(edata))) {
large_prof_tctx_reset(edata);
}
} else {
if (unlikely(!alloc_ctx->slab)) {
edata_t *edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
large_prof_tctx_reset(edata);
}
}
}
JEMALLOC_ALWAYS_INLINE void
arena_prof_tctx_reset_sampled(tsd_t *tsd, const void *ptr) {
cassert(config_prof);
assert(ptr != NULL);
edata_t *edata = emap_edata_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr);
assert(!edata_slab_get(edata));
large_prof_tctx_reset(edata);
}
JEMALLOC_ALWAYS_INLINE void
arena_prof_info_set(
tsd_t *tsd, edata_t *edata, prof_tctx_t *tctx, size_t size) {
cassert(config_prof);
assert(!edata_slab_get(edata));
large_prof_info_set(edata, tctx, size);
}
JEMALLOC_ALWAYS_INLINE void
arena_decay_ticks(tsdn_t *tsdn, arena_t *arena, unsigned nticks) {
if (unlikely(tsdn_null(tsdn))) {
return;
}
tsd_t *tsd = tsdn_tsd(tsdn);
/*
* We use the ticker_geom_t to avoid having per-arena state in the tsd.
* Instead of having a countdown-until-decay timer running for every
* arena in every thread, we flip a coin once per tick, whose
* probability of coming up heads is 1/nticks; this is effectively the
* operation of the ticker_geom_t. Each arena has the same chance of a
* coinflip coming up heads (1/ARENA_DECAY_NTICKS_PER_UPDATE), so we can
* use a single ticker for all of them.
*/
ticker_geom_t *decay_ticker = tsd_arena_decay_tickerp_get(tsd);
uint64_t *prng_state = tsd_prng_statep_get(tsd);
if (unlikely(ticker_geom_ticks(decay_ticker, prng_state, nticks,
tsd_reentrancy_level_get(tsd) > 0))) {
arena_decay(tsdn, arena, false, false);
}
}
JEMALLOC_ALWAYS_INLINE void
arena_decay_tick(tsdn_t *tsdn, arena_t *arena) {
arena_decay_ticks(tsdn, arena, 1);
}
JEMALLOC_ALWAYS_INLINE void *
arena_malloc(tsdn_t *tsdn, arena_t *arena, size_t size, szind_t ind, bool zero,
bool slab, tcache_t *tcache, bool slow_path) {
assert(!tsdn_null(tsdn) || tcache == NULL);
if (likely(tcache != NULL)) {
if (likely(slab)) {
assert(sz_can_use_slab(size));
return tcache_alloc_small(tsdn_tsd(tsdn), arena, tcache,
size, ind, zero, slow_path);
} else if (likely(ind < tcache_nbins_get(tcache->tcache_slow)
&& !tcache_bin_disabled(ind, &tcache->bins[ind],
tcache->tcache_slow))) {
return tcache_alloc_large(tsdn_tsd(tsdn), arena, tcache,
size, ind, zero, slow_path);
}
/* (size > tcache_max) case falls through. */
}
return arena_malloc_hard(tsdn, arena, size, ind, zero, slab);
}
JEMALLOC_ALWAYS_INLINE arena_t *
arena_aalloc(tsdn_t *tsdn, const void *ptr) {
edata_t *edata = emap_edata_lookup(tsdn, &arena_emap_global, ptr);
unsigned arena_ind = edata_arena_ind_get(edata);
return (arena_t *)atomic_load_p(&arenas[arena_ind], ATOMIC_RELAXED);
}
JEMALLOC_ALWAYS_INLINE size_t
arena_salloc(tsdn_t *tsdn, const void *ptr) {
assert(ptr != NULL);
emap_alloc_ctx_t alloc_ctx;
emap_alloc_ctx_lookup(tsdn, &arena_emap_global, ptr, &alloc_ctx);
assert(alloc_ctx.szind != SC_NSIZES);
return emap_alloc_ctx_usize_get(&alloc_ctx);
}
JEMALLOC_ALWAYS_INLINE size_t
arena_vsalloc(tsdn_t *tsdn, const void *ptr) {
/*
* Return 0 if ptr is not within an extent managed by jemalloc. This
* function has two extra costs relative to isalloc():
* - The rtree calls cannot claim to be dependent lookups, which induces
* rtree lookup load dependencies.
* - The lookup may fail, so there is an extra branch to check for
* failure.
*/
emap_full_alloc_ctx_t full_alloc_ctx;
bool missing = emap_full_alloc_ctx_try_lookup(
tsdn, &arena_emap_global, ptr, &full_alloc_ctx);
if (missing) {
return 0;
}
if (full_alloc_ctx.edata == NULL) {
return 0;
}
assert(edata_state_get(full_alloc_ctx.edata) == extent_state_active);
/* Only slab members should be looked up via interior pointers. */
assert(edata_addr_get(full_alloc_ctx.edata) == ptr
|| edata_slab_get(full_alloc_ctx.edata));
assert(full_alloc_ctx.szind != SC_NSIZES);
return edata_usize_get(full_alloc_ctx.edata);
}
static inline void
arena_dalloc_large_no_tcache(
tsdn_t *tsdn, void *ptr, szind_t szind, size_t usize) {
/*
* szind is still needed in this function mainly becuase
* szind < SC_NBINS determines not only if this is a small alloc,
* but also if szind is valid (an inactive extent would have
* szind == SC_NSIZES).
*/
if (config_prof && unlikely(szind < SC_NBINS)) {
arena_dalloc_promoted(tsdn, ptr, NULL, true);
} else {
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
if (large_dalloc_safety_checks(edata, ptr, usize)) {
/* See the comment in isfree. */
return;
}
large_dalloc(tsdn, edata);
}
}
static inline void
arena_dalloc_no_tcache(tsdn_t *tsdn, void *ptr) {
assert(ptr != NULL);
emap_alloc_ctx_t alloc_ctx;
emap_alloc_ctx_lookup(tsdn, &arena_emap_global, ptr, &alloc_ctx);
if (config_debug) {
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
assert(alloc_ctx.szind == edata_szind_get(edata));
assert(alloc_ctx.szind < SC_NSIZES);
assert(alloc_ctx.slab == edata_slab_get(edata));
assert(emap_alloc_ctx_usize_get(&alloc_ctx)
== edata_usize_get(edata));
}
if (likely(alloc_ctx.slab)) {
/* Small allocation. */
arena_dalloc_small(tsdn, ptr);
} else {
arena_dalloc_large_no_tcache(tsdn, ptr, alloc_ctx.szind,
emap_alloc_ctx_usize_get(&alloc_ctx));
}
}
JEMALLOC_ALWAYS_INLINE void
arena_dalloc_large(tsdn_t *tsdn, void *ptr, tcache_t *tcache, szind_t szind,
size_t usize, bool slow_path) {
assert(!tsdn_null(tsdn) && tcache != NULL);
bool is_sample_promoted = config_prof && szind < SC_NBINS;
if (unlikely(is_sample_promoted)) {
arena_dalloc_promoted(tsdn, ptr, tcache, slow_path);
} else {
if (szind < tcache_nbins_get(tcache->tcache_slow)
&& !tcache_bin_disabled(
szind, &tcache->bins[szind], tcache->tcache_slow)) {
tcache_dalloc_large(
tsdn_tsd(tsdn), tcache, ptr, szind, slow_path);
} else {
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
if (large_dalloc_safety_checks(edata, ptr, usize)) {
/* See the comment in isfree. */
return;
}
large_dalloc(tsdn, edata);
}
}
}
JEMALLOC_ALWAYS_INLINE bool
arena_tcache_dalloc_small_safety_check(tsdn_t *tsdn, void *ptr) {
if (!config_debug) {
return false;
}
edata_t *edata = emap_edata_lookup(tsdn, &arena_emap_global, ptr);
szind_t binind = edata_szind_get(edata);
div_info_t div_info = arena_binind_div_info[binind];
/*
* Calls the internal function bin_slab_regind_impl because the
* safety check does not require a lock.
*/
size_t regind = bin_slab_regind_impl(&div_info, binind, edata, ptr);
slab_data_t *slab_data = edata_slab_data_get(edata);
const bin_info_t *bin_info = &bin_infos[binind];
assert(edata_nfree_get(edata) < bin_info->nregs);
if (unlikely(!bitmap_get(
slab_data->bitmap, &bin_info->bitmap_info, regind))) {
safety_check_fail(
"Invalid deallocation detected: the pointer being freed (%p) not "
"currently active, possibly caused by double free bugs.\n",
ptr);
return true;
}
return false;
}
JEMALLOC_ALWAYS_INLINE void
arena_dalloc(tsdn_t *tsdn, void *ptr, tcache_t *tcache,
emap_alloc_ctx_t *caller_alloc_ctx, bool slow_path) {
assert(!tsdn_null(tsdn) || tcache == NULL);
assert(ptr != NULL);
if (unlikely(tcache == NULL)) {
arena_dalloc_no_tcache(tsdn, ptr);
return;
}
emap_alloc_ctx_t alloc_ctx;
if (caller_alloc_ctx != NULL) {
alloc_ctx = *caller_alloc_ctx;
} else {
util_assume(tsdn != NULL);
emap_alloc_ctx_lookup(
tsdn, &arena_emap_global, ptr, &alloc_ctx);
}
if (config_debug) {
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
assert(alloc_ctx.szind == edata_szind_get(edata));
assert(alloc_ctx.szind < SC_NSIZES);
assert(alloc_ctx.slab == edata_slab_get(edata));
assert(emap_alloc_ctx_usize_get(&alloc_ctx)
== edata_usize_get(edata));
}
if (likely(alloc_ctx.slab)) {
/* Small allocation. */
if (arena_tcache_dalloc_small_safety_check(tsdn, ptr)) {
return;
}
tcache_dalloc_small(
tsdn_tsd(tsdn), tcache, ptr, alloc_ctx.szind, slow_path);
} else {
arena_dalloc_large(tsdn, ptr, tcache, alloc_ctx.szind,
emap_alloc_ctx_usize_get(&alloc_ctx), slow_path);
}
}
static inline void
arena_sdalloc_no_tcache(tsdn_t *tsdn, void *ptr, size_t size) {
assert(ptr != NULL);
assert(size <= SC_LARGE_MAXCLASS);
emap_alloc_ctx_t alloc_ctx;
if (!config_prof || !opt_prof) {
/*
* There is no risk of being confused by a promoted sampled
* object, so base szind and slab on the given size.
*/
szind_t szind = sz_size2index(size);
emap_alloc_ctx_init(
&alloc_ctx, szind, (szind < SC_NBINS), size);
}
if ((config_prof && opt_prof) || config_debug) {
emap_alloc_ctx_lookup(
tsdn, &arena_emap_global, ptr, &alloc_ctx);
assert(alloc_ctx.szind == sz_size2index(size));
assert((config_prof && opt_prof)
|| alloc_ctx.slab == (alloc_ctx.szind < SC_NBINS));
if (config_debug) {
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
assert(alloc_ctx.szind == edata_szind_get(edata));
assert(alloc_ctx.slab == edata_slab_get(edata));
}
}
if (likely(alloc_ctx.slab)) {
/* Small allocation. */
arena_dalloc_small(tsdn, ptr);
} else {
arena_dalloc_large_no_tcache(tsdn, ptr, alloc_ctx.szind,
emap_alloc_ctx_usize_get(&alloc_ctx));
}
}
JEMALLOC_ALWAYS_INLINE void
arena_sdalloc(tsdn_t *tsdn, void *ptr, size_t size, tcache_t *tcache,
emap_alloc_ctx_t *caller_alloc_ctx, bool slow_path) {
assert(!tsdn_null(tsdn) || tcache == NULL);
assert(ptr != NULL);
assert(size <= SC_LARGE_MAXCLASS);
if (unlikely(tcache == NULL)) {
arena_sdalloc_no_tcache(tsdn, ptr, size);
return;
}
emap_alloc_ctx_t alloc_ctx;
if (config_prof && opt_prof) {
if (caller_alloc_ctx == NULL) {
/* Uncommon case and should be a static check. */
emap_alloc_ctx_lookup(
tsdn, &arena_emap_global, ptr, &alloc_ctx);
assert(alloc_ctx.szind == sz_size2index(size));
assert(emap_alloc_ctx_usize_get(&alloc_ctx) == size);
} else {
alloc_ctx = *caller_alloc_ctx;
}
} else {
/*
* There is no risk of being confused by a promoted sampled
* object, so base szind and slab on the given size.
*/
alloc_ctx.szind = sz_size2index(size);
alloc_ctx.slab = (alloc_ctx.szind < SC_NBINS);
}
if (config_debug) {
edata_t *edata = emap_edata_lookup(
tsdn, &arena_emap_global, ptr);
assert(alloc_ctx.szind == edata_szind_get(edata));
assert(alloc_ctx.slab == edata_slab_get(edata));
emap_alloc_ctx_init(
&alloc_ctx, alloc_ctx.szind, alloc_ctx.slab, sz_s2u(size));
assert(emap_alloc_ctx_usize_get(&alloc_ctx)
== edata_usize_get(edata));
}
if (likely(alloc_ctx.slab)) {
/* Small allocation. */
if (arena_tcache_dalloc_small_safety_check(tsdn, ptr)) {
return;
}
tcache_dalloc_small(
tsdn_tsd(tsdn), tcache, ptr, alloc_ctx.szind, slow_path);
} else {
arena_dalloc_large(tsdn, ptr, tcache, alloc_ctx.szind,
sz_s2u(size), slow_path);
}
}
static inline void
arena_cache_oblivious_randomize(
tsdn_t *tsdn, arena_t *arena, edata_t *edata, size_t alignment) {
assert(edata_base_get(edata) == edata_addr_get(edata));
if (alignment < PAGE) {
unsigned lg_range = LG_PAGE
- lg_floor(CACHELINE_CEILING(alignment));
size_t r;
if (!tsdn_null(tsdn)) {
tsd_t *tsd = tsdn_tsd(tsdn);
r = (size_t)prng_lg_range_u64(
tsd_prng_statep_get(tsd), lg_range);
} else {
uint64_t stack_value = (uint64_t)(uintptr_t)&r;
r = (size_t)prng_lg_range_u64(&stack_value, lg_range);
}
uintptr_t random_offset = ((uintptr_t)r)
<< (LG_PAGE - lg_range);
edata->e_addr = (void *)((byte_t *)edata->e_addr
+ random_offset);
assert(ALIGNMENT_ADDR2BASE(edata->e_addr, alignment)
== edata->e_addr);
}
}
static inline bin_t *
arena_get_bin(arena_t *arena, szind_t binind, unsigned binshard) {
bin_t *shard0 = (bin_t *)((byte_t *)arena + arena_bin_offsets[binind]);
return shard0 + binshard;
}
#endif /* JEMALLOC_INTERNAL_ARENA_INLINES_B_H */

View file

@ -0,0 +1,123 @@
#ifndef JEMALLOC_INTERNAL_ARENA_STATS_H
#define JEMALLOC_INTERNAL_ARENA_STATS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/lockedint.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/mutex_prof.h"
#include "jemalloc/internal/pa.h"
#include "jemalloc/internal/sc.h"
JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS
typedef struct arena_stats_large_s arena_stats_large_t;
struct arena_stats_large_s {
/*
* Total number of large allocation/deallocation requests served directly
* by the arena.
*/
locked_u64_t nmalloc;
locked_u64_t ndalloc;
/*
* Total large active bytes (allocated - deallocated) served directly
* by the arena.
*/
locked_u64_t active_bytes;
/*
* Number of allocation requests that correspond to this size class.
* This includes requests served by tcache, though tcache only
* periodically merges into this counter.
*/
locked_u64_t nrequests; /* Partially derived. */
/*
* Number of tcache fills / flushes for large (similarly, periodically
* merged). Note that there is no large tcache batch-fill currently
* (i.e. only fill 1 at a time); however flush may be batched.
*/
locked_u64_t nfills; /* Partially derived. */
locked_u64_t nflushes; /* Partially derived. */
/* Current number of allocations of this size class. */
size_t curlextents; /* Derived. */
};
/*
* Arena stats. Note that fields marked "derived" are not directly maintained
* within the arena code; rather their values are derived during stats merge
* requests.
*/
typedef struct arena_stats_s arena_stats_t;
struct arena_stats_s {
LOCKEDINT_MTX_DECLARE(mtx)
/*
* resident includes the base stats -- that's why it lives here and not
* in pa_shard_stats_t.
*/
size_t base; /* Derived. */
size_t metadata_edata; /* Derived. */
size_t metadata_rtree; /* Derived. */
size_t resident; /* Derived. */
size_t metadata_thp; /* Derived. */
size_t mapped; /* Derived. */
atomic_zu_t internal;
size_t allocated_large; /* Derived. */
uint64_t nmalloc_large; /* Derived. */
uint64_t ndalloc_large; /* Derived. */
uint64_t nfills_large; /* Derived. */
uint64_t nflushes_large; /* Derived. */
uint64_t nrequests_large; /* Derived. */
/*
* The stats logically owned by the pa_shard in the same arena. This
* lives here only because it's convenient for the purposes of the ctl
* module -- it only knows about the single arena_stats.
*/
pa_shard_stats_t pa_shard_stats;
/* Number of bytes cached in tcache associated with this arena. */
size_t tcache_bytes; /* Derived. */
size_t tcache_stashed_bytes; /* Derived. */
mutex_prof_data_t mutex_prof_data[mutex_prof_num_arena_mutexes];
/* One element for each large size class. */
arena_stats_large_t lstats[SC_NSIZES - SC_NBINS];
/* Arena uptime. */
nstime_t uptime;
};
static inline bool
arena_stats_init(tsdn_t *tsdn, arena_stats_t *arena_stats) {
if (config_debug) {
for (size_t i = 0; i < sizeof(arena_stats_t); i++) {
assert(((char *)arena_stats)[i] == 0);
}
}
if (LOCKEDINT_MTX_INIT(arena_stats->mtx, "arena_stats",
WITNESS_RANK_ARENA_STATS, malloc_mutex_rank_exclusive)) {
return true;
}
/* Memory is zeroed, so there is no need to clear stats. */
return false;
}
static inline void
arena_stats_large_flush_nrequests_add(tsdn_t *tsdn, arena_stats_t *arena_stats,
szind_t szind, uint64_t nrequests) {
LOCKEDINT_MTX_LOCK(tsdn, arena_stats->mtx);
arena_stats_large_t *lstats = &arena_stats->lstats[szind - SC_NBINS];
locked_inc_u64(tsdn, LOCKEDINT_MTX(arena_stats->mtx),
&lstats->nrequests, nrequests);
locked_inc_u64(
tsdn, LOCKEDINT_MTX(arena_stats->mtx), &lstats->nflushes, 1);
LOCKEDINT_MTX_UNLOCK(tsdn, arena_stats->mtx);
}
#endif /* JEMALLOC_INTERNAL_ARENA_STATS_H */

View file

@ -0,0 +1,111 @@
#ifndef JEMALLOC_INTERNAL_ARENA_STRUCTS_H
#define JEMALLOC_INTERNAL_ARENA_STRUCTS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_stats.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/bin.h"
#include "jemalloc/internal/bitmap.h"
#include "jemalloc/internal/counter.h"
#include "jemalloc/internal/ecache.h"
#include "jemalloc/internal/edata_cache.h"
#include "jemalloc/internal/extent_dss.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/nstime.h"
#include "jemalloc/internal/pa.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/ticker.h"
struct arena_s {
/*
* Number of threads currently assigned to this arena. Each thread has
* two distinct assignments, one for application-serving allocation, and
* the other for internal metadata allocation. Internal metadata must
* not be allocated from arenas explicitly created via the arenas.create
* mallctl, because the arena.<i>.reset mallctl indiscriminately
* discards all allocations for the affected arena.
*
* 0: Application allocation.
* 1: Internal metadata allocation.
*
* Synchronization: atomic.
*/
atomic_u_t nthreads[2];
/* Next bin shard for binding new threads. Synchronization: atomic. */
atomic_u_t binshard_next;
/*
* When percpu_arena is enabled, to amortize the cost of reading /
* updating the current CPU id, track the most recent thread accessing
* this arena, and only read CPU if there is a mismatch.
*/
tsdn_t *last_thd;
/* Synchronization: internal. */
arena_stats_t stats;
/*
* Lists of tcaches and cache_bin_array_descriptors for extant threads
* associated with this arena. Stats from these are merged
* incrementally, and at exit if opt_stats_print is enabled.
*
* Synchronization: tcache_ql_mtx.
*/
ql_head(tcache_slow_t) tcache_ql;
ql_head(cache_bin_array_descriptor_t) cache_bin_array_descriptor_ql;
malloc_mutex_t tcache_ql_mtx;
/*
* Represents a dss_prec_t, but atomically.
*
* Synchronization: atomic.
*/
atomic_u_t dss_prec;
/*
* Extant large allocations.
*
* Synchronization: large_mtx.
*/
edata_list_active_t large;
/* Synchronizes all large allocation/update/deallocation. */
malloc_mutex_t large_mtx;
/* The page-level allocator shard this arena uses. */
pa_shard_t pa_shard;
/*
* A cached copy of base->ind. This can get accessed on hot paths;
* looking it up in base requires an extra pointer hop / cache miss.
*/
unsigned ind;
/*
* Base allocator, from which arena metadata are allocated.
*
* Synchronization: internal.
*/
base_t *base;
/* Used to determine uptime. Read-only after initialization. */
nstime_t create_time;
/* The name of the arena. */
char name[ARENA_NAME_LEN];
/*
* The arena is allocated alongside its bins; really this is a
* dynamically sized array determined by the binshard settings.
* Enforcing cacheline-alignment to minimize the number of cachelines
* touched on the hot paths.
*/
JEMALLOC_WARN_ON_USAGE(
"Do not use this field directly. "
"Use `arena_get_bin` instead.")
JEMALLOC_ALIGNED(CACHELINE)
bin_t all_bins[0];
};
#endif /* JEMALLOC_INTERNAL_ARENA_STRUCTS_H */

View file

@ -0,0 +1,60 @@
#ifndef JEMALLOC_INTERNAL_ARENA_TYPES_H
#define JEMALLOC_INTERNAL_ARENA_TYPES_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/sc.h"
/* Default decay times in milliseconds. */
#define DIRTY_DECAY_MS_DEFAULT ZD(10 * 1000)
#define MUZZY_DECAY_MS_DEFAULT (0)
/* Number of event ticks between time checks. */
#define ARENA_DECAY_NTICKS_PER_UPDATE 1000
/* Maximum length of the arena name. */
#define ARENA_NAME_LEN 32
typedef struct arena_s arena_t;
typedef enum {
percpu_arena_mode_names_base = 0, /* Used for options processing. */
/*
* *_uninit are used only during bootstrapping, and must correspond
* to initialized variant plus percpu_arena_mode_enabled_base.
*/
percpu_arena_uninit = 0,
per_phycpu_arena_uninit = 1,
/* All non-disabled modes must come after percpu_arena_disabled. */
percpu_arena_disabled = 2,
percpu_arena_mode_names_limit = 3, /* Used for options processing. */
percpu_arena_mode_enabled_base = 3,
percpu_arena = 3,
per_phycpu_arena = 4 /* Hyper threads share arena. */
} percpu_arena_mode_t;
#define PERCPU_ARENA_ENABLED(m) ((m) >= percpu_arena_mode_enabled_base)
#define PERCPU_ARENA_DEFAULT percpu_arena_disabled
/*
* When allocation_size >= oversize_threshold, use the dedicated huge arena
* (unless have explicitly spicified arena index). 0 disables the feature.
*/
#define OVERSIZE_THRESHOLD_DEFAULT (8 << 20)
struct arena_config_s {
/* extent hooks to be used for the arena */
extent_hooks_t *extent_hooks;
/*
* Use extent hooks for metadata (base) allocations when true.
*/
bool metadata_use_hooks;
};
typedef struct arena_config_s arena_config_t;
extern const arena_config_t arena_config_default;
#endif /* JEMALLOC_INTERNAL_ARENA_TYPES_H */

View file

@ -1,45 +1,63 @@
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/malloc_io.h"
#include "jemalloc/internal/util.h"
/*
* Define a custom assert() in order to reduce the chances of deadlock during
* assertion failure.
*/
#ifndef assert
#define assert(e) do { \
if (unlikely(config_debug && !(e))) { \
malloc_printf( \
"<jemalloc>: %s:%d: Failed assertion: \"%s\"\n", \
__FILE__, __LINE__, #e); \
abort(); \
} \
} while (0)
# define assert(e) \
do { \
if (unlikely(config_debug && !(e))) { \
malloc_printf( \
"<jemalloc>: %s:%d: Failed assertion: \"%s\"\n", \
__FILE__, __LINE__, #e); \
abort(); \
} \
} while (0)
#endif
#ifndef not_reached
#define not_reached() do { \
if (config_debug) { \
malloc_printf( \
"<jemalloc>: %s:%d: Unreachable code reached\n", \
__FILE__, __LINE__); \
abort(); \
} \
unreachable(); \
} while (0)
# define not_reached() \
do { \
if (config_debug) { \
malloc_printf( \
"<jemalloc>: %s:%d: Unreachable code reached\n", \
__FILE__, __LINE__); \
abort(); \
} \
unreachable(); \
} while (0)
#endif
#ifndef not_implemented
#define not_implemented() do { \
if (config_debug) { \
malloc_printf("<jemalloc>: %s:%d: Not implemented\n", \
__FILE__, __LINE__); \
abort(); \
} \
} while (0)
# define not_implemented() \
do { \
if (config_debug) { \
malloc_printf( \
"<jemalloc>: %s:%d: Not implemented\n", \
__FILE__, __LINE__); \
abort(); \
} \
} while (0)
#endif
#ifndef assert_not_implemented
#define assert_not_implemented(e) do { \
if (unlikely(config_debug && !(e))) \
not_implemented(); \
} while (0)
# define assert_not_implemented(e) \
do { \
if (unlikely(config_debug && !(e))) { \
not_implemented(); \
} \
} while (0)
#endif
/* Use to assert a particular configuration, e.g., cassert(config_debug). */
#ifndef cassert
# define cassert(c) \
do { \
if (unlikely(!(c))) { \
not_reached(); \
} \
} while (0)
#endif

View file

@ -1,651 +1,108 @@
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
#ifndef JEMALLOC_INTERNAL_ATOMIC_H
#define JEMALLOC_INTERNAL_ATOMIC_H
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
#include "jemalloc/internal/jemalloc_preamble.h"
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
#define JEMALLOC_U8_ATOMICS
#if defined(JEMALLOC_GCC_ATOMIC_ATOMICS)
# include "jemalloc/internal/atomic_gcc_atomic.h"
# if !defined(JEMALLOC_GCC_U8_ATOMIC_ATOMICS)
# undef JEMALLOC_U8_ATOMICS
# endif
#elif defined(JEMALLOC_GCC_SYNC_ATOMICS)
# include "jemalloc/internal/atomic_gcc_sync.h"
# if !defined(JEMALLOC_GCC_U8_SYNC_ATOMICS)
# undef JEMALLOC_U8_ATOMICS
# endif
#elif defined(_MSC_VER)
# include "jemalloc/internal/atomic_msvc.h"
#elif defined(JEMALLOC_C11_ATOMICS)
# include "jemalloc/internal/atomic_c11.h"
#else
# error "Don't have atomics implemented on this platform."
#endif
#define atomic_read_uint64(p) atomic_add_uint64(p, 0)
#define atomic_read_uint32(p) atomic_add_uint32(p, 0)
#define atomic_read_p(p) atomic_add_p(p, NULL)
#define atomic_read_z(p) atomic_add_z(p, 0)
#define atomic_read_u(p) atomic_add_u(p, 0)
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#define ATOMIC_INLINE JEMALLOC_ALWAYS_INLINE
/*
* All arithmetic functions return the arithmetic result of the atomic
* operation. Some atomic operation APIs return the value prior to mutation, in
* which case the following functions must redundantly compute the result so
* that it can be returned. These functions are normally inlined, so the extra
* operations can be optimized away if the return values aren't used by the
* callers.
* This header gives more or less a backport of C11 atomics. The user can write
* JEMALLOC_GENERATE_ATOMICS(type, short_type, lg_sizeof_type); to generate
* counterparts of the C11 atomic functions for type, as so:
* JEMALLOC_GENERATE_ATOMICS(int *, pi, 3);
* and then write things like:
* int *some_ptr;
* atomic_pi_t atomic_ptr_to_int;
* atomic_store_pi(&atomic_ptr_to_int, some_ptr, ATOMIC_RELAXED);
* int *prev_value = atomic_exchange_pi(&ptr_to_int, NULL, ATOMIC_ACQ_REL);
* assert(some_ptr == prev_value);
* and expect things to work in the obvious way.
*
* <t> atomic_read_<t>(<t> *p) { return (*p); }
* <t> atomic_add_<t>(<t> *p, <t> x) { return (*p += x); }
* <t> atomic_sub_<t>(<t> *p, <t> x) { return (*p -= x); }
* bool atomic_cas_<t>(<t> *p, <t> c, <t> s)
* {
* if (*p != c)
* return (true);
* *p = s;
* return (false);
* }
* void atomic_write_<t>(<t> *p, <t> x) { *p = x; }
* Also included (with naming differences to avoid conflicts with the standard
* library):
* atomic_fence(atomic_memory_order_t) (mimics C11's atomic_thread_fence).
* ATOMIC_INIT (mimics C11's ATOMIC_VAR_INIT).
*/
#ifndef JEMALLOC_ENABLE_INLINE
uint64_t atomic_add_uint64(uint64_t *p, uint64_t x);
uint64_t atomic_sub_uint64(uint64_t *p, uint64_t x);
bool atomic_cas_uint64(uint64_t *p, uint64_t c, uint64_t s);
void atomic_write_uint64(uint64_t *p, uint64_t x);
uint32_t atomic_add_uint32(uint32_t *p, uint32_t x);
uint32_t atomic_sub_uint32(uint32_t *p, uint32_t x);
bool atomic_cas_uint32(uint32_t *p, uint32_t c, uint32_t s);
void atomic_write_uint32(uint32_t *p, uint32_t x);
void *atomic_add_p(void **p, void *x);
void *atomic_sub_p(void **p, void *x);
bool atomic_cas_p(void **p, void *c, void *s);
void atomic_write_p(void **p, const void *x);
size_t atomic_add_z(size_t *p, size_t x);
size_t atomic_sub_z(size_t *p, size_t x);
bool atomic_cas_z(size_t *p, size_t c, size_t s);
void atomic_write_z(size_t *p, size_t x);
unsigned atomic_add_u(unsigned *p, unsigned x);
unsigned atomic_sub_u(unsigned *p, unsigned x);
bool atomic_cas_u(unsigned *p, unsigned c, unsigned s);
void atomic_write_u(unsigned *p, unsigned x);
#endif
/*
* Pure convenience, so that we don't have to type "atomic_memory_order_"
* quite so often.
*/
#define ATOMIC_RELAXED atomic_memory_order_relaxed
#define ATOMIC_ACQUIRE atomic_memory_order_acquire
#define ATOMIC_RELEASE atomic_memory_order_release
#define ATOMIC_ACQ_REL atomic_memory_order_acq_rel
#define ATOMIC_SEQ_CST atomic_memory_order_seq_cst
#if (defined(JEMALLOC_ENABLE_INLINE) || defined(JEMALLOC_ATOMIC_C_))
/******************************************************************************/
/* 64-bit operations. */
/*
* Another convenience -- simple atomic helper functions.
*/
#define JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(type, short_type, lg_size) \
JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, lg_size) \
ATOMIC_INLINE void atomic_load_add_store_##short_type( \
atomic_##short_type##_t *a, type inc) { \
type oldval = atomic_load_##short_type(a, ATOMIC_RELAXED); \
type newval = oldval + inc; \
atomic_store_##short_type(a, newval, ATOMIC_RELAXED); \
} \
ATOMIC_INLINE void atomic_load_sub_store_##short_type( \
atomic_##short_type##_t *a, type inc) { \
type oldval = atomic_load_##short_type(a, ATOMIC_RELAXED); \
type newval = oldval - inc; \
atomic_store_##short_type(a, newval, ATOMIC_RELAXED); \
}
/*
* Not all platforms have 64-bit atomics. If we do, this #define exposes that
* fact.
*/
#if (LG_SIZEOF_PTR == 3 || LG_SIZEOF_INT == 3)
# if (defined(__amd64__) || defined(__x86_64__))
JEMALLOC_INLINE uint64_t
atomic_add_uint64(uint64_t *p, uint64_t x)
{
uint64_t t = x;
asm volatile (
"lock; xaddq %0, %1;"
: "+r" (t), "=m" (*p) /* Outputs. */
: "m" (*p) /* Inputs. */
);
return (t + x);
}
JEMALLOC_INLINE uint64_t
atomic_sub_uint64(uint64_t *p, uint64_t x)
{
uint64_t t;
x = (uint64_t)(-(int64_t)x);
t = x;
asm volatile (
"lock; xaddq %0, %1;"
: "+r" (t), "=m" (*p) /* Outputs. */
: "m" (*p) /* Inputs. */
);
return (t + x);
}
JEMALLOC_INLINE bool
atomic_cas_uint64(uint64_t *p, uint64_t c, uint64_t s)
{
uint8_t success;
asm volatile (
"lock; cmpxchgq %4, %0;"
"sete %1;"
: "=m" (*p), "=a" (success) /* Outputs. */
: "m" (*p), "a" (c), "r" (s) /* Inputs. */
: "memory" /* Clobbers. */
);
return (!(bool)success);
}
JEMALLOC_INLINE void
atomic_write_uint64(uint64_t *p, uint64_t x)
{
asm volatile (
"xchgq %1, %0;" /* Lock is implied by xchgq. */
: "=m" (*p), "+r" (x) /* Outputs. */
: "m" (*p) /* Inputs. */
: "memory" /* Clobbers. */
);
}
# elif (defined(JEMALLOC_C11ATOMICS))
JEMALLOC_INLINE uint64_t
atomic_add_uint64(uint64_t *p, uint64_t x)
{
volatile atomic_uint_least64_t *a = (volatile atomic_uint_least64_t *)p;
return (atomic_fetch_add(a, x) + x);
}
JEMALLOC_INLINE uint64_t
atomic_sub_uint64(uint64_t *p, uint64_t x)
{
volatile atomic_uint_least64_t *a = (volatile atomic_uint_least64_t *)p;
return (atomic_fetch_sub(a, x) - x);
}
JEMALLOC_INLINE bool
atomic_cas_uint64(uint64_t *p, uint64_t c, uint64_t s)
{
volatile atomic_uint_least64_t *a = (volatile atomic_uint_least64_t *)p;
return (!atomic_compare_exchange_strong(a, &c, s));
}
JEMALLOC_INLINE void
atomic_write_uint64(uint64_t *p, uint64_t x)
{
volatile atomic_uint_least64_t *a = (volatile atomic_uint_least64_t *)p;
atomic_store(a, x);
}
# elif (defined(JEMALLOC_ATOMIC9))
JEMALLOC_INLINE uint64_t
atomic_add_uint64(uint64_t *p, uint64_t x)
{
/*
* atomic_fetchadd_64() doesn't exist, but we only ever use this
* function on LP64 systems, so atomic_fetchadd_long() will do.
*/
assert(sizeof(uint64_t) == sizeof(unsigned long));
return (atomic_fetchadd_long(p, (unsigned long)x) + x);
}
JEMALLOC_INLINE uint64_t
atomic_sub_uint64(uint64_t *p, uint64_t x)
{
assert(sizeof(uint64_t) == sizeof(unsigned long));
return (atomic_fetchadd_long(p, (unsigned long)(-(long)x)) - x);
}
JEMALLOC_INLINE bool
atomic_cas_uint64(uint64_t *p, uint64_t c, uint64_t s)
{
assert(sizeof(uint64_t) == sizeof(unsigned long));
return (!atomic_cmpset_long(p, (unsigned long)c, (unsigned long)s));
}
JEMALLOC_INLINE void
atomic_write_uint64(uint64_t *p, uint64_t x)
{
assert(sizeof(uint64_t) == sizeof(unsigned long));
atomic_store_rel_long(p, x);
}
# elif (defined(JEMALLOC_OSATOMIC))
JEMALLOC_INLINE uint64_t
atomic_add_uint64(uint64_t *p, uint64_t x)
{
return (OSAtomicAdd64((int64_t)x, (int64_t *)p));
}
JEMALLOC_INLINE uint64_t
atomic_sub_uint64(uint64_t *p, uint64_t x)
{
return (OSAtomicAdd64(-((int64_t)x), (int64_t *)p));
}
JEMALLOC_INLINE bool
atomic_cas_uint64(uint64_t *p, uint64_t c, uint64_t s)
{
return (!OSAtomicCompareAndSwap64(c, s, (int64_t *)p));
}
JEMALLOC_INLINE void
atomic_write_uint64(uint64_t *p, uint64_t x)
{
uint64_t o;
/*The documented OSAtomic*() API does not expose an atomic exchange. */
do {
o = atomic_read_uint64(p);
} while (atomic_cas_uint64(p, o, x));
}
# elif (defined(_MSC_VER))
JEMALLOC_INLINE uint64_t
atomic_add_uint64(uint64_t *p, uint64_t x)
{
return (InterlockedExchangeAdd64(p, x) + x);
}
JEMALLOC_INLINE uint64_t
atomic_sub_uint64(uint64_t *p, uint64_t x)
{
return (InterlockedExchangeAdd64(p, -((int64_t)x)) - x);
}
JEMALLOC_INLINE bool
atomic_cas_uint64(uint64_t *p, uint64_t c, uint64_t s)
{
uint64_t o;
o = InterlockedCompareExchange64(p, s, c);
return (o != c);
}
JEMALLOC_INLINE void
atomic_write_uint64(uint64_t *p, uint64_t x)
{
InterlockedExchange64(p, x);
}
# elif (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8) || \
defined(JE_FORCE_SYNC_COMPARE_AND_SWAP_8))
JEMALLOC_INLINE uint64_t
atomic_add_uint64(uint64_t *p, uint64_t x)
{
return (__sync_add_and_fetch(p, x));
}
JEMALLOC_INLINE uint64_t
atomic_sub_uint64(uint64_t *p, uint64_t x)
{
return (__sync_sub_and_fetch(p, x));
}
JEMALLOC_INLINE bool
atomic_cas_uint64(uint64_t *p, uint64_t c, uint64_t s)
{
return (!__sync_bool_compare_and_swap(p, c, s));
}
JEMALLOC_INLINE void
atomic_write_uint64(uint64_t *p, uint64_t x)
{
__sync_lock_test_and_set(p, x);
}
# else
# error "Missing implementation for 64-bit atomic operations"
# endif
# define JEMALLOC_ATOMIC_U64
#endif
/******************************************************************************/
/* 32-bit operations. */
#if (defined(__i386__) || defined(__amd64__) || defined(__x86_64__))
JEMALLOC_INLINE uint32_t
atomic_add_uint32(uint32_t *p, uint32_t x)
{
uint32_t t = x;
JEMALLOC_GENERATE_ATOMICS(void *, p, LG_SIZEOF_PTR)
asm volatile (
"lock; xaddl %0, %1;"
: "+r" (t), "=m" (*p) /* Outputs. */
: "m" (*p) /* Inputs. */
);
/*
* There's no actual guarantee that sizeof(bool) == 1, but it's true on the only
* platform that actually needs to know the size, MSVC.
*/
JEMALLOC_GENERATE_ATOMICS(bool, b, 0)
return (t + x);
}
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(unsigned, u, LG_SIZEOF_INT)
JEMALLOC_INLINE uint32_t
atomic_sub_uint32(uint32_t *p, uint32_t x)
{
uint32_t t;
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(int, i, LG_SIZEOF_INT)
x = (uint32_t)(-(int32_t)x);
t = x;
asm volatile (
"lock; xaddl %0, %1;"
: "+r" (t), "=m" (*p) /* Outputs. */
: "m" (*p) /* Inputs. */
);
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(size_t, zu, LG_SIZEOF_PTR)
return (t + x);
}
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(ssize_t, zd, LG_SIZEOF_PTR)
JEMALLOC_INLINE bool
atomic_cas_uint32(uint32_t *p, uint32_t c, uint32_t s)
{
uint8_t success;
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(uint8_t, u8, 0)
asm volatile (
"lock; cmpxchgl %4, %0;"
"sete %1;"
: "=m" (*p), "=a" (success) /* Outputs. */
: "m" (*p), "a" (c), "r" (s) /* Inputs. */
: "memory"
);
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(uint32_t, u32, 2)
return (!(bool)success);
}
JEMALLOC_INLINE void
atomic_write_uint32(uint32_t *p, uint32_t x)
{
asm volatile (
"xchgl %1, %0;" /* Lock is implied by xchgl. */
: "=m" (*p), "+r" (x) /* Outputs. */
: "m" (*p) /* Inputs. */
: "memory" /* Clobbers. */
);
}
# elif (defined(JEMALLOC_C11ATOMICS))
JEMALLOC_INLINE uint32_t
atomic_add_uint32(uint32_t *p, uint32_t x)
{
volatile atomic_uint_least32_t *a = (volatile atomic_uint_least32_t *)p;
return (atomic_fetch_add(a, x) + x);
}
JEMALLOC_INLINE uint32_t
atomic_sub_uint32(uint32_t *p, uint32_t x)
{
volatile atomic_uint_least32_t *a = (volatile atomic_uint_least32_t *)p;
return (atomic_fetch_sub(a, x) - x);
}
JEMALLOC_INLINE bool
atomic_cas_uint32(uint32_t *p, uint32_t c, uint32_t s)
{
volatile atomic_uint_least32_t *a = (volatile atomic_uint_least32_t *)p;
return (!atomic_compare_exchange_strong(a, &c, s));
}
JEMALLOC_INLINE void
atomic_write_uint32(uint32_t *p, uint32_t x)
{
volatile atomic_uint_least32_t *a = (volatile atomic_uint_least32_t *)p;
atomic_store(a, x);
}
#elif (defined(JEMALLOC_ATOMIC9))
JEMALLOC_INLINE uint32_t
atomic_add_uint32(uint32_t *p, uint32_t x)
{
return (atomic_fetchadd_32(p, x) + x);
}
JEMALLOC_INLINE uint32_t
atomic_sub_uint32(uint32_t *p, uint32_t x)
{
return (atomic_fetchadd_32(p, (uint32_t)(-(int32_t)x)) - x);
}
JEMALLOC_INLINE bool
atomic_cas_uint32(uint32_t *p, uint32_t c, uint32_t s)
{
return (!atomic_cmpset_32(p, c, s));
}
JEMALLOC_INLINE void
atomic_write_uint32(uint32_t *p, uint32_t x)
{
atomic_store_rel_32(p, x);
}
#elif (defined(JEMALLOC_OSATOMIC))
JEMALLOC_INLINE uint32_t
atomic_add_uint32(uint32_t *p, uint32_t x)
{
return (OSAtomicAdd32((int32_t)x, (int32_t *)p));
}
JEMALLOC_INLINE uint32_t
atomic_sub_uint32(uint32_t *p, uint32_t x)
{
return (OSAtomicAdd32(-((int32_t)x), (int32_t *)p));
}
JEMALLOC_INLINE bool
atomic_cas_uint32(uint32_t *p, uint32_t c, uint32_t s)
{
return (!OSAtomicCompareAndSwap32(c, s, (int32_t *)p));
}
JEMALLOC_INLINE void
atomic_write_uint32(uint32_t *p, uint32_t x)
{
uint32_t o;
/*The documented OSAtomic*() API does not expose an atomic exchange. */
do {
o = atomic_read_uint32(p);
} while (atomic_cas_uint32(p, o, x));
}
#elif (defined(_MSC_VER))
JEMALLOC_INLINE uint32_t
atomic_add_uint32(uint32_t *p, uint32_t x)
{
return (InterlockedExchangeAdd(p, x) + x);
}
JEMALLOC_INLINE uint32_t
atomic_sub_uint32(uint32_t *p, uint32_t x)
{
return (InterlockedExchangeAdd(p, -((int32_t)x)) - x);
}
JEMALLOC_INLINE bool
atomic_cas_uint32(uint32_t *p, uint32_t c, uint32_t s)
{
uint32_t o;
o = InterlockedCompareExchange(p, s, c);
return (o != c);
}
JEMALLOC_INLINE void
atomic_write_uint32(uint32_t *p, uint32_t x)
{
InterlockedExchange(p, x);
}
#elif (defined(__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4) || \
defined(JE_FORCE_SYNC_COMPARE_AND_SWAP_4))
JEMALLOC_INLINE uint32_t
atomic_add_uint32(uint32_t *p, uint32_t x)
{
return (__sync_add_and_fetch(p, x));
}
JEMALLOC_INLINE uint32_t
atomic_sub_uint32(uint32_t *p, uint32_t x)
{
return (__sync_sub_and_fetch(p, x));
}
JEMALLOC_INLINE bool
atomic_cas_uint32(uint32_t *p, uint32_t c, uint32_t s)
{
return (!__sync_bool_compare_and_swap(p, c, s));
}
JEMALLOC_INLINE void
atomic_write_uint32(uint32_t *p, uint32_t x)
{
__sync_lock_test_and_set(p, x);
}
#else
# error "Missing implementation for 32-bit atomic operations"
#ifdef JEMALLOC_ATOMIC_U64
JEMALLOC_GENERATE_EXPANDED_INT_ATOMICS(uint64_t, u64, 3)
#endif
/******************************************************************************/
/* Pointer operations. */
JEMALLOC_INLINE void *
atomic_add_p(void **p, void *x)
{
#undef ATOMIC_INLINE
#if (LG_SIZEOF_PTR == 3)
return ((void *)atomic_add_uint64((uint64_t *)p, (uint64_t)x));
#elif (LG_SIZEOF_PTR == 2)
return ((void *)atomic_add_uint32((uint32_t *)p, (uint32_t)x));
#endif
}
JEMALLOC_INLINE void *
atomic_sub_p(void **p, void *x)
{
#if (LG_SIZEOF_PTR == 3)
return ((void *)atomic_add_uint64((uint64_t *)p,
(uint64_t)-((int64_t)x)));
#elif (LG_SIZEOF_PTR == 2)
return ((void *)atomic_add_uint32((uint32_t *)p,
(uint32_t)-((int32_t)x)));
#endif
}
JEMALLOC_INLINE bool
atomic_cas_p(void **p, void *c, void *s)
{
#if (LG_SIZEOF_PTR == 3)
return (atomic_cas_uint64((uint64_t *)p, (uint64_t)c, (uint64_t)s));
#elif (LG_SIZEOF_PTR == 2)
return (atomic_cas_uint32((uint32_t *)p, (uint32_t)c, (uint32_t)s));
#endif
}
JEMALLOC_INLINE void
atomic_write_p(void **p, const void *x)
{
#if (LG_SIZEOF_PTR == 3)
atomic_write_uint64((uint64_t *)p, (uint64_t)x);
#elif (LG_SIZEOF_PTR == 2)
atomic_write_uint32((uint32_t *)p, (uint32_t)x);
#endif
}
/******************************************************************************/
/* size_t operations. */
JEMALLOC_INLINE size_t
atomic_add_z(size_t *p, size_t x)
{
#if (LG_SIZEOF_PTR == 3)
return ((size_t)atomic_add_uint64((uint64_t *)p, (uint64_t)x));
#elif (LG_SIZEOF_PTR == 2)
return ((size_t)atomic_add_uint32((uint32_t *)p, (uint32_t)x));
#endif
}
JEMALLOC_INLINE size_t
atomic_sub_z(size_t *p, size_t x)
{
#if (LG_SIZEOF_PTR == 3)
return ((size_t)atomic_add_uint64((uint64_t *)p,
(uint64_t)-((int64_t)x)));
#elif (LG_SIZEOF_PTR == 2)
return ((size_t)atomic_add_uint32((uint32_t *)p,
(uint32_t)-((int32_t)x)));
#endif
}
JEMALLOC_INLINE bool
atomic_cas_z(size_t *p, size_t c, size_t s)
{
#if (LG_SIZEOF_PTR == 3)
return (atomic_cas_uint64((uint64_t *)p, (uint64_t)c, (uint64_t)s));
#elif (LG_SIZEOF_PTR == 2)
return (atomic_cas_uint32((uint32_t *)p, (uint32_t)c, (uint32_t)s));
#endif
}
JEMALLOC_INLINE void
atomic_write_z(size_t *p, size_t x)
{
#if (LG_SIZEOF_PTR == 3)
atomic_write_uint64((uint64_t *)p, (uint64_t)x);
#elif (LG_SIZEOF_PTR == 2)
atomic_write_uint32((uint32_t *)p, (uint32_t)x);
#endif
}
/******************************************************************************/
/* unsigned operations. */
JEMALLOC_INLINE unsigned
atomic_add_u(unsigned *p, unsigned x)
{
#if (LG_SIZEOF_INT == 3)
return ((unsigned)atomic_add_uint64((uint64_t *)p, (uint64_t)x));
#elif (LG_SIZEOF_INT == 2)
return ((unsigned)atomic_add_uint32((uint32_t *)p, (uint32_t)x));
#endif
}
JEMALLOC_INLINE unsigned
atomic_sub_u(unsigned *p, unsigned x)
{
#if (LG_SIZEOF_INT == 3)
return ((unsigned)atomic_add_uint64((uint64_t *)p,
(uint64_t)-((int64_t)x)));
#elif (LG_SIZEOF_INT == 2)
return ((unsigned)atomic_add_uint32((uint32_t *)p,
(uint32_t)-((int32_t)x)));
#endif
}
JEMALLOC_INLINE bool
atomic_cas_u(unsigned *p, unsigned c, unsigned s)
{
#if (LG_SIZEOF_INT == 3)
return (atomic_cas_uint64((uint64_t *)p, (uint64_t)c, (uint64_t)s));
#elif (LG_SIZEOF_INT == 2)
return (atomic_cas_uint32((uint32_t *)p, (uint32_t)c, (uint32_t)s));
#endif
}
JEMALLOC_INLINE void
atomic_write_u(unsigned *p, unsigned x)
{
#if (LG_SIZEOF_INT == 3)
atomic_write_uint64((uint64_t *)p, (uint64_t)x);
#elif (LG_SIZEOF_INT == 2)
atomic_write_uint32((uint32_t *)p, (uint32_t)x);
#endif
}
/******************************************************************************/
#endif
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/
#endif /* JEMALLOC_INTERNAL_ATOMIC_H */

View file

@ -0,0 +1,94 @@
#ifndef JEMALLOC_INTERNAL_ATOMIC_C11_H
#define JEMALLOC_INTERNAL_ATOMIC_C11_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include <stdatomic.h>
#define ATOMIC_INIT(...) ATOMIC_VAR_INIT(__VA_ARGS__)
#define atomic_memory_order_t memory_order
#define atomic_memory_order_relaxed memory_order_relaxed
#define atomic_memory_order_acquire memory_order_acquire
#define atomic_memory_order_release memory_order_release
#define atomic_memory_order_acq_rel memory_order_acq_rel
#define atomic_memory_order_seq_cst memory_order_seq_cst
#define atomic_fence atomic_thread_fence
/* clang-format off */
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, \
/* unused */ lg_size) \
typedef _Atomic(type) atomic_##short_type##_t; \
\
ATOMIC_INLINE type \
atomic_load_##short_type(const atomic_##short_type##_t *a, \
atomic_memory_order_t mo) { \
/* \
* A strict interpretation of the C standard prevents \
* atomic_load from taking a const argument, but it's \
* convenient for our purposes. This cast is a workaround. \
*/ \
atomic_##short_type##_t* a_nonconst = \
(atomic_##short_type##_t*)a; \
return atomic_load_explicit(a_nonconst, mo); \
} \
\
ATOMIC_INLINE void \
atomic_store_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
atomic_store_explicit(a, val, mo); \
} \
\
ATOMIC_INLINE type \
atomic_exchange_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
return atomic_exchange_explicit(a, val, mo); \
} \
\
ATOMIC_INLINE bool \
atomic_compare_exchange_weak_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
return atomic_compare_exchange_weak_explicit(a, expected, \
desired, success_mo, failure_mo); \
} \
\
ATOMIC_INLINE bool \
atomic_compare_exchange_strong_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
return atomic_compare_exchange_strong_explicit(a, expected, \
desired, success_mo, failure_mo); \
}
/* clang-format on */
/*
* Integral types have some special operations available that non-integral ones
* lack.
*/
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, /* unused */ lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
\
ATOMIC_INLINE type atomic_fetch_add_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_add_explicit(a, val, mo); \
} \
\
ATOMIC_INLINE type atomic_fetch_sub_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_sub_explicit(a, val, mo); \
} \
ATOMIC_INLINE type atomic_fetch_and_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_and_explicit(a, val, mo); \
} \
ATOMIC_INLINE type atomic_fetch_or_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_or_explicit(a, val, mo); \
} \
ATOMIC_INLINE type atomic_fetch_xor_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return atomic_fetch_xor_explicit(a, val, mo); \
}
#endif /* JEMALLOC_INTERNAL_ATOMIC_C11_H */

View file

@ -0,0 +1,121 @@
#ifndef JEMALLOC_INTERNAL_ATOMIC_GCC_ATOMIC_H
#define JEMALLOC_INTERNAL_ATOMIC_GCC_ATOMIC_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
#define ATOMIC_INLINE JEMALLOC_ALWAYS_INLINE
#define ATOMIC_INIT(...) \
{ __VA_ARGS__ }
typedef enum {
atomic_memory_order_relaxed,
atomic_memory_order_acquire,
atomic_memory_order_release,
atomic_memory_order_acq_rel,
atomic_memory_order_seq_cst
} atomic_memory_order_t;
ATOMIC_INLINE int
atomic_enum_to_builtin(atomic_memory_order_t mo) {
switch (mo) {
case atomic_memory_order_relaxed:
return __ATOMIC_RELAXED;
case atomic_memory_order_acquire:
return __ATOMIC_ACQUIRE;
case atomic_memory_order_release:
return __ATOMIC_RELEASE;
case atomic_memory_order_acq_rel:
return __ATOMIC_ACQ_REL;
case atomic_memory_order_seq_cst:
return __ATOMIC_SEQ_CST;
}
/* Can't happen; the switch is exhaustive. */
not_reached();
}
ATOMIC_INLINE void
atomic_fence(atomic_memory_order_t mo) {
__atomic_thread_fence(atomic_enum_to_builtin(mo));
}
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
typedef struct { \
type repr; \
} atomic_##short_type##_t; \
\
ATOMIC_INLINE type atomic_load_##short_type( \
const atomic_##short_type##_t *a, atomic_memory_order_t mo) { \
type result; \
__atomic_load(&a->repr, &result, atomic_enum_to_builtin(mo)); \
return result; \
} \
\
ATOMIC_INLINE void atomic_store_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
__atomic_store(&a->repr, &val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_exchange_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
type result; \
__atomic_exchange( \
&a->repr, &val, &result, atomic_enum_to_builtin(mo)); \
return result; \
} \
\
ATOMIC_INLINE bool atomic_compare_exchange_weak_##short_type( \
atomic_##short_type##_t *a, UNUSED type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
return __atomic_compare_exchange(&a->repr, expected, &desired, \
true, atomic_enum_to_builtin(success_mo), \
atomic_enum_to_builtin(failure_mo)); \
} \
\
ATOMIC_INLINE bool atomic_compare_exchange_strong_##short_type( \
atomic_##short_type##_t *a, UNUSED type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
return __atomic_compare_exchange(&a->repr, expected, &desired, \
false, atomic_enum_to_builtin(success_mo), \
atomic_enum_to_builtin(failure_mo)); \
}
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, /* unused */ lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
\
ATOMIC_INLINE type atomic_fetch_add_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_add( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_fetch_sub_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_sub( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_fetch_and_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_and( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_fetch_or_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_or( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
} \
\
ATOMIC_INLINE type atomic_fetch_xor_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __atomic_fetch_xor( \
&a->repr, val, atomic_enum_to_builtin(mo)); \
}
#undef ATOMIC_INLINE
#endif /* JEMALLOC_INTERNAL_ATOMIC_GCC_ATOMIC_H */

View file

@ -0,0 +1,197 @@
#ifndef JEMALLOC_INTERNAL_ATOMIC_GCC_SYNC_H
#define JEMALLOC_INTERNAL_ATOMIC_GCC_SYNC_H
#include "jemalloc/internal/jemalloc_preamble.h"
#define ATOMIC_INLINE JEMALLOC_ALWAYS_INLINE
#define ATOMIC_INIT(...) \
{ __VA_ARGS__ }
typedef enum {
atomic_memory_order_relaxed,
atomic_memory_order_acquire,
atomic_memory_order_release,
atomic_memory_order_acq_rel,
atomic_memory_order_seq_cst
} atomic_memory_order_t;
ATOMIC_INLINE void
atomic_fence(atomic_memory_order_t mo) {
/* Easy cases first: no barrier, and full barrier. */
if (mo == atomic_memory_order_relaxed) {
asm volatile("" ::: "memory");
return;
}
if (mo == atomic_memory_order_seq_cst) {
asm volatile("" ::: "memory");
__sync_synchronize();
asm volatile("" ::: "memory");
return;
}
asm volatile("" ::: "memory");
#if defined(__i386__) || defined(__x86_64__)
/* This is implicit on x86. */
#elif defined(__ppc64__)
asm volatile("lwsync");
#elif defined(__ppc__)
asm volatile("sync");
#elif defined(__sparc__) && defined(__arch64__)
if (mo == atomic_memory_order_acquire) {
asm volatile("membar #LoadLoad | #LoadStore");
} else if (mo == atomic_memory_order_release) {
asm volatile("membar #LoadStore | #StoreStore");
} else {
asm volatile("membar #LoadLoad | #LoadStore | #StoreStore");
}
#else
__sync_synchronize();
#endif
asm volatile("" ::: "memory");
}
/*
* A correct implementation of seq_cst loads and stores on weakly ordered
* architectures could do either of the following:
* 1. store() is weak-fence -> store -> strong fence, load() is load ->
* strong-fence.
* 2. store() is strong-fence -> store, load() is strong-fence -> load ->
* weak-fence.
* The tricky thing is, load() and store() above can be the load or store
* portions of a gcc __sync builtin, so we have to follow GCC's lead, which
* means going with strategy 2.
* On strongly ordered architectures, the natural strategy is to stick a strong
* fence after seq_cst stores, and have naked loads. So we want the strong
* fences in different places on different architectures.
* atomic_pre_sc_load_fence and atomic_post_sc_store_fence allow us to
* accomplish this.
*/
ATOMIC_INLINE void
atomic_pre_sc_load_fence() {
#if defined(__i386__) || defined(__x86_64__) \
|| (defined(__sparc__) && defined(__arch64__))
atomic_fence(atomic_memory_order_relaxed);
#else
atomic_fence(atomic_memory_order_seq_cst);
#endif
}
ATOMIC_INLINE void
atomic_post_sc_store_fence() {
#if defined(__i386__) || defined(__x86_64__) \
|| (defined(__sparc__) && defined(__arch64__))
atomic_fence(atomic_memory_order_seq_cst);
#else
atomic_fence(atomic_memory_order_relaxed);
#endif
}
/* clang-format off */
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, \
/* unused */ lg_size) \
typedef struct { \
type volatile repr; \
} atomic_##short_type##_t; \
\
ATOMIC_INLINE type \
atomic_load_##short_type(const atomic_##short_type##_t *a, \
atomic_memory_order_t mo) { \
if (mo == atomic_memory_order_seq_cst) { \
atomic_pre_sc_load_fence(); \
} \
type result = a->repr; \
if (mo != atomic_memory_order_relaxed) { \
atomic_fence(atomic_memory_order_acquire); \
} \
return result; \
} \
\
ATOMIC_INLINE void \
atomic_store_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
if (mo != atomic_memory_order_relaxed) { \
atomic_fence(atomic_memory_order_release); \
} \
a->repr = val; \
if (mo == atomic_memory_order_seq_cst) { \
atomic_post_sc_store_fence(); \
} \
} \
\
ATOMIC_INLINE type \
atomic_exchange_##short_type(atomic_##short_type##_t *a, type val, \
atomic_memory_order_t mo) { \
/* \
* Because of FreeBSD, we care about gcc 4.2, which doesn't have\
* an atomic exchange builtin. We fake it with a CAS loop. \
*/ \
while (true) { \
type old = a->repr; \
if (__sync_bool_compare_and_swap(&a->repr, old, val)) { \
return old; \
} \
} \
} \
\
ATOMIC_INLINE bool \
atomic_compare_exchange_weak_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
type prev = __sync_val_compare_and_swap(&a->repr, *expected, \
desired); \
if (prev == *expected) { \
return true; \
} else { \
*expected = prev; \
return false; \
} \
} \
ATOMIC_INLINE bool \
atomic_compare_exchange_strong_##short_type(atomic_##short_type##_t *a, \
type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
type prev = __sync_val_compare_and_swap(&a->repr, *expected, \
desired); \
if (prev == *expected) { \
return true; \
} else { \
*expected = prev; \
return false; \
} \
}
/* clang-format on */
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, /* unused */ lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, /* unused */ lg_size) \
\
ATOMIC_INLINE type atomic_fetch_add_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_add(&a->repr, val); \
} \
\
ATOMIC_INLINE type atomic_fetch_sub_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_sub(&a->repr, val); \
} \
\
ATOMIC_INLINE type atomic_fetch_and_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_and(&a->repr, val); \
} \
\
ATOMIC_INLINE type atomic_fetch_or_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_or(&a->repr, val); \
} \
\
ATOMIC_INLINE type atomic_fetch_xor_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return __sync_fetch_and_xor(&a->repr, val); \
}
#undef ATOMIC_INLINE
#endif /* JEMALLOC_INTERNAL_ATOMIC_GCC_SYNC_H */

View file

@ -0,0 +1,163 @@
#ifndef JEMALLOC_INTERNAL_ATOMIC_MSVC_H
#define JEMALLOC_INTERNAL_ATOMIC_MSVC_H
#include "jemalloc/internal/jemalloc_preamble.h"
#define ATOMIC_INLINE JEMALLOC_ALWAYS_INLINE
#define ATOMIC_INIT(...) \
{ __VA_ARGS__ }
typedef enum {
atomic_memory_order_relaxed,
atomic_memory_order_acquire,
atomic_memory_order_release,
atomic_memory_order_acq_rel,
atomic_memory_order_seq_cst
} atomic_memory_order_t;
typedef char atomic_repr_0_t;
typedef short atomic_repr_1_t;
typedef long atomic_repr_2_t;
typedef __int64 atomic_repr_3_t;
ATOMIC_INLINE void
atomic_fence(atomic_memory_order_t mo) {
_ReadWriteBarrier();
#if defined(_M_ARM) || defined(_M_ARM64)
/* ARM needs a barrier for everything but relaxed. */
if (mo != atomic_memory_order_relaxed) {
MemoryBarrier();
}
#elif defined(_M_IX86) || defined(_M_X64)
/* x86 needs a barrier only for seq_cst. */
if (mo == atomic_memory_order_seq_cst) {
MemoryBarrier();
}
#else
# error "Don't know how to create atomics for this platform for MSVC."
#endif
_ReadWriteBarrier();
}
#define ATOMIC_INTERLOCKED_REPR(lg_size) atomic_repr_##lg_size##_t
#define ATOMIC_CONCAT(a, b) ATOMIC_RAW_CONCAT(a, b)
#define ATOMIC_RAW_CONCAT(a, b) a##b
#define ATOMIC_INTERLOCKED_NAME(base_name, lg_size) \
ATOMIC_CONCAT(base_name, ATOMIC_INTERLOCKED_SUFFIX(lg_size))
#define ATOMIC_INTERLOCKED_SUFFIX(lg_size) \
ATOMIC_CONCAT(ATOMIC_INTERLOCKED_SUFFIX_, lg_size)
#define ATOMIC_INTERLOCKED_SUFFIX_0 8
#define ATOMIC_INTERLOCKED_SUFFIX_1 16
#define ATOMIC_INTERLOCKED_SUFFIX_2
#define ATOMIC_INTERLOCKED_SUFFIX_3 64
#define JEMALLOC_GENERATE_ATOMICS(type, short_type, lg_size) \
typedef struct { \
ATOMIC_INTERLOCKED_REPR(lg_size) repr; \
} atomic_##short_type##_t; \
\
ATOMIC_INLINE type atomic_load_##short_type( \
const atomic_##short_type##_t *a, atomic_memory_order_t mo) { \
ATOMIC_INTERLOCKED_REPR(lg_size) ret = a->repr; \
if (mo != atomic_memory_order_relaxed) { \
atomic_fence(atomic_memory_order_acquire); \
} \
return (type)ret; \
} \
\
ATOMIC_INLINE void atomic_store_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
if (mo != atomic_memory_order_relaxed) { \
atomic_fence(atomic_memory_order_release); \
} \
a->repr = (ATOMIC_INTERLOCKED_REPR(lg_size))val; \
if (mo == atomic_memory_order_seq_cst) { \
atomic_fence(atomic_memory_order_seq_cst); \
} \
} \
\
ATOMIC_INLINE type atomic_exchange_##short_type( \
atomic_##short_type##_t *a, type val, atomic_memory_order_t mo) { \
return (type)ATOMIC_INTERLOCKED_NAME(_InterlockedExchange, \
lg_size)(&a->repr, (ATOMIC_INTERLOCKED_REPR(lg_size))val); \
} \
\
ATOMIC_INLINE bool atomic_compare_exchange_weak_##short_type( \
atomic_##short_type##_t *a, type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
ATOMIC_INTERLOCKED_REPR(lg_size) \
e = (ATOMIC_INTERLOCKED_REPR(lg_size)) * expected; \
ATOMIC_INTERLOCKED_REPR(lg_size) \
d = (ATOMIC_INTERLOCKED_REPR(lg_size))desired; \
ATOMIC_INTERLOCKED_REPR(lg_size) \
old = ATOMIC_INTERLOCKED_NAME( \
_InterlockedCompareExchange, lg_size)(&a->repr, d, e); \
if (old == e) { \
return true; \
} else { \
*expected = (type)old; \
return false; \
} \
} \
\
ATOMIC_INLINE bool atomic_compare_exchange_strong_##short_type( \
atomic_##short_type##_t *a, type *expected, type desired, \
atomic_memory_order_t success_mo, \
atomic_memory_order_t failure_mo) { \
/* We implement the weak version with strong semantics. */ \
return atomic_compare_exchange_weak_##short_type( \
a, expected, desired, success_mo, failure_mo); \
}
/* clang-format off */
#define JEMALLOC_GENERATE_INT_ATOMICS(type, short_type, lg_size) \
JEMALLOC_GENERATE_ATOMICS(type, short_type, lg_size) \
\
ATOMIC_INLINE type \
atomic_fetch_add_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
return (type)ATOMIC_INTERLOCKED_NAME(_InterlockedExchangeAdd, \
lg_size)(&a->repr, (ATOMIC_INTERLOCKED_REPR(lg_size))val); \
} \
\
ATOMIC_INLINE type \
atomic_fetch_sub_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
/* \
* MSVC warns on negation of unsigned operands, but for us it \
* gives exactly the right semantics (MAX_TYPE + 1 - operand). \
*/ \
__pragma(warning(push)) \
__pragma(warning(disable: 4146)) \
return atomic_fetch_add_##short_type(a, -val, mo); \
__pragma(warning(pop)) \
} \
ATOMIC_INLINE type \
atomic_fetch_and_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
return (type)ATOMIC_INTERLOCKED_NAME(_InterlockedAnd, lg_size)( \
&a->repr, (ATOMIC_INTERLOCKED_REPR(lg_size))val); \
} \
ATOMIC_INLINE type \
atomic_fetch_or_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
return (type)ATOMIC_INTERLOCKED_NAME(_InterlockedOr, lg_size)( \
&a->repr, (ATOMIC_INTERLOCKED_REPR(lg_size))val); \
} \
ATOMIC_INLINE type \
atomic_fetch_xor_##short_type(atomic_##short_type##_t *a, \
type val, atomic_memory_order_t mo) { \
return (type)ATOMIC_INTERLOCKED_NAME(_InterlockedXor, lg_size)( \
&a->repr, (ATOMIC_INTERLOCKED_REPR(lg_size))val); \
}
/* clang-format on */
#undef ATOMIC_INLINE
#endif /* JEMALLOC_INTERNAL_ATOMIC_MSVC_H */

View file

@ -0,0 +1,38 @@
#ifndef JEMALLOC_INTERNAL_BACKGROUND_THREAD_EXTERNS_H
#define JEMALLOC_INTERNAL_BACKGROUND_THREAD_EXTERNS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/background_thread_structs.h"
#include "jemalloc/internal/base.h"
#include "jemalloc/internal/mutex.h"
extern bool opt_background_thread;
extern size_t opt_max_background_threads;
extern malloc_mutex_t background_thread_lock;
extern atomic_b_t background_thread_enabled_state;
extern size_t n_background_threads;
extern size_t max_background_threads;
extern background_thread_info_t *background_thread_info;
bool background_thread_create(tsd_t *tsd, unsigned arena_ind);
bool background_threads_enable(tsd_t *tsd);
bool background_threads_disable(tsd_t *tsd);
bool background_thread_is_started(background_thread_info_t *info);
void background_thread_wakeup_early(
background_thread_info_t *info, nstime_t *remaining_sleep);
void background_thread_prefork0(tsdn_t *tsdn);
void background_thread_prefork1(tsdn_t *tsdn);
void background_thread_postfork_parent(tsdn_t *tsdn);
void background_thread_postfork_child(tsdn_t *tsdn);
bool background_thread_stats_read(
tsdn_t *tsdn, background_thread_stats_t *stats);
void background_thread_ctl_init(tsdn_t *tsdn);
#ifdef JEMALLOC_PTHREAD_CREATE_WRAPPER
extern int pthread_create_wrapper(pthread_t *__restrict, const pthread_attr_t *,
void *(*)(void *), void *__restrict);
#endif
bool background_thread_boot0(void);
bool background_thread_boot1(tsdn_t *tsdn, base_t *base);
#endif /* JEMALLOC_INTERNAL_BACKGROUND_THREAD_EXTERNS_H */

View file

@ -0,0 +1,58 @@
#ifndef JEMALLOC_INTERNAL_BACKGROUND_THREAD_INLINES_H
#define JEMALLOC_INTERNAL_BACKGROUND_THREAD_INLINES_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_inlines_a.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/background_thread_externs.h"
JEMALLOC_ALWAYS_INLINE bool
background_thread_enabled(void) {
return atomic_load_b(&background_thread_enabled_state, ATOMIC_RELAXED);
}
JEMALLOC_ALWAYS_INLINE void
background_thread_enabled_set_impl(bool state) {
atomic_store_b(&background_thread_enabled_state, state, ATOMIC_RELAXED);
}
JEMALLOC_ALWAYS_INLINE void
background_thread_enabled_set(tsdn_t *tsdn, bool state) {
malloc_mutex_assert_owner(tsdn, &background_thread_lock);
background_thread_enabled_set_impl(state);
}
JEMALLOC_ALWAYS_INLINE background_thread_info_t *
arena_background_thread_info_get(arena_t *arena) {
unsigned arena_ind = arena_ind_get(arena);
return &background_thread_info[arena_ind % max_background_threads];
}
JEMALLOC_ALWAYS_INLINE background_thread_info_t *
background_thread_info_get(size_t ind) {
return &background_thread_info[ind % max_background_threads];
}
JEMALLOC_ALWAYS_INLINE uint64_t
background_thread_wakeup_time_get(background_thread_info_t *info) {
uint64_t next_wakeup = nstime_ns(&info->next_wakeup);
assert(atomic_load_b(&info->indefinite_sleep, ATOMIC_ACQUIRE)
== (next_wakeup == BACKGROUND_THREAD_INDEFINITE_SLEEP));
return next_wakeup;
}
JEMALLOC_ALWAYS_INLINE void
background_thread_wakeup_time_set(
tsdn_t *tsdn, background_thread_info_t *info, uint64_t wakeup_time) {
malloc_mutex_assert_owner(tsdn, &info->mtx);
atomic_store_b(&info->indefinite_sleep,
wakeup_time == BACKGROUND_THREAD_INDEFINITE_SLEEP, ATOMIC_RELEASE);
nstime_init(&info->next_wakeup, wakeup_time);
}
JEMALLOC_ALWAYS_INLINE bool
background_thread_indefinite_sleep(background_thread_info_t *info) {
return atomic_load_b(&info->indefinite_sleep, ATOMIC_ACQUIRE);
}
#endif /* JEMALLOC_INTERNAL_BACKGROUND_THREAD_INLINES_H */

View file

@ -0,0 +1,69 @@
#ifndef JEMALLOC_INTERNAL_BACKGROUND_THREAD_STRUCTS_H
#define JEMALLOC_INTERNAL_BACKGROUND_THREAD_STRUCTS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/mutex.h"
/* This file really combines "structs" and "types", but only transitionally. */
#if defined(JEMALLOC_BACKGROUND_THREAD) || defined(JEMALLOC_LAZY_LOCK)
# define JEMALLOC_PTHREAD_CREATE_WRAPPER
#endif
#define BACKGROUND_THREAD_INDEFINITE_SLEEP UINT64_MAX
#define MAX_BACKGROUND_THREAD_LIMIT MALLOCX_ARENA_LIMIT
#define DEFAULT_NUM_BACKGROUND_THREAD 4
/*
* These exist only as a transitional state. Eventually, deferral should be
* part of the PAI, and each implementation can indicate wait times with more
* specificity.
*/
#define BACKGROUND_THREAD_HPA_INTERVAL_MAX_UNINITIALIZED (-2)
#define BACKGROUND_THREAD_HPA_INTERVAL_MAX_DEFAULT_WHEN_ENABLED 5000
#define BACKGROUND_THREAD_DEFERRED_MIN UINT64_C(0)
#define BACKGROUND_THREAD_DEFERRED_MAX UINT64_MAX
typedef enum {
background_thread_stopped,
background_thread_started,
/* Thread waits on the global lock when paused (for arena_reset). */
background_thread_paused,
} background_thread_state_t;
struct background_thread_info_s {
#ifdef JEMALLOC_BACKGROUND_THREAD
/* Background thread is pthread specific. */
pthread_t thread;
pthread_cond_t cond;
#endif
malloc_mutex_t mtx;
background_thread_state_t state;
/* When true, it means no wakeup scheduled. */
atomic_b_t indefinite_sleep;
/* Next scheduled wakeup time (absolute time in ns). */
nstime_t next_wakeup;
/*
* Since the last background thread run, newly added number of pages
* that need to be purged by the next wakeup. This is adjusted on
* epoch advance, and is used to determine whether we should signal the
* background thread to wake up earlier.
*/
size_t npages_to_purge_new;
/* Stats: total number of runs since started. */
uint64_t tot_n_runs;
/* Stats: total sleep time since started. */
nstime_t tot_sleep_time;
};
typedef struct background_thread_info_s background_thread_info_t;
struct background_thread_stats_s {
size_t num_threads;
uint64_t num_runs;
nstime_t run_interval;
mutex_prof_data_t max_counter_per_bg_thd;
};
typedef struct background_thread_stats_s background_thread_stats_t;
#endif /* JEMALLOC_INTERNAL_BACKGROUND_THREAD_STRUCTS_H */

View file

@ -1,25 +1,125 @@
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
#ifndef JEMALLOC_INTERNAL_BASE_H
#define JEMALLOC_INTERNAL_BASE_H
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/edata.h"
#include "jemalloc/internal/ehooks.h"
#include "jemalloc/internal/mutex.h"
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
/*
* Alignment when THP is not enabled. Set to constant 2M in case the HUGEPAGE
* value is unexpected high (which would cause VM over-reservation).
*/
#define BASE_BLOCK_MIN_ALIGN ((size_t)2 << 20)
void *base_alloc(tsdn_t *tsdn, size_t size);
void base_stats_get(tsdn_t *tsdn, size_t *allocated, size_t *resident,
size_t *mapped);
bool base_boot(void);
void base_prefork(tsdn_t *tsdn);
void base_postfork_parent(tsdn_t *tsdn);
void base_postfork_child(tsdn_t *tsdn);
enum metadata_thp_mode_e {
metadata_thp_disabled = 0,
/*
* Lazily enable hugepage for metadata. To avoid high RSS caused by THP
* + low usage arena (i.e. THP becomes a significant percentage), the
* "auto" option only starts using THP after a base allocator used up
* the first THP region. Starting from the second hugepage (in a single
* arena), "auto" behaves the same as "always", i.e. madvise hugepage
* right away.
*/
metadata_thp_auto = 1,
metadata_thp_always = 2,
metadata_thp_mode_limit = 3
};
typedef enum metadata_thp_mode_e metadata_thp_mode_t;
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#define METADATA_THP_DEFAULT metadata_thp_disabled
extern metadata_thp_mode_t opt_metadata_thp;
extern const char *const metadata_thp_mode_names[];
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/
/* Embedded at the beginning of every block of base-managed virtual memory. */
typedef struct base_block_s base_block_t;
struct base_block_s {
/* Total size of block's virtual memory mapping. */
size_t size;
/* Next block in list of base's blocks. */
base_block_t *next;
/* Tracks unused trailing space. */
edata_t edata;
};
typedef struct base_s base_t;
struct base_s {
/*
* User-configurable extent hook functions.
*/
ehooks_t ehooks;
/*
* User-configurable extent hook functions for metadata allocations.
*/
ehooks_t ehooks_base;
/* Protects base_alloc() and base_stats_get() operations. */
malloc_mutex_t mtx;
/* Using THP when true (metadata_thp auto mode). */
bool auto_thp_switched;
/*
* Most recent size class in the series of increasingly large base
* extents. Logarithmic spacing between subsequent allocations ensures
* that the total number of distinct mappings remains small.
*/
pszind_t pind_last;
/* Serial number generation state. */
size_t extent_sn_next;
/* Chain of all blocks associated with base. */
base_block_t *blocks;
/* Heap of extents that track unused trailing space within blocks. */
edata_heap_t avail[SC_NSIZES];
/* Contains reusable base edata (used by tcache_stacks currently). */
edata_avail_t edata_avail;
/* Stats, only maintained if config_stats. */
size_t allocated;
size_t edata_allocated;
size_t rtree_allocated;
size_t resident;
size_t mapped;
/* Number of THP regions touched. */
size_t n_thp;
};
static inline unsigned
base_ind_get(const base_t *base) {
return ehooks_ind_get(&base->ehooks);
}
static inline bool
metadata_thp_enabled(void) {
return (opt_metadata_thp != metadata_thp_disabled);
}
base_t *b0get(void);
base_t *base_new(tsdn_t *tsdn, unsigned ind, const extent_hooks_t *extent_hooks,
bool metadata_use_hooks);
void base_delete(tsdn_t *tsdn, base_t *base);
ehooks_t *base_ehooks_get(base_t *base);
ehooks_t *base_ehooks_get_for_metadata(base_t *base);
extent_hooks_t *base_extent_hooks_set(
base_t *base, extent_hooks_t *extent_hooks);
void *base_alloc(tsdn_t *tsdn, base_t *base, size_t size, size_t alignment);
edata_t *base_alloc_edata(tsdn_t *tsdn, base_t *base);
void *base_alloc_rtree(tsdn_t *tsdn, base_t *base, size_t size);
void *b0_alloc_tcache_stack(tsdn_t *tsdn, size_t size);
void b0_dalloc_tcache_stack(tsdn_t *tsdn, void *tcache_stack);
void base_stats_get(tsdn_t *tsdn, base_t *base, size_t *allocated,
size_t *edata_allocated, size_t *rtree_allocated, size_t *resident,
size_t *mapped, size_t *n_thp);
void base_prefork(tsdn_t *tsdn, base_t *base);
void base_postfork_parent(tsdn_t *tsdn, base_t *base);
void base_postfork_child(tsdn_t *tsdn, base_t *base);
bool base_boot(tsdn_t *tsdn);
#endif /* JEMALLOC_INTERNAL_BASE_H */

View file

@ -0,0 +1,121 @@
#ifndef JEMALLOC_INTERNAL_BIN_H
#define JEMALLOC_INTERNAL_BIN_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/bin_info.h"
#include "jemalloc/internal/bin_stats.h"
#include "jemalloc/internal/bin_types.h"
#include "jemalloc/internal/edata.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/sc.h"
/*
* A bin contains a set of extents that are currently being used for slab
* allocations.
*/
typedef struct bin_s bin_t;
struct bin_s {
/* All operations on bin_t fields require lock ownership. */
malloc_mutex_t lock;
/*
* Bin statistics. These get touched every time the lock is acquired,
* so put them close by in the hopes of getting some cache locality.
*/
bin_stats_t stats;
/*
* Current slab being used to service allocations of this bin's size
* class. slabcur is independent of slabs_{nonfull,full}; whenever
* slabcur is reassigned, the previous slab must be deallocated or
* inserted into slabs_{nonfull,full}.
*/
edata_t *slabcur;
/*
* Heap of non-full slabs. This heap is used to assure that new
* allocations come from the non-full slab that is oldest/lowest in
* memory.
*/
edata_heap_t slabs_nonfull;
/* List used to track full slabs. */
edata_list_active_t slabs_full;
};
/* A set of sharded bins of the same size class. */
typedef struct bins_s bins_t;
struct bins_s {
/* Sharded bins. Dynamically sized. */
bin_t *bin_shards;
};
void bin_shard_sizes_boot(unsigned bin_shard_sizes[SC_NBINS]);
bool bin_update_shard_size(unsigned bin_shards[SC_NBINS], size_t start_size,
size_t end_size, size_t nshards);
/* Initializes a bin to empty. Returns true on error. */
bool bin_init(bin_t *bin);
/* Forking. */
void bin_prefork(tsdn_t *tsdn, bin_t *bin);
void bin_postfork_parent(tsdn_t *tsdn, bin_t *bin);
void bin_postfork_child(tsdn_t *tsdn, bin_t *bin);
/* Slab region allocation. */
void *bin_slab_reg_alloc(edata_t *slab, const bin_info_t *bin_info);
void bin_slab_reg_alloc_batch(
edata_t *slab, const bin_info_t *bin_info, unsigned cnt, void **ptrs);
/* Slab list management. */
void bin_slabs_nonfull_insert(bin_t *bin, edata_t *slab);
void bin_slabs_nonfull_remove(bin_t *bin, edata_t *slab);
edata_t *bin_slabs_nonfull_tryget(bin_t *bin);
void bin_slabs_full_insert(bool is_auto, bin_t *bin, edata_t *slab);
void bin_slabs_full_remove(bool is_auto, bin_t *bin, edata_t *slab);
/* Slab association / demotion. */
void bin_dissociate_slab(bool is_auto, edata_t *slab, bin_t *bin);
void bin_lower_slab(tsdn_t *tsdn, bool is_auto, edata_t *slab, bin_t *bin);
/* Deallocation helpers (called under bin lock). */
void bin_dalloc_slab_prepare(tsdn_t *tsdn, edata_t *slab, bin_t *bin);
void bin_dalloc_locked_handle_newly_empty(
tsdn_t *tsdn, bool is_auto, edata_t *slab, bin_t *bin);
void bin_dalloc_locked_handle_newly_nonempty(
tsdn_t *tsdn, bool is_auto, edata_t *slab, bin_t *bin);
/* Slabcur refill and allocation. */
void bin_refill_slabcur_with_fresh_slab(tsdn_t *tsdn, bin_t *bin,
szind_t binind, edata_t *fresh_slab);
void *bin_malloc_with_fresh_slab(tsdn_t *tsdn, bin_t *bin,
szind_t binind, edata_t *fresh_slab);
bool bin_refill_slabcur_no_fresh_slab(tsdn_t *tsdn, bool is_auto,
bin_t *bin);
void *bin_malloc_no_fresh_slab(tsdn_t *tsdn, bool is_auto, bin_t *bin,
szind_t binind);
/* Bin selection. */
bin_t *bin_choose(tsdn_t *tsdn, arena_t *arena, szind_t binind,
unsigned *binshard_p);
/* Stats. */
static inline void
bin_stats_merge(tsdn_t *tsdn, bin_stats_data_t *dst_bin_stats, bin_t *bin) {
malloc_mutex_lock(tsdn, &bin->lock);
malloc_mutex_prof_accum(tsdn, &dst_bin_stats->mutex_data, &bin->lock);
bin_stats_t *stats = &dst_bin_stats->stats_data;
stats->nmalloc += bin->stats.nmalloc;
stats->ndalloc += bin->stats.ndalloc;
stats->nrequests += bin->stats.nrequests;
stats->curregs += bin->stats.curregs;
stats->nfills += bin->stats.nfills;
stats->nflushes += bin->stats.nflushes;
stats->nslabs += bin->stats.nslabs;
stats->reslabs += bin->stats.reslabs;
stats->curslabs += bin->stats.curslabs;
stats->nonfull_slabs += bin->stats.nonfull_slabs;
malloc_mutex_unlock(tsdn, &bin->lock);
}
#endif /* JEMALLOC_INTERNAL_BIN_H */

View file

@ -0,0 +1,51 @@
#ifndef JEMALLOC_INTERNAL_BIN_INFO_H
#define JEMALLOC_INTERNAL_BIN_INFO_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/bitmap.h"
/*
* Read-only information associated with each element of arena_t's bins array
* is stored separately, partly to reduce memory usage (only one copy, rather
* than one per arena), but mainly to avoid false cacheline sharing.
*
* Each slab has the following layout:
*
* /--------------------\
* | region 0 |
* |--------------------|
* | region 1 |
* |--------------------|
* | ... |
* | ... |
* | ... |
* |--------------------|
* | region nregs-1 |
* \--------------------/
*/
typedef struct bin_info_s bin_info_t;
struct bin_info_s {
/* Size of regions in a slab for this bin's size class. */
size_t reg_size;
/* Total size of a slab for this bin's size class. */
size_t slab_size;
/* Total number of regions in a slab for this bin's size class. */
uint32_t nregs;
/* Number of sharded bins in each arena for this size class. */
uint32_t n_shards;
/*
* Metadata used to manipulate bitmaps for slabs associated with this
* bin.
*/
bitmap_info_t bitmap_info;
};
extern bin_info_t bin_infos[SC_NBINS];
void bin_info_boot(sc_data_t *sc_data, unsigned bin_shard_sizes[SC_NBINS]);
#endif /* JEMALLOC_INTERNAL_BIN_INFO_H */

View file

@ -0,0 +1,112 @@
#ifndef JEMALLOC_INTERNAL_BIN_INLINES_H
#define JEMALLOC_INTERNAL_BIN_INLINES_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/bin.h"
#include "jemalloc/internal/bin_info.h"
#include "jemalloc/internal/bitmap.h"
#include "jemalloc/internal/div.h"
#include "jemalloc/internal/edata.h"
#include "jemalloc/internal/sc.h"
/*
* The dalloc bin info contains just the information that the common paths need
* during tcache flushes. By force-inlining these paths, and using local copies
* of data (so that the compiler knows it's constant), we avoid a whole bunch of
* redundant loads and stores by leaving this information in registers.
*/
typedef struct bin_dalloc_locked_info_s bin_dalloc_locked_info_t;
struct bin_dalloc_locked_info_s {
div_info_t div_info;
uint32_t nregs;
uint64_t ndalloc;
};
/* Find the region index of a pointer within a slab. */
JEMALLOC_ALWAYS_INLINE size_t
bin_slab_regind_impl(
div_info_t *div_info, szind_t binind, edata_t *slab, const void *ptr) {
size_t diff, regind;
/* Freeing a pointer outside the slab can cause assertion failure. */
assert((uintptr_t)ptr >= (uintptr_t)edata_addr_get(slab));
assert((uintptr_t)ptr < (uintptr_t)edata_past_get(slab));
/* Freeing an interior pointer can cause assertion failure. */
assert(((uintptr_t)ptr - (uintptr_t)edata_addr_get(slab))
% (uintptr_t)bin_infos[binind].reg_size
== 0);
diff = (size_t)((uintptr_t)ptr - (uintptr_t)edata_addr_get(slab));
/* Avoid doing division with a variable divisor. */
regind = div_compute(div_info, diff);
assert(regind < bin_infos[binind].nregs);
return regind;
}
JEMALLOC_ALWAYS_INLINE size_t
bin_slab_regind(bin_dalloc_locked_info_t *info, szind_t binind,
edata_t *slab, const void *ptr) {
size_t regind = bin_slab_regind_impl(
&info->div_info, binind, slab, ptr);
return regind;
}
JEMALLOC_ALWAYS_INLINE void
bin_dalloc_locked_begin(
bin_dalloc_locked_info_t *info, szind_t binind) {
info->div_info = arena_binind_div_info[binind];
info->nregs = bin_infos[binind].nregs;
info->ndalloc = 0;
}
/*
* Does the deallocation work associated with freeing a single pointer (a
* "step") in between a bin_dalloc_locked begin and end call.
*
* Returns true if arena_slab_dalloc must be called on slab. Doesn't do
* stats updates, which happen during finish (this lets running counts get left
* in a register).
*/
JEMALLOC_ALWAYS_INLINE bool
bin_dalloc_locked_step(tsdn_t *tsdn, bool is_auto, bin_t *bin,
bin_dalloc_locked_info_t *info, szind_t binind, edata_t *slab,
void *ptr) {
const bin_info_t *bin_info = &bin_infos[binind];
size_t regind = bin_slab_regind(info, binind, slab, ptr);
slab_data_t *slab_data = edata_slab_data_get(slab);
assert(edata_nfree_get(slab) < bin_info->nregs);
/* Freeing an unallocated pointer can cause assertion failure. */
assert(bitmap_get(slab_data->bitmap, &bin_info->bitmap_info, regind));
bitmap_unset(slab_data->bitmap, &bin_info->bitmap_info, regind);
edata_nfree_inc(slab);
if (config_stats) {
info->ndalloc++;
}
unsigned nfree = edata_nfree_get(slab);
if (nfree == bin_info->nregs) {
bin_dalloc_locked_handle_newly_empty(
tsdn, is_auto, slab, bin);
return true;
} else if (nfree == 1 && slab != bin->slabcur) {
bin_dalloc_locked_handle_newly_nonempty(
tsdn, is_auto, slab, bin);
}
return false;
}
JEMALLOC_ALWAYS_INLINE void
bin_dalloc_locked_finish(tsdn_t *tsdn, bin_t *bin,
bin_dalloc_locked_info_t *info) {
if (config_stats) {
bin->stats.ndalloc += info->ndalloc;
assert(bin->stats.curregs >= (size_t)info->ndalloc);
bin->stats.curregs -= (size_t)info->ndalloc;
}
}
#endif /* JEMALLOC_INTERNAL_BIN_INLINES_H */

View file

@ -0,0 +1,58 @@
#ifndef JEMALLOC_INTERNAL_BIN_STATS_H
#define JEMALLOC_INTERNAL_BIN_STATS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/mutex_prof.h"
typedef struct bin_stats_s bin_stats_t;
struct bin_stats_s {
/*
* Total number of allocation/deallocation requests served directly by
* the bin. Note that tcache may allocate an object, then recycle it
* many times, resulting many increments to nrequests, but only one
* each to nmalloc and ndalloc.
*/
uint64_t nmalloc;
uint64_t ndalloc;
/*
* Number of allocation requests that correspond to the size of this
* bin. This includes requests served by tcache, though tcache only
* periodically merges into this counter.
*/
uint64_t nrequests;
/*
* Current number of regions of this size class, including regions
* currently cached by tcache.
*/
size_t curregs;
/* Number of tcache fills from this bin. */
uint64_t nfills;
/* Number of tcache flushes to this bin. */
uint64_t nflushes;
/* Total number of slabs created for this bin's size class. */
uint64_t nslabs;
/*
* Total number of slabs reused by extracting them from the slabs heap
* for this bin's size class.
*/
uint64_t reslabs;
/* Current number of slabs in this bin. */
size_t curslabs;
/* Current size of nonfull slabs heap in this bin. */
size_t nonfull_slabs;
};
typedef struct bin_stats_data_s bin_stats_data_t;
struct bin_stats_data_s {
bin_stats_t stats_data;
mutex_prof_data_t mutex_data;
};
#endif /* JEMALLOC_INTERNAL_BIN_STATS_H */

View file

@ -0,0 +1,21 @@
#ifndef JEMALLOC_INTERNAL_BIN_TYPES_H
#define JEMALLOC_INTERNAL_BIN_TYPES_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/sc.h"
#define BIN_SHARDS_MAX (1 << EDATA_BITS_BINSHARD_WIDTH)
#define N_BIN_SHARDS_DEFAULT 1
/* Used in TSD static initializer only. Real init in arena_bind(). */
#define TSD_BINSHARDS_ZERO_INITIALIZER \
{ \
{ UINT8_MAX } \
}
typedef struct tsd_binshards_s tsd_binshards_t;
struct tsd_binshards_s {
uint8_t binshard[SC_NBINS];
};
#endif /* JEMALLOC_INTERNAL_BIN_TYPES_H */

View file

@ -0,0 +1,431 @@
#ifndef JEMALLOC_INTERNAL_BIT_UTIL_H
#define JEMALLOC_INTERNAL_BIT_UTIL_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
/* Sanity check. */
#if !defined(JEMALLOC_INTERNAL_FFSLL) || !defined(JEMALLOC_INTERNAL_FFSL) \
|| !defined(JEMALLOC_INTERNAL_FFS)
# error JEMALLOC_INTERNAL_FFS{,L,LL} should have been defined by configure
#endif
/*
* Unlike the builtins and posix ffs functions, our ffs requires a non-zero
* input, and returns the position of the lowest bit set (as opposed to the
* posix versions, which return 1 larger than that position and use a return
* value of zero as a sentinel. This tends to simplify logic in callers, and
* allows for consistency with the builtins we build fls on top of.
*/
static inline unsigned
ffs_llu(unsigned long long x) {
util_assume(x != 0);
return JEMALLOC_INTERNAL_FFSLL(x) - 1;
}
static inline unsigned
ffs_lu(unsigned long x) {
util_assume(x != 0);
return JEMALLOC_INTERNAL_FFSL(x) - 1;
}
static inline unsigned
ffs_u(unsigned x) {
util_assume(x != 0);
return JEMALLOC_INTERNAL_FFS(x) - 1;
}
/* clang-format off */
#define DO_FLS_SLOW(x, suffix) do { \
util_assume(x != 0); \
x |= (x >> 1); \
x |= (x >> 2); \
x |= (x >> 4); \
x |= (x >> 8); \
x |= (x >> 16); \
if (sizeof(x) > 4) { \
/* \
* If sizeof(x) is 4, then the expression "x >> 32" \
* will generate compiler warnings even if the code \
* never executes. This circumvents the warning, and \
* gets compiled out in optimized builds. \
*/ \
int constant_32 = sizeof(x) * 4; \
x |= (x >> constant_32); \
} \
x++; \
if (x == 0) { \
return 8 * sizeof(x) - 1; \
} \
return ffs_##suffix(x) - 1; \
} while(0)
/* clang-format on */
static inline unsigned
fls_llu_slow(unsigned long long x) {
DO_FLS_SLOW(x, llu);
}
static inline unsigned
fls_lu_slow(unsigned long x) {
DO_FLS_SLOW(x, lu);
}
static inline unsigned
fls_u_slow(unsigned x) {
DO_FLS_SLOW(x, u);
}
#undef DO_FLS_SLOW
#ifdef JEMALLOC_HAVE_BUILTIN_CLZ
static inline unsigned
fls_llu(unsigned long long x) {
util_assume(x != 0);
/*
* Note that the xor here is more naturally written as subtraction; the
* last bit set is the number of bits in the type minus the number of
* leading zero bits. But GCC implements that as:
* bsr edi, edi
* mov eax, 31
* xor edi, 31
* sub eax, edi
* If we write it as xor instead, then we get
* bsr eax, edi
* as desired.
*/
return (8 * sizeof(x) - 1) ^ __builtin_clzll(x);
}
static inline unsigned
fls_lu(unsigned long x) {
util_assume(x != 0);
return (8 * sizeof(x) - 1) ^ __builtin_clzl(x);
}
static inline unsigned
fls_u(unsigned x) {
util_assume(x != 0);
return (8 * sizeof(x) - 1) ^ __builtin_clz(x);
}
#elif defined(_MSC_VER)
# if LG_SIZEOF_PTR == 3
# define DO_BSR64(bit, x) _BitScanReverse64(&bit, x)
# else
/*
* This never actually runs; we're just dodging a compiler error for the
* never-taken branch where sizeof(void *) == 8.
*/
# define DO_BSR64(bit, x) \
bit = 0; \
unreachable()
# endif
/* clang-format off */
#define DO_FLS(x) do { \
if (x == 0) { \
return 8 * sizeof(x); \
} \
unsigned long bit; \
if (sizeof(x) == 4) { \
_BitScanReverse(&bit, (unsigned)x); \
return (unsigned)bit; \
} \
if (sizeof(x) == 8 && sizeof(void *) == 8) { \
DO_BSR64(bit, x); \
return (unsigned)bit; \
} \
if (sizeof(x) == 8 && sizeof(void *) == 4) { \
/* Dodge a compiler warning, as above. */ \
int constant_32 = sizeof(x) * 4; \
if (_BitScanReverse(&bit, \
(unsigned)(x >> constant_32))) { \
return 32 + (unsigned)bit; \
} else { \
_BitScanReverse(&bit, (unsigned)x); \
return (unsigned)bit; \
} \
} \
unreachable(); \
} while (0)
/* clang-format on */
static inline unsigned
fls_llu(unsigned long long x) {
DO_FLS(x);
}
static inline unsigned
fls_lu(unsigned long x) {
DO_FLS(x);
}
static inline unsigned
fls_u(unsigned x) {
DO_FLS(x);
}
# undef DO_FLS
# undef DO_BSR64
#else
static inline unsigned
fls_llu(unsigned long long x) {
return fls_llu_slow(x);
}
static inline unsigned
fls_lu(unsigned long x) {
return fls_lu_slow(x);
}
static inline unsigned
fls_u(unsigned x) {
return fls_u_slow(x);
}
#endif
#if LG_SIZEOF_LONG_LONG > 3
# error "Haven't implemented popcount for 16-byte ints."
#endif
/* clang-format off */
#define DO_POPCOUNT(x, type) do { \
/* \
* Algorithm from an old AMD optimization reference manual. \
* We're putting a little bit more work than you might expect \
* into the no-instrinsic case, since we only support the \
* GCC intrinsics spelling of popcount (for now). Detecting \
* whether or not the popcount builtin is actually useable in \
* MSVC is nontrivial. \
*/ \
\
type bmul = (type)0x0101010101010101ULL; \
\
/* \
* Replace each 2 bits with the sideways sum of the original \
* values. 0x5 = 0b0101. \
* \
* You might expect this to be: \
* x = (x & 0x55...) + ((x >> 1) & 0x55...). \
* That costs an extra mask relative to this, though. \
*/ \
x = x - ((x >> 1) & (0x55U * bmul)); \
/* Replace each 4 bits with their sideays sum. 0x3 = 0b0011. */\
x = (x & (bmul * 0x33U)) + ((x >> 2) & (bmul * 0x33U)); \
/* \
* Replace each 8 bits with their sideways sum. Note that we \
* can't overflow within each 4-bit sum here, so we can skip \
* the initial mask. \
*/ \
x = (x + (x >> 4)) & (bmul * 0x0FU); \
/* \
* None of the partial sums in this multiplication (viewed in \
* base-256) can overflow into the next digit. So the least \
* significant byte of the product will be the least \
* significant byte of the original value, the second least \
* significant byte will be the sum of the two least \
* significant bytes of the original value, and so on. \
* Importantly, the high byte will be the byte-wise sum of all \
* the bytes of the original value. \
*/ \
x = x * bmul; \
x >>= ((sizeof(x) - 1) * 8); \
return (unsigned)x; \
} while(0)
/* clang-format on */
static inline unsigned
popcount_u_slow(unsigned bitmap) {
DO_POPCOUNT(bitmap, unsigned);
}
static inline unsigned
popcount_lu_slow(unsigned long bitmap) {
DO_POPCOUNT(bitmap, unsigned long);
}
static inline unsigned
popcount_llu_slow(unsigned long long bitmap) {
DO_POPCOUNT(bitmap, unsigned long long);
}
#undef DO_POPCOUNT
static inline unsigned
popcount_u(unsigned bitmap) {
#ifdef JEMALLOC_INTERNAL_POPCOUNT
return JEMALLOC_INTERNAL_POPCOUNT(bitmap);
#else
return popcount_u_slow(bitmap);
#endif
}
static inline unsigned
popcount_lu(unsigned long bitmap) {
#ifdef JEMALLOC_INTERNAL_POPCOUNTL
return JEMALLOC_INTERNAL_POPCOUNTL(bitmap);
#else
return popcount_lu_slow(bitmap);
#endif
}
static inline unsigned
popcount_llu(unsigned long long bitmap) {
#ifdef JEMALLOC_INTERNAL_POPCOUNTLL
return JEMALLOC_INTERNAL_POPCOUNTLL(bitmap);
#else
return popcount_llu_slow(bitmap);
#endif
}
/*
* Clears first unset bit in bitmap, and returns
* place of bit. bitmap *must not* be 0.
*/
static inline size_t
cfs_lu(unsigned long *bitmap) {
util_assume(*bitmap != 0);
size_t bit = ffs_lu(*bitmap);
*bitmap ^= ZU(1) << bit;
return bit;
}
static inline unsigned
ffs_zu(size_t x) {
#if LG_SIZEOF_PTR == LG_SIZEOF_INT
return ffs_u(x);
#elif LG_SIZEOF_PTR == LG_SIZEOF_LONG
return ffs_lu(x);
#elif LG_SIZEOF_PTR == LG_SIZEOF_LONG_LONG
return ffs_llu(x);
#else
# error No implementation for size_t ffs()
#endif
}
static inline unsigned
fls_zu(size_t x) {
#if LG_SIZEOF_PTR == LG_SIZEOF_INT
return fls_u(x);
#elif LG_SIZEOF_PTR == LG_SIZEOF_LONG
return fls_lu(x);
#elif LG_SIZEOF_PTR == LG_SIZEOF_LONG_LONG
return fls_llu(x);
#else
# error No implementation for size_t fls()
#endif
}
static inline unsigned
ffs_u64(uint64_t x) {
#if LG_SIZEOF_LONG == 3
return ffs_lu(x);
#elif LG_SIZEOF_LONG_LONG == 3
return ffs_llu(x);
#else
# error No implementation for 64-bit ffs()
#endif
}
static inline unsigned
fls_u64(uint64_t x) {
#if LG_SIZEOF_LONG == 3
return fls_lu(x);
#elif LG_SIZEOF_LONG_LONG == 3
return fls_llu(x);
#else
# error No implementation for 64-bit fls()
#endif
}
static inline unsigned
ffs_u32(uint32_t x) {
#if LG_SIZEOF_INT == 2
return ffs_u(x);
#else
# error No implementation for 32-bit ffs()
#endif
}
static inline unsigned
fls_u32(uint32_t x) {
#if LG_SIZEOF_INT == 2
return fls_u(x);
#else
# error No implementation for 32-bit fls()
#endif
}
static inline uint64_t
pow2_ceil_u64(uint64_t x) {
if (unlikely(x <= 1)) {
return x;
}
size_t msb_on_index = fls_u64(x - 1);
/*
* Range-check; it's on the callers to ensure that the result of this
* call won't overflow.
*/
assert(msb_on_index < 63);
return 1ULL << (msb_on_index + 1);
}
static inline uint32_t
pow2_ceil_u32(uint32_t x) {
if (unlikely(x <= 1)) {
return x;
}
size_t msb_on_index = fls_u32(x - 1);
/* As above. */
assert(msb_on_index < 31);
return 1U << (msb_on_index + 1);
}
/* Compute the smallest power of 2 that is >= x. */
static inline size_t
pow2_ceil_zu(size_t x) {
#if (LG_SIZEOF_PTR == 3)
return pow2_ceil_u64(x);
#else
return pow2_ceil_u32(x);
#endif
}
static inline unsigned
lg_floor(size_t x) {
util_assume(x != 0);
#if (LG_SIZEOF_PTR == 3)
return fls_u64(x);
#else
return fls_u32(x);
#endif
}
static inline unsigned
lg_ceil(size_t x) {
return lg_floor(x) + ((x & (x - 1)) == 0 ? 0 : 1);
}
/* A compile-time version of lg_floor and lg_ceil. */
#define LG_FLOOR_1(x) 0
#define LG_FLOOR_2(x) (x < (1ULL << 1) ? LG_FLOOR_1(x) : 1 + LG_FLOOR_1(x >> 1))
#define LG_FLOOR_4(x) (x < (1ULL << 2) ? LG_FLOOR_2(x) : 2 + LG_FLOOR_2(x >> 2))
#define LG_FLOOR_8(x) (x < (1ULL << 4) ? LG_FLOOR_4(x) : 4 + LG_FLOOR_4(x >> 4))
#define LG_FLOOR_16(x) \
(x < (1ULL << 8) ? LG_FLOOR_8(x) : 8 + LG_FLOOR_8(x >> 8))
#define LG_FLOOR_32(x) \
(x < (1ULL << 16) ? LG_FLOOR_16(x) : 16 + LG_FLOOR_16(x >> 16))
#define LG_FLOOR_64(x) \
(x < (1ULL << 32) ? LG_FLOOR_32(x) : 32 + LG_FLOOR_32(x >> 32))
#if LG_SIZEOF_PTR == 2
# define LG_FLOOR(x) LG_FLOOR_32((x))
#else
# define LG_FLOOR(x) LG_FLOOR_64((x))
#endif
#define LG_CEIL(x) (LG_FLOOR(x) + (((x) & ((x) - 1)) == 0 ? 0 : 1))
#endif /* JEMALLOC_INTERNAL_BIT_UTIL_H */

View file

@ -1,19 +1,27 @@
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
#ifndef JEMALLOC_INTERNAL_BITMAP_H
#define JEMALLOC_INTERNAL_BITMAP_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/bit_util.h"
#include "jemalloc/internal/sc.h"
typedef unsigned long bitmap_t;
#define LG_SIZEOF_BITMAP LG_SIZEOF_LONG
/* Maximum bitmap bit count is 2^LG_BITMAP_MAXBITS. */
#define LG_BITMAP_MAXBITS LG_RUN_MAXREGS
#define BITMAP_MAXBITS (ZU(1) << LG_BITMAP_MAXBITS)
typedef struct bitmap_level_s bitmap_level_t;
typedef struct bitmap_info_s bitmap_info_t;
typedef unsigned long bitmap_t;
#define LG_SIZEOF_BITMAP LG_SIZEOF_LONG
#if SC_LG_SLAB_MAXREGS > LG_CEIL(SC_NSIZES)
/* Maximum bitmap bit count is determined by maximum regions per slab. */
# define LG_BITMAP_MAXBITS SC_LG_SLAB_MAXREGS
#else
/* Maximum bitmap bit count is determined by number of extent size classes. */
# define LG_BITMAP_MAXBITS LG_CEIL(SC_NSIZES)
#endif
#define BITMAP_MAXBITS (ZU(1) << LG_BITMAP_MAXBITS)
/* Number of bits per group. */
#define LG_BITMAP_GROUP_NBITS (LG_SIZEOF_BITMAP + 3)
#define BITMAP_GROUP_NBITS (ZU(1) << LG_BITMAP_GROUP_NBITS)
#define BITMAP_GROUP_NBITS_MASK (BITMAP_GROUP_NBITS-1)
#define LG_BITMAP_GROUP_NBITS (LG_SIZEOF_BITMAP + 3)
#define BITMAP_GROUP_NBITS (1U << LG_BITMAP_GROUP_NBITS)
#define BITMAP_GROUP_NBITS_MASK (BITMAP_GROUP_NBITS - 1)
/*
* Do some analysis on how big the bitmap is before we use a tree. For a brute
@ -21,81 +29,139 @@ typedef unsigned long bitmap_t;
* use a tree instead.
*/
#if LG_BITMAP_MAXBITS - LG_BITMAP_GROUP_NBITS > 3
# define USE_TREE
# define BITMAP_USE_TREE
#endif
/* Number of groups required to store a given number of bits. */
#define BITMAP_BITS2GROUPS(nbits) \
((nbits + BITMAP_GROUP_NBITS_MASK) >> LG_BITMAP_GROUP_NBITS)
#define BITMAP_BITS2GROUPS(nbits) \
(((nbits) + BITMAP_GROUP_NBITS_MASK) >> LG_BITMAP_GROUP_NBITS)
/*
* Number of groups required at a particular level for a given number of bits.
*/
#define BITMAP_GROUPS_L0(nbits) \
BITMAP_BITS2GROUPS(nbits)
#define BITMAP_GROUPS_L1(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(nbits))
#define BITMAP_GROUPS_L2(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits))))
#define BITMAP_GROUPS_L3(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS( \
BITMAP_BITS2GROUPS((nbits)))))
#define BITMAP_GROUPS_L0(nbits) BITMAP_BITS2GROUPS(nbits)
#define BITMAP_GROUPS_L1(nbits) BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(nbits))
#define BITMAP_GROUPS_L2(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits))))
#define BITMAP_GROUPS_L3(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS( \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits)))))
#define BITMAP_GROUPS_L4(nbits) \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS( \
BITMAP_BITS2GROUPS(BITMAP_BITS2GROUPS((nbits))))))
/*
* Assuming the number of levels, number of groups required for a given number
* of bits.
*/
#define BITMAP_GROUPS_1_LEVEL(nbits) \
BITMAP_GROUPS_L0(nbits)
#define BITMAP_GROUPS_2_LEVEL(nbits) \
(BITMAP_GROUPS_1_LEVEL(nbits) + BITMAP_GROUPS_L1(nbits))
#define BITMAP_GROUPS_3_LEVEL(nbits) \
(BITMAP_GROUPS_2_LEVEL(nbits) + BITMAP_GROUPS_L2(nbits))
#define BITMAP_GROUPS_4_LEVEL(nbits) \
(BITMAP_GROUPS_3_LEVEL(nbits) + BITMAP_GROUPS_L3(nbits))
#define BITMAP_GROUPS_1_LEVEL(nbits) BITMAP_GROUPS_L0(nbits)
#define BITMAP_GROUPS_2_LEVEL(nbits) \
(BITMAP_GROUPS_1_LEVEL(nbits) + BITMAP_GROUPS_L1(nbits))
#define BITMAP_GROUPS_3_LEVEL(nbits) \
(BITMAP_GROUPS_2_LEVEL(nbits) + BITMAP_GROUPS_L2(nbits))
#define BITMAP_GROUPS_4_LEVEL(nbits) \
(BITMAP_GROUPS_3_LEVEL(nbits) + BITMAP_GROUPS_L3(nbits))
#define BITMAP_GROUPS_5_LEVEL(nbits) \
(BITMAP_GROUPS_4_LEVEL(nbits) + BITMAP_GROUPS_L4(nbits))
/*
* Maximum number of groups required to support LG_BITMAP_MAXBITS.
*/
#ifdef USE_TREE
#ifdef BITMAP_USE_TREE
#if LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_1_LEVEL(BITMAP_MAXBITS)
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 2
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_2_LEVEL(BITMAP_MAXBITS)
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 3
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_3_LEVEL(BITMAP_MAXBITS)
#elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 4
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_4_LEVEL(BITMAP_MAXBITS)
#else
# error "Unsupported bitmap size"
#endif
# if LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_1_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_1_LEVEL(BITMAP_MAXBITS)
# elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 2
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_2_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_2_LEVEL(BITMAP_MAXBITS)
# elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 3
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_3_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_3_LEVEL(BITMAP_MAXBITS)
# elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 4
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_4_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_4_LEVEL(BITMAP_MAXBITS)
# elif LG_BITMAP_MAXBITS <= LG_BITMAP_GROUP_NBITS * 5
# define BITMAP_GROUPS(nbits) BITMAP_GROUPS_5_LEVEL(nbits)
# define BITMAP_GROUPS_MAX BITMAP_GROUPS_5_LEVEL(BITMAP_MAXBITS)
# else
# error "Unsupported bitmap size"
# endif
/* Maximum number of levels possible. */
#define BITMAP_MAX_LEVELS \
(LG_BITMAP_MAXBITS / LG_SIZEOF_BITMAP) \
+ !!(LG_BITMAP_MAXBITS % LG_SIZEOF_BITMAP)
/*
* Maximum number of levels possible. This could be statically computed based
* on LG_BITMAP_MAXBITS:
*
* #define BITMAP_MAX_LEVELS \
* (LG_BITMAP_MAXBITS / LG_SIZEOF_BITMAP) \
* + !!(LG_BITMAP_MAXBITS % LG_SIZEOF_BITMAP)
*
* However, that would not allow the generic BITMAP_INFO_INITIALIZER() macro, so
* instead hardcode BITMAP_MAX_LEVELS to the largest number supported by the
* various cascading macros. The only additional cost this incurs is some
* unused trailing entries in bitmap_info_t structures; the bitmaps themselves
* are not impacted.
*/
# define BITMAP_MAX_LEVELS 5
#else /* USE_TREE */
# define BITMAP_INFO_INITIALIZER(nbits) \
{ \
/* nbits. */ \
nbits, /* nlevels. */ \
(BITMAP_GROUPS_L0(nbits) \
> BITMAP_GROUPS_L1(nbits)) \
+ (BITMAP_GROUPS_L1(nbits) \
> BITMAP_GROUPS_L2(nbits)) \
+ (BITMAP_GROUPS_L2(nbits) \
> BITMAP_GROUPS_L3(nbits)) \
+ (BITMAP_GROUPS_L3(nbits) \
> BITMAP_GROUPS_L4(nbits)) \
+ 1, /* levels. */ \
{ \
{0}, {BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L1(nbits) \
+ BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L2(nbits) \
+ BITMAP_GROUPS_L1(nbits) \
+ BITMAP_GROUPS_L0(nbits)}, \
{BITMAP_GROUPS_L3(nbits) \
+ BITMAP_GROUPS_L2(nbits) \
+ BITMAP_GROUPS_L1(nbits) \
+ BITMAP_GROUPS_L0(nbits)}, \
{ \
BITMAP_GROUPS_L4(nbits) \
+ BITMAP_GROUPS_L3(nbits) \
+ BITMAP_GROUPS_L2(nbits) \
+ BITMAP_GROUPS_L1(nbits) \
+ BITMAP_GROUPS_L0(nbits) \
} \
} \
}
#define BITMAP_GROUPS_MAX BITMAP_BITS2GROUPS(BITMAP_MAXBITS)
#else /* BITMAP_USE_TREE */
#endif /* USE_TREE */
# define BITMAP_GROUPS(nbits) BITMAP_BITS2GROUPS(nbits)
# define BITMAP_GROUPS_MAX BITMAP_BITS2GROUPS(BITMAP_MAXBITS)
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
# define BITMAP_INFO_INITIALIZER(nbits) \
{ \
/* nbits. */ \
nbits, /* ngroups. */ \
BITMAP_BITS2GROUPS(nbits) \
}
struct bitmap_level_s {
#endif /* BITMAP_USE_TREE */
typedef struct bitmap_level_s {
/* Offset of this level's groups within the array of groups. */
size_t group_offset;
};
} bitmap_level_t;
struct bitmap_info_s {
typedef struct bitmap_info_s {
/* Logical number of bits in bitmap (stored at bottom level). */
size_t nbits;
#ifdef USE_TREE
#ifdef BITMAP_USE_TREE
/* Number of levels necessary for nbits. */
unsigned nlevels;
@ -103,39 +169,21 @@ struct bitmap_info_s {
* Only the first (nlevels+1) elements are used, and levels are ordered
* bottom to top (e.g. the bottom level is stored in levels[0]).
*/
bitmap_level_t levels[BITMAP_MAX_LEVELS+1];
#else /* USE_TREE */
bitmap_level_t levels[BITMAP_MAX_LEVELS + 1];
#else /* BITMAP_USE_TREE */
/* Number of groups necessary for nbits. */
size_t ngroups;
#endif /* USE_TREE */
};
#endif /* BITMAP_USE_TREE */
} bitmap_info_t;
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
void bitmap_info_init(bitmap_info_t *binfo, size_t nbits);
void bitmap_init(bitmap_t *bitmap, const bitmap_info_t *binfo, bool fill);
size_t bitmap_size(const bitmap_info_t *binfo);
void bitmap_info_init(bitmap_info_t *binfo, size_t nbits);
void bitmap_init(bitmap_t *bitmap, const bitmap_info_t *binfo);
size_t bitmap_size(const bitmap_info_t *binfo);
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#ifndef JEMALLOC_ENABLE_INLINE
bool bitmap_full(bitmap_t *bitmap, const bitmap_info_t *binfo);
bool bitmap_get(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit);
void bitmap_set(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit);
size_t bitmap_sfu(bitmap_t *bitmap, const bitmap_info_t *binfo);
void bitmap_unset(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit);
#endif
#if (defined(JEMALLOC_ENABLE_INLINE) || defined(JEMALLOC_BITMAP_C_))
JEMALLOC_INLINE bool
bitmap_full(bitmap_t *bitmap, const bitmap_info_t *binfo)
{
#ifdef USE_TREE
size_t rgoff = binfo->levels[binfo->nlevels].group_offset - 1;
static inline bool
bitmap_full(bitmap_t *bitmap, const bitmap_info_t *binfo) {
#ifdef BITMAP_USE_TREE
size_t rgoff = binfo->levels[binfo->nlevels].group_offset - 1;
bitmap_t rg = bitmap[rgoff];
/* The bitmap is full iff the root group is 0. */
return (rg == 0);
@ -143,31 +191,30 @@ bitmap_full(bitmap_t *bitmap, const bitmap_info_t *binfo)
size_t i;
for (i = 0; i < binfo->ngroups; i++) {
if (bitmap[i] != 0)
return (false);
if (bitmap[i] != 0) {
return false;
}
}
return (true);
return true;
#endif
}
JEMALLOC_INLINE bool
bitmap_get(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
{
size_t goff;
static inline bool
bitmap_get(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit) {
size_t goff;
bitmap_t g;
assert(bit < binfo->nbits);
goff = bit >> LG_BITMAP_GROUP_NBITS;
g = bitmap[goff];
return (!(g & (ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK))));
return !(g & (ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK)));
}
JEMALLOC_INLINE void
bitmap_set(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
{
size_t goff;
static inline void
bitmap_set(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit) {
size_t goff;
bitmap_t *gp;
bitmap_t g;
bitmap_t g;
assert(bit < binfo->nbits);
assert(!bitmap_get(bitmap, binfo, bit));
@ -178,7 +225,7 @@ bitmap_set(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
g ^= ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK);
*gp = g;
assert(bitmap_get(bitmap, binfo, bit));
#ifdef USE_TREE
#ifdef BITMAP_USE_TREE
/* Propagate group state transitions up the tree. */
if (g == 0) {
unsigned i;
@ -190,51 +237,113 @@ bitmap_set(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
assert(g & (ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK)));
g ^= ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK);
*gp = g;
if (g != 0)
if (g != 0) {
break;
}
}
}
#endif
}
/* sfu: set first unset. */
JEMALLOC_INLINE size_t
bitmap_sfu(bitmap_t *bitmap, const bitmap_info_t *binfo)
{
/* ffu: find first unset >= bit. */
static inline size_t
bitmap_ffu(const bitmap_t *bitmap, const bitmap_info_t *binfo, size_t min_bit) {
assert(min_bit < binfo->nbits);
#ifdef BITMAP_USE_TREE
size_t bit = 0;
for (unsigned level = binfo->nlevels; level--;) {
size_t lg_bits_per_group = (LG_BITMAP_GROUP_NBITS
* (level + 1));
bitmap_t group = bitmap[binfo->levels[level].group_offset
+ (bit >> lg_bits_per_group)];
unsigned group_nmask =
(unsigned)(((min_bit > bit) ? (min_bit - bit) : 0)
>> (lg_bits_per_group - LG_BITMAP_GROUP_NBITS));
assert(group_nmask <= BITMAP_GROUP_NBITS);
bitmap_t group_mask = ~((1LU << group_nmask) - 1);
bitmap_t group_masked = group & group_mask;
if (group_masked == 0LU) {
if (group == 0LU) {
return binfo->nbits;
}
/*
* min_bit was preceded by one or more unset bits in
* this group, but there are no other unset bits in this
* group. Try again starting at the first bit of the
* next sibling. This will recurse at most once per
* non-root level.
*/
size_t sib_base = bit + (ZU(1) << lg_bits_per_group);
assert(sib_base > min_bit);
assert(sib_base > bit);
if (sib_base >= binfo->nbits) {
return binfo->nbits;
}
return bitmap_ffu(bitmap, binfo, sib_base);
}
bit += ((size_t)ffs_lu(group_masked))
<< (lg_bits_per_group - LG_BITMAP_GROUP_NBITS);
}
assert(bit >= min_bit);
assert(bit < binfo->nbits);
return bit;
#else
size_t i = min_bit >> LG_BITMAP_GROUP_NBITS;
bitmap_t g = bitmap[i]
& ~((1LU << (min_bit & BITMAP_GROUP_NBITS_MASK)) - 1);
size_t bit;
while (1) {
if (g != 0) {
bit = ffs_lu(g);
return (i << LG_BITMAP_GROUP_NBITS) + bit;
}
i++;
if (i >= binfo->ngroups) {
break;
}
g = bitmap[i];
}
return binfo->nbits;
#endif
}
/* sfu: set first unset. */
static inline size_t
bitmap_sfu(bitmap_t *bitmap, const bitmap_info_t *binfo) {
size_t bit;
bitmap_t g;
unsigned i;
assert(!bitmap_full(bitmap, binfo));
#ifdef USE_TREE
#ifdef BITMAP_USE_TREE
i = binfo->nlevels - 1;
g = bitmap[binfo->levels[i].group_offset];
bit = ffs_lu(g) - 1;
bit = ffs_lu(g);
while (i > 0) {
i--;
g = bitmap[binfo->levels[i].group_offset + bit];
bit = (bit << LG_BITMAP_GROUP_NBITS) + (ffs_lu(g) - 1);
bit = (bit << LG_BITMAP_GROUP_NBITS) + ffs_lu(g);
}
#else
i = 0;
g = bitmap[0];
while ((bit = ffs_lu(g)) == 0) {
while (g == 0) {
i++;
g = bitmap[i];
}
bit = (i << LG_BITMAP_GROUP_NBITS) + (bit - 1);
bit = (i << LG_BITMAP_GROUP_NBITS) + ffs_lu(g);
#endif
bitmap_set(bitmap, binfo, bit);
return (bit);
return bit;
}
JEMALLOC_INLINE void
bitmap_unset(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
{
size_t goff;
bitmap_t *gp;
bitmap_t g;
static inline void
bitmap_unset(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit) {
size_t goff;
bitmap_t *gp;
bitmap_t g;
UNUSED bool propagate;
assert(bit < binfo->nbits);
@ -247,7 +356,7 @@ bitmap_unset(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
g ^= ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK);
*gp = g;
assert(!bitmap_get(bitmap, binfo, bit));
#ifdef USE_TREE
#ifdef BITMAP_USE_TREE
/* Propagate group state transitions up the tree. */
if (propagate) {
unsigned i;
@ -261,14 +370,12 @@ bitmap_unset(bitmap_t *bitmap, const bitmap_info_t *binfo, size_t bit)
== 0);
g ^= ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK);
*gp = g;
if (!propagate)
if (!propagate) {
break;
}
}
}
#endif /* USE_TREE */
#endif /* BITMAP_USE_TREE */
}
#endif
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/
#endif /* JEMALLOC_INTERNAL_BITMAP_H */

View file

@ -0,0 +1,36 @@
#ifndef JEMALLOC_INTERNAL_BUF_WRITER_H
#define JEMALLOC_INTERNAL_BUF_WRITER_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/tsd_types.h"
/*
* Note: when using the buffered writer, cbopaque is passed to write_cb only
* when the buffer is flushed. It would make a difference if cbopaque points
* to something that's changing for each write_cb call, or something that
* affects write_cb in a way dependent on the content of the output string.
* However, the most typical usage case in practice is that cbopaque points to
* some "option like" content for the write_cb, so it doesn't matter.
*/
typedef struct {
write_cb_t *write_cb;
void *cbopaque;
char *buf;
size_t buf_size;
size_t buf_end;
bool internal_buf;
} buf_writer_t;
bool buf_writer_init(tsdn_t *tsdn, buf_writer_t *buf_writer,
write_cb_t *write_cb, void *cbopaque, char *buf, size_t buf_len);
void buf_writer_flush(buf_writer_t *buf_writer);
write_cb_t buf_writer_cb;
void buf_writer_terminate(tsdn_t *tsdn, buf_writer_t *buf_writer);
typedef ssize_t(read_cb_t)(void *read_cbopaque, void *buf, size_t limit);
void buf_writer_pipe(
buf_writer_t *buf_writer, read_cb_t *read_cb, void *read_cbopaque);
#endif /* JEMALLOC_INTERNAL_BUF_WRITER_H */

View file

@ -0,0 +1,777 @@
#ifndef JEMALLOC_INTERNAL_CACHE_BIN_H
#define JEMALLOC_INTERNAL_CACHE_BIN_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/jemalloc_internal_externs.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/safety_check.h"
#include "jemalloc/internal/sz.h"
/*
* The cache_bins are the mechanism that the tcache and the arena use to
* communicate. The tcache fills from and flushes to the arena by passing a
* cache_bin_t to fill/flush. When the arena needs to pull stats from the
* tcaches associated with it, it does so by iterating over its
* cache_bin_array_descriptor_t objects and reading out per-bin stats it
* contains. This makes it so that the arena need not know about the existence
* of the tcache at all.
*/
/*
* The size in bytes of each cache bin stack. We also use this to indicate
* *counts* of individual objects.
*/
typedef uint16_t cache_bin_sz_t;
#define JUNK_ADDR ((uintptr_t)0x7a7a7a7a7a7a7a7aULL)
/*
* Leave a noticeable mark pattern on the cache bin stack boundaries, in case a
* bug starts leaking those. Make it look like the junk pattern but be distinct
* from it.
*/
static const uintptr_t cache_bin_preceding_junk = JUNK_ADDR;
/* Note: JUNK_ADDR vs. JUNK_ADDR + 1 -- this tells you which pointer leaked. */
static const uintptr_t cache_bin_trailing_junk = JUNK_ADDR + 1;
/*
* A pointer used to initialize a fake stack_head for disabled small bins
* so that the enabled/disabled assessment does not rely on ncached_max.
*/
extern const uintptr_t disabled_bin;
/*
* That implies the following value, for the maximum number of items in any
* individual bin. The cache bins track their bounds looking just at the low
* bits of a pointer, compared against a cache_bin_sz_t. So that's
* 1 << (sizeof(cache_bin_sz_t) * 8)
* bytes spread across pointer sized objects to get the maximum.
*/
#define CACHE_BIN_NCACHED_MAX \
(((size_t)1 << sizeof(cache_bin_sz_t) * 8) / sizeof(void *) - 1)
/*
* This lives inside the cache_bin (for locality reasons), and is initialized
* alongside it, but is otherwise not modified by any cache bin operations.
* It's logically public and maintained by its callers.
*/
typedef struct cache_bin_stats_s cache_bin_stats_t;
struct cache_bin_stats_s {
/*
* Number of allocation requests that corresponded to the size of this
* bin.
*/
uint64_t nrequests;
};
/*
* Read-only information associated with each element of tcache_t's tbins array
* is stored separately, mainly to reduce memory usage.
*/
typedef struct cache_bin_info_s cache_bin_info_t;
struct cache_bin_info_s {
cache_bin_sz_t ncached_max;
};
/*
* Responsible for caching allocations associated with a single size.
*
* Several pointers are used to track the stack. To save on metadata bytes,
* only the stack_head is a full sized pointer (which is dereferenced on the
* fastpath), while the others store only the low 16 bits -- this is correct
* because a single stack never takes more space than 2^16 bytes, and at the
* same time only equality checks are performed on the low bits.
*
* (low addr) (high addr)
* |------stashed------|------available------|------cached-----|
* ^ ^ ^ ^
* low_bound(derived) low_bits_full stack_head low_bits_empty
*/
typedef struct cache_bin_s cache_bin_t;
struct cache_bin_s {
/*
* The stack grows down. Whenever the bin is nonempty, the head points
* to an array entry containing a valid allocation. When it is empty,
* the head points to one element past the owned array.
*/
void **stack_head;
/*
* cur_ptr and stats are both modified frequently. Let's keep them
* close so that they have a higher chance of being on the same
* cacheline, thus less write-backs.
*/
cache_bin_stats_t tstats;
/*
* The low bits of the address of the first item in the stack that
* hasn't been used since the last GC, to track the low water mark (min
* # of cached items).
*
* Since the stack grows down, this is a higher address than
* low_bits_full.
*/
cache_bin_sz_t low_bits_low_water;
/*
* The low bits of the value that stack_head will take on when the array
* is full (of cached & stashed items). But remember that stack_head
* always points to a valid item when the array is nonempty -- this is
* in the array.
*
* Recall that since the stack grows down, this is the lowest available
* address in the array for caching. Only adjusted when stashing items.
*/
cache_bin_sz_t low_bits_full;
/*
* The low bits of the value that stack_head will take on when the array
* is empty.
*
* The stack grows down -- this is one past the highest address in the
* array. Immutable after initialization.
*/
cache_bin_sz_t low_bits_empty;
/* The maximum number of cached items in the bin. */
cache_bin_info_t bin_info;
};
/*
* The cache_bins live inside the tcache, but the arena (by design) isn't
* supposed to know much about tcache internals. To let the arena iterate over
* associated bins, we keep (with the tcache) a linked list of
* cache_bin_array_descriptor_ts that tell the arena how to find the bins.
*/
typedef struct cache_bin_array_descriptor_s cache_bin_array_descriptor_t;
struct cache_bin_array_descriptor_s {
/*
* The arena keeps a list of the cache bins associated with it, for
* stats collection.
*/
ql_elm(cache_bin_array_descriptor_t) link;
/* Pointers to the tcache bins. */
cache_bin_t *bins;
};
static inline void
cache_bin_array_descriptor_init(
cache_bin_array_descriptor_t *descriptor, cache_bin_t *bins) {
ql_elm_new(descriptor, link);
descriptor->bins = bins;
}
JEMALLOC_ALWAYS_INLINE bool
cache_bin_nonfast_aligned(const void *ptr) {
if (!config_uaf_detection) {
return false;
}
/*
* Currently we use alignment to decide which pointer to junk & stash on
* dealloc (for catching use-after-free). In some common cases a
* page-aligned check is needed already (sdalloc w/ config_prof), so we
* are getting it more or less for free -- no added instructions on
* free_fastpath.
*
* Another way of deciding which pointer to sample, is adding another
* thread_event to pick one every N bytes. That also adds no cost on
* the fastpath, however it will tend to pick large allocations which is
* not the desired behavior.
*/
return ((uintptr_t)ptr & san_cache_bin_nonfast_mask) == 0;
}
static inline const void *
cache_bin_disabled_bin_stack(void) {
return &disabled_bin;
}
/*
* If a cache bin was zero initialized (either because it lives in static or
* thread-local storage, or was memset to 0), this function indicates whether or
* not cache_bin_init was called on it.
*/
static inline bool
cache_bin_still_zero_initialized(cache_bin_t *bin) {
return bin->stack_head == NULL;
}
static inline bool
cache_bin_disabled(cache_bin_t *bin) {
bool disabled = (bin->stack_head == cache_bin_disabled_bin_stack());
if (disabled) {
assert((uintptr_t)(*bin->stack_head) == JUNK_ADDR);
}
return disabled;
}
/* Gets ncached_max without asserting that the bin is enabled. */
static inline cache_bin_sz_t
cache_bin_ncached_max_get_unsafe(cache_bin_t *bin) {
return bin->bin_info.ncached_max;
}
/* Returns ncached_max: Upper limit on ncached. */
static inline cache_bin_sz_t
cache_bin_ncached_max_get(cache_bin_t *bin) {
assert(!cache_bin_disabled(bin));
return cache_bin_ncached_max_get_unsafe(bin);
}
/*
* Internal.
*
* Asserts that the pointer associated with earlier is <= the one associated
* with later.
*/
static inline void
cache_bin_assert_earlier(
cache_bin_t *bin, cache_bin_sz_t earlier, cache_bin_sz_t later) {
if (earlier > later) {
assert(bin->low_bits_full > bin->low_bits_empty);
}
}
/*
* Internal.
*
* Does difference calculations that handle wraparound correctly. Earlier must
* be associated with the position earlier in memory.
*/
static inline cache_bin_sz_t
cache_bin_diff(cache_bin_t *bin, cache_bin_sz_t earlier, cache_bin_sz_t later) {
cache_bin_assert_earlier(bin, earlier, later);
return later - earlier;
}
/*
* Number of items currently cached in the bin, without checking ncached_max.
*/
static inline cache_bin_sz_t
cache_bin_ncached_get_internal(cache_bin_t *bin) {
cache_bin_sz_t diff = cache_bin_diff(bin,
(cache_bin_sz_t)(uintptr_t)bin->stack_head, bin->low_bits_empty);
cache_bin_sz_t n = diff / sizeof(void *);
/*
* We have undefined behavior here; if this function is called from the
* arena stats updating code, then stack_head could change from the
* first line to the next one. Morally, these loads should be atomic,
* but compilers won't currently generate comparisons with in-memory
* operands against atomics, and these variables get accessed on the
* fast paths. This should still be "safe" in the sense of generating
* the correct assembly for the foreseeable future, though.
*/
assert(n == 0 || *(bin->stack_head) != NULL);
return n;
}
/*
* Number of items currently cached in the bin, with checking ncached_max. The
* caller must know that no concurrent modification of the cache_bin is
* possible.
*/
static inline cache_bin_sz_t
cache_bin_ncached_get_local(cache_bin_t *bin) {
cache_bin_sz_t n = cache_bin_ncached_get_internal(bin);
assert(n <= cache_bin_ncached_max_get(bin));
return n;
}
/*
* Internal.
*
* A pointer to the position one past the end of the backing array.
*
* Do not call if racy, because both 'bin->stack_head' and 'bin->low_bits_full'
* are subject to concurrent modifications.
*/
static inline void **
cache_bin_empty_position_get(cache_bin_t *bin) {
cache_bin_sz_t diff = cache_bin_diff(bin,
(cache_bin_sz_t)(uintptr_t)bin->stack_head, bin->low_bits_empty);
byte_t *empty_bits = (byte_t *)bin->stack_head + diff;
void **ret = (void **)empty_bits;
assert(ret >= bin->stack_head);
return ret;
}
/*
* Internal.
*
* Calculates low bits of the lower bound of the usable cache bin's range (see
* cache_bin_t visual representation above).
*
* No values are concurrently modified, so should be safe to read in a
* multithreaded environment. Currently concurrent access happens only during
* arena statistics collection.
*/
static inline cache_bin_sz_t
cache_bin_low_bits_low_bound_get(cache_bin_t *bin) {
return (cache_bin_sz_t)bin->low_bits_empty
- cache_bin_ncached_max_get(bin) * sizeof(void *);
}
/*
* Internal.
*
* A pointer to the position with the lowest address of the backing array.
*/
static inline void **
cache_bin_low_bound_get(cache_bin_t *bin) {
cache_bin_sz_t ncached_max = cache_bin_ncached_max_get(bin);
void **ret = cache_bin_empty_position_get(bin) - ncached_max;
assert(ret <= bin->stack_head);
return ret;
}
/*
* As the name implies. This is important since it's not correct to try to
* batch fill a nonempty cache bin.
*/
static inline void
cache_bin_assert_empty(cache_bin_t *bin) {
assert(cache_bin_ncached_get_local(bin) == 0);
assert(cache_bin_empty_position_get(bin) == bin->stack_head);
}
/*
* Get low water, but without any of the correctness checking we do for the
* caller-usable version, if we are temporarily breaking invariants (like
* ncached >= low_water during flush).
*/
static inline cache_bin_sz_t
cache_bin_low_water_get_internal(cache_bin_t *bin) {
return cache_bin_diff(bin, bin->low_bits_low_water, bin->low_bits_empty)
/ sizeof(void *);
}
/* Returns the numeric value of low water in [0, ncached]. */
static inline cache_bin_sz_t
cache_bin_low_water_get(cache_bin_t *bin) {
cache_bin_sz_t low_water = cache_bin_low_water_get_internal(bin);
assert(low_water <= cache_bin_ncached_max_get(bin));
assert(low_water <= cache_bin_ncached_get_local(bin));
cache_bin_assert_earlier(bin,
(cache_bin_sz_t)(uintptr_t)bin->stack_head,
bin->low_bits_low_water);
return low_water;
}
/*
* Indicates that the current cache bin position should be the low water mark
* going forward.
*/
static inline void
cache_bin_low_water_set(cache_bin_t *bin) {
assert(!cache_bin_disabled(bin));
bin->low_bits_low_water = (cache_bin_sz_t)(uintptr_t)bin->stack_head;
}
static inline void
cache_bin_low_water_adjust(cache_bin_t *bin) {
assert(!cache_bin_disabled(bin));
if (cache_bin_ncached_get_internal(bin)
< cache_bin_low_water_get_internal(bin)) {
cache_bin_low_water_set(bin);
}
}
JEMALLOC_ALWAYS_INLINE void *
cache_bin_alloc_impl(cache_bin_t *bin, bool *success, bool adjust_low_water) {
/*
* success (instead of ret) should be checked upon the return of this
* function. We avoid checking (ret == NULL) because there is never a
* null stored on the avail stack (which is unknown to the compiler),
* and eagerly checking ret would cause pipeline stall (waiting for the
* cacheline).
*/
/*
* This may read from the empty position; however the loaded value won't
* be used. It's safe because the stack has one more slot reserved.
*/
void *ret = *bin->stack_head;
cache_bin_sz_t low_bits = (cache_bin_sz_t)(uintptr_t)bin->stack_head;
void **new_head = bin->stack_head + 1;
/*
* Note that the low water mark is at most empty; if we pass this check,
* we know we're non-empty.
*/
if (likely(low_bits != bin->low_bits_low_water)) {
bin->stack_head = new_head;
*success = true;
return ret;
}
if (!adjust_low_water) {
*success = false;
return NULL;
}
/*
* In the fast-path case where we call alloc_easy and then alloc, the
* previous checking and computation is optimized away -- we didn't
* actually commit any of our operations.
*/
if (likely(low_bits != bin->low_bits_empty)) {
bin->stack_head = new_head;
bin->low_bits_low_water = (cache_bin_sz_t)(uintptr_t)new_head;
*success = true;
return ret;
}
*success = false;
return NULL;
}
/*
* Allocate an item out of the bin, failing if we're at the low-water mark.
*/
JEMALLOC_ALWAYS_INLINE void *
cache_bin_alloc_easy(cache_bin_t *bin, bool *success) {
/* We don't look at info if we're not adjusting low-water. */
return cache_bin_alloc_impl(bin, success, false);
}
/*
* Allocate an item out of the bin, even if we're currently at the low-water
* mark (and failing only if the bin is empty).
*/
JEMALLOC_ALWAYS_INLINE void *
cache_bin_alloc(cache_bin_t *bin, bool *success) {
return cache_bin_alloc_impl(bin, success, true);
}
JEMALLOC_ALWAYS_INLINE cache_bin_sz_t
cache_bin_alloc_batch(cache_bin_t *bin, size_t num, void **out) {
cache_bin_sz_t n = cache_bin_ncached_get_internal(bin);
if (n > num) {
n = (cache_bin_sz_t)num;
}
memcpy(out, bin->stack_head, n * sizeof(void *));
bin->stack_head += n;
cache_bin_low_water_adjust(bin);
return n;
}
JEMALLOC_ALWAYS_INLINE bool
cache_bin_full(cache_bin_t *bin) {
return (
(cache_bin_sz_t)(uintptr_t)bin->stack_head == bin->low_bits_full);
}
/*
* Scans the allocated area of the cache_bin for the given pointer up to limit.
* Fires safety_check_fail if the ptr is found and returns true.
*/
JEMALLOC_ALWAYS_INLINE bool
cache_bin_dalloc_safety_checks(cache_bin_t *bin, void *ptr) {
if (!config_debug || opt_debug_double_free_max_scan == 0) {
return false;
}
cache_bin_sz_t ncached = cache_bin_ncached_get_internal(bin);
unsigned max_scan = opt_debug_double_free_max_scan < ncached
? opt_debug_double_free_max_scan
: ncached;
void **cur = bin->stack_head;
void **limit = cur + max_scan;
for (; cur < limit; cur++) {
if (*cur == ptr) {
safety_check_fail(
"Invalid deallocation detected: double free of "
"pointer %p\n",
ptr);
return true;
}
}
return false;
}
/*
* Free an object into the given bin. Fails only if the bin is full.
*/
JEMALLOC_ALWAYS_INLINE bool
cache_bin_dalloc_easy(cache_bin_t *bin, void *ptr) {
if (unlikely(cache_bin_full(bin))) {
return false;
}
if (unlikely(cache_bin_dalloc_safety_checks(bin, ptr))) {
return true;
}
bin->stack_head--;
*bin->stack_head = ptr;
cache_bin_assert_earlier(bin, bin->low_bits_full,
(cache_bin_sz_t)(uintptr_t)bin->stack_head);
return true;
}
/* Returns false if failed to stash (i.e. bin is full). */
JEMALLOC_ALWAYS_INLINE bool
cache_bin_stash(cache_bin_t *bin, void *ptr) {
if (cache_bin_full(bin)) {
return false;
}
/* Stash at the full position, in the [full, head) range. */
cache_bin_sz_t low_bits_head = (cache_bin_sz_t)(uintptr_t)
bin->stack_head;
/* Wraparound handled as well. */
cache_bin_sz_t diff = cache_bin_diff(
bin, bin->low_bits_full, low_bits_head);
*(void **)((byte_t *)bin->stack_head - diff) = ptr;
assert(!cache_bin_full(bin));
bin->low_bits_full += sizeof(void *);
cache_bin_assert_earlier(bin, bin->low_bits_full, low_bits_head);
return true;
}
/* Get the number of stashed pointers. */
JEMALLOC_ALWAYS_INLINE cache_bin_sz_t
cache_bin_nstashed_get_internal(cache_bin_t *bin) {
cache_bin_sz_t ncached_max = cache_bin_ncached_max_get(bin);
cache_bin_sz_t low_bits_low_bound = cache_bin_low_bits_low_bound_get(
bin);
cache_bin_sz_t n = cache_bin_diff(
bin, low_bits_low_bound, bin->low_bits_full)
/ sizeof(void *);
assert(n <= ncached_max);
if (config_debug && n != 0) {
/* Below are for assertions only. */
void **low_bound = cache_bin_low_bound_get(bin);
assert(
(cache_bin_sz_t)(uintptr_t)low_bound == low_bits_low_bound);
void *stashed = *(low_bound + n - 1);
bool aligned = cache_bin_nonfast_aligned(stashed);
#ifdef JEMALLOC_JET
/* Allow arbitrary pointers to be stashed in tests. */
aligned = true;
#endif
assert(stashed != NULL && aligned);
}
return n;
}
JEMALLOC_ALWAYS_INLINE cache_bin_sz_t
cache_bin_nstashed_get_local(cache_bin_t *bin) {
cache_bin_sz_t n = cache_bin_nstashed_get_internal(bin);
assert(n <= cache_bin_ncached_max_get(bin));
return n;
}
/*
* Obtain a racy view of the number of items currently in the cache bin, in the
* presence of possible concurrent modifications.
*
* Note that this is the only racy function in this header. Any other functions
* are assumed to be non-racy. The "racy" term here means accessed from another
* thread (that is not the owner of the specific cache bin). This only happens
* when gathering stats (read-only). The only change because of the racy
* condition is that assertions based on mutable fields are omitted.
*
* It's important to keep in mind that 'bin->stack_head' and
* 'bin->low_bits_full' can be modified concurrently and almost no assertions
* about their values can be made.
*
* This function should not call other utility functions because the racy
* condition may cause unexpected / undefined behaviors in unverified utility
* functions. Currently, this function calls two utility functions
* cache_bin_ncached_max_get and cache_bin_low_bits_low_bound_get because
* they help access values that will not be concurrently modified.
*/
static inline void
cache_bin_nitems_get_remote(
cache_bin_t *bin, cache_bin_sz_t *ncached, cache_bin_sz_t *nstashed) {
/* Racy version of cache_bin_ncached_get_internal. */
cache_bin_sz_t diff = bin->low_bits_empty
- (cache_bin_sz_t)(uintptr_t)bin->stack_head;
cache_bin_sz_t n = diff / sizeof(void *);
*ncached = n;
/* Racy version of cache_bin_nstashed_get_internal. */
cache_bin_sz_t low_bits_low_bound = cache_bin_low_bits_low_bound_get(
bin);
n = (bin->low_bits_full - low_bits_low_bound) / sizeof(void *);
*nstashed = n;
/*
* Note that cannot assert anything regarding ncached_max because
* it can be configured on the fly and is thus racy.
*/
}
/*
* For small bins, used to calculate how many items to fill at a time.
* The final nfill is calculated by (ncached_max >> (base - offset)).
*/
typedef struct cache_bin_fill_ctl_s cache_bin_fill_ctl_t;
struct cache_bin_fill_ctl_s {
uint8_t base;
uint8_t offset;
};
/*
* Limit how many items can be flushed in a batch (Which is the upper bound
* for the nflush parameter in tcache_bin_flush_impl()).
* This is to avoid stack overflow when we do batch edata look up, which
* reserves a nflush * sizeof(emap_batch_lookup_result_t) stack variable.
*/
#define CACHE_BIN_NFLUSH_BATCH_MAX \
((VARIABLE_ARRAY_SIZE_MAX >> LG_SIZEOF_PTR) - 1)
/*
* Filling and flushing are done in batch, on arrays of void *s. For filling,
* the arrays go forward, and can be accessed with ordinary array arithmetic.
* For flushing, we work from the end backwards, and so need to use special
* accessors that invert the usual ordering.
*
* This is important for maintaining first-fit; the arena code fills with
* earliest objects first, and so those are the ones we should return first for
* cache_bin_alloc calls. When flushing, we should flush the objects that we
* wish to return later; those at the end of the array. This is better for the
* first-fit heuristic as well as for cache locality; the most recently freed
* objects are the ones most likely to still be in cache.
*
* This all sounds very hand-wavey and theoretical, but reverting the ordering
* on one or the other pathway leads to measurable slowdowns.
*/
typedef struct cache_bin_ptr_array_s cache_bin_ptr_array_t;
struct cache_bin_ptr_array_s {
cache_bin_sz_t n;
void **ptr;
};
/*
* Declare a cache_bin_ptr_array_t sufficient for nval items.
*
* In the current implementation, this could be just part of a
* cache_bin_ptr_array_init_... call, since we reuse the cache bin stack memory.
* Indirecting behind a macro, though, means experimenting with linked-list
* representations is easy (since they'll require an alloca in the calling
* frame).
*/
#define CACHE_BIN_PTR_ARRAY_DECLARE(name, nval) \
cache_bin_ptr_array_t name; \
name.n = (nval)
/*
* Start a fill. The bin must be empty, and This must be followed by a
* finish_fill call before doing any alloc/dalloc operations on the bin.
*/
static inline void
cache_bin_init_ptr_array_for_fill(
cache_bin_t *bin, cache_bin_ptr_array_t *arr, cache_bin_sz_t nfill) {
cache_bin_assert_empty(bin);
arr->ptr = cache_bin_empty_position_get(bin) - nfill;
}
/*
* While nfill in cache_bin_init_ptr_array_for_fill is the number we *intend* to
* fill, nfilled here is the number we actually filled (which may be less, in
* case of OOM.
*/
static inline void
cache_bin_finish_fill(
cache_bin_t *bin, cache_bin_ptr_array_t *arr, cache_bin_sz_t nfilled) {
cache_bin_assert_empty(bin);
void **empty_position = cache_bin_empty_position_get(bin);
if (nfilled < arr->n) {
memmove(empty_position - nfilled, empty_position - arr->n,
nfilled * sizeof(void *));
}
bin->stack_head = empty_position - nfilled;
/* Reset the bin stats as it's merged during fill. */
if (config_stats) {
bin->tstats.nrequests = 0;
}
}
/*
* Same deal, but with flush. Unlike fill (which can fail), the user must flush
* everything we give them.
*/
static inline void
cache_bin_init_ptr_array_for_flush(
cache_bin_t *bin, cache_bin_ptr_array_t *arr, cache_bin_sz_t nflush) {
arr->ptr = cache_bin_empty_position_get(bin) - nflush;
assert(cache_bin_ncached_get_local(bin) == 0 || *arr->ptr != NULL);
}
static inline void
cache_bin_finish_flush(
cache_bin_t *bin, cache_bin_ptr_array_t *arr, cache_bin_sz_t nflushed) {
unsigned rem = cache_bin_ncached_get_local(bin) - nflushed;
memmove(
bin->stack_head + nflushed, bin->stack_head, rem * sizeof(void *));
bin->stack_head += nflushed;
cache_bin_low_water_adjust(bin);
/* Reset the bin stats as it's merged during flush. */
if (config_stats) {
bin->tstats.nrequests = 0;
}
}
static inline void
cache_bin_init_ptr_array_for_stashed(cache_bin_t *bin, szind_t binind,
cache_bin_ptr_array_t *arr, cache_bin_sz_t nstashed) {
assert(nstashed > 0);
assert(cache_bin_nstashed_get_local(bin) == nstashed);
void **low_bound = cache_bin_low_bound_get(bin);
arr->ptr = low_bound;
assert(*arr->ptr != NULL);
}
static inline void
cache_bin_finish_flush_stashed(cache_bin_t *bin) {
void **low_bound = cache_bin_low_bound_get(bin);
/* Reset the bin local full position. */
bin->low_bits_full = (uint16_t)(uintptr_t)low_bound;
assert(cache_bin_nstashed_get_local(bin) == 0);
/* Reset the bin stats as it's merged during flush. */
if (config_stats) {
bin->tstats.nrequests = 0;
}
}
/*
* Initialize a cache_bin_info to represent up to the given number of items in
* the cache_bins it is associated with.
*/
void cache_bin_info_init(
cache_bin_info_t *bin_info, cache_bin_sz_t ncached_max);
/*
* Given an array of initialized cache_bin_info_ts, determine how big an
* allocation is required to initialize a full set of cache_bin_ts.
*/
void cache_bin_info_compute_alloc(const cache_bin_info_t *infos, szind_t ninfos,
size_t *size, size_t *alignment);
/*
* Actually initialize some cache bins. Callers should allocate the backing
* memory indicated by a call to cache_bin_compute_alloc. They should then
* preincrement, call init once for each bin and info, and then call
* cache_bin_postincrement. *alloc_cur will then point immediately past the end
* of the allocation.
*/
void cache_bin_preincrement(const cache_bin_info_t *infos, szind_t ninfos,
void *alloc, size_t *cur_offset);
void cache_bin_postincrement(void *alloc, size_t *cur_offset);
void cache_bin_init(cache_bin_t *bin, const cache_bin_info_t *info, void *alloc,
size_t *cur_offset);
void cache_bin_init_disabled(cache_bin_t *bin, cache_bin_sz_t ncached_max);
bool cache_bin_stack_use_thp(void);
#endif /* JEMALLOC_INTERNAL_CACHE_BIN_H */

View file

@ -1,96 +0,0 @@
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
/*
* Size and alignment of memory chunks that are allocated by the OS's virtual
* memory system.
*/
#define LG_CHUNK_DEFAULT 21
/* Return the chunk address for allocation address a. */
#define CHUNK_ADDR2BASE(a) \
((void *)((uintptr_t)(a) & ~chunksize_mask))
/* Return the chunk offset of address a. */
#define CHUNK_ADDR2OFFSET(a) \
((size_t)((uintptr_t)(a) & chunksize_mask))
/* Return the smallest chunk multiple that is >= s. */
#define CHUNK_CEILING(s) \
(((s) + chunksize_mask) & ~chunksize_mask)
#define CHUNK_HOOKS_INITIALIZER { \
NULL, \
NULL, \
NULL, \
NULL, \
NULL, \
NULL, \
NULL \
}
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
extern size_t opt_lg_chunk;
extern const char *opt_dss;
extern rtree_t chunks_rtree;
extern size_t chunksize;
extern size_t chunksize_mask; /* (chunksize - 1). */
extern size_t chunk_npages;
extern const chunk_hooks_t chunk_hooks_default;
chunk_hooks_t chunk_hooks_get(tsdn_t *tsdn, arena_t *arena);
chunk_hooks_t chunk_hooks_set(tsdn_t *tsdn, arena_t *arena,
const chunk_hooks_t *chunk_hooks);
bool chunk_register(tsdn_t *tsdn, const void *chunk,
const extent_node_t *node);
void chunk_deregister(const void *chunk, const extent_node_t *node);
void *chunk_alloc_base(size_t size);
void *chunk_alloc_cache(tsdn_t *tsdn, arena_t *arena,
chunk_hooks_t *chunk_hooks, void *new_addr, size_t size, size_t alignment,
bool *zero, bool *commit, bool dalloc_node);
void *chunk_alloc_wrapper(tsdn_t *tsdn, arena_t *arena,
chunk_hooks_t *chunk_hooks, void *new_addr, size_t size, size_t alignment,
bool *zero, bool *commit);
void chunk_dalloc_cache(tsdn_t *tsdn, arena_t *arena,
chunk_hooks_t *chunk_hooks, void *chunk, size_t size, bool committed);
void chunk_dalloc_wrapper(tsdn_t *tsdn, arena_t *arena,
chunk_hooks_t *chunk_hooks, void *chunk, size_t size, bool zeroed,
bool committed);
bool chunk_purge_wrapper(tsdn_t *tsdn, arena_t *arena,
chunk_hooks_t *chunk_hooks, void *chunk, size_t size, size_t offset,
size_t length);
bool chunk_boot(void);
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#ifndef JEMALLOC_ENABLE_INLINE
extent_node_t *chunk_lookup(const void *chunk, bool dependent);
#endif
#if (defined(JEMALLOC_ENABLE_INLINE) || defined(JEMALLOC_CHUNK_C_))
JEMALLOC_INLINE extent_node_t *
chunk_lookup(const void *ptr, bool dependent)
{
return (rtree_get(&chunks_rtree, (uintptr_t)ptr, dependent));
}
#endif
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/
#include "jemalloc/internal/chunk_dss.h"
#include "jemalloc/internal/chunk_mmap.h"

View file

@ -1,37 +0,0 @@
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
typedef enum {
dss_prec_disabled = 0,
dss_prec_primary = 1,
dss_prec_secondary = 2,
dss_prec_limit = 3
} dss_prec_t;
#define DSS_PREC_DEFAULT dss_prec_secondary
#define DSS_DEFAULT "secondary"
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
extern const char *dss_prec_names[];
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
dss_prec_t chunk_dss_prec_get(void);
bool chunk_dss_prec_set(dss_prec_t dss_prec);
void *chunk_alloc_dss(tsdn_t *tsdn, arena_t *arena, void *new_addr,
size_t size, size_t alignment, bool *zero, bool *commit);
bool chunk_in_dss(void *chunk);
bool chunk_dss_mergeable(void *chunk_a, void *chunk_b);
void chunk_dss_boot(void);
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/

View file

@ -1,21 +0,0 @@
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
void *chunk_alloc_mmap(void *new_addr, size_t size, size_t alignment,
bool *zero, bool *commit);
bool chunk_dalloc_mmap(void *chunk, size_t size);
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/

View file

@ -1,86 +1,102 @@
#ifndef JEMALLOC_INTERNAL_CKH_H
#define JEMALLOC_INTERNAL_CKH_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/tsd.h"
/* Cuckoo hashing implementation. Skip to the end for the interface. */
/******************************************************************************/
/* INTERNAL DEFINITIONS -- IGNORE */
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
typedef struct ckh_s ckh_t;
typedef struct ckhc_s ckhc_t;
/* Typedefs to allow easy function pointer passing. */
typedef void ckh_hash_t (const void *, size_t[2]);
typedef bool ckh_keycomp_t (const void *, const void *);
/* Maintain counters used to get an idea of performance. */
/* #define CKH_COUNT */
/* #define CKH_COUNT */
/* Print counter values in ckh_delete() (requires CKH_COUNT). */
/* #define CKH_VERBOSE */
/* #define CKH_VERBOSE */
/*
* There are 2^LG_CKH_BUCKET_CELLS cells in each hash table bucket. Try to fit
* one bucket per L1 cache line.
*/
#define LG_CKH_BUCKET_CELLS (LG_CACHELINE - LG_SIZEOF_PTR - 1)
#define LG_CKH_BUCKET_CELLS (LG_CACHELINE - LG_SIZEOF_PTR - 1)
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
/* Typedefs to allow easy function pointer passing. */
typedef void ckh_hash_t(const void *, size_t[2]);
typedef bool ckh_keycomp_t(const void *, const void *);
/* Hash table cell. */
struct ckhc_s {
const void *key;
const void *data;
};
typedef struct {
const void *key;
const void *data;
} ckhc_t;
struct ckh_s {
/* The hash table itself. */
typedef struct {
#ifdef CKH_COUNT
/* Counters used to get an idea of performance. */
uint64_t ngrows;
uint64_t nshrinks;
uint64_t nshrinkfails;
uint64_t ninserts;
uint64_t nrelocs;
uint64_t ngrows;
uint64_t nshrinks;
uint64_t nshrinkfails;
uint64_t ninserts;
uint64_t nrelocs;
#endif
/* Used for pseudo-random number generation. */
uint64_t prng_state;
uint64_t prng_state;
/* Total number of items. */
size_t count;
size_t count;
/*
* Minimum and current number of hash table buckets. There are
* 2^LG_CKH_BUCKET_CELLS cells per bucket.
*/
unsigned lg_minbuckets;
unsigned lg_curbuckets;
unsigned lg_minbuckets;
unsigned lg_curbuckets;
/* Hash and comparison functions. */
ckh_hash_t *hash;
ckh_keycomp_t *keycomp;
ckh_hash_t *hash;
ckh_keycomp_t *keycomp;
/* Hash table with 2^lg_curbuckets buckets. */
ckhc_t *tab;
};
ckhc_t *tab;
} ckh_t;
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
/* BEGIN PUBLIC API */
/******************************************************************************/
bool ckh_new(tsd_t *tsd, ckh_t *ckh, size_t minitems, ckh_hash_t *hash,
/* Lifetime management. Minitems is the initial capacity. */
bool ckh_new(tsd_t *tsd, ckh_t *ckh, size_t minitems, ckh_hash_t *hash,
ckh_keycomp_t *keycomp);
void ckh_delete(tsd_t *tsd, ckh_t *ckh);
size_t ckh_count(ckh_t *ckh);
bool ckh_iter(ckh_t *ckh, size_t *tabind, void **key, void **data);
bool ckh_insert(tsd_t *tsd, ckh_t *ckh, const void *key, const void *data);
bool ckh_remove(tsd_t *tsd, ckh_t *ckh, const void *searchkey, void **key,
void **data);
bool ckh_search(ckh_t *ckh, const void *searchkey, void **key, void **data);
void ckh_string_hash(const void *key, size_t r_hash[2]);
bool ckh_string_keycomp(const void *k1, const void *k2);
void ckh_pointer_hash(const void *key, size_t r_hash[2]);
bool ckh_pointer_keycomp(const void *k1, const void *k2);
void ckh_delete(tsd_t *tsd, ckh_t *ckh);
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
/* Get the number of elements in the set. */
size_t ckh_count(ckh_t *ckh);
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/
/*
* To iterate over the elements in the table, initialize *tabind to 0 and call
* this function until it returns true. Each call that returns false will
* update *key and *data to the next element in the table, assuming the pointers
* are non-NULL.
*/
bool ckh_iter(ckh_t *ckh, size_t *tabind, void **key, void **data);
/*
* Basic hash table operations -- insert, removal, lookup. For ckh_remove and
* ckh_search, key or data can be NULL. The hash-table only stores pointers to
* the key and value, and doesn't do any lifetime management.
*/
bool ckh_insert(tsd_t *tsd, ckh_t *ckh, const void *key, const void *data);
bool ckh_remove(
tsd_t *tsd, ckh_t *ckh, const void *searchkey, void **key, void **data);
bool ckh_search(ckh_t *ckh, const void *searchkey, void **key, void **data);
/* Some useful hash and comparison functions for strings and pointers. */
void ckh_string_hash(const void *key, size_t r_hash[2]);
bool ckh_string_keycomp(const void *k1, const void *k2);
void ckh_pointer_hash(const void *key, size_t r_hash[2]);
bool ckh_pointer_keycomp(const void *k1, const void *k2);
#endif /* JEMALLOC_INTERNAL_CKH_H */

View file

@ -0,0 +1,23 @@
#ifndef JEMALLOC_INTERNAL_CONF_H
#define JEMALLOC_INTERNAL_CONF_H
#include "jemalloc/internal/sc.h"
void malloc_conf_init(sc_data_t *sc_data, unsigned bin_shard_sizes[SC_NBINS],
char readlink_buf[PATH_MAX + 1]);
void malloc_abort_invalid_conf(void);
#ifdef JEMALLOC_JET
extern bool had_conf_error;
bool conf_next(char const **opts_p, char const **k_p, size_t *klen_p,
char const **v_p, size_t *vlen_p);
void conf_error(
const char *msg, const char *k, size_t klen, const char *v, size_t vlen);
bool conf_handle_bool(const char *v, size_t vlen, bool *result);
bool conf_handle_signed(const char *v, size_t vlen, intmax_t min, intmax_t max,
bool check_min, bool check_max, bool clip, intmax_t *result);
bool conf_handle_char_p(const char *v, size_t vlen, char *dest, size_t dest_sz);
#endif
#endif /* JEMALLOC_INTERNAL_CONF_H */

View file

@ -0,0 +1,36 @@
#ifndef JEMALLOC_INTERNAL_COUNTER_H
#define JEMALLOC_INTERNAL_COUNTER_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/lockedint.h"
#include "jemalloc/internal/mutex.h"
typedef struct counter_accum_s {
LOCKEDINT_MTX_DECLARE(mtx)
locked_u64_t accumbytes;
uint64_t interval;
} counter_accum_t;
JEMALLOC_ALWAYS_INLINE bool
counter_accum(tsdn_t *tsdn, counter_accum_t *counter, uint64_t bytes) {
uint64_t interval = counter->interval;
assert(interval > 0);
LOCKEDINT_MTX_LOCK(tsdn, counter->mtx);
/*
* If the event moves fast enough (and/or if the event handling is slow
* enough), extreme overflow can cause counter trigger coalescing.
* This is an intentional mechanism that avoids rate-limiting
* allocation.
*/
bool overflow = locked_inc_mod_u64(tsdn, LOCKEDINT_MTX(counter->mtx),
&counter->accumbytes, bytes, interval);
LOCKEDINT_MTX_UNLOCK(tsdn, counter->mtx);
return overflow;
}
bool counter_accum_init(counter_accum_t *counter, uint64_t interval);
void counter_prefork(tsdn_t *tsdn, counter_accum_t *counter);
void counter_postfork_parent(tsdn_t *tsdn, counter_accum_t *counter);
void counter_postfork_child(tsdn_t *tsdn, counter_accum_t *counter);
#endif /* JEMALLOC_INTERNAL_COUNTER_H */

View file

@ -1,118 +1,172 @@
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
#ifndef JEMALLOC_INTERNAL_CTL_H
#define JEMALLOC_INTERNAL_CTL_H
typedef struct ctl_node_s ctl_node_t;
typedef struct ctl_named_node_s ctl_named_node_t;
typedef struct ctl_indexed_node_s ctl_indexed_node_t;
typedef struct ctl_arena_stats_s ctl_arena_stats_t;
typedef struct ctl_stats_s ctl_stats_t;
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_stats.h"
#include "jemalloc/internal/background_thread_structs.h"
#include "jemalloc/internal/bin_stats.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/malloc_io.h"
#include "jemalloc/internal/mutex_prof.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/stats.h"
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
/* Maximum ctl tree depth. */
#define CTL_MAX_DEPTH 7
#define CTL_MULTI_SETTING_MAX_LEN 1000
struct ctl_node_s {
bool named;
};
typedef struct ctl_node_s {
bool named;
} ctl_node_t;
struct ctl_named_node_s {
struct ctl_node_s node;
const char *name;
typedef struct ctl_named_node_s {
ctl_node_t node;
const char *name;
/* If (nchildren == 0), this is a terminal node. */
unsigned nchildren;
const ctl_node_t *children;
int (*ctl)(tsd_t *, const size_t *, size_t, void *,
size_t *, void *, size_t);
};
size_t nchildren;
const ctl_node_t *children;
int (*ctl)(
tsd_t *, const size_t *, size_t, void *, size_t *, void *, size_t);
} ctl_named_node_t;
struct ctl_indexed_node_s {
struct ctl_node_s node;
const ctl_named_node_t *(*index)(tsdn_t *, const size_t *, size_t,
size_t);
};
typedef struct ctl_indexed_node_s {
struct ctl_node_s node;
const ctl_named_node_t *(*index)(
tsdn_t *, const size_t *, size_t, size_t);
} ctl_indexed_node_t;
struct ctl_arena_stats_s {
bool initialized;
unsigned nthreads;
const char *dss;
ssize_t lg_dirty_mult;
ssize_t decay_time;
size_t pactive;
size_t pdirty;
/* The remainder are only populated if config_stats is true. */
arena_stats_t astats;
typedef struct ctl_arena_stats_s {
arena_stats_t astats;
/* Aggregate stats for small size classes, based on bin stats. */
size_t allocated_small;
uint64_t nmalloc_small;
uint64_t ndalloc_small;
uint64_t nrequests_small;
size_t allocated_small;
uint64_t nmalloc_small;
uint64_t ndalloc_small;
uint64_t nrequests_small;
uint64_t nfills_small;
uint64_t nflushes_small;
malloc_bin_stats_t bstats[NBINS];
malloc_large_stats_t *lstats; /* nlclasses elements. */
malloc_huge_stats_t *hstats; /* nhclasses elements. */
bin_stats_data_t bstats[SC_NBINS];
arena_stats_large_t lstats[SC_NSIZES - SC_NBINS];
pac_estats_t estats[SC_NPSIZES];
hpa_shard_stats_t hpastats;
} ctl_arena_stats_t;
typedef struct ctl_stats_s {
size_t allocated;
size_t active;
size_t metadata;
size_t metadata_edata;
size_t metadata_rtree;
size_t metadata_thp;
size_t resident;
size_t mapped;
size_t retained;
background_thread_stats_t background_thread;
mutex_prof_data_t mutex_prof_data[mutex_prof_num_global_mutexes];
} ctl_stats_t;
typedef struct ctl_arena_s ctl_arena_t;
struct ctl_arena_s {
unsigned arena_ind;
bool initialized;
ql_elm(ctl_arena_t) destroyed_link;
/* Basic stats, supported even if !config_stats. */
unsigned nthreads;
const char *dss;
ssize_t dirty_decay_ms;
ssize_t muzzy_decay_ms;
size_t pactive;
size_t pdirty;
size_t pmuzzy;
/* NULL if !config_stats. */
ctl_arena_stats_t *astats;
};
struct ctl_stats_s {
size_t allocated;
size_t active;
size_t metadata;
size_t resident;
size_t mapped;
size_t retained;
unsigned narenas;
ctl_arena_stats_t *arenas; /* (narenas + 1) elements. */
};
typedef struct ctl_arenas_s {
uint64_t epoch;
unsigned narenas;
ql_head(ctl_arena_t) destroyed;
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
/*
* Element 0 corresponds to merged stats for extant arenas (accessed via
* MALLCTL_ARENAS_ALL), element 1 corresponds to merged stats for
* destroyed arenas (accessed via MALLCTL_ARENAS_DESTROYED), and the
* remaining MALLOCX_ARENA_LIMIT elements correspond to arenas.
*/
ctl_arena_t *arenas[2 + MALLOCX_ARENA_LIMIT];
} ctl_arenas_t;
int ctl_byname(tsd_t *tsd, const char *name, void *oldp, size_t *oldlenp,
int ctl_byname(tsd_t *tsd, const char *name, void *oldp, size_t *oldlenp,
void *newp, size_t newlen);
int ctl_nametomib(tsdn_t *tsdn, const char *name, size_t *mibp,
size_t *miblenp);
int ctl_bymib(tsd_t *tsd, const size_t *mib, size_t miblen, void *oldp,
int ctl_nametomib(tsd_t *tsd, const char *name, size_t *mibp, size_t *miblenp);
int ctl_bymib(tsd_t *tsd, const size_t *mib, size_t miblen, void *oldp,
size_t *oldlenp, void *newp, size_t newlen);
bool ctl_boot(void);
void ctl_prefork(tsdn_t *tsdn);
void ctl_postfork_parent(tsdn_t *tsdn);
void ctl_postfork_child(tsdn_t *tsdn);
int ctl_mibnametomib(
tsd_t *tsd, size_t *mib, size_t miblen, const char *name, size_t *miblenp);
int ctl_bymibname(tsd_t *tsd, size_t *mib, size_t miblen, const char *name,
size_t *miblenp, void *oldp, size_t *oldlenp, void *newp, size_t newlen);
bool ctl_boot(void);
void ctl_prefork(tsdn_t *tsdn);
void ctl_postfork_parent(tsdn_t *tsdn);
void ctl_postfork_child(tsdn_t *tsdn);
void ctl_mtx_assert_held(tsdn_t *tsdn);
#define xmallctl(name, oldp, oldlenp, newp, newlen) do { \
if (je_mallctl(name, oldp, oldlenp, newp, newlen) \
!= 0) { \
malloc_printf( \
"<jemalloc>: Failure in xmallctl(\"%s\", ...)\n", \
name); \
abort(); \
} \
} while (0)
#define xmallctl(name, oldp, oldlenp, newp, newlen) \
do { \
if (je_mallctl(name, oldp, oldlenp, newp, newlen) != 0) { \
malloc_printf( \
"<jemalloc>: Failure in xmallctl(\"%s\", ...)\n", \
name); \
abort(); \
} \
} while (0)
#define xmallctlnametomib(name, mibp, miblenp) do { \
if (je_mallctlnametomib(name, mibp, miblenp) != 0) { \
malloc_printf("<jemalloc>: Failure in " \
"xmallctlnametomib(\"%s\", ...)\n", name); \
abort(); \
} \
} while (0)
#define xmallctlnametomib(name, mibp, miblenp) \
do { \
if (je_mallctlnametomib(name, mibp, miblenp) != 0) { \
malloc_printf( \
"<jemalloc>: Failure in " \
"xmallctlnametomib(\"%s\", ...)\n", \
name); \
abort(); \
} \
} while (0)
#define xmallctlbymib(mib, miblen, oldp, oldlenp, newp, newlen) do { \
if (je_mallctlbymib(mib, miblen, oldp, oldlenp, newp, \
newlen) != 0) { \
malloc_write( \
"<jemalloc>: Failure in xmallctlbymib()\n"); \
abort(); \
} \
} while (0)
#define xmallctlbymib(mib, miblen, oldp, oldlenp, newp, newlen) \
do { \
if (je_mallctlbymib(mib, miblen, oldp, oldlenp, newp, newlen) \
!= 0) { \
malloc_write( \
"<jemalloc>: Failure in xmallctlbymib()\n"); \
abort(); \
} \
} while (0)
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#define xmallctlmibnametomib(mib, miblen, name, miblenp) \
do { \
if (ctl_mibnametomib(tsd_fetch(), mib, miblen, name, miblenp) \
!= 0) { \
malloc_write( \
"<jemalloc>: Failure in ctl_mibnametomib()\n"); \
abort(); \
} \
} while (0)
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/
#define xmallctlbymibname( \
mib, miblen, name, miblenp, oldp, oldlenp, newp, newlen) \
do { \
if (ctl_bymibname(tsd_fetch(), mib, miblen, name, miblenp, \
oldp, oldlenp, newp, newlen) \
!= 0) { \
malloc_write( \
"<jemalloc>: Failure in ctl_bymibname()\n"); \
abort(); \
} \
} while (0)
#endif /* JEMALLOC_INTERNAL_CTL_H */

View file

@ -0,0 +1,188 @@
#ifndef JEMALLOC_INTERNAL_DECAY_H
#define JEMALLOC_INTERNAL_DECAY_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/smoothstep.h"
#define DECAY_UNBOUNDED_TIME_TO_PURGE ((uint64_t) - 1)
/*
* The decay_t computes the number of pages we should purge at any given time.
* Page allocators inform a decay object when pages enter a decay-able state
* (i.e. dirty or muzzy), and query it to determine how many pages should be
* purged at any given time.
*
* This is mostly a single-threaded data structure and doesn't care about
* synchronization at all; it's the caller's responsibility to manage their
* synchronization on their own. There are two exceptions:
* 1) It's OK to racily call decay_ms_read (i.e. just the simplest state query).
* 2) The mtx and purging fields live (and are initialized) here, but are
* logically owned by the page allocator. This is just a convenience (since
* those fields would be duplicated for both the dirty and muzzy states
* otherwise).
*/
typedef struct decay_s decay_t;
struct decay_s {
/* Synchronizes all non-atomic fields. */
malloc_mutex_t mtx;
/*
* True if a thread is currently purging the extents associated with
* this decay structure.
*/
bool purging;
/*
* Approximate time in milliseconds from the creation of a set of unused
* dirty pages until an equivalent set of unused dirty pages is purged
* and/or reused.
*/
atomic_zd_t time_ms;
/* time / SMOOTHSTEP_NSTEPS. */
nstime_t interval;
/*
* Time at which the current decay interval logically started. We do
* not actually advance to a new epoch until sometime after it starts
* because of scheduling and computation delays, and it is even possible
* to completely skip epochs. In all cases, during epoch advancement we
* merge all relevant activity into the most recently recorded epoch.
*/
nstime_t epoch;
/* Deadline randomness generator. */
uint64_t jitter_state;
/*
* Deadline for current epoch. This is the sum of interval and per
* epoch jitter which is a uniform random variable in [0..interval).
* Epochs always advance by precise multiples of interval, but we
* randomize the deadline to reduce the likelihood of arenas purging in
* lockstep.
*/
nstime_t deadline;
/*
* The number of pages we cap ourselves at in the current epoch, per
* decay policies. Updated on an epoch change. After an epoch change,
* the caller should take steps to try to purge down to this amount.
*/
size_t npages_limit;
/*
* Number of unpurged pages at beginning of current epoch. During epoch
* advancement we use the delta between arena->decay_*.nunpurged and
* ecache_npages_get(&arena->ecache_*) to determine how many dirty pages,
* if any, were generated.
*/
size_t nunpurged;
/*
* Trailing log of how many unused dirty pages were generated during
* each of the past SMOOTHSTEP_NSTEPS decay epochs, where the last
* element is the most recent epoch. Corresponding epoch times are
* relative to epoch.
*
* Updated only on epoch advance, triggered by
* decay_maybe_advance_epoch, below.
*/
size_t backlog[SMOOTHSTEP_NSTEPS];
/* Peak number of pages in associated extents. Used for debug only. */
uint64_t ceil_npages;
};
/*
* The current decay time setting. This is the only public access to a decay_t
* that's allowed without holding mtx.
*/
static inline ssize_t
decay_ms_read(const decay_t *decay) {
return atomic_load_zd(&decay->time_ms, ATOMIC_RELAXED);
}
/*
* See the comment on the struct field -- the limit on pages we should allow in
* this decay state this epoch.
*/
static inline size_t
decay_npages_limit_get(const decay_t *decay) {
return decay->npages_limit;
}
/* How many unused dirty pages were generated during the last epoch. */
static inline size_t
decay_epoch_npages_delta(const decay_t *decay) {
return decay->backlog[SMOOTHSTEP_NSTEPS - 1];
}
/*
* Current epoch duration, in nanoseconds. Given that new epochs are started
* somewhat haphazardly, this is not necessarily exactly the time between any
* two calls to decay_maybe_advance_epoch; see the comments on fields in the
* decay_t.
*/
static inline uint64_t
decay_epoch_duration_ns(const decay_t *decay) {
return nstime_ns(&decay->interval);
}
static inline bool
decay_immediately(const decay_t *decay) {
ssize_t decay_ms = decay_ms_read(decay);
return decay_ms == 0;
}
static inline bool
decay_disabled(const decay_t *decay) {
ssize_t decay_ms = decay_ms_read(decay);
return decay_ms < 0;
}
/* Returns true if decay is enabled and done gradually. */
static inline bool
decay_gradually(const decay_t *decay) {
ssize_t decay_ms = decay_ms_read(decay);
return decay_ms > 0;
}
/*
* Returns true if the passed in decay time setting is valid.
* < -1 : invalid
* -1 : never decay
* 0 : decay immediately
* > 0 : some positive decay time, up to a maximum allowed value of
* NSTIME_SEC_MAX * 1000, which corresponds to decaying somewhere in the early
* 27th century. By that time, we expect to have implemented alternate purging
* strategies.
*/
bool decay_ms_valid(ssize_t decay_ms);
/*
* As a precondition, the decay_t must be zeroed out (as if with memset).
*
* Returns true on error.
*/
bool decay_init(decay_t *decay, nstime_t *cur_time, ssize_t decay_ms);
/*
* Given an already-initialized decay_t, reinitialize it with the given decay
* time. The decay_t must have previously been initialized (and should not then
* be zeroed).
*/
void decay_reinit(decay_t *decay, nstime_t *cur_time, ssize_t decay_ms);
/*
* Compute how many of 'npages_new' pages we would need to purge in 'time'.
*/
uint64_t decay_npages_purge_in(
decay_t *decay, nstime_t *time, size_t npages_new);
/* Returns true if the epoch advanced and there are pages to purge. */
bool decay_maybe_advance_epoch(
decay_t *decay, nstime_t *new_time, size_t current_npages);
/*
* Calculates wait time until a number of pages in the interval
* [0.5 * npages_threshold .. 1.5 * npages_threshold] should be purged.
*
* Returns number of nanoseconds or DECAY_UNBOUNDED_TIME_TO_PURGE in case of
* indefinite wait.
*/
uint64_t decay_ns_until_purge(
decay_t *decay, size_t npages_current, uint64_t npages_threshold);
#endif /* JEMALLOC_INTERNAL_DECAY_H */

View file

@ -0,0 +1,42 @@
#ifndef JEMALLOC_INTERNAL_DIV_H
#define JEMALLOC_INTERNAL_DIV_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
/*
* This module does the division that computes the index of a region in a slab,
* given its offset relative to the base.
* That is, given a divisor d, an n = i * d (all integers), we'll return i.
* We do some pre-computation to do this more quickly than a CPU division
* instruction.
* We bound n < 2^32, and don't support dividing by one.
*/
typedef struct div_info_s div_info_t;
struct div_info_s {
uint32_t magic;
#ifdef JEMALLOC_DEBUG
size_t d;
#endif
};
void div_init(div_info_t *div_info, size_t divisor);
static inline size_t
div_compute(div_info_t *div_info, size_t n) {
assert(n <= (uint32_t)-1);
/*
* This generates, e.g. mov; imul; shr on x86-64. On a 32-bit machine,
* the compilers I tried were all smart enough to turn this into the
* appropriate "get the high 32 bits of the result of a multiply" (e.g.
* mul; mov edx eax; on x86, umull on arm, etc.).
*/
size_t i = ((uint64_t)n * (uint64_t)div_info->magic) >> 32;
#ifdef JEMALLOC_DEBUG
assert(i * div_info->d == n);
#endif
return i;
}
#endif /* JEMALLOC_INTERNAL_DIV_H */

View file

@ -0,0 +1,56 @@
#ifndef JEMALLOC_INTERNAL_ECACHE_H
#define JEMALLOC_INTERNAL_ECACHE_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/eset.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/san.h"
typedef struct ecache_s ecache_t;
struct ecache_s {
malloc_mutex_t mtx;
eset_t eset;
eset_t guarded_eset;
/* All stored extents must be in the same state. */
extent_state_t state;
/* The index of the ehooks the ecache is associated with. */
unsigned ind;
/*
* If true, delay coalescing until eviction; otherwise coalesce during
* deallocation.
*/
bool delay_coalesce;
};
static inline size_t
ecache_npages_get(ecache_t *ecache) {
return eset_npages_get(&ecache->eset)
+ eset_npages_get(&ecache->guarded_eset);
}
/* Get the number of extents in the given page size index. */
static inline size_t
ecache_nextents_get(ecache_t *ecache, pszind_t ind) {
return eset_nextents_get(&ecache->eset, ind)
+ eset_nextents_get(&ecache->guarded_eset, ind);
}
/* Get the sum total bytes of the extents in the given page size index. */
static inline size_t
ecache_nbytes_get(ecache_t *ecache, pszind_t ind) {
return eset_nbytes_get(&ecache->eset, ind)
+ eset_nbytes_get(&ecache->guarded_eset, ind);
}
static inline unsigned
ecache_ind_get(ecache_t *ecache) {
return ecache->ind;
}
bool ecache_init(tsdn_t *tsdn, ecache_t *ecache, extent_state_t state,
unsigned ind, bool delay_coalesce);
void ecache_prefork(tsdn_t *tsdn, ecache_t *ecache);
void ecache_postfork_parent(tsdn_t *tsdn, ecache_t *ecache);
void ecache_postfork_child(tsdn_t *tsdn, ecache_t *ecache);
#endif /* JEMALLOC_INTERNAL_ECACHE_H */

View file

@ -0,0 +1,795 @@
#ifndef JEMALLOC_INTERNAL_EDATA_H
#define JEMALLOC_INTERNAL_EDATA_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/bin_info.h"
#include "jemalloc/internal/bit_util.h"
#include "jemalloc/internal/hpdata.h"
#include "jemalloc/internal/nstime.h"
#include "jemalloc/internal/ph.h"
#include "jemalloc/internal/prof_types.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/slab_data.h"
#include "jemalloc/internal/sz.h"
#include "jemalloc/internal/typed_list.h"
/*
* sizeof(edata_t) is 128 bytes on 64-bit architectures. Ensure the alignment
* to free up the low bits in the rtree leaf.
*/
#define EDATA_ALIGNMENT 128
/*
* Defines how many nodes visited when enumerating the heap to search for
* qualified extents. More nodes visited may result in better choices at
* the cost of longer search time. This size should not exceed 2^16 - 1
* because we use uint16_t for accessing the queue needed for enumeration.
*/
#define ESET_ENUMERATE_MAX_NUM 32
enum extent_state_e {
extent_state_active = 0,
extent_state_dirty = 1,
extent_state_muzzy = 2,
extent_state_retained = 3,
extent_state_transition = 4, /* States below are intermediate. */
extent_state_merging = 5,
extent_state_max = 5 /* Sanity checking only. */
};
typedef enum extent_state_e extent_state_t;
enum extent_head_state_e {
EXTENT_NOT_HEAD,
EXTENT_IS_HEAD /* See comments in ehooks_default_merge_impl(). */
};
typedef enum extent_head_state_e extent_head_state_t;
/*
* Which implementation of the page allocator interface, (PAI, defined in
* pai.h) owns the given extent?
*/
enum extent_pai_e { EXTENT_PAI_PAC = 0, EXTENT_PAI_HPA = 1 };
typedef enum extent_pai_e extent_pai_t;
struct e_prof_info_s {
/* Time when this was allocated. */
nstime_t e_prof_alloc_time;
/* Allocation request size. */
size_t e_prof_alloc_size;
/* Points to a prof_tctx_t. */
atomic_p_t e_prof_tctx;
/*
* Points to a prof_recent_t for the allocation; NULL
* means the recent allocation record no longer exists.
* Protected by prof_recent_alloc_mtx.
*/
atomic_p_t e_prof_recent_alloc;
};
typedef struct e_prof_info_s e_prof_info_t;
/*
* The information about a particular edata that lives in an emap. Space is
* more precious there (the information, plus the edata pointer, has to live in
* a 64-bit word if we want to enable a packed representation.
*
* There are two things that are special about the information here:
* - It's quicker to access. You have one fewer pointer hop, since finding the
* edata_t associated with an item always requires accessing the rtree leaf in
* which this data is stored.
* - It can be read unsynchronized, and without worrying about lifetime issues.
*/
typedef struct edata_map_info_s edata_map_info_t;
struct edata_map_info_s {
bool slab;
szind_t szind;
};
typedef struct edata_cmp_summary_s edata_cmp_summary_t;
struct edata_cmp_summary_s {
uint64_t sn;
uintptr_t addr;
};
/* Extent (span of pages). Use accessor functions for e_* fields. */
typedef struct edata_s edata_t;
ph_structs(edata_avail, edata_t, ESET_ENUMERATE_MAX_NUM);
ph_structs(edata_heap, edata_t, ESET_ENUMERATE_MAX_NUM);
struct edata_s {
/*
* Bitfield containing several fields:
*
* a: arena_ind
* b: slab
* c: committed
* p: pai
* z: zeroed
* g: guarded
* t: state
* i: szind
* f: nfree
* s: bin_shard
*
* 00000000 ... 0000ssss ssffffff ffffiiii iiiitttg zpcbaaaa aaaaaaaa
*
* arena_ind: Arena from which this extent came, or all 1 bits if
* unassociated.
*
* slab: The slab flag indicates whether the extent is used for a slab
* of small regions. This helps differentiate small size classes,
* and it indicates whether interior pointers can be looked up via
* iealloc().
*
* committed: The committed flag indicates whether physical memory is
* committed to the extent, whether explicitly or implicitly
* as on a system that overcommits and satisfies physical
* memory needs on demand via soft page faults.
*
* pai: The pai flag is an extent_pai_t.
*
* zeroed: The zeroed flag is used by extent recycling code to track
* whether memory is zero-filled.
*
* guarded: The guarded flag is use by the sanitizer to track whether
* the extent has page guards around it.
*
* state: The state flag is an extent_state_t.
*
* szind: The szind flag indicates usable size class index for
* allocations residing in this extent, regardless of whether the
* extent is a slab. Extent size and usable size often differ
* even for non-slabs, either due to sz_large_pad or promotion of
* sampled small regions.
*
* nfree: Number of free regions in slab.
*
* bin_shard: the shard of the bin from which this extent came.
*/
uint64_t e_bits;
#define MASK(CURRENT_FIELD_WIDTH, CURRENT_FIELD_SHIFT) \
((((((uint64_t)0x1U) << (CURRENT_FIELD_WIDTH)) - 1)) \
<< (CURRENT_FIELD_SHIFT))
#define EDATA_BITS_ARENA_WIDTH MALLOCX_ARENA_BITS
#define EDATA_BITS_ARENA_SHIFT 0
#define EDATA_BITS_ARENA_MASK \
MASK(EDATA_BITS_ARENA_WIDTH, EDATA_BITS_ARENA_SHIFT)
#define EDATA_BITS_SLAB_WIDTH 1
#define EDATA_BITS_SLAB_SHIFT (EDATA_BITS_ARENA_WIDTH + EDATA_BITS_ARENA_SHIFT)
#define EDATA_BITS_SLAB_MASK MASK(EDATA_BITS_SLAB_WIDTH, EDATA_BITS_SLAB_SHIFT)
#define EDATA_BITS_COMMITTED_WIDTH 1
#define EDATA_BITS_COMMITTED_SHIFT \
(EDATA_BITS_SLAB_WIDTH + EDATA_BITS_SLAB_SHIFT)
#define EDATA_BITS_COMMITTED_MASK \
MASK(EDATA_BITS_COMMITTED_WIDTH, EDATA_BITS_COMMITTED_SHIFT)
#define EDATA_BITS_PAI_WIDTH 1
#define EDATA_BITS_PAI_SHIFT \
(EDATA_BITS_COMMITTED_WIDTH + EDATA_BITS_COMMITTED_SHIFT)
#define EDATA_BITS_PAI_MASK MASK(EDATA_BITS_PAI_WIDTH, EDATA_BITS_PAI_SHIFT)
#define EDATA_BITS_ZEROED_WIDTH 1
#define EDATA_BITS_ZEROED_SHIFT (EDATA_BITS_PAI_WIDTH + EDATA_BITS_PAI_SHIFT)
#define EDATA_BITS_ZEROED_MASK \
MASK(EDATA_BITS_ZEROED_WIDTH, EDATA_BITS_ZEROED_SHIFT)
#define EDATA_BITS_GUARDED_WIDTH 1
#define EDATA_BITS_GUARDED_SHIFT \
(EDATA_BITS_ZEROED_WIDTH + EDATA_BITS_ZEROED_SHIFT)
#define EDATA_BITS_GUARDED_MASK \
MASK(EDATA_BITS_GUARDED_WIDTH, EDATA_BITS_GUARDED_SHIFT)
#define EDATA_BITS_STATE_WIDTH 3
#define EDATA_BITS_STATE_SHIFT \
(EDATA_BITS_GUARDED_WIDTH + EDATA_BITS_GUARDED_SHIFT)
#define EDATA_BITS_STATE_MASK \
MASK(EDATA_BITS_STATE_WIDTH, EDATA_BITS_STATE_SHIFT)
#define EDATA_BITS_SZIND_WIDTH LG_CEIL(SC_NSIZES)
#define EDATA_BITS_SZIND_SHIFT (EDATA_BITS_STATE_WIDTH + EDATA_BITS_STATE_SHIFT)
#define EDATA_BITS_SZIND_MASK \
MASK(EDATA_BITS_SZIND_WIDTH, EDATA_BITS_SZIND_SHIFT)
#define EDATA_BITS_NFREE_WIDTH (SC_LG_SLAB_MAXREGS + 1)
#define EDATA_BITS_NFREE_SHIFT (EDATA_BITS_SZIND_WIDTH + EDATA_BITS_SZIND_SHIFT)
#define EDATA_BITS_NFREE_MASK \
MASK(EDATA_BITS_NFREE_WIDTH, EDATA_BITS_NFREE_SHIFT)
#define EDATA_BITS_BINSHARD_WIDTH 6
#define EDATA_BITS_BINSHARD_SHIFT \
(EDATA_BITS_NFREE_WIDTH + EDATA_BITS_NFREE_SHIFT)
#define EDATA_BITS_BINSHARD_MASK \
MASK(EDATA_BITS_BINSHARD_WIDTH, EDATA_BITS_BINSHARD_SHIFT)
#define EDATA_BITS_IS_HEAD_WIDTH 1
#define EDATA_BITS_IS_HEAD_SHIFT \
(EDATA_BITS_BINSHARD_WIDTH + EDATA_BITS_BINSHARD_SHIFT)
#define EDATA_BITS_IS_HEAD_MASK \
MASK(EDATA_BITS_IS_HEAD_WIDTH, EDATA_BITS_IS_HEAD_SHIFT)
/* Pointer to the extent that this structure is responsible for. */
void *e_addr;
union {
/*
* Extent size and serial number associated with the extent
* structure (different than the serial number for the extent at
* e_addr).
*
* ssssssss [...] ssssssss ssssnnnn nnnnnnnn
*/
size_t e_size_esn;
#define EDATA_SIZE_MASK ((size_t) ~(PAGE - 1))
#define EDATA_ESN_MASK ((size_t)PAGE - 1)
/* Base extent size, which may not be a multiple of PAGE. */
size_t e_bsize;
};
/*
* If this edata is a user allocation from an HPA, it comes out of some
* pageslab (we don't yet support hugepage allocations that don't fit
* into pageslabs). This tracks it.
*/
hpdata_t *e_ps;
/*
* Serial number. These are not necessarily unique; splitting an extent
* results in two extents with the same serial number.
*/
uint64_t e_sn;
union {
/*
* List linkage used when the edata_t is active; either in
* arena's large allocations or bin_t's slabs_full.
*/
ql_elm(edata_t) ql_link_active;
/*
* Pairing heap linkage. Used whenever the extent is inactive
* (in the page allocators), or when it is active and in
* slabs_nonfull, or when the edata_t is unassociated with an
* extent and sitting in an edata_cache.
*/
union {
edata_heap_link_t heap_link;
edata_avail_link_t avail_link;
};
};
union {
/*
* List linkage used when the extent is inactive:
* - Stashed dirty extents
* - Ecache LRU functionality.
*/
ql_elm(edata_t) ql_link_inactive;
/* Small region slab metadata. */
slab_data_t e_slab_data;
/* Profiling data, used for large objects. */
e_prof_info_t e_prof_info;
};
};
TYPED_LIST(edata_list_active, edata_t, ql_link_active)
TYPED_LIST(edata_list_inactive, edata_t, ql_link_inactive)
static inline unsigned
edata_arena_ind_get(const edata_t *edata) {
unsigned arena_ind = (unsigned)((edata->e_bits & EDATA_BITS_ARENA_MASK)
>> EDATA_BITS_ARENA_SHIFT);
assert(arena_ind < MALLOCX_ARENA_LIMIT);
return arena_ind;
}
static inline szind_t
edata_szind_get_maybe_invalid(const edata_t *edata) {
szind_t szind = (szind_t)((edata->e_bits & EDATA_BITS_SZIND_MASK)
>> EDATA_BITS_SZIND_SHIFT);
assert(szind <= SC_NSIZES);
return szind;
}
static inline szind_t
edata_szind_get(const edata_t *edata) {
szind_t szind = edata_szind_get_maybe_invalid(edata);
assert(szind < SC_NSIZES); /* Never call when "invalid". */
return szind;
}
static inline size_t
edata_usize_get(const edata_t *edata) {
assert(edata != NULL);
/*
* When sz_large_size_classes_disabled() is true, two cases:
* 1. if usize_from_ind is not smaller than SC_LARGE_MINCLASS,
* usize_from_size is accurate;
* 2. otherwise, usize_from_ind is accurate.
*
* When sz_large_size_classes_disabled() is not true, the two should be the
* same when usize_from_ind is not smaller than SC_LARGE_MINCLASS.
*
* Note sampled small allocs will be promoted. Their extent size is
* recorded in edata_size_get(edata), while their szind reflects the
* true usize. Thus, usize retrieved here is still accurate for
* sampled small allocs.
*/
szind_t szind = edata_szind_get(edata);
#ifdef JEMALLOC_JET
/*
* Double free is invalid and results in undefined behavior. However,
* for double free tests to end gracefully, return an invalid usize
* when szind shows the edata is not active, i.e., szind == SC_NSIZES.
*/
if (unlikely(szind == SC_NSIZES)) {
return SC_LARGE_MAXCLASS + 1;
}
#endif
if (!sz_large_size_classes_disabled() || szind < SC_NBINS) {
size_t usize_from_ind = sz_index2size(szind);
if (!sz_large_size_classes_disabled()
&& usize_from_ind >= SC_LARGE_MINCLASS) {
size_t size = (edata->e_size_esn & EDATA_SIZE_MASK);
assert(size > sz_large_pad);
size_t usize_from_size = size - sz_large_pad;
assert(usize_from_ind == usize_from_size);
}
return usize_from_ind;
}
size_t size = (edata->e_size_esn & EDATA_SIZE_MASK);
assert(size > sz_large_pad);
size_t usize_from_size = size - sz_large_pad;
/*
* no matter large size classes disabled or not, usize retrieved from
* size is not accurate when smaller than SC_LARGE_MINCLASS.
*/
assert(usize_from_size >= SC_LARGE_MINCLASS);
return usize_from_size;
}
static inline unsigned
edata_binshard_get(const edata_t *edata) {
unsigned binshard = (unsigned)((edata->e_bits
& EDATA_BITS_BINSHARD_MASK)
>> EDATA_BITS_BINSHARD_SHIFT);
assert(binshard < bin_infos[edata_szind_get(edata)].n_shards);
return binshard;
}
static inline uint64_t
edata_sn_get(const edata_t *edata) {
return edata->e_sn;
}
static inline extent_state_t
edata_state_get(const edata_t *edata) {
return (extent_state_t)((edata->e_bits & EDATA_BITS_STATE_MASK)
>> EDATA_BITS_STATE_SHIFT);
}
static inline bool
edata_guarded_get(const edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_GUARDED_MASK)
>> EDATA_BITS_GUARDED_SHIFT);
}
static inline bool
edata_zeroed_get(const edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_ZEROED_MASK)
>> EDATA_BITS_ZEROED_SHIFT);
}
static inline bool
edata_committed_get(const edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_COMMITTED_MASK)
>> EDATA_BITS_COMMITTED_SHIFT);
}
static inline extent_pai_t
edata_pai_get(const edata_t *edata) {
return (extent_pai_t)((edata->e_bits & EDATA_BITS_PAI_MASK)
>> EDATA_BITS_PAI_SHIFT);
}
static inline bool
edata_slab_get(const edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_SLAB_MASK)
>> EDATA_BITS_SLAB_SHIFT);
}
static inline unsigned
edata_nfree_get(const edata_t *edata) {
assert(edata_slab_get(edata));
return (unsigned)((edata->e_bits & EDATA_BITS_NFREE_MASK)
>> EDATA_BITS_NFREE_SHIFT);
}
static inline void *
edata_base_get(const edata_t *edata) {
assert(edata->e_addr == PAGE_ADDR2BASE(edata->e_addr)
|| !edata_slab_get(edata));
return PAGE_ADDR2BASE(edata->e_addr);
}
static inline void *
edata_addr_get(const edata_t *edata) {
assert(edata->e_addr == PAGE_ADDR2BASE(edata->e_addr)
|| !edata_slab_get(edata));
return edata->e_addr;
}
static inline size_t
edata_size_get(const edata_t *edata) {
return (edata->e_size_esn & EDATA_SIZE_MASK);
}
static inline size_t
edata_esn_get(const edata_t *edata) {
return (edata->e_size_esn & EDATA_ESN_MASK);
}
static inline size_t
edata_bsize_get(const edata_t *edata) {
return edata->e_bsize;
}
static inline hpdata_t *
edata_ps_get(const edata_t *edata) {
assert(edata_pai_get(edata) == EXTENT_PAI_HPA);
return edata->e_ps;
}
static inline void *
edata_before_get(const edata_t *edata) {
return (void *)((byte_t *)edata_base_get(edata) - PAGE);
}
static inline void *
edata_last_get(const edata_t *edata) {
return (void *)((byte_t *)edata_base_get(edata) + edata_size_get(edata)
- PAGE);
}
static inline void *
edata_past_get(const edata_t *edata) {
return (
void *)((byte_t *)edata_base_get(edata) + edata_size_get(edata));
}
static inline slab_data_t *
edata_slab_data_get(edata_t *edata) {
assert(edata_slab_get(edata));
return &edata->e_slab_data;
}
static inline const slab_data_t *
edata_slab_data_get_const(const edata_t *edata) {
assert(edata_slab_get(edata));
return &edata->e_slab_data;
}
static inline prof_tctx_t *
edata_prof_tctx_get(const edata_t *edata) {
return (prof_tctx_t *)atomic_load_p(
&edata->e_prof_info.e_prof_tctx, ATOMIC_ACQUIRE);
}
static inline const nstime_t *
edata_prof_alloc_time_get(const edata_t *edata) {
return &edata->e_prof_info.e_prof_alloc_time;
}
static inline size_t
edata_prof_alloc_size_get(const edata_t *edata) {
return edata->e_prof_info.e_prof_alloc_size;
}
static inline prof_recent_t *
edata_prof_recent_alloc_get_dont_call_directly(const edata_t *edata) {
return (prof_recent_t *)atomic_load_p(
&edata->e_prof_info.e_prof_recent_alloc, ATOMIC_RELAXED);
}
static inline void
edata_arena_ind_set(edata_t *edata, unsigned arena_ind) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_ARENA_MASK)
| ((uint64_t)arena_ind << EDATA_BITS_ARENA_SHIFT);
}
static inline void
edata_binshard_set(edata_t *edata, unsigned binshard) {
/* The assertion assumes szind is set already. */
assert(binshard < bin_infos[edata_szind_get(edata)].n_shards);
edata->e_bits = (edata->e_bits & ~EDATA_BITS_BINSHARD_MASK)
| ((uint64_t)binshard << EDATA_BITS_BINSHARD_SHIFT);
}
static inline void
edata_addr_set(edata_t *edata, void *addr) {
edata->e_addr = addr;
}
static inline void
edata_size_set(edata_t *edata, size_t size) {
assert((size & ~EDATA_SIZE_MASK) == 0);
edata->e_size_esn = size | (edata->e_size_esn & ~EDATA_SIZE_MASK);
}
static inline void
edata_esn_set(edata_t *edata, size_t esn) {
edata->e_size_esn = (edata->e_size_esn & ~EDATA_ESN_MASK)
| (esn & EDATA_ESN_MASK);
}
static inline void
edata_bsize_set(edata_t *edata, size_t bsize) {
edata->e_bsize = bsize;
}
static inline void
edata_ps_set(edata_t *edata, hpdata_t *ps) {
assert(edata_pai_get(edata) == EXTENT_PAI_HPA);
edata->e_ps = ps;
}
static inline void
edata_szind_set(edata_t *edata, szind_t szind) {
assert(szind <= SC_NSIZES); /* SC_NSIZES means "invalid". */
edata->e_bits = (edata->e_bits & ~EDATA_BITS_SZIND_MASK)
| ((uint64_t)szind << EDATA_BITS_SZIND_SHIFT);
}
static inline void
edata_nfree_set(edata_t *edata, unsigned nfree) {
assert(edata_slab_get(edata));
edata->e_bits = (edata->e_bits & ~EDATA_BITS_NFREE_MASK)
| ((uint64_t)nfree << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_nfree_binshard_set(edata_t *edata, unsigned nfree, unsigned binshard) {
/* The assertion assumes szind is set already. */
assert(binshard < bin_infos[edata_szind_get(edata)].n_shards);
edata->e_bits = (edata->e_bits
& (~EDATA_BITS_NFREE_MASK
& ~EDATA_BITS_BINSHARD_MASK))
| ((uint64_t)binshard << EDATA_BITS_BINSHARD_SHIFT)
| ((uint64_t)nfree << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_nfree_inc(edata_t *edata) {
assert(edata_slab_get(edata));
edata->e_bits += ((uint64_t)1U << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_nfree_dec(edata_t *edata) {
assert(edata_slab_get(edata));
edata->e_bits -= ((uint64_t)1U << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_nfree_sub(edata_t *edata, uint64_t n) {
assert(edata_slab_get(edata));
edata->e_bits -= (n << EDATA_BITS_NFREE_SHIFT);
}
static inline void
edata_sn_set(edata_t *edata, uint64_t sn) {
edata->e_sn = sn;
}
static inline void
edata_state_set(edata_t *edata, extent_state_t state) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_STATE_MASK)
| ((uint64_t)state << EDATA_BITS_STATE_SHIFT);
}
static inline void
edata_guarded_set(edata_t *edata, bool guarded) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_GUARDED_MASK)
| ((uint64_t)guarded << EDATA_BITS_GUARDED_SHIFT);
}
static inline void
edata_zeroed_set(edata_t *edata, bool zeroed) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_ZEROED_MASK)
| ((uint64_t)zeroed << EDATA_BITS_ZEROED_SHIFT);
}
static inline void
edata_committed_set(edata_t *edata, bool committed) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_COMMITTED_MASK)
| ((uint64_t)committed << EDATA_BITS_COMMITTED_SHIFT);
}
static inline void
edata_pai_set(edata_t *edata, extent_pai_t pai) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_PAI_MASK)
| ((uint64_t)pai << EDATA_BITS_PAI_SHIFT);
}
static inline void
edata_slab_set(edata_t *edata, bool slab) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_SLAB_MASK)
| ((uint64_t)slab << EDATA_BITS_SLAB_SHIFT);
}
static inline void
edata_prof_tctx_set(edata_t *edata, prof_tctx_t *tctx) {
atomic_store_p(&edata->e_prof_info.e_prof_tctx, tctx, ATOMIC_RELEASE);
}
static inline void
edata_prof_alloc_time_set(edata_t *edata, nstime_t *t) {
nstime_copy(&edata->e_prof_info.e_prof_alloc_time, t);
}
static inline void
edata_prof_alloc_size_set(edata_t *edata, size_t size) {
edata->e_prof_info.e_prof_alloc_size = size;
}
static inline void
edata_prof_recent_alloc_set_dont_call_directly(
edata_t *edata, prof_recent_t *recent_alloc) {
atomic_store_p(&edata->e_prof_info.e_prof_recent_alloc, recent_alloc,
ATOMIC_RELAXED);
}
static inline bool
edata_is_head_get(edata_t *edata) {
return (bool)((edata->e_bits & EDATA_BITS_IS_HEAD_MASK)
>> EDATA_BITS_IS_HEAD_SHIFT);
}
static inline void
edata_is_head_set(edata_t *edata, bool is_head) {
edata->e_bits = (edata->e_bits & ~EDATA_BITS_IS_HEAD_MASK)
| ((uint64_t)is_head << EDATA_BITS_IS_HEAD_SHIFT);
}
static inline bool
edata_state_in_transition(extent_state_t state) {
return state >= extent_state_transition;
}
/*
* Because this function is implemented as a sequence of bitfield modifications,
* even though each individual bit is properly initialized, we technically read
* uninitialized data within it. This is mostly fine, since most callers get
* their edatas from zeroing sources, but callers who make stack edata_ts need
* to manually zero them.
*/
static inline void
edata_init(edata_t *edata, unsigned arena_ind, void *addr, size_t size,
bool slab, szind_t szind, uint64_t sn, extent_state_t state, bool zeroed,
bool committed, extent_pai_t pai, extent_head_state_t is_head) {
assert(addr == PAGE_ADDR2BASE(addr) || !slab);
edata_arena_ind_set(edata, arena_ind);
edata_addr_set(edata, addr);
edata_size_set(edata, size);
edata_slab_set(edata, slab);
edata_szind_set(edata, szind);
edata_sn_set(edata, sn);
edata_state_set(edata, state);
edata_guarded_set(edata, false);
edata_zeroed_set(edata, zeroed);
edata_committed_set(edata, committed);
edata_pai_set(edata, pai);
edata_is_head_set(edata, is_head == EXTENT_IS_HEAD);
if (config_prof) {
edata_prof_tctx_set(edata, NULL);
}
}
static inline void
edata_binit(
edata_t *edata, void *addr, size_t bsize, uint64_t sn, bool reused) {
edata_arena_ind_set(edata, (1U << MALLOCX_ARENA_BITS) - 1);
edata_addr_set(edata, addr);
edata_bsize_set(edata, bsize);
edata_slab_set(edata, false);
edata_szind_set(edata, SC_NSIZES);
edata_sn_set(edata, sn);
edata_state_set(edata, extent_state_active);
/* See comments in base_edata_is_reused. */
edata_guarded_set(edata, reused);
edata_zeroed_set(edata, true);
edata_committed_set(edata, true);
/*
* This isn't strictly true, but base allocated extents never get
* deallocated and can't be looked up in the emap, but no sense in
* wasting a state bit to encode this fact.
*/
edata_pai_set(edata, EXTENT_PAI_PAC);
}
static inline int
edata_esn_comp(const edata_t *a, const edata_t *b) {
size_t a_esn = edata_esn_get(a);
size_t b_esn = edata_esn_get(b);
return (a_esn > b_esn) - (a_esn < b_esn);
}
static inline int
edata_ead_comp(const edata_t *a, const edata_t *b) {
uintptr_t a_eaddr = (uintptr_t)a;
uintptr_t b_eaddr = (uintptr_t)b;
return (a_eaddr > b_eaddr) - (a_eaddr < b_eaddr);
}
static inline edata_cmp_summary_t
edata_cmp_summary_get(const edata_t *edata) {
edata_cmp_summary_t result;
result.sn = edata_sn_get(edata);
result.addr = (uintptr_t)edata_addr_get(edata);
return result;
}
#ifdef JEMALLOC_HAVE_INT128
JEMALLOC_ALWAYS_INLINE unsigned __int128
edata_cmp_summary_encode(edata_cmp_summary_t src) {
return ((unsigned __int128)src.sn << 64) | src.addr;
}
static inline int
edata_cmp_summary_comp(edata_cmp_summary_t a, edata_cmp_summary_t b) {
unsigned __int128 a_encoded = edata_cmp_summary_encode(a);
unsigned __int128 b_encoded = edata_cmp_summary_encode(b);
if (a_encoded < b_encoded)
return -1;
if (a_encoded == b_encoded)
return 0;
return 1;
}
#else
static inline int
edata_cmp_summary_comp(edata_cmp_summary_t a, edata_cmp_summary_t b) {
/*
* Logically, what we're doing here is comparing based on `.sn`, and
* falling back to comparing on `.addr` in the case that `a.sn == b.sn`.
* We accomplish this by multiplying the result of the `.sn` comparison
* by 2, so that so long as it is not 0, it will dominate the `.addr`
* comparison in determining the sign of the returned result value.
* The justification for doing things this way is that this is
* branchless - all of the branches that would be present in a
* straightforward implementation are common cases, and thus the branch
* prediction accuracy is not great. As a result, this implementation
* is measurably faster (by around 30%).
*/
return (2 * ((a.sn > b.sn) - (a.sn < b.sn)))
+ ((a.addr > b.addr) - (a.addr < b.addr));
}
#endif
static inline int
edata_snad_comp(const edata_t *a, const edata_t *b) {
edata_cmp_summary_t a_cmp = edata_cmp_summary_get(a);
edata_cmp_summary_t b_cmp = edata_cmp_summary_get(b);
return edata_cmp_summary_comp(a_cmp, b_cmp);
}
static inline int
edata_esnead_comp(const edata_t *a, const edata_t *b) {
/*
* Similar to `edata_cmp_summary_comp`, we've opted for a
* branchless implementation for the sake of performance.
*/
return (2 * edata_esn_comp(a, b)) + edata_ead_comp(a, b);
}
ph_proto(, edata_avail, edata_t) ph_proto(, edata_heap, edata_t)
#endif /* JEMALLOC_INTERNAL_EDATA_H */

View file

@ -0,0 +1,50 @@
#ifndef JEMALLOC_INTERNAL_EDATA_CACHE_H
#define JEMALLOC_INTERNAL_EDATA_CACHE_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/base.h"
/* For tests only. */
#define EDATA_CACHE_FAST_FILL 4
/*
* A cache of edata_t structures allocated via base_alloc_edata (as opposed to
* the underlying extents they describe). The contents of returned edata_t
* objects are garbage and cannot be relied upon.
*/
typedef struct edata_cache_s edata_cache_t;
struct edata_cache_s {
edata_avail_t avail;
atomic_zu_t count;
malloc_mutex_t mtx;
base_t *base;
};
bool edata_cache_init(edata_cache_t *edata_cache, base_t *base);
edata_t *edata_cache_get(tsdn_t *tsdn, edata_cache_t *edata_cache);
void edata_cache_put(tsdn_t *tsdn, edata_cache_t *edata_cache, edata_t *edata);
void edata_cache_prefork(tsdn_t *tsdn, edata_cache_t *edata_cache);
void edata_cache_postfork_parent(tsdn_t *tsdn, edata_cache_t *edata_cache);
void edata_cache_postfork_child(tsdn_t *tsdn, edata_cache_t *edata_cache);
/*
* An edata_cache_small is like an edata_cache, but it relies on external
* synchronization and avoids first-fit strategies.
*/
typedef struct edata_cache_fast_s edata_cache_fast_t;
struct edata_cache_fast_s {
edata_list_inactive_t list;
edata_cache_t *fallback;
bool disabled;
};
void edata_cache_fast_init(edata_cache_fast_t *ecs, edata_cache_t *fallback);
edata_t *edata_cache_fast_get(tsdn_t *tsdn, edata_cache_fast_t *ecs);
void edata_cache_fast_put(
tsdn_t *tsdn, edata_cache_fast_t *ecs, edata_t *edata);
void edata_cache_fast_disable(tsdn_t *tsdn, edata_cache_fast_t *ecs);
#endif /* JEMALLOC_INTERNAL_EDATA_CACHE_H */

View file

@ -0,0 +1,414 @@
#ifndef JEMALLOC_INTERNAL_EHOOKS_H
#define JEMALLOC_INTERNAL_EHOOKS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/extent_mmap.h"
#include "jemalloc/internal/tsd.h"
#include "jemalloc/internal/tsd_types.h"
/*
* This module is the internal interface to the extent hooks (both
* user-specified and external). Eventually, this will give us the flexibility
* to use multiple different versions of user-visible extent-hook APIs under a
* single user interface.
*
* Current API expansions (not available to anyone but the default hooks yet):
* - Head state tracking. Hooks can decide whether or not to merge two
* extents based on whether or not one of them is the head (i.e. was
* allocated on its own). The later extent loses its "head" status.
*/
extern const extent_hooks_t ehooks_default_extent_hooks;
typedef struct ehooks_s ehooks_t;
struct ehooks_s {
/*
* The user-visible id that goes with the ehooks (i.e. that of the base
* they're a part of, the associated arena's index within the arenas
* array).
*/
unsigned ind;
/* Logically an extent_hooks_t *. */
atomic_p_t ptr;
};
extern const extent_hooks_t ehooks_default_extent_hooks;
/*
* These are not really part of the public API. Each hook has a fast-path for
* the default-hooks case that can avoid various small inefficiencies:
* - Forgetting tsd and then calling tsd_get within the hook.
* - Getting more state than necessary out of the extent_t.
* - Doing arena_ind -> arena -> arena_ind lookups.
* By making the calls to these functions visible to the compiler, it can move
* those extra bits of computation down below the fast-paths where they get ignored.
*/
void *ehooks_default_alloc_impl(tsdn_t *tsdn, void *new_addr, size_t size,
size_t alignment, bool *zero, bool *commit, unsigned arena_ind);
bool ehooks_default_dalloc_impl(void *addr, size_t size);
void ehooks_default_destroy_impl(void *addr, size_t size);
bool ehooks_default_commit_impl(void *addr, size_t offset, size_t length);
bool ehooks_default_decommit_impl(void *addr, size_t offset, size_t length);
#ifdef PAGES_CAN_PURGE_LAZY
bool ehooks_default_purge_lazy_impl(void *addr, size_t offset, size_t length);
#endif
#ifdef PAGES_CAN_PURGE_FORCED
bool ehooks_default_purge_forced_impl(void *addr, size_t offset, size_t length);
#endif
bool ehooks_default_split_impl(void);
/*
* Merge is the only default extent hook we declare -- see the comment in
* ehooks_merge.
*/
bool ehooks_default_merge(extent_hooks_t *extent_hooks, void *addr_a,
size_t size_a, void *addr_b, size_t size_b, bool committed,
unsigned arena_ind);
bool ehooks_default_merge_impl(tsdn_t *tsdn, void *addr_a, void *addr_b);
void ehooks_default_zero_impl(void *addr, size_t size);
void ehooks_default_guard_impl(void *guard1, void *guard2);
void ehooks_default_unguard_impl(void *guard1, void *guard2);
/*
* We don't officially support reentrancy from wtihin the extent hooks. But
* various people who sit within throwing distance of the jemalloc team want
* that functionality in certain limited cases. The default reentrancy guards
* assert that we're not reentrant from a0 (since it's the bootstrap arena,
* where reentrant allocations would be redirected), which we would incorrectly
* trigger in cases where a0 has extent hooks (those hooks themselves can't be
* reentrant, then, but there are reasonable uses for such functionality, like
* putting internal metadata on hugepages). Therefore, we use the raw
* reentrancy guards.
*
* Eventually, we need to think more carefully about whether and where we
* support allocating from within extent hooks (and what that means for things
* like profiling, stats collection, etc.), and document what the guarantee is.
*/
static inline void
ehooks_pre_reentrancy(tsdn_t *tsdn) {
tsd_t *tsd = tsdn_null(tsdn) ? tsd_fetch() : tsdn_tsd(tsdn);
tsd_pre_reentrancy_raw(tsd);
}
static inline void
ehooks_post_reentrancy(tsdn_t *tsdn) {
tsd_t *tsd = tsdn_null(tsdn) ? tsd_fetch() : tsdn_tsd(tsdn);
tsd_post_reentrancy_raw(tsd);
}
/* Beginning of the public API. */
void ehooks_init(ehooks_t *ehooks, extent_hooks_t *extent_hooks, unsigned ind);
static inline unsigned
ehooks_ind_get(const ehooks_t *ehooks) {
return ehooks->ind;
}
static inline void
ehooks_set_extent_hooks_ptr(ehooks_t *ehooks, extent_hooks_t *extent_hooks) {
atomic_store_p(&ehooks->ptr, extent_hooks, ATOMIC_RELEASE);
}
static inline extent_hooks_t *
ehooks_get_extent_hooks_ptr(ehooks_t *ehooks) {
return (extent_hooks_t *)atomic_load_p(&ehooks->ptr, ATOMIC_ACQUIRE);
}
static inline bool
ehooks_are_default(ehooks_t *ehooks) {
return ehooks_get_extent_hooks_ptr(ehooks)
== &ehooks_default_extent_hooks;
}
/*
* In some cases, a caller needs to allocate resources before attempting to call
* a hook. If that hook is doomed to fail, this is wasteful. We therefore
* include some checks for such cases.
*/
static inline bool
ehooks_dalloc_will_fail(ehooks_t *ehooks) {
if (ehooks_are_default(ehooks)) {
return opt_retain;
} else {
return ehooks_get_extent_hooks_ptr(ehooks)->dalloc == NULL;
}
}
static inline bool
ehooks_split_will_fail(ehooks_t *ehooks) {
return ehooks_get_extent_hooks_ptr(ehooks)->split == NULL;
}
static inline bool
ehooks_merge_will_fail(ehooks_t *ehooks) {
return ehooks_get_extent_hooks_ptr(ehooks)->merge == NULL;
}
static inline bool
ehooks_guard_will_fail(ehooks_t *ehooks) {
/*
* Before the guard hooks are officially introduced, limit the use to
* the default hooks only.
*/
return !ehooks_are_default(ehooks);
}
/*
* Some hooks are required to return zeroed memory in certain situations. In
* debug mode, we do some heuristic checks that they did what they were supposed
* to.
*
* This isn't really ehooks-specific (i.e. anyone can check for zeroed memory).
* But incorrect zero information indicates an ehook bug.
*/
static inline void
ehooks_debug_zero_check(void *addr, size_t size) {
assert(((uintptr_t)addr & PAGE_MASK) == 0);
assert((size & PAGE_MASK) == 0);
assert(size > 0);
if (config_debug) {
/* Check the whole first page. */
size_t *p = (size_t *)addr;
for (size_t i = 0; i < PAGE / sizeof(size_t); i++) {
assert(p[i] == 0);
}
/*
* And 4 spots within. There's a tradeoff here; the larger
* this number, the more likely it is that we'll catch a bug
* where ehooks return a sparsely non-zero range. But
* increasing the number of checks also increases the number of
* page faults in debug mode. FreeBSD does much of their
* day-to-day development work in debug mode, so we don't want
* even the debug builds to be too slow.
*/
const size_t nchecks = 4;
assert(PAGE >= sizeof(size_t) * nchecks);
for (size_t i = 0; i < nchecks; ++i) {
assert(p[i * (size / sizeof(size_t) / nchecks)] == 0);
}
}
}
static inline void *
ehooks_alloc(tsdn_t *tsdn, ehooks_t *ehooks, void *new_addr, size_t size,
size_t alignment, bool *zero, bool *commit) {
bool orig_zero = *zero;
void *ret;
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ret = ehooks_default_alloc_impl(tsdn, new_addr, size, alignment,
zero, commit, ehooks_ind_get(ehooks));
} else {
ehooks_pre_reentrancy(tsdn);
ret = extent_hooks->alloc(extent_hooks, new_addr, size,
alignment, zero, commit, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
}
assert(new_addr == NULL || ret == NULL || new_addr == ret);
assert(!orig_zero || *zero);
if (*zero && ret != NULL) {
ehooks_debug_zero_check(ret, size);
}
return ret;
}
static inline bool
ehooks_dalloc(
tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size, bool committed) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_dalloc_impl(addr, size);
} else if (extent_hooks->dalloc == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->dalloc(extent_hooks, addr, size,
committed, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline void
ehooks_destroy(
tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size, bool committed) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ehooks_default_destroy_impl(addr, size);
} else if (extent_hooks->destroy == NULL) {
/* Do nothing. */
} else {
ehooks_pre_reentrancy(tsdn);
extent_hooks->destroy(extent_hooks, addr, size, committed,
ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
}
}
static inline bool
ehooks_commit(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t offset, size_t length) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
bool err;
if (extent_hooks == &ehooks_default_extent_hooks) {
err = ehooks_default_commit_impl(addr, offset, length);
} else if (extent_hooks->commit == NULL) {
err = true;
} else {
ehooks_pre_reentrancy(tsdn);
err = extent_hooks->commit(extent_hooks, addr, size, offset,
length, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
}
if (!err) {
ehooks_debug_zero_check(addr, size);
}
return err;
}
static inline bool
ehooks_decommit(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t offset, size_t length) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_decommit_impl(addr, offset, length);
} else if (extent_hooks->decommit == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->decommit(extent_hooks, addr, size,
offset, length, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline bool
ehooks_purge_lazy(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t offset, size_t length) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
#ifdef PAGES_CAN_PURGE_LAZY
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_purge_lazy_impl(addr, offset, length);
}
#endif
if (extent_hooks->purge_lazy == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->purge_lazy(extent_hooks, addr, size,
offset, length, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline bool
ehooks_purge_forced(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t offset, size_t length) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
/*
* It would be correct to have a ehooks_debug_zero_check call at the end
* of this function; purge_forced is required to zero. But checking
* would touch the page in question, which may have performance
* consequences (imagine the hooks are using hugepages, with a global
* zero page off). Even in debug mode, it's usually a good idea to
* avoid cases that can dramatically increase memory consumption.
*/
#ifdef PAGES_CAN_PURGE_FORCED
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_purge_forced_impl(addr, offset, length);
}
#endif
if (extent_hooks->purge_forced == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->purge_forced(extent_hooks, addr, size,
offset, length, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline bool
ehooks_split(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size,
size_t size_a, size_t size_b, bool committed) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (ehooks_are_default(ehooks)) {
return ehooks_default_split_impl();
} else if (extent_hooks->split == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->split(extent_hooks, addr, size, size_a,
size_b, committed, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline bool
ehooks_merge(tsdn_t *tsdn, ehooks_t *ehooks, void *addr_a, size_t size_a,
void *addr_b, size_t size_b, bool committed) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
return ehooks_default_merge_impl(tsdn, addr_a, addr_b);
} else if (extent_hooks->merge == NULL) {
return true;
} else {
ehooks_pre_reentrancy(tsdn);
bool err = extent_hooks->merge(extent_hooks, addr_a, size_a,
addr_b, size_b, committed, ehooks_ind_get(ehooks));
ehooks_post_reentrancy(tsdn);
return err;
}
}
static inline void
ehooks_zero(tsdn_t *tsdn, ehooks_t *ehooks, void *addr, size_t size) {
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ehooks_default_zero_impl(addr, size);
} else {
/*
* It would be correct to try using the user-provided purge
* hooks (since they are required to have zeroed the extent if
* they indicate success), but we don't necessarily know their
* cost. We'll be conservative and use memset.
*/
memset(addr, 0, size);
}
}
static inline bool
ehooks_guard(tsdn_t *tsdn, ehooks_t *ehooks, void *guard1, void *guard2) {
bool err;
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ehooks_default_guard_impl(guard1, guard2);
err = false;
} else {
err = true;
}
return err;
}
static inline bool
ehooks_unguard(tsdn_t *tsdn, ehooks_t *ehooks, void *guard1, void *guard2) {
bool err;
extent_hooks_t *extent_hooks = ehooks_get_extent_hooks_ptr(ehooks);
if (extent_hooks == &ehooks_default_extent_hooks) {
ehooks_default_unguard_impl(guard1, guard2);
err = false;
} else {
err = true;
}
return err;
}
#endif /* JEMALLOC_INTERNAL_EHOOKS_H */

View file

@ -0,0 +1,397 @@
#ifndef JEMALLOC_INTERNAL_EMAP_H
#define JEMALLOC_INTERNAL_EMAP_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/base.h"
#include "jemalloc/internal/rtree.h"
/*
* Note: Ends without at semicolon, so that
* EMAP_DECLARE_RTREE_CTX;
* in uses will avoid empty-statement warnings.
*/
#define EMAP_DECLARE_RTREE_CTX \
rtree_ctx_t rtree_ctx_fallback; \
rtree_ctx_t *rtree_ctx = tsdn_rtree_ctx(tsdn, &rtree_ctx_fallback)
typedef struct emap_s emap_t;
struct emap_s {
rtree_t rtree;
};
/* Used to pass rtree lookup context down the path. */
typedef struct emap_alloc_ctx_s emap_alloc_ctx_t;
struct emap_alloc_ctx_s {
size_t usize;
szind_t szind;
bool slab;
};
typedef struct emap_full_alloc_ctx_s emap_full_alloc_ctx_t;
struct emap_full_alloc_ctx_s {
szind_t szind;
bool slab;
edata_t *edata;
};
bool emap_init(emap_t *emap, base_t *base, bool zeroed);
void emap_remap(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, szind_t szind, bool slab);
void emap_update_edata_state(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, extent_state_t state);
/*
* The two acquire functions below allow accessing neighbor edatas, if it's safe
* and valid to do so (i.e. from the same arena, of the same state, etc.). This
* is necessary because the ecache locks are state based, and only protect
* edatas with the same state. Therefore the neighbor edata's state needs to be
* verified first, before chasing the edata pointer. The returned edata will be
* in an acquired state, meaning other threads will be prevented from accessing
* it, even if technically the edata can still be discovered from the rtree.
*
* This means, at any moment when holding pointers to edata, either one of the
* state based locks is held (and the edatas are all of the protected state), or
* the edatas are in an acquired state (e.g. in active or merging state). The
* acquire operation itself (changing the edata to an acquired state) is done
* under the state locks.
*/
edata_t *emap_try_acquire_edata_neighbor(tsdn_t *tsdn, emap_t *emap,
edata_t *edata, extent_pai_t pai, extent_state_t expected_state,
bool forward);
edata_t *emap_try_acquire_edata_neighbor_expand(tsdn_t *tsdn, emap_t *emap,
edata_t *edata, extent_pai_t pai, extent_state_t expected_state);
void emap_release_edata(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, extent_state_t new_state);
/*
* Associate the given edata with its beginning and end address, setting the
* szind and slab info appropriately.
* Returns true on error (i.e. resource exhaustion).
*/
bool emap_register_boundary(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, szind_t szind, bool slab);
/*
* Does the same thing, but with the interior of the range, for slab
* allocations.
*
* You might wonder why we don't just have a single emap_register function that
* does both depending on the value of 'slab'. The answer is twofold:
* - As a practical matter, in places like the extract->split->commit pathway,
* we defer the interior operation until we're sure that the commit won't fail
* (but we have to register the split boundaries there).
* - In general, we're trying to move to a world where the page-specific
* allocator doesn't know as much about how the pages it allocates will be
* used, and passing a 'slab' parameter everywhere makes that more
* complicated.
*
* Unlike the boundary version, this function can't fail; this is because slabs
* can't get big enough to touch a new page that neither of the boundaries
* touched, so no allocation is necessary to fill the interior once the boundary
* has been touched.
*/
void emap_register_interior(
tsdn_t *tsdn, emap_t *emap, edata_t *edata, szind_t szind);
void emap_deregister_boundary(tsdn_t *tsdn, emap_t *emap, edata_t *edata);
void emap_deregister_interior(tsdn_t *tsdn, emap_t *emap, edata_t *edata);
typedef struct emap_prepare_s emap_prepare_t;
struct emap_prepare_s {
rtree_leaf_elm_t *lead_elm_a;
rtree_leaf_elm_t *lead_elm_b;
rtree_leaf_elm_t *trail_elm_a;
rtree_leaf_elm_t *trail_elm_b;
};
/**
* These functions the emap metadata management for merging, splitting, and
* reusing extents. In particular, they set the boundary mappings from
* addresses to edatas. If the result is going to be used as a slab, you
* still need to call emap_register_interior on it, though.
*
* Remap simply changes the szind and slab status of an extent's boundary
* mappings. If the extent is not a slab, it doesn't bother with updating the
* end mapping (since lookups only occur in the interior of an extent for
* slabs). Since the szind and slab status only make sense for active extents,
* this should only be called while activating or deactivating an extent.
*
* Split and merge have a "prepare" and a "commit" portion. The prepare portion
* does the operations that can be done without exclusive access to the extent
* in question, while the commit variant requires exclusive access to maintain
* the emap invariants. The only function that can fail is emap_split_prepare,
* and it returns true on failure (at which point the caller shouldn't commit).
*
* In all cases, "lead" refers to the lower-addressed extent, and trail to the
* higher-addressed one. It's the caller's responsibility to set the edata
* state appropriately.
*/
bool emap_split_prepare(tsdn_t *tsdn, emap_t *emap, emap_prepare_t *prepare,
edata_t *edata, size_t size_a, edata_t *trail, size_t size_b);
void emap_split_commit(tsdn_t *tsdn, emap_t *emap, emap_prepare_t *prepare,
edata_t *lead, size_t size_a, edata_t *trail, size_t size_b);
void emap_merge_prepare(tsdn_t *tsdn, emap_t *emap, emap_prepare_t *prepare,
edata_t *lead, edata_t *trail);
void emap_merge_commit(tsdn_t *tsdn, emap_t *emap, emap_prepare_t *prepare,
edata_t *lead, edata_t *trail);
/* Assert that the emap's view of the given edata matches the edata's view. */
void emap_do_assert_mapped(tsdn_t *tsdn, emap_t *emap, edata_t *edata);
static inline void
emap_assert_mapped(tsdn_t *tsdn, emap_t *emap, edata_t *edata) {
if (config_debug) {
emap_do_assert_mapped(tsdn, emap, edata);
}
}
/* Assert that the given edata isn't in the map. */
void emap_do_assert_not_mapped(tsdn_t *tsdn, emap_t *emap, edata_t *edata);
static inline void
emap_assert_not_mapped(tsdn_t *tsdn, emap_t *emap, edata_t *edata) {
if (config_debug) {
emap_do_assert_not_mapped(tsdn, emap, edata);
}
}
JEMALLOC_ALWAYS_INLINE bool
emap_edata_in_transition(tsdn_t *tsdn, emap_t *emap, edata_t *edata) {
assert(config_debug);
emap_assert_mapped(tsdn, emap, edata);
EMAP_DECLARE_RTREE_CTX;
rtree_contents_t contents = rtree_read(
tsdn, &emap->rtree, rtree_ctx, (uintptr_t)edata_base_get(edata));
return edata_state_in_transition(contents.metadata.state);
}
JEMALLOC_ALWAYS_INLINE bool
emap_edata_is_acquired(tsdn_t *tsdn, emap_t *emap, edata_t *edata) {
if (!config_debug) {
/* For assertions only. */
return false;
}
/*
* The edata is considered acquired if no other threads will attempt to
* read / write any fields from it. This includes a few cases:
*
* 1) edata not hooked into emap yet -- This implies the edata just got
* allocated or initialized.
*
* 2) in an active or transition state -- In both cases, the edata can
* be discovered from the emap, however the state tracked in the rtree
* will prevent other threads from accessing the actual edata.
*/
EMAP_DECLARE_RTREE_CTX;
rtree_leaf_elm_t *elm = rtree_leaf_elm_lookup(tsdn, &emap->rtree,
rtree_ctx, (uintptr_t)edata_base_get(edata), /* dependent */ false,
/* init_missing */ false);
if (elm == NULL) {
return true;
}
rtree_contents_t contents = rtree_leaf_elm_read(tsdn, &emap->rtree, elm,
/* dependent */ false);
if (contents.edata == NULL
|| contents.metadata.state == extent_state_active
|| edata_state_in_transition(contents.metadata.state)) {
return true;
}
return false;
}
JEMALLOC_ALWAYS_INLINE void
extent_assert_can_coalesce(const edata_t *inner, const edata_t *outer) {
assert(edata_arena_ind_get(inner) == edata_arena_ind_get(outer));
assert(edata_pai_get(inner) == edata_pai_get(outer));
assert(edata_committed_get(inner) == edata_committed_get(outer));
assert(edata_state_get(inner) == extent_state_active);
assert(edata_state_get(outer) == extent_state_merging);
assert(!edata_guarded_get(inner) && !edata_guarded_get(outer));
assert(edata_base_get(inner) == edata_past_get(outer)
|| edata_base_get(outer) == edata_past_get(inner));
}
JEMALLOC_ALWAYS_INLINE void
extent_assert_can_expand(const edata_t *original, const edata_t *expand) {
assert(edata_arena_ind_get(original) == edata_arena_ind_get(expand));
assert(edata_pai_get(original) == edata_pai_get(expand));
assert(edata_state_get(original) == extent_state_active);
assert(edata_state_get(expand) == extent_state_merging);
assert(edata_past_get(original) == edata_base_get(expand));
}
JEMALLOC_ALWAYS_INLINE edata_t *
emap_edata_lookup(tsdn_t *tsdn, emap_t *emap, const void *ptr) {
EMAP_DECLARE_RTREE_CTX;
return rtree_read(tsdn, &emap->rtree, rtree_ctx, (uintptr_t)ptr).edata;
}
JEMALLOC_ALWAYS_INLINE void
emap_alloc_ctx_init(
emap_alloc_ctx_t *alloc_ctx, szind_t szind, bool slab, size_t usize) {
alloc_ctx->szind = szind;
alloc_ctx->slab = slab;
alloc_ctx->usize = usize;
assert(
sz_large_size_classes_disabled() || usize == sz_index2size(szind));
}
JEMALLOC_ALWAYS_INLINE size_t
emap_alloc_ctx_usize_get(emap_alloc_ctx_t *alloc_ctx) {
assert(alloc_ctx->szind < SC_NSIZES);
if (alloc_ctx->slab) {
assert(alloc_ctx->usize == sz_index2size(alloc_ctx->szind));
return sz_index2size(alloc_ctx->szind);
}
assert(sz_large_size_classes_disabled()
|| alloc_ctx->usize == sz_index2size(alloc_ctx->szind));
assert(alloc_ctx->usize <= SC_LARGE_MAXCLASS);
return alloc_ctx->usize;
}
/* Fills in alloc_ctx with the info in the map. */
JEMALLOC_ALWAYS_INLINE void
emap_alloc_ctx_lookup(
tsdn_t *tsdn, emap_t *emap, const void *ptr, emap_alloc_ctx_t *alloc_ctx) {
EMAP_DECLARE_RTREE_CTX;
rtree_contents_t contents = rtree_read(
tsdn, &emap->rtree, rtree_ctx, (uintptr_t)ptr);
/*
* If the alloc is invalid, do not calculate usize since edata
* could be corrupted.
*/
emap_alloc_ctx_init(alloc_ctx, contents.metadata.szind,
contents.metadata.slab,
(contents.metadata.szind == SC_NSIZES || contents.edata == NULL)
? 0
: edata_usize_get(contents.edata));
}
/* The pointer must be mapped. */
JEMALLOC_ALWAYS_INLINE void
emap_full_alloc_ctx_lookup(tsdn_t *tsdn, emap_t *emap, const void *ptr,
emap_full_alloc_ctx_t *full_alloc_ctx) {
EMAP_DECLARE_RTREE_CTX;
rtree_contents_t contents = rtree_read(
tsdn, &emap->rtree, rtree_ctx, (uintptr_t)ptr);
full_alloc_ctx->edata = contents.edata;
full_alloc_ctx->szind = contents.metadata.szind;
full_alloc_ctx->slab = contents.metadata.slab;
}
/*
* The pointer is allowed to not be mapped.
*
* Returns true when the pointer is not present.
*/
JEMALLOC_ALWAYS_INLINE bool
emap_full_alloc_ctx_try_lookup(tsdn_t *tsdn, emap_t *emap, const void *ptr,
emap_full_alloc_ctx_t *full_alloc_ctx) {
EMAP_DECLARE_RTREE_CTX;
rtree_contents_t contents;
bool err = rtree_read_independent(
tsdn, &emap->rtree, rtree_ctx, (uintptr_t)ptr, &contents);
if (err) {
return true;
}
full_alloc_ctx->edata = contents.edata;
full_alloc_ctx->szind = contents.metadata.szind;
full_alloc_ctx->slab = contents.metadata.slab;
return false;
}
/*
* Only used on the fastpath of free. Returns true when cannot be fulfilled by
* fast path, e.g. when the metadata key is not cached.
*/
JEMALLOC_ALWAYS_INLINE bool
emap_alloc_ctx_try_lookup_fast(
tsd_t *tsd, emap_t *emap, const void *ptr, emap_alloc_ctx_t *alloc_ctx) {
/* Use the unsafe getter since this may gets called during exit. */
rtree_ctx_t *rtree_ctx = tsd_rtree_ctxp_get_unsafe(tsd);
rtree_metadata_t metadata;
bool err = rtree_metadata_try_read_fast(
tsd_tsdn(tsd), &emap->rtree, rtree_ctx, (uintptr_t)ptr, &metadata);
if (err) {
return true;
}
/*
* Small allocs using the fastpath can always use index to get the
* usize. Therefore, do not set alloc_ctx->usize here.
*/
alloc_ctx->szind = metadata.szind;
alloc_ctx->slab = metadata.slab;
if (config_debug) {
alloc_ctx->usize = SC_LARGE_MAXCLASS + 1;
}
return false;
}
/*
* We want to do batch lookups out of the cache bins, which use
* cache_bin_ptr_array_get to access the i'th element of the bin (since they
* invert usual ordering in deciding what to flush). This lets the emap avoid
* caring about its caller's ordering.
*/
typedef const void *(*emap_ptr_getter)(void *ctx, size_t ind);
/*
* This allows size-checking assertions, which we can only do while we're in the
* process of edata lookups.
*/
typedef void (*emap_metadata_visitor)(
void *ctx, emap_full_alloc_ctx_t *alloc_ctx);
typedef union emap_batch_lookup_result_u emap_batch_lookup_result_t;
union emap_batch_lookup_result_u {
edata_t *edata;
rtree_leaf_elm_t *rtree_leaf;
};
JEMALLOC_ALWAYS_INLINE void
emap_edata_lookup_batch(tsd_t *tsd, emap_t *emap, size_t nptrs,
emap_ptr_getter ptr_getter, void *ptr_getter_ctx,
emap_metadata_visitor metadata_visitor, void *metadata_visitor_ctx,
emap_batch_lookup_result_t *result) {
/* Avoids null-checking tsdn in the loop below. */
util_assume(tsd != NULL);
rtree_ctx_t *rtree_ctx = tsd_rtree_ctxp_get(tsd);
for (size_t i = 0; i < nptrs; i++) {
const void *ptr = ptr_getter(ptr_getter_ctx, i);
/*
* Reuse the edatas array as a temp buffer, lying a little about
* the types.
*/
result[i].rtree_leaf = rtree_leaf_elm_lookup(tsd_tsdn(tsd),
&emap->rtree, rtree_ctx, (uintptr_t)ptr,
/* dependent */ true, /* init_missing */ false);
}
for (size_t i = 0; i < nptrs; i++) {
rtree_leaf_elm_t *elm = result[i].rtree_leaf;
rtree_contents_t contents = rtree_leaf_elm_read(
tsd_tsdn(tsd), &emap->rtree, elm, /* dependent */ true);
result[i].edata = contents.edata;
emap_full_alloc_ctx_t alloc_ctx;
/*
* Not all these fields are read in practice by the metadata
* visitor. But the compiler can easily optimize away the ones
* that aren't, so no sense in being incomplete.
*/
alloc_ctx.szind = contents.metadata.szind;
alloc_ctx.slab = contents.metadata.slab;
alloc_ctx.edata = contents.edata;
metadata_visitor(metadata_visitor_ctx, &alloc_ctx);
}
}
#endif /* JEMALLOC_INTERNAL_EMAP_H */

View file

@ -0,0 +1,530 @@
#ifndef JEMALLOC_INTERNAL_EMITTER_H
#define JEMALLOC_INTERNAL_EMITTER_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/malloc_io.h"
#include "jemalloc/internal/ql.h"
typedef enum emitter_output_e emitter_output_t;
enum emitter_output_e {
emitter_output_json,
emitter_output_json_compact,
emitter_output_table
};
typedef enum emitter_justify_e emitter_justify_t;
enum emitter_justify_e {
emitter_justify_left,
emitter_justify_right,
/* Not for users; just to pass to internal functions. */
emitter_justify_none
};
typedef enum emitter_type_e emitter_type_t;
enum emitter_type_e {
emitter_type_bool,
emitter_type_int,
emitter_type_int64,
emitter_type_unsigned,
emitter_type_uint32,
emitter_type_uint64,
emitter_type_size,
emitter_type_ssize,
emitter_type_string,
/*
* A title is a column title in a table; it's just a string, but it's
* not quoted.
*/
emitter_type_title,
};
typedef struct emitter_col_s emitter_col_t;
struct emitter_col_s {
/* Filled in by the user. */
emitter_justify_t justify;
int width;
emitter_type_t type;
union {
bool bool_val;
int int_val;
unsigned unsigned_val;
uint32_t uint32_val;
uint32_t uint32_t_val;
uint64_t uint64_val;
uint64_t uint64_t_val;
size_t size_val;
ssize_t ssize_val;
const char *str_val;
};
/* Filled in by initialization. */
ql_elm(emitter_col_t) link;
};
typedef struct emitter_row_s emitter_row_t;
struct emitter_row_s {
ql_head(emitter_col_t) cols;
};
typedef struct emitter_s emitter_t;
struct emitter_s {
emitter_output_t output;
/* The output information. */
write_cb_t *write_cb;
void *cbopaque;
int nesting_depth;
/* True if we've already emitted a value at the given depth. */
bool item_at_depth;
/* True if we emitted a key and will emit corresponding value next. */
bool emitted_key;
};
static inline bool
emitter_outputs_json(emitter_t *emitter) {
return emitter->output == emitter_output_json
|| emitter->output == emitter_output_json_compact;
}
/* Internal convenience function. Write to the emitter the given string. */
JEMALLOC_FORMAT_PRINTF(2, 3)
static inline void
emitter_printf(emitter_t *emitter, const char *format, ...) {
va_list ap;
va_start(ap, format);
malloc_vcprintf(emitter->write_cb, emitter->cbopaque, format, ap);
va_end(ap);
}
static inline const char *
JEMALLOC_FORMAT_ARG(3) emitter_gen_fmt(char *out_fmt, size_t out_size,
const char *fmt_specifier, emitter_justify_t justify, int width) {
size_t written;
fmt_specifier++;
if (justify == emitter_justify_none) {
written = malloc_snprintf(
out_fmt, out_size, "%%%s", fmt_specifier);
} else if (justify == emitter_justify_left) {
written = malloc_snprintf(
out_fmt, out_size, "%%-%d%s", width, fmt_specifier);
} else {
written = malloc_snprintf(
out_fmt, out_size, "%%%d%s", width, fmt_specifier);
}
/* Only happens in case of bad format string, which *we* choose. */
assert(written < out_size);
return out_fmt;
}
static inline void
emitter_emit_str(emitter_t *emitter, emitter_justify_t justify, int width,
char *fmt, size_t fmt_size, const char *str) {
#define BUF_SIZE 256
char buf[BUF_SIZE];
size_t str_written = malloc_snprintf(buf, BUF_SIZE, "\"%s\"", str);
emitter_printf(
emitter, emitter_gen_fmt(fmt, fmt_size, "%s", justify, width), buf);
if (str_written < BUF_SIZE) {
return;
}
/*
* There is no support for long string justification at the moment as
* we output them partially with multiple malloc_snprintf calls and
* justufication will work correctly only withing one call.
* Fortunately this is not a big concern as we don't use justufication
* with long strings right now.
*
* We emitted leading quotation mark and trailing '\0', hence need to
* exclude extra characters from str shift.
*/
str += BUF_SIZE - 2;
do {
str_written = malloc_snprintf(buf, BUF_SIZE, "%s\"", str);
str += str_written >= BUF_SIZE ? BUF_SIZE - 1 : str_written;
emitter_printf(emitter,
emitter_gen_fmt(fmt, fmt_size, "%s", justify, width), buf);
} while (str_written >= BUF_SIZE);
#undef BUF_SIZE
}
/*
* Internal. Emit the given value type in the relevant encoding (so that the
* bool true gets mapped to json "true", but the string "true" gets mapped to
* json "\"true\"", for instance.
*
* Width is ignored if justify is emitter_justify_none.
*/
static inline void
emitter_print_value(emitter_t *emitter, emitter_justify_t justify, int width,
emitter_type_t value_type, const void *value) {
#define FMT_SIZE 10
/*
* We dynamically generate a format string to emit, to let us use the
* snprintf machinery. This is kinda hacky, but gets the job done
* quickly without having to think about the various snprintf edge
* cases.
*/
char fmt[FMT_SIZE];
#define EMIT_SIMPLE(type, format) \
emitter_printf(emitter, \
emitter_gen_fmt(fmt, FMT_SIZE, format, justify, width), \
*(const type *)value);
switch (value_type) {
case emitter_type_bool:
emitter_printf(emitter,
emitter_gen_fmt(fmt, FMT_SIZE, "%s", justify, width),
*(const bool *)value ? "true" : "false");
break;
case emitter_type_int:
EMIT_SIMPLE(int, "%d")
break;
case emitter_type_int64:
EMIT_SIMPLE(int64_t, "%" FMTd64)
break;
case emitter_type_unsigned:
EMIT_SIMPLE(unsigned, "%u")
break;
case emitter_type_ssize:
EMIT_SIMPLE(ssize_t, "%zd")
break;
case emitter_type_size:
EMIT_SIMPLE(size_t, "%zu")
break;
case emitter_type_string:
emitter_emit_str(emitter, justify, width, fmt, FMT_SIZE,
*(const char *const *)value);
break;
case emitter_type_uint32:
EMIT_SIMPLE(uint32_t, "%" FMTu32)
break;
case emitter_type_uint64:
EMIT_SIMPLE(uint64_t, "%" FMTu64)
break;
case emitter_type_title:
EMIT_SIMPLE(char *const, "%s");
break;
default:
unreachable();
}
#undef FMT_SIZE
}
/* Internal functions. In json mode, tracks nesting state. */
static inline void
emitter_nest_inc(emitter_t *emitter) {
emitter->nesting_depth++;
emitter->item_at_depth = false;
}
static inline void
emitter_nest_dec(emitter_t *emitter) {
emitter->nesting_depth--;
emitter->item_at_depth = true;
}
static inline void
emitter_indent(emitter_t *emitter) {
int amount = emitter->nesting_depth;
const char *indent_str;
assert(emitter->output != emitter_output_json_compact);
if (emitter->output == emitter_output_json) {
indent_str = "\t";
} else {
amount *= 2;
indent_str = " ";
}
for (int i = 0; i < amount; i++) {
emitter_printf(emitter, "%s", indent_str);
}
}
static inline void
emitter_json_key_prefix(emitter_t *emitter) {
assert(emitter_outputs_json(emitter));
if (emitter->emitted_key) {
emitter->emitted_key = false;
return;
}
if (emitter->item_at_depth) {
emitter_printf(emitter, ",");
}
if (emitter->output != emitter_output_json_compact) {
emitter_printf(emitter, "\n");
emitter_indent(emitter);
}
}
/******************************************************************************/
/* Public functions for emitter_t. */
static inline void
emitter_init(emitter_t *emitter, emitter_output_t emitter_output,
write_cb_t *write_cb, void *cbopaque) {
emitter->output = emitter_output;
emitter->write_cb = write_cb;
emitter->cbopaque = cbopaque;
emitter->item_at_depth = false;
emitter->emitted_key = false;
emitter->nesting_depth = 0;
}
/******************************************************************************/
/* JSON public API. */
/*
* Emits a key (e.g. as appears in an object). The next json entity emitted will
* be the corresponding value.
*/
static inline void
emitter_json_key(emitter_t *emitter, const char *json_key) {
if (emitter_outputs_json(emitter)) {
emitter_json_key_prefix(emitter);
emitter_printf(emitter, "\"%s\":%s", json_key,
emitter->output == emitter_output_json_compact ? "" : " ");
emitter->emitted_key = true;
}
}
static inline void
emitter_json_value(
emitter_t *emitter, emitter_type_t value_type, const void *value) {
if (emitter_outputs_json(emitter)) {
emitter_json_key_prefix(emitter);
emitter_print_value(
emitter, emitter_justify_none, -1, value_type, value);
emitter->item_at_depth = true;
}
}
/* Shorthand for calling emitter_json_key and then emitter_json_value. */
static inline void
emitter_json_kv(emitter_t *emitter, const char *json_key,
emitter_type_t value_type, const void *value) {
emitter_json_key(emitter, json_key);
emitter_json_value(emitter, value_type, value);
}
static inline void
emitter_json_array_begin(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
emitter_json_key_prefix(emitter);
emitter_printf(emitter, "[");
emitter_nest_inc(emitter);
}
}
/* Shorthand for calling emitter_json_key and then emitter_json_array_begin. */
static inline void
emitter_json_array_kv_begin(emitter_t *emitter, const char *json_key) {
emitter_json_key(emitter, json_key);
emitter_json_array_begin(emitter);
}
static inline void
emitter_json_array_end(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
assert(emitter->nesting_depth > 0);
emitter_nest_dec(emitter);
if (emitter->output != emitter_output_json_compact) {
emitter_printf(emitter, "\n");
emitter_indent(emitter);
}
emitter_printf(emitter, "]");
}
}
static inline void
emitter_json_object_begin(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
emitter_json_key_prefix(emitter);
emitter_printf(emitter, "{");
emitter_nest_inc(emitter);
}
}
/* Shorthand for calling emitter_json_key and then emitter_json_object_begin. */
static inline void
emitter_json_object_kv_begin(emitter_t *emitter, const char *json_key) {
emitter_json_key(emitter, json_key);
emitter_json_object_begin(emitter);
}
static inline void
emitter_json_object_end(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
assert(emitter->nesting_depth > 0);
emitter_nest_dec(emitter);
if (emitter->output != emitter_output_json_compact) {
emitter_printf(emitter, "\n");
emitter_indent(emitter);
}
emitter_printf(emitter, "}");
}
}
/******************************************************************************/
/* Table public API. */
static inline void
emitter_table_dict_begin(emitter_t *emitter, const char *table_key) {
if (emitter->output == emitter_output_table) {
emitter_indent(emitter);
emitter_printf(emitter, "%s\n", table_key);
emitter_nest_inc(emitter);
}
}
static inline void
emitter_table_dict_end(emitter_t *emitter) {
if (emitter->output == emitter_output_table) {
emitter_nest_dec(emitter);
}
}
static inline void
emitter_table_kv_note(emitter_t *emitter, const char *table_key,
emitter_type_t value_type, const void *value, const char *table_note_key,
emitter_type_t table_note_value_type, const void *table_note_value) {
if (emitter->output == emitter_output_table) {
emitter_indent(emitter);
emitter_printf(emitter, "%s: ", table_key);
emitter_print_value(
emitter, emitter_justify_none, -1, value_type, value);
if (table_note_key != NULL) {
emitter_printf(emitter, " (%s: ", table_note_key);
emitter_print_value(emitter, emitter_justify_none, -1,
table_note_value_type, table_note_value);
emitter_printf(emitter, ")");
}
emitter_printf(emitter, "\n");
}
emitter->item_at_depth = true;
}
static inline void
emitter_table_kv(emitter_t *emitter, const char *table_key,
emitter_type_t value_type, const void *value) {
emitter_table_kv_note(emitter, table_key, value_type, value, NULL,
emitter_type_bool, NULL);
}
/* Write to the emitter the given string, but only in table mode. */
JEMALLOC_FORMAT_PRINTF(2, 3)
static inline void
emitter_table_printf(emitter_t *emitter, const char *format, ...) {
if (emitter->output == emitter_output_table) {
va_list ap;
va_start(ap, format);
malloc_vcprintf(
emitter->write_cb, emitter->cbopaque, format, ap);
va_end(ap);
}
}
static inline void
emitter_table_row(emitter_t *emitter, emitter_row_t *row) {
if (emitter->output != emitter_output_table) {
return;
}
emitter_col_t *col;
ql_foreach (col, &row->cols, link) {
emitter_print_value(emitter, col->justify, col->width,
col->type, (const void *)&col->bool_val);
}
emitter_table_printf(emitter, "\n");
}
static inline void
emitter_row_init(emitter_row_t *row) {
ql_new(&row->cols);
}
static inline void
emitter_col_init(emitter_col_t *col, emitter_row_t *row) {
ql_elm_new(col, link);
ql_tail_insert(&row->cols, col, link);
}
/******************************************************************************/
/*
* Generalized public API. Emits using either JSON or table, according to
* settings in the emitter_t. */
/*
* Note emits a different kv pair as well, but only in table mode. Omits the
* note if table_note_key is NULL.
*/
static inline void
emitter_kv_note(emitter_t *emitter, const char *json_key, const char *table_key,
emitter_type_t value_type, const void *value, const char *table_note_key,
emitter_type_t table_note_value_type, const void *table_note_value) {
if (emitter_outputs_json(emitter)) {
emitter_json_key(emitter, json_key);
emitter_json_value(emitter, value_type, value);
} else {
emitter_table_kv_note(emitter, table_key, value_type, value,
table_note_key, table_note_value_type, table_note_value);
}
emitter->item_at_depth = true;
}
static inline void
emitter_kv(emitter_t *emitter, const char *json_key, const char *table_key,
emitter_type_t value_type, const void *value) {
emitter_kv_note(emitter, json_key, table_key, value_type, value, NULL,
emitter_type_bool, NULL);
}
static inline void
emitter_dict_begin(
emitter_t *emitter, const char *json_key, const char *table_header) {
if (emitter_outputs_json(emitter)) {
emitter_json_key(emitter, json_key);
emitter_json_object_begin(emitter);
} else {
emitter_table_dict_begin(emitter, table_header);
}
}
static inline void
emitter_dict_end(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
emitter_json_object_end(emitter);
} else {
emitter_table_dict_end(emitter);
}
}
static inline void
emitter_begin(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
assert(emitter->nesting_depth == 0);
emitter_printf(emitter, "{");
emitter_nest_inc(emitter);
} else {
/*
* This guarantees that we always call write_cb at least once.
* This is useful if some invariant is established by each call
* to write_cb, but doesn't hold initially: e.g., some buffer
* holds a null-terminated string.
*/
emitter_printf(emitter, "%s", "");
}
}
static inline void
emitter_end(emitter_t *emitter) {
if (emitter_outputs_json(emitter)) {
assert(emitter->nesting_depth == 1);
emitter_nest_dec(emitter);
emitter_printf(emitter, "%s",
emitter->output == emitter_output_json_compact ? "}"
: "\n}\n");
}
}
#endif /* JEMALLOC_INTERNAL_EMITTER_H */

View file

@ -0,0 +1,78 @@
#ifndef JEMALLOC_INTERNAL_ESET_H
#define JEMALLOC_INTERNAL_ESET_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/edata.h"
#include "jemalloc/internal/fb.h"
#include "jemalloc/internal/mutex.h"
/*
* An eset ("extent set") is a quantized collection of extents, with built-in
* LRU queue.
*
* This class is not thread-safe; synchronization must be done externally if
* there are mutating operations. One exception is the stats counters, which
* may be read without any locking.
*/
typedef struct eset_bin_s eset_bin_t;
struct eset_bin_s {
edata_heap_t heap;
/*
* We do first-fit across multiple size classes. If we compared against
* the min element in each heap directly, we'd take a cache miss per
* extent we looked at. If we co-locate the edata summaries, we only
* take a miss on the edata we're actually going to return (which is
* inevitable anyways).
*/
edata_cmp_summary_t heap_min;
};
typedef struct eset_bin_stats_s eset_bin_stats_t;
struct eset_bin_stats_s {
atomic_zu_t nextents;
atomic_zu_t nbytes;
};
typedef struct eset_s eset_t;
struct eset_s {
/* Bitmap for which set bits correspond to non-empty heaps. */
fb_group_t bitmap[FB_NGROUPS(SC_NPSIZES + 1)];
/* Quantized per size class heaps of extents. */
eset_bin_t bins[SC_NPSIZES + 1];
eset_bin_stats_t bin_stats[SC_NPSIZES + 1];
/* LRU of all extents in heaps. */
edata_list_inactive_t lru;
/* Page sum for all extents in heaps. */
atomic_zu_t npages;
/*
* A duplication of the data in the containing ecache. We use this only
* for assertions on the states of the passed-in extents.
*/
extent_state_t state;
};
void eset_init(eset_t *eset, extent_state_t state);
size_t eset_npages_get(eset_t *eset);
/* Get the number of extents in the given page size index. */
size_t eset_nextents_get(eset_t *eset, pszind_t ind);
/* Get the sum total bytes of the extents in the given page size index. */
size_t eset_nbytes_get(eset_t *eset, pszind_t ind);
void eset_insert(eset_t *eset, edata_t *edata);
void eset_remove(eset_t *eset, edata_t *edata);
/*
* Select an extent from this eset of the given size and alignment. Returns
* null if no such item could be found.
*/
edata_t *eset_fit(eset_t *eset, size_t esize, size_t alignment, bool exact_only,
unsigned lg_max_fit);
#endif /* JEMALLOC_INTERNAL_ESET_H */

View file

@ -0,0 +1,50 @@
#ifndef JEMALLOC_INTERNAL_EXP_GROW_H
#define JEMALLOC_INTERNAL_EXP_GROW_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/sz.h"
typedef struct exp_grow_s exp_grow_t;
struct exp_grow_s {
/*
* Next extent size class in a growing series to use when satisfying a
* request via the extent hooks (only if opt_retain). This limits the
* number of disjoint virtual memory ranges so that extent merging can
* be effective even if multiple arenas' extent allocation requests are
* highly interleaved.
*
* retain_grow_limit is the max allowed size ind to expand (unless the
* required size is greater). Default is no limit, and controlled
* through mallctl only.
*/
pszind_t next;
pszind_t limit;
};
static inline bool
exp_grow_size_prepare(exp_grow_t *exp_grow, size_t alloc_size_min,
size_t *r_alloc_size, pszind_t *r_skip) {
*r_skip = 0;
*r_alloc_size = sz_pind2sz(exp_grow->next + *r_skip);
while (*r_alloc_size < alloc_size_min) {
(*r_skip)++;
if (exp_grow->next + *r_skip >= sz_psz2ind(SC_LARGE_MAXCLASS)) {
/* Outside legal range. */
return true;
}
*r_alloc_size = sz_pind2sz(exp_grow->next + *r_skip);
}
return false;
}
static inline void
exp_grow_size_commit(exp_grow_t *exp_grow, pszind_t skip) {
if (exp_grow->next + skip + 1 <= exp_grow->limit) {
exp_grow->next += skip + 1;
} else {
exp_grow->next = exp_grow->limit;
}
}
void exp_grow_init(exp_grow_t *exp_grow);
#endif /* JEMALLOC_INTERNAL_EXP_GROW_H */

View file

@ -1,239 +1,148 @@
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
#ifndef JEMALLOC_INTERNAL_EXTENT_H
#define JEMALLOC_INTERNAL_EXTENT_H
typedef struct extent_node_s extent_node_t;
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/ecache.h"
#include "jemalloc/internal/ehooks.h"
#include "jemalloc/internal/pac.h"
#include "jemalloc/internal/ph.h"
#include "jemalloc/internal/rtree.h"
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
/*
* This module contains the page-level allocator. It chooses the addresses that
* allocations requested by other modules will inhabit, and updates the global
* metadata to reflect allocation/deallocation/purging decisions.
*/
/* Tree of extents. Use accessor functions for en_* fields. */
struct extent_node_s {
/* Arena from which this extent came, if any. */
arena_t *en_arena;
/*
* When reuse (and split) an active extent, (1U << opt_lg_extent_max_active_fit)
* is the max ratio between the size of the active extent and the new extent.
*/
#define LG_EXTENT_MAX_ACTIVE_FIT_DEFAULT 6
extern size_t opt_lg_extent_max_active_fit;
/* Pointer to the extent that this tree node is responsible for. */
void *en_addr;
#define PROCESS_MADVISE_MAX_BATCH_DEFAULT 0
extern size_t opt_process_madvise_max_batch;
/* Total region size. */
size_t en_size;
/*
* The zeroed flag is used by chunk recycling code to track whether
* memory is zero-filled.
*/
bool en_zeroed;
/*
* True if physical memory is committed to the extent, whether
* explicitly or implicitly as on a system that overcommits and
* satisfies physical memory needs on demand via soft page faults.
*/
bool en_committed;
/*
* The achunk flag is used to validate that huge allocation lookups
* don't return arena chunks.
*/
bool en_achunk;
/* Profile counters, used for huge objects. */
prof_tctx_t *en_prof_tctx;
/* Linkage for arena's runs_dirty and chunks_cache rings. */
arena_runs_dirty_link_t rd;
qr(extent_node_t) cc_link;
union {
/* Linkage for the size/address-ordered tree. */
rb_node(extent_node_t) szad_link;
/* Linkage for arena's achunks, huge, and node_cache lists. */
ql_elm(extent_node_t) ql_link;
};
/* Linkage for the address-ordered tree. */
rb_node(extent_node_t) ad_link;
};
typedef rb_tree(extent_node_t) extent_tree_t;
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
rb_proto(, extent_tree_szad_, extent_tree_t, extent_node_t)
rb_proto(, extent_tree_ad_, extent_tree_t, extent_node_t)
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#ifndef JEMALLOC_ENABLE_INLINE
arena_t *extent_node_arena_get(const extent_node_t *node);
void *extent_node_addr_get(const extent_node_t *node);
size_t extent_node_size_get(const extent_node_t *node);
bool extent_node_zeroed_get(const extent_node_t *node);
bool extent_node_committed_get(const extent_node_t *node);
bool extent_node_achunk_get(const extent_node_t *node);
prof_tctx_t *extent_node_prof_tctx_get(const extent_node_t *node);
void extent_node_arena_set(extent_node_t *node, arena_t *arena);
void extent_node_addr_set(extent_node_t *node, void *addr);
void extent_node_size_set(extent_node_t *node, size_t size);
void extent_node_zeroed_set(extent_node_t *node, bool zeroed);
void extent_node_committed_set(extent_node_t *node, bool committed);
void extent_node_achunk_set(extent_node_t *node, bool achunk);
void extent_node_prof_tctx_set(extent_node_t *node, prof_tctx_t *tctx);
void extent_node_init(extent_node_t *node, arena_t *arena, void *addr,
size_t size, bool zeroed, bool committed);
void extent_node_dirty_linkage_init(extent_node_t *node);
void extent_node_dirty_insert(extent_node_t *node,
arena_runs_dirty_link_t *runs_dirty, extent_node_t *chunks_dirty);
void extent_node_dirty_remove(extent_node_t *node);
#ifdef JEMALLOC_HAVE_PROCESS_MADVISE
/* The iovec is on stack. Limit the max batch to avoid stack overflow. */
# define PROCESS_MADVISE_MAX_BATCH_LIMIT \
(VARIABLE_ARRAY_SIZE_MAX / sizeof(struct iovec))
#else
# define PROCESS_MADVISE_MAX_BATCH_LIMIT 0
#endif
#if (defined(JEMALLOC_ENABLE_INLINE) || defined(JEMALLOC_EXTENT_C_))
JEMALLOC_INLINE arena_t *
extent_node_arena_get(const extent_node_t *node)
{
edata_t *ecache_alloc(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
ecache_t *ecache, edata_t *expand_edata, size_t size, size_t alignment,
bool zero, bool guarded);
edata_t *ecache_alloc_grow(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
ecache_t *ecache, edata_t *expand_edata, size_t size, size_t alignment,
bool zero, bool guarded);
void ecache_dalloc(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, ecache_t *ecache,
edata_t *edata);
edata_t *ecache_evict(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
ecache_t *ecache, size_t npages_min);
return (node->en_arena);
void extent_gdump_add(tsdn_t *tsdn, const edata_t *edata);
void extent_record(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, ecache_t *ecache,
edata_t *edata);
void extent_dalloc_gap(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *edata);
edata_t *extent_alloc_wrapper(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
void *new_addr, size_t size, size_t alignment, bool zero, bool *commit,
bool growing_retained);
void extent_dalloc_wrapper(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *edata);
void extent_dalloc_wrapper_purged(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *edata);
void extent_destroy_wrapper(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *edata);
bool extent_purge_lazy_wrapper(tsdn_t *tsdn, ehooks_t *ehooks, edata_t *edata,
size_t offset, size_t length);
bool extent_purge_forced_wrapper(tsdn_t *tsdn, ehooks_t *ehooks, edata_t *edata,
size_t offset, size_t length);
edata_t *extent_split_wrapper(tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks,
edata_t *edata, size_t size_a, size_t size_b, bool holding_core_locks);
bool extent_merge_wrapper(
tsdn_t *tsdn, pac_t *pac, ehooks_t *ehooks, edata_t *a, edata_t *b);
bool extent_commit_zero(tsdn_t *tsdn, ehooks_t *ehooks, edata_t *edata,
bool commit, bool zero, bool growing_retained);
size_t extent_sn_next(pac_t *pac);
bool extent_boot(void);
JEMALLOC_ALWAYS_INLINE bool
extent_neighbor_head_state_mergeable(
bool edata_is_head, bool neighbor_is_head, bool forward) {
/*
* Head states checking: disallow merging if the higher addr extent is a
* head extent. This helps preserve first-fit, and more importantly
* makes sure no merge across arenas.
*/
if (forward) {
if (neighbor_is_head) {
return false;
}
} else {
if (edata_is_head) {
return false;
}
}
return true;
}
JEMALLOC_INLINE void *
extent_node_addr_get(const extent_node_t *node)
{
JEMALLOC_ALWAYS_INLINE bool
extent_can_acquire_neighbor(edata_t *edata, rtree_contents_t contents,
extent_pai_t pai, extent_state_t expected_state, bool forward,
bool expanding) {
edata_t *neighbor = contents.edata;
if (neighbor == NULL) {
return false;
}
/* It's not safe to access *neighbor yet; must verify states first. */
bool neighbor_is_head = contents.metadata.is_head;
if (!extent_neighbor_head_state_mergeable(
edata_is_head_get(edata), neighbor_is_head, forward)) {
return false;
}
extent_state_t neighbor_state = contents.metadata.state;
if (pai == EXTENT_PAI_PAC) {
if (neighbor_state != expected_state) {
return false;
}
/* From this point, it's safe to access *neighbor. */
if (!expanding
&& (edata_committed_get(edata)
!= edata_committed_get(neighbor))) {
/*
* Some platforms (e.g. Windows) require an explicit
* commit step (and writing to uncommitted memory is not
* allowed).
*/
return false;
}
} else {
if (neighbor_state == extent_state_active) {
return false;
}
/* From this point, it's safe to access *neighbor. */
}
return (node->en_addr);
assert(edata_pai_get(edata) == pai);
if (edata_pai_get(neighbor) != pai) {
return false;
}
if (opt_retain) {
assert(edata_arena_ind_get(edata)
== edata_arena_ind_get(neighbor));
} else {
if (edata_arena_ind_get(edata)
!= edata_arena_ind_get(neighbor)) {
return false;
}
}
assert(!edata_guarded_get(edata) && !edata_guarded_get(neighbor));
return true;
}
JEMALLOC_INLINE size_t
extent_node_size_get(const extent_node_t *node)
{
return (node->en_size);
}
JEMALLOC_INLINE bool
extent_node_zeroed_get(const extent_node_t *node)
{
return (node->en_zeroed);
}
JEMALLOC_INLINE bool
extent_node_committed_get(const extent_node_t *node)
{
assert(!node->en_achunk);
return (node->en_committed);
}
JEMALLOC_INLINE bool
extent_node_achunk_get(const extent_node_t *node)
{
return (node->en_achunk);
}
JEMALLOC_INLINE prof_tctx_t *
extent_node_prof_tctx_get(const extent_node_t *node)
{
return (node->en_prof_tctx);
}
JEMALLOC_INLINE void
extent_node_arena_set(extent_node_t *node, arena_t *arena)
{
node->en_arena = arena;
}
JEMALLOC_INLINE void
extent_node_addr_set(extent_node_t *node, void *addr)
{
node->en_addr = addr;
}
JEMALLOC_INLINE void
extent_node_size_set(extent_node_t *node, size_t size)
{
node->en_size = size;
}
JEMALLOC_INLINE void
extent_node_zeroed_set(extent_node_t *node, bool zeroed)
{
node->en_zeroed = zeroed;
}
JEMALLOC_INLINE void
extent_node_committed_set(extent_node_t *node, bool committed)
{
node->en_committed = committed;
}
JEMALLOC_INLINE void
extent_node_achunk_set(extent_node_t *node, bool achunk)
{
node->en_achunk = achunk;
}
JEMALLOC_INLINE void
extent_node_prof_tctx_set(extent_node_t *node, prof_tctx_t *tctx)
{
node->en_prof_tctx = tctx;
}
JEMALLOC_INLINE void
extent_node_init(extent_node_t *node, arena_t *arena, void *addr, size_t size,
bool zeroed, bool committed)
{
extent_node_arena_set(node, arena);
extent_node_addr_set(node, addr);
extent_node_size_set(node, size);
extent_node_zeroed_set(node, zeroed);
extent_node_committed_set(node, committed);
extent_node_achunk_set(node, false);
if (config_prof)
extent_node_prof_tctx_set(node, NULL);
}
JEMALLOC_INLINE void
extent_node_dirty_linkage_init(extent_node_t *node)
{
qr_new(&node->rd, rd_link);
qr_new(node, cc_link);
}
JEMALLOC_INLINE void
extent_node_dirty_insert(extent_node_t *node,
arena_runs_dirty_link_t *runs_dirty, extent_node_t *chunks_dirty)
{
qr_meld(runs_dirty, &node->rd, rd_link);
qr_meld(chunks_dirty, node, cc_link);
}
JEMALLOC_INLINE void
extent_node_dirty_remove(extent_node_t *node)
{
qr_remove(&node->rd, rd_link);
qr_remove(node, cc_link);
}
#endif
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/
#endif /* JEMALLOC_INTERNAL_EXTENT_H */

View file

@ -0,0 +1,30 @@
#ifndef JEMALLOC_INTERNAL_EXTENT_DSS_H
#define JEMALLOC_INTERNAL_EXTENT_DSS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_types.h"
#include "jemalloc/internal/tsd_types.h"
typedef enum {
dss_prec_disabled = 0,
dss_prec_primary = 1,
dss_prec_secondary = 2,
dss_prec_limit = 3
} dss_prec_t;
#define DSS_PREC_DEFAULT dss_prec_secondary
#define DSS_DEFAULT "secondary"
extern const char *const dss_prec_names[];
extern const char *opt_dss;
dss_prec_t extent_dss_prec_get(void);
bool extent_dss_prec_set(dss_prec_t dss_prec);
void *extent_alloc_dss(tsdn_t *tsdn, arena_t *arena, void *new_addr,
size_t size, size_t alignment, bool *zero, bool *commit);
bool extent_in_dss(void *addr);
bool extent_dss_mergeable(void *addr_a, void *addr_b);
void extent_dss_boot(void);
#endif /* JEMALLOC_INTERNAL_EXTENT_DSS_H */

View file

@ -0,0 +1,12 @@
#ifndef JEMALLOC_INTERNAL_EXTENT_MMAP_EXTERNS_H
#define JEMALLOC_INTERNAL_EXTENT_MMAP_EXTERNS_H
#include "jemalloc/internal/jemalloc_preamble.h"
extern bool opt_retain;
void *extent_alloc_mmap(
void *new_addr, size_t size, size_t alignment, bool *zero, bool *commit);
bool extent_dalloc_mmap(void *addr, size_t size);
#endif /* JEMALLOC_INTERNAL_EXTENT_MMAP_EXTERNS_H */

View file

@ -0,0 +1,378 @@
#ifndef JEMALLOC_INTERNAL_FB_H
#define JEMALLOC_INTERNAL_FB_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
#include "jemalloc/internal/bit_util.h"
/*
* The flat bitmap module. This has a larger API relative to the bitmap module
* (supporting things like backwards searches, and searching for both set and
* unset bits), at the cost of slower operations for very large bitmaps.
*
* Initialized flat bitmaps start at all-zeros (all bits unset).
*/
typedef unsigned long fb_group_t;
#define FB_GROUP_BITS (ZU(1) << (LG_SIZEOF_LONG + 3))
#define FB_NGROUPS(nbits) \
((nbits) / FB_GROUP_BITS + ((nbits) % FB_GROUP_BITS == 0 ? 0 : 1))
static inline void
fb_init(fb_group_t *fb, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
memset(fb, 0, ngroups * sizeof(fb_group_t));
}
static inline bool
fb_empty(fb_group_t *fb, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
for (size_t i = 0; i < ngroups; i++) {
if (fb[i] != 0) {
return false;
}
}
return true;
}
static inline bool
fb_full(fb_group_t *fb, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
size_t trailing_bits = nbits % FB_GROUP_BITS;
size_t limit = (trailing_bits == 0 ? ngroups : ngroups - 1);
for (size_t i = 0; i < limit; i++) {
if (fb[i] != ~(fb_group_t)0) {
return false;
}
}
if (trailing_bits == 0) {
return true;
}
return fb[ngroups - 1] == ((fb_group_t)1 << trailing_bits) - 1;
}
static inline bool
fb_get(fb_group_t *fb, size_t nbits, size_t bit) {
assert(bit < nbits);
size_t group_ind = bit / FB_GROUP_BITS;
size_t bit_ind = bit % FB_GROUP_BITS;
return (bool)(fb[group_ind] & ((fb_group_t)1 << bit_ind));
}
static inline void
fb_set(fb_group_t *fb, size_t nbits, size_t bit) {
assert(bit < nbits);
size_t group_ind = bit / FB_GROUP_BITS;
size_t bit_ind = bit % FB_GROUP_BITS;
fb[group_ind] |= ((fb_group_t)1 << bit_ind);
}
static inline void
fb_unset(fb_group_t *fb, size_t nbits, size_t bit) {
assert(bit < nbits);
size_t group_ind = bit / FB_GROUP_BITS;
size_t bit_ind = bit % FB_GROUP_BITS;
fb[group_ind] &= ~((fb_group_t)1 << bit_ind);
}
/*
* Some implementation details. This visitation function lets us apply a group
* visitor to each group in the bitmap (potentially modifying it). The mask
* indicates which bits are logically part of the visitation.
*/
typedef void (*fb_group_visitor_t)(void *ctx, fb_group_t *fb, fb_group_t mask);
JEMALLOC_ALWAYS_INLINE void
fb_visit_impl(fb_group_t *fb, size_t nbits, fb_group_visitor_t visit, void *ctx,
size_t start, size_t cnt) {
assert(cnt > 0);
assert(start + cnt <= nbits);
size_t group_ind = start / FB_GROUP_BITS;
size_t start_bit_ind = start % FB_GROUP_BITS;
/*
* The first group is special; it's the only one we don't start writing
* to from bit 0.
*/
size_t first_group_cnt = (start_bit_ind + cnt > FB_GROUP_BITS
? FB_GROUP_BITS - start_bit_ind
: cnt);
/*
* We can basically split affected words into:
* - The first group, where we touch only the high bits
* - The last group, where we touch only the low bits
* - The middle, where we set all the bits to the same thing.
* We treat each case individually. The last two could be merged, but
* this can lead to bad codegen for those middle words.
*/
/* First group */
fb_group_t mask =
((~(fb_group_t)0) >> (FB_GROUP_BITS - first_group_cnt))
<< start_bit_ind;
visit(ctx, &fb[group_ind], mask);
cnt -= first_group_cnt;
group_ind++;
/* Middle groups */
while (cnt > FB_GROUP_BITS) {
visit(ctx, &fb[group_ind], ~(fb_group_t)0);
cnt -= FB_GROUP_BITS;
group_ind++;
}
/* Last group */
if (cnt != 0) {
mask = (~(fb_group_t)0) >> (FB_GROUP_BITS - cnt);
visit(ctx, &fb[group_ind], mask);
}
}
JEMALLOC_ALWAYS_INLINE void
fb_assign_visitor(void *ctx, fb_group_t *fb, fb_group_t mask) {
bool val = *(bool *)ctx;
if (val) {
*fb |= mask;
} else {
*fb &= ~mask;
}
}
/* Sets the cnt bits starting at position start. Must not have a 0 count. */
static inline void
fb_set_range(fb_group_t *fb, size_t nbits, size_t start, size_t cnt) {
bool val = true;
fb_visit_impl(fb, nbits, &fb_assign_visitor, &val, start, cnt);
}
/* Unsets the cnt bits starting at position start. Must not have a 0 count. */
static inline void
fb_unset_range(fb_group_t *fb, size_t nbits, size_t start, size_t cnt) {
bool val = false;
fb_visit_impl(fb, nbits, &fb_assign_visitor, &val, start, cnt);
}
JEMALLOC_ALWAYS_INLINE void
fb_scount_visitor(void *ctx, fb_group_t *fb, fb_group_t mask) {
size_t *scount = (size_t *)ctx;
*scount += popcount_lu(*fb & mask);
}
/* Finds the number of set bit in the of length cnt starting at start. */
JEMALLOC_ALWAYS_INLINE size_t
fb_scount(fb_group_t *fb, size_t nbits, size_t start, size_t cnt) {
size_t scount = 0;
fb_visit_impl(fb, nbits, &fb_scount_visitor, &scount, start, cnt);
return scount;
}
/* Finds the number of unset bit in the of length cnt starting at start. */
JEMALLOC_ALWAYS_INLINE size_t
fb_ucount(fb_group_t *fb, size_t nbits, size_t start, size_t cnt) {
size_t scount = fb_scount(fb, nbits, start, cnt);
return cnt - scount;
}
/*
* An implementation detail; find the first bit at position >= min_bit with the
* value val.
*
* Returns the number of bits in the bitmap if no such bit exists.
*/
JEMALLOC_ALWAYS_INLINE ssize_t
fb_find_impl(
fb_group_t *fb, size_t nbits, size_t start, bool val, bool forward) {
assert(start < nbits);
size_t ngroups = FB_NGROUPS(nbits);
ssize_t group_ind = start / FB_GROUP_BITS;
size_t bit_ind = start % FB_GROUP_BITS;
fb_group_t maybe_invert = (val ? 0 : (fb_group_t)-1);
fb_group_t group = fb[group_ind];
group ^= maybe_invert;
if (forward) {
/* Only keep ones in bits bit_ind and above. */
group &= ~((1LU << bit_ind) - 1);
} else {
/*
* Only keep ones in bits bit_ind and below. You might more
* naturally express this as (1 << (bit_ind + 1)) - 1, but
* that shifts by an invalid amount if bit_ind is one less than
* FB_GROUP_BITS.
*/
group &= ((2LU << bit_ind) - 1);
}
ssize_t group_ind_bound = forward ? (ssize_t)ngroups : -1;
while (group == 0) {
group_ind += forward ? 1 : -1;
if (group_ind == group_ind_bound) {
return forward ? (ssize_t)nbits : (ssize_t)-1;
}
group = fb[group_ind];
group ^= maybe_invert;
}
assert(group != 0);
size_t bit = forward ? ffs_lu(group) : fls_lu(group);
size_t pos = group_ind * FB_GROUP_BITS + bit;
/*
* The high bits of a partially filled last group are zeros, so if we're
* looking for zeros we don't want to report an invalid result.
*/
if (forward && !val && pos > nbits) {
return nbits;
}
return pos;
}
/*
* Find the first set bit in the bitmap with an index >= min_bit. Returns the
* number of bits in the bitmap if no such bit exists.
*/
static inline size_t
fb_ffu(fb_group_t *fb, size_t nbits, size_t min_bit) {
return (size_t)fb_find_impl(fb, nbits, min_bit, /* val */ false,
/* forward */ true);
}
/* The same, but looks for an unset bit. */
static inline size_t
fb_ffs(fb_group_t *fb, size_t nbits, size_t min_bit) {
return (size_t)fb_find_impl(fb, nbits, min_bit, /* val */ true,
/* forward */ true);
}
/*
* Find the last set bit in the bitmap with an index <= max_bit. Returns -1 if
* no such bit exists.
*/
static inline ssize_t
fb_flu(fb_group_t *fb, size_t nbits, size_t max_bit) {
return fb_find_impl(fb, nbits, max_bit, /* val */ false,
/* forward */ false);
}
static inline ssize_t
fb_fls(fb_group_t *fb, size_t nbits, size_t max_bit) {
return fb_find_impl(fb, nbits, max_bit, /* val */ true,
/* forward */ false);
}
/* Returns whether or not we found a range. */
JEMALLOC_ALWAYS_INLINE bool
fb_iter_range_impl(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len, bool val, bool forward) {
assert(start < nbits);
ssize_t next_range_begin = fb_find_impl(fb, nbits, start, val, forward);
if ((forward && next_range_begin == (ssize_t)nbits)
|| (!forward && next_range_begin == (ssize_t)-1)) {
return false;
}
/* Half open range; the set bits are [begin, end). */
ssize_t next_range_end = fb_find_impl(
fb, nbits, next_range_begin, !val, forward);
if (forward) {
*r_begin = next_range_begin;
*r_len = next_range_end - next_range_begin;
} else {
*r_begin = next_range_end + 1;
*r_len = next_range_begin - next_range_end;
}
return true;
}
/*
* Used to iterate through ranges of set bits.
*
* Tries to find the next contiguous sequence of set bits with a first index >=
* start. If one exists, puts the earliest bit of the range in *r_begin, its
* length in *r_len, and returns true. Otherwise, returns false (without
* touching *r_begin or *r_end).
*/
static inline bool
fb_srange_iter(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len) {
return fb_iter_range_impl(fb, nbits, start, r_begin, r_len,
/* val */ true, /* forward */ true);
}
/*
* The same as fb_srange_iter, but searches backwards from start rather than
* forwards. (The position returned is still the earliest bit in the range).
*/
static inline bool
fb_srange_riter(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len) {
return fb_iter_range_impl(fb, nbits, start, r_begin, r_len,
/* val */ true, /* forward */ false);
}
/* Similar to fb_srange_iter, but searches for unset bits. */
static inline bool
fb_urange_iter(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len) {
return fb_iter_range_impl(fb, nbits, start, r_begin, r_len,
/* val */ false, /* forward */ true);
}
/* Similar to fb_srange_riter, but searches for unset bits. */
static inline bool
fb_urange_riter(fb_group_t *fb, size_t nbits, size_t start, size_t *r_begin,
size_t *r_len) {
return fb_iter_range_impl(fb, nbits, start, r_begin, r_len,
/* val */ false, /* forward */ false);
}
JEMALLOC_ALWAYS_INLINE size_t
fb_range_longest_impl(fb_group_t *fb, size_t nbits, bool val) {
size_t begin = 0;
size_t longest_len = 0;
size_t len = 0;
while (begin < nbits
&& fb_iter_range_impl(
fb, nbits, begin, &begin, &len, val, /* forward */ true)) {
if (len > longest_len) {
longest_len = len;
}
begin += len;
}
return longest_len;
}
static inline size_t
fb_srange_longest(fb_group_t *fb, size_t nbits) {
return fb_range_longest_impl(fb, nbits, /* val */ true);
}
static inline size_t
fb_urange_longest(fb_group_t *fb, size_t nbits) {
return fb_range_longest_impl(fb, nbits, /* val */ false);
}
/*
* Initializes each bit of dst with the bitwise-AND of the corresponding bits of
* src1 and src2. All bitmaps must be the same size.
*/
static inline void
fb_bit_and(fb_group_t *dst, fb_group_t *src1, fb_group_t *src2, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
for (size_t i = 0; i < ngroups; i++) {
dst[i] = src1[i] & src2[i];
}
}
/* Like fb_bit_and, but with bitwise-OR. */
static inline void
fb_bit_or(fb_group_t *dst, fb_group_t *src1, fb_group_t *src2, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
for (size_t i = 0; i < ngroups; i++) {
dst[i] = src1[i] | src2[i];
}
}
/* Initializes dst bit i to the negation of source bit i. */
static inline void
fb_bit_not(fb_group_t *dst, fb_group_t *src, size_t nbits) {
size_t ngroups = FB_NGROUPS(nbits);
for (size_t i = 0; i < ngroups; i++) {
dst[i] = ~src[i];
}
}
#endif /* JEMALLOC_INTERNAL_FB_H */

View file

@ -0,0 +1,129 @@
#ifndef JEMALLOC_INTERNAL_FXP_H
#define JEMALLOC_INTERNAL_FXP_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
/*
* A simple fixed-point math implementation, supporting only unsigned values
* (with overflow being an error).
*
* It's not in general safe to use floating point in core code, because various
* libc implementations we get linked against can assume that malloc won't touch
* floating point state and call it with an unusual calling convention.
*/
/*
* High 16 bits are the integer part, low 16 are the fractional part. Or
* equivalently, repr == 2**16 * val, where we use "val" to refer to the
* (imaginary) fractional representation of the true value.
*
* We pick a uint32_t here since it's convenient in some places to
* double the representation size (i.e. multiplication and division use
* 64-bit integer types), and a uint64_t is the largest type we're
* certain is available.
*/
typedef uint32_t fxp_t;
#define FXP_INIT_INT(x) ((x) << 16)
#define FXP_INIT_PERCENT(pct) (((pct) << 16) / 100)
/*
* Amount of precision used in parsing and printing numbers. The integer bound
* is simply because the integer part of the number gets 16 bits, and so is
* bounded by 65536.
*
* We use a lot of precision for the fractional part, even though most of it
* gets rounded off; this lets us get exact values for the important special
* case where the denominator is a small power of 2 (for instance,
* 1/512 == 0.001953125 is exactly representable even with only 16 bits of
* fractional precision). We need to left-shift by 16 before dividing by
* 10**precision, so we pick precision to be floor(log(2**48)) = 14.
*/
#define FXP_INTEGER_PART_DIGITS 5
#define FXP_FRACTIONAL_PART_DIGITS 14
/*
* In addition to the integer and fractional parts of the number, we need to
* include a null character and (possibly) a decimal point.
*/
#define FXP_BUF_SIZE (FXP_INTEGER_PART_DIGITS + FXP_FRACTIONAL_PART_DIGITS + 2)
static inline fxp_t
fxp_add(fxp_t a, fxp_t b) {
return a + b;
}
static inline fxp_t
fxp_sub(fxp_t a, fxp_t b) {
assert(a >= b);
return a - b;
}
static inline fxp_t
fxp_mul(fxp_t a, fxp_t b) {
uint64_t unshifted = (uint64_t)a * (uint64_t)b;
/*
* Unshifted is (a.val * 2**16) * (b.val * 2**16)
* == (a.val * b.val) * 2**32, but we want
* (a.val * b.val) * 2 ** 16.
*/
return (uint32_t)(unshifted >> 16);
}
static inline fxp_t
fxp_div(fxp_t a, fxp_t b) {
assert(b != 0);
uint64_t unshifted = ((uint64_t)a << 32) / (uint64_t)b;
/*
* Unshifted is (a.val * 2**16) * (2**32) / (b.val * 2**16)
* == (a.val / b.val) * (2 ** 32), which again corresponds to a right
* shift of 16.
*/
return (uint32_t)(unshifted >> 16);
}
static inline uint32_t
fxp_round_down(fxp_t a) {
return a >> 16;
}
static inline uint32_t
fxp_round_nearest(fxp_t a) {
uint32_t fractional_part = (a & ((1U << 16) - 1));
uint32_t increment = (uint32_t)(fractional_part >= (1U << 15));
return (a >> 16) + increment;
}
/*
* Approximately computes x * frac, without the size limitations that would be
* imposed by converting u to an fxp_t.
*/
static inline size_t
fxp_mul_frac(size_t x_orig, fxp_t frac) {
assert(frac <= (1U << 16));
/*
* Work around an over-enthusiastic warning about type limits below (on
* 32-bit platforms, a size_t is always less than 1ULL << 48).
*/
uint64_t x = (uint64_t)x_orig;
/*
* If we can guarantee no overflow, multiply first before shifting, to
* preserve some precision. Otherwise, shift first and then multiply.
* In the latter case, we only lose the low 16 bits of a 48-bit number,
* so we're still accurate to within 1/2**32.
*/
if (x < (1ULL << 48)) {
return (size_t)((x * frac) >> 16);
} else {
return (size_t)((x >> 16) * (uint64_t)frac);
}
}
/*
* Returns true on error. Otherwise, returns false and updates *ptr to point to
* the first character not parsed (because it wasn't a digit).
*/
bool fxp_parse(fxp_t *a, const char *ptr, char **end);
void fxp_print(fxp_t a, char buf[FXP_BUF_SIZE]);
#endif /* JEMALLOC_INTERNAL_FXP_H */

View file

@ -1,111 +1,79 @@
#ifndef JEMALLOC_INTERNAL_HASH_H
#define JEMALLOC_INTERNAL_HASH_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/assert.h"
/*
* The following hash function is based on MurmurHash3, placed into the public
* domain by Austin Appleby. See https://github.com/aappleby/smhasher for
* details.
*/
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#ifndef JEMALLOC_ENABLE_INLINE
uint32_t hash_x86_32(const void *key, int len, uint32_t seed);
void hash_x86_128(const void *key, const int len, uint32_t seed,
uint64_t r_out[2]);
void hash_x64_128(const void *key, const int len, const uint32_t seed,
uint64_t r_out[2]);
void hash(const void *key, size_t len, const uint32_t seed,
size_t r_hash[2]);
#endif
#if (defined(JEMALLOC_ENABLE_INLINE) || defined(JEMALLOC_HASH_C_))
/******************************************************************************/
/* Internal implementation. */
JEMALLOC_INLINE uint32_t
hash_rotl_32(uint32_t x, int8_t r)
{
static inline uint32_t
hash_rotl_32(uint32_t x, int8_t r) {
return ((x << r) | (x >> (32 - r)));
}
JEMALLOC_INLINE uint64_t
hash_rotl_64(uint64_t x, int8_t r)
{
static inline uint64_t
hash_rotl_64(uint64_t x, int8_t r) {
return ((x << r) | (x >> (64 - r)));
}
JEMALLOC_INLINE uint32_t
hash_get_block_32(const uint32_t *p, int i)
{
static inline uint32_t
hash_get_block_32(const uint32_t *p, int i) {
/* Handle unaligned read. */
if (unlikely((uintptr_t)p & (sizeof(uint32_t)-1)) != 0) {
if (unlikely((uintptr_t)p & (sizeof(uint32_t) - 1)) != 0) {
uint32_t ret;
memcpy(&ret, (uint8_t *)(p + i), sizeof(uint32_t));
return (ret);
return ret;
}
return (p[i]);
return p[i];
}
JEMALLOC_INLINE uint64_t
hash_get_block_64(const uint64_t *p, int i)
{
static inline uint64_t
hash_get_block_64(const uint64_t *p, int i) {
/* Handle unaligned read. */
if (unlikely((uintptr_t)p & (sizeof(uint64_t)-1)) != 0) {
if (unlikely((uintptr_t)p & (sizeof(uint64_t) - 1)) != 0) {
uint64_t ret;
memcpy(&ret, (uint8_t *)(p + i), sizeof(uint64_t));
return (ret);
return ret;
}
return (p[i]);
return p[i];
}
JEMALLOC_INLINE uint32_t
hash_fmix_32(uint32_t h)
{
static inline uint32_t
hash_fmix_32(uint32_t h) {
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return (h);
return h;
}
JEMALLOC_INLINE uint64_t
hash_fmix_64(uint64_t k)
{
static inline uint64_t
hash_fmix_64(uint64_t k) {
k ^= k >> 33;
k *= KQU(0xff51afd7ed558ccd);
k ^= k >> 33;
k *= KQU(0xc4ceb9fe1a85ec53);
k ^= k >> 33;
return (k);
return k;
}
JEMALLOC_INLINE uint32_t
hash_x86_32(const void *key, int len, uint32_t seed)
{
const uint8_t *data = (const uint8_t *) key;
const int nblocks = len / 4;
static inline uint32_t
hash_x86_32(const void *key, int len, uint32_t seed) {
const uint8_t *data = (const uint8_t *)key;
const int nblocks = len / 4;
uint32_t h1 = seed;
@ -114,8 +82,8 @@ hash_x86_32(const void *key, int len, uint32_t seed)
/* body */
{
const uint32_t *blocks = (const uint32_t *) (data + nblocks*4);
int i;
const uint32_t *blocks = (const uint32_t *)(data + nblocks * 4);
int i;
for (i = -nblocks; i; i++) {
uint32_t k1 = hash_get_block_32(blocks, i);
@ -126,21 +94,29 @@ hash_x86_32(const void *key, int len, uint32_t seed)
h1 ^= k1;
h1 = hash_rotl_32(h1, 13);
h1 = h1*5 + 0xe6546b64;
h1 = h1 * 5 + 0xe6546b64;
}
}
/* tail */
{
const uint8_t *tail = (const uint8_t *) (data + nblocks*4);
const uint8_t *tail = (const uint8_t *)(data + nblocks * 4);
uint32_t k1 = 0;
switch (len & 3) {
case 3: k1 ^= tail[2] << 16;
case 2: k1 ^= tail[1] << 8;
case 1: k1 ^= tail[0]; k1 *= c1; k1 = hash_rotl_32(k1, 15);
k1 *= c2; h1 ^= k1;
case 3:
k1 ^= tail[2] << 16;
JEMALLOC_FALLTHROUGH;
case 2:
k1 ^= tail[1] << 8;
JEMALLOC_FALLTHROUGH;
case 1:
k1 ^= tail[0];
k1 *= c1;
k1 = hash_rotl_32(k1, 15);
k1 *= c2;
h1 ^= k1;
}
}
@ -149,15 +125,13 @@ hash_x86_32(const void *key, int len, uint32_t seed)
h1 = hash_fmix_32(h1);
return (h1);
return h1;
}
UNUSED JEMALLOC_INLINE void
hash_x86_128(const void *key, const int len, uint32_t seed,
uint64_t r_out[2])
{
const uint8_t * data = (const uint8_t *) key;
const int nblocks = len / 16;
static inline void
hash_x86_128(const void *key, const int len, uint32_t seed, uint64_t r_out[2]) {
const uint8_t *data = (const uint8_t *)key;
const int nblocks = len / 16;
uint32_t h1 = seed;
uint32_t h2 = seed;
@ -171,95 +145,161 @@ hash_x86_128(const void *key, const int len, uint32_t seed,
/* body */
{
const uint32_t *blocks = (const uint32_t *) (data + nblocks*16);
int i;
const uint32_t *blocks = (const uint32_t *)(data
+ nblocks * 16);
int i;
for (i = -nblocks; i; i++) {
uint32_t k1 = hash_get_block_32(blocks, i*4 + 0);
uint32_t k2 = hash_get_block_32(blocks, i*4 + 1);
uint32_t k3 = hash_get_block_32(blocks, i*4 + 2);
uint32_t k4 = hash_get_block_32(blocks, i*4 + 3);
uint32_t k1 = hash_get_block_32(blocks, i * 4 + 0);
uint32_t k2 = hash_get_block_32(blocks, i * 4 + 1);
uint32_t k3 = hash_get_block_32(blocks, i * 4 + 2);
uint32_t k4 = hash_get_block_32(blocks, i * 4 + 3);
k1 *= c1; k1 = hash_rotl_32(k1, 15); k1 *= c2; h1 ^= k1;
k1 *= c1;
k1 = hash_rotl_32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = hash_rotl_32(h1, 19); h1 += h2;
h1 = h1*5 + 0x561ccd1b;
h1 = hash_rotl_32(h1, 19);
h1 += h2;
h1 = h1 * 5 + 0x561ccd1b;
k2 *= c2; k2 = hash_rotl_32(k2, 16); k2 *= c3; h2 ^= k2;
k2 *= c2;
k2 = hash_rotl_32(k2, 16);
k2 *= c3;
h2 ^= k2;
h2 = hash_rotl_32(h2, 17); h2 += h3;
h2 = h2*5 + 0x0bcaa747;
h2 = hash_rotl_32(h2, 17);
h2 += h3;
h2 = h2 * 5 + 0x0bcaa747;
k3 *= c3; k3 = hash_rotl_32(k3, 17); k3 *= c4; h3 ^= k3;
k3 *= c3;
k3 = hash_rotl_32(k3, 17);
k3 *= c4;
h3 ^= k3;
h3 = hash_rotl_32(h3, 15); h3 += h4;
h3 = h3*5 + 0x96cd1c35;
h3 = hash_rotl_32(h3, 15);
h3 += h4;
h3 = h3 * 5 + 0x96cd1c35;
k4 *= c4; k4 = hash_rotl_32(k4, 18); k4 *= c1; h4 ^= k4;
k4 *= c4;
k4 = hash_rotl_32(k4, 18);
k4 *= c1;
h4 ^= k4;
h4 = hash_rotl_32(h4, 13); h4 += h1;
h4 = h4*5 + 0x32ac3b17;
h4 = hash_rotl_32(h4, 13);
h4 += h1;
h4 = h4 * 5 + 0x32ac3b17;
}
}
/* tail */
{
const uint8_t *tail = (const uint8_t *) (data + nblocks*16);
uint32_t k1 = 0;
uint32_t k2 = 0;
uint32_t k3 = 0;
uint32_t k4 = 0;
const uint8_t *tail = (const uint8_t *)(data + nblocks * 16);
uint32_t k1 = 0;
uint32_t k2 = 0;
uint32_t k3 = 0;
uint32_t k4 = 0;
switch (len & 15) {
case 15: k4 ^= tail[14] << 16;
case 14: k4 ^= tail[13] << 8;
case 13: k4 ^= tail[12] << 0;
k4 *= c4; k4 = hash_rotl_32(k4, 18); k4 *= c1; h4 ^= k4;
case 12: k3 ^= tail[11] << 24;
case 11: k3 ^= tail[10] << 16;
case 10: k3 ^= tail[ 9] << 8;
case 9: k3 ^= tail[ 8] << 0;
k3 *= c3; k3 = hash_rotl_32(k3, 17); k3 *= c4; h3 ^= k3;
case 8: k2 ^= tail[ 7] << 24;
case 7: k2 ^= tail[ 6] << 16;
case 6: k2 ^= tail[ 5] << 8;
case 5: k2 ^= tail[ 4] << 0;
k2 *= c2; k2 = hash_rotl_32(k2, 16); k2 *= c3; h2 ^= k2;
case 4: k1 ^= tail[ 3] << 24;
case 3: k1 ^= tail[ 2] << 16;
case 2: k1 ^= tail[ 1] << 8;
case 1: k1 ^= tail[ 0] << 0;
k1 *= c1; k1 = hash_rotl_32(k1, 15); k1 *= c2; h1 ^= k1;
case 15:
k4 ^= tail[14] << 16;
JEMALLOC_FALLTHROUGH;
case 14:
k4 ^= tail[13] << 8;
JEMALLOC_FALLTHROUGH;
case 13:
k4 ^= tail[12] << 0;
k4 *= c4;
k4 = hash_rotl_32(k4, 18);
k4 *= c1;
h4 ^= k4;
JEMALLOC_FALLTHROUGH;
case 12:
k3 ^= (uint32_t)tail[11] << 24;
JEMALLOC_FALLTHROUGH;
case 11:
k3 ^= tail[10] << 16;
JEMALLOC_FALLTHROUGH;
case 10:
k3 ^= tail[9] << 8;
JEMALLOC_FALLTHROUGH;
case 9:
k3 ^= tail[8] << 0;
k3 *= c3;
k3 = hash_rotl_32(k3, 17);
k3 *= c4;
h3 ^= k3;
JEMALLOC_FALLTHROUGH;
case 8:
k2 ^= (uint32_t)tail[7] << 24;
JEMALLOC_FALLTHROUGH;
case 7:
k2 ^= tail[6] << 16;
JEMALLOC_FALLTHROUGH;
case 6:
k2 ^= tail[5] << 8;
JEMALLOC_FALLTHROUGH;
case 5:
k2 ^= tail[4] << 0;
k2 *= c2;
k2 = hash_rotl_32(k2, 16);
k2 *= c3;
h2 ^= k2;
JEMALLOC_FALLTHROUGH;
case 4:
k1 ^= (uint32_t)tail[3] << 24;
JEMALLOC_FALLTHROUGH;
case 3:
k1 ^= tail[2] << 16;
JEMALLOC_FALLTHROUGH;
case 2:
k1 ^= tail[1] << 8;
JEMALLOC_FALLTHROUGH;
case 1:
k1 ^= tail[0] << 0;
k1 *= c1;
k1 = hash_rotl_32(k1, 15);
k1 *= c2;
h1 ^= k1;
break;
}
}
/* finalization */
h1 ^= len; h2 ^= len; h3 ^= len; h4 ^= len;
h1 ^= len;
h2 ^= len;
h3 ^= len;
h4 ^= len;
h1 += h2; h1 += h3; h1 += h4;
h2 += h1; h3 += h1; h4 += h1;
h1 += h2;
h1 += h3;
h1 += h4;
h2 += h1;
h3 += h1;
h4 += h1;
h1 = hash_fmix_32(h1);
h2 = hash_fmix_32(h2);
h3 = hash_fmix_32(h3);
h4 = hash_fmix_32(h4);
h1 += h2; h1 += h3; h1 += h4;
h2 += h1; h3 += h1; h4 += h1;
h1 += h2;
h1 += h3;
h1 += h4;
h2 += h1;
h3 += h1;
h4 += h1;
r_out[0] = (((uint64_t) h2) << 32) | h1;
r_out[1] = (((uint64_t) h4) << 32) | h3;
r_out[0] = (((uint64_t)h2) << 32) | h1;
r_out[1] = (((uint64_t)h4) << 32) | h3;
}
UNUSED JEMALLOC_INLINE void
hash_x64_128(const void *key, const int len, const uint32_t seed,
uint64_t r_out[2])
{
const uint8_t *data = (const uint8_t *) key;
const int nblocks = len / 16;
static inline void
hash_x64_128(
const void *key, const int len, const uint32_t seed, uint64_t r_out[2]) {
const uint8_t *data = (const uint8_t *)key;
const int nblocks = len / 16;
uint64_t h1 = seed;
uint64_t h2 = seed;
@ -269,55 +309,99 @@ hash_x64_128(const void *key, const int len, const uint32_t seed,
/* body */
{
const uint64_t *blocks = (const uint64_t *) (data);
int i;
const uint64_t *blocks = (const uint64_t *)(data);
int i;
for (i = 0; i < nblocks; i++) {
uint64_t k1 = hash_get_block_64(blocks, i*2 + 0);
uint64_t k2 = hash_get_block_64(blocks, i*2 + 1);
uint64_t k1 = hash_get_block_64(blocks, i * 2 + 0);
uint64_t k2 = hash_get_block_64(blocks, i * 2 + 1);
k1 *= c1; k1 = hash_rotl_64(k1, 31); k1 *= c2; h1 ^= k1;
k1 *= c1;
k1 = hash_rotl_64(k1, 31);
k1 *= c2;
h1 ^= k1;
h1 = hash_rotl_64(h1, 27); h1 += h2;
h1 = h1*5 + 0x52dce729;
h1 = hash_rotl_64(h1, 27);
h1 += h2;
h1 = h1 * 5 + 0x52dce729;
k2 *= c2; k2 = hash_rotl_64(k2, 33); k2 *= c1; h2 ^= k2;
k2 *= c2;
k2 = hash_rotl_64(k2, 33);
k2 *= c1;
h2 ^= k2;
h2 = hash_rotl_64(h2, 31); h2 += h1;
h2 = h2*5 + 0x38495ab5;
h2 = hash_rotl_64(h2, 31);
h2 += h1;
h2 = h2 * 5 + 0x38495ab5;
}
}
/* tail */
{
const uint8_t *tail = (const uint8_t*)(data + nblocks*16);
uint64_t k1 = 0;
uint64_t k2 = 0;
const uint8_t *tail = (const uint8_t *)(data + nblocks * 16);
uint64_t k1 = 0;
uint64_t k2 = 0;
switch (len & 15) {
case 15: k2 ^= ((uint64_t)(tail[14])) << 48;
case 14: k2 ^= ((uint64_t)(tail[13])) << 40;
case 13: k2 ^= ((uint64_t)(tail[12])) << 32;
case 12: k2 ^= ((uint64_t)(tail[11])) << 24;
case 11: k2 ^= ((uint64_t)(tail[10])) << 16;
case 10: k2 ^= ((uint64_t)(tail[ 9])) << 8;
case 9: k2 ^= ((uint64_t)(tail[ 8])) << 0;
k2 *= c2; k2 = hash_rotl_64(k2, 33); k2 *= c1; h2 ^= k2;
case 8: k1 ^= ((uint64_t)(tail[ 7])) << 56;
case 7: k1 ^= ((uint64_t)(tail[ 6])) << 48;
case 6: k1 ^= ((uint64_t)(tail[ 5])) << 40;
case 5: k1 ^= ((uint64_t)(tail[ 4])) << 32;
case 4: k1 ^= ((uint64_t)(tail[ 3])) << 24;
case 3: k1 ^= ((uint64_t)(tail[ 2])) << 16;
case 2: k1 ^= ((uint64_t)(tail[ 1])) << 8;
case 1: k1 ^= ((uint64_t)(tail[ 0])) << 0;
k1 *= c1; k1 = hash_rotl_64(k1, 31); k1 *= c2; h1 ^= k1;
case 15:
k2 ^= ((uint64_t)(tail[14])) << 48;
JEMALLOC_FALLTHROUGH;
case 14:
k2 ^= ((uint64_t)(tail[13])) << 40;
JEMALLOC_FALLTHROUGH;
case 13:
k2 ^= ((uint64_t)(tail[12])) << 32;
JEMALLOC_FALLTHROUGH;
case 12:
k2 ^= ((uint64_t)(tail[11])) << 24;
JEMALLOC_FALLTHROUGH;
case 11:
k2 ^= ((uint64_t)(tail[10])) << 16;
JEMALLOC_FALLTHROUGH;
case 10:
k2 ^= ((uint64_t)(tail[9])) << 8;
JEMALLOC_FALLTHROUGH;
case 9:
k2 ^= ((uint64_t)(tail[8])) << 0;
k2 *= c2;
k2 = hash_rotl_64(k2, 33);
k2 *= c1;
h2 ^= k2;
JEMALLOC_FALLTHROUGH;
case 8:
k1 ^= ((uint64_t)(tail[7])) << 56;
JEMALLOC_FALLTHROUGH;
case 7:
k1 ^= ((uint64_t)(tail[6])) << 48;
JEMALLOC_FALLTHROUGH;
case 6:
k1 ^= ((uint64_t)(tail[5])) << 40;
JEMALLOC_FALLTHROUGH;
case 5:
k1 ^= ((uint64_t)(tail[4])) << 32;
JEMALLOC_FALLTHROUGH;
case 4:
k1 ^= ((uint64_t)(tail[3])) << 24;
JEMALLOC_FALLTHROUGH;
case 3:
k1 ^= ((uint64_t)(tail[2])) << 16;
JEMALLOC_FALLTHROUGH;
case 2:
k1 ^= ((uint64_t)(tail[1])) << 8;
JEMALLOC_FALLTHROUGH;
case 1:
k1 ^= ((uint64_t)(tail[0])) << 0;
k1 *= c1;
k1 = hash_rotl_64(k1, 31);
k1 *= c2;
h1 ^= k1;
break;
}
}
/* finalization */
h1 ^= len; h2 ^= len;
h1 ^= len;
h2 ^= len;
h1 += h2;
h2 += h1;
@ -334,10 +418,8 @@ hash_x64_128(const void *key, const int len, const uint32_t seed,
/******************************************************************************/
/* API. */
JEMALLOC_INLINE void
hash(const void *key, size_t len, const uint32_t seed, size_t r_hash[2])
{
static inline void
hash(const void *key, size_t len, const uint32_t seed, size_t r_hash[2]) {
assert(len <= INT_MAX); /* Unfortunate implementation limitation. */
#if (LG_SIZEOF_PTR == 3 && !defined(JEMALLOC_BIG_ENDIAN))
@ -351,7 +433,5 @@ hash(const void *key, size_t len, const uint32_t seed, size_t r_hash[2])
}
#endif
}
#endif
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/
#endif /* JEMALLOC_INTERNAL_HASH_H */

View file

@ -0,0 +1,163 @@
#ifndef JEMALLOC_INTERNAL_HOOK_H
#define JEMALLOC_INTERNAL_HOOK_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/tsd.h"
/*
* This API is *extremely* experimental, and may get ripped out, changed in API-
* and ABI-incompatible ways, be insufficiently or incorrectly documented, etc.
*
* It allows hooking the stateful parts of the API to see changes as they
* happen.
*
* Allocation hooks are called after the allocation is done, free hooks are
* called before the free is done, and expand hooks are called after the
* allocation is expanded.
*
* For realloc and rallocx, if the expansion happens in place, the expansion
* hook is called. If it is moved, then the alloc hook is called on the new
* location, and then the free hook is called on the old location (i.e. both
* hooks are invoked in between the alloc and the dalloc).
*
* If we return NULL from OOM, then usize might not be trustworthy. Calling
* realloc(NULL, size) only calls the alloc hook, and calling realloc(ptr, 0)
* only calls the free hook. (Calling realloc(NULL, 0) is treated as malloc(0),
* and only calls the alloc hook).
*
* Reentrancy:
* Reentrancy is guarded against from within the hook implementation. If you
* call allocator functions from within a hook, the hooks will not be invoked
* again.
* Threading:
* The installation of a hook synchronizes with all its uses. If you can
* prove the installation of a hook happens-before a jemalloc entry point,
* then the hook will get invoked (unless there's a racing removal).
*
* Hook insertion appears to be atomic at a per-thread level (i.e. if a thread
* allocates and has the alloc hook invoked, then a subsequent free on the
* same thread will also have the free hook invoked).
*
* The *removal* of a hook does *not* block until all threads are done with
* the hook. Hook authors have to be resilient to this, and need some
* out-of-band mechanism for cleaning up any dynamically allocated memory
* associated with their hook.
* Ordering:
* Order of hook execution is unspecified, and may be different than insertion
* order.
*/
#define HOOK_MAX 4
enum hook_alloc_e {
hook_alloc_malloc,
hook_alloc_posix_memalign,
hook_alloc_aligned_alloc,
hook_alloc_calloc,
hook_alloc_memalign,
hook_alloc_valloc,
hook_alloc_pvalloc,
hook_alloc_mallocx,
/* The reallocating functions have both alloc and dalloc variants */
hook_alloc_realloc,
hook_alloc_rallocx,
};
/*
* We put the enum typedef after the enum, since this file may get included by
* jemalloc_cpp.cpp, and C++ disallows enum forward declarations.
*/
typedef enum hook_alloc_e hook_alloc_t;
enum hook_dalloc_e {
hook_dalloc_free,
hook_dalloc_dallocx,
hook_dalloc_sdallocx,
/*
* The dalloc halves of reallocation (not called if in-place expansion
* happens).
*/
hook_dalloc_realloc,
hook_dalloc_rallocx,
};
typedef enum hook_dalloc_e hook_dalloc_t;
enum hook_expand_e {
hook_expand_realloc,
hook_expand_rallocx,
hook_expand_xallocx,
};
typedef enum hook_expand_e hook_expand_t;
typedef void (*hook_alloc)(void *extra, hook_alloc_t type, void *result,
uintptr_t result_raw, uintptr_t args_raw[3]);
typedef void (*hook_dalloc)(
void *extra, hook_dalloc_t type, void *address, uintptr_t args_raw[3]);
typedef void (*hook_expand)(void *extra, hook_expand_t type, void *address,
size_t old_usize, size_t new_usize, uintptr_t result_raw,
uintptr_t args_raw[4]);
typedef struct hooks_s hooks_t;
struct hooks_s {
hook_alloc alloc_hook;
hook_dalloc dalloc_hook;
hook_expand expand_hook;
void *extra;
};
/*
* Begin implementation details; everything above this point might one day live
* in a public API. Everything below this point never will.
*/
/*
* The realloc pathways haven't gotten any refactoring love in a while, and it's
* fairly difficult to pass information from the entry point to the hooks. We
* put the informaiton the hooks will need into a struct to encapsulate
* everything.
*
* Much of these pathways are force-inlined, so that the compiler can avoid
* materializing this struct until we hit an extern arena function. For fairly
* goofy reasons, *many* of the realloc paths hit an extern arena function.
* These paths are cold enough that it doesn't matter; eventually, we should
* rewrite the realloc code to make the expand-in-place and the
* free-then-realloc paths more orthogonal, at which point we don't need to
* spread the hook logic all over the place.
*/
typedef struct hook_ralloc_args_s hook_ralloc_args_t;
struct hook_ralloc_args_s {
/* I.e. as opposed to rallocx. */
bool is_realloc;
/*
* The expand hook takes 4 arguments, even if only 3 are actually used;
* we add an extra one in case the user decides to memcpy without
* looking too closely at the hooked function.
*/
uintptr_t args[4];
};
/*
* Returns an opaque handle to be used when removing the hook. NULL means that
* we couldn't install the hook.
*/
bool hook_boot(void);
void *hook_install(tsdn_t *tsdn, hooks_t *to_install);
/* Uninstalls the hook with the handle previously returned from hook_install. */
void hook_remove(tsdn_t *tsdn, void *opaque);
/* Hooks */
void hook_invoke_alloc(hook_alloc_t type, void *result, uintptr_t result_raw,
uintptr_t args_raw[3]);
void hook_invoke_dalloc(
hook_dalloc_t type, void *address, uintptr_t args_raw[3]);
void hook_invoke_expand(hook_expand_t type, void *address, size_t old_usize,
size_t new_usize, uintptr_t result_raw, uintptr_t args_raw[4]);
#endif /* JEMALLOC_INTERNAL_HOOK_H */

View file

@ -0,0 +1,185 @@
#ifndef JEMALLOC_INTERNAL_HPA_H
#define JEMALLOC_INTERNAL_HPA_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/base.h"
#include "jemalloc/internal/edata_cache.h"
#include "jemalloc/internal/emap.h"
#include "jemalloc/internal/exp_grow.h"
#include "jemalloc/internal/hpa_central.h"
#include "jemalloc/internal/hpa_hooks.h"
#include "jemalloc/internal/hpa_opts.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/pai.h"
#include "jemalloc/internal/psset.h"
#include "jemalloc/internal/sec.h"
typedef struct hpa_shard_nonderived_stats_s hpa_shard_nonderived_stats_t;
struct hpa_shard_nonderived_stats_s {
/*
* The number of times we've purged within a hugepage.
*
* Guarded by mtx.
*/
uint64_t npurge_passes;
/*
* The number of individual purge calls we perform (which should always
* be bigger than npurge_passes, since each pass purges at least one
* extent within a hugepage.
*
* Guarded by mtx.
*/
uint64_t npurges;
/*
* The number of times we've hugified a pageslab.
*
* Guarded by mtx.
*/
uint64_t nhugifies;
/*
* The number of times we've tried to hugify a pageslab, but failed.
*
* Guarded by mtx.
*/
uint64_t nhugify_failures;
/*
* The number of times we've dehugified a pageslab.
*
* Guarded by mtx.
*/
uint64_t ndehugifies;
};
/* Completely derived; only used by CTL. */
typedef struct hpa_shard_stats_s hpa_shard_stats_t;
struct hpa_shard_stats_s {
psset_stats_t psset_stats;
hpa_shard_nonderived_stats_t nonderived_stats;
sec_stats_t secstats;
};
typedef struct hpa_shard_s hpa_shard_t;
struct hpa_shard_s {
/*
* pai must be the first member; we cast from a pointer to it to a
* pointer to the hpa_shard_t.
*/
pai_t pai;
/* The central allocator we get our hugepages from. */
hpa_central_t *central;
/* Protects most of this shard's state. */
malloc_mutex_t mtx;
/*
* Guards the shard's access to the central allocator (preventing
* multiple threads operating on this shard from accessing the central
* allocator).
*/
malloc_mutex_t grow_mtx;
/* The base metadata allocator. */
base_t *base;
/*
* This edata cache is the one we use when allocating a small extent
* from a pageslab. The pageslab itself comes from the centralized
* allocator, and so will use its edata_cache.
*/
edata_cache_fast_t ecf;
/* Small extent cache (not guarded by mtx) */
JEMALLOC_ALIGNED(CACHELINE) sec_t sec;
psset_t psset;
/*
* How many grow operations have occurred.
*
* Guarded by grow_mtx.
*/
uint64_t age_counter;
/* The arena ind we're associated with. */
unsigned ind;
/*
* Our emap. This is just a cache of the emap pointer in the associated
* hpa_central.
*/
emap_t *emap;
/* The configuration choices for this hpa shard. */
hpa_shard_opts_t opts;
/*
* How many pages have we started but not yet finished purging in this
* hpa shard.
*/
size_t npending_purge;
/*
* Those stats which are copied directly into the CTL-centric hpa shard
* stats.
*/
hpa_shard_nonderived_stats_t stats;
/*
* Last time we performed purge on this shard.
*/
nstime_t last_purge;
/*
* Last time when we attempted work (purging or hugifying). If deferral
* of the work is allowed (we have background thread), this is the time
* when background thread checked if purging or hugifying needs to be
* done. If deferral is not allowed, this is the time of (hpa_alloc or
* hpa_dalloc) activity in the shard.
*/
nstime_t last_time_work_attempted;
};
bool hpa_hugepage_size_exceeds_limit(void);
/*
* Whether or not the HPA can be used given the current configuration. This is
* is not necessarily a guarantee that it backs its allocations by hugepages,
* just that it can function properly given the system it's running on.
*/
bool hpa_supported(void);
bool hpa_shard_init(tsdn_t *tsdn, hpa_shard_t *shard, hpa_central_t *central,
emap_t *emap, base_t *base, edata_cache_t *edata_cache, unsigned ind,
const hpa_shard_opts_t *opts, const sec_opts_t *sec_opts);
void hpa_shard_stats_accum(hpa_shard_stats_t *dst, hpa_shard_stats_t *src);
void hpa_shard_stats_merge(
tsdn_t *tsdn, hpa_shard_t *shard, hpa_shard_stats_t *dst);
/*
* Notify the shard that we won't use it for allocations much longer. Due to
* the possibility of races, we don't actually prevent allocations; just flush
* and disable the embedded edata_cache_small.
*/
void hpa_shard_disable(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_destroy(tsdn_t *tsdn, hpa_shard_t *shard);
/* Flush caches that shard may be using */
void hpa_shard_flush(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_set_deferral_allowed(
tsdn_t *tsdn, hpa_shard_t *shard, bool deferral_allowed);
void hpa_shard_do_deferred_work(tsdn_t *tsdn, hpa_shard_t *shard);
/*
* We share the fork ordering with the PA and arena prefork handling; that's why
* these are 2, 3 and 4 rather than 0 and 1.
*/
void hpa_shard_prefork2(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_prefork3(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_prefork4(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_postfork_parent(tsdn_t *tsdn, hpa_shard_t *shard);
void hpa_shard_postfork_child(tsdn_t *tsdn, hpa_shard_t *shard);
#endif /* JEMALLOC_INTERNAL_HPA_H */

View file

@ -0,0 +1,41 @@
#ifndef JEMALLOC_INTERNAL_HPA_CENTRAL_H
#define JEMALLOC_INTERNAL_HPA_CENTRAL_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/base.h"
#include "jemalloc/internal/hpa_hooks.h"
#include "jemalloc/internal/hpdata.h"
#include "jemalloc/internal/mutex.h"
#include "jemalloc/internal/tsd_types.h"
typedef struct hpa_central_s hpa_central_t;
struct hpa_central_s {
/*
* Guards expansion of eden. We separate this from the regular mutex so
* that cheaper operations can still continue while we're doing the OS
* call.
*/
malloc_mutex_t grow_mtx;
/*
* Either NULL (if empty), or some integer multiple of a
* hugepage-aligned number of hugepages. We carve them off one at a
* time to satisfy new pageslab requests.
*
* Guarded by grow_mtx.
*/
void *eden;
size_t eden_len;
/* Source for metadata. */
base_t *base;
/* The HPA hooks. */
hpa_hooks_t hooks;
};
bool hpa_central_init(
hpa_central_t *central, base_t *base, const hpa_hooks_t *hooks);
hpdata_t *hpa_central_extract(tsdn_t *tsdn, hpa_central_t *central, size_t size,
uint64_t age, bool hugify_eager, bool *oom);
#endif /* JEMALLOC_INTERNAL_HPA_CENTRAL_H */

View file

@ -0,0 +1,21 @@
#ifndef JEMALLOC_INTERNAL_HPA_HOOKS_H
#define JEMALLOC_INTERNAL_HPA_HOOKS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/nstime.h"
typedef struct hpa_hooks_s hpa_hooks_t;
struct hpa_hooks_s {
void *(*map)(size_t size);
void (*unmap)(void *ptr, size_t size);
void (*purge)(void *ptr, size_t size);
bool (*hugify)(void *ptr, size_t size, bool sync);
void (*dehugify)(void *ptr, size_t size);
void (*curtime)(nstime_t *r_time, bool first_reading);
uint64_t (*ms_since)(nstime_t *r_time);
bool (*vectorized_purge)(void *vec, size_t vlen, size_t nbytes);
};
extern const hpa_hooks_t hpa_hooks_default;
#endif /* JEMALLOC_INTERNAL_HPA_HOOKS_H */

View file

@ -0,0 +1,190 @@
#ifndef JEMALLOC_INTERNAL_HPA_OPTS_H
#define JEMALLOC_INTERNAL_HPA_OPTS_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/fxp.h"
/*
* This file is morally part of hpa.h, but is split out for header-ordering
* reasons.
*
* All of these hpa_shard_opts below are experimental. We are exploring more
* efficient packing, hugifying, and purging approaches to make efficient
* trade-offs between CPU, memory, latency, and usability. This means all of
* them are at the risk of being deprecated and corresponding configurations
* should be updated once the final version settles.
*/
/*
* This enum controls how jemalloc hugifies/dehugifies pages. Each style may be
* more suitable depending on deployment environments.
*
* hpa_hugify_style_none
* Using this means that jemalloc will not be hugifying or dehugifying pages,
* but will let the kernel make those decisions. This style only makes sense
* when deploying on systems where THP are enabled in 'always' mode. With this
* style, you most likely want to have no purging at all (dirty_mult=-1) or
* purge_threshold=HUGEPAGE bytes (2097152 for 2Mb page), although other
* thresholds may work well depending on kernel settings of your deployment
* targets.
*
* hpa_hugify_style_eager
* This style results in jemalloc giving hugepage advice, if needed, to
* anonymous memory immediately after it is mapped, so huge pages can be backing
* that memory at page-fault time. This is usually more efficient than doing
* it later, and it allows us to benefit from the hugepages from the start.
* Same options for purging as for the style 'none' are good starting choices:
* no purging, or purge_threshold=HUGEPAGE, some min_purge_delay_ms that allows
* for page not to be purged quickly, etc. This is a good choice if you can
* afford extra memory and your application gets performance increase from
* transparent hughepages.
*
* hpa_hugify_style_lazy
* This style is suitable when you purge more aggressively (you sacrifice CPU
* performance for less memory). When this style is chosen, jemalloc will
* hugify once hugification_threshold is reached, and dehugify before purging.
* If the kernel is configured to use direct compaction you may experience some
* allocation latency when using this style. The best is to measure what works
* better for your application needs, and in the target deployment environment.
* This is a good choice for apps that cannot afford a lot of memory regression,
* but would still like to benefit from backing certain memory regions with
* hugepages.
*/
enum hpa_hugify_style_e {
hpa_hugify_style_auto = 0,
hpa_hugify_style_none = 1,
hpa_hugify_style_eager = 2,
hpa_hugify_style_lazy = 3,
hpa_hugify_style_limit = hpa_hugify_style_lazy + 1
};
typedef enum hpa_hugify_style_e hpa_hugify_style_t;
extern const char *const hpa_hugify_style_names[];
typedef struct hpa_shard_opts_s hpa_shard_opts_t;
struct hpa_shard_opts_s {
/*
* The largest size we'll allocate out of the shard. For those
* allocations refused, the caller (in practice, the PA module) will
* fall back to the more general (for now) PAC, which can always handle
* any allocation request.
*/
size_t slab_max_alloc;
/*
* When the number of active bytes in a hugepage is >=
* hugification_threshold, we force hugify it.
*/
size_t hugification_threshold;
/*
* The HPA purges whenever the number of pages exceeds dirty_mult *
* active_pages. This may be set to (fxp_t)-1 to disable purging.
*/
fxp_t dirty_mult;
/*
* Whether or not the PAI methods are allowed to defer work to a
* subsequent hpa_shard_do_deferred_work() call. Practically, this
* corresponds to background threads being enabled. We track this
* ourselves for encapsulation purposes.
*/
bool deferral_allowed;
/*
* How long a hugepage has to be a hugification candidate before it will
* actually get hugified.
*/
uint64_t hugify_delay_ms;
/*
* Hugify pages synchronously (hugify will happen even if hugify_style
* is not hpa_hugify_style_lazy).
*/
bool hugify_sync;
/*
* Minimum amount of time between purges.
*/
uint64_t min_purge_interval_ms;
/*
* Maximum number of hugepages to purge on each purging attempt.
*/
ssize_t experimental_max_purge_nhp;
/*
* Minimum number of inactive bytes needed for a non-empty page to be
* considered purgable.
*
* When the number of touched inactive bytes on non-empty hugepage is
* >= purge_threshold, the page is purgable. Empty pages are always
* purgable. Setting this to HUGEPAGE bytes would only purge empty
* pages if using hugify_style_eager and the purges would be exactly
* HUGEPAGE bytes. Depending on your kernel settings, this may result
* in better performance.
*
* Please note, when threshold is reached, we will purge all the dirty
* bytes, and not just up to the threshold. If this is PAGE bytes, then
* all the pages that have any dirty bytes are purgable. We treat
* purgability constraint for purge_threshold as stronger than
* dirty_mult, IOW, if no page meets purge_threshold, we will not purge
* even if we are above dirty_mult.
*/
size_t purge_threshold;
/*
* Minimum number of ms that needs to elapse between HP page becoming
* eligible for purging and actually getting purged.
*
* Setting this to a larger number would give better chance of reusing
* that memory. Setting it to 0 means that page is eligible for purging
* as soon as it meets the purge_threshold. The clock resets when
* purgability of the page changes (page goes from being non-purgable to
* purgable). When using eager style you probably want to allow for
* some delay, to avoid purging the page too quickly and give it time to
* be used.
*/
uint64_t min_purge_delay_ms;
/*
* Style of hugification/dehugification (see comment at
* hpa_hugify_style_t for options).
*/
hpa_hugify_style_t hugify_style;
};
/* clang-format off */
#define HPA_SHARD_OPTS_DEFAULT { \
/* slab_max_alloc */ \
64 * 1024, \
/* hugification_threshold */ \
HUGEPAGE * 95 / 100, \
/* dirty_mult */ \
FXP_INIT_PERCENT(25), \
/* \
* deferral_allowed \
* \
* Really, this is always set by the arena during creation \
* or by an hpa_shard_set_deferral_allowed call, so the value \
* we put here doesn't matter. \
*/ \
false, \
/* hugify_delay_ms */ \
10 * 1000, \
/* hugify_sync */ \
false, \
/* min_purge_interval_ms */ \
5 * 1000, \
/* experimental_max_purge_nhp */ \
-1, \
/* size_t purge_threshold */ \
PAGE, \
/* min_purge_delay_ms */ \
0, \
/* hugify_style */ \
hpa_hugify_style_lazy \
}
/* clang-format on */
#endif /* JEMALLOC_INTERNAL_HPA_OPTS_H */

View file

@ -0,0 +1,161 @@
#ifndef JEMALLOC_INTERNAL_HPA_UTILS_H
#define JEMALLOC_INTERNAL_HPA_UTILS_H
#include "jemalloc/internal/hpa.h"
#include "jemalloc/internal/extent.h"
#define HPA_MIN_VAR_VEC_SIZE 8
/*
* This is used for jemalloc internal tuning and may change in the future based
* on production traffic.
*
* This value protects two things:
* 1. Stack size
* 2. Number of huge pages that are being purged in a batch as we do not
* allow allocations while making madvise syscall.
*/
#define HPA_PURGE_BATCH_MAX 16
#ifdef JEMALLOC_HAVE_PROCESS_MADVISE
typedef struct iovec hpa_io_vector_t;
#else
typedef struct {
void *iov_base;
size_t iov_len;
} hpa_io_vector_t;
#endif
static inline size_t
hpa_process_madvise_max_iovec_len(void) {
assert(
opt_process_madvise_max_batch <= PROCESS_MADVISE_MAX_BATCH_LIMIT);
return opt_process_madvise_max_batch == 0
? HPA_MIN_VAR_VEC_SIZE
: opt_process_madvise_max_batch;
}
/* Actually invoke hooks. If we fail vectorized, use single purges */
static void
hpa_try_vectorized_purge(
hpa_hooks_t *hooks, hpa_io_vector_t *vec, size_t vlen, size_t nbytes) {
bool success = opt_process_madvise_max_batch > 0
&& !hooks->vectorized_purge(vec, vlen, nbytes);
if (!success) {
/* On failure, it is safe to purge again (potential perf
* penalty) If kernel can tell exactly which regions
* failed, we could avoid that penalty.
*/
for (size_t i = 0; i < vlen; ++i) {
hooks->purge(vec[i].iov_base, vec[i].iov_len);
}
}
}
/*
* This structure accumulates the regions for process_madvise. It invokes the
* hook when batch limit is reached.
*/
typedef struct {
hpa_io_vector_t *vp;
size_t cur;
size_t total_bytes;
size_t capacity;
} hpa_range_accum_t;
static inline void
hpa_range_accum_init(hpa_range_accum_t *ra, hpa_io_vector_t *v, size_t sz) {
ra->vp = v;
ra->capacity = sz;
ra->total_bytes = 0;
ra->cur = 0;
}
static inline void
hpa_range_accum_flush(hpa_range_accum_t *ra, hpa_hooks_t *hooks) {
assert(ra->total_bytes > 0 && ra->cur > 0);
hpa_try_vectorized_purge(hooks, ra->vp, ra->cur, ra->total_bytes);
ra->cur = 0;
ra->total_bytes = 0;
}
static inline void
hpa_range_accum_add(
hpa_range_accum_t *ra, void *addr, size_t sz, hpa_hooks_t *hooks) {
assert(ra->cur < ra->capacity);
ra->vp[ra->cur].iov_base = addr;
ra->vp[ra->cur].iov_len = sz;
ra->total_bytes += sz;
ra->cur++;
if (ra->cur == ra->capacity) {
hpa_range_accum_flush(ra, hooks);
}
}
static inline void
hpa_range_accum_finish(hpa_range_accum_t *ra, hpa_hooks_t *hooks) {
if (ra->cur > 0) {
hpa_range_accum_flush(ra, hooks);
}
}
/*
* For purging more than one page we use batch of these items
*/
typedef struct {
hpdata_purge_state_t state;
hpdata_t *hp;
bool dehugify;
} hpa_purge_item_t;
typedef struct hpa_purge_batch_s hpa_purge_batch_t;
struct hpa_purge_batch_s {
hpa_purge_item_t *items;
size_t items_capacity;
/* Number of huge pages to purge in current batch */
size_t item_cnt;
/* Number of ranges to purge in current batch */
size_t nranges;
/* Total number of dirty pages in current batch*/
size_t ndirty_in_batch;
/* Max number of huge pages to purge */
size_t max_hp;
/*
* Once we are above this watermark we should not add more pages
* to the same batch. This is because while we want to minimize
* number of madvise calls we also do not want to be preventing
* allocations from too many huge pages (which we have to do
* while they are being purged)
*/
size_t range_watermark;
size_t npurged_hp_total;
};
static inline bool
hpa_batch_full(hpa_purge_batch_t *b) {
/* It's okay for ranges to go above */
return b->npurged_hp_total == b->max_hp
|| b->item_cnt == b->items_capacity
|| b->nranges >= b->range_watermark;
}
static inline void
hpa_batch_pass_start(hpa_purge_batch_t *b) {
b->item_cnt = 0;
b->nranges = 0;
b->ndirty_in_batch = 0;
}
static inline bool
hpa_batch_empty(hpa_purge_batch_t *b) {
return b->item_cnt == 0;
}
/* Purge pages in a batch using given hooks */
void hpa_purge_batch(
hpa_hooks_t *hooks, hpa_purge_item_t *batch, size_t batch_sz);
#endif /* JEMALLOC_INTERNAL_HPA_UTILS_H */

View file

@ -0,0 +1,486 @@
#ifndef JEMALLOC_INTERNAL_HPDATA_H
#define JEMALLOC_INTERNAL_HPDATA_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/fb.h"
#include "jemalloc/internal/nstime.h"
#include "jemalloc/internal/pages.h"
#include "jemalloc/internal/ph.h"
#include "jemalloc/internal/ql.h"
#include "jemalloc/internal/typed_list.h"
/*
* The metadata representation we use for extents in hugepages. While the PAC
* uses the edata_t to represent both active and inactive extents, the HP only
* uses the edata_t for active ones; instead, inactive extent state is tracked
* within hpdata associated with the enclosing hugepage-sized, hugepage-aligned
* region of virtual address space.
*
* An hpdata need not be "truly" backed by a hugepage (which is not necessarily
* an observable property of any given region of address space). It's just
* hugepage-sized and hugepage-aligned; it's *potentially* huge.
*/
/*
* The max enumeration num should not exceed 2^16 - 1, see comments in edata.h
* for ESET_ENUMERATE_MAX_NUM for more details.
*/
#define PSSET_ENUMERATE_MAX_NUM 32
typedef struct hpdata_s hpdata_t;
ph_structs(hpdata_age_heap, hpdata_t, PSSET_ENUMERATE_MAX_NUM);
struct hpdata_s {
/*
* We likewise follow the edata convention of mangling names and forcing
* the use of accessors -- this lets us add some consistency checks on
* access.
*/
/*
* The address of the hugepage in question. This can't be named h_addr,
* since that conflicts with a macro defined in Windows headers.
*/
void *h_address;
/* Its age (measured in psset operations). */
uint64_t h_age;
/* Whether or not we think the hugepage is mapped that way by the OS. */
bool h_huge;
/*
* For some properties, we keep parallel sets of bools; h_foo_allowed
* and h_in_psset_foo_container. This is a decoupling mechanism to
* avoid bothering the hpa (which manages policies) from the psset
* (which is the mechanism used to enforce those policies). This allows
* all the container management logic to live in one place, without the
* HPA needing to know or care how that happens.
*/
/*
* Whether or not the hpdata is allowed to be used to serve allocations,
* and whether or not the psset is currently tracking it as such.
*/
bool h_alloc_allowed;
bool h_in_psset_alloc_container;
/*
* The same, but with purging. There's no corresponding
* h_in_psset_purge_container, because the psset (currently) always
* removes hpdatas from their containers during updates (to implement
* LRU for purging).
*/
bool h_purge_allowed;
/* And with hugifying. */
bool h_hugify_allowed;
/* When we became a hugification candidate. */
nstime_t h_time_hugify_allowed;
bool h_in_psset_hugify_container;
/* Whether or not a purge or hugify is currently happening. */
bool h_mid_purge;
bool h_mid_hugify;
/*
* Whether or not the hpdata is being updated in the psset (i.e. if
* there has been a psset_update_begin call issued without a matching
* psset_update_end call). Eventually this will expand to other types
* of updates.
*/
bool h_updating;
/* Whether or not the hpdata is in a psset. */
bool h_in_psset;
union {
/* When nonempty (and also nonfull), used by the psset bins. */
hpdata_age_heap_link_t age_link;
/*
* When empty (or not corresponding to any hugepage), list
* linkage.
*/
ql_elm(hpdata_t) ql_link_empty;
};
/*
* Linkage for the psset to track candidates for purging and hugifying.
*/
ql_elm(hpdata_t) ql_link_purge;
ql_elm(hpdata_t) ql_link_hugify;
/* The length of the largest contiguous sequence of inactive pages. */
size_t h_longest_free_range;
/* Number of active pages. */
size_t h_nactive;
/* A bitmap with bits set in the active pages. */
fb_group_t active_pages[FB_NGROUPS(HUGEPAGE_PAGES)];
/*
* Number of dirty or active pages, and a bitmap tracking them. One
* way to think of this is as which pages are dirty from the OS's
* perspective.
*/
size_t h_ntouched;
/* The touched pages (using the same definition as above). */
fb_group_t touched_pages[FB_NGROUPS(HUGEPAGE_PAGES)];
/* Time when this extent (hpdata) becomes eligible for purging */
nstime_t h_time_purge_allowed;
/* True if the extent was huge and empty last time when it was purged */
bool h_purged_when_empty_and_huge;
};
TYPED_LIST(hpdata_empty_list, hpdata_t, ql_link_empty)
TYPED_LIST(hpdata_purge_list, hpdata_t, ql_link_purge)
TYPED_LIST(hpdata_hugify_list, hpdata_t, ql_link_hugify)
ph_proto(, hpdata_age_heap, hpdata_t);
static inline void *
hpdata_addr_get(const hpdata_t *hpdata) {
return hpdata->h_address;
}
static inline void
hpdata_addr_set(hpdata_t *hpdata, void *addr) {
assert(HUGEPAGE_ADDR2BASE(addr) == addr);
hpdata->h_address = addr;
}
static inline uint64_t
hpdata_age_get(const hpdata_t *hpdata) {
return hpdata->h_age;
}
static inline void
hpdata_age_set(hpdata_t *hpdata, uint64_t age) {
hpdata->h_age = age;
}
static inline bool
hpdata_huge_get(const hpdata_t *hpdata) {
return hpdata->h_huge;
}
static inline bool
hpdata_alloc_allowed_get(const hpdata_t *hpdata) {
return hpdata->h_alloc_allowed;
}
static inline void
hpdata_alloc_allowed_set(hpdata_t *hpdata, bool alloc_allowed) {
hpdata->h_alloc_allowed = alloc_allowed;
}
static inline bool
hpdata_in_psset_alloc_container_get(const hpdata_t *hpdata) {
return hpdata->h_in_psset_alloc_container;
}
static inline void
hpdata_in_psset_alloc_container_set(hpdata_t *hpdata, bool in_container) {
assert(in_container != hpdata->h_in_psset_alloc_container);
hpdata->h_in_psset_alloc_container = in_container;
}
static inline bool
hpdata_purge_allowed_get(const hpdata_t *hpdata) {
return hpdata->h_purge_allowed;
}
static inline void
hpdata_purge_allowed_set(hpdata_t *hpdata, bool purge_allowed) {
assert(purge_allowed == false || !hpdata->h_mid_purge);
hpdata->h_purge_allowed = purge_allowed;
}
static inline bool
hpdata_hugify_allowed_get(const hpdata_t *hpdata) {
return hpdata->h_hugify_allowed;
}
static inline void
hpdata_allow_hugify(hpdata_t *hpdata, nstime_t now) {
assert(!hpdata->h_mid_hugify);
hpdata->h_hugify_allowed = true;
hpdata->h_time_hugify_allowed = now;
}
static inline nstime_t
hpdata_time_hugify_allowed(hpdata_t *hpdata) {
return hpdata->h_time_hugify_allowed;
}
static inline void
hpdata_disallow_hugify(hpdata_t *hpdata) {
hpdata->h_hugify_allowed = false;
}
static inline bool
hpdata_in_psset_hugify_container_get(const hpdata_t *hpdata) {
return hpdata->h_in_psset_hugify_container;
}
static inline void
hpdata_in_psset_hugify_container_set(hpdata_t *hpdata, bool in_container) {
assert(in_container != hpdata->h_in_psset_hugify_container);
hpdata->h_in_psset_hugify_container = in_container;
}
static inline bool
hpdata_mid_purge_get(const hpdata_t *hpdata) {
return hpdata->h_mid_purge;
}
static inline void
hpdata_mid_purge_set(hpdata_t *hpdata, bool mid_purge) {
assert(mid_purge != hpdata->h_mid_purge);
hpdata->h_mid_purge = mid_purge;
}
static inline bool
hpdata_mid_hugify_get(const hpdata_t *hpdata) {
return hpdata->h_mid_hugify;
}
static inline void
hpdata_mid_hugify_set(hpdata_t *hpdata, bool mid_hugify) {
assert(mid_hugify != hpdata->h_mid_hugify);
hpdata->h_mid_hugify = mid_hugify;
}
static inline bool
hpdata_changing_state_get(const hpdata_t *hpdata) {
return hpdata->h_mid_purge || hpdata->h_mid_hugify;
}
static inline bool
hpdata_updating_get(const hpdata_t *hpdata) {
return hpdata->h_updating;
}
static inline void
hpdata_updating_set(hpdata_t *hpdata, bool updating) {
assert(updating != hpdata->h_updating);
hpdata->h_updating = updating;
}
static inline bool
hpdata_in_psset_get(const hpdata_t *hpdata) {
return hpdata->h_in_psset;
}
static inline void
hpdata_in_psset_set(hpdata_t *hpdata, bool in_psset) {
assert(in_psset != hpdata->h_in_psset);
hpdata->h_in_psset = in_psset;
}
static inline size_t
hpdata_longest_free_range_get(const hpdata_t *hpdata) {
return hpdata->h_longest_free_range;
}
static inline void
hpdata_longest_free_range_set(hpdata_t *hpdata, size_t longest_free_range) {
assert(longest_free_range <= HUGEPAGE_PAGES);
hpdata->h_longest_free_range = longest_free_range;
}
static inline size_t
hpdata_nactive_get(const hpdata_t *hpdata) {
return hpdata->h_nactive;
}
static inline size_t
hpdata_ntouched_get(const hpdata_t *hpdata) {
return hpdata->h_ntouched;
}
static inline size_t
hpdata_ndirty_get(const hpdata_t *hpdata) {
return hpdata->h_ntouched - hpdata->h_nactive;
}
static inline size_t
hpdata_nretained_get(hpdata_t *hpdata) {
return HUGEPAGE_PAGES - hpdata->h_ntouched;
}
static inline void
hpdata_time_purge_allowed_set(hpdata_t *hpdata, const nstime_t *v) {
nstime_copy(&hpdata->h_time_purge_allowed, v);
}
static inline const nstime_t *
hpdata_time_purge_allowed_get(const hpdata_t *hpdata) {
return &hpdata->h_time_purge_allowed;
}
static inline bool
hpdata_purged_when_empty_and_huge_get(const hpdata_t *hpdata) {
return hpdata->h_purged_when_empty_and_huge;
}
static inline void
hpdata_purged_when_empty_and_huge_set(hpdata_t *hpdata, bool v) {
hpdata->h_purged_when_empty_and_huge = v;
}
static inline void
hpdata_assert_empty(hpdata_t *hpdata) {
assert(fb_empty(hpdata->active_pages, HUGEPAGE_PAGES));
assert(hpdata->h_nactive == 0);
}
/*
* Only used in tests, and in hpdata_assert_consistent, below. Verifies some
* consistency properties of the hpdata (e.g. that cached counts of page stats
* match computed ones).
*/
static inline bool
hpdata_consistent(hpdata_t *hpdata) {
bool res = true;
const size_t active_urange_longest = fb_urange_longest(
hpdata->active_pages, HUGEPAGE_PAGES);
const size_t longest_free_range = hpdata_longest_free_range_get(hpdata);
if (active_urange_longest != longest_free_range) {
malloc_printf(
"<jemalloc>: active_fb_urange_longest=%zu != hpdata_longest_free_range=%zu\n",
active_urange_longest, longest_free_range);
res = false;
}
const size_t active_scount = fb_scount(
hpdata->active_pages, HUGEPAGE_PAGES, 0, HUGEPAGE_PAGES);
if (active_scount != hpdata->h_nactive) {
malloc_printf(
"<jemalloc>: active_fb_scount=%zu != hpdata_nactive=%zu\n",
active_scount, hpdata->h_nactive);
res = false;
}
const size_t touched_scount = fb_scount(
hpdata->touched_pages, HUGEPAGE_PAGES, 0, HUGEPAGE_PAGES);
if (touched_scount != hpdata->h_ntouched) {
malloc_printf(
"<jemalloc>: touched_fb_scount=%zu != hpdata_ntouched=%zu\n",
touched_scount, hpdata->h_ntouched);
res = false;
}
if (hpdata->h_ntouched < hpdata->h_nactive) {
malloc_printf(
"<jemalloc>: hpdata_ntouched=%zu < hpdata_nactive=%zu\n",
hpdata->h_ntouched, hpdata->h_nactive);
res = false;
}
if (hpdata->h_huge && (hpdata->h_ntouched != HUGEPAGE_PAGES)) {
malloc_printf(
"<jemalloc>: hpdata_huge=%d && (hpdata_ntouched=%zu != hugepage_pages=%zu)\n",
hpdata->h_huge, hpdata->h_ntouched, HUGEPAGE_PAGES);
res = false;
}
const bool changing_state = hpdata_changing_state_get(hpdata);
if (changing_state
&& (hpdata->h_purge_allowed || hpdata->h_hugify_allowed)) {
malloc_printf(
"<jemalloc>: hpdata_changing_state=%d && (hpdata_purge_allowed=%d || hpdata_hugify_allowed=%d)\n",
changing_state, hpdata->h_purge_allowed,
hpdata->h_hugify_allowed);
res = false;
}
if (hpdata_hugify_allowed_get(hpdata)
!= hpdata_in_psset_hugify_container_get(hpdata)) {
malloc_printf(
"<jemalloc>: hpdata_hugify_allowed=%d != hpdata_in_psset_hugify_container=%d\n",
hpdata_hugify_allowed_get(hpdata),
hpdata_in_psset_hugify_container_get(hpdata));
res = false;
}
return res;
}
#define hpdata_assert_consistent(hpdata) \
do { \
assert(hpdata_consistent(hpdata)); \
} while (0)
static inline bool
hpdata_empty(const hpdata_t *hpdata) {
return hpdata->h_nactive == 0;
}
static inline bool
hpdata_full(const hpdata_t *hpdata) {
return hpdata->h_nactive == HUGEPAGE_PAGES;
}
void hpdata_init(hpdata_t *hpdata, void *addr, uint64_t age, bool is_huge);
/*
* Given an hpdata which can serve an allocation request, pick and reserve an
* offset within that allocation.
*/
void *hpdata_reserve_alloc(hpdata_t *hpdata, size_t sz);
void hpdata_unreserve(hpdata_t *hpdata, void *addr, size_t sz);
/*
* The hpdata_purge_prepare_t allows grabbing the metadata required to purge
* subranges of a hugepage while holding a lock, drop the lock during the actual
* purging of them, and reacquire it to update the metadata again.
*/
typedef struct hpdata_purge_state_s hpdata_purge_state_t;
struct hpdata_purge_state_s {
size_t npurged;
size_t ndirty_to_purge;
fb_group_t to_purge[FB_NGROUPS(HUGEPAGE_PAGES)];
size_t next_purge_search_begin;
};
/*
* Initializes purge state. The access to hpdata must be externally
* synchronized with other hpdata_* calls.
*
* You can tell whether or not a thread is purging or hugifying a given hpdata
* via hpdata_changing_state_get(hpdata). Racing hugification or purging
* operations aren't allowed.
*
* Once you begin purging, you have to follow through and call hpdata_purge_next
* until you're done, and then end. Allocating out of an hpdata undergoing
* purging is not allowed.
*
* Returns the number of dirty pages that will be purged and sets nranges
* to number of ranges with dirty pages that will be purged.
*/
size_t hpdata_purge_begin(
hpdata_t *hpdata, hpdata_purge_state_t *purge_state, size_t *nranges);
/*
* If there are more extents to purge, sets *r_purge_addr and *r_purge_size to
* true, and returns true. Otherwise, returns false to indicate that we're
* done.
*
* This requires exclusive access to the purge state, but *not* to the hpdata.
* In particular, unreserve calls are allowed while purging (i.e. you can dalloc
* into one part of the hpdata while purging a different part).
*/
bool hpdata_purge_next(hpdata_t *hpdata, hpdata_purge_state_t *purge_state,
void **r_purge_addr, size_t *r_purge_size);
/*
* Updates the hpdata metadata after all purging is done. Needs external
* synchronization.
*/
void hpdata_purge_end(hpdata_t *hpdata, hpdata_purge_state_t *purge_state);
void hpdata_hugify(hpdata_t *hpdata);
void hpdata_dehugify(hpdata_t *hpdata);
#endif /* JEMALLOC_INTERNAL_HPDATA_H */

View file

@ -1,35 +0,0 @@
/******************************************************************************/
#ifdef JEMALLOC_H_TYPES
#endif /* JEMALLOC_H_TYPES */
/******************************************************************************/
#ifdef JEMALLOC_H_STRUCTS
#endif /* JEMALLOC_H_STRUCTS */
/******************************************************************************/
#ifdef JEMALLOC_H_EXTERNS
void *huge_malloc(tsdn_t *tsdn, arena_t *arena, size_t usize, bool zero);
void *huge_palloc(tsdn_t *tsdn, arena_t *arena, size_t usize,
size_t alignment, bool zero);
bool huge_ralloc_no_move(tsdn_t *tsdn, void *ptr, size_t oldsize,
size_t usize_min, size_t usize_max, bool zero);
void *huge_ralloc(tsd_t *tsd, arena_t *arena, void *ptr, size_t oldsize,
size_t usize, size_t alignment, bool zero, tcache_t *tcache);
#ifdef JEMALLOC_JET
typedef void (huge_dalloc_junk_t)(void *, size_t);
extern huge_dalloc_junk_t *huge_dalloc_junk;
#endif
void huge_dalloc(tsdn_t *tsdn, void *ptr);
arena_t *huge_aalloc(const void *ptr);
size_t huge_salloc(tsdn_t *tsdn, const void *ptr);
prof_tctx_t *huge_prof_tctx_get(tsdn_t *tsdn, const void *ptr);
void huge_prof_tctx_set(tsdn_t *tsdn, const void *ptr, prof_tctx_t *tctx);
void huge_prof_tctx_reset(tsdn_t *tsdn, const void *ptr);
#endif /* JEMALLOC_H_EXTERNS */
/******************************************************************************/
#ifdef JEMALLOC_H_INLINES
#endif /* JEMALLOC_H_INLINES */
/******************************************************************************/

View file

@ -0,0 +1,43 @@
#ifndef JEMALLOC_INTERNAL_INSPECT_H
#define JEMALLOC_INTERNAL_INSPECT_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/tsd_types.h"
/*
* This module contains the heap introspection capabilities. For now they are
* exposed purely through mallctl APIs in the experimental namespace, but this
* may change over time.
*/
/*
* The following two structs are for experimental purposes. See
* experimental_utilization_query_ctl and
* experimental_utilization_batch_query_ctl in src/ctl.c.
*/
typedef struct inspect_extent_util_stats_s inspect_extent_util_stats_t;
struct inspect_extent_util_stats_s {
size_t nfree;
size_t nregs;
size_t size;
};
typedef struct inspect_extent_util_stats_verbose_s
inspect_extent_util_stats_verbose_t;
struct inspect_extent_util_stats_verbose_s {
void *slabcur_addr;
size_t nfree;
size_t nregs;
size_t size;
size_t bin_nfree;
size_t bin_nregs;
};
void inspect_extent_util_stats_get(
tsdn_t *tsdn, const void *ptr, size_t *nfree, size_t *nregs, size_t *size);
void inspect_extent_util_stats_verbose_get(tsdn_t *tsdn, const void *ptr,
size_t *nfree, size_t *nregs, size_t *size, size_t *bin_nfree,
size_t *bin_nregs, void **slabcur_addr);
#endif /* JEMALLOC_INTERNAL_INSPECT_H */

File diff suppressed because it is too large Load diff

View file

@ -1,40 +1,67 @@
#ifndef JEMALLOC_INTERNAL_DECLS_H
#define JEMALLOC_INTERNAL_DECLS_H
#define JEMALLOC_INTERNAL_DECLS_H
#include <math.h>
#ifdef _WIN32
# include <windows.h>
# include "msvc_compat/windows_extra.h"
# include <windows.h>
# include "msvc_compat/windows_extra.h"
# include "msvc_compat/strings.h"
# ifdef _WIN64
# if LG_VADDR <= 32
# error Generate the headers using x64 vcargs
# endif
# else
# if LG_VADDR > 32
# undef LG_VADDR
# define LG_VADDR 32
# endif
# endif
#else
# include <sys/param.h>
# include <sys/mman.h>
# if !defined(__pnacl__) && !defined(__native_client__)
# include <sys/syscall.h>
# if !defined(SYS_write) && defined(__NR_write)
# define SYS_write __NR_write
# endif
# include <sys/uio.h>
# endif
# include <pthread.h>
# ifdef JEMALLOC_OS_UNFAIR_LOCK
# include <os/lock.h>
# endif
# ifdef JEMALLOC_GLIBC_MALLOC_HOOK
# include <sched.h>
# endif
# include <errno.h>
# include <sys/time.h>
# include <time.h>
# ifdef JEMALLOC_HAVE_MACH_ABSOLUTE_TIME
# include <mach/mach_time.h>
# endif
# include <sys/param.h>
# include <sys/mman.h>
# if !defined(__pnacl__) && !defined(__native_client__)
# include <sys/syscall.h>
# if !defined(SYS_write) && defined(__NR_write)
# define SYS_write __NR_write
# endif
# if defined(SYS_open) && defined(__aarch64__)
/* Android headers may define SYS_open to __NR_open even though
* __NR_open may not exist on AArch64 (superseded by __NR_openat). */
# undef SYS_open
# endif
# include <sys/uio.h>
# endif
# include <pthread.h>
# if defined(__FreeBSD__) || defined(__DragonFly__) \
|| defined(__OpenBSD__)
# include <pthread_np.h>
# include <sched.h>
# if defined(__FreeBSD__)
# define cpu_set_t cpuset_t
# endif
# endif
# include <signal.h>
# ifdef JEMALLOC_OS_UNFAIR_LOCK
# include <os/lock.h>
# endif
# ifdef JEMALLOC_GLIBC_MALLOC_HOOK
# include <sched.h>
# endif
# include <errno.h>
# include <sys/time.h>
# include <time.h>
# ifdef JEMALLOC_HAVE_MACH_ABSOLUTE_TIME
# include <mach/mach_time.h>
# endif
#endif
#include <sys/types.h>
#include <limits.h>
#ifndef SIZE_T_MAX
# define SIZE_T_MAX SIZE_MAX
# define SIZE_T_MAX SIZE_MAX
#endif
#ifndef SSIZE_MAX
# define SSIZE_MAX ((ssize_t)(SIZE_T_MAX >> 1))
#endif
#include <stdarg.h>
#include <stdbool.h>
@ -43,33 +70,57 @@
#include <stdint.h>
#include <stddef.h>
#ifndef offsetof
# define offsetof(type, member) ((size_t)&(((type *)NULL)->member))
# define offsetof(type, member) ((size_t) & (((type *)NULL)->member))
#endif
#include <string.h>
#include <strings.h>
#include <ctype.h>
#ifdef _MSC_VER
# include <io.h>
# include <io.h>
typedef intptr_t ssize_t;
# define PATH_MAX 1024
# define STDERR_FILENO 2
# define __func__ __FUNCTION__
# ifdef JEMALLOC_HAS_RESTRICT
# define restrict __restrict
# endif
# define PATH_MAX 1024
# define STDERR_FILENO 2
# define __func__ __FUNCTION__
# ifdef JEMALLOC_HAS_RESTRICT
# define restrict __restrict
# endif
/* Disable warnings about deprecated system functions. */
# pragma warning(disable: 4996)
#if _MSC_VER < 1800
# pragma warning(disable : 4996)
# if _MSC_VER < 1800
static int
isblank(int c)
{
isblank(int c) {
return (c == '\t' || c == ' ');
}
#endif
# endif
#else
# include <unistd.h>
# include <unistd.h>
#endif
#include <fcntl.h>
/*
* The Win32 midl compiler has #define small char; we don't use midl, but
* "small" is a nice identifier to have available when talking about size
* classes.
*/
#ifdef small
# undef small
#endif
/*
* Oftentimes we'd like to perform some kind of arithmetic to obtain
* a pointer from another pointer but with some offset or mask applied.
* Naively you would accomplish this by casting the source pointer to
* `uintptr_t`, performing all of the relevant arithmetic, and then casting
* the result to the desired pointer type. However, this has the unfortunate
* side-effect of concealing pointer provenance, hiding useful information for
* optimization from the compiler (see here for details:
* https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html
* )
* Instead what one should do is cast the source pointer to `char *` and perform
* the equivalent arithmetic (since `char` of course represents one byte). But
* because `char *` has the semantic meaning of "string", we define this typedef
* simply to make it clearer where we are performing such pointer arithmetic.
*/
typedef char byte_t;
#endif /* JEMALLOC_INTERNAL_H */

View file

@ -1,5 +1,5 @@
#ifndef JEMALLOC_INTERNAL_DEFS_H_
#define JEMALLOC_INTERNAL_DEFS_H_
#define JEMALLOC_INTERNAL_DEFS_H_
/*
* If JEMALLOC_PREFIX is defined via --with-jemalloc-prefix, it will cause all
* public APIs to be prefixed. This makes it possible, with some care, to use
@ -8,6 +8,21 @@
#undef JEMALLOC_PREFIX
#undef JEMALLOC_CPREFIX
/*
* Define overrides for non-standard allocator-related functions if they are
* present on the system.
*/
#undef JEMALLOC_OVERRIDE___LIBC_CALLOC
#undef JEMALLOC_OVERRIDE___LIBC_FREE
#undef JEMALLOC_OVERRIDE___LIBC_FREE_SIZED
#undef JEMALLOC_OVERRIDE___LIBC_FREE_ALIGNED_SIZED
#undef JEMALLOC_OVERRIDE___LIBC_MALLOC
#undef JEMALLOC_OVERRIDE___LIBC_MEMALIGN
#undef JEMALLOC_OVERRIDE___LIBC_REALLOC
#undef JEMALLOC_OVERRIDE___LIBC_VALLOC
#undef JEMALLOC_OVERRIDE___LIBC_PVALLOC
#undef JEMALLOC_OVERRIDE___POSIX_MEMALIGN
/*
* JEMALLOC_PRIVATE_NAMESPACE is used as a prefix for all library-private APIs.
* For shared libraries, symbol visibility mechanisms prevent these symbols
@ -21,58 +36,41 @@
* order to yield to another virtual CPU.
*/
#undef CPU_SPINWAIT
/* 1 if CPU_SPINWAIT is defined, 0 otherwise. */
#undef HAVE_CPU_SPINWAIT
/*
* Number of significant bits in virtual addresses. This may be less than the
* total number of bits in a pointer, e.g. on x64, for which the uppermost 16
* bits are the same as bit 47.
*/
#undef LG_VADDR
/* Defined if C11 atomics are available. */
#undef JEMALLOC_C11ATOMICS
#undef JEMALLOC_C11_ATOMICS
/* Defined if the equivalent of FreeBSD's atomic(9) functions are available. */
#undef JEMALLOC_ATOMIC9
/* Defined if GCC __atomic atomics are available. */
#undef JEMALLOC_GCC_ATOMIC_ATOMICS
/* and the 8-bit variant support. */
#undef JEMALLOC_GCC_U8_ATOMIC_ATOMICS
/*
* Defined if OSAtomic*() functions are available, as provided by Darwin, and
* documented in the atomic(3) manual page.
*/
#undef JEMALLOC_OSATOMIC
/*
* Defined if __sync_add_and_fetch(uint32_t *, uint32_t) and
* __sync_sub_and_fetch(uint32_t *, uint32_t) are available, despite
* __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 not being defined (which means the
* functions are defined in libgcc instead of being inlines).
*/
#undef JE_FORCE_SYNC_COMPARE_AND_SWAP_4
/*
* Defined if __sync_add_and_fetch(uint64_t *, uint64_t) and
* __sync_sub_and_fetch(uint64_t *, uint64_t) are available, despite
* __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 not being defined (which means the
* functions are defined in libgcc instead of being inlines).
*/
#undef JE_FORCE_SYNC_COMPARE_AND_SWAP_8
/* Defined if GCC __sync atomics are available. */
#undef JEMALLOC_GCC_SYNC_ATOMICS
/* and the 8-bit variant support. */
#undef JEMALLOC_GCC_U8_SYNC_ATOMICS
/*
* Defined if __builtin_clz() and __builtin_clzl() are available.
*/
#undef JEMALLOC_HAVE_BUILTIN_CLZ
/*
* Defined if madvise(2) is available.
*/
#undef JEMALLOC_HAVE_MADVISE
/*
* Defined if os_unfair_lock_*() functions are available, as provided by Darwin.
*/
#undef JEMALLOC_OS_UNFAIR_LOCK
/*
* Defined if OSSpin*() functions are available, as provided by Darwin, and
* documented in the spinlock(3) manual page.
*/
#undef JEMALLOC_OSSPIN
/* Defined if syscall(2) is available. */
#undef JEMALLOC_HAVE_SYSCALL
/* Defined if syscall(2) is usable. */
#undef JEMALLOC_USE_SYSCALL
/*
* Defined if secure_getenv(3) is available.
@ -84,6 +82,21 @@
*/
#undef JEMALLOC_HAVE_ISSETUGID
/* Defined if pthread_atfork(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_ATFORK
/* Defined if pthread_setname_np(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_SETNAME_NP
/* Defined if pthread_getname_np(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_GETNAME_NP
/* Defined if pthread_set_name_np(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_SET_NAME_NP
/* Defined if pthread_get_name_np(3) is available. */
#undef JEMALLOC_HAVE_PTHREAD_GET_NAME_NP
/*
* Defined if clock_gettime(CLOCK_MONOTONIC_COARSE, ...) is available.
*/
@ -99,6 +112,16 @@
*/
#undef JEMALLOC_HAVE_MACH_ABSOLUTE_TIME
/*
* Defined if clock_gettime(CLOCK_REALTIME, ...) is available.
*/
#undef JEMALLOC_HAVE_CLOCK_REALTIME
/*
* Defined if clock_gettime_nsec_np(CLOCK_UPTIME_RAW) is available.
*/
#undef JEMALLOC_HAVE_CLOCK_GETTIME_NSEC_NP
/*
* Defined if _malloc_thread_cleanup() exists. At least in the case of
* FreeBSD, pthread_key_create() allocates, which if used during malloc
@ -125,12 +148,6 @@
/* Non-empty if the tls_model attribute is supported. */
#undef JEMALLOC_TLS_MODEL
/* JEMALLOC_CC_SILENCE enables code that silences unuseful compiler warnings. */
#undef JEMALLOC_CC_SILENCE
/* JEMALLOC_CODE_COVERAGE enables test code coverage analysis. */
#undef JEMALLOC_CODE_COVERAGE
/*
* JEMALLOC_DEBUG enables assertions and other sanity checks, and disables
* inline functions.
@ -140,6 +157,9 @@
/* JEMALLOC_STATS enables statistics calculation. */
#undef JEMALLOC_STATS
/* JEMALLOC_EXPERIMENTAL_SMALLOCX_API enables experimental smallocx API. */
#undef JEMALLOC_EXPERIMENTAL_SMALLOCX_API
/* JEMALLOC_PROF enables allocation profiling. */
#undef JEMALLOC_PROF
@ -152,27 +172,29 @@
/* Use gcc intrinsics for profile backtracing if defined. */
#undef JEMALLOC_PROF_GCC
/*
* JEMALLOC_TCACHE enables a thread-specific caching layer for small objects.
* This makes it possible to allocate/deallocate objects without any locking
* when the cache is in the steady state.
*/
#undef JEMALLOC_TCACHE
/* Use frame pointer for profile backtracing if defined. Linux only. */
#undef JEMALLOC_PROF_FRAME_POINTER
/* JEMALLOC_PAGEID enabled page id */
#undef JEMALLOC_PAGEID
/* JEMALLOC_HAVE_PRCTL checks prctl */
#undef JEMALLOC_HAVE_PRCTL
/*
* JEMALLOC_DSS enables use of sbrk(2) to allocate chunks from the data storage
* JEMALLOC_DSS enables use of sbrk(2) to allocate extents from the data storage
* segment (DSS).
*/
#undef JEMALLOC_DSS
/* Support memory filling (junk/zero/quarantine/redzone). */
/* Support memory filling (junk/zero). */
#undef JEMALLOC_FILL
/* Support utrace(2)-based tracing. */
#undef JEMALLOC_UTRACE
/* Support Valgrind. */
#undef JEMALLOC_VALGRIND
/* Support utrace(2)-based tracing (label based signature). */
#undef JEMALLOC_UTRACE_LABEL
/* Support optional abort() on OOM. */
#undef JEMALLOC_XMALLOC
@ -180,9 +202,6 @@
/* Support lazy locking (avoid locking unless a second thread is launched). */
#undef JEMALLOC_LAZY_LOCK
/* Minimum size class to support is 2^LG_TINY_MIN bytes. */
#undef LG_TINY_MIN
/*
* Minimum allocation alignment is 2^LG_QUANTUM bytes (ignoring tiny size
* classes).
@ -192,6 +211,16 @@
/* One page is 2^LG_PAGE bytes. */
#undef LG_PAGE
/* Maximum number of regions in a slab. */
#undef CONFIG_LG_SLAB_MAXREGS
/*
* One huge page is 2^LG_HUGEPAGE bytes. Note that this is defined even if the
* system does not explicitly support huge pages; system calls that require
* explicit huge page support are separately configured.
*/
#undef LG_HUGEPAGE
/*
* If defined, adjacent virtual memory mappings with identical attributes
* automatically coalesce, and they fragment when changes are made to subranges.
@ -202,11 +231,12 @@
#undef JEMALLOC_MAPS_COALESCE
/*
* If defined, use munmap() to unmap freed chunks, rather than storing them for
* later reuse. This is disabled by default on Linux because common sequences
* of mmap()/munmap() calls will cause virtual memory map holes.
* If defined, retain memory for later reuse by default rather than using e.g.
* munmap() to unmap freed extents. This is enabled on 64-bit Linux because
* common sequences of mmap()/munmap() calls will cause virtual memory map
* holes.
*/
#undef JEMALLOC_MUNMAP
#undef JEMALLOC_RETAIN
/* TLS is used to map arenas and magazine caches to threads. */
#undef JEMALLOC_TLS
@ -226,10 +256,10 @@
#undef JEMALLOC_INTERNAL_FFS
/*
* JEMALLOC_IVSALLOC enables ivsalloc(), which verifies that pointers reside
* within jemalloc-owned chunks before dereferencing them.
* popcount*() functions to use for bitmapping.
*/
#undef JEMALLOC_IVSALLOC
#undef JEMALLOC_INTERNAL_POPCOUNTL
#undef JEMALLOC_INTERNAL_POPCOUNT
/*
* If defined, explicitly attempt to more uniformly distribute large allocation
@ -237,11 +267,28 @@
*/
#undef JEMALLOC_CACHE_OBLIVIOUS
/*
* If defined, enable logging facilities. We make this a configure option to
* avoid taking extra branches everywhere.
*/
#undef JEMALLOC_LOG
/*
* If defined, use readlinkat() (instead of readlink()) to follow
* /etc/malloc_conf.
*/
#undef JEMALLOC_READLINKAT
/*
* If defined, use getenv() (instead of secure_getenv() or
* alternatives) to access MALLOC_CONF.
*/
#undef JEMALLOC_FORCE_GETENV
/*
* Darwin (OS X) uses zones to work around Mach-O symbol override shortcomings.
*/
#undef JEMALLOC_ZONE
#undef JEMALLOC_ZONE_VERSION
/*
* Methods for determining whether the OS overcommits.
@ -252,18 +299,95 @@
#undef JEMALLOC_SYSCTL_VM_OVERCOMMIT
#undef JEMALLOC_PROC_SYS_VM_OVERCOMMIT_MEMORY
/* Defined if madvise(2) is available. */
#undef JEMALLOC_HAVE_MADVISE
/*
* Defined if transparent huge pages are supported via the MADV_[NO]HUGEPAGE
* arguments to madvise(2).
*/
#undef JEMALLOC_HAVE_MADVISE_HUGE
/*
* Defined if best-effort synchronous collapse of the native
* pages mapped by the memory range into transparent huge pages is supported
* via MADV_COLLAPSE arguments to madvise(2).
*/
#undef JEMALLOC_HAVE_MADVISE_COLLAPSE
/*
* Methods for purging unused pages differ between operating systems.
*
* madvise(..., MADV_DONTNEED) : On Linux, this immediately discards pages,
* madvise(..., MADV_FREE) : This marks pages as being unused, such that they
* will be discarded rather than swapped out.
* madvise(..., MADV_DONTNEED) : If JEMALLOC_PURGE_MADVISE_DONTNEED_ZEROS is
* defined, this immediately discards pages,
* such that new pages will be demand-zeroed if
* the address region is later touched.
* madvise(..., MADV_FREE) : On FreeBSD and Darwin, this marks pages as being
* unused, such that they will be discarded rather
* than swapped out.
* the address region is later touched;
* otherwise this behaves similarly to
* MADV_FREE, though typically with higher
* system overhead.
*/
#undef JEMALLOC_PURGE_MADVISE_DONTNEED
#undef JEMALLOC_PURGE_MADVISE_FREE
#undef JEMALLOC_PURGE_MADVISE_DONTNEED
#undef JEMALLOC_PURGE_MADVISE_DONTNEED_ZEROS
/* Defined if madvise(2) is available but MADV_FREE is not (x86 Linux only). */
#undef JEMALLOC_DEFINE_MADVISE_FREE
/*
* Defined if MADV_DO[NT]DUMP is supported as an argument to madvise.
*/
#undef JEMALLOC_MADVISE_DONTDUMP
/*
* Defined if MADV_[NO]CORE is supported as an argument to madvise.
*/
#undef JEMALLOC_MADVISE_NOCORE
/* Defined if process_madvise(2) is available. */
#undef JEMALLOC_HAVE_PROCESS_MADVISE
#undef EXPERIMENTAL_SYS_PROCESS_MADVISE_NR
/* Defined if mprotect(2) is available. */
#undef JEMALLOC_HAVE_MPROTECT
/* Defined if sys/sdt.h is available and sdt tracing enabled */
#undef JEMALLOC_EXPERIMENTAL_USDT_STAP
/*
* Defined if sys/sdt.h is unavailable, sdt tracing enabled, and
* platform is supported
*/
#undef JEMALLOC_EXPERIMENTAL_USDT_CUSTOM
/*
* Defined if transparent huge pages (THPs) are supported via the
* MADV_[NO]HUGEPAGE arguments to madvise(2), and THP support is enabled.
*/
#undef JEMALLOC_THP
/* Defined if posix_madvise is available. */
#undef JEMALLOC_HAVE_POSIX_MADVISE
/*
* Method for purging unused pages using posix_madvise.
*
* posix_madvise(..., POSIX_MADV_DONTNEED)
*/
#undef JEMALLOC_PURGE_POSIX_MADVISE_DONTNEED
#undef JEMALLOC_PURGE_POSIX_MADVISE_DONTNEED_ZEROS
/*
* Defined if memcntl page admin call is supported
*/
#undef JEMALLOC_HAVE_MEMCNTL
/*
* Defined if malloc_size is supported
*/
#undef JEMALLOC_HAVE_MALLOC_SIZE
/* Define if operating system has alloca.h header. */
#undef JEMALLOC_HAS_ALLOCA_H
@ -292,9 +416,32 @@
/* glibc memalign hook. */
#undef JEMALLOC_GLIBC_MEMALIGN_HOOK
/* pthread support */
#undef JEMALLOC_HAVE_PTHREAD
/* dlsym() support */
#undef JEMALLOC_HAVE_DLSYM
/* Adaptive mutex support in pthreads. */
#undef JEMALLOC_HAVE_PTHREAD_MUTEX_ADAPTIVE_NP
/* gettid() support */
#undef JEMALLOC_HAVE_GETTID
/* GNU specific sched_getcpu support */
#undef JEMALLOC_HAVE_SCHED_GETCPU
/* GNU specific sched_setaffinity support */
#undef JEMALLOC_HAVE_SCHED_SETAFFINITY
/* pthread_setaffinity_np support */
#undef JEMALLOC_HAVE_PTHREAD_SETAFFINITY_NP
/*
* If defined, all the features necessary for background threads are present.
*/
#undef JEMALLOC_BACKGROUND_THREAD
/*
* If defined, jemalloc symbols are not exported (doesn't work when
* JEMALLOC_PREFIX is not defined).
@ -304,4 +451,44 @@
/* config.malloc_conf options string. */
#undef JEMALLOC_CONFIG_MALLOC_CONF
/* If defined, jemalloc takes the malloc/free/etc. symbol names. */
#undef JEMALLOC_IS_MALLOC
/*
* Defined if strerror_r returns char * if _GNU_SOURCE is defined.
*/
#undef JEMALLOC_STRERROR_R_RETURNS_CHAR_WITH_GNU_SOURCE
/* Performs additional safety checks when defined. */
#undef JEMALLOC_OPT_SAFETY_CHECKS
/* Is C++ support being built? */
#undef JEMALLOC_ENABLE_CXX
/* Performs additional size checks when defined. */
#undef JEMALLOC_OPT_SIZE_CHECKS
/* Allows sampled junk and stash for checking use-after-free when defined. */
#undef JEMALLOC_UAF_DETECTION
/* Darwin VM_MAKE_TAG support */
#undef JEMALLOC_HAVE_VM_MAKE_TAG
/* If defined, realloc(ptr, 0) defaults to "free" instead of "alloc". */
#undef JEMALLOC_ZERO_REALLOC_DEFAULT_FREE
/* If defined, use volatile asm during benchmarks. */
#undef JEMALLOC_HAVE_ASM_VOLATILE
/*
* If defined, support the use of rdtscp to get the time stamp counter
* and the processor ID.
*/
#undef JEMALLOC_HAVE_RDTSCP
/* If defined, use __int128 for optimization. */
#undef JEMALLOC_HAVE_INT128
#include "jemalloc/internal/jemalloc_internal_overrides.h"
#endif /* JEMALLOC_INTERNAL_DEFS_H_ */

View file

@ -0,0 +1,91 @@
#ifndef JEMALLOC_INTERNAL_EXTERNS_H
#define JEMALLOC_INTERNAL_EXTERNS_H
#include "jemalloc/internal/arena_types.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/fxp.h"
#include "jemalloc/internal/hpa_opts.h"
#include "jemalloc/internal/nstime.h"
#include "jemalloc/internal/sec_opts.h"
#include "jemalloc/internal/tsd_types.h"
/* TSD checks this to set thread local slow state accordingly. */
extern bool malloc_slow;
/* Run-time options. */
extern bool opt_abort;
extern bool opt_abort_conf;
extern bool opt_trust_madvise;
extern bool opt_experimental_hpa_start_huge_if_thp_always;
extern bool opt_experimental_hpa_enforce_hugify;
extern bool opt_confirm_conf;
extern bool opt_hpa;
extern hpa_shard_opts_t opt_hpa_opts;
extern sec_opts_t opt_hpa_sec_opts;
extern const char *opt_junk;
extern bool opt_junk_alloc;
extern bool opt_junk_free;
extern void (*JET_MUTABLE junk_free_callback)(void *ptr, size_t size);
extern void (*JET_MUTABLE junk_alloc_callback)(void *ptr, size_t size);
extern void (*JET_MUTABLE invalid_conf_abort)(void);
extern bool opt_utrace;
extern bool opt_xmalloc;
extern bool opt_experimental_infallible_new;
extern bool opt_experimental_tcache_gc;
extern bool opt_zero;
extern unsigned opt_narenas;
extern fxp_t opt_narenas_ratio;
extern zero_realloc_action_t opt_zero_realloc_action;
extern malloc_init_t malloc_init_state;
extern const char *const zero_realloc_mode_names[];
extern atomic_zu_t zero_realloc_count;
extern bool opt_cache_oblivious;
extern unsigned opt_debug_double_free_max_scan;
extern size_t opt_calloc_madvise_threshold;
extern bool opt_disable_large_size_classes;
extern const char *opt_malloc_conf_symlink;
extern const char *opt_malloc_conf_env_var;
/* Escape free-fastpath when ptr & mask == 0 (for sanitization purpose). */
extern uintptr_t san_cache_bin_nonfast_mask;
/* Number of CPUs. */
extern unsigned ncpus;
/* Number of arenas used for automatic multiplexing of threads and arenas. */
extern unsigned narenas_auto;
/* Base index for manual arenas. */
extern unsigned manual_arena_base;
/*
* Arenas that are used to service external requests. Not all elements of the
* arenas array are necessarily used; arenas are created lazily as needed.
*/
extern atomic_p_t arenas[];
extern unsigned huge_arena_ind;
void *a0malloc(size_t size);
void a0dalloc(void *ptr);
void *bootstrap_malloc(size_t size);
void *bootstrap_calloc(size_t num, size_t size);
void bootstrap_free(void *ptr);
void arena_set(unsigned ind, arena_t *arena);
unsigned narenas_total_get(void);
arena_t *arena_init(tsdn_t *tsdn, unsigned ind, const arena_config_t *config);
arena_t *arena_choose_hard(tsd_t *tsd, bool internal);
void arena_migrate(tsd_t *tsd, arena_t *oldarena, arena_t *newarena);
void iarena_cleanup(tsd_t *tsd);
void arena_cleanup(tsd_t *tsd);
size_t batch_alloc(void **ptrs, size_t num, size_t size, int flags);
void jemalloc_prefork(void);
void jemalloc_postfork_parent(void);
void jemalloc_postfork_child(void);
void sdallocx_default(void *ptr, size_t size, int flags);
void free_default(void *ptr);
void *malloc_default(size_t size);
#endif /* JEMALLOC_INTERNAL_EXTERNS_H */

View file

@ -0,0 +1,84 @@
#ifndef JEMALLOC_INTERNAL_INCLUDES_H
#define JEMALLOC_INTERNAL_INCLUDES_H
/*
* jemalloc can conceptually be broken into components (arena, tcache, etc.),
* but there are circular dependencies that cannot be broken without
* substantial performance degradation.
*
* Historically, we dealt with this by each header into four sections (types,
* structs, externs, and inlines), and included each header file multiple times
* in this file, picking out the portion we want on each pass using the
* following #defines:
* JEMALLOC_H_TYPES : Preprocessor-defined constants and pseudo-opaque data
* types.
* JEMALLOC_H_STRUCTS : Data structures.
* JEMALLOC_H_EXTERNS : Extern data declarations and function prototypes.
* JEMALLOC_H_INLINES : Inline functions.
*
* We're moving toward a world in which the dependencies are explicit; each file
* will #include the headers it depends on (rather than relying on them being
* implicitly available via this file including every header file in the
* project).
*
* We're now in an intermediate state: we've broken up the header files to avoid
* having to include each one multiple times, but have not yet moved the
* dependency information into the header files (i.e. we still rely on the
* ordering in this file to ensure all a header's dependencies are available in
* its translation unit). Each component is now broken up into multiple header
* files, corresponding to the sections above (e.g. instead of "foo.h", we now
* have "foo_types.h", "foo_structs.h", "foo_externs.h", "foo_inlines.h").
*
* Those files which have been converted to explicitly include their
* inter-component dependencies are now in the initial HERMETIC HEADERS
* section. All headers may still rely on jemalloc_preamble.h (which, by fiat,
* must be included first in every translation unit) for system headers and
* global jemalloc definitions, however.
*/
/******************************************************************************/
/* TYPES */
/******************************************************************************/
#include "jemalloc/internal/arena_types.h"
#include "jemalloc/internal/tcache_types.h"
#include "jemalloc/internal/prof_types.h"
/******************************************************************************/
/* STRUCTS */
/******************************************************************************/
#include "jemalloc/internal/prof_structs.h"
#include "jemalloc/internal/arena_structs.h"
#include "jemalloc/internal/tcache_structs.h"
#include "jemalloc/internal/background_thread_structs.h"
/******************************************************************************/
/* EXTERNS */
/******************************************************************************/
#include "jemalloc/internal/jemalloc_internal_externs.h"
#include "jemalloc/internal/arena_externs.h"
#include "jemalloc/internal/large_externs.h"
#include "jemalloc/internal/tcache_externs.h"
#include "jemalloc/internal/prof_externs.h"
#include "jemalloc/internal/background_thread_externs.h"
/******************************************************************************/
/* INLINES */
/******************************************************************************/
#include "jemalloc/internal/jemalloc_internal_inlines_a.h"
/*
* Include portions of arena code interleaved with tcache code in order to
* resolve circular dependencies.
*/
#include "jemalloc/internal/arena_inlines_a.h"
#include "jemalloc/internal/jemalloc_internal_inlines_b.h"
#include "jemalloc/internal/tcache_inlines.h"
#include "jemalloc/internal/arena_inlines_b.h"
#include "jemalloc/internal/jemalloc_internal_inlines_c.h"
#include "jemalloc/internal/prof_inlines.h"
#include "jemalloc/internal/background_thread_inlines.h"
#endif /* JEMALLOC_INTERNAL_INCLUDES_H */

View file

@ -0,0 +1,135 @@
#ifndef JEMALLOC_INTERNAL_INLINES_A_H
#define JEMALLOC_INTERNAL_INLINES_A_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_externs.h"
#include "jemalloc/internal/arena_types.h"
#include "jemalloc/internal/atomic.h"
#include "jemalloc/internal/bit_util.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/sc.h"
#include "jemalloc/internal/tcache_externs.h"
#include "jemalloc/internal/ticker.h"
JEMALLOC_ALWAYS_INLINE malloc_cpuid_t
malloc_getcpu(void) {
assert(have_percpu_arena);
#if defined(_WIN32)
return GetCurrentProcessorNumber();
#elif defined(JEMALLOC_HAVE_SCHED_GETCPU)
return (malloc_cpuid_t)sched_getcpu();
#elif defined(JEMALLOC_HAVE_RDTSCP)
unsigned int ecx;
asm volatile("rdtscp" : "=c"(ecx)::"eax", "edx");
return (malloc_cpuid_t)(ecx & 0xfff);
#elif defined(__aarch64__) && defined(__APPLE__)
/* Other oses most likely use tpidr_el0 instead */
uintptr_t c;
asm volatile("mrs %x0, tpidrro_el0" : "=r"(c)::"memory");
return (malloc_cpuid_t)(c & (1 << 3) - 1);
#else
not_reached();
return -1;
#endif
}
/* Return the chosen arena index based on current cpu. */
JEMALLOC_ALWAYS_INLINE unsigned
percpu_arena_choose(void) {
assert(have_percpu_arena && PERCPU_ARENA_ENABLED(opt_percpu_arena));
malloc_cpuid_t cpuid = malloc_getcpu();
assert(cpuid >= 0);
unsigned arena_ind;
if ((opt_percpu_arena == percpu_arena)
|| ((unsigned)cpuid < ncpus / 2)) {
arena_ind = cpuid;
} else {
assert(opt_percpu_arena == per_phycpu_arena);
/* Hyper threads on the same physical CPU share arena. */
arena_ind = cpuid - ncpus / 2;
}
return arena_ind;
}
/* Return the limit of percpu auto arena range, i.e. arenas[0...ind_limit). */
JEMALLOC_ALWAYS_INLINE unsigned
percpu_arena_ind_limit(percpu_arena_mode_t mode) {
assert(have_percpu_arena && PERCPU_ARENA_ENABLED(mode));
if (mode == per_phycpu_arena && ncpus > 1) {
if (ncpus % 2) {
/* This likely means a misconfig. */
return ncpus / 2 + 1;
}
return ncpus / 2;
} else {
return ncpus;
}
}
static inline arena_t *
arena_get(tsdn_t *tsdn, unsigned ind, bool init_if_missing) {
arena_t *ret;
assert(ind < MALLOCX_ARENA_LIMIT);
ret = (arena_t *)atomic_load_p(&arenas[ind], ATOMIC_ACQUIRE);
if (unlikely(ret == NULL)) {
if (init_if_missing) {
ret = arena_init(tsdn, ind, &arena_config_default);
}
}
return ret;
}
JEMALLOC_ALWAYS_INLINE bool
tcache_available(tsd_t *tsd) {
/*
* Thread specific auto tcache might be unavailable if: 1) during tcache
* initialization, or 2) disabled through thread.tcache.enabled mallctl
* or config options. This check covers all cases.
*/
if (likely(tsd_tcache_enabled_get(tsd))) {
/* Associated arena == NULL implies tcache init in progress. */
if (config_debug && tsd_tcache_slowp_get(tsd)->arena != NULL) {
tcache_assert_initialized(tsd_tcachep_get(tsd));
}
return true;
}
return false;
}
JEMALLOC_ALWAYS_INLINE tcache_t *
tcache_get(tsd_t *tsd) {
if (!tcache_available(tsd)) {
return NULL;
}
return tsd_tcachep_get(tsd);
}
JEMALLOC_ALWAYS_INLINE tcache_slow_t *
tcache_slow_get(tsd_t *tsd) {
if (!tcache_available(tsd)) {
return NULL;
}
return tsd_tcache_slowp_get(tsd);
}
static inline void
pre_reentrancy(tsd_t *tsd, arena_t *arena) {
/* arena is the current context. Reentry from a0 is not allowed. */
assert(arena != arena_get(tsd_tsdn(tsd), 0, false));
tsd_pre_reentrancy_raw(tsd);
}
static inline void
post_reentrancy(tsd_t *tsd) {
tsd_post_reentrancy_raw(tsd);
}
#endif /* JEMALLOC_INTERNAL_INLINES_A_H */

View file

@ -0,0 +1,106 @@
#ifndef JEMALLOC_INTERNAL_INLINES_B_H
#define JEMALLOC_INTERNAL_INLINES_B_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_inlines_a.h"
#include "jemalloc/internal/extent.h"
#include "jemalloc/internal/jemalloc_internal_inlines_a.h"
static inline void
percpu_arena_update(tsd_t *tsd, unsigned cpu) {
assert(have_percpu_arena);
arena_t *oldarena = tsd_arena_get(tsd);
assert(oldarena != NULL);
unsigned oldind = arena_ind_get(oldarena);
if (oldind != cpu) {
unsigned newind = cpu;
arena_t *newarena = arena_get(tsd_tsdn(tsd), newind, true);
assert(newarena != NULL);
/* Set new arena/tcache associations. */
arena_migrate(tsd, oldarena, newarena);
tcache_t *tcache = tcache_get(tsd);
if (tcache != NULL) {
tcache_slow_t *tcache_slow = tsd_tcache_slowp_get(tsd);
assert(tcache_slow->arena != NULL);
tcache_arena_reassociate(
tsd_tsdn(tsd), tcache_slow, tcache, newarena);
}
}
}
/* Choose an arena based on a per-thread value. */
static inline arena_t *
arena_choose_impl(tsd_t *tsd, arena_t *arena, bool internal) {
arena_t *ret;
if (arena != NULL) {
return arena;
}
/* During reentrancy, arena 0 is the safest bet. */
if (unlikely(tsd_reentrancy_level_get(tsd) > 0)) {
return arena_get(tsd_tsdn(tsd), 0, true);
}
ret = internal ? tsd_iarena_get(tsd) : tsd_arena_get(tsd);
if (unlikely(ret == NULL)) {
ret = arena_choose_hard(tsd, internal);
assert(ret);
if (tcache_available(tsd)) {
tcache_slow_t *tcache_slow = tsd_tcache_slowp_get(tsd);
tcache_t *tcache = tsd_tcachep_get(tsd);
if (tcache_slow->arena != NULL) {
/* See comments in tsd_tcache_data_init().*/
assert(tcache_slow->arena
== arena_get(tsd_tsdn(tsd), 0, false));
if (tcache_slow->arena != ret) {
tcache_arena_reassociate(tsd_tsdn(tsd),
tcache_slow, tcache, ret);
}
} else {
tcache_arena_associate(
tsd_tsdn(tsd), tcache_slow, tcache, ret);
}
}
}
/*
* Note that for percpu arena, if the current arena is outside of the
* auto percpu arena range, (i.e. thread is assigned to a manually
* managed arena), then percpu arena is skipped.
*/
if (have_percpu_arena && PERCPU_ARENA_ENABLED(opt_percpu_arena)
&& !internal
&& (arena_ind_get(ret) < percpu_arena_ind_limit(opt_percpu_arena))
&& (ret->last_thd != tsd_tsdn(tsd))) {
unsigned ind = percpu_arena_choose();
if (arena_ind_get(ret) != ind) {
percpu_arena_update(tsd, ind);
ret = tsd_arena_get(tsd);
}
ret->last_thd = tsd_tsdn(tsd);
}
return ret;
}
static inline arena_t *
arena_choose(tsd_t *tsd, arena_t *arena) {
return arena_choose_impl(tsd, arena, false);
}
static inline arena_t *
arena_ichoose(tsd_t *tsd, arena_t *arena) {
return arena_choose_impl(tsd, arena, true);
}
static inline bool
arena_is_auto(arena_t *arena) {
assert(narenas_auto > 0);
return (arena_ind_get(arena) < manual_arena_base);
}
#endif /* JEMALLOC_INTERNAL_INLINES_B_H */

View file

@ -0,0 +1,600 @@
#ifndef JEMALLOC_INTERNAL_INLINES_C_H
#define JEMALLOC_INTERNAL_INLINES_C_H
#include "jemalloc/internal/jemalloc_preamble.h"
#include "jemalloc/internal/arena_externs.h"
#include "jemalloc/internal/arena_inlines_b.h"
#include "jemalloc/internal/emap.h"
#include "jemalloc/internal/hook.h"
#include "jemalloc/internal/jemalloc_internal_types.h"
#include "jemalloc/internal/log.h"
#include "jemalloc/internal/sz.h"
#include "jemalloc/internal/thread_event.h"
#include "jemalloc/internal/witness.h"
/*
* These correspond to the macros in jemalloc/jemalloc_macros.h. Broadly, we
* should have one constant here per magic value there. Note however that the
* representations need not be related.
*/
#define TCACHE_IND_NONE ((unsigned)-1)
#define TCACHE_IND_AUTOMATIC ((unsigned)-2)
#define ARENA_IND_AUTOMATIC ((unsigned)-1)
/*
* Translating the names of the 'i' functions:
* Abbreviations used in the first part of the function name (before
* alloc/dalloc) describe what that function accomplishes:
* a: arena (query)
* s: size (query, or sized deallocation)
* e: extent (query)
* p: aligned (allocates)
* vs: size (query, without knowing that the pointer is into the heap)
* r: rallocx implementation
* x: xallocx implementation
* Abbreviations used in the second part of the function name (after
* alloc/dalloc) describe the arguments it takes
* z: whether to return zeroed memory
* t: accepts a tcache_t * parameter
* m: accepts an arena_t * parameter
*/
JEMALLOC_ALWAYS_INLINE arena_t *
iaalloc(tsdn_t *tsdn, const void *ptr) {
assert(ptr != NULL);
return arena_aalloc(tsdn, ptr);
}
JEMALLOC_ALWAYS_INLINE size_t
isalloc(tsdn_t *tsdn, const void *ptr) {
assert(ptr != NULL);
return arena_salloc(tsdn, ptr);
}
JEMALLOC_ALWAYS_INLINE void *
iallocztm_explicit_slab(tsdn_t *tsdn, size_t size, szind_t ind, bool zero,
bool slab, tcache_t *tcache, bool is_internal, arena_t *arena,
bool slow_path) {
void *ret;
assert(!slab || sz_can_use_slab(size)); /* slab && large is illegal */
assert(!is_internal || tcache == NULL);
assert(!is_internal || arena == NULL || arena_is_auto(arena));
if (!tsdn_null(tsdn) && tsd_reentrancy_level_get(tsdn_tsd(tsdn)) == 0) {
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
}
ret = arena_malloc(
tsdn, arena, size, ind, zero, slab, tcache, slow_path);
if (config_stats && is_internal && likely(ret != NULL)) {
arena_internal_add(iaalloc(tsdn, ret), isalloc(tsdn, ret));
}
return ret;
}
JEMALLOC_ALWAYS_INLINE void *
iallocztm(tsdn_t *tsdn, size_t size, szind_t ind, bool zero, tcache_t *tcache,
bool is_internal, arena_t *arena, bool slow_path) {
bool slab = sz_can_use_slab(size);
return iallocztm_explicit_slab(
tsdn, size, ind, zero, slab, tcache, is_internal, arena, slow_path);
}
JEMALLOC_ALWAYS_INLINE void *
ialloc(tsd_t *tsd, size_t size, szind_t ind, bool zero, bool slow_path) {
return iallocztm(tsd_tsdn(tsd), size, ind, zero, tcache_get(tsd), false,
NULL, slow_path);
}
JEMALLOC_ALWAYS_INLINE void *
ipallocztm_explicit_slab(tsdn_t *tsdn, size_t usize, size_t alignment,
bool zero, bool slab, tcache_t *tcache, bool is_internal, arena_t *arena) {
void *ret;
assert(!slab || sz_can_use_slab(usize)); /* slab && large is illegal */
assert(usize != 0);
assert(usize == sz_sa2u(usize, alignment));
assert(!is_internal || tcache == NULL);
assert(!is_internal || arena == NULL || arena_is_auto(arena));
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
ret = arena_palloc(tsdn, arena, usize, alignment, zero, slab, tcache);
assert(ALIGNMENT_ADDR2BASE(ret, alignment) == ret);
if (config_stats && is_internal && likely(ret != NULL)) {
arena_internal_add(iaalloc(tsdn, ret), isalloc(tsdn, ret));
}
return ret;
}
JEMALLOC_ALWAYS_INLINE void *
ipallocztm(tsdn_t *tsdn, size_t usize, size_t alignment, bool zero,
tcache_t *tcache, bool is_internal, arena_t *arena) {
return ipallocztm_explicit_slab(tsdn, usize, alignment, zero,
sz_can_use_slab(usize), tcache, is_internal, arena);
}
JEMALLOC_ALWAYS_INLINE void *
ipalloct(tsdn_t *tsdn, size_t usize, size_t alignment, bool zero,
tcache_t *tcache, arena_t *arena) {
return ipallocztm(tsdn, usize, alignment, zero, tcache, false, arena);
}
JEMALLOC_ALWAYS_INLINE void *
ipalloct_explicit_slab(tsdn_t *tsdn, size_t usize, size_t alignment, bool zero,
bool slab, tcache_t *tcache, arena_t *arena) {
return ipallocztm_explicit_slab(
tsdn, usize, alignment, zero, slab, tcache, false, arena);
}
JEMALLOC_ALWAYS_INLINE void *
ipalloc(tsd_t *tsd, size_t usize, size_t alignment, bool zero) {
return ipallocztm(tsd_tsdn(tsd), usize, alignment, zero,
tcache_get(tsd), false, NULL);
}
JEMALLOC_ALWAYS_INLINE size_t
ivsalloc(tsdn_t *tsdn, const void *ptr) {
return arena_vsalloc(tsdn, ptr);
}
JEMALLOC_ALWAYS_INLINE void
idalloctm(tsdn_t *tsdn, void *ptr, tcache_t *tcache,
emap_alloc_ctx_t *alloc_ctx, bool is_internal, bool slow_path) {
assert(ptr != NULL);
assert(!is_internal || tcache == NULL);
assert(!is_internal || arena_is_auto(iaalloc(tsdn, ptr)));
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
if (config_stats && is_internal) {
arena_internal_sub(iaalloc(tsdn, ptr), isalloc(tsdn, ptr));
}
if (!is_internal && !tsdn_null(tsdn)
&& tsd_reentrancy_level_get(tsdn_tsd(tsdn)) != 0) {
assert(tcache == NULL);
}
arena_dalloc(tsdn, ptr, tcache, alloc_ctx, slow_path);
}
JEMALLOC_ALWAYS_INLINE void
idalloc(tsd_t *tsd, void *ptr) {
idalloctm(tsd_tsdn(tsd), ptr, tcache_get(tsd), NULL, false, true);
}
JEMALLOC_ALWAYS_INLINE void
isdalloct(tsdn_t *tsdn, void *ptr, size_t size, tcache_t *tcache,
emap_alloc_ctx_t *alloc_ctx, bool slow_path) {
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
arena_sdalloc(tsdn, ptr, size, tcache, alloc_ctx, slow_path);
}
JEMALLOC_ALWAYS_INLINE void *
iralloct_realign(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size,
size_t alignment, bool zero, bool slab, tcache_t *tcache, arena_t *arena,
hook_ralloc_args_t *hook_args) {
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
void *p;
size_t usize, copysize;
usize = sz_sa2u(size, alignment);
if (unlikely(usize == 0 || usize > SC_LARGE_MAXCLASS)) {
return NULL;
}
p = ipalloct_explicit_slab(
tsdn, usize, alignment, zero, slab, tcache, arena);
if (p == NULL) {
return NULL;
}
/*
* Copy at most size bytes (not size+extra), since the caller has no
* expectation that the extra bytes will be reliably preserved.
*/
copysize = (size < oldsize) ? size : oldsize;
memcpy(p, ptr, copysize);
hook_invoke_alloc(
hook_args->is_realloc ? hook_alloc_realloc : hook_alloc_rallocx, p,
(uintptr_t)p, hook_args->args);
hook_invoke_dalloc(
hook_args->is_realloc ? hook_dalloc_realloc : hook_dalloc_rallocx,
ptr, hook_args->args);
isdalloct(tsdn, ptr, oldsize, tcache, NULL, true);
return p;
}
/*
* is_realloc threads through the knowledge of whether or not this call comes
* from je_realloc (as opposed to je_rallocx); this ensures that we pass the
* correct entry point into any hooks.
* Note that these functions are all force-inlined, so no actual bool gets
* passed-around anywhere.
*/
JEMALLOC_ALWAYS_INLINE void *
iralloct_explicit_slab(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size,
size_t alignment, bool zero, bool slab, tcache_t *tcache, arena_t *arena,
hook_ralloc_args_t *hook_args) {
assert(ptr != NULL);
assert(size != 0);
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
if (alignment != 0
&& ((uintptr_t)ptr & ((uintptr_t)alignment - 1)) != 0) {
/*
* Existing object alignment is inadequate; allocate new space
* and copy.
*/
return iralloct_realign(tsdn, ptr, oldsize, size, alignment,
zero, slab, tcache, arena, hook_args);
}
return arena_ralloc(tsdn, arena, ptr, oldsize, size, alignment, zero,
slab, tcache, hook_args);
}
JEMALLOC_ALWAYS_INLINE void *
iralloct(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size, size_t alignment,
size_t usize, bool zero, tcache_t *tcache, arena_t *arena,
hook_ralloc_args_t *hook_args) {
bool slab = sz_can_use_slab(usize);
return iralloct_explicit_slab(tsdn, ptr, oldsize, size, alignment, zero,
slab, tcache, arena, hook_args);
}
JEMALLOC_ALWAYS_INLINE void *
iralloc(tsd_t *tsd, void *ptr, size_t oldsize, size_t size, size_t alignment,
size_t usize, bool zero, hook_ralloc_args_t *hook_args) {
return iralloct(tsd_tsdn(tsd), ptr, oldsize, size, alignment, usize,
zero, tcache_get(tsd), NULL, hook_args);
}
JEMALLOC_ALWAYS_INLINE bool
ixalloc(tsdn_t *tsdn, void *ptr, size_t oldsize, size_t size, size_t extra,
size_t alignment, bool zero, size_t *newsize) {
assert(ptr != NULL);
assert(size != 0);
witness_assert_depth_to_rank(
tsdn_witness_tsdp_get(tsdn), WITNESS_RANK_CORE, 0);
if (alignment != 0
&& ((uintptr_t)ptr & ((uintptr_t)alignment - 1)) != 0) {
/* Existing object alignment is inadequate. */
*newsize = oldsize;
return true;
}
return arena_ralloc_no_move(
tsdn, ptr, oldsize, size, extra, zero, newsize);
}
JEMALLOC_ALWAYS_INLINE void
fastpath_success_finish(
tsd_t *tsd, uint64_t allocated_after, cache_bin_t *bin, void *ret) {
thread_allocated_set(tsd, allocated_after);
if (config_stats) {
bin->tstats.nrequests++;
}
}
JEMALLOC_ALWAYS_INLINE bool
malloc_initialized(void) {
return (malloc_init_state == malloc_init_initialized);
}
/*
* malloc() fastpath. Included here so that we can inline it into operator new;
* function call overhead there is non-negligible as a fraction of total CPU in
* allocation-heavy C++ programs. We take the fallback alloc to allow malloc
* (which can return NULL) to differ in its behavior from operator new (which
* can't). It matches the signature of malloc / operator new so that we can
* tail-call the fallback allocator, allowing us to avoid setting up the call
* frame in the common case.
*
* Fastpath assumes size <= SC_LOOKUP_MAXCLASS, and that we hit
* tcache. If either of these is false, we tail-call to the slowpath,
* malloc_default(). Tail-calling is used to avoid any caller-saved
* registers.
*
* fastpath supports ticker and profiling, both of which will also
* tail-call to the slowpath if they fire.
*/
JEMALLOC_ALWAYS_INLINE void *
imalloc_fastpath(size_t size, void *(fallback_alloc)(size_t)) {
if (tsd_get_allocates() && unlikely(!malloc_initialized())) {
return fallback_alloc(size);
}
tsd_t *tsd = tsd_get(false);
if (unlikely((size > SC_LOOKUP_MAXCLASS) || tsd == NULL)) {
return fallback_alloc(size);
}
/*
* The code below till the branch checking the next_event threshold may
* execute before malloc_init(), in which case the threshold is 0 to
* trigger slow path and initialization.
*
* Note that when uninitialized, only the fast-path variants of the sz /
* tsd facilities may be called.
*/
szind_t ind;
/*
* The thread_allocated counter in tsd serves as a general purpose
* accumulator for bytes of allocation to trigger different types of
* events. usize is always needed to advance thread_allocated, though
* it's not always needed in the core allocation logic.
*/
size_t usize;
sz_size2index_usize_fastpath(size, &ind, &usize);
/* Fast path relies on size being a bin. */
assert(ind < SC_NBINS);
assert((SC_LOOKUP_MAXCLASS < SC_SMALL_MAXCLASS)
&& (size <= SC_SMALL_MAXCLASS));
uint64_t allocated, threshold;
te_malloc_fastpath_ctx(tsd, &allocated, &threshold);
uint64_t allocated_after = allocated + usize;
/*
* The ind and usize might be uninitialized (or partially) before
* malloc_init(). The assertions check for: 1) full correctness (usize
* & ind) when initialized; and 2) guaranteed slow-path (threshold == 0)
* when !initialized.
*/
if (!malloc_initialized()) {
assert(threshold == 0);
} else {
assert(ind == sz_size2index(size));
assert(usize > 0 && usize == sz_index2size(ind));
}
/*
* Check for events and tsd non-nominal (fast_threshold will be set to
* 0) in a single branch.
*/
if (unlikely(allocated_after >= threshold)) {
return fallback_alloc(size);
}
assert(tsd_fast(tsd));
tcache_t *tcache = tsd_tcachep_get(tsd);
assert(tcache == tcache_get(tsd));
cache_bin_t *bin = &tcache->bins[ind];
/* Suppress spurious warning from static analysis */
assert(bin != NULL);
bool tcache_success;
void *ret;
/*
* We split up the code this way so that redundant low-water
* computation doesn't happen on the (more common) case in which we
* don't touch the low water mark. The compiler won't do this
* duplication on its own.
*/
ret = cache_bin_alloc_easy(bin, &tcache_success);
if (tcache_success) {
fastpath_success_finish(tsd, allocated_after, bin, ret);
return ret;
}
ret = cache_bin_alloc(bin, &tcache_success);
if (tcache_success) {
fastpath_success_finish(tsd, allocated_after, bin, ret);
return ret;
}
return fallback_alloc(size);
}
JEMALLOC_ALWAYS_INLINE tcache_t *
tcache_get_from_ind(tsd_t *tsd, unsigned tcache_ind, bool slow, bool is_alloc) {
tcache_t *tcache;
if (tcache_ind == TCACHE_IND_AUTOMATIC) {
if (likely(!slow)) {
/* Getting tcache ptr unconditionally. */
tcache = tsd_tcachep_get(tsd);
assert(tcache == tcache_get(tsd));
} else if (is_alloc
|| likely(tsd_reentrancy_level_get(tsd) == 0)) {
tcache = tcache_get(tsd);
} else {
tcache = NULL;
}
} else {
/*
* Should not specify tcache on deallocation path when being
* reentrant.
*/
assert(is_alloc || tsd_reentrancy_level_get(tsd) == 0
|| tsd_state_nocleanup(tsd));
if (tcache_ind == TCACHE_IND_NONE) {
tcache = NULL;
} else {
tcache = tcaches_get(tsd, tcache_ind);
}
}
return tcache;
}
JEMALLOC_ALWAYS_INLINE bool
maybe_check_alloc_ctx(tsd_t *tsd, void *ptr, emap_alloc_ctx_t *alloc_ctx) {
if (config_opt_size_checks) {
emap_alloc_ctx_t dbg_ctx;
emap_alloc_ctx_lookup(
tsd_tsdn(tsd), &arena_emap_global, ptr, &dbg_ctx);
if (alloc_ctx->szind != dbg_ctx.szind) {
safety_check_fail_sized_dealloc(
/* current_dealloc */ true, ptr,
/* true_size */ emap_alloc_ctx_usize_get(&dbg_ctx),
/* input_size */
emap_alloc_ctx_usize_get(alloc_ctx));
return true;
}
if (alloc_ctx->slab != dbg_ctx.slab) {
safety_check_fail(
"Internal heap corruption detected: "
"mismatch in slab bit");
return true;
}
}
return false;
}
JEMALLOC_ALWAYS_INLINE bool
prof_sample_aligned(const void *ptr) {
return ((uintptr_t)ptr & PROF_SAMPLE_ALIGNMENT_MASK) == 0;
}
JEMALLOC_ALWAYS_INLINE bool
free_fastpath_nonfast_aligned(void *ptr, bool check_prof) {
/*
* free_fastpath do not handle two uncommon cases: 1) sampled profiled
* objects and 2) sampled junk & stash for use-after-free detection.
* Both have special alignments which are used to escape the fastpath.
*
* prof_sample is page-aligned, which covers the UAF check when both
* are enabled (the assertion below). Avoiding redundant checks since
* this is on the fastpath -- at most one runtime branch from this.
*/
if (config_debug && cache_bin_nonfast_aligned(ptr)) {
assert(prof_sample_aligned(ptr));
}
if (config_prof && check_prof) {
/* When prof is enabled, the prof_sample alignment is enough. */
if (prof_sample_aligned(ptr)) {
return true;
} else {
return false;
}
}
if (config_uaf_detection) {
if (cache_bin_nonfast_aligned(ptr)) {
return true;
} else {
return false;
}
}
return false;
}
/* Returns whether or not the free attempt was successful. */
JEMALLOC_ALWAYS_INLINE
bool
free_fastpath(void *ptr, size_t size, bool size_hint) {
tsd_t *tsd = tsd_get(false);
/* The branch gets optimized away unless tsd_get_allocates(). */
if (unlikely(tsd == NULL)) {
return false;
}
/*
* The tsd_fast() / initialized checks are folded into the branch
* testing (deallocated_after >= threshold) later in this function.
* The threshold will be set to 0 when !tsd_fast.
*/
assert(tsd_fast(tsd)
|| *tsd_thread_deallocated_next_event_fastp_get_unsafe(tsd) == 0);
emap_alloc_ctx_t alloc_ctx JEMALLOC_CC_SILENCE_INIT({0, 0, false});
size_t usize;
if (!size_hint) {
bool err = emap_alloc_ctx_try_lookup_fast(
tsd, &arena_emap_global, ptr, &alloc_ctx);
/* Note: profiled objects will have alloc_ctx.slab set */
if (unlikely(err || !alloc_ctx.slab
|| free_fastpath_nonfast_aligned(ptr,
/* check_prof */ false))) {
return false;
}
assert(alloc_ctx.szind != SC_NSIZES);
usize = sz_index2size(alloc_ctx.szind);
} else {
/*
* Check for both sizes that are too large, and for sampled /
* special aligned objects. The alignment check will also check
* for null ptr.
*/
if (unlikely(size > SC_LOOKUP_MAXCLASS
|| free_fastpath_nonfast_aligned(ptr,
/* check_prof */ true))) {
return false;
}
sz_size2index_usize_fastpath(size, &alloc_ctx.szind, &usize);
/* Max lookup class must be small. */
assert(alloc_ctx.szind < SC_NBINS);
/* This is a dead store, except when opt size checking is on. */
alloc_ctx.slab = true;
}
/*
* Currently the fastpath only handles small sizes. The branch on
* SC_LOOKUP_MAXCLASS makes sure of it. This lets us avoid checking
* tcache szind upper limit (i.e. tcache_max) as well.
*/
assert(alloc_ctx.slab);
uint64_t deallocated, threshold;
te_free_fastpath_ctx(tsd, &deallocated, &threshold);
uint64_t deallocated_after = deallocated + usize;
/*
* Check for events and tsd non-nominal (fast_threshold will be set to
* 0) in a single branch. Note that this handles the uninitialized case
* as well (TSD init will be triggered on the non-fastpath). Therefore
* anything depends on a functional TSD (e.g. the alloc_ctx sanity check
* below) needs to be after this branch.
*/
if (unlikely(deallocated_after >= threshold)) {
return false;
}
assert(tsd_fast(tsd));
bool fail = maybe_check_alloc_ctx(tsd, ptr, &alloc_ctx);
if (fail) {
/* See the comment in isfree. */
return true;
}
tcache_t *tcache = tcache_get_from_ind(tsd, TCACHE_IND_AUTOMATIC,
/* slow */ false, /* is_alloc */ false);
cache_bin_t *bin = &tcache->bins[alloc_ctx.szind];
/*
* If junking were enabled, this is where we would do it. It's not
* though, since we ensured above that we're on the fast path. Assert
* that to double-check.
*/
assert(!opt_junk_free);
if (!cache_bin_dalloc_easy(bin, ptr)) {
return false;
}
*tsd_thread_deallocatedp_get(tsd) = deallocated_after;
return true;
}
JEMALLOC_ALWAYS_INLINE void JEMALLOC_NOTHROW
je_sdallocx_noflags(void *ptr, size_t size) {
if (!free_fastpath(ptr, size, true)) {
sdallocx_default(ptr, size, 0);
}
}
JEMALLOC_ALWAYS_INLINE void JEMALLOC_NOTHROW
je_sdallocx_impl(void *ptr, size_t size, int flags) {
if (flags != 0 || !free_fastpath(ptr, size, true)) {
sdallocx_default(ptr, size, flags);
}
}
JEMALLOC_ALWAYS_INLINE void JEMALLOC_NOTHROW
je_free_impl(void *ptr) {
if (!free_fastpath(ptr, 0, false)) {
free_default(ptr);
}
}
#endif /* JEMALLOC_INTERNAL_INLINES_C_H */

View file

@ -1,57 +1,146 @@
/*
* JEMALLOC_ALWAYS_INLINE and JEMALLOC_INLINE are used within header files for
* functions that are static inline functions if inlining is enabled, and
* single-definition library-private functions if inlining is disabled.
*
* JEMALLOC_ALWAYS_INLINE_C and JEMALLOC_INLINE_C are for use in .c files, in
* which case the denoted functions are always static, regardless of whether
* inlining is enabled.
*/
#if defined(JEMALLOC_DEBUG) || defined(JEMALLOC_CODE_COVERAGE)
/* Disable inlining to make debugging/profiling easier. */
# define JEMALLOC_ALWAYS_INLINE
# define JEMALLOC_ALWAYS_INLINE_C static
# define JEMALLOC_INLINE
# define JEMALLOC_INLINE_C static
# define inline
#ifndef JEMALLOC_INTERNAL_MACROS_H
#define JEMALLOC_INTERNAL_MACROS_H
#ifdef JEMALLOC_DEBUG
# define JEMALLOC_ALWAYS_INLINE static inline
#else
# define JEMALLOC_ENABLE_INLINE
# ifdef JEMALLOC_HAVE_ATTR
# define JEMALLOC_ALWAYS_INLINE \
static inline JEMALLOC_ATTR(unused) JEMALLOC_ATTR(always_inline)
# define JEMALLOC_ALWAYS_INLINE_C \
static inline JEMALLOC_ATTR(always_inline)
# else
# define JEMALLOC_ALWAYS_INLINE static inline
# define JEMALLOC_ALWAYS_INLINE_C static inline
# endif
# define JEMALLOC_INLINE static inline
# define JEMALLOC_INLINE_C static inline
# ifdef _MSC_VER
# define inline _inline
# endif
# ifdef _MSC_VER
# define JEMALLOC_ALWAYS_INLINE static __forceinline
# else
# define JEMALLOC_ALWAYS_INLINE \
JEMALLOC_ATTR(always_inline) static inline
# endif
#endif
#ifdef _MSC_VER
# define inline _inline
#endif
#ifdef JEMALLOC_CC_SILENCE
# define UNUSED JEMALLOC_ATTR(unused)
#else
# define UNUSED
#endif
#define UNUSED JEMALLOC_ATTR(unused)
#define ZU(z) ((size_t)z)
#define ZI(z) ((ssize_t)z)
#define QU(q) ((uint64_t)q)
#define QI(q) ((int64_t)q)
#define ZU(z) ((size_t)z)
#define ZD(z) ((ssize_t)z)
#define QU(q) ((uint64_t)q)
#define QD(q) ((int64_t)q)
#define KZU(z) ZU(z##ULL)
#define KZI(z) ZI(z##LL)
#define KQU(q) QU(q##ULL)
#define KQI(q) QI(q##LL)
#define KZU(z) ZU(z##ULL)
#define KZD(z) ZD(z##LL)
#define KQU(q) QU(q##ULL)
#define KQD(q) QI(q##LL)
#ifndef __DECONST
# define __DECONST(type, var) ((type)(uintptr_t)(const void *)(var))
# define __DECONST(type, var) ((type)(uintptr_t)(const void *)(var))
#endif
#ifndef JEMALLOC_HAS_RESTRICT
# define restrict
#if !defined(JEMALLOC_HAS_RESTRICT) || defined(__cplusplus)
# define restrict
#endif
/* Various function pointers are static and immutable except during testing. */
#ifdef JEMALLOC_JET
# define JET_MUTABLE
# define JET_EXTERN extern
#else
# define JET_MUTABLE const
# define JET_EXTERN static
#endif
#define JEMALLOC_VA_ARGS_HEAD(head, ...) head
#define JEMALLOC_VA_ARGS_TAIL(head, ...) __VA_ARGS__
/* Diagnostic suppression macros */
#if defined(_MSC_VER) && !defined(__clang__)
# define JEMALLOC_DIAGNOSTIC_PUSH __pragma(warning(push))
# define JEMALLOC_DIAGNOSTIC_POP __pragma(warning(pop))
# define JEMALLOC_DIAGNOSTIC_IGNORE(W) __pragma(warning(disable : W))
# define JEMALLOC_DIAGNOSTIC_IGNORE_MISSING_STRUCT_FIELD_INITIALIZERS
# define JEMALLOC_DIAGNOSTIC_IGNORE_FRAME_ADDRESS
# define JEMALLOC_DIAGNOSTIC_IGNORE_TYPE_LIMITS
# define JEMALLOC_DIAGNOSTIC_IGNORE_ALLOC_SIZE_LARGER_THAN
# define JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED
# define JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS
/* #pragma GCC diagnostic first appeared in gcc 4.6. */
#elif (defined(__GNUC__) \
&& ((__GNUC__ > 4) || ((__GNUC__ == 4) && (__GNUC_MINOR__ > 5)))) \
|| defined(__clang__)
/*
* The JEMALLOC_PRAGMA__ macro is an implementation detail of the GCC and Clang
* diagnostic suppression macros and should not be used anywhere else.
*/
# define JEMALLOC_PRAGMA__(X) _Pragma(#X)
# define JEMALLOC_DIAGNOSTIC_PUSH JEMALLOC_PRAGMA__(GCC diagnostic push)
# define JEMALLOC_DIAGNOSTIC_POP JEMALLOC_PRAGMA__(GCC diagnostic pop)
# define JEMALLOC_DIAGNOSTIC_IGNORE(W) \
JEMALLOC_PRAGMA__(GCC diagnostic ignored W)
/*
* The -Wmissing-field-initializers warning is buggy in GCC versions < 5.1 and
* all clang versions up to version 7 (currently trunk, unreleased). This macro
* suppresses the warning for the affected compiler versions only.
*/
# if ((defined(__GNUC__) && !defined(__clang__)) && (__GNUC__ < 5)) \
|| defined(__clang__)
# define JEMALLOC_DIAGNOSTIC_IGNORE_MISSING_STRUCT_FIELD_INITIALIZERS \
JEMALLOC_DIAGNOSTIC_IGNORE( \
"-Wmissing-field-initializers")
# else
# define JEMALLOC_DIAGNOSTIC_IGNORE_MISSING_STRUCT_FIELD_INITIALIZERS
# endif
# define JEMALLOC_DIAGNOSTIC_IGNORE_FRAME_ADDRESS \
JEMALLOC_DIAGNOSTIC_IGNORE("-Wframe-address")
# define JEMALLOC_DIAGNOSTIC_IGNORE_TYPE_LIMITS \
JEMALLOC_DIAGNOSTIC_IGNORE("-Wtype-limits")
# define JEMALLOC_DIAGNOSTIC_IGNORE_UNUSED_PARAMETER \
JEMALLOC_DIAGNOSTIC_IGNORE("-Wunused-parameter")
# if defined(__GNUC__) && !defined(__clang__) && (__GNUC__ >= 7)
# define JEMALLOC_DIAGNOSTIC_IGNORE_ALLOC_SIZE_LARGER_THAN \
JEMALLOC_DIAGNOSTIC_IGNORE("-Walloc-size-larger-than=")
# else
# define JEMALLOC_DIAGNOSTIC_IGNORE_ALLOC_SIZE_LARGER_THAN
# endif
# ifdef JEMALLOC_HAVE_ATTR_DEPRECATED
# define JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED \
JEMALLOC_DIAGNOSTIC_IGNORE("-Wdeprecated-declarations")
# else
# define JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED
# endif
# define JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS \
JEMALLOC_DIAGNOSTIC_PUSH \
JEMALLOC_DIAGNOSTIC_IGNORE_UNUSED_PARAMETER
#else
# define JEMALLOC_DIAGNOSTIC_PUSH
# define JEMALLOC_DIAGNOSTIC_POP
# define JEMALLOC_DIAGNOSTIC_IGNORE(W)
# define JEMALLOC_DIAGNOSTIC_IGNORE_MISSING_STRUCT_FIELD_INITIALIZERS
# define JEMALLOC_DIAGNOSTIC_IGNORE_FRAME_ADDRESS
# define JEMALLOC_DIAGNOSTIC_IGNORE_TYPE_LIMITS
# define JEMALLOC_DIAGNOSTIC_IGNORE_ALLOC_SIZE_LARGER_THAN
# define JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED
# define JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS
#endif
#ifdef __clang_analyzer__
# define JEMALLOC_CLANG_ANALYZER
#endif
#ifdef JEMALLOC_CLANG_ANALYZER
# define JEMALLOC_CLANG_ANALYZER_SUPPRESS __attribute__((suppress))
# define JEMALLOC_CLANG_ANALYZER_SILENCE_INIT(v) = v
#else
# define JEMALLOC_CLANG_ANALYZER_SUPPRESS
# define JEMALLOC_CLANG_ANALYZER_SILENCE_INIT(v)
#endif
#define JEMALLOC_SUPPRESS_WARN_ON_USAGE(...) \
JEMALLOC_DIAGNOSTIC_PUSH \
JEMALLOC_DIAGNOSTIC_IGNORE_DEPRECATED \
__VA_ARGS__ \
JEMALLOC_DIAGNOSTIC_POP
/*
* Disables spurious diagnostics for all headers. Since these headers are not
* included by users directly, it does not affect their diagnostic settings.
*/
JEMALLOC_DIAGNOSTIC_DISABLE_SPURIOUS
#endif /* JEMALLOC_INTERNAL_MACROS_H */

View file

@ -0,0 +1,22 @@
#ifndef JEMALLOC_INTERNAL_OVERRIDES_H
#define JEMALLOC_INTERNAL_OVERRIDES_H
/*
* Under normal circumstances this header serves no purpose, as these settings
* can be customized via the corresponding autoconf options at configure-time.
* Overriding in this fashion is useful when the header files generated by
* autoconf are used as input for another build system.
*/
#ifdef JEMALLOC_OVERRIDE_LG_PAGE
# undef LG_PAGE
# define LG_PAGE JEMALLOC_OVERRIDE_LG_PAGE
#endif
#ifdef JEMALLOC_OVERRIDE_JEMALLOC_CONFIG_MALLOC_CONF
# undef JEMALLOC_CONFIG_MALLOC_CONF
# define JEMALLOC_CONFIG_MALLOC_CONF \
JEMALLOC_OVERRIDE_JEMALLOC_CONFIG_MALLOC_CONF
#endif
#endif /* JEMALLOC_INTERNAL_OVERRIDES_H */

View file

@ -0,0 +1,148 @@
#ifndef JEMALLOC_INTERNAL_TYPES_H
#define JEMALLOC_INTERNAL_TYPES_H
#include "jemalloc/internal/quantum.h"
/* Processor / core id type. */
typedef int malloc_cpuid_t;
/* When realloc(non-null-ptr, 0) is called, what happens? */
enum zero_realloc_action_e {
/* Realloc(ptr, 0) is free(ptr); return malloc(0); */
zero_realloc_action_alloc = 0,
/* Realloc(ptr, 0) is free(ptr); */
zero_realloc_action_free = 1,
/* Realloc(ptr, 0) aborts. */
zero_realloc_action_abort = 2
};
typedef enum zero_realloc_action_e zero_realloc_action_t;
/* Signature of write callback. */
typedef void(write_cb_t)(void *, const char *);
enum malloc_init_e {
malloc_init_uninitialized = 3,
malloc_init_a0_initialized = 2,
malloc_init_recursible = 1,
malloc_init_initialized = 0 /* Common case --> jnz. */
};
typedef enum malloc_init_e malloc_init_t;
/*
* Flags bits:
*
* a: arena
* t: tcache
* 0: unused
* z: zero
* n: alignment
*
* aaaaaaaa aaaatttt tttttttt 0znnnnnn
*/
#define MALLOCX_ARENA_BITS 12
#define MALLOCX_TCACHE_BITS 12
#define MALLOCX_LG_ALIGN_BITS 6
#define MALLOCX_ARENA_SHIFT 20
#define MALLOCX_TCACHE_SHIFT 8
#define MALLOCX_ARENA_MASK \
((unsigned)(((1U << MALLOCX_ARENA_BITS) - 1) << MALLOCX_ARENA_SHIFT))
/* NB: Arena index bias decreases the maximum number of arenas by 1. */
#define MALLOCX_ARENA_LIMIT ((unsigned)((1U << MALLOCX_ARENA_BITS) - 1))
#define MALLOCX_TCACHE_MASK \
((unsigned)(((1U << MALLOCX_TCACHE_BITS) - 1) << MALLOCX_TCACHE_SHIFT))
#define MALLOCX_TCACHE_MAX ((unsigned)((1U << MALLOCX_TCACHE_BITS) - 3))
#define MALLOCX_LG_ALIGN_MASK ((1 << MALLOCX_LG_ALIGN_BITS) - 1)
/* Use MALLOCX_ALIGN_GET() if alignment may not be specified in flags. */
#define MALLOCX_ALIGN_GET_SPECIFIED(flags) \
(ZU(1) << (flags & MALLOCX_LG_ALIGN_MASK))
#define MALLOCX_ALIGN_GET(flags) \
(MALLOCX_ALIGN_GET_SPECIFIED(flags) & (SIZE_T_MAX - 1))
#define MALLOCX_ZERO_GET(flags) ((bool)(flags & MALLOCX_ZERO))
#define MALLOCX_TCACHE_GET(flags) \
(((unsigned)((flags & MALLOCX_TCACHE_MASK) >> MALLOCX_TCACHE_SHIFT)) \
- 2)
#define MALLOCX_ARENA_GET(flags) \
(((unsigned)(((unsigned)flags) >> MALLOCX_ARENA_SHIFT)) - 1)
/* Smallest size class to support. */
#define TINY_MIN (1U << LG_TINY_MIN)
#define LONG ((size_t)(1U << LG_SIZEOF_LONG))
#define LONG_MASK (LONG - 1)
/* Return the smallest long multiple that is >= a. */
#define LONG_CEILING(a) (((a) + LONG_MASK) & ~LONG_MASK)
#define SIZEOF_PTR (1U << LG_SIZEOF_PTR)
#define PTR_MASK (SIZEOF_PTR - 1)
/* Return the smallest (void *) multiple that is >= a. */
#define PTR_CEILING(a) (((a) + PTR_MASK) & ~PTR_MASK)
/*
* Maximum size of L1 cache line. This is used to avoid cache line aliasing.
* In addition, this controls the spacing of cacheline-spaced size classes.
*
* CACHELINE cannot be based on LG_CACHELINE because __declspec(align()) can
* only handle raw constants.
*/
#define LG_CACHELINE 6
#define CACHELINE 64
#define CACHELINE_MASK (CACHELINE - 1)
/* Return the smallest cacheline multiple that is >= s. */
#define CACHELINE_CEILING(s) (((s) + CACHELINE_MASK) & ~CACHELINE_MASK)
/* Return the nearest aligned address at or below a. */
#define ALIGNMENT_ADDR2BASE(a, alignment) \
((void *)(((byte_t *)(a)) \
- (((uintptr_t)(a)) - ((uintptr_t)(a) & ((~(alignment)) + 1)))))
/* Return the offset between a and the nearest aligned address at or below a. */
#define ALIGNMENT_ADDR2OFFSET(a, alignment) \
((size_t)((uintptr_t)(a) & (alignment - 1)))
/* Return the smallest alignment multiple that is >= s. */
#define ALIGNMENT_CEILING(s, alignment) \
(((s) + (alignment - 1)) & ((~(alignment)) + 1))
/*
* Return the nearest aligned address at or above a.
*
* While at first glance this would appear to be merely a more complicated
* way to perform the same computation as `ALIGNMENT_CEILING`,
* this has the important additional property of not concealing pointer
* provenance from the compiler. See the block-comment on the
* definition of `byte_t` for more details.
*/
#define ALIGNMENT_ADDR2CEILING(a, alignment) \
((void *)(((byte_t *)(a)) \
+ (((((uintptr_t)(a)) + (alignment - 1)) & ((~(alignment)) + 1)) \
- ((uintptr_t)(a)))))
/* Declare a variable-length array. */
#if __STDC_VERSION__ < 199901L || defined(__STDC_NO_VLA__)
# ifdef _MSC_VER
# include <malloc.h>
# define alloca _alloca
# else
# ifdef JEMALLOC_HAS_ALLOCA_H
# include <alloca.h>
# else
# include <stdlib.h>
# endif
# endif
# define VARIABLE_ARRAY_UNSAFE(type, name, count) \
type *name = alloca(sizeof(type) * (count))
#else
# define VARIABLE_ARRAY_UNSAFE(type, name, count) type name[(count)]
#endif
#define VARIABLE_ARRAY_SIZE_MAX 2048
#define VARIABLE_ARRAY(type, name, count) \
assert(sizeof(type) * (count) <= VARIABLE_ARRAY_SIZE_MAX); \
VARIABLE_ARRAY_UNSAFE(type, name, count)
#define CALLOC_MADVISE_THRESHOLD_DEFAULT (((size_t)1) << 23) /* 8 MB */
#endif /* JEMALLOC_INTERNAL_TYPES_H */

View file

@ -0,0 +1,286 @@
#ifndef JEMALLOC_PREAMBLE_H
#define JEMALLOC_PREAMBLE_H
#include "jemalloc/internal/jemalloc_internal_defs.h"
#include "jemalloc/internal/jemalloc_internal_decls.h"
#if defined(JEMALLOC_UTRACE) || defined(JEMALLOC_UTRACE_LABEL)
#include <sys/ktrace.h>
# if defined(JEMALLOC_UTRACE)
# define UTRACE_CALL(p, l) utrace(p, l)
# else
# define UTRACE_CALL(p, l) utrace("jemalloc_process", p, l)
# define JEMALLOC_UTRACE
# endif
#endif
#define JEMALLOC_NO_DEMANGLE
#ifdef JEMALLOC_JET
# undef JEMALLOC_IS_MALLOC
# define JEMALLOC_N(n) jet_##n
# include "jemalloc/internal/public_namespace.h"
# define JEMALLOC_NO_RENAME
# include "../jemalloc@install_suffix@.h"
# undef JEMALLOC_NO_RENAME
#else
# define JEMALLOC_N(n) @private_namespace@##n
# include "../jemalloc@install_suffix@.h"
#endif
#if defined(JEMALLOC_OSATOMIC)
#include <libkern/OSAtomic.h>
#endif
#ifdef JEMALLOC_ZONE
#include <mach/mach_error.h>
#include <mach/mach_init.h>
#include <mach/vm_map.h>
#endif
#include "jemalloc/internal/jemalloc_internal_macros.h"
/*
* Note that the ordering matters here; the hook itself is name-mangled. We
* want the inclusion of hooks to happen early, so that we hook as much as
* possible.
*/
#ifndef JEMALLOC_NO_PRIVATE_NAMESPACE
# ifndef JEMALLOC_JET
# include "jemalloc/internal/private_namespace.h"
# else
# include "jemalloc/internal/private_namespace_jet.h"
# endif
#endif
#include "jemalloc/internal/test_hooks.h"
#ifdef JEMALLOC_DEFINE_MADVISE_FREE
# define JEMALLOC_MADV_FREE 8
#endif
/*
* Can be defined at compile time, in cases, when it is known
* madvise(..., MADV_COLLAPSE) feature is supported, but MADV_COLLAPSE
* constant is not defined.
*/
#ifdef JEMALLOC_DEFINE_MADVISE_COLLAPSE
# define JEMALLOC_MADV_COLLAPSE 25
#endif
static const bool config_debug =
#ifdef JEMALLOC_DEBUG
true
#else
false
#endif
;
static const bool have_dss =
#ifdef JEMALLOC_DSS
true
#else
false
#endif
;
static const bool have_madvise_huge =
#ifdef JEMALLOC_HAVE_MADVISE_HUGE
true
#else
false
#endif
;
static const bool have_process_madvise =
#ifdef JEMALLOC_HAVE_PROCESS_MADVISE
true
#else
false
#endif
;
static const bool config_fill =
#ifdef JEMALLOC_FILL
true
#else
false
#endif
;
static const bool config_lazy_lock =
#ifdef JEMALLOC_LAZY_LOCK
true
#else
false
#endif
;
static const char * const config_malloc_conf = JEMALLOC_CONFIG_MALLOC_CONF;
static const bool config_prof =
#ifdef JEMALLOC_PROF
true
#else
false
#endif
;
static const bool config_prof_libgcc =
#ifdef JEMALLOC_PROF_LIBGCC
true
#else
false
#endif
;
static const bool config_prof_libunwind =
#ifdef JEMALLOC_PROF_LIBUNWIND
true
#else
false
#endif
;
static const bool config_prof_frameptr =
#ifdef JEMALLOC_PROF_FRAME_POINTER
true
#else
false
#endif
;
static const bool maps_coalesce =
#ifdef JEMALLOC_MAPS_COALESCE
true
#else
false
#endif
;
static const bool config_stats =
#ifdef JEMALLOC_STATS
true
#else
false
#endif
;
static const bool config_tls =
#ifdef JEMALLOC_TLS
true
#else
false
#endif
;
static const bool config_utrace =
#ifdef JEMALLOC_UTRACE
true
#else
false
#endif
;
static const bool config_xmalloc =
#ifdef JEMALLOC_XMALLOC
true
#else
false
#endif
;
static const bool config_cache_oblivious =
#ifdef JEMALLOC_CACHE_OBLIVIOUS
true
#else
false
#endif
;
/*
* Undocumented, for jemalloc development use only at the moment. See the note
* in jemalloc/internal/log.h.
*/
static const bool config_log =
#ifdef JEMALLOC_LOG
true
#else
false
#endif
;
/*
* Are extra safety checks enabled; things like checking the size of sized
* deallocations, double-frees, etc.
*/
static const bool config_opt_safety_checks =
#ifdef JEMALLOC_OPT_SAFETY_CHECKS
true
#elif defined(JEMALLOC_DEBUG)
/*
* This lets us only guard safety checks by one flag instead of two; fast
* checks can guard solely by config_opt_safety_checks and run in debug mode
* too.
*/
true
#else
false
#endif
;
/*
* Extra debugging of sized deallocations too onerous to be included in the
* general safety checks.
*/
static const bool config_opt_size_checks =
#if defined(JEMALLOC_OPT_SIZE_CHECKS) || defined(JEMALLOC_DEBUG)
true
#else
false
#endif
;
static const bool config_uaf_detection =
#if defined(JEMALLOC_UAF_DETECTION) || defined(JEMALLOC_DEBUG)
true
#else
false
#endif
;
/* Whether or not the C++ extensions are enabled. */
static const bool config_enable_cxx =
#ifdef JEMALLOC_ENABLE_CXX
true
#else
false
#endif
;
#if defined(_WIN32) || defined(__APPLE__) || defined(JEMALLOC_HAVE_SCHED_GETCPU)
/* Currently percpu_arena depends on sched_getcpu. */
#define JEMALLOC_PERCPU_ARENA
#endif
static const bool have_percpu_arena =
#ifdef JEMALLOC_PERCPU_ARENA
true
#else
false
#endif
;
/*
* Undocumented, and not recommended; the application should take full
* responsibility for tracking provenance.
*/
static const bool force_ivsalloc =
#ifdef JEMALLOC_FORCE_IVSALLOC
true
#else
false
#endif
;
static const bool have_background_thread =
#ifdef JEMALLOC_BACKGROUND_THREAD
true
#else
false
#endif
;
static const bool config_high_res_timer =
#ifdef JEMALLOC_HAVE_CLOCK_REALTIME
true
#else
false
#endif
;
static const bool have_memcntl =
#ifdef JEMALLOC_HAVE_MEMCNTL
true
#else
false
#endif
;
#endif /* JEMALLOC_PREAMBLE_H */

View file

@ -0,0 +1,49 @@
#ifndef JEMALLOC_INTERNAL_JEMALLOC_PROBE_H
#define JEMALLOC_INTERNAL_JEMALLOC_PROBE_H
#include <jemalloc/internal/jemalloc_preamble.h>
#ifdef JEMALLOC_EXPERIMENTAL_USDT_STAP
#include <jemalloc/internal/jemalloc_probe_stap.h>
#elif defined(JEMALLOC_EXPERIMENTAL_USDT_CUSTOM)
#include <jemalloc/internal/jemalloc_probe_custom.h>
#elif defined(_MSC_VER)
#define JE_USDT(name, N, ...) /* Nothing */
#else /* no USDT, just check the args */
#define JE_USDT(name, N, ...) _JE_USDT_CHECK_ARG##N(__VA_ARGS__)
#define _JE_USDT_CHECK_ARG1(a) \
do { \
(void)(a); \
} while (0)
#define _JE_USDT_CHECK_ARG2(a, b) \
do { \
(void)(a); \
(void)(b); \
} while (0)
#define _JE_USDT_CHECK_ARG3(a, b, c) \
do { \
(void)(a); \
(void)(b); \
(void)(c); \
} while (0)
#define _JE_USDT_CHECK_ARG4(a, b, c, d) \
do { \
(void)(a); \
(void)(b); \
(void)(c); \
(void)(d); \
} while (0)
#define _JE_USDT_CHECK_ARG5(a, b, c, d, e) \
do { \
(void)(a); \
(void)(b); \
(void)(c); \
(void)(d); \
(void)(e); \
} while (0)
#endif /* JEMALLOC_EXPERIMENTAL_USDT_* */
#endif /* JEMALLOC_INTERNAL_JEMALLOC_PROBE_H */

Some files were not shown because too many files have changed in this diff Show more