Commit graph

2013 commits

Author SHA1 Message Date
guangli-dai
fbca96c433 Remove unnecessary parameters for cache_bin_postincrement. 2023-09-06 10:47:14 -07:00
Qi Wang
7d563a8f81 Update safety check message to remove --enable-debug when it's already on. 2023-09-05 14:15:45 -07:00
Kevin Svetlitski
da66aa391f Enable a few additional warnings for CI and fix the issues they uncovered
- `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are
  helpful for finding dead code and/or things that should be `static`
  but aren't marked as such.
- `-Wunused-macros` is of similar utility, but for identifying dead macros.
- `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly
  what they say: flag unreachable code.
2023-08-11 13:56:23 -07:00
Kevin Svetlitski
3aae792b10 Fix infinite purging loop in HPA
As reported in #2449, under certain circumstances it's possible to get
stuck in an infinite loop attempting to purge from the HPA. We now
handle this by validating the HPA settings at the end of
configuration parsing and either normalizing them or aborting depending on
if `abort_conf` is set.
2023-08-08 14:36:19 -07:00
Kevin Svetlitski
424dd61d57 Issue a warning upon directly accessing an arena's bins
An arena's bins should normally be accessed via the `arena_get_bin`
function, which properly takes into account bin-shards. To ensure that
we don't accidentally commit code which incorrectly accesses the bins
directly, we mark the field with `__attribute__((deprecated))` with an
appropriate warning message, and suppress the warning in the few places
where directly accessing the bins is allowed.
2023-08-04 15:47:05 -07:00
Kevin Svetlitski
07a2eab3ed Stop over-reporting memory usage from sampled small allocations
@interwq noticed [while reviewing an earlier PR](https://github.com/jemalloc/jemalloc/pull/2478#discussion_r1256217261)
that I missed modifying this statistics accounting in line with the rest
of the changes from #2459. This is now fixed, such that sampled small
allocations increment the `.nmalloc`/`.ndalloc` of their effective bin
size instead of over-reporting memory usage by attributing all such
allocations to `SC_LARGE_MINCLASS`.
2023-08-03 16:12:22 -07:00
Qi Wang
6816b23862 Include the unrecognized malloc conf option in the error message.
Previously the option causing trouble will not be printed, unless the option
key:value pair format is found.
2023-08-02 10:44:55 -07:00
Kevin Svetlitski
62648c88e5 Ensured sampled allocations are properly deallocated during arena_reset
Sampled allocations were not being demoted before being deallocated
during an `arena_reset` operation.
2023-08-01 11:35:37 -07:00
Kevin Svetlitski
9ba1e1cb37 Make ctl_arena_clear slightly more efficient
While this function isn't particularly hot, (accounting for just 0.27% of
time spent inside the allocator on average across the fleet), looking
at the generated assembly and performance profiles does show we're dispatching
to multiple different `memset`s when we could instead be just tail-calling
`memset` once, reducing code size and marginally improving performance.
2023-07-31 14:44:04 -07:00
Kevin Svetlitski
3e82f357bb Fix all optimization-inhibiting integer-to-pointer casts
Following from PR #2481, we replace all integer-to-pointer casts [which
hide pointer provenance information (and thus inhibit
optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html)
with equivalent operations that preserve this information. I have
enabled the corresponding clang-tidy check in our static analysis CI so
that we do not get bitten by this again in the future.
2023-07-24 14:40:42 -07:00
Kevin Svetlitski
1431153695 Define SBRK_INVALID instead of using a magic number 2023-07-24 14:40:42 -07:00
Kevin Svetlitski
7e54dd1ddb Define PROF_TCTX_SENTINEL instead of using magic numbers
This makes the code more readable on its own, and also sets the stage
for more cleanly handling the pointer provenance lints in a following
commit.
2023-07-24 14:40:42 -07:00
Kevin Svetlitski
c49c17f128 Suppress verbose frame address warnings
These warnings are not useful, and make the output of some CI jobs
enormous and difficult to read, so let's suppress them.
2023-07-24 10:44:17 -07:00
Kevin Svetlitski
cdb2c0e02f Implement C23's free_sized and free_aligned_sized
[N2699 - Sized Memory Deallocation](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2699.htm)
introduced two new functions which were incorporated into the C23
standard, `free_sized` and `free_aligned_sized`. Both already have
analogues in Jemalloc, all we are doing here is adding the appropriate
wrappers.
2023-07-20 15:06:41 -07:00
Kevin Svetlitski
589c63b424 Make eligible global variables static and/or const
For better or worse, Jemalloc has a significant number of global
variables. Making all eligible global variables `static` and/or `const`
at least makes it slightly easier to reason about them, as these
qualifications communicate to the programmer restrictions on their use
without having to `grep` the whole codebase.
2023-07-06 14:15:12 -07:00
Qi Wang
602edd7566 Enabled -Wstrict-prototypes and fixed warnings. 2023-07-06 12:00:02 -07:00
Kevin Svetlitski
5a858c64d6 Reduce the memory overhead of sampled small allocations
Previously, small allocations which were sampled as part of heap
profiling were rounded up to `SC_LARGE_MINCLASS`. This additional memory
usage becomes problematic when the page size is increased, as noted in #2358.

Small allocations are now rounded up to the nearest multiple of `PAGE`
instead, reducing the memory overhead by a factor of 4 in the most
extreme cases.
2023-07-03 16:19:06 -07:00
Qi Wang
d131331310 Avoid eager purging on the dedicated oversize arena when using bg thds.
We have observed new workload patterns (namely ML training type) that cycle
through oversized allocations frequently, because 1) the dataset might be sparse
which is faster to go through, and 2) GPU accelerated.  As a result, the eager
purging from the oversize arena becomes a bottleneck.  To offer an easy
solution, allow normal purging of the oversized extents when background threads
are enabled.
2023-06-27 11:57:41 -07:00
Kevin Svetlitski
bb0333e745 Fix remaining static analysis warnings
Fix or suppress the remaining warnings generated by static analysis.
This is a necessary step before we can incorporate static analysis into
CI. Where possible, I've preferred to modify the code itself instead of
just disabling the warning with a magic comment, so that if we decide to
use different static analysis tools in the future we will be covered
against them raising similar warnings.
2023-06-23 11:50:29 -07:00
Qi Wang
86eb49b478 Fix the arena selection for oversized allocations.
Use the per-arena oversize_threshold, instead of the global setting.
2023-06-06 15:03:13 -07:00
Christos Zoulas
5832ef6589 Use a local variable to set the alignment for this particular allocation
instead of changing mmap_flags which makes the change permanent. This was
enforcing large alignments for allocations that did not need it causing
fragmentation. Reported by Andreas Gustafsson.
2023-05-31 14:44:24 -07:00
Kevin Svetlitski
6d4aa33753 Extract the calculation of psset heap assignment for an hpdata into a common function
This is in preparation for upcoming changes I plan to make to this
logic. Extracting it into a common function will make this easier and
less error-prone, and cleans up the existing code regardless.
2023-05-31 11:44:04 -07:00
Arne Welzel
d59e30cbc9 Rename fallback_impl to fallbackNewImpl and prune in jeprof
The existing fallback_impl name seemed a bit generic and given
it's static probably okay to rename.

Closes #2451
2023-05-31 11:41:09 -07:00
Kevin Svetlitski
9c32689e57 Fix bug where hpa_shard was not being destroyed
It appears that this was a simple mistake where `hpa_shard_disable` was
being called instead of `hpa_shard_destroy`. At present
`hpa_shard_destroy` is not called anywhere at all outside of test-cases,
which further suggests that this is a bug. @davidtgoldblatt noted
however that since HPA is disabled for manual arenas and we don't
support destruction for auto arenas that presently there is no way to
actually trigger this bug. Nonetheless, it should be fixed.
2023-05-18 14:17:38 -07:00
Kevin Svetlitski
3e2ba7a651 Remove dead stores detected by static analysis
None of these are harmful, and they are almost certainly optimized
away by the compiler. The motivation for fixing them anyway is that
we'd like to enable static analysis as part of CI, and the first step
towards that is resolving the warnings it produces at present.
2023-05-11 20:27:49 -07:00
Kevin Svetlitski
0288126d9c Fix possible NULL pointer dereference from mallctl("prof.prefix", ...)
Static analysis flagged this issue. Here is a minimal program which
causes a segfault within Jemalloc:
```
#include <jemalloc/jemalloc.h>

const char *malloc_conf = "prof:true";

int main() {
  mallctl("prof.prefix", NULL, NULL, NULL, 0);
}
```

Fixed by checking if `prefix` is `NULL`.
2023-05-11 14:47:50 -07:00
Qi Wang
94ace05832 Fix the prof thread_name reference in prof_recent dump.
As pointed out in #2434, the thread_name in prof_tdata_t was changed in #2407.
This also requires an update for the prof_recent dump, specifically the emitter
expects a "char **" which is fixed in this commit.
2023-05-11 09:10:57 -07:00
Qi Wang
6ea8a7e928 Add config detection for JEMALLOC_HAVE_PTHREAD_SET_NAME_NP.
and use it on the background thread name setting.
2023-05-11 09:10:57 -07:00
auxten
5bac384970 If ptr present check if alloc_ctx.edata == NULL 2023-05-10 17:18:22 -07:00
auxten
019cccc293 Make arenas_lookup_ctl triable 2023-05-10 17:18:22 -07:00
Kevin Svetlitski
dc0a184f8d Fix possible NULL pointer dereference in VERIFY_READ
Static analysis flagged this. Fixed by simply checking `oldlenp`
before dereferencing it.
2023-05-09 10:57:09 -07:00
Kevin Svetlitski
12311fe6c3 Fix segfault in extent_try_coalesce_impl
Static analysis flagged this. `extent_record` was passing `NULL` as the
value for `coalesced` to `extent_try_coalesce`, which in turn passes
that argument to `extent_try_coalesce_impl`, where it is written to
without checking if it is `NULL`. I can confirm from reviewing the
fleetwide coredump data that this was in fact being hit in production.
2023-05-09 10:55:44 -07:00
Kevin Svetlitski
70344a2d38 Make eligible functions static
The codebase is already very disciplined in making any function which
can be `static`, but there are a few that appear to have slipped through
the cracks.
2023-05-08 15:00:02 -07:00
Kevin Svetlitski
fc680128e0 Remove errant assert in arena_extent_alloc_large
This codepath may generate deferred work when the HPA is enabled.
See also [@davidtgoldblatt's relevant comment on the PR which
introduced this](https://github.com/jemalloc/jemalloc/pull/2107#discussion_r699770967)
which prevented a similarly incorrect `assert` from being added elsewhere.
2023-05-01 10:00:30 -07:00
Eric Mueller
521970fb2e Check for equality instead of assigning in asserts in hpa_from_pai.
It appears like a simple typo means we're unconditionally overwriting
some fields in hpa_from_pai when asserts are enabled. From hpa_shard_init,
it looks like these fields have these values anyway, so this shouldn't
cause bugs, but if something is wrong it seems better to have these
asserts in place.

See issue #2412.
2023-04-17 20:57:48 -07:00
Qi Wang
ce0b7ab6c8 Inline the storage for thread name in prof_tdata_t.
The previous approach managed the thread name in a separate buffer, which causes
races because the thread name update (triggered by new samples) can happen at
the same time as prof dumping (which reads the thread names) -- these two
operations are under separate locks to avoid blocking each other.  Implemented
the thread name storage as part of the tdata struct, which resolves the lifetime
issue and also avoids internal alloc / dalloc during prof_sample.
2023-04-05 10:03:12 -07:00
Amaury Séchet
f743690739 Remove unused mutex from hpa_central 2023-03-10 11:25:47 -08:00
Qi Wang
c7805f1eb5 Add a header in HPA stats for the nonfull slabs. 2023-02-17 13:31:27 -08:00
Qi Wang
b6125120ac Add an explicit name to the dedicated oversize arena. 2023-02-17 13:31:09 -08:00
Qi Wang
5fd55837bb Fix thread_name updating for heap profiling.
The current thread name reading path updates the name every time, which requires
both alloc and dalloc -- and the temporary NULL value in the middle causes races
where the prof dump read path gets NULLed in the middle.

Minimize the changes in this commit to isolate the bugfix testing; will also
refactor the whole thread name paths later.
2023-02-15 17:49:40 -08:00
Qi Wang
8580c65f81 Implement prof sample hooks "experimental.hooks.prof_sample(_free)".
The added hooks hooks.prof_sample and hooks.prof_sample_free are intended to
allow advanced users to track additional information, to enable new ways of
profiling on top of the jemalloc heap profile and sample features.

The sample hook is invoked after the allocation and backtracing, and forwards
the both the allocation and backtrace to the user hook; the sample_free hook
happens before the actual deallocation, and forwards only the ptr and usz to the
hook.
2022-12-07 16:06:49 -08:00
Guangli Dai
e8f9f13811 Inline free and sdallocx into operator delete 2022-11-21 11:14:05 -08:00
Qi Wang
481bbfc990 Add a configure option --enable-force-getenv.
Allows the use of getenv() rather than secure_getenv() to read MALLOC_CONF.
This helps in situations where hosts are under full control, and setting
MALLOC_CONF is needed while also setuid.  Disabled by default.
2022-11-04 13:37:14 -07:00
Qi Wang
143e9c4a2f Enable fast thread locals for dealloc-only threads.
Previously if a thread does only allocations, it stays on the slow path /
minimal initialized state forever.  However, dealloc-only is a valid pattern for
dedicated reclamation threads -- this means thread cache is disabled (no batched
flush) for them, which causes high overhead and contention.

Added the condition to fully initialize TSD when a fair amount of dealloc
activities are observed.
2022-10-25 09:54:38 -07:00
David Carlier
4c95c953e2 fix build for non linux/BSD platforms. 2022-10-03 10:42:09 -07:00
Guangli Dai
ba19d2cb78 Add arena-level name.
An arena-level name can help identify manual arenas.
2022-09-16 15:04:59 -07:00
Guangli Dai
a0734fd6ee Making jemalloc max stack depth a runtime option 2022-09-12 13:56:22 -07:00
Abael He
56ddbea270 error: implicit declaration of function 'pthread_create_fptr_init' is invalid in C99
./autogen.sh \
&& ./configure --prefix=/usr/local  --enable-static   --enable-autogen --enable-xmalloc --with-static-libunwind=/usr/local/lib/libunwind.a --enable-lazy-lock --with-jemalloc-prefix='' \
&& make -j16

...
gcc -std=gnu11 -Werror=unknown-warning-option -Wall -Wextra -Wshorten-64-to-32 -Wsign-compare -Wundef -Wno-format-zero-length -Wpointer-arith -Wno-missing-braces -Wno-missing-field-initializers -pipe -g3 -Wimplicit-fallthrough -O3 -funroll-loops -fPIC -DPIC -c -D_REENTRANT -Iinclude -Iinclude -DJEMALLOC_NO_PRIVATE_NAMESPACE -o src/edata_cache.sym.o src/edata_cache.c
src/background_thread.c:768:6: error: implicit declaration of function 'pthread_create_fptr_init' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
            pthread_create_fptr_init()) {
            ^
src/background_thread.c:768:6: note: did you mean 'pthread_create_wrapper_init'?
src/background_thread.c:34:1: note: 'pthread_create_wrapper_init' declared here
pthread_create_wrapper_init(void) {
^
1 error generated.
make: *** [src/background_thread.sym.o] Error 1
make: *** Waiting for unfinished jobs....
2022-09-07 11:56:41 -07:00
Guangli Dai
ce29b4c3d9 Refactor the remote / cross thread cache bin stats reading
Refactored cache_bin.h so that only one function is racy.
2022-09-06 19:41:19 -07:00
Ivan Zaitsev
36366f3c4c Add double free detection in thread cache for debug build
Add new runtime option `debug_double_free_max_scan` that specifies the max
number of stack entries to scan in the cache bit when trying to detect the
double free bug (currently debug build only).
2022-08-04 16:58:22 -07:00