Converted: jemalloc_init, prof_log, base, pac, malloc_io, prof,
background_thread, pages.
No latent hermeticity bugs in headers this batch. All fixes are
explicit includes for symbols (arena.h, background_thread.h,
jemalloc_internal_externs.h, etc.) that the umbrella was supplying
transitively.
Step 6 (Option B) of the cyclical-dep cleanup, batch 6 of N.
Converted: thread_event, emap, sec, eset, tsd, psset, zone, hpdata,
ckh, prof_recent.
No latent hermeticity bugs in headers this batch -- just .c files
that needed previously-transitive includes added (most commonly
arena.h, the various jemalloc_internal_inlines_*, mutex.h, tsd.h,
witness.h, and prof.h).
Step 6 (Option B) of the cyclical-dep cleanup, batch 5 of N.
Converted: prof_stack_range, jemalloc_fork, san, pa_extra, mutex,
thread_event_registry, rtree, ehooks, pa, extent_dss, decay, large,
nstime, bin, arenas_management.
One latent hermeticity bug surfaced: prof_sys.h declares
`void bt_init(prof_bt_t *bt, void **vec);` but didn't include prof.h
where prof_bt_t is defined. Added the include.
Step 6 (Option B) of the cyclical-dep cleanup, batch 4 of N.
Converted: hpa_hooks, san_bump, sz, cache_bin, bitmap, hpa_central,
witness, fxp, buf_writer, edata_cache.
No latent hermeticity bugs in headers this batch -- a few .c files
just needed previously-transitive includes added (e.g. hpa_central.c
needed hpa.h for hpa_supported()).
Step 6 (Option B) of the cyclical-dep cleanup, batch 3 of N.
Converted: hpa_utils, ecache, extent_mmap, util, safety_check,
prof_stats, peak_event, inspect, log.
(src/div.c was already minimal; skipped.)
One latent hermeticity bug surfaced: peak_event.h declared
`extern te_base_cb_t peak_te_handler;` but didn't include
thread_event_registry.h where te_base_cb_t is typedef'd. Added the
include to peak_event.h. peak_event.c also needs thread_event.h
directly for TE_MIN_START_WAIT.
Step 6 (Option B) of the cyclical-dep cleanup, batch 2 of N.
Replaces #include "jemalloc_internal_includes.h" with explicit
per-symbol includes in five small TUs:
src/edata.c -> edata.h
src/exp_grow.c -> exp_grow.h
src/ticker.c -> ticker.h
src/bin_info.c -> assert.h, bin_info.h
src/counter.c -> counter.h, witness.h
One latent hermeticity bug surfaced: sz.h's
sz_large_size_classes_disabled() inline references
opt_disable_large_size_classes (declared in
jemalloc_internal_externs.h) but sz.h didn't include that header.
Worked under the umbrella but breaks once consumers stop including
everything. Added the include to sz.h so it stands on its own.
Step 6 (Option B) of the cyclical-dep cleanup, batch 1 of N.
arena_types.h + arena_structs.h + arena_externs.h merged into arena.h,
keeping the three logical sections (TYPES / STRUCTS / EXTERNS) with
explicit dividers. arena_inlines_a.h and arena_inlines_b.h stay
separate; arena_inlines_b.h now carries a comment explaining why
merging the two would reintroduce a real #include cycle through
tcache_inlines.h -> arena_choose (the asymmetric cycle-breaker).
Two ordering gotchas this consolidation surfaced:
1. tsd_internals.h is included from tsd.h via tsd_generic.h, sometimes
long before arena.h is loaded (e.g. ckh.c includes ckh.h -> tsd.h
before jemalloc_internal_includes.h). TSD_INITIALIZER's expansion
in tsd_generic.h's function bodies references
ARENA_DECAY_NTICKS_PER_UPDATE, so it must already be defined.
Factor the constant into a new minimal header,
arena_decay_constants.h, that pulls nothing but jemalloc_preamble.h,
and include it from both arena.h and tsd_internals.h. arena_t is
still added as a forward decl in tsd_internals.h -- including
arena.h there would trigger arena_stats.h -> mutex.h -> tsd.h ->
re-entry into this very file.
2. extent_dss.h previously included arena_types.h for the arena_t
pointer type, but arena.h now includes extent_dss.h (it was a
STRUCTS-section dep). Forward-decl arena_t in extent_dss.h to
break that cycle.
Additional forward decls in tcache.h and large.h (arena_t *). These
were previously satisfied by the master include order loading
arena_types.h before everything else; with arena.h now in the EXTERNS
section, large.h and tcache.h are parsed earlier than arena.h, so
they need to declare arena_t themselves.
jemalloc_internal_externs.h's #include of arena_types.h was
vestigial -- the file uses no arena symbols. Dropped.
Each of these components had a four-way split (_types, _structs,
_externs, _inlines) that dates back to the old "include each section
multiple times from a master file" pattern. With Step 2's edata <->
prof_types decoupling, merging _types + _structs + _externs into one
header per component no longer risks recreating an include cycle.
- prof.h replaces prof_types.h + prof_structs.h + prof_externs.h.
- tcache.h replaces tcache_types.h + tcache_structs.h + tcache_externs.h.
prof_inlines.h and tcache_inlines.h are kept separate: prof_inlines.h
sits at the bottom of the dependency layering, and tcache_inlines.h's
include of arena_externs.h is the asymmetric cycle-breaker that keeps
the arena<->tcache symbol cycle from becoming an include cycle.
Two surprises required adjustments beyond a straight concatenation:
1. te_prof_sample_event_lookahead was a JEMALLOC_ALWAYS_INLINE function
defined in prof_externs.h, but its body calls tsd_thread_allocated_*
accessors that only exist after tsd inlines are loaded. The original
layering hid this because prof_externs.h was only included near the
bottom of jemalloc_internal_includes.h. After consolidation,
tsd_internals.h's includes pull prof.h in earlier, exposing the
ordering dependency. Moved the inline to prof_inlines.h (where
inline definitions belong anyway) and left only the related extern
in prof.h.
2. base.h was included from prof_externs.h and tcache_externs.h purely
for base_t * pointer arguments on a couple of declarations. Carrying
that include into the merged prof.h / tcache.h would pull ehooks.h
(-> tsd.h) into tsd_internals.h before tsd_internals.h finishes
declaring its tsd accessors. Replaced with a forward declaration of
base_t in each merged file.
Similarly, tsd_internals.h's prior #include of prof_types.h becomes a
forward decl of prof_tdata_t (the only prof symbol it references, and
only as a pointer), and large.h needs a forward decl of prof_info_t
because large.h is loaded before prof.h in the new master ordering.
No inline / static qualifiers are dropped; only the one inline moves
files. #ifdef blocks (JEMALLOC_PROF, JEMALLOC_PROF_LIBGCC,
JEMALLOC_PROF_GCC, JEMALLOC_DEBUG) are kept intact.
Folds several historical *_types/_structs/_externs/_inlines splits where
the layering is no longer load-bearing.
- large_externs.h -> large.h: renamed; it was a single-purpose
function-prototype file.
- background_thread_structs.h + background_thread_externs.h ->
background_thread.h: merged. background_thread_inlines.h is kept
separate because it depends on arena_inlines_a.h.
- bin_inlines.h folded into bin.h, along with BIN_SHARDS_MAX /
N_BIN_SHARDS_DEFAULT from bin_types.h. bin.h carries a forward decl
of arena_binind_div_info (declared in arena_externs.h) so it stays
hermetic without re-introducing the bin.h <-> arena_externs.h cycle.
- tsd_binshards.h (new): houses tsd_binshards_t and its zero
initializer. Keeping these out of bin.h lets tsd_internals.h pull in
just what it needs during X-macro expansion, avoiding bin.h's mutex.h
dependency (mutex.h itself depends on TSD machinery, so routing it
through tsd_internals.h forms a chicken-and-egg).
jemalloc_internal_includes.h: drops the now-redundant references to
the deleted/merged headers.
edata.h only uses prof_tctx_t and prof_recent_t as opaque pointer types
(in two getters, two setters, and two struct fields), so forward
declarations are sufficient. Drop the #include of prof_types.h and
declare the two typedefs locally.
Pull the tcache-aware allocation routing helpers out of arena into a
layer that sits directly below the public malloc interface:
arena_malloc -> malloc_dispatch_malloc
arena_palloc -> malloc_dispatch_palloc
arena_ralloc -> malloc_dispatch_ralloc
arena_dalloc* -> malloc_dispatch_dalloc*
arena_sdalloc* -> malloc_dispatch_sdalloc*
arena_dalloc_promoted -> malloc_dispatch_dalloc_promoted
These helpers decide whether to route through tcache or fall through to
arena/large fast paths. They are now owned by malloc_dispatch_inlines.h
+ src/malloc_dispatch.c, and the only consumers are the public-front-end
wrappers in jemalloc_internal_inlines_c.h.
arena keeps a narrower arena_prof_demote() helper for the sampled
allocation demotion + redzone verification it used to perform inline.
arena_inlines_b.h no longer includes tcache_inlines.h -- the symbol
level arena <-> tcache cycle is gone (it's now in malloc_dispatch).
Fix FreeBSD postfork child handler never being called: FreeBSD's libthr
calls _malloc_postfork in both parent and child (see freebsd-src
lib/libthr/thread/thr_fork.c), but jemalloc mapped it to the parent
handler only. Detect the child via getpid() and route to
jemalloc_postfork_child, which resets nthreads and rebuilds the
descriptor queue.
Remove the child_survivor_bytes vs pre_survivor_bytes comparison: on
macOS where jemalloc registers as the default zone, internal allocations
during the postfork handler (pthread_mutex_init) can inflate the
surviving thread's tcache.
Add double-fork test to verify prefork pid is refreshed correctly when a
child process forks again.
Zero-sized arrays are not allowed by ISO C.
C99 introduced a way to express this.
Type-checking fails, because all_bins is asigned malloced
storage of length > 0.
Found by GCC and Clang (-Wpedantic).
This change includes the following improvements:
- Remove the hpa_sec_batch_fill_extra parameter.
- Refactor the hpa_alloc() code and helper functions to be able to
allocate more than one extent out of a single pageslab. This way
we can amortize the per-pageslab costs (active bitmap iteration,
pageslab metadata updates) across multiple extents.
- Decide on a min and max number of extents that will be allocated
in hpa_alloc(). The code will try to allocate at least the min
and allocate up to the max as long as we can allocate additional
ones from the pageslab we already have, as additional allocations
are relatively cheap.
- Add extent allocation distribution stats.
- Amend hpa_sec_integration.c unit test.
The orchestrator looks up the surviving descriptor via
tcache_postfork_arena_descriptor and threads it into
arena_postfork_child, eliminating arena's call into tcache. Also reset
cache_bin_array_descriptor_ql_mtx right before the queue rebuild it
protects.
tcache.c was reaching into arena->cache_bin_array_descriptor_ql{,_mtx}
directly to register / unregister / postfork-relink its descriptor.
That queue and mutex are owned by arena, so the locking and ql_*
operations belong in arena.c.
After tcache_init runs, tcache_slow->tcache == tcache always holds, so
the tcache_t parameter to the three association helpers is derivable
from tcache_slow.
Drop the duplicate arena->tcache_ql; stats merging walks the
cache_bin_array_descriptor_ql directly. Rename the protecting mutex
from tcache_ql_mtx to cache_bin_array_descriptor_ql_mtx to match. Add
an assertion in test_thread_migrate_arena that the dissociate-time
flush zeros cache_bin->tstats.nrequests.
bin_t is an arena implementation detail; tcache should not reach into
it. Extract the slab-address lookup into bin.c as bin_current_slab_addr,
and expose it to tcache only through arena_locality_hint(tsdn, arena,
szind), which composes bin_choose + bin_current_slab_addr.
After replacing PAI vtable dispatch with direct calls in the previous
commit, the embedded pai_t member in pac_t and hpa_shard_t is dead
weight, and pai.h has no remaining users. Remove them.
Changes:
- Drop pai_t pai member (and "must be first member" comment) from
pac_t and hpa_shard_t.
- Replace #include "jemalloc/internal/pai.h" with the actually-needed
edata.h / tsd_types.h in pac.h, hpa.h, sec.h, pa.h.
- Update extent_pai_t comment in edata.h to no longer reference pai.h.
- Update three remaining test files (hpa_thp_always,
hpa_vectorized_madvise, hpa_vectorized_madvise_large_batch) to call
hpa_*(tsdn, shard, ...) directly instead of pai_*(tsdn, &shard->pai,
...).
- Delete include/jemalloc/internal/pai.h.
No behavioral changes.
The pai_t interface implements C-style polymorphism via function pointers
to abstract over PAC and HPA. This abstraction provides no real benefit:
only two implementations exist, the dispatcher already knows which one to
use, and HPA stubs 2 of 5 operations. Remove the runtime dispatch in
favor of direct calls.
This commit:
- Promotes pac_alloc/expand/shrink/dalloc/time_until_deferred_work to
external linkage and replaces the pai_t *self parameter with pac_t *pac.
- Promotes hpa_alloc/expand/shrink/dalloc/time_until_deferred_work to
external linkage and replaces pai_t *self with hpa_shard_t *shard.
- Updates hpa_dalloc_batch's signature to take hpa_shard_t * directly
and removes the hpa_from_pai container-of helper. Updates internal
callers in hpa_alloc, hpa_dalloc, and hpa_sec_flush_impl.
- Drops the vtable assignments from pac_init() and hpa_shard_init().
- Replaces pai_alloc/dalloc/etc. dispatch in pa.c with direct calls.
HPA expand and shrink (which are unconditional failure stubs) are
skipped entirely for HPA-owned extents.
- Removes the pa_get_pai() helper.
- Updates tests in test/unit/hpa.c and test/unit/hpa_sec_integration.c
to call hpa_alloc/dalloc/etc. directly.
The pai_t struct field stays as dead weight in pac_t and hpa_shard_t;
it is removed in the next commit along with pai.h itself.
No behavioral changes.
Some pages (e.g., hugetlb pages) cannot be purged, and should be
prioritized for reuse. A custom extent_alloc hook signals this by
OR'ing EXTENT_ALLOC_FLAG_PINNED into the low bits of the returned
pointer; jemalloc strips the flag bits and caches pinned extents in
a dedicated ecache_pinned, separate from the dirty/muzzy decay
pipeline.
Pinned extents do not coalesce eagerly, except for ones larger than
SC_LARGE_MINCLASS. A prefer-small policy reuses the smallest fitting
pinned extent, to avoid unnecessary split/fragmentation.