Two fixes based on review feedback from OmarAzizi:
1. The pacman.conf path was wrong. /etc/pacman.conf does not exist on
the AppVeyor Windows image; the correct path is c:/msys64/etc/pacman.conf.
This explains why the SigLevel rewrite had no effect in previous attempts.
2. Move 'pacman -Rsc mingw-w64-%CPU%-gcc gcc' to after both -Syuu runs.
Removing gcc before the keyring and package database are fully updated
can leave the environment in a broken state if the update fails midway.
The previous version called open(f,'w') before open(f).read() in the
same expression. Python evaluates the object (open for writing) first,
which truncates the file to zero before the read happens -- resulting
in an empty pacman.conf and a broken build.
Fix: read into a variable first, then write back. Also use ^/$ anchors
with re.MULTILINE so the pattern matches only lines that START with
SigLevel, leaving LocalFileSigLevel untouched. Tested locally against
a mock pacman.conf with the exact MSYS2 AppVeyor format.
sed -i with a ^SigLevel regex ran without error but had no effect --
pacman was still downloading and failing to verify .sig files, proving
the SigLevel line was not actually replaced. Switch to python3 which
reads and rewrites the entire file, matching any whitespace between
SigLevel and = regardless of how the line is formatted in the installed
pacman.conf on the AppVeyor image.
The previous fix using pacman-key --populate hit a circular dependency:
installing msys2-keyring requires verifying it, but verification requires
the updated keyring. Break the cycle by temporarily setting SigLevel=Never
in pacman.conf to allow the initial sync and keyring install, then restore
the original SigLevel before the second update runs with proper verification.
Note: Windows CI is also covered by .github/workflows/windows-ci.yml which
uses msys2/setup-msys2@v2 and handles keyring setup automatically.
The AppVeyor Windows CI has been failing on all builds because the
MSYS2 keyring on the AppVeyor image does not contain signing key
5F944B027F7FE2091985AA2EFA11531AA0AA7F57, causing pacman -Syuu to
abort with 'invalid or corrupted database (PGP signature)'.
Fix by refreshing the keyring before syncing packages:
pacman-key --init
pacman-key --populate msys2
pacman -Sy msys2-keyring
This unblocks all AppVeyor build matrix entries (MINGW32/MINGW64,
with and without MSVC).
All five call sites in test/unit/bin.c that called bin_slab_reg_alloc()
with the old two-argument signature are updated to pass tsdn and bin.
Three test functions (test_bin_slab_reg_alloc, test_bin_slabs_full,
test_bin_slabs_full_auto) lacked tsdn and bin_t entirely — they now
call tsdn_fetch() and bin_init() and wrap the allocation loops with
malloc_mutex_lock/unlock, matching the locking contract enforced by
the new malloc_mutex_assert_owner() in bin_slab_reg_alloc().
Verified: make check passes 17/17, 0 failures.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bitmap_set() performs a plain (non-atomic) read-modify-write on every
level of the bitmap tree:
g = *gp; /* READ */
g ^= ZU(1) << bit; /* MODIFY — thread-local copy */
*gp = g; /* WRITE BACK — no barrier, no CAS */
Two threads that reach bitmap_sfu() -> bitmap_set() concurrently on the
same slab bitmap — even for different bits that share a group word —
will clobber each other's write. The clobbered bit still looks free on
the next allocation; bitmap_sfu() selects it again; the second call to
bitmap_set() aborts on:
assert(!bitmap_get(bitmap, binfo, bit)); /* bitmap.h:220 */
or, once tree propagation begins for a newly-full group:
assert(g & (ZU(1) << (bit & BITMAP_GROUP_NBITS_MASK))); /* bitmap.h:237 */
Either assert calls abort() and produces the coredump reported in
issues #2875 and #2772.
The immediate callers (bin_malloc_with_fresh_slab,
bin_malloc_no_fresh_slab) already assert lock ownership, but
bin_slab_reg_alloc() itself had no such check, making it easy for new
call sites to silently bypass the requirement.
Fix:
- Thread tsdn_t *tsdn and bin_t *bin through bin_slab_reg_alloc() and
call malloc_mutex_assert_owner() as the first statement.
- Update both internal callers (bin_malloc_with_fresh_slab,
bin_malloc_no_fresh_slab) to pass the context they already hold.
- Document the locking contract in bin.h and the thread-safety
constraint in bitmap.h directly above bitmap_set().
Note: bin_slab_reg_alloc_batch() is left unchanged because it has one
legitimate unlocked caller (arena_fill_small_fresh) which operates on
freshly allocated slabs that are not yet visible to any other thread.
Its locking contract is now documented in bin.h.
Fixes#2875
* Document new mallctl interfaces added since 5.3.0
Add documentation for the following new mallctl entries:
- opt.debug_double_free_max_scan: double-free detection scan limit
- opt.prof_bt_max: max profiling backtrace depth
- opt.disable_large_size_classes: page-aligned large allocations
- opt.process_madvise_max_batch: batched process_madvise purging
- thread.tcache.max: per-thread tcache_max control
- thread.tcache.ncached_max.read_sizeclass: query ncached_max
- thread.tcache.ncached_max.write: set ncached_max per size range
- arena.<i>.name: get/set arena names
- arenas.hugepage: hugepage size
- approximate_stats.active: lightweight active bytes estimate
Remove config.prof_frameptr since it still needs more development
and is still experimental.
Co-authored-by: lexprfuncall <carl.shapiro@gmail.com>
When san_bump_grow_locked fails, it sets sba->curr_reg to NULL.
The old curr_reg (saved in to_destroy) was never freed or restored,
leaking the virtual memory extent. Restore sba->curr_reg from
to_destroy on failure so the old region remains usable.
An extra 'size' argument was passed where 'slab' (false) should be,
shifting all subsequent arguments: slab got size (nonzero=true),
szind got false (0), and sn got SC_NSIZES instead of a proper serial
number from extent_sn_next(). Match the correct pattern used by the
gap edata_init call above.
The sentinel fill loop used sz_pind2sz_tab[pind] (constant) instead
of sz_pind2sz_tab[i] (loop variable), writing only to the first
entry repeatedly and leaving subsequent entries uninitialized.
When emap_try_acquire_edata_neighbor returned a non-NULL neighbor but
the size check failed, the neighbor was never released from
extent_state_merging, making it permanently invisible to future
allocation and coalescing operations.
Release the neighbor when it doesn't meet the size requirement,
matching the pattern used in extent_recycle_extract.
Used size_t (unsigned) instead of ssize_t for the return value of
malloc_read_fd, which returns -1 on error. With size_t, -1 becomes
a huge positive value, bypassing the error check and corrupting the
remaining byte count.
Returned LG_PAGE (log2 of page size, e.g. 12) instead of PAGE (actual
page size, e.g. 4096) when sysconf(_SC_PAGESIZE) failed. This would
cause os_page to be set to an absurdly small value, breaking all
page-aligned operations.
newly_mapped_size was set unconditionally in the ecache_alloc_grow
fallback path, even when the allocation returned NULL. This inflated
pac_mapped stats without a corresponding deallocation to correct them.
Guard the assignment with an edata != NULL check, matching the pattern
used in the batched allocation path above it.
When called with size==0, the else branch wrote to str[size-1] which
is str[(size_t)-1], a massive out-of-bounds write. Standard vsnprintf
allows size==0 to mean "compute length only, write nothing".
Add unit test for the size==0 case.
Same pattern as arenas_bin_i_index: used > instead of >= allowing
access one past the end of bstats[] and lstats[] arrays.
Add unit tests that verify boundary indices return ENOENT.
The second expansion attempt in large_ralloc_no_move omitted the !
before large_ralloc_no_move_expand(), inverting the return value.
On expansion failure, the function falsely reported success, making
callers believe the allocation was expanded in-place when it was not.
On expansion success, the function falsely reported failure, causing
callers to unnecessarily allocate, copy, and free.
Add unit test that verifies the return value matches actual size change.
In both the full_slabs and empty_slabs JSON sections of HPA shard
stats, "nactive_huge" was emitted twice instead of emitting
"ndirty_huge" as the second entry. This caused ndirty_huge to be
missing from the JSON output entirely.
Add a unit test that verifies both sections contain "ndirty_huge".
The index validation used > instead of >=, allowing access at index
SC_NBINS (for bins) and SC_NSIZES-SC_NBINS (for lextents), which are
one past the valid range. This caused out-of-bounds reads in bin_infos[]
and sz_index2size_unsafe().
Add unit tests that verify the boundary indices return ENOENT.
These functions had zero callers anywhere in the codebase:
- extent_commit_wrapper: wrapper never called, _impl used directly
- large_salloc: trivial wrapper never called
- tcache_gc_dalloc_new_event_wait: no header declaration, no callers
- tcache_gc_dalloc_postponed_event_wait: no header declaration, no callers
The condition incorrectly used 'alloc_count || 0' which was likely a typo
for 'alloc_count != 0'. While both evaluate similarly for the zero/non-zero
case, the fix ensures consistency with bt_count and thr_count checks and
uses the correct comparison operator.
psset_pick_purge used max_bit-- after rejecting a time-ineligible
candidate, which caused unnecessary re-scanning of the same bitmap
and makes assert fail in debug mode) and a size_t underflow
when the lowest-index entry was rejected. Use max_bit = ind - 1
to skip directly past the rejected index.
A few ways this consistency check can be improved:
* Print which conditions fail and associated values.
* Accumulate the result so that we can print all conditions that fail.
* Turn hpdata_assert_consistent() into a macro so, when it fails,
we can get line number where it's called from.
tsd_tcache_data_init() returns true on failure but its callers ignore
this return value, leaving the per-thread tcache in an uninitialized
state after a failure.
This change disables the tcache on an initialization failure and logs
an error message. If opt_abort is true, it will also abort.
New unit tests have been added to test tcache initialization failures.
This is a clean-up change that gives the bin functions implemented in
the area code a prefix of bin_ and moves them into the bin code.
To further decouple the bin code from the arena code, bin functions
that had taken an arena_t to check arena_is_auto now take an is_auto
parameter instead.
During mutex stats emit, derived counters are not emitted for json.
Yet the array indexing counter should still be increased to skip
derived elements in the output, which was not. This commit fixes it.
While undocumented, the prctl system call will set errno to ENOMEM
when passed NULL as an address. Under that condition, an assertion
that check for EINVAL as the only possible errno value will fail. To
avoid the assertion failure, this change skips the call to os_page_id
when address is NULL. NULL can only occur after mmap fails in which
case there is no mapping to name.
When C++ support is enabled, configure unconditionally probes
`-lstdc++` and keeps it in LIBS if the link test succeeds. On
platforms using libc++, this probe can succeed at compile time (if
libstdc++ headers/libraries happen to be installed) but then cause
runtime failures when configure tries to execute test binaries
because `libstdc++.so.6` isn't actually available.
Add a `--with-cxx-stdlib=<libstdc++|libcxx>` option that lets the
build system specify which C++ standard library to link. When given,
the probe is skipped and the specified library is linked directly.
When not given, the original probe behavior is preserved.
Add mechanism to be able to select a test to run from a test file. The test harness will read the JEMALLOC_TEST_NAME env and, if set, it will only run subtests with that name.
The definition of the PAGE_SIZE macro is used as a signal for a 32-bit
target or a 64-bit target with an older NDK. Otherwise, a 16KiB page
size is assumed.
Closes: #2657
The address of the local variable created_threads is a different
location than the data it points to. Incorrectly treating these
values as being the same can cause out-of-bounds writes to the stack.
Closes: facebook/jemalloc#59