Commit graph

208 commits

Author SHA1 Message Date
Jim Chen
7883c7749f Use openat syscall if available
Some architectures like AArch64 may not have the open syscall because it
was superseded by the openat syscall, so check and use SYS_openat if
SYS_open is not available.

Additionally, Android headers for AArch64 define SYS_open to __NR_open,
even though __NR_open is undefined. Undefine SYS_open in that case so
SYS_openat is used.
2017-05-12 10:34:32 -07:00
Jason Evans
807a9a3e17 Fix decommit-related run fragmentation.
When allocating runs with alignment stricter than one page, commit after
trimming the head/tail from the initial over-sized allocation, rather
than before trimming.  This avoids creating clean-but-committed runs;
such runs do not get purged (and decommitted as a side effect), so they
can cause unnecessary long-term run fragmentation.

Do not commit decommitted memory in chunk_recycle() unless asked to by
the caller.  This allows recycled arena chunks to start in the
decommitted state, and therefore increases the likelihood that purging
after run deallocation will allow the arena chunk to become a single
unused run, thus allowing the chunk as a whole to be discarded.

This resolves #766.
2017-04-18 12:08:28 -07:00
Jason Evans
d84d2909c3 Fix/enhance THP integration.
Detect whether chunks start off as THP-capable by default (according to
the state of /sys/kernel/mm/transparent_hugepage/enabled), and use this
as the basis for whether to call pages_nohuge() once per chunk during
first purge of any of the chunk's page runs.

Add the --disable-thp configure option, as well as the the opt.thp
mallctl.

This resolves #541.
2017-02-28 14:25:06 -08:00
Jason Evans
7c124830a1 Fix lg_chunk clamping for config_cache_oblivious.
Fix lg_chunk clamping to take into account cache-oblivious large
allocation.  This regression only resulted in incorrect behavior if
!config_fill (false unless --disable-fill specified) and
config_cache_oblivious (true unless --disable-cache-oblivious
specified).

This regression was introduced by
8a03cf039c (Implement cache index
randomization for large allocations.), which was first released in
4.0.0.

This resolves #555.
2017-02-27 15:19:41 -08:00
Jason Evans
08c24e7c1a Relax witness assertions related to prof_gdump().
In some cases the prof machinery allocates (in order to modify the
bt2gctx hash table), and such operations are synchronized via
bt2gctx_mtx.  Rather than asserting that no locks are held on entry
into functions that may call prof_gdump(), make the weaker assertion
that no "core" locks are held.  The prof machinery enqueues dumps
triggered by prof_gdump() calls when bt2gctx_mtx is held, so this
weakened assertion avoids false failures in such cases.
2017-02-23 10:08:42 -08:00
Jason Evans
f56cb9a68e Add witness_assert_depth[_to_rank]().
This makes it possible to make lock state assertions about precisely
which locks are held.
2017-02-23 10:08:42 -08:00
Jason Evans
b49c649bc1 Fix lock order reversal during gdump. 2017-01-24 12:50:06 -08:00
Jason Evans
e98a620c59 Mark partially purged arena chunks as non-hugepage.
Add the pages_[no]huge() functions, which toggle huge page state via
madvise(..., MADV_[NO]HUGEPAGE) calls.

The first time a page run is purged from within an arena chunk, call
pages_nohuge() to tell the kernel to make no further attempts to back
the chunk with huge pages.  Upon arena chunk deletion, restore the
associated virtual memory to its original state via pages_huge().

This resolves #243.
2016-11-24 00:15:55 -08:00
Jason Evans
2379479225 Consistently use size_t rather than uint64_t for extent serial numbers. 2016-11-15 13:47:22 -08:00
Jason Evans
5c77af98b1 Add extent serial numbers.
Add extent serial numbers and use them where appropriate as a sort key
that is higher priority than address, so that the allocation policy
prefers older extents.

This resolves #147.
2016-11-15 13:33:40 -08:00
Jason Evans
5d6cb6eb66 Refactor prng to not use 64-bit atomics on 32-bit platforms.
This resolves #495.
2016-11-07 11:50:59 -08:00
Jason Evans
a4e83e8593 Fix run leak.
Fix arena_run_first_best_fit() to search all potentially non-empty
runs_avail heaps, rather than ignoring the heap that contains runs
larger than large_maxclass, but less than chunksize.

This fixes a regression caused by
f193fd80cf (Refactor runs_avail.).

This resolves #493.
2016-11-07 09:43:39 -08:00
Jason Evans
28b7e42e44 Fix arena data structure size calculation.
Fix paren placement so that QUANTUM_CEILING() applies to the correct
portion of the expression that computes how much memory to base_alloc().
In practice this bug had no impact.  This was caused by
5d8db15db9 (Simplify run quantization.),
which in turn fixed an over-allocation regression caused by
3c4d92e82a (Add per size class huge
allocation statistics.).
2016-11-04 15:00:08 -07:00
Jason Evans
32896a902b Fix large allocation to search optimal size class heap.
Fix arena_run_alloc_large_helper() to not convert size to usize when
searching for the first best fit via arena_run_first_best_fit().  This
allows the search to consider the optimal quantized size class, so that
e.g. allocating and deallocating 40 KiB in a tight loop can reuse the
same memory.

This regression was nominally caused by
5707d6f952 (Quantize szad trees by size
class.), but it did not commonly cause problems until
8a03cf039c (Implement cache index
randomization for large allocations.).  These regressions were first
released in 4.0.0.

This resolves #487.
2016-11-03 22:36:30 -07:00
Jason Evans
e9012630ac Fix chunk_alloc_cache() to support decommitted allocation.
Fix chunk_alloc_cache() to support decommitted allocation, and use this
ability in arena_chunk_alloc_internal() and arena_stash_dirty(), so that
chunks don't get permanently stuck in a hybrid state.

This resolves #487.
2016-11-03 22:36:30 -07:00
Jason Evans
e2bcf037d4 Make dss operations lockless.
Rather than protecting dss operations with a mutex, use atomic
operations.  This has negligible impact on synchronization overhead
during typical dss allocation, but is a substantial improvement for
chunk_in_dss() and the newly added chunk_dss_mergeable(), which can be
called multiple times during chunk deallocations.

This change also has the advantage of avoiding tsd in deallocation paths
associated with purging, which resolves potential deadlocks during
thread exit due to attempted tsd resurrection.

This resolves #425.
2016-10-13 15:33:56 -07:00
Jason Evans
d419bb09ef Fix and simplify decay-based purging.
Simplify decay-based purging attempts to only be triggered when the
epoch is advanced, rather than every time purgeable memory increases.
In a correctly functioning system (not previously the case; see below),
this only causes a behavior difference if during subsequent purge
attempts the least recently used (LRU) purgeable memory extent is
initially too large to be purged, but that memory is reused between
attempts and one or more of the next LRU purgeable memory extents are
small enough to be purged.  In practice this is an arbitrary behavior
change that is within the set of acceptable behaviors.

As for the purging fix, assure that arena->decay.ndirty is recorded
*after* the epoch advance and associated purging occurs.  Prior to this
fix, it was possible for purging during epoch advance to cause a
substantially underrepresentative (arena->ndirty - arena->decay.ndirty),
i.e. the number of dirty pages attributed to the current epoch was too
low, and a series of unintended purges could result.  This fix is also
relevant in the context of the simplification described above, but the
bug's impact would be limited to over-purging at epoch advances.
2016-10-11 15:50:05 -07:00
Jason Evans
45a5bf6772 Do not advance decay epoch when time goes backwards.
Instead, move the epoch backward in time.  Additionally, add
nstime_monotonic() and use it in debug builds to assert that time only
goes backward if nstime_update() is using a non-monotonic time source.
2016-10-10 22:31:37 -07:00
Jason Evans
94e7ffa979 Refactor arena->decay_* into arena->decay.* (arena_decay_t). 2016-10-10 22:22:59 -07:00
Jason Evans
5d8db15db9 Simplify run quantization. 2016-10-06 15:58:38 -07:00
Jason Evans
f193fd80cf Refactor runs_avail.
Use pszind_t size classes rather than szind_t size classes, and always
reserve space for NPSIZES elements.  This removes unused heaps that are
not multiples of the page size, and adds (currently) unused heaps for
all huge size classes, with the immediate benefit that the size of
arena_t allocations is constant (no longer dependent on chunk size).
2016-10-04 19:48:50 -07:00
Jason Evans
1abb49f09d Implement pz2ind(), pind2sz(), and psz2u().
These compute size classes and indices similarly to size2index(),
index2size() and s2u(), respectively, but using the subset of size
classes that are multiples of the page size.  Note that pszind_t and
szind_t are not interchangeable.
2016-10-04 16:29:19 -07:00
Jason Evans
05a9e4ac65 Fix potential VM map fragmentation regression.
Revert 245ae6036c (Support --with-lg-page
values larger than actual page size.), because it could cause VM map
fragmentation if the kernel grows mmap()ed memory downward.

This resolves #391.
2016-06-07 14:21:21 -07:00
Jason Evans
c1e00ef2a6 Resolve bootstrapping issues when embedded in FreeBSD libc.
b2c0d6322d (Add witness, a simple online
locking validator.) caused a broad propagation of tsd throughout the
internal API, but tsd_fetch() was designed to fail prior to tsd
bootstrapping.  Fix this by splitting tsd_t into non-nullable tsd_t and
nullable tsdn_t, and modifying all internal APIs that do not critically
rely on tsd to take nullable pointers.  Furthermore, add the
tsd_booted_get() function so that tsdn_fetch() can probe whether tsd
bootstrapping is complete and return NULL if not.  All dangerous
conversions of nullable pointers are tsdn_tsd() calls that assert-fail
on invalid conversion.
2016-05-10 22:51:33 -07:00
Jason Evans
3ef51d7f73 Optimize the fast paths of calloc() and [m,d,sd]allocx().
This is a broader application of optimizations to malloc() and free() in
f4a0f32d34 (Fast-path improvement:
reduce # of branches and unnecessary operations.).

This resolves #321.
2016-05-06 14:37:39 -07:00
Jason Evans
04c3c0f9a0 Add the stats.retained and stats.arenas.<i>.retained statistics.
This resolves #367.
2016-05-03 22:11:35 -07:00
Jason Evans
90827a3f3e Fix huge_palloc() regression.
Split arena_choose() into arena_[i]choose() and use arena_ichoose() for
arena lookup during internal allocation.  This fixes huge_palloc() so
that it always succeeds during extent node allocation.

This regression was introduced by
66cd953514 (Do not allocate metadata via
non-auto arenas, nor tcaches.).
2016-05-03 17:19:15 -07:00
Jason Evans
174c0c3a9c Fix fork()-related lock rank ordering reversals. 2016-04-25 23:16:20 -07:00
Jason Evans
7e6749595a Fix arena reset effects on large/huge stats.
Reset large curruns to 0 during arena reset.

Do not increase huge ndalloc stats during arena reset.
2016-04-25 13:26:54 -07:00
Jason Evans
19ff2cefba Implement the arena.<i>.reset mallctl.
This makes it possible to discard all of an arena's allocations in a
single operation.

This resolves #146.
2016-04-22 15:20:06 -07:00
Jason Evans
66cd953514 Do not allocate metadata via non-auto arenas, nor tcaches.
This assures that all internally allocated metadata come from the
first opt_narenas arenas, i.e. the automatically multiplexed arenas.
2016-04-22 15:19:59 -07:00
Jason Evans
c9a4bf9170 Reduce a variable scope. 2016-04-22 14:56:58 -07:00
Jason Evans
ab0cfe01fa Update private_symbols.txt.
Change test-related mangling to simplify symbol filtering.

The following commands can be used to detect missing/obsolete symbol
mangling, with the caveat that the full set of symbols is based on the
union of symbols generated by all configurations, some of which are
platform-specific:

./autogen.sh --enable-debug --enable-prof --enable-lazy-lock
make all tests
nm -a lib/libjemalloc.a src/*.jet.o \
  |grep " [TDBCR] " \
  |awk '{print $3}' \
  |sed -e 's/^\(je_\|jet_\(n_\)\?\)\([a-zA-Z0-9_]*\)/\3/g' \
  |LC_COLLATE=C sort -u \
  |grep -v \
   -e '^\(malloc\|calloc\|posix_memalign\|aligned_alloc\|realloc\|free\)$' \
   -e '^\(m\|r\|x\|s\|d\|sd\|n\)allocx$' \
   -e '^mallctl\(\|nametomib\|bymib\)$' \
   -e '^malloc_\(stats_print\|usable_size\|message\)$' \
   -e '^\(memalign\|valloc\)$' \
   -e '^__\(malloc\|memalign\|realloc\|free\)_hook$' \
   -e '^pthread_create$' \
  > /tmp/private_symbols.txt
2016-04-18 15:23:35 -07:00
Jason Evans
b2c0d6322d Add witness, a simple online locking validator.
This resolves #358.
2016-04-14 02:09:28 -07:00
rustyx
00432331b8 Fix 64-to-32 conversion warnings in 32-bit mode 2016-04-12 09:34:09 -07:00
Jason Evans
245ae6036c Support --with-lg-page values larger than actual page size.
During over-allocation in preparation for creating aligned mappings,
allocate one more page than necessary if PAGE is the actual page size,
so that trimming still succeeds even if the system returns a mapping
that has less than PAGE alignment.  This allows compiling with e.g. 64
KiB "pages" on systems that actually use 4 KiB pages.

Note that for e.g. --with-lg-page=21, it is also necessary to increase
the chunk size (e.g. --with-malloc-conf=lg_chunk:22) so that there are
at least two "pages" per chunk.  In practice this isn't a particularly
compelling configuration because so much (unusable) virtual memory is
dedicated to chunk headers.
2016-04-11 02:35:00 -07:00
Jason Evans
c6a2c39404 Refactor/fix ph.
Refactor ph to support configurable comparison functions.  Use a cpp
macro code generation form equivalent to the rb macros so that pairing
heaps can be used for both run heaps and chunk heaps.

Remove per node parent pointers, and instead use leftmost siblings' prev
pointers to track parents.

Fix multi-pass sibling merging to iterate over intermediate results
using a FIFO, rather than a LIFO.  Use this fixed sibling merging
implementation for both merge phases of the auxiliary twopass algorithm
(first merging the aux list, then replacing the root with its merged
children).  This fixes both degenerate merge behavior and the potential
for deep recursion.

This regression was introduced by
6bafa6678f (Pairing heap).

This resolves #371.
2016-04-11 02:15:42 -07:00
Chris Peterson
a82070ef5f Add JEMALLOC_ALLOC_JUNK and JEMALLOC_FREE_JUNK macros
Replace hardcoded 0xa5 and 0x5a junk values with JEMALLOC_ALLOC_JUNK and
JEMALLOC_FREE_JUNK macros, respectively.
2016-03-31 11:23:29 -07:00
Jason Evans
f86bc081d6 Update a comment. 2016-03-31 11:19:46 -07:00
Jason Evans
ce7c0f999b Fix potential chunk leaks.
Move chunk_dalloc_arena()'s implementation into chunk_dalloc_wrapper(),
so that if the dalloc hook fails, proper decommit/purge/retain cascading
occurs.  This fixes three potential chunk leaks on OOM paths, one during
dss-based chunk allocation, one during chunk header commit (currently
relevant only on Windows), and one during rtree write (e.g. if rtree
node allocation fails).

Merge chunk_purge_arena() into chunk_purge_default() (refactor, no
change to functionality).
2016-03-30 18:36:04 -07:00
Jason Evans
61a6dfcd5f Constify various internal arena APIs. 2016-03-23 16:15:42 -07:00
Jason Evans
613cdc80f6 Convert arena_bin_t's runs from a tree to a heap. 2016-03-08 13:48:27 -08:00
Dave Watson
4a0dbb5ac8 Use pairing heap for arena->runs_avail
Use pairing heap instead of red black tree in arena runs_avail.  The
extra links are unioned with the bitmap_t, so this change doesn't use
any extra memory.

Canaries show this change to be a 1% cpu win, and 2% latency win.  In
particular, large free()s, and small bin frees are now O(1) (barring
coalescing).

I also tested changing bin->runs to be a pairing heap, but saw a much
smaller win, and it would mean increasing the size of arena_run_s by two
pointers, so I left that as an rb-tree for now.
2016-03-08 13:48:27 -08:00
Jason Evans
022f6891fa Avoid a potential innocuous compiler warning.
Add a cast to avoid comparing a ssize_t value to a uint64_t value that
is always larger than a 32-bit ssize_t.  This silences an innocuous
compiler warning from e.g. gcc 4.2.1 about the comparison always having
the same result.
2016-03-02 22:45:37 -08:00
Dmitri Smirnov
33184bf698 Fix stack corruption and uninitialized var warning
Stack corruption happens in x64 bit

This resolves #347.
2016-02-29 15:22:53 -08:00
Jason Evans
3c07f803aa Fix stats.arenas.<i>.[...] for --disable-stats case.
Add missing stats.arenas.<i>.{dss,lg_dirty_mult,decay_time}
initialization.

Fix stats.arenas.<i>.{pactive,pdirty} to read under the protection of
the arena mutex.
2016-02-27 20:40:13 -08:00
Jason Evans
40ee9aa957 Fix stats.cactive accounting regression.
Fix stats.cactive accounting to always increase/decrease by multiples of
the chunk size, even for huge size classes that are not multiples of the
chunk size, e.g. {2.5, 3, 3.5, 5, 7} MiB with 2 MiB chunk size.  This
regression was introduced by 155bfa7da1
(Normalize size classes.) and first released in 4.0.0.

This resolves #336.
2016-02-27 15:35:52 -08:00
Jason Evans
3763d3b5f9 Refactor arena_cactive_update() into arena_cactive_{add,sub}().
This removes an implicit conversion from size_t to ssize_t.  For cactive
decreases, the size_t value was intentionally underflowed to generate
"negative" values (actually positive values above the positive range of
ssize_t), and the conversion to ssize_t was undefined according to C
language semantics.

This regression was perpetuated by
1522937e9c (Fix the cactive statistic.)
and first release in 4.0.0, which in retrospect only fixed one of two
problems introduced by aa5113b1fd
(Refactor overly large/complex functions) and first released in 3.5.0.
2016-02-26 17:29:35 -08:00
Jason Evans
42ce80e15a Silence miscellaneous 64-to-32-bit data loss warnings.
This resolves #341.
2016-02-25 20:51:00 -08:00
Jason Evans
8282a2ad97 Remove a superfluous comment. 2016-02-25 16:44:48 -08:00