Commit graph

78 commits

Author SHA1 Message Date
guangli-dai
0f8a8d7ebe Refactor init_system_thp_mode and print it in malloc stats. 2025-10-13 16:56:12 -07:00
Slobodan Predolac
f8ef3195e5 [HPA] Add ability to start page as huge and more flexibility for purging 2025-10-06 10:59:43 -04:00
Slobodan Predolac
db943a0d9e Running clang-format on two files 2025-10-06 10:59:43 -04:00
Slobodan Predolac
d4cde60066 Remove pidfd_open call handling and rely on PIDFD_SELF 2025-08-27 12:44:45 -04:00
lexprfuncall
5f06b14d1c Remove an orphaned comment
This was left behind when definitions of malloc_open and malloc_close
were abstracted from code that had followed.
2025-08-23 11:42:23 -04:00
Slobodan Predolac
4f126b47b3 Save and restore errno when calling process_madvise 2025-08-21 16:22:06 -07:00
lexprfuncall
c45b6223e5 Use relaxed atomics to access the process madvise pid fd
Relaxed atomics already provide sequentially consistent access to single
location data structures.
2025-08-13 18:33:27 -07:00
Slobodan Predolac
97d25919c3 [process_madvise] Make init lazy so that python tests pass. Reset the pidfd on fork 2025-07-29 15:47:53 -07:00
guangli-dai
34f359e0ca Reformat the codebase with the clang-format 18. 2025-06-20 14:35:15 -07:00
Slobodan Predolac
852da1be15 Add experimental option force using SYS_process_madvise 2025-04-28 18:45:30 -07:00
Qi Wang
22440a0207 Implement process_madvise support.
Add opt.process_madvise_max_batch which determines if process_madvise is enabled
(non-zero) and the max # of regions in each batch.  Added another limiting
factor which is the space to reserve on stack, which results in the max batch of
128.
2025-03-07 15:32:32 -08:00
Dmitry Ilvokhin
0ce13c6fb5 Add opt hpa_hugify_sync to hugify synchronously
Linux 6.1 introduced `MADV_COLLAPSE` flag to perform a best-effort
synchronous collapse of the native pages mapped by the memory range into
transparent huge pages.

Synchronous hugification might be beneficial for at least two reasons:
we are not relying on khugepaged anymore and get an instant feedback if
range wasn't hugified.

If `hpa_hugify_sync` option is on, we'll try to perform synchronously
collapse and if it wasn't successful, we'll fallback to asynchronous
behaviour.
2024-11-20 10:52:52 -08:00
Nathan Slingerland
8c2e15d1a5 Add malloc_open() / malloc_close() reentrancy safe helpers 2024-09-12 15:38:08 -07:00
Danny Lin
c893fcd169 Change macOS mmap tag to fix conflict with CoreMedia
Tag 101 is assigned to "CoreMedia Capture Data", which makes for confusing output when debugging.

To avoid conflicts, use a tag in the reserved application-specific range from 240–255 (inclusive).

All assigned tags: 94d3b45284/osfmk/mach/vm_statistics.h (L773-L775)
2024-06-26 14:53:48 -07:00
Kevin Svetlitski
da66aa391f Enable a few additional warnings for CI and fix the issues they uncovered
- `-Wmissing-prototypes` and `-Wmissing-variable-declarations` are
  helpful for finding dead code and/or things that should be `static`
  but aren't marked as such.
- `-Wunused-macros` is of similar utility, but for identifying dead macros.
- `-Wunreachable-code` and `-Wunreachable-code-aggressive` do exactly
  what they say: flag unreachable code.
2023-08-11 13:56:23 -07:00
Kevin Svetlitski
3e82f357bb Fix all optimization-inhibiting integer-to-pointer casts
Following from PR #2481, we replace all integer-to-pointer casts [which
hide pointer provenance information (and thus inhibit
optimizations)](https://clang.llvm.org/extra/clang-tidy/checks/performance/no-int-to-ptr.html)
with equivalent operations that preserve this information. I have
enabled the corresponding clang-tidy check in our static analysis CI so
that we do not get bitten by this again in the future.
2023-07-24 14:40:42 -07:00
Kevin Svetlitski
589c63b424 Make eligible global variables static and/or const
For better or worse, Jemalloc has a significant number of global
variables. Making all eligible global variables `static` and/or `const`
at least makes it slightly easier to reason about them, as these
qualifications communicate to the programmer restrictions on their use
without having to `grep` the whole codebase.
2023-07-06 14:15:12 -07:00
Qi Wang
602edd7566 Enabled -Wstrict-prototypes and fixed warnings. 2023-07-06 12:00:02 -07:00
Kevin Svetlitski
5a858c64d6 Reduce the memory overhead of sampled small allocations
Previously, small allocations which were sampled as part of heap
profiling were rounded up to `SC_LARGE_MINCLASS`. This additional memory
usage becomes problematic when the page size is increased, as noted in #2358.

Small allocations are now rounded up to the nearest multiple of `PAGE`
instead, reducing the memory overhead by a factor of 4 in the most
extreme cases.
2023-07-03 16:19:06 -07:00
Christos Zoulas
5832ef6589 Use a local variable to set the alignment for this particular allocation
instead of changing mmap_flags which makes the change permanent. This was
enforcing large alignments for allocations that did not need it causing
fragmentation. Reported by Andreas Gustafsson.
2023-05-31 14:44:24 -07:00
Kevin Svetlitski
3e2ba7a651 Remove dead stores detected by static analysis
None of these are harmful, and they are almost certainly optimized
away by the compiler. The motivation for fixing them anyway is that
we'd like to enable static analysis as part of CI, and the first step
towards that is resolving the warnings it produces at present.
2023-05-11 20:27:49 -07:00
David Carlier
4fc5c4fbac New configure option '--enable-pageid' for Linux
The option makes jemalloc use prctl with PR_SET_VMA to tag memory mappings with
"jemalloc_pg" or "jemalloc_pg_overcommit". This allows to easily identify
jemalloc's mappings in /proc/<pid>/maps. PR_SET_VMA is only available in Linux
5.17 and above.
2022-06-09 18:54:08 -07:00
Alex Lapenkou
0f6da1257d San: Implement bump alloc
The new allocator will be used to allocate guarded extents used as slabs
for guarded small allocations.
2021-12-15 10:39:17 -08:00
Qi Wang
deb8e62a83 Implement guard pages.
Adding guarded extents, which are regular extents surrounded by guard pages
(mprotected).  To reduce syscalls, small guarded extents are cached as a
separate eset in ecache, and decay through the dirty / muzzy / retained pipeline
as usual.
2021-09-26 16:30:15 -07:00
David CARLIER
35a8552605 Mac OS: Tag mapped pages.
This can be used to help profiling tools (e.g. vmmap) identify the
sources of mappings more specifically.
2021-02-03 15:05:53 -08:00
Azat Khuzhin
a943172b73 Add runtime detection for MADV_DONTNEED zeroes pages (mostly for qemu)
qemu does not support this, yet [1], and you can get very tricky assert
if you will run program with jemalloc in use under qemu:

    <jemalloc>: ../contrib/jemalloc/src/extent.c:1195: Failed assertion: "p[i] == 0"

  [1]: https://patchwork.kernel.org/patch/10576637/

Here is a simple example that shows the problem [2]:

    // Gist to check possible issues with MADV_DONTNEED
    // For example it does not supported by qemu user
    // There is a patch for this [1], but it hasn't been applied.
    //   [1]: https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05422.html

    #include <sys/mman.h>
    #include <stdio.h>
    #include <stddef.h>
    #include <assert.h>
    #include <string.h>

    int main(int argc, char **argv)
    {
        void *addr = mmap(NULL, 1<<16, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        if (addr == MAP_FAILED) {
            perror("mmap");
            return 1;
        }
        memset(addr, 'A', 1<<16);

        if (!madvise(addr, 1<<16, MADV_DONTNEED)) {
            puts("MADV_DONTNEED does not return error. Check memory.");
            for (int i = 0; i < 1<<16; ++i) {
                assert(((unsigned char *)addr)[i] == 0);
            }
        } else {
            perror("madvise");
        }

        if (munmap(addr, 1<<16)) {
            perror("munmap");
            return 1;
        }

        return 0;
    }

  ### unpatched qemu

      $ qemu-x86_64-static /tmp/test-MADV_DONTNEED
      MADV_DONTNEED does not return error. Check memory.
      test-MADV_DONTNEED: /tmp/test-MADV_DONTNEED.c:19: main: Assertion `((unsigned char *)addr)[i] == 0' failed.
      qemu: uncaught target signal 6 (Aborted) - core dumped
      Aborted (core dumped)

  ### patched qemu (by returning ENOSYS error)

      $ qemu-x86_64 /tmp/test-MADV_DONTNEED
      madvise: Success

  ### patch for qemu to return ENOSYS

      diff --git a/linux-user/syscall.c b/linux-user/syscall.c
      index 897d20c076..5540792e0e 100644
      --- a/linux-user/syscall.c
      +++ b/linux-user/syscall.c
      @@ -11775,7 +11775,7 @@ static abi_long do_syscall1(void *cpu_env, int num, abi_long arg1,
                  turns private file-backed mappings into anonymous mappings.
                  This will break MADV_DONTNEED.
                  This is a hint, so ignoring and returning success is ok.  */
      -        return 0;
      +        return ENOSYS;
       #endif
       #ifdef TARGET_NR_fcntl64
           case TARGET_NR_fcntl64:

  [2]: https://gist.github.com/azat/12ba2c825b710653ece34dba7f926ece

v2:
- review fixes
- add opt_dont_trust_madvise
v3:
- review fixes
- rename opt_dont_trust_madvise to opt_trust_madvise
2021-01-20 20:08:30 -08:00
Jin Qian
4e3fe218e9 Use posix_madvise to purge pages when available 2020-12-18 10:05:59 -08:00
David Carlier
d2d941017b MADV_DO[NOT]DUMP support equivalence on FreeBSD. 2020-11-02 09:15:15 -08:00
David Goldblatt
1ed0288d9c bit_util: Change ffs functions indexing.
Making these 0-based instead of 1-based makes calling code simpler and will be
more consistent with functions introduced in subsequent diffs.
2020-07-30 15:25:23 -07:00
David Carlier
00f06c9beb enabling mpss on solaris/illumos.
reusing slighty linux configuration as possible, aligning the
 address range to HUGEPAGE.
2020-07-06 09:59:10 -07:00
Yinan Zhang
a795b19327 Remove beginning define in source files
```
sed -i "/^#define JEMALLOC_[A-Z_]*_C_$/d" src/*.c;
```
2020-06-19 12:15:44 -07:00
zoulasc
536ea6858e NetBSD specific changes:
- NetBSD overcommits
- When mapping pages, use the maximum of the alignment requested and the
  compiled-in PAGE constant which might be greater than the current kernel
  pagesize, since we compile binaries with the maximum page size supported
  by the architecture (so that they work with all kernels).
2020-02-03 15:49:36 -08:00
RingsC
6924f83cb2 use SYS_openat when available
some architecture like AArch64 may not have the open syscall, but have
openat syscall. so check and use SYS_openat if SYS_openat available if
SYS_open is not supported at init_thp_state.
2019-11-01 13:06:40 -07:00
Edward Tomasz Napierala
a4c6b9ae01 Restore a FreeBSD-specific getpagesize(3) optimization.
It was removed in 0771ff2cea.
Add a comment explaining its purpose.
2018-11-09 14:14:49 -08:00
Qi Wang
50b473c883 Set commit properly for FreeBSD w/ overcommit.
When overcommit is enabled, commit needs to be set when doing mmap().  The
regression was introduced in f80c97e.
2018-11-05 09:47:04 -08:00
jsteemann
856319dc8a check return value of malloc_read_fd
in case `malloc_read_fd` returns a negative error number, the result
would afterwards be casted to an unsigned size_t, and may have
theoretically caused an out-of-bounds memory access in the following
`strncmp` call.
2018-10-11 17:25:20 -07:00
Edward Tomasz Napierala
f80c97e477 Rework the way jemalloc uses mmap(2) on FreeBSD.
This makes it directly use MAP_EXCL and MAP_ALIGNED() instead
of weird workarounds involving mapping at random places and then
unmapping parts of them.
2018-10-06 22:06:56 -07:00
Edward Tomasz Napierala
676cdd6679 Disable runtime detection of lazy purging support on FreeBSD.
The check doesn't seem to serve any purpose here, and this shaves
off three syscalls on binary startup.
2018-10-06 22:06:56 -07:00
David Carlier
0771ff2cea FreeBSD build changes and allow to run the tests. 2018-08-09 10:41:20 -07:00
Qi Wang
e8a63b87c3 Fix an incorrect assertion.
When configured with --with-lg-page, it's possible for the configured page size
to be greater than the system page size, in which case the page address may only
be aligned with the system page size.
2018-05-09 23:52:56 -07:00
Qi Wang
d3e0976a2c Fix type warning on Windows.
Add cast since read / write has unsigned return type on windows.
2018-04-09 16:50:30 -07:00
Qi Wang
e4f090e8df Add opt.thp which allows explicit hugepage usage.
"always" marks all user mappings as MADV_HUGEPAGE; while "never" marks all
mappings as MADV_NOHUGEPAGE. The default setting "default" does not change any
settings.  Note that all the madvise calls are part of the default extent hooks
by design, so that customized extent hooks have complete control over the
mappings including hugepage settings.
2018-03-08 13:08:06 -08:00
Edward Tomasz Napierala
9f455e2786 Try to use sysctl(3) instead of sysctlbyname(3).
This attempts to use VM_OVERCOMMIT OID - newly introduced in -CURRENT
few days ago, specifically for this purpose - instead of querying the
sysctl by its string name.  Due to how syctlbyname(3) works, this means
we do one syscall during binary startup instead of two.

Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
2017-11-03 08:25:39 -07:00
Edward Tomasz Napierala
d591df05c8 Use getpagesize(3) under FreeBSD.
This avoids sysctl(2) syscall during binary startup, using the value
passed in the ELF aux vector instead.

Signed-off-by: Edward Tomasz Napierala <trasz@FreeBSD.org>
2017-11-03 08:25:39 -07:00
David Goldblatt
bbaa72422b Add pages_dontdump and pages_dodump.
This will, eventually, enable us to avoid dumping eden regions.
2017-10-16 15:35:49 -07:00
Qi Wang
31ab38be5f Define MADV_FREE on our own when needed.
On x86 Linux, we define our own MADV_FREE if madvise(2) is available, but no
MADV_FREE is detected.  This allows the feature to be built in and enabled with
runtime detection.
2017-10-11 15:49:22 -07:00
Qi Wang
0720192a32 Add runtime detection of lazy purging support.
It's possible to build with lazy purge enabled but depoly to systems without
such support.  In this case, rely on the boot time detection instead of keep
making unnecessary madvise calls (which all returns EINVAL).
2017-09-26 17:26:22 -07:00
Qi Wang
47b20bb654 Change opt.metadata_thp to [disabled,auto,always].
To avoid the high RSS caused by THP + low usage arena (i.e. THP becomes a
significant percentage), added a new "auto" option which will only start using
THP after a base allocator used up the first THP region.  Starting from the
second hugepage (in a single arena), "auto" behaves the same as "always",
i.e. madvise hugepage right away.
2017-08-30 16:47:32 -07:00
Qi Wang
3ec279ba1c Fix test/unit/pages.
As part of the metadata_thp support, We now have a separate swtich
(JEMALLOC_HAVE_MADVISE_HUGE) for MADV_HUGEPAGE availability.  Use that instead
of JEMALLOC_THP (which doesn't guard pages_huge anymore) in tests.
2017-08-11 15:57:12 -07:00
Qi Wang
8fdd9a5797 Implement opt.metadata_thp
This option enables transparent huge page for base allocators (require
MADV_HUGEPAGE support).
2017-08-11 14:51:20 -07:00