jemalloc/test/stress/pa
Guangli Dai 1dfa6f7aa4 Replace PAI vtable dispatch with direct calls
The pai_t interface implements C-style polymorphism via function pointers
to abstract over PAC and HPA. This abstraction provides no real benefit:
only two implementations exist, the dispatcher already knows which one to
use, and HPA stubs 2 of 5 operations. Remove the runtime dispatch in
favor of direct calls.

This commit:
- Promotes pac_alloc/expand/shrink/dalloc/time_until_deferred_work to
  external linkage and replaces the pai_t *self parameter with pac_t *pac.
- Promotes hpa_alloc/expand/shrink/dalloc/time_until_deferred_work to
  external linkage and replaces pai_t *self with hpa_shard_t *shard.
- Updates hpa_dalloc_batch's signature to take hpa_shard_t * directly
  and removes the hpa_from_pai container-of helper. Updates internal
  callers in hpa_alloc, hpa_dalloc, and hpa_sec_flush_impl.
- Drops the vtable assignments from pac_init() and hpa_shard_init().
- Replaces pai_alloc/dalloc/etc. dispatch in pa.c with direct calls.
  HPA expand and shrink (which are unconditional failure stubs) are
  skipped entirely for HPA-owned extents.
- Removes the pa_get_pai() helper.
- Updates tests in test/unit/hpa.c and test/unit/hpa_sec_integration.c
  to call hpa_alloc/dalloc/etc. directly.

The pai_t struct field stays as dead weight in pac_t and hpa_shard_t;
it is removed in the next commit along with pai.h itself.

No behavioral changes.
2026-05-12 13:43:16 -07:00
..
data Adding trace analysis in preparation for page allocator microbenchmark. 2026-03-10 18:14:33 -07:00
.gitignore Adding trace analysis in preparation for page allocator microbenchmark. 2026-03-10 18:14:33 -07:00
pa_data_preprocessor.cpp [pa-bench] Add clock to pa benchmark 2026-03-10 18:14:33 -07:00
pa_microbench.c Replace PAI vtable dispatch with direct calls 2026-05-12 13:43:16 -07:00
README.md Add a page-allocator microbenchmark. 2026-03-10 18:14:33 -07:00

Page Allocator (PA) Microbenchmark Suite

This directory contains a comprehensive microbenchmark suite for testing and analyzing jemalloc's Page Allocator (PA) system, including the Hugepage-aware Page Allocator (HPA) and Slab Extent Cache (SEC) components.

Overview

The PA microbenchmark suite consists of two main programs designed to preprocess allocation traces and replay them against jemalloc's internal PA system to measure performance, memory usage, and allocation patterns.

To summarize how to run it, assume we have a file test/stress/pa/data/hpa.csv collected from a real application using USDT, the simulation can be run as follows:

make tests_pa
./test/stress/pa/pa_data_preprocessor hpa test/stress/pa/data/hpa.csv test/stress/pa/data/sample_hpa_output.csv
./test/stress/pa/pa_microbench -p -o test/stress/pa/data/sample_hpa_stats.csv test/stress/pa/data/sample_hpa_output.csv

If it's sec, simply replace the first parameter passed to pa_data_preprocessor with sec.

Architecture

PA System Components

The Page Allocator sits at the core of jemalloc's memory management hierarchy:

Application
    ↓
Arena (tcache, bins)
    ↓
PA (Page Allocator) ← This is what we benchmark
    ├── HPA (Hugepage-aware Page Allocator)
    └── SEC (Slab Extent Cache)
    ↓
Extent Management (emap, edata)
    ↓
Base Allocator
    ↓
OS (mmap/munmap)

Microbenchmark Architecture

Raw Allocation Traces
    ↓
[pa_data_preprocessor] ← Preprocesses and filters traces
    ↓
CSV alloc/dalloc Files
    ↓
[pa_microbench] ← Replays against real PA system
    ↓
Performance Statistics & Analysis

Programs

1. pa_data_preprocessor

A C++ data preprocessing tool that converts raw allocation traces into a standardized CSV format suitable for microbenchmarking.

Purpose:

  • Parse and filter raw allocation trace data
  • Convert various trace formats to standardized CSV
  • Filter by process ID, thread ID, or other criteria
  • Validate and clean allocation/deallocation sequences

2. pa_microbench

A C microbenchmark that replays allocation traces against jemalloc's actual PA system to measure performance and behavior with HPA statistics collection.

Purpose:

  • Initialize real PA infrastructure (HPA, SEC, base allocators, emaps)
  • Replay allocation/deallocation sequences from CSV traces
  • Measure allocation latency, memory usage, and fragmentation
  • Test different PA configurations (HPA-only vs HPA+SEC)
  • Generate detailed HPA internal statistics

Key Features:

  • Real PA Integration: Uses jemalloc's actual PA implementation, not simulation
  • Multi-shard Support: Tests allocation patterns across multiple PA shards
  • Configurable Modes: Supports HPA-only mode (-p) and HPA+SEC mode (-s)
  • Statistics Output: Detailed per-shard statistics and timing data
  • Configurable Intervals: Customizable statistics output frequency (-i/--interval)

Building

Compilation

# Build both PA microbenchmark tools
cd /path/to/jemalloc
make tests_pa

This creates:

  • test/stress/pa/pa_data_preprocessor - Data preprocessing tool
  • test/stress/pa/pa_microbench - PA microbenchmark

Usage

Data Preprocessing

# Basic preprocessing
./test/stress/pa/pa_data_preprocessor <hpa/sec> input_trace.txt output.csv

Microbenchmark Execution

# Run with HPA + SEC (default mode)
./test/stress/pa/pa_microbench -s -o stats.csv trace.csv

# Run with HPA-only (no SEC)
./test/stress/pa/pa_microbench -p -o stats.csv trace.csv

# Show help
./test/stress/pa/pa_microbench -h