sharded fixed-size object allocator with a lock-free hot path More...
#include <taskflow/utility/object_pool.hpp>
Public Member Functions | |
| ObjectPool ()=default | |
constructs the allocator with 2^LogSize empty shards | |
| ObjectPool (const ObjectPool &)=delete | |
| disabled copy constructor | |
| ObjectPool & | operator= (const ObjectPool &)=delete |
| disabled copy assignment operator | |
| ~ObjectPool ()=default | |
| destroys the allocator and releases all backing memory to upstream | |
| template<typename... Args> | |
| T * | animate (Args &&... args) |
constructs an object of type T in the pool and returns a pointer | |
| void | recycle (T *obj) |
| destructs the object and returns its storage to the pool | |
| void | release () |
| returns all recycled blocks and backing memory to the system allocator | |
sharded fixed-size object allocator with a lock-free hot path
| T | object type to allocate |
| H | tagged-pointer policy that controls the free-stack head representation. The type must provide:
|
| LogSize | log2 of the number of shards (default 5, giving 32 shards); must be in [1, 15] to fit the shard index in a uint16_t |
ObjectPool is a high-performance allocator for a single fixed-size type T, designed for concurrent task-parallel workloads where objects are frequently created and destroyed across many threads.
Internally, allocations are distributed across 2^LogSize independent shards. Each shard maintains two independent components (separated by cache lines to prevent false sharing):
Hot Path (99% of operations): A lock-free Treiber stack of recycled blocks. When tf::ObjectPool::animate is called, it tries to pop a recycled block from this stack with a single atomic CAS. On success, the block is reused with no mutex acquisition. Blocks returned by tf::ObjectPool::recycle are pushed back onto this stack without acquiring any mutex.
Cold Path (1% of operations): A std::pmr::synchronized_pool_resource as backing storage for fresh block allocations. This mutex-protected pool is only touched when the shard's hot-path stack is empty. When accessed, it allocates a whole chunk (configured to hold up to 1024 blocks via max_blocks_per_chunk = 1024), amortizing the synchronization cost: one mutex acquisition yields ~1024 blocks for the hot path.
The tagged-pointer policy H attaches a version counter to each free-stack head to prevent the ABA problem. This counter increments on every push and pop, making ABA wrap-around effectively impossible under realistic workloads. Shards are aligned to the cache line size to eliminate false sharing between concurrent threads accessing different shards' hot-path stacks.
The combination of lock-free and mutex-protected freelists is deliberate: recycled blocks remain on the lock-free stack indefinitely, avoiding mutex costs on every allocation. The backing pool's internal freelist is rarely used directly because blocks do not call deallocate() in the normal hot path — they stay on the lock-free stack for immediate reuse. This design trades chunk-level memory reuse efficiency for atomic-fast allocation on the hot path, which is the right trade-off for task-parallel workloads where the hot path is hit millions of times.
When the hot-path stack is empty, a single std::pmr::synchronized_pool_resource::allocate call acquires a mutex and either reuses a chunk or allocates a new one from the system allocator. With max_blocks_per_chunk = 1024, one mutex acquisition amortizes to ~1024 subsequent lock-free pops, yielding negligible mutex overhead (roughly 0.001 mutex cost per allocation).
|
default |
constructs the allocator with 2^LogSize empty shards
Each shard is default-constructed with an empty free stack and an uninitialized backing pool. No memory is allocated from the OS until the first call to tf::ObjectPool::animate.
|
delete |
disabled copy constructor
ObjectPool owns its shards and backing memory; copying is not meaningful. Declare the allocator as a global or long-lived member and share it by reference or pointer.
|
default |
destroys the allocator and releases all backing memory to upstream
The destructor of each shard's std::pmr::synchronized_pool_resource returns all allocated chunks to the system allocator, including memory backing blocks that are currently on the free stack. No per-block destructor is called; callers are responsible for recycling all live objects before destroying the allocator.
|
inlinenodiscard |
constructs an object of type T in the pool and returns a pointer
| Args | constructor argument types |
| args | arguments forwarded to the constructor of T |
T; never nullOn the hot path, animate pops a previously recycled block from the shard's lock-free free stack and constructs T in it via std::construct_at, with no mutex acquisition. On a cache miss (empty free stack), a fresh block is carved from the shard's backing std::pmr::synchronized_pool_resource, which amortizes system allocation cost over chunks of up to 1024 blocks.
Allocations are distributed across shards via a per-thread round-robin counter seeded from the thread ID hash, balancing load with zero shared state after initialization.
|
inline |
destructs the object and returns its storage to the pool
| obj | pointer to a T previously returned by tf::ObjectPool::animate, or nullptr (no-op) |
recycle calls the destructor of *obj via std::destroy_at, then pushes the underlying block onto its shard's lock-free free stack without acquiring any mutex. The block becomes immediately available for the next call to tf::ObjectPool::animate on any thread.
The correct shard is identified via the pool_id stored in the block header, so recycle may be called from any thread regardless of which thread called animate.
obj is a dangling pointer and must not be dereferenced.
|
inline |
returns all recycled blocks and backing memory to the system allocator
release calls std::pmr::synchronized_pool_resource::release on each shard's backing pool, returning all chunks to the upstream system allocator in one shot, then atomically resets each shard's free stack to null. This is an O(1) operation per shard — no per-block work is performed because the backing pool owns memory at the chunk level and frees entire chunks regardless of how many individual blocks were returned to it.
After this call the allocator is in the same state as after construction: empty free stacks, no memory held from the OS.
This method is optional and is not required before destruction. It is useful for reclaiming pool memory between distinct workload phases without destroying the allocator itself.