SKILL.md
$27
Installation
Copy the header into your project
git clone https://github.com/LaurieWired/tailslayer.git
cp -r tailslayer/include/tailslayer /your/project/include/
Include in your code
#include <tailslayer/hedged_reader.hpp>
Build the provided example
git clone https://github.com/LaurieWired/tailslayer.git
cd tailslayer
make
./tailslayer_example
Key API
tailslayer::HedgedReader
Template parameters:
Parameter
Description
T
Value type stored and read
SignalFn
Function that waits for a trigger and returns the index to read
WorkFn
Function called with the value immediately after read
SignalArgs
(optional) tailslayer::ArgList<...> of compile-time args to signal function
WorkArgs
(optional) tailslayer::ArgList<...> of compile-time args to work function
Constructor optional parameters
HedgedReader(
uint64_t channel_offset = DEFAULT_OFFSET, // undocumented channel scrambling offset
uint64_t channel_bit = DEFAULT_BIT, // bit used for channel selection
std::size_t n_replicas = 2 // number of DRAM channel replicas
)
Methods
reader.insert(T value); // Insert value, replicated across all channels
reader.start_workers(); // Launch per-channel worker threads (blocking)
Utilities
tailslayer::pin_to_core(core_id); // Pin calling thread to a specific core
tailslayer::CORE_MAIN // Constant: recommended core for main thread
Minimal Usage Pattern
#include <tailslayer/hedged_reader.hpp>
#include <cstdint>
#include <cstdio>
// 1. Define your signal function — waits for your event, returns index to read
[[gnu::always_inline]] inline std::size_t my_signal() {
// Example: busy-wait for an external flag, then return the index
extern volatile std::size_t g_index;
extern volatile bool g_trigger;
while (!g_trigger) {}
g_trigger = false;
return g_index;
}
// 2. Define your work function — receives the read value immediately
template <typename T>
[[gnu::always_inline]] inline void my_work(T val) {
// Process val as fast as possible
printf("Read value: %u\n", (unsigned)val);
}
int main() {
using T = uint8_t;
// Pin main thread to recommended core
tailslayer::pin_to_core(tailslayer::CORE_MAIN);
// Construct reader with 2 replicas (default)
tailslayer::HedgedReader<T, my_signal, my_work<T>> reader{};
// Insert data — replicated across both DRAM channels automatically
reader.insert(0x43);
reader.insert(0x44);
// Launch workers — blocks; workers spin until signal fires
reader.start_workers();
return 0;
}
Passing Arguments to Signal and Work Functions
Use tailslayer::ArgList<...> to pass compile-time integer arguments:
#include <tailslayer/hedged_reader.hpp>
// Signal function with args
[[gnu::always_inline]] inline std::size_t my_signal(int threshold, int channel) {
// use threshold and channel...
return 0;
}
// Work function with args
template <typename T>
[[gnu::always_inline]] inline void my_work(T val, int multiplier) {
volatile int result = (int)val * multiplier;
(void)result;
}
int main() {
using T = uint8_t;
tailslayer::pin_to_core(tailslayer::CORE_MAIN);
tailslayer::HedgedReader<
T,
my_signal,
my_work<T>,
tailslayer::ArgList<10, 1>, // args forwarded to my_signal: threshold=10, channel=1
tailslayer::ArgList<2> // args forwarded to my_work: multiplier=2
> reader{};
reader.insert(0xAB);
reader.start_workers();
}
Custom Channel Configuration
Override channel offset, channel bit, and replica count in the constructor:
// Example: 4 replicas, custom channel bit 8 (common for AMD/Intel)
tailslayer::HedgedReader<T, my_signal, my_work<T>> reader{
/* channel_offset */ 0,
/* channel_bit */ 8,
/* n_replicas */ 4
};
Note: N-way (more than 2 replicas) hedging requires using the benchmark code in discovery/benchmark/. The main library header currently exposes 2 channels by default.
Running Benchmarks
Channel-hedged read benchmark (N-way)
cd discovery/benchmark
make
sudo chrt -f 99 ./hedged_read_cpp --all --channel-bit 8
Flags:
Flag
Description
--all
Run all channel configurations
--channel-bit N
Specify the DRAM channel selection bit (try 6, 7, or 8 for your platform)
DRAM refresh spike timing probe
cd discovery
gcc -O2 -o trefi_probe trefi_probe.c
sudo ./trefi_probe
This measures your DRAM's tREFI refresh interval and the worst-case stall duration — useful for calibrating expectations.
Platform Notes
Platform
Typical Channel Bit
Notes
AMD (Zen)
6 or 7
Verify with benchmark
Intel
6, 7, or 8
Run benchmark with --all
AWS Graviton
8
Confirmed working
Use --all in the benchmark to auto-detect the best channel bit for your system.
Common Patterns
Low-latency trading / event-driven read
// Pre-load order book prices into hedged reader
// Signal on market data arrival, process immediately
[[gnu::always_inline]] inline std::size_t await_market_signal() {
extern volatile std::size_t g_book_idx;
extern volatile bool g_tick;
while (!g_tick) { __builtin_ia32_pause(); }
g_tick = false;
return g_book_idx;
}
template <typename T>
[[gnu::always_inline]] inline void process_price(T price) {
// Submit order using price with minimal latency
extern void submit_order(T);
submit_order(price);
}
int main() {
tailslayer::pin_to_core(tailslayer::CORE_MAIN);
tailslayer::HedgedReader<uint64_t, await_market_signal, process_price<uint64_t>> reader{};
for (uint64_t price : preloaded_prices) {
reader.insert(price);
}
reader.start_workers();
}
Preloading a lookup table across channels
// Each insert automatically maps to correct DRAM channel via address calculation
// Access is via logical index — tailslayer manages physical placement
tailslayer::HedgedReader<uint32_t, my_signal, my_work<uint32_t>> reader{};
std::vector<uint32_t> lut = {100, 200, 300, 400};
for (auto v : lut) {
reader.insert(v);
}
reader.start_workers();
Troubleshooting
High latency still observed
- Verify you are using the correct
--channel-bitfor your CPU. Run benchmark with--all.
- Ensure workers are pinned to isolated cores (use
isolcpus=kernel boot parameter).
- Run with real-time scheduling:
sudo chrt -f 99 ./your_binary
Build errors — missing headers
- Confirm
include/tailslayer/hedged_reader.hppis on your include path.
- Requires C++17 or later: add
-std=c++17to your compiler flags.
Workers don't start / deadlock
start_workers()is blocking. It launches threads and waits — your signal function must eventually return.
- Ensure the signal function does not block indefinitely during testing.
Data corruption / wrong values
- Each
insert()replicates the value N times (one per channel). Logical indexing is handled internally — do not attempt to address replicas directly.
- Do not modify inserted data after
insert()is called.
Platform not supported
- Tailslayer uses undocumented DRAM channel scrambling offsets. If your platform is not AMD, Intel, or Graviton, run the trefi_probe and benchmark tools to characterize refresh behavior before using the library in production.
Project Structure
tailslayer/
├── include/tailslayer/
│ └── hedged_reader.hpp # Main library header (copy this)
├── tailslayer_example.cpp # Usage example
├── discovery/
│ ├── trefi_probe.c # DRAM refresh spike timing tool
│ └── benchmark/ # N-way channel hedging benchmark
└── Makefile