how it
works
the engine's design.
the architecture of
unhurried compression
iris treats data compression as a perception problem. profile first, route second.
the perception pass
Most compressors fail because they are blind. They apply the same sliding window or entropy coding regardless of what the data actually looks like.
iris takes the opposite approach: read the dataset once, understand its global structural profile, and only then route the data through the stage best suited to eliminate what it finds.
three phases
observe → profile
A single read-only pass calculates byte entropy, autocorrelation at all lags up to 64, and block fingerprints. The profiler builds the context map that every subsequent stage uses.
decompose → router
Structured text is routed to the Column Grammar engine. Periodic binary data hits the Resonance stage. Large-scale structural similarity is handled by the Prediction Graph.
pack → coder
The final residuals—stripped of all identifiable structure—are packed via our internal pure-Rust rANS and Range Coding engines. In tensor mode, Salience scores accurately guide which sub-coder to use per block. Zero external libraries, just pure information theory.
nearsorted.bin
the pipeline in practice
phase i (profile): The read-only profiler builds the context map. autocorrelation peaks at lag=4. The running_surprise metric remains near 0.02, identifying a high-resonance target.
phase ii (eliminate): The Resonance extraction engine fires at lag=4. The residual is data[i] wrapping_sub data[i-4]. The structural periodicity is stripped, leaving a nearly-zero residual stream.
phase iii (pack): The final 4MB of residuals are squashed into 9,857 bytes by the internal rANS engine. Final ratio: 405x.
prediction graph
the architecture
Traditional LZ77 compressors are trapped by a small sliding window (usually 32kb–1mb). If a block repeats 10mb away, they can't "see" it.
iris removes this limit. Using SimHash fingerprints and LSH bands during the initial profile pass, the **Prediction Graph** identifies similar blocks regardless of their physical distance in the file.
Similar blocks are XOR-delta'd against their nearest neighbor, leaving a nearly-zero residual for the final coder.
shared architecture
By moving the perception of the data to the very beginning, iris eliminates the need for expensive re-reads or complex back-tracking during compression.
The entire pipeline is built in pure Rust 1.75+, leveraging context-packing and range asymmetric numeral systems for maximum density.
pipeline flow
- → phase 1: profile table
- → phase 1.5: salience map (tensor block routing)
- → phase 2: structural elimination
- → phase 3: rANS pack
- → result: .iris standalone