how it
works

the engine's design.

the anatomy of perception

the architecture of
unhurried compression

iris treats data compression as a perception problem. profile first, route second.

the perception pass

Most compressors fail because they are blind. They apply the same sliding window or entropy coding regardless of what the data actually looks like.

iris takes the opposite approach: read the dataset once, understand its global structural profile, and only then route the data through the stage best suited to eliminate what it finds.

three phases

i

observe → profile

A single read-only pass calculates byte entropy, autocorrelation at all lags up to 64, and block fingerprints. The profiler builds the context map that every subsequent stage uses.

ii

decompose → router

Structured text is routed to the Column Grammar engine. Periodic binary data hits the Resonance stage. Large-scale structural similarity is handled by the Prediction Graph.

iii

pack → coder

The final residuals—stripped of all identifiable structure—are packed via our internal pure-Rust rANS and Range Coding engines. In tensor mode, Salience scores accurately guide which sub-coder to use per block. Zero external libraries, just pure information theory.

nearsorted.bin

the pipeline in practice

phase i (profile): The read-only profiler builds the context map. autocorrelation peaks at lag=4. The running_surprise metric remains near 0.02, identifying a high-resonance target.

phase ii (eliminate): The Resonance extraction engine fires at lag=4. The residual is data[i] wrapping_sub data[i-4]. The structural periodicity is stripped, leaving a nearly-zero residual stream.

phase iii (pack): The final 4MB of residuals are squashed into 9,857 bytes by the internal rANS engine. Final ratio: 405x.

prediction graph

the architecture

Traditional LZ77 compressors are trapped by a small sliding window (usually 32kb–1mb). If a block repeats 10mb away, they can't "see" it.

iris removes this limit. Using SimHash fingerprints and LSH bands during the initial profile pass, the **Prediction Graph** identifies similar blocks regardless of their physical distance in the file.

Similar blocks are XOR-delta'd against their nearest neighbor, leaving a nearly-zero residual for the final coder.

profile anchor
[block 0] raw
block 1 (Δ) residue: 0.04%
block 3 (Δ) residue: 0.01%
block 4 (Δ)
block 5 (Δ) cascading
profile anchor
[block 2] raw
block 6 (Δ)
global simhash match (+12MB)
[block 7] raw
block 8 (Δ)

shared architecture

By moving the perception of the data to the very beginning, iris eliminates the need for expensive re-reads or complex back-tracking during compression.

The entire pipeline is built in pure Rust 1.75+, leveraging context-packing and range asymmetric numeral systems for maximum density.

pipeline flow

  • → phase 1: profile table
  • → phase 1.5: salience map (tensor block routing)
  • → phase 2: structural elimination
  • → phase 3: rANS pack
  • → result: .iris standalone