← Research
Compression progress update — Q2 2026
An informal update on where our compression research stands this quarter — what is working, what is not, and what we are trying next.
A quick, informal update on where our compression work is this quarter.
What is working
Our nonlinear compression pipeline continues to hold quality at compression ratios that quantisation-only approaches cannot reach. The headline number we have been quoting publicly — a 7B-quality model in roughly 750 MB — is reproducible across the open-source bases we have tried.
What is not
Long-context behaviour past 32k tokens degrades faster than we would like. We have a hypothesis about where the loss is concentrated, and we are testing a fix this month.
What we are trying next
- A revised reconstruction objective that weighs attention-head importance.
- A faster calibration loop so that compressing a new base model takes hours, not days.
- Better evals on tool-use and structured-output tasks, which we think are underrepresented in standard benchmarks.
More to come.
Tags compressionresearch-update