Compression progress update — Q2 2026

An informal update on where our compression research stands this quarter — what is working, what is not, and what we are trying next.

Gaurav Gandhi 1 min read Research

A quick, informal update on where our compression work is this quarter.

What is working

Our nonlinear compression pipeline continues to hold quality at compression ratios that quantisation-only approaches cannot reach. The headline number we have been quoting publicly — a 7B-quality model in roughly 750 MB — is reproducible across the open-source bases we have tried.

What is not

Long-context behaviour past 32k tokens degrades faster than we would like. We have a hypothesis about where the loss is concentrated, and we are testing a fix this month.

What we are trying next

  • A revised reconstruction objective that weighs attention-head importance.
  • A faster calibration loop so that compressing a new base model takes hours, not days.
  • Better evals on tool-use and structured-output tasks, which we think are underrepresented in standard benchmarks.

More to come.

Tags compressionresearch-update