Live Demo

See the difference.
In real time.

Chat with a Fern compressed model and its commercial counterpart side by side. Watch tokens stream, costs accumulate, and decide for yourself.

10-minute sandbox · 30-minute cooldown between sessions No account required

Methodology

How metrics are calculated

Both models are queried simultaneously and stream tokens directly to your browser.

Time to first token

Wall-clock milliseconds from when the request is sent until the first content token arrives in the browser.

Tokens / sec

Output tokens counted as they stream, divided by elapsed wall-clock time from request start to final token.

Cost / token (Fern)

Based on AWS p3.2xlarge ($3.06/hr, 1× NVIDIA Tesla V100 32 GB) shared across 50 concurrent users — a conservative production baseline for a 0.6B model. Formula: ($/hr ÷ 3600) ÷ (tokens/sec × 50).

Cost / token (GPT-4o mini)

Based on OpenAI published pricing: $0.15/M input tokens and $0.60/M output tokens, applied to actual usage.

See the difference.In real time.

How metrics are calculated

See the difference.
In real time.