Live Demo
Chat with a Fern compressed model and its commercial counterpart side by side. Watch tokens stream, costs accumulate, and decide for yourself.
Methodology
Both models are queried simultaneously and stream tokens directly to your browser.
Time to first token
Wall-clock milliseconds from when the request is sent until the first content token arrives in the browser.
Tokens / sec
Output tokens counted as they stream, divided by elapsed wall-clock time from request start to final token.
Cost / token (Fern)
Based on AWS p3.2xlarge ($3.06/hr, 1× NVIDIA Tesla V100 32 GB) shared across 50 concurrent users — a conservative production baseline for a 0.6B model. Formula: ($/hr ÷ 3600) ÷ (tokens/sec × 50).
Cost / token (GPT-4o mini)
Based on OpenAI published pricing: $0.15/M input tokens and $0.60/M output tokens, applied to actual usage.