Ollama benchmark: 3-way server comparison

Three Ollama servers ran the same prompts and the same models. Higher numbers = faster. Every chart shows the same answer in a different way.

Per-model breakdown

Bar length shows each server's median tokens/sec for that model (across all test types). Longer bar = faster.

Headline: tokens per second by model

Tokens per second is how fast the model produces words. Higher is better. Bars show the median across all test types.

Per test type — speed each user sees

Same models, four different load patterns: sequential = one at a time, concurrent = several at once, queued = a stream of requests, mixed = all models hit simultaneously.

Throughput under load (batch tokens/sec)

When many requests run at once, what's the total tokens/sec the server produces? Think of this as the kitchen output rate when the restaurant is busy.

The raw numbers (median tokens/sec)

How to read this: