Ollama benchmark: 3-way server comparison

Three Ollama servers ran the same prompts and the same models. Higher numbers = faster. Every chart shows the same answer in a different way.

Per-model breakdown

Bar length shows each server's median tokens/sec for that model (across all test types). Longer bar = faster.

Headline: tokens per second by model

Tokens per second is how fast the model produces words. Higher is better. Bars show the median across all test types.

Per test type — speed each user sees

Same models, four different load patterns: sequential = one at a time, concurrent = several at once, queued = a stream of requests, mixed = all models hit simultaneously.

Throughput under load (batch tokens/sec)

When many requests run at once, what's the total tokens/sec the server produces? Think of this as the kitchen output rate when the restaurant is busy.

The raw numbers (median tokens/sec)

How to read this:

Tokens per second is roughly "words per second" — higher = the answer arrives faster.
We use the median (middle value) across all runs to avoid one weird run skewing the picture.
Failed or zero-token requests are filtered out before averaging.