Three Ollama servers ran the same prompts and the same models. Higher numbers = faster. Every chart shows the same answer in a different way.
Bar length shows each server's median tokens/sec for that model (across all test types). Longer bar = faster.
Tokens per second is how fast the model produces words. Higher is better. Bars show the median across all test types.
Same models, four different load patterns: sequential = one at a time, concurrent = several at once, queued = a stream of requests, mixed = all models hit simultaneously.
When many requests run at once, what's the total tokens/sec the server produces? Think of this as the kitchen output rate when the restaurant is busy.
How to read this: