How To Reduce Latency

What to know the secret to reducing latency in any software system?

That works at every level.

It’s simple: move things close to each other.

Latency too high when sending data from a server in New York to a server in San Francisco - move the data to a server in San Francisco.

Latency too high when doing high frequency trading in City of London 1km from the stock exchange - move the trading engine into the stock exchange’s own data centre, just metres from the exchange’s own servers.

Latency too high reading data from a hard drive - move the data closer to the CPU, by read-ahead buffering (aka caching).

Latency too high when reading from main memory for a high performance compute engine - move the data closer to the CPU by caching it in the L1 and L2 cache.

Latency too high when reading cache, move the data closer still - keep it in a register.

In every case, we reduce latency by moving the data nearer to the processing. Sometimes we permanently move it, other times we temporarily move it using caches.