Attention: There are no Delays Expected on the Network
Recently I had an interesting query from a customer that I thought warranted a blog post. So here goes…
A Long Fat Network (LFN) is a feature of modern networks and solutions which we are seeing more and more. On the one hand, bandwidths are increasing. Gig connections to head offices are common and almost mandatory into a DC. Businesses are expanding internationally with more branches opening across the globe and we also see the push (at least from suppliers) of cloud based workloads. On the other hand, we still can’t beat the laws of physics just yet and from a networking perspective, this manifests itself as latency.
A LFN therefore is a link, where there is plenty of bandwidth (Fat) but lots of latency (Long). Latency is actually a result of a number of factors, not just distance, but also buffering and serialisation delay in routers etc. We also need to be careful with the definition of latency. Some define latency as the time from source to destination while others also include the return trip or round trip time (RTT). For the purposes of this post, latency means RTT (the time you see when you ping something).
So, with definitions out of the way, what is the problem? Well, many customers have the perception that the only consideration is bandwidth. “If things are slow, then more bandwidth will solve it.”
Unfortunately of course, nothing is ever that simple. Most applications use TCP/IP as the network protocol and for good reason. It’s a connection oriented protocol, which means that a sender must receive the occasional acknowledgement in order to continue. The acknowledgement confirms to the sender that everything so far has been received ok and the next part can be sent. All good so far, but the longer that the sender has to wait for the acknowledgement, the less data throughput you get, and here is where latency comes in. The more latency, the more the sender has to wait for acknowledgements, the slower the throughput. And here is the rub. You get to a point where it doesn’t matter how much bandwidth you throw at the network, your throughput just doesn’t increase because the latency is having an effect.
This is something that we have seen numerous times with customers trying to transmit large files or move unsuitable applications to cloud providers or replicate large amounts of data, SAN replication being a prime example. There are a number of answers to this issue depending on the scenario, but here is one possibility.
The sender can only send the amount of data defined by the receivers TCP Window Size. This parameter is within the end system itself and is invisible to most network components such as routers etc. A larger TCP Window size means more data transmitted before needing an acknowledgement, so more throughput. The answer therefore could be a registry change to increase your host TCP Window Size. Be careful though. While the obvious answer might seem to be to change this value to the maximum supported by the OS that could be a bad thing. A packet loss would take longer to detect and result in more data to be re-transmitted, which could have the result of reducing throughput.
So what would I suggest?
Well, if you are considering moving applications to the cloud or upgrading bandwidth in the expectation that responsiveness will improve, then you could be disappointed with the results. Before taking the plunge, make sure that you understand your network and most importantly your applications so you can accurately predict what the result will be. Simulate the network in advance if possible to get an accurate view of user experience and throughput.