Why Latency – not Bandwidth – is Your Page Load Time Enemy

Preface

If you’re one of my non-nerd friends reading this, you might want to bail now. What follows is likely to not be your cup of tea.

tl;dr

If you are a nerd, yet don’t want to wade through the information below, here is a summary. Assuming all other variables (bandwidth, RTT, browser, etc) remain constant, you can use the following rules-of-thumb:
  • Page load times (PLT) grow linerally as Round Trip Time (RTT) increases
    • In the case of this study, it grows roughly according to this formula: PLT = 10 x RTT + 2 seconds – YMWV depending on the page in question
    • The exact formula depends on the web app in question and the browser used, but the (near) linearity is universal*
  • For modern web apps – that is, ones where various, typically large JavaScript libraries are used – and/or large, non-caheable assets like images that change often – clients should have a bare minimum of 512kbps of bandwidth to/from the servers hosting the app/assets
    • Preferably, 1Mbps or greater for the best experience

* Note: Nothing is universal. I’m painting with a relatively broad brush, so please hold the “Yeah, but…” comments. I know there’s always other variables to consider and corner cases to be found. I’m trying to keep this as “simple” as I can :)

Overview

The intent of this Post is to get some high-level insight into how latency (Round Trip Time, or RTT) and bandwidth affect page load performance for modern web apps.
This is not a new topic, per se. I’m not claiming to plant any flags here. The extra piece I have to add – which I haven’t had luck in finding, at least not all in one place – is understanding why we see these results… not just that we see them, period.
Also, I plan to follow this up with a study combining these two variables – RTT and bandwidth – interact together. In the real world, there are various “profiles” of these combinations, such as:
  • Branch Office: Low RTT but limited – and contentious – bandwidth
  • Teleworker: Low/Mid RTT and limited bandwidth (often without any QoS)
  • Coffee Shop WiFi: Mid RTT and highly contentious, relatively low bandwidth
  • Hotel WiFi (AKA: damn near a 56k modem): Mid RTT and significantly oversubscrubed bandwidth
  • Overseas worker: Plenty of bandwidth, but 150ms+ latency

Testing Methodology

I took measurements of various load times and subtracted server-side delay.
The items measured were:
  • Server-side response time (server-side delay)
  • Time to fully download the framework HTML
    • This particular web app uses Facebook’s BigPipe framework, which helps improve perceived performance on “slower” browsers (read: IE 8 and lower)
  • Time to the DOMContentLoaded event
  • Time to the Page Load event
Note that the Page Load event may or may not correlate with the end-user’s actual perceived page load time, but should be pretty close. To truly accurately measure that, we would need to use an automation framework like Selenium, which is outside the scope of this exercise.
These times were obtained via Chrome/Safari Dev Tools and Firebug for Chrome, Safari, and Firefox, respectively.
The variables changed to observe their effect were:
  • Round Trip Time (RTT): From 0 – 300ms added delay
  • Bandwidth: From 64k – unrestricted WiFi (again, exact bandwidth won’t matter, as you’ll read later)
These variables were manipulated using Charles Proxy. Unfortunately, Charles Proxy does not have a facility to add random packet loss, so this was not tested.
Also, I opted to use WiFi, as this is the connection medium of choice for most end users. Short of adding a touch of latency, it is not expected that WiFi would introduce any significant delays. I did measure the OS-level RTT times, which were ~59ms server(s). Keep that in mind when looking at the charts below. In other words, where you see 50ms in the RTT table below, that is actually closer to 110ms RTT in real life. Baseline (shown as 0), in this case, is ~59ms.

Summary

Unsurprisingly, the results mimicked prior studies on this subject. Similar findings from Googler Ilya Grigorik are posted here, which themselves are similar to this post by Stuart Cheshire back in 1996.
So, we’re not talking about a recent discovery. In the age of modern web applications, however, it has become an increasingly important one.
At the risk of spoiling the rest of this Post, in general:
  • Page load times increase linearly when RTT is increased
  • Page load times increase on a curve when bandwidth is reduced
What I hope to add to the conversation is to detail some of the reasons why these “truths” are, well, truths.
I will also see what rules-of-thumb we might be able to come up with regarding how these variables affect page load times.
Finally, I will also look at different browsers to see how they affect page load times.
Let’s get to it.

RTT

Here is a graph of the page load times in Chrome, as RTT was increased:
 Chrome RTT
You can see that the longer RTT affects the speed at which we get the initial page, but not very much. It does, however, affect times to load the DOM and – most importantly – the page as a whole.
The question is: Why?
This is because the first page you receive has references to tons of other items such as JavaScript and Images. As RTT increases, the browser spends more of its time waiting for each request to be fulfilled. This, in turn, delays when other resources can be downloaded.
This screenshot from Firebug shows the times needed to request and receive various items on the page:
Screen Shot 2013-09-05 at 11.43.00 AM
Notice how it looks like a Gantt Chart, where you see roughly 6 items being downloaded at once. This goes back to our old friend, “The (typically) 6 TCP Session Limit.” For those new to this limitation, most all modern browsers can only have 6 requests to the server outstanding at any one time.
[True, these can be tweaked, but I am trying to stick to a “stock” config here. Telling users to hack their registry or fiddle about in “about://settings” isn’t something non-geeks will be prone to try.]
To elaborate, let’s say that we need to download items 1 – 100 to fully load a given page. To do this, we consume all 6 TCP sessions getting the first 6 items. We can’t get the 7th item (or any other items) until one of those 6 requests is fulfilled. If one of those first 6 items takes a while to get, it holds up the other 94 behind it.
This is why you see the staggered waterfall in the image above. Most of these requests come back after roughly the same amount of delay hence this “stair step”/Gantt Chart-like effect. Items 1-6 come back, then 7-12, then 13-18, and so on.
To help illustrate how the combination of longer RTT and the 6 TCP session limit affect page load times, take note of the brownish color in the image above. That indicates the amount of time that the browser was “blocked” before it could send that particular request. This blocking comes from the TCP session limit; it is the time a given request spends “waiting” on
Here is an example breakdown for one of these affected requests at 300ms RTT. Again, note the brown bar, indicating how long the request is blocked:
Screen Shot 2013-09-05 at 11.43.57 AM
That’s almost half a second (425ms)… just waiting to send the request. There were others later on in the page load waiting > 1.5 seconds at 300ms RTT.
When you consider that many modern web applications will average close to 100 requests (my Gmail page involves ~95 such requests, for example), this is not insignificant – which is what the RTT data shows us.

RTT Rule-of-Thumb

OK, great. How do we get to a simple formula to help understand how RTT affects page load time? Simple: We use a trend line.
Looking at the trend line in the RTT chart above, we see the formula for that line is:
  • y = 9.119x + 2.124
Translated, this means we can estimate page load times for my test app in Chrome by:
  • Multiplying 9x the observed RTT (in fractions of a second, e.g.: 100ms = 0.1s)
    • Run a ping to get the current RTT from a given client
  • Adding 2 seconds to the result
Again, we are not looking to be 100% accurate; it is a rule-of-thumb, not an absolute. Also, remember that we are talking about changing RTT only. Once you start playing with multiple variables (RTT and bandwidth, as would be the case for a telecommuter), the story is not so straight-forward.
The rule-of-thumb for Safari and Firefox are similar, except that the multipliers for each are roughly 11 and 10, respectively.
To make things simple, we can split the difference and say that:
  • Page load time = (10 x RTT) + 2
    • This assumes other variables like bandwidth, browser used, etc. remain constant
Again: This ROT is based on one page in a modern web application. Pages with more or less HTTP requests will be affected more or less, hence have different values in their ROT.
The point is that for any given page, it is a reasonable assumption that page load times increase roughly linerally as RTT increases.

Bandwidth

Developing a formula for how reduced bandwidth affects page load times is a bit more tricky.
Here are page load times for my test web app starting at 64k, working up to T1 speed and through to WiFi connectivity:
Chrome - Bandwidth
An aside: The Y axis is seconds and the X axis is kbps. Note that “1600 kbps” entry is data for unrestricted WiFi-connectivity. If I put that value at 802.11n speeds in the chart, it squishes everything to the “left” such that it was just a cliff. OK, back to the Post…
The trend line for this one is not so simple – nor conducive to a simple rule-of-thumb. What we can see that bandwidth significantly affects page load times once you get below 512kbps. At 256kbps, we see a small uptick, but at 128kbps and 64kbps that page load times jump drastically.
Given that most of the target audience for modern web apps are for users on connections greater than 512k, that means that bandwidth will typically not be a significant contributor to page load times for most users.
But, keep this in mind the next time you go to load GMail while on Gogo or at crappy coffee house WiFi (crappy can define the coffee house or their WiFi, or both – your call :) ).
In conclusion, our rule-of-thumb for bandwidth is:
  • Minimum bandwidth: 512kbps
  • Preferred bandwidth: 1M or greater

Comparing Browsers

We saw that they track similarly when bandwidth is constrained and we know that bandwidth is not a significant issue if available bandwidth is >=512k.
We also saw that page load time tracks linerally with RTT. What about different browsers? How does this change things?
Comparing Chrome, Safari, and Firefox, there was not much to differentiate either browser. The numbers speak for themselves.

An Aside: Internet Explorer

This chart will become more interesting when I add Internet Explorer in a future update, as Internet Explorer (at least through IE9) has some well-known limitations for “modern” websites. Namely, it downloads JavaScript serially whereas other browsers download them in parallel.
True, JavaScript still has to be processed “in order,” but that doesn’t mean you need to wait to download each .js file until the one before it has been downloaded and processed. In many cases, one of the first chunks of JavaScript to be downloaded will be a large library like Dojo or jQuery. This means for IE that the user will sit for a long time, showing no apparent page load progress, while it downloads this JavaScript. It logjams everything behind it.

RTT

RTT - Browsers

Bandwidth

Interestingly, there was a bit more of a delta between the browsers when bandwidth is seriously constrained. That said, I’m not sure that the difference between 21 (Chrome) and 27 second (Firefox) page load times matters. You’ve likely lost the user to some other distraction at that point.
Bandwidth - Browsers

OK, So What Can We Do About This?

Again, everything above isn’t newly discovered. Because of this, a few different schools of thought have come about to mitigate the effect of RTT and bandwidth constraints, including:

  1. Split content across multiple hosts to get around the 6 TCP session limit
  2. Content Delivery Networks
  3. Minifying
  4. Lazy loading
  5. The oldest of them all: Caching

The Internet is full of Posts covering these topics, so I won’t rehash those, but they gist is they all look to minimize:

  1. The size of content needed to load a given page
  2. The number of requests needed to load a given page
  3. Parallelizing the requests needed to load a given page

While reducing the size helps with bandwidth-constrained clients, a key effect of reducing the size of a given payload is actually better realized by freeing up that TCP session faster, thus unblocking later requests faster.

If we look at this from the bigger picture, there is really only one answer to “how do we make pages load faster”: Reduce the number of requests. That ultimately has the biggest bang for the buck.

CDNs, caching, etc all help, but if I still have to wait for the server to tell me that a given piece of content is unchanged (302), I still had to wait for that response, which holds up later requests, and so on.

As a case in point, for the RTT times shown above, about 2/3 of the requests got a 302 back. The same number of requests and 200s vs 302s came back in all tests. So, it’s not a matter of needing a bigger pipe. The only solution is to send less requests to begin with. This is one (of the many) things that minification intends to solve. Without minification and JavaScript bundling, the story would be significantly worse.

Conclusion

If we take the info above and apply it to where things are going – namely mobility, both apps and the mobile web – we can see that while 3G vs LTE makes a difference, our biggest enemy is latency. Yes, LTE definitely gives us a bigger “pipe” but you’d be surprised at how much of the speed you perceive is related to the lower latency from your device to the carrier on LTE vs 3G technologies.

In fact, Ars Technica recently posted about how ViaSat – a satellite-based ISP – has speeds that rank well with the big boys. But, ask anyone who has been forced to use the high-latency technology and you will hear tales of woe. My Dad, for example, ditched satellite Internet the second that he could get DSL in his area. They love their, low(er)-latency 1.5MBps DSL way better than their satellite link, which would top out around 12MBps on SpeedTest.net.

To crib from SNL’s “Lazy Sunday” skit, “It’s all about the latency, baby!”

… and I’m Ghost like Swayze.

Advertisements