Resource (pre)loading

layout: true
class: center, middle
background-image: url(media/mechanism.jpg)
---

# On Preloads and Preloaders

.center[Yoav Weiss  |     @yoavweiss]
.right[![](media/Akamai-Logo-RGB.png)]

???

Hi, I'm Yoav Weiss. I work for Akamai on making our CDN as well as browsers faster, and I'm here today to 
to answer 2 fundemental questions:
---

# How do browsers load resources?

???
Something we rarely think about in our day to day. We just add resources to our pages and the browser Just loads them! But how does the
browser know which resources to load?
---

# How can we make it faster?

???
Cause we're here to talk about performance, right?
---
# Start from the begining

???
In order to understand how browsers load resources today, let's start by taking a trip down memory lane and examine how they used to load
resources.
---
.bright[![](media/dom_creation.svg)]

???
In the early day of the Web, users clicked on a link or typed in a URL, and then the browser would fetch the relevant HTML file,
tokenize it, parse it and construct a DOM from it.

Then each DOM node would decide which other resources it may need: scripts, stylesheets, images or other HTML files.

And once the DOM node figured that out, it would trigger a download for these resources.

In short, each DOM node, when created, downloads the resources that it needs.

Anyone spots a problem with that?
---
##  scripts ruin everything!!!

.bright[![](media/scripts_halt_parser.svg)]

???

Synchronous scripts (the only kind of scripts in the early days) can have all kinds of interaction with the DOM tree and its
styles.

They can modify the DOM tree, they can return different results depending on the current DOM tree and depending on styles that were applied to the DOM.

All that means that once a script is encountered, the DOM creation has to block there and then. So, DOM creation could not continue unless
all scripts that block it finished running, and those scripts couldn't run until all their resources were downloaded, and until all the
styles that preceded them finished downloading and were applied to the DOM.

---

.center[![](media/no-pre-loader-waterfall-ie7.png)]

???

And since resource download depended on DOM node creation, have a sync script in your HTML (which almost everyone had) meant that your
*other* resources were blocked on it too. When scripts were downloaded, nothing else would.

which results in a staircase shaped network waterfall

---
## Decoupling

???

So how did browsers solve this problem?

By decoupling resource download from DOM node creation. That way, they could start downloading the resources earlier while not breaking Web
pages that relied on the fact sync scripts alter the DOM.

---
background-image: url(media/james_brown.jpg)

.blockquote[The greatest browser optimization of all times.

Steve Souders]

???
That, in and of itself, was a huge improvement! up to a point where Steve Souders, the Godfather of Web performance, once called it the
greatest  performance improvement of all times.

---
background-image: url(media/rose.jpg)
# By any other name

???
Different browsers called it different names: the speculative parser, the look ahead scanner, the preload scanner. I like to call it the
preloader, as a generic, vendor neutral term.

---
## How does it do its magic?

???

To dive into that we'd need to overview HTML processing a little bit.

---
## `<img src=bla><div>some text</div></img>`

???

The browser downloads the HTML file, as bits on the wire. Then it translates them into the appropriate charset.

Does anyone spot problems with that HTML?

---

## Tokenization

.bright[![](media/tokenization.svg)]

???
The next phase after that is tokenization. Breaking apart the single characters into tags and attributes.
At that point, no HTML rules are yet applied.

---
## Parsing

.bright[![](media/dom_tree.svg)]

???

Then parsing takes place, and applies HTML rules to the tokens and creating a DOM tree from them.

---

## Heuristic download in the tokenization phase

???

The preloader replaces (or adds on) the previous behavior of download at the parsing phase, by heuristically downloading resources just
based on the HTML tokens. There's a certain risk there, that these resources won't actually be needed, in cases where JS breaks everything.

But since that almost never happens, downloading these resources in advance in a pretty safe bet.

---
## All done??

???
So, the preloader is awesome, and browsers continue to develop it further to also catch resources that are not currently caught.

But, that doesn't solve all of the browser's problems when it comes to resource discoverability.
Nowadays, many of our pages are not quite flat in their structure.

---

![](media/tree_html_only.svg)

???
The browser starts out by just downloading the HTML. That's the only URL it's aware of.

---

![](media/tree_html_first_subresources.svg)

???

Then as the HTML is processed, CSS, images and scripts are discovered.
---

![](media/tree_html_second_subresources.svg)

???

But in many cases, CSS and scripts load other resources.
---

![](media/tree_html_third_subresources.svg)

???

Which then can load even more resources

CSS background images are declared in CSS, so discoverable only when the browser calculates the style.
Fonts were later added to the mix, and fonts are even more complicated. As they're only downloaded when the browser knows that they will be
used. So it has to calculate the style, and cross those style with the DOM to see if the resource is really needed.

---
background-image: url(media/dependency_tree.jpg)
## Discoverability problem

???
A lot of resources are downloaded by scripts, which are downloaded by scripts.

The dependency trees for many sites - the graphs that sho which resources depend on which - are extremely deep in some cases.

Generally speaking, the browser can only discover and download the resources that are on the next level of the dependency graph.

Everything below that is just a big unknown.

This is an example of a (sloppily) handdrawn dependency graph of a not-untypical site. Only the very first level of resources is
discoverable by the browser. Everything else requires JS to run in order for the browser to know that it needs these resources.

---
## Bandwidth utilization problem
![](media/espn_bandwidth.png)

???

https://www.webpagetest.org/video/compare.php?tests=161027_M4_2a5c527b9d0bcc489be71e38971f5619-r:1-c:0
---
# How can we fill the gaps?

???
How can we solve that? How can we fill those bandwidth gaps?

I like to divide the bandwidth gap problem into two periods, with two different solutions.

---
# Pre-HTML

???
Let's start by looking at the "pre HTML" period.

During that period, the browser is not aware of any resources that it needs to download beyond the HTML.
What are the typical reasons out HTML gets stalled, which extend that period?
---
# Server "think-time"
---
# Redirects
---
.center.large[![](media/espn_bandwidth_pre_html.png)]

???
---
.center.large[![](media/espn_bandwidth_pre_html_potential.png)]

???
---
# H2 to the rescue
---
layout: true
class: center, middle
background-image: url(media/push.jpg)
---
# H2 push

???

H2 push enables us to take advantage of this idle period and push down resources that we know the browser is going to need.
That can have a double advantage:

---
# Using idle bandwidth

---
# Warming up connection

???

Explain slow start
---
![](media/page_loading_nopush.svg)

???
300 ms think time
100 ms RTT
98KB CSS+JS
120KB HTML
---
![](media/page_loading_push.svg)

???

---
.center.large[![](media/push_all_the_things.png)]

---
#DON'T

.center.large[![](media/push_all_the_things.png)]
---
# Priorities
???

When pushing you're taking away the browser's ability to tell you about the resources and their priorities.
In general, browsers have put in years of research into resource priorities, and pusing everything takes that away from them.
If you do that, you better be pretty confident that you're doing a better job, but chances are, you are not.
---
# Caching
???

The second point where push is not ideal is when it comes to browser caching. Unless you built a mechanism that enables you to keep track of
the browser's cache state (and no such mechanism is ideal), you're not aware if the resources you're pushing are in the browser's cache.

That means that for repeating users, you're likely to push content that the user already has, which is wasting bandwidth and time.

The browser should RST push streams for resources it already has in cache, but turns out no browser reliably does that today.

There's work underway in Chrome, but it's not there yet.

---
background-image: url(media/cache_digests.png)

# Cache digests

???
There's a proposal from Kazuho (the H2O developer, as well as the guy speaking here right after me) to enable browser based cache digests,
so that the server will be notified when the connection start about the user's cache state.

That proposal is great and will eliviate most of the caching concern, but it's not yet there.
---
layout: true
class: center, middle
background-image: url(media/pull.jpg)
---
background-image: url(media/mechanism.jpg)
# Post-HTML
.center[![](media/espn_bandwidth_post_html.png)]

???

This is what we can do before the HTML arrives. What can we do after it has?

Well, recently a set of new standards were specified and added to browsers that can help you tell the browser about those deep dependency
graph resources, and make sure that the browser is aware of them before they are required and downloads them ahead of time.

These standards are a reincarnation of older, de facto additions to browsers that tried to do the same, with varying levels of success.

Let's take a look at what they do.

---

.big.center[`<link rel=preconnect>`]

???

Preconnect enables us to tell the browser that it's going to need to connect to a certain host, so that it can perform a DNS lookup, 
establishe a TCP connections and TLS connection if needed. That a lot of RTTs out of the way.
---

.big.center[`<link rel=preconnect crossorigin>`]

???
Where it gets slightly more complicated is that for privacy reasons, CORS-enabled, credential-less connections are performed using a
separate socket pool. That means that the browser needs to know if a connection would be used for CORS-enabled fetches.

In practice that means that if you have a host where you'd be getting font resources from or XHR, you need to add the crossorigin attribute if you
want these connections to be put to good use.

---

.big.center[`<link rel=preload>`]

???

Now preload is where things get serious. Preload enabled devs to tell the browser that a specific resource would be needed later on.
And more than that, it enables them to tell it what those resources *are* using the...

---

.big.center[`as` attribute]

???

That's one of the biggest differentiators of preload to previous attempts at getting the same thing done.
It doesn't seem like much, but it enables the browser a few things.

--
* Download priority

--
* Proper `Accept` headers

--
* Content-Security-Policy

--
* Resource matching

---
## Use cases
---
## Load late-discovered resources
---
.left[
```html
<link rel=preload href=script.js as=script>
```
]
---
## Early font loading
---
.left[
```html
<link rel=preload href=font.woff2 as=font
      crossorigin type="font/woff2">
```
]
---
## Decouple load from execution
---
.left[
```javascript
function downloadScript(src) {
  var el = document.createElement("link");
  el.as = "script";
  el.rel = "preload";
  el.href = src;
  document.body.appendChild(el);
}

function runScript(src) {
  var el = document.createElement("script");
  el.src = src;
}
```]
---
## Markup based async loading
---
.left[
```html
<link rel=preload as=style href="async_style.css"
      onload="this.rel='stylesheet'">
```
]
---
## Responsive preloading
---
.left[
```html
<link rel=preload as=image href="someimage.jpg"
      media="(max-width: 600px)">
```
]
---
.big.center[`Link:`]
```http
Link: <a.js>;rel=preload;as=script
```
---
## Feature detection

.left[
```javascript
var preloadSupported = function() {
  var link = document.createElement('link');
  var relList = link.relList;
  if (!relList || !relList.supports)
    return false;

return relList.supports('preload');
}
```
]

---
#Push hint
.left[```http
Link: <a.js>;rel=preload;as=script
Link: <a.js>;rel=preload;as=script;nopush
```]
---
# Preload > push?

--
* Cross origin

--
* Cache & cookies

--
* load/error events

--
* Content negotiation

---
# But can't 
#pre-HTML

???

But important to remember that preload's big disadvantage is that is can't help for the "pre-HTML" period.

---
background-image: url(media/early_hints.png)

# Or can it?

---
layout: true
class: center, middle
background-image: url(media/takeaway.jpg)
---

---
## Resource loading is hard
---
## Bandwidth is often underutilized
---
## Push critical resources during HTML think-time
---
## Preload late-discovered resources
---
## Preload enables new loading patterns

---
background-image: url(media/thanks.jpg)
# Thank you!

.center[Yoav Weiss  |     @yoavweiss]
![](media/Akamai-Logo-RGB.png)
---
background-image: url(media/thanks.jpg)
# Questions?
.center[Yoav Weiss  |     @yoavweiss]
![](media/Akamai-Logo-RGB.png)
---
background-image: url(media/thanks.jpg)
## Acknowledgements

* Resources - https://www.flickr.com/photos/36742159@N02/16973927197
* James Brown mural - Jay Galvin https://www.flickr.com/photos/36957368@N00/8254471580
* Mechanism - Vlastimil Koutecký https://www.flickr.com/photos/vlastimil_koutecky/9205853995/
* Rose - Susanne Nilsson https://www.flickr.com/photos/infomastern/9916290543/
* Push - The US Army https://www.flickr.com/photos/soldiersmediacenter/14789290273/
* Pull - Clemens v. Vogelsang https://www.flickr.com/photos/vauvau/3507527748/
* Takeaway - Jeremy Segrott https://www.flickr.com/photos/126337928@N05/23964405150/
* Thanks - Alessio Maffeis https://www.flickr.com/photos/imaffo/4736628110/