Resource (pre)loading

layout: true
class: center, middle
background-image: url(media/links.jpg)
---

# `<link>`

.center[Yoav Weiss  |     @yoavweiss]
.right[![](media/Akamai-Logo-RGB.png)]

???

Hi, I'm Yoav Weiss. I work for Akamai on making both browsers and CDNs awesomer, and I'm here today to talk to you about the `<link>`
element, browser's resource loading, the connection between the two and how *you* can help browsers load resources faster using the magic of
the link element.

When we first look at the link element, the obvious problem with it is that it has nothing to do with what we usually refer to as links.

If we were to teach someone new to the platform about `<link>` for the first time, they'd think that this is the element for creating links
between pages. And they'd be wrong...

So if `<link>` is not about links, what is it about?
---
background-image: url(media/doves.jpg)

## All about relationships

???

the `<link>` element is about
relationships between the current page you're on and... other things on the internet.

It can be relationship to other pages, to resources, or theoretically to the online representation of "objects"

---

![](media/link_definition.png)

???

Link has been part of HTML for a long while. It was there in the first Internet draft that defined HTML, as the hidden and underappreciated little brother of `<a>` tags.

Both establish a relationship to an external document or resource, but `<a>` has an immediate action. It's creating a link that enables the
user to hop over to that document it references. `<link>` has no such action and the processing of the element varies wildly according to
its relationship.

// https://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt

---

![](media/relationships.png)

???

The original draft included a bunch of relationship types, but stated loud and clear
that non of these relationships is part of the spec.

Most of the relationships in the original spec are long gone and no one ever cares much about them.

But let's take a look at the relationships defined in a more recent spec, shall we?

---
![](media/link_types.png)

???

A lot of items on that list, but if we look at the relations that actually do something...

---
![](media/link_types_annotated.png)

???

We get a totally different picture.

We have `stylesheet` for CSS, `alternate` for RSS, `icon` for icons.
---
![](media/link_types_annotated2.png)

???

We also have prefetch and next, which we'll discuss later.

But, the other relations are either only allowed on anchors or don't actually
change the way browsers behave in any way.

Some of them have benefits in terms of SEO or other non-browser HTML consumers.

But since they have no impact on UI, I suspect most developers are not really aware of these link relations beyond cargo-cult copy/pasting
them around.

In practice the first time most developers encounter a `<link>` tag is when they're googling the syntax for...
---
### `<link rel=stylesheet href="awesome.css">`

???

which, let's be honest with ourselves, probably should've been something like...
---
### `<style src="awesome.css"></style>`

???

Am I right? that could've been simpler. But, `<link>` was there and external stylesheet were defined as their own documents that style the
HTML document, and the rest is history. Going back from that decision won't be Web compatible, as the relative URLs would break.

---
## `<link>` is a mechanism

???

We have all these different rel values, and each one of them *does* something different.
Which means that`<link>` is a generic mechanism to create relations. And as a mechanism it is pretty rad, and it has a lot of nifty nuts and bolts that tend to make it very helpful.

Let's go over them, OK?

---
#`rel`

???

rel is used to define the relations we just discussed. We've seen the list of rel values in the spec, but the beautiful thing about that
list is that it is extensible. Different specifications can define new link values that define new relations. And over the years browsers
have implemented a few link-based mechanisms that rely on that extensibility. We'll talk about them in a while.

---
## body-ok

???
One curious thing is that if we have a `<link>` element outside of the HTML's head, the question if that element is valid or not depends
on its `rel` value. Some rel values are defined as "body-ok", while others are not.
That was introduced in order to support microdata links (which actually don't have rel at all, but an itemprop attribute)

And even though in-body styles have been something developers have been doing for a long while, they just recently switched from the naughty
list to the being "body-OK".

One more recent change is that Chrome stopped treating in-body styles as render blocking for the entire document, making them render
blocking only for the parts of the document that are below them. That means that if you're mashing together multiple page components on the
server side, you can now place in-body styles at the begining of the "component" and in most browsers, they
would only block the rendering of the parts that are below them.

---
#`relList`

???
One of the nice things about rel is that it can include more than one relationship. rel's value is a space separated list of relations, so
developers can define multiple relations between one document and another document/resource.

And in order to access that list of relations from Javascript, we have...

relList is the way to access the various values of rel from script without having to parse the damn thing yourself.

---
## `add()`
## `remove()`
## `contains()`

???

It is a DOMTokenList (just like class) and contains cool methods such as `add()`, `remove()`, `contains()` etc.
That makes it so you can easily manipulate the relationship values.

---
# `supports()`

???

Another method recently introduced to DOMTokenList is the `supports()` method, that tells you if a certain token is supported or not.

---
#`href`

???

Stands for Hypertext Reference, which means that it contains a URL to the external document or resource that this link relationship is with.

---
# `media`

???
You may know that one, and Simon Pieters will most probably talk about it in details when he'll talk about source later today, but that
attribute enables the browser to avoid downloading (or delay downloading) of a resource if it is not required for the current user's
conditions (as defined by the media query that is the value of the attribute).

That can be handy.
---
## `<link rel=stylesheet media="not all">`

???

If we look at this example, the browser could have avoided from downloading this stylesheet altogether. However, when Opera, back in its
Presto days, tried to do that, a ton content broke as a result. Websites were relying on this style to download anyway. As a result, what
some browser did was to download such stylesheets but with lower priority.
---
# `type`

???
Another useful conditional is the type attribute which enables the browser to avoid downloading resources which mime type it doesn't
support.

---
## `<link rel=stylesheet type="foo/bar">`

???

Contrary to the previous example, this kind of stylesheet will not get downloaded, at least in most browsers, with no content breaking as a
consequence. The Blink rendering engine recently stopped downloading and applying such styles, and the sky didn't fall. WebKit is still downloading them,
last time I looked.
---
## `onload`/`onerror`

???
Load and error events enable us to know when a resource was downloaded (or failed to do so) and act on it.
---
## `crossorigin`

???
Determines the CORS state of the element. I won't get into what CORS is beyond the absolutely necessary, but it determines if the resource
would be requested with or without credentials and with or without CORS mode.
---
## `referrerpolicy`
## `sizes`
## `title`
## `hreflang`

???
---
# `Link:`
???

One extremely cool feature about link is that is has a header equivalent. That means you can replace the HTML tags with HTTP headers.
In some cases it may not matter much, but in other cases it makes it significantly easier to automate certain links. (which we'll talk about
later)
---
#`</link>`
---
background-image: url(media/resources_gray.jpg)
layout: true
class: center, middle
---

## Loading resources ain't easy

???

Downloading a Web page is a complex process. Each web page is composed of tens or even hundreds of different resources, some resources are
loading others and in general, and loading all the resources that a certain page needs is a complex problem.

Doing that in a reasonable time frame is even more so.

So, today I explain why a browser's job in loading resources is not an easy one. And I'm gonna try to illustrate that using a story.

---
## Once upon a time

![](media/story.jpg)

???

Once upon a time in a land far far away, browsers were loading Web pages.

---
.bright[![](media/dom_creation.svg)]

???
Users clicked on a link or typed in a URL, and then the browser would fetch the relevant HTML file,
tokenize it, parse it and construct a DOM from it.

Then each DOM node would decide which other resources it may need: scripts, stylesheets, images or other HTML files.

And once the DOM node figured that out, it would trigger a download for these resources.

---
##  scripts ruin everything!!!

.bright[![](media/scripts_halt_parser.svg)]

???

that process had one glaring problem with it: Scripts.

Synchronous scripts (which were the only kind of scripts in that far away land) can have all kinds of interaction with the DOM tree and its
styles.

They can modify the DOM tree, they can return different results depending on the current DOM tree and depending on styles that were applied to the DOM.

All that means that once a script is encountered, the DOM creation has to block there and then. So, DOM creation could not continue unless
all scripts that block it finished running, and those scripts couldn't run until all their resources were downloaded, and until all the
styles that preceded them finished downloading and were applied to the DOM.

---

![](media/no-pre-loader-waterfall-ie7.png)

???

And since resource download depended on DOM node creation, have a sync script in your HTML (which almost everyone had) meant that your
*other* resources were blocked on it too. When scripts were downloaded, nothing else would.

---
## Decoupling

???

So how did browsers in that far away land solve this problem?

By decoupling resource download from DOM node creation. That way, they could start downloading the resources earlier while not breaking Web
pages that relied on the fact sync scripts alter the DOM.

So, browsers start to look into the HTML tokenization products, and speculatively download resources based on that.

When an `<img>` tag was encountered for example, they assumed that in most cases, sync scripts won't override it in weird ways, so the
resource in that img's src attribute would be required, so they added it to their lists.

---

.blockquote[The greatest browser optimization of all times.

Steve Souders]

???
That, in and of itself, was a huge improvement! up to a point where Steve souders, the Godfather of Web performance, once called it the
greatest  performance improvement of all times.

---
## By any other name

???
Different browsers called it different names: the speculative parser, the look ahead scanner, the preload scanner. I like to call it the
preloader, as a generic, vendor neutral term.

---
## How does it do its magic?

???

To dive into that we'd need to overview HTML processing a little bit.

---
## `<img src=bla><div>some text</div></img>`

???

The browser downloads the HTML file, as bits on the wire. Then it translates them into the appropriate charset.

---

## Tokenization

.bright[![](media/tokenization.svg)]

???
The next phase after that is tokenization. Breaking apart the single characters into tags and attributes.
At that point, no HTML rules are yet applied.

---
## Parsing

.bright[![](media/dom_tree.svg)]

---

## Heuristic download in the tokenization phase

???

The preloader replaces (or adds on) the previous behavior of download at the parsing phase, by heuristically downloading resources just
based on the HTML tokens. There's a certain risk there, that these resources won't actually be needed, in cases where JS breaks everything.

But since that almost never happens, downloading these resources in advance in a pretty safe bet.

---

## All done??

???
So, the preloader is awesome, and browsers continue to develop it further to also catch resources that are not currently caught,
such as CSS based resources (or at least their hosts) and doc.write based HTML tokens.

But, that doesn't solve all of the browser's problems when it comes to resource discoverability.

---
background-image: url(media/dependency_tree.jpg)
## Discoverability problem

???
A lot of resources are downloaded by scripts, which are downloaded by scripts.

The dependency trees for many sites - the graphs that sho which resources depend on which - are extremely deep in some cases.

Generally speaking, the browser can only discover and download the resources that are on the next level of the dependency graph.

Everything below that is just a big unknown.

This is an example of a (sloppily) handdrawn dependency graph of a not-untypical site. Only the very first level of resources is
discoverable by the browser. Everything else requires JS to run in order for the browser to know that it needs these resources.

---
## How can we help?

???

What can you do to help the browser there?
Well, recently a set of new standards were specified and added to browsers that can help you tell the browser about those deep dependency
graph resources, and make sure that the browser is aware of them before they are required and downloads them ahead of time.

These standards are a reincarnation of older, de facto additions to browsers that tried to do the same, with varying levels of success.

Let's take a look at what they do.

---

.big.center[`<link rel=dns-prefetch>`]

???

DNS prefetch is a link relation that enables developers to tell the browser "you're gonna need to go to this host, so better resolve that
DNS now".

It's a no-brainer, in case you know certain hosts would be required. The cost in terms of bandiwdth is negligable.
The gain OTOH, is not huge, as you're only getting DNS off your critical path.

---

.big.center[`<link rel=preconnect>`]

???

Preconnect is a step up, telling the browser the same "you're gonna need that", but taking it to the next level by making sure that it also
establishes a TCP connections and TLS connection if needed. That a lot more RTTs out of the way.
---

.big.center[`<link rel=preconnect crossorigin>`]

???
Where it gets slightly more complicated is that for privacy reasons, CORS-enabled, credential-less connections are performed using a
separate socket pool. That means that the browser needs to know if a connection would be used for CORS-enabled fetches.

In practice that means that if you have a host where you'd be getting font resources from, you need to add the crossorigin attribute if you
want these connections to be put to good use.

---

.big.center[`<link rel=prefetch>`]

???

Prefetch is one of the relations that we saw earlier in the big table. It's role is to tell the browser about specific resources that would
be required for the next navigation! As a result, these resources are downloaded with low priority and their download doesn't get
interrupted if we're navigating away from the current page. (because that would be silly)
---

.big.center[`<link rel=prerender>`]
.big.center[`<link rel=next>`]

???

next is another relation that we saw earlier, which started as indication on what would likely be the next page the user is going to go to
(e.g. we're on page 2 of the article and next is pointing towards page 3).

Some browsers started using that as an indication to prerender said page, in order to speed things up.
Later on, prerender joined next as an indication of a page that should be prerendered.

That's a high gain/high risk kind of optimization though, since if you get it wrong, the browser have downloaded a full page, including all
CSS, script and images for nothing.

The AMP project have implemented their own prerendering which only downloads the resources required to create that initial viewport
rendering, and it might be a nice idea to improve the way browsers prerender to accomodate that.

---

.big.center[`<link rel=preload>`]

???

Now preload is where things get serious. Preload enabled devs to tell the browser that a specific resource would be needed later on.
And more than that, it enables them to tell it what those resources *are* using the...

---

.big.center[`as` attribute]

???

That's one of the biggest differentiators of preload to previous attempts at getting the same thing done.
It doesn't seem like much, but it enables the browser a few things.

--
* Download priority

--
* Proper `Accept` headers

--
* Content-Security-Policy

--
* Resource matching

---
## Use cases
---
## Load late-discovered resources
---
.left[
```html
<link rel=preload href=script.js as=script>
```
]
---
## Early font loading
---
.left[
```html
<link rel=preload href=font.woff2 as=font
      crossorigin type="font/woff2">
```
]
---
## Decouple load from execution
---
.left[
```javascript
function downloadScript(src) {
  var el = document.createElement("link");
  el.as = "script";
  el.rel = "preload";
  el.href = src;
  document.body.appendChild(el);
}

function runScript(src) {
  var el = document.createElement("script");
  el.src = src;
}
```]
---
## Markup based async loading
---
.left[
```html
<link rel=preload as=style href="async_style.css"
      onload="this.rel='stylesheet'">
```
]
---
## Responsive loading
---
.left[
```html
<link rel=preload as=image href="someimage.jpg"
      media="(max-width: 600px)">
```
]
---
.big.center[`Link:`]
---
## Feature detection

.left[
```javascript
var preloadSupported = function() {
  var link = document.createElement('link');
  var relList = link.relList;
  if (!relList || !relList.supports)
    return false;

return relList.supports('preload');
}
```
]

---
## Better than H2 push?

--
* Cross origin

--
* Cache & cookies

--
* load/error events

--
* Content negotiation

---
## `rel=serviceworker`
---
# Takeaways

---
## Links are cool

--
### Link's built in mechanisms are even cooler
---
## Resource loading is hard

--
### Make it easier, flatten your dependency trees

---
## If that's complex, use preconnect and preload

---
# Thank you!

.center[Yoav Weiss  |     @yoavweiss]
![](media/Akamai-Logo-RGB.png)
---
# Questions?
---
## Acknowledgements

* Link - https://www.flickr.com/photos/32547150@N00/2844476751
* Resources - https://www.flickr.com/photos/36742159@N02/16973927197
* Story - https://www.flickr.com/photos/soldiersmediacenter/3351707140
* Doves - https://www.flickr.com/photos/54942754@N02/19665030619