layout: true class: center, middle background-image: url(media/links.jpg) --- # `
` .center[Yoav Weiss | @yoavweiss] .right[![](media/Akamai-Logo-RGB.png)] ??? Hi, I'm Yoav Weiss. I work for Akamai on making both browsers and CDNs awesomer, and I'm here today to talk to you about the `
` element, browser's resource loading, the connection between the two and how *you* can help browsers load resources faster using the magic of the link element. When we first look at the link element, the obvious problem with it is that it has nothing to do with what we usually refer to as links. If we were to teach someone new to the platform about `
` for the first time, they'd think that this is the element for creating links between pages. And they'd be wrong... So if `
` is not about links, what is it about? --- background-image: url(media/doves.jpg) ## All about relationships ??? the `
` element is about relationships between the current page you're on and... other things on the internet. It can be relationship to other pages, to resources, or theoretically to the online representation of "objects" --- ![](media/link_definition.png) ??? Link has been part of HTML for a long while. It was there in the first Internet draft that defined HTML, as the hidden and underappreciated little brother of `
` tags. Both establish a relationship to an external document or resource, but `
` has an immediate action. It's creating a link that enables the user to hop over to that document it references. `
` has no such action and the processing of the element varies wildly according to its relationship. // https://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt --- ![](media/relationships.png) ??? The original draft included a bunch of relationship types, but stated loud and clear that non of these relationships is part of the spec. Most of the relationships in the original spec are long gone and no one ever cares much about them. But let's take a look at the relationships defined in a more recent spec, shall we? --- ![](media/link_types.png) ??? A lot of items on that list, but if we look at the relations that actually do something... --- ![](media/link_types_annotated.png) ??? We get a totally different picture. We have `stylesheet` for CSS, `alternate` for RSS, `icon` for icons. --- ![](media/link_types_annotated2.png) ??? We also have prefetch and next, which we'll discuss later. But, the other relations are either only allowed on anchors or don't actually change the way browsers behave in any way. Some of them have benefits in terms of SEO or other non-browser HTML consumers. But since they have no impact on UI, I suspect most developers are not really aware of these link relations beyond cargo-cult copy/pasting them around. In practice the first time most developers encounter a `
` tag is when they're googling the syntax for... --- ### `
` ??? which, let's be honest with ourselves, probably should've been something like... --- ### `` ??? Am I right? that could've been simpler. But, `
` was there and external stylesheet were defined as their own documents that style the HTML document, and the rest is history. Going back from that decision won't be Web compatible, as the relative URLs would break. --- ## `
` is a mechanism ??? We have all these different rel values, and each one of them *does* something different. Which means that`
` is a generic mechanism to create relations. And as a mechanism it is pretty rad, and it has a lot of nifty nuts and bolts that tend to make it very helpful. Let's go over them, OK? --- #`rel` ??? rel is used to define the relations we just discussed. We've seen the list of rel values in the spec, but the beautiful thing about that list is that it is extensible. Different specifications can define new link values that define new relations. And over the years browsers have implemented a few link-based mechanisms that rely on that extensibility. We'll talk about them in a while. --- ## body-ok ??? One curious thing is that if we have a `
` element outside of the HTML's head, the question if that element is valid or not depends on its `rel` value. Some rel values are defined as "body-ok", while others are not. That was introduced in order to support microdata links (which actually don't have rel at all, but an itemprop attribute) And even though in-body styles have been something developers have been doing for a long while, they just recently switched from the naughty list to the being "body-OK". One more recent change is that Chrome stopped treating in-body styles as render blocking for the entire document, making them render blocking only for the parts of the document that are below them. That means that if you're mashing together multiple page components on the server side, you can now place in-body styles at the begining of the "component" and in most browsers, they would only block the rendering of the parts that are below them. --- #`relList` ??? One of the nice things about rel is that it can include more than one relationship. rel's value is a space separated list of relations, so developers can define multiple relations between one document and another document/resource. And in order to access that list of relations from Javascript, we have... relList is the way to access the various values of rel from script without having to parse the damn thing yourself. --- ## `add()` ## `remove()` ## `contains()` ??? It is a DOMTokenList (just like class) and contains cool methods such as `add()`, `remove()`, `contains()` etc. That makes it so you can easily manipulate the relationship values. --- # `supports()` ??? Another method recently introduced to DOMTokenList is the `supports()` method, that tells you if a certain token is supported or not. --- #`href` ??? Stands for Hypertext Reference, which means that it contains a URL to the external document or resource that this link relationship is with. --- # `media` ??? You may know that one, and Simon Pieters will most probably talk about it in details when he'll talk about source later today, but that attribute enables the browser to avoid downloading (or delay downloading) of a resource if it is not required for the current user's conditions (as defined by the media query that is the value of the attribute). That can be handy. --- ## `
` ??? If we look at this example, the browser could have avoided from downloading this stylesheet altogether. However, when Opera, back in its Presto days, tried to do that, a ton content broke as a result. Websites were relying on this style to download anyway. As a result, what some browser did was to download such stylesheets but with lower priority. --- # `type` ??? Another useful conditional is the type attribute which enables the browser to avoid downloading resources which mime type it doesn't support. --- ## `
` ??? Contrary to the previous example, this kind of stylesheet will not get downloaded, at least in most browsers, with no content breaking as a consequence. The Blink rendering engine recently stopped downloading and applying such styles, and the sky didn't fall. WebKit is still downloading them, last time I looked. --- ## `onload`/`onerror` ??? Load and error events enable us to know when a resource was downloaded (or failed to do so) and act on it. --- ## `crossorigin` ??? Determines the CORS state of the element. I won't get into what CORS is beyond the absolutely necessary, but it determines if the resource would be requested with or without credentials and with or without CORS mode. --- ## `referrerpolicy` ## `sizes` ## `title` ## `hreflang` ??? --- # `Link:` ??? One extremely cool feature about link is that is has a header equivalent. That means you can replace the HTML tags with HTTP headers. In some cases it may not matter much, but in other cases it makes it significantly easier to automate certain links. (which we'll talk about later) --- #`` --- background-image: url(media/resources_gray.jpg) layout: true class: center, middle --- ## Loading resources ain't easy ??? Downloading a Web page is a complex process. Each web page is composed of tens or even hundreds of different resources, some resources are loading others and in general, and loading all the resources that a certain page needs is a complex problem. Doing that in a reasonable time frame is even more so. So, today I explain why a browser's job in loading resources is not an easy one. And I'm gonna try to illustrate that using a story. --- ## Once upon a time ![](media/story.jpg) ??? Once upon a time in a land far far away, browsers were loading Web pages. --- .bright[![](media/dom_creation.svg)] ??? Users clicked on a link or typed in a URL, and then the browser would fetch the relevant HTML file, tokenize it, parse it and construct a DOM from it. Then each DOM node would decide which other resources it may need: scripts, stylesheets, images or other HTML files. And once the DOM node figured that out, it would trigger a download for these resources. --- ## scripts ruin everything!!! .bright[![](media/scripts_halt_parser.svg)] ??? that process had one glaring problem with it: Scripts. Synchronous scripts (which were the only kind of scripts in that far away land) can have all kinds of interaction with the DOM tree and its styles. They can modify the DOM tree, they can return different results depending on the current DOM tree and depending on styles that were applied to the DOM. All that means that once a script is encountered, the DOM creation has to block there and then. So, DOM creation could not continue unless all scripts that block it finished running, and those scripts couldn't run until all their resources were downloaded, and until all the styles that preceded them finished downloading and were applied to the DOM. --- ![](media/no-pre-loader-waterfall-ie7.png) ??? And since resource download depended on DOM node creation, have a sync script in your HTML (which almost everyone had) meant that your *other* resources were blocked on it too. When scripts were downloaded, nothing else would. --- ## Decoupling ??? So how did browsers in that far away land solve this problem? By decoupling resource download from DOM node creation. That way, they could start downloading the resources earlier while not breaking Web pages that relied on the fact sync scripts alter the DOM. So, browsers start to look into the HTML tokenization products, and speculatively download resources based on that. When an `
` tag was encountered for example, they assumed that in most cases, sync scripts won't override it in weird ways, so the resource in that img's src attribute would be required, so they added it to their lists. --- .blockquote[The greatest browser optimization of all times. Steve Souders] ??? That, in and of itself, was a huge improvement! up to a point where Steve souders, the Godfather of Web performance, once called it the greatest performance improvement of all times. --- ## By any other name ??? Different browsers called it different names: the speculative parser, the look ahead scanner, the preload scanner. I like to call it the preloader, as a generic, vendor neutral term. --- ## How does it do its magic? ??? To dive into that we'd need to overview HTML processing a little bit. --- ## `
some text
` ??? The browser downloads the HTML file, as bits on the wire. Then it translates them into the appropriate charset. --- ## Tokenization .bright[![](media/tokenization.svg)] ??? The next phase after that is tokenization. Breaking apart the single characters into tags and attributes. At that point, no HTML rules are yet applied. --- ## Parsing .bright[![](media/dom_tree.svg)] --- ## Heuristic download in the tokenization phase ??? The preloader replaces (or adds on) the previous behavior of download at the parsing phase, by heuristically downloading resources just based on the HTML tokens. There's a certain risk there, that these resources won't actually be needed, in cases where JS breaks everything. But since that almost never happens, downloading these resources in advance in a pretty safe bet. --- ## All done?? ??? So, the preloader is awesome, and browsers continue to develop it further to also catch resources that are not currently caught, such as CSS based resources (or at least their hosts) and doc.write based HTML tokens. But, that doesn't solve all of the browser's problems when it comes to resource discoverability. --- background-image: url(media/dependency_tree.jpg) ## Discoverability problem ??? A lot of resources are downloaded by scripts, which are downloaded by scripts. The dependency trees for many sites - the graphs that sho which resources depend on which - are extremely deep in some cases. Generally speaking, the browser can only discover and download the resources that are on the next level of the dependency graph. Everything below that is just a big unknown. This is an example of a (sloppily) handdrawn dependency graph of a not-untypical site. Only the very first level of resources is discoverable by the browser. Everything else requires JS to run in order for the browser to know that it needs these resources. --- ## How can we help? ??? What can you do to help the browser there? Well, recently a set of new standards were specified and added to browsers that can help you tell the browser about those deep dependency graph resources, and make sure that the browser is aware of them before they are required and downloads them ahead of time. These standards are a reincarnation of older, de facto additions to browsers that tried to do the same, with varying levels of success. Let's take a look at what they do. --- .big.center[`
`] ??? DNS prefetch is a link relation that enables developers to tell the browser "you're gonna need to go to this host, so better resolve that DNS now". It's a no-brainer, in case you know certain hosts would be required. The cost in terms of bandiwdth is negligable. The gain OTOH, is not huge, as you're only getting DNS off your critical path. --- .big.center[`
`] ??? Preconnect is a step up, telling the browser the same "you're gonna need that", but taking it to the next level by making sure that it also establishes a TCP connections and TLS connection if needed. That a lot more RTTs out of the way. --- .big.center[`
`] ??? Where it gets slightly more complicated is that for privacy reasons, CORS-enabled, credential-less connections are performed using a separate socket pool. That means that the browser needs to know if a connection would be used for CORS-enabled fetches. In practice that means that if you have a host where you'd be getting font resources from, you need to add the crossorigin attribute if you want these connections to be put to good use. --- .big.center[`
`] ??? Prefetch is one of the relations that we saw earlier in the big table. It's role is to tell the browser about specific resources that would be required for the next navigation! As a result, these resources are downloaded with low priority and their download doesn't get interrupted if we're navigating away from the current page. (because that would be silly) --- .big.center[`
`] .big.center[`
`] ??? next is another relation that we saw earlier, which started as indication on what would likely be the next page the user is going to go to (e.g. we're on page 2 of the article and next is pointing towards page 3). Some browsers started using that as an indication to prerender said page, in order to speed things up. Later on, prerender joined next as an indication of a page that should be prerendered. That's a high gain/high risk kind of optimization though, since if you get it wrong, the browser have downloaded a full page, including all CSS, script and images for nothing. The AMP project have implemented their own prerendering which only downloads the resources required to create that initial viewport rendering, and it might be a nice idea to improve the way browsers prerender to accomodate that. --- .big.center[`
`] ??? Now preload is where things get serious. Preload enabled devs to tell the browser that a specific resource would be needed later on. And more than that, it enables them to tell it what those resources *are* using the... --- .big.center[`as` attribute] ??? That's one of the biggest differentiators of preload to previous attempts at getting the same thing done. It doesn't seem like much, but it enables the browser a few things. -- * Download priority -- * Proper `Accept` headers -- * Content-Security-Policy -- * Resource matching --- ## Use cases --- ## Load late-discovered resources --- .left[ ```html
``` ] --- ## Early font loading --- .left[ ```html
``` ] --- ## Decouple load from execution --- .left[ ```javascript function downloadScript(src) { var el = document.createElement("link"); el.as = "script"; el.rel = "preload"; el.href = src; document.body.appendChild(el); } function runScript(src) { var el = document.createElement("script"); el.src = src; } ```] --- ## Markup based async loading --- .left[ ```html
``` ] --- ## Responsive loading --- .left[ ```html
``` ] --- .big.center[`Link:`] --- ## Feature detection .left[ ```javascript var preloadSupported = function() { var link = document.createElement('link'); var relList = link.relList; if (!relList || !relList.supports) return false; return relList.supports('preload'); } ``` ] --- ## Better than H2 push? -- * Cross origin -- * Cache & cookies -- * load/error events -- * Content negotiation --- ## `rel=serviceworker` --- # Takeaways --- ## Links are cool -- ### Link's built in mechanisms are even cooler --- ## Resource loading is hard -- ### Make it easier, flatten your dependency trees --- ## If that's complex, use preconnect and preload --- # Thank you! .center[Yoav Weiss | @yoavweiss] ![](media/Akamai-Logo-RGB.png) --- # Questions? --- ## Acknowledgements * Link - https://www.flickr.com/photos/32547150@N00/2844476751 * Resources - https://www.flickr.com/photos/36742159@N02/16973927197 * Story - https://www.flickr.com/photos/soldiersmediacenter/3351707140 * Doves - https://www.flickr.com/photos/54942754@N02/19665030619