layout: true class: center, middle background-image: url(media/mechanism.jpg) --- # On Preloads and Preloaders .center[Yoav Weiss | @yoavweiss] .right[![](media/Akamai-Logo-RGB.png)] ??? Hi, I'm Yoav Weiss. I work for Akamai on making our CDN as well as browsers faster, and I'm here today to to answer 2 fundemental questions: --- # How do browsers load resources? ??? Something we rarely think about in our day to day. We just add resources to our pages and the browser Just loads them! But how does the browser know which resources to load? --- # How can we make it faster? ??? Cause we're here to talk about performance, right? --- # Start from the begining ??? In order to understand how browsers load resources today, let's start by taking a trip down memory lane and examine how they used to load resources. --- .bright[![](media/dom_creation.svg)] ??? In the early day of the Web, users clicked on a link or typed in a URL, and then the browser would fetch the relevant HTML file, tokenize it, parse it and construct a DOM from it. Then each DOM node would decide which other resources it may need: scripts, stylesheets, images or other HTML files. And once the DOM node figured that out, it would trigger a download for these resources. In short, each DOM node, when created, downloads the resources that it needs. Anyone spots a problem with that? --- ## scripts ruin everything!!! .bright[![](media/scripts_halt_parser.svg)] ??? Synchronous scripts (the only kind of scripts in the early days) can have all kinds of interaction with the DOM tree and its styles. They can modify the DOM tree, they can return different results depending on the current DOM tree and depending on styles that were applied to the DOM. All that means that once a script is encountered, the DOM creation has to block there and then. So, DOM creation could not continue unless all scripts that block it finished running, and those scripts couldn't run until all their resources were downloaded, and until all the styles that preceded them finished downloading and were applied to the DOM. --- .center[![](media/no-pre-loader-waterfall-ie7.png)] ??? And since resource download depended on DOM node creation, have a sync script in your HTML (which almost everyone had) meant that your *other* resources were blocked on it too. When scripts were downloaded, nothing else would. which results in a staircase shaped network waterfall --- ## Decoupling ??? So how did browsers solve this problem? By decoupling resource download from DOM node creation. That way, they could start downloading the resources earlier while not breaking Web pages that relied on the fact sync scripts alter the DOM. --- background-image: url(media/james_brown.jpg) .blockquote[The greatest browser optimization of all times. Steve Souders] ??? That, in and of itself, was a huge improvement! up to a point where Steve Souders, the Godfather of Web performance, once called it the greatest performance improvement of all times. --- background-image: url(media/rose.jpg) # By any other name ??? Different browsers called it different names: the speculative parser, the look ahead scanner, the preload scanner. I like to call it the preloader, as a generic, vendor neutral term. --- ## How does it do its magic? ??? To dive into that we'd need to overview HTML processing a little bit. --- ## `
some text
` ??? The browser downloads the HTML file, as bits on the wire. Then it translates them into the appropriate charset. Does anyone spot problems with that HTML? --- ## Tokenization .bright[![](media/tokenization.svg)] ??? The next phase after that is tokenization. Breaking apart the single characters into tags and attributes. At that point, no HTML rules are yet applied. --- ## Parsing .bright[![](media/dom_tree.svg)] ??? Then parsing takes place, and applies HTML rules to the tokens and creating a DOM tree from them. --- ## Heuristic download in the tokenization phase ??? The preloader replaces (or adds on) the previous behavior of download at the parsing phase, by heuristically downloading resources just based on the HTML tokens. There's a certain risk there, that these resources won't actually be needed, in cases where JS breaks everything. But since that almost never happens, downloading these resources in advance in a pretty safe bet. --- ## All done?? ??? So, the preloader is awesome, and browsers continue to develop it further to also catch resources that are not currently caught. But, that doesn't solve all of the browser's problems when it comes to resource discoverability. Nowadays, many of our pages are not quite flat in their structure. --- ![](media/tree_html_only.svg) ??? The browser starts out by just downloading the HTML. That's the only URL it's aware of. --- ![](media/tree_html_first_subresources.svg) ??? Then as the HTML is processed, CSS, images and scripts are discovered. --- ![](media/tree_html_second_subresources.svg) ??? But in many cases, CSS and scripts load other resources. --- ![](media/tree_html_third_subresources.svg) ??? Which then can load even more resources CSS background images are declared in CSS, so discoverable only when the browser calculates the style. Fonts were later added to the mix, and fonts are even more complicated. As they're only downloaded when the browser knows that they will be used. So it has to calculate the style, and cross those style with the DOM to see if the resource is really needed. --- background-image: url(media/dependency_tree.jpg) ## Discoverability problem ??? A lot of resources are downloaded by scripts, which are downloaded by scripts. The dependency trees for many sites - the graphs that sho which resources depend on which - are extremely deep in some cases. Generally speaking, the browser can only discover and download the resources that are on the next level of the dependency graph. Everything below that is just a big unknown. This is an example of a (sloppily) handdrawn dependency graph of a not-untypical site. Only the very first level of resources is discoverable by the browser. Everything else requires JS to run in order for the browser to know that it needs these resources. --- ## Bandwidth utilization problem ![](media/espn_bandwidth.png) ??? https://www.webpagetest.org/video/compare.php?tests=161027_M4_2a5c527b9d0bcc489be71e38971f5619-r:1-c:0 --- # How can we fill the gaps? ??? How can we solve that? How can we fill those bandwidth gaps? I like to divide the bandwidth gap problem into two periods, with two different solutions. --- # Pre-HTML ??? Let's start by looking at the "pre HTML" period. During that period, the browser is not aware of any resources that it needs to download beyond the HTML. What are the typical reasons out HTML gets stalled, which extend that period? --- # Server "think-time" --- # Redirects --- .center.large[![](media/espn_bandwidth_pre_html.png)] ??? --- .center.large[![](media/espn_bandwidth_pre_html_potential.png)] ??? --- # H2 to the rescue --- layout: true class: center, middle background-image: url(media/push.jpg) --- # H2 push ??? H2 push enables us to take advantage of this idle period and push down resources that we know the browser is going to need. That can have a double advantage: --- # Using idle bandwidth --- # Warming up connection ??? Explain slow start --- ![](media/page_loading_nopush.svg) ??? 300 ms think time 100 ms RTT 98KB CSS+JS 120KB HTML --- ![](media/page_loading_push.svg) ??? --- .center.large[![](media/push_all_the_things.png)] --- #DON'T .center.large[![](media/push_all_the_things.png)] --- # Priorities ??? When pushing you're taking away the browser's ability to tell you about the resources and their priorities. In general, browsers have put in years of research into resource priorities, and pusing everything takes that away from them. If you do that, you better be pretty confident that you're doing a better job, but chances are, you are not. --- # Caching ??? The second point where push is not ideal is when it comes to browser caching. Unless you built a mechanism that enables you to keep track of the browser's cache state (and no such mechanism is ideal), you're not aware if the resources you're pushing are in the browser's cache. That means that for repeating users, you're likely to push content that the user already has, which is wasting bandwidth and time. The browser should RST push streams for resources it already has in cache, but turns out no browser reliably does that today. There's work underway in Chrome, but it's not there yet. --- background-image: url(media/cache_digests.png) # Cache digests ??? There's a proposal from Kazuho (the H2O developer, as well as the guy speaking here right after me) to enable browser based cache digests, so that the server will be notified when the connection start about the user's cache state. That proposal is great and will eliviate most of the caching concern, but it's not yet there. --- layout: true class: center, middle background-image: url(media/pull.jpg) --- background-image: url(media/mechanism.jpg) # Post-HTML .center[![](media/espn_bandwidth_post_html.png)] ??? This is what we can do before the HTML arrives. What can we do after it has? Well, recently a set of new standards were specified and added to browsers that can help you tell the browser about those deep dependency graph resources, and make sure that the browser is aware of them before they are required and downloads them ahead of time. These standards are a reincarnation of older, de facto additions to browsers that tried to do the same, with varying levels of success. Let's take a look at what they do. --- .big.center[`
`] ??? Preconnect enables us to tell the browser that it's going to need to connect to a certain host, so that it can perform a DNS lookup, establishe a TCP connections and TLS connection if needed. That a lot of RTTs out of the way. --- .big.center[`
`] ??? Where it gets slightly more complicated is that for privacy reasons, CORS-enabled, credential-less connections are performed using a separate socket pool. That means that the browser needs to know if a connection would be used for CORS-enabled fetches. In practice that means that if you have a host where you'd be getting font resources from or XHR, you need to add the crossorigin attribute if you want these connections to be put to good use. --- .big.center[`
`] ??? Now preload is where things get serious. Preload enabled devs to tell the browser that a specific resource would be needed later on. And more than that, it enables them to tell it what those resources *are* using the... --- .big.center[`as` attribute] ??? That's one of the biggest differentiators of preload to previous attempts at getting the same thing done. It doesn't seem like much, but it enables the browser a few things. -- * Download priority -- * Proper `Accept` headers -- * Content-Security-Policy -- * Resource matching --- ## Use cases --- ## Load late-discovered resources --- .left[ ```html
``` ] --- ## Early font loading --- .left[ ```html
``` ] --- ## Decouple load from execution --- .left[ ```javascript function downloadScript(src) { var el = document.createElement("link"); el.as = "script"; el.rel = "preload"; el.href = src; document.body.appendChild(el); } function runScript(src) { var el = document.createElement("script"); el.src = src; } ```] --- ## Markup based async loading --- .left[ ```html
``` ] --- ## Responsive preloading --- .left[ ```html
``` ] --- .big.center[`Link:`] ```http Link:
;rel=preload;as=script ``` --- ## Feature detection .left[ ```javascript var preloadSupported = function() { var link = document.createElement('link'); var relList = link.relList; if (!relList || !relList.supports) return false; return relList.supports('preload'); } ``` ] --- #Push hint .left[```http Link:
;rel=preload;as=script Link:
;rel=preload;as=script;nopush ```] --- # Preload > push? -- * Cross origin -- * Cache & cookies -- * load/error events -- * Content negotiation --- # But can't #pre-HTML ??? But important to remember that preload's big disadvantage is that is can't help for the "pre-HTML" period. --- background-image: url(media/early_hints.png) # Or can it? --- layout: true class: center, middle background-image: url(media/takeaway.jpg) --- --- ## Resource loading is hard --- ## Bandwidth is often underutilized --- ## Push critical resources during HTML think-time --- ## Preload late-discovered resources --- ## Preload enables new loading patterns --- background-image: url(media/thanks.jpg) # Thank you! .center[Yoav Weiss | @yoavweiss] ![](media/Akamai-Logo-RGB.png) --- background-image: url(media/thanks.jpg) # Questions? .center[Yoav Weiss | @yoavweiss] ![](media/Akamai-Logo-RGB.png) --- background-image: url(media/thanks.jpg) ## Acknowledgements * Resources - https://www.flickr.com/photos/36742159@N02/16973927197 * James Brown mural - Jay Galvin https://www.flickr.com/photos/36957368@N00/8254471580 * Mechanism - Vlastimil Koutecký https://www.flickr.com/photos/vlastimil_koutecky/9205853995/ * Rose - Susanne Nilsson https://www.flickr.com/photos/infomastern/9916290543/ * Push - The US Army https://www.flickr.com/photos/soldiersmediacenter/14789290273/ * Pull - Clemens v. Vogelsang https://www.flickr.com/photos/vauvau/3507527748/ * Takeaway - Jeremy Segrott https://www.flickr.com/photos/126337928@N05/23964405150/ * Thanks - Alessio Maffeis https://www.flickr.com/photos/imaffo/4736628110/