layout: true class: center, middle background-image: url(media/treasure.jpg) --- # Caches All The Way Down .center[Yoav Weiss | @yoavweiss] .right[![](media/Akamai-Logo-RGB.png)] ??? Hi, I'm Yoav Weiss. I work for Akamai on making our CDN as well as browsers faster, and I'm here today to talk a bit about caching. --- ## 2 Hard Things in Computer Science ??? You most probably heard before that there are 2 hard things in computer science. --- # Naming Things ??? Naming things is the obvious one, because giving something a short yet meaningful name is a hard cognitive exercise. But the other one, --- # Cache Invalidation ??? , is less obvious. If you never had to deal with caches, you may not grasp at first why cache invalidation is extremely hard. --- .center[![](media/cache_definition.png)] ??? But what is a cache? If we look up the word's origins, it's coming from the French verb "cacher" or "to hide", so it's basically a hiding place for computer programmes, where they can stash data and keep it around until they may need it at a later time. So, you're a programme, deciding to keep data around for later use. Why is invalidating that data so hard, that's it's right up there on the computer science hall of shame, right alongside *this*? --- .center[![](media/awful_name.png)] ??? First, when you're making a decision of storing something in the cache, you have to guess whether this data is something that would be valuable for you to keep around. Why can't we just keep everything around? Because we live in a world of finite resources, after a short while of your cache running, putting resources in also means throwing something out. In other words, --- # Eviction ??? ...means you have to be fairly certain that the resource you put in the cache is more valuable than the one you're throwing out as a result. Second, when you're serving some piece of data out of the cache, you have to be pretty damn sure this piece of data is still valid info, and not some stale, yesterday news. You have to be sure you're serving --- #Fresh ??? content. Which means that more often than not, you'd prefer to play it safe and revalidate that data, even when that's unnecessary. And when you look at these two points together, you realize that cache invalidation is hard because it requires you to --- background-image: url(media/8ball.jpg) ??? predict the future. Only time will tell if you made the right decision. That resource you just evicted in order to put in a new and shiny one instead? It may be needed a second from now, while the one you stored in cache instead is something you'll never see again. That article you revalidated? 90% of the time you'll find that the resource didn't change, so you paid the extra latency for the revalidation for nothing. But you couldn't know that ahead of time. Because we can't predict the future. (3 minutes) --- background-image: url(media/everywhere.png) # Caches Everywhere! ??? But despite the fact that caching is hard, despite the fact that you can't always get it right, we have caches in computers all the way down to the CPU. CPUs have built in caches, multiple layers of them, called L1, L2 and L3. Each layer has larger cache storage, is less expensive, but ultimately also slower to access. When the CPU is trying to get some data, it will go to the RAM only after it failed to find it in its caches. There are also Operating system caches that enable it to avoid reading from disk by keeping around popular disk pages in RAM. And many programmes keep cached info in RAM, in order to avoid having to fetch data from disk, which is significantly slower, or from the network, which is even slower as well as unpredictable! ---
L1
0.5 nanosec
x1
L2
7 nanosec
x14
L3
30 nanosec
x60
RAM
100 nanosec
x200
SSD
150 microsec
x300,000
HDD Seek
10 millisec
x20,000,000
Network
150 millisec
x300,000,000
??? A few figures just so that you'd get the orders of magnitude we're discussing: *walk through table* This table right here is why we bother with caches. Yes, they are not perfect and require complex logic, but the alternative is so much worse. the alternative could be 300 million times slower!!! --- ## Caching on the Web ??? OK, so caching is awesome, but how does caching on the Web look like? What are the different caches that a request hits on its way to your server? --- class: contain background-image: url(media/questy_1304.png) ??? At first a request object is created inside the rendering engine. Its sole purpose is to find a matching resource and bring it back to the rendering engine so that the resource can be used as part of the rendered page. That resource could be an image, a script or any other external resource. There could also be many reasons for the request to be created: the user clicked on a link, HTML was parsed, or a JS API created the request. And each one of these requests is a little different: it may have a different type, different credential settings, or other internal differences between different requests, beyond the different URL. So, the created request is looking for its resource, and the first place to look is in the closest cache: --- background-image: url(media/memorycache_1377.png) #MemoryCache ??? Or as it should be called, the short term memory cache. That cache is part of the renderer and keeps in RAM resources that the renderer seen before, but disappears when the renderer is destroyed because the user clicked away. So if the resource we're looking for was previously loaded on that page by the preloadScanner, --- background-image: url(media/memorycache_1377.png) ## `
` ??? a preload link, --- background-image: url(media/memorycache_1377.png) ## `
`
``
`
` ??? or multiple tags (show examples), then the resource would be in the MemoryCache, and we can use it and stop our quest for a resource. You'll sometimes hear people refer to it as the Image Cache or the Preload Cache. At least in Chrome, this is all done by a single cache. There are underway efforts to separate preload from other resources, but all in all this entire area is totally under-specced and implementations can vary. Not great :/ --- background-image: url(media/mismatch_2114.png) ??? The MemoryCache also has a bunch of rules regarding which resources can be a match for which requests. Obviousely, URL matching is a pre-requisite, but there's also type matching, so that an image request can't get for example a script resource as a response. There's also credential checking and other conditions. --- background-image: url(media/non_cacheable_1600.png) ??? At the same time, HTTP caching semantics are not part of that, and the MemoryCache will happily serve non-cacheable resources to requests, since it is volatile by nature. The one exception here is that the MemoryCache will no serve no-store responses. Like I said, All this is underspecced, and we should totally do a better job and spec that whole thing, probably as part of the Fetch spec. --- class: contain background-image: url(media/resource_timing.png) ??? If our request didn't find the right resource in the MemoryCache, it continues on its way. At that point it gets registered as a network request in both resource timing and devtools. That means that if a request was served from the MemoryCache, it will not appear in your dev tools network tab, nor in your resource timing timeline. After that, the request continues to the Service Worker. --- class: contain background-image: url(media/service_worker_975.png) # Service Worker ??? Service Workers as I'm sure you've heard, are extremly powerful, in-the-browser JS proxies which enable you to manipulate requests and responses. As such, they have their own separate cache with an API of its own. From the request's perspective, Service Worker is totally unpredictable and can return anything as its response. It can return a response to a totally different request, a made up response or a previous response it hit for this request. The logic is not baked into the browser, but created by the Web developer. And by default, the cache is not bound to HTTP semantics. If the Service Worker has no matching resource for our request, it uses `fetch()` to send it further down, to the network stack (which is often in a different process, which means some extra latency) --- class: contain background-image: url(media/http_cache_765.png) # HTTP cache ??? At the network stack, *the* place to look for resources is the HTTP cache! The HTTP cache is a rather strict, and follows all the HTTP caching semantics to the letter. We'll soon discuss what the means exactly. At the same time, it ignores many of the restrictions placed on the MemoryCache, and does allow mixed type matches (so image requests can find get a script response there, for example). The HTTP cache is a persistent cache storage, which means that it also has to evacuate resources (and has some eviction scheme), and that since it uses persistent storage, it might be significantly slower than the MemoryCache. But, if the HTTP cache doesn't have a resource for our request, we'd now have to go to the network, right? Wrong! When working with HTTP/2, there's one more cache on our request way before it hits the network. --- class: contain background-image: url(media/push_cache_890.png) # Push cache ### AKA: The Unclaimed Push Stream Container ??? The push cache is a non-persistent container on the H2 connection which keeps around resources which were pushed by the server. H2 push is a feature that enables the server to send resources to the browser before it requested them. When that happens, these resources are stored in the push cache, waiting for matching requests to come along. Once a request matched the resource, it gets taken out of the push cache, but often then goes into the HTTP cache. Because the push cache is owned by the H2 connection, if the connection is closed, those pushed resources are gone. It also means that if you pushed a resource on one connection, and the request for it comes off on a separate connection, the pushed resource won't be used. Request and resource matching in the push cache is also underspecced, so the rules there are may vary between implementations. But if our response resource is not waiting for us at the push cache, the next stop is the network --- background-image: url()
??? The network can be an extremely unpredictable medium, latencies vary based on queues filling up along the network, packet can get lost due to such queues overflowing, collisions at the radio layer, data corruption of ongoing packets and more. It's a pretty scary place. Latencies can also vary by the type of network that we're dealing with and the distance that the data has to go through. Wifi networks have relatively low latency, but high packet loss rates when they get congested, which results in jittery latency. Cellular networks have relatively high latencies, even though those have gotten better over the generations. 1G or GPRS had latency of ~700ms. 4G has a theoretical one of 50ms. And wired networks usually have low latencies, but still, connecting two continents with fiber requires us to send light beams from one place to the other. Speed of light is an inherent lower bound on that latency. --- background-image: url()
??? So, if latency is such a dominant factor in our web apps performing well, what can we do to lower it? That's where CDNs come in. Sitting as close as possible to your ISP's gateway are the CDN edge servers, which are likely to serve you the content from their internal cache rather than send it all the way up to the origin server. They also often terminate your TLS connection, significantly lowering the cost of TLS connection establishment. (again, by fighting latency) At Akamai we see a median latency of 80ms between the user and the edge server on mobile networks, but that can vary based on the network type. So, if we found our resource here, that's awesome. And if we didn't? The CDN edge server needs to forward the request to the origin. (hence this type of proxy is called forward proxy) --- background-image: url()
??? At the origin server's network, you're likely to hit another proxy, another cache. It might be an independent reverse proxy or a software component such as redis inside your server architecture. Why do we need another cache server as part of the server's network? Latency between that server and the origin is likely to be very low... OTOH, when the server is creating dynamic content, it has to talk to databases and potentially fetch data from different places across its network, or even external APIs. That can take a while. And if the same request is hitting the server multiple times in a row, having caches in place can save that time and processing power and serve the response for the first request to future identical requests. These caches may or may not follow HTTP cache semantics, based on their configured logic, which is in the Web developer's control, at least in theory. (In practice they can come preconfigured, or be controlled by a different group from the ones handling the content) And if that last line of caching cannot serve our request: --- background-image: url()
??? We'd have to find it at the origin server. The origin server must have our resource, generate it, or declare failure (so a 400 response or a 500 response). And once it serves us that response --- class: contain background-image: url(media/happilly_ever_after_1600.png) ??? Now our request and resource have found each other and can start they way back to the browser. --- class: contain background-image: url(media/happilly_ever_after_1600.png)
--- class: contain background-image: url(media/happilly_ever_after_1600.png)
--- class: contain background-image: url(media/happilly_ever_after_1600.png)
--- class: contain background-image: url(media/happilly_ever_after_1600.png) # HTTP cache ??? The HTTP cache stores a copy of the resource if it's cacheable. --- class: contain background-image: url(media/happilly_ever_after_1600.png) # Service Worker cache ??? The SW sees a stream of the resource passing by and may or may not decided to store it for future use. --- class: contain background-image: url(media/happilly_ever_after_1600.png) #ResourceTiming --- class: contain background-image: url(media/happilly_ever_after_1600.png) # MemoryCache ??? And finally the MemoryCache stored a pointer to the resoruce, so it can direct future requests to it, btu without creating extra memory allocations. (20 minutes) --- background-image: url(media/http:1.0.png) ??? So we talked a lot about HTTP caching semantics while going over the request lifetime, but what is that exactly? Well, the good folks that standardized the HTTP/1.0 protocol realized that caching is important and included Caching directives as part of the protocol. --- background-image: url(media/http:1.1.png) ??? A few years later with some more implementation experience and understanding of what people need to do, the HTTP/1.1 protocol revamped those caching directives and improved them. Let's go over those HTTP mechanisms, shall we? First of all the cache key for HTTP is the resource URL. If you request the same URL twice, the first response could be cached and used to serve the second one. Other Key concepts in HTTP caching are "freshness" and "validators". --- # Freshness ??? Freshness of a resource determines how long you can use that resource without revalidating it. If you remember the "predicting the future" part that I talked about a few minutes ago, determining the right "freshness" for a resource is it. --- ### `Cache-Control: max-age=3600`; ??? If you include a "max-age" directive of 3600 seconds, you're basically telling the browser and any other cache along the way that you guaranty that this resource will not change in the next hour, but after that, it might. That means that if you shipped a thing, and found a bug a minute later, fixed it and deployed it, your users may continue to see that bug for almost an hour after the fix was deployed. Not great... The once the freshness lifetime of the resource ran out, that doesn't mean that the resource has changed. It just means that you need to revalidate it. That revalidation happens with "Conditional requests" that are using --- # Validators ??? HTTP responses may include headers such as "Last-Modified" and "ETag". What do these headers do? They tell the cache: "This resource was last modified at this date" (duh!) or provide a signature of the resource. That enables the cache to revalidate the resource at a relatively low cost. --- class: left ### `If-None-Match: badbaaadbeef` ### `If-Modified-Since: Mon, 29 May 2017 15:32:00 GMT` ??? It can send out a request with an "If-Modified-Since" or "If-None-Match" headers, basically asking the server "Did this thing change?" and the server can reply with "Nah", or in HTTP, a --- ## `304 Not Modified` ??? response. Or if the resource has changed, the server can reply with a 200 OK status response, just like it would normally have. So validators give us the ability to revalidate a response, without downloading its payload if it hasn't changed. That's awesome, but revalidation still has a cost, as it still takes a full RTT (or round trip time) to get the response back and know we can use the resource at hand. --- # Scope ??? Another aspect of HTTP's caching directives is their scope, and *who* can cache said resources. --- background-image: url(media/amazon_private_info.png) ??? For some resources it's perfectly fine to cache them in the browser for a particular user, but would be an awful privacy breach if cached on the network as a publicly cached resource. In other cases it's fine for an Edge cache to cache a resource, but not for the user, as the resource may change, and we don't want it in user's cache, where it's out of our control. --- # Cache Directives ??? How do we define freshness, validators and scope in HTTP? Freshness is time based, so we have a bunch of directives that indicate that time. --- # `Expires:` ??? In HTTP/1.0 we had the `Expires` header, which means you have to set a date in the future where that resource will expire. --- ##
`Pragma: no-cache`
??? HTTP/1.0 included that as a request header, indicating the client doesn't want cached content. For a long while, people tended to use it as a response header. It's not. It never was. Don't. --- # `Cache-Control:` ??? HTTP/1.1 gave us `Cache-Control` which has multiple directives influencing freshness, scope, and how cached resources can be used. --- # Freshness ## `max-age` ## `s-maxage` ## `immutable` ??? `Cache-Control` has a `max-age` value which gives us a way to say "this is fresh for the next hour" without doing time based math at the server. `s-maxage` is applicable only to shared caches, which means that we can define different lifetimes for shared caches and to private caches. `immutable` means "this resource will never change so don't bother revalidating it". It's a recent addition, only supported by Firefox at the moment. --- # Usage ### `must-revalidate` ### `no-cache` ### `no-store` ### `proxy-revalidate` ### `no-transform` ### `stale-while-revalidate` ??? Now we talked earlier about the 2 hardest things in computer science. must-revalidate and no-cache are the 2 biggest lies in computer science. --- ## `must-revalidate` -- ## (may not revalidate) ??? must-revalidate means that your content will not be revalidated as long as it is fresh, but cannot be served stale, once its freshness ran out. That's something you should use only when the content you serve will really be invalid after the freshness has run out. --- ## `no-cache` -- ## (will cache) ??? no-cache means that your content will be cached, but won't be served without revalidation. --- ##`no-cache=set-cookie` ??? no-cache can also have a specific header value in its defintion, which changes its meaning further, and then means "do not cache that particular response header, but everything else can be cached (and be served without validation) just fine" ... I know --- ## `no-store` ??? Will actually do what it says and avoid storing the resource on disk and will evict from memory as soon as possible (which can lead to issues). --- ### `proxy-revalidate` ??? proxy-revalidate is the same as 'must-revalidate' (which doesn't have to revalidate), but only applicable to public caches and not to private ones. --- ### `no-transform` ??? no-transform means that a cache serving this response must not change it in any way. One example of caches that alter responses is optimization proxies which may compress the images, or minify CSS and JS. That directive tell them not to. --- ### `stale-while-revalidate` --- # Scope ## `public` ## `private` ??? Then we have cache control directives that define the scope of the resource's cachability `Cache-Control: private` means that a resource is not cacheable as a public resource, and can only be cached on the client, in the browser's cache. `Cache-Control: public` means exactly the opposite. --- # And by default? ??? By default a cache server can serve content which doesn't indicate otherwise, for most HTTP status codes, using heuristic freshness times. It also can (although that's rarely used) serve stale content unless the content indicates otherwise. And some caching implementations treat URLs with parameters in them differently, and avoid caching them unless they have explicit caching directives. --- # `Surrogate-Control` ??? is a caching header equivalent to `Cache-Control` but destined at your surrogate cache. That's either your internal reverse proxy, or your CDN's edge server. It enables you to serve separate caching instructions to those proxies, vs. other proxies along the network. --- # `Age` ??? Another related HTTP response header, set by proxies, is `Age`. Using that a caching proxy tells anyone using its response that the resource it is serving has been in the cache for that amount of seconds. --- # `Vary` ??? is used in content negotiation scenarios, where the response generated by the server has changed (or varied) based on a certain request HTTP header. `Vary` enables the server to tell downstream caches that this response can be cached, but its cache key can now no longer be tied only to the URL, but must also be related to the header in question. --- class: bright ![](media/vary.svg) ??? That enables the server to tell caches along the way that the content was adapted to a particular request header. For example if we're using client-hints, and the browser sends up the width of the image it is requesting along with request, the server can adapt to that and tell any cache that this response can be served to requests with identical "Width" value, but not to ones with a different "Width" value. --- background-image: url(media/key_rfc.png) # `Key` ??? And an upcoming HTTP header value enables us to do more than that and have more granular control. The `Key` header enables us to define more complex cache keys. --- class: bright ![](media/key.svg) ??? If we'll look at the same client-hints example from before, maybe our server only serves image resources in fixed increments. In that case what we want to tell caches is "the response is good for requests with a Width value between 300 and 600". Key enables us to do that and much more. I won't go into more details, as it's not yet implemented anywhere, but it's a very powerful new proposal. --- ## Caching use cases ??? OK, so we just went over a large amount of possible header values, but you're just trying to make your content properly cacheable. What should you do? --- background-image: url(media/jake_caching.png) ??? I'm gonna steal Jake Archibald's advice here. In order to avoid the "predicting the future" issues that we've talked about earlier, there are 2 common patterns that you can follow without too much headaches. --- # Immutable content ??? --- # Always revalidate --- ## Everything else is a gamble --- # Or is it???? --- # Hold Till Told! ??? In the CDN world a common pattern is to have rarely changed content considered as "immutable" on the edge, and use explicit purge instructions whenever it changes. So your content is cacheable, but if you pushed to production a JS bug, a false statement or the wrong price, getting rid of it is just a button click or an API call away. Wouldn't it be cool if we could bring that same pattern to the browser? --- # Service Workers to the rescue! ??? With Service Workers we can have browser caching controlled by our own logic, implemented in JS. That means we can implement such a pattern in the browser, and have such resources cached in the SW cache, while communicating "purge instructions" all the way up to the browser. One example of such purge instructions could be keeping around a text file on the server which contains a list of purged URLs, which the SW periodically checks and evicts from its cache as necessary. --- # Takeaways --- # Caching is important ??? Don't neglect caching as it can make a huge difference to your site's performance. HTTP caching can be a bit complex and daunting, but don't let it discourage you. --- # Browser internal caches ??? There are lots of them. Not all are specced. And if you're using H2 push, preload or prefetch, knowing these caches can come in handy. --- # HTTP Caching patterns ??? immutable and always revalidate --- # Service Worker FTW ??? Service worker can enable us new and exciting caching patterns such as "hold till told" in the browser. --- # Thank you! .center[Yoav Weiss | @yoavweiss] ![](media/Akamai-Logo-RGB.png) --- # Questions? .center[Yoav Weiss | @yoavweiss] ![](media/Akamai-Logo-RGB.png) ### Credits * https://www.flickr.com/photos/68784095@N00/7883688474 - magic 8ball * https://www.flickr.com/photos/puuikibeach/24657325864/ - treasure chest * https://gist.github.com/hellerbarde/2843375 - Access speed numbers