Caches All The Way Down

layout: true
class: center, middle
background-image: url(media/treasure.jpg)
---

# Caches All The Way Down

.center[Yoav Weiss  |     @yoavweiss]
.right[![](media/Akamai-Logo-RGB.png)]

???

Hi, I'm Yoav Weiss. I work for Akamai on making our CDN as well as browsers faster, and I'm here today to
talk a bit about caching.
---

## 2 Hard Things in Computer Science

???
You most probably heard before that there are 2 hard things in computer
science.

---
# Naming Things

???
Naming things is the obvious one, because giving something a
short yet meaningful name is a hard cognitive exercise. But the other
one,

---
# Cache Invalidation

???
, is less obvious. If you never had to deal with
caches, you may not grasp at first why cache invalidation is extremely
hard.

---
.center[![](media/cache_definition.png)]

???
But what is a cache? If we look up the word's origins, it's coming from
the French verb "cacher" or "to hide", so it's basically a hiding place
for computer programmes, where they can stash data and keep it around
until they may need it at a later time.

So, let's say you're a programme, deciding to keep data around for later use. Why
is invalidating that data so hard, that's it's right up there on the
computer science hall of shame, right alongside *this*?
---

.center[![](media/awful_name.png)]

???

First, when you're making a decision of storing something in the cache, you
have to guess whether this data is something that would be valuable for you to keep around.
Why can't we just keep everything around?

Because we live in a world of finite resources, after a short while of
your cache running, putting resources in also means throwing something
out.
In other words,

---
# Eviction

???
...means you have to be fairly certain that the resource you put in the cache is more valuable than the one you're throwing out as a result.

Second, when you're serving some piece of data out of the cache, you
have to be pretty damn sure this piece of data is still valid info, and
not some stale, yesterday news. 
You have to be sure you're serving
---
#Fresh

???
content.
Which means that more often than not,
you'd prefer to play it safe and revalidate that data, even
when that's unnecessary.

And when you look at these two points together, you realize that cache
invalidation is hard because it requires you to

---
background-image: url(media/cristal.jpg)

???
predict the future.
Only time will tell if you made the right decision.

That resource you just evicted in order to put in a new and shiny one
instead? It may be needed a second from now, while the one you stored in
cache instead is something you'll never see again.

That article you revalidated? 90% of the time you'll find that the
resource didn't change, so you paid the extra latency for the
revalidation for nothing. But you couldn't know that ahead of time.
Because we can't predict the future.

(3 minutes)

---
# Caches, Caches Everywhere!
???

But despite the fact that caching is hard, despite
the fact that you can't always get it right, we have caches in computers
all the way down to the CPU. CPUs have built in caches, multiple layers
of them, called L1, L2 and L3. Each layer has larger cache storage, is less
expensive, but ultimately also slower to access. When the CPU is trying to get some data, it will go to the RAM only after it failed to find
it in its caches.
There are also Operating system caches that enable it
to avoid reading from disk by keeping around popular disk pages in RAM.
And many programmes keep cached info in
RAM, in order to avoid having to fetch data from disk, which is
significantly slower, or from the network, which is even slower as well as
unpredictable!

---
<table style="margin: auto; font-size: 2em; border: 2px solid black;">
    <tr>
        <td>L1</td>
        <td> 0.5 nanosec</td>
        <td>x1</td>
    </tr>
    <tr>
        <td>L2</td>
        <td> 7 nanosec</td>
        <td>x14</td>
    </tr>
    <tr>
        <td>L3</td>
        <td>30 nanosec</td>
        <td>x60</td>
    </tr>
    <tr>
        <td>RAM</td>
        <td>100 nanosec</td>
        <td>x200</td>
    </tr>
    <tr>
        <td>SSD</td>
        <td>150 microsec</td>
        <td>x300,000</td>
    </tr>
    <tr>
        <td>HDD Seek</td>
        <td>10 millisec</td>
        <td>x20,000,000</td>
    </tr>
    <tr>
        <td>Network</td>
        <td>150 millisec</td>
        <td>x300,000,000</td>
    </tr>
</table>

???

A few figures just so that you'd get the orders of magnitude we're
discussing:
*walk through table*

This table right here is why we bother with caches. Yes, they are not
perfect and require complex logic, but the alternative is so much worse.
the alternative could be 300 million times slower!!!

---
## Caching on the Web

???

OK, so caching is awesome, but how does caching on the Web look like?
What are the different caches that a request hits on its way to your server?

---
class: contain
background-image: url(media/questy_1304.png)

???
At first a request object is created inside the rendering engine. Its sole purpose is to find a matching resource and bring it back to the
rendering engine so that the resource can be used as part of the rendered page. That resource could be an image, a script or any other
external resource. There could also be many reasons for the request to be created: the user clicked on a link, HTML was parsed, or a JS API
created the request. And each one of these requests is a little different: it may have a different type, different credential settings, or
other internal differences between different requests, beyond the different URL.

So, the created request is looking for its resource, and the first place to look is in the closest cache:
---
background-image: url(media/memorycache_1377.png)
#MemoryCache

???

Or as it should be called, the short term memory cache.
That cache is part of the renderer and keeps in RAM resources that the renderer seen before, but disappears when the renderer is destroyed
because the user clicked away. So if the resource we're looking for
was previously loaded on that page by the preloadScanner,

---
background-image: url(media/memorycache_1377.png)
## `<link rel=preload href=foo>`

???
a preload link, 
---
background-image: url(media/memorycache_1377.png)
## `<img src=foo>`<br>``<br> `<img src=foo>`

???
or multiple tags (show examples), then the resource would be in
the MemoryCache, and we can use it and stop our quest for a resource.

---
background-image: url(media/mismatch_2114.png)

???

The MemoryCache also has a bunch of rules regarding which resources can be a match for which requests. Obviousely, URL matching is a
pre-requisite, but there's also type matching, so that an image request can't get for example a script resource as a response.
There's also credential checking and other conditions.

---
background-image: url(media/non_cacheable_1600.png)

???

At the same time, HTTP caching semantics are not part of that, and the MemoryCache
will happily serve non-cacheable resources to requests, since it is volatile by nature.

The one exception here is that the MemoryCache will no serve no-store responses.

All this is underspecced, and we should totally do a better job and spec that whole thing, probably as part of the Fetch spec.

---
class: contain
background-image: url(media/resource_timing.png)

???

If our request didn't find the right resource in the MemoryCache, it continues on its way.

At that point it gets registered as a network request in both resource timing and devtools. That means that if a request was served from the
MemoryCache, it will not appear in your dev tools network tab, nor in your resource timing timeline.

After that, the request continues to the Service Worker.

---
class: contain, down
background-image: url(media/service_worker_975.png)
# Service Worker

???

Service Workers as I'm sure you've heard, are extremly powerful, in-the-browser JS proxies which enable you to manipulate requests and responses.
As such, they have their own separate cache with an API of its own.

From the request's perspective, Service Worker is totally unpredictable and can return anything as its response. It can return a response to
a totally different request, a generated response or a previous response it hit for this request. The logic is not baked into the browser, but
created by the Web developer.

And by default, the cache is not bound to HTTP semantics.

If the Service Worker has no matching resource for our request, it uses `fetch()` to send it further down, to the network stack (which is
often in a different process, which means some extra latency)

---
class: contain, down
background-image: url(media/http_cache_765.png)

# HTTP cache

???

At the network stack, *the* place to look for resources is the HTTP cache! The HTTP cache is a rather strict,
and follows all the HTTP caching semantics to the letter. We'll soon discuss what the means exactly. At the same time, it ignores many of the restrictions
placed on the MemoryCache, and does allow mixed type matches (so image requests can find get a script response there, for example).

The HTTP cache is a persistent cache storage, which means that it also has to evacuate resources (and has some eviction scheme), and that
since it uses persistent storage, it might be significantly slower than the MemoryCache.

But, if the HTTP cache doesn't have a resource for our request, we'd now have to go to the network, right?

Wrong!
When working with HTTP/2, there's one more cache on our request way before it hits the network.

---
class: contain, down
background-image: url(media/push_cache_890.png)

# Push cache
### AKA: The Unclaimed Push Stream Container

???

H2 push is a feature that enables the server to send resources to the browser before it requested them. When that happens, these resources
are stored in the push cache, waiting for matching requests to come along. Once a request matched the resource, it gets taken out of the
push cache, but often then goes into the HTTP cache.

The push cache is a non-persistent container on the H2 connection which keeps around resources which were pushed by the server.

Because the push cache is owned by the H2 connection, if the connection is closed, those pushed resources are gone. It also means that if you pushed a resource on one connection, and
the request for it comes off on a separate connection, the pushed resource won't be used.

Request and resource matching in the push cache is also underspecced, so the rules there are may vary between implementations.

But if our response resource is not waiting for us at the push cache, the next stop is the network

---
background-image: url() 
<iframe width="1120" height="315" src="media/user.svg" frameborder="0" ></iframe>

???
The network can be an extremely unpredictable medium, latencies vary based on queues filling up along the network, packet can get lost due
to such queues overflowing, collisions at the radio layer, data corruption of ongoing packets and more. It's a pretty scary place.

Latencies can also vary by the type of network that we're dealing with and the distance that the data has to go through.
Wifi networks have relatively low latency, but high packet loss rates when they get congested, which results in jittery latency.
Cellular networks have relatively high latencies, even though those have gotten better over the generations. 1G or GPRS had latency of ~700ms.
4G has a theoretical one of 50ms.
And wired networks usually have low latencies, but still, connecting two continents with fiber requires us to send light beams from one
place to the other. Speed of light is an inherent lower bound on that latency.

---
background-image: url() 
<iframe width="1120" height="315" src="media/edge.svg" frameborder="0" ></iframe>

???
So, if latency is such a dominant factor in our web apps performing well, what can we do to lower it?
That's where CDNs come in. Sitting as close as possible to your ISP's gateway are the CDN edge servers, which are likely to serve you the
content from their internal cache rather than send it all the way up to the origin server.

They also often terminate your TLS connection, significantly lowering the cost of TLS connection establishment. (again, by fighting latency)

At Akamai we see a median latency of 80ms between the user and the edge server on mobile networks, but that can vary based on the network
type.

So, if we found our resource here, that's awesome. And if we didn't? The CDN edge server needs to forward the request to the origin. (hence
this type of proxy is called forward proxy)

---
background-image: url() 
<iframe width="1120" height="315" src="media/reverse.svg" frameborder="0" ></iframe>

???

At the origin server's network, you're likely to hit another proxy, another cache. It might be an independent reverse proxy or a software
component such as redis inside your server architecture. Why do we need another cache server as part of the server's network? Latency
between that server and the origin is likely to be very low...

OTOH, when the server is creating dynamic content, it has to talk to databases and potentially fetch data from different places across its
network, or even external APIs. That can take a while. And if the same request is hitting the server multiple times in a row, having caches
in place can save that time and processing power and serve the response for the first request to future identical requests.

These caches may or may not follow HTTP cache semantics, based on their configured logic, which is in the Web developer's control, at least
in theory. (In practice they can come preconfigured, or be controlled by a different group from the ones handling the content)

And if that last line of caching cannot serve our request:
---
background-image: url() 
<iframe width="1120" height="315" src="media/origin.svg" frameborder="0" ></iframe>

???
We'd have to find it at the origin server. The origin server must have our resource, generate it, or declare failure (so a 400 response or a
500 response). And once it serves us that response

---
class: contain
background-image: url(media/happilly_ever_after_1600.png)

???

Now our request and resource have found each other and can start they way back to the browser leaving cached copies in caches along the way.

(20 minutes)

---
background-image: url(media/http:1.0.png)

???

So we talked a lot about HTTP caching semantics while going over the request lifetime, but what is that exactly?

Well, the good folks that standardized the HTTP/1.0 protocol realized
that caching is important and included Caching directives as part of the
protocol.
---
background-image: url(media/http:1.1.png)

???
A few years later with some more implementation experience and
understanding of what people need to do, the HTTP/1.1 protocol revamped
those caching directives and improved them.

Let's go over those HTTP mechanisms, shall we?

---
# URL is key

???
First of all the cache key for HTTP is the resource URL. If you request the same URL twice, the first response could be cached and used to serve the second
one.

Other Key concepts in HTTP caching are "freshness" and "validators".

---
# Freshness

???

Freshness
of a resource determines how long you can use that resource without
revalidating it. If you remember the "predicting the future" part that I
talked about a few minutes ago, determining the right "freshness" for a
resource is it.

---
### `Cache-Control: max-age=3600`;

???
If you include a "max-age" directive of 3600 seconds,
you're basically telling the browser and any other cache along the way
that you guaranty that this resource will not change in the next hour, but after that, it might. That means that if you shipped a
thing, and found a bug a minute later, fixed it and deployed it, your
users may continue to see that bug for almost an hour after the fix was
deployed. Not great...

Then once the freshness lifetime of the resource ran out, that doesn't
mean that the resource has changed. It just means that you need to
revalidate it. That revalidation happens with "Conditional requests"
that are using
---

# Validators
### `Last-Modified: Mon, 29 May 2017 15:32:00 GMT`
### `ETag: badbaaadbeef`
???

HTTP responses may include headers such as
"Last-Modified" and "ETag". What do these headers do? They tell the
cache: "This resource was last modified at this date" (duh!) or provide
a signature of the resource.
That enables the cache to revalidate the resource at a relatively low
cost.

---
class: left

### `If-None-Match: badbaaadbeef`
### `If-Modified-Since: Mon, 29 May 2017 15:32:00 GMT`

???
It can send out a request with an "If-Modified-Since" or
"If-None-Match" headers, basically asking the server "Did this thing
change?" and the server can reply with "Nah", or in HTTP, a

---
## `304 Not Modified`

???
response. Or if the resource has changed, the server can reply
with a 200 OK status response, just like it would normally have.
So validators give us the ability to revalidate a response, without
downloading its payload if it hasn't changed. That's awesome, but
revalidation still has a cost, as it still takes a full RTT (or round
trip time) to get the response back and know we can use the resource at
hand.

---
# Scope

???
Another aspect of HTTP's caching directives is their scope, and *who*
can cache said resources.

---
background-image: url(media/amazon_private_info.png)

???
For some resources it's perfectly fine to
cache them in the browser for a particular user, but would be an awful
privacy breach if cached on the network as a publicly cached resource.

---
## `Cache-Control: public`
## `Cache-Control: private`

???
We can control that using cache control directives that define the scope of the resource's cachability
`Cache-Control: private` means that a resource is not cacheable as a
public resource, and can only be cached on the client, in the browser's
cache.
`Cache-Control: public` means exactly the opposite.

---
# Cachability

???

Cachability directives tell the cache is and how this resource is cacheable.
Now we talked earlier about the 2 hardest things in computer science. Now we'll talk about the 2 biggest lies in computer
science.

---
## `Cache-Control:`
## `must-revalidate`
--

## (may not revalidate)

???
must-revalidate means that your content will not be revalidated as long as it is fresh, but cannot be served stale, once its freshness ran out. That's something you should use only when the
content you serve will really be invalid after the freshness has run out.

---
## `Cache-Control:`
## `no-cache`
--

## will cache

???
no-cache means that your content will be cached, but won't be served without revalidation.

---
## `Cache-Control:`
##`no-cache=set-cookie`

???

no-cache can also have a specific header value in its defintion, which changes its meaning further, and then means
"do not cache that particular response header, but everything else can be cached (and be served without validation) just fine"

...
I know

---
## `Cache-Control:`
## `no-store`

???
Will actually do what it says and avoid storing the resource on disk and will evict from memory as soon as possible (which can lead to
issues).

---
## `Cache-Control:`
## `stale-while-revalidate`
---
background-image: url() 
<iframe width="950" height="210" src="media/stale_while.svg" frameborder="0" ></iframe>
---
# And by default?

???

By default a cache server can serve content which doesn't indicate otherwise, for most HTTP status codes, using heuristic freshness times.
It also can (although that's rarely used) serve stale content unless the content indicates otherwise.
And some caching implementations treat URLs with parameters in them differently, and avoid caching them unless they have explicit caching
directives.

---
## `Cache-Control:`
## `proxy-revalidate`
## `s-maxage`
---
## `Surrogate-Control:`

???
is a caching header equivalent to `Cache-Control` but destined at your surrogate cache. That's either your internal reverse proxy, or your
CDN's edge server. It enables you to serve separate caching instructions to those proxies, vs. other proxies along the network.

---
# `Age:`

???

Another related HTTP response header, set by proxies, is `Age`. Using that a caching proxy tells anyone using its response that the resource it is serving has been in
the cache for that amount of seconds.

---
# `Vary:`

???
is used in content negotiation scenarios, where the response generated by the server has changed (or varied) based on a certain request HTTP
header. `Vary` enables the server to tell downstream caches that this response can be cached, but its cache key can now no longer be tied
only to the URL, but must also be related to the header in question.

---
class: bright
![](media/vary.svg)

???
That enables the server to tell caches along the way that the content was adapted to a particular request header.
For example if we're using client-hints, and the browser sends up the width of the image it is requesting along with request,
the server can adapt to that and tell any cache that this response can be served to requests with identical "Width" value, but not to ones
with a different "Width" value.

---
background-image: url(media/key_rfc.png)
# `Key:`

???

And an upcoming HTTP header value enables us to do more than that and have more granular control. The `Key` header enables us to define more
complex cache keys.

---
class: bright
![](media/key.svg)

???
If we'll look at the same client-hints example from before, maybe our server only serves image resources in fixed increments.
In that case what we want to tell caches is "the response is good for requests with a Width value between 300 and 600". Key enables us to do
that and much more. I won't go into more details, as it's not yet implemented anywhere, but it's a very powerful new proposal.

---
## Caching use cases

???

OK, so we just went over a large amount of possible header values, but you're just trying to make your content properly cacheable. What
should you do?

---
background-image: url(media/jake_caching.png)

???

I'm gonna steal Jake Archibald's advice here. In order to avoid the "predicting the future" issues that we've talked about earlier, there
are 2 common patterns that you can follow without too much headaches.

---
# Immutable content
---
### `Cache-Control: max-age: 315360000, immutable`

???

---
# Always revalidate
---
## `Cache-Control: no-cache`

---

## Everything else is a gamble

---
# Or is it????

---
# Hold Till Told!

???
In the CDN world a common pattern is to have rarely changed content considered as "immutable" on the edge, and use explicit purge instructions whenever
it changes. So your content is cacheable, but if you pushed to production a JS bug, a false statement or the wrong price, getting rid of it
is just a button click or an API call away.

Wouldn't it be cool if we could bring that same pattern to the browser?
---
# H2 push for invalidation

---
# Service Workers to the rescue!

???
With Service Workers we can have browser caching controlled by our own logic, implemented in JS. That means we can implement such a pattern
in the browser, and have such resources cached in the SW cache, while communicating "purge instructions" all the way up to the browser.

One example of such purge instructions could be keeping around a text file on the server which contains a list of purged URLs, which the SW
periodically checks and evicts from its cache as necessary.

---
# Takeaways

---
# Caching is important

???

Don't neglect caching as it can make a huge difference to your site's performance.
HTTP caching can be a bit complex and daunting, but don't let it discourage you.

---
# Browser internal caches

???
There are lots of them. Not all are specced. And if you're using H2 push, preload or prefetch, knowing these caches can come in handy.

---
# HTTP Caching patterns

???
immutable and always revalidate

---
# Service Worker FTW

???

Service worker can enable us new and exciting caching patterns such as "hold till told" in the browser.

---
# Thank you!

.center[Yoav Weiss  |     @yoavweiss]
![](media/Akamai-Logo-RGB.png)
---
# Questions?
.center[Yoav Weiss  |     @yoavweiss]
![](media/Akamai-Logo-RGB.png)

### Credits
* https://www.flickr.com/photos/puuikibeach/24657325864/ - Treasure chest
* https://www.flickr.com/photos/art_es_anna/456980063/ - Crystal ball
* https://gist.github.com/hellerbarde/2843375 - Access speed numbers

???