Hi, my name is Nolan Lawson and today I'd like to talk to you about CSS runtime performance.
Quick note: these slides will be available online.
The speaker notes have lots of links.
So who am I?
If you know me from the internet, it's probably from my blog where I talk about performance, accessibility, web components, etc.
I was on the Microsoft Edge performance team for a couple years, then moved to the performance team at Salesforce, and now I work on our JavaScript framework, Lightning Web Components.
Press P or
Hi, my name is Nolan Lawson and today I'd like to talk to you about CSS runtime performance.
Quick note: these slides will be available online.
The speaker notes have lots of links.
nolanlawson.com
So who am I?
If you know me from the internet, it's probably from my blog where I talk about performance, accessibility, web components, etc.
I was on the Microsoft Edge performance team for a couple years, then moved to the performance team at Salesforce, and now I work on our JavaScript framework, Lightning Web Components.
I'd like to start off with a story. When I was learning to drive as a teenager, my car was a stick shift (manual transmission). These are much more common in Europe than in the U.S., but growing up in the Seattle area, this is what I had.
I kinda like stick shifts. You feel more in tune with what the car is doing. By listening to the sounds of the engine, I developed a feel for when to shift from one gear to another, or how to do things like "engine braking," which is actually an efficient use of fuel.
Of course, I have no idea how an internal combustion engine actually works. It's a hugely complex thing.
But by listening to the engine and seeing how it reacted to my actions, I learned how to use the engine efficiently.
This is kind of what I like about web performance. A browser engine is an very complex. And I'm not a C/C++ developer.
But through observation of how the engine responds to my inputs, I can become a better web developer, and write more efficient web apps.
And if you know just a bit about how the engine works, you can be an even better web developer.
Let's look at a browser perf trace like this one (from the Chrome DevTools). This is the main thread.
There are two main parts: the yellow (JavaScript) part, and the purple (style/layout) part.
JavaScript (yellow part)
Let's look at a browser perf trace like this one (from the Chrome DevTools). This is the main thread.
There are two main parts: the yellow (JavaScript) part, and the purple (style/layout) part.
If you're an experienced web dev, you might look at the JavaScript side and feel pretty comfortable with it.
We see names of functions we wrote. We see libraries we installed from npm. We see frameworks like React.
JavaScript (yellow part)
Style/Layout (purple part)
Let's look at a browser perf trace like this one (from the Chrome DevTools). This is the main thread.
There are two main parts: the yellow (JavaScript) part, and the purple (style/layout) part.
If you're an experienced web dev, you might look at the JavaScript side and feel pretty comfortable with it.
We see names of functions we wrote. We see libraries we installed from npm. We see frameworks like React.
But many people look at the purple part and see a black box. "That's just the browser doing browser things. I couldn't possibly understand that."
Or we say "the purple part doesn't matter." As if the user cares whether their click was delayed by the yellow part or purple part!
This is kind of a "learned helplessness." We feel helpless, so we try to tell ourselves it doesn't matter.
In this talk, I'd like to convince you can understand what's going on in there, if you know a little bit about how browsers work.
To put the purple part in context, I tested three news sites. I used WebPageTest, simulated low-end Android phone.
Then I categorized the time spent on the main thread..
As you can see, the purple part is not the most important part, but it can be quite big. For the third site in particular, it's worth looking into.
And even for the others, in absolute terms, 2.6s and 3.5s are pretty big! If we can find some quick wins here, that would be great.
To understand the purple part, we first need to start with how browsers render content. This process is called "updating the rendering" in the HTML spec. Let's call it "the render loop."
The first step, JavaScript, is where we run some JavaScript that modifies the DOM. Typically, this will be your JavaScript framework rendering, such as React doing its virtual DOM diffing and then eventually putting elements into the DOM.
The next two steps, style and layout, involve applying your CSS to those DOM elements and then laying them out on the page. This is my focus.
The last two steps, paint and composite, are about actually writing pixels to the screen. This is where we have layers, GPU animations, opacity, etc. This is out of scope for this talk.
Let's focus on the style/layout part.
So let's break down style and layout calculation first. These are two separate steps.
So let's break down style and layout calculation first. These are two separate steps.
With style, we're figuring out which CSS rules apply to which elements and computing styles.
So let's break down style and layout calculation first. These are two separate steps.
With style, we're figuring out which CSS rules apply to which elements and computing styles.
With layout, we're figuring out how to place those elements geometrically on the page.
h1 { padding: 5px;}h2 { padding: 10px;}
<h1>Hello</h1><h2>World</h2>
So style calculation is about figuring out which elements have which CSS rules. The output of this is called the "layout tree" (or "render tree").
Let's take a simple example. In this case, we have a 5px-padding h1 and a 10px-padding h2. So style calculation is the process of figuring out that
h1 { padding: 5px;}h2 { padding: 10px;}
<h1>Hello</h1><h2>World</h2>
this rule applies to this h1
h1 { padding: 5px;}h2 { padding: 10px;}
<h1>Hello</h1><h2>World</h2>
this rule applies to this h2.
In this case, style calculation is (basically) about applying the CSS selectors, and figuring out that h1
refers to the <h1>
element,
and h2
refers to the <h2>
element.
So in a sense, it's almost as if the browser is taking this page, and turning it into this one:
<h1 style="padding: 5px;" >Hello</h1><h2 style="padding: 10px;">World</h2>
Conceptually, this is what style calculation is: it's giving us the same page we would have had if we had used inline styles.
It's also computing the em
s, rem
s, etc. and turning them into px
s, as well as resolving custom properties, cascade, inheritance, etc,
but this has less of an impact on perf in my experience.
<h1 style="padding: 5px;" >Hello</h1><h2 style="padding: 10px;">World</h2>
Now let's move on to layout. Conceptually, the output of style is the "inline styles," and that's the input to layout.
Now we finally get to the geometry of the page. Layout calculation is where the styles, which have been associated with each element, actually get applied.
In this case, the browser figures takes the margin, padding, text wrapping, widths, heights, positions, etc.
Now first off, when you're looking at a perf trace, it's important to understand whether you primarily have a problem with style calculation, layout calculation, or both. Because these two traces are not the same!
These two look similar because they're both purple. But in one trace, we have huge style costs, and in the other, we have huge layout costs. The causes of slowness in these two cases is very different!
If you don't remember anything else from my talk, please remember this: style and layout are not the same thing!
The biggest mistake I see people make is seeing one and thinking it's the other. For instance, they have a lot of style costs, but they think it has something to do with the geometry of the page.
Just to give an idea of what the style vs layout breakdown might look like, here are those three news sites from earlier. Layout is usually the biggest part, but style is occasionally pretty big. In fact, for the first site, it's spending slightly more time in style than in layout.
I've seen traces where it's almost all style, and where it's almost all layout, and everything in between. It depends.
Style | Layout | |
---|---|---|
Complexity of CSS | π | |
Complexity of layout | π | |
DOM size | π | π |
Repeated re-renders (thrashing) | π | π |
At a high level, if you're seeing a large amount of time spent in style or layout, it usually comes down to one of these things.
Style | Layout | |
---|---|---|
Complexity of CSS | π | |
Complexity of layout | π | |
DOM size | π | π |
Repeated re-renders (thrashing) | π | π |
At a high level, if you're seeing a large amount of time spent in style or layout, it usually comes down to one of these things.
Typically either your CSS selectors are too complex, or there are a lot of them, which slows down style calculation. Note this has no effect on layout calculation.
Style | Layout | |
---|---|---|
Complexity of CSS | π | |
Complexity of layout | π | |
DOM size | π | π |
Repeated re-renders (thrashing) | π | π |
At a high level, if you're seeing a large amount of time spent in style or layout, it usually comes down to one of these things.
Typically either your CSS selectors are too complex, or there are a lot of them, which slows down style calculation. Note this has no effect on layout calculation.
Or your layout itself, i.e. the geometry of the page, is very large or complex, which slows down layout calculation. Note this has no effect on style calculation.
Style | Layout | |
---|---|---|
Complexity of CSS | π | |
Complexity of layout | π | |
DOM size | π | π |
Repeated re-renders (thrashing) | π | π |
At a high level, if you're seeing a large amount of time spent in style or layout, it usually comes down to one of these things.
Typically either your CSS selectors are too complex, or there are a lot of them, which slows down style calculation. Note this has no effect on layout calculation.
Or your layout itself, i.e. the geometry of the page, is very large or complex, which slows down layout calculation. Note this has no effect on style calculation.
Or your DOM is very large. A bigger DOM just means more work for the browser to do. This affects both style and layout.
Style | Layout | |
---|---|---|
Complexity of CSS | π | |
Complexity of layout | π | |
DOM size | π | π |
Repeated re-renders (thrashing) | π | π |
At a high level, if you're seeing a large amount of time spent in style or layout, it usually comes down to one of these things.
Typically either your CSS selectors are too complex, or there are a lot of them, which slows down style calculation. Note this has no effect on layout calculation.
Or your layout itself, i.e. the geometry of the page, is very large or complex, which slows down layout calculation. Note this has no effect on style calculation.
Or your DOM is very large. A bigger DOM just means more work for the browser to do. This affects both style and layout.
Or you are doing repeated re-renders over time, also called thrashing, which slows down both style and layout.
To understand style vs layout performance a bit more, we need to go into detail on how each one works. Let's go into style.
Remember, this is about matching up CSS rules with DOM nodes, and computing styles.
"For most websites I would posit that selector performance is not the best area to spend your time trying to find performance optimizations."
β Greg Whitworth, via Enduring CSS by Ben Frain (2016)
When we talk about style performance, we're mostly talking about selector performance. This needs some clarification.
There's a common refrain in the web perf community that CSS selector performance "doesn't matter" or you shouldn't worry about it. Here is one representative quote Greg Whitworth. I'm picking on Greg because I work with him, but if you Google "CSS performance," you'll see this repeated in multiple places.
I think this is true at the micro level: if you try to micro-optimize your CSS selectors, you're probably wasting your time. But it's less true at the macro level: if you're building a framework or design system, or if you're writing selector patterns that are repeated multiple times on a page, then selector performance can really matter.
I've seen perf regressions that were almost entirely driven by selector performance. And if you see high style calculation costs, then you very likely have a selector perf problem too. So we need to discover the "lost art" of selector performance.
for (const element of page) { for (const rule of cssRules) { if (rule.matches(element)) { /* ... */ } }}
To understand style performance, first it's important to note how browsers actually implement their style engines, and what they've already optimized.
To illustrate, let's imagine we're building a browser. Here is a naive implementation of style calculation that we might have.
for (const element of page) { for (const rule of cssRules) { if (rule.matches(element)) { /* ... */ } }}
O(n * m)
To understand style performance, first it's important to note how browsers actually implement their style engines, and what they've already optimized.
To illustrate, let's imagine we're building a browser. Here is a naive implementation of style calculation that we might have.
Unfortunately this naive implementation has a big problem: this is an O(n * m)
operation, where n
is the number of elements and m
is the number of CSS rules. On any reasonably-sized page, the browser would slow to a crawl. So browsers try to avoid this naive case
wherever possible.
For example, let's look at a simple DOM tree. In this case, we have multiple selectors in our stylesheet, and we need to match them with DOM nodes.
If we were doing the naive approach, then the browser would have to walk through the entire DOM, plus every selector for every DOM.
You can see how this would be inefficient, especially if it runs every time the DOM changes!
span
β span
, a
β a:last-child
bar
β #bar
foo
β .foo
So let's add a simple optimization to our toy browser. For every tag name, ID, and class, we'll create a hashmap mapping those strings to the list of selectors for that string.
This is pretty reasonable, because tag names for an element never change, and IDs and classes are pretty small and simple most of the time.
As you can see, this has a big impact of the efficiency of our algorithm. Rather than checking all selectors, we can short-circuit to only those selectors that could possibly match
As you can see, this has a big impact of the efficiency of our algorithm. Rather than checking all selectors, we can short-circuit to only those selectors that could possibly match
Now, there's still a problem with our algorithm. What about descendant selectors? In this case, we need to find all
.bar
elements inside of a .foo
.
So we have to traverse the descendants of .foo
to try to find all the .bar
elements.
The hashmap doesn't really help us here, because it's more about the relationship between the two nodes. So this is pretty inefficient. We're walking a
lot of DOM nodes just to find the .bar
elements.
So here's another optimization we can do. How about instead of walking from the left to the right, we evaluate the selector from right to left?
So instead of going foo
then bar
, we would go bar
then foo
.
Here is our same DOM tree from before.
It turns out we check a lot fewer DOM nodes this way.
You may have heard that browser engines evaluate CSS selectors from right to left. If you've ever wondered why, this is the reason! Any given node in the DOM tree tends to have fewer ancestors than descendants, so this optimization works out really well for most DOM trees.
.foo div
This right-to-left technique works out pretty well. But we have another problem. What about selectors like this one?
The right-hand-side (i.e. the descendant) is pretty generic. Most DOM trees have a lot of div
s.
Consider this DOM tree, where we have a lot of divs and want to find .foo div
.
With the right-to-left technique, we have to traverse a lot of ancestor chains just to find .foo
.
So how can we solve this?
Enter the Bloom filter. WebKit began using this technique in 2011. All browsers have it now.
You can think of a Bloom filter as a Hash Set that may give false positives, but never gives false negatives. It's fast and has low memory overhead.
Here, we hash the strings x, y, and z, and insert them into the Bloom filter. Then we check if w is in there by hashing it.
How does this work in the DOM tree? Well basically, the browser keeps a little Bloom filter hash on each node of its ancestors' tag name, IDs, and classes.
This means that if we're on div
, and we want to figure out if .foo
is an ancestor, then we don't have to walk up the tree β we know
instantly, because .foo
is in the Bloom filter.
This particular optimization was first applied by WebKit in 2011 and now all browsers have it.
So now we can quickly filter all the divs
based on the Bloom filter, "fast rejecting" any
that couldn't possibly have .foo
as an ancestor.
Note that, because we could have false positives, we still
need to walk the ancestor chain to check that it really has .foo
as an ancestor, but we are still eliminating a lot of work.
Supported? | Type | Example |
---|---|---|
β | ID | #id div |
β | ClassοΈ | .foo div |
β | TagοΈ | main div |
β | Attribute π | [foo] div |
β οΈ | Attribute value π | [foo="bar"] div |
βοΈ | Other stuff | :nth-child(2) div |
So what's in the Bloom filter?
Supported? | Type | Example |
---|---|---|
β | ID | #id div |
β | ClassοΈ | .foo div |
β | TagοΈ | main div |
β | Attribute π | [foo] div |
β οΈ | Attribute value π | [foo="bar"] div |
βοΈ | Other stuff | :nth-child(2) div |
So what's in the Bloom filter?
Originally it was only IDs, classes, and tags. WebKit invented this in 2011.
Supported? | Type | Example |
---|---|---|
β | ID | #id div |
β | ClassοΈ | .foo div |
β | TagοΈ | main div |
β | Attribute π | [foo] div |
β οΈ | Attribute value π | [foo="bar"] div |
βοΈ | Other stuff | :nth-child(2) div |
So what's in the Bloom filter?
Originally it was only IDs, classes, and tags. WebKit invented this in 2011.
In 2018 WebKit added attributes, and Firefox and Chrome added them in 2021 after I filed bugs on them.
Note that the attribute optimization only applies to attribute names, not values, but attribute value selectors can kind of piggyback off of them because the browser will quickly check if any ancestors even have the attribute name, before checking the value.
Supported? | Type | Example |
---|---|---|
β | ID | #id div |
β | ClassοΈ | .foo div |
β | TagοΈ | main div |
β | Attribute π | [foo] div |
β οΈ | Attribute value π | [foo="bar"] div |
βοΈ | Other stuff | :nth-child(2) div |
So what's in the Bloom filter?
Originally it was only IDs, classes, and tags. WebKit invented this in 2011.
In 2018 WebKit added attributes, and Firefox and Chrome added them in 2021 after I filed bugs on them.
Note that the attribute optimization only applies to attribute names, not values, but attribute value selectors can kind of piggyback off of them because the browser will quickly check if any ancestors even have the attribute name, before checking the value.
Other stuff could be optimized in theory, but last I checked (late 2022), no browsers have expanded the Bloom filter to anything else.
Bloom filter source:
Now, there are many more browser style optimizations than what I've mentioned here. Here are a few more.
Now, there are many more browser style optimizations than what I've mentioned here. Here are a few more.
Now, there are many more browser style optimizations than what I've mentioned here. Here are a few more.
:has
(2022)Now, there are many more browser style optimizations than what I've mentioned here. Here are a few more.
:has()
, which can be thought of as an ancestor selector. (How did they make this fast? You guessed it... another
Bloom filter. Like the other one, this one has classes, IDs, tags, and attributes, but it also adds :hover
, and they hint
that they may add other pseudo classes later.)
My goal in telling you this is not to say that you need to memorize whatever optimizations browsers have.
I just want to give you an appreciation for all the work a browser has to do to make it so, 9 times out of 10, you _don't have to worry about selector performance.
Notes:
So now, knowing a bit more about how browsers work under the hood, what can we as web developers do if we see high style calculation costs?
Well, one thing you can do to reduce style calculation costs is to remove unused CSS. (The example shows a screenshot from Chrome Dev Tools "Coverage" tool.)
After all, the browser doesn't know your CSS is unused until it runs the algorithm to determine that it's unused. And it may have to run this algorithm every time something on the page changes. So unused CSS can cost you on the main thread throughout the lifetime of your web app.
So trim that unused CSS!
[class*="foo"] :nth-child(2) > p ~ *
These kinds of selectors are cute, but if we imagine matching right-to-left, we can see why they might be expensive.
Now, you can probably have a few of these on your page and you'll be fine. It's more of a problem at the macro level, e.g. if you're building a framework or design system that repeats these selectors multiple times.
Some folks use preprocessors like SASS or LESS, and it's easy to put something like this in a for-loop.
~Cost | Type | Examples |
---|---|---|
β | ID, class, tag | #id .cls a |
β οΈ | Descendant | .foo * .foo > * |
β οΈ | Attribute name | [foo] |
πΆοΈοΈ | Attribute value | [foo="bar"] [foo*="bar"] |
πΆοΈ | Sibling | .foo ~ * .foo + * |
πΆοΈ | Pseudo-class | :nth-child() :nth-of-type() |
This is my extremely rough estimate of selector costs.
~Cost | Type | Examples |
---|---|---|
β | ID, class, tag | #id .cls a |
β οΈ | Descendant | .foo * .foo > * |
β οΈ | Attribute name | [foo] |
πΆοΈοΈ | Attribute value | [foo="bar"] [foo*="bar"] |
πΆοΈ | Sibling | .foo ~ * .foo + * |
πΆοΈ | Pseudo-class | :nth-child() :nth-of-type() |
This is my extremely rough estimate of selector costs.
~Cost | Type | Examples |
---|---|---|
β | ID, class, tag | #id .cls a |
β οΈ | Descendant | .foo * .foo > * |
β οΈ | Attribute name | [foo] |
πΆοΈοΈ | Attribute value | [foo="bar"] [foo*="bar"] |
πΆοΈ | Sibling | .foo ~ * .foo + * |
πΆοΈ | Pseudo-class | :nth-child() :nth-of-type() |
This is my extremely rough estimate of selector costs.
Attribute names have historically been less optimized. They're still typically slower than classes, but WebKit and Firefox made optimizations recently (in addition to the Bloom filter optimizations), after I began writing about this on my blog.
~Cost | Type | Examples |
---|---|---|
β | ID, class, tag | #id .cls a |
β οΈ | Descendant | .foo * .foo > * |
β οΈ | Attribute name | [foo] |
πΆοΈοΈ | Attribute value | [foo="bar"] [foo*="bar"] |
πΆοΈ | Sibling | .foo ~ * .foo + * |
πΆοΈ | Pseudo-class | :nth-child() :nth-of-type() |
This is my extremely rough estimate of selector costs.
Attribute names have historically been less optimized. They're still typically slower than classes, but WebKit and Firefox made optimizations recently (in addition to the Bloom filter optimizations), after I began writing about this on my blog.
Sibling selectors tend to be less optimized. Non-adjacent selector and generic right-hand-side can cause a lot of matching.
Pseudos like :nth-child()
and :nth-of-type()
tend to be less optimized, although browsers have specific optimizations for common ones like :hover
and :focus
.
~Cost | Type | Examples |
---|---|---|
β | ID, class, tag | #id .cls a |
β οΈ | Descendant | .foo * .foo > * |
β οΈ | Attribute name | [foo] |
πΆοΈοΈ | Attribute value | [foo="bar"] [foo*="bar"] |
πΆοΈ | Sibling | .foo ~ * .foo + * |
πΆοΈ | Pseudo-class | :nth-child() :nth-of-type() |
This is my extremely rough estimate of selector costs.
Attribute names have historically been less optimized. They're still typically slower than classes, but WebKit and Firefox made optimizations recently (in addition to the Bloom filter optimizations), after I began writing about this on my blog.
Sibling selectors tend to be less optimized. Non-adjacent selector and generic right-hand-side can cause a lot of matching.
Pseudos like :nth-child()
and :nth-of-type()
tend to be less optimized, although browsers have specific optimizations for common ones like :hover
and :focus
.
To actually understand which selectors are slow, there actually is a new tool
released in Chromium earlier this year. If you enable blink.debug
when using Chrome tracing...
Then you can get this view of the "selector stats." If you sort by elapsed time, you can actually see your most expensive CSS rules ranked from most to least expensive.
The time shown is microseconds. We also have match attempts, match count, and fast reject count. Fast reject count is Bloom filter rejections.
Note this tool may be more useful at the macro level than the micro level. Look for repeated patterns coming from a framework or CSS processor.
This tool may land in Dev Tools soon!
<my-component> #shadow-root <style> div { color: red } </style> <div>Hello!</div></my-component>
Shadow DOM is interesting because it encapsulates styles. They don't bleed in and out of this component.
Because of this scoping, this reduces both the n
and the m
in our O (n*m)
algorithm earlier. The browser
doesn't need to check as many rules against as many elements β it's clear which ones can apply to each other.
:nth-child(2) * /* Input */
.xyz:nth-child(2) .xyz /* Svelte output */
An alternative to shadow DOM is to use style scoping from frameworks like Vue or Svelte. In a sense, these "polyfill" shadow DOM style scoping by modifying the selectors with unique classes, tags, or attributes.
In this case, you can see how Svelte effectively turns an inefficient selector into an efficient selector that benefits from both the hashmap and Bloom filter optimization.
This chart runs my benchmark, median of 25 iterations, on a 2014 Mac Mini. The benchmark generates some random CSS, and compares running it through various scoping algorithms vs using shadow DOM vs just leaving the styles unscoped. It uses 1,000 generated "components," with 10 styles per component.
Note though that this is a microbenchmark, and this is not exactly comparing apples-to-apples. But the point is to show that shadow DOM and scoped styles do have a perf impact.
If you're writing a framework yourself, you may be interested in the differences between the scoping strategies, which is covered in my blog post.
I hesitated to mention this one, because it only applies to Chromium, but it's a big optimization.
It turns out that in Chromium, one big stylesheet is faster than multiple small stylesheets for style calculation. It has no impact on Safari or Firefox, and the Chromium devs may fix it eventually, but it's something to be aware of.
In this benchmark, this is median of 25 iterations. The styles are exactly the same; it's just the number of
individual <style>
s that's different. The fastest is to concatenate everything into one big stylesheet.
So the advice would be to concatenate stylesheets as much as possible across pages, similar to how we do concatenation and chunking for JavaScript modules in bundlers like Webpack. There are implications for cache performance here as well, though, so don't over-concatenate. Also try to avoid modifying the stylesheets after inserting them; if you do modify them, it's better to keep them separate according to the Chromium devs.
OK, so now that I've covered all the bases on style performance, I want to move on to layout performance.
Remember, this is about the geometry of the page.
So let's say we have a simple layout like this. We've got a header, a sidebar, and the main content.
Each of these boxes contains other boxes, but the browser already knows which elements have which styles, so it's just a matter of laying them out.
This can get very complicated, because some of these boxes may have absolute/relative positioning, others may use flexbox, others may use grid, etc.
And the browser has to calculate all the boxes for these things relative to each other.
So we have our simple layout.
But let's say our main content suddenly takes up a bit more space, so now the sidebar has to shrink.
So now the browser has to recalculate the geometry of all these boxes
Why might this happen? Well maybe we ended up with some super long text that pushed the size of the box out. If that happens, then the browser might have to recalculate the layout for everything inside the sidebar.
Now as the author of the page, you might know this is never going to happen. And even if it did, you'd want to clip the text or something anyway. But is there some simpler way we can reassure the browser that these boxes are never going to change size?
Is there a way we can reassure the browser that a change in one box won't affect the other boxes? Yes, it's called CSS containment.
If we apply the CSS contain: strict
to each of these boxes, then the browser can calculate their sizes independently
of each other. This has the potential to speed up layout performance.
Now, this has some downsides. If there's a dropdown or something that you want to peek out of one box and into another one...
...then it'll get cut off if we use CSS containment, because we promised the browser that one box wouldn't bleed into another.
contain: content
contain: strict
Now there are two (main) values: content
and strict
. content
is a little laxer than strict
, and is easier
to apply more broadly, but it doesn't size itself independently of its descendants. So strict
is a bit harder to pull off.
My recommendation would be to try applying these to logically separate parts of your page (sidebars, modals, individual items in a list, etc.), and then measure and see if it improves layout performance.
It's a bit hard to predict what optimizations a browser will apply, but here is a benchmark I put together based on one built by Manuel Rego Casasnovas. It renders 100 items, changes the text 100 times, and measures the result. (Median of 25 iterations, 100 items.)
As you can see, in Chrome you get most of the benefit with contain: content
and don't need to go as far as
contain: strict
. With Firefox, the benefit only comes with contain: strict
. In Safari there isn't any effect.
Keep in mind that browsers may implement multiple different optimizations for CSS containment, or none at all. The spec doesn't require that a browser implement any optimizations β only that the observable effects of containment apply. Also note that this is just one benchmark, and you may see different results in your own page.
(Also note that I tested in Firefox Nightly, and they seem to have the same perf optimization as Chrome now.)
Shadow DOM | CSS containment | |
---|---|---|
Encapsulates | Style | Layout |
Improves first calc? | Yes | No |
Now, CSS containment is a form of encapsulation. You might recall I also referred to shadow DOM as a form of encapsulation. What's the difference?
Shadow DOM | CSS containment | |
---|---|---|
Encapsulates | Style | Layout |
Improves first calc? | Yes | No |
Now, CSS containment is a form of encapsulation. You might recall I also referred to shadow DOM as a form of encapsulation. What's the difference?
Well, shadow DOM encapsulates your styles and improves style calculation. Whereas CSS containment encapsulates your layout and improves layout performance.
So if you have high style costs, CSS containment can't help you. And if you have high layout costs, shadow DOM can't help.
Shadow DOM | CSS containment | |
---|---|---|
Encapsulates | Style | Layout |
Improves first calc? | Yes | No |
Now, CSS containment is a form of encapsulation. You might recall I also referred to shadow DOM as a form of encapsulation. What's the difference?
Well, shadow DOM encapsulates your styles and improves style calculation. Whereas CSS containment encapsulates your layout and improves layout performance.
So if you have high style costs, CSS containment can't help you. And if you have high layout costs, shadow DOM can't help.
Also note that CSS containment can only make subsequent layouts faster. It provides a hint that if one part of the DOM changed, then another part doesn't need to be invalidated.
Some browsers like Firefox have experimented with doing layouts in parallel, but since no browser actually implements parallel layout, CSS
containment cannot speed up the first layout pass. Whereas for style calculation, shadow DOM can actually improve
style performance for the first style calculation because it's just about reducing the n
and m
in that O(n*m)
algorithm I mentioned.
Other than CSS containment, I can only share a few general tips on improving layout performance.
Other than CSS containment, I can only share a few general tips on improving layout performance.
First off, explicitly telling the browser the sizes of things will always be less work than asking it to run its layout algorithm. If you know the exact width/height of something, you can set the explicit size rather than letting the browser calculate it.
Other than CSS containment, I can only share a few general tips on improving layout performance.
First off, explicitly telling the browser the sizes of things will always be less work than asking it to run its layout algorithm. If you know the exact width/height of something, you can set the explicit size rather than letting the browser calculate it.
Also, of course, use fewer DOM nodes. If you have an infinite-scrolling list, use virtualization so that you're not rendering a bunch of DOM nodes that are off-screen.
display:none
and content-visibility
Other than CSS containment, I can only share a few general tips on improving layout performance.
First off, explicitly telling the browser the sizes of things will always be less work than asking it to run its layout algorithm. If you know the exact width/height of something, you can set the explicit size rather than letting the browser calculate it.
Also, of course, use fewer DOM nodes. If you have an infinite-scrolling list, use virtualization so that you're not rendering a bunch of DOM nodes that are off-screen.
If you use something like display:none
, it will also avoid paying the layout cost for everything that is currently being hidden.
There is also a new property that you can use called content-visibility
, that allows the browser to skip rendering
large portions of the page while still allowing them to be searchable with Cmd-F/Ctrl-F.
So I'd say if you have high layout costs, try CSS containment first, then try these techniques.
OK, so now that I've covered the principles of style and layout, and how they're different, I want to move on to topics that affect both style and layout calculation.
Up to now, we've mostly talked about what happens to a page that calculates style/layout once. But of course, a lot of us are building very dynamic pages that are constantly changing, so the browser calculates style/layout more than once. This process is called "invalidation."
Basically it means we are going from one layout state to another state.
This can typically happen for two different reasons: either 1) the DOM changed, and/or 2) the CSS rules changed.
element.style.width = '200px';
Invalidation could be as simple as this β changing the margin on an element.
When the browser detects this, it will automatically re-render during the next style/layout pass, which happens on the next frame.
That's why, when you call requestAnimationFrame
, you get the point in time right before the next
style/layout operation.
element.style.width = '200px'; // Invalidateelement.getBoundingClientRect(); // Force style/layout
Now, normally this is fine. But it gets dangerous if you're explicitly telling the browser that you want style and layout to be calculated immediately, rather than waiting for the next frame.
element.style.width = '200px'; // Invalidateelement.getBoundingClientRect(); // Force style/layout
Here we're invalidating by setting the margin on the element to 20px. This doesn't actually cause the browser to do any style/layout work yet. Normally it would happen in the next frame.
element.style.width = '200px'; // Invalidateelement.getBoundingClientRect(); // Force style/layout
But instead, we immediately call element.getBoundingClientRect()
. This forces the browser to immediately
and synchronously calculate both style and layout.
getBoundingClientRect
offsetWidth
getComputedStyle
innerText
π€―Paul Irish has a complete list of APIs that force style/layout recalc.
Some APIs are a bit suprising (like innerText
). Some force style and layout, whereas others only force style (e.g.
getComputedStyle
).
for (const el of elements) { const width = el.parentElement.offsetWidth; el.style.width = width + 'px';}
This leads us to layout thrashing.
Layout thrashing is a situation where, in a loop, you're both writing to the DOM (invalidating) and writing to the DOM (forcing style/layout). This forces the browser to re-run style and layout repeatedly.
for (const el of elements) { const width = el.parentElement.offsetWidth; el.style.width = width + 'px';}
So in this case here we are reading from the DOM
for (const el of elements) { const width = el.parentElement.offsetWidth; el.style.width = width + 'px';}
And here we are writing to the DOM
The telltale sign of layout thrashing is these repeated sections of purple in the DevTools, and the warning about "forced reflow." (Reflow is another name for layout.)
const widths = elements.map(el => el.parentElement.offsetWidth);elements.forEach((el, i) => { el.style.width = widths[i] + 'px';});
In these cases, it's better to batch your reads and writes together.
const widths = elements.map(el => el.parentElement.offsetWidth);elements.forEach((el, i) => { el.style.width = widths[i] + 'px';});
So that you do all the reads at once...
const widths = elements.map(el => el.parentElement.offsetWidth);elements.forEach((el, i) => { el.style.width = widths[i] + 'px';});
...followed by all the writes.
This ensures you only at most pay for style calculation at most twice β once during the reads, and again during the writes.
If you do this correctly, then you should see one big style/layout cost (or at most two) rather than multiple. This allows the browser to be more efficient because it's doing all the calculations at once rather than piece by piece.
Note that the DevTools can be misleading. They warn you about "forced reflow" anytime you use one of the APIs that force style/layout, such as getBoundingClientRect
or offsetWidth
.
But if you're only reading from the DOM once, then it's useless to eliminate that call; you're just moving the cost around.
Here we've gone through a lot of effort to remove that getBoundingClientRect
call. And the Chrome DevTools
have rewarded us! Our "Recalculate style" doesn't have a little red triangle with a warning anymore.
But the result is the same. All we did was move the style/layout costs from the getBoundingClientRect
to
the next style/layout calc on the next frame. The total time spent is the same. So this DevTools warning can be very misleading.
Browsers have a lot of invalidation optimizations. They try to avoid unnecessary work.
You change a class, and they try to only check rules related to that class. You change one part of the DOM, and they try to only redo layout for that part of the DOM (although containment can help).
This is why, in the typical flow for a web app, we have a lot of high upfront style/layout costs, and very small residual style/layout costs on every interaction. This is a good thing β this is what we want.
Sometimes though, these residual costs are surprisingly high. And it can be really tricky to figure out exactly why something was invalidated and is causing these high residual costs.
One tool you can use to inspect this is "invalidation tracking" in the Chrome DevTools.
Note that this only really works for Chrome β other browsers have different heuristics and different performance characteristics when it comes to invalidation.
If you do this, and then you click on the "Recalculate style" or "Recalculate Layout" slice in Dev Tools, then it will show you which CSS rules were invalidated for which elements (in the case of style recalc), or which elements needed layout (in "recalculate layout"). This can be really invaluable in debugging high invalidation costs!
For instance, you might use this to find out that you have an expensive animation that you're paying for, even though it's off-screen or otherwise invisible in the DOM.
Another thing to be aware of with invalidation: not all invalidations are created equal. In general, invalidating CSS rules is more expensive than invalidating DOM elements. In particular, when you insert new CSS rules at the global level, the browser may recalculate all styles for every existing element on the page.
In this benchmark, I'm inserting <style>
tags in each rAF
, and you can see the style costs just steadily
increasing, even though I'm not inserting any new DOM nodes.
So the best practice is to batch your CSS rule insertions.
:has
selectorCSS has been getting a lot of new features recently. Here are some new and draft specs.
Layout has been getting new features too.
All of these features are cool, and you should be using them. CSS is great.
JavaScript (yellow part)
Style/Layout (purple part)
The more work you do in the purple part and the less in the yellow part, the more performant your page will tend to be.
But the more we lean on CSS, and the more ambitious apps we try to build, I worry that we'll see more time spent in the purple part. And it's not good enough to treat it as a black box.
Unfortunately it's harder to understand the purple than the yellow, because JavaScript is imperative, whereas CSS is declarative.
With JavaScript, there's a 1-to-1 mapping between the algorithm we write and the perf trace.
Whereas with CSS, we give a big declarative blob to the browser and tell the browser to implement the algorithm.
SELECT height, weight FROM pokemon INNER JOIN pokemon_types ON pokemon.id = pokemon_types.id;
I think there's an interesting analogy here with SQL. Like CSS, SQL is declarative.
And like CSS, SQL has performance considerations. A slow query can result in time spent waiting on the database.
EXPLAINSELECT height, weight FROM pokemon INNER JOIN pokemon_types ON pokemon.id = pokemon_types.id;
But unlike CSS, SQL has an EXPLAIN.
If we ask Postgres to EXPLAIN, it'll tell us how many rows it scanned, which indexes it used, how it joined the tables, etc.
So we can map this back to the declarative SQL we wrote.
Going back to my original car metaphorβ¦
Sure, I can listen to the engine's noises and rely on intuition. But what I'd really like is a dashboard, to give me insight into what the engine is doing.
So my pitch to browser vendors is: please give us a CSS EXPLAIN! SelectorStats and Invalidation Tracking are great, but I want so much more.
Task | ms |
---|---|
Style | 400 |
βββ Bloom filter misses | 200 |
βββ :nth-child selectors |
130 |
βββ Sibling selectors | 50 |
βββ Custom properties | 20 |
Layout | 600 |
βββ <nav> (grid) |
300 |
βββ .sidebar (flexbox) |
200 |
βββ <main> (normal flow) |
100 |
Here is a mockup.
So that's my talk on style/layout performance. I hope you enjoyed it and learned something about how browsers work and how to optimize style/layout calculation.
Thank you to all the people who helped with research for this talk.
These slides are available online, and you can pop open the speaker notes to find links.
nolanlawson.com
So who am I?
If you know me from the internet, it's probably from my blog where I talk about performance, accessibility, web components, etc.
I was on the Microsoft Edge performance team for a couple years, then moved to the performance team at Salesforce, and now I work on our JavaScript framework, Lightning Web Components.
Keyboard shortcuts
β, β, Pg Up, k | Go to previous slide |
β, β, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |