Originally published at https://www.nerdwallet.com/blog/engineering/speeding-up-nerdwallet/
Our recently spun-up Frontend Infrastructure team now has more than 6 months under our belt and site performance has been one of our main focuses. This post summarizes a handful of the macro level things weāve done to improve and maintain page load performance ofĀ nerdwallet.com. Huge shout out to the product teams that did the brunt of the work to consume these changes as well as implement their own optimizations on individual parts of the site.
Removing Unused Global CSS/JS
NerdWalletĀ was originally created as a PHP monolith. During my time one of the largest overarching initiatives has been to move from this monolithic stack to microservices mostly built in Python and micro-apps built with Node/React.
The large amount of CSS we used to serve globally was intertwined in these micro-apps and even in our base React components. This effort was tedious tech debt cleanup but hereās the net impact after removing this unused code that was being served globally:
- 50kb (gzipped) of JS (190kb parsed) removed
- 30kb (gzipped) of CSS (200kb parsed) removed
Web Font Optimization
I wroteĀ a full blog postĀ about this so you can read more about it there.
TL;DR:Ā Font optimization is quite complicated nowadays but a concerted effort can have meaningful reduction in time-to-first-paint (in our case, as much as a 30% drop). In our case we are subsetting critical fonts and loading them as soon as possible viaĀ <link rel="preloadā>
.
Server Side React Render Cache
When we first migrated to React, one of the must-haves was that we couldnāt sacrifice the risk of losing search ranking by client-side rendering our applications ā we had to use server side rendering to produce the HTML needed for easiest GoogleBot consumption.
However the React render can take a non-trivial amount of time. If you have a really large tree, itās not uncommon for the render function to near 100ms. And 100ms per request spent on a synchronous, blocking call means a maximum of 10 requests per second per instance. Thatās not good.
Since Reactās render is a pure function, we can memoize based on all the inputs (e.g. reduxās store state and the react-router location). In a cache hit this can be an order of magnitude or more reduction in time to render, improving performance for that page as well as throughput capabilities.
Nearly 50% reduction in overall server side render pre/post
Centralized Build Tool
Webpack configuration can be fairly complex. Since we have dozens of individual web-apps that all run webpack themselves we devised a solution to share webpack configs across all of these applications so they could be centrally managed. We landed on using a light wrapper around ElectrodeāsĀ webpack-config-composerĀ because it can be expressed as a plain object.
With this tool in place, whenever we make a performance optimization via our webpack config, we just update in one place and the benefits are propagated out to all apps when they upgrade / redeploy.
The above would produce a webpack config with sane defaults to be consumed in a browser, while allowing complete customization, in this case specifying a custom value for theĀ minChunks
Ā option to be passed intoĀ CommonChunksPlugin
.
Upgrading Webpack
As a part of creating this centralized webpack interface, we upgraded our version of Webpack. One challenge we ran into when upgrading to webpack 3 (this was prior to webpack 4 being released) from webpack 1 is that the dedupe plugin was removed and thusĀ our bundles got bigger.
This was unexpected since we assumed the later version would be better for performance. We ended up rolling our own webpack dedupe plugin to produce the same functionality.
Babel
When transpiling with Babel, we use babel-preset-env,Ā browserslist, and our siteās google analytics data to compile our JavaScript for supported browsers based on traffic usage.
When we update our traffic usage from google analytics, as apps re build/deploy the JS supported will be automatically transpiled to reflect the browsers we support.
Images
Lazyloading an image with a low-res placeholder inlined.
We built a React Image component to codify best practices (e.g.Ā <picture>
Ā ,Ā srcset
,Ā sizes
) and support lazy loading to improve perceived performance.
ByĀ lazy loadingĀ we can ensure we are only downloading images when the user will see them, and by using anĀ aspect ratio boxĀ we can avoid image reflows.
Codesplitting
Godspeed those who attempt the server-rendered, code-split apps
This is a real quote from react-router that was live on their site until very recently. Codesplitting a server side rendered app is not as battle tested and only recently has there been solid tooling to do such a thing.
We ended up leveraging the suite ofĀ react-universal-componentĀ packages to handle CSS and JS based codesplitting. This allows us to achieve CSS and JS codesplitting that will work server side or client side.
Weāve had some challenges around setting this up, codesplitting from within a nested package, and codesplitting / css modules not playing nicely. However this has allowed us to do route-based or component-based codesplitting, and split out things like large visualization libraries to their own bundles that are lazy-loaded.
CDN in front of nerdwallet.com
This was probably the single biggest optimization that improved our page load performance site-wide.
Thanks to a large effort from our DevOps team we put Amazonās CloudFront in front of all traffic on our entire website. This was a huge win because CloudFront has many data centers from which it operates and our customers are now opening up connections with the closest CloudFront location to them rather than going all the way to nerdwallet.com which is not hosted in nearly as many locations. Additionally, even after the connection has been established, since nerdwallet.com is hosted in Amazonās S3/ECS, we are able to leverage Amazonās internal routing rather than the public internet to get content to users as fast as possible. Overall, this helped dropped our site-wide time to first byte by ~20%.
The second large change associated with this is that we were able to consolidate all of our assets to our top level domain. For example, instead of referencing assets on the CDN atĀ cdn.nerdwallet.com
Ā we now can useĀ www.nerdwallet.com/cdn
. This results in faster time to download assets in the critical render path over HTTP2 as we donāt need to handle a new DNS lookup/TCP connection/SSL handshake ā the browser can leverage the existing connection opened from the top level domain.
All critical assets are downloaded faster ššØ
Asset Prioritization
CSS / Fonts / images
<link rel="preload" as="font" type="font/woff2" href="..."/>
For assets in the critical render path impacting SpeedIndex, we leverage theĀ preload
Ā attribute where we can and have moved our JavaScript out of theĀ <head>
Ā so the browser can potentially paint as much of the page without having to make many requests in serial.
JavaScript
<script src="..." defer />
We useĀ defer
Ā for our core JS bundles overĀ async
Ā becauseĀ defer
Ā guarantees order and guarantees the main thread wonāt be blocked until the DOMContentLoaded event. In practice we sawĀ async
Ā scripts sometimes being evaluated much earlier than weād like to see ā ahead of image paint or similar.
Challenges
We have seen some unexpected behavior with the HTTP2 implementation of Chrome and/or Cloudfront. In particular, Chrome specifies assets of a lower priority in a linear dependency. What this means is, as an example, that there might be tiny image files that are waiting for much larger JavaScript files to completely finish downloading ā these assets are downloading in serial not in parallel.
This required us to notĀ preload
Ā our JavaScript bundles since they can be large and we didnāt want them to block the loading of much smaller images, andĀ Chrome gives a higher priority to preloaded JS assets than images.
Building a Culture of Performance
In order to maintain and improve our site performance, we needed to build a culture that values site performance and measures it on a regular basis.
This is probably the biggest change weāve made to ensure we are on top of our performance. Weāve adopted SpeedCurve and made SpeedIndex one of the key metrics we monitor across all of our frontend teams. If you donāt useĀ SpeedCurve, we highly recommend it.
Automated Analysis of Performance Impacts of PRs
Vlad Silin, an intern on the FEI team for the past quarter built an awesome tool to provide performance related feedback in the form of a Github comment when developers make PRs.Ā Read more about this in Vladās post.
Culture
Weāve held performance-related workshops, talks, captured performance data in our data warehouse to correlate page performance to business performance, and in general been advocates where possible to make performance a core part of frontend engineering at NerdWallet.
Upcoming
We still have a long list of things we want to do. Some things we plan to work on in the near future
- Inline CSS under a minimum threshold via GoogleāsĀ Nginx pagespeed module
- Improve our image processing pipeline ā support requiring an imageĀ
require('../my-image.png')
from within JS and generate responsive image sizes either at build or on the fly - Service Workers for better offline support
- Upgrade versions of some of our key packages (webpack 4, React 16)