It is named so, since you can typically carry out the calculations on the back of an envelope or a napkin. Note: for interviews, these should be called Side of the Whiteboard Calculations.
Sometimes we face a choice between alternative architectures. At simplest, we can ask if a single server is sufficient? If not, how many servers would we need? A quick calculation gives us a rough estimate.
Generally, the calculation tells us
First, we need to recognize a limited resource in our system, then approximate the actual usage. For example, our servers are capped by 2GHz CPUs, and we would like to know if we can serve all user requests using a single server.
So how to approximate the actual usage? We need to break down the usage to its constituting factors, make a rough estimate of those factors (if needed, further breaking them down), and combining them.
For example, we might expect to have 1K active users, each issuing 15 requests per day. That’s 15K requests per day, or 15K/86400 requests per second.
When combining the parts, a trick is to round aggressively. Noone wants to divide by 86400. So let’s round to 20K/100K, leaving 0.2 seconds time available to serve a single request. If we know that a single request roughly takes 0.7 seconds to serve, we need to bring up at least 4 machines. Of course you don’t want to live on the edge, so let’s add some buffer and make that 10 machines (which is also a nicer number).
Prefer using small numbers along with abbreviation for magnitude (or, if needed, exponents) rather than writing out a full number. 10K instead of 10000.
If given a large and overly-precise number, for example 7432, instantly convert and write it down as 7K. You are approximating anyway.
Having numbers in this form makes multiplication and division fast. K*K is M. G/M is K. 4K*7M=28G. To work with larger numbers, round both of them towards a small multiple of a power of 2 or 10.
Find typical limited dimensions along with exercises below.
Assuming 1Gbps link per machine, if we want to crawl 70TB of websites every day, how many machines would a crawler system need?
How much space would it take to store the contents of 100M web pages? What if we substitute each word by an integer index? How many machines of 64GB SSD would it fit?
You store fetched web pages on a mechanical hard drive, with limited random access speed. Users issue requests with 100 query per sec (qps), each request typically returning the content of 20 pages. How many hard drives would you need to keep request latency low?
You need to deliver a new feature. There are 5 programmers and 40 tasks. How many weeks until possible launch?
A user pays $10 a month for your image store service, storing all their photos, each downsized to 3MB. During a month a user fetches 1K photos. Find the pricing page of your favorite cloud provider, and calculate the cost associated with each user. How much is your revenue per user? Check for different assumed photo counts.
Others include CPU time, RAM size, latencies of various kinds (disk access, RAM access, network), thread count.
Enumerate typical use-cases of the system and determine the most critical resources they need. A document store will need lots of storage. Guesstimating document sizes and counts is a good start, but further details will depend on usage. How often are new documents added? Are the documents searchable? Do we need any indices? Is it a write-heavy or read-heavy store?
Sometimes different use-cases will need very different shapes of resources. For example, the serving the documents might need lots of RAM but not CPU, while preprocessing of new documents the other way around. Hosting all those services on homogeneous machines would waste both CPU and RAM, since you need to get machines which are maxed on both dimensions.
Such differences indicate those features should be split to different services, hosted on independent sets of machines.
Outside of system design interviews, you can reach out to actual data. Spend some time mining the monitoring dashboards to get usual CPU usage. Perform a load test to measure peak RAM consumption. Run a SQL query to get the average number of photos stored by user.
It doesn’t need to be an either-or. Complement assumptions with data-backed facts if needed.
A summary of numbers every engineer should know. Or at least know where to look up ;)