What is Two hard things in computer science?
Engineering culture and software practice
Two hard things in computer science is a long running developer saying that points to an awkward truth. Some of the trickiest parts of software are not flashy algorithms but deciding what to call things, and making sure cached data stays fresh without becoming slow or wrong. Engineers use the phrase as a joke, but also as a warning that meaning, timing, and stale information can make ordinary work unexpectedly difficult.
What this means
On the surface, the saying is funny because it makes software engineering sound absurdly specific. Out of everything in computing, are those really the two hardest jobs. That is exactly the point. The phrase picks two chores that look simple until you are the person responsible for them.
Naming matters because names are how humans understand a system. Cache invalidation matters because computers are always trying to keep copies of data nearby to save time. The moment you keep a copy, you have a fresh question. When does that copy stop being trustworthy.
Why it matters
This saying matters because it gives non specialists a quick way to understand why software teams sometimes struggle with work that looks small from the outside. Renaming a field, an API endpoint, or a product concept can ripple through code, documentation, reporting, and team conversations. Changing cache rules can affect speed, cost, and correctness all at once.
It also captures something important about engineering culture. A lot of the difficulty in software does not come from pure computation. It comes from representing a messy world clearly, then keeping many moving parts in sync. If you understand this saying, you understand why engineers care so much about vocabulary, ownership, and data freshness. You also understand why yak shaving sometimes starts with "this name is a bit wrong" or "we will just add a cache".
How it works
Where the saying came from
The line is usually credited to Phil Karlton and was later popularised by Martin Fowler, who wrote about it after hearing and using it in the software world. Like a lot of engineering folklore, the attribution is widely repeated but the early trail is patchy. That uncertainty is part of the charm. The phrase behaves like many good engineering jokes. It spreads because people recognise the truth in it immediately.
Over time people started adding a bonus punchline about off by one errors. That add on is itself a tiny demonstration of the original point. Engineers cannot resist refining a line if it fits the pattern of the field.
Why naming is so hard
Naming is difficult because a name is never just a label. It carries assumptions. If you call something a customer, are you talking about a paying company, a single user, a billing account, or a prospect. If you call a feature a report, does it mean a saved dashboard, a downloadable file, or a summary e mail. Good names help people reason. Bad names quietly push people into the wrong mental model.
In software, names also have to survive change. A name that is perfect for the first version of a product can turn misleading a year later. Teams often discover that the real work was not typing the identifier but deciding what the thing actually is. That is why strong engineering teams obsess over domain language, schema names, API names, and event names. They are not being precious. They are trying to stop future confusion before it starts.
There is also a social side to naming. A name has to make sense to more than the person who invented it. It must work for the next developer, the analyst, the support team, and sometimes the customer reading a screen. When teams argue for half an hour over a field name, they are often really arguing about product meaning, ownership, and hidden edge cases.
Why cache invalidation is so hard
A cache is a stored copy of information kept close to where it will be used. The idea is wonderfully practical. If a system can reuse a recent answer instead of recalculating it or fetching it from a slower place, everything gets quicker. The difficulty starts the moment the original data changes.
Now the team has to answer a messy set of questions. Which copies exist. Who owns them. How quickly do they need to reflect the new truth. Is it safe to wait a few seconds. Should the cache expire after a timer, or should the system actively purge it when something changes. What happens if one layer updates and another one does not. Good cache behaviour is a game of trade offs between freshness, simplicity, traffic, and cost.
The reason this becomes notorious is that caches appear everywhere. Browsers cache pages. Content delivery networks cache images and files. Application servers cache query results. Databases cache recent reads. Even people carry caches in their heads in the form of assumptions and dashboards that have not yet refreshed. When an engineer says cache invalidation is hard, they usually mean that copied truth is convenient right up until the day it is not.
How the saying shows up in real work
Teams use the phrase in two slightly different ways. Sometimes it means "this seemingly small task hides a lot of complexity". Sometimes it means be careful, we are about to make our lives harder with vague names or clever caching. In both cases the saying works as a small safety alarm.
It is also a reminder that speed and clarity are rarely separate concerns. A bad name slows people down. A stale cache produces wrong behaviour and forces detective work. In other words, both "hard things" punish overconfidence. They reward teams that make meaning explicit and build sensible rules for staleness.
Examples
A product team introduces a new concept called an "account". Six months later they discover that sales means company, finance means billing record, and the application means one login owner. No code is broken in the obvious sense, but reporting, permissions, and support tickets all become muddled. The expensive part is not the rename. It is untangling the meaning.
An ecommerce site caches stock levels to keep pages fast. A shopper buys the last item, but another user still sees it as available because a copy is sitting in a fast layer that has not yet expired. The bug is not dramatic in code review, yet it is dramatic for customers. Suddenly the team is tracing refresh rules through half a dozen systems.
A data platform creates a dashboard metric called "active user". It ships quickly, then every department uses it differently. Growth counts any visit, product counts a meaningful action, and finance counts paying usage. The argument looks like semantics, but it is really about whether the company is even measuring the same thing.
Common misunderstandings
A common misunderstanding is that this is only a joke. It is a joke, but it survives because it compresses a lot of real experience into one line.
Another is that naming means picking neat variable names. That is the shallow version. The deeper version is shaping a shared language for the whole system and the people around it.
People also hear cache invalidation and assume it only matters in giant distributed platforms. In reality, any system with saved copies, local state, browser storage, report snapshots, or CDN files can hit the same problem.
There is also a temptation to treat naming debates as bikeshedding. Sometimes they are. Often they are a sign that the underlying concept is still blurry, and the team is wisely noticing that before the blur hardens into code.
Risks and boundaries
The bad version of this saying turns it into an excuse. "Naming is hard" can become a reason to keep muddy terminology. "Cache invalidation is hard" can become a reason to bolt on time based expiry and hope for the best.
The good version is humbler. It says these areas deserve deliberate thinking. If a team cannot explain what a thing is called and why, or cannot explain when a cached value becomes stale, they probably have not finished the design.
It is also worth saying that not every project needs heroic naming workshops or elaborate cache rules. Some internal tools can live with rough edges. The trick is to know when the edge is harmless and when it will multiply confusion later.
What to do next
If you recognise this pattern in your team, start by asking two plain questions. What do we call this thing, and what exactly do we mean by that name. Then ask the matching systems question. Where are copies of this data kept, and what makes them trustworthy or stale.
Encourage teams to keep a small shared vocabulary for important concepts. Review names early, when change is cheap. For caches, make freshness rules explicit rather than implied. Decide who owns invalidation, what delay is acceptable, and how the team will observe stale behaviour in production.
Most of all, treat clarity as real engineering work. It is not admin. It is one of the ways teams avoid weeks of confusion later.
FAQs
Is this phrase really about only two things?
No. It is playful exaggeration. The point is that these two tasks are surprisingly rich sources of pain.
Why do engineers care so much about names?
Because names shape how people think about the system. When the name is wrong, the reasoning that follows is often wrong too.
What counts as a cache?
Any stored copy kept to save time can act like a cache. That includes browser state, CDN files, application memory, query results, and more.
Why is stale data such a big deal?
Because fast wrong answers are still wrong. A stale cache can create customer confusion, bad decisions, or difficult production bugs.
Why do people sometimes add off by one errors to the saying?
It is a later joke layered on top of the original. Engineers enjoy adding one more self referential punchline when the topic invites it.
Is the answer to avoid caching?
Not usually. Caches are often very useful. The real lesson is that a cache needs clear freshness rules, not blind optimism.
