What are premature optimizations? - Johnny's Software Lab

Master software performance in just 16 hours!
Join our Software Optimization for the Memory Subsystem Workshop taking place from May 18th to May 21st. Click here to express interest or register.

This short post is the personal opinion of the author, which comes from experience. Other people may have different opinions stemming from their experiences, and that is fine. The ability to change opinions when new evidence is presented is is what makes us better engineers and better people.

I had a recent debate on Twitter about premature optimizations. It turns out there are two extremes when it comes to performance optimizations: one extreme are people who think that performance optimizations are completely irrelevant on modern systems since now we have faster and faster processors and more and more memory. They often use the term premature optimization to dismiss any idea of optimizing before profiling. In contrast, the other group is very performance aware, where every comma matters if it has the capacity to improve speed.

The truth is probably somewhere in the middle, and here I will try to give my reasoning about this.

Do you need to discuss a performance problem in your project? Or maybe you want a vectorization training for yourself or your team? Contact us

You can also subscribe to our mailing list (link top right of this page) or follow us on LinkedIn , Twitter or Mastodon and get notified as soon as new content becomes available.

The crude reality of software performance

Let’s take an example of a machine learning problem: a typical machine learning problem will do matrix multiplication for 80% of its runtime, and other things for the rest 20% of the runtime. Matrix multiplication is a very common problem the CPUs solve, so all CPU vendors provide matrix multiplication libraries that are perfectly optimized for their CPUs. So, it is not possible to do anything about speeding up matrix multiplication.

Now, imagine we find a perfect compiler that can perfectly optimize our code, and the compiler has managed to optimize so well our machine learning problem, that the rest which used to take 20% of the runtime, now takes 0%. What it means is that if our program used to run 60 minutes, now it is running 48 minutes. Even if get rid of non-matrix multiplication part completely, our program will be running 20 % faster.

The conclusion is simple: performance optimization pay of only on hot code. You cannot expect faster programs when optimizing a cold part of the code. This conclusion is pretty simple and makes complete sense. But, the reality of software development is much more complex than this, as we will see later.

The complexity wall

Now let’s move on to another topic: why software projects fail. There are many reasons why software project fails: engineering, financial or managerial. Sometimes software projects fail because they cannot deliver performance. For example, a game can sell poorly because it has performance issues. But most of the time it is not that, the problem is complexity.

As the software grows and becomes more and more complex there is a chance that the following thing happens: we add a new feature, but we break an existing feature. Or we fix a bug, but with our fix we introduce a new bug. The development has reached a still point: we invest time and effort into the project, but the project isn’t moving anymore. We have hit a complexity wall: because of the software’s complexity, it is not possible to fix bugs or introduce new features without breaking something.

The absolute priority number one of all software projects is keeping the development as far away as possible from the complexity wall. For this we use all sorts of techniques, mostly revolving around simplicity: simple API, simple components, readable and maintainable code. This is where recommendation about writing simple, readable and maintainable code stems from.

Premature Optimizations

So, knowing that optimizations don’t work unless done on a hot code, and also that complexity management has a much higher priority than performance concerns, the question arises: is there a need to do any optimizations at all? The answer to that question is multifaceted:

Shifting hot spot

As we said earlier, optimizing anything but the hot loop doesn’t bring the expected speed-ups. But more often than not, the hot spot will shift from one place to another depending on the usage scenario. This is especially true for famous programs with many features and a large userbase. The developers might not be aware of what the hot spots in their code actually are before somebody external tells them. So, avoiding crude performance errors actually does make sense.

Industries where performance matters

There are industries, like the gaming industry or the high-frequency trading industry, where being slow means not shipping. In these industries, the software is developed with constant performance in mind. For example, game developers think that the classic object-oriented approach is hardware-unfriendly, and have devised a completely new paradigm called Entity-Component-System that uses hardware resources more efficiently. From the perspective of non-game developers, this approach is overkill, but for game developers, this approach is right since it works and delivers what they need.

There is no clear cut between the industries where performance matters and where it doesn’t. All projects can suffer from performance problems, and sometimes performance issues will require a serious overwrite to fix. We talked about those in the earlier post about why do programs get slower over time. But most of the time, for most engineers, performance is not a critical issue.

Good engineering practices

There are certain patterns for writing code that don’t introduce bugs or ambiguous behavior, don’t increase the software complexity and don’t make the code more difficult to read and maintain. For example, clang-tidy can be used to enforce some of such coding practices. Here is a list of a few of them:

Recommendation	Description
performance-unnecessary-value-param	Pass the function parameter as a const reference instead of passing it by value. E.g `void my_function(const std::string& v)` instead of `void my_function(std::string v)`.
performance-type-promotion-in-math-fn	If passing `float` to a mathematical function, avoid implicit type conversion. E.g. function `cos` accepts doubles, and if you call it with `float` there is an implicit type conversion.
performance-inefficient-algorithm	Warns not to use `std::find` on associative STL containers.
performance-implicit-conversion-in-loop	In the case of range `for` loops, warns if there is an implicit conversion from one type to another. E.g. `for (object o: object_vector)` creates a copy of `o` for each element in the loop. Instead, use `for (object& o: object_vector)`.
performance-inefficient-vector-operation	Warns if not reserving space in a vector when using `push_back` or `emplace_back` inside a loop, if the size of the vector can be calculated before entering the loop.

clang-tidy recommendations related to performance

There are some other recommendations that would fall into this category that clang-tidy doesn’t detect, for example using a binary search on a sorted vector instead of a regular search.

These “performance optimizations” do not obscure the meaning of the code, do not increase complexity and do not introduce bugs. Engineers who know them have a better grasp of C++ language and how things work under the hood, which is important in other domains as well. Personally, I prefer to call them good engineering practices rather than performance optimizations.

Bottomline

So, what’s the bottom line? The main focus of all engineers should be reducing complexity by writing easily understandable and maintainable code. But there are some simple engineering practices that do not lead to increased complexity or maintainability, yet they guarantee good performant software. Those “optimizations” are definitely worth doing in all the phases of software development, regardless if there is no need to do them because they are not part of the hot code. Simply, having them in habit will guarantee that they are in your head sometime in the future when the code you are writing is actually part of the hot code.