We investigate how memory consumption, dataset size and software performance correlate…
In this post we introduce the essentials of programming for systems with several CPU cores. We start with an explanation of software threads and synchronization, two fundamental building blocks of multithreaded programming. We explain how these are implemented in hardware, and finally, we present several multithreading APIs you can use for parallel programming.
As I already mentioned in earlier posts, vectorization is the holy grail of software optimizations: if your hot loop is efficiently vectorized, it is pretty much running at fastest possible speed. So, it is definitely a goal worth pursuing, under two assumptions: (1) that your code has a hardware-friendly memory access pattern1 and (2) that…
In this post we present key concepts of software performance engineering.
We try to answer the question of why is quicksort faster than heapsort and then we dig deeper into these algorithms’ hardware efficiency. The goal: making them faster.
For all the engineers who like to tinker with software performance, vectorization is the holy grail: if it vectorizes, this means that it runs faster. Unfortunately, many times this is not the case, and the results of forcing vectorization by any means can mean lower performance. This happens when vectorization hits the memory wall: although…
In this post we try to answer the questions “what are premature optimizations?”
We investigate why software gets slower as new features are added or data set grows and what can you do about it.
We try to answer two questions related to compiler optimizations: how can you help the compiler do a better job and when does it make sense to do the compiler optimizations manually.
We investigate what are the techniques your compiler employs to make your loop run faster.