As I already mentioned in earlier posts, vectorization is the holy grail of software optimizations: if your hot loop is efficiently vectorized, it is pretty much running at fastest possible speed. So, it is definitely a goal worth pursuing, under two assumptions: (1) that your code has a hardware-friendly memory access pattern1 and (2) that…
We try to answer two questions related to compiler optimizations: how can you help the compiler do a better job and when does it make sense to do the compiler optimizations manually.
We investigate what are the techniques your compiler employs to make your loop run faster.
This is the first article about hardware support for parallelization. We talk about SIMD, an extension almost every processor nowadays has that lets you speed up your program.