 # 2-minute read: the Magic Touch of Parallel Algorithms

In the previous article we made our program run faster by replacing `std::endl` with `'\n'` when writing to a file. Our flamegraph for the same program, after this modification, looks like this:

As you can see, what takes most of the time in our program now is the function `sort_lines`. Let’s have a look at the source code of this function:

``` void sort_lines(std::vector<std::string>& lines) {
std::sort(lines.begin(), lines.end());
}```

This is a simple call to `std::sort`, so we could naively conclude there is nothing possible to optimize. However, this is not true.

Almost all modern CPUs have more than one core nowadays but the standard C++ library is somewhat stuck in the past. Sorting can be done on more than one CPU simultaneously, and there exist parallel implementations of `std::sort` that can take advantage of more than one core.

### Parallel algorithms in the standard library

If you are C++17, the `std::sort` algorithm takes an additional parameter called `ExecutionPolicy`. Developers can use it to specify what flavor of the algorithm they prefer. Here is our modified implementation:

```void sort_lines(std::vector<std::string>& lines) {
std::sort(std::execution::par_unseq, lines.begin(), lines.end());
}```

With `std::execution::par_unseq` we are instructing the standard C++ library to use the parallel version of `std::sort` algorithm. Many algorithms of the standard C++ library accept this parameter, e.g. `std::sort`, `std::find`, `std::find_if`, `std::count`, `std::for_each`, `std::transform`, `std::transform_reduce` and many other.

If you are using an older C++ version, GCC’s standard C++ library offers Parallel Mode. Parallel mode is not enabled by default, and you can enable it in two ways:

• Recompile your program with `-D_GLIBCXX_PARALLEL`. This will instruct the standard library to use parallel algorithms whenever available or
• Use the algorithms from `__gnu_parallel` namespace, in our case `__gnu_parallel::sort`.

Our current implementation looks like this:

```void sort_lines(std::vector<std::string>& lines) {
#if __cplusplus >= 201703L
std::sort(std::execution::par_unseq, lines.begin(), lines.end());
#else
__gnu_parallel::sort(lines.begin(), lines.end());
#endif
}
```

We are choosing the `std::sort` implementation depending on the version of C++ version.

The flamegraph (with GCC’s parallel mode libstdc++) looks like this:

The runtime of `sort_lines` function went down from 440 ms to 150 ms. The whole program’s runtime went down from 650 ms to 390 ms with a very simple change!
1. Rama says: