├── .github
└── FUNDING.yml
├── .gitignore
├── CMakeLists.txt
├── README.md
├── cpp
├── pcl_conditional_removal.cpp
├── prefer_references.cpp
├── reserve.cpp
├── small_strings.cpp
├── strings_are_vectors.cpp
└── strings_concatenation.cpp
├── docs
├── 2d_matrix_iteration.md
├── 2d_transforms.md
├── about.md
├── boost_flatmap.md
├── dont_need_map.md
├── img
│ ├── beautiful.jpg
│ ├── boom.gif
│ ├── const_reference.png
│ ├── cpp.png
│ ├── davide_yells_at_PCL.jpg
│ ├── feel_bad.jpg
│ ├── fizzbuzz.jpg
│ ├── growing_vector.png
│ ├── hotspot_heaptrack.jpg
│ ├── inconceivably.jpg
│ ├── laser_scan_matcher.png
│ ├── linked_list.png
│ ├── modifystring.png
│ ├── mordor.jpg
│ ├── motor_profile1.png
│ ├── motor_profile2.png
│ ├── multiply_vector.png
│ ├── palindrome_benchmark.png
│ ├── pcl.jpg
│ ├── pcl_fromros.png
│ ├── quick-bench.png
│ ├── quote.png
│ ├── really.jpg
│ ├── realsense.png
│ ├── relax_sso.jpg
│ ├── spider_senses.png
│ ├── sso_in_action.png
│ ├── string_concatenation.png
│ ├── that_is_logn.jpg
│ ├── think_about_it.jpg
│ ├── tostring.png
│ ├── twitter_unordered.png
│ ├── two_lookups.jpg
│ ├── vector_reserve.png
│ ├── velodyne.png
│ └── why_copy.jpg
├── index.md
├── no_lists.md
├── palindrome.md
├── pcl_filter.md
├── pcl_fromROS.md
├── prefer_references.md
├── reserve.md
├── small_strings.md
├── small_vectors.md
├── strings_are_vectors.md
└── strings_concatenation.md
├── mkdocs.yml
└── overrides
└── main.html
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | # These are supported funding model platforms
2 |
3 | github: facontidavide
4 | custom: https://www.paypal.me/facontidavide
5 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | site
2 |
3 | */CMakeLists.txt.user
4 |
5 | build-*
6 |
7 | # Prerequisites
8 | *.d
9 |
10 | # Compiled Object files
11 | *.slo
12 | *.lo
13 | *.o
14 | *.obj
15 |
16 | # Precompiled Headers
17 | *.gch
18 | *.pch
19 |
20 | # Compiled Dynamic libraries
21 | *.so
22 | *.dylib
23 | *.dll
24 |
25 | # Fortran module files
26 | *.mod
27 | *.smod
28 |
29 | # Compiled Static libraries
30 | *.lai
31 | *.la
32 | *.a
33 | *.lib
34 |
35 | # Executables
36 | *.exe
37 | *.out
38 | *.app
39 |
--------------------------------------------------------------------------------
/CMakeLists.txt:
--------------------------------------------------------------------------------
1 | cmake_minimum_required(VERSION 2.8)
2 |
3 | project(cpp_optimizations)
4 |
5 | set(CMAKE_CXX_STANDARD 11)
6 | set(CMAKE_CXX_STANDARD_REQUIRED ON)
7 |
8 | # On ubuntu, install with: sudo apt install libbenchmark-dev
9 | # https://github.com/google/benchmark
10 | find_package(benchmark REQUIRED)
11 | find_package(Threads)
12 |
13 | function(compile_example target file)
14 | add_executable(${target} ${file} )
15 | target_link_libraries(${target} benchmark Threads::Threads)
16 | endfunction()
17 |
18 | compile_example(prefer_references basics/prefer_references/prefer_references.cpp)
19 |
20 | compile_example(vector_reserve vectors_everywhere/reserve/reserve.cpp)
21 |
22 | compile_example(strings_are_vectors just_a_string/strings_are_vectors/strings_are_vectors.cpp)
23 | compile_example(small_string_optimization just_a_string/small_strings/small_strings.cpp)
24 |
25 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Preface: about me
2 |
3 | My name is [Davide Faconti](https://twitter.com/facontidavide) and my job is one of the best in the world: I work in **robotics**.
4 |
5 | This blog/repository is maintained in **my spare time** and it is not related to my work there. Therefore *opinions (and memes) are all mine and don't represent my employer in any way*.
6 |
7 | I love C++ programming and Open Source and this "diary" is my small contribution to the OSS community.
8 |
9 | # CPP Optimizations Diary
10 |
11 | Optimizing code in C++ is something that no one can resist. You can have fun
12 | and pretend that you are doing something useful for your organization at the same time!
13 |
14 | In this repository, I will record some simple design patterns to improve your code
15 | and remove unnecessary overhead in **C++**.
16 |
17 | If you are a seasoned C++ expert, you probably have your own set of rules already.
18 |
19 | These rules help you look like a bad-ass/rockstar/10X engineer to your colleagues.
20 |
21 | You are the kind of person that casually drops a [std::vector<>::reserve](docs/reserve.md) before a loop and
22 | nods, smiling, looking at the performance improvement and the astonishment of your team member.
23 |
24 |

25 |
26 |
27 | Hopefully, the examples in this repository will help you achieve this status of guru
28 | and, as a side effect, save the planet from global warming, sparing useless CPU
29 | cycles from being wasted.
30 |
31 | Then, unfortunately, someone on the other side of the planet will start mining Bitcoins or write her/his
32 | application in **Python** and all your effort to save electricity was for nothing.
33 |
34 | I am kidding, Python developers, we love you!
35 |
36 | > Narrator: "he was not kidding..."
37 |
38 | ## Rule 1: measure first (using _good_ tools)
39 |
40 | The very first thing any person concerned about performance should do is:
41 |
42 | - **Measure first** and **make hypothesis later**.
43 |
44 | Me and my colleagues are almost always wrong about the reasons a piece of code is slow.
45 |
46 | Sometimes we are right, but it is really hard to know in advance how refactoring will
47 | improve performance. Good profiling tools show in minutes the "low hanging fruits": minimum work, maximum benefit!
48 |
49 | Summarizing: 10 minutes profiling can save you hours of guessing and refactoring.
50 |
51 | My "goto" tools in Linux are [Hotspot](https://github.com/KDAB/hotspot) and
52 | [Heaptrack](https://github.com/KDE/heaptrack). I understand Windows has similar
53 | tools too.
54 |
55 | 
56 |
57 | In the benchmark war, if you are the soldier, these are your rifle and hand grenades.
58 |
59 | Once you know which part of the code deserves to be optimized, you might want to use
60 | [Google Benchmark](https://github.com/google/benchmark) to measure the time spent in a very specific
61 | class or function.
62 |
63 | You can even run it Google Benchmark online here: [quick-bench.com](http://quick-bench.com/G7B2w0xPUWgOVvuzI7unES6cU4w).
64 |
65 | 
66 |
67 | ## Rule 2: learn good design patterns, use them by default
68 |
69 | Writing good code is like brushing your teeth: you should do it without thinking too much about it.
70 |
71 | It is a muscle that you need to train, that will become stronger over time. But don't worry:
72 | once you start, you will start seeing recurring patterns that
73 | are surprisingly simple and works in many different use cases.
74 |
75 | **Spoiler alert**: one of my most beloved tricks is to _minimize the number of heap allocations_.
76 | You have no idea how much that helps.
77 |
78 | But let's make something absolutely clear:
79 |
80 | - Your **first goal** as a developer (software engineer?) is to create code that is **correct** and fulfil the requirements.
81 | - The **second** most important thing is to make your code **maintainable and readable** for other people.
82 | - In many cases, you also want to make code faster, because [faster code is better code](https://craigmod.com/essays/fast_software/).
83 |
84 | In other words, think twice before doing any change in your code that makes it less readable or harder to debug,
85 | just because you believe it may run 2.5% faster.
86 |
87 | # Get started
88 |
89 | For a more comfortable reading experience, visit: https://cpp-optimizations.netlify.app
90 |
91 | ## Optimization examples
92 |
93 | ### "If you pass that by value one more time..."
94 |
95 | - [Use Const reference by default](docs/prefer_references.md).
96 |
97 | - Move semantic (TODO).
98 |
99 | - Return value optimization (TODO).
100 |
101 |
102 | ### std::vector<> is your best friend
103 |
104 |
105 | - [Use std::vector<>::reserve by default](docs/reserve.md)
106 |
107 | - ["I have learnt linked-lists at university, should I use them?" Nooope](docs/no_lists.md).
108 |
109 | - [You don't need a `std::map<>` for that](docs/dont_need_map.md).
110 |
111 | - [Small vector optimization](docs/small_vectors.md)
112 |
113 |
114 | ### "It is just a string, how bad could that be?"
115 |
116 | - [Strings are (almost) vectors](docs/strings_are_vectors.md)
117 |
118 | - [When not to worry: small string optimization](docs/small_strings.md).
119 |
120 | - [String concatenation: the false sense of security of `operator+`](docs/strings_concatenation.md).
121 |
122 | - `std::string_view`: love at first sight (TODO).
123 |
124 | ### Don't compute things twice.
125 |
126 | - [Example: 2D/3D transforms the right way](docs/2d_transforms.md).
127 |
128 | - [Iterating over a 2D matrix: less elegant, more performant](docs/2d_matrix_iteration.md).
129 |
130 | ### Fantastic data structures and where to find them.
131 |
132 | - [I tried `boost::container::flat_map`. You won't imagine what happened next](docs/boost_flatmap.md).
133 |
134 | ### Case studies
135 |
136 | - [Simpler and faster way to filter Point Clouds in PCL.](docs/pcl_filter.md)
137 |
138 | - [Fast Palindrome: the cost of conditional branches](docs/palindrome.md)
139 |
140 |
141 | # License
142 |
143 | This work is licensed under CC BY-SA 4.0


144 |
--------------------------------------------------------------------------------
/cpp/pcl_conditional_removal.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 |
10 | using namespace pcl;
11 |
12 | const char filename[] = "test_pcd.pcd";
13 |
14 | PointCloud::Ptr LoadFromFile(const char *filename)
15 | {
16 | PointCloud::Ptr cloud_in(new PointCloud);
17 | if (io::loadPCDFile(filename, *cloud_in) == -1) //* load the file
18 | {
19 | std::cerr << "Couldn't read file " << filename << std::endl;
20 | throw std::runtime_error("file not found)");
21 | }
22 | return cloud_in;
23 | }
24 |
25 | static void PCL_Filter(benchmark::State & state)
26 | {
27 | auto cloud = LoadFromFile(filename);
28 |
29 | PointXYZ minPt, maxPt;
30 | pcl::getMinMax3D(*cloud, minPt, maxPt);
31 | const float mid_point_x = (maxPt.x+minPt.x)*0.5 ;
32 | const float mid_point_y = (maxPt.y+minPt.y)*0.5 ;
33 |
34 | PointCloud::Ptr outCloud(new PointCloud);
35 |
36 | for (auto _ : state) {
37 | outCloud->clear();
38 | pcl::ConditionalRemoval range_filt;
39 |
40 | auto range_cond = boost::make_shared>();
41 | auto comparison_x = boost::make_shared>("x", ComparisonOps::GT, mid_point_x);
42 | auto comparison_y = boost::make_shared>("y", ComparisonOps::GT, mid_point_y);
43 | range_cond->addComparison(comparison_x);
44 | range_cond->addComparison(comparison_y);
45 |
46 | range_filt.setInputCloud(cloud);
47 | range_filt.setCondition(range_cond);
48 | range_filt.filter(*outCloud);
49 | }
50 | }
51 |
52 | static void Naive_Filter(benchmark::State & state)
53 | {
54 | auto cloud = LoadFromFile(filename);
55 |
56 | PointXYZ minPt, maxPt;
57 | pcl::getMinMax3D(*cloud, minPt, maxPt);
58 | const float mid_point_x = (maxPt.x+minPt.x)*0.5 ;
59 | const float mid_point_y = (maxPt.y+minPt.y)*0.5 ;
60 |
61 | PointCloud::Ptr outCloud(new PointCloud);
62 |
63 | for (auto _ : state) {
64 | outCloud->clear();
65 | for (const auto& point: cloud->points) {
66 | if( point.x > mid_point_x && point.y > mid_point_y ){
67 | outCloud->push_back( point );
68 | }
69 | }
70 | }
71 | }
72 |
73 | BENCHMARK(PCL_Filter);
74 | BENCHMARK(Naive_Filter);
75 |
76 | //----------------------------------------------------------------
77 | template
78 | class GenericCondition : public ConditionBase
79 | {
80 | public:
81 | typedef boost::shared_ptr> Ptr;
82 | typedef boost::shared_ptr> ConstPtr;
83 | typedef std::function FunctorT;
84 |
85 | GenericCondition(FunctorT evaluator): ConditionBase(),_evaluator(evaluator) {}
86 |
87 | virtual bool evaluate (const PointT &point) const {
88 | return _evaluator(point);
89 | }
90 | private:
91 | FunctorT _evaluator;
92 | };
93 |
94 | static void PCL_Filter_Generic(benchmark::State & state)
95 | {
96 | auto cloud = LoadFromFile(filename);
97 |
98 | PointXYZ minPt, maxPt;
99 | pcl::getMinMax3D(*cloud, minPt, maxPt);
100 | const float mid_point_x = (maxPt.x+minPt.x)*0.5 ;
101 | const float mid_point_y = (maxPt.y+minPt.y)*0.5 ;
102 |
103 | PointCloud::Ptr outCloud(new PointCloud);
104 |
105 | for (auto _ : state) {
106 | outCloud->clear();
107 | pcl::ConditionalRemoval range_filt;
108 |
109 | auto range_cond = boost::make_shared>(
110 | [=](const PointXYZ& point){
111 | return point.x > mid_point_x && point.y > mid_point_y;
112 | });
113 |
114 | range_filt.setInputCloud(cloud);
115 | range_filt.setCondition(range_cond);
116 | range_filt.filter(*outCloud);
117 | }
118 | }
119 |
120 | BENCHMARK(PCL_Filter_Generic);
121 |
122 | BENCHMARK_MAIN();
123 |
--------------------------------------------------------------------------------
/cpp/prefer_references.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 |
4 | struct Vector3D{
5 | double x;
6 | double y;
7 | double z;
8 | };
9 |
10 | Vector3D MultiplyByTwo_Value(Vector3D p){
11 | return { p.x*2, p.y*2, p.z*2 };
12 | }
13 |
14 | Vector3D MultiplyByTwo_Ref(const Vector3D& p){
15 | return { p.x*2, p.y*2, p.z*2 };
16 | }
17 |
18 | void MultiplyVector_Value(benchmark::State& state) {
19 | Vector3D in = {1,2,3};
20 | for (auto _ : state) {
21 | Vector3D out = MultiplyByTwo_Value(in);
22 | benchmark::DoNotOptimize(out);
23 | }
24 | }
25 | BENCHMARK(MultiplyVector_Value);
26 |
27 | void MultiplyVector_Ref(benchmark::State& state) {
28 | Vector3D in = {1,2,3};
29 | for (auto _ : state) {
30 | Vector3D out = MultiplyByTwo_Ref(in);
31 | benchmark::DoNotOptimize(out);
32 | }
33 | }
34 | BENCHMARK(MultiplyVector_Ref);
35 | //----------------------------------
36 | size_t GetSpaces_Value(std::string str)
37 | {
38 | size_t spaces = 0;
39 | for(const char c: str){
40 | if( c == ' ') spaces++;
41 | }
42 | return spaces;
43 | }
44 |
45 | size_t GetSpaces_Ref(const std::string& str)
46 | {
47 | size_t spaces = 0;
48 | for(const char c: str){
49 | if( c == ' ') spaces++;
50 | }
51 | return spaces;
52 | }
53 |
54 | const std::string LONG_STR("a long string that can't use Small String Optimization");
55 |
56 | void PassStringByValue(benchmark::State& state) {
57 | for (auto _ : state) {
58 | size_t n = GetSpaces_Value(LONG_STR);
59 | benchmark::DoNotOptimize(n);
60 | }
61 | }
62 | BENCHMARK(PassStringByValue);
63 |
64 | void PassStringByRef(benchmark::State& state) {
65 | for (auto _ : state) {
66 | size_t n = GetSpaces_Ref(LONG_STR);
67 | benchmark::DoNotOptimize(n);
68 | }
69 | }
70 | BENCHMARK(PassStringByRef);
71 |
72 | //----------------------------------
73 |
74 | size_t Sum_Value(std::vector vect)
75 | {
76 | size_t sum = 0;
77 | for(unsigned val: vect) { sum += val; }
78 | return sum;
79 | }
80 |
81 | size_t Sum_Ref(const std::vector& vect)
82 | {
83 | size_t sum = 0;
84 | for(unsigned val: vect) { sum += val; }
85 | return sum;
86 | }
87 |
88 | const std::vector vect_in = { 1, 2, 3, 4, 5 };
89 |
90 | void PassVectorByValue(benchmark::State& state) {
91 | for (auto _ : state) {
92 | size_t n = Sum_Value(vect_in);
93 | benchmark::DoNotOptimize(n);
94 | }
95 | }
96 | BENCHMARK(PassVectorByValue);
97 |
98 | void PassVectorByRef(benchmark::State& state) {
99 | for (auto _ : state) {
100 | size_t n = Sum_Ref(vect_in);
101 | benchmark::DoNotOptimize(n);
102 | }
103 | }
104 | BENCHMARK(PassVectorByRef);
105 |
106 | //----------------------------------
107 |
108 | BENCHMARK_MAIN();
109 |
--------------------------------------------------------------------------------
/cpp/reserve.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 |
4 |
5 | static void NoReserve(benchmark::State& state) {
6 |
7 | for (auto _ : state) {
8 | std::vector v;
9 | for(size_t i=0; i<100; i++){
10 | v.push_back(i);
11 | }
12 | benchmark::DoNotOptimize( v );
13 | }
14 | }
15 | BENCHMARK(NoReserve);
16 |
17 | static void WithReserve(benchmark::State& state) {
18 |
19 | for (auto _ : state) {
20 | std::vector v;
21 | v.reserve(100);
22 | for(size_t i=0; i<100; i++){
23 | v.push_back(i);
24 | }
25 | benchmark::DoNotOptimize( v );
26 | }
27 | }
28 | BENCHMARK(WithReserve);
29 |
30 |
31 | static void ObsessiveRecycling(benchmark::State& state) {
32 |
33 | std::vector v;
34 | for (auto _ : state) {
35 | v.clear();
36 | for(size_t i=0; i<100; i++){
37 | v.push_back(i);
38 | }
39 | benchmark::DoNotOptimize( v );
40 | }
41 | }
42 | BENCHMARK(ObsessiveRecycling);
43 |
44 | BENCHMARK_MAIN();
45 |
--------------------------------------------------------------------------------
/cpp/small_strings.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 |
4 | static const char* SHORT_STR = "hello world";
5 |
6 | static void ShortStringCreation(benchmark::State& state) {
7 | // Create a string over and over again.
8 | // It is just because "short strings optimization" is active
9 | // no memory allocations
10 | for (auto _ : state) {
11 | std::string created_string(SHORT_STR);
12 | benchmark::DoNotOptimize(created_string);
13 | }
14 | }
15 | BENCHMARK(ShortStringCreation);
16 |
17 | static void ShortStringCopy(benchmark::State& state) {
18 |
19 | // Here we create the string only once, but copy repeatably.
20 | // Why it is much slower than ShortStringCreation?
21 | // The compiler, apparently, outsmarted me
22 |
23 | std::string x; // create once
24 | for (auto _ : state) {
25 | x = SHORT_STR; // copy
26 | }
27 | }
28 | BENCHMARK(ShortStringCopy);
29 |
30 | static const char* LONG_STR = "this will not fit into small string optimization";
31 |
32 | static void LongStringCreation(benchmark::State& state) {
33 | // The long string will trigget memory allocation for sure
34 |
35 | for (auto _ : state) {
36 | std::string created_string(LONG_STR);
37 | benchmark::DoNotOptimize(created_string);
38 | }
39 | }
40 | BENCHMARK(LongStringCreation);
41 |
42 | static void LongStringCopy(benchmark::State& state) {
43 | // Now we do see an actual advantage, reciclying the same string
44 | // multiple times
45 | std::string x;
46 | for (auto _ : state) {
47 | x = LONG_STR;
48 | }
49 | }
50 | BENCHMARK(LongStringCopy);
51 |
52 |
53 | BENCHMARK_MAIN();
54 |
--------------------------------------------------------------------------------
/cpp/strings_are_vectors.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 |
4 | enum Color{
5 | BLUE,
6 | RED,
7 | YELLOW
8 | };
9 |
10 | std::string ToStringBad(Color c)
11 | {
12 | switch(c)
13 | {
14 | case BLUE: return "BLUE";
15 | case RED: return "RED";
16 | case YELLOW: return "YELLOW";
17 | }
18 | }
19 |
20 | const std::string& ToStringBetter(Color c)
21 | {
22 | static const std::string color_name[3] ={"BLUE", "RED", "YELLOW"};
23 | switch(c)
24 | {
25 | case BLUE: return color_name[0];
26 | case RED: return color_name[1];
27 | case YELLOW: return color_name[2];
28 | }
29 | }
30 |
31 | static void ToStringByValue(benchmark::State& state) {
32 |
33 |
34 | for (auto _ : state) {
35 | std::string x = ToStringBad(BLUE);
36 | benchmark::DoNotOptimize( x );
37 | }
38 | }
39 | BENCHMARK(ToStringByValue);
40 |
41 | static void ToStringByReference(benchmark::State& state) {
42 |
43 | std::string x; // create once
44 | for (auto _ : state) {
45 | const std::string& x = ToStringBetter(BLUE);
46 | benchmark::DoNotOptimize( x );
47 | }
48 | }
49 | BENCHMARK(ToStringByReference);
50 |
51 | //---------------------------------------------
52 |
53 |
54 | // Create a new string every time (even if return value optimization may help)
55 | static std::string ModifyString(const std::string& input)
56 | {
57 | std::string output = input;
58 | output.append("... indeed");
59 | return output;
60 | }
61 | // Reuse an existing string that MAYBE, have the space already reserved
62 | // (or maybe not..)
63 | static void ModifyStringBetter(const std::string& input, std::string& output)
64 | {
65 | output = input;
66 | output.append("... indeed");
67 | }
68 |
69 | static const char* LONG_STR = "this will not fit into small string optimization";
70 |
71 |
72 | static void ModifyByValuee(benchmark::State& state) {
73 |
74 | std::string in(LONG_STR);
75 | std::string out;
76 | // Memory must be allocated every time
77 | for (auto _ : state) {
78 | out = ModifyString(in);
79 | }
80 | }
81 | BENCHMARK(ModifyByValuee);
82 |
83 | static void ModifyByReference(benchmark::State& state) {
84 |
85 | std::string in(LONG_STR);
86 | std::string out;
87 | // ModifyStringBetter could be as fast as ModifyString, but
88 | // occasionally faster, because out has the mory already allocated
89 | // from previous calls
90 | for (auto _ : state) {
91 | ModifyStringBetter(in, out);
92 | }
93 | }
94 | BENCHMARK(ModifyByReference);
95 |
96 |
97 | BENCHMARK_MAIN();
98 |
--------------------------------------------------------------------------------
/cpp/strings_concatenation.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 |
4 | std::string first_str("This is my first string.");
5 | std::string second_str("This is the second string I want to append.");
6 | std::string third_str("This is the third and last string to append.");
7 |
8 | inline size_t StrSize(const char* str) {
9 | return strlen(str);
10 | }
11 |
12 | inline size_t StrSize(const std::string& str) {
13 | return str.size();
14 | }
15 |
16 | template
17 | size_t StrSize(const Head& head, Tail const&... tail){
18 | return StrSize(head) + StrSize(tail...);
19 | }
20 |
21 | template
22 | void StrAppend(std::string& out, const Head& head){
23 | out += head;
24 | }
25 |
26 | template
27 | void StrAppend(std::string& out, const Head& head, Args const&... args){
28 | out += head;
29 | StrAppend(out, args...);
30 | }
31 |
32 | template inline
33 | std::string StrCat(Args const&... args){
34 | size_t tot_size = StrSize(args...);
35 | std::string out;
36 | out.reserve(tot_size);
37 |
38 | StrAppend(out, args...);
39 | return out;
40 | }
41 |
42 | static void DefaultConcatenation(benchmark::State& state) {
43 |
44 | for (auto _ : state) {
45 | std::string big_one = first_str + " " + second_str + " " + third_str;
46 | benchmark::DoNotOptimize(big_one);
47 | }
48 | }
49 |
50 | static void ManualConcatenation(benchmark::State& state) {
51 |
52 | for (auto _ : state) {
53 | std::string big_one;
54 | big_one.reserve(first_str.size() +
55 | second_str.size() +
56 | third_str.size() +
57 | strlen(" ")*2 );
58 |
59 | big_one += first_str;
60 | big_one += " ";
61 | big_one += second_str;
62 | big_one += " ";
63 | big_one += third_str;
64 | benchmark::DoNotOptimize(big_one);
65 | }
66 | }
67 |
68 | static void VariadicConcatenation(benchmark::State& state) {
69 |
70 | for (auto _ : state) {
71 | std::string big_one = StrCat(first_str, " ", second_str, " ", third_str);
72 | benchmark::DoNotOptimize(big_one);
73 | }
74 | }
75 |
76 | // Register the function as a benchmark
77 | BENCHMARK(DefaultConcatenation);
78 | BENCHMARK(ManualConcatenation);
79 | BENCHMARK(VariadicConcatenation);
80 |
--------------------------------------------------------------------------------
/docs/2d_matrix_iteration.md:
--------------------------------------------------------------------------------
1 | # Iterating over a 2D matrix
2 |
3 | 2D matrices are very common in my domain (robotics).
4 |
5 | We use them to represent images, gridmaps/costmaps, etc.
6 |
7 | Recently, I realized that one of our algorithms dealing with costmaps was quite slow,
8 | and I profiled it using **Hotspot**.
9 |
10 | I realized that one of the bottlenecks was a function that combined together two costmaps to create
11 | a third one. The function looked qualitatively like this:
12 |
13 | ```C++
14 | // this is over simplified code, do not argue about it with me
15 | for( size_t y = y_min; y < y_max; y++ )
16 | {
17 | for( size_t x = x_min; x < x_max; x++ )
18 | {
19 | matrix_out( x,y ) = std::max( mat_a( x,y ), mat_b( x,y ) );
20 | }
21 | }
22 | ```
23 |
24 | Pretty straightforward, right? Elegantly written, as it should be.
25 | But since my measurements revealed that it was using too much CPU, I decided to
26 | follow the white rabbit inside the optimization hole.
27 |
28 | ## How do you write a good 2D matrix in C++?
29 |
30 | Have you ever written code like this?
31 |
32 | ```C++
33 | // My computer science professor did it like this
34 | float** matrix = (float**) malloc( rows_count * sizeof(float*) );
35 | for(int r=0; r class Matrix2D
56 | {
57 | public:
58 | Matrix2D(size_t rows, size_t columns): _num_rows(rows)
59 | {
60 | _data.resize( rows * columns );
61 | }
62 |
63 | size_t rows() const
64 | {
65 | return _num_rows;
66 | }
67 |
68 | T& operator()(size_t row, size_t col)
69 | {
70 | size_t index = col*_num_rows + row;
71 | return _data[index];
72 | }
73 |
74 | T& operator[](size_t index)
75 | {
76 | return _data[index];
77 | }
78 |
79 | // all the other methods omitted
80 | private:
81 | std::vector _data;
82 | size_t _num_rows;
83 | };
84 |
85 | // access an element of the matrix like this:
86 | matrix(row, col) = 42;
87 |
88 | ```
89 |
90 | This is the most cache-friendly way to build a matrix, with a single memory allocation and
91 | the data stored in a way known as [column-wise](https://www.geeksforgeeks.org/row-wise-vs-column-wise-traversal-matrix/).
92 |
93 | To convert a row/column pair into an index in the vector, we need a single multiplication and addition.
94 |
95 | ## Back to my problem
96 |
97 | Do you remember the code we wrote at the beginning?
98 |
99 | We have a number of iterations that is equal to `(x_max-x_min)*(y_max-y_min)`.
100 | Often, that it is a lot of pixels/cells.
101 |
102 | In each iteration, we calculate the index 3 times using the formula:
103 |
104 | size_t index = col*_num_rows + row;
105 |
106 | Holy moly, that is a lot of multiplications!
107 |
108 | It turned out that it was worth rewriting the code like this:
109 |
110 | ```C++
111 | // calculating the index "by hand"
112 | for(size_t y = y_min; y < y_max; y++)
113 | {
114 | size_t offset_out = y * matrix_out.rows();
115 | size_t offset_a = y * mat_a.rows();
116 | size_t offset_b = y * mat_b.rows();
117 | for(size_t x = x_min; x < x_max; x++)
118 | {
119 | size_t index_out = offset_out + x;
120 | size_t index_a = offset_a + x;
121 | size_t index_b = offset_b + x;
122 | matrix_out( index_out ) = std::max( mat_a( index_a ), mat_b( index_b ) );
123 | }
124 | }
125 | ```
126 |
127 | So, I know what you are thinking, **my eyes are hurting too**. It is ugly. But the performance boost was too much to ignore.
128 |
129 | That is not surprising, considering that the number of multiplications is dramatically reduced by a factor `(x_max-x_min)*3`.
130 |
131 |
132 |
--------------------------------------------------------------------------------
/docs/2d_transforms.md:
--------------------------------------------------------------------------------
1 | # Don't compute it twice
2 |
3 | The example I am going to present will make some of you react like this:
4 |
5 | 
6 |
7 | I say this because it will be absolutely obvious... in retrospective.
8 |
9 | On the other hand, I have seen **this same code** being used in multiple open source projects.
10 |
11 | Projects with hundreds of Github stars missed this (apparently obvious) opportunity for optimization.
12 |
13 | A notable example is: [Speed up improvement for laserOdometry and scanRegister (20%)](https://github.com/laboshinl/loam_velodyne/pull/20)
14 |
15 | ## 2D transforms
16 |
17 | Let's consider this code:
18 |
19 | ```c++
20 | double x1 = x*cos(ang) - y*sin(ang) + tx;
21 | double y1 = x*sin(ang) + y*cos(ang) + ty;
22 | ```
23 |
24 | People with a trained eye (and a little of trigonometric background) will immediately recognize the [affine transform of a 2D point], commonly used in computer graphics and robotics.
25 |
26 | Don't you see anything we can do better? Of course:
27 |
28 | ```c++
29 | const double Cos = cos(angle);
30 | const double Sin = sin(angle);
31 | double x1 = x*Cos - y*Sin + tx;
32 | double y1 = x*Sin + y*Cos + ty;
33 | ```
34 |
35 | The cost of trigonometric functions is relatively high and there is absolutely no reason to compute twice the same value.
36 | The latter code will be 2x times faster then the former, because the cost of multiplication and sum is really low compared with `sin()` and `cos()`.
37 |
38 | In general, if the number of potential angles you need to test is not extremely high, consider to use look-up-table where you can store pre-computed values.
39 |
40 | This is the case, for instance, of laser scan data, that needs to be converted from polar coordinates to cartesian ones.
41 |
42 | 
43 |
44 | A naive implementation would invoke trigonometric functions for each point (in the order of thousands per seconds).
45 |
46 | ```c++
47 | // Conceptual operation (inefficient)
48 | // Data is usually stored in a vector of distances
49 | std::vector scan_distance; // the input
50 | std::vector cartesian_points; // the output
51 |
52 | cartesian_points.reserve( scan_distance.size() );
53 |
54 | for(int i=0; i LUT_cos;
75 | std::vector LUT_sin;
76 |
77 | for(int i=0; i scan_distance;
86 | std::vector cartesian_points;
87 |
88 | cartesian_points.reserve( scan_distance.size() );
89 |
90 | for(int i=0; i::operator[]**.
26 |
27 | 
28 |
29 | There is a big block on the right side, and then many multiple calls here and there in the code on the left one.
30 |
31 | The problematic container looks more or less like this:
32 |
33 | ```C++
34 | // simplified code
35 | std::unordered_map m_dictionary;
36 |
37 | //where
38 | struct Address{
39 | int16_t index;
40 | unt8_t subindex;
41 | };
42 | ```
43 |
44 | # The solution
45 |
46 | I inspected the code and I found horrible things. Like this:
47 |
48 | ```C++
49 | // simplified code
50 | bool hasEntry(const Address& address)
51 | {
52 | return m_dictionary.count(address) != 0;
53 | }
54 |
55 | Value getEntry(const Address& address)
56 | {
57 | if( !hasEntry(address) {
58 | throw ...
59 | }
60 | Entry* = m_dictionary[address];
61 | // rest of the code
62 | }
63 | ```
64 |
65 | 
66 |
67 | The correct way to do it is, instead:
68 |
69 | ```C++
70 | // simplified code. Only one lookup
71 | Value getEntry(const Address& address)
72 | {
73 | auto it = m_dictionary.find(address);
74 | if( it == m_dictionary.end() ) {
75 | throw ...
76 | }
77 | Entry* = it->second;
78 | // rest of the code
79 | }
80 | ```
81 |
82 | This change alone cuts by half the overhead of `std::unordered_map` observed in the flamegraph.
83 |
84 | ## Embrace caching
85 |
86 | It is very important to note that once created, the dictionary is never changed at run-time.
87 |
88 | This allows us, as mentioned in the picture, to optimize the large block on the right side using caching.
89 |
90 | In fact, the map lookup is executed inside a callback that has access to the relevant address. But if the **[Address, Entry*]** pair never changes, why don't we directly store the `Entry*`?
91 |
92 | As expected, this completely erase the overhead in that big block using 15% of the CPU.
93 |
94 | If you haven't you may look at similar examples of caching [in one of my previous articles about 2D transformations](2d_transforms.md).
95 |
96 | ## Winning big with `boost::container_flat_map`
97 |
98 | Using caching in the rest of the code would have been a pain. That is the typical example of "death by a 1000 papercuts".
99 |
100 | Furthermore, there was something in the back of my head telling me that something was off.
101 |
102 | Why does `std::hash` take so long? Is this, maybe, **one of those rare cases** where I should not look at the big O() notation?
103 |
104 | `std::unordered_map` lookup is O(1), that is a Good Thing, isn't it?
105 |
106 | Shall we try some lateral thinking and use another container with O(logn) lookup complexity, but without the cost of the hash function?
107 |
108 | Drum roll, please and welcome [boost::container_flat_map](https://www.boost.org/doc/libs/1_74_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx).
109 |
110 | I am not going to repeat what the documentation explains about the implementation, which is mostly a normal ordered vector, similar to what I discuss [at the end of this article about std::map](dont_need_map.md).
111 |
112 | The results surprised me: I haven't just "reduced" the overhead, it was completely gone. The cost of `flat_map<>::operator[]` was barely measurable.
113 |
114 | Basically, just switching to `flat_map` solved the entire problem changing in one line of code!
115 |
116 |
117 |
118 |
119 |
--------------------------------------------------------------------------------
/docs/dont_need_map.md:
--------------------------------------------------------------------------------
1 | # Do you actually need to use std::map?
2 |
3 | `std::map` is one of the most known data structures in C++, the default associative container for most of us, but its popularity has been decreasing over the years.
4 |
5 | Associative containers are used when you have **pairs of key/value** and you want to find a value given its key.
6 |
7 | But, because of the way the nodes of the [red-black tree](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) are created, `std::map` is
8 | not much different than an `std::list`, i.e. an unwelcome memory allocation during insertion and a very cache-unfriendly memory layout.
9 |
10 | Before selecting this data structure, ask yourself these questions:
11 |
12 | - do I need all the pairs to be **ordered** by their keys?
13 | - do I need to iterate often through all the items of the container?
14 |
15 | If the answer to the first question is "no", you may want to switch by default to `std::unordered_map`.
16 |
17 | In all my benchmarks, this was always a win. Maybe I was lucky and maybe there are situations in which `std::map` would perform better, but I haven't found those cases yet.
18 |
19 | If you answer "yes" to the second question... this will be interesting.
20 |
21 | Sit on my lap, my child, and join my optimization adventures.
22 |
23 | ## Optimizing the Velodyne driver
24 |
25 | This is a Pull Request I am particularly proud of:
26 |
27 | [Avoid unnecessary computation in RawData::unpack](https://github.com/ros-drivers/velodyne/pull/194)
28 |
29 | To understand how a small change can make a huge difference, think about what this driver is doing.
30 |
31 | 
32 |
33 | The Velodyne is a sensor that measures hundreds of thousands of points per seconds (distance from obstacles); it is the most important sensor in most autonomous cars.
34 |
35 | The Velodyne driver converts measurements in polar coordinates to 3D cartesian coordinates (the so-called "PointCloud").
36 |
37 | I profiled the Velodyne driver using **Hotspot** and, surprise surprise, I found that something related to `std::map::operator[]` was using a lot of CPU.
38 |
39 | So I explored the code and I found this:
40 |
41 | ```C++
42 | std::map laser_corrections;
43 | ```
44 | `LaserCorrection`contains some calibration information that was needed to adjust the measurements.
45 |
46 | The `int` key of the map is a number in the range [0, N-1], where N could be 16, 32 or 64. Maybe one day it will reach 128!
47 |
48 | Furthermore `laser_corrections` was created once (no further insertions)
49 | and used over and over again in a loop like this:
50 |
51 | ```C++
52 | // code simplified
53 | for (int i = 0; i < BLOCKS_PER_PACKET; i++) {
54 | //some code
55 | for (int j = 0; j < NUM_SCANS; j++)
56 | {
57 | int laser_number = // omitted for simplicity
58 | const LaserCorrection &corrections = laser_corrections[laser_number];
59 | // some code
60 | }
61 | }
62 | ```
63 |
64 | 
65 |
66 | Indeed, behind this innocent line of code:
67 |
68 | laser_corrections[laser_number];
69 |
70 | There is a search in a red-black tree!
71 |
72 | Remember: the index is **not** a random number, its value is always between 0 and N-1, where N is very small.
73 |
74 | So, I proposed this change and you can not imagine what happened next:
75 |
76 | ```C++
77 | std::vector laser_corrections;
78 | ```
79 |
80 | 
81 |
82 | Summarizing, there wasn't any need for an associative container, because the position in the vector itself (the index in the vector) is working just fine.
83 |
84 | I don't blame in any way the developers of the Velodyne driver, because changes like these make sense only in retrospective: until you profile your application and do some actual measurements, it is hard to find an unnecesary overhead hidden in plain sight.
85 |
86 | When you think that the rest of the function does **a lot** of mathematical operations, you can understand how counter-intuitive it is that the actual bottleneck was a tiny `std::map`.
87 |
88 | ## Going a step further: vector of [key,value] pairs
89 |
90 | This example was quite "extreme", because of its very convenient integer key, a small number between 0 and N.
91 |
92 | Nevertheless, in my code I use often a structure like this, instead of a "real" associative container:
93 |
94 | ```C++
95 | std::vector< std::pair > my_map;
96 | ```
97 |
98 | This is the best data structure if what you need to **iterate frequently over all the elements**.
99 |
100 | Most of the times, you can not beat it!
101 |
102 | > "But Davide, I need to have those elements ordered, that is the reason why I used `std::map`"!
103 |
104 | Well, if you need them ordered... order them!
105 |
106 | ```C++
107 | std::sort( my_map.begin(), my_map.end() ) ;
108 | ```
109 | > "But Davide, sometimes I need to search an element in my map"
110 |
111 | In that case, you can find your element by its key searching in an **ordered** vector with the function
112 | [std::lower_bound](http://www.cplusplus.com/reference/algorithm/lower_bound/).
113 |
114 | The complexity of `lower_bound` / `upper_bound` is **O(log n)**, the same as `std::map`, but iteration through all the elements is much, much faster.
115 |
116 | ## Summarizing
117 |
118 | - Think about the way you want to access your data.
119 | - Ask yourself if you have frequent or infrequent insertion/deletion.
120 | - Do not underestimate the cost of an associative container.
121 | - Use `std::unordered_map` by default... or `std::vector`, of course !
122 |
--------------------------------------------------------------------------------
/docs/img/beautiful.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/beautiful.jpg
--------------------------------------------------------------------------------
/docs/img/boom.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/boom.gif
--------------------------------------------------------------------------------
/docs/img/const_reference.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/const_reference.png
--------------------------------------------------------------------------------
/docs/img/cpp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/cpp.png
--------------------------------------------------------------------------------
/docs/img/davide_yells_at_PCL.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/davide_yells_at_PCL.jpg
--------------------------------------------------------------------------------
/docs/img/feel_bad.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/feel_bad.jpg
--------------------------------------------------------------------------------
/docs/img/fizzbuzz.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/fizzbuzz.jpg
--------------------------------------------------------------------------------
/docs/img/growing_vector.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/growing_vector.png
--------------------------------------------------------------------------------
/docs/img/hotspot_heaptrack.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/hotspot_heaptrack.jpg
--------------------------------------------------------------------------------
/docs/img/inconceivably.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/inconceivably.jpg
--------------------------------------------------------------------------------
/docs/img/laser_scan_matcher.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/laser_scan_matcher.png
--------------------------------------------------------------------------------
/docs/img/linked_list.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/linked_list.png
--------------------------------------------------------------------------------
/docs/img/modifystring.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/modifystring.png
--------------------------------------------------------------------------------
/docs/img/mordor.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/mordor.jpg
--------------------------------------------------------------------------------
/docs/img/motor_profile1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/motor_profile1.png
--------------------------------------------------------------------------------
/docs/img/motor_profile2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/motor_profile2.png
--------------------------------------------------------------------------------
/docs/img/multiply_vector.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/multiply_vector.png
--------------------------------------------------------------------------------
/docs/img/palindrome_benchmark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/palindrome_benchmark.png
--------------------------------------------------------------------------------
/docs/img/pcl.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/pcl.jpg
--------------------------------------------------------------------------------
/docs/img/pcl_fromros.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/pcl_fromros.png
--------------------------------------------------------------------------------
/docs/img/quick-bench.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/quick-bench.png
--------------------------------------------------------------------------------
/docs/img/quote.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/quote.png
--------------------------------------------------------------------------------
/docs/img/really.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/really.jpg
--------------------------------------------------------------------------------
/docs/img/realsense.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/realsense.png
--------------------------------------------------------------------------------
/docs/img/relax_sso.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/relax_sso.jpg
--------------------------------------------------------------------------------
/docs/img/spider_senses.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/spider_senses.png
--------------------------------------------------------------------------------
/docs/img/sso_in_action.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/sso_in_action.png
--------------------------------------------------------------------------------
/docs/img/string_concatenation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/string_concatenation.png
--------------------------------------------------------------------------------
/docs/img/that_is_logn.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/that_is_logn.jpg
--------------------------------------------------------------------------------
/docs/img/think_about_it.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/think_about_it.jpg
--------------------------------------------------------------------------------
/docs/img/tostring.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/tostring.png
--------------------------------------------------------------------------------
/docs/img/twitter_unordered.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/twitter_unordered.png
--------------------------------------------------------------------------------
/docs/img/two_lookups.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/two_lookups.jpg
--------------------------------------------------------------------------------
/docs/img/vector_reserve.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/vector_reserve.png
--------------------------------------------------------------------------------
/docs/img/velodyne.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/velodyne.png
--------------------------------------------------------------------------------
/docs/img/why_copy.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/facontidavide/CPP_Optimizations_Diary/78054adffdc77ae3f2b8a4424a436aeb7195d4c7/docs/img/why_copy.jpg
--------------------------------------------------------------------------------
/docs/index.md:
--------------------------------------------------------------------------------
1 | # CPP Optimizations Diary
2 |
3 | Optimizing code in C++ is something that no one can resist. You can have fun
4 | and pretend that you are doing something useful for your organization at the same time!
5 |
6 | In this repository, I will record some simple design patterns to improve your code
7 | and remove unnecessary overhead in **C++**.
8 |
9 | If you are a seasoned C++ expert, you probably have already a set of rules in your head
10 | that you always follow.
11 |
12 | These rules help you look like a bad-ass/rockstar/10X engineer to your colleagues.
13 |
14 | You are the kind of person that casually drops a [std::vector<>::reserve](reserve.md) before a loop and
15 | nods, smiling, looking at the performance improvement and the astonishment of your team member.
16 |
17 |
18 | 
19 |
20 | Hopefully, the examples in this repository will help you achieve this status of guru
21 | and, as a side effect, save the planet from global warming, sparing useless CPU
22 | cycles from being wasted.
23 |
24 | Then, of course, someone on the other side of the planet will start mining Bitcoins or write her/his
25 | application in **Python** and all your effort to save electricity was for nothing.
26 |
27 | I am kidding, Python developers, we love you!
28 |
29 | > Narrator: "he was not kidding..."
30 |
31 | ## Rule 1: measure first (using _good_ tools)
32 |
33 | The very first thing any person concerned about performance should do is:
34 |
35 | - **Measure first** and **make hypothesis later**.
36 |
37 | Me and my colleagues are almost always wrong about the reasons a piece of code is
38 | be slow.
39 |
40 | Sometimes we are right, but it is really hard to know in advance ho refactoring will
41 | improve performance. Good profiling tools show in minutes the "low hanging fruits": minimum work, maximum benefit!
42 |
43 | Summarizing: 10 Minutes profiling can save you hours of work guessing and refactoring.
44 |
45 | My "goto" tools in Linux are [Hotspot](https://github.com/KDAB/hotspot) and
46 | [Heaptrack](https://github.com/KDE/heaptrack). I understand Windows has similar
47 | tools too.
48 |
49 | 
50 |
51 | In the benchmark war, if you are the soldier, these are your rifle and hand grenades.
52 |
53 | Once you know which part of the code deserves to be optimized, you might want to use
54 | [Google Benchmark](https://github.com/google/benchmark) to measure the time spent in a very specific
55 | class or function.
56 |
57 | You can even run it Google Benchmark online here: [quick-bench.com](http://quick-bench.com/G7B2w0xPUWgOVvuzI7unES6cU4w).
58 |
59 | 
60 |
61 | ## Rule 2: learn good design patterns, use them by default
62 |
63 | Writing good code is like brushing your teeth: you should do it without thinking too much about it.
64 |
65 | It is a muscle that you need to train, that will become stronger over time. But don't worry:
66 | once you start, you will start seeing recurring patterns that
67 | are surprisingly simple and works in many different use cases.
68 |
69 | **Spoiler alert**: one of my most beloved tricks is to _minimize the number of heap allocations_.
70 | You have no idea how much that helps.
71 |
72 | But let's make something absolutely clear:
73 |
74 | - Your **first goal** as a developer (software engineer?) is to create code that is **correct** and fulfil the requirements.
75 | - The **second** most important thing is to make your code **maintainable and readable** for other people.
76 | - In many cases, you also want to make code faster, because [faster code is better code](https://craigmod.com/essays/fast_software/).
77 |
78 | In other words, think twice before doing any change in your code that makes it less readable or harder to debug,
79 | just because you believe it may run 2.5% faster.
80 |
81 | ## Optimization examples
82 |
83 | ### "If you pass that by value one more time..."
84 |
85 | - [Use Const reference by default](prefer_references.md).
86 |
87 | - Move semantic (TODO).
88 |
89 | - Return value optimization (TODO).
90 |
91 |
92 | ### std::vector<> is your best friend
93 |
94 |
95 | - [Use std::vector<>::reserve by default](reserve.md)
96 |
97 | - ["I have learnt linked-lists at university, should I use them?" Nooope](no_lists.md).
98 |
99 | - [You don't need a `std::map<>` for that](dont_need_map.md).
100 |
101 | - [Small vector optimization](small_vectors.md)
102 |
103 |
104 | ### "It is just a string, how bad could that be?"
105 |
106 | - [Strings are (almost) vectors](strings_are_vectors.md)
107 |
108 | - [When not to worry: small string optimization](small_strings.md).
109 |
110 | - [String concatenation: the false sense of security of `operator+`](strings_concatenation.md).
111 |
112 | - `std::string_view`: love at first sight (TODO).
113 |
114 | ### Don't compute things twice.
115 |
116 | - [Example: 2D/3D transforms the smart way.](2d_transforms.md )
117 |
118 | - [Iterating over a 2D matrix: less elegant, more performant](2d_matrix_iteration.md).
119 |
120 | ### Fantastic data structures and where to find them.
121 |
122 | - [I tried `boost::container::flat_map`. You won't imagine what happened next](boost_flatmap.md).
123 |
124 | ### Case studies
125 |
126 | - [Simpler and faster way to filter Point Clouds in PCL.](pcl_filter.md)
127 |
128 | - [More PCL optimizations: conversion fro ROS message](pcl_fromROS.md)
129 |
130 | - [Fast Palindrome: the cost of conditional branches](palindrome.md)
131 |
--------------------------------------------------------------------------------
/docs/no_lists.md:
--------------------------------------------------------------------------------
1 | # If you are using std::list<>, you are doing it wrong
2 |
3 |
4 | 
5 |
6 | I am not wasting time here to repeat benchmarks which a lot of people did already.
7 |
8 | - [std::vector vs std::list benchmark](https://baptiste-wicht.com/posts/2012/11/cpp-benchmark-vector-vs-list.html)
9 |
10 | - [Are lists evil? Bjarne Stroustrup](https://isocpp.org/blog/2014/06/stroustrup-lists)
11 |
12 | - [Video from Bjarne Stroustrup keynote](https://www.youtube.com/watch?v=YQs6IC-vgmo)
13 |
14 | You think your case is special, a unique snowflake. **It is not**.
15 |
16 | You have another STL data structure that is better than `std::list`:
17 | [std::deque<>](https://es.cppreference.com/w/cpp/container/deque) almost 99% of the time.
18 |
19 | In some cases, even the humble `std::vector` is better than a list.
20 |
21 | If you like very exotic alternatives have a look at [plf::colony](https://plflib.org/colony.htm).
22 |
23 | But seriously, just use `vector`or `deque`.
24 |
25 | ## Real world example: improving the Intel RealSense driver
26 |
27 | This is a practical example of a Pull Request I sent to the [RealSense](https://github.com/IntelRealSense)
28 | repository a while ago.
29 |
30 | 
31 |
32 | They where using that abomination called `std::list<>` for a reason that I can not understand.
33 |
34 | Just kidding, Intel Developers, we love you!
35 |
36 | Here you can find the link to the Pull Request:
37 | - [Considerable CPU saving in BaseRealSenseNode::publishPointCloud()](https://github.com/IntelRealSense/realsense-ros/pull/1097)
38 |
39 | In a nutshell, the whole PR contains only two tiny changes:
40 |
41 | ```C++
42 | // We changed this list, created at each camera frame
43 | std::list valid_indices;
44 |
45 | // With this vector: a class member, that is cleared before reusing
46 | // (but allocated memory is still there)
47 | std::vector _valid_indices;
48 | ```
49 |
50 | Additionally, we have a quite large object called `sensor_msgs::PointCloud2 msg_pointcloud` that
51 | is converted into a class member that is reused over and over again at each frame.
52 |
53 | The reported speed improvement is 20%-30%, that is huge, if you think about it.
54 |
55 |
56 |
--------------------------------------------------------------------------------
/docs/palindrome.md:
--------------------------------------------------------------------------------
1 | # Having fun with Palindrome words
2 |
3 | This article is not as interesting and reusable as other, but I think it might still be a nice example of how you we can speed up your code reducing the amount of branches.
4 |
5 | The `if` statement is very fast and usually we should not worry about its runtime cost, but there could be few cases in which avoiding it can provide a visible improvement.
6 |
7 | ## Coding interviews...
8 |
9 | At the time of writing this article, I find myself in the wonderful world of job searching. As a consequence, you probably know what that implies: coding interviews!
10 |
11 | I am fine with them, but the other day a person interviewing me asked the following question:
12 |
13 | > Can you write a function to find if a string is a [palindrome](https://en.wikipedia.org/wiki/Palindrome)?
14 |
15 | And I was thinking...
16 |
17 | 
18 |
19 | To be fair, I am sure he his a great guy and he was just breaking the ice. He certainly had the best intentions, but I think it could have been more productive for both of us to look at some real-world code I wrote in production instead!
20 |
21 | Nevertheless, this is the answer:
22 |
23 | ```C++
24 |
25 | bool IsPalindrome(const std::string& str)
26 | {
27 | const size_t N = str.size();
28 | const size_t N_half = N / 2;
29 | for(size_t i=0; i
63 |
64 | inline bool IsPalindromeWord(const std::string& str)
65 | {
66 | const size_t N = str.size();
67 | const size_t N_half = (N/2);
68 | const size_t S = sizeof(uint32_t);
69 | // number of words of size S in N_half
70 | const size_t N_words = (N_half / S);
71 |
72 | // example: if N = 18, half string is 9 bytes and
73 | // we need to compare 2 pairs of words and 1 pair of chars
74 |
75 | size_t index = 0;
76 |
77 | for(size_t i=0; i ();
28 | range_cond->addComparison (
29 | std::make_shared("z", ComparisonOps::GT, 0.0));
30 | range_cond->addComparison (
31 | std::make_shared("z", ComparisonOps::LT, 1.0)));
32 |
33 | // build the filter
34 | ConditionalRemoval condition_removal;
35 | condition_removal.setCondition (range_cond);
36 | condition_removal.setInputCloud (input_cloud);
37 | // apply filter
38 | condition_removal.filter (*cloud_filtered);
39 | ```
40 |
41 | > Basically, we create the condition which a given point must satisfy for it to remain in our PointCloud.
42 | > In this example, we use add two comparisons to the condition: greater than (GT) 0.0 and less than (LT) 1.0.
43 | > This condition is then used to build the filter.
44 |
45 | Let me rephrase it for people that are not familiar with PCL:
46 |
47 | - An object is created that says: "the Z value of the point must be greater than 0.0".
48 | - Another object is created saying: "the Z value of the point must be less than 1.0".
49 | - They are both added to a `ConditionAnd`.
50 | - We tell `ConditionalRemoval` to use this combined condition.
51 | - Apply the filter to an input cloud to create a filtered one.
52 |
53 | Now think that the usal Point Cloud has a number of points in the order of
54 | **tens of thousands** or more.
55 |
56 | Think about it:
57 |
58 | 
59 |
60 | Seriously, take some minutes to think how, given a **vector** of points that looks like this:
61 |
62 | ```C++
63 | // oversimplified, not the actual implementation
64 | struct PointXYZ{
65 | float x;
66 | float y;
67 | float z;
68 | };
69 | ```
70 |
71 | You want to create another point cloud with all the points that pass this condition:
72 |
73 | 0.0 < point.z < 1.0
74 |
75 | I mean, if you ask **me**, this is what I would do, because I am not that smart:
76 |
77 | ```C++
78 | auto cloud_filtered = std::make_shared>();
79 |
80 | for (const auto& point: input_cloud->points)
81 | {
82 | if( point.z > 0.0 && point.z < 1.0 )
83 | {
84 | cloud_filtered->push_back( point );
85 | }
86 | }
87 | ```
88 | This is what we will call the **"naive filter"**.
89 |
90 | Before showing you a benchmark that will knock your socks off (don't worry, I will),
91 | I must admit that it is an **unfair comparison** because the `pcl` filters
92 | do many more checks when processing your data to prevent weird corner cases.
93 |
94 |
95 | But do not forget that we expressed our conditions like this:
96 |
97 | ```C++
98 | pcl::FieldComparison ("z", pcl::ComparisonOps::GT, 0.0)));
99 | pcl::FieldComparison ("z", pcl::ComparisonOps::LT, 1.0)));
100 | ```
101 |
102 | If you think about it, there **must be** some kind of parser "somewhere".
103 |
104 | The easiest implementation of a parser is of course a `switch`
105 | statement, but no one would ever do that for **each** of these trillion points...
106 |
107 | [Oh, snap!](https://github.com/PointCloudLibrary/pcl/blob/pcl-1.11.0/filters/include/pcl/filters/impl/conditional_removal.hpp#L98-L127)
108 |
109 | Indeed, 2 switch statements called for each point of the cloud.
110 |
111 | Summarizing: the very fact that these function tries to be "too clever"
112 | using these "composable rules", means that the implementation is **inherently slow**.
113 |
114 | There is nothing we can do to save them. Nevertheles, we can replace them ;)
115 |
116 | ## Davide, give me speed AND expressive code.
117 |
118 | Sure thing, my friend!
119 |
120 | Since `pcl::FieldComparison` is intrinsically broken (so are all the other Conditions in the library),
121 | because of their `switch`statements, let me write my own pcl::Condition (must be derived from `pcl::ConditionBase`) like this:
122 |
123 | ```C++
124 | template
125 | class GenericCondition : public pcl::ConditionBase
126 | {
127 | public:
128 | typedef std::shared_ptr> Ptr;
129 | typedef std::shared_ptr> ConstPtr;
130 | typedef std::function FunctorT;
131 |
132 | GenericCondition(FunctorT evaluator):
133 | pcl::ConditionBase(),_evaluator( evaluator )
134 | {}
135 |
136 | virtual bool evaluate (const PointT &point) const {
137 | // just delegate ALL the work to the injected std::function
138 | return _evaluator(point);
139 | }
140 | private:
141 | FunctorT _evaluator;
142 | };
143 | ```
144 |
145 | That is literally **all** the code you need, no omissions.
146 |
147 | I am simply wrapping a `std::function` inside `pcl::ConditionBase`. Nothing else.
148 |
149 | This is the **old** code:
150 |
151 |
152 | ```C++
153 | auto range_cond = std::make_shared ();
154 | range_cond->addComparison (
155 | std::make_shared("z", ComparisonOps::GT, 0.0));
156 | range_cond->addComparison (
157 | std::make_shared("z", ComparisonOps::LT, 1.0)));
158 | ```
159 |
160 | And this is the **new** one, where my condition is expressed in plain old code:
161 |
162 | ```C++
163 | auto range_cond = std::make_shared>(
164 | [](const PointXYZ& point){
165 | return point.z > 0.0 && point.z < 1.0;
166 | });
167 | ```
168 |
169 | The rest of the code is unchanged!!!
170 |
171 | 
172 |
173 | ## Let's talk about speed
174 |
175 | You may find the code to replicate my tests [here](https://github.com/facontidavide/CPP_Optimizations_Diary/tree/master/cpp/pcl_conditional_removal.cpp).
176 |
177 | These are the benchmarks based on my sample cloud and 4 filters (upper and lower bound in X and Y):
178 |
179 | ```
180 | -------------------------------------------------------------
181 | Benchmark Time CPU Iterations
182 | -------------------------------------------------------------
183 | PCL_Filter 1403083 ns 1403084 ns 498
184 | Naive_Filter 107418 ns 107417 ns 6586
185 | PCL_Filter_Generic 668223 ns 668191 ns 1069
186 | ```
187 | Your results may change a lot according to the number of conditions and the size of the point cloud.
188 |
189 | But the lessons to learn are:
190 |
191 | - The "naive" filter might be an option in many cases and it is blazing fast.
192 | - The "safe" `pcl::ConditionalRemoval` can still be used if you just ditch the builtin `pcl::Conditions` and use instead the much more concise and readable `GenericCondition`.
193 |
194 |
--------------------------------------------------------------------------------
/docs/pcl_fromROS.md:
--------------------------------------------------------------------------------
1 | # Case study: convert ROS message to PCL
2 |
3 | [Point Cloud Library (PCL)](https://pointclouds.org/)
4 | seems to be a cornucopia of opportunities for optimizations.
5 |
6 | Even if this is a gratuitous criticism, let's remember what
7 | **Bjarne Stroustrup** said:
8 |
9 | > "There are only two kinds of languages: the ones people complain about and the ones nobody uses"
10 |
11 | So, let's keep in mind that PCL has a **huge** role in
12 | allowing everyone to process pointclouds easily.
13 | The developers and maintainers deserve all my respect for that!
14 |
15 | Said that, let's jump into my next rant.
16 |
17 | 
18 |
19 |
20 | ## Using pcl::fromROSMsg()
21 |
22 | If you use PCL in ROS, the following code is your bread and butter:
23 |
24 | ```c++
25 | void cloudCallback(const sensor_msgs::PointCloud2ConstPtr& msg)
26 | {
27 | pcl::PointCloud cloud;
28 | pcl::fromROSMsg(*msg, cloud);
29 |
30 | //...
31 | }
32 | ```
33 |
34 | Now, I can not count the number of people complaining that
35 | this conversion alone uses a lot of CPU!
36 |
37 | I look at its implementation and at the results of Hotspot
38 | (perf profiling) and a problem becomes immediately apparent:
39 |
40 | ```c++
41 | template
42 | void fromROSMsg(const sensor_msgs::msg::PointCloud2 &cloud,
43 | pcl::PointCloud &pcl_cloud)
44 | {
45 | pcl::PCLPointCloud2 pcl_pc2;
46 | pcl_conversions::toPCL(cloud, pcl_pc2);
47 | pcl::fromPCLPointCloud2(pcl_pc2, pcl_cloud);
48 | }
49 | ```
50 |
51 | We are transforming/copying the data twice:
52 |
53 | - first, we convert from `sensor_msgs::msg::PointCloud2` to
54 | `pcl::PCLPointCloud2`
55 | - then, from `pcl::PCLPointCloud2` to `pcl::PointCloud`.
56 |
57 | Digging into the implementation of `pcl_conversions::toPCL`, I found this:
58 |
59 | ```c++
60 | void toPCL(const sensor_msgs::msg::PointCloud2 &pc2,
61 | pcl::PCLPointCloud2 &pcl_pc2)
62 | {
63 | copyPointCloud2MetaData(pc2, pcl_pc2);
64 | pcl_pc2.data = pc2.data;
65 | }
66 | ```
67 |
68 | Copying that raw data from one type to the other is an overhead that can be easily avoided
69 | with some refactoring.
70 |
71 | This refactoring is not particularly interesting, because I basically "copied and pasted"
72 | the code of `pcl::fromPCLPointCloud2` to use a different input type.
73 |
74 | Fast-forwarding to the solution, let's have a look at the results:
75 |
76 | 
77 |
78 | ## What is the takeaway of this story?
79 |
80 | **Measure, measure, measure**! Don't assume that the "smart people" implemented the best solution and that you can't actively do anything about it.
81 |
82 | In this case, code clarity and reusing existing functions was prefered over performance.
83 |
84 | But the impact of this decision is definitively too much to be ignored.
85 |
86 |
87 |
88 |
89 |
--------------------------------------------------------------------------------
/docs/prefer_references.md:
--------------------------------------------------------------------------------
1 | # Value semantic vs references
2 |
3 | What I am going to say here is so trivial that probably any seasoned developer
4 | knows it already.
5 |
6 | Nevertheless, I keep seeing people doing stuff like this:
7 |
8 | ```C++
9 | bool OpenFile(str::string filename);
10 |
11 | void DrawPath(std::vector path);
12 |
13 | Pose DetectFace(Image image);
14 |
15 | Matrix3D Rotate(Matrix3D mat, AxisAngle axis_angle);
16 |
17 | ```
18 |
19 | I made these functions up, but **I do see** code like this in production sometimes.
20 |
21 | What do these functions have in common? You are passing the argument **by value**.
22 |
23 | In other words, whenever you call one of these functions, you make a copy of the input in your scope
24 | and pass the **copy** to the function.
25 |
26 | 
27 |
28 | Copies may, or may not, be an expensive operation, according to the size of the object or the fact
29 | that it requires dynamic heap memory allocation or not.
30 |
31 | In these examples, the objects that probably have a negligible overhead,
32 | when passed by value, are `Matrix3D` and `AngleAxis`, because we may assume that they don't require
33 | heap allocations.
34 |
35 | But, even if the overhead is small, is there any reason to waste CPU cycle, if we can avoid it?
36 |
37 | This is a better API:
38 |
39 |
40 | ```C++
41 | bool OpenFile(const str::string& filename); // string_view is even better
42 |
43 | void DrawPath(const std::vector& path);
44 |
45 | Pose DetectFace(const Image& image);
46 |
47 | Matrix3D Rotate(const Matrix3D& mat, const AxisAngle& axis_angle);
48 |
49 | ```
50 |
51 | In the latter version, we are using what is called **"reference semantic"**.
52 |
53 | You may use **C-style** (not-owning) pointers instead of references and get the same benefits, in terms of
54 | performance, but here we are telling to the compiler that the arguments are:
55 |
56 | - Constant. We won't change them on the callee side, nor inside the called function.
57 | - Being a reference, the argument *refers* to an existing object. A raw pointer might have value `nullptr`.
58 | - Not being a pointer, we are sure that we are not transferring the ownership of the object.
59 |
60 | The cost can be dramatically different, as you may see here:
61 |
62 | ```C++
63 | size_t GetSpaces_Value(std::string str)
64 | {
65 | size_t spaces = 0;
66 | for(const char c: str){
67 | if( c == ' ') spaces++;
68 | }
69 | return spaces;
70 | }
71 |
72 | size_t GetSpaces_Ref(const std::string& str)
73 | {
74 | size_t spaces = 0;
75 | for(const char c: str){
76 | if( c == ' ') spaces++;
77 | }
78 | return spaces;
79 | }
80 |
81 | const std::string LONG_STR("a long string that can't use Small String Optimization");
82 |
83 | void PassStringByValue(benchmark::State& state) {
84 | for (auto _ : state) {
85 | size_t n = GetSpaces_Value(LONG_STR);
86 | }
87 | }
88 |
89 | void PassStringByRef(benchmark::State& state) {
90 | for (auto _ : state) {
91 | size_t n = GetSpaces_Ref(LONG_STR);
92 | }
93 | }
94 |
95 | //----------------------------------
96 | size_t Sum_Value(std::vector vect)
97 | {
98 | size_t sum = 0;
99 | for(unsigned val: vect) { sum += val; }
100 | return sum;
101 | }
102 |
103 | size_t Sum_Ref(const std::vector& vect)
104 | {
105 | size_t sum = 0;
106 | for(unsigned val: vect) { sum += val; }
107 | return sum;
108 | }
109 |
110 | const std::vector vect_in = { 1, 2, 3, 4, 5 };
111 |
112 | void PassVectorByValue(benchmark::State& state) {
113 | for (auto _ : state) {
114 | size_t n = Sum_Value(vect_in);
115 | }
116 | }
117 |
118 | void PassVectorByRef(benchmark::State& state) {
119 | for (auto _ : state) {
120 | size_t n = Sum_Ref(vect_in);
121 | benchmark::DoNotOptimize(n);
122 | }
123 | }
124 |
125 | ```
126 |
127 | 
128 |
129 |
130 | Clearly, passing by reference wins hands down.
131 |
132 | ## Exceptions to the rule
133 |
134 | > "That is cool Davide, I will use `const&` everywhere".
135 |
136 | Let's have a look to another example, first.
137 |
138 | ```C++
139 | struct Vector3D{
140 | double x;
141 | double y;
142 | double z;
143 | };
144 |
145 | Vector3D MultiplyByTwo_Value(Vector3D p){
146 | return { p.x*2, p.y*2, p.z*2 };
147 | }
148 |
149 | Vector3D MultiplyByTwo_Ref(const Vector3D& p){
150 | return { p.x*2, p.y*2, p.z*2 };
151 | }
152 |
153 | void MultiplyVector_Value(benchmark::State& state) {
154 | Vector3D in = {1,2,3};
155 | for (auto _ : state) {
156 | Vector3D out = MultiplyByTwo_Value(in);
157 | }
158 | }
159 |
160 | void MultiplyVector_Ref(benchmark::State& state) {
161 | Vector3D in = {1,2,3};
162 | for (auto _ : state) {
163 | Vector3D out = MultiplyByTwo_Ref(in);
164 | }
165 | }
166 | ```
167 |
168 | 
169 |
170 |
171 | Interesting! Using `const&` has no benefit at all, this time.
172 |
173 | When you copy an object that doesn't require heap allocation and is smaller than a few dozens of bytes,
174 | you won't notice any benefit passing them by reference.
175 |
176 | On the other hand, it will never be slower so, if you are in doubt, using `const&` is always a "safe bet". While passing primitive types by const references can be shown to generate an extra instruction (see https://godbolt.org/z/-rusab). That gets optimized out when compiling with `-O3`.
177 |
178 | My rule of thumb is: never pass by reference any argument with size 8 bytes or less (integers, doubles, chars, long, etc.).
179 |
180 | Since we know for sure that there is 0% benefit, writing something like this **makes no sense** and it is "ugly":
181 |
182 | ```C++
183 | void YouAreTryingTooHardDude(const int& a, const double& b);
184 | ```
185 |
--------------------------------------------------------------------------------
/docs/reserve.md:
--------------------------------------------------------------------------------
1 | # Vectors are awesome...
2 |
3 | `std::vector<>`s have a huge advantage when compared to other data structures:
4 | their elements are packed in memory one next to the other.
5 |
6 | We might have a long discussion about how this may affect performance, based on how memory
7 | works in modern processors.
8 |
9 | If you want to know more about it, just Google "C++ cache aware programming". For instance:
10 |
11 | - [CPU Caches and why you Care](https://www.aristeia.com/TalkNotes/codedive-CPUCachesHandouts.pdf)
12 | - [Writing cache friendly C++ (video)](https://www.youtube.com/watch?v=Nz9SiF0QVKY)
13 |
14 | Iterating through all the elements of a vector is very fast and they work really really well when we have to
15 | append or remove an element from the back of the structure.
16 |
17 | # ... when you use `reserve`
18 |
19 | We need to understand how vectors work under the hood.
20 | When you push an element into an empty or full vector, we need to:
21 |
22 | - allocate a new block of memory that is larger.
23 | - move all the elements we have already stored in the previous block into the new one.
24 |
25 | Both these operations are expensive and we want to avoid them as much as possible, if you can,
26 | sometimes you just accept things the way they are.
27 |
28 | The size of the new block is **2X the capacity**. Therefore, if you have
29 | a vector where both `size()` and `capacity()` are 100 elements and you `push_back()` element 101th,
30 | the block of memory (and the capacity) will jump to 200.
31 |
32 | To prevent these allocations, that may happen multiple times, we can **reserve** the capacity that
33 | we know (or believe) the vector needs.
34 |
35 | Let's have a look to a micro-benchmark.
36 |
37 | ```C++
38 | static void NoReserve(benchmark::State& state)
39 | {
40 | for (auto _ : state) {
41 | // create a vector and add 100 elements
42 | std::vector v;
43 | for(size_t i=0; i<100; i++){ v.push_back(i); }
44 | }
45 | }
46 |
47 | static void WithReserve(benchmark::State& state)
48 | {
49 | for (auto _ : state) {
50 | // create a vector and add 100 elements, but reserve first
51 | std::vector v;
52 | v.reserve(100);
53 | for(size_t i=0; i<100; i++){ v.push_back(i); }
54 | }
55 | }
56 |
57 |
58 | static void ObsessiveRecycling(benchmark::State& state) {
59 | // create the vector only once
60 | std::vector v;
61 | for (auto _ : state) {
62 | // clear it. Capacity is still 100+ from previous run
63 | v.clear();
64 | for(size_t i=0; i<100; i++){ v.push_back(i); }
65 | }
66 | }
67 | ```
68 |
69 | 
70 |
71 | Look at the difference! And these are only 100 elements.
72 |
73 | The number of elements influence the final performance gain a lot, but one thing is sure: it **will** be faster.
74 |
75 | Note also as the `ObsessiveRecycling` brings a performance gain that is probably visible for small vectors, but negligible with bigger ones.
76 |
77 | Don't take me wrong, though: `ObsessiveRecycling` will always be faster, even if according to the size of the object you are storing
78 | you may or may not notice that difference.
79 |
80 |
81 | ## Recognizing a vector at first sight
82 |
83 | This is the amount of memory an applications of mine was using over time (image obtained with **Heaptrack**):
84 |
85 | 
86 |
87 | Look at that! Something is doubling the amount of memory it is using by a factor of two every few seconds...
88 |
89 | I wonder what it could be? A vector, of course, because other data structures would have a more "linear" growth.
90 |
91 | That, by the way, **is a bug in the code that was found thanks to memory profiling**: that vector was not supposed to grow at all.
92 |
93 |
94 |
95 |
--------------------------------------------------------------------------------
/docs/small_strings.md:
--------------------------------------------------------------------------------
1 | # Small String Optimizations
2 |
3 | Remember when I said that "strings are `std::vector` in disguise"?
4 |
5 | In practice, very smart folks realized that you may store
6 | small strings inside the already allocated memory.
7 |
8 | Given that the size of a `std::string` is **24 bytes** on a 64-bits
9 | platform (to store data pointer, size and capacity), some
10 | very cool tricks allow us to store **statically** up to 23 bytes
11 | before you need to allocate memory.
12 |
13 | That has a huge impact in terms of performance!
14 |
15 | 
16 |
17 | For the curious minds, here there are some details about the implementation:
18 |
19 | - [SSO-23](https://github.com/elliotgoodrich/SSO-23)
20 | - [CppCon 2016: “The strange details of std::string at Facebook"](https://www.youtube.com/watch?v=kPR8h4-qZdk)
21 |
22 | According to your version of the compiler, you may have less than 23 bytes, that is
23 | the theoretical limit.
24 |
25 | ## Example
26 |
27 | ```C++
28 | const char* SHORT_STR = "hello world";
29 |
30 | void ShortStringCreation(benchmark::State& state) {
31 | // Create a string over and over again.
32 | // It is just because "short strings optimization" is active
33 | // no memory allocations
34 | for (auto _ : state) {
35 | std::string created_string(SHORT_STR);
36 | }
37 | }
38 |
39 | void ShortStringCopy(benchmark::State& state) {
40 | // Here we create the string only once, but copy repeatably.
41 | // Why is it much slower than ShortStringCreation?
42 | // The compiler, apparently, outsmarted me
43 | std::string x; // create once
44 | for (auto _ : state) {
45 | x = SHORT_STR; // copy
46 | }
47 | }
48 |
49 | const char* LONG_STR = "this will not fit into small string optimization";
50 |
51 | void LongStringCreation(benchmark::State& state) {
52 | // The long string will trigger memory allocation for sure
53 | for (auto _ : state) {
54 | std::string created_string(LONG_STR);
55 | }
56 | }
57 |
58 | void LongStringCopy(benchmark::State& state) {
59 | // Now we do see an actual speed-up, when recycling
60 | // the same string multiple times
61 | std::string x;
62 | for (auto _ : state) {
63 | x = LONG_STR;
64 | }
65 | }
66 | ```
67 |
68 | As you may notice, my attempt to be clever and say "I will not create a new string
69 | every time" fails miserably if the string is short, but has a huge impact if the string
70 | is allocating memory.
71 |
72 | 
73 |
74 |
75 |
76 |
--------------------------------------------------------------------------------
/docs/small_vectors.md:
--------------------------------------------------------------------------------
1 | # Small vector optimization
2 |
3 | By now, I hope I convinced you that `std::vector` is the first data structure that you should consider to use unless you need an associative container.
4 |
5 | But even when we cleverly use `reserve` to prevent superfluous heap allocations and copies, there will be a least **one** heap allocation at the beginning. Can we do better?
6 |
7 | Sure we can! If you have read already about the [small string optimization](../../small_strings) you know where this is going.
8 |
9 | # "Static" vectors and "Small" vectors
10 |
11 | When you are sure that your vector is small and will remain small-ish even in the worst-case scenario, you can allocate the entire array of elements in the stack, and skip the expensive heap allocation.
12 |
13 | You may think that this is unlikely, but you will be surprised to know that this happens much more often than you may expect. Just 2 weeks ago, I identified this very same pattern in one of our libraries, where the size of some vector could be any number between 0 and 8 at most.
14 |
15 | A 30 minutes refactoring improved the overals speed of our software by 20%!
16 |
17 | Summarizing, you want the familiar API of this guy:
18 | ```C++
19 | std::vector my_data; // at least one heap allocation unless size is 0
20 | ```
21 | When in fact, under the hood, you want this:
22 | ```C++
23 | double my_data[MAX_SIZE]; // no heap allocations
24 | int size_my_data;
25 | ```
26 |
27 | Let's see a simple and naive implementation of `StaticVector`:
28 |
29 | ```C++
30 | #include
31 | #include
32 |
33 | template
34 | class StaticVector
35 | {
36 | public:
37 |
38 | using iterator = typename std::array::iterator;
39 | using const_iterator = typename std::array::const_iterator;
40 |
41 | StaticVector(uint8_t n=0): _size(n) {
42 | if( _size > N ){
43 | throw std::runtime_error("SmallVector overflow");
44 | }
45 | }
46 |
47 | StaticVector(const StaticVector& other) = default;
48 | StaticVector(StaticVector&& other) = default;
49 |
50 | StaticVector(std::initializer_list init)
51 | {
52 | _size = init.size();
53 | for(int i=0; i<_size; i++) { _storage[i] = init[i]; }
54 | }
55 |
56 | void push_back(T val){
57 | _storage[_size++] = val;
58 | if( _size > N ){
59 | throw std::runtime_error("SmallVector overflow");
60 | }
61 | }
62 |
63 | void pop_back(){
64 | if( _size == 0 ){
65 | throw std::runtime_error("SmallVector underflow");
66 | }
67 | back().~T(); // call destructor
68 | _size--;
69 | }
70 |
71 | size_t size() const { return _size; }
72 |
73 | void clear(){ while(_size>0) { pop_back(); } }
74 |
75 | T& front() { return _storage.front(); }
76 | const T& front() const { return _storage.front(); }
77 |
78 | T& back() { return _storage[_size-1]; }
79 | const T& back() const { return _storage[_size-1]; }
80 |
81 | iterator begin() { return _storage.begin(); }
82 | const_iterator begin() const { return _storage.begin(); }
83 |
84 | iterator end() { return _storage.end(); }
85 | const_iterator end() const { return _storage.end(); }
86 |
87 | T& operator[](uint8_t index) { return _storage[index]; }
88 | const T& operator[](uint8_t index) const { return _storage[index]; }
89 |
90 | T& data() { return _storage.data(); }
91 | const T& data() const { return _storage.data(); }
92 |
93 | private:
94 | std::array _storage;
95 | uint8_t _size = 0;
96 |
97 | ```
98 |
99 | **StaticVector** looks like a `std::vector` but is...
100 |
101 | 
102 |
103 | In some cases, there is a very high probability that a vector-like container will have at most **N** elements, but we are not "absolutely sure".
104 |
105 | We can still use a container, generally known as **SmallVector**, that will use the pre-allocated memory from the stack for its first N elements and **only** when the container needs to grow further, will create a new storage block using an heap allocation.
106 |
107 | ## StaticVector and SmallVector in the wild
108 |
109 | It turn out that these tricks are well known and can be found implemented and ready to use in many popular libraries:
110 |
111 | - [Boost::container](https://www.boost.org/doc/libs/1_73_0/doc/html/container.html). If it exists, Boost has it of course.
112 | - [Abseil](https://github.com/abseil/abseil-cpp/tree/master/absl/container). They are called `fixed_array` and `inlined_vector`.
113 | - For didactic purpose, you may have a look to the [SmallVector used internally by LLVM](https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ADT/SmallVector.h)
114 |
115 |
--------------------------------------------------------------------------------
/docs/strings_are_vectors.md:
--------------------------------------------------------------------------------
1 | # It is just a string: should I worry?
2 |
3 | `std::string` is a wonderful abstraction, when compared to the awful mess of
4 | raw pointers and lengths that you have to deal with in **C**.
5 |
6 | I am kidding, C developers, we love you!
7 | > Or we sympathize for you, depends on how you want to look at it.
8 |
9 | If you think about it, it should be no more than an `std::vector` in disguise,
10 | with some useful utility that makes sense for text, but no much more.
11 |
12 | On one hand, **it is**, but here comes what is called **Small String Optimization (SSO)**.
13 |
14 | [Read more about SSO here](small_strings.md).
15 |
16 | What I want to show you here is that, as any objects that **might**
17 | require memory allocation, you must use best practices you should use
18 | with similar containers (even if, arguably, often you need to worry less).
19 |
20 | ## ToString
21 |
22 | ```c++
23 | enum Color{
24 | BLUE,
25 | RED,
26 | YELLOW
27 | };
28 |
29 | std::string ToStringBad(Color c)
30 | {
31 | switch(c) {
32 | case BLUE: return "BLUE";
33 | case RED: return "RED";
34 | case YELLOW: return "YELLOW";
35 | }
36 | }
37 |
38 | const std::string& ToStringBetter(Color c)
39 | {
40 | static const std::string color_name[3] ={"BLUE", "RED", "YELLOW"};
41 | switch(c) {
42 | case BLUE: return color_name[0];
43 | case RED: return color_name[1];
44 | case YELLOW: return color_name[2];
45 | }
46 | }
47 | ```
48 |
49 | This is just an example of how, if you can, you should not create over and
50 | over a string. Of course, I can hear you arguing:
51 |
52 | "Davide, you are forgetting Return Value Optimization"?
53 |
54 | I am not. But a `const&` is **always** guaranteed to be the most
55 | performing option, so why try your luck?
56 |
57 |
58 | 
59 |
60 | ## Reuse temporary strings
61 |
62 | Here it comes a similar example in which we **potentially**
63 | recycle the memory already allocated in the past.
64 |
65 | You are not guaranteed to be faster with the latter version, but you
66 | might be.
67 |
68 | ```c++
69 | // Create a new string every time (even if return value optimization may help)
70 | static std::string ModifyString(const std::string& input)
71 | {
72 | std::string output = input;
73 | output.append("... indeed");
74 | return output;
75 | }
76 | // Reuse an existing string that MAYBE, have the space already reserved
77 | // (or maybe not..)
78 | static void ModifyStringBetter(const std::string& input, std::string& output)
79 | {
80 | output = input;
81 | output.append("... indeed");
82 | }
83 | ```
84 |
85 | And, as expected...
86 |
87 | 
88 |
89 |
--------------------------------------------------------------------------------
/docs/strings_concatenation.md:
--------------------------------------------------------------------------------
1 | # String concatenation
2 |
3 | Warning: before you read this, remember the rule #1 we mentioned at the beginning.
4 |
5 | **Optimize your code only if you can observe a visible overhead
6 | with you profiling tools**. Said that...
7 |
8 | As we said, strings are a little more than vectors of characters, therefore
9 | they may need heap allocations to store all their elements.
10 |
11 | Concatenating strings in C++ is very easy, but there is something we should
12 | be aware of.
13 |
14 | ## "Default" concatenation
15 |
16 | Look at this familiar line of code.
17 |
18 | ```C++
19 | std:string big_string = first + " " + second + " " + third;
20 |
21 | // Where...
22 | // std::string first("This is my first string.");
23 | // std::string second("This is the second string I want to append.");
24 | // std::string third("This is the third and last string to append.");
25 | ```
26 |
27 | Noticing anything suspicious? Think about heap allocations...
28 |
29 | 
30 |
31 | Let me rewrite it like this:
32 |
33 | ```C++
34 | std:string big_string = (((first + " ") + second) + " ") + third;
35 | ```
36 |
37 | Hopefully you got it. To concatenate strings of this length, you will
38 | need multiple heap allocations and copies from the old memory block to the new one.
39 |
40 | If only `std::string` had a method similar to `std::vector::reserve()` :(
41 |
42 | Hey, wait... [what is this](https://en.cppreference.com/w/cpp/string/basic_string/reserve)?
43 |
44 |
45 | ## "Manual" concatenation
46 |
47 | Let's use reserve to reduce the amount of heap allocations to exactly one.
48 |
49 | We can calculate the total amount of characters needed by `big_string` in
50 | advance and reserve it like this:
51 |
52 | ```C++
53 | std::string big_one;
54 | big_one.reserve(first_str.size() +
55 | second_str.size() +
56 | third_str.size() +
57 | strlen(" ")*2 );
58 |
59 | big_one += first;
60 | big_one += " ";
61 | big_one += second;
62 | big_one += " ";
63 | big_one += third;
64 | ```
65 |
66 | I know what you are thinking and you are 100% right.
67 |
68 | 
69 |
70 | That is a horrible piece of code... that is **2.5 times faster** than the
71 | default string concatenation!
72 |
73 | ## Variadic concatenation
74 |
75 | Can we create a string concatenation function that is fast, reusable **and**
76 | readable?
77 |
78 | We do, but we need to use some heavy weapons of Modern C++: **variadic templates**.
79 |
80 | There is a very [nice article about variadic templates here](https://arne-mertz.de/2016/11/more-variadic-templates/),
81 | that you should probably read if you are not familiar with them.
82 |
83 |
84 | ```C++
85 | //--- functions to calculate the total size ---
86 | size_t StrSize(const char* str) {
87 | return strlen(str);
88 | }
89 |
90 | size_t StrSize(const std::string& str) {
91 | return str.size();
92 | }
93 |
94 | template
95 | size_t StrSize(const Head& head, Tail const&... tail) {
96 | return StrSize(head) + StrSize(tail...);
97 | }
98 |
99 | //--- functions to append strings together ---
100 | template
101 | void StrAppend(std::string& out, const Head& head) {
102 | out += head;
103 | }
104 |
105 | template
106 | void StrAppend(std::string& out, const Head& head, Args const&... args) {
107 | out += head;
108 | StrAppend(out, args...);
109 | }
110 |
111 | //--- Finally, the function to concatenate strings ---
112 | template
113 | std::string StrCat(Args const&... args) {
114 | size_t tot_size = StrSize(args...);
115 | std::string out;
116 | out.reserve(tot_size);
117 |
118 | StrAppend(out, args...);
119 | return out;
120 | }
121 | ```
122 |
123 | That was a lot of complex code, even for a trained eye. But the good news are
124 | that it is very easy to use:
125 |
126 | ```C++
127 | std:string big_string = StrCat(first, " ", second, " ", third );
128 | ```
129 |
130 | So, how fast is that?
131 |
132 | 
133 |
134 | The reason why the version with variadic templates is slightly slower than
135 | the "ugly" manual concatenation is...
136 |
137 | I have no idea!
138 |
139 | What I **do** know is that it is twice as fast as the default one and
140 | it is not an unreadable mess.
141 |
142 | ## Before you copy and paste my code...
143 |
144 | My implementation of `StrCat` is very limited and I just wanted to make a point:
145 | beware of string concatenations in C++.
146 |
147 | Nevertheless, don't think it twice and use [{fmt}](https://github.com/fmtlib/fmt) instead.
148 |
149 | Not only it is an easy to integrate, well documented and **very**
150 | fast library to format strings.
151 |
152 | It is also an implementation of [C++20 std::format](https://en.cppreference.com/w/cpp/utility/format).
153 |
154 | This means that you can write code that is readable, performant and future proof!
155 |
156 |
157 |
158 |
159 |
160 |
161 |
--------------------------------------------------------------------------------
/mkdocs.yml:
--------------------------------------------------------------------------------
1 | site_name: CPP Optimizations diary
2 |
3 | site_description: Tip and tricks to optimize your C++ code.
4 | site_author: Davide Faconti
5 |
6 | copyright: 'Copyright © 2020 Davide Faconti'
7 |
8 | theme:
9 | name: 'material'
10 | custom_dir: overrides
11 | language: en
12 | logo: 'img/cpp.png'
13 | palette:
14 | primary: blue grey
15 | accent: purple
16 | font:
17 | text: Ubuntu
18 | code: Roboto Mono
19 | icon:
20 | logo: material/library
21 | repo: fontawesome/brands/git-alt
22 |
23 | repo_name: 'CPP_Optimizations_Diary'
24 | repo_url: 'https://github.com/facontidavide/CPP_Optimizations_Diary'
25 |
26 | extra:
27 | social:
28 | - icon: fontawesome/brands/twitter
29 | link: https://twitter.com/facontidavide
30 |
31 | markdown_extensions:
32 | - admonition
33 | - codehilite
34 | - pymdownx.highlight:
35 | anchor_linenums: true
36 | - pymdownx.inlinehilite
37 | - pymdownx.snippets
38 | - pymdownx.superfences
39 |
40 |
41 | extrahead:
42 |
43 |
44 | nav:
45 | - Home: index.md
46 |
47 | - Reference and move semantic:
48 | - Use references: prefer_references.md
49 |
50 | - Vectors are your best friend:
51 | - Use reserve: reserve.md
52 | - Avoid std::list : no_lists.md
53 | - You may not need std::map : dont_need_map.md
54 | - Small vector optimization: small_vectors.md
55 |
56 | - "It's just a string...":
57 | - Strings are vectors: strings_are_vectors.md
58 | - Small string optimization: small_strings.md
59 | - String concatenation: strings_concatenation.md
60 |
61 | - Don't compute it twice:
62 | - More efficient 2D transforms: 2d_transforms.md
63 | - Iterating over a 2D matrix: 2d_matrix_iteration.md
64 |
65 | - Fantastic data structures:
66 | - Boost flat_map to the rescue: boost_flatmap.md
67 |
68 | - Case studies:
69 | - Faster and simpler PCL filter: pcl_filter.md
70 | - More PCL optimizations: pcl_fromROS.md
71 | - Fast palindrome: palindrome.md
72 |
73 | - About me: about.md
74 |
75 |
--------------------------------------------------------------------------------
/overrides/main.html:
--------------------------------------------------------------------------------
1 | {% extends "base.html" %}
2 |
3 | {% block extrahead %}
4 |
5 |
6 |
7 | {% endblock %}
8 |
--------------------------------------------------------------------------------