Fastest `finally` for C ++

C ++ still (unfortunately) does not support the finally clause for the try . This leads to thoughts on how to free up resources. Having studied the question on the Internet, although I found some solutions, I did not understand their performance (and I would use Java if the performance did not matter). So I had to navigate.

Possible options:

  • The finally functor class proposed by CodeProject . He is powerful, but slow. And the showdown suggests that local variables of an external function are captured very inefficiently: one after the other is pushed onto the stack, and not just passes a pointer to the internal (lambda function).

  • RAII: manual cleaning object on the stack: the drawback is manual dialing and sewing it for each used place. Another disadvantage is the need to copy to it all the variables necessary for the release of resources.

  • MSVC ++ specific __try / __finally statement . The disadvantage is that it is clearly not tolerated.

I created this little benchmark to compare performance during the execution of these approaches:

 #include <chrono> #include <functional> #include <cstdio> class Finally1 { std::function<void(void)> _functor; public: Finally1(const std::function<void(void)> &functor) : _functor(functor) {} ~Finally1() { _functor(); } }; void BenchmarkFunctor() { volatile int64_t var = 0; const int64_t nIterations = 234567890; auto start = std::chrono::high_resolution_clock::now(); for (int64_t i = 0; i < nIterations; i++) { Finally1 doFinally([&] { var++; }); } auto elapsed = std::chrono::high_resolution_clock::now() - start; double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count(); printf("Functor: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var); } void BenchmarkObject() { volatile int64_t var = 0; const int64_t nIterations = 234567890; auto start = std::chrono::high_resolution_clock::now(); for (int64_t i = 0; i < nIterations; i++) { class Cleaner { volatile int64_t* _pVar; public: Cleaner(volatile int64_t& var) : _pVar(&var) { } ~Cleaner() { (*_pVar)++; } } c(var); } auto elapsed = std::chrono::high_resolution_clock::now() - start; double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count(); printf("Object: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var); } void BenchmarkMSVCpp() { volatile int64_t var = 0; const int64_t nIterations = 234567890; auto start = std::chrono::high_resolution_clock::now(); for (int64_t i = 0; i < nIterations; i++) { __try { } __finally { var++; } } auto elapsed = std::chrono::high_resolution_clock::now() - start; double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count(); printf("__finally: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var); } template <typename Func> class Finally4 { Func f; public: Finally4(Func&& func) : f(std::forward<Func>(func)) {} ~Finally4() { f(); } }; template <typename F> Finally4<F> MakeFinally4(F&& f) { return Finally4<F>(std::forward<F>(f)); } void BenchmarkTemplate() { volatile int64_t var = 0; const int64_t nIterations = 234567890; auto start = std::chrono::high_resolution_clock::now(); for (int64_t i = 0; i < nIterations; i++) { auto doFinally = MakeFinally4([&] { var++; }); //Finally4 doFinally{ [&] { var++; } }; } auto elapsed = std::chrono::high_resolution_clock::now() - start; double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count(); printf("Template: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var); } void BenchmarkEmpty() { volatile int64_t var = 0; const int64_t nIterations = 234567890; auto start = std::chrono::high_resolution_clock::now(); for (int64_t i = 0; i < nIterations; i++) { var++; } auto elapsed = std::chrono::high_resolution_clock::now() - start; double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count(); printf("Empty: %.3lf Ops/sec, var=%lld\n", nIterations / nSec, (long long)var); } int __cdecl main() { BenchmarkFunctor(); BenchmarkObject(); BenchmarkMSVCpp(); BenchmarkTemplate(); BenchmarkEmpty(); return 0; } 

The results on my Ryzen 1800X @ 3.9 GHz with DDR4 @ 2.6Ghz CL13 were:

 Functor: 175148825.946 Ops/sec, var=234567890 Object: 553446751.181 Ops/sec, var=234567890 __finally: 553832236.221 Ops/sec, var=234567890 Template: 554964345.876 Ops/sec, var=234567890 Empty: 554468478.903 Ops/sec, var=234567890 

Apparently, all options except functor-base (# 1) are as fast as an empty loop.

So, is there a quick and powerful C ++ alternative for finally that is portable and requires minimal copying from the stack of an external function?

UPDATE: I tested the @ Jarod42 solution, so the code and output are updated here in the question. Although, as @Sopel mentioned, it can break if copying fails.

UPDATE2: To clarify what I am asking, this is a convenient quick way in C ++ to execute a block of code, even if an exception is thrown. For the reasons stated in the question, some methods are slow or inconvenient.

+5
source share
2 answers

You can implement Finally without erasing the std::function styles and overhead:

 template <typename F> class Finally { F f; public: template <typename Func> Finally(Func&& func) : f(std::forward<Func>(func)) {} ~Finally() { f(); } Finally(const Finally&) = delete; Finally(Finally&&) = delete; Finally& operator =(const Finally&) = delete; Finally& operator =(Finally&&) = delete; }; template <typename F> Finally<F> make_finally(F&& f) { return { std::forward<F>(f) }; } 

And use it like:

 auto&& doFinally = make_finally([&] { var++; }); 

Demo

+11
source

Well, this is your test that is broken: it doesn’t actually throw, so you only see the path without exception. This is pretty bad, because the optimizer can prove that you are not throwing, so it can throw away all the code that actually handles the cleanup with the exception in flight.

I think you should repeat your test by placing an exceptionThrower() or nonthrowingThrower() call in your try{} block. These two functions must be compiled as a separate translation unit and associated only with the reference code. This will force the compiler to actually generate exception handling code, whether you call exceptionThrower() or nonthrowingThrower() . (Make sure you do not enable link time optimization, which can ruin the effect.)

It will also allow you to easily compare the impact of performance between exclusive and non-throwing execution paths.


Beyond security questions, exceptions in C ++ are slow. You will never get hundreds of millions of exceptions thrown in a second. This is most of all millions of digits at best, most likely less. I expect that any performance differences between the various finally implementations are completely irrelevant in the case of throwing. What you can optimize is a non-metal path, where your costs are just the construction / destruction of your finally implementation object, whatever that is.

0
source

Source: https://habr.com/ru/post/1268806/


All Articles