Disclaimer: I do not know any theory of error handling, however, I thought it was repeated on this issue, as I studied various programming languages and paradigms, and also experimented with programming language designs (and discussed them). The following is a summary of my experience; with objective arguments.
Note. This should cover all issues, but I did not even try to streamline them, preferring instead a structured view. At the end of each section, I present a brief answer to the questions he answered for clarity.
Introduction
As a prerequisite, I would like to note that no matter what is discussed, some parameters should be taken into account when developing a library (or reusable code).
The author cannot hope to understand how this library will be used, and thus avoid strategies that make integration difficult. The most egregious defect will rely on a globally shared state; thread-local shared state can also be a nightmare for interacting with coroutines / green threads. The use of such coroutines and threads also emphasizes that synchronization is best left to the user, in single-threaded code this will mean nothing (better performance), while in coroutines and green threads the user is best suited to implement (or use existing implementations) of the allocated mechanisms synchronization.
Moreover, when the library is intended for internal use only, global or stream local variables can be convenient; if used, they should be clearly documented as technical limitations.
Logging
There are many ways to record messages:
- with additional information such as timestamp, process ID, thread ID, server / IP name, ...
- via synchronous calls or with an asynchronous mechanism (and an overflow handling mechanism)
- in files, databases, distributed databases, dedicated log servers, ...
As the author of the library, journals should be integrated into the client infrastructure (or disabled). This is best ensured by allowing the client to provide hooks to deal with the magazines themselves, my recommendation:
- to provide 2 hooks: one to decide whether to log or not, and one for the actual log (formatted message and last call are only called when the client has decided to register)
- to provide, in addition to the message: severity (or level), file name, name of the line and function, if open source or otherwise a logical module (if several)
- to, by default, write to
stdout and stderr (depending on severity) until the client explicitly indicates log
I would like to note that in accordance with the instructions described in the introduction, synchronization is provided to the client.
Regarding log errors: do not log (as errors) what you have already reported using your API; you can, however, register parts with less detail. The client can decide whether to report it or not when processing the error, and, for example, should not report it if it was just a speculative call.
Note: some information should not go to the logs, and some other objects are best messed up. For example, passwords should not be recorded, and credit card numbers or passport / social security numbers are best confused (in part at least). In a library designed for such important information, this can be done during registration; otherwise the application should take care of this.
Does it write something that should be done only in the application code? Or is it ok to make some entries from library code.
Application code should determine the policy. Regardless of whether the library works or not, whether it depends on it.
Continued after an error?
Before talking about reporting errors, the first question we should ask is whether to report an error (for processing) or if something is wrong, that canceling the current process is certainly the best policy.
This, of course, is a complex topic. In general, I would advise developing such an option to continue, with the option to clear / reset, if necessary. If this cannot be achieved in some cases, then these cases should trigger an abortion process.
Note. On some systems, a process memory dump can be obtained. If the application processes confidential data (password, credit cards, passports, ...), it is better to deactivate it during the production process (but can be used in the development process).
Note: it may be interesting to have a debug switch that converts some of the error messages to a dumped abortion to help debug during development.
Report a bug
The occurrence of an error means that the function / interface contract cannot be executed. This has several consequences:
- the client must be warned, so an error message must be reported
- partially incorrect data should come out in the wild
The last point will be considered later; now let's focus on the error message. The client should not, in any case, be able to accidentally ignore this report. This is why using error codes is such an abomination (in languages where return values can be ignored):
ErrorStatus_t doit(Input const* input, Output* output);
I know two schemes that require explicit action for the client side:
- exceptions
- result types (
optional<T> , either<T, U> , ...)
The former is well known, the latter is very often used in functional languages and was introduced in C ++ 11 under the guise of std::future<T> , although there are other implementations.
I advise you to prefer the latter when possible, as it is easier to understand, but return to exceptions when no result is expected. Contrast:
Option<Value&> find(Key const&); void updateName(Client::Id id, Client::Name name);
In the case of write-only operations, such as updateName , the client does not use the result. It could be entered, but it would be easy to forget the check.
A return to exceptions also occurs when the type of result is impractical or insufficient to convey details:
Option<Value&> compute(RepositoryInterface&, Details...);
In this case, the external callback has an almost endless list of potential failures. An implementation can use a network, a database, a file system, ... in this case, for an accurate error message:
- the external response is expected to report errors through exceptions when the interface is insufficient (or impractical) to convey complete error information.
- functions based on this abstract callback should be transparent to these exceptions (let them pass, not modify)
The goal is to allow this exception to go to the level where the interface implementation was implemented (at least), since only at this level is there a chance to correctly interpret the exception raised.
Note. An externally defined call is not forced to use exceptions, we just have to expect it to use some.
Error usage
To use the error report, the client needs enough information to make a decision. Structured information, such as error codes or exception types, should be preferred (for automatic actions), and additional information (message, stack, ...) can be provided in an unstructured way (for people to investigate).
It would be better if the function clearly fixed all possible failure modes: when they occur and how they are reported. However, especially in the case of arbitrary code execution, the client should be prepared to work with unknown codes / exceptions.
A notable exception is, of course, the types of results: boost::variant<Output, Error0, Error1, ...> provides a compiler-tested exhaustive list of known failure modes ... although the function returning this type can still throw.
How to decide how to register an error or show it as an error message to the user?
The user should always be warned when his order cannot be completed, however, a user-friendly (understandable) message should be displayed. If possible, tips or advice should also be provided. Details to investigate the teams.
Error recovery?
Finally, but certainly not least, the truly scary part about errors comes: recovery.
This is what (real) databases are so good for: transactional semantics. If something unexpected happens, the transaction is aborted as if nothing had happened.
In the real world, things are not so simple. A simple example of canceling emailing: too late. Protocols may exist depending on your application domain, but this is not discussed. The first step, however, is the ability to restore a normal state in memory; and this is far from simple in most languages (and STM can only do this today).
First of all, an illustration of the problem:
void update(Client& client, Client::Name name, Client::Address address) { client.update(std::move(name)); client.update(std::move(address));
Now, after updating the address failed, I was left with half the updated client . What can I do?
- an attempt to undo all updates that occurred is almost impossible (cancellation may fail)
- copying the state before performing any one update is a hog process (suppose we can even change it in the right way)
In any case, the required accounting is such that errors will creep.
And worst of all: there is no safe assumption that can be made regarding the degree of corruption (except that the client now corrupt). Or, at least, no assumption that will sustain the time (and code changes).
As often, the only way to win is to not play.
Possible Solution: Transactions
Whenever possible, the key idea is to identify macro functions that either fail or lead to the expected result. These are our transactions. And their form is invariant:
Either<Output, Error> doit(Input const&); // or Output doit(Input const&); // throw in case of error
A transaction does not change any external state, therefore, if it does not produce a result:
- the outside world has not changed (nothing rollback)
- no partial result for observation
Any function that is not a transaction should be considered as corrupted by everything that it affected, and thus, the only reasonable way to deal with the error from non-trading functions is that it can bubble up to the level of the transaction. Any attempt to cope with a previous error is, in the end, doomed to failure.
How to decide whether to handle the error locally or propagate to higher-level code?
In case of exceptions, where do you usually catch them? In the code of the lower level or higher level?
Treat them when it's safe , and that matters. First of all, it’s good to catch an error, check whether it can be processed locally, and then either deal with it or transfer it.
If you strive for a common error handling strategy across all levels of code or try to develop a system that can adapt to various error handling strategies (in order to be able to deal with errors from third-party libraries).
I have not considered this question before, but I believe that this is clear than the one I highlighted is already double, since it consists of both types of results and exceptions. Thus, working with third-party libraries should be cinch, although I even recommend wrapping them for other reasons (third-party code is better isolated outside the business-oriented interface, which is charged with adapting the impedance).