Error Handling Theory?

Most error handling recommendations come down to a few tips and tricks (like this post ). These tips are useful, but I think they do not answer all the questions. I feel that I have to develop my application in accordance with a certain philosophy, a school of thought, which provides a solid foundation for development. Is there such a theory regarding error handling?

Here are some practical questions:

  • How to decide whether to handle the error locally or propagate to higher-level code?
  • How to decide how to register an error or show it as an error message to the user?
  • Does it write something that should be done only in the application code? Or is it ok to make some entries from library code.
  • In case of exceptions, where do you usually catch them? In the code of the lower level or higher level?
  • If you are striving for a unified error handling strategy across all levels of code or trying to develop a system that can adapt to various error handling strategies (to be able to deal with errors from third-party libraries).
  • Does it make sense to create a list of error codes? Or is it old-fashioned these days?

In many cases, common sense is enough to develop a good enough strategy to solve the error conditions. However, I would like to know if there is a more formal / “scientific” approach?

PS: this is a general question, but answers in C ++ are also welcome (C ++ is my main programming language to work with).

+49
c ++ error-handling
Jan 01 '09 at 22:10
source share
14 answers

A couple of years ago I was thinking exactly about the same question :)

After searching and reading a few things, I think the most interesting link I found was the templates for generating, processing and managing errors from Andy Longshaw and Eoin Woods. This is a short and systematic attempt to cover the main idioms that you mentioned, and some others.

The answer to these questions is rather controversial, but the authors above were brave enough to expose themselves to the conference, and then put their thoughts on paper.

+6
May 03 '14 at 19:51
source share

Writes what should be done in the application code? Or it's ok to do some registration from the library code.

Just wanted to comment on this. My opinion is to never log directly in the library code, but provide hooks or callbacks to implement this in the application code, so the application can decide what to do with logging out (if at all).

+14
Jan 01 '09 at 22:20
source share

Introduction

To understand what needs to be done to handle errors, I believe that you need to clearly understand the types of errors that errors encounter and the contexts that they encounter.

It was extremely useful for me to consider two main types of errors:

  • Errors that should never occur, and usually due to errors in the code.

  • Errors that are expected and cannot be prevented during normal operation, for example, a connection to the database is due to a database problem over which the application has no control.

The way error is handled is largely dependent on the type of error.

The various contexts that affect error handling are as follows:

  • Application code

  • Library code

Error handling in the library code is somewhat different from processing in the application code.

The following describes the philosophy for handling the two main types of errors. Special considerations for library code are also considered. Finally, specific practical issues in the initial post are examined in the context of the philosophy presented.

Types of errors

Programming errors - errors - and other errors that should never occur

Many errors are the result of programming errors. These errors, as a rule, cannot be fixed, since a specific programming error cannot be expected. This means that we cannot know in advance which condition is an error for the application, therefore we cannot recover from this condition and should not try.

The ultimate fix for this kind of error is to fix a programming error. To facilitate this, the error should be popped up as quickly as possible. Ideally, the program should exit immediately after identifying such an error and providing relevant information. A quick and obvious exit reduces the time required to complete the debugging and retesting cycle, allowing you to fix more errors in the same test time; which in turn leads to a more robust application with fewer errors when the time comes for deployment.

Another important task in handling this type of error should be to provide sufficient information to easily identify the error. In Java, for example, throwing a RuntimeException often provides enough information in the stack trace to immediately detect an error; in clean code, immediate fixes can often be identified only after studying the stack trace. In other languages, you can register a call stack or otherwise save the necessary information. It is imperative not to suppress information in the interest of brevity; Do not worry about how much journal space you occupy when this type of error occurs. The more information is provided, the faster errors can be fixed, and the less errors will continue to pollute the logs when the application does this for production.

Server applications

Now, in some server applications, it is important that the server is fault tolerant enough to continue working even in the face of random programming errors. In this case, the best approach is a very clear separation between the server code, which should continue to work, and the task processing code, which can be resolved. For example, tasks can be assigned to threads or subprocesses, as is done on many web servers.

In such a server architecture, the thread or subprocess that processes the task can then be processed as an application that may fail. All of the above considerations apply to such a task: the error should be surfaced as quickly as possible by a clean exit from the task and sufficient information should be recorded so that it can be easily found and fixed the error. When such a task exits Java, for example, the entire stack trace of any RuntimeException that causes the exit should usually be logged.

As much code as possible should be executed in the threads or processes that process the task, and not in the main thread or server process. This is due to the fact that any error in the main thread or server process will still lead to the fact that the entire server will go down. It is better to click the code - with the errors contained in it - into the task processing code, where it will not cause a server failure when an error occurs.

Errors that are expected and cannot be prevented during normal operation

Errors that are expected and cannot be prevented during normal operation, such as exclusion from a database or other service, separate from the application, require very different handling. In these cases, the goal is not to fix the code, but to have the code that handles the error when it makes sense and to inform users or operators who might fix the problem otherwise.

In these cases, for example, the application may want to discard all the results accumulated so far and repeat the operation. In accessing a database, using transactions can help avoid discarding accumulated data. In other cases, it is useful to write one code with such repetitions. The concept of idempotency may also be useful here.

When automatic attempts will not solve the problem sufficiently, people should be informed. The user should be informed that the operation failed; often the user may be given the opportunity to retry. The user can then judge the desirability of the repetition, and can also make changes to the input, which can help improve performance when you try again.

For this type of error, you can use logging and possibly email notifications to inform system operators. Unlike logging programming errors, recording errors expected during normal operation should be more concise, since an error can occur many times and appear many times in the logs; operators will often analyze the structure of many errors, rather than focusing on one single error.

Libraries and Applications

The above discussion of error types is directly applicable to the application code. Another main context for error handling is library code. The library code still has the same two main types of errors, but usually it cannot or should not directly contact the user, and also has less knowledge about the application context, including whether immediate exit is acceptable than the application code.

As a result, there are differences in how libraries should handle logging, how they should handle errors that might be expected during normal operation, and how they should handle programming errors and other errors that should never occur.

With respect to logging, the library should, if possible, support logging in the desired format with the client application code. One valid approach is to not log at all and allow the application code to do all the logging based on the error information provided by the application library. Another approach is to use a custom logging interface that allows the client application to provide an implementation for logging, for example, when the library is loaded for the first time. For example, in Java, a library can use the logging interface and let the application worry about which version of logging you need to configure to use the protocol.

For errors and other errors that should never occur, libraries still cannot just exit the application, as this may not be acceptable for the application. Rather, libraries should get out of the library call by providing the caller with enough information to help diagnose the problem. Information can be provided as an exception with a stack trace, or the library can log information if a custom logging method is used. Then the application can handle this, like any other error of this type, usually by exiting either on the server, allowing the completion of the task or thread process with the same protocol or error messages that will be executed to program errors in the application code.

Errors expected during normal operation should also be reported to the client code. In this case, as with this type of error when detected in the client code, the information associated with the error may be more concise. Typically, libraries should do less local processing of this type of error, relying more on client code to solve things like retry and how many times. Then the client code may pass by deciding the retry to the user.

Practical issues

Now that we have a philosophy, apply it to the practical issues that you mentioned.

  • How to decide whether to handle the error locally or propagate to higher-level code?

If this is an error expected during normal operation, try again, or perhaps contact the user locally. Otherwise, extend it to a higher level code.

  • How to decide how to register an error or show it as an error message to the user?

If this is an error expected during normal operation, and user input would be useful to determine what action to take, get user input and record a short message; if this is a programming error, provide the user with a short notice and write down more extensive information.

  • Does it write something that should be done only in the application code? Or is it ok to make some entries from library code.

The entry from the library code must be controlled by the client code. In the best case, the library should enter the interface for which the client provides the implementation.

  • In case of exceptions, where do you usually catch them? In the code of the lower level or higher level?

Exceptions that are expected during normal operation can be localized locally, and the operation is repeated or otherwise handled. In all other cases, the distribution of exceptions is allowed.

  • If you are striving for a common error handling strategy across all levels of code or trying to develop a system that can adapt to various error handling strategies (in order to be able to deal with errors from third-party libraries).

The types of errors in third-party libraries are the same types of errors that are found in application code. Errors should be handled mainly according to what type of error they present, with appropriate settings for the library code.

  • Does it make sense to create a list of error codes? Or is it old-fashioned these days?

The application code should contain a full description of the error in case of programming errors and a brief description in case of errors that may occur during normal operation; in any case, the description is usually more appropriate than the error code. Libraries can provide an error code as a way of describing whether the error is programming or another internal error, or whether the error is one that may occur during normal operation, the latter type being perhaps more accurately subdivided; however, an exception hierarchy may be more useful than an error code in languages ​​where possible. Note that applications launched from the command line can act as libraries for shell scripts.

+8
May 04 '14 at 17:29
source share

Disclaimer: I do not know any theory of error handling, however, I thought it was repeated on this issue, as I studied various programming languages ​​and paradigms, and also experimented with programming language designs (and discussed them). The following is a summary of my experience; with objective arguments.

Note. This should cover all issues, but I did not even try to streamline them, preferring instead a structured view. At the end of each section, I present a brief answer to the questions he answered for clarity.




Introduction

As a prerequisite, I would like to note that no matter what is discussed, some parameters should be taken into account when developing a library (or reusable code).

The author cannot hope to understand how this library will be used, and thus avoid strategies that make integration difficult. The most egregious defect will rely on a globally shared state; thread-local shared state can also be a nightmare for interacting with coroutines / green threads. The use of such coroutines and threads also emphasizes that synchronization is best left to the user, in single-threaded code this will mean nothing (better performance), while in coroutines and green threads the user is best suited to implement (or use existing implementations) of the allocated mechanisms synchronization.

Moreover, when the library is intended for internal use only, global or stream local variables can be convenient; if used, they should be clearly documented as technical limitations.




Logging

There are many ways to record messages:

  • with additional information such as timestamp, process ID, thread ID, server / IP name, ...
  • via synchronous calls or with an asynchronous mechanism (and an overflow handling mechanism)
  • in files, databases, distributed databases, dedicated log servers, ...

As the author of the library, journals should be integrated into the client infrastructure (or disabled). This is best ensured by allowing the client to provide hooks to deal with the magazines themselves, my recommendation:

  • to provide 2 hooks: one to decide whether to log or not, and one for the actual log (formatted message and last call are only called when the client has decided to register)
  • to provide, in addition to the message: severity (or level), file name, name of the line and function, if open source or otherwise a logical module (if several)
  • to, by default, write to stdout and stderr (depending on severity) until the client explicitly indicates log

I would like to note that in accordance with the instructions described in the introduction, synchronization is provided to the client.

Regarding log errors: do not log (as errors) what you have already reported using your API; you can, however, register parts with less detail. The client can decide whether to report it or not when processing the error, and, for example, should not report it if it was just a speculative call.

Note: some information should not go to the logs, and some other objects are best messed up. For example, passwords should not be recorded, and credit card numbers or passport / social security numbers are best confused (in part at least). In a library designed for such important information, this can be done during registration; otherwise the application should take care of this.

Does it write something that should be done only in the application code? Or is it ok to make some entries from library code.

Application code should determine the policy. Regardless of whether the library works or not, whether it depends on it.




Continued after an error?

Before talking about reporting errors, the first question we should ask is whether to report an error (for processing) or if something is wrong, that canceling the current process is certainly the best policy.

This, of course, is a complex topic. In general, I would advise developing such an option to continue, with the option to clear / reset, if necessary. If this cannot be achieved in some cases, then these cases should trigger an abortion process.

Note. On some systems, a process memory dump can be obtained. If the application processes confidential data (password, credit cards, passports, ...), it is better to deactivate it during the production process (but can be used in the development process).

Note: it may be interesting to have a debug switch that converts some of the error messages to a dumped abortion to help debug during development.




Report a bug

The occurrence of an error means that the function / interface contract cannot be executed. This has several consequences:

  • the client must be warned, so an error message must be reported
  • partially incorrect data should come out in the wild

The last point will be considered later; now let's focus on the error message. The client should not, in any case, be able to accidentally ignore this report. This is why using error codes is such an abomination (in languages ​​where return values ​​can be ignored):

 ErrorStatus_t doit(Input const* input, Output* output); 

I know two schemes that require explicit action for the client side:

  • exceptions
  • result types ( optional<T> , either<T, U> , ...)

The former is well known, the latter is very often used in functional languages ​​and was introduced in C ++ 11 under the guise of std::future<T> , although there are other implementations.

I advise you to prefer the latter when possible, as it is easier to understand, but return to exceptions when no result is expected. Contrast:

 Option<Value&> find(Key const&); void updateName(Client::Id id, Client::Name name); 

In the case of write-only operations, such as updateName , the client does not use the result. It could be entered, but it would be easy to forget the check.

A return to exceptions also occurs when the type of result is impractical or insufficient to convey details:

 Option<Value&> compute(RepositoryInterface&, Details...); 

In this case, the external callback has an almost endless list of potential failures. An implementation can use a network, a database, a file system, ... in this case, for an accurate error message:

  • the external response is expected to report errors through exceptions when the interface is insufficient (or impractical) to convey complete error information.
  • functions based on this abstract callback should be transparent to these exceptions (let them pass, not modify)

The goal is to allow this exception to go to the level where the interface implementation was implemented (at least), since only at this level is there a chance to correctly interpret the exception raised.

Note. An externally defined call is not forced to use exceptions, we just have to expect it to use some.




Error usage

To use the error report, the client needs enough information to make a decision. Structured information, such as error codes or exception types, should be preferred (for automatic actions), and additional information (message, stack, ...) can be provided in an unstructured way (for people to investigate).

It would be better if the function clearly fixed all possible failure modes: when they occur and how they are reported. However, especially in the case of arbitrary code execution, the client should be prepared to work with unknown codes / exceptions.

A notable exception is, of course, the types of results: boost::variant<Output, Error0, Error1, ...> provides a compiler-tested exhaustive list of known failure modes ... although the function returning this type can still throw.

How to decide how to register an error or show it as an error message to the user?

The user should always be warned when his order cannot be completed, however, a user-friendly (understandable) message should be displayed. If possible, tips or advice should also be provided. Details to investigate the teams.




Error recovery?

Finally, but certainly not least, the truly scary part about errors comes: recovery.

This is what (real) databases are so good for: transactional semantics. If something unexpected happens, the transaction is aborted as if nothing had happened.

In the real world, things are not so simple. A simple example of canceling emailing: too late. Protocols may exist depending on your application domain, but this is not discussed. The first step, however, is the ability to restore a normal state in memory; and this is far from simple in most languages ​​(and STM can only do this today).

First of all, an illustration of the problem:

 void update(Client& client, Client::Name name, Client::Address address) { client.update(std::move(name)); client.update(std::move(address)); // Throws } 

Now, after updating the address failed, I was left with half the updated client . What can I do?

  • an attempt to undo all updates that occurred is almost impossible (cancellation may fail)
  • copying the state before performing any one update is a hog process (suppose we can even change it in the right way)

In any case, the required accounting is such that errors will creep.

And worst of all: there is no safe assumption that can be made regarding the degree of corruption (except that the client now corrupt). Or, at least, no assumption that will sustain the time (and code changes).

As often, the only way to win is to not play.




Possible Solution: Transactions

Whenever possible, the key idea is to identify macro functions that either fail or lead to the expected result. These are our transactions. And their form is invariant:

 Either<Output, Error> doit(Input const&); // or Output doit(Input const&); // throw in case of error 

A transaction does not change any external state, therefore, if it does not produce a result:

  • the outside world has not changed (nothing rollback)
  • no partial result for observation

Any function that is not a transaction should be considered as corrupted by everything that it affected, and thus, the only reasonable way to deal with the error from non-trading functions is that it can bubble up to the level of the transaction. Any attempt to cope with a previous error is, in the end, doomed to failure.

How to decide whether to handle the error locally or propagate to higher-level code?

In case of exceptions, where do you usually catch them? In the code of the lower level or higher level?

Treat them when it's safe , and that matters. First of all, it’s good to catch an error, check whether it can be processed locally, and then either deal with it or transfer it.




If you strive for a common error handling strategy across all levels of code or try to develop a system that can adapt to various error handling strategies (in order to be able to deal with errors from third-party libraries).

I have not considered this question before, but I believe that this is clear than the one I highlighted is already double, since it consists of both types of results and exceptions. Thus, working with third-party libraries should be cinch, although I even recommend wrapping them for other reasons (third-party code is better isolated outside the business-oriented interface, which is charged with adapting the impedance).

+6
May 04 '14 at 1:18
source share

How to decide whether to handle the error locally or propagate to higher-level code?

If an exception disrupts the method, it is a good approach to drop it to a higher level. If you are familiar with MVC, Exceptions should be evaluated in the controller.

How to decide how to register an error or show it as an error message to the user? Registration errors and all error information is a good approach. If the error aborts the operation, or the user needs to know that an error has occurred, you must display it to the user. Note that Windows service logs are very important.

Does it write something that should be done only in the application code? Or is it ok to make some entries from the library code.

I see no reason to register errors in the dll. This should only cause errors. Of course, there may be a certain reason. In our company, dll records process information (not only errors)

In case of exceptions, where do you usually catch them? In the code of the lower level or higher level? A similar question: at what point should you stop spreading the error and deal with it?

In the controller.

Edit: I need to explain this a bit if you are not familiar with MVC. Model View Controller is a design pattern. In the model, you are developing application logic. In view mode, you display content to the user. In the controller, you receive custom events and call "Model" for the corresponding function, then call "View" to display the result for the user.

Suppose you have a form with two text fields and a shortcut and a button named Add. As you might have guessed, this is your opinion. The Button_Click event is defined in the controller. And the adding method is defined in the model. When the user clicks, the Button_Click event is fired, and the controller calls the add method. Here, the values ​​of the text field may be empty or they may be letters instead of numbers. An exception occurs in the add function, and this exception is thrown. The controller processes it. And displays an error message in the label.

Should you strive for a common error handling strategy across all levels of code or try to develop a system that can adapt to various error handling strategies (in order to be able to deal with errors from the 3rd party library).

I prefer the second. It would be easier. And I don’t think you can do general material for error handling. Especially for different libraries.

Does it make sense to create a list of error codes? Or is it old fashioned these days?

It depends on how you use it. In one application (website, desktop application), I do not think it is necessary. But if you are developing a web service, how will you inform users of errors? It is always important to indicate an error code here.

 If (error.Message == "User Login Failed") { //do something. } If (error.Code == "102") { //do something. } 

Which one do you prefer?

And these days there is another way for error codes:

 If (error.Code == "LOGIN_ERROR_102") // wrong password { //do something. } 

Others may be: LOGIN_ERROR_103 (for example, the user has expired), etc.

This man is also read by man.

+4
Jan 01 '09 at 22:29
source share

My view of registration (or other actions) from the library code is NEVER.

The library should not impose a policy on the user, and the user may have an INTENDED error. Perhaps the program deliberately requested a specific error, waiting for its arrival to check some condition. Recording this error will be misleading.

A record (or something else) imposes a policy on the caller, which is bad. Moreover, if a harmless error condition (which, for example, would be ignored or safely called by the caller) were to occur with a high frequency, then the volume of the logs could mask any permissible errors or cause problems with stability (filling disks using excessive input-output, etc.)

+4
Jan 01 '09 at 22:53
source share
  • Always process as soon as possible. The closer you are to its occurrence, the more likely you are to do something meaningful, or at least find out where and why it happened. In C ++, this is not just a matter of context, but it cannot be defined in many cases.

  • In general, you should always stop the application if an error occurs that is a real error (it does not look like you are not finding a file that is not something that should be considered an error, but is designated as such). It is not going to just understand, and once the application is broken, it will cause errors that cannot be debugged because they have nothing to do with the area in which they occur.

  • Why not?

  • see 1.

  • see 1.

  • You need everything to be simple, or you will regret it. More important for runtime error handling is testing to avoid them.

  • I like to say that it’s better to centralize or not to centralize. In some cases this can make a lot of sense, but be an empty time in others. For what is a loadable lib / module that may have data errors (garbage collection, garbage collection), this makes tons of sense. For more general error handling or less catastrophic errors.

+2
Jan 02 '09 at 7:05
source share

Error handling is not accompanied by official theory. This is too “concrete realization” of the topic, which will be regarded as a scientific field (in fairness, for the sake of the fact that there is a big discussion about whether programming is an independent science).

Nevertheless, this is a good part of the developer's work (and, therefore, his / her life), therefore, practical approaches and a technical focus have been developed on this topic.

A good look at this topic is presented by A. Alexandrescu in his speech systematic error handling in C ++

I have a repository on GitHub where the methods presented are presented.

Basically, what AA does is implement the class

 template<class T> class Expected { /* Implementation in the GitHub link */ }; 

which is intended to be used as a return value. This class can contain either a return value of type T , or an exception (pointer). An exception can either be thrown explicitly or on request, but rich error information is always available. Usage example would be this way

 int foo(); // .... Expected<int> ret = foo(); if (ret.valid()) { // do the work } else { // either use the info of the exception // or throw the exception (eg in an exception "friendly" codebase) } 

When building this structure for error handling, AA guides us through methods and projects that produce successful or poor error handling and what works and what doesn't. He also gives his definitions of "error" and "error handling"

+2
May 2 '14 at 16:13
source share

Here's a terrific blog post that explains how error handling should be done. http://damienkatz.net/2006/04/error_code_vs_e.html

How to decide whether to handle the error locally or propagate to higher-level code? As Martin Beckett says in another answer, it is about whether it is possible to fix the error here.

How to decide how to register an error or show it as an error message to the user? You should probably never show an error to the user if you think so. Rather, show them a well-formed message explaining the situation without giving too much technical information. Then write down the technical information, especially if this is an error while processing the input. If your code does not know how to handle erroneous input, then this MUST be fixed.

Does it write something that should be done only in the application code? Or is it ok to make some entries from library code. Entering library code is not useful because you cannot even write it. However, an application can record interactions with library code and even through statistics detection errors.

In case of exceptions, where do you usually catch them? In the code of the lower level or higher level? See Question 1.

: ? . 1.

, ( ), - , , . , , , , (0 , 1+ ).

? ? . . , .

+1
01 . '09 22:35
source share

My two cents.

, ? , . , .

, ? , . , . , . , ( ": !" ).

-, ? . , .

, ? ? , ( ). , (, - ). A.

, , try/catch , .

: ? , ( ). . , . .

? ? . , : - , , , . , , -.

+1
01 . '09 22:57
source share

, ?

. , . , . , , - , . , , , .

, ?

. , . , (, , , , , , ), .

-, ? .

. , , , .

, ? ?

, , , ​​/ . .

, ( 3- ).

. , , . , , -.

? ? , . , , . , , / , . .

+1
02 '14 19:58
source share

, , ?

( ) ?

, , - , ( )?

0
01 . '09 22:16
source share

, :

  • , , .
  • , - ; .
  • , .

, , , ; .

, ; .

0
02 . '09 1:04
source share

" : , .NET" . . 7 "". , .

-Krip

0
02 . '09 1:12
source share



All Articles