Erlang let-it-crash philosophy - applicable elsewhere?

Erlang (or Joe Armstrong?) Advises NOT to use defensive programming and to allow processes to crash (rather than pollute your code with unnecessary guards trying to track the wreckage) makes so much sense now I wonder why I spent so much effort processing errors for many years!

What interests me is this approach applies only to platforms like Erlang? Erlang has a virtual machine with simple built-in support for process control trees, and restarting processes is very fast. Should I spend my development efforts (when not in the Erlang world) on recreating observation trees, and not impose myself on top-level exception handlers, error codes, zero results, etc. Etc. Etc.

Do you think this change in approach will work well in the (for example) .NET or Java space?

+49
java erlang defensive-programming
Dec 08 '10 at
source share
6 answers

It is applicable everywhere . Regardless of whether you write your software in the โ€œlet it crashโ€ template, it will still crash, for example, if a hardware failure. โ€œLet this crashโ€ apply wherever you need to confront reality. Quoth James Hamilton:

If a hardware failure requires any immediate administrative action, the service simply will not save and scale reliably. The entire service should be able to survive without administrative human interaction. Failure recovery should be a very simple way, and this way should be checked often. Armando Fox of Stanford argued that the best way to check the path of failure is to never close the service normally. Itโ€™s just hard to do it. This sounds counter-intuitive, but if failure paths are often used, they will not work when necessary.

This does not mean "never use guards." But don't be afraid to crumble!

+29
Dec 08 '10 at 23:01
source share

Yes, this applies everywhere, but it is important to note in which context it is intended for use. It does not mean that the application as a whole crashes, which, as @PeterM pointed out, can be catastrophic in many cases. The goal is to create a system that generally never crashes, but can handle errors internally. In our case, these were telecommunication systems, which are expected to have an idle time of the order of minutes per year.

The main design is to fold the system and isolate the central parts of the system to control and control the other parts that do the work. OTP terminology has supervisory and work processes. Supervisory authorities carry out work to monitor workers and other supervisory authorities with the aim of their correct restart when they break up, when workers perform all the actual work. Correct structuring of the system in layers using this principle of strict separation of functionality allows isolating most of the error handling from workers in supervisors. You are trying to create a kernel with a small error, which, if it is correct, can handle errors anywhere in the rest of the system. It is in this context that the let-it-crash philosophy is intended to be used.

You get the paradox of where you think about mistakes and failures everywhere, with the goal of actually dealing with them in as many places as possible.

How best to handle the error, of course, depends on the error and the system. Sometimes itโ€™s best to try to catch errors locally inside the process and try to handle them there, with the possibility of a second failure if this does not work. If you have several collaborative workflows, it is often best to corrupt them and restart them. This is the supervisor who does this.

You need a language that throws errors / exceptions when something goes wrong so that you can trap them or cause the process to crash. Just ignoring the returned error values โ€‹โ€‹is not the same thing.

+22
Dec 09 '10 at 22:29
source share

It is called fail-fast. This is a good paradigm if you have a team of people who can respond to failure (and do it quickly).

In NAVY, all pipes and electrical devices are installed on the outside of the wall (preferably on the wider side of the wall). Thus, if there is a leak or problem, it will most likely be detected quickly. In NAVY, people are punished for refusing to refuse, so it works very well: failures are detected quickly and quickly work.

In a scenario where someone cannot act quickly with an error, the question becomes whether it is more profitable to do this to prevent a system shutdown or to assimilate a failure and try to continue further.

+5
Dec 08 2018-10-12T00:
source share

I write programs that rely on data from real-life situations, and if they fail, they can do a lot of $$ physical damage (not to mention the big loss in $$). I would not work if I did not program defensively.

With that said, I think Erlang should be a special case that not only can restart things instantly, that a restartable program can pop up, look around and say "ahhh .. that's what I do!"

+5
Dec 08 '10 at 23:03
source share

My colleagues and I thought that the topic is not particularly technological, but more from the point of view of the domain and from the point of view of security.

Question: "Is it safe to crash?" or better. "Can you even apply a reliability paradigm like Erlangs to" let it crash "into security related software projects?"

To find the answer, we did a small research project using a scenario close to reality with an industrial and especially medical education. Take a look here ( http://bit.ly/Z-Blog_let-it-crash ). There is even paper to load. Tell me what you think!

Personally, I believe that it is applicable in many cases and even desirable, especially when there are many mistakes that need to be made (security-related systems). You cannot always use Erlang (there are no real-time functions, there is no real built-in support, costumer whishes ...), but I am sure that you can implement it differently (for example, using threads, exceptions, message passing). I have not tried it yet, but I would like to.

+4
Sep 14 '13 at 12:10
source share

IMHO Some developers process / transfer checked exceptions with code that has little value. It is often easier to allow a method to throw an initial exception if you are not going to handle it and add some value.

+2
Dec 08 2018-10-12T00:
source share



All Articles