Isabelle: Sledgehammer finds evidence, but it fails

Question

Isabelle: Sledgehammer finds evidence, but it fails

Often there is a problem that the sledgehammer finds evidence, but when I insert it, it does not end. I think sledgehammer is one of Isabelle's most important parts, but then it gets annoying if the proof fails.

In the Sledgehammer study guide, there is a small chapter, “Why Can't Metis Recover Proof?”

It lists:

Try the isar_proofs option to get a step-by-step Isar proof where each step is justified by metis . Since the steps are quite small, metis more likely to be able to reproduce them.
Try the smt proof method instead of metis . This is usually stronger, but you need to either have Z3 to play the evidence, trust the SMT solver, or use certificates.
Try blast or auto proof methods, passing the necessary facts through unfolding , using , intro: elim: dest: or simp: as needed.

The problem is that the first option makes the proof more detailed, and also involves manual intervention. The second option rarely works.

What about the third option. Is there any easy-to-use heuristic I can apply?

What is the difference between unfolding and using ? Also, are there any recommendations for using intro: elim: and dest: from the failed metis proof?

Partial example

 proof- have "(det (?lm)) = (det (transpose ?lm))" by (smt det_transpose) then have "(det (?lm)) = [...][not shown]" unfolding det_transpose transpose_mat_factor_col by auto then show ?thesis [...][not shown] qed

I would like to get rid of the first line of the proof, since the line seems trivial. If I delete the first line, sledgehammer will still find the proof, but this proof found fails (does not complete).

+4

theorem-proving isabelle

mrsteve Jun 13 '13 at 16:02

source share

3 answers

You ask a few questions, but I will take your headline and the second paragraph as the essence of your main complaint, where I end up giving a long answer that can be summarized,

Sledgehammer is part of a tripartite arsenal,
you become more experienced, with endless experiments, as well as trial and error - this is a heuristic,
not using many of the evidence that Sledgehammer returns is a big part of using Sledgehammer, and
The minimize and preplay_timeout can save you some time and frustration by automatically reproducing evidence that gives you time information, and sometimes showing that the evidence found fails.

Starting in the second paragraph, you say:

Often I have a problem that the Sledgehammer finds evidence. But then I will try, but the proof does not stop. I think Sledgehammer is one of the most important parts of Isabel, ...

The sledgehammer is important, but I consider it part of the tripartite arsenal, where there are three parts:

Detailed validation steps using natural inference.
Automatic verification methods such as auto , simp , rule , etc. Most of this will create your own rules for rewriting simp text and learn to use theorems with rule and many other methods of automatic proofing.
Sledgehammer causing automatic theoretical predictors (ATP). Using steps 1 and 2, with experience, are used to configure Sledgehammer. Experience a lot. You can use auto to simplify so that Sledgehammer succeeds, but you cannot use auto because it will extend the formulas before Sledgehammer has no chance of success.

... but then it becomes annoying if the proof fails.

So, your expectations and my expectations for Sledgehammer diverge. These days, if it annoys me, it annoys me that I will have to work more than 30 seconds to prove the theorem. If I am very disappointed that a certain proof of Sledgehammer fails, it is because I tried to prove the theorem for several hours or days without success.

Using a sledgehammer does not find evidence, but find good evidence

Automation can sometimes alleviate frustration. Clicking on the Sledgehammer's proof, only to find out that this fails, would be unpleasant. Here's how I'm currently using Sledgehammer if I don't start desperately needing proof:

 sledgehammer_params[minimize=smart,preplay_timeout=10,timeout=60,verbose=true, max_relevant=smart,provers=" remote_vampire metis remote_satallax z3_tptp remote_e remote_e_tofof spass remote_e_sine e z3 yices "]

The parameters minimize=smart and preplay_timeout=10 are associated with the reproduction of Sledgehammer evidence once it is discovered. Not the use of the many evidence that Sledgehammer finds is a large part of using Sledgehammer, but the evidence of reproduction is a large part of the rejection of evidence.

I myself am not very versed in Sledgehammer evidence, which does not end there, but probably because of what I choose to start with.

My first criteria for proving Sledgehammer is that it will be fast enough, and therefore, when Sledgehammer reports that he found evidence that it is more than 3 seconds, I do not even try to use it, if I do not desperately want to find whether it is possible to prove the theorem .

Using Sledgehammer for me usually happens as follows:

State the theorem and see if I succeed with the Sledgehammer.
If Sledgehammer gives me proof that 30 milliseconds or less, then I think it's good proof, but I'm still experimenting with try and the automatic proof methods in section 9.4.4, p. 208, isar-ref.pdf, Many times I can get proof up to 5 ms or less.
A metis proof of a total time of more than 100 ms, I am ready to work 30 minutes or more to try to get a faster proof.
A metis proof from 200 ms to 500 ms, I will resort to what I know to try to bring it to a level below 100 ms, which many times means converting to detailed proof.
A smt or metis proof for more than 1 second. I consider it good only as temporary evidence.
The proof in the output panel is that Sledgehammer reports that he is more than 3 seconds, I usually don’t even try, because even if he finishes work, I still have to work to find another proof, so I would rather spend time trying to find good evidence.

Heuristic option

You speak,

What about the third option. Is there any easy-to-use heuristic I can apply?

Heuristic:

"as needed" ,

which means that heuristic "uses a sledgehammer as part of a three-target arsenal."

Heuristics also "reads a lot of tutorials and documentation so you have a lot of other things to use with Sledgehammer." The sledgehammer is strong, but it is not infinitely powerful, and for some theorems you can use your own simp rules to prove in 0ms with apply(simp) or apply(auto) that Sledgehammer will never prove.

For myself, I have up to 150-200 theorems, so “as needed” makes much more sense to me than before. Basically, you are trying to configure Sledgehammer the way it should be configured.

A way to create a sledgehammer will sometimes mean starting auto or simp first, but sometimes not, because repeatedly running auto or simp will result in a Sledgehammer error.

But sometimes you don’t even need Sledgehammer's metis proof, except for the preliminary proof, until you find better proof, which for me usually means faster proof using automatic proof methods.

I am not an authority on Sledgehammer, but it seems that Sledgehammer is good for comparing hypotheses and conclusions from old theorems, and hypotheses and conclusions are used for the new theorem. That this is not good proves formulas that I have greatly expanded using simp and auto .

I continue with a lengthy heuristic that focuses on a sledgehammer:

Use the Sledgehammer to begin the proof process by proving some theorems with the Sledgehammer that you otherwise do not know how to prove.
Turn your equivalent theorems into rewrite rules for simp for use with automatic simp methods such as simp , auto , fastforce , etc., as described in chapter 9 of the .pdf tutorial.
Use some of your theorems for conditional rewrite rules for use with intro and rule .
The last two steps are used to completely solve the proof step or to use the Sledgehammer “as needed” to configure. A sledgehammer never ceases to be useful, no matter how much you know, and it is extremely useful when you don’t know much, but only a sledgehammer is not the path to success.
If Sledgehammer cannot prove the theorem, resort to a detailed proof, starting with a detailed proof with bare bones. Sometimes breaking the if-and-only-if into two conditions allows Sledgehammer to easily prove two conditions if he could not prove the if-and-only-if.
After you have proven many things, go back and optimize your evidence. Sometimes, with all the rewriting rules you created, simp and auto will magically prove things, and you will get rid of some of the metis evidence that Sledgehammer has found for you. Sometimes you use Sledgehammer to find metis evidence that is even faster.

Use this command to optimize synchronization:

 ML_command "Toplevel.timing := true"

There is another SO post, which is described in more detail.

+3

user2503085 Jun 19 '13 at 10:10

source share

I can answer your subquery "What is the difference between unfolding and using ?". Roughly speaking, it works like this.

Suppose lemma foo has the form x = a+b+c . If you write

 unfolding foo

in your proof, then all occurrences of x will be replaced by a+b+c . On the other hand, if you write

 using foo

then x=a+b+c will be added to your list of assumptions.

+1

John wickerson Jun 19 '13 at 6:43

source share

chris · Accepted Answer · 2013-06-20T02:06:23+0000

Regarding your statement, the sledgehammer is one of Isabelle's most important parts: You will never need a sledgehammer to succeed with proof. But, of course, the sledgehammer is very convenient and can save a lot of tedious reasoning. Thus, this is definitely a very important part in order to make Isabelle more suitable for people who have not spent many years using it (and even for these sledgehammers every day is more productive).

Getting to your question

Try blast or auto proof methods, passing the necessary facts through unfolding , using , intro: elim: dest: or simp: if necessary.
[...]
So what about [this] option. Is there any easy-to-use heuristic I can apply?

Indeed, there are:

unfolding : This (recursively) expands the equations, i.e. very similar to apply (simp only: ...) . Heuristic is when you don't get the expected result with simp: ... try unfolding ... instead (maybe this affects other equations).

using : used to add additional assumptions to the current sub-target. Heuristics is when a fact does not correspond to one of the patterns below, try using .

intro: This is used for introduction rules, i.e. forms, when whenever certain assumptions are fulfilled, some connecting (or more general constant) can be introduced. Example: A ==> B ==> A & B (where the entered constant is op & ).

elim: This is used for exception rules, i.e. from the form that from the presence of some connective (or more constant) some facts can be concluded as additional assumptions. Example: A & B ==> (A ==> B ==> P) ==> P (where the constant op & eliminated in favor of explicitly having A and B as assumptions). Pay attention to the general form of the conclusion (which is not connected with the main point of A & B ), it is important not to lose provability (see also dest: .

dest: This is used for destruction rules, i.e. form, which, due to the presence of a certain constant, can be directly concluded. Example: A & B ==> B (Note that the information contained in A is lost as a result, unlike the elim: example.)

simp: This is used for simplification rules, i.e. (conditional) equations, which are always applied from left to right (therefore, it is sometimes useful to add [symmetric] to the fact in order to apply it from right to left, but be careful with inactivation, since looping differentiations are easy to introduce).

Having said that, often it’s just an experience that allows you to decide how best to use this fact inside the evidence. What I usually do when I have sledgehammer evidence that is too slow in Isar is to verify the facts that use the evidence found. Then classify them as described above, call auto accordingly, and if that doesn’t completely solve the problem, apply sledgehammer again (I hope this time we will provide an “easier” proof).

Isabelle: Sledgehammer finds evidence, but it fails

Using a sledgehammer does not find evidence, but find good evidence

Heuristic option

More articles: