OCaml: how to test my own regex library

I created a simple regex engine that supports concatenation, interleaving, closing, and char a .. z .

The way I represent nfa and dfa is to use a notation:

 type state = int with sexp, compare type alphabet = char with sexp, compare type transaction = state * alphabet option * state with sexp, compare type d_transaction = state * alphabet * state with sexp, compare type state_set = State_set.t type states_set = States_set.t type nfa = { states : State_set.t ; alphabets : Alphabet_set.t ; transactions : Transaction_set.t; start_state : state; final_states : State_set.t; } type dfa = { d_states : State_set.t ; d_alphabets : Alphabet_set.t; d_transactions : D_Transaction_set.t ; d_start_state : state ; d_final_states : State_set.t; } 

For example, the string "a *" will be analyzed on Closure (Char 'a') , and then nfa will be converted: states: 0 1 2 3 alphabets: a transactions: 0->e->1, 1->a>2, 2->e->3, 2->e->1, 0->e->3 start_state: 0 final_states: 3

then dfa :

states: 0 1 alphabets: a transactions: 0->a->1, 1->a->1 start_state: 0 final_states: 0 1

However, I use a lot of recursion in my code. The way my program generates a status number for each node in nfa and dfa is really unpredictable. I do not know how to verify the correctness of the created dfa without using a pen and a sheet of paper to check it myself.

I am trying to find the best way to test my code so that I can add additional functions to the program in the future.

Can someone please give me some suggestions?

+5
source share
3 answers

One rather complicated plan would be to convert the DFA back to a regular expression and then check if the result is equivalent to your original regular expression. Here is an SO page that provides several ways to test for RE equivalence: Regex: determine if two regular expressions can match for a single input?

We hope that the two inverse transformations will help to debug each other :-)

+2
source

With the exception of the formal check, you could:

  • Use a unit test library such as OUnit or alcotest to test your engine with a lot of examples. Here is a good blog post comparing some other testing libraries.
  • Combine this with a coverage tool like Bisect_ppx , which serves two purposes: it directly helps make sure that your examples check the various branches in your generator, and also indirectly makes you look more closely at the generator and think about how to write test examples different code codes. Here's another blog post that briefly compares coverage tools.

EDIT: I think if you want to directly test your DFAs, you can write some small specialized “coverage tool” for your specific types that tells you what percentage of states and / or states / transition pairs you reached during testing each DFA, and which ones. This will be some kind of modified form of the function that you are currently using to walk through the DFA along the input lines.

Disclaimer: I am currently contributing to the improvement of Bisect_ppx (which is the “modern” Bisect plug). However, I am not associated with anything else mentioned here.

+2
source

One property-based test for your regex library should write

  • Regular Expression Generator Generating Random Regular Expressions
  • A string generator that, with a given regular expression, creates random strings in this language.
  • A property that, given a regular expression, matches your regular expression expression, displays a string generator.

You can use QCheck to test properties in OCaml.

+2
source

Source: https://habr.com/ru/post/1241635/


All Articles