I created a simple regex engine that supports concatenation, interleaving, closing, and char a .. z .
The way I represent nfa and dfa is to use a notation:
type state = int with sexp, compare type alphabet = char with sexp, compare type transaction = state * alphabet option * state with sexp, compare type d_transaction = state * alphabet * state with sexp, compare type state_set = State_set.t type states_set = States_set.t type nfa = { states : State_set.t ; alphabets : Alphabet_set.t ; transactions : Transaction_set.t; start_state : state; final_states : State_set.t; } type dfa = { d_states : State_set.t ; d_alphabets : Alphabet_set.t; d_transactions : D_Transaction_set.t ; d_start_state : state ; d_final_states : State_set.t; }
For example, the string "a *" will be analyzed on Closure (Char 'a') , and then nfa will be converted: states: 0 1 2 3 alphabets: a transactions: 0->e->1, 1->a>2, 2->e->3, 2->e->1, 0->e->3 start_state: 0 final_states: 3
then dfa :
states: 0 1 alphabets: a transactions: 0->a->1, 1->a->1 start_state: 0 final_states: 0 1
However, I use a lot of recursion in my code. The way my program generates a status number for each node in nfa and dfa is really unpredictable. I do not know how to verify the correctness of the created dfa without using a pen and a sheet of paper to check it myself.
I am trying to find the best way to test my code so that I can add additional functions to the program in the future.
Can someone please give me some suggestions?
齐天 大圣 source share