How to safely translate data tree structures between C ++ / Ocaml?

I have an obsolete data structure written in C ++ and a new tool in OCaml that is expected to deal with this obsolete data. Therefore, I need to import / translate data from the first to the last. The data is presented in the form of a tree and is usually processed by visitors.

As a simple example, consider this minimal DSL:

#include <memory> using namespace std; class intnode; class addnode; struct visitor { virtual void visit(const intnode& n) = 0; virtual void visit(const addnode& n) = 0; }; struct node { virtual void accept(visitor& v) = 0; }; struct intnode : public node { int x; virtual void accept(visitor& v) { v.visit(*this); } }; struct addnode : public node { shared_ptr<node> l; shared_ptr<node> r; virtual void accept(visitor& v) { v.visit(*this); } }; 

His view in OCaml is:

 type node = Int of int | Plus of node * node let make_int x = Int x let make_plus lr = Plus(l,r) 

The question is, how can I safely and efficiently convert a C ++ tree to its OCaml representation?

So far I have had two approaches:

Approach 1

Write to the visitor who calls the OCaml constructors and gives a value , for example. something like that:

 value translate(shared_ptr<node> n); struct translator : public visitor { value retval; virtual visit(const intnode& n) { retval = call(make_int, Val_int(x->value)); } virtual visit(const addnode& n) { value l = translate(nl); value r = translate(nr); retval = call(make_add, l, r); } }; value translate(shared_ptr<node> n) { translator t; t.visit(*n); } 

Just suppose call executes all the necessary scaffolding to return to OCaml and calls the correct constructor.

The problem with this approach is the OCaml collector builder. If the GC works, and the C ++ side has some value on - this is the stack, this value (which in the end is a pointer to the OCaml heap) may be invalid. So I need to somehow tell OCaml that the values ​​are still needed. This is usually done using CAML * macros, but how to do this in this case? Can I use these macros inside visit methods?

Approach 2

The second approach is more complicated. When there is no way to safely store intermediate links, I could flip everything and direct C ++ pointers to an OCaml heap:

 type cppnode (* C++ pointer *) type functions = { transl_plus : cppnode -> cppnode -> node; transl_int : int -> node; } external dispatch : functions -> cppnode -> node = "dispatch_transl" let rec translate n = dispatch {transl_plus; transl_int = make_int} n and transl_plus ab = make_plus (translate a) (translate b) 

The idea here is that the submit function will wrap all the auxiliary nodes in CustomVal structures and pass them to OCaml without saving any intermediate values. The corresponding visitor will only perform pattern matching. This should work clearly with GC, but has the disadvantage that it is slightly less efficient (due to pointer wrapping) and potentially less readable (due to the difference between send and restore).

Is there a way to get the security of approach 2 with the elegance of approach 1?

+5
source share
2 answers

I see no problem building OCaml values ​​in the C stack, even in the recursive case. In your example, you use a member of the structure to store the OCaml heap value. It is also possible, however, you need to use caml_register_global_root or caml_register_generational_root and release them with caml_remove_global_root or caml_remove_generational_global_root . In fact, you can even create a smart pointer that will contain OCaml values.

With all these words, I still don’t see the reasons (at least for the simplified example that you demonstrated) why you should go to class members for this, so I would solve it:

 struct translator : public visitor { virtual value visit(const intnode& n) { CAMLparam0(); CAMLlocal1(x); x = call(make_int, Val_int(n->value); CAMLreturn(x); } virtual value visit(const addnode& n) { CAMLparam0(); CAMLlocal(l,r,x); l = visit(*nl); r = visit(*nr); x = call(make_add, l, r); CAMLreturn(x); } }; 

This, of course, assumes that you have a visitor that can return values ​​of arbitrary types. If you do not have it and you do not want to implement it, you can gradually increase your value:

 value translate(shared_ptr<node> n); class builder : public visitor { value result; public: builder() { result = Val_unit; // or any better default caml_register_generational_global_root(&result); } virtual ~builder() { caml_remove_generational_global_root(&result); } virtual void visit(const intnode& n) { CAMLparam0(); CAMLlocal1(x); x = call(make_int, Val_int(n->value); caml_modify_generational_global_root(&result, x); CAMLreturn0; } virtual void visit(const addnode& n) { CAMLparam0(); CAMLlocal(l,r,x); l = translate(nl); r = translate(nr); x = call(make_add, l, r); caml_modify_generational_global_root(&result,x) CAMLreturn0; } }; value translate(share_ptr<node> node) { CAMLparam0(); CAMLlocal1(x); builder b; b.visit(*node); x = b.result; CAMLreturn(x); } 

You can also see the Berke Durak Aurochs project, which builds in-place parsing trees using C.

+2
source

Personally, I would write a damper in C ++ and an analyzer of this dump in OCaml. If you are not afraid of a more complicated route, perhaps you can take a look at this tool: https://github.com/Antique-team/clangml

0
source

Source: https://habr.com/ru/post/1272332/


All Articles