Designing Large Projects at OCaml

Question

Designing Large Projects at OCaml

What are the best practices for writing large software projects in OCaml?

How do you structure your projects?

What features of OCaml should and should not be used to simplify code management? Exceptions? First-class modules? GADTs? Types of objects?

System assembly? Testing? Library stack

I found great recommendations

+44

design functional-programming ocaml

John Rivers Oct 13 '13 at 21:38

source share

3 answers

Oasis

To add the answer to Paul:

Disclaimer: I am the author of OASIS.

OASIS also has oasis2opam, which can help you quickly create an OPAM package and oasis2debian to create Debian packages. This is extremely useful if you want to create a “release” goal that automates most tasks for downloading a package.

OASIS also comes with a script called oasis-dist.ml, which automatically creates a tarball to load.

See it all at https://github.com/ocaml.org .

Testing

I use OUnit to run all my tests. It is simple and quite effective if you are used to testing xUnit.

Source management

Disclaimer: I am the owner / maintainer of forge.ocamlcore.org (aka forge.oo)

If you want to use git, I recommend using github. It is really effective for review.

If you use darcs or subversion, you can create an account on forge.oo

In both cases, the presence of a common mailing list in which you send all the notifications about the commit is mandatory so that everyone can see and view them. You can use either Google groups or the mailing list on forge.oo

I recommend having a good webpage (github or forge.oo) with the assembly of OCamldoc documentation every time you commit. If you have a huge code base, it will help you from the very beginning to use the documentation created by OCamldoc (and quickly fix it).

I recommend creating tarballs when you reach a stable stage. Don't just rely on checking out the latest version of git / svn. This advice saved me from work in the past. As Martin said, keep all your tarballs in a central place (the git repository is a good idea for this).

+10

gildor Oct 14 '13 at 22:51

source share

This probably doesn't fully answer your question, but here is my experience regarding the build environment:

I really appreciate OASIS . It has a good set of functions that helps not only create a project, but also write documentation and support a test environment.

Build system

OASIS creates the setup.ml file from the specification ( _oasis file), which works mainly as a building script. It accepts the flags -configure , -build , -test , -distclean . I’m pretty used to them when working with different GNU and other projects that usually use Makefiles, and I find it convenient that you can use all of them automatically here.
Makefiles. Instead of generating setup.ml you can also create a Makefile with all the options described above.

Structure

Usually my OASIS project has at least three directories: src , _build , scripts and tests .

In the previous directory, all source files are stored in one directory: source (.ml) and interface (.mli) files are stored together. Maybe if the project is too large, it is worth adding additional subdirectories.
The _build directory is managed by the OASIS build system. Both source and object files are stored here, and I like that the build files do not interfere with the source files, so I can easily delete them if something goes wrong.
I store several shell scripts in the scripts directory. Some of them are designed to run tests and create interface files.
All input and output files for tests that I store in a separate directory.

Interfaces / Documentation

Using interface files (.mli) has both advantages and disadvantages for me. It really helps to find errors of the type, but if you have them, you should edit them when making changes or improvements to your code. Sometimes forgetting about it causes unpleasant errors.

But the main reason I like interface files is the documentation. I use ocamldoc to create (OASIS supports this function with the -doc flag) html documentation pages automatically. In my opinion, it’s enough to write comments describing each function in the interface, and not insert comments in the middle of the code. In OCaml, functions are usually short and concise, and if you need to insert additional comments there, it might be better to separate the function.

Also note the -i flag for ocamlc . The compiler can automatically generate an interface file for the module.

Test

I did not find a reasonable solution to support the tests (I would like to have some ocamltest application), so I use my own scripts to execute and test use ocamltest . Fortunately, OASIS supports the execution of custom commands when setup.ml is executed with the -test flag.

I have not used OASIS for a long time, and if anyone knows any other interesting features, I would also like to know about them.

Also, you do not know OPAM , it is definitely worth a look. Without it, installing and managing new packages is a nightmare.

+5

Pavel Zaichenkov Oct 14 '13 at 0:45

source share

Martin Jambon · Accepted Answer · 2013-10-14 07:16

I am going to answer for a medium-sized project under conditions that are familiar to me, that is, between 100 KB and 1M lines of source code and up to 10 developers. This is what we are using now for a project launched two months ago in August 2013.

Creating a system and organizing code:

one source shell script defines PATH and other variables for our project
one .ocamlinit file in the root of our project loads a bunch of libraries when a toplevel session starts
omake, which is fast (with the -j option for parallel builds); but we avoid creating crazy custom omake plugins.
one root Makefile contains all the main goals (configuration, assembly, testing, cleaning, etc.)
one level of subdirectories, not two
most subdirectories are built into the OCaml library
some subdirectories contain other things (settings, scripts, etc.).
OCAMLPATH contains the root of the project; each subdirectory of the library creates a META file, making all parts of OCaml projects accessible from the top level using # require.
only one OCaml executable file was built for the entire project (saves a lot of time on connection, still not sure why)
libraries are installed using script installation using opam
local opam packages are created for software that is not in the official opam repository
we use the opam switch, which is an alias named after our project, avoiding conflicts with other projects on the same machine.

Source Code Editing:

emacs with opam ocp-indent and ocp-index packages

Source management and management:

we use git and github
all new code is viewed through github pull requests
tarballs for non-opam non-github libraries are stored in a separate git repository (which can be deleted if the story gets too big)
Existing github libraries with bleeding are deployed to our github account and installed through our own local opam package.

Using OCaml:

OCaml will not compensate for poor programming methods; teaching good taste is beyond the scope of this answer. http://ocaml.org/learn/tutorials/guidelines.html is a good starting point.
OCaml 4.01.0 makes it much easier than before to reuse field field labels and variant constructors (i.e. type t1 = {x:int} type t2 = {x:int;y:int} let t1_of_t2 ({x}:t2) : t1 = {x} now works)
we try not to use camlp4 syntax extensions in our own code
we do not use classes and objects, if it is not provided by any external library
in theory, since OCaml 4.01.0, we should prefer the classic options over polymorphic options.
we use exceptions to indicate errors, and let them go happily until our main server loop catches them and interprets them as an “internal error” (default), “bad request” or something else
exceptions, such as Exit or Not_found, can be used locally when it makes sense, but in module interfaces we prefer to use parameters.

Libraries, protocols, frameworks:

we use batteries for all product features not found in the OCaml standard library; for the rest we have the util library.
we use Lwt for asynchronous programming without syntax extensions, and the binding operator (→ =) is the only operator we use (if you need to know, we are reluctant to use camlp4 preprocessing to better track exceptions when binding a point).
we use HTTP and JSON to communicate with third-party software, and we expect every modern service to provide such APIs.
to serve HTTP, we run our own SCGI server (ocaml-scgi) for nginx
as an HTTP client, we use Cohttp
for serializing JSON we use atdgen

Cloud Services:

we use quite a lot because they are usually cheap, easily interact with them and solve the problems of scalability and service for us.

Testing:

We have one make / omake target for quick tests and one for slow tests.
quick tests are single tests; each module can provide a “test” function; file test.ml launches a list of tests
slow tests are those that include launching several services; They were created specifically for our project, but they cover as much as possible, like production services. Everything works locally, either on Linux or MacOS, with the exception of cloud services, for which we find ways not to interfere with production.

There are quite a lot of settings for this, especially for those who are not familiar with OCaml. There is no infrastructure that still takes care of all this, but at least you get a selection of tools.

Designing Large Projects at OCaml

Oasis

Testing

Source management

More articles: