Should I put regression tests with large test data into the source code repository?

Question

Should I put regression tests with large test data into the source code repository?

I have a set of scripts and modules for a total of several megabytes. Regression tests and the necessary data amount to hundreds of megabytes b / c of the nature of the data we work with. "Best practice" is to store regression tests and large test data with actual source code?

Please note that there is a separate set of unit tests, which are much smaller and test individual modules. But to run the main pipelines, you need real (big) data to be useful.

+4

version-control

cespinoza Feb 16 '11 at 19:57

source share

1 answer

Marnix klooster · Accepted Answer · 2011-02-16T20:55:44+0000

I think you should look at the various forces that play a role here.

A specific version of the tests (and their data) checks the specific version of the code. Therefore, it is desirable to be able to record changes in both tests and code together.
Having large test suites under source control can hurt performance for those who don’t always need them: “svn checkout” (or “cleartool mkview -snaphost” or whatever you have) copies a lot of files, the test run gets longer, etc. Therefore, it is desirable to separate the large from a small integration test from the unit test.

My conclusion is to save them together in the same repository, but make sure that there is a way to work with all but -larger-tests-and-their-big-data. For example, in Subversion there may be folders /code/src , /code/test/unit , /code/test/integration and /testdata . Thus, many developers could simply "svn checkout ... / code" and ignore large test suites. And your continuous build tool will use the whole tree so that it can run integration tests.

Should I put regression tests with large test data into the source code repository?

More articles: