How to organize code with random numbers in version control systems for analysis?

Question

How to organize code with random numbers in version control systems for analysis?

this is basically a question about organizing files in VCS like git, but in order to give a good idea of my problem, the subject is quickly introduced here:

Project Overview

I am working on a project where probabilistic models with a neural network are implemented and tested for different sets of parameters. We are currently implementing it in Python, although the problem may be relevant for different programming languages. The result is usually error measurements or graphs or something like that. Now at the moment our project is as follows:

several people work on the project code base and implement new features
some other people are already trying to study the behavior of the model for different sets of parameters, that is, to find out for which ranges of parameters the model shows a qualitatively different behavior.

We are currently using git with GitHub as a VCS with one main branch for the current stable version and one branch for each member of our team for active development. We exchange code by merging branches and merging to master what seems like a stable new feature.

One big problem overall is that it is a research project without a clear design of the project. Sometimes we specifically fix some errors or implement something planned using function branches. But sometimes it’s not clear what the next function will be, or it’s even possible to implement what we have in mind. Some of us mainly study the behavior of our model in a more or less structured way. I know. But the way it is.

Probabilistic Behavior Management

Our model is probabilistic at many levels. Different parts are initialized with random numbers, and random numbers are also used during model modeling.

, - . , , , . , , , numpy python

import numpy as np
np.random.seed(42)
a = np.random.rand() # -> will always be 0.3745401188473625
b = np.random.rand() # -> will always be 0.9507143064099162

:

1) ?

, , :

* master
|
*---------------
|\              \
* * experiment1  * experiment2
| |              |
. * tag setting1 * tag setting1
. |              |
. * tag setting2 * tag setting2

, , , . , .

- , , . , .

, ? :

2)

, , , . . , , , .

import numpy as np
np.random.seed(42)
a = np.random.rand() # -> will always be 0.3745401188473625

# fixing some stuff here
c = np.random.rand()
# -> will be 0.9507143064099162 as was previously 'b'

b = np.random.rand()
# -> will now be 0.7319939418114051 and not anymore 0.9507143064099162

# ...
# code using 'b' will behave differently

, , :

( )
,

, , , , . , , , , .

- ?

+4

git version-control random

PaFa 26 . '16 7:57

1

Martin Nyolt · Accepted Answer · 2016-08-26T08:48:24+0000

.

, , . , . .

:

, , . ANN ..
- , , " x ?". , .
, , .

(: . , , , .)

git, . , :

X-<date>-<title>/   # repository for experiment
|___ models/
|    |___ M1/       # submodule for M1 repository
|    |___ M2/       # submodule for M2 repository
|    |
|     ...
|___ code/          # submodule for your core algorithms
|___ data/          # a copy or link to your data sets
|___ experiment1.sh # script to run your experiment
|___ experiment2.sh # possibly some more sub-experiments

( , ) . , .

. , (, experiment1.sh), . , , . , git checkout && git submodule update. .

.

- - . . , experiment1.sh, code/my-tool --seed=42 models/M1/model.ann < input.dat. experiment1.sh , . .

, , , ( ) . : /?

, . , , (, , -). , , , , , .

, . ! , , . , .

How to organize code with random numbers in version control systems for analysis?

Project Overview

Probabilistic Behavior Management

More articles: