If you are using pandas data frames, then one approach is to use pandas.DataFrame.to_csv()
both pandas.read_csv()
to save and load cleared data between each step.
- Notebook1 loads input1 and saves the result1.
- Notebook2 downloads result1 and saves result2.
- Notebook3 downloads result2 and saves result3.
If this is your data:
import pandas as pd
raw_data = {'id': [10, 20, 30],
'name': ['foo', 'bar', 'baz']
}
input = pd.DataFrame(raw_data, columns = ['id', 'name'])
Then in notebook1.ipynb process it like this:
df = pd.read_csv('input.csv', index_col=0)
df.to_csv('result1.csv')
... and repeat this process for each step in the chain.
df = pd.read_csv('result1.csv', index_col=0)
df.to_csv('result2.csv')
In the end, your laptop collection will look like this:
- input.csv
- notebook1.ipynb
- notebook2.ipynb
- notebook3.ipynb
- result1.csv
- result2.csv
- result3.csv
Documentation:
source
share