I am working on a scientific project aimed at studying the behavior of people.
The project will be divided into three parts:
- A program for reading data from some remote sources and creating a local data pool with it.
- Program for checking this data pool and ensuring integrity
- A web interface that allows people to read / manipulate data.
The data consists of a list of people, all with ID #, and with several characteristics: height, weight, age, ...
I need to easily make groups of this data (for example, everything with a given age or height range), and the data is several TB large (but can decrease in smaller subsets of 2-3 gb).
I have strong experience with the theoretical material behind the project, but I am not a computer scientist. I know java, C and Matlab, and now I am learning python.
I would like to use python as it seems quite simple and greatly reduces Java verbosity. The problem is that I am wondering how to process the data pool.
I'm not a database expert, but I probably need it here. Which tools do you think I should use?
Remember that the goal is to implement very advanced mathematical functions on data sets, so we want to reduce the complexity of the source code. Speed ββis not a problem.
source share