I am using scikit Random Forest:
sklearn.ensemble.RandomForestClassifier(n_estimators=100, max_features="auto", max_depth=10)
After calling rf.fit(...) , the process memory usage increases by 80 MB or 0.8 MB per tree (I also tried many other settings with similar results. I used top and psutil to control memory usage)
A binary tree of depth 10 should contain no more than 2^11-1 = 2047 elements that can be stored in one dense array, which allows the programmer to easily find the parents and children of any element.
Each element needs an index of the function used in split and cut-off, or 6-16 bytes, depending on how economical the programmer is. In my case, this means 0.01-0.03 MB per tree.
Why does the scikit implementation use 20-60 times the amount of memory to store a random forest tree?
source share