I'm not quite sure if you mean this:
strats = [] for k in range(11): y_val = k*0.1 dummy_df = your_df[your_df['Y'] == y_val] stats.append( dummy_df.sample(200) )
This makes a dummy framework consisting only of the Y values ββyou want, and then takes a sample of 200.
OK, so you need different pieces in order to have the same structure. I think this is a little more complicated, here is how I would do it:
First of all, I would get a histogram of what X1 looks like:
hist, edges = np.histogram(your_df['X1'], bins=np.linespace(min_x, max_x, nbins))
we have a bar chart with nbins bins.
Now the strategy is to draw a certain number of lines depending on what their value is X1 . We will extract more from the bins with more observations and less from the bins with less, so that the structure X preserved.
In particular, the relative contribution of each bin should be:
rel = [float(i) / sum(hist) for i in hist]
It will be something like [0.1, 0.2, 0.1, 0.3, 0.3]
If we need 200 samples, we need to draw:
draws_in_bin = [int(i*200) for i in rel]
Now we know how many observations need to be made from each bin:
strats = [] for k in range(11): y_val = k*0.1