An average, average, and standard deviation alone is unlikely to be enough, especially if you have outliers.
If exact percentiles are required, this is a parallel computation problem. Some work has been done in this direction, for example, in the parallel mode of the C ++ STL library .
If only approximate percentiles are required, then Cross Validated raises the question - Estimating quantiles of given quantiles of a subset - this suggests an approach to subsampling. You would take some (but not all) of the data from each data set, create a new combined data set that is small enough to fit on one machine and calculate the percentiles of it.
Another approximate approach, effective if the percentiles of each segment are already available, will approximate the cumulative distribution function of each segment as a step function of the percentile. Then the total distribution would be a finite mixture of segment distributions, and the cumulative distribution function would be the weighted sum of the cumulative distribution functions of the segment. The quantile function (i.e., percentiles) can be calculated by numerically inverting the cumulative distribution function.
source share