Even if the accepted answer fully meets the requirements, there is some overhead in it. First of all, since we are talking about TPL, the volume of data arrays is probably large, so just creating a lot of arrays is a lot. In addition, the solution offered by @viveknuna does not guarantee the order for the pieces. If this is normal, you should probably use a response from @DmitryBychenko with a little update:
const int chunkSize = 3;
var array = Enumerable.Range(1, 9).ToArray();
var partitioner = Partitioner.Create(0, array.Length, chunkSize);
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount };
Parallel.ForEach(partitioner, parallelOptions, part =>
{
Parallel.ForEach(array.Skip(part.Item1).Take(chunkSize), parallelOptions, value =>
{
});
});
, ParallelOptions.MaxDegreeOfParallelism 1, , .