What is the best way to efficiently extract a small random subset of a large enumerable?

What is the best way to capture n elements from IEnumerable <T> in random order?

I am writing a storage API and should provide a small set of random elements from sometimes a huge number of elements. The proper listing is sometimes an array, and sometimes a lazy evaluated filter of the specified array.

Since I just grab a proportionally small number of elements from the enumerations, it’s better to use some sort of randomly indexed index in the enumeration and check it every time, rather than randomly sorting the entire list using the existing algorithm and grabbing the top of x, right?

Any better ideas?

+3
source share
4 answers

If you know the number of elements in advance, it is quite simple to calculate n random numbers in this range, and then capture them with these indices.

0
source

Here is another idea:

using System;
using System.Collections.Generic;
using System.Linq;

namespace RandomElements
{
    class Program
    {
        static IEnumerable<int> GetRandomElements(IEnumerable<int> source, int count)
        {
            var random = new Random();
            var length = source.Count();
            var enumerator = source.GetEnumerator();

            if (length < count)
            {
                throw new InvalidOperationException("Seriously?");
            }

            while (count > 0)
            {
                const int bias = 5;
                var next = random.Next((length / bias) - count - bias) + 1; // To make sure we don't starve.
                length -= next;

                while (next > 0)
                {
                    if (!enumerator.MoveNext())
                    {
                        throw new InvalidOperationException("What, we starved out?");
                    }

                    --next;
                }

                yield return enumerator.Current;

                --count;
            }
        }

        static void Main(string[] args)
        {
            var sequence = Enumerable.Range(1, 100);
            var random = GetRandomElements(sequence, 10);

            random.ToList().ForEach(Console.WriteLine);
        }
    }
}

You only need to go through the enumeration once (if you go to ICollection, that is, otherwise it must know the length). This can be useful if it’s expensive to switch to an enumeration or copy all the elements or something else.

, , , "", 22, , . , - ? , .

.

+1

, .

, , , .

0

Knuthe Shuffle, . , , n . , , , , .

, , , . , , .

0

Source: https://habr.com/ru/post/1705570/


All Articles