Limited Weighted Choice

I have a weighted selection algorithm that works, but I would like to improve it in two aspects (in order of importance):

  • Ensure that the minimum quantity is selected from each possible choice.
  • Reduce computational complexity from O (nm) to O (n) or O (m), where n is the requested number of randomly selected elements, and m are the types of available elements.

Edit: The number of requested numbers is usually small (less than 100) for my purposes. Thus, algorithms with complexity O (t) or O (t + n), where t is the total number of elements, is usually worse than O (nm) due to O (t)> O (m).

Simplified code:

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Security.Cryptography; public class Program { static void Main(string[] args) { // List of items with discrete availability // In this example there is a total of 244 discrete items and 3 types, // but there could be millions of items and and hundreds of types. List<Stock<string>> list = new List<Stock<string>>(); list.Add(new Stock<string>("Apple", 200)); list.Add(new Stock<string>("Coconut", 2)); list.Add(new Stock<string>("Banana", 42)); // Pick 10 random items // Chosen with equal weight across all types of items foreach (var item in Picker<string>.PickRandom(10, list)) { // Do stuff with item Console.WriteLine(item); } } } // Can be thought of as a weighted choice // where (Item Available) / (Sum of all Available) is the weight. public class Stock<T> { public Stock(T item, int available) { Item = item; Available = available; } public T Item { get; set; } public int Available { get; set; } } public static class Picker<T> { // Randomly choose requested number of items from across all stock types // Currently O(nm), where n is requested number of items and m is types of stock // O(n) or O(m) would be nice, which I believe is possible but not sure how // O(1) would be awesome, but I don't believe it is possible public static IEnumerable<T> PickRandom(int requested, IEnumerable<Stock<T>> list) { // O(m) : to calcuate total items, // thus implicitly have per item weight -> (Item Available) / (Total Items) int sumAll = list.Sum(x => x.Available); // O(1) if (sumAll < requested) throw new ArgumentException("Requested amount must not exceed total available"); // O(1) Random rng = new Random(Seed()); // O(n) for the loop alone : O(nm) total for (int n = 0; n < requested; n++) { // O(1) to choose an item : uses implicit ordering int choice = rng.Next(1, sumAll); int current = 0; // O(m) to find the chosen item foreach (Stock<T> stock in list) { current += stock.Available; if (current >= choice) { yield return stock.Item; // O(1) to re-calculate weight once item is found stock.Available -= 1; sumAll--; break; } } } } // Sufficiently random seed private static int Seed() { byte[] bytes = new byte[4]; new RNGCryptoServiceProvider().GetBytes(bytes); return bytes[0] << 24 | bytes[1] << 16 | bytes[2] << 8 | bytes[3]; } } 

The PickRandom() function uses yield return and IEnumerable , but is not required. I was just trying to be smart when I first wrote this function so that it could scroll through something (even say, an enum from the LINQ to SQL query). Subsequently, I found that, despite the flexibility, I never needed it.

My first thought about resolving point number 1 (I guarantee that the minimum number from each possible choice is selected) would be to select the required minimum from each type in a completely nonrandom way, use my existing algorithm to select the remaining unststrined part, then shuffle the results together. It seemed the most natural and imitative, as I will do something similar in real life, but I think that this is not the most effective way.

My second idea was to first create an array of results, randomly select indexes to fill in the required minima first, and then fill in the rest using my existing algorithm, but in all my attempts this led to an increase in the complexity of "big O" or how a big mess of indexes being written all over. I still think that such an approach is possible, I just could not solve it yet.

Then I decided to come here, since this problem seems that it can be abstracted to a fairly general algorithm, but all the keywords that I use for searching usually indicate me the generation of a weighted random number (as opposed to choosing individual elements grouped by type with specific availability). And not a single one was found that holds back the problem with a minimum choice for each type of element, but still remains randomized. Therefore, I hope that someone bright can see a simple effective solution, or someone who has heard this problem before they recognize some better keywords than I can and point me in the right direction.

+6
source share
2 answers

Here is a rough idea; I am sure this can be further improved:

  • Suppose that each available item has a unique identifier in the range [0..sumAll [where sumAll is the number of available items. So, the first apple has ID 0, the last apple has ID 199, the first coconut has ID 200, etc. The definition of sumAll and the subrange of each type is O (m), where m is the number of types.

  • Choose a random identifier (with all identifiers having the same weight). Repeat this until you have a set of 10 different identifiers. This is O (n), where n is the number of items to select.

  • Determine the type of each selected identifier using a binary search. This is O (n log m).

  • Remove the selected items from the available items. This is O (m).

To guarantee the minimum number of items selected for each type, selecting them and removing them from the available items before step 1, sounds like a good idea. This is O (m).

+2
source

Great question. I think O (mn) is the main case, because for every n (number of elements) you need to re-evaluate the weighting (this is O (m)).

The Picker class always seems to return the same type — you are not mixing stock types here. In your example, Stock<string> . Therefore, perhaps the Picker class can flatten all your stocks into one list — less efficient memory, more computationally efficient.

 public static IEnumerable<T> PickRandom(int requested, IEnumerable<Stock<T>> list) { var allStock = list.SelectMany(item => Enumerable.Range(0, item.Available).Select(r => item.Item)).ToList(); Random rng = new Random(); for (int n = 0; n < requested; n++) { int choice = rng.Next(0, allStock.Count - 1); var result = allStock[choice]; allStock.RemoveAt(choice); yield return result; } } 

The loss here is that you do not modify the original Stock objects, but it is an implementation that you can do (your sample shows Stock objects created as anonymous parameters for Picker).

change

Here is another example that is very similar to your existing code. He will create a dictionary in which you can find your choice. But then again, the dictionary (which controls weighting) needs to be reevaluated after each selection, which leads to O (mn).

 public static IEnumerable<T> PickRandom(int requested, IEnumerable<Stock<T>> list) { Random rng = new Random(); for (int n = 0; n < requested; n++) { int cumulative = 0; var items = list.ToDictionary(item => new { Start = cumulative, End = cumulative += item.Available }); int choice = rng.Next(0, cumulative - 1); var foundItem = items.Single(i => i.Key.Start <= choice && choice < i.Key.End).Value; foundItem.Available--; yield return foundItem.Item; } } 

Logically, is it possible to overestimate the weighting without taking into account all your categories?

0
source

Source: https://habr.com/ru/post/895723/


All Articles