Rules Subset Generation

Let's say that we have 5,000 users in the database. The user’s line has a table for sex, a place where he / she was born in the column and status (married or unmarried).

How to create a random subset (say, 100 users) that will satisfy these conditions:

  • 40% should be men and 60% - women
  • 50% should be born in the USA, 20% were born in the UK, 20% were born in Canada, 10% in Australia.
  • 70% should be married and 30% not.

These conditions are independent , that is, they cannot do this:

  • (0.4 * 0.5 * 0.7) * 100 = 14 users who are US-born men and married.
  • (0.4 * 0.5 * 0.3) * 100 = 6 users who are US-born men and unmarried.

Is there an algorithm for this generation?

+3
source share
5 answers

Is exact or rough breakdown required? Usually, if you create a sample like this, you do a statistical study, so just create an example sample.

Here's how to do it:

There is a genRandomIndividual () function.

Every time you create a person, use a random function to select gender - men with a 40% probability

( 0-1, 0 -.5, , .5-.7, & K, .7-.9 , ).

( 0-1, 0-.7, , ).

, , , , . , .

, . . , , .

+2

- :

  • 100
  • ( ):
    • , ,
    • , . , .

probaby .

, , . , , , .

+1

, , . , , . , - , .

+1

( (, ), .)

, , . , .

, , , , .. , , p, k , p k ; . . , 40 60 .

p- ( , ). p- , . , , , .

, , , . , , , k, , . , , , , ; 1, , .

0

, , ( ), , . .

, , , . , . , , .

:

, , , .

, (, 40% , 60% ) (, 100, 40 , 60 ). .

, , (. , ). :

- Randomly select a row.  
- Mark the row examined.
- For each column constraint:
    * Get the value for the relevant column from the row
    * Test for selectability:
        If there a value target for the value, 
        and if we haven't already selected our target number of incidences of this value, 
        then the row is selectable with respect to this column
    * Else: the row fails.
- If the row didn't fail, select it: add it to the subset

. , , ..., , , .

:

, . , ( 100), , , , .

, : , , . , ( : 100, : 40%, 10%) , , , .

( : 100, : 40%, 40%), , , , . ( : 100, : 20%, 40%), , ( ).

. , .

( ) . : , , , , .

OP, : , . : " ".

: , , - , , . , , .

, , . , .

Pre-:

, , , , , , , .

, , (.. 100 ) , . , , (, 40 60 ), .

, , - . , , .

, . , . : , , : ( ) .

However, for the OP question, this is possible. As I explain, we can randomly select strings and test them individually, because each has one weight.

0
source

Source: https://habr.com/ru/post/1726149/


All Articles