MySQL query to get the "intersection" of multiple queries with restrictions

Suppose I have one mySQL table (users) with the following fields:

userid gender region age ethnicity income 

I want to be able to return the number of complete records based on the number the user enters. In addition, they will also provide additional criteria.

In the simplest example, they can request 1000 entries, in which 600 entries should have gender = "Male" and 400 entries, where gender = "Female". It is quite simple to do.

Now take one more step. Suppose now they want to specify Region:

 GENDER Male: 600 records Female: 400 records REGION North: 100 records South: 200 records East: 300 records West: 400 records 

Again, only 1000 records need to be returned, but in the end there should be 600 men, 400 women, 100 northerners, 200 southerners, 300 eastern residents and 400 westerners.

I know this is not valid syntax, but using pseudo-mySQL code, I hope it illustrates what I'm trying to do:

 (SELECT * FROM users WHERE gender = 'Male' LIMIT 600 UNION SELECT * FROM users WHERE gender = 'Female' LIMIT 400) INTERSECT (SELECT * FROM users WHERE region = 'North' LIMIT 100 UNION SELECT * FROM users WHERE region = 'South' LIMIT 200 UNION SELECT * FROM users WHERE region = 'East' LIMIT 300 UNION SELECT * FROM users WHERE region = 'West' LIMIT 400) 

Please note that I am not looking for a one-time request. The total number of records and the number of records in each criterion will constantly change based on user input. So, I'm trying to come up with a general solution that can be reused over and over again, rather than a hard-coded solution.

To complicate the situation, add even more criteria. There may also be age, ethnicity and income, each with its own number of entries for each group, additional code added above:

 INTERSECT (SELECT * FROM users WHERE age >= 18 and age <= 24 LIMIT 300 UNION SELECT * FROM users WHERE age >= 25 and age <= 36 LIMIT 200 UNION SELECT * FROM users WHERE age >= 37 and age <= 54 LIMIT 200 UNION SELECT * FROM users WHERE age >= 55 LIMIT 300) INTERSECT etc. 

I'm not sure if this is possible to write in a single query or it requires several statements and iterations.

+43
sql database inner-join mysql select
Nov 26 '14 at 22:40
source share
10 answers

Flatten your criteria




You can smooth your multidimensional criteria into single-level criteria.

enter image description here

Now these criteria can be achieved in one request, as follows

 (SELECT * FROM users WHERE gender = 'Male' AND region = 'North' LIMIT 40) UNION ALL (SELECT * FROM users WHERE gender = 'Male' AND region = 'South' LIMIT 80) UNION ALL (SELECT * FROM users WHERE gender = 'Male' AND region = 'East' LIMIT 120) UNION ALL (SELECT * FROM users WHERE gender = 'Male' AND region = 'West' LIMIT 160) UNION ALL (SELECT * FROM users WHERE gender = 'Female' AND region = 'North' LIMIT 60) UNION ALL (SELECT * FROM users WHERE gender = 'Female' AND region = 'South' LIMIT 120) UNION ALL (SELECT * FROM users WHERE gender = 'Female' AND region = 'East' LIMIT 180) UNION ALL (SELECT * FROM users WHERE gender = 'Female' AND region = 'West' LIMIT 240) 

Problem

  • It does not always return the correct result. For example, if there are less than 40 users who are male and from the north, then the query will return less than 1000 entries.



Adjust criteria




Let's say that there are less than 40 users who are men and from the north. Then you need to adjust the number of other criteria to cover the missing amount from “Male” and “North”. I believe that this cannot be done using bare SQL. This is the pseudo code that I have in mind. To simplify, I think we will only query men, women, north and south.

 conditions.add({ gender: 'Male', region: 'North', limit: 40 }) conditions.add({ gender: 'Male', region: 'South', limit: 80 }) conditions.add({ gender: 'Female', region: 'North', limit: 60 }) conditions.add({ gender: 'Female', region: 'South', limit: 120 }) foreach(conditions as condition) { temp = getResultFromDatabaseByCondition(condition) conditions.remove(condition) // there is not enough result for this condition, // increase other condition quantity if (temp.length < condition.limit) { adjust(...); } } 

Let's say that there are only 30 men from the northern countries. Therefore, we need to configure +10 male and +10 northener.

 To Adjust --------------------------------------------------- Male +10 North +10 Remain Conditions ---------------------------------------------------- { gender: 'Male', region: 'South', limit: 80 } { gender: 'Female', region: 'North', limit: 60 } { gender: 'Female', region: 'South', limit: 120 } 

“Male” + “South” is the first condition that matches the “Male” setting condition. Increase it by +10 and remove it from the "remain in state" list. As we increase the South, we need to reduce it under other conditions. So add the condition "South" to the "Configure" list.

 To Adjust --------------------------------------------------- South -10 North +10 Remain Conditions ---------------------------------------------------- { gender: 'Female', region: 'North', limit: 60 } { gender: 'Female', region: 'South', limit: 120 } Final Conditions ---------------------------------------------------- { gender: 'Male', region: 'South', limit: 90 } 

Find the condition corresponding to "South" and repeat the same process.

 To Adjust --------------------------------------------------- Female +10 North +10 Remain Conditions ---------------------------------------------------- { gender: 'Female', region: 'North', limit: 60 } Final Conditions ---------------------------------------------------- { gender: 'Female', region: 'South', limit: 110 } { gender: 'Male', region: 'South', limit: 90 } 

And finally

 { gender: 'Female', region: 'North', limit: 70 } { gender: 'Female', region: 'South', limit: 110 } { gender: 'Male', region: 'South', limit: 90 } 

I have not yet come up with the exact implementation of the setup. This is harder than I expected. I will update as soon as I can understand how to implement it.

+16
Dec 11 '14 at 2:42
source share

The problem you are describing is a multidimensional modeling problem. In particular, you are trying to get a stratified sample from several dimensions at the same time. The key to this is the descent to the lowest level of detail and the accumulation of the sample from there.

I also suggest that you want the sample to be representative at all levels. That is, you do not want all users from Severny to be women. Or all “men” must be from the “West”, even if it meets the final criteria.

Start by thinking in terms of the total number of records, dimensions, and distributions along each dimension. For example, for the first sample, think of it as:

  • 1000 entries
  • 2 measurements: floor, area
  • gender split: 60%, 40%
  • division of the region: 10%, 20%, 30%, 40%

Then you want to highlight these numbers for each gender / region combination. Figures:

  • North, Male: 60
  • North, Female: 40
  • South, Man: 120
  • South, Woman: 80
  • East, Man: 180
  • East, Woman: 120
  • West, Man: 240
  • West, Female: 160

You will see that they fold in size.

Calculating the numbers in each cell is quite simple. This is a percentage of the total. So, "East, woman" is 30% * 40% * 1000., Voila! The value is 120.

Here is the solution:

  • Take the input for each dimension as a percentage of the total. And make sure they add up to 100% for each dimension.
  • Create a table of expected percentages for each cell. This is the product of interest for each dimension.
  • A few expected percentages in the total amount.
  • The final request is described below.

Suppose you have a cells table with the expected counter and raw data ( users ).

 select enumerated.* from (select u.*, (@rn := if(@dims = concat_ws(':', dim1, dim2, dim3), @rn + 1, if(@dims := concat_ws(':', dim1, dim2, dim3), 1, 1) ) ) as seqnum from users u cross join (select @dims = '', @rn := '') vars order by dim1, dim2, dim3, rand() ) enumerated join cells on enumerated.dims = cells.dims where enuemrated.seqnum <= cells.expectedcount; 

Please note that this is a sketch of the solution. You must fill in the size details.

This will work as long as you have enough data for all the cells.

In practice, when performing this type of multidimensional stratified sampling, you risk that the cells will be empty or too small. When this happens, you can often fix it with a subsequent extra skip. Take what you can from large enough cells. Usually they make up the majority of the required data. Then add entries to meet the final score. The entries to be added are those whose values ​​correspond to what is needed for the most necessary measurements. However, this decision simply assumes that there is enough data to meet your criteria.

+10
Dec 11 '14 at 1:07
source share

The problem with your request is that there are a huge number of options that you can use to achieve the proposed numbers:

  Male Female Sum ----------------------------- North: 100 0 100 South: 200 0 200 East: 300 0 300 West: 0 400 400 Sum: 600 400 ----------------------------- North: 99 1 100 South: 200 0 200 East: 300 0 300 West: 1 399 400 Sum: 600 400 ----------------------------- .... ----------------------------- North: 0 100 100 South: 200 0 200 East: 0 300 300 West: 400 0 400 Sum: 600 400 

Just combining the North, East and West (always with a south man: 200), you will get 400 opportunities to achieve the proposed numbers. And it gets even more complicated when you only have a limited number of entries for each "class" ("Male / Northern =" class ").

You may need up to MIN(COUNT(gender), COUNT(location)) records for each cell in the table above (for the case when it will be zero).

It depends on the:

  Male Female --------------------- North: 100 100 South: 200 200 East: 300 300 West: 400 400 

So, you need to calculate the available records for each gender / location pair AVAILABLE(gender, location) .

The search for a concrete correspondence seems close to semi-magic squares [1] [2] .

And there are several questions on math.stackexchange.com about this [3] [4] .

I ended up reading some articles on how to build them, and I doubt it can be done with just one choice.

If you have enough records and will fail in this situation:

  Male Female --------------------- North: 100 0 South: 200 200 East: 300 0 West: 200 200 

I would go with iterative places of the trough and add at each step a proportional number of men / women:

  • M: 100 (16%); F: 0 (0%)
  • M: 100 (16%); F: 200 (50%)
  • M: 400 (66%); F: 200 (50%)
  • M: 600 (100%); F: 400 (100%)

But this will give you only rough results and after checking those that you may want to repeat the result several times and adjust the calculations in each category to be "good enough".

+3
Dec 08 '14 at 16:24
source share

This can be solved in two stages. I will describe how to do this for an example where gender and region are dimensions. Then I will describe a more general case. At the first stage, we solve a system of equations of 8 variables, then we take a disconnected union of 8 selection operators, limited by the solutions found in the first step. Please note that there are only 8 possibilities for any row. They can be men or women, and then the region is one of the northern, southern, eastern or western. Let now

 X1 equal the number of rows that are male and from the north, X2 equal the number of rows that are male and from the south, X3 equal the number of rows that are male and from the east, X4 equal then number that are male and from the west X5 equal the number of rows that are female and from the north, X6 equal the number of rows that are female and from the south, X7 equal the number of rows that are female and from the east, X8 equal then number that are female and from the west 

Equations

  X1+X2+X3+X4=600 X5+X6+X7+X8=400 X1+X5=100 X2+X6=200 X3+X7=300 X4+X8=400 

Now decide for X1, X2, ... X8 in the above. There are many solutions (I will tell you how to solve in an instant). Here is the solution:

 X1=60, X2=120, X3=180,X4=240,X5=40,X6=80,X7=120,X8=160. 

Now we can get the result by a simple connection of 8 elements:

 (select * from user where gender='m' and region="north" limit 60) union distinct(select * from user where gender='m' and region='south' limit 120) union distinct(select * from user where gender='m' and region='east' limit 180) union distinct(select * from user where gender='m' and region='west' limit 240) union distinct(select * from user where gender='f' and region='north' limit 40) union distinct(select * from user where gender='f' and region='south' limit 80) union distinct(select * from user where gender='f' and region='east' limit 120) union distinct(select * from user where gender='f' and region='west' limit 160); 

Please note that if the database does not contain 60 rows, satisfy the first one selected above, then the specific given solution will not work. Therefore, we need to add other restrictions, LT:

 0<X1 <= (select count(*) from user where from user where gender='m' and region="north") 0<X2 <= (select count(*) from user where gender='m' and region='south') 0<X3 <= (select count(*) from user where gender='m' and region='east' ) 0<X4 <= (select count(*) from user where gender='m' and region='west') 0<X5 <= (select count(*) from user where gender='f' and region='north' ) 0<X6 <= (select count(*) from user where gender='f' and region='south') 0<X7 <= (select count(*) from user where gender='f' and region='east' ) 0<X8 <= (select count(*) from user where gender='f' and region='west'); 

Now let's summarize the permissible splits for this case. Equations E:

  X1+X2+X3+X4=n1 X5+X6+X7+X8=n2 X1+X5=m1 X2+X6=m2 X3+X7=m3 X4+X8=m4 

The numbers n1, n2, m1, m2, m3, m4 are given and satisfy n1 + n2 = (m1 + m2 + m3 + m4). Thus, we have led the problem to solving the equations of LT and E above. This is simply a linear programming problem and can be solved using the simplex method or other methods. Another possibility is to consider this as a System of linear Diophantine equations and use the methods for this to find solutions. In any case, I solved the problem of finding a solution to the above equations. (Given that the equations have a special form, there may be a faster way than using the simplex method or solving a system of linear Diophantine equations). Once we decide for Xi the final solution:

 (select * from user where gender='m' and region="north" limit :X1) union distinct(select * from user where gender='m' and region='south' limit :X2) union distinct(select * from user where gender='m' and region='east' limit :X3) union distinct(select * from user where gender='m' and region='west' limit :X4) union distinct(select * from user where gender='f' and region='north' limit :X5) union distinct(select * from user where gender='f' and region='south' limit :X6) union distinct(select * from user where gender='f' and region='east' limit :X7) union distinct(select * from user where gender='f' and region='west' limit :X8); 

Denote the dimension D with n possibilities by D: n. Suppose you have dimensions D1: n1, D2: n2, ... DM: nM. It will generate the variables n1 * n2 * ... nM. The number of generated equations is n1 + n2 + ... nM. Instead, we define a general method to take another case of three dimensions, 4 dimensions and 2 dimensions; Allows you to call the possible values ​​for D1 equal to d11, d12, d13, D2 - the values ​​of d21, d22, d23, d24 and D3 are equal to d31, d32. We will have 24 variables, and the equations:

  X1 + X2 + ...X8=n11 X9 + X10 + ..X16=n12 X17+X18 + ...X24=n13 X1+X2+X9+x10+x17+x18=n21 X3+X4+X11+x12+x19+x20=n22 X5+X6+X13+x14+x21+x22=n23 X7+X8+X15+x116+x23+x24=n24 X1+X3+X5+...X23=n31 X2+X4+......X24=n32 

Where

 X1 equals number with D1=d11 and D2=d21 and D3=d31 X2 equals number with D1=d11 and D2=d21 and D3 = d31 .... X24 equals number with D1=D13 and D2=d24, and D3=d32. 

Add fewer restrictions. Then decide for X1, X2, ... X24. Create 24 select statements and take a disconnected union. We can solve similarly for any measurements.

So, in short: given the sizes D1: n1, D2: n2, ... DM: nM, we can solve the corresponding linear programming problem, as described above for n1 * n2 * ... nM variables, and then generate the solution by taking a disconnected union over the operators n1 * n2 * ... nM select. So, yes, we can generate a solution using select statements, but first we need to solve the equations and determine the limits, getting the calculations for each of the variables n1 * n2 * ... nM.

Despite the fact that the generosity is over, I'm going to add a little more for those you are interested in. I affirm that I have fully shown how to solve this, if there is a solution.

To clarify my approach. In the case of 3 measurements, let's say we divided the age into one of three possibilities. Then make good use of gender and region, as in the question. For each user, there are 24 different options corresponding to where they fall into these categories. Let Xi be the number of each of these possibilities in the final result. Let me write a matrix where each row represents one of the possibilities. Each user will contribute a maximum of 1 to m or f, 1 to the north, south, east or west, and 1 to the age category. And there are only 24 possibilities for the user. Let's show the matrix: (abc) 3 ages, (nsew) regions and (mf) male or female: a age less than or equal to 10, b - age from 11 to 30 years, c - age from 31 to 50 years.

  abc nsew mf X1 100 1000 10 X2 100 1000 01 X3 100 0100 10 X4 100 0100 01 X5 100 0010 10 X6 100 0010 01 X7 100 0001 10 X8 100 0001 01 X9 010 1000 10 X10 010 1000 01 X11 010 0100 10 X12 010 0100 01 X13 010 0010 10 X14 010 0010 01 X15 010 0001 10 X16 010 0001 01 X17 001 1000 10 X18 001 1000 01 X19 001 0100 10 X20 001 0100 01 X21 001 0010 10 X22 001 0010 01 X23 001 0001 10 X24 001 0001 01 

Each row represents a user in which there is 1 in the column if he contributes to the result. For example, the first line shows 1 for a, 1 for n, and 1 for m. This means that the user's age is less than or equal to 10, from the north and is a man. Xi represents how many of this number of rows are in the final result. So, let's say that X1 is 10, which means that we say that the end result has 10 results, all of which are from the north, are males and less than or equal to 10. Okay, now we just need to add something. Please note that the first 8 X1+X2+X3+X4+X5+X6+X7+X8 are all lines whose age is less than or equal to 10. They must correspond to all that we have chosen for this category. Similarly for the next 2 sets of 8.

So, until now we get the equations: (na is a number with an age of less than 10, nb is an age from 10 to 20, nc is a number whose age is less than 50

 X1+X2+X3+X4+X5+X6+X7+X8 = na X9+X10+X11 + .... X16 = nb X17+X18+X19+... X24=nc 

These are age-related cleavages. Now let's look at the division of the area. Just add the variables to the "n" column,

 X1+X2+X9+X10+X17+X18 = nn X3+X4+X11+X12+X19+20=ns ... 

etc .. you see how i get these equations just by looking down the columns? We continue for ew and mf. giving 3 + 4 + 2 equations in general. So what I did here is pretty simple. I reasoned that any row you select contributes to each of the three dimensions, and there are only 24 possibilities. Then let Xi be the number for each opportunity, and you will get the equations that need to be solved. It seems to me that any method that you came up with should be a solution to these equations. In other words, I simply reformulated the problem in terms of solving these equations.

Now we need an integer solution, since we cannot have a fractional string. Please note that these are all linear equations. But we want an integer solution. Here is a link to an article that describes how to solve them: https://www.math.uwaterloo.ca/~wgilbert/Research/GilbertPathria.pdf

+1
Dec 14 '14 at 11:29
source share

I would build a database distribution map and use it to implement sampling logic. Bonuses include the ability to add quick demographic feedback to the user and additional server load. On the console side, you need to implement a mechanism to synchronize the database and the map.

It might look like this using JSON:

 {"gender":{ "Male":{ "amount":35600, "region":{ "North":{ "amount":25000, "age":{ "18":{ "amount":2400, "ethnicity":{ ... "income":{ ... } }, "income":{ ... "ethnicity":{ ... } } }, "19":{ ... }, ... "120":{ ... } }, "ethnicity":{ ... }, "income":{ ... } }, "South":{ ... }, ... } "age":{ ... } "ethnicity":{ ... }, "income":{ ... } }, "Female":{ ... } }, "region":{ ... }, "age":{ ... }, "ethnicity":{ ... }, "income":{ ... }} 

Thus, the user selects

 total 1000 600 Male 400 Female 100 North 200 South 300 East 400 West 300 <20 years old 300 21-29 years old 400 >=30 years old 

Calculate the linear distribution:

 male-north-u20: 1000*0.6*0.1*0.3=18 male-north-21to29: 18 male-north-o29: 24 (keep a track of rounding errors) etc 

then we will check the map:

 tmp.male.north.u20=getSumUnder(JSON.gender.Male.region.North.age,20) // == 10 tmp.male.north.f21to29=getSumBetween(JSON.gender.Male.region.North.age,21,29) // == 29 tmp.male.north.o29=getSumOver(JSON.gender.Male.region.north.age,29) // == 200 etc 

Check everything that matches the linear distribution as well, and track the excess. If something (e.g. male.north.u20) below is first set up in the parent (to make sure male.north, for example, matches the criteria), you get 8 for u20 and overuse 8 for f21to29. After the first run, configure each missing criterion in other regions. So tmp.male.south.u20+=8;tmp.male.south.f21to29-=8; .

It is quite tiring to figure it out.

, SQL-.

+1
15 . '14 12:16
source share

, .

, , - .
, , - ,

:

{F1, F2, ... Fn} , , . , F1 , (F1V1 = : 60%, F1V2 = : 40%). , ( X ). , , . ( )

EG: , 2 , F1: , {F1V1 = : 60%, F1V2 = : 40%}, F2: , {F2V1 = : 50%, F2V2 = : 50%} , X = 10 .
, 6 , 4 , 5 - 5 - .

  • sql- F1 - (.
    • WHERE gender = 'Male' : 0,6,
    • WHERE gender = 'Female' : 0.4)
  • F2 - sql- - , F1 F2, . , 2 x 2 = 4
    • WHERE gender = 'Male' AND region = 'North' : 0,6 * 0,5 = 0,3,
    • WHERE gender = 'Female' AND region = 'North' : 0,4 * 0,5 = 0,2,
    • WHERE gender = 'Male' AND region = 'South' : 0,6 * 0,5 = 0,3,
    • WHERE gender = 'Female' AND region = 'South' : 0,4 * 0,5 = 0,2
  • 2 F3 - Fn. ( 2 , )
  • SQL- [, ] * X = ( 0.3 * 10 = 3 Male/North, 0.2 * 10 = 2 Female/North ..)
  • , sql- - SQL

# , . SQL

, - , , , - , .

- .

 // This is an example of a public class you could use to hold one of your filters // For example - if you wanted 60% male / 40% female, you could have an item with // item1 = {Fraction: 0.6, ValueExact: 'Male', RangeStart: null, RangeEnd: null} // & item2 = {Fraction: 0.4, ValueExact: 'Female', RangeStart: null, RangeEnd: null} public class FilterItem{ public decimal Fraction {get; set;} public string ValueExact {get; set;} public int? RangeStart {get; set;} public int? RangeEnd {get; set;} } // This is an example of a public method you could call to build your SQL // - passing in a generic list of desired filter // for example the dictionary entry for the above filter would be // {Key: "gender", Value: new List<FilterItem>(){item1, item2}} public string BuildSQL(Dictionary<string, List<FilterItem>> filters, int TotalItems) { // we want to build up a list of SQL stubs that can be unioned together. var sqlStubItems = new List<SqlItem>(); foreach(var entry in filters) { AddFilter(entry.Key, entry.Value, sqlStubItems); } // ok - now just combine all of the sql stubs into one big union. var result = ""; // Id use a stringbuilder for this normally, // but this is probably more cross-language readable. int limitSum = 0; for(int i = 0; i < sqlStubItems.Count; i++) // string.Join() would be more succinct! { var item = sqlStubItems[i]; if (i > 0) { result += " UNION "; } int limit = (int)Math.Round(TotalItems * item.Fraction, 0); limitSum+= limit; if (i == sqlStubItems.Count - 1 && limitSum != TotalItems) { //may need to adjust one of the rounded items to account //for rounding errors making a total that is not the //originally required total limit. limit += (TotalItems - limitSum); } result += item.Sql + " LIMIT " + Convert.ToString(limit); } return result; } // This method expands the number of SQL stubs for every filter that has been added. // each existing filter is split by the number of items in the newly added filter. private void AddFilter(string filterType, List<FilterItem> filterValues, List<SqlItem> SqlItems) { var newItems = new List<SqlItem>(); foreach(var filterItem in filterValues) { string filterAddon; if (filterItem.RangeStart.HasValue && filterItem.RangeEnd.HasValue){ filterAddon = filterType + " >= " + filterItem.RangeStart.ToString() + " AND " + filterType + " <= " + filterItem.RangeEnd.ToString(); } else { filterAddon = filterType + " = '" + filterItem.ValueExact.Replace("'","''") + "'"; //beware of SQL injection. (hence the .Replace() above) } if(SqlItems.Count() == 0) { newItems.Add(new SqlItem(){Sql = "Select * FROM users WHERE " + filterAddon, Fraction = filterItem.Fraction}); } else { foreach(var existingItem in SqlItems) { newItems.Add(new SqlItem() { Sql = existingItem + " AND " + filterAddon, Fraction = existingItem.Fraction * filterItem.Fraction }); } } } SqlItems.Clear(); SqlItems.AddRange(newItems); } // this class is for part-built SQL strings, with the fraction private class SqlItem{ public string Sql { get; set;} public decimal Fraction{get; set;} } 

( )

  • , 600/400, , .
  • , , . (, 10 , 6 , 4 , 5 , 5 3 , 3 , 2 2 .)
  • - . - ORDER BY RAND() ( , ), .
  • SQL. , ' .

( )? , ?

, , , . , sql . n- SQL- ( F1-n). , 4 ( , SQL-), , ,

  • SQL, WHERE.
  • - , , ( )
    • ( ).
    • , , , , , . , .
    • .
    • n ( )

, - :
/ = 3, / = 2, / = 3, / = 2, 2 ( , )

  • Male/North 2 (-1)
  • Female/North 3 (+1) Male/South 4 (+1)
  • / 1 (-1). Voila! ( , 2 /)

( 4- , , !) stub SQL n- ( ). (, gender = male). , ! ... 2 - , , 2 , 1 - NE, .

adjustments

0
08 . '14 16:58
source share

- SQL , .

, ORM - SQL.

, Django :

:

 class User(models.Model): GENDER_CHOICES = ( ('M', 'Male'), ('F','Female') ) gender = models.CharField(max_length=1, choices=GENDER_CHOICES) REGION_CHOICES = ( ('E', 'East'), ('W','West'), ('N','North'), ('S','South') ) region = models.CharField(max_length=1, choices=REGION_CHOICES) age = models.IntegerField() ETHNICITY_CHOICES = ( ....... ) ethnicity = models.CharField(max_length=1, choices=ETHNICITY_CHOICES) income = models.FloatField() 

:

 # gender_limits is a dict like {'M':400, 'F':600} # region_limits is a dict like {'N':100, 'E':200, 'W':300, 'S':400} def get_users_by_gender_and_region(gender_limits,region_limits): for gender in gender_limits: gender_queryset = gender_queryset | User.objects.filter(gender=gender)[:gender_limits[gender]] for region in region_limits: region_queryset = region_queryset | User.objects.filter(region=region)[:region_limits[region]] return gender_queryset & region_queryset 

, , .

ORM, , ORM .

0
08 . '14 17:34
source share

SQL, mySQL. : , (, , , ?).

. . , . MySQL , . . , .

.

 set @male=600; set @female=400; set @north=100; set @south=200; set @east=300; set @west=400; set @north_male=@north*(@male/(@male+@female)); set @south_male=@south*(@male/(@male+@female)); set @east_male =@east *(@male/(@male+@female)); set @west_male =@west *(@male/(@male+@female)); set @north_female=@north*(@female/(@male+@female)); set @south_female=@south*(@female/(@male+@female)); set @east_female =@east *(@female/(@male+@female)); set @west_female =@west *(@female/(@male+@female)); select gender, region, count(*) from ( select * from (select @north_male :=@north_male-1 as row, userid, gender, region from users where gender = 'Male' and region = 'North' ) mn where row>=0 union all select * from (select @south_male :=@south_male-1 as row, userid, gender, region from users where gender = 'Male' and region = 'South' ) ms where row>=0 union all select * from (select @east_male :=@east_male-1 as row, userid, gender, region from users where gender = 'Male' and region = 'East' ) me where row>=0 union all select * from (select @west_male :=@west_male-1 as row, userid, gender, region from users where gender = 'Male' and region = 'West' ) mw where row>=0 union all select * from (select @north_female:=@north_female-1 as row, userid, gender, region from users where gender = 'Female' and region = 'North' ) fn where row>=0 union all select * from (select @south_female:=@south_female-1 as row, userid, gender, region from users where gender = 'Female' and region = 'South' ) fs where row>=0 union all select * from (select @east_female :=@east_female-1 as row, userid, gender, region from users where gender = 'Female' and region = 'East' ) fe where row>=0 union all select * from (select @west_female :=@west_female-1 as row, userid, gender, region from users where gender = 'Female' and region = 'West' ) fw where row>=0 ) a group by gender, region order by gender, region; 

Output:

 Female East 120 Female North 40 Female South 80 Female West 160 Male East 180 Male North 60 Male South 120 Male West 240 

, :

 set @male=600; set @female=400; set @north=100; set @south=200; set @east=300; set @west=400; set @north_male=@north*(@male/(@male+@female)); set @south_male=@south*(@male/(@male+@female)); set @east_male =@east *(@male/(@male+@female)); set @west_male =@west *(@male/(@male+@female)); set @north_female=@north*(@female/(@male+@female)); set @south_female=@south*(@female/(@male+@female)); set @east_female =@east *(@female/(@male+@female)); set @west_female =@west *(@female/(@male+@female)); select * from (select @north_male :=@north_male-1 as row, userid, gender, region from users where gender = 'Male' and region = 'North' ) mn where row>=0 union all select * from (select @south_male :=@south_male-1 as row, userid, gender, region from users where gender = 'Male' and region = 'South' ) ms where row>=0 union all select * from (select @east_male :=@east_male-1 as row, userid, gender, region from users where gender = 'Male' and region = 'East' ) me where row>=0 union all select * from (select @west_male :=@west_male-1 as row, userid, gender, region from users where gender = 'Male' and region = 'West' ) mw where row>=0 union all select * from (select @north_female:=@north_female-1 as row, userid, gender, region from users where gender = 'Female' and region = 'North' ) fn where row>=0 union all select * from (select @south_female:=@south_female-1 as row, userid, gender, region from users where gender = 'Female' and region = 'South' ) fs where row>=0 union all select * from (select @east_female :=@east_female-1 as row, userid, gender, region from users where gender = 'Female' and region = 'East' ) fe where row>=0 union all select * from (select @west_female :=@west_female-1 as row, userid, gender, region from users where gender = 'Female' and region = 'West' ) fw where row>=0 ; 

, 10000 :

 use test; drop table if exists users; create table users (userid int not null auto_increment, gender VARCHAR (20), region varchar(20), primary key (userid) ); drop procedure if exists load_users_table; delimiter # create procedure load_users_table() begin declare l_max int unsigned default 10000; declare l_cnt int unsigned default 0; declare l_gender varchar(20); declare l_region varchar(20); declare l_rnd smallint; truncate table users; start transaction; WHILE l_cnt < l_max DO set l_rnd = floor( 0 + (rand()*2) ); if l_rnd = 0 then set l_gender = 'Male'; else set l_gender = 'Female'; end if; set l_rnd=floor(0+(rand()*4)); if l_rnd = 0 then set l_region = 'North'; elseif l_rnd=1 then set l_region = 'South'; elseif l_rnd=2 then set l_region = 'East'; elseif l_rnd=3 then set l_region = 'West'; end if; insert into users (gender, region) values (l_gender, l_region); set l_cnt=l_cnt+1; end while; commit; end # delimiter ; call load_users_table(); select gender, region, count(*) from users group by gender, region order by gender, region; 

, . : UNION ALL LIMIT .

0
11 . '14 11:04
source share

, , , 60/40 . . , , .

  Declare @Mlimit bigint Declare @Flimit bigint Declare @Northlimit bigint Declare @Southlimit bigint Declare @Eastlimit bigint Declare @Westlimit bigint Set @Mlimit= 600 Set @Flimit=400 Set @Northlimit= 100 Set @Southlimit=200 Set @Eastlimit=300 Set @Westlimit=400 CREATE TABLE #Users( [UserId] [int] NOT NULL, [gender] [varchar](10) NULL, [region] [varchar](10) NULL, [age] [int] NULL, [ethnicity] [varchar](50) NULL, [income] [bigint] NULL ) Declare @MnorthCnt bigint Declare @MsouthCnt bigint Declare @MeastCnt bigint Declare @MwestCnt bigint Declare @FnorthCnt bigint Declare @FsouthCnt bigint Declare @FeastCnt bigint Declare @FwestCnt bigint Select @MnorthCnt=COUNT(*) from users where gender='male' and region='north' Select @FnorthCnt=COUNT(*) from users where gender='female' and region='north' Select @MsouthCnt=COUNT(*) from users where gender='male' and region='south' Select @FsouthCnt=COUNT(*) from users where gender='female' and region='south' Select @MeastCnt=COUNT(*) from users where gender='male' and region='east' Select @FeastCnt=COUNT(*) from users where gender='female' and region='east' Select @MwestCnt=COUNT(*) from users where gender='male' and region='west' Select @FwestCnt=COUNT(*) from users where gender='female' and region='west' If (@Northlimit=@MnorthCnt+@FnorthCnt) begin Insert into #Users select * from Users where region='north' set @Northlimit=0 set @Mlimit-=@MnorthCnt set @Flimit-=@FnorthCnt set @MnorthCnt=0 set @FnorthCnt=0 end If (@Southlimit=@MSouthCnt+@FSouthCnt) begin Insert into #Users select * from Users where region='South' set @Southlimit=0 set @Mlimit-=@MSouthCnt set @Flimit-=@FSouthCnt set @MsouthCnt=0 set @FsouthCnt=0 end If (@Eastlimit=@MEastCnt+@FEastCnt) begin Insert into #Users select * from Users where region='East' set @Eastlimit=0 set @Mlimit-=@MEastCnt set @Flimit-=@FEastCnt set @MeastCnt=0 set @FeastCnt=0 end If (@Westlimit=@MWestCnt+@FWestCnt) begin Insert into #Users select * from Users where region='West' set @Westlimit=0 set @Mlimit-=@MWestCnt set @Flimit-=@FWestCnt set @MwestCnt=0 set @FwestCnt=0 end If @MnorthCnt<@Northlimit Begin insert into #Users select top (@Northlimit-@MnorthCnt) * from Users where gender='female' and region='north' and userid not in (select userid from #users) set @Flimit-=(@Northlimit-@MnorthCnt) set @FNorthCnt-=(@Northlimit-@MnorthCnt) set @Northlimit-=(@Northlimit-@MnorthCnt) End If @FnorthCnt<@Northlimit Begin insert into #Users select top (@Northlimit-@FnorthCnt) * from Users where gender='male' and region='north' and userid not in (select userid from #users) set @Mlimit-=(@Northlimit-@FnorthCnt) set @MNorthCnt-=(@Northlimit-@FnorthCnt) set @Northlimit-=(@Northlimit-@FnorthCnt) End if @MsouthCnt<@southlimit Begin insert into #Users select top (@southlimit-@MsouthCnt) * from Users where gender='female' and region='south' and userid not in (select userid from #users) set @Flimit-=(@southlimit-@MsouthCnt) set @FSouthCnt-=(@southlimit-@MsouthCnt) set @southlimit-=(@southlimit-@MsouthCnt) End if @FsouthCnt<@southlimit Begin insert into #Users select top (@southlimit-@FsouthCnt) * from Users where gender='male' and region='south' and userid not in (select userid from #users) set @Mlimit-=(@southlimit-@FsouthCnt) set @MSouthCnt-=(@southlimit-@FsouthCnt) set @southlimit-=(@southlimit-@FsouthCnt) End if @MeastCnt<@eastlimit Begin insert into #Users select top (@eastlimit-@MeastCnt) * from Users where gender='female' and region='east' and userid not in (select userid from #users) set @Flimit-=(@eastlimit-@MeastCnt) set @FEastCnt-=(@eastlimit-@MeastCnt) set @eastlimit-=(@eastlimit-@MeastCnt) End if @FeastCnt<@eastlimit Begin insert into #Users select top (@eastlimit-@FeastCnt) * from Users where gender='male' and region='east' and userid not in (select userid from #users) set @Mlimit-=(@eastlimit-@FeastCnt) set @MEastCnt-=(@eastlimit-@FeastCnt) set @eastlimit-=(@eastlimit-@FeastCnt) End if @MwestCnt<@westlimit Begin insert into #Users select top (@westlimit-@MwestCnt) * from Users where gender='female' and region='west' and userid not in (select userid from #users) set @Flimit-=(@westlimit-@MwestCnt) set @FWestCnt-=(@westlimit-@MwestCnt) set @westlimit-=(@westlimit-@MwestCnt) End if @FwestCnt<@westlimit Begin insert into #Users select top (@westlimit-@FwestCnt) * from Users where gender='male' and region='west' and userid not in (select userid from #users) set @Mlimit-=(@westlimit-@FwestCnt) set @MWestCnt-=(@westlimit-@FwestCnt) set @westlimit-=(@westlimit-@FwestCnt) End IF (@MnorthCnt>=@Northlimit and @FnorthCnt>=@Northlimit and @MsouthCnt>=@southlimit and @FsouthCnt>=@southlimit and @MeastCnt>=@eastlimit and @FeastCnt>=@eastlimit and @MwestCnt>=@westlimit and @FwestCnt>=@westlimit and not(@Mlimit=0 and @Flimit=0)) Begin ---Create Cursor DECLARE UC CURSOR FAST_forward FOR SELECT * FROM Users where userid not in (select userid from #users) Declare @UserId [int] , @gender [varchar](10) , @region [varchar](10) , @age [int] , @ethnicity [varchar](50) , @income [bigint] OPEN UC FETCH NEXT FROM UC INTO @UserId ,@gender, @region, @age, @ethnicity, @income WHILE @@FETCH_STATUS = 0 and not (@Mlimit=0 and @Flimit=0) BEGIN If @gender='male' and @region='north' and @Northlimit>0 AND @Mlimit>0 begin insert into #Users values (@UserId ,@gender, @region, @age, @ethnicity, @income) set @Mlimit-=1 set @MNorthCnt-=1 set @Northlimit-=1 end If @gender='male' and @region='south' and @southlimit>0 AND @Mlimit>0 begin insert into #Users values (@UserId ,@gender, @region, @age, @ethnicity, @income) set @Mlimit-=1 set @MsouthCnt-=1 set @Southlimit-=1 end If @gender='male' and @region='east' and @eastlimit>0 AND @Mlimit>0 begin insert into #Users values (@UserId ,@gender, @region, @age, @ethnicity, @income) set @Mlimit-=1 set @MeastCnt-=1 set @eastlimit-=1 end If @gender='male' and @region='west' and @westlimit>0 AND @Mlimit>0 begin insert into #Users values (@UserId ,@gender, @region, @age, @ethnicity, @income) set @Mlimit-=1 set @MwestCnt-=1 set @westlimit-=1 end If @gender='female' and @region='north' and @Northlimit>0 AND @flimit>0 begin insert into #Users values (@UserId ,@gender, @region, @age, @ethnicity, @income) set @Flimit-=1 set @FNorthCnt-=1 set @Northlimit-=1 end If @gender='female' and @region='south' and @southlimit>0 AND @flimit>0 begin insert into #Users values (@UserId ,@gender, @region, @age, @ethnicity, @income) set @Flimit-=1 set @FsouthCnt-=1 set @Southlimit-=1 end If @gender='female' and @region='east' and @eastlimit>0 AND @flimit>0 begin insert into #Users values (@UserId ,@gender, @region, @age, @ethnicity, @income) set @flimit-=1 set @feastCnt-=1 set @eastlimit-=1 end If @gender='female' and @region='west' and @westlimit>0 AND @flimit>0 begin insert into #Users values (@UserId ,@gender, @region, @age, @ethnicity, @income) set @flimit-=1 set @fwestCnt-=1 set @westlimit-=1 end FETCH NEXT FROM UC INTO @UserId ,@gender, @region, @age, @ethnicity, @income END CLOSE UC DEALLOCATE UC end Select * from #Users SELECT GENDER, REGION, COUNT(*) AS COUNT FROM #USERS GROUP BY GENDER, REGION DROP TABLE #Users 
0
11 . '14 20:50
source share

GROUP BY :

SELECT gender,region,count(*) FROM users GROUP BY gender,region

 +----------------------+ |gender|region|count(*)| +----------------------+ |f |E | 129| |f |N | 43| |f |S | 84| |f |W | 144| |m |E | 171| |m |N | 57| |m |S | 116| |m |W | 256| +----------------------+ 

, 600 , 400 , 100 , 200 , 300 400 .

.

, , :

 SELECT gender, region, case when age < 30 then 'Young' when age between 30 and 59 then 'Middle aged' else 'Old' end as age_range, count(*) FROM users GROUP BY gender,region, age_range 

, :

 +----------------------------------+ |gender|region|age |count(*)| +----------------------------------+ |f |E |Middle aged| 56| |f |E |Old | 31| |f |E |Young | 42| |f |N |Middle aged| 14| |f |N |Old | 11| |f |N |Young | 18| |f |S |Middle aged| 40| |f |S |Old | 23| |f |S |Young | 21| |f |W |Middle aged| 67| |f |W |Old | 42| |f |W |Young | 35| |m |E |Middle aged| 77| |m |E |Old | 56| |m |E |Young | 38| |m |N |Middle aged| 13| |m |N |Old | 25| |m |N |Young | 19| |m |S |Middle aged| 46| |m |S |Old | 39| |m |S |Young | 31| |m |W |Middle aged| 103| |m |W |Old | 66| |m |W |Young | 87| +----------------------------------+ 
0
15 . '14 12:29
source share



All Articles