First, a simple random sample should very well reflect the size of the market. What you are asking for is a stratified pattern.
One way to get such a pattern is to randomize the data and assign a serial number in each group. Then normalize the serial number between 0 and 1 and, finally, arrange by the normalized value and select the upper "n" lines:
select top 100000 c.* from (select c.*, row_number() over (partition by market order by rand(checksum(newid())) ) as seqnum, count(*) over (partition by market) as cnt from customers c ) c order by cast(seqnum as float) / cnt
It may be clear what happens if you look at the data. Consider a sample of 5 from:
1 A 2 B 3 C 4 D 5 D 6 D 7 B 8 A 9 D 10 C
The first step assigns a random number to each market:
1 A 1 2 B 1 3 C 1 4 D 1 5 D 2 6 D 3 7 B 2 8 A 2 9 D 4 10 C 2
Then normalize these values:
1 A 1 0.50 2 B 1 0.50 3 C 1 0.50 4 D 1 0.25 5 D 2 0.50 6 D 3 0.75 7 B 2 1.00 8 A 2 1.00 9 D 4 1.00 10 C 2 1.00
Now, if you take the top five, you get the first five values, which are a stratified pattern.
source share