How to count every digit in a range of integers?

Question

How to count every digit in a range of integers?

Imagine that you are selling these metal numbers used to indicate homes, locker doors, hotel rooms, etc. You need to find how many of each digit should be sent when your client needs to indicate the door / house:

from 1 to 100
51 to 300
1 to 2000 with zeros on the left

The obvious solution is to loop from the first to the last number, convert the counter to a string with or without zeros on the left, extract each digit and use it as an index to increase the array of 10 integers.

I wonder if there is a better way to solve this problem without having to quote the entire range of integers.

Permissions in any language or pseudo-code are welcome.

Edit:

Review Overview
John at CashCommons and Wayne Conrad note that my current approach is good and fast enough. Let me use a stupid analogy: if you were given the task of counting squares on a chessboard in less than 1 minute, you could finish the task by counting the squares one by one, but the best solution would be to count the sides and do the multiplication, because you may be asked later Count the tiles in the building.
Alex Reisner points to a very interesting mathematical law, which, unfortunately, does not seem to be relevant to this problem.
Andres suggests using the same algorithm that Im uses, but it extracts digits using% 10 instead of substrings.
John at CashCommons and phord offer a preliminary calculation of the required numbers and their storage in the look-up table or for raw speed - an array. This could be a good solution if we had an absolute, immutable, set in stone, maximum integer value. I have never seen one of them. The high performance mark and strainer have calculated the necessary numbers for different ranges. A result for one million seems to indicate that there is a fraction, but results for a different number show different proportions.
the filter found some formulas that can be used to count a digit for a number that is ten. Robert Harvey had a very interesting experience publishing a question in MathOverflow. One of the mathematicians wrote the solution using mathematical notation.
Aaronach developed and tested the solution using math. After publishing, he looked at the formulas that emerged from Math Overflow, and found a flaw in it (point to Stackoverflow :).
Noaklavin developed the algorithm and presented it in pseudo-code.

New solution
After reading all the answers and doing some experiments, I found that for a range of integers from 1 to 10 ⁿ -1:

Digits 1 through 9 require fragments n * 10 ^(n-1)
For the digit 0, if you do not use leading zeros, you need n * 10 ^n-1 - ((10 ⁿ -1) / 9)
For digit 0, if leading zeros are used, n * 10 ^n-1 - n is necessary

The first formula was found by the retina (and probably by others), and I found the other two by trial and error (but they can be included in other answers).

For example, if n = 6, the range is from 1 to 999,999:

For numbers 1 through 9, we need 6 * 10 ⁵ = 600,000 of each
For the digit 0 without leading zeros, we need 6 * 10 ⁵ - (10 ⁶ -1) / 9 = 600 000 - 111 111 = 488 889
For the digit 0 with leading zeros, we need 6 * 10 ⁵ - 6 = 599.994

These numbers can be verified using highly effective marking results.

Using these formulas, I improved the original algorithm. It is still in the range from the first to the last number in the range of integers, but if it finds a number that is a power of ten, it uses formulas to add to the numbers, counting the number for the full range from 1 to 9 or from 1 to 99 or from 1 to 999, etc. Here's the algorithm in pseudo code:

  integer First, Last // First and last number in the range
 integer Number // Current number in the loop
 integer Power // Power is the n in 10 ^ n in the formulas
 integer Nines // Nines is the resut of 10 ^ n - 1, 10 ^ 5 - 1 = 99999
 integer Prefix // First digits in a number.  For 14,200, prefix is 142
 array 0..9 Digits // Will hold the count for all the digits

 FOR Number = First TO Last
   CALL TallyDigitsForOneNumber WITH Number, 1 // Tally the count of each digit 
                                               // in the number, increment by 1
   // Start of optimization.  Comments are for Number = 1,000 and Last = 8,000.
   Power = Zeros at the end of number // For 1,000, Power = 3
   IF Power> 0 // The number ends in 0 00 000 etc 
     Nines = 10 ^ Power-1 // Nines = 10 ^ 3 - 1 = 1000 - 1 = 999
     IF Number + Nines <= Last // If 1,000 + 999 <8,000, add a full set
       Digits [0-9] + = Power * 10 ^ (Power-1) // Add 3 * 10 ^ (3-1) = 300 to digits 0 to 9
       Digits [0] - = -Power // Adjust digit 0 (leading zeros formula)
       Prefix = First digits of Number // For 1000, prefix is 1
       CALL TallyDigitsForOneNumber WITH Prefix, Nines // Tally the count of each 
                                                      // digit in prefix,
                                                      // increment by 999
       Number + = Nines // Increment the loop counter 999 cycles
     Endif
   Endif 
   // End of optimization
 Endfor  

 SUBROUTINE TallyDigitsForOneNumber PARAMS Number, Count
   REPEAT
     Digits [Number% 10] + = Count
     Number = Number / 10
   UNTIL Number = 0

For example, for a range from 786 to 3.021, the counter will be increased:

In 1 from 786 to 790 (5 cycles)
K 9 from 790 to 799 (1 cycle)
In 1 from 799 to 800
K 99 from 800 to 899
In 1 from 899 to 900
K 99 from 900 to 999
In 1 from 999 to 1000
K 999 from 1000 to 1999
By 1 year from 1999 to 2000
By 999, from 2000 to 2999.
In 1 from 2999 to 3000
In 1 from 3000 to 3010 (10 cycles)
In 9 from 3010 to 3019 (1 cycle)
In 1 from 3019 to 3021 (2 cycles)

Total: 28 cycles. Without optimization: 2,235 cycles.

Note that this algorithm solves the problem without leading zeros. To use it with leading zeros, I used a hack:

If you need a range from 700 to 1000 with leading zeros, use the algorithm for 10 700-11 000, and then subtract 1000 - 700 = 300 from the number 1.

Source code and source code

I checked the original approach, the same approach using% 10 and a new solution for some large ranges, with these results:

 Original 104.78 seconds
 With% 10 83.66
 With Powers of Ten 0.07

Screenshot of the test application:
alt text http://clarion.sca.mx/images/stories/digitsbench.png

If you want to see the full source code or run the test, use the following links:

Full source code (in Clarion ): http://sca.mx/ftp/countdigits.txt
Compiled project and win32 exe: http://sca.mx/ftp/countdigits.zip

Accepted answer

the noahlavine solution may be correct, but I just could not follow the pseudo-code, I think some details are missing or not fully explained.

Aaronaught's solution seems to be correct, but the code is too complicated for my taste.

I accepted the answers from the filters because his thought prompted me to develop this new solution.

+49

language-agnostic algorithm count clarion

Carlos Gutiérrez Jan 13 '10 at 19:42

source share

11 answers

There is a clear mathematical solution to such a problem. Suppose the value is zero with the maximum number of digits (this is not the case, but we will compensate for this later), and explain this via:

0 to 9, each digit appears once
From 0 to 99, each digit occurs 20 times (10x at position 1 and 10x at position 2)
Starting from 0-999, each digit occurs 300 times (100x in P1, 100x in P2, 100x in P3)

An obvious template for any given digit, if the range is from 0 to degree 10, N * 10 ^N-1 , where N is power from 10.

What if the range is not equal to 10? Start with the lowest power of 10, then process. The simplest case is a maximum of 399. We know that for every multiple of 100, each digit has at least 20 times, but we must compensate for the number of times that it appears in the most significant digit, which will be 100 for digits 0-3 and exactly equal zero for all other digits. In particular, the additional amount to add is 10 ^N for the corresponding digits.

Introducing this into the formula, for upper bounds that are 1 less than a multiple of 10 (i.e., 399, 6999, etc.), it becomes: M * N * 10 ^N-1 + iif (d <= M, 10 ^N 0)

Now you need to deal with the remainder (which we will call R ). Take 445 as an example. This is what the result is for 399, plus a range of 400-445. In this range, MSD takes place R more times, and all digits (including MSD) also occur at the same frequencies that they will be from the range [0 - R ].

Now we only need to compensate for the leading zeros. This template is simple - simple:

10 ^N + 10 ^N-1 + 10 ^N-2 + ... + ** 10 ⁰

Update: This version correctly takes into account "zero roots", i.e. zeros in the middle positions when working with the remainder ([4 0 0, 4 0 1, 4 0 2, ...]). Calculating padding zeros is a bit ugly, but a revised code (C-style pseudo-code) handles it:

 function countdigits(int d, int low, int high) { return countdigits(d, low, high, false); } function countdigits(int d, int low, int high, bool inner) { if (high == 0) return (d == 0) ? 1 : 0; if (low > 0) return countdigits(d, 0, high) - countdigits(d, 0, low); int n = floor(log10(high)); int m = floor((high + 1) / pow(10, n)); int r = high - m * pow(10, n); return (max(m, 1) * n * pow(10, n-1)) + // (1) ((d < m) ? pow(10, n) : 0) + // (2) (((r >= 0) && (n > 0)) ? countdigits(d, 0, r, true) : 0) + // (3) (((r >= 0) && (d == m)) ? (r + 1) : 0) + // (4) (((r >= 0) && (d == 0)) ? countpaddingzeros(n, r) : 0) - // (5) (((d == 0) && !inner) ? countleadingzeros(n) : 0); // (6) } function countleadingzeros(int n) { int tmp= 0; do{ tmp= pow(10, n)+tmp; --n; }while(n>0); return tmp; } function countpaddingzeros(int n, int r) { return (r + 1) * max(0, n - max(0, floor(log10(r))) - 1); }

As you can see, this got a little ugly, but it still works in O (log n), so if you need to process numbers in billions, it will still give you instant results. :-) And if you run it on a range [0 - 1,000,000], you will get the same distribution as the message posted by the high-performance tag, so I'm pretty sure that is correct.

FYI, the reason for the inner variable is that the initial zero function is already recursive, so it can only be recalculated the first time countdigits are countdigits .

Update 2: In case the code is difficult to read, here is a link to what each line of the return countdigits expression countdigits (I tried the inline comments, but they made the code even more difficult to read):

The frequency of any digit up to a maximum power of 10 (0-99, etc.).
MSD frequency above any multiple of maximum power 10 (100-399)
The frequency of any digits in the remainder (400-445, R = 45)
Additional MSD frequency remaining
The number of zeros in the middle position for the range of residues (404, 405 ...)
Subtract leading zeros only once (from the outermost loop)

+10

Aaronaught Jan 14 '10 at 3:52

source share

I assume that you need a solution in which the numbers are in a range, and you have a start and end number. Imagine starting with the start number and counting until you reach the final number - it will work, but it will be slow. I think the trick with the fast algorithm is to understand that in order to go up one digit at 10 ^ x and keep everything else the same, you need to use all the numbers in front of it 10 ^ x times plus all the numbers 0 -9 10 ^ (x-1) times. (Except that your calculation may be related to moving past the xth digit - I will fix it below).

Here is an example. Let's say you count from 523 to 1004.

First you count from 523 to 524. This uses the numbers 5, 2, and 4 times.
Secondly, count from 524 to 604. The rightmost digit performs 6 cycles for all digits, so you need 6 copies of each digit. The second digit goes through the numbers from 2 to 0, 10 times. The third digit is 6 5 times and 5 100-24 times.
Thirdly, the count is from 604 to 1004. The rightmost digit is 40 cycles, so add 40 copies of each digit. The second of the right digits performs 4 cycles, so add 4 copies of each digit. The leftmost digit is 100 each of 7, 8 and 9, plus 5 of 0 and 100 is 5 of 6. The last digit is 1 time.

To speed up the last bit, look at the part on the rightmost two places. It uses each digit 10 + 1 times. In the general case, 1 + 10 + ... + 10 ^ n = (10 ^ (n + 1) - 1) / 9, which we can use to speed up the calculation even more.

My algorithm should count from the starting number to the ending number (using base-10 counting), but use the above fact to do it quickly. You repeat the numbers of the start number from the smallest to the most significant, and in each place you count so that this number is the same as the number in the final number. At each point, n is the number of counts that must be completed before you move on to the carryover, and m is the number you need to do after that.

Now suppose the pseudo-code counts as a language. Here is what I will do:

 convert start and end numbers to digit arrays start [] and end []
 create an array counts [] with 10 elements which stores the number of copies of
      each digit that you need

 iterate through start number from right to left.  at the i-th digit,
     let d be the number of digits you must count up to get from this digit
         to the i-th digit in the ending number.  (ie subtract the equivalent
         digits mod 10)
     add d * (10 ^ i - 1) / 9 to each entry in count.
     let m be the numerical value of all the digits to the right of this digit,
         n be 10 ^ i - m.
     for each digit e from the left of the starting number up to and including the
         i-th digit, add n to the count for that digit.
     for j in 1 to d
         increment the i-th digit by one, including doing any carries
         for each digit e from the left of the starting number up to and including
             the i-th digit, add 10 ^ i to the count for that digit
     for each digit e from the left of the starting number up to and including the
         i-th digit, add m to the count for that digit.
     set the i-th digit of the starting number to be the i-th digit of the ending
         number.

Oh, and since the value of i increases by one each time, track your old 10 ^ i and just multiply it by 10 to get a new one, instead of increasing the level each time.

+8

Noah Lavine Jan 13 '10 at 20:01

source share

Here is a very bad answer, I am ashamed to publish it. I asked Mathematica to count the numbers used in all numbers from 1 to 1,000,000, without leading 0. Here's what I got:

 0 488895 1 600001 2 600000 3 600000 4 600000 5 600000 6 600000 7 600000 8 600000 9 600000

The next time you order sticky numbers for sale in your store, order in these proportions, you will not go wrong.

+6

High Performance Mark Jan 14 '10 at 3:00

source share

I asked this question on Math Overflow and got an excuse by asking such a simple question. One user felt sorry for me and said that if I posted it on “The Art of Solving Problems,” he would answer it; that's why I did.

Here is the answer he posted:
http://www.artofproblemsolving.com/Forum/viewtopic.php?p=1741600#1741600

Vaguely, my math is inadequate to understand what he published (the guy is 19 years old ... it's so depressing). I really need to take some math classes.

On the bright side, the equation is recursive, so it should be simple to turn it into a recursive function with several lines of code, for those who understand math.

+5

Robert Harvey Jan 14 2018-10-14T00: 00Z

source share

Your approach is wonderful. I'm not sure why you will need anything faster than what you described.

Or it will give you an instant solution: before you really need it, figure out what you need from 1 to some maximum number. You can store the numbers you need at every step. If you have a range similar to your second example, this will be what you need from 1 to 300, minus what you need for 1-50.

Now you have a lookup table that you can call up as you wish. Running up to 10,000 will take only a few MB and, what, a few minutes to calculate, once?

+3

John Jan 13 '10 at 20:01

source share

I know this question has an accepted answer, but I was tasked with writing this code for an interview, and I think I came up with an alternative solution that is fast, does not require cycles and can use or discard leading zeros as necessary.

This is actually quite simple, but not easy to explain.

If you pick the first n numbers

  1 2 3 . . . 9 10 11

Usually, the counting of the numbers necessary from the number of the initial room to the number of the final room starts in the order from left to right, so for the above we have one 1, one 2, one 3 ... one 9, two 1 one zero, four, etc. . Most of the solutions I've seen have used this approach with some optimization to speed it up.

What I did was count vertically in columns, as in hundreds, tens, and units. You know the highest room number, so we can calculate how many of each digit are in the hundreds column through a single whole, and then recount and calculate how many dozens in the column, etc. Then we can subtract leading zeros if we like.

It’s easier to visualize if you use Excel to write numbers, but use a separate column for each digit of the number

  ABC - - - 0 0 1 (assuming room numbers do not start at zero) 0 0 2 0 0 3 . . . 3 6 4 3 6 5 . . . 6 6 9 6 7 0 6 7 1 ^ sum in columns not rows

So, if the highest room number is 671, the hundreds column will have 100 zeros vertically, and then 100 units, etc. up to 71 sixes, if necessary, ignore 100 zeros, as we know that they are all leading.

Then we go to dozens and perform the same operation, we know that there will be 10 zeros, followed by 10, etc., repeated six times, and then the final time up to 2 semesters. Again, you can ignore the first 10 zeros, since we know what they lead. Finally, of course, units, ignoring the first zero as required.

So, there are no cycles, everything is calculated using division. I use recursion to move the "up" columns until the maximum (in this case, hundreds) is reached, and then returns down, communicating as it goes through.

I wrote this in C # and I can publish the code if someone is interested, has not completed any deadlines, but it is essentially instant for values up to 10 ^ 18 numbers.

I could not find such an approach, mentioned here or elsewhere, so I thought that it could be useful for someone.

+3

Simon Hawes May 28 '15 at 2:59

source share

This does not answer your exact question, but it is interesting to note the distribution of the first digits according to Benford Law . For example, if you select a set of numbers at random, 30% of them will start with "1", which is somewhat contrary to intuition.

I don’t know of any distributions describing the next digits, but you could define it empirically and create a simple formula to calculate the approximate number of digits required for any range of numbers.

+1

Alex Reisner Jan 13 '10 at 20:15

source share

If “better” means “clearer,” then I doubt it. If it means “faster,” then yes, but I would not use a faster algorithm instead of a clearer one, unnecessarily.

 #!/usr/bin/ruby1.8 def digits_for_range(min, max, leading_zeros) bins = [0] * 10 format = [ '%', ('0' if leading_zeros), max.to_s.size, 'd', ].compact.join (min..max).each do |i| s = format % i for digit in s.scan(/./) bins[digit.to_i] +=1 unless digit == ' ' end end bins end p digits_for_range(1, 49, false) # => [4, 15, 15, 15, 15, 5, 5, 5, 5, 5] p digits_for_range(1, 49, true) # => [13, 15, 15, 15, 15, 5, 5, 5, 5, 5] p digits_for_range(1, 10000, false) # => [2893, 4001, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000]

Ruby 1.8, a language known as “slow dog,” runs code above 0.135 seconds. This includes loading the interpreter. Do not give up the obvious algorithm if you do not need a higher speed.

+1

Wayne Conrad Jan 13 '10 at 21:39

source share

If you need raw speed for many iterations, try a lookup table:

Create an array with two sizes: 10 x max-house-number

 int nDigits[10000][10] ; // Don't try this on the stack, kids!

Fill each line with the numbers necessary to get this number from zero.
Hint: use the previous line as the beginning:

 n=0..9999: if (n>0) nDigits[n] = nDigits[n-1] d=0..9: nDigits[n][d] += countOccurrencesOf(n,d) //

Number of digits "between" two numbers becomes simple subtraction.

        For range = 51 to 300, take the counts for 300 and subtract the counts for 50.
        0 = nDigits [300] [0] - nDigits [50] [0]
        1 = nDigits [300] [1] - nDigits [50] [1]
        2 = nDigits [300] [2] - nDigits [50] [2]
        3 = nDigits [300] [3] - nDigits [50] [3]
        etc.

+1

phord Jan 15 '10 at 18:01

source share

You can divide each digit ( see here for an example ), create a histogram with entries from 0..9 (which will count how many digits appear in the number) and multiply by the number of requested "numbers".

But if this is not what you are looking for, can you give a better example?

Edited by:

Now I think I have a problem. I think you can take this (pseudo C):

 int histogram[10]; memset(histogram, 0, sizeof(histogram)); for(i = startNumber; i <= endNumber; ++i) { array = separateDigits(i); for(j = 0; k < array.length; ++j) { histogram[k]++; } }

Individual numbers implement a function in a link.

Each histogram position will contain the amount of each digit. for example

 histogram[0] == total of zeros histogram[1] == total of ones

...

Hello

0

Andres Jan 13 '10 at 19:47

source share

strainer · Accepted Answer · 2010-01-14 02:35

To reel in numbers from a number, we would only need an expensive string conversion, if we could not make a mod, numbers can be quickly copied from a number like this:

feed=number; do { digit=feed%10; feed/=10; //use digit... eg. digitTally[digit]++; } while(feed>0)

this loop should be very fast and can simply be placed inside the loop from start to finish for the easiest way to count numbers.

To move faster, for a larger range of numbers, im looking for an optimized method of counting all digits from 0 to the number * 10 ^ value (from beginning to end bazzogles me)

here is a table showing the digits of the digits of some individual significant digits. they include 0, but not the highest value, it was an oversight but it may be a little easier to see patterns (there are no numbers here) These labels do not include trailing zeros,

  1 10 100 1000 10000 2 20 30 40 60 90 200 600 2000 6000 0 1 1 10 190 2890 1 2 3 4 6 9 30 110 490 1690 1 0 1 20 300 4000 1 12 13 14 16 19 140 220 1600 2800 2 0 1 20 300 4000 0 2 13 14 16 19 40 220 600 2800 3 0 1 20 300 4000 0 2 3 14 16 19 40 220 600 2800 4 0 1 20 300 4000 0 2 3 4 16 19 40 220 600 2800 5 0 1 20 300 4000 0 2 3 4 16 19 40 220 600 2800 6 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800 7 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800 8 0 1 20 300 4000 0 2 3 4 6 19 40 120 600 1800 9 0 1 20 300 4000 0 2 3 4 6 9 40 120 600 1800

edit: clear my original thought:
from the brute force table showing from 0 (inclusive) to poweroTen (notinc) it can be seen that majordigit of tenpower:

 increments tally[0 to 9] by md*tp*10^(tp-1) increments tally[1 to md-1] by 10^tp decrements tally[0] by (10^tp - 10) (to remove leading 0s if tp>leadingzeros) can increment tally[moresignificantdigits] by self(md*10^tp) (to complete an effect)

if these coupon adjustments were applied for each significant digit, the bill should be changed as if it were counted from 0 to the end-1

settings can be inverted to delete the previous range (starting number)

Thanks to Aaronaught for your complete and verified answer.

How to count every digit in a range of integers?

Edit:

More articles: