Converting a very large number to base line 10

Question

Converting a very large number to base line 10

An integer measures 4 bytes. In my example, I have 1 MB digits. How can I convert them to a number readable in decimal? fast ?

The number is present in the uint[] array containing the Size elements.

+6

math c #

bytecode77 Mar 01 '12 at 18:19

source share

4 answers

I was thinking about your problem. I don't have a solution encoded, but here is the approach:

First of all, suppose without loss of generality that you have a set of 2 ⁿ bits. (If you have less than 2 ⁿ bits, drop the bit array with leading zeros until you do this. Obviously this does not double the size of the array.) In your case, you say you have a million uints, so bit 2 ²⁵ .

Suppose also that each set of 2 ^{k + 1} bits can be divided equally into two sets of bits, the left and right collections, each with 2 ^k bits.

Thus, each bit collection or subsection has a “size”, which is an exact power of two. The smallest possible collection contains one bit and cannot be further divided.

Secondly, suppose you likewise have an unchanged representation of a number in decimal form, and that again, without loss of generality, there are 2 ^d decimal digits in the string. If there are less than exactly 2 ^d, again, with leading zeros. Again, each decimal collection larger than one can be divided into two collections in half size.

Now we will draw a recursive algorithm:

 DigitCollection Convert(BitCollection bits) { if (bits.Size == 1) return bits[0] ? DigitCollection.SingleOne : DigitCollection.SingleZero; DigitCollection left = Convert(bits.Left); DigitCollection right = Convert(bits.Right); DigitCollection power = MakePowerOfTwo(bits.Right.Size); return Add(Multiply(left, power), right); }

Now we need the Add, Multiply, and MakePowerOfTwo methods. As we will see now, we also need a Subtract and two Shift operators to quickly multiply by powers of ten.

Addition and subtraction are simple. It is clear that if a longer collection contains n digits, then the methods of adding and subtracting can be implemented to take O (n) time.

The FullShift and HalfShift operators create new collections of numbers from the old ones to facilitate quick multiplication by ten. If a set of digits of size 2 ^{d + 1} consists of subcategories (X1, X2), each of which has size 2 ^d, then the “half-shifted” collection contains 2 ^{d + 2} and consists of ((2 ^d leading zeros, X1), (X2 , 2 ^d finite zeros)). A collection with a complete shift consists of ((X1, X2), (2 ^{d + 1} trailing zeros)).

It is obviously very cheap to build.

Multiplication is where we run into big problems . Assume without loss of generality that we multiply together two DigitCollections, each of which has exactly 2 ^d digits. We can use the Karatsuba algorithm:

 DigitCollection Multiply(DigitCollection x, DigitCollection y) { // Assume x and y are the same digit size. if (x and y are both single digits) return the trivial solution; // Assume that z2, z1 and z0 are all the same digit size as x and y. DigitCollection z2 = Multiply(x.Left, y.Left); DigitCollection z0 = Multiply(x.Right, y.Right); DigitCollection z1 = Subtract( Multiply(Add(x.Left, x.Right), Add(y.Left, y.Right)), Add(z2, z0)); return Add(Add(FullShift(z2), HalfShift(z1)), z0); }

What is the order of this algorithm? Suppose n = 2 ^d digits. What is O (Multiply (n))? We repeat three times, each of which has a problem with half the number of digits. The remaining operations of addition, subtraction, and shift do not exceed O (n). So we have repeatability:

 T(n) = 3 T(n/2) + O(n)

which has an easy solution through the main theorem: this algorithm is O (n ^{1 / log 3} ), which has a value of O (n ^1,58 ).

What about MakePowerOfTwo? This is easy, given what we already have. We use the identity:

2 ^{2n + 1} = 2 ⁿ x 2 ⁿ + 2 ⁿ x 2 ⁿ

and write the algorithm:

 DigitCollection MakePowerOfTwo(int p) { if (p == 0) return DigitCollection.SingleOne; DigitCollection power = MakePowerOfTwo(p / 2); power = Multiply(power, power); if (p % 2 != 0) power = Add(power, power); return power; }

It is dominated by the calculation of multiplication, as well as O (n ^1.58 ).

And now we see that multiplication also prevails in the original Convert call.

So, using this algorithm, if you have 2 ^d binary digits for conversion, you can expect that it will take about O (2 ^{1,58 d} ) steps. In your case, you have 2 ²⁵ bits to convert, so it will take about 777 billion calculations.

The key fact here is that the multiplication cost prevails in this algorithm . If you can reduce the cost of multiplication to less than O (n ^1.58 ), you will get huge wins. If I were you, I would study improvements to decimal multiplication algorithms over Karatsuba.

+10

Eric Lippert Mar 2 '12 at 21:25

source share

You may save some time by completing more than one digit at a time. if you do, say, 100,000 at a time, it will probably go at least a little faster than 10 at a time.

Keep in mind that this can still be quite painfully slow, but it will save you some time.

We can assume that you can make it recursive and speed it up much more - get the rough square root of the number, rounded to the nearest indicator 10. div and mod by this number and send the results back to the same function. Keep in mind, I'm not sure how you would work effectively with a div or mod of that size, but if you can figure it out (and not run out of memory), it should still be more time-efficient than dividing it by a digit at a time.

Alternative version: refuse decimal places - since the number will be too large to make sense for any real readers anyway - and do the thing in hexadecimal form. Still a technically readable person, but you can make it a byte at the same time and save a lot of suffering.

+3

Ben barden Mar 01 '12 at 20:41

source share

Thanks to all of you, I understood the way, mainly based on the idea of J ..., which suggested converting a number into 10 based numbers, adding a power of 2 each time there is 1 . But instead of a decimal system based on 10 (human decimal system) I use a system based on 1000000000000000000 (10 ^ 18). Thus, each digit has not only 10 possibilities (0 ... 9), but actually 10 ^ 18! This fits into a 64-bit number, which then converts .ToString()

This is the most effective way.

0

bytecode77 Mar 2 '12 at 21:34

source share

J ... · Accepted Answer · 2012-03-01T20:22:58+0000

I don’t know if this happens faster, but here is an example in delphi that I wrote a long time ago to treat large ints as strings (VERY fast and dirty) - this was for a 128-bit uint, but you could expand it indefinitely

 Function HexToBinShort(hex:integer):string; begin case hex of 0: result:='0000'; //convert each hex digit to binary string 1: result:='0001'; //could do this with high-nybble and low nybble 2: result:='0010'; //of each sequential byte in the array (mask and bit shift) 3: result:='0011'; //ie: binstring:=binstring + HexToBinShort(nybble[i]) 4: result:='0100'; //but must count DOWN for i (start with MSB!) 5: result:='0101'; 6: result:='0110'; 7: result:='0111'; 8: result:='1000'; 9: result:='1001'; 10: result:='1010'; 11: result:='1011'; 12: result:='1100'; 13: result:='1101'; 14: result:='1110'; 15: result:='1111'; end; end;

Then take a concatenated binary string and add two power each time you see "1"

 Function BinToIntString(binstring:string):string; var i, j : integer; var calcHold, calc2 :string; begin calc2:=binstring[Length(binstring)]; // first bit is easy 0 or 1 for i := (Length(binstring) - 1) downto 1 do begin if binstring[i] = '1' then begin calcHold:=generateCard(Length(binstring)-i); calc2 := AddDecimalStrings(calcHold, calc2); end; end; result:=calc2; end;

generateCard is used to create a decimal string representation of 2 ^ i (for i> 0)

 Function generateCard(i:integer):string; var j : integer; var outVal : string; begin outVal := '2'; if i > 1 then begin for j := 2 to i do begin outVal:= MulByTwo(outVal); end; end; result := outVal; end;

and MulByTwo multiplies the decimal string by two

 Function MulByTwo(val:string):string; var i : integer; var carry, hold : integer; var outHold : string; var outString :string; var outString2 : string; begin outString:= StringOfChar('0', Length(val) + 1); outString2:= StringOfChar('0', Length(val)); carry :=0; for i := Length(val) downto 1 do begin hold := StrToInt(val[i]) * 2 + carry; if hold >= 10 then begin carry := 1; hold := hold - 10; end else begin carry := 0; end; outHold := IntToStr(hold); outString[i+1] := outHold[1]; end; if carry = 1 then begin outString[1] := '1'; result := outString; end else begin for i := 1 to length(outString2) do outString2[i]:=outString[i+1]; result := outString2; end; end;

And finally AddDecimalStrings ... well, it adds two decimal strings:

 Function AddDecimalStrings(val1, val2:string):string; var i,j :integer; var carry, hold, largest: integer; var outString, outString2, bigVal, smVal, outHold:string; begin if Length(val1) > Length(val2) then begin largest:= Length(val1); bigVal := val1; smVal := StringOfChar('0', largest); j:=1; for i := (largest - length(val2) +1) to largest do begin smVal[i] := val2[j]; j:=j+1; end; end else begin if length(val2) > Length(val1) then begin largest:=Length(val2); bigVal:=val2; smVal := StringOfChar('0', largest); j:=1; for i := (largest - length(val1) +1) to largest do begin smVal[i] := val1[j]; j:=j+1; end; end else begin largest:=length(val1); bigVal:=val1; smVal:=val2; end; end; carry:=0; outString:=StringOfChar('0', largest +1); outString2:=StringOfChar('0', largest); for i := largest downto 1 do begin hold := StrToInt(bigVal[i]) + StrToInt(smVal[i]) + carry; if hold >=10 then begin carry:=1; hold := hold - 10; end else begin carry:=0; end; outHold:= IntToStr(hold); outString[i+1]:=outHold[1]; end; if carry = 1 then begin outString[1] := '1'; result := outString; end else begin for i := 1 to length(outString2) do outString2[i]:=outString[i+1]; result := outString2; end; end;

These functions allow you to perform basic arithmetic on almost arbitrarily large integers in the form of strings. You find yourself in another wall when the number of digits is too large to, of course, index the array.

Here, the division into two, by the way (useful for going the other way ...). Here I do not process odd numbers.

 Function DivByTwo(val:string):string; var i : integer; var hold : integer; var outHold : string; var outString, outString2 :string; begin outString:=StringOfChar('0',Length(val)); for i := Length(val) downto 1 do begin if StrToInt(val[i]) mod 2 = 0 then begin hold:= Math.Floor(StrToInt(val[i]) / 2); outHold:= IntToStr(hold); outString[i]:=outHold[1]; end else begin hold:= Math.Floor((StrToInt(val[i]) - 1) / 2); outHold:=IntToStr(hold); outString[i]:= outHold[1]; if i <> Length(val) then begin hold:= StrToInt(outString[i+1]) + 5; outHold:= IntToStr(hold); outString[i+1] := outHold[1]; end; end; end; outString2:=StringOfChar('0',Length(val)-1); if (outString[1] = '0') and (length(outString) > 1) then begin for i := 1 to length(outString2) do outString2[i]:=outString[i+1]; result:=outString2; end else begin result:=outString; end; end;

EDIT: I just tried this with a 9 million bit binary string, and it's ridiculously slow! No wonder, really. This is a completely non-optimized code that has a lot of low hanging fruit to pick for speed. However, I cannot help but feel that this is the kind (or scale) of the problem that you probably want to write, at least in part, in a fully optimized assembly. Individual operations are small, but they must be performed many times - these are spells that require assembly. Of course, multithreading can be used here.

Converting a very large number to base line 10

More articles: