How can I improve the performance of my Julia program for great numbers?

I played with some Perl programs to compute great numbers . Although the runtime was acceptable for my decisions, I thought that a different language, especially one that is designed for digital things, could be faster. A friend suggested Julia , but the performance that I see is so poor that I have to do something wrong. I looked at Performance Recommendations and don’t see what I should improve:

digits = int( ARGS[1] ) const k = div( digits, 2 ) for a = ( 10 ^ (k - 1) ) : ( 10 ^ (k) - 1 ) front = a * (10 ^ k + a) root = floor( front ^ 0.5 ) for b = ( root - 1 ): ( root + 1 ) back = b * (b - 1); if back > front break end if log(10,b) > k continue end if front == back @printf "%d%d\n" ab end end end 

I have an equivalent C program that is an order of magnitude faster, instead of the coefficient 2 noted on the Julia page (although most of Stackoverflow's questions about Julia speed seem to indicate error tests on this page):

And the non-optimized pure Perl that I wrote takes half the length:

 use v5.20; my $digits = $ARGV[0] // 2; die "Number of digits must be even and non-zero! You said [$digits]\n" unless( $digits > 0 and $digits % 2 == 0 and int($digits) eq $digits ); my $k = ( $digits / 2 ); foreach my $n ( 10**($k-1) .. 10**($k) - 1 ) { my $front = $n*(10**$k + $n); my $root = int( sqrt( $front ) ); foreach my $try ( $root - 2 .. $root + 2 ) { my $back = $try * ($try - 1); last if length($try) > $k; last if $back > $front; # say "\tn: $n back: $back try: $try front: $front"; if( $back == $front ) { say "$n$try"; last; } } } 

I use the pre-compiled Julia for Mac OS X, since I could not get the source code to compile (but I did not try to explode it for the first time). I believe that part of it.

In addition, I see about 0.7 seconds of startup time for any Julia program (see Slow Julia startup time ), which means that an equivalent compiled C program can run 200 times before Julia ends once. As the execution time increases (large digits ) and the startup time means less, my Julia program is still very slow.

I did not get the part for very large numbers (20+ digits of excellent numbers), which I did not understand that Julia could not cope with them better than most other languages.


Here is my C code, which is a little different from when I started this. My rusty, inelegant C skills are essentially the same as my Perl.

 #include <math.h> #include <stdio.h> #include <stdlib.h> int main( int argc, char *argv[] ) { long k, digits, start, end, a, b, front, back, root ; digits = atoi( argv[1] ); k = digits / 2; start = (long) pow(10, k - 1); end = (long) pow(10, k); for( a = start; a < end; a++ ) { front = (long) a * ( pow(10,k) + a ); root = (long) floor( sqrt( front ) ); for( b = root - 1; b <= root + 1; b++ ) { back = (long) b * ( b - 1 ); if( back > front ) { break; } if( log10(b) > k ) { continue; } if( front == back ) { printf( "%ld%ld\n", a, b ); } } } return 0; } 
+6
source share
4 answers

I checked your code ( brian.jl ) for the following code, which tries to make minimal changes to your code and follows the Julian style:

 function excellent(digits) k = div(digits, 2) l = 10 ^ (k - 1) u = (10 ^ k) - 1 for a in l:u front = a * (10 ^ k + a) root = isqrt(front) for b = (root - 1):(root + 1) back = b * (b - 1) back > front && break log10(b) > k && continue front == back && println(a,b) end end end excellent(int(ARGS[1])) 

The separation of u and l was a personal preference for readability. As a baseline, Julia startup time on my machine:

 $ time julia -e '' real 0m0.248s user 0m0.306s sys 0m0.091s 

So, if the calculation that you perform for Julia's cold start is about 0.3 seconds, then Julia might not be a great choice for you at this point. I went through 16 to the scripts and got:

 $ time julia brian.jl 16 1045751633986928 1140820035650625 3333333466666668 real 0m15.973s user 0m15.691s sys 0m0.586s 

and

 $ time julia iain.jl 16 1045751633986928 1140820035650625 3333333466666668 real 0m9.691s user 0m9.839s sys 0m0.155s 

The limitation of this code, as it is written, is that if digits>=20 , we will exceed the Int64 storage. Julia, for performance reasons, does not support the automatic promotion of integer types to integer integers. We can use our knowledge of the problem to solve this problem by changing the last line to:

 digits = int(ARGS[1]) excellent(digits >= 20 ? BigInt(digits) : digits) 

We get the version of BigInt excellent for free, which is nice. Ignoring this at the moment, while profiling my version, I found that ~ 74% of the time was spent calculating log10 , following ~ 19% on isqrt . I did this profiling, replacing the last line with

 excellent(4) # Warm up to avoid effects of JIT @profile excellent(int(ARGS[1])) Profile.print() 

Now, if we wanted to use minor algorithmic changes, given what we now know from the profiler, we can replace the log10 line (which simply checks the number of valid digits) with ndigits(b) > k && continue , which gives us

 $ time julia iain.jl 16 1045751633986928 1140820035650625 3333333466666668 real 0m3.634s user 0m3.785s sys 0m0.153s 

This changes the balance by approximately ~ 56% of isqrt and ~ 28% of ndigits . Digging further, at 56%, about half was spent on executing this line , which seems like a pretty reasonable algorithm, so any improvement is likely to change the spirit of comparison, since that would really be a completely different approach. Learning about machine code with @code_native tends to suggest that there is nothing strange, although I did not delve into it.

If I allow myself to tackle some more minor algorithmic improvements, I can start with root+1 and only do ndigits checks once, i.e.

 for a in l:u front = a * (10^k + a) root = isqrt(front) b = root + 1 ndigits(b) > k && continue front == b*(b-1) && println(a,b) b = root front == b*(b-1) && println(a,b) b = root - 1 front == b*(b-1) && println(a,b) end 

which brings me to

 real 0m2.901s user 0m3.050s sys 0m0.154s 

(I'm not convinced that we need two equality checks, but I'm trying to minimize the differences!). In the end, I thought that I would allocate some additional speed, having previously calculated 10^k , i.e. k10 = 10^k , which seems to be calculated only every iteration. In doing so, I get

 real 0m2.518s user 0m2.670s sys 0m0.153s 

This is a good 20-fold improvement over the source code.

+17
source

I was curious how Perl was getting such good performance from this code, so I felt like I needed to do a comparison. Since there are some seemingly useless differences in the control flow and operations between the versions of Perl and Julia code in this question, I ported each version to a different language and compared all four. I also wrote the fifth version of Julia using more idiomatic numerical functions, but with the same control flow structure as the Perl version question.

The first option is essentially the Perl code from the question, but wrapped in a function:

 sub perl1 { my $k = $_[0]; foreach my $n (10**($k-1) .. 10**($k)-1) { my $front = $n * (10**$k + $n); my $root = int(sqrt($front)); foreach my $t ($root-2 .. $root+2) { my $back = $t * ($t - 1); last if length($t) > $k; last if $back > $front; if ($back == $front) { print STDERR "$n$t\n"; last; } } } } 

Then I transferred it to Julia, preserving the same control flow and using the same operations - it takes the integer floor of the square root from front in the outer loop and takes the length of the “structure” t in the inner loop:

 function julia1(k) for n = 10^(k-1):10^k-1 front = n*(10^k + n) root = floor(Int,sqrt(front)) for t = root-2:root+2 back = t * (t - 1) length(string(t)) > k && break back > front && break if back == front println(STDERR,n,t) break end end end end 

Here's the question, Julia's code with some minor formatting techs wrapped in a function:

 function julia2(k) for a = 10^(k-1):10^k-1 front = a * (10^k + a) root = floor(front^0.5) for b = root-1:root+1 back = b * (b - 1); back > front && break log(10,b) > k && continue if front == back @printf STDERR "%d%d\n" ab # missing break? end end end end 

I translated this to Perl, preserving the same structure of the control flow and using the same operations as the Perl code occupying the root floor, raised to 0.5 in the outer loop and taking the logarithmic base 10 into the inner loop:

 sub perl2 { my $k = $_[0]; foreach my $a (10**($k-1) .. 10**($k)-1) { my $front = $a * (10**$k + $a); my $root = int($front**0.5); foreach my $b ($root-1 .. $root+1) { my $back = $b * ($b - 1); last if $back > $front; next if log($b)/log(10) > $k; if ($front == $back) { print STDERR "$a$b\n" } } } } 

Finally, I wrote a version of Julia with the same control flow structure as the Perl question, but it uses more idiomatic numerical operations - the isqrt and ndigits :

 function julia3(k) for n = 10^(k-1):10^k-1 front = n*(10^k + n) root = isqrt(front) for t = root-2:root+2 back = t * (t - 1) ndigits(t) > k && break back > front && break if back == front println(STDERR,n,t) break end end end end 

As far as I know (I used a lot of Perl programming, but it was a while), there are no versions of Perl for any of these operations, so there is no corresponding version of perl3 .

I completed all five options with Perl 5.18.2 and Julia 0.3.9 respectively, ten times each for 2, 4, 6, 8, 10, 12, and 14 digits. Here are the sync results:

median execution time in seconds vs. digits

The x axis is the number of digits requested. Y-axis - the average time in seconds required to calculate each function. The Y axis is plotted on a log scale (there is some rendering error in the Cairo backend Gadfly , so the superscripts don't appear very raised). We can see that with the exception of the smallest number of digits (2), all three Julia variants are faster than both Perl variants, and julia3 much faster than everything else. How much faster? Here's a comparison of the other four options regarding julia3 (not a logarithmic scale):

time relative to julia3 vs. digits

The x axis is the number of digits requested, and the y axis is how many times slower than each option than julia3 . As you can see here, I was unable to reproduce the Perl performance stated in the question - Perl code was not 2 times faster than Julia - it was 7-40 times slower than julia3 and at least 2 times slower than the slowest Julia version for any non-trivial number of digits. I have not tested Perl 5.20 - maybe someone could follow up on these tests with the new Perl and see if this explains the different results? The code to run the tests can be found here: excellent.pl , excellent.jl . I ran them as follows:

 cat /dev/null >excellent.csv for d in 2 4 6 8 10 12 14; do perl excellent.pl $d >>excellent.csv julia excellent.jl $d >>excellent.csv done 

I analyzed the resulting excellent.csv file using this Julia script .

Finally, as mentioned in the comments, using BigInt or Int128 is an option for learning great great numbers in Julia. However, this requires a little caution when writing the algorithm as a whole. Here is the fourth option that works as a whole:

 function julia4(k) ten = oftype(k,10) for n = ten^(k-1):ten^k-1 front = n*(ten^k + n) root = isqrt(front) for t = root-2:root+2 back = t * (t - 1) ndigits(t) > k && break back > front && break if back == front println(STDERR,n,t) break end end end end 

This is the same as julia3 , but works for universal integer types by converting 10 to the type of its argument. Since the algorithm scales exponentially, however, it takes a lot of time to calculate the number of digits larger than 14.

 julia> @time julia4(int128(10)) # digits = 20 21733880705143685100 22847252005297850625 23037747345324014028 23921499005444619376 24981063345587629068 26396551105776186476 31698125906461101900 33333333346666666668 34683468346834683468 35020266906876369525 36160444847016852753 36412684107047802476 46399675808241903600 46401324208242096401 48179452108449381525 elapsed time: 2260.27479767 seconds (5144 bytes allocated) 

It works, but 37 minutes is a long wait. Using a faster programming language gives you a constant acceleration factor - 40 times in this case - but it only buys a couple of extra digits. To really explore great great numbers, you need to learn the best algorithms.

+11
source

I get a lot of speedup when I use ifloor (which is not specified in Mathematical operations and elementary functions ) instead of floor . Getting rid of floor in favor of isqrt shows the same acceleration. I do not see where this is documented.

Now I see the performance that I would expect, although on my Mac, it seems that Julia could not start k = 10. BigInt may help, but then the performance is rotten. Part of the reason I looked at Julia was my hope that she could handle large numbers easily, so I would have to keep looking at her.

The rest of the expected speed can be hidden when implementing the logarithm algorithms, as Colin noted in the comments.

I had fun checking out this language, and maybe I'll try again when it ripens.

+4
source

You can try to put your code in a function.

 function excellent(k) for a = ( 10 ^ (k - 1) ) : ( 10 ^ (k) - 1 ) front = a * (10 ^ k + a) root = ifloor( sqrt(front) ) # floor() returns a double for b = ( root - 1 ): ( root + 1 ) back = b * (b - 1); if back > front break end if log(10,b) > k continue end if front == back @printf "%d%d\n" ab end end end end @time excellent(7) ## 33333346666668 ## 48484848484848 ## elapsed time: 1.451842881 seconds (14680 bytes allocated) 

For arbitrary precision numbers, you can use BigInt (for example, ten = BigInt("10") , but performance drops ...

+2
source

Source: https://habr.com/ru/post/987887/


All Articles