Google Code Jam 2010 Big Data Gets Too Long To Send

I participated in a code error in 2010, and I solved two problems for small data sets, but I am not even close to solving large data sets in an 8-minute time interval.

I am wondering if anyone has decided to solve a large data set:

  • What equipment did you work on?
  • What language do you work in?
  • What performance tuning methods did you make in your code to work as fast as possible?

I write solutions in Ruby, which is not my everyday language and runs them on my Macbook Pro.

My solutions for task A and task C are on github at http://github.com/tjboudreaux/codejam2010 .

I would appreciate any suggestions you may have.

FWIW, I have a lot of experience in C ++ from college, my main language is PHP, and my sandbox is Ruby.

Whether I was a bit ambitious taking up this in Ruby, not knowing where the language fights for performance, or did anyone see something red that I could not complete a large dataset in time to send.

+4
source share
6 answers

I made the same mistake. I write in Python, which by its nature is slower than other languages. I tried to compile it in C ++ code using ShedSkin, but my algorithm was still too slow.

My solution for Snapper Chain just “worked out” the script. Pseudocode:

set first snapper to powered and off // because it is always powered set all other snappers to unpowered and off repeat K times: for each snapper in chain: // process changes in on-off-ness if powered and on: turn off elif powered and off: turn on for each snapper in chain: // process changes in powered-ness if not first snapper and previous snapper is powered and on: set to powered else: set to unpowered 

Later, I realized a solution based on the fact that (2 ^ n) -1 == a binary number with n-1 right 1, but by that time I had not had time on a large set.

EDIT: It’s better to find a solution on the Competition Analysis page of the code jam control panel.

+1
source

Typically, small data sets are designed to be solved using simple algorithms, but large data sets require some clever ideas to reduce the time required for calculation. Language should not be a big problem here if you have the right algorithm.

+15
source

First, consider the algorithm that you use, and try to calculate its complexity. Then browse the language you use to implement it. It is known that Ruby is slower than other languages, this can affect, especially if the set is very large and the time limits are short.

Take a look at this site Language Tests . It compares different languages ​​in terms of speed, performance, and memory consumption.

+3
source

For GCJ, you need the right algorithm. In addition, I would say that it is much more important how quickly you can encode a program in a language than quickly, given the limited encoding time.

I used Python for GCJ and had no case when the language speed "failed" to me. We can say that Python is 2 times faster than Ruby (per lang.benchmarks shootout); and when I used Psyco (the JIT compiler module), I get about 5x acceleration - but it's a little beer, choosing a language can only lead to a linear increase in speed. Say 10 times, big cheers.

Problems in the GCJ, on the other hand, are designed to take into account a combinatorial explosion , and large inputs lead to a significant increase in time (or memory). Taking, for example, GCJ 2010-1C, "Creating Chess Boards." Assuming a square board for simplicity, a naive implementation has complexity O (n 4). The quick but complex implementation of the judge was described as O (n 2 log (n 2)). My simpler solution, which, unfortunately, came to me after the end of the round, was O (n 3). The difference between power 3 to power 4 may not seem very significant, but the 512x512 table was processed at a large input, for it 3 algorithms will have to iterate in the value

 naive 68,719,476,736 mine 134,217,728 judge 4,718,592 

Thus, my implementation at this input will be approximately 30x slower than the judge’s decision, and ~ 500 times faster than the naive code. On my old desktop (1.2 GHz Athlon), my Python code launches a large input just under 4 minutes. I can assume that an optimal solution would have passed in less than 10 seconds, but who cares as long as you fit under 8 minutes?

On the other hand, for the algorithm n ** 4 it will take ~ 500 * 4 min = 33 hours. This is very unacceptable, and an optimizing compiler or processor with an increased clock speed cannot save us from this swamp.

Of course, some optimizations are possible - just adding psyco.full () reduced my time from 5x to 46 seconds. In addition, running the code on my faster laptop (2 GHz dual-core processor) accelerated it up to 3 times. This is “only” 15 times - it doesn’t matter, let’s say that we accelerated it up to 50 times - it’s another 10 times too slow to be able to use the “naive” algorithm at a large input.

So, if you have a bad algorithm, optimizer / compiler / equipment will not help. On the other hand, if you have a better algorithm, you can use the computer 30 times slower than my nine year old PC, and still get results over time with the python / ruby ​​scripting language. Which, by the way, is the main goal of the authors of the GCJ problems - the contestants stand out on the basis of programming skills, and not the compiler / hardware / network connection.

+3
source

language is not a problem, the problem lies in the algorithm itself, you should get a different algorithm, not a brute force method, such as dynamic programming or separation and conquest.

0
source

Since you are using ruby, you will need to find the most efficient algorithms. I would also try using a profiler to find points that use most of your time. There are several available. One open source option is http://ruby-prof.rubyforge.org/ .

-1
source

Source: https://habr.com/ru/post/1309207/


All Articles