Increased download speed of large files

My program uses two large text files (Millions of lines). These files are parsed and loaded into hashes so that you can quickly access data. The problem I am facing is that parsing and loading are currently the slowest part of the program. Below is the code where this is done.

database = extractDatabase(@type).chomp("fasta") + "yml"
revDatabase = extractDatabase(@type + "-r").chomp("fasta.reverse") + "yml"
@proteins = Hash.new
@decoyProteins = Hash.new

File.open(database, "r").each_line do |line|
  parts = line.split(": ")
  @proteins[parts[0]] = parts[1]
end

File.open(revDatabase, "r").each_line do |line|
  parts = line.split(": ")
  @decoyProteins[parts[0]] = parts[1]
end

And the files look like an example below. It started as a YAML file, but the format was changed to increase parsing speed.

MTMDK: P31946   Q14624  Q14624-2    B5BU24  B7ZKJ8  B7Z545  Q4VY19  B2RMS9  B7Z544  Q4VY20
MTMDKSELVQK: P31946 B5BU24  Q4VY19  Q4VY20
....

I got confused with different ways of setting up the file and parsing them, and so far this is the fastest way, but it is still terribly slow.

Is there a way to improve the speed of this, or is there a whole other approach I can take?

List of things that do not work :

  • Yaml.
  • Ruby.
  • , .
+1
4

, , . , ,

buffer = File.readlines(database)
buffer.each do |line|
    ...
end

, , , .

+2

, : , , SQLlite3?

+2

( , (Ruby) BDB "NoSQL" -, .)

, -. mmap. , ( , , , , ). "" (, B- SQL- BDB : -).

:

  • IO, , Ruby.

Widefinder Project, , " -".

+1
0

Source: https://habr.com/ru/post/1759405/


All Articles