Access speed, binary hash perl vs mySQL

Question

Access speed, binary hash perl vs mySQL

I currently use a large number of binary perl hashes stored in several file locations to upload data to this cgi website. I am discussing whether mySQL will be faster or slower if I want to store my data there.

Any ideas? I understand that perl hashes are fully loaded into memory.

Gordon

+4

mysql perl

Gordon Feb 14 '11 at 21:06

source share

4 answers

If the Perl hash is processing your data, you probably don't need the overhead of a full SQL database. There are many storage alternatives for storing keys → values, such as DB Berkley and the entire NOSQL movement. Google and you will find a lot of information. Perl interfaces exist in CPAN for many of them.

+2

Bill ruppert Feb 14 '11 at 23:46

source share

Strictly speaking from the point of view of speed, searching for single, exactly matching keys in a direct hash in memory is about as good as you can get if your data cannot be placed into an array. (i.e., it will be accessed only by a series of numerical keys, which form a predominantly adjacent range starting with 0.)

If you have several possible keys that may be required for the search (for example, the identifier of the name and employee) or if you need to perform a search that is not strictly based on equality (for example, “Find all employees with the last name“ Smith ”), then you will slow down significantly due to the need to search through hash keys, and the database will begin to look much better.

Another factor in overall performance is that you mentioned that your hashes are "stored in multiple file locations." If you make only one or several requests, reading hashes into memory from these files also takes time, which again tends to favor the use of a database, which will minimize the amount of unnecessary data that is read from disk.

Thus, it depends on how you need to access your data and your access patterns.

+1

Dave sherohman Feb 15 '11 at 10:59

source share

In addition to what has already been mentioned, you will get more scalability with the database, since it can be uploaded to another server. MySQL has been working for many years to make complex search queries faster, and this is code that you don't need to write. With a binary hash, you should worry about synchronizing with the disk without slowing down the application, ensuring atomicity of recording, maintenance and optimization of the disk, as well as synchronization processing, when several processes access the data at once. Using a database covers all of this for you.

On the other hand, the database equations mean an additional delay for I / O when sending queries and results received over a network or local socket. Don't underestimate the time you can spend here, especially as your data set grows.

It is often recommended that you write a generic API on top of the hash driver. Then, when scalability or concurrency becomes a problem, you can simply add the MySQL driver and migrate the data. Of course, this is a big “easy,” but it's a quick and easy way to promote that limits the impact on the rest of your software if a change becomes necessary.

0

Jeff ober Feb 15 '11 at 13:32

source share

Canspice · Accepted Answer · 2011-02-14T21:13:24+0000

Using a database means your search will be slower, but your script will use less memory.

Using hashes in memory means your searches will be faster, but your script will use more memory.

If you have no memory problems and your hashes will never become bigger, continue to use them.

If you have no memory problems and your hashes will increase, look in the database.

If you have memory problems, use the database.

If you want to use the database to use the database (that is, to learn new skills), use the database.

Access speed, binary hash perl vs mySQL

More articles: