- Suppose a key value table has at least 10 s of millions of rows.
- Define an operation that takes a large number of identifiers (again, 10 million), finds the corresponding values and sums them up.
Using a database, this operation seems to be approaching (disk seek time) * (number of lookups)
.
Using a flat file and reading all the contents, this operation will approach (file size)/(drive transfer rate)
.
Inclusion of some (rough) values (from Wikipedia and / or experiments):
seek time = 0.5ms
transfer rate = 64MByte/s
file size = 800M
(for 70 million int / double key / values)
65 million value lookups
DB time = 0.5ms * 65000000
= 32500s
= 9 hours
Flat file = 800M/(64MB/s)
=12s
Experimental results are not so bad for MySQL, but a flat file still wins.
:
id/value InniDB MyISAM.
CREATE TABLE `ivi` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`val` double DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB
32 . :
select sum(val) from ivm where id not in (1,12,121,1121); //be sure to change the numbers each time or clear the query cache
/ java.
private static void writeData() throws IOException {
long t = -System.currentTimeMillis();
File dat = new File("/home/mark/dat2");
if (dat.exists()){
dat.delete();
}
FileOutputStream fos = new FileOutputStream(dat);
ObjectOutputStream os = new ObjectOutputStream(new BufferedOutputStream(fos));
for (int i=0; i< 32000000; i++){
os.writeInt(i);
os.writeDouble(i / 2.0);
}
os.flush();
os.close();
t += System.currentTimeMillis();
System.out.println("time ms = " + t);
}
private static void performSummationQuery() throws IOException{
long t = -System.currentTimeMillis();
File dat = new File("/home/mark/dat2");
FileInputStream fin = new FileInputStream(dat);
ObjectInputStream in = new ObjectInputStream(new BufferedInputStream(fin));
HashSet<Integer> set = new HashSet<Integer>(Arrays.asList(11, 101, 1001, 10001, 100001));
int i;
double d;
double sum = 0;
try {
while (true){
i = in.readInt();
d = in.readDouble();
if (!set.contains(i)){
sum += d;
}
}
} catch (EOFException e) {
}
System.out.println("sum = " + sum);
t += System.currentTimeMillis();
System.out.println("time ms = " + t);
}
:
InnoDB 8.0-8.1s
MyISAM 3.1-16.5s
Stored proc 80-90s
FlatFile 1.6-2.4s (even after: echo 3 > /proc/sys/vm/drop_caches)
, . , "" CRUD . , .
, MySQL , , ?
EDIT:
:
1. , , .
2. , , - , . , , , - ad-hoc. . " ".
.