SQLite Suitability / Performance for Large Time Series Data

I have time series data that I would like to store in a format database:

  • group: string
  • date: date
  • val1: number
  • val2: number
  • ... valN

This database will be almost completely read. A search will be performed for strings belonging to a group that is within the date range (for example, group = XXX and date> = START and date <= END).

The data set is large. Hundreds of millions of lines. Will SQLite handle this type of data easily? The attractive thing about SQLite is that it is serverless, and I would like to use it if I can.

+4
source share
2 answers

Updated Answer

100- RAM- - , RAM, , 11 , 147 , !!! , , , - RAMdrive RAMdrive , , , . , , .

, , 1 , 10 100 , Perl. "GROUP-" 1900 2000 . .

#!/usr/bin/perl
use strict;
use DBI;

my $dsn = "dbi:SQLite:dbname=test.db";
my $user = '';
my $password = '';
my %attr = ( RaiseError => 1, AutoCommit => 0 );

my $dbh = DBI->connect($dsn, $user, $password, \%attr) 
    or die "Can't connect to database: $DBI::errstr";

    $dbh->do("DROP TABLE IF EXISTS TimeSeries;");
    $dbh->do("CREATE TABLE TimeSeries (grp TEXT, date TEXT, val1 INTEGER, val2 INTEGER, val3 INTEGER, val4 INTEGER, PRIMARY KEY(grp,date))");

my $sql = qq{ INSERT INTO TimeSeries VALUES ( ?, ?, ?, ?, ?, ? ) };
my $sth = $dbh->prepare( $sql );

for(my $i=0;$i<100000000;$i++){
      # Synthesize a group
      my $group=sprintf("GROUP-%d",$i);
      $sth->bind_param(1,$group);

      # Generate random date between 1900-2000
      my $year=int(rand(100))+1900;
      my $month=int(rand(12))+1;
      my $day=int(rand(28)+1);
      my $date=sprintf("%d-%02d-%02d 00:00:00.0",$year,$month,$day);
      $sth->bind_param(2,$date);

      $sth->bind_param(3,int(rand(1000000)));
      $sth->bind_param(4,int(rand(1000000)));
      $sth->bind_param(5,int(rand(1000000)));
      $sth->bind_param(6,int(rand(1000000)));
      $sth->execute();
      if(($i % 1000)==0){printf "$i\n";$dbh->commit();}
}
$dbh->commit();
$sth->finish();
$dbh->disconnect();

: 1m, 10m 100m:

-rw-r--r--  1 mark  staff   103M  4 Feb 14:16 1m.db
-rw-r--r--  1 mark  staff   1.0G  4 Feb 14:18 10m.db
-rw-r--r--  1 mark  staff    11G  4 Feb 15:10 100m.db

:

GROUP-794|1927-12-14 00:00:00.0|233545|700623|848770|61504
GROUP-797|1927-06-13 00:00:00.0|315357|246334|276825|799325
GROUP-840|1927-09-28 00:00:00.0|682335|5651|879688|247996
GROUP-907|1927-05-19 00:00:00.0|148547|595716|516884|820007
GROUP-1011|1927-06-01 00:00:00.0|793543|479096|433073|786200

1927 , :

time sqlite3 1m.db 'select * from timeseries where date between "1927-01-01" and "1927-12-31"'

:

all records in year 1927 from 1m record database => 2.7 seconds
all records in year 1927 from 10m record database => 14 seconds
all records in year 1927 from 100m record database => 147 seconds

, , ...

P.S. spec iMac SSD.

+8

, , .. ROWID table group date .

0

Source: https://habr.com/ru/post/1627083/


All Articles