It works great for me using the latest version data.table from GitHub. Perhaps two recent changes to README have decided:
fread():
* . Clayton Stanley : fread
* . 2970844 : fread
( 4 , , ):
$ file avito_train.tsv
avito_train.tsv: UTF-8 Unicode text, with very long lines
> DT = fread("Downloads/avito_train.tsv",verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 2.915 GB
File is opened and mapped ok
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Using line 30 to detect sep (the last non blank line in the first 'autostart') ... sep='\t'
Found 13 columns
First row with 13 fields occurs on line 1 (either column names or first row of data)
All the fields on line 1 are character fields. Treating as the column names.
Count of eol after first data row: 3995804
Subtracted 1 for last eol and any trailing empty lines, leaving 3995803 data rows
Type codes ( first 5 rows): 1444441111113
Type codes (+ middle 5 rows): 1444441111113
Type codes (+ last 5 rows): 1444441111113
Type codes: 1444441111113 (after applying colClasses and integer64)
Type codes: 1444441111113 (after applying drop or select (if supplied)
Allocating 13 column slots (13 - 0 dropped)
Read 3995803 rows and 13 (of 13) columns from 2.915 GB file in 00:10:49
82.590s ( 13%) Memory map (rerun may be quicker)
2.930s ( 0%) sep and header detection
68.290s ( 11%) Count rows (wc -l)
0.000s ( 0%) Column type detection (first, middle and last 5 rows)
3.550s ( 1%) Allocation of 3995803x13 result (xMB) in RAM
491.590s ( 76%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.080s ( 0%) Changing na.strings to NA
649.030s Total
.
> head(DT)
itemid category subcategory title
1: 10000010 Toyota Sera, 1991
2: 10000025
3: 10000094 , , Steilmann
4: 10000101 Ford Focus, 2011
5: 10000132 3.0 Bar
6: 10000152 2115 Samara, 2005
description
1: (, ), , 16- , , . ^p ! ! ^p , !!!
2: ^p :8@@PHONE@@
3: . . V . . (+3-4 ). 40
4: , , , .. , . / . !!! .
5: V-6 . V-8 16 .....
6: 8 @@PHONE@@
attrs
1: {"" "":""1991"", "" "":"""", """":""10 000 - 14 999"", "" "":"""", "" "":""1.5"", "" "":"""", """":""Toyota"", """":""Sera"", """":"""", """":"""", """":"""", """":"" ""}
2: {"" "":"", ""}
3: {"" "":"" "", "" "":"" "", """":""46–48 (L)""}
4: {"""":""Ford"", """":""Focus"", "" "":""2011"", """":""80 000 - 84 999"", "" "":"""", """":"""", "" "":""1.6"", "" "":"""", "" "":"""", """":"""", """":"""", """":"" ""}
5: {"" "":"""", "" "":"" ""}
6: {"""":"" (LADA)"", """":""2115 Samara"", "" "":""2005"", """":""140 000 - 149 999"", "" "":"""", """":"""", "" "":""1.5"", "" "":"""", "" "":"""", """":"""", """":"""", """":"" ""}
price is_proved is_blocked phones_cnt emails_cnt urls_cnt close_hours
1: 150000 NA 0 0 0 0 0.03
2: 0 NA 0 1 0 0 22.38
3: 1500 NA 0 0 0 0 0.41
4: 365000 NA 0 0 0 0 8.87
5: 5000 NA 0 0 0 0 11.82
6: 0 NA 0 1 0 0 22.55
.
> tail(DT)
itemid category subcategory title
1: 99999929
2: 99999962 Bridgestone-Blizzak WS-60-225/50 R17--
3: 99999973 1- , 39 ²
4: 99999974 ,
5: 99999977 Nokia
6: 99999982
description
1: 2 1560()*1050() , 2 ,,. . ( , ) 4000 , 7000.
2: 4 . 5-6 , . ^p 16 000 ^p ^p 8-@@PHONE@@
3: . .
4: . , . ^p - , ^p - ^p - ^p - ^p - ^p -
5:
6: . . , , ().
attrs price is_proved is_blocked phones_cnt emails_cnt urls_cnt close_hours
1: {"" "":"" ""} 4000 NA 0 0 0 0 0.69
2: {"" "":"", "", "" "":""""} 16000 NA 0 1 0 0 0.04
3: {"" "":"""", "" "":""1"", "" "":"" "", """":""""} 11000 NA 0 0 0 0 0.20
4: {"" "":"", ""} 0 NA 0 0 0 0 23.50
5: {"" "":""""} 300 NA 0 0 0 0 5.72
6: {"" "":""""} 300 NA 0 0 0 0 19.08
.
> dim(DT)
[1] 3995803 13
.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 20
Model: 2
Stepping: 0
CPU MHz: 800.000 # i.e. my slow netbook (4GB RAM)
BogoMIPS: 1995.01
Virtualisation: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
NUMA node0 CPU(s): 0,1