I have the following code:
raw_test <- fread("avito_test.tsv", nrows = intNrows, skip = intSkip)
Which causes the following error:
Error in fread("avito_test.tsv", nrows = intNrows, skip = intSkip, autostart = (intSkip + :
Expected sep (',') but new line, EOF (or other non printing character) ends field 14 on line 1003 when detecting types: 10066652 ТранÑпорт Ðвтомобили Ñ Ð¿Ñ€Ð¾Ð±ÐµÐ³Ð¾Ð¼ Nissan R Nessa, 1998 Ð¢Ð°Ñ€Ð°Ð½Ñ‚Ð°Ñ Ð² отличном ÑоÑтоÑнии. на прошлой неделе возили на тех. ОбÑлуживание. Ð’ дорожных неприÑтноÑÑ‚ÑÑ… не был учаÑтником. Ð"етали кузова без коцок и терок. ПредназначалаÑÑŒ Ð´Ð»Ñ Ð¿Ð¾ÐµÐ·Ð´Ð¾Ðº на природу, Отдам только в добрые руки. Ð’ Ñалон не поÑтавлю не звоните "{""Марка"":""Nissan"", ""Модель"":""R Nessa"", ""Ð"од выпуÑка"":""1998"", ""Пробег"":""180 000 - 189 999"", ""Тип кузова"":""МинивÑн"", ""Цвет"":""Оранжевый"", ""Объём двигателÑ"":""2.4"", ""Коробка передач"":""МеханичеÑкаÑ
I tried changing it to this:
raw_test <- fread("avito_test.tsv", nrows = intNrows, skip = intSkip, autostart = (intSkip + 2))
Which is based on what I read on a similar issue, skip and autorun in fread
However, it produces a similar error, as indicated above.
How can I skip the first 1000 lines and read the next thousand? My expected result is 1000 lines, skipping the first thousand from my CSV file and reading the second thousand.
Note: Reading a file using raw_test <- fread("avito_test.tsv", nrows = 1000, skip = -1)works well to get me only the first thousand, but I'm trying to get only the second thousand.
Edit: Data is publicly available at http://www.kaggle.com/c/avito-prohibited-content/data
: :
> packageVersion("data.table")
[1] ‘1.9.3’
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)