Problem with skipping and autorun

Question

Problem with skipping and autorun

I have the following code:

raw_test <- fread("avito_test.tsv", nrows = intNrows, skip = intSkip)

Which causes the following error:

Error in fread("avito_test.tsv", nrows = intNrows, skip = intSkip, autostart = (intSkip +  : 
  Expected sep (',') but new line, EOF (or other non printing character) ends field 14 on line 1003 when detecting types: 10066652  Ð¢Ñ€Ð°Ð½ÑÐ¿Ð¾Ñ€Ñ‚   ÐÐ²Ñ‚Ð¾Ð¼Ð¾Ð±Ð¸Ð»Ð¸ Ñ Ð¿Ñ€Ð¾Ð±ÐµÐ³Ð¾Ð¼  Nissan R Nessa, 1998    Ð¢Ð°Ñ€Ð°Ð½Ñ‚Ð°Ñ Ð² Ð¾Ñ‚Ð»Ð¸Ñ‡Ð½Ð¾Ð¼ ÑÐ¾ÑÑ‚Ð¾ÑÐ½Ð¸Ð¸. Ð½Ð° Ð¿Ñ€Ð¾ÑˆÐ»Ð¾Ð¹ Ð½ÐµÐ´ÐµÐ»Ðµ Ð²Ð¾Ð·Ð¸Ð»Ð¸ Ð½Ð° Ñ‚ÐµÑ…. ÐžÐ±ÑÐ»ÑƒÐ¶Ð¸Ð²Ð°Ð½Ð¸Ðµ. Ð’ Ð´Ð¾Ñ€Ð¾Ð¶Ð½Ñ‹Ñ… Ð½ÐµÐ¿Ñ€Ð¸ÑÑ‚Ð½Ð¾ÑÑ‚ÑÑ… Ð½Ðµ Ð±Ñ‹Ð» ÑƒÑ‡Ð°ÑÑ‚Ð½Ð¸ÐºÐ¾Ð¼. Ð"ÐµÑ‚Ð°Ð»Ð¸ ÐºÑƒÐ·Ð¾Ð²Ð° Ð±ÐµÐ· ÐºÐ¾Ñ†Ð¾Ðº Ð¸ Ñ‚ÐµÑ€Ð¾Ðº. ÐŸÑ€ÐµÐ´Ð½Ð°Ð·Ð½Ð°Ñ‡Ð°Ð»Ð°ÑÑŒ Ð´Ð»Ñ Ð¿Ð¾ÐµÐ·Ð´Ð¾Ðº Ð½Ð° Ð¿Ñ€Ð¸Ñ€Ð¾Ð´Ñƒ, ÐžÑ‚Ð´Ð°Ð¼ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ Ð² Ð´Ð¾Ð±Ñ€Ñ‹Ðµ Ñ€ÑƒÐºÐ¸. Ð’ ÑÐ°Ð»Ð¾Ð½ Ð½Ðµ Ð¿Ð¾ÑÑ‚Ð°Ð²Ð»ÑŽ Ð½Ðµ Ð·Ð²Ð¾Ð½Ð¸Ñ‚Ðµ    "{""ÐœÐ°Ñ€ÐºÐ°"":""Nissan"", ""ÐœÐ¾Ð´ÐµÐ»ÑŒ"":""R Nessa"", ""Ð"Ð¾Ð´ Ð²Ñ‹Ð¿ÑƒÑÐºÐ°"":""1998"", ""ÐŸÑ€Ð¾Ð±ÐµÐ³"":""180 000 - 189 999"", ""Ð¢Ð¸Ð¿ ÐºÑƒÐ·Ð¾Ð²Ð°"":""ÐœÐ¸Ð½Ð¸Ð²ÑÐ½"", ""Ð¦Ð²ÐµÑ‚"":""ÐžÑ€Ð°Ð½Ð¶ÐµÐ²Ñ‹Ð¹"", ""ÐžÐ±ÑŠÑ‘Ð¼ Ð´Ð²Ð¸Ð³Ð°Ñ‚ÐµÐ»Ñ"":""2.4"", ""ÐšÐ¾Ñ€Ð¾Ð±ÐºÐ° Ð¿ÐµÑ€ÐµÐ´Ð°Ñ‡"":""ÐœÐµÑ…Ð°Ð½Ð¸Ñ‡ÐµÑÐºÐ°Ñ

I tried changing it to this:

raw_test <- fread("avito_test.tsv", nrows = intNrows, skip = intSkip, autostart = (intSkip + 2))

Which is based on what I read on a similar issue, skip and autorun in fread

However, it produces a similar error, as indicated above.

How can I skip the first 1000 lines and read the next thousand? My expected result is 1000 lines, skipping the first thousand from my CSV file and reading the second thousand.

Note: Reading a file using raw_test <- fread("avito_test.tsv", nrows = 1000, skip = -1)works well to get me only the first thousand, but I'm trying to get only the second thousand.

Edit: Data is publicly available at http://www.kaggle.com/c/avito-prohibited-content/data

: :

> packageVersion("data.table")
[1] ‘1.9.3’
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

+1

r data.table fread

user1477388 15 . '14 13:20

:

4

fread

:

23

4

4

2

2

1

0

0

0

Problem with skipping and autorun

More articles: