Awfully new for spark and hive and big data and scala and all. I am trying to write a simple function that takes sqlContext, loads a csv file from s3 and returns a DataFrame. The problem is that this particular csv uses the ^ A character (e.g. \ 001) as a delimiter, and the dataset is huge, so I can't just do "s / \ 001 /, / g" on it. In addition, the fields may contain commas or other characters that I can use as a separator.
I know that the spark-csv package that I use has a separator parameter, but I donβt know how to set it so that it reads \ 001 as a single character, and not something like escaped 0, 0, and 1. Maybe should i use hiveContext or something else?
source share