Convert delimited file to fix data in CSV file

I have a single column CSV file, and the rows are defined as follows:

123 || food || fruit 123 || food || fruit || orange 123 || food || fruit || apple 

I want to create a csv file with one column and different row values ​​like:

 orange apple 

I tried using the following code:

  val data = sc.textFile("fruits.csv") val rows = data.map(_.split("||")) val rddnew = rows.flatMap( arr => { val text = arr(0) val words = text.split("||") words.map( word => ( word, text ) ) } ) 

But this code does not give me the correct result as needed.
Can anyone help me with this?

+1
source share
2 answers

you need to split on escape for special characters since split accepts regex

 .split("\\|\\|") 

converting to CSV is difficult because data strings can potentially contain delimiters (in quotation marks), new or other syntax characters, so I would recommend using spark-csv

  val df = sqlContext.read .format("com.databricks.spark.csv") .option("delimiter", "||") .option("header", "true") .option("inferSchema", "true") .load("words.csv") 

and

  words.write .format("com.databricks.spark.csv") .option("delimiter", "||") .option("header", "true") .save("words.csv") 
+3
source

you can solve this problem similar to this code

 val text = sc.textFile("fruit.csv") val word = text.map( l => l.split("\\|\\|") val last = word.map( w => w(w.size - 1)) last.distinct.collect 
+1
source

Source: https://habr.com/ru/post/1013397/


All Articles