In sparks, I understand how to use wholeTextFilesand textFiles, but I'm not sure what to use when. Here is what I know so far:
- When working with files that are not divided into lines, you should use it
wholeTextFiles, otherwise use it textFiles.
I would think that the default sections are wholeTextFilesboth textFileson the contents of the file and on the lines, respectively. But both of them allow you to change the parameter minPartitions.
So how does changing partitions affect how they are processed?
For example, let's say I have one very large file with 100 lines. What is the difference between treating it as wholeTextFileswith 100 partiions and treating it as textFile(which separates it line by line) using the default value of 100.
What is the difference between the two?
source
share