How does sparking fileStreamidentify new files in the monitoring directory from one interval to another?
Does this rely on new file names or a file creation timestamp or any other approach?
What is the meaning of the argument newFilesOnly?
fileStream(String directory, Class<K> kClass, Class<V> vClass, Class<F> fClass, Function<org.apache.hadoop.fs.Path,Boolean> filter, boolean newFilesOnly, org.apache.hadoop.conf.Configuration conf)
source
share