I am using apapche spark. I want to access multiple json files from spark by date. How can I select multiple files, that is, I want to provide a range so that files ending in 1034.json, before files ending in 1434.json. I am trying to do this.
DataFrame df = sql.read().json("s3://..../..../.....-.....[1034*-1434*]");
But I get the following error
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.range(Pattern.java:2594)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at org.apache.hadoop.fs.GlobPattern.set(GlobPattern.java:156)
at org.apache.hadoop.fs.GlobPattern.<init>(GlobPattern.java:42)
at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:67)
Specify the exit path.
source
share