How to use Spark.newAPIHadoopRDD () from Java

I am trying to port an example written in Scala (from the Apache Spark project) to Java and running some problems.

The code

val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(),
  classOf[CqlPagingInputFormat],
  classOf[java.util.Map[String,ByteBuffer]],
  classOf[java.util.Map[String,ByteBuffer]])

from the original example, Scala builds and works just fine, but

JavaPairRDD rdd = jsc.newAPIHadoopRDD(job.getConfiguration(),
  CqlPagingInputFormat.class,
  java.util.Map<String, ByteBuffer>.class,
  java.util.Map<String, ByteBuffer>.class);

not allowed in Java ( Cannot select from parameterized type).

Change

java.util.Map<String, ByteBuffer>.class

at

Class.forName("java.util.Map<String, ByteBuffer>")

produces a new error:

Error:(42, 30) java: method newAPIHadoopRDD in class org.apache.spark.api.java.JavaSparkContext cannot be applied to given types;
required: org.apache.hadoop.conf.Configuration,java.lang.Class<F>,java.lang.Class<K>,java.lang.Class<V>
found: org.apache.hadoop.conf.Configuration,java.lang.Class<org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat>,java.lang.Class<capture#1 of ?>,java.lang.Class<capture#2 of ?>
reason: inferred type does not conform to declared bound(s)
inferred: org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat
bound(s): org.apache.hadoop.mapreduce.InputFormat<capture#1 of ?,capture#2 of ?>

Changing it just java.util.Map.classgives a similar error:

Error:(44, 30) java: method newAPIHadoopRDD in class org.apache.spark.api.java.JavaSparkContext cannot be applied to given types;
required: org.apache.hadoop.conf.Configuration,java.lang.Class<F>,java.lang.Class<K>,java.lang.Class<V>
found: org.apache.hadoop.conf.Configuration,java.lang.Class<org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat>,java.lang.Class<java.util.Map>,java.lang.Class<java.util.Map>
reason: inferred type does not conform to declared bound(s)
inferred: org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat
bound(s): org.apache.hadoop.mapreduce.InputFormat<java.util.Map,java.util.Map>

, ? , newAPIHadoopRDD() Scala Java. Scala : http://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaSparkContext.html#newAPIHadoopRDD (org.apache.hadoop.conf.Configuration, java.lang.Class, java.lang.Class, java.lang.Class) Java.

CqlPagingInputFormat :

public class CqlPagingInputFormat extends org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat<java.util.Map<java.lang.String,java.nio.ByteBuffer>,java.util.Map<java.lang.String,java.nio.ByteBuffer>> {
+4
1

, . newHadoopAPI , org.apache.hadoop.mapreduce.InputFormat org.apache.cassandra.hadoop.cql3.CqlInputFormat InputFormat , org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat, , , InputFormat.

Eclipse groovy, , , Java . , groovy K, V , java .

pom.xml groovy:

<properties>
    <groovy-version>1.8.6</groovy-version>
    <maven-comipler-plugin-version>2.5.1</maven-comipler-plugin-version>
    <groovy-eclipse-compiler-version>2.7.0-01</groovy-eclipse-compiler-version>
    <maven-clover2-plugin-version>3.1.7</maven-clover2-plugin-version>
    <groovy-eclipse-batch-version>1.8.6-01</groovy-eclipse-batch-version>
</properties>
  • groovy

    <dependencies>
        <dependency>
            <groupId>org.codehaus.groovy</groupId>
            <artifactId>groovy-all</artifactId>
            <version>${groovy-version}</version>
        </dependency>
    <dependencies>
    
  • grovvy ,

    <build>
        <pluginManagement>
            <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>${maven-comipler-plugin-version}</version>
                <configuration>
                    <!-- Bind Groovy Eclipse Compiler -->
                    <compilerId>groovy-eclipse-compiler</compilerId>
                    <source>${jdk-version}</source>
                    <target>${jdk-version}</target>
                </configuration>
                <dependencies>
                    <!-- Define which Groovy version will be used for build (default is 
                        2.0) -->
                    <dependency>
                        <groupId>org.codehaus.groovy</groupId>
                        <artifactId>groovy-eclipse-batch</artifactId>
                        <version>${groovy-eclipse-batch-version}</version>
                    </dependency>
                    <!-- Define dependency to Groovy Eclipse Compiler (as it referred 
                        in compilerId) -->
                    <dependency>
                        <groupId>org.codehaus.groovy</groupId>
                        <artifactId>groovy-eclipse-compiler</artifactId>
                        <version>${groovy-eclipse-compiler-version}</version>
                    </dependency>
                </dependencies>
            </plugin>
            <!-- Define Groovy Eclipse Compiler again and set extensions=true. Thanks 
                to this, plugin will -->
            <!-- enhance default build life cycle with an extra phase which adds 
                additional Groovy source folders -->
            <!-- It works fine under Maven 3.x, but we've encountered problems with 
                Maven 2.x -->
            <plugin>
                <groupId>org.codehaus.groovy</groupId>
                <artifactId>groovy-eclipse-compiler</artifactId>
                <version>${groovy-eclipse-compiler-version}</version>
                <extensions>true</extensions>
            </plugin>
            <!-- Configure Clover for Maven plug-in. Please note that it not bound 
                to any execution phase, --> 
            <!-- so you'll have to call Clover goals from command line. -->
            <plugin>
                <groupId>com.atlassian.maven.plugins</groupId>
                <artifactId>maven-clover2-plugin</artifactId>
                <version>${maven-clover2-plugin-version}</version>
                <configuration>
                    <generateHtml>true</generateHtml>
                    <historyDir>.cloverhistory</historyDir>
                </configuration>
            </plugin>
            </plugins>
        </pluginManagement>
    </build>
    

.

+2

Source: https://habr.com/ru/post/1546292/


All Articles