NoSuchMethodError Sets.newConcurrentHashSet () while running jar using hadoop

Question

NoSuchMethodError Sets.newConcurrentHashSet () while running jar using hadoop

I am using cassandra-all 2.0.7 api with hadoop 2.2.0 .

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>zazzercode</groupId> <artifactId>doctorhere-engine-writer</artifactId> <version>1.0</version> <packaging>jar</packaging> <name>DoctorhereEngineWriter</name> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <cassandra.version>2.0.7</cassandra.version> <hector.version>1.0-2</hector.version> <guava.version>15.0</guava.version> <hadoop.version>2.2.0</hadoop.version> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>2.3.2</version> <configuration> <source>1.6</source> <target>1.6</target> </configuration> </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifest> <mainClass>zazzercode.DiseaseCountJob</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>me.prettyprint</groupId> <artifactId>hector-core</artifactId> <version>${hector.version}</version> <exclusions> <exclusion> <artifactId>org.apache.thrift</artifactId> <groupId>libthrift</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.cassandra</groupId> <artifactId>cassandra-all</artifactId> <version>${cassandra.version}</version> <exclusions> <exclusion> <artifactId>libthrift</artifactId> <groupId>org.apache.thrift</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.cassandra</groupId> <artifactId>cassandra-thrift</artifactId> <version>${cassandra.version}</version> <exclusions> <exclusion> <artifactId>libthrift</artifactId> <groupId>org.apache.thrift</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>org.apache.thrift</groupId> <artifactId>libthrift</artifactId> <version>0.7.0</version> </dependency> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>${guava.version}</version> </dependency> <dependency> <groupId>com.googlecode.concurrentlinkedhashmap</groupId> <artifactId>concurrentlinkedhashmap-lru</artifactId> <version>1.3</version> </dependency> </dependencies> </project>

When I run the jar (created after mvn assembly:assembly from a regular prayagupd user), as shown below: hduser ,

 hduser@prayagupd $ hadoop jar target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar /user/hduser/shakespeare

I get the following guava collection error on cassandra api,

 14/11/23 17:51:04 WARN mapred.LocalJobRunner: job_local800673408_0001 java.lang.NoSuchMethodError: com.google.common.collect.Sets.newConcurrentHashSet()Ljava/util/Set; at org.apache.cassandra.config.Config.<init>(Config.java:53) at org.apache.cassandra.config.DatabaseDescriptor.<clinit>(DatabaseDescriptor.java:105) at org.apache.cassandra.hadoop.BulkRecordWriter.<init>(BulkRecordWriter.java:105) at org.apache.cassandra.hadoop.BulkRecordWriter.<init>(BulkRecordWriter.java:90) at org.apache.cassandra.hadoop.BulkOutputFormat.getRecordWriter(BulkOutputFormat.java:69) at org.apache.cassandra.hadoop.BulkOutputFormat.getRecordWriter(BulkOutputFormat.java:29) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:558) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:632) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:445) 14/11/23 17:51:04 INFO mapreduce.Job: map 100% reduce 0%

Line number 53 cassandra api Config.java has this code,

 public Set<String> hinted_handoff_enabled_by_dc = Sets.newConcurrentHashSet();

While I find the Sets class with the jar itself,

 hduser@prayagupd $ jar tvf target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar | grep com/google/common/collect/Sets 2358 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$1.class 2019 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$2.class 1705 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$3.class 1327 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$CartesianSet$1.class 4224 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$CartesianSet.class 5677 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$DescendingSet.class 4187 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredNavigableSet.class 1567 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredSet.class 2614 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$FilteredSortedSet.class 1174 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$ImprovedAbstractSet.class 1361 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$PowerSet$1.class 3727 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$PowerSet.class 1398 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SetView.class 1950 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SubSet$1.class 2058 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$SubSet.class 4159 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets$UnmodifiableNavigableSet.class 17349 Fri Sep 06 15:52:24 NPT 2013 com/google/common/collect/Sets.class

In addition, there is a method when I checked the jar, as shown below,

 hduser@prayagupd $ javap -classpath target/doctorhere-engine-writer-1.0-jar-with-dependencies.jar com.google.common.collect.Sets | grep newConcurrentHashSet public static <E extends java/lang/Object> java.util.Set<E> newConcurrentHashSet(); public static <E extends java/lang/Object> java.util.Set<E> newConcurrentHashSet(java.lang.Iterable<? extends E>);

I see com.google.guava in the /META/INF/maven library when navigating the jar file,

I have the following artifacts in ~/.m2 outside the user's hdfs ,

 $ ll ~/.m2/repository/com/google/guava/guava total 20 drwxrwxr-x 5 prayagupd prayagupd 4096 Nov 23 20:05 ./ drwxrwxr-x 4 prayagupd prayagupd 4096 Nov 23 20:05 ../ drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:05 11.0.2/ drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:06 15.0/ drwxrwxr-x 2 prayagupd prayagupd 4096 Nov 23 20:05 r09/

And the hasoop class path

 $ hadoop classpath /usr/local/hadoop-2.2.0/etc/hadoop: /usr/local/hadoop2.2.0/share/hadoop/common/lib/*: /usr/local/hadoop-2.2.0/share/hadoop/common/*: /usr/local/hadoop-2.2.0/share/hadoop/hdfs: /usr/local/hadoop-2.2.0/share/hadoop/hdfs/lib/*: /usr/local/hadoop-2.2.0/share/hadoop/hdfs/*: /usr/local/hadoop-2.2.0/share/hadoop/yarn/lib/*: /usr/local/hadoop-2.2.0/share/hadoop/yarn/*: /usr/local/hadoop-2.2.0/share/hadoop/mapreduce/lib/*: /usr/local/hadoop-2.2.0/share/hadoop/mapreduce/*: /usr/local/hadoop-2.2.0/contrib/capacity-scheduler/*.jar

Dependency tree

as shown below where com.google.guava:guava:jar:r09:compile uses me.prettyprint:hector-core:jar:1.0-2:compile , whereas guava-11.0.2.jar uses hadoop-2.2.0 or hadoop-2.6.0 and cassandra-2.0.6 use guava-15.0..jar

 $ find /usr/local/apache-cassandra-2.0.6/ -name "guava*" /usr/local/apache-cassandra-2.0.6/lib/guava-15.0.jar /usr/local/apache-cassandra-2.0.6/lib/licenses/guava-15.0.txt $ mvn dependency:tree [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building DoctorhereEngineWriter 1.0 [INFO] ------------------------------------------------------------------------ [INFO] [INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ doctorhere-engine-writer --- [INFO] zazzercode:doctorhere-engine-writer:jar:1.0 [INFO] +- junit:junit:jar:3.8.1:test (scope not updated to compile) [INFO] +- me.prettyprint:hector-core:jar:1.0-2:compile [INFO] | +- commons-lang:commons-lang:jar:2.4:compile [INFO] | +- commons-pool:commons-pool:jar:1.5.3:compile [INFO] | +- com.google.guava:guava:jar:r09:compile [INFO] | +- org.slf4j:slf4j-api:jar:1.6.1:compile [INFO] | +- com.github.stephenc.eaio-uuid:uuid:jar:3.2.0:compile [INFO] | \- com.ecyrd.speed4j:speed4j:jar:0.9:compile [INFO] +- org.apache.cassandra:cassandra-all:jar:2.0.7:compile [INFO] | +- org.xerial.snappy:snappy-java:jar:1.0.5:compile [INFO] | +- net.jpountz.lz4:lz4:jar:1.2.0:compile [INFO] | +- com.ning:compress-lzf:jar:0.8.4:compile [INFO] | +- commons-cli:commons-cli:jar:1.1:compile [INFO] | +- commons-codec:commons-codec:jar:1.2:compile [INFO] | +- org.apache.commons:commons-lang3:jar:3.1:compile [INFO] | +- com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.3:compile [INFO] | +- org.antlr:antlr:jar:3.2:compile [INFO] | | \- org.antlr:antlr-runtime:jar:3.2:compile [INFO] | | \- org.antlr:stringtemplate:jar:3.2:compile [INFO] | | \- antlr:antlr:jar:2.7.7:compile [INFO] | +- org.codehaus.jackson:jackson-core-asl:jar:1.9.2:compile [INFO] | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.2:compile [INFO] | +- jline:jline:jar:1.0:compile [INFO] | +- com.googlecode.json-simple:json-simple:jar:1.1:compile [INFO] | +- com.github.stephenc.high-scale-lib:high-scale-lib:jar:1.1.2:compile [INFO] | +- org.yaml:snakeyaml:jar:1.11:compile [INFO] | +- edu.stanford.ppl:snaptree:jar:0.1:compile [INFO] | +- org.mindrot:jbcrypt:jar:0.3m:compile [INFO] | +- com.yammer.metrics:metrics-core:jar:2.2.0:compile [INFO] | +- com.addthis.metrics:reporter-config:jar:2.1.0:compile [INFO] | | \- org.hibernate:hibernate-validator:jar:4.3.0.Final:compile [INFO] | | +- javax.validation:validation-api:jar:1.0.0.GA:compile [INFO] | | \- org.jboss.logging:jboss-logging:jar:3.1.0.CR2:compile [INFO] | +- com.thinkaurelius.thrift:thrift-server:jar:0.3.3:compile [INFO] | | \- com.lmax:disruptor:jar:3.0.1:compile [INFO] | +- net.sf.supercsv:super-csv:jar:2.1.0:compile [INFO] | +- log4j:log4j:jar:1.2.16:compile [INFO] | +- com.github.stephenc:jamm:jar:0.2.5:compile [INFO] | \- io.netty:netty:jar:3.6.6.Final:compile [INFO] +- org.apache.cassandra:cassandra-thrift:jar:2.0.7:compile [INFO] +- org.apache.hadoop:hadoop-client:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile [INFO] | | +- org.apache.commons:commons-math:jar:2.1:compile [INFO] | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | +- commons-io:commons-io:jar:2.1:compile [INFO] | | +- commons-net:commons-net:jar:3.1:compile [INFO] | | +- commons-logging:commons-logging:jar:1.1.1:compile [INFO] | | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | | +- org.apache.avro:avro:jar:1.7.4:compile [INFO] | | | \- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] | | +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile [INFO] | | \- org.apache.commons:commons-compress:jar:1.4.1:compile [INFO] | | \- org.tukaani:xz:jar:1.0:compile [INFO] | +- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile [INFO] | | \- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.2.0:compile [INFO] | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.2.0:compile [INFO] | | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.2.0:compile [INFO] | | \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-yarn-api:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.2.0:compile [INFO] | | \- org.apache.hadoop:hadoop-yarn-common:jar:2.2.0:compile [INFO] | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.2.0:compile [INFO] | \- org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile [INFO] \- org.apache.thrift:libthrift:jar:0.7.0:compile [INFO] +- javax.servlet:servlet-api:jar:2.5:compile [INFO] \- org.apache.httpcomponents:httpclient:jar:4.0.1:compile [INFO] \- org.apache.httpcomponents:httpcore:jar:4.0.1:compile [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 27.124s [INFO] Finished at: Wed Mar 18 01:39:42 CDT 2015 [INFO] Final Memory: 15M/982M [INFO] ------------------------------------------------------------------------

Here is the hasoop script for hadoop 2.2.0 ,

 $ cat /usr/local/hadoop-2.2.0/bin/hadoop #!/usr/bin/env bash # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # This script runs the hadoop core commands. bin=`which $0` bin=`dirname ${bin}` bin=`cd "$bin"; pwd` DEFAULT_LIBEXEC_DIR="$bin"/../libexec HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR} . $HADOOP_LIBEXEC_DIR/hadoop-config.sh export HADOOP_USER_CLASSPATH_FIRST=true function print_usage(){ echo "Usage: hadoop [--config confdir] COMMAND" echo " where COMMAND is one of:" echo " fs run a generic filesystem user client" echo " version print the version" echo " jar <jar> run a jar file" echo " checknative [-a|-h] check native hadoop and compression libraries availability" echo " distcp <srcurl> <desturl> copy file or directories recursively" echo " archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive" echo " classpath prints the class path needed to get the" echo " Hadoop jar and the required libraries" echo " daemonlog get/set the log level for each daemon" echo " or" echo " CLASSNAME run the class named CLASSNAME" echo "" echo "Most commands print help when invoked w/o parameters." } if [ $# = 0 ]; then print_usage exit fi COMMAND=$1 case $COMMAND in # usage flags --help|-help|-h) print_usage exit ;; #hdfs commands namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups|portmap|nfs3) echo "DEPRECATED: Use of this script to execute hdfs command is deprecated." 1>&2 echo "Instead use the hdfs command for it." 1>&2 echo "" 1>&2 #try to locate hdfs and if present, delegate to it. shift if [ -f "${HADOOP_HDFS_HOME}"/bin/hdfs ]; then exec "${HADOOP_HDFS_HOME}"/bin/hdfs ${COMMAND/dfsgroups/groups} " $@ " elif [ -f "${HADOOP_PREFIX}"/bin/hdfs ]; then exec "${HADOOP_PREFIX}"/bin/hdfs ${COMMAND/dfsgroups/groups} " $@ " else echo "HADOOP_HDFS_HOME not found!" exit 1 fi ;; #mapred commands for backwards compatibility pipes|job|queue|mrgroups|mradmin|jobtracker|tasktracker) echo "DEPRECATED: Use of this script to execute mapred command is deprecated." 1>&2 echo "Instead use the mapred command for it." 1>&2 echo "" 1>&2 #try to locate mapred and if present, delegate to it. shift if [ -f "${HADOOP_MAPRED_HOME}"/bin/mapred ]; then exec "${HADOOP_MAPRED_HOME}"/bin/mapred ${COMMAND/mrgroups/groups} " $@ " elif [ -f "${HADOOP_PREFIX}"/bin/mapred ]; then exec "${HADOOP_PREFIX}"/bin/mapred ${COMMAND/mrgroups/groups} " $@ " else echo "HADOOP_MAPRED_HOME not found!" exit 1 fi ;; classpath) echo $CLASSPATH exit ;; #core commands *) # the core commands if [ "$COMMAND" = "fs" ] ; then CLASS=org.apache.hadoop.fs.FsShell elif [ "$COMMAND" = "version" ] ; then CLASS=org.apache.hadoop.util.VersionInfo elif [ "$COMMAND" = "jar" ] ; then CLASS=org.apache.hadoop.util.RunJar elif [ "$COMMAND" = "checknative" ] ; then CLASS=org.apache.hadoop.util.NativeLibraryChecker elif [ "$COMMAND" = "distcp" ] ; then CLASS=org.apache.hadoop.tools.DistCp CLASSPATH=${CLASSPATH}:${TOOL_PATH} elif [ "$COMMAND" = "daemonlog" ] ; then CLASS=org.apache.hadoop.log.LogLevel elif [ "$COMMAND" = "archive" ] ; then CLASS=org.apache.hadoop.tools.HadoopArchives CLASSPATH=${CLASSPATH}:${TOOL_PATH} elif [[ "$COMMAND" = -* ]] ; then # class and package names cannot begin with a - echo "Error: No command named \`$COMMAND' was found. Perhaps you meant \`hadoop ${COMMAND#-}'" exit 1 else CLASS=$COMMAND fi shift # Always respect HADOOP_OPTS and HADOOP_CLIENT_OPTS HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" #make sure security appender is turned off HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}" export CLASSPATH=$CLASSPATH exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS " $@ " ;; esac

How can I fix the issue with the release of Google?

Actual code here

 git clone --branch doctor-engine-writer https://github.com/prayagupd/doctorhere cd doctorhere/doctorhere-engine-writer

References

Hadoop library conflict at the time of map creation

+5

java jar cassandra hadoop

prayagupd Nov 23 '14 at 12:48

source share

1 answer

Vishnu prathish · Answer 1 · 2015-03-11T17:30:18+0000

Basically you are faced with a version conflict. The problem is this:

Both native hadoop and cassandra libraries use google guava.
But your hadoop version uses an older version of guava (11.xx), while your cassandra is updated and uses guava 16.0. For enterprise applications, hasoop is not very common to update its environment with each new version.
cassandra config loader uses the newConcurrentHashSet () method, which is missing from your old version.
banks used by hadoop are always loaded in front of any third-party banks. Therefore, although the correct version of guava is present in your bank with dependencies, an older version of the guava flag was loaded from the hasoop class path and distributed to your maps / reducers.

Decision:

Set the configuration parameter "mapreduce.job.user.classpath.first" to true in the method for starting your job:
```
 job.getConfiguration().set("mapreduce.job.user.classpath.first", "true"); 
```

Now, in your bin / hadoop, add a statement

 export HADOOP_USER_CLASSPATH_FIRST=true which will tell hadoop to load user defined libraries first.

Make sure the latest version of your library is present in your hadoop class path to the older one.

NoSuchMethodError Sets.newConcurrentHashSet () while running jar using hadoop

References

Decision:

More articles: