Hadoop Maven Dependencies: MiniDFSCluster & MiniMRCluster

I want to implement a maven project that helps me unit test assignment to Hadoop MapReduce. My biggest problem is defining Maven dependencies in order to be able to use test classes: MiniDFSCluster and MiniMRCluster.

I am using Hadoop 2.4.1. Any ideas?

+6
source share
2 answers

I think I figured it out. In the maven mm file, first add a new repository:

<repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> </repositories> 

Then add the following values ​​depending on the project

 <dependency> <groupId>commons-io</groupId> <artifactId>commons-io</artifactId> <version>2.1</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-auth</artifactId> <version>2.0.0-cdh4.3.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-test</artifactId> <version>2.0.0-mr1-cdh4.3.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.0.0-cdh4.3.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.0.0-cdh4.3.0</version> <classifier>tests</classifier> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.0.0-cdh4.3.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.0.0-cdh4.3.0</version> <classifier>tests</classifier> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>2.0.0-mr1-cdh4.3.0</version> </dependency> 

If someone is interested in getting the whole project (unit test for the famous WordCount MapReduce job, I'm ready to share it)

+2
source

If someone else is looking for an answer:

MiniMRCluster is now deprecated.

You can get MiniDFSCluster and MiniMRCluster depending (shown for Gradle)

 compile group: 'org.apache.hadoop', name: 'hadoop-minicluster', version: '2.7.2' 

A dependency is basically just a pom file that lists the dependencies in this package. For those who want to see this, MiniDFSCluster is in the hadoop-hdfs:tests artifact

You do not need to use dependencies from the Cloudera repository

+5
source

Source: https://habr.com/ru/post/971707/


All Articles