MongoDB Connector for Hadoop

The MongoDB Connector for Hadoop is a plugin for Hadoop that provides the ability to use MongoDB as an input source and/or an output destination.

License	License The Apache Software License, Version 2.0
Categories	Categories MongoDB Data Databases
GroupId	GroupId org.mongodb
ArtifactId	ArtifactId mongo-hadoop-core
Last Version	Last Version 1.3.0
Release Date	Release Date 27-Jun-2014
Type	Type jar
Description	Description MongoDB Connector for Hadoop The MongoDB Connector for Hadoop is a plugin for Hadoop that provides the ability to use MongoDB as an input source and/or an output destination.
Project URL	Project URL https://github.com/mongodb/mongo-hadoop.git
Source Code Management	Source Code Management https://github.com/mongodb/mongo-hadoop.git

Download mongo-hadoop-core

Filename	Size
mongo-hadoop-core-1.3.0.pom
mongo-hadoop-core-1.3.0.jar	33 KB
mongo-hadoop-core-1.3.0-sources.jar	29 KB
mongo-hadoop-core-1.3.0-javadoc.jar	296 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/org.mongodb/mongo-hadoop-core/ -->
<dependency>
    <groupId>org.mongodb</groupId>
    <artifactId>mongo-hadoop-core</artifactId>
    <version>1.3.0</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/org.mongodb/mongo-hadoop-core/
implementation 'org.mongodb:mongo-hadoop-core:1.3.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/org.mongodb/mongo-hadoop-core/
implementation ("org.mongodb:mongo-hadoop-core:1.3.0")

Apache Buildr

'org.mongodb:mongo-hadoop-core:jar:1.3.0'

Apache Ivy

<dependency org="org.mongodb" name="mongo-hadoop-core" rev="1.3.0">
  <artifact name="mongo-hadoop-core" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='org.mongodb', module='mongo-hadoop-core', version='1.3.0')
)

Scala SBT

libraryDependencies += "org.mongodb" % "mongo-hadoop-core" % "1.3.0"

Leiningen

[org.mongodb/mongo-hadoop-core "1.3.0"]

Dependencies

compile (7)

Group / Artifact	Type	Version
org.apache.hadoop : hadoop-mapreduce-client-app	jar	2.4.0
org.apache.hadoop : hadoop-mapreduce-client-core	jar	2.4.0
org.apache.hadoop : hadoop-mapreduce-client-jobclient	jar	2.4.0
org.mongodb : mongo-java-driver	jar	2.12.2
org.apache.hadoop : hadoop-mapreduce-client-shuffle	jar	2.4.0
org.apache.hadoop : hadoop-common	jar	2.4.0
org.apache.hadoop : hadoop-mapreduce-client-common	jar	2.4.0

test (10)

Group / Artifact	Type	Version
org.apache.hadoop : hadoop-hdfs	jar	2.4.0
org.apache.hadoop : hadoop-hdfs	jar	2.4.0
org.apache.hadoop : hadoop-yarn-server-tests	jar	2.4.0
org.zeroturnaround : zt-exec	jar	1.6
com.jayway.awaitility : awaitility	jar	1.6.0
org.apache.hadoop : hadoop-mapreduce-client-jobclient	jar	2.4.0
commons-daemon : commons-daemon	jar	1.0.15
org.apache.hadoop : hadoop-common	jar	2.4.0
junit : junit	jar	4.11
org.hamcrest : hamcrest-all	jar	1.3

Project Modules

There are no modules declared in this project.

MongoDB Connector for Hadoop

Purpose

The MongoDB Connector for Hadoop is a library which allows MongoDB (or backup files in its data format, BSON) to be used as an input source, or output destination, for Hadoop MapReduce tasks. It is designed to allow greater flexibility and performance and make it easy to integrate data in MongoDB with other parts of the Hadoop ecosystem including the following:

Check out the releases page for the latest stable release.

Features

Can create data splits to read from standalone, replica set, or sharded configurations
Source data can be filtered with queries using the MongoDB query language
Supports Hadoop Streaming, to allow job code to be written in any language (python, ruby, nodejs currently supported)
Can read data from MongoDB backup files residing on S3, HDFS, or local filesystems
Can write data out in .bson format, which can then be imported to any MongoDB database with mongorestore
Works with BSON/MongoDB documents in other Hadoop tools such as Pig and Hive.

Download

The best way to install the Hadoop connector is through a dependency management system like Maven:

<dependency>
    <groupId>org.mongodb.mongo-hadoop</groupId>
    <artifactId>mongo-hadoop-core</artifactId>
    <version>1.5.1</version>
</dependency>

or Gradle:

compile 'org.mongodb.mongo-hadoop:mongo-hadoop-core:1.5.1'

You can also download the jars files yourself from the Maven Central Repository.

New releases are announced on the releases page.

Requirements

Version Compatibility

These are the minimum versions tested with the Hadoop connector. Earlier versions may work, but haven't been tested.

Hadoop 1.X: 1.2
Hadoop 2.X: 2.4
Hive: 1.1
Pig: 0.11
Spark: 1.4
MongoDB: 2.2

Dependencies

You must have at least version 3.0.0 of the MongoDB Java Driver installed in order to use the Hadoop connector.

Building

Run ./gradlew jar to build the jars. The jars will be placed in to build/libs for each module. e.g. for the core module, it will be generated in the core/build/libs directory.

The Hadoop connector will build against the versions of Hadoop, Hive, Pig, etc. as specified in build.gradle.

After successfully building, you must copy the jars to the lib directory on each node in your hadoop cluster. This is usually one of the following locations, depending on which Hadoop release you are using:

$HADOOP_PREFIX/lib/
$HADOOP_PREFIX/share/hadoop/mapreduce/
$HADOOP_PREFIX/share/hadoop/lib/

mongo-hadoop should work on any distribution of Hadoop. Should you run in to an issue, please file a Jira ticket.

Documentation

For full documentation, please check out the Hadoop Connector Wiki. The documentation includes installation instructions, configuration options, as well as specific instructions and examples for each Hadoop application the connector supports.

Usage with Amazon Elastic MapReduce

Amazon Elastic MapReduce is a managed Hadoop framework that allows you to submit jobs to a cluster of customizable size and configuration, without needing to deal with provisioning nodes and installing software.

Using EMR with the MongoDB Connector for Hadoop allows you to run MapReduce jobs against MongoDB backup files stored in S3.

Submitting jobs using the MongoDB Connector for Hadoop to EMR simply requires that the bootstrap actions fetch the dependencies (mongoDB java driver, mongo-hadoop-core libs, etc.) and place them into the hadoop distributions lib folders.

For a full example (running the enron example on Elastic MapReduce) please see here.

Notes for Contributors

If your code introduces new features, add tests that cover them if possible and make sure that ./gradlew check still passes. For instructions on how to run the tests, see the Running the Tests section in the wiki. If you're not sure how to write a test for a feature or have trouble with a test failure, please post on the google-groups with details and we will try to help. Note: Until findbugs updates its dependencies, running ./gradlew check on Java 8 will fail.

mongodb

Versions

Version
1.3.0 27-Jun-2014
1.0.0 10-Apr-2012
1.0.0-rc0 13-Feb-2012

MongoDB Connector for Hadoop

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download mongo-hadoop-core

How to add to project

Dependencies

compile (7)

test (10)

Project Modules

MongoDB Connector for Hadoop

Purpose

Features

Download

Requirements

Version Compatibility

Dependencies

Building

Documentation

Usage with Amazon Elastic MapReduce

Notes for Contributors

Maintainers

Contributors

Support

mongodb

Versions