Spark-Redis
A library for reading and writing data in Redis using Apache Spark.
Spark-Redis provides access to all of Redis' data structures - String, Hash, List, Set and Sorted Set - from Spark as RDDs. It also supports reading and writing with DataFrames and Spark SQL syntax.
The library can be used both with Redis stand-alone as well as clustered databases. When used with Redis cluster, Spark-Redis is aware of its partitioning scheme and adjusts in response to resharding and node failure events.
Spark-Redis also supports Spark Streaming (DStreams) and Structured Streaming.
Version compatibility and branching
The library has several branches, each corresponds to a different supported Spark version. For example, 'branch-2.3' works with any Spark 2.3.x version. The master branch contains the recent development for the next release.
Spark-Redis | Spark | Redis | Supported Scala Versions |
---|---|---|---|
2.4 | 2.4.x | >=2.9.0 | 2.11, 2.12 |
2.3 | 2.3.x | >=2.9.0 | 2.11 |
1.4 | 1.4.x | 2.10 |
Known limitations
- Java, Python and R API bindings are not provided at this time
Additional considerations
This library is a work in progress so the API may change before the official release.
Documentation
Please make sure you use documentation from the correct branch (2.4, 2.3, etc).
- Getting Started
- RDD
- Dataframe
- Streaming
- Structured Streaming
- Cluster
- Java
- Python
- Configuration
- Dev environment
Contributing
You're encouraged to contribute to the Spark-Redis project.
There are two ways you can do so:
Submit Issues
If you encounter an issue while using the library, please report it via the project's issues tracker.
Author Pull Requests
Code contributions to the Spark-Redis project can be made using pull requests. To submit a pull request:
- Fork this project.
- Make and commit your changes.
- Submit your changes as a pull request.