śiva format शिव for the JVM 
This library is a Java implementation of siva format. It is intended to be used with any JVM language. The main implementation is written in Go here.
This java library offers an API to read and unpack siva files but not to write them yet.
Usage
siva-java is available on maven central. To include it as a dependency in your project managed by sbt add the dependency to your build.sbt file:
libraryDependencies += "tech.sourced" % "siva-java" % "[version]"
On the other hand, if you use maven to manage your dependencies, you must add the dependency to your pom.xml:
<dependency>
<groupId>tech.sourced</groupId>
<artifactId>siva-java</artifactId>
<version>[version]</version>
</dependency>
If you use gradle to manage your dependencies, add the following to your build.gradle file in the dependencies section:
compile 'tech.sourced:siva-java:[version]'
In all cases, replace [version] with the latest siva-java version.
Example of Usage
package com.github.mcarmonaa.sivaexample;
import org.apache.commons.io.FileUtils;
import tech.sourced.siva.IndexEntry;
import tech.sourced.siva.SivaReader;
import java.io.File;
import java.io.InputStream;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Main {
private static final String SIVA_DIR = "/tmp/siva-files/";
private static final String SIVA_UNPACKED_DIR = "/tmp/siva-unpacked/";
private static final String DEFAULT_SIVA_FILE = SIVA_DIR + "/aac052c42c501abf6aa8c3509424e837bb27e188.siva";
private static final Logger LOGGER = Logger.getLogger(Main.class.getName());
public static void main(String[] args) {
LOGGER.log(Level.INFO, "unpacking siva-file");
try (SivaReader sivaReader = new SivaReader(new File(DEFAULT_SIVA_FILE))) {
List<IndexEntry> index = sivaReader.getIndex().getFilteredIndex().getEntries();
for (IndexEntry indexEntry : index) {
InputStream entry = sivaReader.getEntry(indexEntry);
Path outPath = Paths.get(SIVA_UNPACKED_DIR.concat(indexEntry.getName()));
FileUtils.copyInputStreamToFile(entry, new File(outPath.toString()));
}
} catch (Exception ex) {
LOGGER.log(Level.SEVERE, ex.toString(), ex);
}
}
}
Development
Build
To build the project and generate a jar file:
make build
It leaves the jar file at ./target/siva-java-[version].jar, being [version] the version specified in the build.sbt
Tests
Just run:
make test
Clean
To clean the project:
make clean
Limitations
Some known limitations and implementation divergences regarding the main siva reference specification
All the issues commented below are related to the index part of the blocks since that is where siva really places the metadata. Most of the meta-information is encoded as unsigned values, because of this, most of the problems come from the lack of unsigned values in the JVM.
To avoid these limitations, in some cases, a cast to a bigger number type and a binary AND operation with a mask solves the problem. The trick consists of:
unsigned int8 (byte in Go): 255
if you read this byte in Java, it interprets the value as signed. So the same bits in Java result on:
signed int8 (byte in Java): -1
Casting this value to a java integer, keeps the value as -1, so we apply a binary mask, with the less weight byte set to all "ones" and the rest of the byte to "zeros":
byte b = readByte() // 255 read, but in java the value is -1
int mask = 0x000000FF
int n = b & mask // now n is an integer storing the value 255
This procedure is related on how JVM encodes the number values using two's complement and it can apply for all the types which can be cast to a bigger number type.
Unsigned Integer 64 Limitation!: a siva file with a value in those fields that the specification encodes as uint64 can contain values in range [0, 264-1] while java implementation only supports values in range [0, 264-1-1]. There's no a number type bigger than a long (int64) in java, so this can't be avoided.
Next, are pointed those parts of the index affected by different issues:
-
Index Signature: The reference specification says that a sequence of three bytes (
IBA) is used as the signature but for the reference implementation in Go a byte is anuint8while in java a byte is anint8. The current java implementation doesn't take care about this since the three bytes used are all of them values less than 127, so these values are read properly. -
Index Entry:
- UNIX mode: is encoded as
uint32, so in java implementation is cast to a long. - The offset of the file content, relative to the beginning of the block: this is an
uint64value, so the implementation just read it as a long and check that is not negative. Unsigned Integer 64 Limitation! - Size of the file content: encoded as a
uint64, check no negative. Unsigned Integer 64 Limitation! - CRC32:
uint32value cast to alongjava type. - Flags:
uint32value, it's read without cast type since it only can contain values0 (No Flags)or1 (Deleted).
- UNIX mode: is encoded as
-
Index Footer:
- Number of entries in the block:
uint32value cast tolongjava type. - Index Size in bytes:
uint64value can't be cast, check no negative. Unsigned Integer 64 Limitation! - Block size in bytes:
uint64value cant't be cast, check no negative. Unsigned Integer 64 Limitation! - CRC32:
uint32value cast to alongjava type.
- Number of entries in the block:
Other comments: This java implementation verify the integrity of the index with the CRC in the Index Footer. The integrity of the files should be checked optionally with the CRC kept in the Index Entry by the clients of this library.
License
See LICENSE.