Notice

OTBV file format specification

Table of contents


1. Rationale

The OTBV file format is designed to solve a highly specific problem - I/O bottleneck in cloud computing pipelines, especially using synthetic datasets that often exceed hundreds of gigabytes in size.
OTBV utilises lossless compression by encoding the volume into an octree. The file's size and subsequently the encoding and decoding speed depends on the underlying structure of the volume.
Due to the main goal being optimisation of I/O operations, encoder and decoder specifications include recommendations for minimizing I/O calls.
While encoding/deconding performance was taken into consideration, the computational power of an average modern cloud environment reduces the benefit of optimising speed. Thus, this document does not outline recommendation on the implementation of encoding/decoding algorithms.

2. Data representation

Numeric values are stored with bytes in network order (MSB first).
All integers are unsigned, unless specified otherwise.

2.1 Octree encoding

The data part that represents the octree is encoded with the following pattern.
If there are no Internal Nodes, the volume is homogeneous.
If the volume is not homogeneous, the first node is always an Internal Node. The termination of the last child of the root Internal Node marks the end of the Data chunk.

2.2 Volume encoding ordering


3. File structure

The OTBV file format contains the following chunks, in order:
  1. Signature
  2. Metadata
  3. Data
The lengths of the Signature and the Metadata chunks are constant. The length of the Data chunk is stored in the Metadata chunk.

3.1 Signature

The first 5 bytes of the file compose the signature which identifies the OTBV file format. All OTBV files must have a valid signature.
The signature bytes are the same for all files:
      HEX    4f 54 42 56 96
      ASCII  O  T  B  V  \226
    
The first 4 bytes name the file type. The 5th byte is a non-ASCII character, to prevent misidentification of text files that start with letters "OTBV" as OTBV files.
If a decoder fails to validate the signature, the file should not be read further and the user should be notified that the file is malformed.

3.2 Metadata

The Metadata chunk stores additional data needed to read the Data chunk.
The first 3 bits (128, 64, and 32) of the first byte identify the number of padding bits at the start of the Data chunk.
Bit 4 of the first byte denotes if the volume was padded to a cube when encoding.
Bits 5-8 of the first byte are reserved for custom flags.
Bytes 2-5 store the edge length (resolution) of the volume. If bit 4 of the first byte is set, this is the X resolution.
Bytes 6-9 and 10-13 store the Y and Z resolution respectively (if bit 4 of the first byte is not set these should be 0).
Bytes 14-17 store the length (in bytes) of the Data portion of the file.

3.3 Data

The data chunk stores the binary representation of the octree that encodes the volume.

4. Encoder implementation guidelines

5. Decoder implementation guidelines

Decoding the resolution

If bit 4 of the first byte of the Metadata chunk is not set, the volume should be read as cubic. In this case, bytes 2-5 of the Metadata chunk store the edge length of the volume. If that number is X, the volume should be interpreted as having resolution of X*X*X.
If bit 4 is set, bytes 2-5 store the real resolution in the X dimension. Bytes 6-9 and 10-13 store the Y and Z resolution respectively. The real resolution of the volume is X*Y*Z. The encoded volume has the resolution of N*N*N, where N is the smallest power of 2 that is larger than X, Y, and Z. The decoder should interpret the data in the file as being of resolution N*N*N, then trim the resulting volume to X*Y*Z.