OTBV file format specification
Table of contents
- 1. Rationale
- 2. Data representation
- 3. File structure
- 4. Encoder implementation guidelines
- 5. Decoder implementation guidelines
1. Rationale
The OTBV file format is designed to solve a highly specific problem - I/O bottleneck in cloud computing pipelines, especially using synthetic datasets that often exceed hundreds of gigabytes in size.OTBV utilises lossless compression by encoding the volume into an octree. The file's size and subsequently the encoding and decoding speed depends on the underlying structure of the volume.
Due to the main goal being optimisation of I/O operations, encoder and decoder specifications include recommendations for minimizing I/O calls.
While encoding/deconding performance was taken into consideration, the computational power of an average modern cloud environment reduces the benefit of optimising speed. Thus, this document does not outline recommendation on the implementation of encoding/decoding algorithms.
2. Data representation
Numeric values are stored with bytes in network order (MSB first).All integers are unsigned, unless specified otherwise.
2.1 Octree encoding
The data part that represents the octree is encoded with the following pattern.-
A Leaf Node is represented by 2 bits. The first bit is always
"
0", and the second bit represents the value of the Leaf Node (either "0" or "1"). -
An Internal Node is represented by a single "
1" bit. The data following an Internal Node represents the first child (see Volume encoding ordering), until the child's termination, after which follows the second child's data, and so on, until the last child terminates.
If the volume is not homogeneous, the first node is always an Internal Node. The termination of the last child of the root Internal Node marks the end of the Data chunk.
2.2 Volume encoding ordering
3. File structure
The OTBV file format contains the following chunks, in order: The lengths of the Signature and the Metadata chunks are constant. The length of the Data chunk is stored in the Metadata chunk.3.1 Signature
The first 5 bytes of the file compose the signature which identifies the OTBV file format. All OTBV files must have a valid signature.The signature bytes are the same for all files:
HEX 4f 54 42 56 96
ASCII O T B V \226
The first 4 bytes name the file type. The 5th byte is a non-ASCII character,
to prevent misidentification of text files that start with letters "OTBV" as
OTBV files.
If a decoder fails to validate the signature, the file should not be read further and the user should be notified that the file is malformed.
3.2 Metadata
The Metadata chunk stores additional data needed to read the Data chunk.The first 3 bits (128, 64, and 32) of the first byte identify the number of padding bits at the start of the Data chunk.
Bit 4 of the first byte denotes if the volume was padded to a cube when encoding.
-
If this bit is "
0", the decoder should only read the X resolution and assume the volume is cubic. -
If this bit is "
1", the decoder should read all dimension. See the proper algorithm in 5. Decoder implementation guidelines
Bits 5-8 of the first byte are reserved for custom flags.
Bytes 2-5 store the edge length (resolution) of the volume. If bit 4 of the first byte is set, this is the X resolution.
Bytes 6-9 and 10-13 store the Y and Z resolution respectively (if bit 4 of the first byte is not set these should be 0).
Bytes 14-17 store the length (in bytes) of the Data portion of the file.
3.3 Data
The data chunk stores the binary representation of the octree that encodes the volume.4. Encoder implementation guidelines
5. Decoder implementation guidelines
Decoding the resolution
If bit 4 of the first byte of the Metadata chunk is not set, the volume should be read as cubic. In this case, bytes 2-5 of the Metadata chunk store the edge length of the volume. If that number is X, the volume should be interpreted as having resolution of X*X*X.If bit 4 is set, bytes 2-5 store the real resolution in the X dimension. Bytes 6-9 and 10-13 store the Y and Z resolution respectively. The real resolution of the volume is X*Y*Z. The encoded volume has the resolution of N*N*N, where N is the smallest power of 2 that is larger than X, Y, and Z. The decoder should interpret the data in the file as being of resolution N*N*N, then trim the resulting volume to X*Y*Z.