Notice

OTBV file format specification

Table of contents

1. Introduction

1.1 Rationale

The Octree Binary Volume (OTBV) format specifies the algorithm for compressing and storing binary volumetric data. The compression algorithm is lossless. The format has been developed for the specific purpose of optimising the size of large datasets.

The OTBV format was initially created as a part of a research project concerning large amounts of structural data. The size of the datasets used (>1M samples at 128^3 resolution) has made it impractical to be transferred between computers working on different parts of the project. Due to the known specifics of the data (exactly 3 dimensions, binary data), the OTBV file format was developed to minimise the file size of the dataset.

1.2 Scope

This specification defines:

  1. The algorithms for data respesentation (Section )
  2. The algorithms for data encoding and decoding
  3. The structure of an .otbv file
  4. The expected encoder and decoder software behaviour

The format does not specify any computational optimisations related to volumetric data. Any details regarding such are up to the implementation of a specific encoder/decoder.

1.3 Possible future extensions

It is understood that the current specification of the format make its use limited. As of now, this is by design, as the specificity allows for maximum data density. In the future, it is possible to extend the format to allow for different data types. The other limitation of data is the requirement it be exactly 3-dimensional. While this can potentially be geenralised, such extension is more suited to be a separate format.

2. Data representation

2.1 Conventions

Numeric values are stored with bytes in network order (MSB first).
All integers are unsigned, unless specified otherwise.

2.1 Octree encoding

The data part that represents the octree is encoded with the following pattern.
If there are no Internal Nodes, the volume is homogeneous.
If the volume is not homogeneous, the first node is always an Internal Node. The termination of the last child of the root Internal Node marks the end of the Data chunk.

2.2 Volume encoding ordering

3. File structure

The OTBV file format contains the following chunks, in order:
  1. Signature
  2. Metadata
  3. Data
The lengths of the Signature and the Metadata chunks are constant. The length of the Data chunk is stored in the Metadata chunk.

3.1 Signature

The first 5 bytes of the file compose the signature which identifies the OTBV file format. All OTBV files must have a valid signature.
The signature bytes are the same for all OTBV files:
HEX    4f 54 42 56 96 
ASCII O T B V \226
The first 4 bytes name the file type. The 5th byte is a non-ASCII character, to prevent misidentification of text files that start with letters "OTBV" as OTBV files.
If a decoder fails to validate the signature, the file should not be read further and the user should be notified that the file is malformed.

3.2 Metadata

The Metadata chunk stores additional data needed to read the Data chunk.
The first 3 bits (128, 64, and 32) of the first byte identify the number of padding bits at the start of the Data chunk.
Bit 4 of the first byte denotes if the volume was padded to a cube when encoding.

Bits 5-8 of the first byte are reserved for custom flags.
Bytes 2-5 store the edge length (resolution) of the volume. If bit 4 of the first byte is set, this is the X resolution.
Bytes 6-9 and 10-13 store the Y and Z resolution respectively (if bit 4 of the first byte is not set these should be 0).
Bytes 14-17 store the length (in bytes) of the Data portion of the file.

3.3 Data

The data chunk stores the binary representation of the octree that encodes the volume.

4. Encoder implementation guidelines

5. Decoder implementation guidelines

Decoding the resolution

If bit 4 of the first byte of the Metadata chunk is not set, the volume should be read as cubic. In this case, bytes 2-5 of the Metadata chunk store the edge length of the volume. If that number is X, the volume should be interpreted as having resolution of X*X*X.
If bit 4 is set, bytes 2-5 store the real resolution in the X dimension. Bytes 6-9 and 10-13 store the Y and Z resolution respectively. The real resolution of the volume is X*Y*Z. The encoded volume has the resolution of N*N*N, where N is the smallest power of 2 that is larger than X, Y, and Z. The decoder should interpret the data in the file as being of resolution N*N*N, then trim the resulting volume to X*Y*Z.

Source code

The source code for a compliant library implementing the specified algorithms is available at https://github.com/eceannmor/libotbv.