Binary DataArray in XML using python/numpy: what are the leading int32 values?

I think I’ve cracked the thing, finally.

Suppose we have a random array arr of 64 bit integers of n values (i.e. arr.size is n). We know that its size in bytes will be 8*n.

Firstly, we have to see how many chunks of 2**15 bytes there are inside arr. The total number of chunks will be equal to this number plus 1. Note that a more general way to get the total memory in bytes of the array using numpy arrays in python is arr.nbytes (which in this example is equal to 8*n). Therefore the number of chunks m is:

m = arr.nbytes//2**15 + 1

where // represents integer division.

Now we need to know the size in bytes of the last chunk. But in python this may be obtained indirectly, as follows.

Each of the first m-1 chunks will have 2**15/8=4096 elements, since we know that they must have a byte size of 2**15 and hold int64 elements. Therefore, each of the first m-1 chunks are known to be arr[0:4096], arr[4096:2*4096], …, arr[(m-2)*4096:(m-1)*4096]. Finally, the last chunk will be arr[(m-1)*4096::], whose size is size_last_chunk = arr[(m-1)*4096::].nbytes.

Now we need to apply the compression, which in this case is zlib, and get the size of the byte arrays of each compressed chunk. With this information we can finally write the header_array, base64 encode it, concatenate the base64 encodings of this array and of all the compressed chunks, and finally write the XML file.

As an example, if we have m=4 chunks of a 16000 random int64 array, then we could do:

import numpy as np
import zlib
from base64 import b64encode
# making a random int64 array here for the example
rng = np.random.default_rng(seed=0)  # random number generator
arr = rng.integers(low=0, high=3, size=4096*3+3712, dtype='int64')
# this array was chosen to have 16000 elements, equal to 3 chunks of
# 4096 elements plus one chunk of 3712. This can be plotted in a vti
# array of extent 32, 25 and 20

# number of chunks
m = arr.nbytes//2**15 + 1  # equal to 4 here
# compressed chunk 0
compr_chunk0 = zlib.compress(arr[0:4096])
# compressed chunk 1
compr_chunk1 = zlib.compress(arr[4096:2*4096])
# compressed chunk 2
compr_chunk2 = zlib.compress(arr[2*4096:3*4096])
# compressed chunk 3
size_last_chunk = arr[3*4096::].nbytes
compr_chunk3 = zlib.compress(arr[3*4096::])

# header array, assuming uint32 headers in the XML file
head_arr = np.array([m, 2**15, size_last_chunk,
                     len(compr_chunk0), len(compr_chunk1),
                     len(compr_chunk2), len(compr_chunk3)], dtype='int32')

# base64 encoding of the header array
b64_head_arr = b64encode(head_arr.tobytes())
# base64 encoding of the concatenation of the compressed chunks
b64_arr = b64encode(compr_chunk0 + compr_chunk1 + compr_chunk2 + compr_chunk3)

# print to XML file (or to sys.stdout, in this case)
print((b64_head_arr + b64_rest).decode('utf-8'))

We can then write the .vti file found at the end of this post considering a 32x25x20 mesh and obtain the following figure. The binary CellData was generated using the above python code.

untitled

<VTKFile type="ImageData" version="1.0" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <ImageData WholeExtent="0 32 0 25 0 20" Origin="0 0 0" Spacing="0.05 0.05 0.05" Direction="1 0 0 0 1 0 0 0 1">
  <Piece Extent="0 32 0 25 0 20">
    <PointData>
    </PointData>
    <CellData Scalars="colorsArray">
      <DataArray type="Int64" Name="colorsArray" format="binary" RangeMin="0" RangeMax="2">
        BAAAAACAAAAAdAAAAQcAAAEHAAAFBwAAWgYAAA==
      </DataArray>
    </CellData>
  </Piece>
  </ImageData>
</VTKFile>