I think I’ve cracked the thing, finally.
Suppose we have a random array arr
of 64 bit integers of n
values (i.e. arr.size
is n
). We know that its size in bytes will be 8*n
.
Firstly, we have to see how many chunks of 2**15
bytes there are inside arr
. The total number of chunks will be equal to this number plus 1. Note that a more general way to get the total memory in bytes of the array using numpy arrays in python is arr.nbytes
(which in this example is equal to 8*n
). Therefore the number of chunks m
is:
m = arr.nbytes//2**15 + 1
where //
represents integer division.
Now we need to know the size in bytes of the last chunk. But in python this may be obtained indirectly, as follows.
Each of the first m-1
chunks will have 2**15/8=4096
elements, since we know that they must have a byte size of 2**15
and hold int64 elements. Therefore, each of the first m-1
chunks are known to be arr[0:4096]
, arr[4096:2*4096]
, …, arr[(m-2)*4096:(m-1)*4096]
. Finally, the last chunk will be arr[(m-1)*4096::]
, whose size is size_last_chunk = arr[(m-1)*4096::].nbytes
.
Now we need to apply the compression, which in this case is zlib
, and get the size of the byte arrays of each compressed chunk. With this information we can finally write the header_array
, base64 encode it, concatenate the base64 encodings of this array and of all the compressed chunks, and finally write the XML file.
As an example, if we have m=4
chunks of a 16000 random int64 array, then we could do:
import numpy as np
import zlib
from base64 import b64encode
# making a random int64 array here for the example
rng = np.random.default_rng(seed=0) # random number generator
arr = rng.integers(low=0, high=3, size=4096*3+3712, dtype='int64')
# this array was chosen to have 16000 elements, equal to 3 chunks of
# 4096 elements plus one chunk of 3712. This can be plotted in a vti
# array of extent 32, 25 and 20
# number of chunks
m = arr.nbytes//2**15 + 1 # equal to 4 here
# compressed chunk 0
compr_chunk0 = zlib.compress(arr[0:4096])
# compressed chunk 1
compr_chunk1 = zlib.compress(arr[4096:2*4096])
# compressed chunk 2
compr_chunk2 = zlib.compress(arr[2*4096:3*4096])
# compressed chunk 3
size_last_chunk = arr[3*4096::].nbytes
compr_chunk3 = zlib.compress(arr[3*4096::])
# header array, assuming uint32 headers in the XML file
head_arr = np.array([m, 2**15, size_last_chunk,
len(compr_chunk0), len(compr_chunk1),
len(compr_chunk2), len(compr_chunk3)], dtype='int32')
# base64 encoding of the header array
b64_head_arr = b64encode(head_arr.tobytes())
# base64 encoding of the concatenation of the compressed chunks
b64_arr = b64encode(compr_chunk0 + compr_chunk1 + compr_chunk2 + compr_chunk3)
# print to XML file (or to sys.stdout, in this case)
print((b64_head_arr + b64_rest).decode('utf-8'))
We can then write the .vti file found at the end of this post considering a 32x25x20 mesh and obtain the following figure. The binary CellData was generated using the above python code.
<VTKFile type="ImageData" version="1.0" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
<ImageData WholeExtent="0 32 0 25 0 20" Origin="0 0 0" Spacing="0.05 0.05 0.05" Direction="1 0 0 0 1 0 0 0 1">
<Piece Extent="0 32 0 25 0 20">
<PointData>
</PointData>
<CellData Scalars="colorsArray">
<DataArray type="Int64" Name="colorsArray" format="binary" RangeMin="0" RangeMax="2">
BAAAAACAAAAAdAAAAQcAAAEHAAAFBwAAWgYAAA==
</DataArray>
</CellData>
</Piece>
</ImageData>
</VTKFile>