Binary DataArray in XML using python/numpy: what are the leading int32 values?

I am writing a python function that saves the CellArrays of an ImageData representation from a numpy array to a binary XML .vti file. I have read this VTK support question, but I don’t believe a solution was actually found.

I have found a way to decode the base64 binaries and perform zlib compression in python by following this SO post.

Taking the examples of the first link, we have corresponding ascii and binary files, shown below.

Notice that the float32 data array called “Points” was saved in binary as “AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=”

The thing is, the first portion of the string, i.e. “AQAAAACAAAAwAAAAEQAAAA==” is an uncompressed array of 4 elements of uint32 (because the header_type is “UInt32”). The rest are zlib-compressed representations of float32’s.

So, in python:

import numpy as np
import zlib
import base64

binary_array = b"eJxjYEAGDfaobEw+ADwjA7w="
data_type = 'float32'

head_array = np.frombuffer(base64.b64decode(head), dtype='uint32')

array = np.frombuffer(zlib.decompress(base64.b64decode(binary_array)), dtype=data_type)


[    1 32768    48    17]
[0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 1.]

The question, then, is: what is the head_array?
From reading the zlib python docs I believe the second value in the array is the wbits parameter, i.e. the window size. Whatever that is…
The third value in the array seems to be the complete size of the uncompressed array in bytes.
I have no idea what the first and fourth values represent.
I think the fourth value might be the size of the compressed buffer, but I am not sure.

For completeness, here I show what the above python code generates for the other binaries:

head_array: [ 1, 32768, 32, 19]
data_type: ‘int64’
array: [0, 1, 2, 3]

head_array: [ 1, 32768, 8, 11]
data_type: ‘int64’
array: [4]

head_array: [ 1, 32768, 1, 9]
data_type: ‘uint8’
array: [10]

Ascii file

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
    <Piece NumberOfPoints="4" NumberOfCells="1">
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="ascii" RangeMin="0" RangeMax="1.4142135624">
          0 0 0 1 0 0
          1 1 0 0 1 1
        <DataArray type="Int64" Name="connectivity" format="ascii" RangeMin="0" RangeMax="3">
          0 1 2 3
        <DataArray type="Int64" Name="offsets" format="ascii" RangeMin="4" RangeMax="4">
        <DataArray type="UInt8" Name="types" format="ascii" RangeMin="10" RangeMax="10">

Binary file

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
    <Piece NumberOfPoints="4" NumberOfCells="1">
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="binary" RangeMin="0" RangeMax="1.4142135624">
        <DataArray type="Int64" Name="connectivity" format="binary" RangeMin="0" RangeMax="3">
        <DataArray type="Int64" Name="offsets" format="binary" RangeMin="4" RangeMax="4">
        <DataArray type="UInt8" Name="types" format="binary" RangeMin="10" RangeMax="10">

I still do not know what the first element of the head is, but I have found out what the three last ones are. They are 2**15 (almost always, don’t know when not), the size of the uncompressed array in bytes, the size of the compressed binary array after compression, but without considering base64 encoding!

The following example shows how to obtain the XML binary for the “Points” data array of the example, consisting of floats, i.e. float32, assuming that the header speficies header_type="UInt32":

import numpy as np
import base64
import zlib

arr = np.array([0, 0, 0, 1, 0, 0,
                1, 1, 0, 0, 1, 1], dtype='float32')
arr_comp = zlib.compress(arr)

header = np.array([ 1,  # apparently this is always the case, I think
                2**15,  # from what I have read, this is true in general
           arr.nbytes,  # the size of the array `arr` in bytes
       len(arr_comp)],  # the size of the compressed array

# only use base64 encoding when writing to file
print(f'{(base64.b64encode(header_arr) + base64.b64encode(arr_comp)).decode("utf-8")}')

The output is as expected:


I have re-opened the issue. The previous code does not work if the size of the uncompressed array is larger than 2**15. The header becomes different. Reading the zlib package in the Python docs makes me think that I have to use the zlib.compressobj() object somehow.

I think I’ve cracked the thing, finally.

Suppose we have a random array arr of 64 bit integers of n values (i.e. arr.size is n). We know that its size in bytes will be 8*n.

Firstly, we have to see how many chunks of 2**15 bytes there are inside arr. The total number of chunks will be equal to this number plus 1. Note that a more general way to get the total memory in bytes of the array using numpy arrays in python is arr.nbytes (which in this example is equal to 8*n). Therefore the number of chunks m is:

m = arr.nbytes//2**15 + 1

where // represents integer division.

Now we need to know the size in bytes of the last chunk. But in python this may be obtained indirectly, as follows.

Each of the first m-1 chunks will have 2**15/8=4096 elements, since we know that they must have a byte size of 2**15 and hold int64 elements. Therefore, each of the first m-1 chunks are known to be arr[0:4096], arr[4096:2*4096], …, arr[(m-2)*4096:(m-1)*4096]. Finally, the last chunk will be arr[(m-1)*4096::], whose size is size_last_chunk = arr[(m-1)*4096::].nbytes.

Now we need to apply the compression, which in this case is zlib, and get the size of the byte arrays of each compressed chunk. With this information we can finally write the header_array, base64 encode it, concatenate the base64 encodings of this array and of all the compressed chunks, and finally write the XML file.

As an example, if we have m=4 chunks of a 16000 random int64 array, then we could do:

import numpy as np
import zlib
from base64 import b64encode
# making a random int64 array here for the example
rng = np.random.default_rng(seed=0)  # random number generator
arr = rng.integers(low=0, high=3, size=4096*3+3712, dtype='int64')
# this array was chosen to have 16000 elements, equal to 3 chunks of
# 4096 elements plus one chunk of 3712. This can be plotted in a vti
# array of extent 32, 25 and 20

# number of chunks
m = arr.nbytes//2**15 + 1  # equal to 4 here
# compressed chunk 0
compr_chunk0 = zlib.compress(arr[0:4096])
# compressed chunk 1
compr_chunk1 = zlib.compress(arr[4096:2*4096])
# compressed chunk 2
compr_chunk2 = zlib.compress(arr[2*4096:3*4096])
# compressed chunk 3
size_last_chunk = arr[3*4096::].nbytes
compr_chunk3 = zlib.compress(arr[3*4096::])

# header array, assuming uint32 headers in the XML file
head_arr = np.array([m, 2**15, size_last_chunk,
                     len(compr_chunk0), len(compr_chunk1),
                     len(compr_chunk2), len(compr_chunk3)], dtype='int32')

# base64 encoding of the header array
b64_head_arr = b64encode(head_arr.tobytes())
# base64 encoding of the concatenation of the compressed chunks
b64_arr = b64encode(compr_chunk0 + compr_chunk1 + compr_chunk2 + compr_chunk3)

# print to XML file (or to sys.stdout, in this case)
print((b64_head_arr + b64_rest).decode('utf-8'))

We can then write the .vti file found at the end of this post considering a 32x25x20 mesh and obtain the following figure. The binary CellData was generated using the above python code.


<VTKFile type="ImageData" version="1.0" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <ImageData WholeExtent="0 32 0 25 0 20" Origin="0 0 0" Spacing="0.05 0.05 0.05" Direction="1 0 0 0 1 0 0 0 1">
  <Piece Extent="0 32 0 25 0 20">
    <CellData Scalars="colorsArray">
      <DataArray type="Int64" Name="colorsArray" format="binary" RangeMin="0" RangeMax="2">


I’m currently doing on a similar work, except that in my case I’m trying to extract data from a binary (zlib compressed) xml file.

The problem I’m facing is that I’m not able to determine where the header block ends, and where data blocks start. Based on what you wrote, I’m performing test to figure out how it works, but no clear for me so far.

I’m wondering if a python library exist (such format is not so recent) ?

Thanks for any advice


Just an example including several blocks:
Structure =[ 4 65536 4992 11657 8550 8336 685]

I found some bugs, but cannot edit the answer.
1- change the head_arr dtype from 'int32' to 'u4'
2- change the last line to print((b64_head_arr + b64_arr).decode('utf-8'))

Still digging to understand how it works :slight_smile:

Hello Paul,

Sorry I did not address you before, I was going to do so quickly afterwards, but this is taking quite longer than I had anticipated. I am also stuck!

I was going to suggest to base64 decode the whole line, then create an array from buffer of just the first 4 bytes if the header_type is “UInt32” or the first 8 bytes if the header_type is “UInt64”. This way you can get the number of chunks. With the number of chunks and the header_type, you will know the size of the header in hex.

But something weird is happing in Python. If you base64 decode the whole xml string, you only get the header array. It seems that the base64 decoder ignores the rest of the concatenated binaries…

There is a python library called meshio that converts meshes to .vtu format, but I’m not sure if this is helpful.

Example in Python:

import numpy as np
import zlib  # note: max wbits here is 2**15=32768, not 2**16=65536
from base64 import b64decode, b64encode

# header_type UInt32
head_arr_4 = np.array([4, 65536, 4992, 11657, 8550, 8336, 685], dtype='=u4')
# header_type UInt64
head_arr_8 = np.array([4, 65536, 4992, 11657, 8550, 8336, 685], dtype='=u8')

# simulating the encoded string obtained from the file
# two small zlib compressed int64 arrays are used as an example,
# but note that they do not obey the header rule
arr0 = zlib.compress(np.arange(10, dtype='=i8'))
arr1 = zlib.compress(np.arange(20, 30, dtype='=i8'))
xml_b64_4 = b64encode(head_arr_4) + b64encode(arr0 + arr1)
xml_b64_8 = b64encode(head_arr_8) + b64encode(arr0 + arr1)

# base64 decoded string
decoded_xml_4 = b64decode(xml_b64_4)
decoded_xml_8 = b64decode(xml_b64_8)

# get the number of chunks
m_4 = int(np.frombuffer(decoded_xml_4[0:4], dtype='=u4')[0])
# 4
m_8 = int(np.frombuffer(decoded_xml_8[0:8], dtype='=u8')[0])
# 4

# retrieve the whole header
retrieved_header_4 = np.frombuffer(decoded_xml_4[0:4*(3+m_4)], dtype='=u4')
retrieved_header_8 = np.frombuffer(decoded_xml_8[0:8*(3+m_8)], dtype='=u8')

# PROBLEM: the rest of xml_b64_4 and xml_b64_8 are not retrived!
# decoded_xml_4[4*(3+m_4)::] is b'' (emtpy) ???
# decoded_xml_8[8*(3+m_8)::] is b'' (empty) ???


Here is where I’m so far ; the following example represents vertices coordinates of au unitary hexaedron:

  • Note that it stats with an “_” that must be ignored (see VTK doc - (
  • by hand I’ve noticed where the header finishes and when the data chunk starts
  • I’m trying to figure out the goal of the header and how to retrieve data from the GetInformations array, but I’ve not made the link


import numpy as np
import zlib
import base64

BinaryPart = base64.b64decode(Xml_file_extract[1:])  # the "_" at the beginning has to be ignored (
GetInformations = np.frombuffer(BinaryPart, dtype=np.uint32)  # from doc when enothing has been specified for the header (TBC)

RawData =  base64.b64decode('eAFjYCAGfLBHVQXjw2iYLDofJo5Oo6uD8YmlYeZ9sAcA4DEONQ==')   ##  [24:]
UncompressedData = zlib.decompress(RawData)                                                  
NodesCoordinates = np.frombuffer(UncompressedData, dtype=np.float64).reshape(8,3) 

leading to:

Information's'=[    1 65536   192    37]
NodesCoordinates=[[0. 0. 0.]
 [0. 1. 0.]
 [1. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 1.]
 [1. 1. 1.]
 [1. 0. 1.]]

Let me digging in your last post


In addition in the doc:


Each token is an integer value whose type is specified by “header_type” at the top of the file (UInt32 if no type specified). The token meanings are:
[#blocks] = Number of blocks
[#u-size] = Block size before compression
[#p-size] = Size of last partial block (zero if it not needed)
[#c-size-i] = Size in bytes of block i after compression

Okay, I think I’ve got it now. Not sure if this is the best option though.

I am assuming UInt32 headers here, for simplicity. As an example I used np.arange(1_000, dtype=np.float64) with blocks of size 1024 bytes, to keep the XML “small”.

import numpy as np
import zlib
from base64 import b64encode, b64decode

# the example array here is np.arange(1_000, dtype=np.float64) with same endianness
# of the system, i.e. dtype='=f8'
# Information (or "header") is:
# [   8, 1024,  832,  210,  172,  188,  188,  204,  205,  204,  177]
# I used 1024 byte blocks to keep the XML example here "small"

Xml_file_extract = '_CAAAAAAEAABAAwAA0gAAAKwAAAC8AAAAvAAAAMwAAADNAAAAzAAAALEAAAA=eJxNyTdOHQAQRdFXuqSgcEGBLIQQQghMTv6fnLHJGbYyS5sleQkU/xRMc+bpJt/v/8AzHPGDYxznT05wkr84xWnOcJZznOcCF/mbS1zmCle5xnVucJNb3OYO/3DA4chidm1mz2b2bebAZg5t5shmjm3mxGZObebMZs5t5sJmLm3mymb+2sy/kUMWm7nWWWzmRmexmVudxWbudBabuddZbOZBZ7GZR53FZp50Fpt51lls5kVnsZlXncVm3nQWm3nXWWzmQ2exmU+dxf4cfgGsFmSQeJwty0tRJQAMAMFIQUqkhP0Bu7DPQqRESqRECgXVc+nTRHxV+c0Tk8XmcHmMZz+TxeZweYwffiaLzeHyGD/9TBabw+UxfvmZLDaHy2P89jNZbA6Xx/jjZ7LYHC6P8eJnstgcLo/x6mey2Bwuj/HmZ7LYHC6P8dfPZLE5XB7jn5/JYnO4PMa7n8lic7g8xoefyWJzuDzGfz+TxeZweYyHn8lic7i8R34CTH+LwXicLc1JcUUhAABBJEQCEpCABCQggaz/igQkPAlIQAISkICEVFI9lz5OCH+1/M8bIxMzCysbOwcfTi5uHl6Gd39GJmYWVjZ2Dj6cXNw8vAwf/oxMzCysbOwcfDi5uHl4GT79GZmYWVjZ2Dn4cHJx8/AyfPkzMjGzsLKxc/Dh5OLm4WX49mdkYmZhZWPn4MPJxc3Dy/Djz8jEzMLKxs7Bh5OLm4eX4eXPyMTMwsrGzsGHk4ubh/eVfwHljZXBeJwtzTtxQCEAAEEkRAISkIAEJCCBMv8g4UlAAhKQgAQkICGTzF6z5YXw12v+54WRiZmFlY2dDwcnFzcPL8ObPyMTMwsrGzsfDk4ubh5ehnd/RiZmFlY2dj4cnFzcPLwMH/6MTMwsrGzsfDg4ubh5eBk+/RmZmFlY2dj5cHBycfPwMnz5MzIxs7CysfPh4OTi5uFl+PZnZGJmYWVj58PBycXNw8vw48/IxMzCysbOh4OTi5uH9yf/At4FmcF4nC3FEXSEAAAA0PAwPAzDMAzDMAzDMGybhGF4eBiGYRiGYRiGe9v9Lz8I/nTZfw+Hfjpy7MSpM+cuXLpy7catO/cePPrltyfPXrx68+7Dpy/fDr4+Pxz66cixE6fOnLtw6cq1G7fu3Hvw6Jffnjx78erNuw+fvnw7+P78cOinI8dOnDpz7sKlK9du3Lpz78GjX3578uzFqzfvPnz68u3g5/PDoZ+OHDtx6sy5C5euXLtx6869B49++e3Jsxev3rz78OnL90/2Cy5fnsF4nC3FEZSDAAAA0DAchmEYhuEwDIdhGAZ3tzAMw3AYhmEYhmE4vHe3/+UHwZ+v/L/QN0eOnTh15tx3Fy79cOXajVt37j149OSXZy9evXn34dOX3w6+P4e+OXLsxKkz5767cOmHK9du3Lpz78GjJ788e/HqzbsPn778dvDzOfTNkWMnTp05992FSz9cuXbj1p17Dx49+eXZi1dv3n349OW3g+fn0DdHjp04debcdxcu/XDl2o1bd+49ePTkl2cvXr159+HTl9/P/Bcqm6DBeJwtxRF0hAAAANAwDA/DMAzDwzAMwzAM2yZhGIZheHgYhmEYhnvb/S8/CP70+X+hIz8cO3HqzLmfLly6cu3GrTv3Hjx68uzFq19+e/Puw6cv3w6+PoeO/HDsxKkz5366cOnKtRu37tx78OjJsxevfvntzbsPn758O/j+HDryw7ETp86c++nCpSvXbty6c+/BoyfPXrz65bc37z58+vLt4Odz6MgPx06cOnPupwuXrly7cevOvQePnjx78eqX3968+/Dpy/dP/gsm16LBeJwtxRGUgwAAANAwDMMwDMMwDIdhGIZhcHcbDofDYRiGYRiGYXjvrv/lB8Gfr+K/0JFjJ06dOXfh0pVvrt24defegx9++uW3Px49efbi1Zt3Hz4dfF+Hjhw7cerMuQuXrnxz7catO/ce/PDTL7/98ejJsxev3rz78Ong5zp05NiJU2fOXbh05ZtrN27duffgh59++e2PR0+evXj15t2HTwf369CRYydOnTm/F7+p0YK5'

header_bin = b64decode(Xml_file_extract[1::])
header_arr = np.frombuffer(header_bin, dtype='=u4')
header_b64_len = len(b64encode(header_bin))

# compressed array
arr_comp = b64decode(Xml_file_extract[1+header_b64_len::])

float64_size = 8  # in bytes

nb = header_arr[0]  # number of blocks
bs = header_arr[1]  # block size in bytes
ls = header_arr[2]  # size of last block in bytes

# pointer array for each block in arr_comp
ptr_arr = np.insert(header_arr[3::], 0, 0)
ptr_arr = np.cumsum(ptr_arr)

sl_s = bs//float64_size  # slice size (in quantity, not bytes)

# retrieve array
complete_arr = np.empty(((nb-1)*bs+ls)//float64_size, dtype=np.float64)
for i in range(nb-1):
    complete_arr[i*sl_s:(i+1)*sl_s] = np.frombuffer(zlib.decompress(arr_comp[ptr_arr[i]:ptr_arr[i+1]]), dtype='=f8')

complete_arr[(nb-1)*sl_s::] = np.frombuffer(zlib.decompress(arr_comp[ptr_arr[-2]:ptr_arr[-1]]), dtype='=f8')

# complete_arr should be the same as np.arange(1_000, dtype=np.float64)

I’ll have a look to your code tomorrow (too late for today); in the meantime I’ll found some stuffs in Meshio project (see file) and in addition to you file, it might be interesting to get additionnal informtions into.

Some code I need to have a look.

Thanks for your support, remain in touch


# -*- coding: utf-8 -*-

import numpy as np
import os, base64, zlib, lzma  
vtu_to_numpy_type = {
    "Float32": np.dtype(np.float32),
    "Float64": np.dtype(np.float64),
    "Int8": np.dtype(np.int8),
    "Int16": np.dtype(np.int16),
    "Int32": np.dtype(np.int32),
    "Int64": np.dtype(np.int64),
    "UInt8": np.dtype(np.uint8),
    "UInt16": np.dtype(np.uint16),
    "UInt32": np.dtype(np.uint32),
    "UInt64": np.dtype(np.uint64),
numpy_to_vtu_type = {v: k for k, v in vtu_to_numpy_type.items()}

def num_bytes_to_num_base64_chars(num_bytes):
    # Rounding up in integer division works by double negation since Python
    # always rounds down.
    return -(-num_bytes // 3) * 4

def read_data(self, c):
    fmt = c.attrib["format"] if "format" in c.attrib else "ascii"

    data_type = c.attrib["type"]
        dtype = vtu_to_numpy_type[data_type]
    except KeyError:
        print(f"Illegal data type '{data_type}'.")

    if fmt == "ascii":
        # ascii
        if c.text.strip() == "":
            data = np.empty((0,), dtype=dtype)
            data = np.fromstring(c.text, dtype=dtype, sep=" ")
    elif fmt == "binary":
        reader = (
            if self.compression is None
            else self.read_compressed_binary
        data = reader(c.text.strip(), dtype)
    elif fmt == "appended":
        offset = int(c.attrib["offset"])
        reader = (
            if self.compression is None
            else self.read_compressed_binary
        assert self.appended_data is not None
        data = reader(self.appended_data[offset:], dtype)
       print(f"Unknown data format '{fmt}'.")

    if "NumberOfComponents" in c.attrib:
        nc = int(c.attrib["NumberOfComponents"])
            data = data.reshape(-1, nc)
        except ValueError:
            name = c.attrib["Name"]
            print(f"VTU file corrupt. The size of the data array '{name}' is {data.size} which doesn't fit the number of components {nc}"
    return data

def read_compressed_binary(self, data, dtype):
    # first read the block size; it determines the size of the header
    header_dtype = vtu_to_numpy_type[self.header_type]
    if self.byte_order is not None:
        header_dtype = header_dtype.newbyteorder(
            "<" if self.byte_order == "LittleEndian" else ">"
    num_bytes_per_item = np.dtype(header_dtype).itemsize
    num_chars = num_bytes_to_num_base64_chars(num_bytes_per_item)
    byte_string = base64.b64decode(data[:num_chars])[:num_bytes_per_item]
    num_blocks = np.frombuffer(byte_string, header_dtype)[0]

    # read the entire header
    num_header_items = 3 + int(num_blocks)
    num_header_bytes = num_bytes_per_item * num_header_items
    num_header_chars = num_bytes_to_num_base64_chars(num_header_bytes)
    byte_string = base64.b64decode(data[:num_header_chars])
    header = np.frombuffer(byte_string, header_dtype)

    # num_blocks = header[0]
    # max_uncompressed_block_size = header[1]
    # last_compressed_block_size = header[2]
    block_sizes = header[3:]

    # Read the block data
    byte_array = base64.b64decode(data[num_header_chars:])
    if self.byte_order is not None:
        dtype = dtype.newbyteorder(
            "<" if self.byte_order == "LittleEndian" else ">"

    byte_offsets = np.empty(block_sizes.shape[0] + 1, dtype=block_sizes.dtype)
    byte_offsets[0] = 0
    np.cumsum(block_sizes, out=byte_offsets[1:])

    assert self.compression is not None                                             ##########################################
    c = {"vtkLZMADataCompressor": lzma, "vtkZLibDataCompressor": zlib}[

    # process the compressed data
    block_data = np.concatenate(
                c.decompress(byte_array[byte_offsets[k] : byte_offsets[k + 1]]),
            for k in range(num_blocks)

    return block_data

This is great! The snippet I showed above does something very similar to what you found in meshio. My ptr_array is the same as their byte_offsets.

But the meshio version is much more general, for all data types and endianness.

Plus, their num_bytes_to_num_base64_chars() function is very interesting. It is a much cheaper approach than header_b64_len = len(b64encode(header_bin)) from my code snippet.

Though I believe their np.concatenate is slower than preallocating the whole array and then populating it in a for loop as I did. A time.perf_counter() comparison would be interesting.

Thanks Breno for your contribution.

Your snippet seems to work (tested on both floats and integers) and I think I should be able to generalize it for my purpose (using as well the Meshio snippet).

The most important remains that thanks to your help, I’ve fulfilled lacks.

Remains in touch


soory wrong copy/paste

Just to say that the topic is not so easy, mainly for big data) as it is illustrated here after (look at header_arr1 and header_arr2 arrays)


import numpy as np
import zlib, sys
from base64 import b64encode, b64decode

### case 1
n1 = 1_000
MyArray1 = np.arange(n1, dtype=np.float64)
print(f"Size Of The Array1 = {sys.getsizeof(MyArray1)}")
Xml_file_extract1 = b64encode(MyArray1)
Xml_file_extract1 = zlib.compress(Xml_file_extract1)

# from breno
header_bin1 = b64decode(Xml_file_extract1)
header_arr1 = np.frombuffer(header_bin1, dtype=np.uint32)
header_b64_len1 = len(b64encode(header_bin1))

### case 2
n2 = 1_000_000
MyArray2 = np.arange(n2, dtype=np.float64)
print(f"Size Of The Array2 = {sys.getsizeof(MyArray2)}")
Xml_file_extract2 = b64encode(MyArray2)
Xml_file_extract2 = zlib.compress(Xml_file_extract2)

header_bin2 = b64decode(Xml_file_extract2)
header_arr2 = np.frombuffer(header_bin2, dtype=np.uint32)
header_b64_len2 = len(b64encode(header_bin2))

(time to stop for today)

Based on the snippet coming from Meshio project (see code above), I think the issue has been fixed (nevertheless I need to perform additional tests to be sure); it has been tested on meshes up to 1.2 million of quadratic hex without trouble and first results semm consistent (TBC).
Actually the most disturbing for me has been to notice that the number of retrieved characters for the header is larger than the header iself, but since we know the number of blocks, correct data are gotten.

Hi Paul,

I do not understand what that code snippet has to do with the VTK XML files.

You used Xml_file_extract1 = b64encode(MyArray1), but this is not how to generate the XML. You did not generate any headers nor did you compress the array.

To generate the XML_file_extract1 you have to follow the steps of the solution of this topic.

If you follow those steps, then you will obtain something similar to the Xml_file_extract from this other post I sent you and be able to apply the XML extraction methods we have been discussing.


Hi Breno

Sorry to be unclear (and you’re right, my previous code is wrong); in you’re snippet, if you change the size of the array from (from 1_000 to a largeur number), you’ll notice that above a certain value, the size of the retrieve array does not change, leading to an error.

In the Meshio snippet, only the header is retrieved


Hi Paul,

Oh, sorry, I did not understand!

I see, so this part of my snippet:

header_bin = b64decode(Xml_file_extract[1::])
header_arr = np.frombuffer(header_bin, dtype='=u4')
header_b64_len = len(b64encode(header_bin))

does not work for larger XML’s, right?

Thank you! I understand now.

Indeed, the meshio snippet is much better, initially only getting the number of blocks and then reading just the header. The num_bytes_to_num_base64_chars() function is brilliant.

Actually the most disturbing for me has been to notice that the number of retrieved characters for the header is larger than the header iself

Yeah, I think this happens because we do not compress the header before writing to the XML file. But in comparison to the whole array, it is still very small, isn’t it? I haven’t tested this.


“does not work for larger XML’s, right?” → Indeed

I was faced to an error and after digging, I noticed that decoding the whole binary part does not work above a certain length/size; in otherword, blocks are missing and data cannot be gotten. Thus it’s necessary to separate the header part from the data one prior to decompressing/decoding data.

The 2nd trouble I had is the size of the header (caculated blocks = 57 and provided = 114 is an example), values suddenly jumped to abnormal ones => if I only retreive the number of blocks originally calculated, it’s find. I share your point of view: more than needed character are retrieved, but we know now how to proceed. :slight_smile:

I analysed the Meshio snippet, and franckly, 1 or 2 lines remain still unclear for me but it works.

Probably the code can be improved.