How to understand binary DataArray in xml vtk output

Hello everyone, I have a problem with how to understand the binary DataArray.

The manual said if the format of DataArray is binary,

The data are encoded in base64 and listed contiguously inside the
DataArray element. Data may also be compressed before encoding in base64. The byte-
order of the data matches that specified by the byte_order attribute of the VTKFile ele-
ment.

I can not fully understand that, so I have obtain ascii file and binary file for same model.

ASCII file

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <UnstructuredGrid>
    <Piece NumberOfPoints="4" NumberOfCells="1">
      <PointData>
      </PointData>
      <CellData>
      </CellData>
      <Points>
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="ascii" RangeMin="0" RangeMax="1.4142135624">
          0 0 0 1 0 0
          1 1 0 0 1 1
        </DataArray>
      </Points>
      <Cells>
        <DataArray type="Int64" Name="connectivity" format="ascii" RangeMin="0" RangeMax="3">
          0 1 2 3
        </DataArray>
        <DataArray type="Int64" Name="offsets" format="ascii" RangeMin="4" RangeMax="4">
          4
        </DataArray>
        <DataArray type="UInt8" Name="types" format="ascii" RangeMin="10" RangeMax="10">
          10
        </DataArray>
      </Cells>
    </Piece>
  </UnstructuredGrid>
</VTKFile>

Binary file

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
  <UnstructuredGrid>
    <Piece NumberOfPoints="4" NumberOfCells="1">
      <PointData>
      </PointData>
      <CellData>
      </CellData>
      <Points>
        <DataArray type="Float32" Name="Points" NumberOfComponents="3" format="binary" RangeMin="0" RangeMax="1.4142135624">
          AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=
        </DataArray>
      </Points>
      <Cells>
        <DataArray type="Int64" Name="connectivity" format="binary" RangeMin="0" RangeMax="3">
          AQAAAACAAAAgAAAAEwAAAA==eJxjYIAARijNBKWZoTQAAHAABw==
        </DataArray>
        <DataArray type="Int64" Name="offsets" format="binary" RangeMin="4" RangeMax="4">
          AQAAAACAAAAIAAAACwAAAA==eJxjYYAAAAAoAAU=
        </DataArray>
        <DataArray type="UInt8" Name="types" format="binary" RangeMin="10" RangeMax="10">
          AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL
        </DataArray>
      </Cells>
    </Piece>
  </UnstructuredGrid>
</VTKFile>

When I looked at the the DataArray, using the last one as an example, I can not create the relationship between AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL and 10.

My understanding can be expressed using follow code, but it obtain CggAAA==.

#include "base64.h" // https://github.com/superwills/NibbleAndAHalf/blob/master/NibbleAndAHalf/base64.h
#include <iostream>
int main()
{
    int x = 10;
    int len;

    // first arg: binary buffer
    // second arg: length of binary buffer
    // third arg: length of ascii buffer
    char *ascii = base64((char *)&x, sizeof(int), &len);
    
    std::cout << ascii << std::endl;
    std::cout << len << std::endl;
    free(ascii);
    return 0;
}

Can someone give me an explanation of how to convert?
Another relate topic can be see in

Thanks for your time.

1 Like

You will find your answer in IO/Core/vtkBase64Utilities.cxx. Bytes are encoded by triplets, pairs, or singletons. Whenever you find one isolated =, it is encoded by pair. When you find the pattern ==, it is encoded by singleton. Everywhere else, it is encoded by triplet.

I suppose that the reason for this custom mapping was to be able to read more easily binary data when you open your file in a text editor, with a set of 64 human readable symbols, and no occurrence of Enter, Tab, or this kind of symbols that could pop up.

The file I pointed you to allows you to encode and decode data.

1 Like

I suspect that you have a lot of what seems like extra data because you are using a compressor. When you do so, there is a need to write a compressor header at the beginning of each data array you are writing. If you do SetCompressor(nullptr) prior to writing your file, you should have raw binary data, following the encoding rule of vtkBase64Utilities.cxx.

1 Like

Thanks for sharing sir.

I now can have a better understanding now: grinning: .

It is a complicated process. I think I should directly call the vtk API in my code other than writing the my own base64 encoder, compressor et al.

@qzxuhui If you want to create a custom XML reader / writer, I highly recommend writing classes inheriting from vtkXMLReader and vtkXMLWriter. XML elements are easy to handle, and there are ready-to-use methods that write / read arrays from an XML file, with the possibility to read an array given an input segment of indices.

Edit: Plus, all this encoding is hidden under the rug. You don’t have to worry about compressed / uncompressed data. The only tedious thing is that if you write your data in Appended mode or in Binary / ASCII mode (which are inlined), you have to handle it explicitly in the writer.

1 Like