vtkHyperTreeGridSource htg construction from large descriptor?

Hello!

I’ve been prototyping some implementations of a custom reader for our in-house TB-AMR data format to expose the data as a vtkHyperTreeGrid. Construction of the vtkHyperTreeGrid looks to work fine, albeit a bit slow (using Python as shown in the tutorial implementation). There was an old suggestion for constructing a vtkHyperTreeGrid quickly by using vtkHyperTreeGridSource to build the grid from a descriptor. This works fine to create a grid with a descriptor length of ~300k: the descriptor is decently fast to build when walking the grid, and the vtkHyperTreeGridSource also builds the grid much faster than just walking the tree.

But, when I try to use the same method with a full-scale descriptor of some 11M characters, the pipeline will hang. Walking the tree works for the same data (… and takes some 90s, which is not great, but not terrible). Is there some innate issue with passing large descriptors to vtkHyperTreeGridSource from Python? I’m using vtk 9.4.0.

@Charles_Gueunet @Louis_Gombert wdyt ?

Dear @alhom ,

The string descriptor mode of the vtkHyperTreeGridSource was initially introduced for testing, to be able to quickly create an HTG with a given structure. It is not meant to serialize a large data structure.
If you want to create a full fledged HTG, you may want to use a bit array directly (see SetDescriptorBits for for more information).

Can you clarify why you are using such a string to serialize a HTG ? With more context, maybe we could hep you find a better way to feed the data to VTK.

Hi!

That’s what I kind of suspected.

What I am trying to do is to is to add a simple, decently-fast Python wrapper to expose our home-grown file format as a vtk HTG object (instead of a bit clunky VisIt plugin, which is another story). Working on the aforementioned tutorials, I have a working skeleton code (all the way down to a ParaView plugin), but it is rather slow to ingest production-scale files (say, ~10M leaf nodes at ~four refinement levels). As above, generating the descriptor and using it would actually seem decently fast on small files, and I would have been pretty glad to have had it working quick even with a bit of a hacky solution :). Any help to better ingest the structure would indeed be appreciated!

Specifically, the context is the VLSV file format, from which we can consider that we store the mesh structure as a list of unique, global CellIDs, one for each leaf. I would rather construct a mapping of these CellIDs to the HTG global indices, but I haven’t delved yet deep enough to the HTG spec to do that, nor do I know of a method for constructing the HTG without walking the trees with the VTK wrapper (rather slow) or building the descriptor (surprisingly fast even with just a Python loop).

I think for efficiency the best way would be to replace the description + reconstruction method with a direct assignment. This can be done in several way:

  1. You can update the code on your side to export directly the data array corresponding to HTG structure (I would recommend this article to get a better understanding of the internal memory layout).
  2. A more generic approach would be to use a conduit descriptor, which is a way to describe your own memory layout in a generic way and let VTK do the translation to its datamodel. Unfortunately, in its current state VTK handles some AMR data sets, but not the Hyper Tree Grid, so some work could be done on the VTK side. Do not hesitate to reach us for this.

Looking quickly at the VLSV format, it looks like it is meant to describe a regular grid. HTG is mainly relevant when you need adaptive resolution, is it the case for you ?

Yes, VLSV also supports adaptive meshes, and we do use AMR. The file format is optimized for parallel I/O, which it does well… with a bit of a compromise on how easy it is to handle in post :).

Direct export from runtime to HTG would require rewriting the whole MPI-I/O workflow, so I doubt we would be doing that. Saving a descriptor and a file layout mapping for the HTG from runtime is feasible, though, and I’ll do likely just this in post for now. This begins to sound like the conduit descriptors, and they look very promising!

A quick try with the SetDescriptorBits indeed worked like a charm. This is very nice, I believe I can now start caching the descriptor and a mapping to the data layout.