Read or split large vtk file for paraview post-processing

ShineSun · July 2, 2024, 3:13am

Hello community and everyone,

Thank you for the powerful tool VTK for scientific data processing. I use Paraview to display my computational fluid dynamics(CFD) data and it really helps me. Now I encounter a problem for large dataset and I need your help since I am very new to VTK.

My software currently only generates Legacy ASCII vtk files of the computational data. The file size becomes very large when I do a complex computation. Now I have a datafile of around 50GB for post-processing. I prepared 1TB RAM memory on the cluster to avoid any memory overflow for large datasets. When I read the data in Paraview and pressed the “apply” button, it gives the following error in a short time:

terminate called after throwing an instance of ‘std::bad_alloc’
what(): std::bad_alloc
Loguru caught a signal: SIGABRT
Stack trace:
…
error: exception occurred: Child aborted

which seems to be related to memory allocation problem though I check that there is still 800GB RAM available in my cluster. I have 308075091 POINTS data in my vtk file, which may be too large for the Paraview vtk reader.

Then I learnt that I can split the data file into smaller pieces using python and VTK package. Since I have never used VTK alone, I asked ChatGPT for help and it gave me the split code below:

import vtk

def read_vtk_file(filename):
reader=vtk.vtkDataSetReader()
reader.SetFileName(filename)
reader.Update()
return reader.GetOutput()

def write_vtk_file(data, filename):
writer=vtk.vtkDataSetWriter()
writer.SetFileName(filename)
writer.SetInputData(data)
writer.Write()

def split_vtk_file(input_filename, output_filenames):
data=read_vtk_file(input_filename)
bounds=data.GetBounds()
for i, output_filename in enumerate(output_filenames):
clipper=vtk.vtkClipDataSet()
clipper.SetInputData(data)

    box=vtk.vtkBox()
    box.SetBounds(
        bounds[0] if i%2==0 else (bounds[0]+bounds[1])/2,
        (bounds[0]+bounds[1])/2 if i%2==0 else bounds[1],
        bounds[2] if i//2==0 else (bounds[2]+bounds[3])/2,
        (bounds[2]+bounds[3])/2 if i//2==0 else bounds[3],
        bounds[4],
        bounds[5]
    )
    clipper.SetClipFunction(box)
    clipper.Update()

    write_vtk_file(clipper.GetOutput(), output_filename)

input_filename=“…/snapdat_0.010.vtk”
output_filenames=[“part1.vtk”, “part2.vtk”, “part3.vtk”, “part4.vtk”]
split_vtk_file(input_filename, output_filenames)

Running the split code also gives the error:

vtkExecutive.cxx:729 ERR| vtkCompositeDataPipeline (0x1b0b560): Algorithm vtkUnstructuredGridReader (0x1b0a240) returned failure for request: vtkInformation (0x1b0bdf0)
Debug: Off
Modified Time: 308
Reference Count: 1
Registered Events: (none)
Request: REQUEST_DATA
FROM_OUTPUT_PORT: 0
ALGORITHM_AFTER_FORWARD: 1
FORWARD_DIRECTION: 0

which shows that the file is not read successfully and the datasize is too large to handle. Since the computation takes really long time and I only have vtk datafiles now, I want to know if I can further split the datafiles or convert them to other format with VTK. I am currently using Paraview-5.9.0-64bit and VTK 9.3.1. Thank you for your kind help!

Best regards,
Shine Sun

rexthor · July 2, 2024, 7:43pm

Hi and welcome to the forum.

I’m not sure you can break the ASCII VTK files apart into pieces. You might be able to break your structure into pieces and write them out at separate ASCII VTK files though? You might also look into another easy-to-write data format (like OBJ, PLY, STL). The last two can be binary as well, which cuts down on size a lot.

However, if your software already has enough understanding of VTK to write out legacy VTK ASCII files … perhaps try using the library to write a modern VTK format? If you describe your data, we can probably help you out. Your code snippet appears to be Python, which makes that even easier.

ShineSun · July 3, 2024, 11:44am

Thank you rexthor for your reply!

I will consider your suggestions. I use an opensource software to generate the VTK format data and post-process them with Paraview. The software generates ASCII VTK files and it becomes difficult to handle when the grid number becomes large. I currently have a data file containing 308075091 points as well as flow field data. I wonder if vtk package in python is able to read such a large single file? When I read it with

def read_vtk_file(filename):
    reader=vtk.vtkDataSetReader()
    reader.SetFileName(filename)
    reader.Update()
    return reader.GetOutput()

it gives the error:

vtkExecutive.cxx:729 ERR| vtkCompositeDataPipeline (0x1b0b560): Algorithm vtkUnstructuredGridReader (0x1b0a240) returned failure for request: vtkInformation (0x1b0bdf0)
Debug: Off
Modified Time: 308
Reference Count: 1
Registered Events: (none)
Request: REQUEST_DATA
FROM_OUTPUT_PORT: 0
ALGORITHM_AFTER_FORWARD: 1
FORWARD_DIRECTION: 0

My VTK data file begins with:

# vtk DataFile Version 2.0
Gerris simulation version 1.3.2 (131206-155120)
ASCII
DATASET UNSTRUCTURED_GRID

POINTS 308075091 float
-0.1 0.075 0.075
......

I have no problem when reading smaller size files. And it seems that the 308075091 points cause the abortion of the code. Thank you again for your help.
Cheers,
Shine Sun

rexthor · July 3, 2024, 5:03pm

That’s a rather large dataset. It’s hard for me to tell if this is simply your computer running out of memory or some other problem.

ShineSun · July 4, 2024, 12:39am

Thank you rexthor. I am sure that I have a plenty of memory to read this dataset. Maybe it is impossible to allocate such memory to a single process or so. I should investigate it more.