Working with a HDF file in VTK

I have a 3D image stack, saved on disk as a set of individual 2D image files. I have around 3000 such image files, each about 10 MB, so the total data set is about 30 GB. The datasets can be larger than that, and will not be able to fit system memory.

I’m currently using the vtkPNGReader class to read these images. I rename my 2D image files using a consistent pattern, so that I can use the SetFilePattern method in that class and the appropriate image is read in as requested.

I want to improve this workflow and manage my dataset better. I’m wondering if the HDF format will help me. I can use the h5py library to convert the individual 2D image files into a single large hdf5 file, but I’m not sure how to read it into VTK. I couldn’t find any examples for the vtkHDFReader class. If there is a better file type I should use for this, please let me know.

My current png reader code:

		reader = vtk.vtkPNGReader()
		reader.SetFilePrefix(imagesPath + '\\') 
		reader.SetFilePattern('%sImage_%05d.png')
		reader.SetDataExtent(0, self.extent_x, 0, self.extent_y, 0, self.extent_z)
		reader.SetDataSpacing(pixel_size_xy, pixel_size_xy, section_thickness)
		reader.SetDataScalarTypeToUnsignedChar()
		reader.SetNumberOfScalarComponents(1)

My code to generate the hdf5 file:

with h5py.File(output_file,'w') as h5f:
    
    ds = h5f.create_dataset('images',dtype=np.uint8,shape=(len(image_files),image_height,image_width)) 

    # Iterate over the PNG image files and add them to the dataset
    for i, filename in enumerate(tqdm(image_files)):
        filepath = os.path.join(image_dir, filename)
        with Image.open(filepath) as img:
            # Convert grayscale image to numpy array and save it to the HDF5 dataset
            img = img.convert("L")
            img = np.asarray(img)
            ds[i,:,:] = img

Any help would be appreciated. Thanks!

Hello,

Better for what? Fetch speed? Memory footprint? Disk usage? …

HDF5 is a file system desined for efficient I/O of large data sets like yours. It’s largely used in science applications, but it won’t optimize memory usage during rendering, if I understood what is your goal.

take care,

PC

Hi Paulo,

Thanks for your response. Instead of handling individual 2D image files, I want to work with a file format that can hold this data in a single file or significantly fewer files. This is not a cloud-based application, so I don’t think I should consider Zarr format. I don’t think VTK python supports Zarr yet.

The rendering that I show in my application is to display a single 2D slice from the stack at any time. The PNGReader I have now works fine in terms of speed when used along with an ExtractVOI filter. I just want to see if there is a way to manage files easier and move to a newer file format.

It looks like the vtkHDFReader implementation works for VTK-HDF format files, I’m trying to figure out how to create such a file sequentially like I’ve shown in my question.

Thanks!

Hi @Chaitanya_Kolluru,

Thanks for the question!

Indeed, VTKHDF was designed just for these types of applications. You can take a look at the documentation here: https://kitware.github.io/vtk-examples/site/VTKFileFormats/#hdf-file-formats.

It sounds like what you may be looking for is the ImageData format. If you can put your data into an HDF5 file respecting the VTKHDF format, you should be able to read it using the vtkHDFReader and execute a rendering pipeline.

Hope that helps!

Best,

Julien

1 Like

Hi,

Please, try to solve one problem at a time. First roll out HDF5 I/O by following @jfausty’s directions. Then, you work on how to render a stack of 2D images as a volume.

regards,

PC

1 Like

Hi Julien,

Thanks for your response. From the link you’ve provided, I’m able to find an example VTKHDF dataset, mandelbrot-vti.hdf from the VTK build directory. The Type attribute in the VTKHDF group for this file is ImageData, so I think this is a good example file for my scenario.

The PointData group within VTKHDF group has 8 datasets in this case. I’m attaching the screenshot from HDFView app listing out all these datasets. What I’d like to do is set up a vtkHDFReader for this file, select the “Iterations” dataset, connect it to a vtkExtractVOI filter and select a single z-slice for rendering with a vtkImageActor.

The code I have is below.

from vtkmodules.vtkCommonColor import vtkNamedColors
from vtkmodules.vtkRenderingCore import (

    vtkRenderWindow,
    vtkRenderWindowInteractor,
    vtkRenderer
)
import vtk

colors = vtkNamedColors()

file_name = 'path/to/mandelbrot-vti.hdf'

# Read the source file.
reader = vtk.vtkHDFReader()
reader.SetFileName(file_name)
reader.Update()    
  
imageXY = vtk.vtkExtractVOI()
imageXY.SetInputData(reader.GetOutput())
imageXY.SetVOI(0, 19, 0, 20, 0, 0) # select slice 0
imageXY.Update()

XYSliceActor = vtk.vtkImageActor()
XYSliceActor.SetPosition(-19 * 1/2, -20 * 1/2, 0)
XYSliceActor.GetMapper().SetInputConnection(imageXY.GetOutputPort())
XYSliceActor.Update()
    
ip = vtk.vtkImageProperty()
ip.SetColorWindow(255)
ip.SetColorLevel(128)
ip.SetAmbient(0.0)
ip.SetDiffuse(1.0)
ip.SetOpacity(1.0)
ip.SetInterpolationTypeToLinear()

XYSliceActor.SetProperty(ip)
XYSliceActor.Update()
  
# Create the Renderer
renderer = vtkRenderer()
renderer.AddActor(XYSliceActor)
renderer.ResetCamera()
renderer.SetBackground(colors.GetColor3d('Silver'))

# Create the RendererWindow
renderer_window = vtkRenderWindow()
renderer_window.AddRenderer(renderer)
renderer_window.SetWindowName('ReadImageData')

# Create the RendererWindowInteractor and display the vti file
interactor = vtkRenderWindowInteractor()
interactor.SetRenderWindow(renderer_window)
interactor.Initialize()
interactor.Start()

The render window closes almost instantaneously, and this error is printed to the console.

2023-04-19 11:18:46.424 ( 2.130s) [ ] vtkImageData.cxx:1412 ERR| vtkImageData (000001F402E83290): No Scalar Field has been specified - assuming 1 component!

Could you please let me know what I’m missing here? How do I select the “Iterations” dataset as the output of reader?

Thank you,
Chaitanya

Hi Paulo,

Thanks for the suggestion, it is helpful. I found an example VTKHDF file in the VTK build directory. I’m trying to use that first and see if the rendering I need is achievable. I’ve posted my code above.

Thanks,
Chaitanya

1 Like

Hi @Chaitanya_Kolluru,

It looks like you might be missing a line like this after your VOI extraction:

outDS = imageXY.GetOutput()
outDS.GetPointData().SetScalars(outDS.GetPointData().GetArray("Iterations"))

That should do the trick.

Also, do not hesitate to look at the examples provided here: https://kitware.github.io/vtk-examples/site/. They are very good quality VTK snippets that you can modify to do your bidding.

Best of luck!

Julien

1 Like

Hi Julien,

Thanks for that tip. I ran into the same error when I copied both the lines you’ve provided here, but instead, when I took reader.GetOutput() and assigned it to outDS, that seems to work.

However, I’m running into another issue now. When I create the VTKHDF file with 100 png image slices as a 3D stack, the rendering works as expected. However, if I consider 500 png image slices, I get an error from the hdf5 library. My datasets are much larger than that, so this is a concern.

I wrote this script that creates a VTKHDF file from a folder containing png images and tries to render one slice. If I set the num_images variable to 100, it works, however, I get an error if I change it to 500.

import h5py
from PIL import Image
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt
import glob
import vtk

def create_vtkhdf_dataset(output_file, image_dir, image_height, image_width, num_images, pixel_size_xy, pixel_size_z):
    
    with h5py.File(output_file,'w') as hdffile:
        
        # write support data
        vtkhdf_group = hdffile.create_group("VTKHDF")
        vtkhdf_group.attrs.create("Version", [1, 0])
            
        vtkhdf_group.attrs.create("Type", np.string_("ImageData"))
        whole_extent = (0, image_width-1, 0, image_height-1, 0, num_images-1)
        vtkhdf_group.attrs.create("WholeExtent", whole_extent)

        vtkhdf_group.attrs.create("Origin", (0.0, 0.0, 0.0))
        vtkhdf_group.attrs.create("Spacing", (pixel_size_xy, pixel_size_xy, pixel_size_z))
        vtkhdf_group.attrs.create("Direction", (1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0))
        
        # create the pointdata group and the dataset inside it
        field_data_group = vtkhdf_group.create_group("PointData")
        field_data_group.attrs.create('Scalars', np.string_("PNGImage"))            
        dset = field_data_group.create_dataset('PNGImage',dtype=np.uint8,shape=(num_images,image_height,image_width)) 

        image_filenames = glob.glob(image_dir + '\\*.png')
        
        # Iterate over the PNG image files and add them to the dataset
        for i, filename in enumerate(tqdm(image_filenames[:num_images])):
            
            with Image.open(filename) as img:
                
                img = np.asarray(img)
                dset[i] = img

# To access individual 2D image slices from the HDF5 file, we can define a method like this:
def get_image_slice(hdf5_file, index):
    
    with h5py.File(hdf5_file, "r") as f:
        img_data = f["VTKHDF"]["PointData"]["PNGImage"][index, :, :]
        return img_data

def render_data(file):
    
    reader = vtk.vtkHDFReader()
    reader.SetFileName(file)
    reader.Update()    
    
    # Instead, this worked when I took the reader output, but only when num_images = 100, not 500
    outDS = reader.GetOutput()
    outDS.GetPointData().SetScalars(outDS.GetPointData().GetArray("PNGImage"))
  
    imageXY = vtk.vtkExtractVOI()
    imageXY.SetInputConnection(reader.GetOutputPort())
    imageXY.SetVOI(0, 2999, 0, 2999, 0, 0)
    
    # This did not work - got the same error 'No scalar Field has been specified - assuming 1 component!'
    # outDS = imageXY.GetOutput()
    # outDS.GetPointData().SetScalars(outDS.GetPointData().GetArray("PNGImage"))

    imageXY.Update()

    XYSliceActor = vtk.vtkImageActor()
    XYSliceActor.SetPosition(-1500, -1500, -500)
    XYSliceActor.GetMapper().SetInputConnection(imageXY.GetOutputPort())
        
    ip = vtk.vtkImageProperty()
    ip.SetColorWindow(255)
    ip.SetColorLevel(128)
    ip.SetAmbient(0.0)
    ip.SetDiffuse(1.0)
    ip.SetOpacity(1.0)
    ip.SetInterpolationTypeToLinear()

    XYSliceActor.SetProperty(ip)
    XYSliceActor.Update()
    
    colors = vtk.vtkNamedColors()
    # Create the Renderer
    renderer = vtk.vtkRenderer()
    renderer.AddActor(XYSliceActor)
    renderer.ResetCamera()
    renderer.SetBackground(colors.GetColor3d('Silver'))

    # Create the RendererWindow
    renderer_window = vtk.vtkRenderWindow()
    renderer_window.AddRenderer(renderer)
    renderer_window.SetWindowName('ReadImageData')

    # Create the RendererWindowInteractor and display the VTKHDF file
    interactor = vtk.vtkRenderWindowInteractor()
    interactor.SetRenderWindow(renderer_window)
    interactor.Initialize()
    interactor.Start()
    
if __name__ == '__main__':
    
    # Define the paths to the PNG images and the output HDF5 file
    # Specify path to the PNG images
    folder_path = r'path\to\data'
    
    image_dir = folder_path
    output_file = folder_path + '\stack.hdf'
    
    image_height = 3000
    image_width = 3000
    
    # Works when this is 100, but not 500
    num_images = 500
    
    pixel_size = 1
    pixel_size_z = 1
    
    # Create the dataset
    create_vtkhdf_dataset(output_file, image_dir, image_height, image_width, num_images, pixel_size, pixel_size_z)
    
    # The method takes the path to the HDF5 file and the index of the image slice to retrieve. 
    # It returns the image data as a numpy array.
    # Example usage:
    img_slice = get_image_slice(output_file, num_images - 10)
    plt.imshow(img_slice)
    plt.show()
    
    render_data(output_file)

I get the following error when num_images is set to 500.

HDF5-DIAG: Error detected in HDF5 (1.13.1) thread 0:
  #000: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5D.c line 1021 in vtkhdf5_H5Dread(): can't synchronously read data
    major: Dataset
    minor: Read failed
  #001: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5D.c line 970 in H5D__read_api_common(): can't read data
    major: Dataset
    minor: Read failed
  #002: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5VLcallback.c line 2079 in vtkhdf5_H5VL_dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #003: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5VLcallback.c line 2046 in H5VL__dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #004: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5VLnative_dataset.c line 294 in vtkhdf5_H5VL__native_dataset_read(): can't read data
    major: Dataset
    minor: Read failed
  #005: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5Dio.c line 262 in vtkhdf5_H5D__read(): can't read data
    major: Dataset
    minor: Read failed
  #006: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5Dcontig.c line 610 in vtkhdf5_H5D__contig_read(): contiguous read failed
    major: Dataset
    minor: Read failed
  #007: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5Dselect.c line 465 in vtkhdf5_H5D__select_read(): read error
    major: Dataspace
    minor: Read failed
  #008: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5Dselect.c line 220 in H5D__select_io(): read error
    major: Dataspace
    minor: Read failed
  #009: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5Dcontig.c line 934 in H5D__contig_readvv(): can't perform vectorized sieve buffer read 
    major: Dataset
    minor: Can't operate on object
  #010: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5VM.c line 1401 in vtkhdf5_H5VM_opvv(): can't perform operation
    major: Internal error (too specific to document in detail)
    minor: Can't operate on object
  #011: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5Dcontig.c line 739 in H5D__contig_readvv_sieve_cb(): block read failed
    major: Dataset
    minor: Read failed
  #012: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5Fio.c line 105 in vtkhdf5_H5F_shared_block_read(): read through page buffer failed
    major: Low-level I/O
    minor: Read failed
  #013: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5PB.c line 718 in vtkhdf5_H5PB_read(): read through metadata accumulator failed
    major: Page Buffering
    minor: Read failed
  #014: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5Faccum.c line 252 in vtkhdf5_H5F__accum_read(): driver read request failed
    major: Low-level I/O
    minor: Read failed
  #015: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5FDint.c line 205 in vtkhdf5_H5FD_read(): driver read request failed
    major: Virtual File Layer
    minor: Read failed
  #016: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5FDsec2.c line 743 in H5FD__sec2_read(): file read failed: time = Wed Apr 19 20:06:35 2023
, filename = 'D:\data\example\stack.hdf', file descriptor = 4, errno = 22, error message = 'Invalid argument', buf = 00000237C0218040, total read size = 4500000000, bytes this sub-read = 2147483647, bytes actually read = 18446744073709551615, offset = 4112
    major: Low-level I/O
    minor: Read failed
2023-04-19 20:06:35.254 (  21.489s) [                ]vtkHDFReaderImplementat:942    ERR| vtkHDFReader (00000237BBAD7EA0): Error H5Dread start: 0, 0, 0 count: 500, 3000, 3000
2023-04-19 20:06:35.313 (  21.549s) [                ]       vtkHDFReader.cxx:391    ERR| vtkHDFReader (00000237BBAD7EA0): Error reading array PNGImage
2023-04-19 20:06:35.322 (  21.558s) [                ]       vtkExecutive.cxx:741    ERR| vtkCompositeDataPipeline (00000237B8F5C6B0): Algorithm vtkHDFReader (00000237BBAD7EA0) returned failure for request: vtkInformation (00000237B8D7E810)
  Debug: Off
  Modified Time: 171
  Reference Count: 1
  Registered Events: (none)
  Request: REQUEST_DATA
  FORWARD_DIRECTION: 0
  ALGORITHM_AFTER_FORWARD: 1
  FROM_OUTPUT_PORT: 0


2023-04-19 20:06:35.879 (  22.114s) [                ]       vtkImageData.cxx:1412   ERR| vtkImageData (00000237B3692970): No Scalar Field has been specified - assuming 1 component!

I’m also attaching an example png file here, which can be duplicated using shutil.copy2() to quickly create as large an image stack as desired. The variable folder_path needs to be modified to point to that directory if you’d like to test it at your end.

Do you have any thoughts on why this is happening? I really appreciate all your help so far.

Thanks,
Chaitanya

Hi @Chaitanya_Kolluru,

Everything looks in order to me. The only thing I would try is to change this snippet:

        for i, filename in enumerate(tqdm(image_filenames[:num_images])):
            
            with Image.open(filename) as img:
                
                img = np.asarray(img)
                dset[i] = img

with this:

        for i, filename in enumerate(tqdm(image_filenames[:num_images])):
            
            with Image.open(filename) as img:
                
                img = np.asarray(img)
                dset[i, :, :] = img.reshape((image_height, image_width))

Hope that is helpful and good luck with the debugging!

Best regards,

Julien

1 Like

Hi Julien,

I made those changes but the issue still persists.

If I put in more than 239 png images into the VTKHDF file, I get the error below. It works fine up until 238 images. Coincidentally, this is when the VTKHDF file that is created increases in size from 1.99 GB to 2 GB. My guess here is that a different process is called for files with a size greater than 2 GB that is causing this issue.

HDF5-DIAG: Error detected in HDF5 (1.13.1) thread 0:
  #000: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5D.c line 1021 in vtkhdf5_H5Dread(): can't synchronously read data
    major: Dataset
    minor: Read failed
  #001: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5D.c line 970 in H5D__read_api_common(): can't read data
    major: Dataset
    minor: Read failed
  #002: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5VLcallback.c line 2079 in vtkhdf5_H5VL_dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #003: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5VLcallback.c line 2046 in H5VL__dataset_read(): dataset read failed
    major: Virtual Object Layer
    minor: Read failed
  #004: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5VLnative_dataset.c line 294 in vtkhdf5_H5VL__native_dataset_read(): can't read data
    major: Dataset
    minor: Read failed
  #005: C:\glr\builds\vtk\vtk-ci-ext\0\ThirdParty\hdf5\vtkhdf5\src\H5Dio.c line 147 in vtkhdf5_H5D__read(): no output buffer
    major: Invalid arguments to routine
    minor: Bad value
2023-04-20 11:30:01.558 (   8.121s) [                ]vtkHDFReaderImplementat:942    ERR| vtkHDFReader (0000024169171F40): Error H5Dread start: 0, 0, 0 count: 239, 3000, 3000        
2023-04-20 11:30:01.595 (   8.158s) [                ]       vtkHDFReader.cxx:391    ERR| vtkHDFReader (0000024169171F40): Error reading array PNGImage
2023-04-20 11:30:01.606 (   8.169s) [                ]       vtkExecutive.cxx:741    ERR| vtkCompositeDataPipeline (0000024168BB31F0): Algorithm vtkHDFReader (0000024169171F40) returned failure for request: vtkInformation (00000241691E1C80)
  Debug: Off
  Modified Time: 171
  Reference Count: 1
  Registered Events: (none)
  Request: REQUEST_DATA
  FORWARD_DIRECTION: 0
  ALGORITHM_AFTER_FORWARD: 1
  FROM_OUTPUT_PORT: 0


2023-04-20 11:30:02.138 (   8.701s) [                ]       vtkImageData.cxx:1412   ERR| vtkImageData (0000024168FD4650): No Scalar Field has been specified - assuming 1 component!

Can you let me know what you think? If there is another file format for Image data that shouldn’t have this issue, I can try that. Since the images are beyond my system memory, converting from PNG into the new file format should be done sequentially, one image at a time as we did here. The format should support loading a single slice for rendering and not require reading in the full stack.

Thanks,
Chaitanya

Hi @Chaitanya_Kolluru,

Some OSs and file systems have file size limits of 2GB. This can be part of the issue, but I am surprised you don’t get an error when writing the file in this case.

It could also be due to the fact that the reader tries to read all the data at the same time and the OS or underlying C library has a 2GB cap. I would be surprised if HDF5 did not already have a solution for this though.

Best regards,

Julien

Thanks, Julien. I’m on a Windows 10, 64-bit OS currently, and can create a VTKHDF file as large as 8 GB (and possibly larger) without an issue. Do you have any thoughts on how I can move forward with this issue? I’m assuming this may be a common issue for others as well when working with large data in this format.

Should I create an issue in the gitlab website for VTK or should this go somewhere on a HDF file format discussion board?

Thanks,
Chaitanya

Hi @Chaitanya_Kolluru,

Please do create an issue on the VTK gitlab repository. If you can share the data and the scripts to reproduce the failure that would be great.

Don’t hesitate to tag me on it and I will try to look into it when I have some time.

Best regards,

Julien

1 Like

Sounds good, created https://gitlab.kitware.com/vtk/vtk/-/issues/18956.

Thanks for taking the time to look into this, really appreciate it.

1 Like