Support for pathlib

Hi all,

Are there any plans to add pathlib support for the Python wrappers?

(I’ve been recommended to ping @dgobbi about this)

No, there are no plans. I’m not sure what pathlib has to do with VTK? What VTK-related problems does it solve?

Hi @dgobbi, thanks for your reply.

It’s a built-in Python module that helps with path handling. It was introduced in Python 3.4, for which VTK has support, so it’s indirectly related to VTK through the Python wrappers. It wouldn’t solve VTK-related problems, but it would improve Python support.

This is an example form the Slicer forum before and after using pathlib:

import os
myDir = '/tmp/images'
for root, dirs, files in os.walk(myDir, topdown=False):
    for name in files:
        _, ext = os.splitext(name)
        if ext == '.nrrd':
            loadVolume(os.path.join(root, name))
from pathlib import Path
myDir = Path('/tmp/images')
for file in myDir.glob('**/*.nrrd'):
    loadVolume(str(file))  # need str until pathlib is supported

Here are some large Python libraries getting adapted to pathlib:

nibabel (23/10/2019)
pandas (10/9/2015)
numpy (6/10/2015)
Pillow (3/8/2015)
matplotlib (18/7/2016)

Related Python Enhancement Proposals (PEP):
PEP 428 – The pathlib module – object-oriented filesystem paths
PEP 519 – Adding a file system path protocol

Some more context on the advantages of pathlib: https://realpython.com/python-pathlib/#the-problem-with-python-file-path-handling

I hope that makes sense and is a bit more clear than my previous post.

I believe you have a bug in your example, as str() will format a bytes-containing object into a nonsensical file name, e.g., "b'/some/where'". That last line should read

    loadVolume(os.fspath(file))

Given that, I’m guessing what you’re looking for is a change to the generated wrappers that automatically calls os.fspath for all path-like arguments as described in PEP 519?

If so, have you made a list of the touch points within the VTK source code where this would be applied? That seems like it would be a good first step towards seeing if your idea is feasible.

Taking a cursory look at some of the code that opens files, I see a lot of generic const char * declarations for file name parameters, meaning that this couldn’t be automated substantially, so would entail reading through all of VTK and somehow building a table of signatures to indicate where to call os.fspath and where not to (which seems like a huge amount of work).

Hi @efahl, thanks for replying.

Path.glob returns Path objects, so I’m not sure what you mean. That code works fine on my computer.

To be clear, I’m a very basic VTK user. I know very little of C++ and don’t know how the wrappers work. I was just curious about whether this had ever been discussed, or whether some of the main developers thought it would be a good feature to add. I don’t think this is an essential feature or it solves and VTK problems. I asked something similar on the Slicer forum and I was suggested to extend my question to VTK and PythonQt. And here I am, because I thought it would be helpful for the three libraries to have pathlib support. I understand that if this would need a big effort, it’s just not worth it.

So to answer your question, I haven’t made the list of touch points. I wouldn’t know where to start! I just searched for SetFileName but got overwhelmed by the results. I think there’s really not much I can do to help here, unfortunately.

Correct for your specific use, but generic os.PathLike objects can contain str or bytes, depending on how you got your path object, so if you just do str(path), it could produce an unexpected result. The point of os.fspath() is to avoid this issue by looking at the type of path and “doing the right thing”.

So, if this goes anywhere, it would be best to do it once in a way that handles not just pathlib.Path, but anything that implements the __fspath__ protocol (I’m thinking specifically of os.scandir and its os.DirEntry objects).

Sort of an aside…
Back when PEP 519 was being finalized, I converted a large path handling library to support it, and found that since we dealt with VTK on the back end, I had to further transform the output of os.fspath() with something like this, as we were getting some wacky bytes paths for files that had names in, if I remember right, Chinese. This ensured that the C++ code only ever got UTF-8 and made everything happy.

def _fspath(path: PathLike) -> str:
    _path = os.fspath(path)
    return _path.decode('UTF-8') if isinstance(_path, bytes) else _path
1 Like

Pathlib support would make Python-wrapped VTK a bit more modern. Personally, I’m not fond of pathlib, because it brings in a little convenience, but also make things a little opaque and causes friction with libraries that do not support it. However, pathlib seems to have a growing user base that is increasingly demanding support for it everywhere (and I admit that if path objects were universally used in all Python libraries then it would be better than using strings). Since VTK is an actively maintained and widely used library, it will have to implement path object support at some point.

I assume that it could be implemented similarly to other wrapper hints. For example this is an existing hint:

  double* GetTuple4(vtkIdType tupleIdx)
    VTK_EXPECTS(0 <= tupleIdx && tupleIdx < GetNumberOfTuples())
    VTK_SIZEHINT(4); 

and this could be a new hint that would tell the VTK wrapper to add in some extra code to allow conversion to/from pathlike object:

  void SetFileName(const char* fname) VTK_EXPECTS_PATHLIKE(fname);
  const char* GetFileName() VTK_PATHLIKE();

@fepegar When these libraries return a path, do they return it as a string or path object? If they return path object, how did they do the transition from using string (wasn’t it a breaking change)?

1 Like

I can’t really think of an example of any of these libraries ever returning a path. The closest could be for example the filename property of a Pillow image, which returns a string. I suspect most of these libraries convert the path to string as soon it is passed to them. For example, NiBabel has a stringify function

which they use for loading

and saving

It would help if you could do some more investigation into what are the best practices for this, because VTK clearly needs to store and return paths in many cases. If a library only accept path object but then return path as string then I would consider the library only to “tolerate” path objects, not really supporting them.

Since the SetFileName() methods are used so much more often than the GetFileName(), I think it’s reasonable to take a pragmatic approach and only modify SetFileName(). Basically the idea is just to make VTK a little more convenient to use.

In any case, the wrappers are not going to import pathlib at the very low level of individual class wrapping. That would be an invasive change, and there’s no precedent for it. On the other hand, it’s straightforward to make SetFileName() look for an __fspath__ attribute, either with manual or automatic hinting.

2 Likes

It seems to be the case that these libraries just tolerate pathlib then, as all they do is covert to string as soon as they can. I’m not sure it makes sense to modify current implementations to be based on pathlib. But if I had to, for example, write a library to read DICOM, I think it would be very handy. But that’s a different issue.

Here are the aforementioned libraries tolerating a path-like object:

I agree, I think modifying SetFileName would be the change we are after.

1 Like

What about vtkStringArray, which is often used to store a list of filenames? Have you (or other users you are aware of) run into situations where you tried putting a Path object into a vtkStringArray?

No. But I think in that case it would make sense that only strings are accepted as input, as it’s not a vtkPathArray

I’ve submitted #18120 to get this on the tracker. No timeline, though.

2 Likes