What is state of the art: Unicode file names on Windows

The only place I’m aware of where “char *” is used for data is the interface to vtkCharArray. I don’t know if vtkCharArray is used much outside of legacy code.

The Python wrappers check UTF8 validity via PyUnicode_FromStringAndSize(), which raises a Python exception if the input isn’t valid.

Python byte arrays are how Python handles strings in other encodings. For example, if someone is using VTK-Python on a Windows box with CP1252 as the active code page, then right now the only way to specify a filename with accented characters is to use Python to generate a bytes() object with CP1252 encoding and pass that bytes() object to SetFileName().

If VTK goes full UTF8, then this will no longer be necessary, since VTK will always use the Unicode versions of the Windows APIs.

Great. So there is no impediment to switching to a requirement that all string type parameters are UTF-8 encoded.

@efahl Finally I can invite you to try this again using utf-8 text for file/folder names and building VTK from master. If you can think of any other test cases please try them. Let us know how it goes. :smile:

1 Like

@toddy, tried it with several files, works great, thanks!

vtkPNGReader (000002508818E8E0)
...
  FileName: 三维图片.png
...
  Internal File Name: 三维图片.png
2 Likes

FYI, since Windows10 May 2019 Update, the process code page can be set to UTF-8 in the application manifest.

We can use plain fopen API with UTF-8 file names on Windows the same way as on Mac and Linux. There is no need to convert filenames to UTF-16 (WCHAR) and use _wfopen anymore or use vtksys::SystemTools::Fopen. This is convenient because we don’t need to make hundreds of changes in VTK and invasive patches in third-party libraries.

I’ve tested it and works well. I only had to add the manifest file to the list of source files in CMake add_executable (in CTK, I’ve added add_executable_utf8 macro for this).

1 Like

@lassoan The necessary changes have already been made in VTK.

How does the manifest code page setting work for Windows < 10?
image

Code page setting in the manifest has no effect on pre-Windows10. So, the application still works well except it cannot read/write files that have special characters in their names. Fortunately for us, Microsoft took on the task of migrating users to current Windows versions, so we don’t need to worry about supporting earlier versions.

Thanks for your efforts. Since UTF-8 is not the default code page on Windows applications yet, your changes are still useful, as they make things a bit simpler for VTK users. Maybe in a couple of years, when UTF-8 will become the default code page on Windows and then we don’t need to use a custom compatibility layer anymore (similarly to how vtkStdString served its time and is now ready to be retired).

Has everything been merged into VTK master? I still see lots of plain fopen in VTK master.

Everything has been merged into master. No changes were made to 3rd party libraries.

It is great that MS are finally addressing this issue, but it’s not a silver bullet for legacy systems. 26% of users are still on Windows 7 and I note that the UK national health service was substantially affected by the WannaCry virus in 2018 because so many of their machines were running Windows XP.

Hello Mr A.Lasso,
I managed to find the macro and the actual manifest file. It seems 1-to-1 with the mSDN link you provided except for the schema version. (yours is: xmlns=“urn:schemas-microsoft-com:asm.v3”
while on MSDN is xmlns=“urn:schemas-microsoft-com:asm.v1”

not sure if that’s causing the following warning I get in VisualStudio:

Severity Code Description Project File Line Suppression State
Warning 81010002 Unrecognized Element “activeCodePage” in namespace “http://schemas.microsoft.com/SMI/2019/WindowsSettings”.

Any idea why I am getting this!? I am using VS 2017 and latest cmake+vtk9.0.1.

p.s. Slicer3D is not supporting unicode on win when I use the installer

You need to use a more recent windows SDK & toolchain version.

“UTF8 everywhere” switch happened in Slicer-4.11 version, so you need to use a Slicer Preview Release.

If you’re using VTK 9 you shouldn’t need the macro at all.

Good point, this may still be useful for applications that must run on older Windows versions.

On current Windows versions, using the UTF8 code page allows making system calls or use any libraries (not just VTK) without worrying about encodings.

I’m using vtk9.0.1,but the vtkSTLReader still can not handle chinese file name on win10-1903.

I’ve just tested if vtkSTLReader works with arbitrary characters in the filename and I confirm that it works well. Probably you have forgot to enable UTF8 code page in your application manifest. You can check that the code page is set correctly as shown in this test.

@albert.liu Please post a sample of your code.

VTK-9 can read non-Ascii filenames in any version of Windows. Simply provide the filename as a utf8 encoded string.

@lassoan In VTK-9 the code page setting is irrelevant and not required.

UINT activeCodePage = GetACP();
I trace this line in my project, the activeCodePage is 936.

For VTK9 library the application code page does not matter, but for the application it still does. Since in an application you need to use C runtime and other APIs, not just VTK (for example to get the filename from a command-line argument, console input, or GUI), you still need to set the application’s code page to UTF8 to avoid manual conversions between various string encodings.

This means that you don’t use UTF8 code page. You either need to convert the filename from “simplified Chinese” (CP936) to UTF8 before you pass it to VTK; or change your application manifest to use UTF8 code page (because then you can already get the filename as UTF8).

1 Like

I am testing Activiz.NET 9.x, and found objReader can not open file with non-ascii path, after setting the manifest with a UTF-8 activeCodePage, the problem is gone.

But the same obj file can be opened with Activiz.NET old version without activeCodePage setting.

Can you be more specific? What is the path/file name? Which version is the “old” version.

VTK 8.2+ expects/handles utf-8 encoded text.
For Activiz that means setting the UTF-8 codepage to do the UTF16 to UTF8 conversion on Windows.

The path name is “d:\test\保留\objs\test.obj”.

When using Activiz.NET 5.8, objReader can open the file without setting activeCodePage.