Replace unsafe C functions Proposal

There are a couple of categories of unsafe functions that are used throughout VTK/ParaView/kwsys and third-party libraries. For the third-party libraries we can’t do anything, but for VTK/ParaView/kwsys we should replace them with safer alternatives. The discussion about this topic was initiated in the ParaView issue.

The unsafe C functions categories that should be replaced are the following

  1. ato*, e.g. atoi, atof e.t.c.
  2. mem*, e.g. memcmp/memcpy/memset e.t.c.
  3. str*. e.g. strcpy, strcmp, strcat, strlen, strtok e.t.c.
  4. printf, sprintf
  5. fprintf
  6. snprintf

The safe replacements that could be used are:

  1. strto*, e.g. strtol, strtof
  2. mem*_s from the safestringlib
  3. str*_s from the safestringlib, or replace all char* class members with vtkStdString which can be implicited converted to char*.
  4. print (also 35% faster) from the fmt
  5. output_file & print function (also faster by 5 - 9 times) from the fmt
  6. format (also 30% faster) from the fmt

To implement these changes in the existing code base, we need to decide which project will include fmt and/or safestringlib. Since ParaView inherits everything from VTK, we don’t need to add anything to it. As of now VTK has fmt as a third-party library.

Based on my understanding we have the following options:

  1. Add safestringlib to VTK and let kwsys as it is (with the unsafe functions)
  2. Add safestrinblib to kwsys and let kwsys’s printf/sprintf/fprintf.snprintf functions be. This way VTK will inherit safestringlib functionality
  3. Move fmt to kwsys and add safestringlib too to kwsys. This way VTK will inherit fmt and safestringlib functionality

Any suggestions or comments would be more than welcome!

1 Like

memcmp and memset seem fine to me; they have explicit bounds passed in. memcpy has the standard buffer juggling issues though.

std::string and fmt should handle all of the use cases here.

I don’t think the half-way things here are any use; just use std::string (or std::vector<char> for raw bytes) and fmt instead. I don’t think safestringlib is of any interest to C++ codebases.

What about kwsys?

kwsys is fine for string-as-path functions I think. It is, however, still C++ and therefore safestringlib is of limited utility.

What alternative do you propose for memcpy?

std::copy should suffice I imagine.

Careful, memcpy is highly-optimized intrinsic on all compilers. So a blanket conversion to std::copy will impact performance.

While true, I’d rather things be correct and then pointed out “this needs to be faster” than the reverse after a debugging session.

Note there are multiple pitfalls with memcpy:

  • it calculates in bytes, not objects (overflow on the manual multiplication is…a thing)
  • only works on POD types with trivial constructors (other types do not “exist” without their constructor being called; custom copy ctors may also block the destination from being valid objects)
  • does not verify that the source and destination types are compatible
  • alignment fun is probably lurking around here

So after some discussion of how this can go, some things that came up:

Compatibility

So vtkStdString is on my list of things to go, but we can use its drawback to help bridge the compatibility here. It has an implicit const char* conversion operator (very dangerous, but no worse than C pointers anyways). We can use this to our advantage here. Note that classes that use nullptr as some magic value would need something like vtkOptionalString with convenience methods to convert to nullptr if it is none. Converting a char* member to vtkStdString can be kept compatible with:

Old code:

vtkSetStringMacro(Member);
// virtual void SetMember(const char* _arg) { … }
vtkGetStringMacro(Member);
// virtual char* GetMember() { … }

char* Member;

New code:

nullptr is invalid:

void SetMember(const char* _arg) { // Need to handle `nullptr` callers
    if (_arg) {
        this->SetMember(vtkStdString(_arg));
    }
}
void SetMember(vtkStdString _arg); // impl in C++ code (see below)
vtkStdString const& GetMember() const { return this->Member; }

vtkStdString Member;

// --------------

void SetMember(vtkStdString _arg) {
    if (_arg != this->Member) {
        this->Member = std::move(_arg);
        this->Modified();
    }
}

nullptr is valid:

void SetMember(vtkOptionalString _arg); // impl in C++ code (same body as above)
vtkOptionalString const& GetMember() const { return this->Member; }

vtkOptionalString would be another interim class to help bridge the gap here which just turns nullptr into std::none_t. It’s implicit const char* conversion would similarly be deprecated someday (just like vtkStdString's).

vtkSetGet macros

I find these macros to be more noise than useful. About the only thing they do is:

  • avoid typos
  • ensure Modified() is called properly in setters

Other than that, I see no benefit. I think that unit tests are far better for ensuring such things. The benefits:

  • virtual would no longer be used for these things
  • getters could be const
  • getters could return const& instead of forcing a copy
  • setters could std::move for non-POD types (allowing the caller to decide if memory is reused instead of pessimizing and forcing a copy even when the caller doesn’t need the string anymore)
  • headers lose a lot of unnecessary code that triggers a full rebuild when vtkSetGet is changed
  • the member name is no longer pinned to the method name (allowing for more PIMPL usage)

The inline-ness is largely useless for the setters because VTK doesn’t support devirtualization of Modified calls (the use of factory constructors hides necessary knowledge even for vtkNew<vtkObject> obj; obj->SetSomething();), so the code is the same for every single caller and any actual inlining is of no benefit (it just inflates the caller’s code size).

If you are testing for a nullptr why do you need to do an implicit conversion? Why use vtkStdString at all?

There are three things here:

Why test for nullptr

Existing callers can pass nullptr; we need to guard against that.

why do you need to do an implicit conversion

So that the member can be automatically managed instead of being manual (with macro support)

Why use vtkStdString at all?

Because we need to preserve const char* access to the getter (it is currently char*, but actually modifying it bypasses Modified() calls to everyone’s great enjoyment I’m sure; anyone actually doing that is just setting themselves up for failure). Making the return type vtkStdString const& preserves that conversion (auto* s = obj->GetString(); is messed up though…but there’s an easy compatible way to write it too with an explicit const char*).

Because char* management is finicky. Common faults:

  • not releasing the buffer before reuse
  • using the wrong allocation mechanism (vtkSetGet uses new[], so delete[] is needed)
  • forgetting to set it back to nullptr in the destructor

In order to remove our use of C string library functions, not using char* for strings is basically required unless we’re going to pay for a strlen per string_view constructed from one for further manipulations. You may as well just bundle the allocation and length memorization in std::string (of which, vtkStdString is a better compatibility story).

I think that this work needs to be split into two parts.

  1. Replace:
    1. ato*, e.g. atoi, atof e.t.c.
    2. printf, sprintf, snprintf
    3. fprintf, fwrite
    4. scanf
    5. fscanf
  2. Replace
    1. mem*, e.g. memcmp/memcpy/memset e.t.c.
    2. str*. e.g. strcpy, strcmp, strcat, strlen, strtok e.t.c.

The first part will fix half the safety issues and give us greater performance in all vtk writers.
The second part will help us fix the remaining safety issues and completely remove the char* usages, after figuring out the best cost/effective approach (which we are close to do so).

Is it not better to just deprecate the existing Getter and add one with slightly different syntax?
e.g.

std::string const& Member() const { return this->Member; }

Well later on removal of deprecation would require again a lot of work if we add an extra function. That’s one of the downsides.

If the intention is to eliminate vtkStdString then I don’t see how extending its use improves the situation.

The priority here is to remove unsafe C functions. If vtkStdString can help us minimize that cost. And later on, just do using vtkStdString = std::string, it would be ideal.

Sure, but the inconsistency of “some members use Get prefixes, others don’t” would persist without another round of deprecation and renaming on everyone’s plate.

vtkStdString itself wouldn’t change; it provides a nice ramp between the status quo and a better world (without either naming inconsistencies or double renames in the long run).

We should also ban all uses of functions in this manpage (the is* family of ASCII classification functions) because they are UB if the argument is not in [0,127] or EOF itself.

Brought to my attention in this thread.

Cc: @dgobbi

Any suggested replacements?

The “C” locale overloads in C++ are apparently well-behaved. std::isblank(std::locale) - cppreference.com

std::locale c_locale("C");
std::isdigit(c, c_locale);