How to return std::string

While reviewing a merge request I noticed that while VTK generally returns std::string by value. This is good, as it allows to keep a stable API while implementation details can change. In contrast, returning const std::string& would require the implementation to always store a std::string internally.

Could this added to the VTK coding standard to ensure that all developers are aware of this?

There are currently 5 classes that uses const std::string& return value in a public interface (vtkAMReXParticlesReader, vtkGDALRasterReader, vtkPLYWriter, vtkCompositeDataDisplayAttributes, vtkOpenVRModel). Should we change these to std::string or at least comment that this is non-compliant API to avoid this practice spreading in VTK via copy-paste.

In VTK, you cant have std::string in the API, use the macros (vtkSet/GetSTDStringMacro) if you want to store a std::string and have a setter/getter.

STL usage is indeed generally discouraged on public interface by the VTK coding standard. The only exception is std::string, which is the currently preferred way for character data and it is used widely in modern VTK classes (and the deprecated vtkStdString variant is used in many older classes):

Exception: std::string should be used as the container for all 8-bit character data, and is permitted throughout VTK.

(see : Specific C++ Language Guidelines / C++ Standard Library section)

vtkSetMacro/vtkGetMacro is used with std::string and vtkStdString at a number of places, but vtkSetStringMacro/vtkGetStringMacro remains much more widely used. I guess because it is somewhat risky to switch over from const char* input to const std::string& input (as passing a nullptr as const std::string& input crashes the application).

That seems quite out-of-date because at some point, VTK started assuming that strings are all encoded using UTF-8 (at least pays lip service to it; filepaths are generally not looked at too hard I think).

For the MR in question, this is already true because SetObjectName exists. Where else is a class going to store the name if that is intended to be effectful and visible by GetObjectName?

Those macros don’t change anything; std::string is still in the API and there’s nothing wrong with that.

That said, the API in the MR that triggered this thread should not use those macros because it should not call Modified() in the setter.

I’m pretty sure they do change something:

//
// Get character string.  Creates member Get"name"()
// (e.g., char *GetFilename());
//
#define vtkGetCharFromStdStringMacro(name)                                                         \
  virtual const char* Get##name()                                                                  \
  {                                                                                                \
    vtkDebugMacro(<< this->GetClassName() << " (" << this << "): returning " << #name " of "       \
                  << this->name);                                                                  \
    return this->name.c_str();                                                                     \
  }

My understanding is that std::string should not be present in the API as ParaView StringVectorProperty cant create ClientServer warping for it, but I may be mistaken.

That sounds like a reason to enhance CS wrapping to me. Python and Java both handle it. Python also handles std::vector and std::map too for that matter.

1 Like

I’m not sure VTK signed up for UTF8 everywhere but I would fully support it. It would reduce maintenance efforts, at the cost of slight extra effort from developers to ensure that UTF8 code page is used in the application. This is could be a separate topic to discuss.

The coding standard indeed did not get much attention. This topic just addresses a small improvement in it. I agree that it would deserve an update, and probably moving it to a revision controller repository, probably into VTK repository itself. This could be a separate discussion.

Method arguments are not an issue at all. They can be const std::string&. Only return values are problematic.

It is all about making sure that the implementation details can change while the API remains the same. We just would not want to tie hands of future VTK developers by telling them that they internally must store the string as a std::string. That’s why Qt and other libraries that take API compatibility seriously do this, too.

I already have an example of why we would immediately benefit from returning std::string for GetObjectName. In MRML library, we would probably override GetObjectName to return the MRML node name, which is stored as a char[] for historical reasons. So, returning std::string would be much simpler, because we would not need to copy the string to a member variable just to be able to return it.

In general, it is easy to find examples for why returning std::string is better for future compatibility. For example, it is quite possible that at some point VTK will switch to storing file paths in some kind of FilePath or URL object (it may be implemented as a VTK object, or come from STL or other library). It can be beneficial to change the internal implementation so that the path is stored in this specialized object, without impacting the public API. If you return std::string then it is not a problem at all. If you return const std::string& then you need to add an additional string member variable just to be able to return that string.

Those macros don’t change anything; std::string is still in the API and there’s nothing wrong with that. That said, the API in the MR that triggered this thread should not use those macros because it should not call Modified() in the setter.

Agreed. This topic is not about GetObjectName/SetObjectName but about how to return strings in VTK in general. It is probably better to keep the discussion about GetObjectName/SetObjectName in the merge request.

I’m saying that if the SetObjectName method argument is going to be stored somewhere, don’t we already have a std::string to return a reference to? Why is forcing a copy of this name useful if the result is only ever used by reference? The caller can force a copy if needed.

So what would SetObjectName do here? There’s no way to know that it will be ignored because it returns void, so anything trying to remember “I named this object” will lose track of it for this case.

I think this can make sense for virtual methods that query for information. Anything with setters is probably going to have a store for that information, so returning a const& in that case seems fine to me. Since not all methods are virtual, I think any guidelines should be a nuanced on this point. I’ll note that it’s probably a mistake that vtkSetGet.h macros make setters and getters virtual (and getters are not const either), so const& can still be done there.

I guess this makes me think that we really want std::string_view instead so that even the char[] case doesn’t need to allocate heap memory (except that std::string_view is not as easily usable as std::string because many operations do not exist for the type).

1 Like

This is a more general discussion, just about how methods in VTK should return strings. There may not even be a Set... method.

The assumption is that due to return value optimization actual string copy does not actually happen. But even if it does, the impact is generally negligible (except maybe some very rare special cases).

When library developers need to choose theoretical performance improvements in edge cases (but most likely just premature optimization) and a more future-proof API then they typically choose the latter.

VTK has already chosen to return std::string by value: there are hundreds of examples for that, while only 5 examples for returning const ref. You can also have a look at any other C++ libraries and let us know if you find any that promotes the use const std::string& for returning strings.

Yes, the need for flexibility is more obvious for virtual methods, but returning by value gives library developers flexibility in all cases when the internal implementation of a method may ever change in VTK itself.

Hrm. It sounds to me like we really want Rust’s Cow<'static, str> which is either (in C++ terms) std::string_view or a std::string, but can be used as much as the intersection of their APIs (basically the const part) as well as a method to say “I want to own this now” that would return a std::string. Alas, C++ does not have the core mechanisms which makes such a type ergonomic. I suppose returning std::string is probably the best we can do today.

return this->Member; returning a value will never have RVO applied and will always at least be a copy because RVO is for returning local variables (at least that’s my understanding).

My understanding is that std::string is returned by value when ownership of the character data should be returned to the function caller so it seems appropriate that const std::string& be used where the data is a field/property of an object.
Java, Python and C# wrappers always make a copy though as they translate utf-8 into a native string form.

UTF8 is used everywhere for char*/std::string in VTK (including paths) and UTF8 is specified as the application code page in KWSYS.


https://gitlab.kitware.com/vtk/vtk/-/merge_requests/6122/diffs?view=inline#3c9eb8e4774d56dbbed69f3ecf45b6b638828015

I did a very rough timing test of returning a string member variable by value and const reference.

If the code does not do anything else in a loop than returning a string member variable then with full compiler optimization returning a std::string takes about 1e-8s, while returning const std::string& takes about 1e-13s. So, return by value is indeed much slower. However, both is fast enough that is highly unlikely to impact performance in a real scenario, when we do some real work in that loop (e.g., store the result in a local std::string, do some floating-point math, etc.).

Code usedfor testing
#include <chrono>
#include <iostream>
#include <string>

class MyClass
{
public:
    std::string GetNameByValue() { return this->Name; };
    const std::string& GetNameByReference() { return this->Name; }
    std::string GetLocalNameByValue() { std::string name  { "This is a name" }; return name; };

    std::string Name { "This is a name" };
};

int main()
{
    unsigned int repeat = 1e6;
    
    MyClass tester;
    char someResult = 0;

    // ---------------------------------------------------------
    auto start = std::chrono::high_resolution_clock::now();
    for (unsigned int i = 0; i < repeat; i++)
    {
        someResult += tester.GetLocalNameByValue()[5];
    }
    auto finish = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = finish - start;
    std::cout << someResult <<" Get local by reference: " << elapsed.count()/repeat << " s\n";
    
    // ---------------------------------------------------------
    start = std::chrono::high_resolution_clock::now();
    for (unsigned int i = 0; i < repeat; i++)
    {
        someResult += tester.GetNameByValue()[5];
    }
    finish = std::chrono::high_resolution_clock::now();
    elapsed = finish - start;
    std::cout << someResult <<" Get member by value: " << elapsed.count()/repeat << " s\n";

    // ---------------------------------------------------------
    start = std::chrono::high_resolution_clock::now();
    for (unsigned int i = 0; i < repeat; i++)
    {
        someResult += tester.GetNameByReference()[5];
    }
    finish = std::chrono::high_resolution_clock::now();
    elapsed = finish - start;
    std::cout << someResult <<" Get member by reference: " << elapsed.count()/repeat << " s\n";

    return 0;
}

In case of a complex, large container I agree that it is more appropriate to return by const ref because the performance overhead may be very significant. For simple strings the burden of making a long-term commitment to use a std::string member variable for internal storage may overweigh the performance gain.