Clang format discussion

It’s not the macros themselves; it’s the usage. And even that doesn’t hide the fact that there’s a missing semicolon; clang-format still indents afterwards (it just doesn’t edit the bracked code code). Luckily, VTK is very good about putting Macro in macro names, so it was easy to find all the usages (though some may have escaped my search, their telltale diff pattern also doesn’t show up in a scan of the patch).

Status

The rest may be TL;DR for some, so if you just want the diff, I’ve updated the clang-format branch on my fork. I have the buildbots testing it overnight (kwrobot needs to be told to check formatting before it can be merged and it will likely need a rebase for any other changes anyways). To those interested in some things that were done to make formatting work for VTK and notes for improvements I noticed, feel free to continue.

Manual hinting

I’ve perused the diff and gotten it down to a ~1.7M line patch that removes a net 50k lines (the initial, naïve run added a net 400k lines with a 2.5M line patch) by fixing some things. Things I found:

  • imported or generated code (fixed by excluding them from formatting)
  • data arrays; I added hints to clang-format to format them the way they probably should be formatted and/or moved them to separate, ignored files.
  • hiding manually formatted code from the formatter (usually horizontally aligned line ups for mathematical formulas; usually after applying sensible diffs clang-format wanted to make)

“Bad” patterns seen while looking over the diffs

This isn’t too common, but something to watch out for.

  • Lots of tests which do SOA-style storage for test data rather than AOS.
  • Some data storage does int foo[][8] = {} and then has comments for what each element is for. Instead, an anonymous struct should be used to give each value set a name. Filters/* and Common/DataModel are the main culprits here.
  • Lots of point storage all over the place doing arr[npts * 3] rather than arr[npts][3]. Cell types and such do this a lot.
  • Lots of arrays that aren’t marked static and/or const. Most should probably become constexpr anyways. Happens ~everywhere in Filters/* and Common/DataModel.
  • Lots of the mathematical formula code doesn’t factor out common subexpressions like (1 + z) * (1 - z) and instead recalculate it a dozen times (just search for “derivative” case insensitively and you’ll find most of them). These sometimes, but not always, needed manual formatting hints if you want to look at that commit for a place to start.
  • There are many places where assignments were formatted in a matrix-style block. clang-format has split these into an assignment-per-line; proper matrix data structures should probably just be preferred rather than treating double[9] as a 3x3 matrix.

I saw lots of code that does:

expr1; expr2; // comment

which became:

expr1;
expr2; // comment

when it should probably be:

// comment
expr1;
expr2;

These are too hard to find mechanically, so I let clang-format do its thing.

Getting good formatting for data arrays

Things I learned about array formatting doing this:

int arr[] = { 0, 1 };

is well-formatted, but this is the proper format with a trailing comma:

int arr[] {
  0,
  1,
};

So if you want your arrays to be one element per line, add a trailing comma, otherwise don’t. This is how I hinted to clang-format how to format various data arrays (the hinting is a separate commit for anyone curious).

1 Like

@ben.boeckel , thanks for clarifying this, obviously my view was quite simplistic, I didn’t realise that it was the usage. I like your comments on making formatting work and notes for improvements.

The MR has now been merged.

1 Like

Now the doxygen comments in the header files wrap at around 70 or 80 columns (as per our old standard), but the code itself wraps at 100 columns. It’s readable but it looks kind of funny.

It might be too late, but would it be possible to change the current pointer/ref syntax from double* ptr; to double *ptr?

A lot of code in VTK still follows the C-ism of listing variables at the top of the function, often with multiple declarations per line. This can lead to misleading-looking code such as

vtkIdType* cells, npts, ptId;

which, at a glace, looks like all three variables are pointers, while

vtkIdType *cells, npts, ptId;

is (a bit) clearer. It also looks like discourse’s markdown syntax highlighter handles it better :slight_smile:

Ideally, we’d only declare one var per line and only declare them when they’re needed to keep scopes manageable, but there’s a lot of legacy code that doesn’t follow those practices.

@ben.boeckel I did a build in Windows and it failed at:

...\VTK\Common\Core\vtkWin32OutputWindow.cxx
...VTK\Common\Core\vtkWin32OutputWindow.cxx(351): error C2143: syntax error: missing ';' before 'this'
...\VTK\Common\Core\vtkWin32OutputWindow.cxx(358): error C2143: syntax error: missing ';' before 'this'

I turns out that for some reason I had VTK_LEGACY_REMOVE:BOOL=OFF in the CMakeCache.txt file. Once I set it to ON everything built OK.

Good point @allison.vacanti, it would be a bit clearer.

A bit of history

If you want to see some really old code look at ./Common/DataModel/Testing/Cxx/otherFieldData.cxx .
Way back in 2003 Ken Martin had to modify BerkGeveci’s original test function: int otherFieldData(int, char*[])):

  for(int i=0; i<5; i++)
    {
    // ...
    }

to:

  int i;
  // ...
  for(i=0; i<5; i++)
    {
    // ...
    }

This was a Visual Studio 6 requirement! This code has remained the same ever since. You’ll see this pattern in a lot of the older code.

gcc 2.7, way back in 1995, was the last gcc compiler to require this. Unfortunately VS6 remained in use for a long time so a lot of coders used this pattern. VS6.0 was introduced in 1998 however I can assure you that one of my ex-colleagues was still coding in it in 2012 when I retired. I could not get him to update!!!

Old habits die hard, for instance it pops up in Common/Core/vtkMersenneTwister_Private.cxx.

The pattern still pops up, see: [ What is “ Variable ’ i ’ was not declared in scope ” in c++?] note that one of the respondents is still using VS6. People got around this issue by declaring i outside the first loop.

1 Like

-1 for @allison.vacanti suggestion. Let’s not enforce these C-ism, it should instead be converted to :

 vtkIdType* cells;
 vtkIdType npts, ptId;

Thanks for the background, @amaclean. It’s an especially unfortunate issue for maintainers: trying to read a lengthy function and trace back a variable hundreds of lines when it could have just been scoped to 10-15 lines…things get complicated quickly!

@mwestphal That snippet still doesn’t follow most C++ style guidelines, which would put every variable on its own line. I’m certainly not in favor of enforcing these c-ism, but since our code is already filled with them, I would prefer a syntax that doesn’t complicate reading them. This issue is a common stumbling point, especially for beginners.

Even without these multivar decls, the &/* modifiers are associated with the variable’s identifier, not the type, and it makes more sense to group them this way IMO.

Indeed a matter of personal preference.

It’s preference as far as they both compile, though type *id; represents how the compiler parses the expression and is more readable in certain situations (which VTK is chock full of).

Meanwhile, type* id; …still compiles :slight_smile: I’ve honestly never heard a good reason to write it this way other than “it works too.”

Over in Qt, this age old question was quite recently brought up on the Qt developer mailing list, if you want some bike shedding inspiration from another camp :slight_smile: https://lists.qt-project.org/pipermail/development/2019-October/037701.html

That thread was started with someone wanting to go the opposite way though - make sure the star is always with the type - as they felt it was the more in line with what C++ outside of Qt does. Anyway, as expected, the thread ended with no action.

@allison.vacanti agree with you, I use ˋtype *id;ˋ it is much clearer IMHO. However it is not worth the effort of changing all the VTK code. @mwestphal your idea would make things much clearer.

However it is not worth the effort of changing all the VTK code

Normally I’d agree and I wouldn’t bring something like this up, but we already changed all of the code to be type* id; two days ago . We could change a parameter in a config file, rerun a script, and commit the result and most developers wouldn’t notice that there were two major style changes instead of just the one :slight_smile:

But I’ll stop advocating for this now. If we want to change it, great! If not, meh. Using this opportunity to fix the syntax with minimal additional disruption makes sense to me, but it’s ultimately not a major issue.

Note Google style guide allows attaching the modifier to either the variable or the type,
but does not allow declaring pointers and values in the same declaration.

https://google.github.io/styleguide/cppguide.html#Pointer_and_Reference_Expressions

Well, that’s a place I missed a semicolon. I missed the VTK_LEGACY_REPLACED_BODY calls that were split over 2 lines and didn’t see the error since buildbots build with LEGACY_REMOVE=ON.

That file just needs to die. https://gitlab.kitware.com/vtk/vtk/issues/17730

As for the pointer thing. The “attach to the variable” is a stupid C-ism. The pointer belongs to the type that is declared.

Is this something we can get clang-tidy to tell us about?

FTFY :slight_smile: It should be that way, but it isn’t.

However, my preferred solution is to avoid/prevent the confusing case, not to bow down to it :slight_smile: .

1 Like

My preferred solution is to make my mental model match what’s actually happening. Which means:

is 100% correct – I admittedly have Stockholm Syndrome with C++ :stuck_out_tongue:

Embrace the ugliness!

Well, my 2¢: I greatly prefer type* var.

I’m happy to see this formatting work, but personally I think it got merged way too fast. I first heard of it at the recent hackathon and it was merged just 7 business days later. For people like me that don’t work on VTK every day, that left no time to review it before.

Anyway… Now that this mega change has already occurred, I’d vote against another mega change of asterisk position.

There is this clang-tidy check:
https://clang.llvm.org/extra/clang-tidy/checks/readability-isolate-declaration.html

It seems to have done at least one weird thing to Objective-C++ code like:

https://gitlab.kitware.com/vtk/vtk/blob/master/Examples/GUI/Cocoa/MyWindowController.mm#L64

where -> got spaces added before and after. I couldn’t find any clang-format option related to this, but don’t know the tool well.

BTW is it normal that gitlab is not colouring Obj-C++ code at all?

Also, you might all find this interesting:
https://travisdowns.github.io/blog/2019/11/19/toupper.html

Sean