Clang format discussion

At the hackathon this week, there was discussion of using clang-format on VTK to enforce coding style guidelines. This allows formatting of a merge request using Do: reformat so that local formatting isn’t necessary (and that various versions of clang-format don’t keep disagreeing on minor details in the formatting). ParaView, CMake, VTK-m, and the SMTK projects have been doing this for quite a while with great success.

However, to do this effectively, a sweep of the entire codebase to normalize the formatting is required (otherwise any edit to a file is going to force a reformat of the entire file leading to very confounding MR reviews). It is best to do this before the 9.0 release so that backports can actually be applied between the release and master branches with the same code.

For those wanting a way to easily rebase your branches across the reformat, a procedure will be provided (I know how it should work, but need to write down the steps and test them out for robustness). Any incremental (e.g., directory-at-a-time) reformatting would require this process for each formatting “sweep” your merge requests touch. Due to this reason, a single reformatting sweep is preferred.

By using whatstyle (thanks @seanm!) on ParaView (which has consistent style), it has determined that ParaView’s clang-format-8 configuration file based on the contents of ParaViewCore/ClientServerCore/Core with some hand-tuning for parameters that whatstyle can’t figure out on its own (namely ColumnLimit):

BasedOnStyle: Mozilla
AlignAfterOpenBracket: DontAlign
AlignOperands: false
BinPackArguments: true
BinPackParameters: true
BreakBeforeBraces: Allman
ColumnLimit: 100
Standard: Cpp03

Which is very similar to what is used today.

An initial MR to sort include directives is here. This is done separately because it is highly likely to change behavior compared to the rest of the formatting configuration parameters. I’ll work through its failures until the dashboards are happy, though I suspect the nightly builders may have some followup thoughts on the sorting as well.

Cc: @utkarshayachit @Dave_DeMarle @RobertMaynard @brad.king @seanm @dgobbi @mwestphal

5 Likes

I think this will be great.

Removing the pedantic formatting barrier-to-entry for new developers is worth the git blame history wall in my opinion. There is always git log --follow to peer behind the wall when we need to.

For reference, CMake and ParaView transitions both left empty commits just before the transition commit that provides instructions for rebasing work-in-progress topics across the transition. See CMake’s commit and ParaView’s commit.

This is a really good idea.

FYI, ages ago, I dropped a .clang-format file into VTKExamples/src/Cxx which (I think) was based on the CMake one. It works quite well in both KDevelop and Visual Studio. Have a look at it if you are interested.

The point here is that it doesn’t matter what clang-format version developers have laying around or whether their editors support it. It also does not apply to third party code (which we should not be reformatting; the robot also checks for “improper” changes to third party code as well). The robot will check all MRs for proper formatting and reject them if they don’t adhere to what we’re using. It can, however, also fix your MR automatically via a Do: reformat command.

1 Like

I like it! This willl be so useful.

Andrew Maclean

The header sorting has been merged. I’ll try and get the full reformat up by the end of the week. An initial run had a 2.5M line patch for anyone curious.

The branch is available here. I need to write the docs and inject a commit with instructions for handling rebasing over the commit yet.

Hello,

Thanks for that effort, it will make the code/PR much more readable !

About the includes, some class uses either “” or <> and there are some space between some group of includes (vtkOpenGLGPUVolumeRayCastMapper is a good example), will clang format handle that properly ?

It sorts within groups. I don’t think it knows to make sure certain headers use "" or <>. There is support via IncludeCategories (https://clang.llvm.org/docs/ClangFormatStyleOptions.html) for doing that logic. CMake just moved over to using it.

I’d like to merge the reformat later today. I’ll have a preview of the branch up in a merge request around noon after I’ve tested that things still build on my machine. At the end of the day (around 4:30 or so), I’ll do a final rebase, run the build locally again, and let the robot sanity check it and then merge it. I don’t know that the buildbots will get back fast enough to not have something else go in before it and conflict with it (plus, if we have tests dependent on the formatting of the code…).

Any objections? @Dave_DeMarle @ken-martin @allison.vacanti @seanm (sorry, this will inevitably make const madness need yet another rebase) @utkarshayachit @brad.king @RobertMaynard

1 Like

Fine with me. I’d like to go over the branch together IRL when you prepare it.

This sounds like a nice change. We have something similar at work, but where the build will fail if it doesn’t adhere to the clang-format style we use. Having a reformatter bot would be even nicer, though we’ve still saved a lot of time during reviews.

The only thing I can recommend is picking the very latest clang-format version you can get your hands on for the bot. That way its longevity in terms of staying packaged in distros, being supported upstream etc will be maximized.

We are in the situation now that we set up our CI format check back when clang-format-4.0 was recent, and have recommended our devs to auto-format on save using that version, to avoid a yelling CI bot, but now clang-format-4.0 is ancient and we’re considering bumping the version on our CI, which will bring with it some behavioral changes (even given the same config).

So just a recommendation to go with the latest, to postpone that day for you as much as possible. Our situation is a little different of course, and you guys may be able to run the same version on the bot for eternity.

I have a pending change into the robot to be able to just do the sweeping reformat, so updating in the future will be as easy as:

  • make new version available on the bot
  • prep a topic which changes format.clang-format=8 to =9
  • make an empty commit for the robot to reformat the repo into
  • run the reformatter until it hits a fix point
1 Like

So a wrench in the process. HeaderTests had some failures. Looking into it, a lot of macro calls do not have a trailing semicolon. clang-format goes a little greedy with this and starts indenting things. I’m currently going through and adding semicolons everywhere it looks like they’re missing and then I’ll redo the format. I’ll try to get in early Monday to merge it, otherwise it will have to wait for some other end-of-day to conflict with just about every MR.

Which macros?

You might be able to get the compiler to warn you if you reformulate the macros as do-while loops, then the compiler will error if there’s no semi-colon.

See also:
https://wiki.sei.cmu.edu/confluence/display/c/PRE10-C.+Wrap+multistatement+macros+in+a+do-while+loop

https://wiki.sei.cmu.edu/confluence/display/c/PRE11-C.+Do+not+conclude+macro+definitions+with+a+semicolon

Some, yes. But other synthesize method implementations, so technically it will trigger clang-tidy's “unnecessary” semicolon lint, but if clang-format is going to wreck the code without it, I’m going to call it “necessary”. I have 426 files edited right now for the semicolons, so I’m just going to wrap it up in the format change. I’ll have to do a buildbot run to get more modules than I can build locally looked at by a compiler though.

Or you could really start confusing people with more obscure semicolon swallowers :slight_smile:

#define SOME_MACRO(x) \
  if (x) \
  { \
    foo(); \
  } \
  [](){}()

But I’m guessing that most of those are for Set/Get macros, and I can’t think of any good way to require a semicolon on them.

Edit: appending struct {} or enum {} work to silence -Wextra-semi, but throw new warnings under -Wpedantic :frowning:

clang-format making the code look like a cascade of indentation is an effective, if unfortunate, way of “requiring” the semicolon.

1 Like

This is just a thought. You could bracket the macros in a:

// clang-format off

// Group of macro definitions ....

// clang-format on

Then there is no need to add semicolons. However the macros will need to be manually formatted in new code.