External Dashboard and CDash mails

mwestphal · March 11, 2022, 8:56am

Situation:

Some of you may know, some don’t, but VTK have a few external dashboards that are run nightly. There used to be a lot, but now, only RogueResearch dashboard (managed by @seanm ) are in this section.

How they work is that every night they will build the last VTK master with their configurations and reports any errors they may encounter on VTK dashboards, visible here:

https://open.cdash.org/index.php?project=VTK#!#Nightly+Expected

They also take a look at the mails of the commiters of the new commits of the day and send them a mail to inform that their commits may have broken something. Here is how these mails looks:

The “Nightly Expected” group has errors, warnings, or test failures.
You have been identified as one of the authors who have checked in changes that are part of this submission or you are listed in the default contact list.

Details on the submission can be found at https://open.cdash.org/index.php?project=VTK&date=2022-03-11.

Project: VTK
Site: RogueResearch19
Build Name: Mac12.x-AppleClang-dbg-Rosetta
Build Time: 2022-03-11 04:17:54
Type: Nightly
Total Failing Tests: 10

Failing Tests (first 5 included)
VTK::ViewsInfovisCxx-TestTreeMapView | Completed (Failed) | (https://open.cdash.org/test/638754713)
VTK::IOOggTheoraCxx-TestOggTheoraWriter | Completed (ILLEGAL) | (https://open.cdash.org/test/638754784)
VTK::RenderingAnnotationCxx-TestCubeAxes3 | Completed (Failed) | (https://open.cdash.org/test/638755314)
VTK::RenderingAnnotationCxx-TestCubeAxesWithYLines | Completed (Failed) | (https://open.cdash.org/test/638755335)
VTK::FiltersSelectionCxx-TestLinearSelector3D | Completed (Failed) | (https://open.cdash.org/test/638755368)

-CDash on open.cdash.org

Problem:

There is an issue with this situation.

As @seanm pointed out multiple time here and by mail, developpers will often breaks these dashboard and ignore the mail, usually without even realizing it. This reduce the quality and the usefullness of these dashboard.

This issue is caused by many factors:

This external dashboard process is not documented at all in develop.md: https://gitlab.kitware.com/vtk/vtk/-/blob/master/Documentation/dev/git/develop.md#testing, new developpers have no way to learn about it unless reading topics deep in the discourse or asking around. This can cause incomprehension and prevent new contributors to contribute again.
The mail is way too spammy, and inform on all broken tests, not only new broken tests, increasing the chance of the mail being ignored (I know I do).
The mail is not specific enough as VTK rythm of developpement is too fast for a nightly approach, with around 20 commits a day, which means that the chance that a mail concerns actually you is pretty low and requires a deep investigation.
This process is just too disconnected of our current approach of CI. What made sense in 2010 does not make sense in 2020 as the CI environnement has very much evolved, and for the better.

Possible Solutions:

I see a few solutions to this issue

External dashboard is considered not part of the CI of VTK but the responsability of the external buildbot manager.

It means that when something broke, this is @seanm responsability to test, investigate and track down the commit which is source of the issue and ask the developpers to fix it if this is an actual issue.

This is what we have been doing the past few months, but there is strong limitations. First, it can be a lot of work which can appear at any morning for Sean. Delaying the work only increase the associated workload. More over, if the issue cannot be reproduced locally, the developpers has no way to test that any potential fix actually fix the issue.

Improve the nightly process of testing in external dashboards

One of the main issue is that the mails are not specific enough and spammy. By improving the process we could send the mails to only the right person with an associated commit sha or MR. something like rerunning each failing test on each merge commit individually would probably narrow it down. The message could also be put on the MR directly instead of being set by mail.

This MR is breaking the following test on this cdash: [URL]

That doesnt fix the fix test issue mentionned earlier, and documentation would need to be improved.

Integrate external dashboards in the CI process fully

If we care about certain configuration being tested, they should just be integrated in the CI process proper. It is not 2010 anymore, lets keep modernizing our process and integrate it all in gitlab. Either by retiring RogueResearch and deploying new machines on the shared runners network for this task or by using RogueResearch as dedicated gitlab runner for the VTK project.

mwestphal · March 11, 2022, 8:59am

I personnaly am more in favor of 3, as uniyfing process lead to more stability and less work for contributors, but I think this is very much open to discussion, and input from @seanm should very much be considered !

@seanm @ben.boeckel @berk.geveci @Francois_Mazen

ben.boeckel · March 22, 2022, 6:09am

I think getting CI to cover most of the cases handled by RogueResearch machines is the best course. Of note:

older macOS targets (note that we tend to update our machines regularly due to EOL deadlines, so actually testing that things work on something like 10.10 is difficult)
warning flag sets (can be built on top of https://gitlab.kitware.com/vtk/vtk/-/merge_requests/8843
some external deps (possible, requires some work)

I think just compiling while targeting macOS 10.10 (or whatever we deem to be the oldest supported version at least) should be sufficient for that part of this; Apple is usually decent about the mechanisms at least. I’ll also note that our upgrade path tends to make older Xcode versions unrunnable, so testing “minvers” (something we need for Linux and Windows too) there is pretty difficult.

Is there anything else RogueResearch is doing that I’m glossing over here?

mwestphal · May 5, 2022, 6:29am

Without much feedback from the community, I will keep doing 1:

External dashboard is considered not part of the CI of VTK but the responsability of the external buildbot manager.

seanm · June 13, 2022, 8:06pm

Thanks for raising this issue Mathieu, I’m sorry I didn’t see it before (I have yet to find how to make @mentions stand out when using discourse mailing list mode.)

Some of you may know, some don’t, but VTK have a few external dashboards that are run nightly. There used to be a lot, but now, only RogueResearch dashboard (managed by @seanm ) are in this section.

Last I checked @dgobbi also submits builds.

But it is indeed a shame that there is no longer a wide and rich variety that there used to be years ago. It would be nice to reinvigourate that IMHO.

This external dashboard process is not documented at all in develop.md

It predates that document of course. Perhaps no one ever thought to write up anything. I could add a few sentences…

The mail is way too spammy, and inform on all broken tests, not only new broken tests, increasing the chance of the mail being ignored (I know I do).

True, those emails could certainly be improved. That would require changes to cdash I guess. I made one proposal here suggesting that cdash use git blame on lines with errors/warnings.

Another solution of course is to fix those tests.

Another “solution” is that I give up on them ever being fixed and just suppress any test currently failing. If we could get all the Rogue bots green, maybe they could be kept green.

External dashboard is considered not part of the CI of VTK but the responsability of the external buildbot manager.

Certainly part of the responsability falls to the external buildbot manager, and although I don’t have time for VTK everyday, I do think I’m pretty attentative/responsible for my bot submissions.

However, I do think the “you break it, you fix it” policy should also be applied. If one makes a code change to VTK, one must be prepared to do a little cleanup promptly after.

Integrate external dashboards in the CI process fully

I don’t know anything about gitlab CI, but maybe there is some way to better integrate my bots with gitlab? My bots are not dedicated to VTK though, they also build other projects, both internal and external, so they are only available for VTK between certain hours of the day (overnight really); during the day they are mostly busy building my own stuff.

Also, let’s not pretend that it’s only the Nightly submissions on cdash that are messy. The gitlib CI is never green either. I don’t think I’ve ever created an MR that doesn’t have some warning or test failure because of some issue that already exists in master. I’d favour a policy of “no merges unless green”.

Another “easy” thing that would help is to actually turn on some warning flags on the gitlab CI. Currently the gitlab CI uses only compiler default warnings, which is a very low bar.

Is there anything else RogueResearch is doing that I’m glossing over here?

Address Sanitizer, Thread Sanitizer, Rosetta, Universal Binaries, GuardMalloc, pre-release versions of macOS, pre-release versions of clang, various compiler flag permutations (LTO, -fstack-protector-all, -ftrapv, -ftrivial-auto-var-init=pattern, -std=c++11, -std=c++14, -std=c++17, -std=c++20).

Testing is an infinitely large problem space, and pre-merge testing is great and covers a lot, but there’s so much more, and there’s only time to do some of it overnight.

mwestphal · June 13, 2022, 8:53pm

As a VTK developper, and I know other VTK dev feel the same, we expect the CI we care about to be in the MR, not outside of it.

I don’t know anything about gitlab CI, but maybe there is some way to better integrate my bots with gitlab?

I’m afraid not. For security reasons, all the gitlab CI runner should be on Kitware internal network.

The gitlib CI is never green either.

I beg to differ. The CI is green (Apart for this one lone test that is currently being fixed by @spyridon97.) and we are crafting guidelines and enforcement to ensure that is stays green.

Another “easy” thing that would help is to actually turn on some warning flags on the gitlab CI. Currently the gitlab CI uses only compiler default warnings, which is a very low bar.

The CI process is open. Anyone can enable any warnings if they feel like it. I know that @ben.boeckel and myself will be happy to review such MR.

Testing is an infinitely large problem space, and pre-merge testing is great and covers a lot, but there’s so much more, and there’s only time to do some of it overnight.

In 2022, premerging testing is what matters imo. I’m not against somebody else testing what matters to them. I do have my own VTK build that test some stuff that I care about, but what the project cares about should be tested in the merge request, not after.

seanm · June 14, 2022, 9:53pm

I don’t know anything about gitlab CI, but maybe there is some way to better integrate my bots with gitlab?

I’m afraid not. For security reasons, all the gitlab CI runner should be on Kitware internal network.

What security reasons are those?

The gitlib CI is never green either.

I beg to differ. The CI is green and we are crafting guidelines and enforcement to ensure that is stays green. Apart for this one lone test that is currently being fixed by @spyridon97.

One lone test not green == not green. But, ok, hopefully it’s greener than it used to be, it used to be very non-green.

Another “easy” thing that would help is to actually turn on some warning flags on the gitlab CI. Currently the gitlab CI uses only compiler default warnings, which is a very low bar.

The CI process is open. Anyone can enable any warnings if they feel like it. I know that @ben.boeckel and myself will be happy to review such MR.

As you know there is a WIP https://gitlab.kitware.com/vtk/vtk/-/merge_requests/8843

Testing is an infinitely large problem space, and pre-merge testing is great and covers a lot, but there’s so much more, and there’s only time to do some of it overnight.

In 2022, premerging testing is what matters imo. I’m not against somebody else testing what matters to them. I do have my own VTK build that test some stuff that I care about, but what the project cares about should be tested in the merge request, not after.

Premerge testing is certainly preferable to post-merge testing, I agree. I’m saying it’s unrealistic to cover everything with premerge testing. There’s literally not enough time, and there are too many variations.

Please explain how we can have premerge testing with address sanitizer, thread sanitizer, valgrind, beta versions of operating systems, etc. etc. Absent some way I’m not seeing, better to have them nightly than not at all, no?

Sean

mwestphal · June 15, 2022, 6:59am

Variations just dont matters. If it is tested outside MR CI, just put it inside MR CI.
All that can be tested in a reasonnable timeframe should be included in the MR CI.

Thoses that can’t be tested in a reasonnable timeframe should not be the responsibility of devs, but of the responsibility of the maintainer of this specific CI.

seanm · June 15, 2022, 4:29pm

seanm:

Please explain how we can have premerge testing with address sanitizer, thread sanitizer, valgrind, beta versions of operating systems, etc. etc. Absent some way I’m not seeing, better to have them nightly than not at all, no?

Variations just dont matters.

I must say I’m shocked that you think that. More testing is good, not bad.

If it is tested outside MR CI, just put it inside MR CI.

You said yesterday all gitlab runners have to be in the kitware LAN, so how can I do that?

All that can be tested in a reasonnable timeframe should be included in the MR CI.

Agreed. And this is not currently the case. The current CI has very low warning levels and easy bugs are being missed.

Thoses that can’t be tested in a reasonnable timeframe should not be the responsibility of devs, but of the responsibility of the maintainer of this specific CI.

Strong disagree. IMNSHO, it should be the shared responsibility of the bot maintainer and the person that changed/broke the code. I think of it this way: Just because I find a bug you caused, doesn’t mean it’s my job to fix it, but I’m happy to help get it fixed together.

A concrete example: one of my bots has been showing a warning about an infinite recursion for over a month. And since apparently few people look at cdash, this code has been buggy for a month. Now, sure, we both agree that it would be good if this could be caught pre-merge. We should work to improve pre-merge testing. But some things are always going to slip through, and if we can catch them post-merge, that’s better than shipping the bugs and letting our customers find them.

When I see a new warning appear today in foo.cxx, it’s hard for me to know: who touched that file yesterday? which MR was the change related to? (it would be nice if cdash could be improved to make this easier.) But if the person who merged MR #123 gets an email showing a new warning in foo.cxx the day after, it’s easy for him to remember that he touched that file yesterday, and to fix his mistake.

Hopefully once the pre-merge bots are improved to catch more, there will be less issues that the post-merged bots find, and thus the cdash emails will be fewer and smaller.

Sean

mwestphal · June 16, 2022, 5:21am

To clarify:

developer: anyone that contribute to VTK, unrelated to Kitware or have rights to merge
maintainer: someone that upload CDash results from a nightly runner to VTK CDash (unrelated to Kitware or actual VTK maintainers)

Miscommunication here. What I’m trying to say is that variations dont matter in the context of our discussions, as any external runner could be turned to an interrnal, premerge, runner.

It is not the many variations the issue, but the fact that certains variations can take too long to come back.

This is the only runner that make sense to run not premerge IMO.

You said yesterday all gitlab runners have to be in the kitware LAN, so how can I do that?

Not possible afaik, but @ben.boeckel or @berk.geveci can give more insight here.

It is possible to add you own flags to existing jobs, add new jobs and such in GitLab CI though, see below.

Strong disagree. IMNSHO, it should be the shared responsibility of the bot maintainer and the person that changed/broke the code. I think of it this way: Just because I find a bug you caused, doesn’t mean it’s my job to fix it, but I’m happy to help get it fixed together.

Sure, a warning one someelse runner can be considered an issue and fixed by the developper. But I strongly believe that:

The tracking down of the issue is not the responsability of the developper, but of the maintainer
The actual correction of the issue can be down by the developers but if the fix is complex, require a redesign, inccurs complexity, then the developpers can definitely NOT fix the issue and leave it open.

Let’s imagine the following scenarios:

Developper add a feature
A few years later, a maintainer add a external nightly runner to CDash, new warnings on the feature pop ups
In that case, it is definitelly the maintainer responsability to track down and fix any warnings

Another one:

Maintainer add a external nightly runner to CDash, to check for VS 2012, which is not supported officially (because not tested in the premerge CI btw)
Developers add a feature that break on VS2012
In that case, it is definitelly the maintainer responsability to track down and fix any warnings

Another one:

Maintainer add a clang tidy external nightly runner to CDash, to check for tidy warnings that are not check on premerge CI yet
Warnings pops up of course
Developers add a feature add more warnings
In that case, it is definitelly the maintainer responsability to track down and fix any warnings

With all these scenarios, the problem is that anyone, without consulting the community nor being reviewed by the developpers or even other maintainers for that matters, can add an nightly runners.
Kitwareans cant access these runners for on site debugging.
Developpers cant run these runners on the CI to check that their fix is working.

In the other hand, gitlab CI is a open CI system, where any developpers can add some more CI configurations or jobs. Kitware GitLab provide runners for the VTK project, which can be used.

Nightly, external to the repo, runners should be deprecated in the VTK project and maintainers should contribute their configuration to the GitLab CI jobs of VTK, relying on the runners of VTK.

Of course, if that is not possible, then external nightly CDash provided runners should be the responsability of the maintainers and warnings and errors on these runnners can be considered like any VTK issues, and fixed in the same way.

(On a side note, @seanm you may be interested by this: Documentation about reading and understanding CI results)

ben.boeckel · June 16, 2022, 12:01pm

CI has been green for most of the past week (at least before looking at CDash; I get direct emails for the nightly pipelines).

Also relevant: https://gitlab.kitware.com/vtk/vtk/-/merge_requests/8409

These were never fully working and I disabled them since the sanitizers weren’t actually running. Turning them on was a bloodbath, so until some time can be invested to making them happy, they’ll stay off.

Meh. AFAICT, there’s nothing for us to do if there are problems other than “tell Apple to fix their emulator”.

VTK doesn’t release binaries.

All macOS builds should be happening under this (buildbot did it; need to enable for CI yet).

More important are Xcode releases IMO. And testing the oldest supported (though given how Apple’s treadmill works, the oldest Xcode we can run is Xcode 13.0 because macOS 12 refuses to run older Xcode versions).

It’s highly unlikely that this is feasible. UBSan should handle -ftrivial-auto-var-init=pattern and -ftrapv. -fstack-protector-all is not a new configuration: you should be able to toss it into any build and still expect success. C++ standards…meh. I think we’ve avoided what gets deprecated and doing too much “if C++14” conditional code is too much maintenance. We’ll move when we need.

More specifically: we have our CI machines set up to avoid damage from a malicious MR. I don’t want to ask external machines to do the same thing.

Agreed. This is the biggest benefit over buildbot: the configurations are editable by anyone, not just those with access to a Kitware-internal repo.

Yes. 9.2 has been waiting for greener pastures^WCI processes.

I don’t think this will happen even post-merge as it is just too much maintenance on these machines. I’m happy that our machines are basically hands-off except when new IDE versions are released. Having to also do daily OS upgrade trains is far too much. It’s more on these vendors to not break things IMO (though we all know how fond Apple is of doing so anyways).

That’s not tenable. There are way too many things to test inside the MR. We already only run pre-3.10 wheel tests only upon merging. MR authors can ungate it for their MR if they want to check though.

We just don’t have the horsepower to do everything. That’s not going to change anytime soon. We can enhance what we have, but new configurations do need resource considerations. One of the root causes here is that VTK has a combinatorial explosion of configurations. I try to stuff as much as I can into each job, but sometimes “not X” is interesting to test as well (of note, render-less, MPI-less, and Python-less right now).

I do have an MR up to bump what we’re using in CI, but every time it ends up in a morass of new clang-tidy lints (some of which are interesting). The biggest thing I think is missing is the “minimum versions” testing (which should test as many minvers as possible including Qt and such).

mwestphal · June 16, 2022, 12:29pm

That’s fair, but we cant expect the VTK devs to care about the “everything” either.

ben.boeckel · June 16, 2022, 12:55pm

That’s true. However, we can certainly leverage the tools we have, but aren’t using.

This includes warning flags, getting sanitizers to be green, runtime checks (MallocGuard on macOS and MALLOC_CHECK_=3 with glibc), etc.