Struggling with geographic coordinates

cobo · March 22, 2019, 8:14pm

In my models I am working with geographic coordinates, which are meters in a range between -10’000 and +10’000, and some applications require a precision down to cm or mm - which is not covered any more with “float” data type coordinates.

However I see that many filters in the VTK class hierarchy fail with these conditions, and more and more I realize that it is often enough the fact that generating vtkPoints data structures are generated deep inside these classes, without consideration of data type, and with the consequence that it is always VTK_FLOAT by default.

A bit desperate I was considering to use a special “patch” version of VTK and ParaView in my project where this default is changed to being VTK_DOUBLE, so avoiding the need to more or less rewrite entire classes - only to find again others that are still making trouble.

But now I saw that I might get into even bigger trouble with this: In vtkExtractSelection::ExtractSelectedCells I see code like this:

  vtkNew<vtkPoints> newPts;
  newPts->Allocate(numPts/4,numPts);

which basically means: The VTK_FLOAT assumption is even hardcoded here in the memory allocation call!

Question: Is there any “global solution” that I can apply to get out of this problem, or is this more or less an open construction site where I will have to find my way case by case? Like either indeed copying and slightly changing the vtkExtractSelection class completely in my code, or patching my version of VTK/ParaView - making the maintenance of my project harder than normally required?

Any recommendation? How are others solving this problem - if they have it?

lassoan · March 22, 2019, 9:02pm

Don’t worry, this is handled quite nicely in many places in VTK. You should definitely not need to fork VTK but maybe just contribute a few fixes.

You can set any default scalar type in vtkPoints by registering a custom object factory (see for example this test).

Filters try to guess the best scalar type for their output and/or you may set it explicitly (see for example this smoothing filter).

If you come across a filter with hardcoded single-precision computations then fix them (by applying techniques used in other filters) and send a merge request.

Note that even double-precision has its limits and many algorithms may not work well for highly non-normalized data (when some data points are millimeters from each other, while others are thousands of kilometers away). You may need to represent your objects differently when you work at different scales.

cobo · March 22, 2019, 10:10pm

Ok, I have seen the method in that smoothing filter already in other filters, so it looks like some kind of “standard” for vtkAlgorithm derived classes.

Which basically means that you propose that I should go and implement it in classes like vtkExtractSelection or vtkCleanUntructuredGrid where I am missing such a thing specifically. Merging such updates into the main branch is of course better because I don’t have to take care of it in the future.

Regarding the vtkCleanUnstructuredGrid there is however still slightly more to consider: If you just switch from FLOAT to DOUBLE, the point merging will not merge lots of “merge candidates” because it looks like that the last bits of DOUBLE variables are often a bit “damaged” (switching between 80-bit and 64-bit representation??), so the plain x==y comparison gets wrong results. So for that class an additional precision parameter will be required, like 0.001 or the like, and then every point needs to be explicitly rounded to that precision before it is “fed” into the “Locator” variable that is supposed to merge equal points.

lassoan · March 22, 2019, 10:41pm

Floating-point math is hard. Simple x==y is rarely usable for assessing equality.

vtkCleanPolydata has a number of ways to specify tolerance (you can set absolute tolerance, specify custom locator, etc.). In contrast, ParaView’s vtkCleanUntructuredGrid filter is very limited in this regard. Ask ParaView developers why is this (it is just not implemented yet, or there is a better alternative to this filter, etc.) and if they think it is a good idea to implement vtkCleanPolydata-like point comparison tolerance.

Sebastien_Jourdain · March 23, 2019, 12:23am

What about calling SetDataTypeToDouble() on the vtkPoints instance?

cobo · March 23, 2019, 9:39am

Right, vtkPoints has the option to work with either FLOAT or DOUBLE. But the vtkPoints instances causing trouble for me are generated inside other filters, so it’s about these filters that need to be adapted in order to not always only go for the default - which is FLOAT.

Now I realize that the way how this is handled in different classes is not completely homogeneous in VTK and Paraview, and also inside the projects, so I see that probably the suggestion of Andras Lasso is the best: look at the specific classes that you need to use, fix them, and ideally finish with a merge request, contributing thus to the improvement of the two projects. In this sense I have to talk about two specific filters where I am struggling with this problem of geographic coordinates:

vtkExtractSelection. This is part of the VTK project, and it generates vtkPoints instances for the extracted data items - not taking care that these are not always FLOAT. And even more, the memory allocation for the point data is even hardcoded to be 4 bytes per value, so even if I would start a “global attack” on vtkPoints in my specific project, just defining the default type to be DOUBLE there, I would still run into trouble in that class. Solution: Fix these issues by adapting the strategy in vtkExtractSelection to the way how it is handled in the smoothing filter that Andras mentioned.
vtkCleanUnstructuredGrid. This is part of the Paraview project, and again inside that filter vtkPoints are allocated without respecting the data type of the input or giving the opportunity to the programmer to specify it explicitly. Again that “smothing filter strategy” could easily fix this issue. So far in my project I am already using a subclass of vtkCleanUnstructuredGrid that copies most of the functions 1:1, only changing the vtkPoints allocation to be DOUBLE instead of FLOAT. This solves the problem for my own project (but I just saw: only partially - see 3. below). But then people also want to use the filter directly through the Paraview GUI - and again the coordinates are damaged. So again it would be the better option to follow the advice of Andras: fix it in the Paraview project and send a merge request.
vtkCleanUnstructuredGrid faces still another problem, and that is with checking points for equality: It is done by simply comparing the values with ==, and this is always a bad idea. And it is even worse in a geometric context where the difference between “larger - equal - smaller” may translate into the topological differences “inside - onborder - outside” which may have a huge impact on any further computations, like generating topological structures that do not make sense at all. That bad comparison does not directly happen in vtkCleanUnstructuredGrid, but in the vtkMergePoints::IsInsertedPoint function that is called from there - and which brings us back into the VTK project.

So far in my programming career I tried to never do any double == comparisons, but always solved such problems with the “eps comparison” (“equal” is epx>::fabs(x-y)) - which is possible as long as I know exactly what I need: “meters down to mm precision” -> eps = 0.001 etc. It is of course not a fully valid solution for a project like VTK that has to remain generic, i.e. no assumptions can be made regarding absolute eps values, and sometimes relative eps may even be better. So for my own project, again some kind of “brute force approach” may work - simply copy entire classes and adapt them - with all the disadvantages of such an approach. I did not further investigate this yet, but at the moment I intend to take the advice of Andras and study the vtkCleanPolyData class and see how they are doing it better, then see how it can be applied to the vtkCleanUnstructuredClass as well.

So this is what I am taking from the input in this thread so far! The only thing that lets me always hesitate regarding merge requests is simply the fact that I feel not so “fluent” yet with the means of git - and VTK even appears as a “submodule” of Paraview here, which for me makes it “look harder”. However, I think I will have to take the time and try to find my way through this “git world” - in order to have that option also in the future - for the mutual progress of both my own project and also VTK and Paraview!

Sebastien_Jourdain · March 24, 2019, 12:48am

Hi Cornelis,

Those are great comments. On a side note, you can contribute to VTK and ParaView in an independent manner. In any case, we tend to pull newer VTK into ParaView anyway. So if you are patient, it will naturally happen.

Looking forward for your contribution,

Seb

lassoan · March 24, 2019, 1:17am

You might have assumed that Allocate() method of a VTK array takes input in bytes. The input is actually number of values or tuples. The division by 4 is for something else (maybe pre-allocation with approximate number of expected values for performance improvement).

In this filter, points are allocated using vtkPoints::New(), so you do have the option of specifying scalar type (by registering a custom object factory, as I wrote above). It would be more convenient for users if scalar type would be set from the input (and maybe also could be requested explicitly in the filter), but it is just a nice to have.

As described in vtkMergePoints documentation, there is a tradeoff: “vtkMergePoints is a locator object to quickly locate points in 3D. The primary difference between vtkMergePoints and its superclass vtkPointLocator is that vtkMergePoints merges precisely coincident points and is therefore much faster.”

I think the only limitation is in the vtkCleanUnstructuredGrid that it does not allow setting a locator, but vtkMergePoints is hardcoded, so you cannot adapt this filter to your requirements. It should be very easy to address this (similarly to vtkCleanPolyData).

Don’t worry, you cannot break anything, since you only have permission to modify your own VTK or Paraview forks. Feel free to experiment. Proposed changes in your merge requests will be reviewed and tested, so if you made mistakes then most likely they would not slip through.

cobo · March 24, 2019, 8:57am

Thanks for the hint: This looks indeed like the more “clean” solution!

However, I am of course not patient at all: I want to deliver the improvements to customers as quick as possible. But this is of course not a problem: Just apply the changes also locally and make sure that my “superbuild” does not download another version of PV and VTK than what I have locally on my system.

cobo · March 24, 2019, 9:13am

Great, so I am learning again - thank you!

Regarding the hard-coded allocation: You are right, I only assumed that the hard coded 4 implies a hard-coded “float size”, but it even does not make sense: it should be a multiplication then, not a division…

The hint with the custom object factory sounds like the elegant (or “legal”) alternative to just “hack” my local copy of vtkPoints to always deliver “double” by default. I am not sitting at my development computer here now, so I cannot check myself now (only tomorrow…). But if that factory is coming between if the vtk::New() function is used for object creation, this would mean that I can register such a factory globally in my PV based custom application - and ALL the filters, i.e VTK, Paraview and my own, will start to go for “double” by default - as long as they are using the ::New() function!?

With this I would already solve my two problems 1) and 2) in one elegant step.

So what remains would be to have a look into vtkCleanUnstructuredGrid and see if that cannot be changed to use another locator, at least optionally. This would then solve my problem number 3) as well - and that would then also be my only “contribution” to PV (merge request) - except if I would do the extra step and also implement that other interface (like in the smooth filter), but it would actually be for the same class.

However, from what I found out I can say that at least in my cases the combination of “double” and “vtkMergePoints” inside vtkCleanUnstructuredGrid more or less eliminates the “cleaning effect”: So far this is mostly based on the fact that by entering “double” points they are “rounded” to “float”, and then only compared, so implicitly introducing some kind of “relative eps”, being the precision limit of “float”. If you take that out (replacing float by double), the filter merged almost no points any more. So theoretically the only case where this procedure would work is where the points are actually just 1:1 copies of each other, but if any one of them has another “calculation history”, chances are 99% that they are different at the bit-by-bit level.

Bottom line: Adding more locator options to that class would certainly be a benefit for the vtkCleanUnstructuredGrid class, not only for my project!

cobo · March 25, 2019, 3:46pm

Sorry for getting off-topic, but this is where “git newbies” (since two years) are losing all the time of the world: Since several hours I just want to get a clean starting point for the above operation: update the vtkCleanUnstructuredGrid class. But this seems to be more or less impossible:

a local “git pull” fails with obscure error messages about “The following untracked working tree files would be overwritten by merge:” - and then some Google files that I certainly never ever touched… And no obvious way to get out of this…
on the website I already have a fork of the PV project, and a “branch”. However, that is all obsolete, so I rather simply want to kill that branch - but again: I do not find a way how to achieve this…
so why not simply “kill” the entire fork, delete also the entire PV work tree locally and start over from scratch: “fork” - clone - etc. Just get rid of the mess and start over! But no, again: I cannot find any way to do even this!

Sorry if I am wasting the time of people who have better things to do than reading the “rants” of a frustrated git/gitlab user - but I really see no way how I could contribute even one line of code if I am losing such amount of time for such nonsense… (One problem is that all “explanations” in the “git world” are so full of jargon that I sometimes think that learning Chinese would be the far easier thing to achieve: “just stash the committed fork in order go blame the master merge request of the head branch…” )

mwestphal · March 25, 2019, 3:52pm

You want to update to master ?
if all local changes have been commited you can do the following.

git checkout master
git pull

and if it fails

git reset --hard FETCH_HEAD

lassoan · March 25, 2019, 4:12pm

You might find it easier to use a GUI client for git (such as TortoiseGit or Github desktop). You don’t need to memorize commands (but you can always see what commands that are executed so you can learn them if you want to) and you are automatically offered actions that makes sense at the particular state of the repository. There are tons of git tutorials out there, but if you are interested you can have a look at the basic version control tutorial that we give to all of our new students.

cobo · March 25, 2019, 4:27pm

Thanks!

The last line from Mathieu helped me to get at least locally out of the jungle, and I am sure I should give another tutorial a try! Yours or another one. Not that I have not been working with version control for years (CVS and SVN), but git is “different” - and even TortoiseGit (which I am using) does not save me from the jargon. Which in itself may not even be so bad, but I still lack the basic understanding of concepts in git - even after already a number of tutorials…

And then it’s once again Mathieu who helps me out with just one or two lines of code that do the job - and I am already almost afraid of asking him such “stupid” questions because - because one day the “children” should finally start to learn something…

cobo · March 30, 2019, 4:44pm

Just to finalize this thread.

First of all a big thank you who gave me hints and tipps here - first of all to Andras because his input was really the missing piece of info that changed my solution from some patch to a really also conceptually sound and safe solution.

Basically I did two things:

Patched the vtkCleanUnstructuredGrid class in such a way that it supports now the same interface logic as vtkCleanPolyData: you can specify data type, and you can specify absolute or relative tolerance for the point merging. The default is still what it was before, where merging basically worked due to the fact that coordinates were rounded down to “float”, and whatever coincided after that operation was merged. With this I can tell now exactly that I want to “clean” to mm precision, for example.
Added an “object factory” to my project that always generates vtkPoints with “double” precision by default in my software - even in classes that I did not touch otherwise. With this I make sure that data are not clipped due to conversion into an insufficient data type.

The first of the two could actually be integrated back into ParaView! However, I did last week a few frustrating steps, trying to get the setup ready that would allow me to integrate into that well developed update process - with merge request, quality control and all. However, I did not manage to get it working, lost my nerves a bit and decided to go back to the official 5.6.0 release and apply the patches only locally here (with a “local git” so I can at least easily track what I added).

In other words: If anybody is interested or ready to take the code of the updated class and integrate it into Paraview, I would happily send it - because I know that it would be part of the official “development stream” from that moment! But of course I know also that people have a lot of other things to do. Otherwise I may do another attempt at a later time, maybe after 5.7 is out or whenever I find a few days of extra time for getting into it.

PS: During that excursion into the brand new current developments of Paraview I also got a glimpse of the major refactoring that is going on regarding plugin development. Mathieu had given me a hint already, but now I saw it happening - and I have the impression that I like it because it seems to add more clarity about “what does what” and “what depends on what”. In other words: clear modularity.

I also saw that the model seems to be the “module concept” inside VTK - and I wondered if there is maybe a good introduction document somewhere into these concepts and principles? So if somebody could give me a hint I would take it up - in preparation for the major changes I will have to apply to my project if I want to eventually migrate to 5.7

lassoan · March 30, 2019, 9:47pm

Thanks for the update. Getting something integrated into VTK (especially the first time) could be significant effort, but I would definitely recommend to go through this exercise - it is a good learning opportunity for you and the VTK community would benefit from your contributions.

cobo · March 31, 2019, 8:58am

I will eventually give it another try, but at this moment I am struggling with some other tasks as well, so I simply do not have the patience, and I am wasting the time of helping hands if I would try it again right now. Because at the end I know of course that my failure is basically due to lack of patience on my side, but there is also still a kind of ambition like “why shouldn’t I be able to manage if so many others have done it already?”

In that specific case it is also about contributing to ParaView, and as far as I can see it is currently in a heavy refactoring phase: Overall I like refactoring, but jumping in just in the middle of it is maybe also not the best thing if your own product needs to have a stable working base.

cory.quammen · March 31, 2019, 1:09pm

Just to clarify, the major refactoring has happened in ParaView’s CMake-based build system. While code changes are occurring, the number of changes in the C++ source code itself is consistent with typical ParaView development. That said, it will no doubt be easier for you to jump in once you update your base ParaView to something using the new build system.