Designing a CSS-styled representation for vtkDataAssembly

dcthomp · July 15, 2021, 5:38am

This started as a ParaView discussion on visual properties for partitioned datasets, but seems more appropriate here.

Problem statement

Complex composite datasets do not get fine-grained control of their visual style. Partitioned-dataset collections can hold heterogenous data (i.e., some datasets may be surfaces, some may be polylines, some might be images or volumetric meshes). The controls that existing mappers – such as the vtkCompositePolyDataMapper – provide do not include per-dataset style.

ParaView – and presumably in some cases VTK – has scalability issues with large numbers of representations in a view. (In ParaView, each representation has client-server proxies that hold significant metadata.) This drives several applications being developed to present a single dataset (previously a multiblock, now a partitioned dataset collection) for rendering rather than a large number of individual datasets. Because of the heterogeneity, rendering and interacting with these datasets is challenging.

For example, typical CAD models can present thousands of surfaces, edges, and model vertices, each their own polydata; even splitting these into a single collection per dimension does not solve the issue since rendering styles for subsets of these datasets necessarily change frequently with user interaction.

Proposed solution

We propose a vtkRepresentation or vtkMapper for vtkDataAssembly that accepts cascading style sheets (CSS) to control the visual style and interaction (e.g., pickability).

Separate content from presentation. The data assembly serves as a document-object model (DOM) much the same as a web-page’s HTML elements. CSS marks up HTML for the same reasons that our applications wish to control mapping of composite datasets: separating presentation from content.
Group data using selectors. The data assembly provides a hierarchy that can be exposed for use by CSS selectors. The names assigned to data-assembly entries are non-unique; they could be treated as tag names (3-D data equivalents of HTML tags such as p, div, h1) or as CSS classes.
Apply properties to groups/classes rather than data. CSS properties have a significant overlap with the visual properties VTK exposes through mappers and actors.
Interaction and animation. CSS provides pseudo-classes for interaction (like :hover) as well as simple transition animations as the classes assigned to an element are changed. Separating temporal updates to property values from the underlying rendering code modularizes the design.

In the longer term, one can imagine an entire scene (collection of renderers, render-windows, cameras, framebuffers, and render-passes) as having a DOM that might be marked up with CSS. In the short term, we’ll simply have a mapper or representation that accepts CSS as a configuration parameter.

Design decisions and rationale

There are many different ways we could expose CSS on a data assembly. This section discusses some alternatives and reasons for choosing between them.

Mapping a `vtkDataAssembly` into a DOM

A vtkDataAssembly is a tree whose nodes are assigned integer values. These integer values may have a list of 0 or more child integers. Each integer may be assigned a non-unique string “name”. The tree structure is amenable to the hierarchy of an HTML DOM, but the exact mapping between DOM elements, attributes, and classes is arbitrary.

We could make each data-assembly entry a div element and pass the name assigned to the entry a class. This maximizes similarity to HTML by forcing all but the leaves of the tree to be valid HTML. However, HTML elements may have any number of string class names assigned to them; this mapping would provide only one (the name) or the name would have to be split using some separator (which could slow processing). Attributes (if any are allowed) would be stored separately.
We could treat each entry in the collection as an element whose type is its name. Classes and attributes would have to be stored as separate maps from integers to sets of strings (for classes) or maps from strings to variants (for attributes). It might also be necessary to index class and attribute names so that quick reverse lookups could be performed by selectors (e.g., generating a list of integer data-object IDs that are assigned the highlighted class).

Since vtkDataAssembly names are non-unique, the latter approach seems preferable.

And, in fact, vtkDataAssembly uses pugi-xml internally to store attributes and perform x-path searches – so the DOM already exists. Some API for adding/removing classes needs to be added (already have a branch for this).

Data assembly leaf-nodes

Beyond the hierarchy provided by a vtkDataAssembly, most VTK data objects provide structure that might be useful to expose. Particularly, the points, connectivity, and attributes (point/cell/node/arc/row/field data) could be exposed. Some options include

Use a simple transformation of each vtkObject’s class name to obtain an “element” name. For example, lowercase(vtkObject::GetClassName()).substr(3) would map “vtkPolyData” to “polydata”; “vtkUnstructuredGrid” to “unstructuredgrid”; and so on.
For subclasses of vtkDataSetAttributes, the “active” attribute arrays might be marked with a pseudo-class (e.g., polydata pointdata:scalars would select the active point scalar array). NB: Because the concept of “active” arrays is being slowly deprecated, this may not be wise to expose.
For subclasses of vtkAbstractArray, the array type and number of components might be presented as attributes (e.g. polydata celldata[type='uint64', components=1] would select cell arrays storing 1 unsigned 64-bit integer per cell).
It is possible that information keys held in the map returned by vtkDataObject::GetInformation() could also be presented in the DOM (perhaps as attributes?), although this will not be done in the first pass.
For data objects that have cells (with either implicit or explicit connectivity), it may be useful to present it in the DOM. Specifically, this would cover use cases where a “decorator” of some sort is used to subset or transform the cells of a dataset before mapping. This supports a major use-case driving a CSS mapper: assigning separate visual styles to individual datasets.
- Mapping unstructured grids: extracting the external surface, all edges, or running other pipeline operations (such as a shrink filter) would allow datasets not typically mapped directly to be presented.
- Glyphing polydata points.
- Rendering polydata in wireframe vs surface mode.
It makes sense to apply any “decoration” property to the cell connectivity of a dataset. The decoration might refer to pseudo-objects representing pipelines that can be configured with CSS properties, e.g.:
```
/* extract the external surface of the mesh: */
unstructuredgrid cells { transform: '#extract-surface'; }
#extract-surface { merge-points: true; }
/* glyph vertices with spheres */
polydata verts { transform: '#glyph'; }
#glyph { shape: 'sphere'; resolution: 32; }
```
This would need some fleshing out in order to support things like rendering both surfaces and edges, using the tube/sphere shaders for lines/points, and so on.

Supported properties

From this list of properties, we anticipate supporting:

color: a fixed color and optionally opacity.
cursor: a cursor to use when the mouse is over the given data
height: scale a dataset to fit device coordinates (projected bounding box height constrained)
opacity: a single opacity that modulates an entire dataset.
outline: whether to render a bounding box around the object. The top/bottom/left/right keywords would not be used.
rotate: accepts a 3-d rotational transform
scale: accepts 1 or 3 scale factors
translate: accepts 3-d transformation
visibility: turn rendering on or off
width: scale a dataset to fit device coordinates (projected bounding box width constrained)

Furthermore, we would add the following properties not present in CSS:

color-by: one of solid (default), array, texture, shader
color-array: a selector string identifying an array to color by
color-map: specification of a vtkScalarsToColors object
line-width
line-color: when rendering surfaces, if line-width is non-zero then bounding lines of surface cells should be rendered with the given (solid) color.
ambient-color: ambient color for surface shading (overrides color if present).
diffuse-color: diffuse color for surface shading (overrides color if present).
specular-color: specular color for surface shading (overrides color if present).
ambient-coefficient: weighting of ambient to diffuse and specular components
diffuse-coefficient: weighting of ambient to diffuse and specular components
specular-coefficient: weighting of ambient to diffuse and specular components
specular-power: the exponent used in Phong shading
point-size
point-color: when rendering lines and surfaces, if the point size is non-zero then corner points of cells should be rendered with the given (solid) color.

ben.boeckel · July 16, 2021, 11:16am

Unrelated to what it will be used for, has there been investigation into suitable libraries for parsing and applying/querying the CSS?

Aron_Helser · July 16, 2021, 3:58pm

I agree that treating the node names as a type seems to be preferable to treating the name as a class. It maps well to the idea that the DOM represents the structure of the page (even though display often gets mixed in).

I’m concerned that applying the CSS selectors will be complex or slow - we should think about what needs to happen when:

the data assembly changes (likely to happen often)
the css changes (infrequent - when do we allow it?)

In particular, from our client app, we currently are using a partitioned dataset, and I’d like to understand how that needs to be translated into a DataAssembly, and what selectors might be used to achieve what we are doing now by extracting partitions.

dcthomp · July 17, 2021, 2:45am

That is a good point. We will need some code to benchmark

large assemblies,
large CSS rulesets,
changes to assembly datasets (connectivity, color arrays, etc),
large changes to an assembly (location/name changes of entries in the tree).

As far as what needs to happen when:

CSS changes

Each time the CSS changes (or is provided the first time), we must invoke the parser, passing it

the CSS source
a “state” object that holds the parsed set of rules
optional additional arguments (which we might use to track or apply the CSS to a DOM – but I think it would be wise to avoid this for the time being).

The main output is the state object. The first time CSS is provided, if we already have a data assembly, we will then need to continue on to the next step:

Data assembly changes

At this point, we have parsed the CSS and need only apply it. We will keep a map from each “unique style” to the list of datasets which share that style. A “unique style” is a collection of datasets that share the same CSS properties, however they came by them. (This means one dataset can have a class that provides a set of properties while another dataset might have explicitly-specified properties. As long as the properties are the same (and the datasets are compatible with one another), then they should be rendered together.) Thus each “unique style” can provide 3 functions for rendering:

prepare() to render using the style
render(vtkDataObject*) to render a dataset entry
finalize() rendering using the style

When the assembly changes, we will need to

Identify which entries in the hierarchy have changed (which may be the entire assembly).
Apply the state object to each entry to determine properties, accumulating a set of entries for each unique set of properties (i.e., aggregate like datasets that share the same style).
Prepare OpenGL vertex, index, and command buffers for each modified dataset entry in the DOM. For datasets that share the same style, the command buffer should not need to be modified, just called while targeting different vertex and index buffer objects.
Render. This simply traverses the list of unique styles, invoking the functions above on each dataset marked with that style.

Depending on the changes that have occurred:

If the style for a dataset entry changes (or the dataset entry itself changes due to upstream filters), then buffers may need to be re-uploaded.
In some cases, a new “unique style” will come into existence and command buffers set up for rendering it.
Finally, in some cases, the dataset may just need to be moved to a new “unique style” entry (no buffer re-uploads required) so that re-rendering just involves changing the order in which buffers are queued.

dcthomp · July 17, 2021, 2:49am

I’ve been fiddling around with PEGTL and think for the set of CSS we need, we can parse it ourselves. I have not found a good, open CSS parser in C/C++ that can be easily extracted from its consuming project. The netsurf parser identified in the other thread (1) looks unmaintained and (2) was not fun to build because of the other dependencies it includes. Other potential sources (e.g., webkit) have similar issues.

dcthomp · July 17, 2021, 3:36am

@ben.boeckel @Aron_Helser Here is a first pass at a PEGTL CSS parser. It parses some non-trivial-but-by-no-means-exhaustive CSS files. There are no PEGTL actions yet, so it doesn’t do anything but match grammar elements. However, the other parsers I’ve found would require significantly more work to use than this (since VTK already provides PEGTL 2.x).

It can parse this CSS from the vtk.js website in 2ms in Release mode (or 34ms in Debug mode).

ben.boeckel · July 17, 2021, 2:52pm

Parsing CSS is not my main concern. It is applying CSS’s rules, overrides, and other semantics that are of more concern to me. That is what I want an existing CSS engine for, not the parsing.

dcthomp · July 17, 2021, 4:45pm

For the properties we need to deal with, the cascade is not that complex. If we had to do box layout, it would be another thing.

dcthomp · July 27, 2021, 3:31am

To make the data-flow and storage more understandable, here is a figure that summarizes how the data-assembly mapper would work:

css-mapper

The mapper holds the following objects:

A Styles object that holds
- a CSS string that serves as Markup
- a collection of Rule instances parsed from the CSS markup (a rule is a selector plus a set of properties with values)
- a collection of Style instances computed using the Rules plus the mapper’s DOM
A vtkDataAssembly that, through pugi-xml, already provides a DOM with attributes (including the class attribute).
Sub-mappers (plus any applicable input filters) for each DOM entry. Eventually, these sub-mappers should be replaced with finer-grained components that manage single command-buffer and data-buffer objects. For the first pass, we will simply use existing mappers.

Organizing things this way (which is the way libcss works and very similar to webkit/blink except they do JIT – which would greatly increase VTK’s 3rd-party dependencies) ensures:

If the CSS changes, the DOM does not need to be revisited (the rules will be updated and from the rules the properties of the style instances present in the DOM will be updated).
If the DOM changes, the CSS does not need to be revisited (the styles will be updated with the existing rules, then the sub-mappers updated using the computed styles).
If entries in the input partitioned-dataset collection change, then only the mappers that use those datasets will re-upload data to the GPU.
CSS animations just update property values in the computed styles, update the sub-mapper configurations, and re-render. In fact, each time the data-assembly mapper renders it can simply update properties from any active animation before deciding whether to reconfigure any sub-mappers. The mapper itself could queue a rate-limiting timer at the end of the render if any animations were active. The application could re-render when the mapper’s timer fires.