About using spdx to handle license and copyright in VTK

We recently improved a bit how licenses are handled in ParaView and VTK, and now third party licenses are easily listed, module by module, in the install directory of VTK.

While this is a great improvement, we need to look further, into the use of SPDX license identifier, not only to remove the header in every single VTK source file but also to be able to generate a SBOM associated with a compiled version of VTK.

The idea is to (optionaly) integrate spdx generation into the build system of VTK.
When building a VTK module, the spdx headers of each file used to build the module will be parsed in order to create a .spdx file for the module.

Once all the .spdx files are created, tools to merge sbom could be used to merge all .spdx into one.

In essence, here is how a .spdx files looks like:

And here is what I think we should be able to produce:

SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: VTK
DocumentNamespace: https://gitlab.kitware.com/vtk/vtk/
Creator: Person: ?
Creator: Tool: ?
Creator: Tool: ?
Created: ?

##### Package: IOEnsight

PackageName: IOEnsight
SPDXID: SPDXRef-Package-IOEnsight
PackageDownloadLocation: git+https://https://gitlab.kitware.com/vtk/vtk/#IO/Ensight
FilesAnalyzed: ?
PackageVerificationCode: ?
PackageLicenseConcluded: $computed from info below$
PackageLicenseInfoFromFiles: $computed from licenses declared in files$
PackageLicenseDeclared: $provided by vtk.module$
PackageCopyrightText: $provided by vtk.module$

Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-IOEnsight

FileName: /build/libvtkIOEnsight.so
SPDXID: SPDXRef-Package-IOEnsight-binary
FileType: BINARY
FileChecksum: SHA1: ?
FileChecksum: SHA256: ?
FileChecksum: MD5: ?
LicenseConcluded: $Same as PackageLicenseConcluded above$
LicenseInfoInFile: NOASSERTION
FileCopyrightText: NOASSERTION

FileName: /src/IO/Ensight/vtkEnsightReader.cxx
SPDXID: SPDXRef-Package-IOEnsight-src
FileType: SOURCE
FileChecksum: SHA1: ?
FileChecksum: SHA256: ?
FileChecksum: MD5: ?
LicenseConcluded: $from file spdx header$
LicenseInfoInFile: $from file spdx header$
FileCopyrightText: $from file spdx header$

FileName: /src/IO/Ensight/vtkEnsightReader.h
SPDXID: SPDXRef-Package-IOEnsight-header
FileType: SOURCE
FileChecksum: SHA1: ?
FileChecksum: SHA256: ?
FileChecksum: MD5: ?
LicenseConcluded: $from file spdx header$
LicenseInfoInFile: $from file spdx header$
FileCopyrightText: $from file spdx header$

Relationship: SPDXRef-Package-IOEnsight-binary GENERATED_FROM SPDXRef-Package-IOEnsight-header
Relationship: SPDXRef-Package-IOEnsight-binary GENERATED_FROM SPDXRef-Package-IOEnsight-src

I did not include information about the build tool (CMakeLists, vtk.module), but this would still be possible I suppose.

This is very rough for now, but let me know your thoughts.

Standardizing on SPDX makes sense. Thanks for helping with this :pray:

VTK

I suggest to include the recommended header in each files for most of the file we would then have something like this:

# Copyright (c) Ken Martin, Will Schroeder, Bill Lorensen
# SPDX-License-Identifier: BSD-3-Clause

or

// Copyright (c) Ken Martin, Will Schroeder, Bill Lorensen
// SPDX-License-Identifier: BSD-3-Clause

VTK ThirdParty

We will also to look into how to integrate information in the context of the vendorized ThirdParty sources.

VTK Module System

Since there is standard tooling to extract license information, it may be worth considering deprecated the LICENSE property currently specified in vtk.module.

This would allow to have one source of truth.

Other projects

To help, here are references to other project which adopted it:

1 Like

I’m mostly in line with your suggetions.

I’m afraid this will have to stay in, has the license will still need to be installed over.

Moving forward with the following formalism.

For each module, when CMake option to generate spdx is set, a .spdx file will be generated at build time with following content:

SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: ${module name}
DocumentNamespace: ${global cmake property}
Creator: Tool: CMake
Created: ${computed at build time}

##### Package: ${module name}

PackageName: ${module name}
SPDXID: SPDXRef-Package-${module name}
PackageDownloadLocation: ${module location is set, if not global property}
FilesAnalyzed: false
PackageLicenseConcluded: ${based on below field}
PackageLicenseInfoFromFiles: ${parsed form source files in module}
PackageLicenseDeclared: ${module declared license}
PackageCopyrightText: ${parsed form source files in module}

Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-${module name}

Update about SPDX for the curious, will soo be integrated in https://gitlab.kitware.com/vtk/vtk/-/merge_requests/10200

SPDX Generation in module

It is now possible to generate SPDX files in VTK module system.
A SPDX file is a standardized file containing all license and copyright information about a software package, usually used
to generate Software Bills Of Materials.

(see below for technical resources)

The generation system relies on three component.

  1. SPDX arguments in vtk_module_build
  2. SPDX variables in each vtk.module
  3. SPDX Tags in the sources files

Assuming all these informations are provided, VTK will be able to generate complete SPDX file for each module and third-party module.
If some information are missing, VTK will warn during configuration or during build but the SPDX file will still
be generated with some field being NOASSERTION or another default value.
See below for multiple examples.

In the details below, “expected” field are outputing a warning when no present but are not required and may be replaced by NOASSERTION.

  1. SPDX arguments in vtk_module_build

Any project using SPDX generation may want to set certain SPDX arguments in their vtk_module_build call

  • GENERATE_SPDX
  • SPDX_DOCUMENT_NAMESPACE
  • SPDX_DOWNLOAD_LOCATION

GENERATE_SPDX is the trigger to will start the whole process. Set this to ON.

SPDX_DOCUMENT_NAMESPACE is used as a basename for the DocumentNamespace spdx field. The name of the module will simply be appended to the basename. If not provided, https://vtk.org/spdx will be used. This is the value VTK project uses as well. Please note that the namespace does not need to be an actual live URL, but just a unique URI. If VTK decide to host SPDX files in the future, the namespace in use for the VTK spdx files may change accordingly.

SPDX_DOWNLOAD_LOCATION is used as a basename for the PackageDownloadLocation when not provided at module level. The relative path to the module will simply be appended
in order to generate the actual PackageDownloadLocation spdx field. If not provided at module or in vtk_module_build, NOASSERTION will be used.

  1. SPDX module variables

There three variable to declare in the vtk.module file (or in the third party declaration).

  • SPDX_LICENSE_IDENTIFIER
  • SPDX_COPYRIGHT_TEXT
  • SPDX_DOWNLOAD_LOCATION

SPDX_LICENSE_IDENTIFIER is an expected field correspond to the PackageLicenseDeclared spdx field and will also be considered as the global license for all files of the
module that are not parsed during generation. This field is of course considered to generate the PackageLicenseConcluded spdx field.

SPDX_COPYRIGHT_TEXT is an expected field that correspond to the copyright applying to all files that are not parsed during generation, it is used to generate PackageCopyrightText.

SPDX_DOWNLOAD_LOCATION is a optional field for modules (see above for setting this in vtk_module_build) and expected field for third parties. If provided, it is used as is for the PackageDownloadLocation spdx field.

  1. SPDX Tags in the sources files

For modules (not for third parties), source files are parsed for specific SPDX tags in a specific order.
First N lines of copyright texts, then one line of license tag. Like this:

// SPDX-FileCopyrightText: Copyright (c) Ken Martin, Will Schroeder, Bill Lorensen
// SPDX-FileCopyrightText: Copyright (c) Awesome contributor
// SPDX-License-Identifier: BSD-3-CLAUSE

Correctness of the copyright text and license identifier is not ensured at all and will be used as is.
If a source file does not contain the expected SPDX tags, a warning will be emitted and the file will not be parsed.
Please note some generated files are automatically excluded from parsing.

Note: It would be possible to add a dedicated tag to identify that a file should NOT be parsed for SPDX tags.

  1. Conclusion

All provided license information and copyrights are used to generate the document.
Regarding license, different identifier are concatened together using AND keyword.
Regarding copyright, different copyright test are appended to the section.

  1. Limitations and caveats

According to the SPDX spec, we should list every single parsed file in the SPDX document.
I did not do it yet as it was not specifically required, however adding this feature should be pretty trivial and may be done in the future.

Copyright texts and license identifier validation could also be integrated but may require external python modules to perform the evaluation.

The SPDX generation system do not and cannot replace the LICENSE_FILES mechanism.
Certains license require the license to be shipped with the code/binaries which SPDX do not provide.

SPDX 2.2 specification has been used because this is the widely used standard with many resources, but switching to 2.3 in a long term future will probably be needed.

  1. Examples

Example of a complete SPDX file for a module in VTK (once the module have been ported to the system):

SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: IOPLY
DocumentNamespace: https://vtk.org/vtkIOPly
Creator: Tool: CMake
Created: 2023-05-16T16:08:29Z

##### Package: IOPLY

PackageName: IOPLY
SPDXID: SPDXRef-Package-IOPLY
PackageDownloadLocation: https://gitlab.kitware.com/vtk/vtk/-/tree/master/IO/PLY
FilesAnalyzed: true
PackageLicenseConcluded: BSD-3-CLAUSE
PackageLicenseDeclared: BSD-3-CLAUSE
PackageLicenseInfoFromFiles: BSD-3-CLAUSE
PackageCopyrightText: <text>
Copyright (c) Ken Martin, Will Schroeder, Bill Lorensen
</text>

Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-IOPLY

Example of a SPDX file generated without any information for a module that have not been porter to the system:

SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: vtkFiltersVerdict
DocumentNamespace: https://vtk.org/vtkFiltersVerdict
Creator: Tool: CMake
Created: 2023-05-25T15:16:20Z

##### Package: vtkFiltersVerdict

PackageName: vtkFiltersVerdict
SPDXID: SPDXRef-Package-vtkFiltersVerdict
PackageDownloadLocation: https://gitlab.kitware.com/vtk/vtk/-/tree/master/Filters/Verdict
FilesAnalyzed: false
PackageLicenseConcluded: NOASSERTION
PackageLicenseDeclared: NOASSERTION
PackageLicenseInfoFromFiles: NOASSERTION
PackageCopyrightText: <text>
NOASSERTION
</text>

Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-vtkFiltersVerdict

Example of a complete SPDX file for a 3rd party in VTK (once the 3rd party have been ported to the system):

SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: VTK::loguru
DocumentNamespace: https://vtk.org/vtkloguru
Creator: Tool: CMake
Created: 2023-05-22T15:56:52Z

##### Package: VTK::loguru

PackageName: VTK::loguru
SPDXID: SPDXRef-Package-VTK::loguru
PackageDownloadLocation: https://github.com/Delgan/loguru
FilesAnalyzed: no
PackageLicenseConcluded: BSD-3-Clause
PackageLicenseDeclared: BSD-3-Clause
PackageLicenseInfoFromFiles: NOASSERTION
PackageCopyrightText: <text>
LOGURU Team
</text>

Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-VTK::loguru

Example of a complete SPDX file for a VTK module from outside of VTK (once the module have been ported to the system):

SPDXVersion: SPDX-2.2
DataLicense: CC0-1.0
SPDXID: SPDXRef-DOCUMENT
DocumentName: MyModule
DocumentNamespace: https://my-website/MyModule
Creator: Tool: CMake
Created: 2023-05-16T16:08:29Z

##### Package: MyModule

PackageName: MyModule
SPDXID: SPDXRef-Package-MyModule
PackageDownloadLocation: https://github/myorg/mymodule
FilesAnalyzed: true
PackageLicenseConcluded: BSD-3-CLAUSE AND MIT
PackageLicenseDeclared: BSD-3-CLAUSE
PackageLicenseInfoFromFiles: BSD-3-CLAUSE AND MIT
PackageCopyrightText: <text>
Copyright (c) 2023 Popeye
Copyright (c) 2023 Wayne "The Dock" Sonjhon
</text>

Relationship: SPDXRef-DOCUMENT DESCRIBES SPDXRef-Package-MyModule
  1. Resources:

The SPDX generation system has just been merged in VTK, you can read more about it here:

https://docs.vtk.org/en/latest/advanced/spdx_and_sbom.html

SPDX information has been added to all files where it matters and all “standard” VTK headers have been replaced by SPDX headers:

This conclude my work on SPDX in VTK for now, although many issues stay standing and the system could be improved much further:
https://gitlab.kitware.com/vtk/vtk/-/issues/?sort=created_date&state=opened&search=SPDX&first_page_size=20

Please make sure to rebase any of your ongoing MR and branches on the last master before merging anything.

1 Like