Data Model
The data model of the Open Delivery Gear intends to correlate typed metadata from multiple sources with Artefacts said metadata is related to. Artefacts can either be OCM Artefacts (i.e. Designtime Artefacts), or Runtime Artefacts. They are referenced using OCM coordinates with optional extensions.
At its core, the Open Delivery Gear’s data model consists of the
ArtefactMetadata meta-type, which allows describing such metadata, and
correlating it to an Artefact. It is the output of an extension which is
uploaded to the Delivery-Database via the Delivery-Service, and then may be
used for further processing and reporting. In the most basic form, it consists
of an Artefact, some Metadata and an extension specific
Payload (see Fig. 1). The model is defined in the odg.model module of
the delivery-service (ref).
Fig. 1: Artefact Metadata Model
The Artefact is used as a correlation-id to identify where the Payload belongs to, e.g. to an OCI image, some source code or a Kubernetes cluster. Also, it may be used to group multiple Payloads together. The Payload in turn holds the actual content the extension has created, this might be for example a finding, some informational data or some metadata (see Fig. 2).
Fig. 2: General Overview
Artefact
The Artefact identifies where the Payload belongs to. These Artefacts can be generally divided into two different groups:
Designtime Artefacts (e.g. OCI images, Helm charts, source code)
Designtime artefacts includes those artefacts which are statically available right after the build. Commonly, these artefacts are already modelled via OCM as
resourcesorsourcesand can be directly translated into theartefactmodel of theArtefactMetadata. The supportedartefact_kindsare thereforeresourceandsource.OCM component descriptor (excerpt)meta: schemaVersion: v2 component: name: example.org/my-component version: 0.1.0 resources: # might be `sources` as well - name: my-image version: 0.1.0 type: ociImage extraIdentity: version: 0.1.0
DerivedartefactofArtefactMetadataartefact: component_name: example.org/my-component component_version: 0.1.0 artefact_kind: resource # might be `source` as well artefact: artefact_name: my-image artefact_version: 0.1.0 artefact_type: ociImage artefact_extra_id: version: 0.1.0
Runtime Artefacts (e.g. Kubernetes clusters, hyperscaler resources)
Runtime artefacts can not be statically modelled via OCM as they are ephemeral in nature and not related to the build process. Hence, those kinds of artefacts have to be modelled more individually. An important aspect to consider when defining the model is that it is necessary to be able to unambiguously identify an artefact and that related artefacts can be grouped together (i.e. there must be some shared properties, e.g. the
artefact_type). Some already existing examples:artefactas modelled by the Diki extensionartefact: component_name: example.org/my-landscape-component # OCM component name of the landscape component_version: 0.1.0 # current version of the landscape artefact_kind: runtime artefact: artefact_name: managed-seeds # group of Kubernetes clusters, might also be a project etc. artefact_version: diki # Diki does not specify an actual version here artefact_type: dikiReport # Diki does not specifiy multiple artefact types
artefactas modelled by the Inventory extensionartefact: component_name: example.org/my-landscape-component # OCM component name of the landscape artefact_kind: runtime artefact: artefact_name: instance-abc # instance-id of a hyperscale resource artefact_type: aws/virtual-machine # Inventory uses different artefact types here artefact_extra_id: account_id: 0123456789 region_name: eu-west-1 vpc_id: vpc-0123456789
When defining how to set the artefact properties, it is important to consider
that this correlation-id is used to find related data or to create logical
groups which may be used, for example, to group items into the same issue as
part of the GitHub issue reporting. The attributes which are used for this kind
of grouping can be configured freely, but it must be ensured that the content
of the included properties is “stable”. That means, it might be not benefical
to include a version property or a temporary instance-id as a grouping
relevant properties as this would not allow to correlate the same
Payload between multiple versions or instances, ultimately causing for
example initial discovery dates to be re-written or new GitHub issues being
created instead of existing ones being updated. In the examples above, grouping
constellations which proved to be favorable are highlighted.
Metadata
In general, the meta field holds information on where the Payload
comes from (datasource) and what type of Payload it is (type). In
most cases, the datasource is equivalent to the name of the extension. Both,
the datasource and the type share a global namespace. When it comes to the
type, it can be differentiated between three kinds of datatypes:
Meta Types
Those datatypes are not directly related to any type of finding or a single extension, but rather used internally by the Open Delivery Gear. Most presumably the new extension does not have to define any of those datatypes. The most prominent one is the
meta/artefact_scan_infowhich must be emitted by an extension for every processed Artefact to indicate that is has been successfully processed. Also, it contains information on the last execution in general (e.g. a timestamp or a reference) (see Artefact Scan Info for an example). The relationship of a meta type and an Artefact is usually 1:1.Examples:
meta/artefact_scan_info,meta/responsiblesFinding Types
Finding types describe deviations from a desired state defined by a ruleset, for example the presence of a known vulnerability. Also, those finding types can be assigned to a certain “severity”. As findings usually have to be resolved within a certain timeframe, those
ArtefactMetadataentries also have to provide a initial Discovery Date together with theirallowed_processing_time. To have more control over the assignees in case of a reporting via GitHub issues, theresponsiblesdetected by the extension can be also added to themetafield to overwrite the default fallback (see issue-replicator extension). The relationship of findings and an Artefact is typically n:1.Informational Types
If an extension collects data for a certain Artefact which is not considered to be a finding, it should be modelled as an informational datatype. The information might be used to enrich the reported findings. For example, in the context of vulnerabilities, an additional informational type holds information on the detected file paths to add the package location to the reporting afterwards. In this case, the information is not part of the Payload of the finding type already as the relationship of file paths to vulnerability findings is n:n.
To create a mapping between the Datasource and the Datatypes it emits (and
vice-versa), the respective util functions datasource() and datatypes()
must be updated as well.
Payload
The schema of the Payload, model-wise referred to as data, can be
individually defined by the extension to store the actual content. Therefore,
it is necessary to add a new dataclass with the desired structure for each
Datatype. However, type-definitions must be consistent for each model-element
of the same Datatype. Afterwards, this new dataclass must be added to the
list of allowed types for the data property of the ArtefactMetadata model
class.
Key
To be able to unambiguously identify already existing database entries, it is
required for each ArtefactMetadata instance to define a unique key
property. This key always consists of the artefact, Datasource,
Datatype as well as the key defined by the data class (if there is any).
This means, in case it is expected that there may be multiple entries per tuple
of artefact, Datasource and Datatype, the new class must define a unique
key property as well.
Note
See gardener/cc-utils#1166
as an example for this chapter. Please note that the dso.model module in
the pull request has been replaced by the odg.model module in the
delivery-service.
Discovery Date
Findings (deviations from rulesets) typically have to be processed within an
allowed timeframe. Hence, the date of first discovery is stored to allow for
the calculation for latest due-dates. Thereby, the initial discovery_date
must be retained during subsequent updates. Therefore, the discovery_date is
part of the ArtefactMetadata model. To re-use the initial discovery_date of
a finding, and don’t reset it as part of every new scan, it must be defined
when a finding is to be interpreted as equal so that the discovery_date must
be re-used.
Considerations
In the most trivial example, this is the case when the data key is equal.
However, there might be cases where this is not enough, for example for
vulnerability findings, the discovery_date must be re-used in case the CVE
and the package is the same, even if the package-version (which is part of the
data key) changes. Therefore, the behaviour must be defined in the
PUT /artefacts/metadata route
(see open-component-model/delivery-service@6697e50
as an example how to define this behaviour). In case it is not defined, the
discovery_date will be always consumed as it is defined in the new
ArtefactMetadata entry.
Examples
Artefact Scan Info
artefact:
component_name: example.org/my-component
component_version: 0.1.0
artefact_kind: resource
artefact:
artefact_name: my-image
artefact_version: 0.1.0
artefact_type: ociImage
artefact_extra_id:
version: 0.1.0
meta:
type: meta/artefact_scan_info
datasource: bdba # name of the new extension
data: {} # optional properties describing the scan