Data Model
The data model of the Open Delivery Gear intends to correlate typed metadata from multiple sources with Artefacts said metadata is related to. Artefacts can either be OCM Artefacts (i.e. Designtime Artefacts), or Runtime Artefacts. They are referenced using OCM coordinates with optional extensions.
At its core, the Open Delivery Gear’s data model consists of the
ArtefactMetadata
meta-type, which allows describing such metadata, and
correlating it to an Artefact. It is the output of an extension which is
uploaded to the Delivery-Database via the Delivery-Service, and then may be
used for further processing and reporting. In the most basic form, it consists
of an Artefact, some Metadata and an extension specific
Payload (see Fig. 1). The model is defined in the odg.model
module of
the delivery-service (ref).
Fig. 1: Artefact Metadata Model
The Artefact is used as a correlation-id to identify where the Payload belongs to, e.g. to an OCI image, some source code or a Kubernetes cluster. Also, it may be used to group multiple Payloads together. The Payload in turn holds the actual content the extension has created, this might be for example a finding, some informational data or some metadata (see Fig. 2).
Fig. 2: General Overview
Artefact
The Artefact identifies where the Payload belongs to. These Artefacts can be generally divided into two different groups:
Designtime Artefacts (e.g. OCI images, Helm charts, source code)
Designtime artefacts includes those artefacts which are statically available right after the build. Commonly, these artefacts are already modelled via OCM as
resources
orsources
and can be directly translated into theartefact
model of theArtefactMetadata
. The supportedartefact_kinds
are thereforeresource
andsource
.OCM component descriptor (excerpt)meta: schemaVersion: v2 component: name: example.org/my-component version: 0.1.0 resources: # might be `sources` as well - name: my-image version: 0.1.0 type: ociImage extraIdentity: version: 0.1.0
Derivedartefact
ofArtefactMetadata
artefact: component_name: example.org/my-component component_version: 0.1.0 artefact_kind: resource # might be `source` as well artefact: artefact_name: my-image artefact_version: 0.1.0 artefact_type: ociImage artefact_extra_id: version: 0.1.0
Runtime Artefacts (e.g. Kubernetes clusters, hyperscaler resources)
Runtime artefacts can not be statically modelled via OCM as they are ephemeral in nature and not related to the build process. Hence, those kinds of artefacts have to be modelled more individually. An important aspect to consider when defining the model is that it is necessary to be able to unambiguously identify an artefact and that related artefacts can be grouped together (i.e. there must be some shared properties, e.g. the
artefact_type
). Some already existing examples:artefact
as modelled by the Diki extensionartefact: component_name: example.org/my-landscape-component # OCM component name of the landscape component_version: 0.1.0 # current version of the landscape artefact_kind: runtime artefact: artefact_name: managed-seeds # group of Kubernetes clusters, might also be a project etc. artefact_version: diki # Diki does not specify an actual version here artefact_type: dikiReport # Diki does not specifiy multiple artefact types
artefact
as modelled by the Inventory extensionartefact: component_name: example.org/my-landscape-component # OCM component name of the landscape artefact_kind: runtime artefact: artefact_name: instance-abc # instance-id of a hyperscale resource artefact_type: aws/virtual-machine # Inventory uses different artefact types here artefact_extra_id: account_id: 0123456789 region_name: eu-west-1 vpc_id: vpc-0123456789
When defining how to set the artefact
properties, it is important to consider
that this correlation-id is used to find related data or to create logical
groups which may be used, for example, to group items into the same issue as
part of the GitHub issue reporting. The attributes which are used for this kind
of grouping can be configured freely, but it must be ensured that the content
of the included properties is “stable”. That means, it might be not benefical
to include a version property or a temporary instance-id as a grouping
relevant properties as this would not allow to correlate the same
Payload between multiple versions or instances, ultimately causing for
example initial discovery dates to be re-written or new GitHub issues being
created instead of existing ones being updated. In the examples above, grouping
constellations which proved to be favorable are highlighted.
Metadata
In general, the meta
field holds information on where the Payload
comes from (datasource
) and what type of Payload it is (type
). In
most cases, the datasource
is equivalent to the name of the extension. Both,
the datasource
and the type
share a global namespace. When it comes to the
type
, it can be differentiated between three kinds of datatypes:
Meta Types
Those datatypes are not directly related to any type of finding or a single extension, but rather used internally by the Open Delivery Gear. Most presumably the new extension does not have to define any of those datatypes. The most prominent one is the
meta/artefact_scan_info
which must be emitted by an extension for every processed Artefact to indicate that is has been successfully processed. Also, it contains information on the last execution in general (e.g. a timestamp or a reference) (see Artefact Scan Info for an example). The relationship of a meta type and an Artefact is usually 1:1.Examples:
meta/artefact_scan_info
,meta/responsibles
Finding Types
Finding types describe deviations from a desired state defined by a ruleset, for example the presence of a known vulnerability. Also, those finding types can be assigned to a certain “severity”. As findings usually have to be resolved within a certain timeframe, those
ArtefactMetadata
entries also have to provide a initial Discovery Date together with theirallowed_processing_time
. To have more control over the assignees in case of a reporting via GitHub issues, theresponsibles
detected by the extension can be also added to themeta
field to overwrite the default fallback (see issue-replicator extension). The relationship of findings and an Artefact is typically n:1.Informational Types
If an extension collects data for a certain Artefact which is not considered to be a finding, it should be modelled as an informational datatype. The information might be used to enrich the reported findings. For example, in the context of vulnerabilities, an additional informational type holds information on the detected file paths to add the package location to the reporting afterwards. In this case, the information is not part of the Payload of the finding type already as the relationship of file paths to vulnerability findings is n:n.
To create a mapping between the Datasource
and the Datatypes
it emits (and
vice-versa), the respective util functions datasource()
and datatypes()
must be updated as well.
Payload
The schema of the Payload, model-wise referred to as data
, can be
individually defined by the extension to store the actual content. Therefore,
it is necessary to add a new dataclass with the desired structure for each
Datatype
. However, type-definitions must be consistent for each model-element
of the same Datatype
. Afterwards, this new dataclass must be added to the
list of allowed types for the data
property of the ArtefactMetadata
model
class.
Key
To be able to unambiguously identify already existing database entries, it is
required for each ArtefactMetadata
instance to define a unique key
property. This key
always consists of the artefact
, Datasource
,
Datatype
as well as the key
defined by the data
class (if there is any).
This means, in case it is expected that there may be multiple entries per tuple
of artefact
, Datasource
and Datatype
, the new class must define a unique
key
property as well.
Note
See gardener/cc-utils#1166
as an example for this chapter. Please note that the dso.model
module in
the pull request has been replaced by the odg.model
module in the
delivery-service.
Discovery Date
Findings (deviations from rulesets) typically have to be processed within an
allowed timeframe. Hence, the date of first discovery is stored to allow for
the calculation for latest due-dates. Thereby, the initial discovery_date
must be retained during subsequent updates. Therefore, the discovery_date
is
part of the ArtefactMetadata
model. To re-use the initial discovery_date
of
a finding, and don’t reset it as part of every new scan, it must be defined
when a finding is to be interpreted as equal so that the discovery_date
must
be re-used.
Considerations
In the most trivial example, this is the case when the data
key is equal.
However, there might be cases where this is not enough, for example for
vulnerability findings, the discovery_date
must be re-used in case the CVE
and the package is the same, even if the package-version (which is part of the
data
key) changes. Therefore, the behaviour must be defined in the
PUT /artefacts/metadata
route
(see open-component-model/delivery-service@6697e50
as an example how to define this behaviour). In case it is not defined, the
discovery_date
will be always consumed as it is defined in the new
ArtefactMetadata
entry.
Examples
Artefact Scan Info
artefact:
component_name: example.org/my-component
component_version: 0.1.0
artefact_kind: resource
artefact:
artefact_name: my-image
artefact_version: 0.1.0
artefact_type: ociImage
artefact_extra_id:
version: 0.1.0
meta:
type: meta/artefact_scan_info
datasource: bdba # name of the new extension
data: {} # optional properties describing the scan