Skip to main content
Version: 2025-02-27

Metadata

Metadata is a tool for direct queries and data visualization of the VKG. Metadata can be added to nodes in order to provide context in addition to the text of the node. It provides the benefit of adding fields that won't affect the text data of the node and the embedding of the node.

For formatting metadata, please refer to Formatting.

Example

The following is an example that can be sourced from an open music dataset.

Node IDtextartistyeargenre
50c8a661-dcc4-4e31-a179-1aeea196d8decare karma care space cadet gonna tell swim jacuzzi settle kick jump bite"weird al" yankovic1983pop
1394ce98-d125-42da-a6dd-f5ef5f6d421byeah shady aftermath guess know everybody floor goin goin outta control50 cent2005pop
6eee81ee-8be1-4ad7-9e51-e8986db9ea7bremember walk airport leave past travel beg want want piece piece collect grindkelly clarkson2015pop
98c4c5f8-a440-43f9-916d-392f77d902adtouch taste feel hate distrust save lyric commercialpink floyd1973rock

Breaking this down, the Node ID and Text are required for all nodes. Artist, Year, Genre are metadata fields that provide more information on the Text field which, in this case, represents lyric excerpts.

With metadata, you can:

  • Filter returned nodes when searching through the VKG.
  • Include string data other than the text field.

Metadata Defaults

Each metadata field can be provided with a default value that will be used if any subsequent node insertions do not define that field.

If the default value is not provided, the API will default it to null.

Metadata Types

Each metadata field can be provided with a type that will be used as a way to ensure data consistency. This means all values for all nodes in a certain metadata field must abide by that type. To get around mixing types, try the one hot encoding technique.

Currently, the supported types are:

  • number
  • datetime
  • string
  • boolean - "TRUE", "FALSE", or null
  • any

If the type is not specified, the API will default it to any.

For example, assuming a JSON input

{
"metadata": {
"field_one": {}
}
}

gets defaulted to

{
"metadata": {
"field_one": {
"default": null,
"type": "any"
}
}
}

number

Numbers are parsed as decimals/floats. Expect any integers to be "saved" as a decimal.

Examples:

InputOutput
00.0
4242.0
1337.01337.0

datetime

Datetimes are parsed as Python datetime objects. That means that they will be converted such that the value will contain the year, month, day, hour, second, and millisecond.

Datetimes should be inputted such that it's in the ISO 8601 extended format: YYYY-MM-DDThh:mm:ss

InputOutput
0Invalid
01-01-2000Invalid
2000-Invalid
20002000-01-01T00:00:00
2000-022000-02-01T00:00:00
2000-02-032024-02-03T00:00:00

string

Strings are just text. If the type is not specified, the metadata field will attempt to infer and eventually default to string.

InputOutput
00
Hello worldHello world
{}{}
!@#$%^&*()!@#$%^&*()
(th!s_br{}k3(th!s_br{}k3

any

Type any treats all values as strings. It is recommended that any is only used as a fallback for metadata that don't fit into the current schema. any type metadata cannot be used to filter search results.

Data Type Inference

If type is specified, inference will not happen and the default value will be coerced into the specified type. If it fails, a error message will be thrown.

If type is not specified, the API will infer the data type and coerce the default value to the inferred type. So try to specify the type on metadata schema initialization/updates as much as possible to prevent loss of data or data inconsistency.

InputInferred Type
0number
2000datetime
Truestring
{}string
(th!s_br{}k3(th!s_br{}k3

VKG Metadata

The following explains how to incorporate metadata into the VKG.

As a rule of thumb, each VKG contains a metadata schema. This schema defines the structure of the metadata of its nodes. This schema can be initialized on VKG creation and modified at any time. Whenever nodes get added, those nodes will be coerced into the VKG's metadata schema in the following steps:

  1. Ignore all input fields that are undeclared in the schema.
  2. Coerce each node individually into the schema, most particularly, the type. If this coercion fails, use the default value.
  3. All successfully valid/coerced nodes will be written to the VKG (Your VKG will contain nodes with metadata!).
  4. All invalid/unable-to-be-coerced nodes will be returned as failed_nodes for manual reformatting.

Note that metadata field names must be alphanumeric without spaces (use dashes or underscores as a substitution).

Metadata Schema Updates

When you update the VKG metadata schema, there are three cases that can happen on the field-level basis:

1. Inserting a field

All existing nodes will gain that new field and take on the default value/type that is specified.

2. Deleting a field

All existing nodes will have that field removed.

3. Editing a field

If the default value changes, none of the existing nodes will be modified. If all nodes were created and took on the old schema's defaults, they will still stay as those defaults.

If the type changes, all existing nodes will be coerced into that type. If they fail to be coerced, they will take on the default value and those nodes will be returned as warning_nodes.

To reiterate:

  • If the default value is not specified, the API will default it to null.
  • If the type is not specified, the API will default it to any.
  • If neither the default value nor the type is specified, the API will default the value to null and the type to any.