Metadata
Metadata is a tool for direct queries and data visualization of the VKG. Metadata can be added to nodes in order to provide context in addition to the text of the node. It provides the benefit of adding fields that won't affect the text data of the node and the embedding of the node.
For formatting metadata, please refer to Formatting.
Example
The following is an example that can be sourced from an open music dataset.
| Node ID | text | artist | year | genre |
|---|---|---|---|---|
| 50c8a661-dcc4-4e31-a179-1aeea196d8de | care karma care space cadet gonna tell swim jacuzzi settle kick jump bite | "weird al" yankovic | 1983 | pop |
| 1394ce98-d125-42da-a6dd-f5ef5f6d421b | yeah shady aftermath guess know everybody floor goin goin outta control | 50 cent | 2005 | pop |
| 6eee81ee-8be1-4ad7-9e51-e8986db9ea7b | remember walk airport leave past travel beg want want piece piece collect grind | kelly clarkson | 2015 | pop |
| 98c4c5f8-a440-43f9-916d-392f77d902ad | touch taste feel hate distrust save lyric commercial | pink floyd | 1973 | rock |
Breaking this down, the Node ID and Text are required for all nodes. Artist, Year, Genre are metadata fields that provide more information on the Text field which, in this case, represents lyric excerpts.
With metadata, you can:
- Filter returned nodes when searching through the VKG.
- Include string data other than the
textfield.
Metadata Defaults
Each metadata field can be provided with a default value that will be used if any subsequent node insertions do not define that field.
If the default value is not provided, the API will default it to null.
Metadata Types
Each metadata field can be provided with a type that will be used as a way to ensure data consistency. This means all values for all nodes in a certain metadata field must abide by that type. To get around mixing types, try the one hot encoding technique.
Currently, the supported types are:
numberdatetimestringany
If the type is not specified, the API will default it to any. Type any metadata cannot be used for filters.
For example, assuming a JSON input
{
"metadata": {
"field_one": {}
}
}
gets defaulted to
{
"metadata": {
"field_one": {
"default": null,
"type": "any"
}
}
}
number
Numbers are parsed as decimals/floats. Expect any integers to be "saved" as a decimal.
Examples:
| Input | Output |
|---|---|
| 0 | 0.0 |
| 42 | 42.0 |
| 1337.0 | 1337.0 |
datetime
Datetimes are parsed as Python datetime objects. That means that they will be converted such that the value will contain the year, month, day, hour, second, and millisecond.
Datetimes should be inputted such that it's in the ISO 8601 extended format: YYYY-MM-DDThh:mm:ss
| Input | Output |
|---|---|
| 0 | Invalid |
| 01-01-2000 | Invalid |
| 2000- | Invalid |
| 2000 | 2000-01-01T00:00:00 |
| 2000-02 | 2000-02-01T00:00:00 |
| 2000-02-03 | 2024-02-03T00:00:00 |
string
Strings are just text. We recommend using type string over type any because string can be used in metadata filters.
| Input | Output |
|---|---|
| 0 | 0 |
| Hello world | Hello world |
| {} | {} |
| !@#$%^&*() | !@#$%^&*() |
| (th!s_br{}k3 | (th!s_br{}k3 |
any
Type any treats all values as strings. It is recommended that any is only used as a fallback for metadata that don't fit into the current schema. any type metadata cannot be used to filter search results.
Data Type Inference
If type is specified, inference will not happen and the default value will be coerced into the specified type. If it fails, a error message will be thrown.
If type is not specified, the API will infer the data type and coerce the default value to the inferred type. So try to specify the type on metadata schema initialization/updates as much as possible to prevent loss of data or data inconsistency.
| Input | Inferred Type |
|---|---|
| 0 | number |
| 2000 | datetime |
| True | string |
| {} | string |
| (th!s_br{}k3 | (th!s_br{}k3 |
VKG Metadata
The following explains how to incorporate metadata into the VKG.
As a rule of thumb, each VKG contains a metadata schema. This schema defines the structure of the metadata of its nodes. This schema can be initialized on VKG creation and modified at any time. Whenever nodes get added, those nodes will be coerced into the VKG's metadata schema in the following steps:
- Ignore all input fields that are undeclared in the schema.
- Coerce each node individually into the schema, most particularly, the
type. If this coercion fails, use thedefaultvalue. - All successfully valid/coerced nodes will be written to the VKG (Your VKG will contain nodes with metadata!).
- All invalid/unable-to-be-coerced nodes will be returned as
failed_nodesfor manual reformatting.
Note that metadata field names must be alphanumeric without spaces (use dashes or underscores as a substitution).
Metadata Schema Updates
When you update the VKG metadata schema, there are three cases that can happen on the field-level basis:
1. Inserting a field
All existing nodes will gain that new field and take on the default value/type that is specified.
2. Deleting a field
All existing nodes will have that field removed.
3. Editing a field
If the default value changes, none of the existing nodes will be modified. If all nodes were created and took on the old schema's defaults, they will still stay as those defaults.
If the type changes, all existing nodes will be coerced into that type. If they fail to be coerced, they will take on the default value and those nodes will be returned as warning_nodes.
To reiterate:
- If the
defaultvalue is not specified, the API will default it tonull.- If the
typeis not specified, the API will default it toany. Typeanymetadata cannot be used for filters.- If neither the
defaultvalue nor thetypeis specified, the API will default the value tonulland the type toany.