Metadata
Metadata is a tool for direct queries and data visualization of the VKG. Metadata can be added to nodes in order to provide context in addition to the text
of the node. It provides the benefit of adding fields that won't affect the text
data of the node and the embedding of the node.
For formatting metadata, please refer to Formatting.
Example
The following is an example that can be sourced from an open music dataset.
Node ID | text | artist | year | genre |
---|---|---|---|---|
50c8a661-dcc4-4e31-a179-1aeea196d8de | care karma care space cadet gonna tell swim jacuzzi settle kick jump bite | "weird al" yankovic | 1983 | pop |
1394ce98-d125-42da-a6dd-f5ef5f6d421b | yeah shady aftermath guess know everybody floor goin goin outta control | 50 cent | 2005 | pop |
6eee81ee-8be1-4ad7-9e51-e8986db9ea7b | remember walk airport leave past travel beg want want piece piece collect grind | kelly clarkson | 2015 | pop |
98c4c5f8-a440-43f9-916d-392f77d902ad | touch taste feel hate distrust save lyric commercial | pink floyd | 1973 | rock |
Breaking this down, the Node ID
and Text
are required for all nodes. Artist
, Year
, Genre
are metadata fields that provide more information on the Text
field which, in this case, represents lyric excerpts.
With metadata, you can:
- Filter returned nodes when searching through the VKG.
- Include string data other than the
text
field.
Metadata Defaults
Each metadata field can be provided with a default
value that will be used if any subsequent node insertions do not define that field.
If the default
value is not provided, the API will default it to null
.
Metadata Types
Each metadata field can be provided with a type
that will be used as a way to ensure data consistency. This means all values for all nodes in a certain metadata field must abide by that type. To get around mixing types, try the one hot encoding technique.
Currently, the supported types are:
number
datetime
string
boolean
-"TRUE"
,"FALSE"
, ornull
any
If the type
is not specified, the API will default it to any
.
For example, assuming a JSON input
{
"metadata": {
"field_one": {}
}
}
gets defaulted to
{
"metadata": {
"field_one": {
"default": null,
"type": "any"
}
}
}
number
Numbers are parsed as decimals/floats. Expect any integers to be "saved" as a decimal.
Examples:
Input | Output |
---|---|
0 | 0.0 |
42 | 42.0 |
1337.0 | 1337.0 |
datetime
Datetimes are parsed as Python datetime objects. That means that they will be converted such that the value will contain the year, month, day, hour, second, and millisecond.
Datetimes should be inputted such that it's in the ISO 8601 extended format: YYYY-MM-DDThh:mm:ss
Input | Output |
---|---|
0 | Invalid |
01-01-2000 | Invalid |
2000- | Invalid |
2000 | 2000-01-01T00:00:00 |
2000-02 | 2000-02-01T00:00:00 |
2000-02-03 | 2024-02-03T00:00:00 |
string
Strings are just text. If the type
is not specified, the metadata field will attempt to infer and eventually default to string
.
Input | Output |
---|---|
0 | 0 |
Hello world | Hello world |
{} | {} |
!@#$%^&*() | !@#$%^&*() |
(th!s_br{}k3 | (th!s_br{}k3 |
any
Type any
treats all values as strings. It is recommended that any
is only used as a fallback for metadata that don't fit into the current schema. any
type metadata cannot be used to filter search results.
Data Type Inference
If type
is specified, inference will not happen and the default
value will be coerced into the specified type. If it fails, a error message will be thrown.
If type
is not specified, the API will infer the data type and coerce the default
value to the inferred type. So try to specify the type on metadata schema initialization/updates as much as possible to prevent loss of data or data inconsistency.
Input | Inferred Type |
---|---|
0 | number |
2000 | datetime |
True | string |
{} | string |
(th!s_br{}k3 | (th!s_br{}k3 |
VKG Metadata
The following explains how to incorporate metadata into the VKG.
As a rule of thumb, each VKG contains a metadata schema. This schema defines the structure of the metadata of its nodes. This schema can be initialized on VKG creation and modified at any time. Whenever nodes get added, those nodes will be coerced into the VKG's metadata schema in the following steps:
- Ignore all input fields that are undeclared in the schema.
- Coerce each node individually into the schema, most particularly, the
type
. If this coercion fails, use thedefault
value. - All successfully valid/coerced nodes will be written to the VKG (Your VKG will contain nodes with metadata!).
- All invalid/unable-to-be-coerced nodes will be returned as
failed_nodes
for manual reformatting.
Note that metadata field names must be alphanumeric without spaces (use dashes or underscores as a substitution).
Metadata Schema Updates
When you update the VKG metadata schema, there are three cases that can happen on the field-level basis:
1. Inserting a field
All existing nodes will gain that new field and take on the default value/type that is specified.
2. Deleting a field
All existing nodes will have that field removed.
3. Editing a field
If the default
value changes, none of the existing nodes will be modified. If all nodes were created and took on the old schema's defaults, they will still stay as those defaults.
If the type
changes, all existing nodes will be coerced into that type. If they fail to be coerced, they will take on the default value and those nodes will be returned as warning_nodes
.
To reiterate:
- If the
default
value is not specified, the API will default it tonull
.- If the
type
is not specified, the API will default it toany
.- If neither the
default
value nor thetype
is specified, the API will default the value tonull
and the type toany
.