Skip to main content
Version: 2025-02-27

RikAI2-Extract Prompting

RikAI2-Extract is designed to excel at extraction tasks, particularly where you may need to extract many fields or return information across many pages of a longer document.

Basic Principles

This model performs best when prompting with the JSON structure you ultimately want returned and filled out. There is also specific formatting to follow if you want to use the returnConfidence = true parameter to return confidence scores and bounding boxes.

Our general prompting tips apply to Extract. It is very important to be direct and clear in your prompt or descriptions.

Prompt Structure

The model is prompted with a JSON schema, and the response returned by the model reflects that same prompt schema. You have two options for your schema, based on whether you want to use returnConfidence = true to return field-level confidence scores and bounding boxes.

returnConfidence = false

“Key”: “Value”,

Replace “Key” with the key you want to use in the return JSON, a name for what you are extracting. If it is extremely straightforward, this may be all you need.

Replace “Value” with more verbose instructions or a question.

Example:

“PatientName”: “”

“PatientName”: “What is the patient's name?”

“PatientName”: “Full patient name”

These should all return the same answer on a straightforward field and form. Leverage the description more precisely for complex requests.

returnConfidence = true

{
"Key": {
"data": "instructions here",
"page_number": 0
}
}

When you want to return bounding boxes and confidence scores, use this structure. Do not change the key “data” or the key value pair “page_number”: 0.

Example:

{
"PatientName": {
"data": "What is the name of the patient?",
"page_number": 0
}
}

You can use a nested JSON up to 2 levels and still use returnConfidence = true.

Example:

{
"PatientInfo": {
"FirstName": {
"data": "Patient first name",
"page_number": 0
},
"LastName": {
"data": "Patient last name",
"page_number": 0
}
}
}

Capabilities

Checkboxes

Extract is equipped to handle checkboxes.

You can specify the options that might be checked, or ask a natural language question about the checkbox.

Example:

"Accident Type": ""

"Accident Type 2": "Work/Auto/Other"

"Accident Type 3": "What option is checked for Type of Accident?"

Tables

Asking for table information can return table columns or information delimited by the “ | “ character.

You can prompt the model to create a JSON using the table information with separate keys and values. Even if you don't know the total rows, you can use a prompt like the example below and the model will create the nested JSON to return the table information requested.

Example:

“Table name”: {
"chemicals": "extract each row of content with the chemical as the key and the rest of the information as the value"
}

Self-Expanding JSON Schema

Similar to the prompt in the Tables section above, you can prompt the model to create a JSON schema to match the number of times a type of information needs to be extracted, even if that will vary per document.

Example:

"procedure_#": "For every procedure named in this document, return the procedure name as the value number the key accordingly"

This will return a list of procedures:

"procedure_1": "appendectomy",

"procedure_2": "wisdom teeth removal",

"procedure_3": "EKG"

Please Note:

  • The response will alphabetize your keys by default, according to JSON best practices.

  • When using returnConfidence = true, if the resulting confidence score is low and your bounding boxes are impossible coordinates, the model will not return an answer. With returnConfidence = false, you may still return an answer.