Skip to content

OpenAIJob

meganno_client.llm_jobs.OpenAIJob

The OpenAIJob class handles calls to OpenAI APIs.

__init__(label_schema={}, label_names=[], records=[], model_config={}, prompt_template=None)

Init function

Parameters:

Name Type Description Default
label_schema list

List of label objects

{}
label_names list

List of label names to be used for annotation

[]
records list

List of records in [{'data': , 'uuid': }] format

[]
model_config dict

Parameters for the Open AI model

{}
prompt_template str

Template based on which prompt to OpenAI is prepared for each record

None

set_openai_api_key(openai_api_key, openai_organization)

Set the API keys necessary for call to OpenAI API

Parameters:

Name Type Description Default
openai_api_key str

OpenAI API key provided by user

required
openai_organization str[optional]

OpenAI organization key provided by user

required

validate_openai_api_key(openai_api_key, openai_organization) staticmethod

Validate the OpenAI API and organization keys provided by user

Parameters:

Name Type Description Default
openai_api_key str

OpenAI API key provided by user

required
openai_organization str[optional]

OpenAI organization key provided by user

required

Raises:

Type Description
Exception

If api keys provided by user are invalid, or if any error in calling OpenAI API

Returns:

Name Type Description
openai_api_key str

OpenAI API key

openai_organization str

OpenAI Organization key

validate_model_config(model_config, api_name='chat') staticmethod

Validate the LLM model config provided by user. Model should be among the models allowed on MEGAnno, and the parameters should match format specified by Open AI

Parameters:

Name Type Description Default
model_config dict

Model specifications such as model name, other parameters eg. temperature, as provided by user

required
api_name str

Name of OpenAI api eg. "chat" or "completion

'chat'

Raises:

Type Description
Exception

If model is not among the ones provided by MEGAnno, or if configuration format is incorrect

Returns:

Name Type Description
model_config dict

Model congigurations

is_valid_prompt(prompt)

Validate the prompt generated. It should not exceed the maximum token limit specified by OpenAI. We use the approximation 1 word ~ 1.33 tokens

Parameters:

Name Type Description Default
prompt str

Prompt generated for OpenAI based on template and the record data

required

Returns:

Type Description
bool

True if prompt is valid, False otherwise

generate_prompts()

Helper function. Given a prompt template and a list of records, generate a list of prompts for each record

Returns:

Name Type Description
prompts list

List of tuples of (uuid, generated prompt) for each record in given subset

get_response_length()

Return the length of the openai response

get_openai_conf_score()

Return confidence score of the label, calculated using average of logit scores

preprocess()

Generate the list of prompts for each record based on the subset and template

Returns:

Name Type Description
prompts list

List of prompts

get_llm_annotations(batch_size=1, num_retrials=2, api_name='chat', label_meta_names=[])

Call OpenAI using the generated prompts, to obtain valid & invalid responses

Parameters:

Name Type Description Default
batch_size int

Size of batch to each Open AI prompt

1
num_retrials int

Number of retrials to OpenAI in case of failure in response

2
api_name str

Name of OpenAI api eg. "chat" or "completion

'chat'
label_meta_names

list of label metadata names to be set

[]

Returns:

Name Type Description
responses list

List of valid responses from OpenAI

invalid_responses list

List of invalid responses from OpenAI

extract(uuid, response, fuzzy_extraction)

Helper function for post-processing. Extract the label (name and value) from the OpenAI response

Parameters:

Name Type Description Default
uuid str

Record uuid

required
response str

Output from OpenAI

required
fuzzy_extraction

Set to True if fuzzy extraction desired in post processing

required

Returns:

Name Type Description
ret dict

Returns the label name and label value

post_process_annotations(fuzzy_extraction=False)

Perform output extraction from the responses generated by LLM, and formats it according to MEGAnno data model.

Parameters:

Name Type Description Default
fuzzy_extraction

Set to True if fuzzy extraction desired in post processing

False

Returns:

Name Type Description
annotations list

List of annotations (uuid, label) in format required by MEGAnno