Skip to content

Subset

meganno_client.subset.Subset

The Subset class is used to represent a group of data records

Attributes:

Name Type Description
__data_uuids list

List of unique identifiers of data records in the subset.

__service Service

Connected backend service

__my_annotation_list list

Local cache of the record and annotation view of the subset owned by service.annotator_id. with all possible metadata.

__init__(service, data_uuids=[], job_id=None)

Init function

Parameters:

Name Type Description Default
service Service

Service-class object identifying the connected backend service and corresponding data storage

required
data_uuids list

List of data uuid's to be included in the subset

[]

get_uuid_list()

Get list of unique identifiers for all records in the subset.

Returns:

Name Type Description
__data_uuids list

List of data uuids included in Subset

value(annotator_list: list = None)

Check for cached data and annotations of service owner, or retrieve for other annotators (not cached).

Parameters:

Name Type Description Default
annotator_list list

if None, retrieve cached own annotator. else, fetch live annotation from others.

None

Returns:

Name Type Description
subset_annotation_list list

See __get_annotation_list for description and example.

get_annotation_by_uuid(uuid)

Return the annotation for a particular data record (specified by uuid)

Parameters:

Name Type Description Default
uuid str

the uuid for the data record specified by user

required

Returns:

Name Type Description
annotation dict

Annotation for specified data record if it exists else None

show(config={})

Visualize the current subset in an in-notebook annotation widget.

Development note: initializing an Annotation widget, creating unique reference to the associated subset and service.

Parameters:

Name Type Description Default
config dict

Configuration for default view of the widget.

- view : "single" | "table", default "single"
- mode : "annotating" | "reconciling", default "annotating"
- title: default "Annotation"
- height: default 300 (pixels)
{}

set_annotations(uuid=None, labels=None)

Set the annotation for a particular data record with the specified label

Parameters:

Name Type Description Default
uuid str

the uuid for the data record specified by user

None
labels dict

The labels for the data record at record and span level, with the following structure:

- "labels_record" : list
    A list of record-level labels
- "labels_span" : list
    A list of span-level labels

Examples
-------

Example of setting an annotation with the desired record and span level labels:
```json
{
    "labels_record": [
        {
            "label_name": "sentiment",
            "label_value": ["neu"]
        }
    ],

    "labels_span": [
        {
            "label_name": "sentiment",
            "label_value": ["neu"],
            "start_idx": 10,
            "end_idx": 20
        }
    ]
}
```
None

Raises:

Type Description
Exception

If uuid or labels is None

Returns:

Name Type Description
labels dict

Updated labels for uuid annotated by user

get_reconciliation_data(uuid_list=None)

Return the list of reconciliation data for all data entries specified by user. The reconciliation data for one data record consists of the annotations for it by all annotators

Parameters:

Name Type Description Default
uuid_list list

list of uuid's provided by user. If None, use all records in the subset

None

Returns:

Name Type Description
reconciliation_data_list list

List of reconciliation data for each uuid with the following keys: annotation_list which specifies all the annotations for the uuid, data which contains the raw data specified by the uuid, metadata which stores additional information about the data, tokens , and the uuid of the data record Full Example:

{
    "annotation_list": [
        {
            "annotator": "pwOA1N9RKZVJM8VZZ7w8VcT8lp22",
            "labels_record": [],
            "labels_span": []
        },
        {
            "annotator": "IAzgHOxyeLQBi5QVo7dQR0p2DpA2",
            "labels_record": [
                {
                    "label_name": "sentiment",
                    "label_value": ["pos"]
                }
            ],
            "labels_span": []
        }
    ],
    "data": "@united obviously",
    "metadata": [],
    "tokens": [],
    "uuid": "ee408271-df5d-435c-af25-72df58a21bfe"
}

suggest_similar(record_meta_name, limit=3)

For each data record in the subset, suggest more similar data records by retriving the most similar data records from the pool, based on metadata(e.g., embedding) distance.

Parameters:

Name Type Description Default
record_meta_name str

The meta-name eg. "bert-embedding" for which the similarity is calculated upon.

required
limit int

The number of matching/similar records desired to be returned. Default is 3

3

Raises:

Type Description
Exception

If response code is not successful

Returns:

Name Type Description
subset Subset

A subset of similar data entries

assign(annotator)

Assign the current subset as payload to an annotator.

Parameters:

Name Type Description Default
annotator str

Annotator ID.

required