Basic Usages

Please also refer to this notebook for a running example of the basic pipeline of using MEGAnno in a notebook.

Setting Schema

Schema defines the annotation task. Example of setting schema for a sentiment analysis task with positive and negative options.

demo.get_schemas().set_schemas({
    "label_schema": [
        {
            "name": "sentiment",
            "level": "record", 
            "options": [
                { "value": "pos", "text": "positive" },
                { "value": "neg", "text": "negative" },
            ]
        }
    ]
})
demo.get_schemas().value(active=True)

A label can be defined to have level record or span. Record-level labels correspond to the entire data record, while span-level labels are associated with a text span in the record. See Updating Schema for an example of a more complex schema.

Importing Data

Given a pandas dataframe like this (example generated from this Twitter US Airline Sentiment dataset):

id	tweet
0	@united how else would I know it was denied?
1	@JetBlue my SIL bought tix for us to NYC. We were told at the gate that her cc was declined. Supervisor accused us of illegal activity.
2	@JetBlue dispatcher keeps yelling and hung up on me!

Importing data is easy by providing column names for id which is a unique importing identifier for data records, and content which is the raw text field.

demo.import_data_df(df, column_mapping={
    "id": "id",
    "content": "tweet"
})

Note: In order to import a new dataset, we recommend to do so within a new project environment.

Exploratory Labeling

Not all data points are equally important for downstream models and applications. There are often cases where users might want to prioritize a particular batch (e.g., to achieve better class or domain coverage or focus on the data points that the downstream model cannot predict well). MEGAnno provides a flexible and controllable way of organizing annotation projects through the exploratory labeling. This annotation process is done by first identifying an interesting subset and assigning labels to data in the subset. We provide a set of “power tools” to help identify valuable subsets.

The script below shows an example of searching for data records with keyword "delay" and bringing up a widget for annotation in the next cell. More examples here.

# search results => subset s1
s1 = demo.search(keyword="delay", limit=10, skip=0)
# bring up a widget 
s1.show()

Column Filters

To view all column filters, click on "Filters" button; to reset all column filters, click on "Reset filters" button.

Column Order & Visibility

Column Order & Visibilty
1. To re-order and re-size column, mouse over column drag handler (left grip handler for re-order and right column edge for re-size).
2. To toggle column visiblity, click on "Columns", then toggle column to show/hide.
3. To reset column ordering and visibility, click on "Reset columns" button.

Metadata Focus-view

Metadata Focus-view
To focus on a single metadata value, click on "Settings" button, then choose a metadata name from the list.

Exporting

Although iterations can happen within a single notebook, it's easy to export the data, and annotations collected:

# collecting the annotation generated by all annotators
demo.export()