Advanced features
This notebook provides examples of some of the advanced features.
Updating Schema
Annotation requirements can change as projects evolve. To update the schema for a project, simply call set_schemas
with the new schema object.
For example, to expand the schema we set in the basic notebook:
demo.get_schemas().set_schemas({
"label_schema": [
{
"name": "sentiment",
"level": "record",
"options": [
{ "value": "pos", "text": "positive" },
{ "value": "neg", "text": "negative" },
{ "value": "neu", "text": "neutral" } # adding a new option
]
},
# adding a span-level label
{
"name": "sp",
"level": "span",
"options": [
{ "value": "pos", "text": "positive" },
{ "value": "neg", "text": "negative" },
]
}
]
})
Metadata
In MEGAnno, metadata refers to auxiliary information associated with data records. MEGAnno takes user-defined functions to generate metadata and uses it to find important subsets and assist human annotators. Here we show two examples.
Example 1: Adding sentence bert embeddings for data records. The embeddings can later be used to make similarity computations over records.
# Example 1, adding sentence-bert embedding.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
# set metadata generation function
demo.set_metadata("bert-embedding",lambda x: list(model.encode(x).astype(float)), 500)
Example 2: Extracting hashtags as annotation context.
# user defined function to extract hashtag
def extract_hashtags(text):
hashtag_list = []
for word in text.split():
if word[0] == "#":
hashtag_list.append(word[:])
# widget can render markdown text
return "".join(["- {}\n".format(x) for x in hashtag_list])
# apply metadata to the project
demo.set_metadata("hashtag", lambda x: extract_hashtags(x), 500)
With hashtag
metadata, MEGAnno widget can show it as context at annotation time.
Advanced Subset Generation
In addition to exact keyword matches, MEGAnno also provides more advanced approaches of generating subsets.
Regex-based Searches
MEGAnno supports searches based on regular expressions:
Subset Suggestion
Searches initiated by users can help them explore the dataset in a controlled way. Still, the quality of searches is only as good as users’ knowledge about the data and domain. MEGAnno provides an automated subset suggestion engine to assist with exploration. Embedding-based suggestions make suggestions based on data-embedding vectors provided by the user (as metadata).
For example, suggest_similar suggests neighbors (based on distance in the embedding space) of data in the querying subset:
s3 = demo.search(keyword="delay", limit=3, skip=0) # source subset
s4 = s3.suggest_similar("bert-embedding", limit=4) # needs to provide a valid meta_name
s4.show()
Subset Operations
MEGAnno supports set operations to build more subsets from others:
# intersection
s_intersection = s1 & s2 # or s1.intersection(s2)
# union
s_union = s1 | s2 # or s1.union(s2)
# difference
s_diff = s1 - s2 # or s1.difference(s2)
Dashboard (administrator-only)
MEGAnno provides a built-in visual monitoring dashboard to help users to get real-time status of the annotation project. As projects evolve, users would often need to understand the project’s status to make decisions about the next steps, like collecting more data points with certain characteristics or adding a new class to the task definition. To aid such analysis, the dashboard widget packs common statistics and analytical visualizations (e.g., annotation progress, distribution of labels, annotator agreement, etc.) based on a survey of our pilot users.
To bring up the project dashboard:
Other features
-
Assignment and dispatch: You may assign a subset to a particular annotator
-
Multiple annotators and reconciliation: You are also able to view a reconciled list of annotations from multiple annotators