This page is mostly an overview of what's possible for NER with Prodigy. For more details and API docs, check out the PRODIGY_README.html
, which is available for download with Prodigy. This also includes the input formats and how to represent entity spans etc. as JSON.
Here's an example of an example with a highlighted entity span:
{
"text": "Hello Apple",
"spans": [{ "start": 6, "end": 11, "label": "ORG"}]
}
At a minimum, you need the start and end character offset into the text, and the label. If you have that annotated already, it should hopefully be pretty straightforward to write a script that converts it to a list of dictionaries with "text"
and "span"
.
You can then add the aspect options to the examples as described above, which will create input examples that look something like this:
{
"text": "Hello Apple",
"spans": [{ "start": 6, "end": 11, "label": "ORG"}],
"options": [
{"id": 0, "text": "Aspect 1"},
{"id": 1, "text": "Aspect 2"}
]
}
In the UI, the example above will be displayed as a text with "Apple" highlighted as "ORG", and two multiple-choice options to choose from. When you select an option, its ID will be added to the task as the "accept"
key. For example:
{
"text": "Hello Apple",
"spans": [{ "start": 6, "end": 11, "label": "ORG"}],
"options": [
{"id": 0, "text": "Aspect 1"},
{"id": 1, "text": "Aspect 2"}
],
"accept": [1],
"answer": "accept"
}
After annotation, you can then export the dataset using db-out
, and for each entity highlighted in the text, you'll have its offsets into the text, as well as the selected aspect option(s).