Thanks for your questions!
Yes, absolutely. Prodigy is centered around "recipes", which are simple Python functions that return a dictionary of components – e.g. the stream of examples and optional functions to update the model, and customise other behaviours. You can find more details and examples of this in the custom recipes usage workflow. The source of the built-in recipes is also shipped with Prodigy, so you can look at the code and take inspiration from it.
To make use of the active learning, all you need is a function that assigns scores to an incoming stream of examples and yields (score, example)
tuples, and a function that takes a list of annotated examples and updates the model. A custom recipe to integrate a custom model could look something like this:
import prodigy
from prodigy.components.loaders import JSONL # file format loader
from prodigy.components.sorters import prefer_uncertain # or other sorter
@prodigy.recipe('custom-ner')
def custom_ner(dataset, source):
model = load_your_custom_model() # load your model however you want
stream = JSONL(source) # assuming your data source is a JSONL file
stream = model(stream) # assign scores to examples via model
stream = prefer_uncertain(stream) # sort to prefer uncertain scores
return {
'dataset': dataset, # ID of the dataset to store annotations
'stream': stream, # stream of examples
'update': model.update, # update model with annotations
'view_id': 'ner' # use NER annotation interface
}
You could then use the recipe as follows:
prodigy custom-ner my_dataset my_data.jsonl -F recipe.py
Sure! To make the most out of Prodigy's intuitive interface, you could for example extract only the rejected examples from your dataset and reannotate them. This keeps the annotator focused on one task at a time. If you have a clearly defined label set of say, 5-10 labels, you could use the choice
interface and only stream in the examples that were previously rejected. The annotator would then see the text with the highlighted entity, and would be able to select one of the available labels (to correct it) or "no label" if the span is not an entity.
There's also a boundaries
interface that lets you create entity (or other span) annotations by selecting the individual tokens. We're also working on more "traditional" interfaces for use cases that require manual labeling.
Yes – I've actually outlined a few solutions and ideas for relationship annotation in this thread.
In general, you can freely mix and match the different annotation UI components and build your own interface. For example, if you annotation task contains spans
, those will be rendered as entities within the text. If a span contains a label
, it will be rendered next to the entity. If the task itself contains a label
, it will be displayed as the headline above the task. So let's say you need to annotate whether two entities that are part of a longer text are related. Your annotation task could look like this:
{
"text": "entity A, some text in between, entity B"
"spans": [{"start": 0, "end": 8, "label": "A"}, {"start": 32, "end": 40, "label": "B"}],
"label": "RELATED"
}
This would show an annotation card with the headline "RELATED" and the two entities highlighted within the text. The task will be immediately intuitive to the annotator, and you'll be able to collect one-click annotations on whether the two entities are "related" (of course, in a real-world example, you'd probably want to use a more descriptive relationship here).
If the built-in options are not enough, you can always use custom HTML templates. You can either specify the HTML directly in the task, e.g.:
{"html": "This is text. <strong>This is bold.</strong>"}
... or provide a 'html_template'
string within the config
returned by your recipe. The template will have access to all properties specified in the task as Mustache variables (which also allow nested objects!). For example:
{"relation": "SIBLINGS", "data": {"text1": "foo", "text2": "bar"}}
<strong style="color: red">{{relation}}</strong><br />
{{data.text1}} → {{data.text2}}