Here's a character-based example with English text: Named Entity Recognition ยท Prodigy ยท An annotation tool for AI, Machine Learning & NLP
The annotations you can export include the start and end character offset of the span, as well as the start and end token index the span refers to. You can also convert character offsets to BILUO/IOB tags programmatically โ see here for an example.
However, since those tags always refer to tokens, you'd have to decide on what the tokens are in your data โ for instance, if you want "ZnT" in "ZnT-SP" to be an entity span, your tokenization needs produce a separate token for it, otherwise you won't be able to create a tag for that token and your model wouldn't be able to learn from it.
Prodigy expects you to provide the label scheme upfront and it's typically not something you'd want the annotator to decide at runtime. The presence and absence of a label can make a big difference for the model and if the labels change during annotation, this can easily lead to very inconsistent data.
You'll also need to know the label scheme if you want to take advantage of a model in the loop, because the model should be initialized with all available labels that it's going to predict.
Do you mean the labels displayed at the top of the annotation card? In theory, that's possible โ you could just look at the top X possible anlyses for the given text, take all entity labels of the predicted spans and then override the "labels"
via the "config"
setting of each individual annotation task that gets sent out.
I'm not sure this will really help with efficiency, though, since it means that the order of labels can change with every example and the annotator has to reorient themselves constantly. To me it seems more useful to just have the model pre-highlight the most confident predictions in the text, e.g. like the ner.correct
recipe does it. Changing the order of labels is something I can see working better for text classification where you have labels for the whole text and could pre-select the most confident predictions and move the more uncertain labels further up.
Streams that queue up data for annotation are just Python generators under the hood that yield dictionaries โ so you can implement any custom logic for selecting the examples, pre-highlighting entities using your model, sorting examples, skipping texts etc. Here's an example for active learning with a custom model.