hi @shensmobile!
Prodigy does have character-level highlighting which essentially turns off tokenization:
As the link mentions:
When using character-based highlighting, annotation may be slower and there’s no guarantee that the spans you annotate map to actual tokens later on. If your goal is to train a named entity recognizer, you should consider using the same tokenizer during annotation, to make sure that your data can be used. Also see the section on efficient annotation for transformers if you’re training a transformer-based model (e.g. BERT) with subword tokenization.
The only problem is these are more for span annotation (ner
or spancat
) and --highlight-chars
isn't available for rel.manual
. If you wanted, you could look at the ner
recipes and try to modify your rel.manual
to get something like it.
To view the recipes, find your Prodigy location (python -m prodigy stats
and find the path to Location:
). Then look for the recipes/rel.py
(relations recipes) and the recipes.ner.py
(ner recipes). Perhaps you can see how --highlight-chars
is used in the ner
recipes and try to replicate something for the rel.manual
.
Since you're only visualizing the relations it seems like it could work -- but I'd need to take more time to test to confirm if it would. Let us know if you find anything!