Can I use rel.manual without tokenization?

ryanwesslen · December 2, 2022, 7:20pm

Prodigy does have character-level highlighting which essentially turns off tokenization:

As the link mentions:

When using character-based highlighting, annotation may be slower and there’s no guarantee that the spans you annotate map to actual tokens later on. If your goal is to train a named entity recognizer, you should consider using the same tokenizer during annotation, to make sure that your data can be used. Also see the section on efficient annotation for transformers if you’re training a transformer-based model (e.g. BERT) with subword tokenization.

The only problem is these are more for span annotation (ner or spancat) and --highlight-chars isn't available for rel.manual. If you wanted, you could look at the ner recipes and try to modify your rel.manual to get something like it.

To view the recipes, find your Prodigy location (python -m prodigy stats and find the path to Location:). Then look for the recipes/rel.py (relations recipes) and the recipes.ner.py (ner recipes). Perhaps you can see how --highlight-chars is used in the ner recipes and try to replicate something for the rel.manual.

Since you're only visualizing the relations it seems like it could work -- but I'd need to take more time to test to confirm if it would. Let us know if you find anything!

Topic		Replies	Views
Tokenization compatibility issues in rel.manual enhancement , usage , done , transformers , relations	7	1460	September 8, 2020
rel.manual not accepting entities because of tokenization ner , solved , relations	7	1068	April 17, 2024
How to do relation annotation after using bert.mer.manual transformers , relations	2	386	December 12, 2023
Fully manual NER annotations without tokeniser enhancement , ner , done	3	1008	June 17, 2020
ner-manual does not use custom tokens ner , done , solved	3	732	January 29, 2020

Can I use rel.manual without tokenization?

Related topics