Can I use rel.manual without tokenization?

shensmobile · December 2, 2022, 5:02pm

Hi, sorry for the re-post. I accidentally deleted the previous one.

I have used rel.manual to label some data successfully and trained a model directly in HuggingFace. I was looking for a good visualization tool to display the predictions and couldn't find anything as good as Displacy for Entities, so I thought "hey maybe I can just use Prodigy!"

Unfortunately, I am unable to get any of my predictions back into Prodigy. I have done this in the past with rel.manual by labelling it with my NER models from Huggingface and creating a jsonl with the following format in it

{
"text": text,
"spans": [{"start": 1
"end": 5
"label": label}]
}

I tried doing the same thing except with relations

{
"text": text,
"spans": [{"start": 1,
"end": 5,
"label": "label"},
{"start": 6,
"end": 10,
"label": "label"}],
"relations": [{"head_span": {"start": 1,
"end": 5,
"label": "label"}},
{"child_span": {"start": 6,
"end": 10,
"label": "label"}},
"label": "related"}]
}

Does rel.manual require tokens to function? If so, is there any way for me to skip the tokenization step? I am using a custom HuggingFace tokenizer and don't need to adjust the relations, just need to visualize them for now.

ryanwesslen · December 2, 2022, 7:20pm

hi @shensmobile!

Prodigy does have character-level highlighting which essentially turns off tokenization:

As the link mentions:

When using character-based highlighting, annotation may be slower and there’s no guarantee that the spans you annotate map to actual tokens later on. If your goal is to train a named entity recognizer, you should consider using the same tokenizer during annotation, to make sure that your data can be used. Also see the section on efficient annotation for transformers if you’re training a transformer-based model (e.g. BERT) with subword tokenization.

The only problem is these are more for span annotation (ner or spancat) and --highlight-chars isn't available for rel.manual. If you wanted, you could look at the ner recipes and try to modify your rel.manual to get something like it.

To view the recipes, find your Prodigy location (python -m prodigy stats and find the path to Location:). Then look for the recipes/rel.py (relations recipes) and the recipes.ner.py (ner recipes). Perhaps you can see how --highlight-chars is used in the ner recipes and try to replicate something for the rel.manual.

Since you're only visualizing the relations it seems like it could work -- but I'd need to take more time to test to confirm if it would. Let us know if you find anything!

shensmobile · December 7, 2022, 12:52am

Hi Ryan,

I ended up getting bored over the weekend and wrote code to port the BERT tokens into my .jsonl and used the bert.ner.manual.py as inspiration to make my own bert.rel.manual.py.

This was a great exercise in learning about how recipes work though! Next I'll try to port over --highlight-chars for future projects that use various different tokenizers!

Topic		Replies	Views
Tokenization compatibility issues in rel.manual enhancement , usage , done , transformers , relations	7	1429	September 8, 2020
rel.manual not accepting entities because of tokenization ner , solved , relations	7	1057	April 17, 2024
How to do relation annotation after using bert.mer.manual transformers , relations	2	367	December 12, 2023
Fully manual NER annotations without tokeniser enhancement , ner , done	3	998	June 17, 2020
ner-manual does not use custom tokens ner , done , solved	3	716	January 29, 2020

Can I use rel.manual without tokenization?

Related topics