Hi, I'm using Prodigy to do annotation for fine-tuning BERTs, which aims to do the NER and RE tasks. I've annotated the NER with the 'bert.ner.manual' recipe. I wondered if there is a recipe for relation annotation using the dataset I annotated. I've tried the scripts (prodigy rel.manual ner_rels_dep blank: en dataset:ner_rels_ent --label SUBJECT,LOCATION --wrap) to run rel.manual for this purpose but things happened like the original tokenizer cannot identify my labeled data.
hi @Jiachen!
Thanks for your question and welcome to the Prodigy community
Have you seen this post?
The idea is to pass in the pre-tokenized text and avoiding spaCy's tokenizer. You'll need to create a custom recipe as rel.manual
will assume you're using a spaCy tokenizer -- but as the post mentions, you can use the bert.ner.manual
as an example of how to use the BPE tokenizer. Also, you can find the rel.manual
recipe (for reference) in your installed Prodigy library. Run prodigy stats
, find the Location:
path (aka where Prodigy is installed), and then look for recipes/rel.py
script.
Hope this helps!
Thank you for your reply! I think now the prodigy can use an annotated dataset as input for the rel.manual. I looked into the data format that uses built-in recipes and compared them with the output annotated by the bert.ner.manual
, I can then process them in my ide for training them on the bert-based-model. but now the spacy has supported the transformers
, so now I can use the Bert-based model.