hi @Jiachen!
Thanks for your question and welcome to the Prodigy community
Have you seen this post?
The idea is to pass in the pre-tokenized text and avoiding spaCy's tokenizer. You'll need to create a custom recipe as rel.manual
will assume you're using a spaCy tokenizer -- but as the post mentions, you can use the bert.ner.manual
as an example of how to use the BPE tokenizer. Also, you can find the rel.manual
recipe (for reference) in your installed Prodigy library. Run prodigy stats
, find the Location:
path (aka where Prodigy is installed), and then look for recipes/rel.py
script.
Hope this helps!