Continue bert.ner.manual annotating where I left of

thondeboer · April 24, 2023, 6:49pm

I am trying to annotate some text with the custom recipe bert.ner.manual and after I do a couple of annotations, save them and stop the prodigy server and restart it, it does not start the annotations from where I left of, but starts from scratch again.

I looked at the ner.manual recipe and can't find anything special about how it does that, so how could I make the custom recipe "bert.ner.manual" to behave the same?

thondeboer · April 24, 2023, 6:54pm

I tried the solution seen here: --exclude is not working for ner.make-gold on same dataset but it did not work and seems old anyway.

ryanwesslen · April 24, 2023, 8:33pm

hi @thondeboer!

Thanks for your message and welcome to the Prodigy community

It's a bit hard to know without seeing the recipe. However, it's sounding like hashing isn't working correctly.

One simple way to test would be to change exclude_by: input in your configuration (either prodigy.json, override, or returned in your recipe). This will try to hash by input, instead of task.

But another possibility is that you're not even hashing to begin with.

Like bert.ner.manual, are you loading your input source like:

stream = get_stream(source, loader=loader, input_key="text")

Can you instead try add rehash = True and dedup = True (not needed, but default behavior):

stream = get_stream(source, loader=loader, input_key="text", rehash = True, dedup = True)

This may be a typo in that recipe. Typically, built in recipes use get_stream with the arguments rehash = True and dedup = True.

Alternatively, you could add in hashes using set_hashes. This is what the rehash = True flag does.

thondeboer · April 25, 2023, 3:21pm

HI, I am using the recipe as provided by the prodigy-recipes repo on github (prodigy-recipes/transformers_tokenizers.py at master · explosion/prodigy-recipes · GitHub).

It does not seem to use those rehash=True and dedup=True options and adding them indeed solved the issue...It is now correctly starting where I left off...

I did not see those options in the default ner.manual recipe in the github entry, since it is using stream = JSONL(source) which I am guessing is already doing the filtering itself.

Topic		Replies	Views
Can't find recipe or command 'bert.ner.manual' usage , ner , solved , transformers	4	591	September 23, 2022
bert.ner.manual with pattern matcher ner	1	264	February 8, 2023
recipe proposing list of custom chosen sentences for manual new usage , ner , custom , solved	4	1095	January 21, 2018
Broken Web App with custom ner.make-gold done , front-end , solved	5	773	October 7, 2019
Custom recipe with Text_input block usage , ner , custom , solved	4	796	March 7, 2022

Continue bert.ner.manual annotating where I left of

Related topics