I am trying to create a dataset for ner training. My pipeline is the following: I cold start labeling manually with
ner.manual. Then I train a model from the manual gold data. I would like to use
ner.correct to speed up the manual labeling.
As I am using the same dataset name and source for both manual and correct recipes, I was wondering if this could create duplicates in my final dataset? More generally, is it safe to make a few cycles of train and correct to improve suggestions more and more during annotation?