Hi,
Sorry I missed this thread before. I've been writing about the same sort of question in this thread: Remarkable Difference Between Prodigy and Custom Training Times - #5 by wpm
There can be a problem here, yes, but we can take steps to solve it. For a start, you can use the prodigy.models.ner.merge_spans()
function to group the annotations onto the same sentence. You should concatenate your datasets and pass them through this function, and then use the ner.print-dataset
function to check that the results are correct. Next, you can pass your annotations through the ner.make-gold
recipe, so that you can manually correct any missing entities. This should let you create a dataset you can use in spaCy or another NER tool.