Using the output of ner.gold-to-spacy to train a new model

damiano · March 28, 2018, 10:23am

Hello,
As mentioned here: ner.batch-train is really slow i am not able to train the model via ner.batch-train

For the moment i have exported my dataset via ner.gold-to-spacy but the output does not seem compatible to Data formats · spaCy API Documentation.

I have two questions:

How can i use this file to train my model? can i not use it via cli command?
I am using the same training corpus to train my custom entities, each dataset has its own custom NER entity. Example:

I was born in Rome on 1985-01-01

I annotate 1985-01-01 as DATE in the DATE dataset and Rome in the CITY dataset.

Is this a problem? I mean, spacy will see the same sentence one time with ONE annotation and another time with a different one. Is this a problem?

Thanks

honnibal · April 4, 2018, 1:18am

Hi,

Sorry I missed this thread before. I've been writing about the same sort of question in this thread: Remarkable Difference Between Prodigy and Custom Training Times - #5 by wpm

There can be a problem here, yes, but we can take steps to solve it. For a start, you can use the prodigy.models.ner.merge_spans() function to group the annotations onto the same sentence. You should concatenate your datasets and pass them through this function, and then use the ner.print-dataset function to check that the results are correct. Next, you can pass your annotations through the ner.make-gold recipe, so that you can manually correct any missing entities. This should let you create a dataset you can use in spaCy or another NER tool.

ines · April 4, 2018, 9:01am

The ner.gold-to-spacy recipe currently only exports the annotations to character offset or BILUO format – it doesn't yet reconcile spans referring to the same input hash. Adding an option to make it output the a full JSON file for training with spacy train is a good idea, though!

wpm · April 4, 2018, 8:58pm

An option to create the full JSON file for training with spacy train would be excellent. I’ve been really confused about how to generate that format.

Topic		Replies	Views
Prodigy annotations to SpaCy train spacy	13	5421	January 31, 2018
Cannot use the ner.gold-to-spacy output JSONL data to train in spacy train usage , ner , spacy , solved	3	630	June 20, 2019
combining multiple models and exporting training data to spacy ner , spacy	3	2735	November 13, 2018
Prodigy ner.batch-train vs Spacy train usage , spacy , best-practices	13	3359	June 2, 2020
ner.batch_train vs spacy nlp.begin_training ner , spacy	1	1052	January 26, 2018

Using the output of ner.gold-to-spacy to train a new model

Related Topics