questions on Multi NERs Annotation & Training at Once in a Sentence

hi @ruiyeNLP!

The spacy evaluate is the command. It can evaluate trained models. Since spaCy is open source, you can see the code for it:

What's important is the function handle_scores_per_type. This is what is called when using --label stats. As you can see, it's called by default for spacy evaluate.

The one thing you may want to do is create a dedicated hold out (evaluation) dataset. By default, Prodigy can enable the --eval-split 0.2 which will do the splitting for you. The problem is each run you may get a different set of data. Ideally, you should split the data.

If you're going to use spacy train, the data-to-spacy recipe can help. It'll do a partition of your data and then convert it to .spacy binary format (which is ideal for using spacy).

There isn't a spacy version of train-curve unfortunately. You can write your own version. See this for details:

Also, you can export your prodigy train-curve results to a .txt file by adding in "> train_curve.txt" so:

prodigy train-curve ... > train_curve.txt
1 Like