Create baseline metrics based on manual NER annotations

I'm taking my first steps with Prodigy and have annotated a test set with ORG labels (only).

My intention is to see how well the existing Spacy models, and eventually other NER models, perform on this labelled data set out of the box, before I start any training. I haven't been able to find a simple way to do this. I essence I'm looking for a evaluate.model recipe where I can pass in a test/validation set and a model which outputs evaluation metrics.

I tried passing in no training data to the train recipe but it didn't want to play ball.

Currently I'm trying to transform the output from the data-to-spacy recipe to fit with the GoldParse input, so I can get some . It seems to me like it could be standard use case, so wanted to check if there is a simpler way to achieve what I want?

It seems like the following post is dealing with the same question:

I'm trying this now.

Hi! I think what you're looking for is spacy evaluate?

This takes data in spaCy's format and will perform an evaluation. Prodigy's training experiments are really designed for quick training experiments with Prodigy dataset and not necessarily to replace the training or evaluation process of whichever library you're using (e.g. spaCy).

The above post is a bit more abstract and about a custom evaluatio, including evaluation of binary data (I think) so I'm not sure that's the right approach if you just want to output scores.

Thank you for the prompt reply I didn't know about spacy evaluate! However, I will log the results to a database so using the code from the example above fits the bill quite well.