I am training a custom spacy model using prodigy annotations. This new model has five labels. However, when the results summary is displayed after n iterations of training it is always missing one label i.e. only shows results for four labels. I also made sure that my data is randomized so as to not cluster one set of labels together ( incase the splitting is done sequentially).
Any idea why this might happen? Also is there anyway of ensuring that there is per-label data in both training and evaluation?
Hi! If a label doesn't appear in the summary, this indicates that it didn't end up in the evaluation data.
Are you using a dedicated evaluation set or are you letting Prodigy shuffle the data and hold back a percentage? If you're letting Prodigy hold back some data for evaluation, the data will be shuffled before splitting. Depending on the number of examples you have, you could try setting a higher --eval-split value to hold back more examples. Alternatively, you can also create a separate evaluation dataset and provide that via the --eval-id option. Once you're getting more serious about evaluating the model and comparing the result, this is definitely the solution I'd recommend, as it also makes it much easier to compare your results across runs, and as you annotate more data.
Hey Ines I'm training NER with 11 labels but only 3 are showing, I chose a dataset for evaluation like you said above which contains all the labels, so all the labels should appear after training but only two appear. How do I fix this ?
( data was manually labelled using prodigy and en core web sm, I m training en vectors lg with pre trained weights I m also getting the mis alignment warning if that helps )
Perhaps the misalignments are causing instances to be dropped, leaving you with only the two labels in the remaining set of (aligned) instances? If you fix the misalignments, do you still only get the two labels?
I do believe you are correct, now it makes sense why it doesn't show in fact when I print the number of misalignments I get 4131 misalignment in one database
So this is a screenshot on the misalignments, if you have any idea on how I can correct them that would be great. Thanks a lot