Hi! I already replied to your other post and it sounds like one problem might be this:
If you're rejecting suggestions that are correct, the model will be updated with the information that "this span is not an entity" and it can get confused and will try to come up with interpretations that match this new information. So it's definitely possible that after a few batches, the model ends up in a weird state where it starts to suggest you completely arbitrary spans.
Also, moving the discussion from the previous thread over:
Thank you, Ines! When I use my own model trained with "prodigy train", using ner.teach only shows scores of 1.00 from the very beginning.
How are you evaluating your model? Do you have a dedicated evaluation set, or are you just evaluating against a held-back percentage of the binary annotations? If you're evaluating against binary annotations and you only have a small set of annotations, you can easily end up with relatively unreliable evaluation scores: you're only evaluating against the spares binary information, so you won't know if any of the other predictions are correct or not. And if some of the evaluation examples are examples with no entities that the model gets correct very reliably, you may end up with an accuracy of 100% that's not actually very representative of the overall performance. So it's usually much better to evaluate against a stable set of gold-standard annotations.