I’m guessing this is a German model? If so, and you’re starting out with the German spaCy model, it’s not too surprising that it makes some weird errors, because it was trained on fairly unrepresentative data from Wikipedia. You might try running
ner.batch-train from only a blank or vectors-only model, to avoid starting off with weights that might be unideal for your task.
Regardless of the initialization, to answer your question: there are a few ways you could deal with this that could all be equally valid. The simplest one is to hard-code a rule. You can see an example of this here: https://github.com/explosion/spaCy/blob/master/examples/pipeline/fix_space_entities.py . The code in that example implements a rule that prevents space tokens from being tagged as entities. All you have to do to adapt it to your situation is change the conditional on line 19 to something like
if token.text == "ein".
If you’d rather have the behaviour integrated into the weights, then you can indeed just add negative examples. I think the problem will probably be resolved by itself as you add more data, so I wouldn’t personally worry about it for now. I think the rule-based will probably be quicker, and it will let you implement other ad-hoc fixes. For now, if you have a better prediction model you’ll be able to annotate faster (as you’ll be able to use
ner.make-gold to use the model’s predictions as a starting point.)