I made a holdout set to evaluate the model using ner.manual. When running ner.batch-train with the --eval-id pointing to this set it has a maximum of 20% accuracy. However if I run ner.print-stream on the holdout set with my trained model the results look really strong(like 80-90%).
Any ideas of why this may be happening?