NER for long string

hi @jiebei!

Yes, you can move to ner.teach (i.e., active learning) if you want to improve the model.

However, if you're overall happy with your model's performance, you can add other extensions like spacy-streamlit. This is a streamlit app that can be used to demo your model. This can be extremely helpful to show your model to non-data scientists. You can run it locally or deploy onto a cloud environment.

You'll want to use the model that you previously trained from the manual (and/or correct) annotations. In prodigy train, you need to specify the output_path for your model. Within that model, it saves a model-last (which is the last version of your model) and a model-best (which is the best performing version of your model. You can then use that path output_path/model-last (e.g., if you want your last run) as your model that you'll use.

The example is a case where you want to use spaCy's built-in ner that has multiple different entity types like PERSON or ORG. Since you have custom entities, you need a model that has been trained for those entities.

Likely you'd want to use new text data. If your model has done a good job, likely it has already embedded the information from the manual recipes into the model, and thus already has "learned" from that example. What you want in doing ner.teach is to find blind spots of your model. The ner.teach model can perform active learning that will modify the order examples are given to you to choose to label the ones the model is most uncertain about.

It's important to note that while ner.teach (active learning) in theory makes sense, it doesn't always work in practice. As an alternative, you could instead keep on using the ner.correct recipe which is like the ner.teach applied on new examples, but only makes predictions on the new text, it doesn't reorder the examples.

Here's a great discussion on Matt about active learning:

Hope this helps!