Start with a New Model When Starting a New Session

Hi all,

When I use prodigy textcat.teach dataset spacy_model source to take annotation task with active learning, what’s the model if I exit current process and reopen it with the same command? Is the model the same with the one in first annotation without any update from first annotation task? What if I want to continue with the updated model?

The model you pass in via the command line will only be updated in memory and won’t be overwritten on disk. So if you re-start with the en_core_web_sm model, it will be the same initial model (not the one you updated in the loop).

If you want to start the process with an updated model, it’s usually recommended to run the textcat.batch-train command first. This will update the model with the annotations (just like the active learning recipe), but it will use multiple iterations and other training tricks, so you usually end up with a better and more accurate model.

Here’s an example:

# first session
prodigy textcat.teach dataset en_core_web_sm source.jsonl --label SOME_LABEL

# train model from annotations
prodigy textcat.batch-train dataset en_core_web_sm /path/to/new-model --label SOME_LABEL

# next session
prodigy textcat.teach dataset /path/to/new-model source.jsonl --label SOME_LABEL
1 Like