Hi,
I have run ner.batch-train to create a base model and trying to improve the model with ner.teach keeping the model in loop with a dataset. i used this command:
sample_text is my dataset and corrected_adjusted is my batch-trained model.
it opens in the interface and i get a ‘loading…’ message on interface. but it does not load anything. What is going wrong? (screen shot attached)
It looks like you forgot to pass in the third argument (source), the path to the input data. If no source is specified, Prodigy defaults to reading from stdin (so you can pipe data forward from other processes). So in your case, Prodigy is waiting to receive piped input, but nothing is coming, so it displays the “Loading…”.
(Btw, for a future version of Prodigy, we’re thinking about making this behaviour more explicit, like other command-line applications do: for example, explicitly setting the source to - to use standard input.)
Thank you.
Does the sequence of input arguments matter?
Also, I have the trained model “corrected_adjusted” in the loop. After finishing a round of annotation with ner.teach, can i see where in the model has it updated? Because, after annotating and saving i checked the model directory and none of the files are updated. After a good chunk of ner-teach annotation i can do a batch-train again with the added dataset. But what happens to the model in the loop? I read in the prodigy_readme that the model in the loop gets updated, but i do not see any change in file-update-times. Its kinda unclear to me.
For the positional arguments, yes. It's really like any other command line tool (or function). For the option arguments like --label, the order doesn't matter because thery're prefixed with the name of the argument.
That's correct – Prodigy won't just silently overwrite your files. The model is kept in memory and updated there, and then discarded afterwards. The main point of updating a model in the loop is to make it suggest better examples and collect better data.
If you actually want to use the model, you should always batch train from the collected annotations using ner.batch-train. This gives you the same model, but better, because you get to train with several iterations and shuffled data, instead of just performing simple updates.
Thank you @ines. The explanation is helpful. as it says ‘improving the model’ i thought it writes to the model in the loop. Great, I will do batch train after a good chunk of annotations.