I’m having issues with Prodigy, I’m annotating data using ner.teach recipe, everything goes well until progress percentage reach about 98%, in that point CPU usage rises and eventually the process ends without give any error message about what happens. I already configure logging but nothing appears in the output.
These are all the details about the running configuration and environment:
Running in a dual core machine within AWS with 4 GB of RAM, CPU usage rise to 1.8 of load average and RAM usage keeps on about 50%.
Python version 3.6.7
Prodigy version 1.7.1
Input a jsonl file with about 165K lines with text in Spanish from Twitter
Recipe ner.teach
I also have curiosity about how the percentage showed in the interface is computed, because we never reach the 98%, our count is near to 30K which is far from been 98% of 165K.
Which base model are you using? If you're starting with a larger model, you might actually be running out of memory? Prodigy will make a copy of the base model and keep it in memory, so if that's large, 4GB might actually be too little.
In the active learning-powered recipes, the progress is a rough estimate of when the loss will hit 0 – so basically, when there's "nothing left to learn anymore". This can help you decide when to stop, which is not always easy if your objective is to annotate the best possible analyses rather than every single example in the stream.
My base model is es_core_news_md, I already have and solved the memory issues with a small instance, but it seems like 4GB is enough because memory usage keeps around 50% and I’m not having memory errors on the system log. Now the process is more stable but it occasionally cash without reporting any error. Thanks