Prodigy with Jupyter Notebook

Hi, I'm new with Prodigy, so I want to ask you maybe trivial questions, but I can't work without them :pensive:

I use Jupyter Notebook to work in prodigy, but when running simple commands such as "textcat.teach, ner.manual", processing can take 20-40 minutes. Although previously working with machine learning models with millions of lines processing took a maximum of 4-5 minutes.

Maybe you have tips on how to speed it up or change something in the settings. Or tips on which environment to use to work effectively with Prodigy in Windows.

Thank you!

Hi Vadym,

could you share some more details about your situation? Are you dealing with a very large dataset or a dataset with very large documents? When you say "processing can take 20-40" are you referring to starting the Prodigy server or something else? Was it fast before and did something change that made it slow? Is there a reason why you're using Prodigy from Jupyter? Does it not run outside of Jupyter?

Could you also share your Prodigy/Python versions?

Hi KOANING,
thank you for your quick response!

Dataset with text articles, which contains about 10 thousand copies. But slow processing occurs even at the stage of creating text patterns.

"When you say "processing can take 20-40" are you referring to starting the Prodigy server or something else? " - starting the Prodigy server

"Was it fast before and did something change that made it slow?" - in the Prodigy from the first launch of recipes processing takes a very long time and sometimes just does not happen.

"Was it fast before and did something change that made it slow?" - Earlier, before working with Prodigy, I worked with different models of machine learning and processing large amounts of data and Jupyter processed it quickly.

"Is there a reason why you're using Prodigy from Jupyter? Does it not run outside of Jupyter?" - I work in Jupiter, because I always use it to work with models of data science and it is quite convenient for me. If I try to run programs via anaconda prompt, the processing takes seconds, but anaconda prompt is very inconvenient to use and there are always errors that are difficult to find a solution. So I'm wondering if there are any better environments that were more intuitive and convenient. Or you have detailed manuals for working with anaconda prompt or Jupyter to speed up the work with Prodigy.

Prodigy version - 1.11.7
Platform Windows-10-10.0.19044-SP0
Python Version 3.9.7

Could you try running Prodigy with a very small example? Maybe just a example.jsonl file with:

{"text": "this is really just one example"}

When you run this with ner.manual like so:

!python -m prodigy ner.manual demodataset example.jsonl --label A,B

Is it still slow? If not, could you share the commands that you use to start Prodigy?

Hi, sorry for the long answer, i tried but processing even a single line file took more than 20 minutes to process and I just turned it off. At this stage, I fully use Anaconda promt to work in Prodigy, but it is not very convenient, because if you make different versions of the code or many recipes they need to be stored in a separate document


That's indeed very strange. Let's try to debug this. Could you try to add some logging? Maybe if you start with basic logging we'll be able to spot an irregularity.

Maybe I specified something wrong but I have this error

I fear you've switched around the order there. The PRODIGY_LOGGING=basic needs to occur at the start of the prompt, per the docs here.