Stream textcat.eval from stdin?

After a long batch-training session I was all ready to evaluate my newly-trained model on some new data with this:

bin/dump_recent_content.py  | prodigy textcat.eval news_classification ./models/news_classification1 -l SPORTS,POLITICS,BUSINESS,WEATHER,ARTENT

but when I did, I got this error:

Using 5 labels: SPORTS, POLITICS, BUSINESS, WEATHER, ARTENT
usage: prodigy textcat.eval [-h] [-l LABEL] [-a None] [-lo None] [-e None]
                            dataset spacy_model source
prodigy textcat.eval: error: the following arguments are required: source
Traceback (most recent call last):
  File "bin/dump_recent_content.py", line 92, in <module>
    main(parse_opts())
  File "bin/dump_recent_content.py", line 64, in main
    'title': source['title']
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe

The dump_recent_content.py script is something I built to grab content from an Elasticsearch instance and convert it to the proper JSON format for Prodigy. I used the same command for textcat.teach. So I went digging into the source and it looks like source has a default value of None in the teach() recipe, but there is no default for source in the evaluate() method.

So for kicks, I added a default value for source (as well as label) and the web server finally launched. I was able to quickly classify several new articles until I ran out of content (only 50 articles). So I got the “No tasks available” screen. I returned to the terminal and issued a ^C to stop the server, which gave me this message:

Saved 250 annotations to database SQLite
Dataset: news_classification
Session ID: 2019-04-19_09-23-06

And now it’s just hanging there chewing on CPU. I’ve left it running for over five minutes and it still hasn’t finished. It must be up to something!

Here are the stats for this dataset:

prodigy stats news_classification

  ✨  Prodigy stats

  Version            1.7.1              
  Location           /Users/avollmer/Development/spacy-ner/.venv/lib/python3.7/site-packages/prodigy 
  Prodigy Home       /Users/avollmer/.prodigy 
  Platform           Darwin-17.7.0-x86_64-i386-64bit 
  Python Version     3.7.2              
  Database Name      SQLite             
  Database Id        sqlite             
  Total Datasets     12                 
  Total Sessions     86                  


  ✨  Dataset 'news_classification'

  Dataset            news_classification 
  Created            2019-04-17 15:14:17 
  Description        Coarse-grained classification of news into broad sections 
  Author             Alex Vollmer       
  Annotations        9390               
  Accept             6736               
  Reject             2582               
  Ignore             72

This seems like a bug, but I’m always very hesitant to jump to that conclusion straight-away. Am I doing this right?

Thanks!

Follow-up: wouldn’t you know it? Thirty seconds after posting the process finally exited with some stats at the end.

One follow-up question I have is about how textcat.eval labels articles. It appears that it offers choice for the user to label each piece of content with each label. So for 50 articles and 5 labels, I have a total of 250 choices to make. Is that correct?

Thanks for the detailed report and analysis! The source argument not defaulting to None is definitely wrong – it should, so you can stream in from stdin. Already fixed it so we can ship that with the next release.

Hmm, I wonder if this could be related to your dataset being very large? When you exit the server after running textcat.eval, it will grab the contents of the entire dataset and evaluate the model on that. Now that I think about it, I can see how this is kinda unintuitive – the recipe should probably have a setting that lets you toggle between only evaluating on the current session annotations vs. the whole evaluation set.

Also, is news_classification your main dataset that you also trained on? If so, adding the evaluation annotations to that set might have not been what you wanted. Instead, it's probably better to create a separate set like news_classification_eval to store your evaluation examples.

If you only want to evaluate on the 250 annotations, you could export the session dataset to a file and then re-import that to a new dataset. Each session also gets its own dedicated dataset named after the session ID, so you can do something like this and then reimport the data to a new dataset start fresh:

prodigy db-out 2019-04-19_09-23-06 > eval_session.jsonl

Yes, exactly. For each label, Prodigy will copy the task and create a new example. You can also see this if you check out the recipe source in prodigy/recipes/textcat.py.