After a long batch-training session I was all ready to evaluate my newly-trained model on some new data with this:
bin/dump_recent_content.py | prodigy textcat.eval news_classification ./models/news_classification1 -l SPORTS,POLITICS,BUSINESS,WEATHER,ARTENT
but when I did, I got this error:
Using 5 labels: SPORTS, POLITICS, BUSINESS, WEATHER, ARTENT
usage: prodigy textcat.eval [-h] [-l LABEL] [-a None] [-lo None] [-e None]
dataset spacy_model source
prodigy textcat.eval: error: the following arguments are required: source
Traceback (most recent call last):
File "bin/dump_recent_content.py", line 92, in <module>
main(parse_opts())
File "bin/dump_recent_content.py", line 64, in main
'title': source['title']
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>
BrokenPipeError: [Errno 32] Broken pipe
The dump_recent_content.py
script is something I built to grab content from an Elasticsearch instance and convert it to the proper JSON format for Prodigy. I used the same command for textcat.teach
. So I went digging into the source and it looks like source
has a default value of None
in the teach()
recipe, but there is no default for source
in the evaluate()
method.
So for kicks, I added a default value for source
(as well as label
) and the web server finally launched. I was able to quickly classify several new articles until I ran out of content (only 50 articles). So I got the “No tasks available” screen. I returned to the terminal and issued a ^C
to stop the server, which gave me this message:
Saved 250 annotations to database SQLite
Dataset: news_classification
Session ID: 2019-04-19_09-23-06
And now it’s just hanging there chewing on CPU. I’ve left it running for over five minutes and it still hasn’t finished. It must be up to something!
Here are the stats for this dataset:
prodigy stats news_classification
✨ Prodigy stats
Version 1.7.1
Location /Users/avollmer/Development/spacy-ner/.venv/lib/python3.7/site-packages/prodigy
Prodigy Home /Users/avollmer/.prodigy
Platform Darwin-17.7.0-x86_64-i386-64bit
Python Version 3.7.2
Database Name SQLite
Database Id sqlite
Total Datasets 12
Total Sessions 86
✨ Dataset 'news_classification'
Dataset news_classification
Created 2019-04-17 15:14:17
Description Coarse-grained classification of news into broad sections
Author Alex Vollmer
Annotations 9390
Accept 6736
Reject 2582
Ignore 72
This seems like a bug, but I’m always very hesitant to jump to that conclusion straight-away. Am I doing this right?
Thanks!