Hi All,
When I use the --exclude
flag in ner.correct
I see the same 26 examples over and over. When I shut down the server and restart, I get a new set of 26 examples and can continue annotating, but restarting the annotation process every 26 examples doesn't seem like a workflow that is in the spirit of prodi.gy. Am I doing something wrong?
In the spirit of Ines' "ingredients" NER model, I'm running a command like this (names changed to protect the innocent):
(prodigy-1.10.5) [~/work]$ prodigy ner.correct dataset_2 ./tmp_model prepped-data.jsonl --label FRUIT,VEG,MEAT,DAIRY,GRAIN --exclude dataset_1
The same command without the --exclude
argument serves all the examples in prepped-data.jsonl
(but of course doesn't exclude the examples in dataset_1
).
Here's my prodigy.json
config (most are default settings):
{
"theme": "basic",
"custom_theme": {},
"buttons": ["accept", "reject", "ignore", "undo"],
"batch_size": 10,
"history_size": 10,
"port": 8080,
"host": "0.0.0.0",
"cors": true,
"db": "sqlite",
"db_settings": {},
"api_keys": {},
"validate": true,
"auto_exclude_current": true,
"instant_submit": false,
"feed_overlap": false,
"ui_lang": "en",
"project_info": ["dataset", "session", "lang", "recipe_name", "view_id", "label"],
"show_stats": false,
"hide_meta": false,
"show_flag": false,
"instructions": false,
"swipe": false,
"split_sents_threshold": false,
"html_template": false,
"global_css": null,
"javascript": null,
"writing_dir": "ltr",
"show_whitespace": false,
"exclude_by": "input"
}
I'm running Prodigy 1.10.5
on Linux, Python 3.6.9
.
I'm working around this by piping my data in using bash tail
thusly:
(prodigy-1.10.5) [~/work]$ tail -n +100 prepped-data.jsonl | prodigy ner.correct dataset_2 ./tmp_model - -l FRUIT,VEG,MEAT,DAIRY,GRAIN
I increment the -n
argument for each batch of data to correct, but this still breaks the flow a little. Is there a way to use --exclude
and see a full stream of data using prodi.gy alone?
Thanks for your help!