Also, three coders already worked on the annotations, we have more than 700 annotations currently. I do see these annotations when viewing prodigy.db in a SQLite viewer. However, prodigy states that there are 0 datasets:
root@077935fff106:/prodigy/home# prodigy stats -ls
? Prodigy stats
Version 1.8.4
Location /usr/local/lib/python3.6/site-packages/prodigy
Prodigy Home /root/.prodigy
Platform Linux-4.15.0-64-generic-x86_64-with-debian-10.1
Python Version 3.6.9
Database Name SQLite
Database Id sqlite
Total Datasets 0
Total Sessions 0
Here's how one of the annotation tasks looks like in DB (note that the id from before polnewstargetsentiment occurs here, too - so I guess the command from above worked well?).
{"targetphrase":"sometext","text":"sometext,","html":"somehtml,","options":[{"id":"positive","text":"\ud83d\ude0a positive"},{"id":"neutral","text":"\ud83d\ude36 neutral"},{"id":"negative","text":"\ud83d\ude41 negative"},{"id":"posneg","text":"\ud83d\ude0a+\ud83d\ude41 pos. and neg."}],"_input_hash":-624773216,"_task_hash":868511941,"_session_id":"polnewstargetsentiment-timo","_view_id":"choice","accept":["neutral"],"answer":"accept"}
Does your recipe in newstsarecipe.py pass through and return the name of the dataset polnewstargetsentiment and return it as the "dataset" setting? This is how Prodigy knows where to save the annotations. (When you first start the server, the dataset is created if it doesn't exist – but it's often a good idea to explicitly run prodigy dataset to add a new set, to make sure everything it set up correctly.)
When you start up the server for your annotators, does it always run under the same user account / write to the same DB? By default, a database prodigy.db the Prodigy home directory (.prodigy in the user home) is created and used. But if you're starting the server under different user accounts for instance, it may create a separate database for each user. In that case, you probably want to configure the database settings to make sure you're always writing to the same DB.
@prodigy.recipe('newstsa',
dataset=prodigy.recipe_args['dataset'],
file_path=("Path to texts", "positional", None, str))
def sentiment(dataset, file_path):
"""Annotate the sentiment of texts using different mood options."""
stream = JSONL(file_path) # load in the JSONL file
stream = add_options(stream) # add options to each task
return {
'dataset': dataset, # save annotations in this dataset
'view_id': 'choice', # use the choice interface
"config": {
"choice_auto_accept": True, # auto-accept example, once the users selects an option
"instructions": "/prodigy/manual.html"
},
'on_exit': on_exit,
'stream': stream,
}
Regarding the second question: I think so, too. On the server, there is only one user (root) and the Prodigy home dir is set via a environment variable.
Thanks for the update – the recipe definitely looks correct
When you're running prodigy stats and prodigy db-out, are you setting that environment variable, too? And is the location shown in the stats (/root/.prodigy) correct?
Since you can see the tables and data in the SQLite browser, I think the most likely explanation is that the database you're loading here is not the same one that the annotations were saved to. Under the hood, the database commands only really do something like this:
from prodigy.components.db import connect
import srsly
db = connect()
examples = db.get_dataset("dataset_name")
srsly.write-jsonl("/path/to/data.jsonl", examples)
Alright, that's what I got wrong: when running prodigy in "server mode" so that people can use it to annotate our data, I set the env variables, but when issuing db stats and the like, I'm not; hence, the differences in the output of the db stats command, which shows 0 datasets without the environment variable, but as expected shows 1 dataset when the environment variable is set properly. I'm sorry for the confusion, I totally missed that the stats command already showed that there was a different home path. And thank you for the help!