Manual annotations are saved, but not in a dataset? How to export them?

fhamborg · October 14, 2019, 9:21am

Hi!

I think I basically have two main questions:

Why does prodigy show that we have 0 datasets (should be 1, I guess)? (And: Did we do something wrong or is this by design?)
How can we export the annotations? prodigy db-out datasetname does not work, since there is no dataset.

Some background info: We created the annotation tasks following the tutorial by issuing:

prodigy newstsa polnewstargetsentiment -F path/newstsarecipe.py path/anno.jsonl

Also, three coders already worked on the annotations, we have more than 700 annotations currently. I do see these annotations when viewing prodigy.db in a SQLite viewer. However, prodigy states that there are 0 datasets:

root@077935fff106:/prodigy/home# prodigy stats -ls

  ?  Prodigy stats

Version          1.8.4
Location         /usr/local/lib/python3.6/site-packages/prodigy
Prodigy Home     /root/.prodigy
Platform         Linux-4.15.0-64-generic-x86_64-with-debian-10.1
Python Version   3.6.9
Database Name    SQLite
Database Id      sqlite
Total Datasets   0
Total Sessions   0

Here's how one of the annotation tasks looks like in DB (note that the id from before polnewstargetsentiment occurs here, too - so I guess the command from above worked well?).

{"targetphrase":"sometext","text":"sometext,","html":"somehtml,","options":[{"id":"positive","text":"\ud83d\ude0a positive"},{"id":"neutral","text":"\ud83d\ude36 neutral"},{"id":"negative","text":"\ud83d\ude41 negative"},{"id":"posneg","text":"\ud83d\ude0a+\ud83d\ude41 pos. and neg."}],"_input_hash":-624773216,"_task_hash":868511941,"_session_id":"polnewstargetsentiment-timo","_view_id":"choice","accept":["neutral"],"answer":"accept"}

Thank you in advance!

Cheers,
Felix

ines · October 14, 2019, 9:29am

Hi! There are two things to check here:

Does your recipe in newstsarecipe.py pass through and return the name of the dataset polnewstargetsentiment and return it as the "dataset" setting? This is how Prodigy knows where to save the annotations. (When you first start the server, the dataset is created if it doesn't exist – but it's often a good idea to explicitly run prodigy dataset to add a new set, to make sure everything it set up correctly.)
When you start up the server for your annotators, does it always run under the same user account / write to the same DB? By default, a database prodigy.db the Prodigy home directory (.prodigy in the user home) is created and used. But if you're starting the server under different user accounts for instance, it may create a separate database for each user. In that case, you probably want to configure the database settings to make sure you're always writing to the same DB.

fhamborg · October 14, 2019, 10:00am

Hi Ines!

Regarding your first question: I guess so, see:

@prodigy.recipe('newstsa',
                dataset=prodigy.recipe_args['dataset'],
                file_path=("Path to texts", "positional", None, str))
def sentiment(dataset, file_path):
    """Annotate the sentiment of texts using different mood options."""
    stream = JSONL(file_path)  # load in the JSONL file
    stream = add_options(stream)  # add options to each task

    return {
        'dataset': dataset,  # save annotations in this dataset
        'view_id': 'choice',  # use the choice interface
        "config": {
            "choice_auto_accept": True,  # auto-accept example, once the users selects an option
            "instructions": "/prodigy/manual.html"
        },
        'on_exit': on_exit,
        'stream': stream,
    }

Regarding the second question: I think so, too. On the server, there is only one user (root) and the Prodigy home dir is set via a environment variable.

ines · October 14, 2019, 10:10am

Thanks for the update – the recipe definitely looks correct

When you're running prodigy stats and prodigy db-out, are you setting that environment variable, too? And is the location shown in the stats (/root/.prodigy) correct?

Since you can see the tables and data in the SQLite browser, I think the most likely explanation is that the database you're loading here is not the same one that the annotations were saved to. Under the hood, the database commands only really do something like this:

from prodigy.components.db import connect
import srsly

db = connect()
examples = db.get_dataset("dataset_name")
srsly.write-jsonl("/path/to/data.jsonl", examples)

fhamborg · October 14, 2019, 10:31am

Alright, that's what I got wrong: when running prodigy in "server mode" so that people can use it to annotate our data, I set the env variables, but when issuing db stats and the like, I'm not; hence, the differences in the output of the db stats command, which shows 0 datasets without the environment variable, but as expected shows 1 dataset when the environment variable is set properly. I'm sorry for the confusion, I totally missed that the stats command already showed that there was a different home path. And thank you for the help!

ines · October 14, 2019, 10:33am

No worries, glad it's working now!

DSLituiev · October 7, 2020, 11:59pm

this doesn't seem like a valid function name

DSLituiev · October 8, 2020, 12:13am

this works

with open(filename, "w") as fh:
    for ex in examples:
        fh.write(srsly.json_dumps(ex) + "\n")

ines · October 8, 2020, 7:12am

Ah, sorry, that was of course supposed to be write_jsonl! Also see here: GitHub - explosion/srsly: 🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)

Topic		Replies	Views
Saving and retrieving annotations usage , database , custom , solved	7	5109	June 13, 2018
How to access stored annotation files?	4	111	June 17, 2024
Ask for information usage , database , solved	2	289	November 17, 2021
Is there a way to get a list of all the databases/annotation projects as well as where the databases are saved on the disk? usage , database	1	4924	April 7, 2020
Where is the annotation saved when prodigy.json is empty? database	1	446	July 15, 2022

Manual annotations are saved, but not in a dataset? How to export them?

Related topics