/get_questions in classification task not returning id?


I am trying to run an active learning annotation classification task(textcat.teach) via the prodigy api, but contrary to what the documentation states in Prodigy's excellent README.html, I don't see an id field in the payload returned when hitting /get_responses

    "tasks": [
            "text": "database designer",
            "_input_hash": -1510625000,
            "_task_hash": -1222882742,
            "label": "information_technology",
            "score": 0.10009049624204636,
            "priority": 0.10009049624204636,
            "spans": [],
            "meta": {
                "score": 0.10009049624204636
            "_session_id": null,
            "_view_id": "classification"

Am I missing something? I am trying to understand what the response payload needs to be for /give_answers in this case when there is not id provided by /get_responses

Also in /give_answers example in the README.html, there is a "score" field shown, what is that score supposed to be? (a simple echo of the one provided in /get_questions?)

Thank you in advance!

Ah, sorry if the docs were a bit unclear here! The main thing here is that Prodigy makes very few assumptions about the dictionaries / objects that get passed around. There are some basic conventions (the raw text goes in "text") and some reserved properties like "_task_hash". But aside from that, whatever is loaded in gets passed around from the recipe to the REST API to the app and back to the server and database.

So the properties in the requests and the responses are whatever is in the data. And the examples in the PRODIGY_README are just arbitrary example of what the data may look like. It could be pretty much anything, really.

For example, your incoming stream of data may contain examples that look like this:

{"text": "hello", "label": "LABEL", "foo": "bar"}

When you execute the recipe, Prodigy may add some meta to the examples, e.g. the hashes. When you request a new batch of examples, /get_questions may return something like this:

    "tasks": [
        {"text": "hello", "label": "LABEL", "foo": "bar", "_input_hash": 123, "_task_hash": 456}

You then see the example in the web app, annotate it by clicking "accept" and it gets send back to the server. /give_answers would then post something like this:

    "answers": [
        {"answer": "accept", "text": "hello", "label": "LABEL", "foo": "bar", "_input_hash": 123, "_task_hash": 456}

These examples will then be passed to the recipe's update callback and stored in the database. As you can see, the custom field "foo": "bar" is just passed through – maybe it's some internal meta data, or maybe you're using a custom interface in the app that renders it. Prodigy itself doesn't care.

TL;DR: The id and score fields in the request/response examples are just example properties to illustrate whatever is in the example data. Maybe we should use a different example here becaue I can see how especially id makes it look like it's some special built-in field. If you want something like a unique ID, you can use the _task_hash.

Thank you @ines - that makes a lot more sense. I am able to pull questions, and give answers thanks to your explanation!

1 Like

Glad it works! :+1: Btw, in case you haven't seen it yet, you can also set the env variable PRODIGY_LOGGING=basic or PRODIGY_LOGGING=verbose. This should log everything that's going on under the hood, including the requests and responses. Verbose logging also outputs the individual examples that pass through Prodigy (so you probably want to write that to a file).