db-in successful but not reflected in annotation interface

Hi!

I started an interface using a postgres db to write to, but a local inputfile:

python -m prodigy custom_recipe dataset_name inputfile.csv -F recipes/custom_recipe.py

I can see the dataset is created on postgres and the interface seems to run fine. I then add data using a local file:

python -m prodigy db-in dataset_name annotations.json

Checking with

python -m prodigy progress dataset_name

confirms the successful import

However, when I open the interface, the annotations using the same annotator sessions as imported annotations seem to start from scratch and the total number of annotations for this annotator in the interface is 0. Any idea why this might be the case?

prodigy.json

{
  "buttons": ["accept", "undo"],
  "feed_overlap": true,
  "host":"0.0.0.0",
  "port":8080,
  "db":"postgresql",
  "db_settings":{
    "postgresql":{
      "host":"placeholder",
      "dbname":"placeholder",
      "user":"placeholder",
      "password":"placeholder",
      "port":"placeholder"
    }
  }
}

custom_recipe.py

import prodigy
from prodigy.components.loaders import CSV

@prodigy.recipe(
    "custom_reipe",
    dataset=("Dataset to save annotations into", "positional", None, str),
    file_in=("Path to examples.jsonl file", "positional", None, str)
    )
def custom_recipe(dataset, file_in):
    blocks = [
        {"view_id": "text"},
        {"view_id": "text_input","field_rows": 1, "field_id": "user_input_a","field_label": "Question 1? 0 (no), 1 (yes)"},
        {"view_id": "text_input","field_rows": 1, "field_id": "user_input_b","field_label": "Question 2? 0 (no), 1 (yes)"}
    ]
               
    stream = CSV(file_in)

    return {
        "view_id": "blocks",  # Annotation interface to use
        "dataset": dataset,  # Name of dataset to save annotations
        "stream": stream,  # Incoming stream of examples
        "config": {  # Additional config settings, mostly for app UI
            "blocks": blocks,
            "choice_style": "multiple",
            "custom_theme": {
                "bgCardTitle": "#801414",
                "colorHighlightLabel": "#801414"
            },
            "global_css_dir": "./recipes/style",
        }
    }

annotations.json

{"text":"This is the text"_input_hash":-346693553,"_task_hash":-132505627,"_view_id":"blocks","user_input_a":"1","user_input_b":"1","answer":"accept","_timestamp":1721748477,"_annotator_id":"dataset_name-annotator","_session_id":"dataset_name-annotator"}

Edit: running without postgres/fully locally file did not affect this behaviour. Issue persists, annotations seem not to be taken into account in the interface.

Welcome to the forum @nicolaiberk :wave:

With the legacy loaders such as the CSV loader the progress bar might not be reliable. You might want to switch to the refactored get_stream function for loading files in different formats. Please see the "Recommended use" note in the loaders docs.

In any case, if you have stored the annotations from a given annotator in the dataset and you restart Prodigy server with the same dataset as target and access the server with the same session name (as you say you do), you shouldn't be seeing the same questions in the UI.

When you restart the Prodigy after uploading the annotations manually from the jsonl file, how does your command look exactly?

Also, Prodigy excludes the inputs based on the task_hash (by default) so it is important that the same hashing function as used in the custom recipe was applied to the annotations that you've loaded with the db-in (e.g. if they were created with the same custom recipe, that should be fine as the custom recipe is using the default hashing). Just for some further reference, you can read more about hashing here.

1 Like