Hi,
I am working on a sentence annotation task to identify whether a sentence is about climate change or not.
I am hosting prodigy (using nginx) on my home computer so that I my master student can also access the UI and annotate sentences. We are however coming across some weird behaviour which I don't understand where it is coming from. Currently the UI says that we have annotated a total of 1940 sentences in the dataset climate_klimatonly_v1 . This number is the same for both him and me.
However, when I run:
from prodigy.components.db import connect
db = connect()
all_dataset_names = db.datasets
examples = db.get_dataset("climate_klimatonly_v1")
print(len(examples))
I get 1612 examples. Hence, some of our annotation progress has disappeared. I have looked through prodigy.db.
The strange behaviour is this:
When my master student does the same we can both see that the total number in the UI increases but the number of examples in the DB does not increase.
When I annotate both the total progress in the UI and the number of examples in the DB both increase.
Note that we are using sessions so he annotates with his name and I use my own. I have double checked with him that he is saving the progress. I have also double checked that the dataset in the UI is climate_klimatonly_v1 .
Any ideas of what might be causing this? First, I was thinking that it might be a network connectivity issue? Perhaps, he looses connection when trying to save??
Hi! It's definitely strange that it works for you and not for your student Prodigy will auto-save annotations under the hood as batches of annotations are collected. You can also hit the save button or press cmd+s manually to save. If you're not seeing an explicit error message. this indicates that the annotations were saved to the database.
Just to confirm, your student is definitely connecting to the web app and Prodigy on your machine with your database, and not running Prodigy locally themselves?
Another thing to keep in mind is that annotations are saved in batches, and that the progress percentage you see is calculated on the server as annotations are sent back (which allows you to also factor in other signals to calculate the progress, like the loss if you're updating a model in the loop). So it'll take a batch to be submitted for the progress percentage to update.
Thanks! Everything seems to be working fine again but he claims that he has annotated at least 2000 sentences that where lost (while he was connecting to my machine - he does not have access to Prodigy himself). I cannot speak to that specific number but I did however notice that there was a discrepancy in the total number of annotated examples displayed in the webb UI and what I saw in the database. Now everything is working fine but I don´t understand how these numbers could differ after saving the progress.
This is definitely strange and I've never seen this happen before By default, Prodigy will auto-save in the background for each batch of batch_size, so if something goes wrong with the database connection or similar, he would have seen an error message in the UI that would have kept popping up for each submitted annotation.
Did you specify a custom DB by any chance? Maybe his instance used the default DB, while yours used the custom DB? In that case, you might have a prodigy.db in your user home directory that contains his annotations.
Seems to be happening again that there is a discrepancy in the UI and DB. My student reported seeing some error message in the UI but he did not recall the message. Question thus remains why there are more examples reported in the UI than when querying the DB?
I checked that we only have one single db. I attach the prodigy.json file below.
Ah, this would be really helpful to know! If the problem was that the annotations couldn't be saved, e.g. because of an error on the back-end, your machine not accepting the connection etc., this could definitely explain what's going on. Were there any error messages in your terminal? If not, you can check the browser's network console if the error occurs and look for a failed API request. This should give you more info about why the request failed. Since your student is connecting to your machine remotely, the problem could be related to how the API endpoints are exposed over the internet.
He just received another this morning.The error message was ""connection to server failed.""
He started off annotating 100 sentences then received the error message and then did 10 more.
When checking the DB I found 10 new examples added... So it seems the last 10 were ok.
Did not find any error messages on my end when checking in the terminal...
Okay, in that case, it does sound like the problem ia caused by how the app is served via your machine. How are you serving and exposing Prodigy via the internet from your machine? Maybe the connection isn't stable and the back-end becomes temporarily unavailable, or the connection/tunnel goes down?
Prodigy is served using Nginx installed on a Ubuntu machine at my home. Connection via is fiber but I have indeed experienced connection interruptions in the past with my provider. I attach the Nginx config FYI.
Question is what can be done to prevent this regardless of connection interruptions. Do you have any recommendations for how to best serve prodigy to allow for reliable annotation process away from my machine?
server {
server_name riksdagen.davcon.se www.riksdagen.davcon.se;
proxy_buffering off;
location / {
proxy_set_header Host $host;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_pass http://192.168.1.144:8051/;
proxy_redirect off;
}
listen 443 ssl; # managed by Certbot
ssl_certificate /etc/letsencrypt/live/admin.reportall.se/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/admin.reportall.se/privkey.pem; # managed by Certbot
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
server {
if ($host = www.riksdagen.davcon.se) {
return 301 https://$host$request_uri;
} # managed by Certbot
if ($host = riksdagen.davcon.se) {
return 301 https://$host$request_uri;
} # managed by Certbot
server_name riksdagen.davcon.se www.riksdagen.davcon.se;
listen 80;
return 404; # managed by Certbot
}