Getting a mismatch in db count and Ui Count

Hello Team
One problem I encountered when working on the Prodigy image tagging was that the database output did not match the user interface.I've posted the screenshots here. The count difference is visible. For db-out, I used the following command:prodigy db-out databasename "/filepath".Moreover, I'm using two sessions.
Version 1.11.0a8 of Prodigy


Thanks and Regards,
Juhi

1 Like

Hi,

I am also facing the problem where I could see different counts between prodigy db-out and Prodigy UI interface. FYI, tagging process broken into two sessions.
Version: v1.11

Thanks

Hi @Juhi and @Vinoth ,

Could please run prodigy db-stats {dataset_name} and share the output? In principle, the TOTAL in the UI should correspond to the number of examples exported with db-out.
Was this annotation done with multiple annotators?
@Vinoth are you also running the image.manual recipe?

Hi @magdaaniol ,

Yes, I am running image.manual recepie.

I hope, there is no such command "prodigy db-stats <dataset_name>". If we use the command, we were getting error as "Can't find recipe or command 'db-stats'."

Thanks

Ooops sorry! That was prodigy stats <dataset_name> (thanks for the info!)

Hi @magdaaniol
yes annotation done with multiple annotators by creating sessions like this
http://xxx.xx.xxx.xxx:7879/?session=username
this the output of the command prodigy stats <dataset_name> and the UI count showed was 700

Thanks

Thanks @Juhi. Is there any chance that your annotators reported seeing the same example more than once? If so, I think you hit an issue related to Prodigy serving duplicate examples to the UI (we've seen that happening, especially in multi-user sessions) which makes it look in the UI like you've gone through X examples, but upon saving to the DB they get deduplicated which is why the number is lower.

This was fixed in superior versions of Prodigy, so I can only recommend upgrading, at least to 1.12+ and ideally to the latest version.

If there are no reports of duplicates in the stream, you would need to recreate the annotation with more verbose logs turned on by running Prodigy with the environment variable PRODIGY_LOGGING=basic to see log statements for the /give_answers endpoint.
It could also be worth trying with a fresh dataset.

In any case, this is just to understand whether there's some oddity about your input dataset as we won't be patching this version of Prodigy. 1.11.0 is a really old version and I also see that you're on a nightly experimental alpha (which itself was superseded with superior alphas). Since Prodigy 1.12 we've done a comprehensive refactoring of the task streaming mechanism to address issues related to duplicates and multi-user sessions that previous versions were known for.