Getting a mismatch in db count and Ui Count

Juhi · April 16, 2024, 12:38pm

Hello Team
One problem I encountered when working on the Prodigy image tagging was that the database output did not match the user interface.I've posted the screenshots here. The count difference is visible. For db-out, I used the following command:prodigy db-out databasename "/filepath".Moreover, I'm using two sessions.
Version 1.11.0a8 of Prodigy

Thanks and Regards,
Juhi

Vinoth · April 17, 2024, 7:51am

Hi,

I am also facing the problem where I could see different counts between prodigy db-out and Prodigy UI interface. FYI, tagging process broken into two sessions.
Version: v1.11

Thanks

magdaaniol · April 17, 2024, 12:08pm

Hi @Juhi and @Vinoth ,

Could please run prodigy db-stats {dataset_name} and share the output? In principle, the TOTAL in the UI should correspond to the number of examples exported with db-out.
Was this annotation done with multiple annotators?
@Vinoth are you also running the image.manual recipe?

Vinoth · April 17, 2024, 3:20pm

Hi @magdaaniol ,

Yes, I am running image.manual recepie.

I hope, there is no such command "prodigy db-stats <dataset_name>". If we use the command, we were getting error as "Can't find recipe or command 'db-stats'."

Thanks

magdaaniol · April 18, 2024, 7:40am

Ooops sorry! That was prodigy stats <dataset_name> (thanks for the info!)

Juhi · April 18, 2024, 11:07am

Hi @magdaaniol
yes annotation done with multiple annotators by creating sessions like this
http://xxx.xx.xxx.xxx:7879/?session=username
this the output of the command prodigy stats <dataset_name> and the UI count showed was 700

Thanks

magdaaniol · April 19, 2024, 8:04am

Thanks @Juhi. Is there any chance that your annotators reported seeing the same example more than once? If so, I think you hit an issue related to Prodigy serving duplicate examples to the UI (we've seen that happening, especially in multi-user sessions) which makes it look in the UI like you've gone through X examples, but upon saving to the DB they get deduplicated which is why the number is lower.

This was fixed in superior versions of Prodigy, so I can only recommend upgrading, at least to 1.12+ and ideally to the latest version.

If there are no reports of duplicates in the stream, you would need to recreate the annotation with more verbose logs turned on by running Prodigy with the environment variable PRODIGY_LOGGING=basic to see log statements for the /give_answers endpoint.
It could also be worth trying with a fresh dataset.

In any case, this is just to understand whether there's some oddity about your input dataset as we won't be patching this version of Prodigy. 1.11.0 is a really old version and I also see that you're on a nightly experimental alpha (which itself was superseded with superior alphas). Since Prodigy 1.12 we've done a comprehensive refactoring of the task streaming mechanism to address issues related to duplicates and multi-user sessions that previous versions were known for.

Topic		Replies	Views
db-out killed database , solved	2	663	January 18, 2019
Duplicate images in image.manual image , streams	1	445	December 6, 2021
count_dataset missing in docs docs , done	3	454	October 28, 2020
Specify database to use with db-out? usage , database , solved	2	990	September 24, 2020
'Can't find db' error for db-out and db-merge for spancat solved	4	275	September 6, 2023

Getting a mismatch in db count and Ui Count

Related Topics