prodigy drop <dataset> not working

I am using the latest Prodigy version 1.12.4. with MySQL as the database. When using the prodigy drop command to drop a dataset I'm getting the following error

peewee.IntegrityError: (1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`prodigy_stt`.`link`, CONSTRAINT `link_ibfk_1` FOREIGN KEY (`example_id`) REFERENCES `example` (`id`))')

That shouldn't happen. I'll dive into this right away to see if I can reproduce it.

In the meantime, is there anything special about your MySQL setup? Could you share the version that you're using? Also, was this an issue before in earlier versions?

Is there anything you could share about the dataset that you're trying to drop? Was it annotated by multiple users? Was it a dataset from a custom recipe?

I suppose another thing that comes to mind, are you dropping the dataset while somebody is still annotating?

Could you share the output of the logs? You should be able to run something like:

PRODIGY_LOGGING=verbose prodigy drop <name-of-dataset>

Could you also share the stats of the dataset?

prodigy stats <name-of-dataset>

I just confirmed that the unit tests run fine with MySQL in our testing suite, also when we're dealing with a dataset with multiple annotators, so that's making me wonder if there's something special about your setup that is causing this error to happen.

I'll await your response on some of my questions for now, because they may provide me with the hint that I need to search further. Once you reply I'll gladly dive back in again.

Hi @koaning

When I tried to reproduce the error with a dummy dataset it worked file. Unfortunately, I cannot run the drop command on the dataset that threw that error anymore, we decided to keep using the dataset and filter out the unwanted examples from the JSONL file.

  • the MySQL version is
mysql  Ver 8.0.33 for Linux on x86_64 (MySQL Community Server - GPL)
  • Multiple annotators are currently using the dataset using different sessions. but we wanted to drop it since our testing stage is over and annotators were ready to seriously start working.

  • The recipe I am using is audio.transcribe. The build-in recipe.

  • Before the version update, we were using Sqlite and did not have any issues with Drop.

  • The output of prodigy stats was

============================== ✨  Prodigy Stats ==============================

Version          1.12.4                        
Location         /usr/local/lib/python3.9/dist-packages/prodigy
Prodigy Home     /usr/local/prodigy/.prodigy   
Platform         Linux-5.10.0-19-cloud-amd64-x86_64-with-glibc2.31
Python Version   3.9.2                         
Spacy Version    3.6.0                         
Database Name    MySQL                         
Database Id      mysql                         
Total Datasets   45                            
Total Sessions   34985                         


============================== ✨  Dataset Stats ==============================

Dataset       stt_second_review  
Created       2023-07-09 12:37:53
Description   None               
Author        None               
Annotations   3107               
Accept        3047               
Reject        8                  
Ignore        52              

I see.

If I had to guess then, it may be that somebody was annotating while the prodigy drop command as called. But it's hard to know for sure without the logs.

If this ever happens again, please run the command with verbose logs turned on and share the results here. We'd be eager to dive into this issue if there's another hint for us to chase.