Prodigy Frequently Erroring

Hello!

I am a data engineer, and while I am very new to Prodigy, I did manage to get it set up and running on a Windows EC2.

The use case is that we have sensitive data that our SMEs are looking through to establish how much sensitive information is actually in the data that we are receiving, so the EC2 is secure with restricted access. At the moment, we have SMEs RDP into the instance (only one at a time can review data).

What is happening is when the SMEs are looking through the data, Prodigy will throw an "Oops, something went wrong :(" error. The way we've been getting around this is killing the process for Prodigy and restarting it. I've been doing this through SSM in the EC2 console. The problem is, we get this problem every 15 minutes or so and it's, understandably, very disruptive when the data is being reviewed.

Does anyone have any ideas?

Thanks in advance!

hi @csdata!

Thanks for your question and welcome to the Prodigy community :wave:

Unfortunately, the "Oops, something went wrong" error is a generic error that could have many different causes (per the documentation):

The “Oops, something went wrong” message and the red popup are the app’s way of telling you that the Prodigy server raised an error. If you navigate to the terminal, it will show you more details about what went wrong. Some potential problems are:
Can’t find file: Check that you’ve downloaded the news_headlines.jsonl file or any other file you want to use, and that you’ve placed it into the current working directory (or that you’re using the correct path if you’ve put it somewhere else).
Invalid JSON: This might happen if you load in your own data and it means that there’s something wrong with this line of JSON and Python can’t load it. If you’re using a .jsonl file, make sure that each line contains one valid JSON object. See the input data formats for examples. You can also use a JSON linter to check your data.
Error while validating stream: no first example: This error occurs if Prodigy couldn’t find any examples to send out for annotation. Make sure that your data has the correct format – if none of your records have a "text", Prodigy will skip them because it doesn’t know where to find the text, resulting in an empty stream.

I suspect though these common issues aren't the problem as you wouldn't be able to have any success.

You'll need to use Prodigy's debugging/logging features to add PRODIGY_LOGGING=basic or PRODIGY_LOGGING=verbose to the command. Hopefully that may give some better idea of the problem.

Also, if you're able to find more details, be sure to look at several of the past Prodigy support tickets on related issues like EC2, especially handling simultaneous multi-users like this post:

I hope this helps and let us know if you're able to make any more progress or able to resolve the issue!