"No tasks available" even though there are still samples left in "mark" mode

Hi,

First, thanks for the prodigy and from the recent feedback from my teammates they are happy with this new tool compared to the previous tagging tools that we have tried like brat or self-designed.

One issue that we meet now is that prodigy won’t iterate thoroughly all the data that we provided under prodigy mark. For example, in our .jsonl file, there are 1000 records, but at the end, we can only tag about 800 and then the screen shows No tasks available. I saw this post and the solution recomendeded by Ines is to use mark. I’m wondering that is there any misunderstanding that I have of it?

Another question is where can I dow
Thanks,
Zhenshan

Thanks for the report – and I'm glad you're finding Prodigy useful so far :blush:

The issue you describe definitely shouldn't happen.

What I meant in my reply was: The teach recipes will select the most relevant examples and thus will skip examples with very high or low predictions, to help you focus on the most important ones. Ideally, this means you'll have to annotate fewer examples in total, while still getting similar results after training.

The mark recipe should go through your examples in order and just ask questions, without making any predictions, selecting examples or modifying the stream. So if you don't want to use the "active learning component" and just want to annotate a fixed set of examples in order, it's generally recommended to use mark instead of teach.

Two possible explanations and solutions I can think of:

  • Can you check an make sure that your examples do not contain any duplicates? Prodigy will assign a unique input hash to each example that comes in, based on the properties (text, spans etc.) and filter out duplicates, to make sure you're not annotating an example twice. So is it possible that your data contains 200 duplicate tasks?
  • Are you setting the --memorize flag when using the mark recipe? Setting the flag will exclude all examples that were already annotated in the same dataset. For example, if you've already annotated 200 examples and stored them in a dataset, and then restart prodigy mark using the same dataset ID, the tasks you've already annotated will be skipped.

Looks like your sentence was somehow cut off?