Only 25 lines loading from my .jsonl stream

I’m trying to train a classifier with my own list of sentences:

$ prodigy textcat.teach discharge_classifier en_core_web_lg sentences_discharge.jsonl --label DISCHARGE`

Prodigy loads correctly and allows me to classify 25 sentences. After 25 sentences, the interface displays “No Task Available”

However, there are 99 sentences in sentences_discharge.jsonl

Why doesn’t Prodigy load my complete list of sentences?

I have checked my .jsonl file - all the line endings are consistent (\n), there are no commas at the end of lines, and there are no wrapping [ and ]

Edit: It stops at 10 sentences if I use a .txt file
This appears to be a bit non-deterministic - it has stopped at 26 sentences and 8 sentences as well, for the same file. Am I somehow misunderstanding how the annotator works with the text stream?

Hi! One thing to keep in mind when using the active learning-powered recipes like textcat.teach is that they don’t necessarily show you all examples you’re loading in. The main concept behind the active learning approach is to show you the most relevant examples for annotation using the model’s predictions. Under the hood, Prodigy uses an exponential moving average of the scores to decide whether to send an example out for annotation or not.

So based on the annotation decisions you make, the model state and the annotations that are already in the database, you’ll see different suggestions. That’s also why the active learning recipes typically work best if you have very large volumes of raw data and want to find the best possible examples.

If you’re looking to just label all examples in your dataset as they come in, you probably want to be using a recipe like textcat.manual instead.

If I use textcat.manual then I get error:

$ prodigy textcat.manual sentences_discharge en_core_web_lg sentences_discharge.jsonl --label "#Discharge"

 Using 1 labels: #Discharge
 ERROR: Invalid task format for view ID 'classification'
'label' is a required property 

My file sentences_discharge.jsonl consists of {"text":"sometext"}. This is the lists of data I want to classify. I do have other files with the label property, which I load using --patterns on the textcat.teach, but they are not working when I use textcat.manual as the --patterns argument does not exist on the manual API. What am I doing wrong?

It looks like you might be running into the same bug mentioned here:

@ines has a temporary fix explained in her reply.

1 Like