textcat.teach uncertain sorter show options with score 0

I understanded that uncertain sorter, filter examples around 0.50 (in example from 0.45 to 0.55) score (prefer_uncertain: how does it use the stream to pick examples to score?)
but my textcat.teach is showing predictions with score 0

This is happening with "ema" algoritm

with the "probability" algoritm, shows 2-3 predictions per example, but some with less 0.05 score

Why could this be happening?

Is low effective classify the same text +24 times (I had 24 labels, but should review the same tuple example-label multiple times)

Sorry for the late reply! I only just spotted this one.

If the question is still relevant, could you share some of your annotation examples together with your custom recipe? That might make it easier for me to provide some extra context.

the issue about label same samples multiple times continous, but before to be a bug, some questions:
-how are filtered the repeated samples into a dataset?
-if 2 samples are the "same text" but the sample b has one space different than sample A, are threat like same sample and deduplicated?
-if I start prodigy with 100 samples file, label them, and restart with other file with 50 samples, but 30 samples was also included on the initial file, are these 30 samples removed/avoid ?

Hi @info2000!

For your three new questions, I wasn't sure if you're asking for only textcat.teach or Prodigy in general. The answers below should be the same for both.

I'm sorry, I don't understand the question. Can you rephrase?

If you're interested in the default behavior for deduplication, you may find the Docs for the filter_duplicates function to be helpful or in the answers below.

For two texts where only difference is one space, would have different values for their input_hash.

Since they have different input_hash values (and also different task_hash), they would be treated as two different samples.

Let's assume for both sessions, you'll set your dataset to the same Prodigy dataset.

For annotation, by default records are deduped by task_hash, see "exclude_by" field in Configuration. Therefore, if you're doing the same task (e.g., textcat recipe) you did in the first 100 samples, then only 20 samples would be shown (30 repeated tasks would be excluded).

But if the 2nd annotation session used a different task (e.g., different recipe like ner), then all 50 samples would be shown.

Sorry we haven't responded yet to this. Are you still interested in an explanation for this? As Vincent had recommended, if we have a reproducible example, it makes it a lot easier for us to debug.

I've found some past posts that may explain shed some light. While uncommon, it's not impossible to see low scores when using the default ema algorithm:

if you get a long sequence of low-scoring examples, the probability sorter will ask you fewer questions from them, while the exponential moving average sorter will get impatient, and start asking you questions even if the scores are low.

As an alternative, you may want to consider using algorithm = 'probability' as the post you originally cited mentioned:

On the other hand, if you know the target class is rare, you want the sorter to “believe” the scores much more. In this case the probability sorter is better.

I also found the discussion here to be helpful: