I’m using textcat.teach
and it seems that the active learning is not working. I went over a couple of hundred examples and they are still served in the original file order.
I do see a different score for each example so I assume that there is some kind of learning, maybe it is just the serving?
That’s very strange. The same prefer_uncertain
sort function is used in both the textcat.teach
and ner.teach
recipes, so It’s hard to see what step things could be going wrong here. The sequence of operations is:
- Stream data through spaCy. The
spacy.pipeline.TextCategorizer
pipeline component attaches scores indoc.cats
- Pass through
prodigy.models.textcat.TextClassifier
wrapper. This simply translates thedoc
object back into a stream of(score, example)
tuples. - The scored stream is then reranked using the
prefer_uncertain
function.
prefer_uncertain
accepts two strategies for finding “good” examples in a (potentially infinite) stream. The default strategy tracks the exponential moving average of the scores, and emits examples that are better than the average by some amount of variance. In the initial examples the estimates of the average and variance aren’t confident, so it doesn’t do much manipulation of the earliest examples.
The second strategy is simpler: it uses the score as an emission probability, and uses a random variable to find whether to emit the example or whether to filter it.
It would be helpful if you could try the second strategy by changing stream = prefer_uncertain(stream)
to stream = prefer_uncertain(stream, algorithm='probability')
. This will tell us whether the problem seems to be in the sorter, or somewhere upstream.
Another useful thing to check is whether the scores being displayed to you (which are read from eg['meta']['score']
actually match the scores the sorter sees, which come in as a tuple on the stream. That is, the model produces sequence of (score, example)
tuples — that score is what the sorter responds to. I guess a possible bug would be that the actual scores being used to sort are all 0.5, and the ones being displayed are a lie. I’ve read the code again and that doesn’t seem possible, but hey — it could be the thing.
If the sorter isn’t the problem and the scores really do vary, then I’m truly flummoxed: it doesn’t seem like there’s a third place for the bug to be. Could you also tell me whether you’re using patterns, and whether you’re using the long text mode?
I changed it and the problem still exist.
I'm using patterns and I'm not in long text mode.
I will check the scores and get back with more results.
It must be the patterns interaction — that’s the newest thing we added to the recipe too. Are all the early matches coming off the pattern matcher? They should tell you a pattern number along with the score.
No. That's why I suspected the active learning.
Hmm. This doesn’t add up. So, all of the following are true?
- The examples receive different scores from the model
- No examples are skipped
- You’ve tried both the default sorter, and the
algorithm="probability"
sorter
The probability sorter is super simple — if the scores are different I don’t understand how it could it be failing to drop examples. Let’s make this even simpler. Instead of the prefer_uncertain
function, wrap the stream in this:
import random
def drop_examples(scored_stream):
for score, example in scored_stream:
# Always give examples some chance of being emited
score = min(0.999, max(0.0001, score))
certainty = abs(score - 0.5) * 2
if random.random() >= certainty:
yield example
If the scores differ, this can’t fail to drop examples, right? So we should mostly get examples with scores close to 0.5.
Another suggestion: if you're not seeing any pattern matches, you might also want to test your patterns against spaCy's tokenization to ensure that they match as expected. If your patterns don't actually match and you're doing a textcat.teach
cold start with a new category, your model likely won't learn anything meaningful, and Prodigy will keep suggesting you very random examples. This still doesn't explain the behaviour to describe – but it can easily make the whole thing more confusing.
We've actually just released an experimental tool that should help with testing patterns:
I might just found my problem, sorry if I wasted your time for nothing.
I created a different input data file with ordered example_id
to test it again and I see that some examples are skipped (1-2 examples each time). The old example_id
wouldn’t let me see those small skips.
Is this behavior sounds correct?
@WeinstockShahar No problem! Glad it seems to be working — now I just hope it does useful things as well
The default sorter (based on the exponential moving average) takes a while to get confident in its estimates of the first and second moments. So, at first it might not skip many. The interaction with the pattern matcher also complicates things. Once the model warms up, it should be fairly selective in what it asks you.
Thanks a lot