Is it possible to combine prefer_high_scores and prefer_uncertain so that a combination of high score and mid score documents can be batched out

mrxiaohe · June 24, 2020, 12:38am

I would like to batch out a combination of high score documents and mid-score documents to annotators (roughly 9:1 ratio). I am not sure how it can be done. Currently, my relevant setup is as follows:

nlp = spacy.load(MODEL_PATH, disable=["parser", "ner"])
stream = get_stream_filter(source, db, dataset)
stream = add_tokens(nlp, prefer_high_scores(model_score(stream, nlp)))

where the model_score function is defined as follows:

def model_score(stream, nlp):
    for eg in stream:
        yield (nlp(eg["text"]).cats["COMPLIMENT"], eg)

Any pointers would be greatly appreciated! Thanks.

ines · June 24, 2020, 12:55pm

Hi! I think for a use case like that, you probably just want to write your own sorter function. Under the hood, the sorters like prefer_high_scores and prefer_uncertain are just functions that take a stream of (score, example) tuples and decide whether to yield and example. For instance:

def custom_sorter(scored_stream):
    for score, eg in scored_stream:
        # TODO: your conditional logic that decides whether to send 
        # out the example or not
        yield eg

Within that function, you can keep any state, like a counter of the high/uncertain scores you sent out previously to make sure you keep the same ratio. Depending on your data, you might also want to include logic to ensure that you don't get stuck in a suboptiomal state and stop sending out examples – for instance, if your model somehow only ends up producing super low scores. (In the built-in sorters, Prodigy uses an exponential moving average.)

Topic		Replies	Views
Using prefer_uncertain with make-gold recipe usage , ner	2	645	July 24, 2019
Documentation for prefer_low_scores, prefer_high_scores, prefer_uncertain docs	1	642	January 9, 2020
Prodigy Active Learning prefer_uncertain mechanism usage , custom , pytorch	8	1850	November 23, 2020
prefer_uncertain: how does it use the stream to pick examples to score? usage , api	3	1400	December 12, 2017
textcat.teach to show all the docs in stream, despite their score textcat , spacy	5	578	August 7, 2018

Is it possible to combine prefer_high_scores and prefer_uncertain so that a combination of high score and mid score documents can be batched out

Related topics