I would like to batch out a combination of high score documents and mid-score documents to annotators (roughly 9:1 ratio). I am not sure how it can be done. Currently, my relevant setup is as follows:
nlp = spacy.load(MODEL_PATH, disable=["parser", "ner"]) stream = get_stream_filter(source, db, dataset) stream = add_tokens(nlp, prefer_high_scores(model_score(stream, nlp)))
model_score function is defined as follows:
def model_score(stream, nlp): for eg in stream: yield (nlp(eg["text"]).cats["COMPLIMENT"], eg)
Any pointers would be greatly appreciated! Thanks.