Compare view problem with progress

Hi,

When trying to use the compare view, I see that the progress never updates.
I tried to take ner.eval-ab and just add a constant progress, but I can’t see it reflected in the UI.
Can you please check this out?

Thanks,
Beka

This is the (very slightly, just adding progress) versio of ner.eval-ab I tried:

@recipe(
    "ner.eval-ab",
    dataset=recipe_args["dataset"],
    before_model=recipe_args["spacy_model"],
    after_model=recipe_args["spacy_model"],
    source=recipe_args["source"],
    api=recipe_args["api"],
    loader=recipe_args["loader"],
    label=recipe_args["label_set"],
    exclude=recipe_args["exclude"],
    unsegmented=recipe_args["unsegmented"],
)
def ab_evaluate(
    dataset,
    before_model,
    after_model,
    source=None,
    api=None,
    loader=None,
    label=None,
    exclude=None,
    unsegmented=False,
):
    """
    Evaluate a n NER model and build an evaluation set from a stream.
    """
    print("RECIPE: Starting CUSTOM recipe ner.eval-ab", locals())

    def get_task(i, text, ents, name):
        spans = [{"start": s, "end": e, "label": L} for s, e, L in ents]
        task = {
            "id": i,
            "input": {"text": text},
            "output": {"text": text, "spans": spans},
        }
        task[INPUT_HASH_ATTR] = murmurhash.hash(name + str(i))
        task[TASK_HASH_ATTR] = murmurhash.hash(name + str(i))
        return task

    def get_tasks(model, stream, name):
        tuples = ((eg["text"], eg) for eg in stream)
        for i, (doc, eg) in enumerate(model.nlp.pipe(tuples, as_tuples=True)):
            ents = [(ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]
            if model.labels:
                ents = [seL for seL in ents if seL[2] in model.labels]
            task = get_task(i, eg["text"], ents, name)
            yield task

    before_model = EntityRecognizer(spacy.load(before_model), label=label)
    after_model = EntityRecognizer(spacy.load(after_model), label=label)
    stream = list(
        get_stream(
            source, api=api, loader=loader, rehash=True, dedup=True, input_key="text"
        )
    )
    if not unsegmented:
        stream = list(split_sentences(before_model.nlp, stream))
    before_stream = list(get_tasks(before_model, stream, "before"))
    after_stream = list(get_tasks(after_model, stream, "after"))
    stream = list(get_compare_questions(before_stream, after_stream, True))
    progress = lambda session, total: 0.5

    return {
        "view_id": "compare",
        "dataset": dataset,
        "stream": stream,
        "on_exit": printers.get_compare_printer("Before", "After"),
        "exclude": exclude,
        "progress": progress,
    }

So just to make sure I understand this correctly: When you just run the recipe normally (without a custom progress), the progress just stays at 0?

When observing the progress, one thing to keep in mind is that it's calculated on the server and updated whenever new answers are sent back. This is so that it can be based on the loss reported by the model updating etc. So it will always take at least one batch of answers for the progress to update. If you want the updates to be sent quicker, you can set a lower batch size.

Ah, I think there's an interesting edge case here: If a stream exposes a __len__ attribute (e.g. if it's a list that has a length, as opposed to a generator), Prodigy will use that to calculate the progress based on the stream length vs. the number of annotations. Otherwise, it falls back to the progress function. This is probably unideal – because in your case, it looks like it's using the stream length and not the custom function.

A workaround for now could be to explicitly return a generator insteas of a list, e.g. by adding stream = (eg for eg in stream).

Thanks a lot Ines! I didn’t figure out the progress is only updated when hitting save, so I thought it is never updates :man_facepalming:
Regarding the __len__ vs. progress, I can just use the __len__ now, but your trick is useful whatsoever.

1 Like