Stream examples from JSONL


I'm trying to stream different examples to my annotators as follows.

In the first case they get examples to reannotate and only when none available, they get a stream of new unannotated data.

The first part works but then new examples don't stream - Prodigy is saying no tasks are available.

Please could you look at the code below and advise where I'm going wrong?

Source is a JSONL

    def get_stream(source):
        curated_examples = get_curated_examples()
        for eg in curated_examples:
            if eg["answer"] == "reject" and eg["meta"].get("racid", 'Unk') == annotator_id:
                yield eg
        stream = prodigy.get_stream(source)
        return stream 

Many thanks

Hi! I think the return might be the problem here? What happens if you write yield from stream? Also, you probably checked that the call to get_stream there correctly produces new examples, right?

Hi Ines,

Thank you for replying.

I did try 'yield stream' and that didn't work. I have also tried
stream = JSONL(source)

I know the jsonl is properly constructed and works with your standard manual ner streaming.

I will try systematically again. I know it's something simple most likely.

Did you try yield from? If you just yield stream, that will yield a generator, which is not what you want. But if you yield from it, it should yield the individual examples. It's the equivalent of:

for eg in  stream:
    yield eg
1 Like

I haven't tried yielding from. I'll give that go. Many thanks Ines

Hi @ines,

I have tried your suggested 'yield from' and it has worked but there is a another part to my challange.

Checking what get_stream produces I get the following:

stream = prodigy.get_stream(source) 

✨  ERROR: Invalid task format for view ID 'ner_manual'
'tokens' is a required property

{'text': '<reducted>', '_input_hash': -2092322675, '_task_hash': 1274619420, '_session_id': 'reannotated-<reducted>', '_view_id': 'ner_manual'}

This looks to me like the issue is in the function variables where manual is True

def reannotate_and_move_on(source, labels_source, annotator_id, n_examples=-1, manual=True):

If I set manual to false, then examples are streamed. The crux is, I need them to be able to annotate.

Could you advise how best to fix this, please?

Thank you


I've solved it with add_tokens like so...

stream = prodigy.get_stream(source)
stream = add_tokens(nlp, stream) # nlp is my custom ner model 

Many thanks


1 Like