AnnaAnia
(Anna Ania)
June 25, 2020, 12:06pm
1
Hi,
I'm trying to stream different examples to my annotators as follows.
In the first case they get examples to reannotate and only when none available, they get a stream of new unannotated data.
The first part works but then new examples don't stream - Prodigy is saying no tasks are available.
Please could you look at the code below and advise where I'm going wrong?
Source is a JSONL
def get_stream(source):
curated_examples = get_curated_examples()
for eg in curated_examples:
if eg["answer"] == "reject" and eg["meta"].get("racid", 'Unk') == annotator_id:
yield eg
stream = prodigy.get_stream(source)
return stream
Many thanks
ines
(Ines Montani)
June 26, 2020, 3:16pm
2
Hi! I think the return
might be the problem here? What happens if you write yield from stream
? Also, you probably checked that the call to get_stream
there correctly produces new examples, right?
AnnaAnia
(Anna Ania)
June 26, 2020, 3:26pm
3
Hi Ines,
Thank you for replying.
I did try 'yield stream' and that didn't work. I have also tried
stream = JSONL(source)
I know the jsonl is properly constructed and works with your standard manual ner streaming.
I will try systematically again. I know it's something simple most likely.
ines
(Ines Montani)
June 26, 2020, 3:52pm
4
Did you try yield from
? If you just yield stream
, that will yield a generator, which is not what you want. But if you yield from it, it should yield the individual examples. It's the equivalent of:
for eg in stream:
yield eg
1 Like
AnnaAnia
(Anna Ania)
June 26, 2020, 3:58pm
5
I haven't tried yielding from. I'll give that go. Many thanks Ines
AnnaAnia
(Anna Ania)
June 29, 2020, 7:53am
6
Hi @ines ,
I have tried your suggested 'yield from' and it has worked but there is a another part to my challange.
Checking what get_stream produces I get the following:
stream = prodigy.get_stream(source)
print(stream)
✨ ERROR: Invalid task format for view ID 'ner_manual'
'tokens' is a required property
{'text': '<reducted>', '_input_hash': -2092322675, '_task_hash': 1274619420, '_session_id': 'reannotated-<reducted>', '_view_id': 'ner_manual'}
This looks to me like the issue is in the function variables where manual is True
def reannotate_and_move_on(source, labels_source, annotator_id, n_examples=-1, manual=True):
If I set manual to false, then examples are streamed. The crux is, I need them to be able to annotate.
Could you advise how best to fix this, please?
Thank you
Anna
AnnaAnia
(Anna Ania)
June 29, 2020, 8:09am
7
I've solved it with add_tokens like so...
stream = prodigy.get_stream(source)
stream = add_tokens(nlp, stream) # nlp is my custom ner model
Many thanks
Anna
1 Like