Custom loader and Stream not compatible?

Hi there,

I am trying to set up a custom interface, together with a custom csv loader. I am following quite closely the example recipe for the custom UI, as it is quite close to what I am trying to achieve: I want my reviewers to tell me if a piece of text is relevant for them, and have some possibly interesting entities highlighted to help them review (for which I think I'll need to use ner.correct instead of ner.manual, but not there yet).

I am pasting the code I am using below

import prodigy
import csv
from prodigy.components.preprocess import add_tokens
import spacy

@prodigy.recipe("retrieval-validation")
def retrieval(dataset, source):
    # We can use the blocks to override certain config and content, and set
    # "text": None for the choice interface so it doesn't also render the text
    blocks = [
        {"view_id": "ner_manual"},
        {"view_id": "choice", "text":None},
        {"view_id": "text_input", "field_rows": 3, "field_label": "If ambiguous, why?"}
    ]
    options = [
        {"id": 2, "text": "😺 Relevant"},
        {"id": 1, "text": "🙀 Ambiguous"},
        {"id": 0, "text": "😾 Not relevant"}
    ]

    
    def custom_csv_loader(source): 
        with open(source) as csvfile: 
            reader = csv.DictReader(csvfile) 
            for row in reader: 
                r_score = row.get('r_score') 
                text = row.get('query') + '\n\n' + row.get('text')
                s_score = row.get('s_score') 
                pdf = row.get('pdf') 
                yield {'text': text, "options": options,'r_score':r_score,'s_score':s_score, 'meta': {'pdf':pdf}}



    stream = custom_csv_loader(source) #load stream with custom loader
    nlp = spacy.load("en_ner_jnlpba_md")#load model                          
    stream.apply(add_tokens, nlp=nlp, stream=stream)  # tokenize the stream for ner_manual

    return {
        "dataset": dataset,          # the dataset to save annotations to
        "view_id": "blocks",         # set the view_id to "blocks"
        "stream": stream,            # the stream of incoming examples
        "config": {
            "labels": ["DNA", "PROTEIN"],
            "blocks": blocks         # add the blocks to the config
        }
    }


When running this with my csv file I get:

AttributeError: 'generator' object has no attribute 'apply'

When I remove the ner.manual view_id and just load the stream with my custom loader, I have no issue, but as soon as I need to modify the tasks I have a problem.
The prodigy version I am using is 1.15.

Is there something I am missing to make this work?

Welcome to the forum @MarieCo :wave:

Apologies for a slightly delayed response but for some reason I've missed your post until now!

The reason why the apply method from the example doesn't work is that it is meant to be used with the Stream data structure, while your custom loader is returning a generator object.
I admit it might be a bit confusing because Prodigy built-in loaders used to return the generator until version 1.12. Then we made a complete refactor and introduced the Stream data structure for better tractability of the stream of tasks (some more context). We've updated most of the examples in the docs to use the updated Stream API, without stating that explicitly in the custom loaders section (note taken!)

You can still use "the old" generator way (and not use the apply method) but I'll show you how to edit your code to make it compatible with the new Stream data structure.
The easiest and the smallest edit would be to convert your generator into Stream using Stream's .from_iterable method:

from prodigy.components.stream import Stream
stream_as_generator = custom_csv_loader(source) #load stream with custom loader
stream = Stream.from_iterable(stream_as_generator) # convert it into Stream

That should make the rest of your code run as expected - let me know if it doesn't :slight_smile:

Hi @magdaaniol , thanks a lot for your answer (and for the welcome)!

Yes, I had seen some elements from your answer in the documentation but wasn't quite sure how to fix it, so thanks for that!
Maybe a stupid question but I can't figure it out: do I import Stream from somewhere? I get a

NameError: name 'Stream' is not defined

with the code you gave me.

Thanks!

Oh my bad! Should have added it. Yes, it needs to be imported form prodigy.components.stream module just like the get_stream function i.e.

from prodigy.components.stream import Stream

Thanks! It is what I would have thought, but...

AttributeError: type object 'Stream' has no attribute 'from_iterator'

Maybe I am missing something, but not sure where to look.

Hi @MarieCo ,

Apologies, I made a mistake when typing the snippet. The Stream method name is from_iterable not from_iterator. I've updated my original response. above.

Woop woop, that works!
Thanks again for all your help @magdaaniol , have a great day!

1 Like