Adding column from CSV into meta with custom loader

I am trying to add a column from a csv file into the meta field so that the “id” of each item can be viewed by the annotator. I have a custom loader, referencing your discussion in Template for Prodigy corpus and API loaders

I am trying the following:

class CustomCsvLoader(object):

    def __init__(self, filename):
        self.filename = filename

    def __iter__(self):
        df = pd.read_csv(self.filename)
        for _, row in df.iterrows():
            text = row['text']
            id_text = row['id']
            yield {"text": text, "meta": {"id": id_text}}

@recipe('ner.custom-gold',
        dataset=recipe_args['dataset'],
        spacy_model=recipe_args['spacy_model'],
        source=recipe_args['source'],
        api=recipe_args['api'],
        label=recipe_args['label_set'],
        exclude=recipe_args['exclude'],
        unsegmented=recipe_args['unsegmented'])
def custom_gold(dataset, spacy_model, source=None, api=None,
              label=None, exclude=None, unsegmented=False):
    result = make_gold(dataset, spacy_model, source=None, api=None, loader=CustomCsvLoader(source),
        label=label, exclude=exclude, unsegmented=unsegmented)

    return result

It doesn’t seem that I am using the loader correctly. The following error is returned:

  File "/venvs/dev3.6/lib/python3.6/site-packages/prodigy/recipes/ner.py", line 199, in make_gold
    dedup=True, input_key='text')
  File "cython_src/prodigy/components/loaders.pyx", line 51, in prodigy.components.loaders.get_stream
  File "cython_src/prodigy/components/loaders.pyx", line 104, in prodigy.components.loaders.get_loader
ValueError: No loader found for '<recipe.CustomCsvLoader object at 0x7f4938369710>'

Any help you can offer would be much appreciated!!

Your loader code looks good, but I think the problem here is this part:

loader=CustomCsvLoader(source)

The loader argument is the loader ID, which Prodigy will resolve to one of the existing loaders (or a custom loader registered via entry points). For example, 'csv' to use the default CSV loader.

In your case, you already have the loader and CustomCsvLoader(source) is the loaded source, i.e. the data stream. So you can pass this in as the source argument, or do result['stream'] = CustomCsvLoader(source).

This worked like a charm. Thank you. Also, I just wanted to express my appreciation for how well the custom recipe hooks on prodigy are designed. Thank you for such a great product.

1 Like

@mikeross Thank you so much! It definitely took a few iterations to get the recipe API right and find the best balance of customisability and “just works out-of-the-box”. So it’s nice to hear that the design resonates :smiley: