Passing array into prodigy.serve()

iero · March 21, 2019, 4:25am

Hello

I try to automate a complete training pipeline using python (without having to launch a bunch of command lines each time).

If I use this function, it works well:

prodigy.serve('ner.teach', 'fr_PRODUCT', 'fr_core_news_sm', 'test.jsonl',
                  None, None, ['PRODUCT'], None, None)

But is it possible to pass an object (ie array of strings) and not a jsonl file ?

I have a function like this that I could use if possible.

def create_jsonl(sentences):
    results = []
    for s in sentences:
        results.append({'text':s, "meta":{"source":"My database"}})

    return '\n'.join([json.dumps(line, ensure_ascii=False) for line in results])

Thanks again for your help

iero · March 21, 2019, 4:45am

I think I found it :

prodigy.serve('custom.ner.teach', 'fr_PRODUCT', 'fr_core_news_sm', sentences, ['PRODUCT'])

@prodigy.recipe('custom.ner.teach',
                dataset=prodigy.recipe_args['dataset'],
                spacy_model=prodigy.recipe_args['spacy_model'],
                database=("Database to connect to", "positional", None, str),
                label=prodigy.recipe_args['label_set'])
def custom_ner_teach(dataset, spacy_model, database, label=None):
    stream = ({'text': row} for row in database)
    components = teach(dataset=dataset, spacy_model=spacy_model,
                       source=stream, label=label)
    return components

Is it correct ?

ines · March 21, 2019, 9:21am

Yes, that’s the first thing I would have suggested, too

iero · March 21, 2019, 3:02pm

Thanks Ines, I’ll be ready to do the Prodigy support soon

iero · March 21, 2019, 4:25pm

I’m digging into this and i would like to add patterns to my recipe.

I managed to do that using :

@prodigy.recipe('custom.ner.teach',
                dataset=prodigy.recipe_args['dataset'],
                spacy_model=prodigy.recipe_args['spacy_model'],
                database=("Database to connect to", "positional", None, str),
                label=prodigy.recipe_args['label_set'],
                patterns=prodigy.recipe_args['patterns'])
def custom_ner_teach(dataset, spacy_model, database, label, patterns):
    stream = ({'text': row} for row in database)
    components = teach(dataset=dataset, spacy_model=spacy_model,
                       source=stream, label=label, patterns=patterns)
    return components

But is there a way to pass an object to patterns (array) and not a file ?

Thanks

iero · March 21, 2019, 11:15pm

I dug a little bit more and it seems that there is no way to pass pattern directly. So I will export to a temporary file.

Maybe it could be an option to add in a future release !

ines · March 22, 2019, 8:58am

Yep, at the moment, it expects to load those from a file.

It's possible that you're at a point now where you find it easier to write your own recipe based on the built-in ner.teach. Then you can load things however you want and also add some of your own logic.

See this page for more details on custom recipes. Also, if you haven't seen it yet, this repo has various recipe scripts, including slightly simplified versions of the built-in recipes with explanations, so you can see what's going on:

The ner.teach example:

github.com

explosion/prodigy-recipes/blob/master/ner/ner_teach.py

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.models.ner import EntityRecognizer
from prodigy.models.matcher import PatternMatcher
from prodigy.components.preprocess import split_sentences
from prodigy.components.sorters import prefer_uncertain
from prodigy.util import combine_models, split_string
import spacy
from typing import List, Optional


# Recipe decorator with argument annotations: (description, argument type,
# shortcut, type / converter function called on value before it's passed to
# the function). Descriptions are also shown when typing --help.
@prodigy.recipe(
    "ner.teach",
    dataset=("The dataset to use", "positional", None, str),
    spacy_model=("The base model", "positional", None, str),
    source=("The source data as a JSONL file", "positional", None, str),
    label=("One or more comma-separated labels", "option", "l", split_string),

This file has been truncated. show original

iero · March 22, 2019, 1:25pm

Great idea ! Thank you Ines

iero · March 29, 2019, 7:02pm

Hi

I’m a little confused about this decorator thing.

I created a custom ner_teach.py file :

@prodigy.recipe('my.ner.teach',
                dataset=("The dataset to use", "positional", None, str),
                spacy_model=("The base model", "positional", None, str),
                source=("The source data as a JSONL file", "positional", None, str),
                label=("One or more comma-separated labels", "option", "l", split_string),
                patterns=("Optional match patterns", "option", "p", str),
                exclude=("Names of datasets to exclude", "option", "e", split_string),
                unsegmented=("Don't split sentences", "flag", "U", bool)
                )
def my_ner_teach(dataset, spacy_model, database, label, patterns, exclude=None):

    stream = ({'text': row} for row in database)

   nlp = spacy.load(spacy_model)
    model = EntityRecognizer(nlp, label=label)

    matcher = PatternMatcher(nlp).add_patterns(patterns)
    # Combine the NER model and the matcher and interleave their
    # suggestions and update both at the same time
    predict, update = combine_models(model, matcher)

    # Use the prefer_uncertain sorter to focus on suggestions that the model
    # is most uncertain about (i.e. with a score closest to 0.5). The model
    # yields (score, example) tuples and the sorter yields just the example
    stream = prefer_uncertain(predict(stream))

    return {
        'view_id': 'ner',  # Annotation interface to use
        'dataset': dataset,  # Name of dataset to save annotations
        'stream': stream,  # Incoming stream of examples
        'update': update,  # Update callback, called with batch of answers
        'exclude': exclude,  # List of dataset names to exclude
        'config': {  # Additional config settings, mostly for app UI
            'lang': nlp.lang,
            'label': ', '.join(label) if label is not None else 'all'
        }
    }

In this file, I add a list of patterns. using PatternMatcher(nlp).add_patterns(patterns). Hope it will work…

But how do I call this function (from my main function) ?

I have this, but of course the "prodigy.serve() cannot find ‘my.ner.teach’

import prodigy
import ner_teach
      
prodigy.serve('my.ner.teach', lang + '_' + label, 'fr_core_news_sm',
              sentences, [label], patterns)

Thanks

ines · March 29, 2019, 9:32pm

The @prodigy.recipe decorator registers the recipe so you can refer to it by its string name my.ner.teach. But in order to do that, it needs to be run.

In the file you’re calling prodigy.serve, are you actually importing the my_ner_teach recipe function?

iero · March 30, 2019, 12:14am

Ok, I’m not far away…

I have the ner_teach.pyfile :

import prodigy
import spacy

from prodigy.components.sorters import prefer_uncertain
from prodigy.models.matcher import PatternMatcher
from prodigy.models.ner import EntityRecognizer
from prodigy.util import combine_models, split_string


@prodigy.recipe('sentinel.ner.teach',
                dataset=("The dataset to use", "positional", None, str),
                spacy_model=("The base model", "positional", None, str),
                database=("The source data as a JSONL file", "positional", None, str),
                label=("One or more comma-separated labels", "option", "l", split_string),
                patterns=("Optional match patterns", "option", "p", str)
                )
def sentinel_ner_teach(dataset, spacy_model, database, label, patterns):

    print(database)
    stream = ({'text': row} for row in database)

    nlp = spacy.load(spacy_model)
    model = EntityRecognizer(nlp, label=label)

    print(patterns)
    matcher = PatternMatcher(nlp).add_patterns(patterns)
    predict, update = combine_models(model, matcher)

    # predict = model
    # update = model.update
    stream = prefer_uncertain(predict(stream))

    return {
        'view_id': 'ner',
        'dataset': dataset,
        'stream': stream,
        'update': update,
        'config': {
            'lang': nlp.lang,
            'label': ', '.join(label) if label is not None else 'all'
        }
    }

And I call it from another file :

from sentinel.ml.ner_teach import sentinel_ner_teach
(...)
prodigy.serve('sentinel.ner.teach', my_dataset, 'fr_core_news_sm', sentences, [label], patterns)

First problem, my IDE (pyCharm) says that the import is unused and want to remove it. An idea to corrige that ?

Second problem, I have this error :

 Exception when serving /get_questions
(...)
ValueError: Error while validating stream: no first example. This likely means that your stream is empty.

As you see in the first code, if I print `pattern, I get :

[{'label': 'ORG', 'pattern': [{'lower': 'lydia'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'google'}, {'lower': 'hangouts'}, {'lower': 'chat'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'watchos'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'amazon'}, {'lower': 'fresh'}]}, {'label': 'ORG', 'pattern': [{'lower': 'bain'}, {'lower': 'capital'}, {'lower': 'ventures'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'apple'}, {'lower': 'news'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'microsoft'}, {'lower': 'windows'}, {'lower': '10'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'apple'}, {'lower': 'pay'}]}]

And dataset contains a list of sentences:

['La liaison entre la Ami One et les smartphones est annoncée comme important','Un point important à retenir est que la Ami One est une voiture sans permis.', 'Citroën affirme cependant que son véhicule\xa0«\xa0dispose de sa propre signature sonore', 'Citroën a tout de même eu la bonne idée d’installer des sièges légèrement décalés l’un de l’autre, afin d’éviter de gêner les mouvements du conducteur.']

I’ll continue to search, but If you have an idea…

ines · March 30, 2019, 12:10pm

I mean, this is just a warning and your editor trying to help you write better code. In this case, the unused import is intentional. I don't know much about PyCharm, but linters usually let you add a comment to the line to specify that what you did there was intentional and that it should stop warning you about it. The standard Python way of doing this is adding # noqa, but it might be # noinspection in PyCharm?

This is trying to tell you that the stream of incoming examples is invalid. So that's whatever is insentences (and later used in your recipe as database).

So for some reason, what the recipe returns as the stream is empty. If the data you're passing in isn't empty, it's likely that the model and patterns with the given labels do not produce any candidates to send out.

iero · March 30, 2019, 3:26pm

Thanks Ines

For the import, I added # noinspection PyUnresolvedReferences and it works.

I checked the examples and inserted this code to test

texts = ['This is a text about David Bowie', 'Apple makes iPhones']
stream = [{'text': text} for text in texts]

patterns = [{'label': 'PERSON', 'pattern': 'David Bowie'},
             {'label': 'ORG', 'pattern': [{'lower': 'apple'}]}]

But same result.
I tried to set the recipe before calling it :

prodigy.set_recipe('sentinel.ner.teach', ner_teach.sentinel_ner_teach)
prodigy.serve('sentinel.ner.teach', my_dataset, 'fr_core_news_sm', sentences, [label], patterns)

Same error…

I think I do not import the decorator function the right way and maybe I have another problem somewhere…

iero · April 1, 2019, 2:25pm

Ok,

If I use a file for patterns, it’s fine :

matcher = PatternMatcher(nlp).from_disk(patterns_file)

But it seems that using a list is not working :

patterns_list = [{'label': 'PERSON', 'pattern': 'David Bowie'},
             {'label': 'ORG', 'pattern': [{'lower': 'apple'}]}]
matcher = PatternMatcher(nlp).add_patterns(patterns_list)

Any idea how to use add_patterns() method (or another method) ?

Thanks

ines · April 1, 2019, 2:51pm

I think the problem with your code here is that your variable matcher now becomes the return value of PatternMatcher.add_patterns. Unlike from_disk (which always returns the object itself), add_patterns is just a regular method that adds the patterns and returns nothing. So you want to be doing something like this:

matcher = PatternMatcher(nlp)  # this is now the actual matcher
matcher.add_patterns(patterns_list)

iero · April 1, 2019, 6:46pm

Thanks a lot Ines, it seems to work now !

I continue my implementation

Topic		Replies	Views
Prodigy present text with no matching pattern (ner.manual) usage , ner , solved	5	566	April 12, 2020
Custom ner recipe doesn't work with patterns ner	10	721	April 9, 2020
Using prodigy with patterns causes error: TypeError: 'tuple' object is not callable textcat , solved	4	699	November 7, 2023
Run through python script. usage , solved	13	3974	April 3, 2019
How do I add a --patterns option to ner.make-gold? ner , solved	11	1872	October 25, 2018

Passing array into prodigy.serve()

Related topics