Ok, Iâm not far awayâŠ
I have the ner_teach.py
file :
import prodigy
import spacy
from prodigy.components.sorters import prefer_uncertain
from prodigy.models.matcher import PatternMatcher
from prodigy.models.ner import EntityRecognizer
from prodigy.util import combine_models, split_string
@prodigy.recipe('sentinel.ner.teach',
dataset=("The dataset to use", "positional", None, str),
spacy_model=("The base model", "positional", None, str),
database=("The source data as a JSONL file", "positional", None, str),
label=("One or more comma-separated labels", "option", "l", split_string),
patterns=("Optional match patterns", "option", "p", str)
)
def sentinel_ner_teach(dataset, spacy_model, database, label, patterns):
print(database)
stream = ({'text': row} for row in database)
nlp = spacy.load(spacy_model)
model = EntityRecognizer(nlp, label=label)
print(patterns)
matcher = PatternMatcher(nlp).add_patterns(patterns)
predict, update = combine_models(model, matcher)
# predict = model
# update = model.update
stream = prefer_uncertain(predict(stream))
return {
'view_id': 'ner',
'dataset': dataset,
'stream': stream,
'update': update,
'config': {
'lang': nlp.lang,
'label': ', '.join(label) if label is not None else 'all'
}
}
And I call it from another file :
from sentinel.ml.ner_teach import sentinel_ner_teach
(...)
prodigy.serve('sentinel.ner.teach', my_dataset, 'fr_core_news_sm', sentences, [label], patterns)
First problem, my IDE (pyCharm) says that the import is unused and want to remove it. An idea to corrige that ?
Second problem, I have this error :
Exception when serving /get_questions
(...)
ValueError: Error while validating stream: no first example. This likely means that your stream is empty.
As you see in the first code, if I print `pattern, I get :
[{'label': 'ORG', 'pattern': [{'lower': 'lydia'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'google'}, {'lower': 'hangouts'}, {'lower': 'chat'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'watchos'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'amazon'}, {'lower': 'fresh'}]}, {'label': 'ORG', 'pattern': [{'lower': 'bain'}, {'lower': 'capital'}, {'lower': 'ventures'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'apple'}, {'lower': 'news'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'microsoft'}, {'lower': 'windows'}, {'lower': '10'}]}, {'label': 'PRODUCT', 'pattern': [{'lower': 'apple'}, {'lower': 'pay'}]}]
And dataset
contains a list of sentences:
['La liaison entre la Ami One et les smartphones est annoncĂ©e comme important','Un point important Ă retenir est que la Ami One est une voiture sans permis.', 'CitroĂ«n affirme cependant que son vĂ©hicule\xa0«\xa0dispose de sa propre signature sonore', 'CitroĂ«n a tout de mĂȘme eu la bonne idĂ©e dâinstaller des siĂšges lĂ©gĂšrement dĂ©calĂ©s lâun de lâautre, afin dâĂ©viter de gĂȘner les mouvements du conducteur.']
Iâll continue to search, but If you have an ideaâŠ