Greetings,
I trained a multi-label NER prodigy model for semantic analysis purposes. It seems the entity ruler isn't reading anything from the prodigy pre-trained model. I am using the command in below to load the prodigy-trained model and patterns
nlp = spacy.load('./model-best')
nlp.tokenizer = Tokenizer(nlp.vocab)
with open('./pattern.jsonl', 'r', encoding='utf-8') as f:
patterns = [json.loads(line) for line in f]
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(patterns)
nlp.add_pipe("entity_ruler", last=True)
However, I am getting "UserWarning: [W036] The component 'entity_ruler' does not have any patterns defined" when I want to train the model. To resolve this issue, I tried to manually add pattern file to the config.cfg file.
initialize]
components = ["entity_ruler"]
[initialize.components.entity_ruler]
patterns = ["pattern.jsonl"]
And simply remove this part for the code
with open('./pattern.jsonl', 'r', encoding='utf-8') as f:
patterns = [json.loads(line) for line in f]
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(patterns)
nlp.add_pipe("entity_ruler", last=True)
However, it did not work. I would be very appreciative if you could let me know where I am wrong in loading the prodigy model.
My config file currently looks like this
[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
[system]
gpu_allocator = null
seed = 0
[nlp]
lang = "en"
pipeline = ["tok2ves","ner"]
batch_size = 1000
disable = [ ]
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
[components]
[components.ner]
factory = "ner"
incorrect_spans_key = "incorrect_spans"
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracel_cut_size = 100
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]
Thanks.