Resolving UserWarning: [W036] The component 'entity_ruler' does not have any patterns defined

Greetings,

I trained a multi-label NER prodigy model for semantic analysis purposes. It seems the entity ruler isn't reading anything from the prodigy pre-trained model. I am using the command in below to load the prodigy-trained model and patterns

nlp = spacy.load('./model-best')
nlp.tokenizer = Tokenizer(nlp.vocab)

with open('./pattern.jsonl', 'r', encoding='utf-8') as f:
    patterns = [json.loads(line) for line in f]

ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(patterns)
nlp.add_pipe("entity_ruler", last=True)

However, I am getting "UserWarning: [W036] The component 'entity_ruler' does not have any patterns defined" when I want to train the model. To resolve this issue, I tried to manually add pattern file to the config.cfg file.

initialize]
components = ["entity_ruler"]

[initialize.components.entity_ruler]
patterns = ["pattern.jsonl"]

And simply remove this part for the code

with open('./pattern.jsonl', 'r', encoding='utf-8') as f:
    patterns = [json.loads(line) for line in f]

ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(patterns)
nlp.add_pipe("entity_ruler", last=True)

However, it did not work. I would be very appreciative if you could let me know where I am wrong in loading the prodigy model.
My config file currently looks like this

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
[system]
gpu_allocator = null
seed = 0
[nlp]
lang = "en"
pipeline = ["tok2ves","ner"]
batch_size = 1000
disable = [ ]
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
[components]
[components.ner]
factory = "ner"
incorrect_spans_key = "incorrect_spans"
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracel_cut_size = 100
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]

Thanks.

Hi @AmirNickkar,

I think the problem is that your current train config is not initializing the entity ruler taking your patterns into account.
I see you tried to do just that, but it looks like the syntax you use is incorrect. The right way would be:

[initialize.components.entity_ruler]

[initialize.components.entity_ruler.patterns]
@readers = "srsly.read_jsonl.v1"
path = "corpus/entity_ruler_patterns.jsonl

Alternatively, you can add it to your pipeline before training:

entity_ruler = nlp.add_pipe("entity_ruler")
entity_ruler.initialize(lambda: [], nlp=nlp, patterns=patterns)

But then in the config you should explicitly source entity_ruler from your custom pipeline.

Finally, one thing to double check is whether your patterns file follow the right syntax.

1 Like