Resolving UserWarning: [W036] The component 'entity_ruler' does not have any patterns defined

AmirNickkar · January 15, 2025, 4:32pm

Greetings,

I trained a multi-label NER prodigy model for semantic analysis purposes. It seems the entity ruler isn't reading anything from the prodigy pre-trained model. I am using the command in below to load the prodigy-trained model and patterns

nlp = spacy.load('./model-best')
nlp.tokenizer = Tokenizer(nlp.vocab)

with open('./pattern.jsonl', 'r', encoding='utf-8') as f:
    patterns = [json.loads(line) for line in f]

ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(patterns)
nlp.add_pipe("entity_ruler", last=True)

However, I am getting "UserWarning: [W036] The component 'entity_ruler' does not have any patterns defined" when I want to train the model. To resolve this issue, I tried to manually add pattern file to the config.cfg file.

initialize]
components = ["entity_ruler"]

[initialize.components.entity_ruler]
patterns = ["pattern.jsonl"]

And simply remove this part for the code

with open('./pattern.jsonl', 'r', encoding='utf-8') as f:
    patterns = [json.loads(line) for line in f]

ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(patterns)
nlp.add_pipe("entity_ruler", last=True)

However, it did not work. I would be very appreciative if you could let me know where I am wrong in loading the prodigy model.
My config file currently looks like this

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
[system]
gpu_allocator = null
seed = 0
[nlp]
lang = "en"
pipeline = ["tok2ves","ner"]
batch_size = 1000
disable = [ ]
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
[components]
[components.ner]
factory = "ner"
incorrect_spans_key = "incorrect_spans"
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracel_cut_size = 100
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]

Thanks.

magdaaniol · January 16, 2025, 11:01am

Hi @AmirNickkar,

I think the problem is that your current train config is not initializing the entity ruler taking your patterns into account.
I see you tried to do just that, but it looks like the syntax you use is incorrect. The right way would be:

[initialize.components.entity_ruler]

[initialize.components.entity_ruler.patterns]
@readers = "srsly.read_jsonl.v1"
path = "corpus/entity_ruler_patterns.jsonl

Alternatively, you can add it to your pipeline before training:

entity_ruler = nlp.add_pipe("entity_ruler")
entity_ruler.initialize(lambda: [], nlp=nlp, patterns=patterns)

But then in the config you should explicitly source entity_ruler from your custom pipeline.

Finally, one thing to double check is whether your patterns file follow the right syntax.

Topic		Replies	Views
Using entity ruler with ner batch train usage , ner , spacy , solved	3	592	November 11, 2019
Pre-Annotation Cannot Capture Entiity usage , ner , spacy	1	321	January 25, 2022
UserWarning: [W036] The component 'matcher' does not have any patterns defined. stream = (eg for _, eg in pattern_matcher(stream)) usage	4	1841	December 21, 2022
Training NER with Entity Ruler ner , spacy , solved	5	2244	November 21, 2019
Following NER annotation flowchart. Questions on new model and patterns file usage , ner	2	533	August 30, 2019

Resolving UserWarning: [W036] The component 'entity_ruler' does not have any patterns defined

Related topics