ner.manual does not highlight patterns

Hi prodigy team,

I am trying to train a new entity named skill. I would like to start by using the recipe ner.manual. But somehow it is not highlighting predefined skills from my pattern jsonl. I created a small example:

File: Test-matching.txt

My Name is Horst
I know Eclipse
I do not know Excel
and I am good in r
My Name is Karl
I know nothing
I do not know 
and I am good in guitar playing
My Name is Rondo
I know Abab
I do not know Tensorflow
and I am good in python
My Name is Elisa
I know coding
I do not know programming languages
and I am good in C#
What are all the sentences about
About skills
Who has the most skills?

File: patterns.jsonl

{"label": "skill", "pattern": [{"lower": "Eclipse"}]}
{"label": "skill", "pattern": [{"lower": "Excel"}]}
{"label": "skill", "pattern": [{"lower": "Office"}]}
{"label": "skill", "pattern": [{"lower": "Linux"}]}
{"label": "skill", "pattern": [{"lower": "Windows"}]}
{"label": "skill", "pattern": [{"lower": "Jenkins"}]}
{"label": "skill", "pattern": [{"lower": "API"}]}
{"label": "skill", "pattern": [{"lower": "SQL"}]}
{"label": "skill", "pattern": [{"lower": "TensorFlow"}]}
{"label": "skill", "pattern": [{"lower": "AWS"}]}
{"label": "skill", "pattern": [{"lower": "Docker"}]}
{"label": "skill", "pattern": [{"lower": "pandas"}]}
{"label": "skill", "pattern": [{"lower": "shell"}]}
{"label": "skill", "pattern": [{"lower": "ABAP"}]}
{"label": "skill", "pattern": [{"lower": "python"}]}
{"label": "skill", "pattern": [{"lower": "C#"}]}
{"label": "skill", "pattern": [{"lower": "C++"}]}
{"label": "skill", "pattern": [{"lower": "Java"}]}
{"label": "skill", "pattern": [{"lower": "JavaScript"}]}

Here is the command I run:

prodigy ner.manual skills-db blank:en data/Test-matching.txt -l skill --patterns data/patterns.jsonl

Why is Eclipse not being highlighted? Could you please help me?

Best,

Paul

Hi! I think the problem here is that your pattern doesn't match: [{"lower": "Eclipse"}] means you're looking for a token whose lowercase form equals "Eclipse". This can never be true, so the pattern will never match. So you can either change it to match on the "text" attribute instead of an exact case-sensitive match, or match on "lower": "exclipse".

You an also test the patterns with spaCy's Matcher or the interactive demo to check if they match, and find potential problems.

Thanks Ines, that fixed my Problem!

1 Like