ner.match

Hi

I have generated the patterns as displayed in the above image

"prodigy ner.match fruits_ner en_core_web_sm /tmp/food_texts.jsonl --patterns /tmp/fruits_patterns.jsonl"

this command gives this error below

I am new to prodigy , can you please help

How does /tmp/food_texts.jsonl look like? It needs to be a JSON content on each line with a text key.

thank you for your fast reply

this is the actual command i am using

python3 -m prodigy ner.match med_ner en_core_web_sm /home/centos/med_test5.jsonl --pattern /home/centos/disease_only_lower.jsonl

my med_test5 file looks something like this

Ok so /home/centos/med_test5.jsonl looks fine

However are you sure that your /home/centos/disease_only_lower.jsonl is fine and that you actually have matches using your patterns? Looking at your patterns it seems that you have the same patterns reoccurring.

i create pattern file in notepad with .jsonl extension example

file name (patterns_for_disease_new_latest2)

can i input this
python3 -m prodigy ner.match med_ner en_core_web_sm /home/centos/med_test5.jsonl --patterns /home/centos/patterns_for_disease_new_latest2.jsonl

nothing than(allergy) from the first sentence is definitely there

What if you create two very simple files using srsly in python

import srsly
srsly.write_jsonl('text-inputs.jsonl', [{'text': 'Minimal example'}])
srsly.write_jsonl('pattern-inputs.jsonl', [{'label': 'EX', 'pattern': [{'LOWER': 'example'}]}])

and then run

prodigy ner.match test-dataset en_core_web_sm text-inputs.jsonl --patterns pattern-inputs.jsonl

At least I suspect that there is an issue with your JSONL files. But maybe the explosion AI team knows better.

1 Like

I think if the JSON was broken, it'd fail much ealier – so the fact that it loads the stream is a good sign. And that "empty stream" message typically means that there's nothing to suggest for annotation. In the case of ner.match, which only shows matches, this would mean that there are no matches.

The files also look fine to me at first glance, and the patterns make sense. So I agree with @nix411's suggestion: maybe try it with a few texts and a few patterns that you know definitely match? Also make sure to use a new dataset, in case you've already annotated something before (because Prodigy will skip examples you've already annotated by default).

thank you so much, I think there was some issue with the json file creation

spacy identifies this for me , is it possible for me to make heat shock as the med_rcd entity manually if spacy identifies only shock and med_rcd entity

If you upgrade to the latest Prodigy v1.9, you can also use ner.manual with --patterns instead of ner.match. This will pre-highlight the suggestions from your patterns and let you manually edit them. See here for more details and an example: https://prodi.gy/docs/named-entity-recognition#manual-patterns

is it possible for me to do both like match if the match is wrong than do it manually

as well can you please tell me if this is my regex for postal (postal_pattern = r"\b([Gg][Ii][Rr] 0[Aa]{2})\b|\b((([A-Za-z][0-9]{1,2})|(([A-Za-z][A-Ha-hJ-Yj-y][0-9]{1,2})|(([A-Za-z][0-9][A-Za-z])|([A-Za-z][A-Ha-hJ-Yj-y][0-9][A-Za-z]?))))\s?[0-9][A-Za-z]{2})\b"
) for a particular entity what will be the command in prodigy

If you want to use your own custom rules for matching, you can write a custom recipe that adds "spans" with the matched "start" and "end" character offsets to each incoming example. See the documentation for the expected format.

You can use the ner_manual.py example recipe from here as a template, and then add your matching logic to the stream: https://github.com/explosion/prodigy-recipes/blob/master/ner/ner_manual.py Also see the custom recipes docs for more info.

I have a doubt after training the model for the medical entity the pretrained entities of spacy like PERSON, ORG all get disappeared example Bill's is suffering from cancer (cancer gets identified as the medical entity but the PERSON entity doesnot why is that and how can i resolve it

I get the the above error for the command as follows "python3 -m prodigy ner.manual med-terms en_core_web_sm ./text-inputs.jsonl --label MEDICAL --patterns ./pattern-inputs.jsonl"

The --patterns argument was only added in v1.9. If you're running an older version (you can run python3 -m prodigy stats to check), you'll need to upgrade to use the new features :slightly_smiling_face:

thank you so much works fine

1 Like