Hello. I’m new to prodigy (and NER) and relatively new to python as well. I’m mostly an R programming so don’t hold that against me!
In short, I’m reading in some safety-related text descriptions. The tab-delimited input file contains a column labeled “text” that holds the recorded safety event descriptions. From this, I want to manually highlight terms associated with the entity BUGBITE. I can’t find a lot of information on this so this is what I did…
- create a SQLlite table called “my_table”
- prodigy ner.manual my_table en_core_web_lg bug_testing.txt --label BUGBITE
- prodigy terms.to-patterns my_table my_patterns.jsonl --label BUGBITE
I look at the JSON file (excerpt below) and it does not look how I would expect… Each entity is associated with the entire record, not the individual manually highlighted terms. I assume it is user error on my part, but I can’t tell what I did wrong.
{“label”:“BUGBITE”,“pattern”:[{“lower”:“While heating up bolts on GT9 in preparation to break loose with a Hytorc, the employee noticed irritation on his right knee. The next day the employee reported the potential of receiving a spider or insect bite the previous night.”}]}
{“label”:“BUGBITE”,“pattern”:[{“lower”:“2011-04-05T00:00:00Z\t"While performing a task in the Lube Oil Shed. Technician needed a hand tool from his tool bag. While reaching into his tool bag, he noticed a large black widow nesting in the tools. The tool bag was used the day before and the insect was not there. The Black Widow had entered the tool bag with in a 14 hour span. Not sure if spider came from laying the tool bag on the lube shed floor for a short time or the locker room, were the tool bag is daily stored.”"}]}
Note that when I run ‘’ then I get the below, which is what I would expect from the annotating process (at least based on your tutorials / videos):
{“label”:“BUGBITE”,“pattern”:[{“lower”:“bite”}]}
{“label”:“BUGBITE”,“pattern”:[{“lower”:“bug”}]}
{“label”:“BUGBITE”,“pattern”:[{“lower”:“insect”}]}
Needless to say, nothing seems to work right after these steps, probably due to the incorrect formatting of the JSON patterns…
Any help is much appreciated!
Thanks,
-rich