Compatibility of versions

GregSilverman · October 1, 2018, 2:09am

Hi,
I started digging into PhraseMatcher and have a question.

We are bringing in annotations for our project (annotating EMS motor vehicle crash reports) from another program (brat) into Prodigy. As a test, I converted all our entity annotations from a single report into seed terms, as in these examples:

{"label": "EMSRunNumber", "text": "AC71231"}
{"label": "Age", "text": "52 Years"}
{"label": "Gender", "text": "Male"}
{"label": "InsuranceStatus", "text": "UNKNOWN"}
{"label": "Subject", "text": "PT"}
{"label": "DriverPassengerStatus", "text": "DRIVING"}
{"label": "VehicleSpeed", "text": "HWY SPEEDS"}
{"label": "Negation", "text": "NOT"}
{"label": "SeatbeltPresence", "text": "BELTED"}
{"label": "OtherSevere", "text": "UNKNOWN WHAT HAPPENED BUT PT WENT INTO A YARD ENDED UP ELEVATED ON DEBRIB"}
{"label": "Rollover", "text": "DRIVERSIDE TOWARDS GROUND ON ITS SIDE AGAINST A TREE"}
{"label": "SeverityIntrusion", "text": "MAJOR DAMAGE TO VEHICLE"}
{"label": "LocIntrusion", "text": "ESPECIALLY DRIVERS SIDE"}
{"label": "SeverityIntrusion", "text": "LARGE AMOUNT"}
{"label": "LocIntrusion", "text": "COMPARTMENT INTRUSION ON DRIVERS SIDE"}
{"label": "Negation", "text": "NO"}
{"label": "AirbagPresence", "text": "AIRBAG DEPLOYMENT"}
{"label": "ProvidersScene", "text": "EMS ON SCENE"}

As you can see, the text of these are multi-word tokens, so I followed this thread train-a-new-ner-entity-with-multi-word-tokens. As @ines suggested, I read these in using db-in and then using terms.to-patterns I wrote out the data to a jsonl file, which looks like:

{"label":null,"pattern":[{"lower":"AC71231"}]}
{"label":null,"pattern":[{"lower":"52 Years"}]}
{"label":null,"pattern":[{"lower":"Male"}]}
{"label":null,"pattern":[{"lower":"UNKNOWN"}]}
{"label":null,"pattern":[{"lower":"PT"}]}
{"label":null,"pattern":[{"lower":"DRIVING"}]}
{"label":null,"pattern":[{"lower":"HWY SPEEDS"}]}
{"label":null,"pattern":[{"lower":"NOT"}]}
{"label":null,"pattern":[{"lower":"BELTED"}]}
{"label":null,"pattern":[{"lower":"UNKNOWN WHAT HAPPENED BUT PT WENT INTO A YARD ENDED UP ELEVATED ON DEBRIB"}]}
{"label":null,"pattern":[{"lower":"DRIVERSIDE TOWARDS GROUND ON ITS SIDE AGAINST A TREE"}]}
{"label":null,"pattern":[{"lower":"MAJOR DAMAGE TO VEHICLE"}]}
{"label":null,"pattern":[{"lower":"ESPECIALLY DRIVERS SIDE"}]}
{"label":null,"pattern":[{"lower":"LARGE AMOUNT"}]}
{"label":null,"pattern":[{"lower":"COMPARTMENT INTRUSION ON DRIVERS SIDE"}]}
{"label":null,"pattern":[{"lower":"NO"}]}
{"label":null,"pattern":[{"lower":"AIRBAG DEPLOYMENT"}]}
{"label":null,"pattern":[{"lower":"EMS ON SCENE"}]}

I understand that these won't have labels, since I did not specify the --label switch when using terms.to-patterns.

I guess my question is, when I have multiple labels like this, is there a hack I can do to just pull the label from the database? The labels are there, as per the output from db-out:

{"label":"EMSRunNumber","text":"AC71231","_input_hash":728392859,"_task_hash":2944968,"answer":"accept"}
{"label":"Age","text":"52 Years","_input_hash":286403082,"_task_hash":-1910933207,"answer":"accept"}
{"label":"Gender","text":"Male","_input_hash":1541676315,"_task_hash":540860021,"answer":"accept"}
{"label":"InsuranceStatus","text":"UNKNOWN","_input_hash":1398767958,"_task_hash":-2052141552,"answer":"accept"}

So, this does not seem to be such an edge case wanting to have multiple labels extracted from the data using terms.to-patterns.

Is the above one of the use cases that EntityRuler will cover?

I have a slight time crunch to get this part of my experiment done by mid-October (specifically, extracting patterns from a hundred or so annotated reports and then refining them and testing these in Prdigy/spaCy. I'll then compare the results to that from several other NLP engines), so I am looking for the easiest route to get results.

So far, Prodigy has been fairly straightforward to use, but if circumventing this by scripting out my own pattern files and then using them in spaCy 2.1.x to take advantage of the EntityRuler would yield quicker results, then I will certainly do that.

Thank you for your input!

Greg--

Topic		Replies	Views
NER or PhraseMatcher? ner , spacy , best-practices	17	6163	September 20, 2018
Create PhraseMatcher in Spacy and use them to Label data manually ner , spacy , solved , medical	9	1679	December 15, 2020
Prodigy patterns not behaving like Spacy patterns usage , spacy , solved	19	2258	May 29, 2019
match pattern work in spacy but does not work in prodigy usage , ner , spacy	2	470	January 25, 2021
EntityRuler and ner.match - different behavior usage , ner , spacy	6	1842	July 11, 2019

Compatibility of versions

Related topics