I am annotating medical articles (https://github.com/chopeen/CORD-19/blob/master/data/raw/cord_19_rf_sentences.jsonl). The RISK_FACTOR names I am highlighting are sometimes compound phrases that contain multiple entities in a single span of text:
- "chromosomal and other anomalies"
- "substandard housing and living conditions"
- "previous use of carbapenems and quinolones"
- "originating from high ( 20 % ) or medium ( 18 % ) endemic area"
Ideally, they should translate to the following entities:
substandard housing conditions+
substandard living conditions
previous use of carbapenems+
previous use of quinolones
originating from high endemic area+
originating from medium endemic area
I think I would need a feature to highlight overlapping entities that are sometimes not consecutive words.
That's not possible in Prodigy, right?
What's the best practice?
Should I highlight only the first entity or entire compound phrases?