I am in the process of setting up Prodigy to annotate some entities. One case we have 25 entities, but i get an error when running code;
prodigy ner.manual symbility_sample /model-ner-en_blank /Symbility_Comm_cleansed_sample_PRODIGY_VERSION.csv --label "Glass,Undg,InsProp,CompMach,Aircraft,EarthQ,PipeFz,PipeNoFz,OverflowTank,LeakSprinkler,MechFailure,FaultyWork,WearTear,FloodDrainage,FloodRiver,FloodCoast,ImpactNoVehi,ImpactOVeh,Impact3Veh,TreeStorm,HailStorm,SnowStorm,WindStorm,IngressStorm,ImpactOther"
OSError: [Errno 36] File name too long: '
Ive reduced the length of the entitiy names to help with other use cases, but cant redude this anymore without it becoming difficult for annotators.
Thanks in advance.
You've sort of hit an edge-case here that we didn't expect. You can provide the labels as either a text file, or on the command line. To figure out which is which, we check whether the file exists. Normally this works fine, but in your case, the is-a-file check fails because the string is too long.
We can fix this in Prodigy, but you can work around it in the meantime by putting the labels into a text file (one label per line), and then pointing Prodigy at that.
Taking a step back though, I think you might want to consider annotating with far fewer labels initially. Annotating with a lot of labels will give the annotators a much more difficult experience, because they have to select the label from a long list, and remember a lot of detailed definitions. There are a few solutions:
- You could have one generic category for the span-annotation phase, and divide the extracted texts into types later.
- You could have one annotation task per entity type, and merge the annotations later.
- You could make the label scheme hierarchical, and do the top-level categories first.
- You could make the annotation sentence-based, rather than span-based.
I think option 3 is one you should especially consider. You could at least have a sentence level task where you annotate whether the sentence contains any entities. If your data has a low density of entities, this will allow you to move through the data much quicker. You could then annotate only the "has at least one entity" sentences in a subsequent pass, probably working on one entity-type at a time.
Amazing thanks! I wasnt aware we could use a text file for the labels, my bad!
I 100% agree re the number of entities, we have way too many and i know that some will not get annotated as no data will exist for those...ill be able to prove this to "management" within a few days thanks to your tool!
For the time being we have gone down the route of hierarachical, so what , why , how and when. Then i think we will look at the sentence based approach as that sounds perfect for when we go in to prod with this!
THanks SO much!