Hi,
I'm attempting to use Prodigy to quickly come up with patterns for later use with a spaCy PhraseMatcher
with attr="SHAPE"
.
I labeled a couple hundred examples, and then was hoping the resulting highlights would be stored in a retrievable way. It appears that perhaps they are, but I'm not able to at the moment.
To reproduce:
Using the following command: prodigy ner.manual test_animal_fruit_example en_core_web_sm ./data/test.jsonl --label "Fruit Weight, Animal Weight"
where test.jsonl
is like so:
{"text": "Have you ever seen a 5kg apple in a tree with a cat"}
{"text": "A whole bunch of text is going on here about a apple of 5oz size in a tree with a cat"}
{"text": "Dancing horses and a 30lb mango in a tree with a cat"}
{"text": "We ate bananas so that the 500 kilo elephant would stay pleased"}
{"text": "Don't feed your cat too much or it will end up as a 60 pound kitty"}
I highlighted 5kg apple
, apple of 5oz
, 30lb mango
, 500 kilo elephant
, and 60 pound kitty
with their respective labels (first 3 are Fruit Weight
and final 2 are Animal Weight
)
The resulting annotations file, after using db-out
, is like so:
{"text":"Have you ever seen a 5kg apple in a tree with a cat","_input_hash":1015986919,"_task_hash":2059168639,"tokens":[{"text":"Have","start":0,"end":4,"id":0},{"text":"you","start":5,"end":8,"id":1},{"text":"ever","start":9,"end":13,"id":2},{"text":"seen","start":14,"end":18,"id":3},{"text":"a","start":19,"end":20,"id":4},{"text":"5","start":21,"end":22,"id":5},{"text":"kg","start":22,"end":24,"id":6},{"text":"apple","start":25,"end":30,"id":7},{"text":"in","start":31,"end":33,"id":8},{"text":"a","start":34,"end":35,"id":9},{"text":"tree","start":36,"end":40,"id":10},{"text":"with","start":41,"end":45,"id":11},{"text":"a","start":46,"end":47,"id":12},{"text":"cat","start":48,"end":51,"id":13}],"_session_id":"test_animal_fruit_example-default","_view_id":"ner_manual","spans":[{"start":21,"end":30,"token_start":5,"token_end":7,"label":"Fruit Weight"}],"answer":"accept"}
{"text":"A whole bunch of text is going on here about a apple of 5oz size in a tree with a cat","_input_hash":-1205931032,"_task_hash":552599497,"tokens":[{"text":"A","start":0,"end":1,"id":0},{"text":"whole","start":2,"end":7,"id":1},{"text":"bunch","start":8,"end":13,"id":2},{"text":"of","start":14,"end":16,"id":3},{"text":"text","start":17,"end":21,"id":4},{"text":"is","start":22,"end":24,"id":5},{"text":"going","start":25,"end":30,"id":6},{"text":"on","start":31,"end":33,"id":7},{"text":"here","start":34,"end":38,"id":8},{"text":"about","start":39,"end":44,"id":9},{"text":"a","start":45,"end":46,"id":10},{"text":"apple","start":47,"end":52,"id":11},{"text":"of","start":53,"end":55,"id":12},{"text":"5","start":56,"end":57,"id":13},{"text":"oz","start":57,"end":59,"id":14},{"text":"size","start":60,"end":64,"id":15},{"text":"in","start":65,"end":67,"id":16},{"text":"a","start":68,"end":69,"id":17},{"text":"tree","start":70,"end":74,"id":18},{"text":"with","start":75,"end":79,"id":19},{"text":"a","start":80,"end":81,"id":20},{"text":"cat","start":82,"end":85,"id":21}],"_session_id":"test_animal_fruit_example-default","_view_id":"ner_manual","spans":[{"start":47,"end":59,"token_start":11,"token_end":14,"label":"Fruit Weight"}],"answer":"accept"}
{"text":"Dancing horses and a 30lb mango in a tree with a cat","_input_hash":-843711838,"_task_hash":903131301,"tokens":[{"text":"Dancing","start":0,"end":7,"id":0},{"text":"horses","start":8,"end":14,"id":1},{"text":"and","start":15,"end":18,"id":2},{"text":"a","start":19,"end":20,"id":3},{"text":"30","start":21,"end":23,"id":4},{"text":"lb","start":23,"end":25,"id":5},{"text":"mango","start":26,"end":31,"id":6},{"text":"in","start":32,"end":34,"id":7},{"text":"a","start":35,"end":36,"id":8},{"text":"tree","start":37,"end":41,"id":9},{"text":"with","start":42,"end":46,"id":10},{"text":"a","start":47,"end":48,"id":11},{"text":"cat","start":49,"end":52,"id":12}],"_session_id":"test_animal_fruit_example-default","_view_id":"ner_manual","spans":[{"start":21,"end":31,"token_start":4,"token_end":6,"label":"Fruit Weight"}],"answer":"accept"}
{"text":"We ate bananas so that the 500 kilo elephant would stay pleased","_input_hash":-263246995,"_task_hash":1265699488,"tokens":[{"text":"We","start":0,"end":2,"id":0},{"text":"ate","start":3,"end":6,"id":1},{"text":"bananas","start":7,"end":14,"id":2},{"text":"so","start":15,"end":17,"id":3},{"text":"that","start":18,"end":22,"id":4},{"text":"the","start":23,"end":26,"id":5},{"text":"500","start":27,"end":30,"id":6},{"text":"kilo","start":31,"end":35,"id":7},{"text":"elephant","start":36,"end":44,"id":8},{"text":"would","start":45,"end":50,"id":9},{"text":"stay","start":51,"end":55,"id":10},{"text":"pleased","start":56,"end":63,"id":11}],"_session_id":"test_animal_fruit_example-default","_view_id":"ner_manual","spans":[{"start":27,"end":44,"token_start":6,"token_end":8,"label":"Animal Weight"}],"answer":"accept"}
{"text":"Don't feed your cat too much or it will end up as a 60 pound kitty","_input_hash":1571593311,"_task_hash":-532519224,"tokens":[{"text":"Do","start":0,"end":2,"id":0},{"text":"n't","start":2,"end":5,"id":1},{"text":"feed","start":6,"end":10,"id":2},{"text":"your","start":11,"end":15,"id":3},{"text":"cat","start":16,"end":19,"id":4},{"text":"too","start":20,"end":23,"id":5},{"text":"much","start":24,"end":28,"id":6},{"text":"or","start":29,"end":31,"id":7},{"text":"it","start":32,"end":34,"id":8},{"text":"will","start":35,"end":39,"id":9},{"text":"end","start":40,"end":43,"id":10},{"text":"up","start":44,"end":46,"id":11},{"text":"as","start":47,"end":49,"id":12},{"text":"a","start":50,"end":51,"id":13},{"text":"60","start":52,"end":54,"id":14},{"text":"pound","start":55,"end":60,"id":15},{"text":"kitty","start":61,"end":66,"id":16}],"_session_id":"test_animal_fruit_example-default","_view_id":"ner_manual","spans":[{"start":52,"end":66,"token_start":14,"token_end":16,"label":"Animal Weight"}],"answer":"accept"}
Now, I'm trying to figure out how to get only the text I highlighted, with the associated label back out. This is proving difficult. I have something like this:
for annotation in data:
if annotation['answer'] == 'accept':
doc = nlp(annotation['text'])
try:
if len(annotation['spans']) > 0:
highlighted_span_start = int(annotation['spans'][0]['start'])
highlighted_span_end = int(annotation['spans'][0]['end'])
if annotation['spans'][0] == 'Fruit Weight':
print('Fruit Weight', doc[highlighted_span_start:highlighted_span_end])
print('*'*30)
elif annotation['spans'][0] == 'Animal Weight':
print('Animal Weight', doc[highlighted_span_start:highlighted_span_end])
print('*'*30)
except KeyError as e:
pass
This does not work, and using token_start
and token_end
for as the highlighted_span_
does not work either.
I assume there's a simple method here for getting the highlighted strings back out from ner.manual alongside their associated labels, but I haven't figured it out.
Thanks for any assistance.