Hello,
I'm trying to import a model I trained using the following command:
prodigy train textcat clinical_order_types en_core_web_lg --textcat-exclusive
Then when I try loading the model:
mod = spacy.load(model_location)
I'm getting: KeyError: 'PUNCTSIDE_FIN'
In the past, I've been able to run this type of construct in other prodigy projects in an earlier version of prodigy (1.8?).
The prodigy training command appears to be working, and this is the contents of the output directory:
os.listdir(model_location)
['textcat', 'ner', 'tagger', 'tokenizer', 'meta.json', 'vocab', 'parser']
Given that this is a text classification model, I don't know why there is a directory called "ner".
I read this regarding a similar error with spacy.load(): nlp=spacy.load('en_core_web_sm') KeyError: 'PUNCTSIDE_FIN' · Issue #4945 · explosion/spaCy · GitHub, but it doesn't seem to apply.
I'm using linux and have
prodigy 1.9.5
spacy 2.2.3
python 3.8
Thanks a lot,
JoAnn
Here's the contents of the meta.json produced from the prodigy train:
{"lang":"en","name":"core_web_lg","license":"MIT","author":"Explosion","url":"https://explosion.ai","email":"contact@explosion.ai","description":"English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, context-specific token vectors, POS tags, dependency parse and named entities.","sources":[{"name":"OntoNotes 5","url":"https://catalog.ldc.upenn.edu/LDC2013T19","license":"commercial (licensed by Explosion)"},{"name":"Common Crawl"}],"pipeline":["tagger","parser","ner","textcat"],"version":"2.2.5","spacy_version":">=2.2.2","parent_package":"spacy","accuracy":{"las":90.1734441725,"uas":92.0132337105,"token_acc":99.7579930934,"tags_acc":97.2200800054,"ents_f":86.5464321721,"ents_p":86.7358967163,"ents_r":86.3577935506},"speed":{"cpu":6257.754029418,"gpu":null,"nwords":291314},"labels":{"tagger":["$","''",",","-LRB-","-RRB-",".",":","ADD","AFX","CC","CD","DT","EX","FW","HYPH","IN","JJ","JJR","JJS","LS","MD","NFP","NN","NNP","NNPS","NNS","PDT","POS","PRP","PRP$","RB","RBR","RBS","RP","SYM","TO","UH","VB","VBD","VBG","VBN","VBP","VBZ","WDT","WP","WP$","WRB","XX","_SP","``"],"parser":["ROOT","acl","acomp","advcl","advmod","agent","amod","appos","attr","aux","auxpass","case","cc","ccomp","compound","conj","csubj","csubjpass","dative","dep","det","dobj","expl","intj","mark","meta","neg","nmod","npadvmod","nsubj","nsubjpass","nummod","oprd","parataxis","pcomp","pobj","poss","preconj","predet","prep","prt","punct","quantmod","relcl","xcomp"],"ner":["CARDINAL","DATE","EVENT","FAC","GPE","LANGUAGE","LAW","LOC","MONEY","NORP","ORDINAL","ORG","PERCENT","PERSON","PRODUCT","QUANTITY","TIME","WORK_OF_ART"],"textcat":["vent_stop","iabp","vent","transfuse_blood_products","noninvasive_vent","other","chest_tube_suction"]},"vectors":{"width":300,"vectors":684831,"keys":684830,"name":"en_core_web_lg.vectors"},"factories":{"tagger":"tagger","parser":"parser","ner":"ner","textcat":"textcat"}}