Hello!
What I did:
I took three files: text, bert tokens, IOB labels and created prodigy format like here:
Those files have right and wrong predictions by pytorch model.
Format I have (text, tokens, spans):
{"text":"U Profile BIKARAEROSPACE GmbH fast bayrische Verh\u00e4ltnisse Und am Wittgensteiner Land rollt der Verkehr vorbei Was die Verkehrsanbindung Wittgenstein anbetrifft werden wir nicht nachlassen verspricht der Hauptgesch\u00e4ftsf\u00fchrer Lesen Sie mehr Dabei zeichnet sich die Dann gibt es die M\u00f6glichkeit im Betrieb zu arbeiten und an den Wochenenden die Uni zu besuchen Platten Bleche Zuschnitte Ronden Ringe Stangen Rohre und Profile Dazu kommt die Stangenware also Stangen Rohre und Profile in allen m\u00f6glichen Abmessungen","tokens":[{"text":"U","start":0,"end":1,"id":0},{"text":"Profil","start":2,"end":8,"id":1},{"text":"##e","start":8,"end":11,"id":2},{"text":"B","start":12,"end":13,"id":3},{"text":"##IK","start":13,"end":17,"id":4},{"text":"##AR","start":17,"end":21,"id":5},{"text":"##AE","start":21,"end":25,"id":6},{"text":"##RO","start":25,"end":29,"id":7},{"text":"##SP","start":29,"end":33,"id":8},{"text":"##ACE","start":33,"end":38,"id":9},{"text":"GmbH","start":39,"end":43,"id":10},{"text":"fast","start":44,"end":48,"id":11},{"text":"ba","start":49,"end":51,"id":12},{"text":"##yr","start":51,"end":55,"id":13},{"text":"##ische","start":55,"end":62,"id":14},{"text":"Verh","start":63,"end":67,"id":15},{"text":"##alt","start":67,"end":72,"id":16},{"text":"##nisse","start":72,"end":79,"id":17},{"text":"Und","start":80,"end":83,"id":18},{"text":"am","start":84,"end":86,"id":19},{"text":"Witt","start":87,"end":91,"id":20},{"text":"##gens","start":91,"end":97,"id":21},{"text":"##te","start":97,"end":101,"id":22},{"text":"##iner","start":101,"end":107,"id":23},{"text":"Land","start":108,"end":112,"id":24},{"text":"ro","start":113,"end":115,"id":25},{"text":"##llt","start":115,"end":120,"id":26},{"text":"der","start":121,"end":124,"id":27},{"text":"Verkehr","start":125,"end":132,"id":28},{"text":"vorbei","start":133,"end":139,"id":29},{"text":"Was","start":140,"end":143,"id":30},{"text":"die","start":144,"end":147,"id":31},{"text":"Verkehrs","start":148,"end":156,"id":32},{"text":"##an","start":156,"end":160,"id":33},{"text":"##bindung","start":160,"end":169,"id":34},{"text":"Witt","start":170,"end":174,"id":35},{"text":"##gens","start":174,"end":180,"id":36},{"text":"##te","start":180,"end":184,"id":37},{"text":"##in","start":184,"end":188,"id":38},{"text":"an","start":189,"end":191,"id":39},{"text":"##bet","start":191,"end":196,"id":40},{"text":"##rifft","start":196,"end":203,"id":41},{"text":"werden","start":204,"end":210,"id":42},{"text":"wir","start":211,"end":214,"id":43},{"text":"nicht","start":215,"end":220,"id":44},{"text":"nach","start":221,"end":225,"id":45},{"text":"##lassen","start":225,"end":233,"id":46},{"text":"verspricht","start":234,"end":244,"id":47},{"text":"der","start":245,"end":248,"id":48},{"text":"Haupt","start":249,"end":254,"id":49},{"text":"##gesch","start":254,"end":261,"id":50},{"text":"##aft","start":261,"end":266,"id":51},{"text":"##sf","start":266,"end":270,"id":52},{"text":"##uhr","start":270,"end":275,"id":53},{"text":"##er","start":275,"end":279,"id":54},{"text":"Lesen","start":280,"end":285,"id":55},{"text":"Sie","start":286,"end":289,"id":56},{"text":"mehr","start":290,"end":294,"id":57},{"text":"Dabei","start":295,"end":300,"id":58},{"text":"zeichnet","start":301,"end":309,"id":59},{"text":"sich","start":310,"end":314,"id":60},{"text":"die","start":315,"end":318,"id":61},{"text":"Dann","start":319,"end":323,"id":62},{"text":"gibt","start":324,"end":328,"id":63},{"text":"es","start":329,"end":331,"id":64},{"text":"die","start":332,"end":335,"id":65},{"text":"Mog","start":336,"end":339,"id":66},{"text":"##lichkeit","start":339,"end":349,"id":67},{"text":"im","start":350,"end":352,"id":68},{"text":"Betrieb","start":353,"end":360,"id":69},{"text":"zu","start":361,"end":363,"id":70},{"text":"arbeiten","start":364,"end":372,"id":71},{"text":"und","start":373,"end":376,"id":72},{"text":"an","start":377,"end":379,"id":73},{"text":"den","start":380,"end":383,"id":74},{"text":"Wochenenden","start":384,"end":395,"id":75},{"text":"die","start":396,"end":399,"id":76},{"text":"Uni","start":400,"end":403,"id":77},{"text":"zu","start":404,"end":406,"id":78},{"text":"besuchen","start":407,"end":415,"id":79},{"text":"Platten","start":416,"end":423,"id":80},{"text":"Blech","start":424,"end":429,"id":81},{"text":"##e","start":429,"end":432,"id":82},{"text":"Zusch","start":433,"end":438,"id":83},{"text":"##nitt","start":438,"end":444,"id":84},{"text":"##e","start":444,"end":447,"id":85},{"text":"Ron","start":448,"end":451,"id":86},{"text":"##den","start":451,"end":456,"id":87},{"text":"Ringe","start":457,"end":462,"id":88},{"text":"Stan","start":463,"end":467,"id":89},{"text":"##gen","start":467,"end":472,"id":90},{"text":"Rohr","start":473,"end":477,"id":91},{"text":"##e","start":477,"end":480,"id":92},{"text":"und","start":481,"end":484,"id":93},{"text":"Profil","start":485,"end":491,"id":94},{"text":"##e","start":491,"end":494,"id":95},{"text":"Dazu","start":495,"end":499,"id":96},{"text":"kommt","start":500,"end":505,"id":97},{"text":"die","start":506,"end":509,"id":98},{"text":"Stan","start":510,"end":514,"id":99},{"text":"##gen","start":514,"end":519,"id":100},{"text":"##ware","start":519,"end":525,"id":101},{"text":"also","start":526,"end":530,"id":102},{"text":"Stan","start":531,"end":535,"id":103},{"text":"##gen","start":535,"end":540,"id":104},{"text":"Rohr","start":541,"end":545,"id":105},{"text":"##e","start":545,"end":548,"id":106},{"text":"und","start":549,"end":552,"id":107},{"text":"Profil","start":553,"end":559,"id":108},{"text":"##e","start":559,"end":562,"id":109},{"text":"in","start":563,"end":565,"id":110},{"text":"allen","start":566,"end":571,"id":111},{"text":"mo","start":572,"end":574,"id":112},{"text":"##glichen","start":574,"end":583,"id":113},{"text":"Abmessungen","start":584,"end":595,"id":114}],"spans":[{"start":2,"end":11,"label":"PRODNAME"},{"start":404,"end":411,"label":"PRODNAME"},{"start":412,"end":420,"label":"PRODNAME"},{"start":421,"end":435,"label":"PRODNAME"},{"start":436,"end":444,"label":"PRODNAME"},{"start":445,"end":450,"label":"PRODNAME"},{"start":451,"end":460,"label":"PRODNAME"},{"start":461,"end":468,"label":"PRODNAME"},{"start":473,"end":482,"label":"PRODNAME"},{"start":498,"end":502,"label":"PRODNAME"},{"start":519,"end":528,"label":"PRODNAME"},{"start":529,"end":536,"label":"PRODNAME"},{"start":541,"end":550,"label":"PRODNAME"}]}
{"text":"70 x 70 x 25 60 x 60 x 25 Kupfer Sechs##kant##sta##ng##en Aluminium##Gu##ss##platten FOR##MO##DA##L 02##3 Eben##heit mm##m FOR##MO##DA##L BM##50##83 \u2013 Pra##zi##si##ons##Wa##l##z##platte EN AW 50##83 Univers##ell einsetz##bare Aluminium##platten fur erh##oh##te Anforderungen im Werkzeug Formen und Modell##bau Hoch##feste Aluminium##Wa##l##z##platten FOR##MO##DA##L BM##400","tokens":[{"text":"70","start":0,"end":2,"id":0},{"text":"x","start":3,"end":4,"id":1},{"text":"70","start":5,"end":7,"id":2},{"text":"x","start":8,"end":9,"id":3},{"text":"25","start":10,"end":12,"id":4},{"text":"60","start":13,"end":15,"id":5},{"text":"x","start":16,"end":17,"id":6},{"text":"60","start":18,"end":20,"id":7},{"text":"x","start":21,"end":22,"id":8},{"text":"25","start":23,"end":25,"id":9},{"text":"Kupfer","start":26,"end":32,"id":10},{"text":"Sechs","start":33,"end":38,"id":11},{"text":"##kant","start":38,"end":44,"id":12},{"text":"##sta","start":44,"end":49,"id":13},{"text":"##ng","start":49,"end":53,"id":14},{"text":"##en","start":53,"end":57,"id":15},{"text":"Aluminium","start":58,"end":67,"id":16},{"text":"##Gu","start":67,"end":71,"id":17},{"text":"##ss","start":71,"end":75,"id":18},{"text":"##platten","start":75,"end":84,"id":19},{"text":"FOR","start":85,"end":88,"id":20},{"text":"##MO","start":88,"end":92,"id":21},{"text":"##DA","start":92,"end":96,"id":22},{"text":"##L","start":96,"end":99,"id":23},{"text":"02","start":100,"end":102,"id":24},{"text":"##3","start":102,"end":105,"id":25},{"text":"Eben","start":106,"end":110,"id":26},{"text":"##heit","start":110,"end":116,"id":27},{"text":"mm","start":117,"end":119,"id":28},{"text":"##m","start":119,"end":122,"id":29},{"text":"FOR","start":123,"end":126,"id":30},{"text":"##MO","start":126,"end":130,"id":31},{"text":"##DA","start":130,"end":134,"id":32},{"text":"##L","start":134,"end":137,"id":33},{"text":"BM","start":138,"end":140,"id":34},{"text":"##50","start":140,"end":144,"id":35},{"text":"##83","start":144,"end":148,"id":36},{"text":"\u2013","start":149,"end":150,"id":37},{"text":"Pra","start":151,"end":154,"id":38},{"text":"##zi","start":154,"end":158,"id":39},{"text":"##si","start":158,"end":162,"id":40},{"text":"##ons","start":162,"end":167,"id":41},{"text":"##Wa","start":167,"end":171,"id":42},{"text":"##l","start":171,"end":174,"id":43},{"text":"##z","start":174,"end":177,"id":44},{"text":"##platte","start":177,"end":185,"id":45},{"text":"EN","start":186,"end":188,"id":46},{"text":"AW","start":189,"end":191,"id":47},{"text":"50","start":192,"end":194,"id":48},{"text":"##83","start":194,"end":198,"id":49},{"text":"Univers","start":199,"end":206,"id":50},{"text":"##ell","start":206,"end":211,"id":51},{"text":"einsetz","start":212,"end":219,"id":52},{"text":"##bare","start":219,"end":225,"id":53},{"text":"Aluminium","start":226,"end":235,"id":54},{"text":"##platten","start":235,"end":244,"id":55},{"text":"fur","start":245,"end":248,"id":56},{"text":"erh","start":249,"end":252,"id":57},{"text":"##oh","start":252,"end":256,"id":58},{"text":"##te","start":256,"end":260,"id":59},{"text":"Anforderungen","start":261,"end":274,"id":60},{"text":"im","start":275,"end":277,"id":61},{"text":"Werkzeug","start":278,"end":286,"id":62},{"text":"Formen","start":287,"end":293,"id":63},{"text":"und","start":294,"end":297,"id":64},{"text":"Modell","start":298,"end":304,"id":65},{"text":"##bau","start":304,"end":309,"id":66},{"text":"Hoch","start":310,"end":314,"id":67},{"text":"##feste","start":314,"end":321,"id":68},{"text":"Aluminium","start":322,"end":331,"id":69},{"text":"##Wa","start":331,"end":335,"id":70},{"text":"##l","start":335,"end":338,"id":71},{"text":"##z","start":338,"end":341,"id":72},{"text":"##platten","start":341,"end":350,"id":73},{"text":"FOR","start":351,"end":354,"id":74},{"text":"##MO","start":354,"end":358,"id":75},{"text":"##DA","start":358,"end":362,"id":76},{"text":"##L","start":362,"end":365,"id":77},{"text":"BM","start":366,"end":368,"id":78},{"text":"##400","start":368,"end":373,"id":79}],"spans":[{"start":2,"end":11,"label":"PRODNAME"},{"start":404,"end":411,"label":"PRODNAME"},{"start":412,"end":420,"label":"PRODNAME"},{"start":421,"end":435,"label":"PRODNAME"},{"start":436,"end":444,"label":"PRODNAME"},{"start":445,"end":450,"label":"PRODNAME"},{"start":451,"end":460,"label":"PRODNAME"},{"start":461,"end":468,"label":"PRODNAME"},{"start":473,"end":482,"label":"PRODNAME"},{"start":498,"end":502,"label":"PRODNAME"},{"start":519,"end":528,"label":"PRODNAME"},{"start":529,"end":536,"label":"PRODNAME"},{"start":541,"end":550,"label":"PRODNAME"}]}
{"text":"30 x 10 x 15 30 x 20 x 20 mm a x b x a x s FOR##MO##DA##L 07 sehr geehrt##er Gesch##aft##spartner Andere un##ed##le Metall##e einschlie\u00dflich Stan##gen Nickel##matt##e Nickel##oxid##sin##ter und andere Hoch##feste Pra##zi##si##ons##wal##z##platten","tokens":[{"text":"30","start":0,"end":2,"id":0},{"text":"x","start":3,"end":4,"id":1},{"text":"10","start":5,"end":7,"id":2},{"text":"x","start":8,"end":9,"id":3},{"text":"15","start":10,"end":12,"id":4},{"text":"30","start":13,"end":15,"id":5},{"text":"x","start":16,"end":17,"id":6},{"text":"20","start":18,"end":20,"id":7},{"text":"x","start":21,"end":22,"id":8},{"text":"20","start":23,"end":25,"id":9},{"text":"mm","start":26,"end":28,"id":10},{"text":"a","start":29,"end":30,"id":11},{"text":"x","start":31,"end":32,"id":12},{"text":"b","start":33,"end":34,"id":13},{"text":"x","start":35,"end":36,"id":14},{"text":"a","start":37,"end":38,"id":15},{"text":"x","start":39,"end":40,"id":16},{"text":"s","start":41,"end":42,"id":17},{"text":"FOR","start":43,"end":46,"id":18},{"text":"##MO","start":46,"end":50,"id":19},{"text":"##DA","start":50,"end":54,"id":20},{"text":"##L","start":54,"end":57,"id":21},{"text":"07","start":58,"end":60,"id":22},{"text":"sehr","start":61,"end":65,"id":23},{"text":"geehrt","start":66,"end":72,"id":24},{"text":"##er","start":72,"end":76,"id":25},{"text":"Gesch","start":77,"end":82,"id":26},{"text":"##aft","start":82,"end":87,"id":27},{"text":"##spartner","start":87,"end":97,"id":28},{"text":"Andere","start":98,"end":104,"id":29},{"text":"un","start":105,"end":107,"id":30},{"text":"##ed","start":107,"end":111,"id":31},{"text":"##le","start":111,"end":115,"id":32},{"text":"Metall","start":116,"end":122,"id":33},{"text":"##e","start":122,"end":125,"id":34},{"text":"einschlie\u00dflich","start":126,"end":140,"id":35},{"text":"Stan","start":141,"end":145,"id":36},{"text":"##gen","start":145,"end":150,"id":37},{"text":"Nickel","start":151,"end":157,"id":38},{"text":"##matt","start":157,"end":163,"id":39},{"text":"##e","start":163,"end":166,"id":40},{"text":"Nickel","start":167,"end":173,"id":41},{"text":"##oxid","start":173,"end":179,"id":42},{"text":"##sin","start":179,"end":184,"id":43},{"text":"##ter","start":184,"end":189,"id":44},{"text":"und","start":190,"end":193,"id":45},{"text":"andere","start":194,"end":200,"id":46},{"text":"Hoch","start":201,"end":205,"id":47},{"text":"##feste","start":205,"end":212,"id":48},{"text":"Pra","start":213,"end":216,"id":49},{"text":"##zi","start":216,"end":220,"id":50},{"text":"##si","start":220,"end":224,"id":51},{"text":"##ons","start":224,"end":229,"id":52},{"text":"##wal","start":229,"end":234,"id":53},{"text":"##z","start":234,"end":237,"id":54},{"text":"##platten","start":237,"end":246,"id":55}],"spans":[{"start":2,"end":11,"label":"PRODNAME"},{"start":404,"end":411,"label":"PRODNAME"},{"start":412,"end":420,"label":"PRODNAME"},{"start":421,"end":435,"label":"PRODNAME"},{"start":436,"end":444,"label":"PRODNAME"},{"start":445,"end":450,"label":"PRODNAME"},{"start":451,"end":460,"label":"PRODNAME"},{"start":461,"end":468,"label":"PRODNAME"},{"start":473,"end":482,"label":"PRODNAME"},{"start":498,"end":502,"label":"PRODNAME"},{"start":519,"end":528,"label":"PRODNAME"},{"start":529,"end":536,"label":"PRODNAME"},{"start":541,"end":550,"label":"PRODNAME"}]}
{"text":"42##5 mm 42##9 mm 115 mm 117 mm 60 mm 61 mm 200 x 5 Vier##kant##sta##ng##en Blech##e und Platten aus Kunststoff Poly##vin##yl##iden##flu##ori##d P##VD##F 25 x 20 27 x 18 100 x 80 102 x 78","tokens":[{"text":"42","start":0,"end":2,"id":0},{"text":"##5","start":2,"end":5,"id":1},{"text":"mm","start":6,"end":8,"id":2},{"text":"42","start":9,"end":11,"id":3},{"text":"##9","start":11,"end":14,"id":4},{"text":"mm","start":15,"end":17,"id":5},{"text":"115","start":18,"end":21,"id":6},{"text":"mm","start":22,"end":24,"id":7},{"text":"117","start":25,"end":28,"id":8},{"text":"mm","start":29,"end":31,"id":9},{"text":"60","start":32,"end":34,"id":10},{"text":"mm","start":35,"end":37,"id":11},{"text":"61","start":38,"end":40,"id":12},{"text":"mm","start":41,"end":43,"id":13},{"text":"200","start":44,"end":47,"id":14},{"text":"x","start":48,"end":49,"id":15},{"text":"5","start":50,"end":51,"id":16},{"text":"Vier","start":52,"end":56,"id":17},{"text":"##kant","start":56,"end":62,"id":18},{"text":"##sta","start":62,"end":67,"id":19},{"text":"##ng","start":67,"end":71,"id":20},{"text":"##en","start":71,"end":75,"id":21},{"text":"Blech","start":76,"end":81,"id":22},{"text":"##e","start":81,"end":84,"id":23},{"text":"und","start":85,"end":88,"id":24},{"text":"Platten","start":89,"end":96,"id":25},{"text":"aus","start":97,"end":100,"id":26},{"text":"Kunststoff","start":101,"end":111,"id":27},{"text":"Poly","start":112,"end":116,"id":28},{"text":"##vin","start":116,"end":121,"id":29},{"text":"##yl","start":121,"end":125,"id":30},{"text":"##iden","start":125,"end":131,"id":31},{"text":"##flu","start":131,"end":136,"id":32},{"text":"##ori","start":136,"end":141,"id":33},{"text":"##d","start":141,"end":144,"id":34},{"text":"P","start":145,"end":146,"id":35},{"text":"##VD","start":146,"end":150,"id":36},{"text":"##F","start":150,"end":153,"id":37},{"text":"25","start":154,"end":156,"id":38},{"text":"x","start":157,"end":158,"id":39},{"text":"20","start":159,"end":161,"id":40},{"text":"27","start":162,"end":164,"id":41},{"text":"x","start":165,"end":166,"id":42},{"text":"18","start":167,"end":169,"id":43},{"text":"100","start":170,"end":173,"id":44},{"text":"x","start":174,"end":175,"id":45},{"text":"80","start":176,"end":178,"id":46},{"text":"102","start":179,"end":182,"id":47},{"text":"x","start":183,"end":184,"id":48},{"text":"78","start":185,"end":187,"id":49}],"spans":[{"start":2,"end":11,"label":"PRODNAME"},{"start":404,"end":411,"label":"PRODNAME"},{"start":412,"end":420,"label":"PRODNAME"},{"start":421,"end":435,"label":"PRODNAME"},{"start":436,"end":444,"label":"PRODNAME"},{"start":445,"end":450,"label":"PRODNAME"},{"start":451,"end":460,"label":"PRODNAME"},{"start":461,"end":468,"label":"PRODNAME"},{"start":473,"end":482,"label":"PRODNAME"},{"start":498,"end":502,"label":"PRODNAME"},{"start":519,"end":528,"label":"PRODNAME"},{"start":529,"end":536,"label":"PRODNAME"},{"start":541,"end":550,"label":"PRODNAME"}]}
What for:
I wanted to use this command to see entities highlited and to accept or correct them.
python -m prodigy ner.manual output_db de_core_news_sm danil_test.jsonl -l PRODNAME,MTRL,ENNUM,TEMPER
But while doing it I got error
ValueError: Mismatched tokenization. Can't resolve span to token index 411. This can happen if your data contains pre-set spans. Make sure that the spans match spaCy's tokenization or add a 'tokens' property to your task.
{'start': 404, 'end': 411, 'label': 'PRODNAME', 'token_start': 78}
Question. So do I understand correctly, that prodigy won't show me text with highlighted entities untill all of them labeled right? Cause my goal is to see visually what model predicted (even if it's wrong) and to correct it using prodigy.