output and weird format

I use Prodigy 1.10.3 (before I used 1.9.5, 1.10.2) with Python 3.7 and Ubuntu 18.04. I’ve got a problem with the ouput and the coloring feature (in whatever version).

A first example with the print-dataset recipe : I annotated manually a corpora with the ner.manual recipe. Then I used the print-dataset recipe on the annotated dataset. The sentences wich contain more than one named entity make a problem. A raw sentence like this :

La voix d’Ariadne se fit entendre ; elle appelait Olga dans le jardin.

... become with print-dataset recipe :

A second example with the print-stream recipe : I annotated 8000 sentences. Then I made a model (model_8000). I applied the model on a new corpora, like this :

prodigy print-stream model_8000/ new.jsonl

I have the same problem with the output. A raw sentence like this :

Vers la fin de l’été, le lendemain du jour où le petit Louis est rentré au lycée, mademoiselle Zozo, au retour d’une réunion chez les Chaduis, se met à table, un soir, toute brillante de plaisir.

become with the print-stream recipe :

So it’s a big problem for me. I can apply my model but I can’t used the result even if it seems to be good. So I’m frustrated.

I saw a simular discussion on the support (***/prodigy-print-dataset-shows-weird-format-output-no-coloring/2586). But I didn’t manage to resolve my problem. Sorry if the answer is somewhere else… I don’t see it.

Thank’s a lot for helping me.

Hi! I just had a look and it seems like this is an offset issue that must have been introduced at some point in v1.10 :thinking: Sorry about that. I've already fixed this and the fix will be included in the next release!

(In the meantime, you could just use the displaCy visualizer in a notebook if you want to visualize all entities predicted by a model in a stream of examples. Or spacy-streamlit, which gives you an interactive Streamlit app :slightly_smiling_face:)

Thanks a lot for this answer !

it works!


1 Like

Update: Just released Prodigy v1.10.4, which should fix the underlying problem in the span CLI printer :slightly_smiling_face: