Ah, damn, looks like your terminal doesn’t support ANSI escape sequences, which are used to set the colours. Just checked and I didn’t know that Windows 10 actually started disabling them by default. So our color helper should definitely check for that and not apply them if they’re not supported – just like our print helper does for emoji. Especially in this case, where it has an impact on usability / readability.
If you do want to see colours and make the output prettier, I found this thread. (But of course, you shouldn’t have to.)
When I use ner.print-stream, I have a similar problem. The entities are added to the stream, but without colors (in Windows 10). I followed the instructions from the thread Ines posted (creating registry entry, and also tried ConEmu) but it still won’t display the colors.
Is there another way to review my results in a way which is also good readable? Do more users report similar issues?
@Ben Thanks for the report – just to confirm, you’re on the latest version of Prodigy, right? (If I remember correctly, there was a problem in an older version of Prodigy that would always disable the colouring, even if it was supported by the system).
As an alternative solution, the input format expected for spaCy’s built-in entity visualizer is very similar – it only calls the "spans" property "ents". (To be honest, we should probably unify that for consistency.) So you could write a simple script that takes your dataset and outputs a displaCy visualization instead. I haven’t tested this in detail, but something like this should work:
from spacy import displacy
from prodigy.components.db import connect
db = connect() # connect to database with settings from prodigy.json
dataset = db.get_dataset('your_dataset') # get examples
examples = [{'text': eg['text'], 'ents': eg['spans']} for eg in dataset]
# start displaCy server
displacy.serve(examples, style='ent', manual=True)
Yes, I am using the latest version. However, using displacy works also great.
To be able to distinguish between accepted and rejected samples, I exported the according jsonl files and read it with displacy.
One suggestion for a future recipe: Allowing to review annotations.
I figured out that I did quite some wrong annotations in the beginning. To solve this, I might just delete the first half of my annotations.
Ah cool, glad to hear the displaCy solution is working! (Btw, in case you haven't seen it: If you call displacy.render and set page=True, it will return the markup of a full HTML page instead of starting the server. So you could even write a script that automatically generates a .html document for each dataset. With a custom colour theme, this is actually a really cool feature – maybe we should turn this into a built-in recipe )
I hope I understand the use case correctly – but the dataset format is identical to the stream format, so you can always just export your dataset, use it as the input to ner.manual and then correct the annotations.
Even if you only re-annotate half of the dataset, you can later reconcile the annotations by comparing the _task_hash of the individual examples (e.g. replace all examples in your old set with the corrected version, if the task hash is available in the new set).