Console colors don't show properly on windows

When I run ner.train-curve, I get symbols such as e[38;5;77m+0.08e[0m instead of the icons shown in the tutorials.

I’m running on windows. The same error happens in both powershell and cmd.

It’s not blocking, but thought you should know.

Please don’t judge the pitiful sample size and accuracy of my example :wink:

Ah, damn, looks like your terminal doesn’t support ANSI escape sequences, which are used to set the colours. Just checked and I didn’t know that Windows 10 actually started disabling them by default. So our color helper should definitely check for that and not apply them if they’re not supported – just like our print helper does for emoji. Especially in this case, where it has an impact on usability / readability.

If you do want to see colours and make the output prettier, I found this thread. (But of course, you shouldn’t have to.)

Good to know. Thanks for the link!

When I use ner.print-stream, I have a similar problem. The entities are added to the stream, but without colors (in Windows 10). I followed the instructions from the thread Ines posted (creating registry entry, and also tried ConEmu) but it still won’t display the colors.
Is there another way to review my results in a way which is also good readable? Do more users report similar issues?

@Ben Thanks for the report – just to confirm, you’re on the latest version of Prodigy, right? (If I remember correctly, there was a problem in an older version of Prodigy that would always disable the colouring, even if it was supported by the system).

As an alternative solution, the input format expected for spaCy’s built-in entity visualizer is very similar – it only calls the "spans" property "ents". (To be honest, we should probably unify that for consistency.) So you could write a simple script that takes your dataset and outputs a displaCy visualization instead. I haven’t tested this in detail, but something like this should work:

from spacy import displacy
from prodigy.components.db import connect

db = connect()  # connect to database with settings from prodigy.json
dataset = db.get_dataset('your_dataset')  # get examples
examples = [{'text': eg['text'], 'ents': eg['spans']} for eg in dataset]

# start displaCy server
displacy.serve(examples, style='ent', manual=True)
1 Like

Yes, I am using the latest version. However, using displacy works also great.
To be able to distinguish between accepted and rejected samples, I exported the according jsonl files and read it with displacy.

One suggestion for a future recipe: Allowing to review annotations.
I figured out that I did quite some wrong annotations in the beginning. To solve this, I might just delete the first half of my annotations.

Thanks for your quick and super helpful replies!

Ah cool, glad to hear the displaCy solution is working! (Btw, in case you haven’t seen it: If you call displacy.render and set page=True, it will return the markup of a full HTML page instead of starting the server. So you could even write a script that automatically generates a .html document for each dataset. With a custom colour theme, this is actually a really cool feature – maybe we should turn this into a built-in recipe :thinking:)

I hope I understand the use case correctly – but the dataset format is identical to the stream format, so you can always just export your dataset, use it as the input to ner.manual and then correct the annotations.

prodigy db-out your_dataset > your_dataset.jsonl
prodigy ner.manual your_new_dataset en_core_web_sm your_dataset.jsonl --label SOME_LABEL

In theory, you should even be able to pipe it forward:

prodigy db-out your_dataset | ner.maual your_new_dataset en_core_web_sm --label SOME_LABEL

Even if you only re-annotate half of the dataset, you can later reconcile the annotations by comparing the _task_hash of the individual examples (e.g. replace all examples in your old set with the corrected version, if the task hash is available in the new set).

1 Like