prodigy print-dataset shows weird format output/ no coloring

Volodymyr · February 25, 2020, 11:40am

Updated prodigy up to version 1.9.7. After that coloring feature is despaired.

$ python3 -m prodigy print-dataset some-dataset | less -r
('Add a reminder for \x1b[38;5;16;48;5;222m my wife \x1b[0m\x1b[38;5;16;48;5;2m REMINDED_PERSON \x1b[0m to go a hospital. to \x1b[38;5;16;48;5;222m go a hospital \x1b[0m\x1b[38;5;16;48;5;2m REMINDER_SUBJECT \x1b[0m.', '\n')

MacOS 10.14.6 (18G103)

$ uname -a
Darwin MacBookPro.local 18.7.0 Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64 x86_64

$ pip3 --version
pip 19.1.1 from /usr/local/lib/python3.6/site-packages/pip (python 3.6)

$ python3 --version
Python 3.6.5

$ help
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin18)

Thanks a lot.

ines · February 25, 2020, 12:50pm

Thanks for the report! This was tricky to debug, because I initially couldn't reproduce it and it seemed so basic. Turned out that Cython was messing up print statements with more than one positional argument and turning the arguments into tuples (basically printing like Python 2 where print("a", "b") becomes ('a', 'b')). I've already fixed this and will push a new release with the fix.

In the meantime, you can patch it by editing the recipe in prodigy/recipes/generic.py and using your own pretty_print function, like this:

def pretty_print(stream, views=["spans", "textcat"]):
    for task in stream:
        result = []
        if "textcat" in views:
            result.append(printers.format_label(task))
        if "spans" in views:
            result.append(printers.format_spans(task))
        else:
            result.append(task.get("text", ""))
        print(f"{' '.join(result)}\n")

Volodymyr · February 25, 2020, 1:08pm

Thanks a lot for your answer.

einarbmag · March 2, 2020, 1:34pm

Just FYI, this fix gives me some strange behaviour for examples with multiple tagged entities. It prints out the full example and highlights only the first-occurring entity, then prints out the example again, but starting from the first token after the first entity, and highlights the second entity. This is iterated until there's no entities left.

ines · March 2, 2020, 3:09pm

That's strange, thanks for the report! Was the data you're rendering created with Prodigy? I've seen this problem happen if the "spans" are out-of-order, but that obviously shouldn't happen if the spans come directly from annotation

einarbmag · March 2, 2020, 3:13pm

Actually that might be it - I have a custom loader which pre-labels a couple of entities based on some structured data accompanying the unlabelled text examples, and then I add further annotations in the Prodigy view. I'll have a look at how I do that pre-labelling, specifically the ordering... thanks for the hint!

ines · March 2, 2020, 3:31pm

@einarbmag Yes, that's a likely explanation! I guess Prodigy might as well just sort spans by default before printing or rendering them... The logic is pretty simple and it helps prevent confusion.

ines · March 14, 2020, 6:54pm

Just released v1.9.8, which should fix the underlying problem

Topic		Replies	Views
output and weird format ner , done	4	499	September 8, 2020
No highlighting in print recipes using less -r done	6	981	February 23, 2018
computer-vision multi-label example fails out-of-the-box when using `print-dataset` done	2	423	February 4, 2020
Console colors don't show properly on windows done , windows	6	1887	May 7, 2018
ner.print-dataset does not show colorfull annotation usage	2	721	May 27, 2019

prodigy print-dataset shows weird format output/ no coloring

Related topics