prodigy print-dataset shows weird format output/ no coloring

Updated prodigy up to version 1.9.7. After that coloring feature is despaired.

$ python3 -m prodigy print-dataset some-dataset | less -r
('Add a reminder for \x1b[38;5;16;48;5;222m my wife \x1b[0m\x1b[38;5;16;48;5;2m REMINDED_PERSON \x1b[0m to go a hospital. to \x1b[38;5;16;48;5;222m go a hospital \x1b[0m\x1b[38;5;16;48;5;2m REMINDER_SUBJECT \x1b[0m.', '\n')

MacOS 10.14.6 (18G103)

$ uname -a
Darwin MacBookPro.local 18.7.0 Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64 x86_64

$ pip3 --version
pip 19.1.1 from /usr/local/lib/python3.6/site-packages/pip (python 3.6)

$ python3 --version
Python 3.6.5

$ help
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin18)

Thanks a lot.

Thanks for the report! This was tricky to debug, because I initially couldn't reproduce it and it seemed so basic. Turned out that Cython was messing up print statements with more than one positional argument and turning the arguments into tuples (basically printing like Python 2 where print("a", "b") becomes ('a', 'b')). I've already fixed this and will push a new release with the fix.

In the meantime, you can patch it by editing the recipe in prodigy/recipes/generic.py and using your own pretty_print function, like this:

def pretty_print(stream, views=["spans", "textcat"]):
    for task in stream:
        result = []
        if "textcat" in views:
            result.append(printers.format_label(task))
        if "spans" in views:
            result.append(printers.format_spans(task))
        else:
            result.append(task.get("text", ""))
        print(f"{' '.join(result)}\n")

Thanks a lot for your answer.

Just FYI, this fix gives me some strange behaviour for examples with multiple tagged entities. It prints out the full example and highlights only the first-occurring entity, then prints out the example again, but starting from the first token after the first entity, and highlights the second entity. This is iterated until there's no entities left.

That's strange, thanks for the report! Was the data you're rendering created with Prodigy? I've seen this problem happen if the "spans" are out-of-order, but that obviously shouldn't happen if the spans come directly from annotation :thinking:

Actually that might be it - I have a custom loader which pre-labels a couple of entities based on some structured data accompanying the unlabelled text examples, and then I add further annotations in the Prodigy view. I'll have a look at how I do that pre-labelling, specifically the ordering... thanks for the hint!

1 Like

@einarbmag Yes, that's a likely explanation! I guess Prodigy might as well just sort spans by default before printing or rendering them... The logic is pretty simple and it helps prevent confusion.

Just released v1.9.8, which should fix the underlying problem :slightly_smiling_face: