Extending UI to display additional fields for textcat.teach

carMartinez · January 30, 2019, 4:46pm

My goal is to use the textcat.teach recipe to annotate sentences which I have stored in a jsonl file. Further, each sentence is associated with a file, and I want to display the file path alongside the text in the annotation interface.

My approach is to wrap textcat.teach in a custom recipe and use the html interface to display additional info.

So far, I have essentially reproduced the textcat.teach recipe with the html interface and introduced a placeholder for the extra field to display, but I’m not sure of the best way to actually include the extra field.

Custom Recipe

import prodigy
from prodigy.recipes.textcat import teach

@prodigy.recipe('custom.textcat.teach',
    dataset=prodigy.recipe_args['dataset'],
    spacy_model=prodigy.recipe_args['spacy_model'],
    source=prodigy.recipe_args['source'],
    label=prodigy.recipe_args['label_set'])
def custom_textcat_teach(dataset, spacy_model, source, label=None):
    components = teach(dataset=dataset, spacy_model=spacy_model,
                       source=source, label=label)
    
    with open('extension/template.html', 'r') as f:
        template = f.read()
    components['config']['html_template'] = template
    components['view_id'] = 'html'
    return components

template.html

<strong>{{text}}</strong>
<span style="background: #ffe184">File path will go here.</span>

Input Data File (jsonl)

{"text": "sentence number one"}
{"text": "sentence number two"}

My naive approach was to modify the input data file to something like

{"text": "sentence number one", "file_path": "path/to/example_one"}
{"text": "sentence number two", "file_path": "path/to/example_two"}

and then reference {{file_path}} in the html template, but this throws an error: ValueError: Failed to load task (invalid JSON)..

So my questions are:

I assume the error is because the input jsonl to textcat is not expecting the ‘file_path’ key - correct?

Are there other valid fields that I can include in the jsonl input and then reference in the html template?

Is there another recommended approach to do this? I could create a generator for the modified jsonl format that will extract just the ‘text’ part and pass it along to textcat.teach, but I’m unsure of how I make the ‘file_path’ values referable in the html template.

ines · January 30, 2019, 6:35pm

Yes, your approach sounds good It's really exaxctly what I would have recommended: using the HTML view with a custom template and additional properties in the task.

ValueError: Failed to load task (invalid JSON)..

This error usually really only occurs if a line can't be loaded by json.loads. The example you pasted looks fine, but maybe you could double-check that there's nothing weird in the file you're loading? An accidental unescaped quotation mark in one of the strings? A trailing comma? You could also write a script that opens the file and calls json.loads on each line to see where it fails.

(The incoming data will be validated against a JSON schema, too, to make sure it has everything it needs – but I just had a look at the schema again and it allows additional properties. So this shouldn't be an issue. Btw, if you're into JSON schemas, you can check it out via prodigy.get_schema('classification').)

Btw, one quick note, also in case others come across this thread later: When you train the model (assuming you're training with textcat.batch-train and spaCy), it will only get to see the "text". So if you do want the model to take the file path into account, you could generate data that looks like this:

{
    "orig_text": "Some text",
    "file_path": "some/path",
    "text": "Some text some/path"
}

Your template would only use the orig_text and file_path, but the model would see the text. Of course, when using this approach, it's important to make sure that what the model sees really matches what the annotator saw – otherwise, you can end up with weird results.

carMartinez · January 30, 2019, 7:23pm

So I did get an error when looping through the input and calling json.loads() on each line. I’m not able to pinpoint exactly which character was giving issues, but it was fixed by using json.dumps() to create the json instead of piecing it together manually. Seems to be working fine now.

Thanks for your expertise!

Topic		Replies	Views
Adding a text box to a recipe usage , textcat , custom , solved	5	900	February 15, 2022
SpanCat and TextCat textcat , custom , spancat	1	28	September 17, 2024
custom recipe not working in 1.9.8 usage , solved	2	478	March 18, 2020
View additional fields in the interface. usage , custom , front-end	1	639	January 3, 2021
Custom spacy pipe for Prodigy view textcat , spacy	2	670	November 21, 2019

Extending UI to display additional fields for textcat.teach

Related topics