Keep case in annotation UI, but model case-insensitive

simon.gurcke · December 24, 2019, 5:33am

What does it take to make the TextClassifier model in the loop case-insensitive while still showing texts with intact case to the annotator?

ines · December 24, 2019, 11:45am

I think the easiest way would be to add an "html" key to your data that contains the regular-cased version and make the "text" lowercase. Prodigy will then show you the HTML in the interface, but the model in the loop (and any model you train from the data) will still be updated with the "text" value.

Some things to note (you're probably aware of this but just putting it here in case others come across this thread later):

When using this solution, it's of course especially important to make sure the texts match and what the annotator sees is the same content the model is updated with. There's always a small risk in showing the annotators something different, especially if that version contains strong signals that influence the annotation decision.
When training the final model, it can still make sense to update it with both versions: lowercase and the original text. This will make it truly insensitive to case. If the model is only trained on lowercase text, it'd require all runtime inputs to be lowercased, too – otherwise, any capitalisation can throw it off completely.

Topic		Replies	Views
NER tag capitalization question	4	172	December 8, 2023
Correcting textcat.manual textcat	6	411	November 8, 2022
Annotation without model usage , textcat , solved	3	556	November 6, 2019
textcat by sentence given context of larger document textcat	1	782	March 1, 2018
In "textcat" recipes, is it possible to format the to-be-annotated texts? usage , textcat , done , front-end	7	626	October 7, 2019

Keep case in annotation UI, but model case-insensitive

Related topics