Keep case in annotation UI, but model case-insensitive

What does it take to make the TextClassifier model in the loop case-insensitive while still showing texts with intact case to the annotator?

I think the easiest way would be to add an "html" key to your data that contains the regular-cased version and make the "text" lowercase. Prodigy will then show you the HTML in the interface, but the model in the loop (and any model you train from the data) will still be updated with the "text" value.

Some things to note (you're probably aware of this but just putting it here in case others come across this thread later):

  • When using this solution, it's of course especially important to make sure the texts match and what the annotator sees is the same content the model is updated with. There's always a small risk in showing the annotators something different, especially if that version contains strong signals that influence the annotation decision.
  • When training the final model, it can still make sense to update it with both versions: lowercase and the original text. This will make it truly insensitive to case. If the model is only trained on lowercase text, it'd require all runtime inputs to be lowercased, too – otherwise, any capitalisation can throw it off completely.
1 Like