Exporting NER annotations for HF datasets

Just a quick question in case someone has already done this: I have exported my annotations to JSONL for NER with the transformers library. I'm going to use the HF datasets package. Has anybody converted the prodigy span format to something suitable for loading seamlessly into a dataset and then into a transformers NER model?

I've done it! With some very slow pandas along the way, unfortunately. I'll try to optimise and then share at some point.

Awesome! If there's something you want to share, that'd be great, and we'd be happy to help you tidy it up so we can maybe make it a proper integration. Because that'd definitely be super cool to have :raised_hands:

If anyone is interested, this is part of an upcoming paper using transformers for adverse drug reaction detection. We're using Prodigy for all our annotation. Here is the repo (GitHub - AustinMOS/adr-nlp) for training a NER model using a JSONL file of annotations exported from Prodigy.

2 Likes