sequence labelling with prodigy ?

Hi,
I’ve got a list of phrases with entities separated by commas, as shown below.
Phrase 1 : “Entity 1, Entity 2, Entity 3, Entity 4”
Phrase 2 : “Entity 5, Entity 6, Entity 7, Entity 8, Entity 9”

I have 5 categories, and each entity belongs to 1 category. I’d use a text classifier, but the position of the entity and the label of the previous entities is important, so it’s more like a sequence labelling problem.
Can I use prodigy’s NER feature for this ? I’ve tried to create recipe similar to ner.teach with a custom tokenizer but it doesn’t seem to do the trick.

Do you already have the data labelled? If so, you might want to work with the spaCy directly. I would guess the NER model would be able to learn your data. Check that it can memorise a small sample first – train it on a few examples, and evaluate it on the same ones to make sure it’s learning them.

Actually I think the parser would be able to learn your data as well, if you make a little tree out of your phrases instead of a flat list. You could make each phrase depend on the one immediately after it. This might perform better, as the parser takes more care to condition on the current state than the NER model does.

Thx for the quick reply.
The data is not labelled yet. I was planning on using prodigy for the labelisation (since prodigy helps minimize the volume of data to be labelled), but I was looking for a simple way to force prodigy to understand that the boundaries of the entities is always a comma. If it’s not possible, I’ll label the data another way.
I get the spacy NER option, but I don’t get the parser option, could you elaborate ?