BILUO or IOB ?

areversat · November 14, 2018, 8:02pm

Hi,

I’m looking into adding a few entities. I’m in the process of creating a dataset that contains both the existing entities and my new ones.

In order to do this I created a tool that does NER using the en_core_web_lg model and then allows me to edit the enities. One problem is that I am not able to find out the BILUO “value” from the token. token.ent_iob_ gives me the IOB “value” but I’d like to be able to have tags such as B-ORG I-ORG L-ORG.

Is there an easy way to get those from the tokens ? Is it important ? Is it a problem if I mix IOB entities and BILUO entities ?

honnibal · November 15, 2018, 11:49am

Hi @areversat,

There are some helper methods in the spacy.gold module that I think will help you, specifically the functions spacy.gold.iob_to_biluo, and perhaps also the function spacy.gold.biluo_tags_from_offsets.

I would usually recommend storing annotations in a stand-off format, like Prodigy does. Specifically, this means recording the start and end offsets of the characters, along with the label. The problem with BILUO is it ties the entity annotation to the tokens, when really the token boundaries are also an annotation --- they might be incorrect, and they don't preserve all of the information in the document.

As for mixing the BILUO and IOB encodings, potentially this would be a problem, yes! In the IOB scheme, the tag for a single-word entity is B. This would be an invalid sequence in BILUO, since in BILUO all B tags must be followed by I or L.

areversat · November 15, 2018, 8:07pm

First of all thanks for the advice and for the tools you build.

So if I understand correctly, I would have something as follows (as in https://github.com/explosion/spacy/blob/master/examples/training/train_new_entity_type.py) :

{'text': 'According to an estimate by Bank of America, something or other', 'entities': [(28, 43, 'ORG')]}

would be enough and I would let spacy figure what it needs to understand about the token in order to make an accurate prediction.

areversat · November 15, 2018, 8:38pm

As it turns out prodigy ner.make-gold should work for my use case.

Topic		Replies	Views
NER Prodigy to IOB2 format usage , ner , spacy	1	1118	August 4, 2020
convert prodigy annotation file to iob format usage , ner , solved , transformers	2	2816	April 16, 2020
BILUO/IOB tags - yes / no? ner , discussion	0	1202	April 3, 2021
Ner format to CONLL usage , ner , solved	7	5365	June 4, 2019
Prodigy JSONL (or spaCY Doc) to CoNLL 2003 usage , ner , spacy , custom	4	925	November 2, 2022

BILUO or IOB ?

Related topics