Fully manual NER annotations without tokeniser

juhai · May 7, 2020, 2:43pm

Hello, I am new to Prodigy.

My task is about annotating specific characters on text with categories. I don't want to use Spacy for modelling so I am struggling with the requirement to tokenize the text for ner.manual.

I would like to get annotations with respect to original text rather than the token annotations coming out of ner.manual. I've tried searching for the answer in the forum but failed so far.

Any ideas if what I want is possible?

ines · May 8, 2020, 7:45pm

Hi! Your question is timed well, because v1.10 will actually have a mode for this out-of-the-box that just lets you highlight characters

In the meantime, you could also achieve something similar by making each character an entry in the "tokens" – they're called "tokens", but in reality, they're mostly just a highlightable unit. And then you probably also want to adjust the margin of the .prodigy-content span (the chracters) so they're not as spaced.

juhai · May 11, 2020, 12:12pm

Hi Ines. Thanks for the response and I am happy my request has been in the works already. I tried separating by character but didn't know about the span control. I will try that out while you finalise the 1.10 release.
Cheers!

ines · June 17, 2020, 5:45pm

Just released Prodigy v1.10, which includes a --highlight-chars flag that lets you highlight characters instead. Also see here for details and examples.

Topic		Replies	Views
Can I use rel.manual without tokenization?	2	354	December 7, 2022
NER with commas in the word through ner.correct	1	381	September 12, 2022
Working at the character level usage , ner , custom	6	1297	June 26, 2019
What would be the way/task to highlight /annotate text from a variable span of characters to be decided by the annotator? usage , front-end , solved	1	402	November 17, 2020
character based annotation issues with Arabic usage , ner	2	267	April 22, 2022

Fully manual NER annotations without tokeniser

Related topics