Manual Input of Entities to a prodigy database

a.arranz · July 7, 2021, 7:39pm

Hi I was wondering if there is a way in which I can manually input a dictionary of company names into the prodigy database and label them for example as ORG. If not is it possible to create my own custom recipe for this?

Thanks

example
{"text": "WMT"}
{"text": "Walmart Inc."}
{"text": "Walmart"}
{"text": "PTR"}
{"text": "PetroChina Company Limited"}
{"text": "VWAGY"}
{"text": "Volkswagen AG ADR Repstg 1/10th Sh"}
{"text": "Volkswagen"}
{"text": "AMZN"}
{"text": "Amazon.com Inc."}
{"text": "Amazon"}
{"text": "Amazon.com"}
{"text": "KELYB"}
{"text": "Kelly Services Inc. Class B Common Stock"}
{"text": "Kellyservices"}
{"text": "Kelly Services"}
{"text": "KELYA"}
{"text": "Kelly Services Inc. Class A Common Stock"}
{"text": "CHL"}
{"text": "China Mobile Limited"}
{"text": "Chinamobileltd"}
{"text": "China Mobile"}

ines · July 8, 2021, 3:43am

Hi! In that case, you could just load the patterns with spaCy directly to label all matches automatically and then use that data to pretrain you model. My comment here explains how to do this:

Using the EntityRuler has the advantage that it takes patterns in the same format as Prodigy and takes care of filtering out overlaps (which can theoretically occur with multiple patterns).

a.arranz · July 8, 2021, 7:01pm

All right thank you very much

a.arranz · July 8, 2021, 7:05pm

Sorry another question, I was wondering if Prodigy would be able to pick up misspellings from commenters in social media groups, for example instead of Volkswagen, someone comments it as Volksvagen. Is there a way in which we can compare an actual Prodigy database entity with the misspelled 'entity', and if the correlation is high enough Prodigy could identify it as an entity? Or would I have to manually curate all the misspelling or slang people use for companies.

a.arranz · July 8, 2021, 8:09pm

Yes I already have a set amount of curated labels in a Prodigy model. What I forgot to ask in my question is that, when I upload it directly to the Prodigy database is there I way I can set an ORG label to all of the companies in that json file?

ines · July 10, 2021, 6:52am

This is something that a trained model would be able to do, and one of the advantages of training a model to predict similar entities in similar contexts (as opposed to just exact pattern matching). So if your training data is good and representative, your model will also be able to pick up on similar entities, including misspellings.

The approach I linked above lets you create Prodigy annotations based on a patterns file, so if you include patterns with the label ORG, the matches will be labelled as ORG in the data.

Topic		Replies	Views
NER Training for Corporate Names ner , best-practices	22	11700	September 4, 2019
Using Custom Entities usage , ner , solved	8	3544	May 10, 2018
Annotating custom entities in job descriptions usage , custom , hr	9	1242	June 2, 2019
Custom NER Tag for english ner , spacy	1	1628	July 24, 2018
Misspelled named entity extraction usage , ner	1	2953	August 20, 2018

Manual Input of Entities to a prodigy database

Related topics