named entity extraction wrong

pathapatisivayya · July 22, 2019, 12:19pm

I Manually trained few entities like Skill, Role, Employer …

After training and extracting model for given below sample text


text="""s rocket eNgine combustion planner system using Python with the usage of restapi 
written in,Flask,with computation package written in, C++, and maintained database in,Amazon Redshift,with the help of,PostgreSQl,"""

doc = nlp(text)
    label_list = [str(l.label_) for l in doc.ents]
    data = {}
    for label in label_list:
        data[label] = [str(e) for e in doc.ents if e.label_ == label]

    skills = ""

    if 'SKILL' in data:
        skills = ','.join(data["SKILL"])
        print(skills)

It is working properly and giving correct output as below.

output:: Python,Flask,C,database,Amazon Redshift,PostgreSQL

But when I tried extracting skill from sentences hanving skills not separated by commas, it is unable to extract correct skills as output below.
Output: : flask,PostgreSQL,Amazon Redshift

sample text = “”“I worked on NASA’s rocket engine planner system using Python with the usage of restapi written in Flask with computation package written in C++ and maintained database in Amazon Redshift with the help of PostgreSQL”""

please help me how to extract skills without commas.

Thanks in advance.
Cheers,
Shiva

honnibal · July 22, 2019, 12:31pm

If all of your training examples have commas around the entities, then it makes sense that the model would learn that.

The best solution would be to do some text pre-processing to clean up the commas in your data. If you do think they’ll be useful, you can add two copies of the text to your training data: one copy where the commas are present, and one copy where the commas are absent. You’ll just have to adjust the span annotations so that the offsets remain correct after you’ve done the string processing.

pathapatisivayya · July 22, 2019, 12:35pm

how to clean up the commas data

please send me any python code clean up commas

Thanks in advance.
Cheers,
Shiva

honnibal · July 22, 2019, 12:37pm

I’m sorry but general Python programming support is outside of the scope of the help we can offer. You could try looking for a consultant here: spaCy/prodigy consultants? , or perhaps more generally on a site like freelancer.com

pathapatisivayya · July 22, 2019, 12:38pm

Hi Matthew,

Thanks for the response.

In fact i was able to find the skill 'Flask' in the pre-trained jsonl file.

{"label":"SKILL","pattern":[{"lower":"database"}]}
{"label":"SKILL","pattern":[{"lower":"restapi"}]}
{"label":"SKILL","pattern":[{"lower":"Flask"}]}
{"label":"SKILL","pattern":[{"lower":"PostgreSQl"}]}
{"label":"SKILL","pattern":[{"lower":"Amazon Redshift"}]}
{"label":"SKILL","pattern":[{"lower":"amzon redshifit"}]}
{"label":"SKILL","pattern":[{"lower":"amazonredshift"}]}

But it's not identifying the keyword flask for given phrase.

> text = """I worked on NASA's rocket eNgine combustion planner system using Python with the usage of restapi " \
> written in Flask with computation package written in C++ and maintained database in Amazon Redshift with
> the help of PostgreSQl"""
> skill_set = nlp(text)
> for skill in skill_set.ents:
>     print(skill)

it successfully identifies 4 skills:

NASA
Python
C++
database

But unable to identify skills like Flask, Restapi, PostgreSQl and Amazon redshift. Even the skills are in trained json and given input. And yes as you mentioned we pre-processed the text before training it. We have removed extra space, commas, etc...

Can you please tell me where i'm doing wrong.

Thanks in Advance.

Sivayya · July 23, 2019, 4:51am

Hi Matthew,

In fact i was able to find the skill ‘Flask’ in the pre-trained jsonl file.

{“label”:“SKILL”,“pattern”:[{“lower”:“database”}]}
{“label”:“SKILL”,“pattern”:[{“lower”:“restapi”}]}
{“label”:“SKILL”,“pattern”:[{“lower”:“Flask”}]}
{“label”:“SKILL”,“pattern”:[{“lower”:“PostgreSQl”}]}
{“label”:“SKILL”,“pattern”:[{“lower”:“Amazon Redshift”}]}
{“label”:“SKILL”,“pattern”:[{“lower”:“amzon redshifit”}]}
{“label”:“SKILL”,“pattern”:[{“lower”:“amazonredshift”}]}

But it’s not identifying the keyword flask for given phrase.

text = """I worked on NASA's rocket eNgine combustion planner system using Python with the usage of restapi " \
written in Flask with computation package written in C++ and maintained database in Amazon Redshift with
the help of PostgreSQl"""
skill_set = nlp(text)
for skill in skill_set.ents:
	print(skill)

it successfully identifies 4 skills:

NASA
Python
C++
database

But unable to identify skills like Flask, Restapi, PostgreSQl and Amazon redshift. Even the skills are in trained json and given input. And yes as you mentioned we pre-processed the text before training it. We have removed extra space, commas, etc…

Can you please tell me where i’m doing wrong.

Thanks in Advance.

Topic		Replies	Views
Custom NER model usage , ner , spacy	6	1337	April 15, 2019
ner.batch-train after ner.maual results error (Value error : [E024]) ner , spacy , solved	8	2862	June 26, 2019
Spacy tags punctuations usage , ner , spacy , solved	3	513	November 19, 2018
NER detection and comma (,) ner	5	1999	March 28, 2018
sequence labelling with prodigy ? usage	2	591	February 27, 2018

named entity extraction wrong

Related Topics