Complex entities capabilities

Can I set complex entities, that not only contain combinations of few words each, but also sometimes share only context similarities? It’s for a chatbot.
An example: Suppose I set an entity like “@feeling good”. Now, people can report “I’m feeling good” or “I have a good feeling”. I want to be able to assosiate “I had a great day today”, or “I’m full of energy” with the same entity “@feeling good” (with a certain score, of course)

My question is - If I train the the model as discribed, can I expect it to be able to extrat entities even if it is pronounced totally differently? It’s like I wish to extract entities, even if they’re not extactly as on the list.
(Some adviced me to set it as “intent”, but it won’t work in my case)
Thank you

1 Like

I think what you’re describing isn’t really an “entity”, and I think the entity recogniser is probably not the best choice for what you want.

The entity recogniser is designed to recognise phrases that have distinct beginnings and ends. It reads the text left-to-right, and keeps track of a small amount of state as it goes. Specifically, it tracks whether an entity is open, and what type of entity it is. If an entity is currently open, it chooses between the actions of continuing the entity, or marking its end on the current word. If no entity is open, the actions available are begin new entity, mark a single-word entity, or continue with no entities open.

The categories you’re interested in don’t really have distinct beginnings and ends. Even a human annotator will have trouble deciding whether to mark “I’m feeling good” or just “feeling good” as your phrase of interest. The model also only has a very limited amount of context available — 4 words on either side of the current word.

I think you’ll be much better off cutting your text into sentences, or possibly even shorter units, and then applying labels to the text with the text categoriser.

Thank you for your insight.
I will try to implement it into my project.
My thoughts are to use prodigy for entities extraction, and a platform like Dialogflow for the intents.
Of course I will have to try that and see what is the best option for my ambitious (everybody around me say: "Way way too ambitious) project.

Well, it’s okay to have an ambitious project, but not every component needs to have an ambitious scope :slight_smile: . To make the technologies work well, you need to define problems the models can actually solve.

Like, let’s say you wanted to mine social media text to find band recommendations. You don’t want to go and tag all the bands you like with an entity label like BAND_I_LIKE. Whether you like the band or not probably isn’t in the local context of that example, so how’s the model supposed to predict that? Instead you want to just tag them BAND, and then separately have a table that lists which bands you like, and which bands you don’t.