Hi,
I am hoping for some advice or creative solutions to our challenge:
We are using Prodigy to train a Classifier and Entity Recogniser that works on incoming request emails, identifies their intent and then depending on the intent, extracts the necessary entities we need to fulfil their request.
Emails obviously have their challenges, but we have managed to train a Classifier to identify intents that are our services we offer, in this case i will be talking about a request to send an asset (e.g. a video file) to a destination.
We have also trained a NER model that works well to then identify the asset ID, TITLE, DURATION, etc.
The problem we now have is where people ask for multiple assets to be sent to multiple destinations, such as:
“Hi,\n\nPlease send ASSET1 and ASSET2 to DESTINATION 1, DESTINATION 2”
“Good morning,\n\nPlease distribute ASSET1 to DESTINATION1 and ASSET2 to DESTINATION2”
“Hello,\n\nPlease send to DESTINATION1 the following videos:\n\nASSET1 TITLE1 DURATION1, ASSET2 TITLE2 DURATION2”
We are thinking that we should train the relationships between Asset and Destination and can see how we could create the training data if there were just one asset to one or more destinations and then use dep.batch-train. But this is obviously not the case in the examples above.
My plan of action is to create new entities “ASSET_SET” and “DESTINATION_SET” that would look for neighbouring entities of the same types and then to train dependencies between them instead.
Thinking that we could use a patterns file and use our previous NER entity types to build the sets. Such as:
pattern = [{'ENT_TYPE': 'ASSET'},{'ORTH': ','},{'ENT_TYPE': 'ASSET'}]
pattern = [{'ENT_TYPE': 'ASSET'},{'IS_SPACE': True},{'ENT_TYPE': 'ASSET'}]
- Does this strategy make sense? Or is there a better approach here?
- How can we create the patterns such that there could be any number of assets or destinations in each set?
Thanks in advance for any help you can provide! Will provide more detail should you need it.