Hi all, I've just read this blog post on AWS Machine Learning Blog about the new incoming feature of Amazon Comprehend custom entities.
A short description:
Customers can now train state-of-the-art entity recognition models to extract their specific terms, completely automatically [...] Training the service to learn custom entity types is as easy as providing a set of those entities and a set of real-world documents that contain them. To get started, put together a list of entities. [...] Next, collect a set of documents that contain those entities in the context of how they are used. The service needs a minimum of 1,000 documents containing at least one or more of the entities from our list.
I think Prodigy is really great platform and so we can do almost anything also to reproduce this feature of AWS: I think with a combination of ner.match
and a custom code we can get the same job done, isn't?
But what I do not understand is how AWS can create training data from patterns that produce false positives, without let the humans explore patterns interactively? How can it manage the disambiguation?
Happy to know your ideas/feedback! Prodigy rules!