Amazon Comprehend custom entities / Prodigy

Hi all, I've just read this blog post on AWS Machine Learning Blog about the new incoming feature of Amazon Comprehend custom entities.

A short description:

Customers can now train state-of-the-art entity recognition models to extract their specific terms, completely automatically [...] Training the service to learn custom entity types is as easy as providing a set of those entities and a set of real-world documents that contain them. To get started, put together a list of entities. [...] Next, collect a set of documents that contain those entities in the context of how they are used. The service needs a minimum of 1,000 documents containing at least one or more of the entities from our list.

I think Prodigy is really great platform and so we can do almost anything :top: also to reproduce this feature of AWS: I think with a combination of ner.match and a custom code we can get the same job done, isn't?
But what I do not understand is how AWS can create training data from patterns that produce false positives, without let the humans explore patterns interactively? How can it manage the disambiguation?

Happy to know your ideas/feedback! Prodigy rules!

Yes, I think you're right.

Good question! I suppose it could try to do something unsupervised? Maybe it uses a pre-trained model, and assumes there's some sort of agreement between what you're looking for and the model's output? It could also do some clustering.

Either way I think it's much better to do some labelling. It doesn't take very long to give the binary feedback, and even if they have some automated wizardry, it's hard to imagine that being better than explicit supervision.

I wonder what the privacy policy is like. If it doesn't prevent a human from manually reading the text and labelling it behind the scenes, maybe that's what's actually happening?

Thanks for the link! Will be interesting to watch this. If anyone has experience with the platform, I'd be eager to hear how you've found it.

1 Like