Hi, we are total newbies in NLP just starting to learn on a real live project. We need to intake hundreds of thousands of survey question-response pairs. There are surprisingly many different formulations of the same core questions in the set.
An example would be:
(a) How satisfied are you with access to water quality data for the northeastern part of the City.
(b) Indicate your level of agreement with the following statement: I have easy access to water quality data for the northeastern part of the City.
(c) Rate the accessibility of water quality information for the northeastern part of the City.
These are essentially the same exact question.
We need to boil down the questions into their generic equivalents.
So far we are doing this by hand, like this:
We are hoping to use Prodigy's tagging or something similar for our training set. Instead of POS elements shown in the picture below (like adjective, noun etc) we would like to use our own categories which we are still defining.
Is Prodigy's interface capable of this or are we on wrong path? Any help would be greatly appreciated, thanks!