I’ve been using the dependency parser quite extensively in english. I wanted to extend my code to be able to cope with french, but I noticed that noun chunks behave quite differently:
“He gave an apple to Paul” gives 2 noun chunks ‘an apple’ and ‘Paul’
“Il donna une pomme à Paul” gives a single noun chunk “une pomme à Paul”. We should be getting 2 (“pomme” and “Paul”).
I suspect this comes from the dependency tree (english “to” is between “apple” and “paul” while in french “à” is child of “Paul”).
Does it mean English ClearNLP dependency tree and UDP are defined in such a different way that the definition of noun chunk should be changed ? Is it just a problem with training (i.e this dependency is not parsed properly and more training will solve the problem)?
Any insight appreciated.