Which one to use? Ner, dependency linking?

I want to extract the data from realestate newspaper ads. Here are two examples:

Oberstreu: Haus zu verkaufen Oberstreu: Haus zu verkaufen, 230 m², mit Scheune und 2 Waldgrundstücken, voll renoviert,

3-Zi.-ETW, Gemünden, 65 m2 3-Zi.-ETW, Gemünden, 65 m2, gute Wohnlage, Balkon, ETW

For the first one I would love to extract following:

Address: Oberstreu
SquareMeter: 230
Condition: Voll renoviert

Second one:
Rooms: 3-Zi-.ETW
SquareMeter: 65
Address: Gemünden
Balcony: Balkon
ApartmentType: ETW

Which of the tools of prodigy best to use to annotate the data?

If each ad is small and contains only information about a single object, I would model this as a named entity tagging task.

If you have ads which refer to several items to be sold, you would have to disambiguate which property belongs to which object. Dependencies would be one way of modelling that.

In my experience, the NE setup is much faster to annotate with.

You could also ask annotators to reject ads which contain multiple objects and then funnel the rejected examples into the more complicated dependency formalism.

1 Like