is it possible to handle partial words?

Hi

I get some text from an external system which is not concentrating lines correctly so I have things like:

Reception: give the same room if we have the roomingFrontOffice: book a parking lot

Those are really 2 lines:
Reception: give the same room if we have the rooming
FrontOffice: book a parking lot

I wanted to annotate only the GROUP in bold, but FrontOffice is part of the word roomingFrontOffice
Reception: give the same room if we have the roomingFrontOffice: book a parking lot

should i first find a way to clean it? or training could help?
thx
Jo

Hi @jweizman ,

I suggest cleaning it out first. This will help you in two ways:

  1. It will help you discover other incorrect formatting the external system has.
  2. It will be way easier for you to label them in Prodigy afterwards, as it automatically highlights the tokens. Because "rooming" and "FrontOffice" are together, Prodigy might think that it's just one word.

Perhaps you can even use Prodigy to help you with cleaning? You can create a custom interface to correct your texts, save them into db, export it afterwards, then label the cleaned data.

Interesting about the custom interface
i'll have a look
thx !