I have a corpus of text where I have the following the phrases can be occur
inside forgery and theft
forgery or theft premise inside
theft given forgery and (premise inside)
premise inside forgery and theft
is there some pattern i can create where I could if I see any permutation of the above based on
inside, forgery and theft and premise in a phrase i could tag it into one entity?
There’s a few ways you might do that, depending on the exact boundaries of the phrases you’re interested in. It’s hard to give a specific pattern without knowing what sort of phrases should be excluded.
Are there any instances of the words “forgery” or “theft” that you don’t want the pattern to match?
One option to consider is to use the dependency parse, after you’ve identified the key term. This can help you move from a single word to a longer phrase you’re interested in. You can plug different sentences or fragments into the parser here: https://explosion.ai/demos/displacy?text=suspicion%20of%20forgery%20or%20theft%20inside%20the%20premise&model=en_core_web_sm&cpu=0&cph=0 . You can read more about how to use the parser on the spaCy docs: https://spacy.io/usage/linguistic-features#section-dependency-parse