Spancat or NER for email signatures

jordandavis · November 30, 2021, 2:22pm

I'm new to Spacy and Prodigy and have a dataset of email correspondence. In order to get the latest text body from a chain of correspondence, I want to train a model to recognize the end of a message 'EOM'. This could be simple phrases like "Thanks," or "Best Regards,", a person's name, or various forms of an email signature "Company Name", "Company Name + Tel:" etc.

My assumption is that I should break these various EOM's into their respective types. For email signatures, I should train a Span categorizer and then combine that with a phrase matcher for the simpler "Thanks" "Best, John" EOMs.

Does this sound like the best method for my objective or am I making a rookie mistake?

ljvmiranda921 · December 3, 2021, 9:07am

Hi @jordandavis , welcome to Prodigy!

You have two options here:

Usually it is good to check how well a naive, non-ML solution works. Try implementing a simple rules-based function to parse email-address and check your accuracy. You can use things like regex, checking the tokens in a Doc, hand-built business rules, or just using the PhraseMatcher.
For a more machine learning approach, I'd recommend trying out the SpanCategorizer. When doing this, always ensure that your annotation scheme is consistent.

Topic		Replies	Views
Identify Email Signatures usage	1	815	November 6, 2018
span categorization spacy , spancat	3	354	March 24, 2023
spans.llm.correct seems to have good llm response but no highlighting done , spacy , front-end , spancat	3	133	June 13, 2024
NER or PhraseMatcher? ner , spacy , best-practices	17	6091	September 20, 2018
Address extraction: NER or Spancat? ner , spacy , spancat	1	2105	June 9, 2023

Spancat or NER for email signatures

Related topics