Discovering associated words/phrases using NLP

I agree with @SofieVL that framing this as a pure span extraction task will likely be inefficient because the boundaries of the spans are very ambiguous and potentially disjoint, and it'll be very difficult to create consistent data for it. The information about the season may also be spread across multiple keywords and indicators, including how the sentence is phrased, which a text classifier will be much better suited for.

This thread might be relevant to you, and it actually deals with a very similar task (extracting season compatibility for garments from reviews):

I actually just did a talk where I walked through a similar problem (span extraction vs. labels over the whole text), which might be interesting as well. The example starts around ~13:00: https://www.youtube.com/watch?v=mJqFI7vhqdA

So for your use case, you could start by classifying the texts by season, which should be pretty straightforward to label and predict. You can still use keywords to pre-select and pre-label the examples, e.g. if a text contains "warm weather", it'll likely be SUMMER. If you have a text classifier that gives you the attention scores for the individual tokens, this would give you insights into what the strongest signals were that led to the prediction.

However, what you do next depends on your end goal and what you're actually looking to do with those tokens/phrases. If those will be consumed by some other downstream application, you might want to consider how useful those will actually be: you'll likely end up with arbitrary tokens that don't necessarily have anything in common or represent the same type of syntactic constituents. So it might be difficult to actually do anything useful with them.

1 Like