Training a sentiment analysis model

I am working on the corpus to make a model which can extract observational sentences

as I wrote before
I have done this

python -m prodigy textcat.manual dfobsv02 en_core_web_sm dfObsV02.jsonl --label Observational,Nonobservational --exclusive

now annotater is working to annotate each sentence to observation or non-observataion

moreover , I have a custom NER which can extract LONG,DATA,TIME...

I need to know what would you recommend as the next step to have a nice sentiment analysis model?

how can I save the data after annotation, probably with?

db-out

any idea would be appreciated

Your workflow sounds good – how well it's going to work will depend on the data. You could just try and train a text classifier, and see how you go.

Yes, that's correct.

1 Like

thank you for your prompt response, basically after finishing the annotation I can run:

prodigy db-out dfobsv02  /annotations

if I want to classify them by myself, would it be easy to extract the annotation and put them as a data frame in column y?

I guess if I do not want to do that I should use:

textcat.batch-train

textcat.eval

to see the result?

any idea how can I connect this to my NER model? I mean it should be way to use NER to improve classification since the sentences including "LONG", "DATE", "TIME" are more likely to be an observation. any idea would be appreciated. many thanks

Yes, the data you get out will look something like this:

{"text": "Some text", "options": [...], "accept": ["LABEL1", "LABEL2"]}

So it should be easy to load that into a dataframe or any other format you need. Typically you want to use the text and the label(s).

If you're training a spaCy model, both components will be independent. However, they will still process and work on the same text – so even if you're not using the NER labels as features in the text classifier, it will still see the same tokens. And the fact that there are dates or times in the text can impact its predictions. So I wouldn't worry too much about incorporating the NER predictions as features and just treat it as an independent text classification task. Just see how you go.

1 Like