Hi, here are the commands I used after downloading the insults_reddit.jsonl file referenced in the video:
prodigy dataset insults
prodigy db-in insults insults_reddit.jsonl
prodigy textcat.batch-train insults en_vectors_web_lg --output insults-model --eval-split 0.2
Then I loaded the model and tested it, but it failed the simple sanity test:
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
import spacy
nlp = spacy.load('insults-model')
doc = nlp('you are a fucking asshole')
print(doc.cats)
{'INSULT': 0.9630451798439026}
doc = nlp('you are lovely')
print(doc.cats)
{'INSULT': 0.9774250984191895}
print(doc.cats)
{'INSULT': 0.9774250984191895}
doc = nlp('hello')
print(doc.cats)
{'INSULT': 0.0003795093798544258}
doc = nlp('you are a nice person')
print(doc.cats)
{'INSULT': 0.9930683374404907}
doc = nlp('he is such a nice person')
print(doc.cats)
{'INSULT': 0.050295885652303696}
doc = nlp('he is such an asshole')
print(doc.cats)
{'INSULT': 0.01430890429764986}
doc = nlp('you are so kind')
print(doc.cats)
{'INSULT': 0.28676915168762207}
Please advise. It seems like the model has been trained to recognize 'you are' as an insult. Is this the expected behavior at this stage in the model?
I spent a fair deal of time tinkering with a harassment classifier, so I would say that what you see makes sense. The insult dataset only has ~500 examples, and I saw the same problem with my classifier when it had a small number of examples.
What I ended up doing was taking a few template strings and generating synthetic examples that are labeled correctly to help the model distinguish the import part from the irrelevant bits. So, I had a couple hundred examples like:
{"text":"You shit head","label":"HARASSMENT","answer":"accept"}
{"text":"You stupid ass","label":"HARASSMENT","answer":"accept"}
{"text":"You wonderful human being","label":"HARASSMENT","answer":"reject"}
{"text":"You're such a nice guy","label":"HARASSMENT","answer":"reject"}
{"text":"You're such a complete jerk","label":"HARASSMENT","answer":"accept"}
{"text":"You're such a total asshole","label":"HARASSMENT","answer":"accept"}
This mostly resolved the issue with the difference between “You are [something nice]” and “You are [something mean]” but insulting/harassing language has lots of other tricky bits that are hard for a model to learn (e.g. culturally acceptable uses of the word bitch)
I gathered ~4000 examples for my demo model (https://stopharassing.me/) and you can see that it makes fewer of these dumb mistakes, but it can still get tripped up pretty easily.
I don’t think you’ve done anything wrong, but insults are hard to capture in a comprehensive way with only 500 examples.