Also what I don't understand is we are iterating through sentences but not using the sent object. This might be the root cause of the issue.
For example I had sample file with below text:
testing the sentences. there might be a memory leak. This will repeat three times for each sentences.
the sentence array contained the same tokens repeated 3 times as there were three sentences.
[
['testing', 'the', 'sentences', '.', 'there', 'might', 'be', 'a', 'memory', 'leak', '.', 'This', 'will', 'repeat', 'there', 'times', 'for', 'each', 'sentences', '.'],
['testing', 'the', 'sentences', '.', 'there', 'might', 'be', 'a', 'memory', 'leak', '.', 'This', 'will', 'repeat', 'there', 'times', 'for', 'each', 'sentences', '.'],
['testing', 'the', 'sentences', '.', 'there', 'might', 'be', 'a', 'memory', 'leak', '.', 'This', 'will', 'repeat', 'there', 'times', 'for', 'each', 'sentences', '.']
]
This might be the root cause of the memory issue.