I know this issue has already been reported but I just installed the new release 1.4.2 and it looks like this one slipped through the cracks :
def __iter__(self):
stream = self.get_stream()
self.n = 0
for doc in self.nlp.pipe((eg['text'] for eg in stream)):
for sent in doc.sents:
yield [w.text for w in doc]
self.n += 1
Should be:
def __iter__(self):
stream = self.get_stream()
self.n = 0
for doc in self.nlp.pipe((eg['text'] for eg in stream)):
for sent in doc.sents:
yield [w.text for w in sent]
self.n += 1