Hi @ines - would be great if you could update the video for training insults classifier or at least put a comment above it indicating it’s out-dated - I ran into the seeds issue and needed to track down these posts:
in order to figure it out.
It seems that it has been quite a while since the seeds method in the video would have worked (~1 yr)
Incidentally have there been some updates recently to the online docs?
I see a few details that I cannot find which I’m sure I saw earlier (before I bought Prodigy). For instance details of the loaders and the apis. Looking here: https://prodi.gy/docs/cookbook#loaders there’s a link to https://prodi.gy/docs/#files but that no longer has the loader info (although it used to, according to the Way Back Machine: https://web.archive.org/web/20171224102353/https://prodi.gy/docs/ )
It looks like there may be comments in the README.html that aren’t on the online docs - I’ve since found some detail on the loaders in the README.html, but it wasn’t initially obvious things would be out of sync.
And once last question (sorry!) Can the Reddit corpus loader be made to work with .xz files, since I see that they’re not compressed that way (https://files.pushshift.io/reddit/comments/ shows the last .bz2 format file being 2017-11). Plus any way to enable it to split long comments into sentences would be awesome too. The loader component is compiled (loaders.cpython-35m-x86_64-linux-gnu.so), so I couldn’t look into it myself to check what was needed.