s2v_reddit_2019_lg - error in unzipped file

I am trying to use the vectors in the sense2vec vectors for reddit library that posted on the github page. However, when merged the three files and tried to unzip them, Winzip reported that there is an error in the file. When I attempted to unzip the individual one, it reported an error in 'part 1'.

Has anyone been able to unzip these and have them working? Is there a way I can repair them before merging them? any insights will be appreciated.

Hi Ernest,

that's strange. I've just downloaded everything on my Linux machine and it worked just fine.

That said. I recall that I once received a corrupted file from Github before. When there's a hiccup on their end, your filesystem may still download/receive a file without knowing about it.

Just to check, did redownloading help? I don't have a Windows machine so I can't try Winzip, but this command did do the trick for me:

# Move it all into a single file
cat s2v_reddit_2019_lg.tar.gz.* > s2v_reddit_lg.tar.gz
# Untar it
tar -xvf s2v_reddit_lg.tar.gz 

Hi Vincent,
Thanks for your reply. I tried using the tar command and I get the following error:

PS C:\Workspace\WILEE\env_prodigy\reddit_sense_vectors> tar -xvf s2v_reddit_2019_lg.tar.gz
tar : tar.exe: Error opening archive: Unrecognized archive format
At line:1 char:1

  • tar -xvf s2v_reddit_2019_lg.tar.gz
  •   + CategoryInfo          : NotSpecified: (tar.exe: Error ... archive format:String) [], RemoteException
      + FullyQualifiedErrorId : NativeCommandError

I have downloaded the files a few times already but I can try again and see.

1 Like

It seems the problem has to do with the Windows environment. When I moved the files to a unix environment I was able to join the files and have it working. Unfortunately, the reddit sense vectors doesn't quite cover my use case so I have to generate my own sense vectors.

Beware that training your own sense vectors can require a lot of data. If you are interested in training your own, be sure the check the example training scripts here: