Other coding languages...

Hi all,

(Hope this is not to off-topic)

What are great programming languages to know in order to get up-to-speeds with spaCy/Prodigy/Thinc?

Knowing python (with all relevant sw suites) is obvious, pytorch desirable, but as a “data hacker” (not a clean coder) I have not a good feeling for the other languages. C+ (for cython), java/ javascript (for making good customer UX), etc. Same for big data frameworks

I don’t intend to start a big “war-of-the-languages”, more input from you of the what and why



spaCy and Thinc are written entirely in Python and Cython. Prodigy’s server and recipes are in Python and Cython, and the front-end is in HTML/CSS/Javascript. So, those should definitely be plenty :).

I wrote up some thoughts about Cython here: https://explosion.ai/blog/writing-c-in-cython . You might also find my talk from PyCon Israel interesting: https://www.youtube.com/watch?v=yJR3qCUB27I

Hi Matthew,

thanks for the input. Gives me a feel for what to write in the “requirement” section of the job ad.

This is an interesting question! :smiley: In general, I think it’s definitely good to focus on Python and the language features itself, rather than some specific framework or API. This is also part of our philosophy: in our libraries, we try to not steal the control flow and let the user write regular Python code. For instance, instead of just providing you a fit() or a train() method with a bunch of arguments, you can write your own training loop. Instead of a method doc.get_all_verbs, we encourage you to write [token for token in doc if token.pos_ == "VERB"].

Prodigy implements a similar philosophy: Instead of writing recipes in a Prodigy-specific config language, you can write pretty straightforward Python functions. And instead of making the user implement a specific “Prodigy stream” object, you can pass in a regular Python generator that can do whatever you like, as long as it yields dictionaries.

Front-end technologies like HTML/CSS or JavaScript aren’t strictly required to use our stack – you can totally be productive in spaCy and Prodigy without them. But it’s definitely a nice-to-have, especially when it comes to delivering end-to-end projects, even if it’s just for showcasing a trained model, or building a demo or proof of concept.