Watch all the Transform 2020 sessions on-demand here.
Ever wonder which programming languages are the most-used in machine learning? How about which artificial intelligence (AI) and data science packages are tapped by developers more frequently than all others? GitHub resolved a few of those mysteries today, in a follow-up to the 2018 Octoverse report it published in October.
The Microsoft-owned platform pulled info on contributions — e.g., pushing code, opening an issue or pull request, commenting on an issue or pull request, or reviewing a pull request — between January 1, 2018 and December 31, 2018. For the most-imported packages, they used data from GitHub’s dependence graph, which includes all public repositories and any private repositories that have opted in.

Above: The most popular programming languages in machine learning projects on GitHub.
Among contributors to repositories tagged with the “machine-learning” topic, Python is the most common language. That’s not surprising — it’s the third-most used language on GitHub overall. In close second is C++, followed by JavaScript, Java, C#, Julia, Shell, R, TypeScript, and Scala.

Above: The most popular machine learning packages on GitHub.
As for the top packages, Numpy — a package with support for mathematical operations on multidimensional data — is far and away the leader by volume, with three-quarters of AI projects on GitHub using it. The next three most-imported packages — scientific computation toolkit Scipy, dataset management tool Pandas, and visualization library matplotlib — are used in over 40 percent of projects, as is scikit-learn (the fifth-most imported package).
June 5th: The AI Audit in NYC
Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

Above: The most popular machine learning projects on GitHub.
So what about the most popular open source machine learning projects? Google’s open source TensorFlow framework topped the list, followed by scikit-learn and two natural language processing projects, explosion/spaCy and RasaHQ/rasa_nlu. The next four top projects are focused on image processing: CMU-Perceptual-Computing-Lab/openpose, thtrieu/darkflow, ageitgey/face_recognition, and tesseract-ocr/tesseract.