The tools of choice for data scientists are machine learning (ML) and Artificial Intelligence (AI) software. There is a wealth of articles listing reliable ML and AI tools with in-depth descriptions of functionality. We have tried to add some insights into the process by including feedback from our experiences. We have tried to answer 5 major questions in precise points below with somewhat detailed descriptions later in the article. Take a look and happy learning!

 

  1. Which are the most popular machine learning languages?
    1. Python
    2. R
    3. C++
  2. What are the top data analytics and visualization tools for AI?
    1. Pandas
    2. Matplotlib
    3. Jupyter notebook
    4. Tableau
  3. Name the best frameworks for general machine learning?
    1. NLTK
    2. SciKit-Learn
    3. NumPy
  4. Which are the best Machine Learning frameworks for Neural Network modelling?
    1. TensorFlow
    2. TensorBoard
    3. PyTorch
    4. Keras
    5. Caffe2
  5. What are the top Big Data tools?
    1. MemSQL
    2. Apache Spark

 

Details descriptions of the questions follow:

 

1. The most popular machine learning languages:

Python: It is a very popular language with high-quality machine learning and data analysis libraries: Python is a general-purpose language favored for its readability, good structure, and a relatively mild learning curve which continues to gain popularity. According to the Annual Developer Survey by Stack Overflow in January 2018, Python can be called the fastest-growing major programming language. It’s ranked the seventh most popular language (38.8 percent), and now is one step ahead of C# (34.4 percent).

Head of research in Respeecher Grant Reaber, who specializes in deep learning applied to speech recognition, uses Python as “almost everyone currently uses it for deep learning”

Co-founder of the NEAR.AI startup who previously managed a team in Google Research on deep learning also sticks with Python: “Python was always a language of data analysis, and, over the time, became a de-facto language for deep learning with all modern libraries built for it.”

 One of the use cases of Python machine learning is model development and particularly prototyping.

Facebook AI researcher Denis Yarats notes that this language has an amazing toolset for deep learning like PyTorch framework or NumPy library.

C++: a middle-level language used for parallel computing on CUDA: C++ is a flexible, object-oriented, statically-typed language based on the C programming language. The language remains popular among developers thanks to its reliability, performance, and the large number of application domains it supports. Another application of this language is the development of drivers and software that can directly interact with hardware under real-time constraints. And since C++ is clean enough for the explanation of basic concepts, it’s used for research and teaching.

Data scientists use this language for diverse yet specific tasks.

Andrii Babii, a senior lecturer at the Kharkiv National University of Radioelectronics (NURE), uses C++ for parallel implementations of algorithms on CUDA, an Nvidia GPU compute platform, to speed up applications based on those algorithms.

R: A language for statistical computing and graphics: R, a language and environment for statistics, visualizations, and data analysis, is a top pick for data scientists. It’s another implementation of the S programming language.

R and libraries written in it provide numerous graphical and statistical techniques like classical statistical tests, linear and nonlinear modeling, time-series analysis, classification, clustering, and etc. We can easily extend the language with R machine learning packages. The language allows for creating high-quality plots, including formulae and mathematical symbols.

 

2. Data analytics and visualization tools:

Pandas: A Python data analysis library enhancing analytics and modelling: Wes McKinney, a data science expert,  developed this library to make data analysis and modeling convenient in Python. Prior to pandas, this programming language worked well only for data preparation and munging.

pandas simplifies analysis by converting CSV, JSON, and TSV data files or a SQL database into a data frame, a Python object looking like an Excel or an SPSS table with rows and columns.

Matplotlib: A Python machine learning library for quality visualizations: Matplotlib is a Python 2D plotting library and originates from MATLAB: Its developer John D. Hunter emulated plotting commands from Mathworks’ MATLAB software.

While written mostly in Python, the library is extended with NumPy and other code, so it performs well even when used for large arrays.

It allows for generating production-quality visualizations with a few lines of code.

 

 

Data science practitioners note Matplotlib’s flexibility and integration capabilities are far ahead of others.

Denis Yarats (Facebook AI Research) says he chooses matplotlib mostly because it’s integrated well into the Python toolset and can be used with the NumPy library or PyTorch machine learning framework.

Jupyter notebook: collaborative work capabilities: The Jupyter Notebook is a free web application for interactive computing. With it, users can create and share documents with live code, develop and execute code, as well as present and discuss task results. A document can be shared via Dropbox, email, GitHub, and Jupyter Notebook Viewer, and it can contain graphics and narrative text.

The notebook is rich in functionality and provides various use scenarios. It can be integrated with numerous tools, such as Apache Spark, pandas, and TensorFlow. It supports more than 40 languages, including R, Scala, Python, and Julia. Besides these capabilities, Jupyter Notebook supports container platforms — Docker and Kubernetes.

Illia Polosukhin from NEAR.AI shares that he uses Jupyter Notebook mostly for custom ad-hoc analysis: “The application allows for doing any data or model analysis quickly, with the ability to connect to a kernel on a remote server. You can also share a resulting notebook with colleagues.”

Tableau: powerful data exploration capabilities and interactive visualization: Tableau is a data visualization tool used in data science and business intelligence. A number of specific features make this software efficient for solving problems in various industries and data environments.

Through data exploration and discovery, Tableau software quickly extracts insights from data and presents them in understandable formats. It doesn’t require excellent programming skills and can be easily installed on all kinds of devices. While a little script must be written, most operations are done by drag and drop.