Intel's distribution of Python, created in collaboration with Anaconda, is billed by the hardware giant as a distribution created for performance. Specifically, the numerical performance demanded by scientific research and machine learning. Claims are bandied about referring to Python that executes at machine-language speeds, but does Intel Python really deliver?
I won't use the word "benchmark," which, to my mind, suggests a scientific and carefully engineered set of performance measurements. However, there is nothing wrong with taking Intel Python for a quick spin and seeing what happens.
Conventional Python Code
For many data scientists, manipulating Panda's data frames is a daily routine. So I was disturbed when I read a blog suggesting that Pandas' code ran slower in the Intel distribution. After all, this seems not to make sense. Intel's enhancements rely heavily on Python code, calling fully compiled libraries for numerical algorithms. Code such as Pandas would not be expected to take advantage of these enhancements, but we certainly wouldn't expect it to be any worse. In my experiments, Panda's data manipulation using the Intel distribution was not noticeably different from that done with the Anaconda distribution.
Matrix math constitutes a large proportion of machine learning code, and numpy is the "go-to" library used by most Python programmers for array manipulation. Running a numpy matrix multiplication routine on the Intel platform yielded a performance improvement averaging about 20%. This improvement is more impressive than it might seem initially since the numpy modules are already highly optimized.
Numba is not an Intel library but is included if you install the full Intel Python distribution instead of the core distribution. Numba provides a just-in-time (jit) compiler to convert Python into machine-language code. In the past, I have encountered challenges installing and correctly configuring numba. So I was delighted to discover that the Intel Python distribution installs numba seamlessly. When Intel's distribution is done installing, numba is there, and it works.
We shouldn't expect to see numba yield a performance improvement in NumPy's matrix multiplication since, in this case, most of NumPy's work is being done in efficient libraries, not in Python itself. We would, however, expect to see improvement in code where mathematical tasks are repeatedly performed within Python loops. The popular and perhaps overdone Mandelbrot set is an example of such code. Test code running with the numba "jitter" ran seven and eight times faster than the Python code.
Intel's Data Analytics Acceleration Library (DAAL) is available as a separate library and does not require the Intel Python distribution. However, if you fully install Intel Python, DAAL comes along for the ride. This library consists of many of the functions commonly used in data analytics. It is not a python library, but Intel provides a python interface called DAAL4Py. This python interface replaces the now-deprecated pyDAAL.
Rebirth of the Plaidypus
OK, PlaidML is not directly related to Intel Python, but since we are speaking of Intel offerings of interest to data scientists and machine learning aficionados, plaidML should be mentioned. PlaidmML is a tensor compiler initially introduced by a company called Vertex.AI. PlaidML has found new life now that Intel has acquired Vertex.AI. Briefly put, plaidML takes high-level instructions for machine learning platforms such as Keras and ONNX and translates them into low-level instructions for different hardware platforms such as CPUs, NVidia CUDA GPUs, and GPUs that support OpenCL.
In contrast with my experience with Google's TensorFlow, the plaidML installation process seems to quickly and reliably discover and use CUDA drivers on both Windows and Linux systems.
Intel is ensuring that data scientists and machine learning researchers continue to see performance improvements. Will you train a neural network over your coffee break? Probably not. However, the latest Intel offerings and upgrades seem to be squeezing more performance out of the current generation of hardware and well positioned to take advantage of the next.
This piece was originally posted on August 22, 2019, and has been refreshed with updated styling.