R Server is dead! Long live Machine Learning Server!
The introduction of Microsoft R Server with SQL Server 2016 was tremendously exciting for data scientists and analysts working with "Big Data." However, when SQL Server 2017 was released, many were shocked to see that R Server was gone. They were, however, relieved to discover that R Server wasn't gone. Instead, it had become the "Machine Learning Server" to emphasize the fact that the server now supports Python and is no longer limited just to R.
Why Machine Learning Server?
Both R and Python were initially conceived in an era when nobody had multicore CPUs, and 1 gigabyte of RAM seemed unimaginably large. Today, however, both hardware and the demands of data analysis have both outstripped these core languages. Machine Learning Server, or just "ML Server," provides tools to help R and Python address the expanding needs of modern data science.
ML Server is, well, a server. Analysts sitting at their workstations are not limited by the speed and available memory of their local computers; they start a session on the server. Furthermore, when Microsoft acquired Revolution Analytics, it acquired a set of tools that facilitate the analysis of data volumes too large to fit into memory all at once.
R and Python are fundamentally single-threaded languages, meaning their primary processing engines cannot take advantage of today's multicore processors. However, newer software packages from third-party vendors and open-source developers retrofit some ability to provide multithreading capabilities to R and Python. As a result, ML Server makes multithreading available largely transparently to analysts. All work done by ML Server executes in some compute context, and parallel-processing-enabled compute contexts know how to divvy tasks across multiple cores. Indeed, tasks can be divided among multiple ML Servers deployed on the local area network. These machines can communicate using HTTP or the faster but more technically complex MPI (Message-Passing Interface).
ML Server also provides some substantial "turn-key" data science solutions that allow you to perform tasks such as sentiment analysis or neural network classification by feeding your data to Microsoft's packages and not having to write any program code yourself.
Deploying ML Servers
Deploying Machine Learning Servers is surprisingly easy. Microsoft provides a PowerShell application that sets up the infrastructure enabling an ML Server to communicate with the workstations of analysts and other ML Servers on the LAN. This PowerShell script, runAdminUtils.ps1 appears in the Windows menu system as the shortcut "Machine-Learning-Admin-Util." This application can also be used to monitor and test ML Servers. In my experience, monitoring server health and occasionally restarting a moribund ML Server is necessary.
The RxInSqlServer compute context, more often just referred to by its alias "sqlserver," first caught the analysts' eye when R Server was first released in 2016. While this context enables the execution of R and Python scripts by SQL Server and enables the execution of R and Python code from within T-SQL stored procedures, it should be emphasized that this is most definitely not how one would typically approach the analysis of SQL Server data. DataInstead, the analysis is done in what now might be called the "classic" way: import data from SQL Server and analyze it elsewhere.
The sqlserver compute context is best suited for tasks that need to be done locally, that is to say, on the same physical machine running SQL Server. There are several situations in which this might arise. For example, a fraud-detection algorithm might need to be executed as new rows are added. Similarly, a retail recommender algorithm might be applied to new purchase items being recorded in a database. Another potential use for the sqlserver context might be constructing XDF files.
XDF files were introduced to operate on "chunks" of data from a file piece=by-piece rather than have to read the entire file all at once. For particularly large amounts of data, it might be desirable to copy data from SQL Server into an XDF file locally rather than consume network bandwidth by attempting to create the file on a different computer.
Microsoft Machine Learning Server is a substantial component of Microsoft's commitment to enterprise-grade analytics. Analysts' workstation tools, like R Studio, Excel, Visual Studio, and Power BI Desktop, can utilize ML Server to cope with big data's ever-increasing volume and velocity demands.
This piece was originally posted on March 23, 2018, and has been refreshed with updated styling.