}

Loading New R Packages into AzureML

18/06/2019

Microsoft Azure ML provides over 500 individual R packages for use in R scripts. It is almost certain, however, that at some point you will wish to use an R package not available by default.

Several years ago, before Revolution Analytics was acquired by Microsoft, Andrie deVries created a very useful package called miniCRAN. This package makes it easy to create local packages that users can employ to install R packages. Often, this is necessary for security reasons; users are prevented from downloading executable files from external repositories.

However, miniCRAN also serves another valuable purpose which will suit our needs precisely. It is easy to upload a single independent R package into Microsoft Azure ML. However, many R packages have multiple dependencies, and managing all the requirements can become a management headache. A very practical solution is to define an R script on a local machine that creates a repository of the desired packages. This repository will automatically include required dependencies. The entire repository can then be zipped and loaded into your AzureML space, and desired packages can be installed as needed, just as easily as if you will using RStudio. Furthermore, the zipped repository can become a shared resource for all the R developers on the team.

[sidebar_cta header="Data Science is More Than a Buzzword. It's the Key to Your Organization's Long-Term Success." color="blue" icon="" btn_href="https://www.learningtree.com/resources-library/webinars/data-science-demystified-informed-organizational-decision-making/" btn_href_en="https://www.learningtree.com/resources-library/webinars/data-science-demystified-informed-organizational-decision-making/" btn_href_ca="https://www.learningtree.ca/resources-library/webinars/data-science-demystified-informed-organizational-decision-making/" btn_href_uk="https://www.learningtree.co.uk/resources-library/webinars/data-science-demystified-informed-organisational-decision-making/" btn_href_se="https://www.learningtree.se/kunskapsbank/webinars/data-science-demystified-informed-organisational-decision-making/" btn_text=" Learn More, Watch Our On-Demand Webinar"]

Creating the Local Repository


In this example, we will install and load the package "sn" which is helpful for generating skewed probability distributions.

The following script is run in your local RStudio to generate a local CRAN-like repository:

library(miniCRAN)

# cf https://blog.revolutionanalytics.com/2014/10/introducing-minicran.html

options(repos = c(CRAN = "https://cran.at.r-project.org/"))

# we create a vector of all the CRAN packages we

# would like to include in our local repository

pkgs <- c("numDeriv", "sn")

localCRAN <- "~/localMiniCRAN"

dir.create(localCRAN)

makeRepo(pkgDep(pkgs), path = localCRAN, type = "source")

makeRepo(pkgDep(pkgs), path = localCRAN, type = "win.binary")

When we are finished, we can view the resulting repository files in the Windows File Explorer. Note that in this example we explicitly distinguish between packages distributed as source code and those distributed as Windows binaries.



Note that the local repository includes mnormt, which is a requirement of sn but which we did not explicitly mention in the repository script.

Loading the Zipped Repository into AzureML


The zipped repository is uploaded as you would any data file. The "dataset" is then connected to an Execute R Script step in the Azure ML Studio.



The following code installs the required package(s) for use by the R script. In this example, R code to view a table of packages is included, but this is solely to observe the repository packages in a test environment. It is not necessary in the deployed script.

# setting-up the repository

uri_repo <- "file:///C:/src/localMiniCRAN/"

options(repos = uri_repo)

# extracting the list of available packages

table_packages <- data.frame(package = rownames(available.packages()))

# installing a required package

install.packages("sn")

library(sn)

# do something with the newly loaded libraries

# outputting the list of packages

maml.mapOutputPort("table_packages")

Conclusion


The miniCRAN package is an excellent tool for managing package dependencies for Azure ML projects involving R script. The example here includes only a single package with two dependencies, but a real miniCRAN repository will include all the package references required by a multitude of Azure ML experiments. The miniCRAN repository then becomes a single easy-to-use and easy-to-manage resource for Azure ML scripts that require additional packages not supplied on Azure.

Written by Dan Buskirk

The pleasures of the table belong to all ages.” Actually, Brillat-Savaron was talking about the dinner table, but the quote applies equally well to Dan’s other big interest, tables of data. Dan has worked with Microsoft Excel since the Dark Ages and has utilized SQL Server since Windows NT first became available to developers as a beta (it was 32 bits! wow!). Since then, Dan has helped corporations and government agencies gather, store, and analyze data and has also taught and mentored their teams using the Microsoft Business Intelligence Stack to impose order on chaos. Dan has taught Learning Tree in Learning Tree’s SQL Server & Microsoft Office curriculums for over 14 years. In addition to his professional data and analysis work, Dan is a proponent of functional programming techniques in general, especially Microsoft’s new .NET functional language F#. Dan enjoys speaking at .NET and F# user’s groups on these topics.