Accessing Artifacts from Within a UDF¶
Using the Exasol MLflow Plugin significantly speeds up loading MLflow models in Exasol UDFs.
There are a few things to keep in mind, though.
MLflow Tracking URI¶
Loading the MLflow model directly from the BucketFS mounted into the local file system of the UDF is the fastest option. It also does not require communication with the MLflow server and consequently the MLflow Tracking URI is not required.
When you cannot guarantee the model to be accessible in the local file system of the UDF, some utility functions will help you to automatically choose the fastest loading option.
Setting the MLflow Tracking URI¶
In all cases (potentially) accessing the MLflow server, the UDF needs to set the MLflow Tracking URI. This can be done, by:
Setting environment variable
MLFLOW_TRACKING_URIorCalling
mlflow.set_tracking_uri()within the UDF implementation.
Depending on the environment your Exasol instance is running in, the MLflow Tracking URI might differ from the one you can use on your local machine. This applies in particular when running an Exasol DockerDB instance inside a virtual machine.
Creating the UDF¶
After having built, deployed, and activated your SLC, you can use Exasol SQL to define a UDF like this:
local_path_or_uri() to read the model from the local file
system if possible. The MLflow Tracking URI is passed via
environment variable MLFLOW_TRACKING_URI.¶--/
CREATE OR REPLACE MLFLOW_SLC SCALAR SCRIPT
"<SCHEMA>"."<UDF_NAME>"(uri VARCHAR(2000))
RETURNS BOOL AS
%env MLFLOW_TRACKING_URI=http://localhost:5000;
import mlflow
from exasol.mlflow_plugin.artifacts.bucketfs_connector import (
local_path_or_uri
)
def run(ctx):
locator = local_path_or_uri(ctx.uri)
model = mlflow.sklearn.load_model(locator)
#--
#-- your implementation using the model goes here
#--
return True
/
Running the UDF¶
Now you can run the UDF via the following SQL statement
SELECT "<SCHEMA>"."<UDF_NAME>"('exa+bfs://...');
Function local_path_or_uri()¶
The function checks if:
The URI points to the BucketFS artifact store and
The associated path is mounted into the local file system of the UDF.
If both conditions are true, then the function will return a path in the local
file system, that can be passed to one of the load_model() functions of
the MLflow API, e.g. mlflow.models.Model.load() or
mlflow.sklearn.load_model().
Otherwise the function will return the original URI without changes, for loading the model via the MLflow server which can be significantly slower.
Function load_model_with_fallback()¶
Another option is using this function, which accepts the URI and the actual load-function as arguments.
load_model_with_fallback(). The MLflow Tracking URI is set via
mlflow.set_tracking_uri() within the implementation of the
UDF.¶--/
CREATE OR REPLACE MLFLOW_SLC SCALAR SCRIPT
"<SCHEMA>"."<UDF_NAME>"(uri VARCHAR(2000))
RETURNS BOOL AS
import mlflow
from exasol.mlflow_plugin.artifacts.bucketfs_connector import (
load_model_with_fallback
)
def run(ctx):
mlflow.set_tracking_uri("http://localhost:5000")
model = load_model_with_fallback(ctx.uri, mlflow.sklearn.load_model)
return True
/