Accessing Artifacts from Within a UDF

Using the Exasol MLflow Plugin significantly speeds up loading MLflow models in Exasol UDFs.

There are a few things to keep in mind, though.

MLflow Tracking URI

Loading the MLflow model directly from the BucketFS mounted into the local file system of the UDF is the fastest option. It also does not require communication with the MLflow server and consequently the MLflow Tracking URI is not required.

When you cannot guarantee the model to be accessible in the local file system of the UDF, some utility functions will help you to automatically choose the fastest loading option.

Setting the MLflow Tracking URI

In all cases (potentially) accessing the MLflow server, the UDF needs to set the MLflow Tracking URI. This can be done, by:

  • Setting environment variable MLFLOW_TRACKING_URI or

  • Calling mlflow.set_tracking_uri() within the UDF implementation.

Depending on the environment your Exasol instance is running in, the MLflow Tracking URI might differ from the one you can use on your local machine. This applies in particular when running an Exasol DockerDB instance inside a virtual machine.

Creating the UDF

After having built, deployed, and activated your SLC, you can use Exasol SQL to define a UDF like this:

Sample UDF loading an MLflow model using function local_path_or_uri() to read the model from the local file system if possible. The MLflow Tracking URI is passed via environment variable MLFLOW_TRACKING_URI.
--/
CREATE OR REPLACE MLFLOW_SLC SCALAR SCRIPT
   "<SCHEMA>"."<UDF_NAME>"(uri VARCHAR(2000))
   RETURNS BOOL AS
%env MLFLOW_TRACKING_URI=http://localhost:5000;
import mlflow
from exasol.mlflow_plugin.artifacts.bucketfs_connector import (
    local_path_or_uri
)
def run(ctx):
    locator = local_path_or_uri(ctx.uri)
    model = mlflow.sklearn.load_model(locator)
    #--
    #-- your implementation using the model goes here
    #--
    return True
/

Running the UDF

Now you can run the UDF via the following SQL statement

SELECT "<SCHEMA>"."<UDF_NAME>"('exa+bfs://...');

Function local_path_or_uri()

The function checks if:

  • The URI points to the BucketFS artifact store and

  • The associated path is mounted into the local file system of the UDF.

If both conditions are true, then the function will return a path in the local file system, that can be passed to one of the load_model() functions of the MLflow API, e.g. mlflow.models.Model.load() or mlflow.sklearn.load_model().

Otherwise the function will return the original URI without changes, for loading the model via the MLflow server which can be significantly slower.

Function load_model_with_fallback()

Another option is using this function, which accepts the URI and the actual load-function as arguments.

Sample UDF loading an MLflow model via load_model_with_fallback(). The MLflow Tracking URI is set via mlflow.set_tracking_uri() within the implementation of the UDF.
--/
CREATE OR REPLACE MLFLOW_SLC SCALAR SCRIPT
   "<SCHEMA>"."<UDF_NAME>"(uri VARCHAR(2000))
   RETURNS BOOL AS
import mlflow
from exasol.mlflow_plugin.artifacts.bucketfs_connector import (
    load_model_with_fallback
)
def run(ctx):
    mlflow.set_tracking_uri("http://localhost:5000")
    model = load_model_with_fallback(ctx.uri, mlflow.sklearn.load_model)
    return True
/