,

An easy and secure way to deploy mlflow tracking service

DALL·E 2024-01-22 16.03.41 - A panoramic digital artwork with enhanced emphasis on the MLflow aspect, depicting the concept of MLflow Cloud Secure Storage. In the center, feature

In this blog post we present an easy and secure way to deploy the mlflow tracking service in the cloud. As one of the MLOps tools, mlflow simplifies the process of development, deployment and management of AI/ML models. We find it very useful and show how small development teams can effectively set it up and share their experimental results with very little effort.



Written by Senior Data Scientist, Jacek Cala


 

1. Introduction

Whether you are trying to build and finetune your Deep Neural Network (DNN) or Machine Learning (ML) models, or run any kind of in-silico experiments, mlflow is a very useful tool to help you manage your workflow. Briefly, mlflow is a platform to simplify the process of development, deployment and management of AI/ML models and one of its key strengths is experiment tracking. The tracking allows you to keep the record of inputs and outputs related to each experimental run. You can log model hyperparameters, for example, a dropout value in your DNN, and its performance metrics, for instance, accuracy. You can also store various artifacts such as configuration files with all the details of your run as well as the output model itself. In addition, you do not need to remember which model worked best and under which conditions. You can easily search for this using mlflow web interface.

There is much more in mlflow than experiment tracking, but these other areas are out of the scope of this blog post. In this post we will focus on the tracking service and on one specific way to run it -- an easy and secure deployment which can help you to collaborate with colleagues from your team.

On the mlflow documentation portal there are three distinct ways that it can be deployed:

  1. Localhost (default) -- for solo development
  2. Local Tracking with Local Database -- for solo development
  3. Remote Tracking with MLflow Tracking Server -- for team development

In this blog post we combine the ease of deployment of methods (1) and (2) with the ability to collaborate with colleagues that is part of (3). This can all be done in a secure setup useful for small teams, but perhaps those which do not store highly sensitive experimental data (see limitations for further details).

2. Architecture

The key idea behind the proposed solution is to run a local mlflow tracking server which can securely access a remote backend database and artifact store in the cloud. The architecture resembles option (2) of the Common Setups in the mlflow documentation, but with a little twist: the artifact and metadat stores are located in the cloud; see figure below.

 

arch-diagram

Fig. 1. Architecture diagram of the proposed mlflow tracking solution.

There are three main benefits of this approach (and not too many limitations):

  • Firstly, you can avoid developing a user management and authentication solution, which sometimes needs to be integrated with a SSO solution in your company.
  • Secondly, once you set up the security credentials, you can forget about the problem of logging in/out of your tracking server.
  • Thirdly, given all the perks and benefits of the cloud and mlflow, it is a very straightforward and secure solution.

3. Configuration

To configure the experiment tracking system you will need to follow these three steps:

  1. Create a database engine in the cloud.
  2. Create a file/blob store in the cloud.
  3. Configure your local mlflow environment to access systems created in step (1) and (2).

For the sake of simplicity, we do (1) and (2) in one selected cloud platform, namely Azure, but you should be able to use any cloud provider that offers secure database deployment and secure file store supported by mlflow, like AWS or GCP.

In the next part of this blog, our assumption is that you have access to an Azure subscription via the portal and you are authorised to create relevant services.

3.1 Create a database engine and database

In the Azure portal we are going to create an Azure Database for PostgreSQL Flexible Server.

From all the options, you need to configure at least:

  1. Resource group: pick or create a group in which the server will be placed.
  2. Server name: whatever you prefer, e.g. mlflowdb.
  3. Region: whatever you prefer, e.g. select one which is close to your location.
  4. Workload type: start with Development as you can easily adjust it later once you or your colleagues hit performance issues.
  5. Authentication method: leave PostgreSQL authentication only.
  6. Admin username: whatever you prefer, e.g. mlflow_db_admin.
  7. Password: pick a long secure password and store it for later. It will be used by mlflow internally, so you will not need to remember or type it manually. However, you may want to use only characters and numbers to avoid issues with password encoding later.

It is important to note that Azure Database for PostgreSQL provides a postgres engine instance which by default uses SSL to secure connections to the database. However, you may want to further restrict access to your database engine, e.g. by creating a non-admin database account or adding firewall rules in the Networking section. These steps are not mandatory but can improve security. For example, if you know which public IP addresses you are going to connect from, you can add them to the firewall to strengthen the protection of your database. As a rule of thumb, we typically add our office static IP range and a few IP addresses when working from home. Another security measure you may wish to take is changing the password on regularly.

Above is a minimal setup you need to apply to create the database service in Azure. But, depending on your exact requirements, you may want to adjust other configuration options as well.

Once you are done, you can hit the Review+create and then Create button.

Now, wait for the database engine to become ready, navigate to Settings and select Databases from the side panel. We need to create a database dedicated to storing experiment tracking information (c.f. the metadata store on the architecture diagram). Add a new database named, for example, mlflow_tracking.

3.2 Create a blob storage container

The next step is to create a blob storage container. For this we first need to create a Storage account. If you prefer, you may want to reuse an existing storage account as mlflow will store artifacts in a dedicated container anyway. But, for the sake of service isolation, it is best to create a new storage account in a separate resource group, so you can more easily manage both the blob storage and database engine in the future.

Of all the options, you need to configure at least:

  • Resource group: pick the same as (3.1a) above; unless you have a good reason not to do so.
  • Storage account name: whatever you prefer and is allowed by Azure, e.g. mlflowblobstore.
  • Region: pick the same as (3.1c) above.
  • Redundancy: you can downgrade to Locally-redundant storage (LRS) to save on costs. Or you can keep it at a higher redundancy, but remember to increase the redundancy of the database engine as well.

Again, that is the minimal setup you need to apply and, given your exact requirements, you may want to set up other options, too. However, if you applied network access restrictions for the database engine, you would need to do a similar configuration once the blob storage account is created.

For now, you can hit Create.

Once the storage account is created, we need to create a container for our experiment artifacts. Navigate to the Data storage section and pick Containers. Create a new container named, e.g., mlflow-artifacts

3.3 Connect the dots

As soon as the two services are ready and running, you can prepare your local configuration for mlflow to use. These steps need to be done by anyone in your team who would like to use the same mlflow tracking deployment.

Prepare credentials

Firstly, you need to prepare credentials for the services created in the cloud. As all team members will need them, we suggest storing them in a file which you can easily distribute. For example, you can create a file named .env with the following contents:

MLFLOW_DB_HOSTNAME=<YOUR DATABASE SERVER NAME>
MLFLOW_DB_USERNAME=<YOUR DATABASE ADMIN NAME>
MLFLOW_DB_PASSWORD=<YOUR DATABASE PASSWORD>
MLFLOW_DB_DBNAME=<YOUR DATABASE NAME>

MLFLOW_ARTIFACT_STORE_URI=<YOUR CONTAINER URI, e.g. wasbs://mlflow-artifacts@mlflowblobstore.blob.core.windows.net/>

AZURE_STORAGE_CONNECTION_STRING=<CONNECTION STRING>
AZURE_STORAGE_ACCESS_KEY=<ACCESS KEY>

If you followed the suggested settings, your .env file would look similar to this:

MLFLOW_DB_HOSTNAME=mlflowdb.postgres.database.azure.com
MLFLOW_DB_USERNAME=mlflow_db_admin
MLFLOW_DB_PASSWORD=***
MLFLOW_DB_DBNAME=mlflow_tracking

MLFLOW_ARTIFACT_STORE_URI=wasbs://mlflow-artifacts@mlflowblobstore.blob.core.windows.net/

AZURE_STORAGE_ACCESS_KEY=***
AZURE_STORAGE_CONNECTION_STRING=***

Remember to set the fields with content *** given your own setup. Use the MLFLOW_DB_PASSWORD you set in Section 3.1, whilst AZURE_STORAGE_ACCESS_KEY and AZURE_STORAGE_CONNECTION_STRING you can copy from the Azure portal by navigating to Security+networking and Access keys within your storage account. Simply, pick one of the two key credentials.

Of course, the .env file holds all your security credentials to the cloud storage you created, so you should never commit it to your source version control. If you use git, it is best to add the .env to your .gitignore file straightaway.

Configure your mlflow server instance

The final preparation step is to run the mlflow server. As the architecture diagram shows, every mlflow user will need to start their own server instance. Thus, it is good to have a script to do it:

  • using bash on Linux/Mac:
#!/bin/bash

source .env

mlflow server --backend-store-uri "postgresql://$MLFLOW_DB_USERNAME:$MLFLOW_DB_PASSWORD@$MLFLOW_DB_HOSTNAME/$MLFLOW_DB_DBNAME" --artifacts-destination "$MLFLOW_ARTIFACT_STORE_URI"
  • or using powershell on Windows:
Get-Content .env | ForEach-Object {
  $name, $value = $_.split('=', 2)
  if (![string]::IsNullOrWhiteSpace($name) -and !$name.StartsWith('#')) {
    Set-Item -Path env:$name -Value $value
  } 
}

mlflow server --backend-store-uri "postgresql://$env:MLFLOW_DB_USERNAME`:$env:MLFLOW_DB_PASSWORD@$env:MLFLOW_DB_HOSTNAME/$env:MLFLOW_DB_DBNAME" --artifacts-destination "$env:MLFLOW_ARTIFACT_STORE_URI"

Note, however, that before the mlflow server can start, you will also need to install extra dependencies which allow it to communicate with the database and artifact store. In the case of the configuration with PostgreSQL and Azure blob storage, you will need in your requirements.txt file three entries: mlflow and two additional libraries.

mlflow
psycopg2
azure-storage-blob

Libraries for other database engines (like MySQL, Oracle and Microsoft SQL Server) and storage services are listed on the SQLAlchemy backends and mlflow artifacts stores pages.

Enjoy!

Once the server is started, you can navigate to http://localhost:5000 in your browser and check if it is available.

From now on, you can use this mlflow server in code by setting the tracking server URI to point to your local instance. Either set the MLFLOW_TRACKING_URI environment variable to http://localhost:5000 or do it directly in Python using the mlflow.set_tracking_uri function.

An example Python code which creates a new experimental run looks as follows:

import mlflow

mlflow.set_tracking_uri(uri="http://127.0.0.1:5000")

mlflow.set_experiment('test-experiment')

# Create an empty test file to store it as an artifact.
open('test-file.txt', 'a').close()

with mlflow.start_run(run_name='test-run'):
    mlflow.log_param('test parameter', 1)
    mlflow.log_artifact('test-file.txt')

4. Limitations

Clearly, the presented method is not without its limitations. The two main ones are:

  • the lack of an authorisation mechanism, and
  • additional configuration steps required to use the tracking system.

The former means that all users have full access to the service, can see all experiments and runs, add new and even delete any experiments stored. The latter means that anyone who wants to use the common mlflow tracking will need to follow the steps in Section 3.3, which may not always be desirable.

If any of these feel like a blocker and you still need to collaborate within a team, you will need to follow the Remote Experiment Tracking tutorial in the mlflow documentation.

Another slight disadvantage of the proposed deployment method is that you will not be able to view the progress of your experiments on your phone easily (this stems directly from the need for the additional configuration steps described in Section 3.3). Although mlflow UI requires a large screen to fully appreciate all the tracking functions, plots and tables, the ability to see on a phone whether or not your recent runs have finished is a nice to have not available in the proposed setup.

5. Conclusions

In this post we presented a simple and secure way to deploy mlflow tracking service. It combines the ease of deployment with the ability to share access to the same tracking system with colleagues in your team. Although it is not the most flexible and secure configuration, we think the proposed option is very well suited in many scenarios. This is especially true if you do not want to manage user authentication nor authorisation, and do not need to access the tracking UI from a restricted device like a smartphone.


Dr Jacek Cala is a Senior Data Scientist at the National Innovation Centre for Data. His technical background is in scalable computing, with broad experience in workflows, application deployment, recomputation and programming. His PhD focused on adaptive deployment of component-based applications in distributed systems.


 

To find out more about working with us, get in touch.
We'd love to hear from you.