Data Deep Dive: Serving machine learning models using AWS Lambda


Introducing the latest instalment in our Data Deep Dive series, Serving machine learning models using AWS Lambda, written by NICD Data Scientist Fergus McClean


As data scientists, we like building models. But models are of no use to anyone just sitting on our laptops. To serve up some insightful predictions to our eager audience, we need to make models accessible to the outside world. Of course, there are many ways to do this. We could wrap our model inside a web application with a user interface and run it on a web server, either one we host ourselves, or one that sits in the cloud. This sounds nice, but what happens if you have multiple applications that all need to use the same model? And what if you want to update the model without fiddling with your web application code? The solution to these issues is to serve the model independently of your web application using something called a REST API (Representational State Transfer Application Programming Interface). A REST API has no user interface as it's designed to let computer programs (not humans) speak to each other. In this case it will be our model and our web application doing the talking. The advantages of using a REST API over simply putting your model inside you application are that: - You can update your model without updating your application code - You can access the same model from different applications without having to embed it multiple times - Testing your deployed model will be easier if it is independant of a larger web application

So we've decided we want to serve our model using a REST API. There are various tools available that can do this. In Python, there is Flask, Django and FastAPI, while in R, Plumber seems to be the primary option. These tools will give you the ability to easily create a REST API and serve up predictions from your model. But what happens if your model requires a high performance CPU or GPU to make these predictions? Running an expensive server all the time, regardless of how many requests are made, does not sound ideal. Likewise, what happens if your model becomes a supermodel and everyone wants to make requests? Can a single server still support this? For scalability, let's look at some cloud-based solutions.

You could simply deploy your REST API into a scalable service like AWS Elastic Beanstalk, Azure App Service or Google App Engine. However, the billing model of these services is based on virtual machine usage rather than number of API requests. Therefore, it's quite tricky to match the amount of compute you are paying for to the amount of requests being made. What if you could be charged based on the number of requests being made, rather than the number of computers you are using? Well, you can, and this growing area of cloud computing is known as Serverless. This is a perfect solution for our model serving problem and there are plenty of options available, including Azure Functions, GCP Cloud Functions and AWS Lambda. In this article, we will focus on Lambda.

There is one more option which we will not be covering here. Increasingly, Machine Learning platforms are offering ways to serve your model with minimal effort as part of their easy-to-use ecosystem. Examples include Databricks, Azure Machine Learning, GCP Vertex AI and AWS Sagemaker. These tools provide great solutions but you do have to pay a premium for the privilidge. Depending on the number of data scientists at your organisation and the trade-off between cost and ease of use, these platforms may provide a better solution than setting up your own serverless function. With that caveat in mind, let's look at Lambda.

AWS Lambda "lets you run code without provisioning or managing servers". The service relies on using Docker images which must be based on a Lambda base image. Lambda provides images for Node.js, Python, Java, .NET, Go and Ruby. We will be using Python, which works out of the box. If you are using R, life might be a bit trickier but it's still possible using the provided base image for custom runtimes. The AWS Lambda free tier includes one million free requests per month and 400,000 GB-seconds of compute time per month.

Building a model

The first step is to train your model. We will use the popular Wine Quality dataset and scikit-learn to train a Random Forest Classifier to identify the quality of wines based on their characteristics. The Wine Quality dataset contains two files related to red and white variants of the Portuguese "Vinho Verde" wine. For more information, you can have a look at the paper by Cortez et al. (2009). Scikit-learn is the most populat machine learning library in Python. There are plenty of guides available to get started if you have not come accross the package before.

The code below reads the two files and combines them, including a flag for is_red. This combined DataFrame is then cleaned by removing spaces from column names and used to train the model. We are not interested in evaluating the model here, so there is no need for a train/test split. Importantly, the final step is to dump our model to a pickle file. This could also be logged to a model tracking service such as MLFlow, but we are not going to include that functionality here.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pickle

white_wine = pd.read_csv(
red_wine = pd.read_csv(
# Add a new column to describe the colour of the wine
red_wine["is_red"] = 1
white_wine["is_red"] = 0

# Concatenate the two datasets
data = pd.concat([red_wine, white_wine], axis=0)

# Remove spaces from column names
data.rename(columns=lambda x: x.replace(" ", "_"), inplace=True)

# Extract the feature and target columns
X = data.drop(["quality"], axis=1)
y = data.quality

# Train a random forest classifier
model = RandomForestClassifier(n_estimators=10, random_state=42), y)

# Export the model to a pickle file
pickle.dump(model, open("model.pkl", "wb"))

Creating the Lambda function

The next step is to write a function which will handle requests to our model and return predictions. Put the following code inside a file called You can actually call the file and the function whatever you like, as long as you specify them in the CMD of the Dockerfile, which we will get to next.

import json
import sklearn
import pickle

def lambda_handler(event, context):

    # Read in the model pickled earlier
    loaded_model = pickle.load(open("model.pkl", "rb"))

    # Extract the body of the request
    data = json.loads(event['body'])

    # Create a list of the features
    features = [

    # Use the list of features to extract the features from the body of the request
    # in the correct order. You could alternatively design your function to accept 
    # a list instead of a dictionary. This function is also designed to only make
    # a single prediction at a time, but you may wish to accept multiple.
    prediction = loaded_model.predict(
        [[data["values"][feature] for feature in features]]

    # Return a status code and the prediction in the body of the response
    return {
        "statusCode": 200,
        "body": json.dumps({"quality": int(prediction)}),


Building a Docker image

Create a Dockerfile using the Lambda Python base image and a combination of the pickled model and the handler function. If you are not already familiar with Docker, the documentation is a good place to get started. The only additional library we need to install is scikit-learn.


# Install dependencies
RUN  pip3 install scikit-learn --target "${LAMBDA_TASK_ROOT}"

# Copy function code

# Copy the model

# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "lambda_function.lambda_handler" ]

Then build and tag an image using the Dockerfile.

docker build -t wine-quality .


Creating a repo in ECR

The next step is to create a repo on AWS Elastic Container Registry (ECR) and push the image there. You will first need to log in to ECR from the terminal. Where you see 749926501328, this will need to be replaced with your account ID. The region I am using is eu-west-2, if yours is different you will also need to change this. It is also worth noting that all AWS configuration can also be done using the console (AWS web interface). To be able to use the AWS CLI, you will need to configure it with your credentials if you have not already done so. There are instructions on how to do that here.

aws ecr get-login-password --region eu-west-2 | \
    docker login \
        --username AWS \

Then, create a private container repository. I am calling mine wine-quality but you can give yours a different name if you like.

aws ecr create-repository --repository-name wine-quality --region eu-west-2

Push the image to this repository. Before pushing, the image needs to be tagged with the ECR repository that we created above.

docker tag wine-quality

docker push

The pricing of using ECR is $0.10 per GB for storage, and as long as your Lambda function and ECR repo are in the same region, data transfer is free.

Creating the Lambda function

Once the image exists in your ECR repo, you can use it to create a lambda function. Lambda functions require execution roles. These roles give functions the permissions they require if interacting with other AWS services. All execution roles will need the ability to execute Lambda functions which can be granted by adding the AWSLambdaBasicExecutionRole policy. We don't need any other permissions for our function, but you may want to add more policies to yours if you want to, for example, interact with cloud storage. For more information about AWS IAM roles and policies, see the documentation.

aws iam create-role \
    --role-name lambda-ex \
    --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{ "Effect": "Allow", "Principal": {"Service": ""}, "Action": "sts:AssumeRole"}]}'

aws iam attach-role-policy \
    --role-name lambda-ex \
    --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Once you have created a role and attached the appropriate policy, you can then create a function using that role. We are using the package type Image to indicate that we are using a Docker image instead of a zip file containing code. The code argument points to the ARN of our image. I called my function wine-quality but you can give yours a different name. The timeout is by default 3 seconds which is not enough time to pull and run our image so I have set this to 60 seconds.

aws lambda create-function \
    --package-type Image \
    --code \
    --function-name wine-quality \
    --region eu-west-2 \
    --role arn:aws:iam::749926501328:role/lambda-ex \
    --timeout 60

Calling the Lambda function to get predictions

Once you have successfully created the Lambda function, you will probably want to use it to get predictions based on new unseen data. There are a number of ways to do this. Lambda provides the ability to add a range of triggers which call the function when events happen. These events can be things like files being uploaded, URLs being requested, logs matching a pattern or messages being added to a queue. There are also a range of integrations with non-AWS services using EventBridge. Depending on your trigger, the event JSON that your function receives will be different. I have designed the function here to be able to be requested from a URL - it looks for a body key in the JSON which is provided when the function is triggered from a URL via API Gateway. We are not going to cover triggers here. Instead, you can call your function directly using the Python SDK for AWS, boto3.

import boto3
import json

# Instantiate a client object
client = boto3.client('lambda')

# Use the client to invoke a request to your function with a dictionary of features
response = client.invoke(
                'fixed_acidity': 7.0, 
                'volatile_acidity': 0.27, 
                'citric_acid': 0.36,
                'residual_sugar': 20.7, 
                'chlorides': 0.045, 
                'free_sulfur_dioxide': 45.0, 
                'total_sulfur_dioxide': 170.0, 
                'density': 1.001, 
                'pH': 3.0, 
                'sulphates': 0.45, 
                'alcohol': 8.8, 
                'quality': 6.0,
                'is_red': False


If everything works, this will return a response like the one below. You can have a play around by changing the values for each of the features to see how this affects the predictions.

    "statusCode": 200, 
    "body": "{\\"quality\\": 6}"

Here, we included the model pickle file directly within the image, however this may not be the best solution for your use case. If you have a large model or want to be able to update it frequently, it is probably a better idea to store the model pickle file somewhere like S3 and access it from within the function. This means that instead of having to update image every time your model changes, you can simply replace the file in S3. It may also be helpful to implement a versioning system or use a model traching service such as MLFlow, as mentioned earlier.


We have trained a Random Forest classifier to identify wine quality based on a range of characteristics. This model was then published to AWS Lambda using a Docker image, and we made requests to it using boto3. Publishing your model as a self-contained microservice, as we have just done, is a good idea as it means that multiple applications can use it and it can be tested and maintained as an isolated piece of software. You may find all this easier using a more managed platform, such as Databricks or Sagemaker to publish your models, however the price of using these services will be higher. Writing your own code and using Lambda also gives you a LOT of flexibility, although there is the obvious tradeoff that writing and maintaining code does require effort. There are a range of alternative cloud function services from other providers, listed above, but I am personally a big fan of Lamdba and would strongly recommend it do anyone, especially if you are already using AWS. And if you have not tried out cloud functions as part of your workflow already, what are you waiting for?

Dr Fergus McClean is a Data Scientist at the National Innovation Centre for Data specialising in, among other things, data engineering and visualisation. He has a background in environmental modelling and his PhD investigated the impact of using global datasets for flood inundation modelling and involved designing a cloud-based framework for running simulations.

To find out more about working with us, get in touch.
We'd love to hear from you.