Blog Post: GitLab Experiments

by Adam Knapp

Posted on March 25, 2024, for Machine Learning in Production (Carnegie Mellon University)

Outline

Here's a quick overview of what is covered:

GitLab Background

GitLab, founded in 2014, quickly emerged as a frontrunner in the DevOps toolchain, offering a single application for the entire software development lifecycle. From project planning and source code management to CI/CD, monitoring, and security, GitLab has provided comprehensive solutions that streamline productivity and foster a collaborative environment for software development teams. Its ability to support both the development and operations side of projects makes it particularly appealing for managing machine learning (ML) projects, which inherently require cross-disciplinary collaboration between software engineers and data scientists.

GitLab MLOps

MLOps, merging machine learning with operations, seeks to unify ML system development and deployment. The aim is to expedite the lifecycle of deploying machine learning models and ensure continuous improvement through feedback loops between model performance and development efforts. GitLab, with its robust DevOps toolchain including CI/CD, automated testing, and monitoring, has embraced MLOps, introducing specific tools to bridge the gap between data scientists and software developers.

Overview of MLOps
Visual representation of an MLOps Diagram (https://polyaxon.com/)

Model exploration, often hindered by non-standardized Jupyter notebooks, poses a significant challenge to collaboration. To address this, tools like MLflow and W&B were developed, enabling tracking of model evolution and facilitating model sharing to ensure organizational reproducibility.

GitLab's integration of MLflow through its Experiments and Model Registry features exemplifies its commitment to enhancing the MLOps workflow. The MLflow client, typically external, is hosted within GitLab, granting all team members direct access to ML model production without additional infrastructure. GitLab Experiments leverage MLflow for detailed logging and information storage about significant models, aiding internal sharing. Furthermore, the Model Registry supports model versioning for deployment, allowing the consolidation of experiments into a single, deployment-ready location. Leveraging GitLab's DevOps tools, model deployment can seamlessly integrate into existing pipelines, streamlining the process.

While Experiments and Model Registry serve similar purposes, this blog will primarily focus on the Experiments feature, highlighting its role in facilitating efficient, collaborative MLOps practices within GitLab.

GitLab Experiments

Overview

Experimentation is at the heart of machine learning. With GitLab experiments, ML developers can log their models in GitLab with the information needed to share their findings and make the model run reproducible. It is commonly perceived that Code, Data, and Environment are needed to be tracked with a model to make it reproducible. We'll highlight these aspects in our example.

GitLab experiments do not force you to log your model in a way to ensure it is reproducible, but in my example below, I will highlight some basic logs. Understand that every team is different and you will need to tailor how you use experiments to your team.

Setup

This feature is still in Beta testing and has very limited documentation. I will provide a step-by-step tutorial on how to implement so that you don't encounter the same issues I did. To complete this, you should already have a GitLab account with at least one project.

Set Up GitLab API Key

Configuration