├── .gitignore ├── Dockerfile ├── README.md ├── archtiecture.jpg ├── docker-compose.yaml ├── hello-mlflow.py └── mlflow-google-auth.gif /.gitignore: -------------------------------------------------------------------------------- 1 | .venv 2 | 3 | postgres-data 4 | 5 | outputs 6 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.10-slim-bullseye 2 | RUN pip install --no-cache mlflow==2.10.0 3 | RUN pip install --no-cache psycopg2-binary boto3 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # mlflow-oauth-sidecar 2 | How to authentication in MLflow using an external IDP like Google, Github 3 | 4 | ## Background 5 | There is an [official instruction](https://mlflow.org/docs/latest/auth/index.html) to authenticate with basic HTTP authentication(username, password) however authenticate using External Providers (Google, GitHub and others) to validate accounts is not supported in MLflow yet. Might have that functionality in Databricks Managed MLflow but not in OSS. 6 | 7 | ## User Story 8 | - As a: Data Scientists who are involved in ML model development and experimental tracking through MLflow 9 | - I want to: have authentication to access MLflow with external IdP 10 | - So that: can facilitate authentication via their personal accounts and simplify the login process in MLflow 11 | 12 | ## Architecture 13 | Using oauth2-proxy as a sidecar for MLflow workload that provides authentication using Providers to validate accounts by email, domain or group. 14 | 15 | ![architecture](./archtiecture.jpg) 16 | 17 | ## Implementation 18 | 19 | In this [complete example](./docker-compose.yaml), the docker-compose sets up essential MLflow components in your environment. Additionally, it spawn an oauth2-proxy to facilitate authentication with favorite external Identity Provider (IdP). 20 | 21 | The example only demonstrates authentication with Google, but you have the flexibility to choose any other external IdPs according to your preferences. For additional details, please refer to the [oauth2-proxy documentation](https://oauth2-proxy.github.io/oauth2-proxy/configuration/oauth_provider) 22 | 23 | Before moving on, you should [generate a cookie secrets](https://oauth2-proxy.github.io/oauth2-proxy/configuration/overview#generating-a-cookie-secret) and fill in corresponding [OAuth client ID and secrets](https://oauth2-proxy.github.io/oauth2-proxy/configuration/oauth_provider#google-auth-provider) as environment variables. 24 | 25 | ```bash 26 | export OAUTH2_PROXY_CLIENT_ID= 27 | export OAUTH2_PROXY_CLIENT_SECRET= 28 | export OAUTH2_PROXY_COOKIE_SECRET= 29 | 30 | docker-compose up -d 31 | ``` 32 | 33 | It might takes a couple of minutes to run the all containers. If you get any error messages, you should check the logs in MLflow and Oauth2-proxy. 34 | 35 | Once everything is up, you can access [127.0.0.1:3000](http://127.0.0.1:3000) to login MLflow UI with your email. Also, you can log runs to the tracking server. 36 | 37 | Your Bearer token can be retrieved from the browser(Chrome: Developer Tools > Network tab > Response Headers). 38 | 39 | ```bash 40 | export MLFLOW_TRACKING_URI=http://127.0.0.1:3000 # OAuth2-proxy 41 | export MLFLOW_TRACKING_TOKEN="ey..7w" # Bearer token 42 | 43 | python hello-mlflow.py 44 | ``` 45 | 46 | ## DEMO 47 | ![mlflow-oauth-proxy-demo](./mlflow-google-auth.gif) 48 | -------------------------------------------------------------------------------- /archtiecture.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cloudacode/mlflow-oauth-sidecar/4a96b3fcfc0392103ec74f90be12c074da8aaebc/archtiecture.jpg -------------------------------------------------------------------------------- /docker-compose.yaml: -------------------------------------------------------------------------------- 1 | # piggyback on 2 | # https://mlflow.org/docs/latest/tracking/tutorials/remote-server.html 3 | version: '3.7' 4 | services: 5 | # PostgreSQL database 6 | postgres: 7 | image: postgres:latest 8 | container_name: postgres 9 | ports: 10 | - 5432:5432 11 | environment: 12 | POSTGRES_USER: user 13 | POSTGRES_PASSWORD: password 14 | POSTGRES_DB: mlflowdb 15 | volumes: 16 | - ./postgres-data:/var/lib/postgresql/data 17 | healthcheck: 18 | test: ["CMD", "pg_isready", "-q", "-U", "user", "-d", "mlflowdb"] 19 | interval: 5s 20 | timeout: 5s 21 | retries: 10 22 | 23 | # MinIO server 24 | minio: 25 | image: minio/minio 26 | container_name: minio 27 | expose: 28 | - "9000" 29 | ports: 30 | - "9000:9000" 31 | # MinIO Console is available at http://localhost:9001 32 | - "9001:9001" 33 | environment: 34 | MINIO_ROOT_USER: minio_user 35 | MINIO_ROOT_PASSWORD: minio_password 36 | healthcheck: 37 | test: timeout 5s bash -c ':> /dev/tcp/127.0.0.1/9000' || exit 1 38 | interval: 5s 39 | timeout: 10s 40 | retries: 5 41 | command: server /data --console-address ":9001" 42 | minio-create-bucket: 43 | image: minio/mc 44 | depends_on: 45 | minio: 46 | condition: service_healthy 47 | entrypoint: > 48 | bash -c " 49 | mc alias set minio http://minio:9000 minio_user minio_password && 50 | if ! mc ls minio | grep --quiet bucket; then 51 | mc mb minio/bucket 52 | else 53 | echo 'bucket already exists' 54 | fi 55 | " 56 | 57 | # MLflow server 58 | mlflow: 59 | depends_on: 60 | minio: 61 | condition: service_healthy 62 | postgres: 63 | condition: service_healthy 64 | image: cloudacode/mlflow:2.10.0 65 | container_name: mlflow 66 | expose: 67 | - "5000" 68 | ports: 69 | - "5000:5000" 70 | environment: 71 | MLFLOW_BACKEND_STORE_URI: postgresql://user:password@postgres:5432/mlflowdb 72 | MLFLOW_ARTIFACTS_DESTINATION: s3://bucket 73 | MLFLOW_S3_ENDPOINT_URL: "http://minio:9000" 74 | MLFLOW_S3_IGNORE_TLS: "true" 75 | AWS_ACCESS_KEY_ID: "minio_user" 76 | AWS_SECRET_ACCESS_KEY: "minio_password" 77 | command: mlflow server --host 0.0.0.0 --port 5000 78 | 79 | # OAuth2-proxy 80 | oauth2-proxy: 81 | depends_on: 82 | - mlflow 83 | image: quay.io/oauth2-proxy/oauth2-proxy:v7.1.3 84 | container_name: oauth2-proxy 85 | expose: 86 | - "3000" 87 | ports: 88 | - "3000:3000" 89 | environment: 90 | OAUTH2_PROXY_PROVIDER: google 91 | OAUTH2_PROXY_OIDC_ISSUER_URL: https://accounts.google.com 92 | OAUTH2_PROXY_EMAIL_DOMAINS: "*" 93 | OAUTH2_PROXY_CLIENT_ID: $OAUTH2_PROXY_CLIENT_ID 94 | OAUTH2_PROXY_CLIENT_SECRET: $OAUTH2_PROXY_CLIENT_SECRET 95 | OAUTH2_PROXY_COOKIE_SECRET: $OAUTH2_PROXY_COOKIE_SECRET 96 | OAUTH2_PROXY_COOKIE_EXPIRE: 3h 97 | OAUTH2_PROXY_COOKIE_REFRESH: 1h 98 | OAUTH2_PROXY_UPSTREAMS: http://mlflow:5000 99 | OAUTH2_PROXY_HTTP_ADDRESS: 0.0.0.0:3000 100 | OAUTH2_PROXY_REDIRECT_URL: http://127.0.0.1:3000/oauth2/callback 101 | OAUTH2_PROXY_COOKIE_SECURE: "false" 102 | OAUTH2_PROXY_SKIP_JWT_BEARER_TOKENS: "true" 103 | OAUTH2_PROXY_PASS_AUTHORIZATION_HEADER: "true" 104 | OAUTH2_PROXY_PASS_ACCESS_TOKEN: "true" 105 | OAUTH2_PROXY_PASS_USER_HEADERS: "true" 106 | OAUTH2_PROXY_SET_XAUTHREQUEST: "true" 107 | OAUTH2_PROXY_SET_AUTHORIZATION_HEADER: "true" 108 | OAUTH2_PROXY_SKIP_PROVIDER_BUTTON: "true" 109 | -------------------------------------------------------------------------------- /hello-mlflow.py: -------------------------------------------------------------------------------- 1 | # https://mlflow.org/docs/latest/tracking/tutorials/remote-server.html 2 | 3 | import mlflow 4 | 5 | from sklearn.model_selection import train_test_split 6 | from sklearn.datasets import load_diabetes 7 | from sklearn.ensemble import RandomForestRegressor 8 | 9 | mlflow.autolog() 10 | 11 | db = load_diabetes() 12 | X_train, X_test, y_train, y_test = train_test_split(db.data, db.target) 13 | 14 | # Create and train models. 15 | rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3) 16 | rf.fit(X_train, y_train) 17 | 18 | # Use the model to make predictions on the test dataset. 19 | predictions = rf.predict(X_test) 20 | -------------------------------------------------------------------------------- /mlflow-google-auth.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cloudacode/mlflow-oauth-sidecar/4a96b3fcfc0392103ec74f90be12c074da8aaebc/mlflow-google-auth.gif --------------------------------------------------------------------------------