You've trained a model. Accuracy is great. You show it to stakeholders and everyone is excited. Then you try to deploy it and everything falls apart. The model works on your laptop but breaks in the cloud. Data drifts and accuracy silently degrades. There's no way to roll back to the previous version. Retraining takes a weekend of manual effort. This is the reality of ML without MLOps.
MLOps (Machine Learning Operations) is the set of practices, tools, and culture that bridges the gap between model development and reliable production deployment. In 2025, any serious ML team — whether at a Kathmandu startup or a global tech company — needs MLOps discipline. This guide walks you through the complete modern MLOps stack, with real code you can use today.
The MLOps Lifecycle
MLOps is not a one-time setup — it's a continuous cycle. Unlike traditional software, ML systems have an extra challenge: they degrade over time as the world changes (data drift), and improving them requires retraining on new data.
System
- Works on my machine: Model trained on Python 3.9 breaks on the server running Python 3.11
- Silent accuracy degradation: Input data distribution shifts; no one notices until customers complain
- No experiment tracking: You don't remember which hyperparameters gave the best result last month
- Manual deployments: Every update requires SSHing into servers and running scripts by hand
- No rollback: If the new model breaks something, there's no quick way to revert
- Data versioning chaos: Different team members train on different data slices
The Modern MLOps Stack
| Category | Tool | Purpose | Free Tier? |
|---|---|---|---|
| Experiment Tracking | MLflow | Log metrics, params, models, artifacts | Yes — self-hosted |
| Experiment Tracking | Weights & Biases | Cloud-based tracking with rich visualisations | Yes — 100GB storage |
| Data Versioning | DVC | Git for large datasets and model files | Yes — open source |
| Model Serving | FastAPI + Uvicorn | Lightweight, async Python API server | Yes — open source |
| Containerisation | Docker | Package app + dependencies into portable images | Yes — free for public |
| Container Orchestration | Kubernetes | Scale and manage containerised ML services | GKE Autopilot free tier |
| Cloud Deployment | Railway / Render | Simple PaaS for deploying Docker containers | Yes — generous free tier |
| CI/CD | GitHub Actions | Automate build, test, deploy on git push | Yes — 2000 min/month |
| Model Registry | MLflow Registry | Version and stage models (Staging/Production) | Yes — self-hosted |
| Monitoring | Prometheus + Grafana | Metrics collection and dashboards | Yes — self-hosted |
| Feature Store | Feast | Manage and serve ML features consistently | Yes — open source |
Step 1: Train and Track with MLflow
MLflow experiment tracking is the foundation of MLOps. Every training run should log its parameters, metrics, and artifacts so you can compare experiments and reproduce results.
# train.py — Model training with MLflow experiment tracking
import mlflow
import mlflow.sklearn
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import (
accuracy_score, precision_score, recall_score,
f1_score, roc_auc_score, classification_report
)
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import json
import os
# ──────────────────────────────────────────────────────────
# Configuration — all hyperparameters in one place
# ──────────────────────────────────────────────────────────
CONFIG = {
"n_estimators": 200,
"max_depth": 5,
"learning_rate": 0.05,
"subsample": 0.8,
"min_samples_split": 10,
"test_size": 0.2,
"random_state": 42,
}
def load_data():
"""Load your dataset here. Replace with actual data loading."""
from sklearn.datasets import make_classification
X, y = make_classification(
n_samples=5000, n_features=20, n_informative=15,
n_redundant=5, random_state=42
)
return pd.DataFrame(X), pd.Series(y)
def train_model():
# Set MLflow tracking URI — use a local server or remote
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("fraud-detection-v2")
with mlflow.start_run(run_name="GBM-experiment-1") as run:
print(f"MLflow Run ID: {run.info.run_id}")
# ── Load & Split Data ──
X, y = load_data()
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=CONFIG["test_size"],
random_state=CONFIG["random_state"], stratify=y
)
# ── Log configuration ──
mlflow.log_params(CONFIG)
mlflow.log_param("train_samples", len(X_train))
mlflow.log_param("test_samples", len(X_test))
mlflow.log_param("n_features", X.shape[1])
# ── Build pipeline (scaler + model) ──
pipeline = Pipeline([
("scaler", StandardScaler()),
("model", GradientBoostingClassifier(
n_estimators = CONFIG["n_estimators"],
max_depth = CONFIG["max_depth"],
learning_rate = CONFIG["learning_rate"],
subsample = CONFIG["subsample"],
min_samples_split = CONFIG["min_samples_split"],
random_state = CONFIG["random_state"],
))
])
# ── Train ──
print("Training model...")
pipeline.fit(X_train, y_train)
# ── Evaluate ──
y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]
metrics = {
"accuracy": accuracy_score(y_test, y_pred),
"precision": precision_score(y_test, y_pred),
"recall": recall_score(y_test, y_pred),
"f1": f1_score(y_test, y_pred),
"roc_auc": roc_auc_score(y_test, y_proba),
}
# ── Log metrics ──
mlflow.log_metrics(metrics)
# ── Cross-validation ──
cv_scores = cross_val_score(pipeline, X, y, cv=5, scoring="roc_auc")
mlflow.log_metric("cv_roc_auc_mean", cv_scores.mean())
mlflow.log_metric("cv_roc_auc_std", cv_scores.std())
print("
Results:")
for k, v in metrics.items():
print(f" {k}: {v:.4f}")
print(f" CV ROC-AUC: {cv_scores.mean():.4f} ± {cv_scores.std():.4f}")
# ── Save and log model ──
mlflow.sklearn.log_model(
pipeline,
"model",
registered_model_name="fraud-detector",
input_example=X_test.iloc[:5],
signature=mlflow.models.infer_signature(X_test, y_pred),
)
print(f"
Model saved to MLflow. Run ID: {run.info.run_id}")
return run.info.run_id
if __name__ == "__main__":
run_id = train_model()
Step 2: Serve with FastAPI
# serve.py — Production FastAPI model server
from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel, Field
import mlflow.sklearn
import numpy as np
import pandas as pd
import logging
import time
import os
from contextlib import asynccontextmanager
from prometheus_client import Counter, Histogram, generate_latest, CONTENT_TYPE_LATEST
from fastapi.responses import Response
# ── Logging ──
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# ── Prometheus metrics ──
REQUEST_COUNT = Counter("predictions_total", "Total predictions", ["status"])
REQUEST_LATENCY = Histogram("prediction_latency_seconds", "Prediction latency")
INPUT_ERRORS = Counter("input_errors_total", "Invalid input count")
# ── Global model ──
model = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global model
model_uri = os.getenv("MODEL_URI", "models:/fraud-detector/Production")
logger.info(f"Loading model from: {model_uri}")
try:
model = mlflow.sklearn.load_model(model_uri)
logger.info("Model loaded successfully")
except Exception as e:
logger.error(f"Failed to load model: {e}")
raise
yield
app = FastAPI(
title="Fraud Detection API",
description="ML model serving with MLflow + FastAPI",
version="2.0.0",
lifespan=lifespan,
)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
class PredictionRequest(BaseModel):
features: list[float] = Field(..., description="Model input features", min_length=1)
class PredictionResponse(BaseModel):
prediction: int
probability: float
confidence: str
latency_ms: float
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
if model is None:
raise HTTPException(status_code=503, detail="Model not loaded")
start = time.time()
try:
X = np.array(request.features).reshape(1, -1)
pred = int(model.predict(X)[0])
proba = float(model.predict_proba(X)[0, 1])
except Exception as e:
INPUT_ERRORS.inc()
REQUEST_COUNT.labels(status="error").inc()
logger.error(f"Prediction error: {e}")
raise HTTPException(status_code=422, detail=f"Prediction failed: {e}")
latency = (time.time() - start) * 1000
REQUEST_COUNT.labels(status="success").inc()
REQUEST_LATENCY.observe(latency / 1000)
confidence = "high" if proba > 0.8 or proba < 0.2 else "medium" if proba > 0.6 or proba < 0.4 else "low"
return PredictionResponse(
prediction=pred,
probability=round(proba, 4),
confidence=confidence,
latency_ms=round(latency, 2),
)
@app.get("/metrics")
def metrics():
"""Prometheus metrics endpoint."""
return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)
@app.get("/health")
def health():
return {"status": "healthy", "model_loaded": model is not None}
Step 3: Dockerise the Service
# Dockerfile — Multi-stage build for production ML API
# ──────────────────────────────────────────────────────────
# Stage 1: Build stage (installs all deps including build tools)
# ──────────────────────────────────────────────────────────
FROM python:3.11-slim AS builder
WORKDIR /build
# Install system build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc g++ libgomp1 && \
rm -rf /var/lib/apt/lists/*
# Install Python dependencies into a virtual env
COPY requirements.txt .
RUN python -m venv /opt/venv && \
/opt/venv/bin/pip install --no-cache-dir --upgrade pip && \
/opt/venv/bin/pip install --no-cache-dir -r requirements.txt
# ──────────────────────────────────────────────────────────
# Stage 2: Runtime stage (lean production image)
# ──────────────────────────────────────────────────────────
FROM python:3.11-slim AS runtime
# Create non-root user for security
RUN useradd -m -u 1000 appuser
WORKDIR /app
# Copy virtual env from builder
COPY --from=builder /opt/venv /opt/venv
# Activate venv
ENV PATH="/opt/venv/bin:$PATH"
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
# Copy application code
COPY --chown=appuser:appuser serve.py .
USER appuser
EXPOSE 8000
# Health check — Docker will restart unhealthy containers
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"
# Run with Gunicorn (production WSGI) + Uvicorn workers
CMD ["gunicorn", "serve:app", "--workers", "2", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "120"]Step 4: Deploy to Railway or Render
For small to medium ML APIs, Railway and Render are the best options for Nepali developers — no complex Kubernetes setup, generous free tiers, and automatic HTTPS.
# ──────────────────────────────────────────────────────────
# Option A: Deploy to Railway
# ──────────────────────────────────────────────────────────
# 1. Install Railway CLI
npm install -g @railway/cli
# 2. Login and initialise project
railway login
railway init # Creates railway.json
# 3. Add environment variables
railway variables set OPENAI_API_KEY=sk-xxx
railway variables set MODEL_URI=models:/fraud-detector/Production
railway variables set MLFLOW_TRACKING_URI=https://your-mlflow-server.com
# 4. Deploy (Railway auto-detects Dockerfile)
railway up
# Your API is live at: https://your-app.railway.app
# ──────────────────────────────────────────────────────────
# Option B: Deploy to Render (via render.yaml)
# ──────────────────────────────────────────────────────────
cat > render.yaml << 'EOF'
services:
- type: web
name: fraud-detection-api
runtime: docker
dockerfilePath: ./Dockerfile
region: singapore # closest to Nepal
plan: starter # $7/month — 512MB RAM, 0.5 CPU
healthCheckPath: /health
envVars:
- key: MODEL_URI
value: models:/fraud-detector/Production
- key: OPENAI_API_KEY
sync: false # Set in Render dashboard (secret)
EOF
# Push to GitHub and connect repo in Render dashboard
git add render.yaml && git commit -m "Add Render config"
git push origin mainStep 5: CI/CD with GitHub Actions
# .github/workflows/deploy.yml
# Runs on every push to main — tests, builds, and deploys
name: MLOps CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}/fraud-detection-api
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.11
uses: actions/setup-python@v5
with:
python-version: "3.11"
cache: "pip"
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-asyncio httpx
- name: Run unit tests
run: pytest tests/ -v --tb=short
- name: Run model validation
run: python scripts/validate_model.py
env:
MODEL_URI: ${{ secrets.MODEL_URI }}
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
build:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- name: Trigger Render deployment
run: |
curl -s -X POST \
-H "Authorization: Bearer ${{ secrets.RENDER_API_KEY }}" \
"https://api.render.com/v1/services/${{ secrets.RENDER_SERVICE_ID }}/deploys" \
-H "Content-Type: application/json" \
-d '{"clearCache": false}'
- name: Wait for deployment
run: |
echo "Waiting for Render to deploy..."
sleep 60
curl -sf https://your-app.onrender.com/health || exit 1
echo "Deployment successful!"
DVC (Data Version Control) is Git for large files — datasets, model checkpoints, and feature stores. It stores file metadata in Git and the actual data in cheap object storage (S3, GCS, Azure Blob).
Building a Reliable ML System
The stack we've covered — MLflow for experiment tracking, FastAPI for serving, Docker for packaging, GitHub Actions for CI/CD, and Railway/Render for deployment — gives you a production-grade MLOps foundation without requiring a dedicated DevOps team.
Start small: add MLflow tracking to your existing training scripts this week. Then containerise your best model. Then automate the deployment. Each step independently adds value and you can stop at any point.
In Nepal's growing AI ecosystem, the engineers who can both build models and reliably deploy them are the rarest and most valuable. MLOps is your competitive advantage.