Home/Courses/AI Builder/AI Builder: Small Models (SLMs) & Optimization
Intermediate CoursePart of AI Builder

AI Builder: Small Models (SLMs) & Optimization

Deploy and fine-tune small language models entirely on your own hardware. Over five weeks, go from ML baselines through the SLM ecosystem, local quantized deployment, LoRA fine-tuning, and containerized serving — ending with an air-gapped support bot.

No rating yet
5 weeks

What You'll Learn

Evaluate ML models using ROUGE, BLEU, and Recall metrics
Navigate the Hugging Face Hub and compare SLM architectures (Llama, Mistral, Phi)
Quantize models to GGUF and AWQ formats for CPU and GPU inference
Fine-tune models with LoRA and QLoRA for memory-efficient training
Serve a quantized model locally via Ollama inside a Docker container

Course Content

W1
Week 1: Machine Learning Baseline
Establish the ML foundations every AI engineer needs before touching LLMs.
1
Supervised Learning with Scikit-Learn
Training classification and regression models end-to-end using Scikit-Learn pipelines.
2
Tree-Based Models
Building and tuning Random Forests and XGBoost models for tabular prediction tasks.
3
Data Preprocessing & Feature Encoding
Handling missing values, scaling numerical features, and encoding categoricals for ML pipelines.
4
Model Evaluation Metrics
Measuring model quality with ROUGE, BLEU, Precision, Recall, and F1 — including when each is appropriate.
Weekly Win
Unsupervised Learning Baselines
Train a K-Means clustering model and compare its groupings against a supervised classifier on the same dataset.
W2
Week 2: The SLM Ecosystem
Navigate the landscape of open-weight small models and choose the right one.
1
SLM Parameter Topologies
Analyzing the architectural differences between Llama, Mistral, and Phi families and their implications for task performance.
2
Navigating the Hugging Face Hub
Finding, filtering, and downloading models from the Hub — including reading model cards and license constraints.
3
Open Weights vs. Proprietary Models
Evaluating trade-offs in capability, cost, data privacy, and customizability between open and closed models.
4
Model Evaluation & Benchmarking
Running standardized benchmarks to objectively compare SLM candidates before committing to fine-tuning.
Weekly Win
Dynamic SLM Routing
Build a router that classifies incoming queries by complexity and dispatches them to the most cost-efficient SLM.
W3
Week 3: Local Deployment & Quantization
Run full LLMs on consumer hardware by reducing precision without sacrificing quality.
1
Quantization Theory & Precision Scaling
How reducing weights from FP32 to INT8 or INT4 compresses models and the quality trade-offs at each precision level.
2
GGUF Format for CPU/Hybrid Inference
Converting models to GGUF format for efficient CPU and hybrid CPU/GPU inference with llama.cpp.
3
AWQ Format for GPU-Bound Inference
Applying Activation-aware Weight Quantization for faster GPU inference with minimal perplexity loss.
4
Inference Engines: llama.cpp & vLLM
Setting up and benchmarking llama.cpp and vLLM for local serving, comparing throughput and latency profiles.
Weekly Win
Memory Profiling & VRAM Calculation
Profile a quantized model's VRAM footprint and calculate the maximum batch size before GPU OOM for a given hardware spec.
W4
Week 4: Parameter-Efficient Fine-Tuning (PEFT)
Adapt a pre-trained model to your domain without retraining from scratch.
1
Fine-Tuning Paradigms & Transfer Learning
Full fine-tuning, instruction tuning, and PEFT compared — when each is appropriate and what data each requires.
2
Low-Rank Adaptation (LoRA) Mechanics
How LoRA decomposes weight updates into low-rank matrices to dramatically reduce the number of trainable parameters.
3
QLoRA for Memory-Efficient Training
Combining 4-bit quantization with LoRA adapters to fine-tune 7B+ parameter models on a single consumer GPU.
4
Training Hyperparameters & Epochs
Configuring learning rate, batch size, gradient accumulation, and epoch count for stable, non-divergent fine-tuning runs.
Weekly Win
Monitoring Cross-Entropy Loss
Run a QLoRA fine-tuning job and produce a training curve showing Cross-Entropy loss converging over epochs.
W5
Week 5: Execution & Capstone
Package and ship a fine-tuned model with zero cloud dependency.
1
Local Serving via Ollama API
Exposing a locally running quantized model through the Ollama REST API for programmatic access.
2
Docker Networking & Zero-Egress Environments
Configuring Docker networks that isolate containers from the internet to enforce data sovereignty.
3
Volume Management for Cached LLMs
Persisting large model weights in Docker volumes to avoid re-downloading on container restart.
4
Open WebUI Integration
Connecting Open WebUI to a locally running model to provide a chat interface without any external API calls.
Weekly Win
Capstone: Air-Gapped Local Support Bot
Deploy a fine-tuned SLM in a fully air-gapped Docker environment with Open WebUI — zero internet egress, all inference local.

Prerequisites

Python programming
Basic statistics
📚
Intermediate Level
Course Price
9,999
India
$199
International · One-time payment
Next cohort starts Mar 30
Duration5 weeks
LevelIntermediate
FormatCohort-based
Modules5

What's included:

Live cohort sessions
Hands-on projects
Certificate of completion
Lifetime access
Career support

Part of Learning Track

🛠️
AI Builder
6 courses in track