QubitSkills - AI Training Platform

What You'll Learn

Evaluate ML models using ROUGE, BLEU, and Recall metrics

Navigate the Hugging Face Hub and compare SLM architectures (Llama, Mistral, Phi)

Quantize models to GGUF and AWQ formats for CPU and GPU inference

Fine-tune models with LoRA and QLoRA for memory-efficient training

Serve a quantized model locally via Ollama inside a Docker container

Course Content

Week 1: Machine Learning Baseline

Establish the ML foundations every AI engineer needs before touching LLMs.

Supervised Learning with Scikit-Learn

Training classification and regression models end-to-end using Scikit-Learn pipelines.

Tree-Based Models

Building and tuning Random Forests and XGBoost models for tabular prediction tasks.

Data Preprocessing & Feature Encoding

Handling missing values, scaling numerical features, and encoding categoricals for ML pipelines.

Model Evaluation Metrics

Measuring model quality with ROUGE, BLEU, Precision, Recall, and F1 — including when each is appropriate.

Weekly Win

Unsupervised Learning Baselines

Train a K-Means clustering model and compare its groupings against a supervised classifier on the same dataset.

Week 2: The SLM Ecosystem

Navigate the landscape of open-weight small models and choose the right one.

SLM Parameter Topologies

Analyzing the architectural differences between Llama, Mistral, and Phi families and their implications for task performance.

Navigating the Hugging Face Hub

Finding, filtering, and downloading models from the Hub — including reading model cards and license constraints.

Open Weights vs. Proprietary Models

Evaluating trade-offs in capability, cost, data privacy, and customizability between open and closed models.

Model Evaluation & Benchmarking

Running standardized benchmarks to objectively compare SLM candidates before committing to fine-tuning.

Weekly Win

Dynamic SLM Routing

Build a router that classifies incoming queries by complexity and dispatches them to the most cost-efficient SLM.

Week 3: Local Deployment & Quantization

Run full LLMs on consumer hardware by reducing precision without sacrificing quality.

Quantization Theory & Precision Scaling

How reducing weights from FP32 to INT8 or INT4 compresses models and the quality trade-offs at each precision level.

GGUF Format for CPU/Hybrid Inference

Converting models to GGUF format for efficient CPU and hybrid CPU/GPU inference with llama.cpp.

AWQ Format for GPU-Bound Inference

Applying Activation-aware Weight Quantization for faster GPU inference with minimal perplexity loss.

Inference Engines: llama.cpp & vLLM

Setting up and benchmarking llama.cpp and vLLM for local serving, comparing throughput and latency profiles.

Weekly Win

Memory Profiling & VRAM Calculation

Profile a quantized model's VRAM footprint and calculate the maximum batch size before GPU OOM for a given hardware spec.

Week 4: Parameter-Efficient Fine-Tuning (PEFT)

Adapt a pre-trained model to your domain without retraining from scratch.

Fine-Tuning Paradigms & Transfer Learning

Full fine-tuning, instruction tuning, and PEFT compared — when each is appropriate and what data each requires.

Low-Rank Adaptation (LoRA) Mechanics

How LoRA decomposes weight updates into low-rank matrices to dramatically reduce the number of trainable parameters.

QLoRA for Memory-Efficient Training

Combining 4-bit quantization with LoRA adapters to fine-tune 7B+ parameter models on a single consumer GPU.

Training Hyperparameters & Epochs

Configuring learning rate, batch size, gradient accumulation, and epoch count for stable, non-divergent fine-tuning runs.

Weekly Win

Monitoring Cross-Entropy Loss

Run a QLoRA fine-tuning job and produce a training curve showing Cross-Entropy loss converging over epochs.

Week 5: Execution & Capstone

Package and ship a fine-tuned model with zero cloud dependency.

Local Serving via Ollama API

Exposing a locally running quantized model through the Ollama REST API for programmatic access.

Docker Networking & Zero-Egress Environments

Configuring Docker networks that isolate containers from the internet to enforce data sovereignty.

Volume Management for Cached LLMs

Persisting large model weights in Docker volumes to avoid re-downloading on container restart.

Open WebUI Integration

Connecting Open WebUI to a locally running model to provide a chat interface without any external API calls.

Weekly Win

Capstone: Air-Gapped Local Support Bot

Deploy a fine-tuned SLM in a fully air-gapped Docker environment with Open WebUI — zero internet egress, all inference local.

Prerequisites

Python programming

Basic statistics

📚

Intermediate Level

Course Price

₹9,999

India

$199

International · One-time payment

Next cohort starts Mar 30

Duration5 weeks

LevelIntermediate

FormatCohort-based

Modules5

What's included:

Live cohort sessions

Hands-on projects

Certificate of completion

Lifetime access

Career support

Part of Learning Track

🛠️

AI Builder

6 courses in track

AI Builder: Small Models (SLMs) & Optimization

What You'll Learn

Course Content

Prerequisites

What's included:

Part of Learning Track