AI Model Compendium
COMPLETE AI MODEL REFERENCE
Concise descriptions, difficulty level, typical uses and example projects for major AI, ML and deep learning models (comprehensive 2025‑level list).
Classical Machine Learning Models
| Model | Level | Description | Common Uses / Example Projects |
|---|---|---|---|
| Linear Regression | Beginner | Predict continuous targets via linear combination of features; teaches OLS and gradients. | House price prediction; sales/time-series forecasting; energy consumption modeling; baseline regression experiments; feature selection studies. |
| Logistic Regression | Beginner | Binary classification using sigmoid; outputs probabilities and interpretable coefficients. | Spam detection; medical screening; churn prediction; credit default classification; simple NLP classification with bag‑of‑words. |
| Decision Tree | Beginner | Hierarchical splits on features producing human‑readable rules; easy to visualize. | Credit scoring rules; diagnostic flowcharts; interpretable classification demos; feature importance visualizer; teaching decision logic. |
| Random Forest | Intermediate | Ensemble of randomized trees; reduces variance and overfitting via averaging. | Tabular baseline for industry problems; feature importance reports; anomaly detection; ecology / bioinformatics classification; model stacking component. |
| Gradient Boosting (XGBoost / LightGBM / CatBoost) | Intermediate | Sequentially built trees that focus on correcting prior errors; state‑of‑the‑art for tabular tasks. | Kaggle‑style tabular pipelines; credit risk scoring; demand forecasting; click‑through rate prediction; categorical-heavy datasets. |
| Support Vector Machine (SVM) | Intermediate | Finds maximum margin hyperplane; kernel trick enables non-linear boundaries. | Text categorization with TF‑IDF; face detection; small-sample classification; kernel comparison experiments. |
| K-Nearest Neighbors (KNN) | Beginner | Instance-based classifier using nearest labeled neighbors; simple but powerful for small datasets. | Recommender prototypes; handwriting recognition demos; local anomaly detection; content-based filtering. |
| K‑Means | Beginner | Partitions data into k clusters by minimizing within‑cluster variance. | Customer segmentation; color quantization for images; document clustering; music‑phrase grouping; prototyping for active learning. |
| DBSCAN | Intermediate | Density‑based clustering that finds arbitrary shapes and marks noise/outliers. | Geospatial clustering; anomaly detection on sensor streams; clustering noisy audio features; density-based segmentation. |
| PCA (Principal Component Analysis) | Intermediate | Linear dimensionality reduction projecting data onto principal variance directions. | Feature compression, visualization, denoising, pre-processing for downstream models, exploratory analysis of embeddings. |
| t‑SNE / UMAP | Intermediate | Nonlinear projection methods for visualizing high‑dimensional data in 2D/3D. | Visualize raag/phrase embeddings; cluster structure exploration; embedding space debugging; model representation comparisons. |
| Gaussian Mixture Model (GMM) | Intermediate | Probabilistic clustering using mixtures of Gaussians; gives soft cluster assignments. | Speaker diarization prototypes; density estimation; unsupervised phoneme modeling; music phrase probabilistic modeling. |
| Ensembles (Bagging, Boosting, Stacking) | Advanced | Combine multiple models to improve accuracy/robustness; stacking trains a meta‑learner on base predictions. | Production ML pipelines; AutoML backbones; competition-winning ensembles; robust risk models; hybrid model deployment. |
Neural Network Models
| Model | Level | Description | Common Uses / Example Projects |
|---|---|---|---|
| Perceptron / MLP | Beginner | Fully connected networks; MLPs learn non‑linear mappings using dense layers and activations. | Tabular prediction with neural nets; baseline classifier/regressor; basic autoencoder; educational walkthrough of backpropagation; deploy a simple MLP API. |
| Convolutional Neural Network (CNN) | Intermediate | Convolutions detect local spatial patterns; pooling and hierarchical features make them ideal for images and spectrograms. | Image classification (imagenet, custom), audio spectrogram classification (raga/instrument), OCR for notation, transfer learning for small datasets, feature visualization. |
| ResNet / VGG / EfficientNet | Advanced | Variants of CNNs addressing depth, efficiency and training stability (residual connections, scaling). | Medical imaging pipelines, object detection backbones, model compression experiments, transfer learning for niche image tasks, performance benchmarking. |
| RNN / LSTM / GRU | Intermediate | Recurrent networks capture temporal dependencies; LSTM/GRU improve long‑term memory and training stability. | Melody generation, next‑note prediction, sequence labeling of musical events, BPM/time‑series modeling, compare RNN vs Transformer for sequences. |
| Seq2Seq (Encoder‑Decoder) | Advanced | Maps input sequences to output sequences; often augmented with attention for alignment (used in translation, summarization). | Notation‑to‑audio pipelines, music transcription, melody harmonization, automated lyric translation, guided sequence transformation. |
| Autoencoder (AE) | Intermediate | Compresses inputs to a latent representation and reconstructs them—useful for denoising and feature learning. | Denoise audio, compress notation, anomaly detection in recordings, pretrain encoders for downstream tasks, latent space visualization. |
| Variational Autoencoder (VAE) | Advanced | Probabilistic autoencoder enabling sampling from latent distributions for generative tasks. | Generate melody variations, interpolate between motifs, conditional generation by raga tag, data augmentation, latent‑space exploration tools. |
| GAN (Generative Adversarial Network) | Advanced | Adversarial training of a generator and a discriminator to produce realistic samples. | Create new audio textures, style transfer between genres, album-art generation, augment small datasets, timbre conversion experiments. |
| Transformer (Self‑Attention) | Advanced | Self‑attention models that capture pairwise interactions across sequences in parallel; foundation of modern LLMs. | Implement toy transformer, pretrain on music tokens, melody autocomplete, attention analysis for musical motifs, fine‑tune for chord prediction. |
Transformers & Large Language Models (LLMs)
| Model | Level | Description | Common Uses / Example Projects |
|---|---|---|---|
| BERT / RoBERTa / ALBERT | Advanced | Encoder‑only transformers pretrained with masked language modeling for contextual understanding. | NER for music corpora, semantic search over bandishes, classification of lyrics, embedding extraction for similarity, fine‑tune for metadata tagging. |
| GPT (GPT‑1 → GPT‑5, LLaMA, Mistral, Falcon) | Expert | Decoder‑only autoregressive transformers trained to predict next tokens; excel at generation and instruction following. | Interactive composition assistant, practice-plan generator, notation-to-text converter, domain‑fine‑tuned tutor for Hindustani music, creative lyric/melody co‑authoring. |
| T5 / BART | Advanced | Encoder‑decoder text‑to‑text frameworks useful for any sequence transformation task. | Summarization of lecture notes, paraphrasing bandish descriptions, automated notation normalization, question generation, lyric rewriting. |
| Vision Transformer (ViT) / CLIP / BLIP | Advanced | Apply transformer to visual tokens; CLIP aligns images and text for cross‑modal tasks. | Sheet‑image captioning, cross‑modal search (audio ↔ sheet), visual notation classification, image‑based dataset indexing, caption generation for concerts. |
| Audio & Music Models (Whisper, Wav2Vec2, MusicLM, Jukebox) | Advanced | Transformers tailored for audio tasks: ASR, speech embeddings, and music generation/synthesis. | Transcribe bandish recordings, extract melody embeddings, generate accompaniment, timbre conversion, build practice feedback systems. |
Graph & Relational Models
| Model | Level | Description | Common Uses / Example Projects |
|---|---|---|---|
| Graph Neural Network (GNN) / GCN | Advanced | Message‑passing networks that learn from node/edge structure and propagate features across graphs. | Raag knowledge graph embeddings (Neo4j + GNN), phrase link prediction, recommendation by graph proximity, molecular property prediction, artist collaboration networks. |
| GraphSAGE / GAT | Advanced | Scalable neighborhood sampling (GraphSAGE) and attention‑based graph message weighting (GAT). | Inductive node classification, influencer detection, weighted relation modeling, music community detection, graph-based retrieval. |
| Knowledge Graph Embeddings (TransE, RotatE, ComplEx) | Advanced | Embed entities & relations into vector spaces preserving relational structure for reasoning and retrieval. | Semantic search over bandishes, QA over structured music facts, link completion for missing relations, hybrid RAG pipelines, ontology alignment. |
Reinforcement Learning
| Model | Level | Description | Common Uses / Example Projects |
|---|---|---|---|
| Q‑Learning | Intermediate | Value‑based RL that learns state–action values for discrete action problems. | Grid‑world agents, discrete music game (hit correct beat), pathfinding, policy visualization, basic RL education. |
| Deep Q‑Network (DQN) | Advanced | Neural network approximates Q function for high‑dimensional observations (images, spectrograms). | Atari benchmark agents, rhythm game agent, simulated instrument control, RL curriculum experiments, replay buffer studies. |
| Policy Gradient / Actor‑Critic / PPO / SAC | Advanced | Directly optimize policy (stochastic) and combine value estimators for stability; PPO is widely used in practice. | Continuous instrument control, expressive performance optimization, simulated conductor, robotics finger control, reward shaping experiments. |
| AlphaZero / MuZero | Expert | Combine deep RL with MCTS and learned dynamics for planning and strategy learning. | Game-playing agents (chess/go), planning in music composition search, high‑level strategy simulators, research on sample‑efficiency and planning. |
Generative & Diffusion Models
| Model | Level | Description | Common Uses / Example Projects |
|---|---|---|---|
| Variational Autoencoder (VAE) | Advanced | Latent probabilistic model enabling sampling and interpolation between encoded inputs. | Generate melody variants, interpolate between bandishes, conditional VAE by raga, data augmentation, latent visualization tools. |
| GANs (DCGAN / StyleGAN / CycleGAN) | Advanced | Adversarial generator/discriminator pairs for realistic sample synthesis or domain translation. | Style transfer for musical art, generate album art, audio texture synthesis, convert folk ↔ classical styles, GAN augmentation pipelines. |
| Diffusion Models (DDPM / DDIM / Latent Diffusion / Stable Diffusion) | Expert | Iterative denoising from noise to data; excels at high-fidelity image, audio and video generation. | Text‑to‑image concert visuals, spectrogram diffusion for audio gen, conditional diffusion for melody→audio, research on controllable synthesis, multimodal diffusion experiments. |
Hybrid, Symbolic & Agentic Systems
| Model / Pattern | Level | Description | Common Uses / Example Projects |
|---|---|---|---|
| Retrieval‑Augmented Generation (RAG) | Advanced | Combine LLMs with external retrievers (vector DBs, knowledge graphs) to ground responses and reduce hallucination. | Bandish QA chatbot with citations, tutor system linked to raag KG, enterprise document assistants, hybrid RAG (vectors + Neo4j), provenance tracking demos. |
| Mixture of Experts (MoE) | Expert | Large-scale models with sparse routing to specialized experts to scale capacity efficiently. | Experiment with expert routing for musical domains, MoE toy for prompt routing, study efficiency vs dense baselines, adapt experts to ragas, research in MoE stability. |
| Neuro‑Symbolic & Knowledge Graph Integration | Expert | Combines symbolic reasoning (graphs, logic) with neural networks for explainable, grounded AI. | Hybrid QA systems, rule + NN compositional models, knowledge‑grounded tutoring, graph reasoning pipelines for music ontology, explainable recommendation systems. |
| Agentic Multi‑Agent Systems | Expert | Orchestration of multiple agents (LLMs) to plan, act, and collaborate for complex tasks and workflows. | Practice coach agents, multi‑agent composition systems, dataset curation agents, long‑term tutoring with memory, autonomous research assistants. |
Tools, Libraries & Platforms
Common tools you’ll use across models:
| Area | Tools / Libraries |
|---|---|
| General ML | scikit‑learn, pandas, numpy, scipy |
| Deep Learning | PyTorch, TensorFlow, Keras, JAX |
| Transformers & LLMs | Hugging Face Transformers, 🤗Datasets, PEFT, LoRA, HuggingFace Hub |
| GNN | PyTorch Geometric, DGL |
| Retrieval & Vectors | FAISS, Milvus, Annoy, Pinecone, Weaviate, LlamaIndex, LangChain |
| RL | Gymnasium, Stable Baselines3, RLlib |
| Generative | diffusers, Jukebox, Magenta, WaveNet, torchaudio |
| Deployment | Docker, FastAPI, TorchServe, BentoML, Kubernetes |
| MLOps | Weights & Biases, MLflow, TensorBoard |
| Graph DB | Neo4j, TigerGraph |