Comprehensive machine learning cheatsheet

A comprehensive machine learning cheatsheet covering key concepts, techniques, and best practices across various stages of a typical machine learning workflow.

Stage	Task/Concept	Description
1. Problem Definition	Define the Problem	Clearly articulate the problem to be solved.
	Understand Objectives	Specify the goals and objectives of the machine learning project.
	Formulate as ML Problem	Determine if the problem is suitable for machine learning and identify the type of ML problem (classification, regression, clustering, etc.).
	Data Availability	Assess the availability and quality of data needed for the project.
	Data-driven vs. Model-driven	Decide whether the problem requires a data-driven or model-driven approach.
	Define Success Criteria	Establish how success will be measured. Specify relevant evaluation metrics (accuracy, precision, recall, etc.).
	Consider Constraints	Identify any constraints or limitations in the project, such as budget, time, or resource constraints.
	Stakeholder Involvement	Involve stakeholders and domain experts to gain insights into the problem domain. Understand the business context and requirements.
	Ethical Considerations	Consider ethical implications, fairness, and potential biases in the data. Ensure compliance with regulations and ethical standards.
	Iterative Refinement	Problem definition is often an iterative process. Refine the problem definition as you gain more insights and data.
2. Data Collection	Identify Data Sources	Identify and gather relevant data sources for your machine learning project.
	Data Exploration	Explore and visualize the data to gain insights.
	Data Cleaning	Handle missing values, outliers, and other data quality issues.
3. Data Preprocessing	Feature Engineering	Create relevant features that contribute to model performance.
	Data Scaling	Standardize or normalize numerical features.
	Categorical Encoding	Convert categorical variables into a numerical format (one-hot encoding, label encoding).
4. Data Splitting	Train-Test Split	Split the dataset into training and testing sets for model evaluation.
5. Model Selection	Choose Model	Select a model based on the nature of the problem (classification, regression).
	Hyperparameter Tuning	Optimize model parameters for better performance.
	Baseline Model	Establish a simple baseline model for comparison.
6. Model Training	Fit Model	Train the chosen model on the training data.
	Cross-Validation	Evaluate model performance using cross-validation techniques.
7. Model Evaluation	Metrics	Choose appropriate evaluation metrics (accuracy, precision, recall, F1-score).
	Confusion Matrix	Analyze model performance with a confusion matrix.
	ROC Curve (if applicable)	Visualize model performance for binary classification.
8. Model Interpretability	Feature Importance	Understand the impact of different features on model predictions.
	Explainability	Use techniques for model explainability (LIME, SHAP).
9. Model Deployment	Deploy Model	Prepare the model for deployment in a production environment.
	API Integration	Create APIs for integrating the model into applications.
10. Monitoring and Maintenance	Monitoring	Implement monitoring to track model performance in real-world scenarios.
	Update Model (if necessary)	Revisit and update the model periodically based on new data and insights.
11. Common Libraries	NumPy, Pandas	Data manipulation and analysis.
	Scikit-Learn	Machine learning models, preprocessing, and evaluation.
	Matplotlib, Seaborn	Data visualization.
	TensorFlow, PyTorch	Deep learning frameworks.
12. Additional Resources	Books, Courses	Invest time in learning from reputable books and online courses.
	Community and Forums	Engage with the machine learning community for support and knowledge sharing.