Home
Omar Hosney
Scikit-learn Cheatsheet
Supervised Learning 📊
- Linear Regression: Fit a linear model.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X, y)
- Logistic Regression: Classification using logistic function.
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X, y)
- Decision Tree: Non-linear decision boundaries.
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X, y)
- Support Vector Machine: Classification using hyperplanes.
from sklearn.svm import SVC
model = SVC()
model.fit(X, y)
Unsupervised Learning 🌐
- K-Means: Clustering data into K groups.
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
- PCA: Dimensionality reduction technique.
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
- DBSCAN: Density-based clustering.
from sklearn.cluster import DBSCAN
dbscan = DBSCAN()
dbscan.fit(X)
Data Preprocessing 🔄
- Standardization: Scale features to zero mean and unit variance.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
- Normalization: Scale features to a range [0, 1].
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_normalized = scaler.fit_transform(X)
- Encoding Categorical: Convert categories to numbers.
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
X_encoded = encoder.fit_transform(X)
Ensemble Methods 🧩
- Random Forest: Ensemble of decision trees.
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X, y)
- Gradient Boosting: Sequential tree boosting.
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier()
model.fit(X, y)
- AdaBoost: Adaptive boosting method.
from sklearn.ensemble import AdaBoostClassifier
model = AdaBoostClassifier()
model.fit(X, y)
Export Model 💾
- Save Model: Save trained model to a file.
from sklearn.externals import joblib
joblib.dump(model, 'model.pkl')
- Load Model: Load trained model from a file.
from sklearn.externals import joblib
model = joblib.load('model.pkl')
Pipeline ⏩
- Pipeline: Chain preprocessing and modeling steps.
from sklearn.pipeline import Pipeline
pipeline = Pipeline([('scaler', StandardScaler()), ('model', SVC())])
pipeline.fit(X, y)
Dimensionality Reduction 🔻
- PCA: Principal Component Analysis.
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
- LDA: Linear Discriminant Analysis.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
X_reduced = lda.fit_transform(X, y)