CatBoost Python Library Cheat Sheet

1. Getting Started

Classifier: model = cb.CatBoostClassifier() model.fit(X_train, y_train) predictions = model.predict(X_test)
Regressor: model = cb.CatBoostRegressor() model.fit(X_train, y_train) predictions = model.predict(X_test)

What is CatBoost? A gradient boosting library for ML, developed by Yandex.
Differences: Handles categorical features automatically, uses ordered boosting.
Key features: Fast training, GPU support, built-in feature importance.
Pros:
- Excellent performance on categorical data
- Less prone to overfitting
- Handles missing values automatically
- Fast prediction time
- Built-in GPU acceleration
Cons:
- Can be slower to train than other boosting algorithms
- May require more memory for large datasets
- Less community support compared to some alternatives
- Fewer customization options for advanced users

Training: model = cb.CatBoostClassifier(iterations=1000, learning_rate=0.1) model.fit(X_train, y_train, cat_features=['category1', 'category2'])
Parameters: Set learning_rate, iterations, depth.
Categorical features: Specify with cat_features parameter.
Missing values: Handled automatically.

Metrics: from sklearn.metrics import accuracy_score, f1_score accuracy = accuracy_score(y_true, y_pred) f1 = f1_score(y_true, y_pred, average='weighted')
Cross-validation: scores = model.cross_validate(X, y, cv=5)
Hyperparameter tuning: from sklearn.model_selection import GridSearchCV params = {'depth': [4, 6, 8], 'learning_rate': [0.01, 0.1]} grid_search = GridSearchCV(model, params, cv=3) grid_search.fit(X, y)

Feature importance: importances = model.get_feature_importance()
Partial dependence: pd_results = model.calc_partial_dependence(X[['feature1', 'feature2']])
SHAP values: import shap explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X)

Grid search: See Model Evaluation section.
Random search: from sklearn.model_selection import RandomizedSearchCV random_search = RandomizedSearchCV(model, param_distributions, n_iter=10)
Bayesian optimization: import optuna def objective(trial): params = { 'depth': trial.suggest_int('depth', 4, 10), 'learning_rate': trial.suggest_loguniform('learning_rate', 1e-3, 1.0) } model = cb.CatBoostClassifier(**params) return model.cross_validate(X, y, cv=3)['test-accuracy'].mean() study = optuna.create_study(direction='maximize') study.optimize(objective, n_trials=100)