This code demonstrates how to use Optuna for hyperparameter optimization in machine learning models, including SVM, RandomForest, and GradientBoosting. The objective is to dynamically and statically tune the hyperparameters to maximize model accuracy.
Steps covered: Data Preprocessing: The code starts by loading the Pima Indians Diabetes dataset, cleaning it by replacing zeros in certain columns with NaN, and filling the missing values with column-wise means. Then, it splits the dataset into training and testing sets, and applies standard scaling to the features.
Objective Function: An objective function is defined for the Optuna optimization. It selects one of the three models (SVM, RandomForest, or GradientBoosting) and tunes their respective hyperparameters (like C, kernel, gamma for SVM; n_estimators, max_depth, and others for RandomForest; and n_estimators, learning_rate, and others for GradientBoosting). The function returns the cross-validated accuracy score.
Optuna Study: The Optuna study is created with the goal of maximizing the accuracy. The study performs 100 trials and identifies the best hyperparameters for each classifier based on the optimization results.
Model Evaluation: After finding the best hyperparameters, the model is retrained using the optimal settings and evaluated on the test set to determine its accuracy.
Visualization: Multiple Optuna visualizations, such as optimization history, parallel coordinate plots, and parameter importance, are generated to help visualize the optimization process and model performance.
Comparison: The code also allows for the comparison of classifier performance, including grouping by classifier type and calculating the mean accuracy for each model.