This repository demonstrates how to use GitHub Actions to automate the process of training a machine learning model, storing the model, and versioning it. This allows you to easily update and improve your model in a collaborative environment.
Prerequisites
- GitHub account
- Basic knowledge of Python and machine learning
- Git command-line tool (optional)
Getting Started
- Fork this Repository: Click the “Fork” button at the top right of this repository to create your own copy.
- Clone Your Repository:
git clone https://github.com/your-username/your-forked-repo.git cd your-forked-repo
- GitHub account
- Basic knowledge of Python and machine learning
- Git command-line tool (optional)
Running the Workflow
Customize Model Training
- Modify the
train_model.py
script in thesrc/
directory according to your dataset and model requirements. This script generates synthetic data for demonstration purposes.
Push Your Changes:
- Commit your changes and push them to your forked repository.
GitHub Actions Workflow:
- Once you push changes to the main branch, the GitHub Actions workflow will be triggered automatically.
View Workflow Progress:
- You can track the progress of the workflow by going to the “Actions” tab in your GitHub repository.
Retrieve the Trained Model:
- After the workflow completes successfully, the trained model will be stored in the
models/
directory.
Model Evaluation
The model evaluation is performed automatically within the GitHub Actions workflow. The evaluation results (e.g., F1 Score) are stored in the metrics/
directory.
Versioning the Model
Each time you run the workflow, a new version of the model is created and stored. You can access and use these models for your projects.
GitHub Actions Workflow Details
The workflow consists of the following steps:
- Generate and Store Timestamp: A timestamp is generated and stored in a file for versioning.
- Model Training: The
train_model.py
script is executed, which trains a random forest classifier on synthetic data and stores the model in themodels/
directory. - Model Evaluation: The
evaluate_model.py
script is executed to evaluate the model’s F1 Score on synthetic data, and the results are stored in themetrics/
directory. - Store and Version the New Model: The trained model is moved to the
models/
directory with a timestamp-based version. - Commit and Push Changes: The metrics and updated model are committed to the repository, allowing you to track changes.
Model Calibration Workflow
Overview
The model_calibration_on_push.yml
workflow is a part of the automation process for machine learning model calibration within this repository. It is essential for ensuring that the model’s predictions are accurate and well-calibrated, a critical step in machine learning applications.
Workflow Purpose
This workflow’s primary purpose is to calibrate a trained machine learning model after each push to the main branch of the repository. Calibration is a crucial step to align model predictions with reality, particularly when dealing with classification tasks. In simple terms, calibration ensures that a model’s predicted probabilities match the actual likelihood of an event happening.
Workflow Execution
Let’s break down how this workflow operates step by step:
Step 1: Trigger on Push to Main Branch
- This workflow is automatically initiated when changes are pushed to the main branch of the repository. It ensures that the model remains calibrated and up-to-date with the latest data and adjustments.
Step 2: Prepare Environment
- The workflow begins by setting up a Python environment and installing the necessary Python libraries and dependencies. This is crucial to ensure that the model calibration process can be executed without any issues.
Step 3: Load Trained Model
- The trained machine learning model, which has been previously saved in the
models/
directory, is loaded into memory. This model should be the most recent version, as trained by thetrain_model.py
script.
Step 4: Calibrate Model Probabilities
- In this step, the model’s predicted probabilities are calibrated. Calibration methods, such as Platt scaling or isotonic regression, are applied. These methods adjust the model’s predicted probabilities to match the actual likelihood of an event occurring. This calibration step is critical for reliable decision-making based on the model’s predictions.
Step 5: Save Calibrated Model
- The calibrated model is saved back to the
models/
directory. It is given a distinct identifier to differentiate it from the original, uncalibrated models. This ensures that both the original model and the calibrated model are available for comparison and use.
Step 6: Commit and Push Changes
- This final step involves committing the calibrated model and any other relevant files to the repository. It is essential to keep track of the changes made during the calibration process and to store the calibrated model in the repository for future applications and reference.
Customization
The model_calibration_on_push.yml
workflow can be customized to align with your specific project requirements. You can modify calibration methods, the directory where the calibrated model is saved, or any other aspects of the calibration process to meet your project’s unique needs.
Integration with Model Training
This workflow is designed to work seamlessly with the main model training workflow, model_retraining_on_push.yml
. In the initial workflow, the model is trained, and in this workflow, the calibrated model is generated. The calibrated model can then be used in applications where precise, well-calibrated probabilities are essential.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- This project uses GitHub Actions for continuous integration and deployment.
- Model training and evaluation are powered by Python and scikit-learn.
Questions or Issues
If you have any questions or encounter issues while using this GitHub Actions workflow, please open an issue in the Issues section of your repository.