Template for self-hosted ML Projects
Find a file Use this template
2025-10-27 11:59:42 +01:00
.devcontainer feat: Filter notebook output 2025-10-27 11:59:42 +01:00
.forgejo feat: pyproject.toml as template 2025-10-14 13:27:11 +00:00
data feat: Install Data Science Cookie Cutter 2025-10-14 12:19:22 +00:00
notebooks feat: Initial Template 2025-10-27 10:31:03 +01:00
references feat: Install Data Science Cookie Cutter 2025-10-14 12:19:22 +00:00
reports feat: Install Data Science Cookie Cutter 2025-10-14 12:19:22 +00:00
src feat: Install Data Science Cookie Cutter 2025-10-14 12:19:22 +00:00
.gitattributes feat: Filter notebook output 2025-10-27 11:59:42 +01:00
.gitignore feat: MinIO Dataset sync 2025-10-16 07:56:39 +00:00
CODE_OF_CONDUCT feat: Add Code of Conduct 2025-10-15 06:50:03 +00:00
LICENSE feat: Install Data Science Cookie Cutter 2025-10-14 12:19:22 +00:00
Makefile feat: MinIO Dataset sync 2025-10-16 07:56:39 +00:00
mc feat: MinIO Dataset sync 2025-10-16 07:56:39 +00:00
pyproject.toml feat: pyproject.toml as template 2025-10-14 13:27:11 +00:00
README.md feat: Filter notebook output 2025-10-27 11:59:42 +01:00
requirements.txt feat: Initial Template 2025-10-27 10:31:03 +01:00
setup.cfg feat: Install Data Science Cookie Cutter 2025-10-14 12:19:22 +00:00

$REPO_NAME

$REPO_DESCRIPTION

Getting Started

These instructions will give you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Requirements for the software and other tools to build, test and push

If you have a nvidia GPU make sure the CUDA Toolkit is installed for utilizing your GPU.

This repo is build as a VS Code Dev Container, meaning inside of Visual Studio Code a Docker Container is started.
All installation and development will be made inside this container.
With this abstraction layer, a standardized environment for everyone is created.

Make sure to set the MinIO Alias in the .devcontainer/.env in the following format:

MC_HOST_origin=https://USERNAME:PASSWORD@s3.lukas-gysin.ch

📌 Dataset selection
Make sure the correct dataset is selected.
By default each ml project has its own dataset.
If you want to use a shared dataset, edit the DATASET variable in the Makefile.
This step must only be configred once, directly after clonig this template

📌 MinIO Bucket
If you are not working with an existing dataset, make sure the MinIO bucket exists.

mc mb origin/dataset-${REPO_NAME_KEBAB}

After cloning the repository update pip, install the python requirements and configure the git filter.

make requirements
git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --ClearMetadataPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'

Project Organization

├── LICENSE            <- MIT Open-source license
├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
│
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml     <- Project configuration file with package metadata for 
│                         src and configuration for tools like black
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.cfg          <- Configuration file for flake8
│
└── src                <- Source code for use in this project.
    │
    ├── __init__.py             <- Makes `src` a Python module
    │
    ├── config.py               <- Store useful variables and configuration
    │
    ├── dataset.py              <- Scripts to download or generate data
    │
    ├── features.py             <- Code to create features for modeling
    │
    ├── modeling                
    │   ├── __init__.py 
    │   ├── predict.py          <- Code to run model inference with trained models          
    │   └── train.py            <- Code to train models
    │
    └── plots.py                <- Code to create visualizations

Authors

License

This project is licensed under the MIT License

Acknowledgments