Template

Template for self-hosted ML Projects

lukas a2e4b5c69c feat: Filter notebook output		2025-10-27 11:59:42 +01:00
.devcontainer	feat: Filter notebook output	2025-10-27 11:59:42 +01:00
.forgejo	feat: `pyproject.toml` as template	2025-10-14 13:27:11 +00:00
data	feat: Install Data Science Cookie Cutter	2025-10-14 12:19:22 +00:00
notebooks	feat: Initial Template	2025-10-27 10:31:03 +01:00
references	feat: Install Data Science Cookie Cutter	2025-10-14 12:19:22 +00:00
reports	feat: Install Data Science Cookie Cutter	2025-10-14 12:19:22 +00:00
src	feat: Install Data Science Cookie Cutter	2025-10-14 12:19:22 +00:00
.gitattributes	feat: Filter notebook output	2025-10-27 11:59:42 +01:00
.gitignore	feat: MinIO Dataset sync	2025-10-16 07:56:39 +00:00
CODE_OF_CONDUCT	feat: Add Code of Conduct	2025-10-15 06:50:03 +00:00
LICENSE	feat: Install Data Science Cookie Cutter	2025-10-14 12:19:22 +00:00
Makefile	feat: MinIO Dataset sync	2025-10-16 07:56:39 +00:00
mc	feat: MinIO Dataset sync	2025-10-16 07:56:39 +00:00
pyproject.toml	feat: `pyproject.toml` as template	2025-10-14 13:27:11 +00:00
README.md	feat: Filter notebook output	2025-10-27 11:59:42 +01:00
requirements.txt	feat: Initial Template	2025-10-27 10:31:03 +01:00
setup.cfg	feat: Install Data Science Cookie Cutter	2025-10-14 12:19:22 +00:00

README.md

$REPO_NAME

$REPO_DESCRIPTION

Getting Started

These instructions will give you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Requirements for the software and other tools to build, test and push

If you have a nvidia GPU make sure the CUDA Toolkit is installed for utilizing your GPU.

This repo is build as a VS Code Dev Container, meaning inside of Visual Studio Code a Docker Container is started.
All installation and development will be made inside this container.
With this abstraction layer, a standardized environment for everyone is created.

Make sure to set the MinIO Alias in the .devcontainer/.env in the following format:

MC_HOST_origin=https://USERNAME:PASSWORD@s3.lukas-gysin.ch

📌 Dataset selection
Make sure the correct dataset is selected.
By default each ml project has its own dataset.
If you want to use a shared dataset, edit the DATASET variable in the Makefile.
This step must only be configred once, directly after clonig this template

📌 MinIO Bucket
If you are not working with an existing dataset, make sure the MinIO bucket exists.
mc mb origin/dataset-${REPO_NAME_KEBAB}

After cloning the repository update pip, install the python requirements and configure the git filter.

make requirements
git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --ClearMetadataPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'

Project Organization

├── LICENSE            <- MIT Open-source license
├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
│
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml     <- Project configuration file with package metadata for 
│                         src and configuration for tools like black
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.cfg          <- Configuration file for flake8
│
└── src                <- Source code for use in this project.
    │
    ├── __init__.py             <- Makes `src` a Python module
    │
    ├── config.py               <- Store useful variables and configuration
    │
    ├── dataset.py              <- Scripts to download or generate data
    │
    ├── features.py             <- Code to create features for modeling
    │
    ├── modeling                
    │   ├── __init__.py 
    │   ├── predict.py          <- Code to run model inference with trained models          
    │   └── train.py            <- Code to train models
    │
    └── plots.py                <- Code to create visualizations

Authors

Lukas Gysin - Main Contributor & Project Owner

License

This project is licensed under the MIT License

Acknowledgments

Billie Thompson - Provided README Template
Contributor Covenant - Provided CODE_OF_CONDUCT Template