A powerful desktop application to scrape, generate synthetic data from documents, merge, analyze, build datasets, and fine-tune models with an Unsloth-based LoRA training stack. Train on RunPod or locally, run fully local inference, then ship to Hugging Face Hub.
A complete desktop studio for ML dataset curation and model fine-tuning
finefoundry.db) is now the sole storage mechanism for all application dataapp_logs table with queryable accessff_settings.json — Settings now in databasesrc/saved_configs/ — Training configs now in databaselogs/ directory — Logs now in database--verbose flag for detailed debug output during generation--config flag to load options from YAML config files--keep-server flag for model caching between batch runs (10x faster subsequent runs)FineFoundry is a native desktop application built with Flet that streamlines the entire machine learning data workflow. From scraping raw data to fine-tuning models, everything happens in one intuitive interface.
Everything you need for dataset curation and model fine-tuning
Scrape from 4chan, Reddit, Stack Exchange, or generate synthetic data from PDFs and documents using local LLMs.
Combine multiple database sessions (and optionally Hugging Face datasets when online) into unified training sets with automatic column mapping.
Analyze datasets with sentiment, toxicity, duplicates, class balance, and data leakage detection modules.
Create train/val/test splits and push to Hugging Face Hub with auto-generated dataset cards.
Train models on RunPod or locally via Docker using an Unsloth-based LoRA fine-tuning stack with shared configs and outputs.
Parameter-efficient fine-tuning with LoRA via Unsloth, packing support, and automatic checkpoint resumption.
Built with Flet for cross-platform native UI
LoRA fine-tuning with PyTorch, Transformers, PEFT, bitsandbytes
Seamless Hugging Face Hub push and pull
Automated pod and network volume management
All processing happens locally on your machine
Quote-chain, cumulative, and adjacent modes
Use as GUI or programmatic API
Open source and free to use
Everything you need to get started with FineFoundry
Clone the repo and run with uv, or use a classic virtualenv.
# Recommended: uv (matches project docs)
git clone https://github.com/SourceBox-LLC/FineFoundry.git FineFoundry-Core
cd FineFoundry-Core
# Install uv if needed
pip install uv
# Run the app (creates an isolated env and installs deps)
uv run src/main.py
# Alternative: classic venv + pip
python -m venv venv
# Activate (Windows PowerShell)
./venv/Scripts/Activate.ps1
# Activate (macOS/Linux)
source venv/bin/activate
# Install dependencies
pip install -e .
Start the desktop application
# If using uv (recommended)
uv run src/main.py
# If using a virtualenv + pip
python src/main.py
# Or use Flet directly
flet run src/main.py
The desktop window will open with tabs for Data Sources, Dataset Analysis, Merge Datasets, Training, Inference, Publish, and Settings.
Collect conversational training data from multiple sources with configurable pairing modes.
Multi-board scraping with quote-chain and cumulative pairing
Subreddits or single posts with parent-child threading
Q&A pairs from accepted answers
Generate Q&A, CoT, or summaries from PDFs/docs using local LLMs
Publish datasets and LoRA adapters (Phase 1) directly to Hugging Face Hub.
username/my-dataset)Fine-tune LLMs using an Unsloth-based LoRA training stack on RunPod or locally via Docker.
Cloud GPU training with automated pod and network volume management
Train on your local GPU using the same Unsloth trainer image
Both targets use docker.io/sbussiso/unsloth-trainer:latest with:
Run local inference against adapters from completed training runs with prompt history and Full Chat View.
Powered by the same stack as training:
AutoModelForCausalLM, AutoTokenizer)PeftModel) for adapter loadingCombine multiple database sessions (and optionally Hugging Face datasets when online) into a unified training set.
Comprehensive quality analysis with togglable modules for different metrics.
Automate FineFoundry workflows with command-line tools and Python APIs.
uv run src/scrapers/reddit_scraper.py \
--url https://www.reddit.com/r/AskReddit/ \
--max-posts 50 \
--mode contextual \
--pairs-path reddit_pairs.json
uv run src/save_dataset.py
from src.scrapers.fourchan_scraper import scrape
pairs = scrape(
board="pol",
max_threads=150,
max_pairs=5000,
mode="contextual",
strategy="cumulative"
)
Run training jobs in containers on RunPod or locally.
Default image: docker.io/sbussiso/unsloth-trainer:latest
Cloud GPU training with automated infrastructure
/data/outputs/...Configure authentication, proxies, integrations, and run a built-in System Check diagnostics panel.
finefoundry.db) and never sent to external servers.
Join the FineFoundry community
FineFoundry is open source and welcomes contributions! Whether you're adding new scrapers, improving analysis modules, enhancing the UI, or fixing bugs, your input is valuable.
Python 3.10+
Flet
Datasets (HF)
Docker
PyTorch
RunPod
Hugging Face Hub
REST APIs
Unsloth
Transformers
PEFT / LoRA
bitsandbytes
SyntheticDataKit
vLLM
SQLite