Loading...
Loading...
Avg 40.0 stars per repo.

โ ๏ธ Unstable: Project Still Under Development โ ๏ธ
Sonnet Scripts is a collection of pre-built data architecture patterns that you can quickly spin up on a local machine, along with examples of real-world data that you can use with it.
One of the challenges of making content and tutorials on data is the lack of established data infrastructure and real-world datasets. I have often found myself repeating this process over and over again, therefore we decided to create an open-source repo to expedite this process.
According to the Academy of American Poets, a "...sonnet is a fourteen-line poem written in iambic pentameter, employing one of several rhyme schemes, and adhering to a tightly structured thematic organization." Through the constraints of a particular sonnet format, poets throughout centuries have pushed their creativity to express themselves-- William Shakespear being one of the most well-known. I've similarly seen data architectures fill the same role as a sonnet, where their specific patterns push data practioners to think of creative ways to solve business problems.
Welcome to Sonnet Scripts โ a fully containerized environment designed for data analysts, analytics engineers, and data engineers to experiment with databases, queries, and ETL pipelines. This repository provides a pre-configured sandbox where users can ingest data, transform it using SQL/Python, and test integrations with PostgreSQL, DuckDB, MinIO and more!
This project is ideal for:
Before setting up the environment, ensure you have the following installed:
Docker & Docker Compose
Make (for automation)
choco install makePython (3.12+)
git clone https://github.com/onthemarkdata/sonnet-scripts.git
cd sonnet-scrips
make setup
This will:
make load-db
make verify-db
make test
make exec-pythonbase
make exec-postgres
make exec-duckdb
make exec-pipelinebase
make load-db-postgres-to-minio
This command:
make load-db-minio-to-duckdb
make check-minio
make check-duckdb
make run-all-data-pipelines
This runs the entire ETL process from PostgreSQL to MinIO to DuckDB.
make stop
make rebuild
make rebuild-clean
This removes all containers, volumes, and images before rebuilding from scratch.
make status
make logs
For a specific container: make logs c=container_name
๐ sonnet-scripts
โโโ ๐ pythonbase/ # Python-based processing container
โโโ ๐ pipelinebase/ # ETL pipeline and data ingest container
โโโ ๐ linuxbase/ # Base container for Linux dependencies
โโโ ๐ jupyterbase/ # Jupyter container for analytics and data science
โโโ ๐ cli/ # Sonnet CLI tool
โโโ ๐ณ docker-compose.yml # Container orchestration
โโโ ๐ Makefile # Automation commands
โโโ ๐ README.md # You are here!
The Sonnet CLI lets you scaffold and run your own local Modern Data Stack projects anywhere on your machine. Zero to running SQL in under 5 minutes.
make install-cli
# Create a project with default services (pgduckdb + pgadmin)
sonnet init myproject
# Or interactively select which services to include
sonnet init myproject --interactive
cd myproject
# Start all services
sonnet up
# Check status
sonnet status
# Stop all services
sonnet down
| Service | Description | Port | |---------|-------------|------| | pgduckdb | PostgreSQL with DuckDB extension | 5432 | | pgadmin | pgAdmin 4 web interface | 8080 | | cloudbeaver | CloudBeaver web interface | 8978 | | minio | S3-compatible object storage | 9000, 9001 | | jupyterbase | Jupyter Lab for Python/SQL | 8888 | | pipelinebase | ETL pipelines and data loading | - | | dbtbase | dbt Core for transformations | - |
After running sonnet up, access your stack:
postgresql://postgres:postgres@localhost:5432/postgresGithub Actions automates builds, test, and environment validation. The pipeline:
pythonbase, linuxbase)docker composemake test)main or feature/*mainWant to improve Sonnet Scripts? Here's how:
For major changes, please open an issue first to discuss your proposal.
We follow Conventional Commits for all commit messages.
Maintained by:
[Juan Pablo Urrutia] GitHub: jpurrutia LinkedIn: Juan Pablo Urrutia
[Mark Freeman] GitHub: onthemarkdata LinkedIn:Mark Freeman II
If you have questions or encounter issues, feel free to:
๐ Happy data wrangling!