Metadata-Version: 2.4
Name: rae-cli
Version: 0.2.6
Summary: CLI tool for scaffolding analytics engineering data stacks.
Home-page: https://github.com/CodeBroJim/rae
Author: James Reeves
Author-email: james@codebrojim.com
License: Proprietary
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Classifier: License :: Other/Proprietary License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: asciimatics==1.15.0
Requires-Dist: click==8.1.7
Requires-Dist: PyYAML==6.0.2
Requires-Dist: pyfiglet==1.0.2
Requires-Dist: questionary==2.0.1
Requires-Dist: rich==14.0.0
Provides-Extra: dev
Requires-Dist: black==24.10.0; extra == "dev"
Requires-Dist: coverage==7.6.10; extra == "dev"
Requires-Dist: flake8==7.1.1; extra == "dev"
Requires-Dist: isort==5.13.2; extra == "dev"
Requires-Dist: mypy==1.10.0; extra == "dev"
Requires-Dist: nodeenv==1.9.1; extra == "dev"
Requires-Dist: pre_commit==4.2.0; extra == "dev"
Requires-Dist: Pygments==2.19.1; extra == "dev"
Requires-Dist: pytest==8.3.4; extra == "dev"
Requires-Dist: pytest-cov==6.0.0; extra == "dev"
Requires-Dist: pytest-mock==3.14.0; extra == "dev"
Requires-Dist: virtualenv==20.30.0; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# RAE - Rapid Analytics Engineering

*Perfectly imperfect, for she was a Wildflower*

**RAE** is the first *opinionated* framework that is purpose-built for the **Analytics Engineering** community — inspired by the likes of backend web frameworks like **Django**, **Flask** and **NestJS**, but with a Data Engineering twist!

**RAE:**
  - Empowers teams, solo devs, students and individual engineers to **rapidly scaffold a modern analytics engineering stack** with nothing more than a few responses to CLI prompts. From zero to fully containerized infrastructure in minutes.
    - **Users can also opt to only scaffold 1 or 2 tools versus an entire stack**
  - Abstracts away the infrastructure, container and server knowledge required to set most tools up.
  
All so you can focus on **what matters most: modeling, orchestrating, and delivering data.**

---

## What RAE Does

### Scaffold Tool Docker Configurations
Spin up a project with plug-and-play support for essential data tools:

- **Data Storage**:
  - PostgreSQL
  - MySQL  
- **Data Modeling**:
  - dbt
  - SQL Mesh  
- **Orchestration**:
  - Airflow
  - Dagster

### Auto-Generate `settings.py`
A clean and extensible settings file inspired by Django — making it easy to pass environment-specific values (ports, credentials, container names, etc.) to every component of your stack.

### Auto-Generate `docker-compose.yml`
- Connect all services via a shared Docker network.

---

Frameworks aren't just for web and mobile engineers anymore.
**RAE** gives Analytics Engineers the tools to build, connect and orchestrate their data stack with ease.

Build like a developer. Deploy like an engineer. Let **RAE** compose your analytics stack.

---

## Who Is RAE For?

- Analytics Engineers who want to quickly scaffold their required infrastructure.
- Data Engineers who need to tie various tools together.
- Data Scientists who have a need for a data tool stack.
- Individual developers and anyone learning to use analytics/data engineering tools.
- Teams that want standardization and clarity across their data stack.

---

## How to use RAE

### System Dependencies

| Tool            | Required Version | Notes                                                                 |
|-----------------|------------------|-----------------------------------------------------------------------|
| **Python**      | 3.8+             | Required for the RAE CLI tool                                         |
| **Docker Desktop**      | Latest           | Docker Desktop (macOS/Windows) or Docker Engine (Linux)               |
| **Shell**       | bash / zsh / PowerShell | Used to run CLI and Docker commands                             |
| **Web Browser** | Any modern browser | `Google Chrome` recommended for container-based UIs (e.g. Airflow)    |

---

## CLI Setup Steps

### 1. Create a Virtual Environment

| OS            | Command                          |
|---------------|----------------------------------|
| macOS/Linux   | `python3 -m venv local-env`      |
| Windows       | `py -m venv local-env`           |

### 2. Activate the Virtual Environment

| OS                  | Command                                 |
|---------------------|-----------------------------------------|
| macOS/Linux (bash/zsh) | `source local-env/bin/activate`        |
| Windows (CMD)       | `local-env\Scripts\activate.bat`        |
| Windows (PowerShell)| `local-env\Scripts\Activate.ps1`        |

### 3. Install RAE CLI:
```bash
pip install rae-cli
```

### 4. Initialize your project:
```bash
rae init
```

This will take you through a series of prompts and then generate the `project_config.json` and `settings.py` files.

After this command completes you will be left with the following project structure:
```bash
├── rae
│   └── src
│       ├── airflow
│       │   └── airflow-init.sh
│       ├── dbt
│       │   ├── analyses
│       │   ├── macros
│       │   ├── models
│       │   ├── seeds
│       │   ├── snapshots
│       │   ├── tests
│       │   ├── dbt-init.sh
│       │   ├── dbt.sh
│       │   ├── dbt_project.yml
│       │   └── Dockerfile
│       ├── docker-compose.yml
│       ├── postgres
│       │   └── postgres-init.sh
│       └── settings
│           ├── project_config.json
│           └── settings.py
```

*The above is just an example and assumes you selected postgres as your data storage, dbt as your data modeling and airflow as your orchestration with postgres as the metastore.*

### 5. Open your settings file - `{project_name}/src/settings/settings.py`
    - You need to populate this file with your specific credentials
        - `data_storage` (PostgreSQL or MySQL)
        - `data_modeling` (dbt or SQL Mesh)
        - `data_orchestration` (Airflow or Dagster)
    - If you do not do this, the project will be usable, but the project's containers will be built with default values and will NOT BE production ready nor secure.
    
**You are responsible for ensuring your project is secure, is setup properly and is ready for deployment!**

### 6. Generate your docker compose file:
1. cd into your project directory:
```bash
cd {project_name}
```
2. generate your compose file:
```bash
rae generate-compose-file
```

3. generate docker-compose file without changing directories:
```bash
rae generate-compose-file --project-name {project_name}
```

### 7. Run your project's Docker containers:
*Docker must be installed AND running on your host machine or this command will fail*
*So make sure you have Docker Desktop installed and actively running on your machine!*
```bash
cd {project_name}/src
```
Then simply start the containers:
```bash
docker-compose up -d
```

This will run the docker containers for each service and link them via a docker network. The process allows for each container to communicate with one another while still ensuring all tools operate in an isolated state.

---

# Current State of Project

## Future Implementations
```bash
1. Add secondary test coverage to the project:
  - src/cli.py
  - src/data_modeling/dbt_modeling
  - src/data_modeling/sql_mesh_modeling
  - src/data_orchestration/airflow_orchestration.py
  - src/data_orchestration/dagster_orchestration.py
  - src/data_storage/mysql_storage.py
  - src/data_storage/postgresql_storage.py
  - src/generators/docker_compose_generator.py
  
2. Continue iterating on test coverage
  - src/managers/data_modeling_manager.py
  - src/managers/data_orchestration_manager.py
  - src/managers/data_storage_manager.py
  - src/managers/settings_manager.py
  - src/utility/base_manager.py
  - src/utility/base_tool.py
  - src/utility/dockerfile_writer.py
  - src/utility/indented_dumper.py
  - src/utility/shell_script_writer.py
  - src/utility/supported_tools.py
  - src/main.py

3. Add support for additional data storage tools:
  - Snowflake
  - DuckDB?
  - SQL Server?
  - Databricks
    - AWS S3
    - Google Cloud Storage
    - Azure Blob Storage

4. Add support to allow users to scaffold single applications or custom combinations of tool stacks
  - Scenarios:
    - user only needs a data modeling tool
    - user only needs a data modeling tool and a data storage tool
    - user only needs an orchestration tool
    - etc
  - Intent:
    - To allow greater flexibility and provide a wider use-case for the CLI
```
