Metadata-Version: 2.4
Name: outboxml
Version: 0.10.1
Summary: Framework for ML and DS: Automating pipelines from training to deployment
Author: Semyon Semyonov, Dmitry Bochkarev, Maxim Matcera, Dmitry Zotov
Author-email: Vladimir Suvorov <justsuvorov@gmail.com>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Framework :: FastAPI
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: catboost==1.2.8
Requires-Dist: shap==0.50.0
Requires-Dist: loguru==0.7.2
Requires-Dist: scikit-learn==1.7.2
Requires-Dist: pydantic==2.12.4
Requires-Dist: statsmodels==0.14.5
Requires-Dist: pyarrow==22.0.0
Requires-Dist: SQLAlchemy>=2.0.31
Requires-Dist: plotly==5.22.0
Requires-Dist: kaleido==0.2.1
Requires-Dist: mlflow==3.6.0
Requires-Dist: pandas==2.3.3
Requires-Dist: polars==1.35.0
Requires-Dist: numpy==2.3.5
Requires-Dist: environs>=11.0.0
Requires-Dist: pretty_html_table==0.9.16
Requires-Dist: optbinning==0.20.0
Requires-Dist: ortools==9.15.6755
Requires-Dist: optuna==4.6.0
Requires-Dist: openpyxl==3.1.5
Requires-Dist: nbformat==5.10.4
Requires-Dist: phik==0.12.5
Requires-Dist: fastapi==0.121.2
Requires-Dist: uvicorn==0.38.0
Requires-Dist: schedule==1.2.2
Requires-Dist: psycopg2-binary
Requires-Dist: scipy>=1.15.3
Requires-Dist: xgboost==3.1.1
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: jupyter; extra == "dev"
Dynamic: license-file

# README
OutBoxML is an open-source framework designed to improve the process of automating machine learning pipelines from model training to deployment. This toolkit integrates several key components including Python for model development, Grafana for monitoring, FastAPI for serving models, and MLFlow for experiment tracking and management. Our aim is to provide a robust and user-friendly platform for ML practitioners to efficiently build, deploy, and monitor their ML solutions with ease. 

The key components include:
- **AutoML**: Use AutoML algorithm with boosting or implement your custom models using low-code solution 
- **MLFlow**: Track experiments, parameters, and outputs with MLFlow .
- **Grafana Monitoring**: Utilize Grafana dashboards to monitor ML models performance in real-time
- **FastAPI**: Host the models with FastAPI that allows for quick deployment and testing of ML models via RESTful APIs.
- **PostgreSQL**: Use open source database to store and update data for AutoML proceses

The main connections between components are made with Docker, the framework requires OS with Docker и Docker Compose installed.

## Communications between the containers
All containers use one Docker network, by default (`<project>_default`):
- **MLflow** Communicates with PostgreSQL using `postgre`.
- **Prometheus** collect metrics from `node-exporter`.
- **FastAPI** Sends metrics to MLflow with REST API.
 

## Ports
By default containers map to the following ports:
- **MLflow**: `5000:5000`
- **Grafana**: `3000:3000`
- **Prometheus**: `9090:9090`
- **Node Exporter**: `9100:9100`
- **Jupyter Notebook**: `8889:8888`
- **FastAPI**: `8000:8000`
- **Minio**: `9001:9001`
  
## Getting Started
- Change the directory to outboxml/app
- Run the create-folder.bat(on Windows) or create-folder.sh(on Linux) before starting any other actions.

1. To start the project change the directory to outboxml/app
   ```bash
   docker compose up
   ```
   or for backround lunch
   ```bash
   docker compose up -d
   ```

- To restart:
  ```bash
  docker compose down && docker compose up --build
  ```
- To stop the project:
  ```bash
  docker compose down
  ```

2. Check availablity
   - MLflow: [http://localhost:5000](http://localhost:5000)
   - Grafana: [http://localhost:3000](http://localhost:3000) (default login/password: `admin/admin`)
   - Prometheus: [http://localhost:9090](http://localhost:9090)
   - Jupyter Notebook: [http://localhost:8889](http://localhost:8889)
   - FastAPI: [http://localhost:8000](http://localhost:8000)

3. Ensure that all containters are up
   ```bash
   docker ps
   ```

4. For testing of FastAPI use Swagger docs: [http://localhost:8000/docs](http://localhost:8000/docs).

5. Minio setup 
- Open http://localhost:9001 (login: minio, password: Strong#Pass#2022)
- Click "Create Bucket" with name "mlflow"
- Open the bucket and edit "Access Policy:"
- Set Access Policy to Public and click set

## Network restrictions and security concerts
- The containers are isolated 
- Use firewall on the host machine for extra security 

1. **Jupyter Notebook**
   - By default open without password or token
2. **Prometheus и Grafana**
   - Manually connect Prometheus to Grafana.

## Possible issues and solutions 
1. **The ports are in use**:
   - Find and free the neccesary ports:
     ```bash
     sudo lsof -i:<порт>
     ```
   - Alternatively change the ports in `docker-compose.yml`.

2. **No connection between containers**:
   - Check names of Docker network:
     ```bash
     docker network inspect <project>_default
     ```

3. **No connections between FastAPI and MLflow**:
   - Check connections MLflow API:
     ```bash
     curl http://mlflow:5000/api/2.0/mlflow/experiments/list
     ```
4. Instructions if packages won't install via pip install
```
1. Log in as a user with sudo privileges.
2. Open the file /etc/default/docker:
    $ sudo nano /etc/default/docker
3. Find and uncomment or add the following line:
    DOCKER_OPTS="--dns 8.8.8.8"
4. Save and close the file.
5. Restart the Docker daemon service:
    $ sudo systemctl restart docker
```

## Contributing
We welcome contributions from the community! If you'd like to contribute, please follow the contributing guidelines outlined in docs/CONTRIBUTING.md.

## License
This project is licensed under the MIT License - see the LICENSE.md file for details.

## Support
For support, please open an issue on GitHub or contact the maintainers directly.

## Acknowledgements
We would like to thank VSK, whose support has been pivotal to the success of the project.
Special thanks to Vladimir Nikulin, who not only supervised the business aspects of the project but also provided invaluable insights and guidance on its integration with business workflows.
We appreciate the support of our Data Science department for integrating the framework into ML processes, and extend special thanks to the MLOps team, especially Aleksey Makeev and Dmitry Zotov, or their contributions to testing and DevOps integration.

## Current contributors
- Semyon Semyonov - Original codebase development, system design and product management
- Vladimir Suvorov - Core code development and software architecture
- Dmitry Bochkarev - Code development and data science model implementation
- Maxim Matcera - Development of specific modules
   
   

