Metadata-Version: 2.4
Name: thermoverse
Version: 0.1.0
Summary: ML models for materials property prediction
Author-email: Rashid Ali <tumhari@email.com>
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scikit-learn
Requires-Dist: lightgbm
Requires-Dist: xgboost
Dynamic: license-file

# Research Find

Materials property prediction models and training scripts.

## Overview
- Collection of training and evaluation scripts for materials datasets (181600, 50000, 5000).

## Quick start
1. Create a Python virtual environment and activate it.
2. Install dependencies (add your project's requirements to `requirements.txt`).
3. Run training or evaluation scripts, e.g.:

```powershell
python train_181600_models.py
```

## Files of interest
- `train_181600_models.py` — training script you're editing.
- `models/` — saved models (ignored by default).
- `outputs/` — predictions and reports (ignored by default).

## How to publish to GitHub
1. Create a new repo on GitHub (choose a name like `research-find`).
2. From this folder run:

```powershell
git init
git add .
git commit -m "Initial commit"
git branch -M main
git remote add origin https://github.com/your-username/repo-name.git
git push -u origin main
```

Replace `your-username/repo-name.git` with your repository URL.

If you want me to run these commands and push, give me the repository URL or let me know and I'll guide you through creating a PAT for authentication.

## Model description

- Purpose: Build accurate, generalizable machine-learning predictors for thermodynamic and mechanical materials properties to accelerate screening and discovery.
- Data: Trained on curated datasets (`Materials_Dataset_181600.csv`, `Materials_Dataset_50000.csv`, `Materials_Dataset_FIXED.csv`).
- Model types: Ensemble models (XGBoost / CatBoost / RandomForest-style) trained per-property with cross-validation and ensembling.
- Inputs: Composition-based features and engineered descriptors produced by preprocessing scripts.
- Outputs: Per-property CSV predictions under `outputs/` and evaluation summaries (R², RMSE) stored in `outputs/` and `models/`.
- Usage: create a Python environment, install dependencies, then run training and evaluation scripts such as `python train_181600_models.py`.
- Notes: large model files and outputs are excluded via `.gitignore`. If datasets or model artifacts exceed GitHub file size limits (>100MB) enable Git LFS for those paths.

