Metadata-Version: 2.4
Name: datagenix
Version: 0.1.1
Summary: A robust and simple library for generating synthetic datasets for ML/DL projects.
Author-email: Your Name <your.email@example.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/datagenix
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.3.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: Faker>=8.0.0

# data_genix

A robust and simple library for generating synthetic datasets for machine learning and deep learning projects. Avoid the hassle of downloading and managing data files for testing and prototyping.

## Installation

Clone the repository and install using pip:

```bash
git clone [https://github.com/yourusername/data_genix.git](https://github.com/yourusername/data_genix.git)
cd data_genix
pip install .
```

## Quick Start

Generate a DataFrame with a variety of data types with a single function call.

```python
from data_genix import DataGenerator

# Initialize the generator
generator = DataGenerator()

# Generate a dataset with 1000 rows
df = generator.generate(
    num_rows=1000,
    numerical_whole=3,
    decimal=2,
    categorical=2,
    ordinal=1,
    boolean=1,
    datetime=1,
    text=1,
    uuid=1,
    object_types=['name', 'country', 'email', 'job']
)

print(df.head())
print(df.info())
```

## Features

- **Numerical Data**: Generate columns of whole numbers (integers) or decimals (floats).
- **Categorical Data**: Generate columns with a predefined set of unordered categories.
- **Ordinal Data**: Generate columns with a predefined set of *ordered* categories.
- **Boolean Data**: Generate columns of `True`/`False` values.
- **Datetime Data**: Generate columns with `datetime` objects.
- **Text Data**: Generate columns with random sentences.
- **ID Data**: Generate columns with unique identifiers (UUIDs).
- **Coordinates**: Generate paired latitude and longitude columns.
- **Web Data**: Generate columns for IP addresses, URLs, and phone numbers.
- **Nested Data**: Generate columns containing JSON-formatted strings.
- **Object/Text Data**: Leverage the power of the `Faker` library to generate realistic text data like names, addresses, emails, and much more.

### Supported `object_types`

You can use any standard `Faker` provider method name as a string in the `object_types` list. Common examples include:

- `name`
- `email`
- `address`
- `country`
- `city`
- `job`
- `text`
- `datetime`
- `phone_number`
- `company`
- `url`
- `credit_card_number`

