Metadata-Version: 2.1
Name: duckdb-spark
Version: 1.0.4
Summary: A custom PySpark extension for writing data to DuckDB
Home-page: https://github.com/ChannabasavAngadi/spark_duckDB_extension.git
Author: Channabasav Angadi
Author-email: channuangadi077@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

# DuckDB Extension for PySpark

Since DuckDB supports only a **single writer at a time**, writing directly from PySpark can lead to **locking errors** due to Spark's multi-worker write process.  

This custom **PySpark extension** provides a reliable way to write **DataFrames to DuckDB**, ensuring smooth data transfer without concurrency issues.

## Features

- ✅ **Seamlessly write PySpark DataFrames to DuckDB**  
- ✅ **Supports `overwrite` and `append` modes**  
- ✅ **Automatically detects and adds new columns when appending data**  
- ✅ **Simple integration with PySpark's `DataFrameWriter` API**  

## Installation

You can install the package using `pip`:

```bash
pip install duckdb-spark


## Usage

```bash
from pyspark.sql import SparkSession
from duckdb_extension import register_duckdb_extension

spark = SparkSession.builder.appName("DuckDB Example").getOrCreate()

# Register the DuckDB extension
register_duckdb_extension(spark)

df=spark.read.csv("employe.csv",header=True)

# Use the custom extension to write the DataFrame to DuckDB and specify the table name
df.write.duckdb_extension(database="./company_database.duckdb", table_name="employe_tbl", mode="append")


