Metadata-Version: 2.1
Name: pyavrio
Version: 20.0.3
Summary: Python client library for Avrio
Home-page: https://github.com/avrioproductionsupport/pyavrio
Author: Avrio Team
Author-email: avrioproductionsupport@trianz.com
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Database :: Front-Ends
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: backports.zoneinfo; python_version < "3.9"
Requires-Dist: python-dateutil
Requires-Dist: pytz
Requires-Dist: requests>=2.31.0
Requires-Dist: tzlocal
Requires-Dist: sqlalchemy>=1.3
Requires-Dist: sql_metadata==2.11.0
Requires-Dist: modin==0.32.0
Requires-Dist: ray==2.42.1
Provides-Extra: all
Requires-Dist: requests_kerberos; extra == "all"
Requires-Dist: sqlalchemy>=1.3; extra == "all"
Provides-Extra: kerberos
Requires-Dist: requests_kerberos; extra == "kerberos"
Provides-Extra: sqlalchemy
Requires-Dist: sqlalchemy>=1.3; extra == "sqlalchemy"
Provides-Extra: tests
Requires-Dist: requests_kerberos; extra == "tests"
Requires-Dist: sqlalchemy>=1.3; extra == "tests"
Requires-Dist: httpretty<1.1; extra == "tests"
Requires-Dist: pytest; extra == "tests"
Requires-Dist: pytest-runner; extra == "tests"
Requires-Dist: pre-commit; extra == "tests"
Requires-Dist: black; extra == "tests"
Requires-Dist: isort; extra == "tests"
Provides-Extra: external-authentication-token-cache
Requires-Dist: keyring; extra == "external-authentication-token-cache"

# PyAvrio Library for Avrio Product



PyAvrio allows you to query and transform data in [Avrio](https://avriodata.ai/) (Data To AI) platform directly without having to download the data locally. This library provides seamless access to the Avrio platform, allowing users to execute SQL queries and retrieve metadata such as catalog names, schema names, table names, and column information.



## Getting Started



### Installation

You can install PyAvrio via pip:



```bash

pip install pyavrio

```



### Usage

To start using PyAvrio, you first need to import the PyAvrioFunctions module:



```python

from pyavrio import PyAvrioFunctions

```



### Connecting to Avrio

To connect to the Avrio platform, use the avrio_engine method:



```python

from pyavrio import PyAvrioFunctions



# Define connection parameters

user_email = "your_email@example.com"

password = "your_password"

host = "host"

port = 1234 

catalog = "your_catalog"

platform = "data_sources"  # Platform should be either "data_products" or "data_sources"



# Establish connection to Avrio

engine = PyAvrioFunctions.avrio_engine(f"pyavrio://{user_email}:{password}@{host}:{port}/{catalog}?platform={platform}")



```



### Using SQL

You can execute SQL queries using the execute_sql_query method:



```python

sql_query = """

    SELECT column1, column2 FROM table_name LIMIT 10

"""



result = PyAvrioFunctions.execute_sql_query(engine, sql_query)

```

Replace sql_query with your desired SQL query string.



### Querying Data

```python

import pandas as pd



# Execute query and store result in DataFrame

df = pd.DataFrame(result, columns=['column1', 'column2'])

print(df.head())



# Perform DataFrame operations

# Example: Filter DataFrame

filtered_df = df[df['column1'] > 100]

print(filtered_df.head())

```

### DataFrame Aggregation

```python

# Example: Aggregating DataFrame

aggregated_df = df.groupby('column1').agg({'column2': 'sum'}).reset_index()

print(aggregated_df.head())

```



### DataFrame Join

```python

sql_query2 = """

    SELECT column3, column4 FROM second_table LIMIT 10

"""

result2 = PyAvrioFunctions.execute_sql_query(engine, sql_query2)

df2 = pd.DataFrame(result2, columns=['column3', 'column4'])



# Join DataFrames

joined_df = df.merge(df2, on='common_column')

print(joined_df.head())

```



### Available Methods

PyAvrio provides the following methods for interacting with the Avrio platform:



- avrio_engine: Connects to the Avrio platform.

- execute_sql_query: Executes SQL queries.

- get_catalog_names: Retrieves catalog names. (Requires platform=data_products for data products or platform=data_sources for data sources). For data products, catalog name represents the domain name, and schema name represents the subdomain name. For data sources, it is similar to Trino catalog and schema.

- get_schema_names: Retrieves schema names. (Requires platform=data_products for data products or platform=data_sources for data sources)

- get_table_names: Retrieves table names. (Requires platform=data_products for data products or platform=data_sources for data sources)

- get_table_columns: Retrieves column information for a specified table. (Requires platform=data_products for data products or platform=data_sources for data sources)



```python

# Retrieve catalog names

catalogs = PyAvrioFunctions.get_catalog_names(engine)

print("Catalogs:", catalogs)



# Retrieve schema names

schemas = PyAvrioFunctions.get_schema_names(engine)

print("Schemas:", schemas)



# Retrieve table names

tables = PyAvrioFunctions.get_table_names(engine, schema='schema_name')

print("Tables:", tables)



# Retrieve columns information for a table

columns_info = PyAvrioFunctions.get_table_columns(engine, schema='schema_name', table_name='table_name')

print("Columns Information:", columns_info)

```

### Supported Operations



DML operations are only supported for Data Sources and not for Data Products, while DDL operations are supported by both Data Sources and Data Products in PyAvrio.



### Example of DML Query

```python

# Example of executing a DML query

dml_query = """

    INSERT INTO table_name (column1, column2) VALUES (value1, value2)

"""



result = PyAvrioFunctions.execute_sql_query(engine, dml_query)

print(result)  



```
