Metadata-Version: 2.4
Name: dv-core-asta
Version: 0.1.0
Summary: DataVoyager Core with Asta Integration - AI agent for data analysis
Requires-Python: <3.14,>=3.12
Requires-Dist: a2a-sdk
Requires-Dist: aiosqlite
Requires-Dist: asta-agent>=1.0.13
Requires-Dist: asta-artifact==1.0.0
Requires-Dist: asta-sandbox[modal-kernel]>=0.1.1
Requires-Dist: asta-types==1.0.0
Requires-Dist: asyncpg
Requires-Dist: autogen-agentchat==0.4.4
Requires-Dist: autogen-ext[azure,diskcache,openai]==0.4.4
Requires-Dist: beautifulsoup4==4.13.4
Requires-Dist: boto3>=1.34
Requires-Dist: click==8.1.8
Requires-Dist: docker
Requires-Dist: google-cloud-logging~=3.12.1
Requires-Dist: greenlet
Requires-Dist: ipython
Requires-Dist: jupyter-server==2.17.0
Requires-Dist: matplotlib==3.10.0
Requires-Dist: modal~=1.4.0
Requires-Dist: nest-asyncio==1.6.0
Requires-Dist: nora-lib-impl==1.8.1
Requires-Dist: numpy==2.2.2
Requires-Dist: pandas==2.2.3
Requires-Dist: pdfminer==20191125
Requires-Dist: pydantic-core>=2.29.0
Requires-Dist: pydantic==2.11.3
Requires-Dist: pyjwt>=2.8
Requires-Dist: pyyaml==6.0.2
Requires-Dist: requests>=2.32.0
Requires-Dist: scikit-learn==1.6.1
Requires-Dist: scipy==1.15.1
Requires-Dist: seaborn==0.13.2
Requires-Dist: sqlalchemy
Requires-Dist: statsmodels==0.14.4
Requires-Dist: tenacity~=9.0.0
Requires-Dist: termcolor==2.5.0
Requires-Dist: websocket-client>=1.8.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio==1.2.*; extra == 'dev'
Requires-Dist: pytest-timeout==2.4.*; extra == 'dev'
Requires-Dist: pytest==8.4.*; extra == 'dev'
Description-Content-Type: text/markdown

This is the asta-flavor of DataVoyager. The standalone and original DataVoyager is
[datavoyager-core](https://github.com/allenai/datavoyager-core/).
Documentation there may still be relevant for unchanged parts of the core.

# Development Setup
If you only need to work on the agent itself (this repo) without use of the
Asta webapp or ecosystem, follow this guide. For e2e development of DV-Asta integration, 
see https://github.com/allenai/nora/wiki/DataVoyager-development which details all 
the processes you need control of.

This project uses [uv](https://docs.astral.sh/uv/) for dependency management
and assumes python 3.12. Install uv, then

    uv sync

Go to 1password and get "datavoyager agent secrets." You will need to export
these vars to your environment. If you use something like direnv, you can
put those exports directly in your `.envrc` file. Don't forget to `direnv allow`.

Make a copy of `.env.template` to `.env`. If the secrets are exported to your 
environment then no further changes are needed. Otherwise, to get those secrets 
into docker containers, edit `.env` with all the secret values.

Check that you can start the agent in local mode. You should see something like this.

    $> uv run python local.py
    * config_file=config/datavoyager_modal_deployment_magentic_one_config_20250617.yaml
    * json_logger.log_filename=threads/local-2025-08-05T10-28-36.981698/log.jsonl
    * md_logger.log_filename=threads/local-2025-08-05T10-28-36.981698/log.md
    * asta_logs.log_file=threads/local-2025-08-05T10-28-36.981698/log.istore.json
    * console_log=threads/local-2025-08-05T10-28-36.981698/console.log 
    ===============================================================================================
    You: (enter two blank lines to submit)

Here you can interact with the agent like you would in the Asta webapp. A number of 
Asta integrations have been swapped with plaintext versions. DataVoyager's reasoning
will spray into the console. As it happens you will be able to see evidence of various 
Asta-specific features, though rendered as plaintext. A few are called out in the excerpts
from an example run below.

Here I pasted one of the stock queries from https://datavoyager.allen.ai into the terminal.

    Out of age and gender, which factor affects the survival of titanic passengers the most? Use 
    the following dataset.
    
    • s3://ai2-asta-workspaces/sampledata/titanic.csv
    
    
    -----------------------------------------------------------------------------------------------

Step progress events have a unique treatment in the Asta UX, 
though they are simply mentioned in plaintext here.

    >>> entering step >>> Checking for datasets
    <<< exited step <<< Loaded 1 new dataset(s)
    >>> entering step >>> Pre-processing request
    <<< exited step <<< Pre-processing request
    >>> entering step >>> Investigating

`<dvtext>` marks spans that will not be forwarded to the Asta UX,
but is important to DV's internal reasoning.

    ==================================================
    MagenticOneOrchestrator
    <dvtext>
    We are working to address the following REQUEST:
    
    Question: Out of age and gender, which factor affects the survival of titanic passengers the 
    most? Use the following dataset.
    
    • s3://ai2-asta-workspaces/sampledata/titanic.csv

    Python variable name for dataset: data_0
    Dataset preview:
    Dataset head:
    PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
    0            1         0       3  ...   7.2500   NaN         S
    1            2         1       1  ...  71.2833   C85         C
    2            3         1       3  ...   7.9250   NaN         S

Eventually the agent will finish with what we consider the answer to the user query which 
is annotated with `<dvoutput>`. The Asta UX picks up these tags to power specific UX behaviors.

    <<< exited step <<< Investigation complete.
    
    Asta: <dvoutput cell_id="10">
    Based on the analysis of the Titanic dataset, the factor that has the most significant impact 
    on passenger survival is gender. The logistic regression results indicate that being female, 
    compared to being male, increases the odds of survival by approximately 1124.74%. In contrast, 
    age has a negligible effect on the odds of survival, with the regression suggesting only a 
    minor change of -0.47% per additional year. Thus, gender is the more impactful factor affecting
    survival rates among Titanic passengers.
    </dvoutput>
    
    ===============================================================================================
    You: (enter two blank lines to submit)


## Deployment

Make sure you have set up the necessary secrets in your environment, as described above.
Set up the `.env` file so that underlying docker containers will have those secrets.

The modal app name is figured as `f"dv-core.{ENV}"`. Here is the 
[modal.com dashboard](https://modal.com/apps/nora-prod/main) with all existing environments.
The `dv-core.rc` environment is **automatically updated** with the code in the **main branch** via a github action, 
so long as they pass tests. Other environments must be updated manually. For example, to deploy to my own personal
development deployment called `dv-core.jasond`

    ENV=jasond make deploy

The prod modal environment `dv-core.prod` is the last stable version of this codebase and is depended upon by the asta.allen.ai asta environment for code stability. When it is time to release this codebase's main to the prod modal environment, run whatever automated and manual tests needed to gain confidence that the current version of code is good before deploying it. To deploy to the prod modal environment you can run the deploy command above locally (changing the `ENV` value), or you can use the `deploy-prod` github workflow (under the actions tab). The latter is preferred. Please note: your changes will *not* be reflected in the asta.allen.ai environment, unless your changes are in the prod modal environment.

## Logs

Logs from the main DV process in modal are forwarded to GCP to the ai2-reviz project.
They can be correlated with other activity there. 
See https://github.com/allenai/nora/wiki/Log-Filtering-and-Navigation

## Remote File Sharing

Remote executions (Docker or Modal) use a shared `FileShareSpec` abstraction to declare which
paths should exist inside the runtime. See `docs/remote_file_sharing.md` for the full guide.
