Auxiliary tools

LLMs and data
Author

Cody Peterson

Published

October 16, 2023

Introduction

As a product manager, I don’t spend most of my time managing products. I suspect most data developers (analysts, engineers, scientists, etc.) don’t spend most of their time writing data code. There are many auxiliary tasks that are required in doing many technical jobs. These include:

  • searching the Internet for information
  • reading, summarizing, and synthesizing information
  • performing boring computer tasks
  • translating between different languages (e.g. SQL and Python; English and Spanish)
  • copying and modifying existing code
  • querying some basic informatin from data platforms

What if we could, through natural language, have a bot perform many of these tasks (in addition to basic data analysis) on our behalf?

We’re using Python, let’s use Python

We’re already using Python for Ibis and Marvin. Let’s use it for auxillary tools. We’ll setup our data and AI platform connections and some simple example data to work with.

Code
import ibis
import marvin

from dotenv import load_dotenv

load_dotenv()

con = ibis.connect("duckdb://penguins.ddb")
t = ibis.examples.penguins.fetch()
t = con.create_table("penguins", t.to_pyarrow(), overwrite=True)
1
Import the libraries we need.
2
Load the environment variable to setup Marvin to call our OpenAI account.
3
Setup the demo datain an Ibis backend.
import ibis
import marvin

from ibis.expr.schema import Schema
from ibis.expr.types.relations import Table

ibis.options.interactive = True
marvin.settings.llm_model = "openai/gpt-4"

con = ibis.connect("duckdb://penguins.ddb")
t = con.table("penguins")
1
Import Ibis and Marvin.
2
Configure Ibis (interactive) and Marvin (GPT-4).
3
Connect to the data and load a table into a variable.

Filesystem tools

Internet tools

AI-powered tools

Introducing Ibis Birdbrain

Introduce the bot. Need to overview the tools here I think, but should probably skip most details.

A comparison with MLOps

TODO: point on how most of the work is not ML

Before “MLOps” was a standard term, the Sculley et al paper from 2015 described the key issues with building real-world ML systems.

Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surround infrastructure is vast and complex.

You can substitute “LLM” for “ML” in the above figure.

The “toy problem” problem

ML and LLMs are cool! They’re fun to play with and it’s easy to get distracted with fun applications. Often, ML is learned through solving toy problems, and …

The application landscape is vast

…and thus requires modular, interoperable, customizable, and extensible tools. TODO: more comparison to MLOps.

Next steps

You can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data.

Back to top