As a product manager, I don’t spend most of my time managing products. I suspect most data developers (analysts, engineers, scientists, etc.) don’t spend most of their time writing data code. There are many auxiliary tasks that are required in doing many technical jobs. These include:
searching the Internet for information
reading, summarizing, and synthesizing information
performing boring computer tasks
translating between different languages (e.g. SQL and Python; English and Spanish)
copying and modifying existing code
querying some basic informatin from data platforms
What if we could, through natural language, have a bot perform many of these tasks (in addition to basic data analysis) on our behalf?
We’re using Python, let’s use Python
We’re already using Python for Ibis and Marvin. Let’s use it for auxillary tools. We’ll setup our data and AI platform connections and some simple example data to work with.
Connect to the data and load a table into a variable.
Filesystem tools
Internet tools
AI-powered tools
Introducing Ibis Birdbrain
Introduce the bot. Need to overview the tools here I think, but should probably skip most details.
A comparison with MLOps
TODO: point on how most of the work is not ML
Before “MLOps” was a standard term, the Sculley et al paper from 2015 described the key issues with building real-world ML systems.
Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small black box in the middle. The required surround infrastructure is vast and complex.
You can substitute “LLM” for “ML” in the above figure.
The “toy problem” problem
ML and LLMs are cool! They’re fun to play with and it’s easy to get distracted with fun applications. Often, ML is learned through solving toy problems, and …
The application landscape is vast
…and thus requires modular, interoperable, customizable, and extensible tools. TODO: more comparison to MLOps.
Next steps
You can get involved with Ibis Birdbrain, our open-source data & AI project for building next-generation natural language interfaces to data.