Computations and control flow: it’s just programming
LLMs and data
Author
Cody Peterson
Published
October 14, 2023
Introduction
The recent Generative AI hype cycle has led to a lot of new terminology to understand. In this post, we’ll cover some key concepts from the groud up and explain the basics of working with LLMs in the context of data.
Connect to the data and load a table into a variable.
Context
Context is a fancy way of talking about the input to a LLM.
Calls
We make calls with inputs to functions or systems and get outputs. We can think of calling the LLM with our input (context) and getting an output (text).
Computations
A function or system often computes something. We can be pedantic about calls versus computations, but in general the connotation around computations is more time and resource intensive than a call. At the end of the day, they will both take some computer cycles.
Retrieval augmented generation (RAG)
Instead of you typing out context for the bot, we can retrieve context from somewhere, augment our strings sent to the bot with this context, and then generate a response from the bot.
from ibis.expr.schema import Schemafrom ibis.expr.types.relations import Table@marvin.ai_fndef sql_select( text: str, table_name: str= t.get_name(), schema: Schema = t.schema()) ->str:"""writes the SQL SELECT statement to query the table according to the text"""query ="the unique combination of species and islands"sql = sql_select(query).strip(";")sql
'SELECT DISTINCT species, island FROM penguins'
Notice that we retrieved the table name and schema with calls to the Ibis table (t.get_name() and t.schema()). We then augment our context (the query in natural language) with this information and generate a response from the bot.
This works reasonably well for simple SQL queries:
I would argue in this case there wasn’t any real computation done by our calls to the Ibis table – we were just retrieving some relatively static metadata – but we could have done some more complex computations (on any of 18+ data platforms).
Thought leadership
TODO: human rewrite
In the realm of Generative AI, particularly when working with Language Learning Models (LLMs), understanding the concept of ‘context’ is crucial. Context, in this domain, refers to the inputs that are fed into an LLM, and the corresponding outputs they generate. This post breaks down the complexities of this process into understandable fragments, including retrieval of context, its augmentation, and, thereafter, the generation of a response.
An illustrative example is provided, showcasing a database interaction. It demonstrates how the data retrieved can be used to augment the context before the bot generates a response. This valuable insight underlines the practical application of the theory, reinforcing the understanding of the readers.
We also venture into the difference between simple static metadata retrieval and the more intricate computations. This distinction echoes the breadth and depth of the processes involved in Generative AI.
As we continue to explore and unravel the potential of Generative AI and LLMs, this post serves as a fundamental building block. It creates a pathway for enthusiasts and professionals alike to delve deeper into this exciting field. By breaking down complex concepts into comprehensible segments, it fosters an environment of learning and growth.
This marks just the beginning of our journey into the world of Generative AI. As we dig deeper, we will continue to explore, learn and share with our readers. Stay tuned for more insightful content in this series. [1]