Core Concepts
The Agent is the atom. Everything else — teams, workflows, database agents, API
agents — is a composition or specialization of Agent, never a parallel system.
┌─────────────────────────────┐
Your code ─────► │ Agent │ (public API, sync + async)
└──────────────┬──────────────┘
│ orchestrates
┌───────────┬───────────┬────────┼────────┬───────────┬───────────┐
▼ ▼ ▼ ▼ ▼ ▼ ▼
Providers Tools Memory RAG Context Cost Tracing
(LLM calls) (functions) (history) (retrieval)(prompt) (tokens→$) (spans)
Higher-order compositions, all built ON TOP of Agent (never around it):
Workflow ── DAG of steps Team ── collaborating agents Server ── agent.serve()
DatabaseAgent ── NL→SQL APIAgent ── call HTTP APIs Integrations ── Shopify/SAP/…
The agent run loop
When you call agent.run(prompt) (sync) or await agent.arun(prompt) (async):
1. Assemble context → system prompt + memory history + (optional) retrieved RAG context + prompt
2. Call the provider → normalized LLM request (with your tools' JSON schemas)
3. Tool calls? → validate args, execute the tool, feed the result back ──┐
(loop, bounded by max_steps) ◄────────────────────────┘
4. Final answer → return AgentResponse(content, messages, usage, steps, trace_id)
5. Along the way → record a span tree (trace) + token usage → cost; persist to memory
Tools auto-generate their JSON schema from your function's signature, type hints, and docstring — you never hand-author schemas. Every run is traced and costed.
1. Agents
from neuroagent import Agent
agent = Agent(provider="openai", model="gpt-4.1", system_prompt="Answer in one sentence.")
resp = agent.run("Who wrote Pride and Prejudice?")
print(resp.content) # final text
print(resp.usage) # Usage(prompt_tokens=..., completion_tokens=...)
print(resp.steps) # number of model/tool iterations
print(resp.trace_id) # links to this run's trace
# Async + streaming variants
import asyncio
asyncio.run(agent.arun("...")) # await agent.arun(...)
# agent.ask(...) / agent.aask(...) are aliases that read naturally with RAG
2. Providers
One API, many models. Switch with a string:
Agent(provider="openai", model="gpt-4.1")
Agent(provider="anthropic", model="claude-opus-4-8")
Agent(provider="gemini", model="gemini-2.0-flash")
Agent(provider="groq", model="llama-3.3-70b-versatile")
Agent(provider="echo") # offline, no key — for tests & demos
OpenAI-compatible endpoints (Azure OpenAI, OpenRouter, Together, vLLM, …) work via
base_url. You can also register a custom provider:
from neuroagent.providers import register_provider, LLMProvider
# class MyProvider(LLMProvider): ... (implement complete/stream/embed)
register_provider("myprovider", MyProvider)
Agent(provider="myprovider")
3. Tools
Decorate any function with @tool. The agent decides when to call it, runs it, and
continues reasoning with the result. Sync and async tools are both supported.
from neuroagent import Agent, tool
@tool
def get_weather(city: str) -> str:
"""Return the current weather for a city."""
return f"It's 22°C and sunny in {city}."
@tool(read_only=True)
async def search_docs(query: str) -> str:
"""Search internal documentation."""
...
agent = Agent(provider="openai", tools=[get_weather, search_docs])
print(agent.run("What should I wear in Paris today?").content)
# Add tools after construction:
agent.add_tool(get_weather)
4. Streaming
Stream tokens as they arrive with astream:
import asyncio
from neuroagent import Agent
async def main():
agent = Agent(provider="openai")
async for token in agent.astream("Tell me a short story about a robot."):
print(token, end="", flush=True)
asyncio.run(main())
NeuroAgent AI