Overview

Prompt playgrounds are too simple

Prompt playgrounds work for single prompts. You edit text, test it, deploy it.

Real applications are not single prompts. A RAG pipeline retrieves documents, ranks them, generates a response. An agent picks tools and chains steps. A classifier runs multiple prompts in parallel and combines results.

These applications have many parameters: retrieval settings, model choices, thresholds, aggregation rules. A prompt playground cannot represent them.

The problem

When subject matter experts want to improve prompts, they need engineers. Every change requires editing code, deploying, testing. The back-and-forth is slow.

Evaluations run on isolated prompts. But a prompt that scores well alone might fail when combined with retrieval or aggregation. You do not know until production.

When something finally works, someone copies the configuration to production code. There is no shared source of truth. Things drift.

Custom workflows

Custom workflows let you build a playground for your actual application.

You wrap your existing code with a thin interface. Agenta calls your code and shows a UI for the configuration. Subject matter experts edit prompts and parameters. They see real outputs. Engineers own the business logic.

This gives you:

SMEs iterate on the real workflow, not an approximation
Evaluations test the complete pipeline
Configuration is versioned
Production uses the same configuration that was tested

How it works

You run your own server with the Agenta SDK. Agenta calls your server with the current configuration. Your code runs and returns the result.

In development, you run the service on your infrastructure and expose it (using ngrok or similar) so that the PM can use the playground. Agenta sends requests to your service with the configuration. Your service runs the business logic and calls your LLM gateway.

In production, you fetch the configuration from Agenta and run independently. Agenta is not in your request path. See Integration for details.

What you can configure

The SDK provides types for common parameters:

PromptTemplate: Messages, model, temperature, and other LLM settings
Floats and integers: Thresholds, limits, with validation
Booleans: Flags and toggles
Choices: Dropdowns for model selection or algorithm variants

You define configuration as a Pydantic model. Agenta renders UI controls for each field.

FAQ

What languages are supported?

Python only for now.

Does my function need to return a specific type?

Yes. The entry point function must return a str.

Does Agenta host my workflow?

No. You run your own server. Agenta connects to it via the URL you provide.

How long does setup take?

About an hour with existing code. Less with a template.

Can I use my existing framework?

Yes. Langchain, LlamaIndex, Haystack, or plain Python. Any model provider.

What about my LLM gateway?

Your code calls your gateway directly. Agenta manages configuration. It does not proxy LLM calls.

Prompt playgrounds are too simple​

The problem​

Custom workflows​

How it works​

What you can configure​

FAQ​

Prompt playgrounds are too simple

The problem

Custom workflows

How it works

What you can configure

FAQ