If you've ever wondered what would happen if you gave an AI a goal and let it figure out how to break the work down, assemble its own team of specialists, and coordinate everything on its own - that's exactly what OrganAIze is about. This is an experimental side project. It's not production-ready, and it's not trying to be a framework. It's a playground for one question: "How well can reasoning models orchestrate themselves, given nothing but a goal?"
The problem I wanted to explore
Most agentic AI systems today follow a familiar pattern: you design the workflow, you decide which agents exist, you wire the communication between them. That works great for well-defined pipelines.
But what if you don't know the shape of the problem upfront?
OrganAIze flips the approach. You hand it a goal -- something like:
"Build a project plan: research Python web frameworks, design a REST API
for a todo app, write a security checklist"
A single Genesis Agent (the orchestrator) reads the goal, reasons about it, and decides on its own:
- How many specialist agents to spawn
- What role and expertise each one needs
- Whether they can run in parallel or need to run sequentially
- What tools each agent is allowed to use
The specialists do their work and report back. The Genesis Agent synthesizes everything into a final output. The whole thing runs, produces results, and writes them to disk -- no human in the loop after the initial prompt.
Here's what a real session looks like:
OrganAIze - Starting session
Goal: Create a comprehensive report on quantum computing
Model: azure/gpt-4o
11:35:31 | genesis | Spawning child QuantumHardwareResearcher (researcher, depth=1)
11:35:31 | genesis | Spawning child QuantumAlgorithmsExplainer (summarizer, depth=1)
11:35:31 | genesis | Spawning child QuantumCybersecurityAnalyst (critic, depth=1)
11:35:31 | genesis | Spawning child QuantumAdvantagePredictor (pm, depth=1)
^ All 4 spawned in parallel
--- Agent Tree ---
[done] Genesis-Orchestrator (role=orchestrator, tokens=26,350)
|-- [done] QuantumHardwareResearcher (role=researcher, tokens=1,094)
|-- [done] QuantumAlgorithmsExplainer (role=summarizer, tokens=922)
|-- [done] QuantumCybersecurityAnalyst (role=critic, tokens=1,179)
+-- [done] QuantumAdvantagePredictor (role=pm, tokens=918)
Total time: 38 seconds (4 agents ran in parallel)
The LLM decided to spawn four parallel agents, gave each a distinct role and name, and then synthesized the results. I didn't tell it to do any of that -- it figured it out from the goal.
How it works? - Architecture
Under the hood, OrganAIze is deliberately simple. Everything runs in-memory within a single Python process.
1. The Agent Loop
Every agent (including Genesis) runs as a LangGraph StateGraph with one core cycle:
reason --> tools --> reason --> tools --> ... --> done
The LLM reasons about its task, optionally calls tools (including spawn_agent to create child agents), receives results, reasons again, and eventually produces a final answer. This is the standard ReAct pattern, extended with the ability to spawn other agents as a tool call.
Internally, the graph has three nodes:
- reason_node: calls the LLM with the agent's system prompt and conversation history
- tool_node: executes any tool calls the LLM emitted (including spawn_agent)
- should_continue: a conditional edge that routes back to reason or to END
The LLM concurrency is capped at ORGANAIZE_MAX_CONCURRENT_LLM (default: 3) via an asyncio.Semaphore, so even parallel agents don't overwhelm the API.
(See: core/agent.py)
2. Agent DNA -- The Blueprint
Every agent is born from an AgentBlueprint -- a Python dataclass that defines everything about the agent:
- Identity: agent_id (UUID), name, lineage (ancestry chain)
- Persona: role (researcher / engineer / critic / ...), persona description,
expertise keywords, verbosity, risk_tolerance
- Capabilities: tools_allowed, tools_denied, model (auto-selected by role if not set)
- Limits: max_steps, spawn_budget (SpawnBudget dataclass)
- Task: task description, success_criteria, output_format
The parent agent fills in the blueprint fields via the spawn_agent tool call. A prompt compiler (compile_system_prompt()) turns the blueprint into a system prompt string that gets injected into the child agent's LLM context. The spawned agent never knows it was "designed" -- it just wakes up with a persona and a job to do.
(See: core/blueprint.py, prompts/agent.py)
3. Parallel Execution
When the LLM decides subtasks are independent, it marks them parallel: true. The system fans them out via asyncio.gather so they run concurrently -- four agents in 38 seconds instead of 2+ minutes sequentially. The LLM can also choose sequential execution when tasks depend on each other.
(See: core/agent.py, _handle_spawn())
4. Communication Model -- Direct Output Flow
No shared memory. No pub/sub. No blackboard.
Child agents return their results to their parent as ToolMessage objects in the LangGraph conversation. The parent's LLM sees these outputs and can:
- Synthesize them into a final response
- Spawn additional agents based on what it learned
- Request more detail by spawning follow-up agents
Each child output is truncated to 3,000 characters before being returned, preventing context window blow-up in the parent agent. Information flows DOWN (parent to child via spawning) and UP (child to parent via tool results). No lateral communication between siblings.
(See: core/agent.py, _handle_spawn())
Preventing runaway agents - The cost and safety guardrails
Without guardrails, a self-spawning system could create infinite agents and burn through your API budget in seconds. I addressed this with three independent layers of spawn control plus token-level budget enforcement.
Layer 1: Depth Limit (Vertical Control)
Agents can only nest so deep. The default max depth is 4 (configurable via ORGANAIZE_MAX_DEPTH). An agent at depth 4 doesn't even see the spawn_agent tool -- it's added to tools_denied, so the LLM never has the option.
Internally, each child's SpawnBudget gets max_depth_remaining = parent's value - 1. The can_spawn property checks max_children > 0, max_depth_remaining > 0, and remaining_global > 0 -- all three must pass.
(See: core/blueprint.py SpawnBudget, core/spawner.py)
Layer 2: Budget Halving (Horizontal Control)
Each agent gets a spawn budget that's half its parent's. This naturally creates a tree that tapers:
Depth 0 (Genesis): 8 children (ORGANAIZE_GENESIS_BUDGET)
Depth 1: 4 children (8 / 2)
Depth 2: 2 children (4 / 2)
Depth 3: 1 child (2 / 2)
Depth 4: 0 (cannot spawn)
Each successful spawn decrements the parent's budget in memory.
(See: core/spawner.py)
Layer 3: Global Agent Cap (Absolute Control)
Regardless of individual budgets, once 20 agents (configurable via ORGANAIZE_MAX_AGENTS) have been created in a session, all spawning stops. The Spawner tracks agents_created as a simple counter. This is the absolute backstop.
(See: core/spawner.py)
Token Budget
Every session has a token cap (default: 500,000 tokens, configurable via ORGANAIZE_MAX_SESSION_TOKENS). A shared CostTracker monitors every LLM call across all agents. When the session total hits the cap, a TokenCapExceeded exception halts the run gracefully.
You can also set this per run from the CLI:
python main.py --max-tokens 100000 "Your goal here"
(See: core/cost_tracker.py)
Per-Agent Accounting
Every LLM call is tracked per agent: input tokens, output tokens, total tokens, number of calls. At the end of a session, you get a full breakdown:
--- Token Summary ---
Total input tokens: 23,839
Total output tokens: 6,624
Total tokens: 30,463
Total LLM calls: 11
Tokens remaining: 69,537
The session.json output file contains per-agent token data so you can audit exactly where costs went. No surprises on your bill.
Keeping costs down - Model tiering
Not every agent needs high reasoning model. A summarizer doesn't need the same reasoning power as an orchestrator. OrganAIze supports role-based model routing via the ROLE_MODEL_DEFAULTS dictionary in config.py:
| Role | Suggested Model | Rationale |
|
Orchestrator / Genesis | gpt-4o | Needs strong reasoning for decomposition |
|
Researcher | gpt-4o | Needs good synthesis of search results |
| Engineer | gpt-4o | Code generation benefits from better models |
| Critic / QA | gpt-4o-mini | Checking is cheaper than creating |
| Summarizer | gpt-4o-mini | Compression is a lightweight task |
| PM / Tracker | gpt-4o-mini | Status tracking, minimal reasoning needed |
The parent agent can override model selection per child in the spawn_agent call. The system uses LiteLLM under the hood, so you can point it at OpenAI, Azure OpenAI, Anthropic, Ollama, or any LiteLLM-compatible provider -- just change the model prefix in your environment variable.
(See: config.py ROLE_MODEL_DEFAULTS, get_llm_kwargs())
TOOL SAFETY
Agents only get the tools their blueprint specifies -- it's an allowlist, not a blocklist. Three checks run before every tool execution:
Deny list:
If the tool is in tools_denied, it's rejected immediately
(e.g., spawn_agent for depth-4 agents).
Allow list:
If the tool isn't in tools_allowed, it's rejected.
Agents can't discover tools outside their blueprint.
Restricted tools:
Dangerous tools (shell_exec, file_delete, network_request_external, code_execute) are only available to agents at depth <= 1 by default.
Even if a parent blueprint lists them, deeper agents are blocked.
(See: tools/tool_registry.py, config.py RESTRICTED_TOOLS)
Session Output
Every session writes to output/<session_id>/:
- session.json: Full metadata including agent tree, per-agent token breakdown,
timing, model used, configuration, and the original goal.
- output.md: The final synthesized markdown output from the Genesis Agent.
You get a complete audit trail of what happened, which agents were created, how much each one consumed, and the final deliverable.
Try it yourself
Check out the project page and grab the code:
Website: OrganAIze
GitHub: OrganAIze
This is an experiment - And I want your thoughts and ideas
This is a side project born out of curiosity. It works, it's fun to watch, and it produces surprisingly useful results for the amount of code involved. But there's a lot more to explore:
- Memory across sessions: Could agents learn from previous runs and improve
their decomposition strategies over time?
- Human-in-the-loop checkpoints: Let users approve spawn plans before
execution, or intervene when the tree grows unexpectedly.
- Multi-model strategies: Use different providers for different agent roles
to optimize cost and quality simultaneously.
- Evaluation frameworks: How do you measure whether a self-organizing system
is actually doing a good job? What does "good decomposition" even mean?
- Real-time visualization: A UI that shows the agent tree growing in real
time, with token counts and status updates per node.
- Streaming output: Stream partial results as agents complete, instead of
waiting for the full tree to finish.
If any of this interests you -- or if you have ideas I haven't thought of -- I'd love to hear from you. Submit a PR, or just drop a message. This is an open experiment and every idea makes it better.
Closing thoughts
OrganAIze is really about one bet: reasoning models are good enough now to decompose problems, assign work, and synthesize results -- all on their own. The role of the developer shifts from designing agent workflows to setting constraints and guardrails, then letting the model figure out the execution plan.
Is it perfect? No. Is it interesting? I think so.
The goal isn't to replace carefully designed multi-agent systems. It's to explore the boundary of what's possible when you trust the model with orchestration decisions and focus your engineering effort on safety, cost control, and observability instead.
Give it a try, break it, improve it, and let me know what you find.