Blog Post

Microsoft Foundry Blog
7 MIN READ

OrganAIze - What Happens When You Let AI Agents Organize Themselves?

akashchekka's avatar
akashchekka
Icon for Microsoft rankMicrosoft
Apr 28, 2026

If you've ever wondered what would happen if you gave an AI a goal and let it figure out how to break the work down, assemble its own team of specialists, and coordinate everything on its own - that's exactly what OrganAIze is about. This is an experimental side project. It's not production-ready, and it's not trying to be a framework. It's a playground for one question: "How well can reasoning models orchestrate themselves, given nothing but a goal?"

The problem I wanted to explore

Most agentic AI systems today follow a familiar pattern: you design the workflow, you decide which agents exist, you wire the communication between them. That works great for well-defined pipelines.

 

But what if you don't know the shape of the problem upfront?

 

OrganAIze flips the approach. You hand it a goal -- something like:

  "Build a project plan: research Python web frameworks, design a REST API

   for a todo app, write a security checklist"

 

A single Genesis Agent (the orchestrator) reads the goal, reasons about it, and decides on its own:

  - How many specialist agents to spawn

  - What role and expertise each one needs

  - Whether they can run in parallel or need to run sequentially

  - What tools each agent is allowed to use

 

The specialists do their work and report back. The Genesis Agent synthesizes everything into a final output. The whole thing runs, produces results, and writes them to disk -- no human in the loop after the initial prompt.

Here's what a real session looks like:

  OrganAIze - Starting session

     Goal:       Create a comprehensive report on quantum computing
     Model:      azure/gpt-4o

  11:35:31 | genesis  | Spawning child QuantumHardwareResearcher (researcher, depth=1)
  11:35:31 | genesis  | Spawning child QuantumAlgorithmsExplainer (summarizer, depth=1)
  11:35:31 | genesis  | Spawning child QuantumCybersecurityAnalyst (critic, depth=1)
  11:35:31 | genesis  | Spawning child QuantumAdvantagePredictor (pm, depth=1)
                        ^ All 4 spawned in parallel

  --- Agent Tree ---

  [done] Genesis-Orchestrator (role=orchestrator, tokens=26,350)
    |-- [done] QuantumHardwareResearcher (role=researcher, tokens=1,094)
    |-- [done] QuantumAlgorithmsExplainer (role=summarizer, tokens=922)
    |-- [done] QuantumCybersecurityAnalyst (role=critic, tokens=1,179)
    +-- [done] QuantumAdvantagePredictor (role=pm, tokens=918)

  Total time: 38 seconds (4 agents ran in parallel)

The LLM decided to spawn four parallel agents, gave each a distinct role and name, and then synthesized the results. I didn't tell it to do any of that -- it figured it out from the goal.

How it works? - Architecture

Under the hood, OrganAIze is deliberately simple. Everything runs in-memory within a single Python process.

1. The Agent Loop

Every agent (including Genesis) runs as a LangGraph StateGraph with one core cycle:

  reason --> tools --> reason --> tools --> ... --> done

The LLM reasons about its task, optionally calls tools (including spawn_agent to create child agents), receives results, reasons again, and eventually produces a final answer. This is the standard ReAct pattern, extended with the ability to spawn other agents as a tool call.

 

Internally, the graph has three nodes:

  - reason_node: calls the LLM with the agent's system prompt and conversation history

  - tool_node: executes any tool calls the LLM emitted (including spawn_agent)

  - should_continue: a conditional edge that routes back to reason or to END

 

The LLM concurrency is capped at ORGANAIZE_MAX_CONCURRENT_LLM (default: 3) via an asyncio.Semaphore, so even parallel agents don't overwhelm the API.

  (See: core/agent.py)

2. Agent DNA -- The Blueprint

Every agent is born from an AgentBlueprint -- a Python dataclass that defines everything about the agent:

 

  - Identity:      agent_id (UUID), name, lineage (ancestry chain)

  - Persona:       role (researcher / engineer / critic / ...), persona description,

                   expertise keywords, verbosity, risk_tolerance

  - Capabilities:  tools_allowed, tools_denied, model (auto-selected by role if not set)

  - Limits:        max_steps, spawn_budget (SpawnBudget dataclass)

  - Task:          task description, success_criteria, output_format

 

The parent agent fills in the blueprint fields via the spawn_agent tool call. A prompt compiler (compile_system_prompt()) turns the blueprint into a system prompt string that gets injected into the child agent's LLM context. The spawned agent never knows it was "designed" -- it just wakes up with a persona and a job to do.

  (See: core/blueprint.py, prompts/agent.py)

3. Parallel Execution

When the LLM decides subtasks are independent, it marks them parallel: true. The system fans them out via asyncio.gather so they run concurrently -- four agents in 38 seconds instead of 2+ minutes sequentially. The LLM can also choose sequential execution when tasks depend on each other.

  (See: core/agent.py, _handle_spawn())

4. Communication Model -- Direct Output Flow

No shared memory. No pub/sub. No blackboard.

 

Child agents return their results to their parent as ToolMessage objects in the LangGraph conversation. The parent's LLM sees these outputs and can:

  - Synthesize them into a final response

  - Spawn additional agents based on what it learned

  - Request more detail by spawning follow-up agents

 

Each child output is truncated to 3,000 characters before being returned, preventing context window blow-up in the parent agent. Information flows DOWN (parent to child via spawning) and UP (child to parent via tool results). No lateral communication between siblings.

  (See: core/agent.py, _handle_spawn())

Preventing runaway agents - The cost and safety guardrails

Without guardrails, a self-spawning system could create infinite agents and burn through your API budget in seconds. I addressed this with three independent layers of spawn control plus token-level budget enforcement.

Layer 1: Depth Limit (Vertical Control)

Agents can only nest so deep. The default max depth is 4 (configurable via ORGANAIZE_MAX_DEPTH). An agent at depth 4 doesn't even see the spawn_agent tool -- it's added to tools_denied, so the LLM never has the option.

Internally, each child's SpawnBudget gets max_depth_remaining = parent's value - 1. The can_spawn property checks max_children > 0, max_depth_remaining > 0, and remaining_global > 0 -- all three must pass.

  (See: core/blueprint.py SpawnBudget, core/spawner.py)

Layer 2: Budget Halving (Horizontal Control)

Each agent gets a spawn budget that's half its parent's. This naturally creates a tree that tapers:

 

  Depth 0 (Genesis):  8 children   (ORGANAIZE_GENESIS_BUDGET)

  Depth 1:            4 children   (8 / 2)

  Depth 2:            2 children   (4 / 2)

  Depth 3:            1 child      (2 / 2)

  Depth 4:            0            (cannot spawn)

Each successful spawn decrements the parent's budget in memory.

  (See: core/spawner.py)

Layer 3: Global Agent Cap (Absolute Control)

Regardless of individual budgets, once 20 agents (configurable via ORGANAIZE_MAX_AGENTS) have been created in a session, all spawning stops. The Spawner tracks agents_created as a simple counter. This is the absolute backstop.

  (See: core/spawner.py)

Token Budget

Every session has a token cap (default: 500,000 tokens, configurable via ORGANAIZE_MAX_SESSION_TOKENS). A shared CostTracker monitors every LLM call across all agents. When the session total hits the cap, a TokenCapExceeded exception halts the run gracefully.

You can also set this per run from the CLI:

python main.py --max-tokens 100000 "Your goal here"

  (See: core/cost_tracker.py)

Per-Agent Accounting

Every LLM call is tracked per agent: input tokens, output tokens, total tokens, number of calls. At the end of a session, you get a full breakdown:

 

  --- Token Summary ---
  Total input tokens:  23,839
  Total output tokens: 6,624
  Total tokens:        30,463
  Total LLM calls:     11
  Tokens remaining:    69,537

 

The session.json output file contains per-agent token data so you can audit exactly where costs went. No surprises on your bill.

Keeping costs down - Model tiering

Not every agent needs high reasoning model. A summarizer doesn't need the same reasoning power as an orchestrator. OrganAIze supports role-based model routing via the ROLE_MODEL_DEFAULTS dictionary in config.py:

RoleSuggested ModelRationale

  Orchestrator / Genesis

gpt-4oNeeds strong reasoning for decomposition

Researcher

gpt-4oNeeds good synthesis of search results
Engineergpt-4oCode generation benefits from better models
Critic / QAgpt-4o-miniChecking is cheaper than creating
Summarizergpt-4o-miniCompression is a lightweight task
PM / Trackergpt-4o-miniStatus tracking, minimal reasoning needed

 

The parent agent can override model selection per child in the spawn_agent call. The system uses LiteLLM under the hood, so you can point it at OpenAI, Azure OpenAI, Anthropic, Ollama, or any LiteLLM-compatible provider -- just change the model prefix in your environment variable.

  (See: config.py ROLE_MODEL_DEFAULTS, get_llm_kwargs())

TOOL SAFETY

Agents only get the tools their blueprint specifies -- it's an allowlist, not a blocklist. Three checks run before every tool execution:

Deny list:

If the tool is in tools_denied, it's rejected immediately

                       (e.g., spawn_agent for depth-4 agents).

Allow list:

If the tool isn't in tools_allowed, it's rejected.

                       Agents can't discover tools outside their blueprint.

Restricted tools:

Dangerous tools (shell_exec, file_delete, network_request_external, code_execute) are only available to agents at depth <= 1 by default.

Even if a parent blueprint lists them, deeper agents are blocked.

  (See: tools/tool_registry.py, config.py RESTRICTED_TOOLS)

Session Output

Every session writes to output/<session_id>/:

  - session.json: Full metadata including agent tree, per-agent token breakdown,

                  timing, model used, configuration, and the original goal.

  - output.md:    The final synthesized markdown output from the Genesis Agent.

You get a complete audit trail of what happened, which agents were created, how much each one consumed, and the final deliverable.

Try it yourself

Check out the project page and grab the code:

  Website: OrganAIze

  GitHub:  OrganAIze

This is an experiment - And I want your thoughts and ideas

This is a side project born out of curiosity. It works, it's fun to watch, and it produces surprisingly useful results for the amount of code involved. But there's a lot more to explore:

  - Memory across sessions: Could agents learn from previous runs and improve

    their decomposition strategies over time?

  - Human-in-the-loop checkpoints: Let users approve spawn plans before

    execution, or intervene when the tree grows unexpectedly.

  - Multi-model strategies: Use different providers for different agent roles

    to optimize cost and quality simultaneously.

  - Evaluation frameworks: How do you measure whether a self-organizing system

    is actually doing a good job? What does "good decomposition" even mean?

  - Real-time visualization: A UI that shows the agent tree growing in real

    time, with token counts and status updates per node.

  - Streaming output: Stream partial results as agents complete, instead of

    waiting for the full tree to finish.

If any of this interests you -- or if you have ideas I haven't thought of -- I'd love to hear from you. Submit a PR, or just drop a message. This is an open experiment and every idea makes it better.

Closing thoughts

OrganAIze is really about one bet: reasoning models are good enough now to decompose problems, assign work, and synthesize results -- all on their own. The role of the developer shifts from designing agent workflows to setting constraints and guardrails, then letting the model figure out the execution plan.

Is it perfect? No. Is it interesting? I think so.

The goal isn't to replace carefully designed multi-agent systems. It's to explore the boundary of what's possible when you trust the model with orchestration decisions and focus your engineering effort on safety, cost control, and observability instead.

Give it a try, break it, improve it, and let me know what you find.

Updated Apr 08, 2026
Version 1.0
No CommentsBe the first to comment