Capability for Subagent Creation for runbear_file_search (and other tasks)

Runbear Feature Request: Agent-to-Agent Invocation & KB Search Subagents

Title:​ Enable Subagent Creation for Knowledge Base Search Delegation

Priority:​ High
Use Case:​ Token Budget Management & Scalability
Submitted by:​ Minted Analytics Team (Ask_Sam Agent)

***

Problem Statement

Our Ask_Sam agent (ff1379fe-51d1-4bd4-a870-e5ef6dc11d88) frequently hits the 1M token context limit due to:

1.  ​144 active tools​ consuming ~2,400 tokens per turn just for tool definitions
2.  ​runbear_file_search results​ returning 5,000-15,000 tokens per query
3.  ​Long-term memory (LTM)​ accumulating ~30,000 tokens of historical learnings
4.  ​Multi-turn conversations​ requiring full context retention

Current Impact:

•   Conversations terminated prematurely with "token limit exceeded" errors
•   User experience disrupted mid-analysis
•   Complex requests cannot be completed

***

Requested Feature: Agent-to-Agent Invocation

Core Capability:

runbear_invoke_agent(
agent_id: str,# Specialized subagent ID
task: str,# Scoped instruction
max_context_return: int,# Token limit for summary response
pass_context: bool = False# Whether to share parent context
)



Returns:

•   Compressed summary (user-defined token limit)
•   Full results remain in subagent's context
•   Parent agent receives only the synthesized response

***

Proposed Architecture

Pattern 1: Knowledge Base Search Specialist

Ask_Sam (Parent Agent)
↓ Invokes
KB_Search_Specialist (Subagent)
- Has runbear_file_search access
- Executes 3-5 searches
- Receives 15,000 tokens of raw results
- Summarizes to 300 tokens
- Returns: "Summary: [key findings]"
↓ Returns to
Ask_Sam (receives 300 tokens, not 15,000)



Pattern 2: Code Analysis Specialist

Ask_Sam
↓ Invokes
GitLab_Code_Agent (Subagent)
- Has GitLab MCP access
- Searches 5 repositories
- Analyzes data lineage
- Returns: "Field X defined in file Y:Z, depends on tables A, B"
↓ Returns concise answer
Ask_Sam



***

Implementation Options

Option A: Dedicated Subagent Templates​ ​(Recommended)
Runbear provides pre-built specialist agents:

•   kb_search_specialist - KB search + summarization
•   code_analysis_specialist - GitLab/GitHub code inspection
•   data_query_specialist - SQL/Snowflake/Hex analysis

Option B: Generic Agent Invocation
Allow any Runbear agent to invoke any other agent in the same workspace, with:

•   Configurable result compression
•   Context isolation (subagent context doesn't pollute parent)
•   Timeout controls

Option C: Built-in Search Compression
Enhance runbear_file_search itself:

runbear_file_search(
query: List[str],
max_num_results: int = 5,
compress_results: bool = True,# NEW
compression_prompt: str = None# NEW: "Summarize in <200 words"
)



***

Expected Benefits

Token Savings:

•   Current: 15,000 tokens per KB search
•   With subagent: 300 tokens per delegated search
•   ​Savings: ~14,700 tokens per search​ (~98% reduction)

Scalability:

•   Parent agent can handle longer conversations (50+ turns vs 15-20 currently)
•   Complex multi-step analysis becomes feasible
•   Parallel specialist invocations possible

User Experience:

•   No more "token limit exceeded" mid-conversation
•   More sophisticated analysis without manual conversation splitting
•   Better separation of concerns (parent = orchestration, subagents = execution)

***

Similar Patterns in Industry

•   ​Hex Threads Agent:​ Uses create_thread / get_thread for analysis delegation
•   ​Cursor Agent:​ Uses CURSOR_LAUNCH_AGENT for code tasks
•   ​OpenAI Assistants API:​ Supports agent-to-agent tool calling
•   ​LangChain:​ Multi-agent orchestration with context isolation

***

Proposed Pilot

Test Case:​ Minted Analytics Ask_Sam
Scenario:​ "Explain data lineage for bi_customers.mm_status"

Current Flow (45,000 tokens):

1.  Search KB for "mm_status" → 8,000 tokens
2.  Search GitLab refs → 10,000 tokens
3.  Search Slack discussions → 9,000 tokens
4.  Synthesize answer → 3,000 tokens
5.  Tool defs (144 tools) → 2,400 tokens
6.  LTM → 12,000 tokens
   ​Total: ~45,000 tokens

With Subagents (~8,000 tokens):

1.  Invoke KB_Search_Specialist("mm_status") → 300 tokens returned
2.  Invoke GitLab_Code_Specialist("mm_status") → 400 tokens returned
3.  Synthesize answer → 3,000 tokens

4.  Tool defs (60 tools, reduced) → 1,000 tokens
5.  LTM → 3,000 tokens
   ​Total: ~8,000 tokens​ (82% reduction)

***

Alternative Workarounds (If Feature Delayed)

1.  ​Disable unused tools​ (144 → 80) - saves 1,000 tokens/turn
2.  ​Aggressive LTM pruning​ - archive entries older than 90 days
3.  ​External MCP server​ - custom KB search compression service
4.  ​Manual conversation splits​ - user restarts every 15 turns (poor UX)

***

Contact for Follow-up

Organization:Minted.com
Primary Contact:​ Patrick Codrington (patrick.codrington@minted.com)
Agent ID:​ ff1379fe-51d1-4bd4-a870-e5ef6dc11d88
Slack Workspace:minted.slack.com (proj_ant_nothing_to_see_here)

Willing to participate in beta testing:​ Yes

Please authenticate to join the conversation.

Upvoters
Status

In Review

Board

💡 Feature Request

Date

8 days ago

Author

Patrick Codrington

Subscribe to post

Get notified by email when there are changes.