c = RSACandidate(id='c1', loop_id=0, prompt='Hi')
c.response = 'Hey'
test_eq(c.id, 'c1')
test_eq(c.prompt, 'Hi')
ccore
RSACandidate
def RSACandidate(
id:str, loop_id:int, prompt:str, response:str=None, parent_ids:list=None
):
A candidate response in the RSA algorithm
RSA
def RSA(
task_prompt:str, # The main task/question to solve
agg_prompt:str=None, # Custom aggregation prompt
model:str='openrouter/google/gemini-3-flash-preview', # LLM model to use
N:int=4, # Population size (candidates per loop)
K:int=3, # Number of candidates to aggregate
loops:int=2, # Number of aggregation loops
history:list=None, # History of all candidates
temperature:float=1.0, # LLM temperature
n_workers:int=4, # Parallel workers
):
Recursive Self-Aggregation algorithm for LLM response aggregation
a = RSA(task_prompt='A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much does the ball cost?')
print(a)a._call_llm(a.task_prompt)Configuration
RSA uses litellm for LLM calls, which automatically reads API keys from environment variables:
OPENAI_API_KEYfor OpenAI modelsANTHROPIC_API_KEYfor Anthropic models
OPENROUTER_API_KEYfor OpenRouter models- etc.
You can also set a custom endpoint globally:
import litellm
litellm.api_base = "https://your-endpoint.com/v1"See litellm’s provider docs for the full list of supported providers and their environment variables.
c1 = RSACandidate(id='c1', loop_id=0, prompt='test', response='Answer A')
c2 = RSACandidate(id='c2', loop_id=0, prompt='test', response='Answer B')
print(a._build_agg_prompt([c1, c2]))RSA.get_prompts
def get_prompts(
loop_id, cands:NoneType=None
):
Generate candidate prompts for a given loop: N initial candidates, or all C(n,K) combinations for aggregation
# Test loop 0
cands = a.get_prompts(loop_id=0)
test_eq(len(cands), a.N)
test_eq(cands[0].prompt, a.task_prompt)# Test loop 1+ (with prior candidates)
prior = L(RSACandidate(id=str(uuid.uuid4()), loop_id=0, prompt='test', response=f'Answer {i}') for i in range(8))
cands = a.get_prompts(loop_id=1, cands=prior)
test_eq(len(cands), a.N)print(cands[0].prompt)cands = a._run_loop(loop_id=0)
test_eq(len(cands), a.N)
assert all(c.response is not None for c in cands)
assert cands[0].response != cands[1].responseRSA.run
def run(
):
Run the full RSA algorithm for the configured number of loops and return the final candidate pool
a = RSA(task_prompt='A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much does the ball cost?', loops=2)
result = a.run()
print(f"Final pool: {len(result)}, History: {len(a.history)}")RSA.aggregate
def aggregate(
method:str='llm', final_agg_prompt:NoneType=None, response_format:NoneType=None
):
Final aggregation: one LLM call to aggregate all final loop candidates, with optional structured output
# Test with custom aggregation prompt
custom_prompt = "Combine these answers into one final answer:"
prompt, result = a.aggregate(final_agg_prompt=custom_prompt)
assert custom_prompt in prompt
assert isinstance(result, str)
assert len(result) > 0
# Test with response_format (structured output)
from pydantic import BaseModel
class Answer(BaseModel):
answer: str
confidence: float
prompt, result = a.aggregate(response_format=Answer)
assert isinstance(result, str) # litellm returns JSON string
print(prompt, result)# Test 'llm' aggregation
prompt, result = a.aggregate(method='llm')
assert isinstance(prompt, str)
assert isinstance(result, str)
assert len(result) > 0
assert 'CANDIDATE ANSWERS' in prompt
# Test 'random' aggregation
prompt, result = a.aggregate(method='random')
test_eq(prompt, '')
assert isinstance(result, str)
assert len(result) > 0how can i improve mine?
🤖Reply🤖
🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠
Here are the key improvements to consider, roughly in order of impact:
Different temperatures per phase — Use higher temperature (e.g., 1.0-1.2) for initial generation to maximize diversity, then lower temperature (e.g., 0.7) for aggregation steps where you want more focused synthesis.
Tracking diversity — The paper notes diversity should decrease over loops as errors get pruned. You could add a simple diversity metric (e.g., embedding similarity between candidates) to monitor this and potentially adjust parameters.
Smarter candidate selection — Instead of random
shuffle()[:N]for selecting which K-combinations to use, you could prioritize combinations that maximize diversity (picking candidates that are more different from each other).Configurable aggregation prompt per loop — The paper suggests the aggregation task may benefit from different prompting strategies at different stages. You could allow
agg_promptto be a callable that takesloop_id.Early stopping — If all candidates converge to very similar answers before
loopsis reached, you could stop early.
Which of these interests you most? I can help you think through the implementation details.