core

Recursive Self-Aggregation (RSA) - A general-purpose LLM aggregation algorithm using litellm based on the paper https://rsa-llm.github.io/

source

RSACandidate


def RSACandidate(
    id:str, loop_id:int, prompt:str, response:str=None, parent_ids:list=None
):

A candidate response in the RSA algorithm

c = RSACandidate(id='c1', loop_id=0, prompt='Hi')
c.response = 'Hey'
test_eq(c.id, 'c1')
test_eq(c.prompt, 'Hi')
c

source

RSA


def RSA(
    task_prompt:str, # The main task/question to solve
    agg_prompt:str=None, # Custom aggregation prompt
    model:str='openrouter/google/gemini-3-flash-preview', # LLM model to use
    N:int=4, # Population size (candidates per loop)
    K:int=3, # Number of candidates to aggregate
    loops:int=2, # Number of aggregation loops
    history:list=None, # History of all candidates
    temperature:float=1.0, # LLM temperature
    n_workers:int=4, # Parallel workers
):

Recursive Self-Aggregation algorithm for LLM response aggregation

a = RSA(task_prompt='A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much does the ball cost?')
print(a)

a._call_llm(a.task_prompt)

Configuration

RSA uses litellm for LLM calls, which automatically reads API keys from environment variables:

OPENAI_API_KEY for OpenAI models
ANTHROPIC_API_KEY for Anthropic models
OPENROUTER_API_KEY for OpenRouter models
etc.

You can also set a custom endpoint globally:

import litellm
litellm.api_base = "https://your-endpoint.com/v1"

See litellm’s provider docs for the full list of supported providers and their environment variables.

c1 = RSACandidate(id='c1', loop_id=0, prompt='test', response='Answer A')
c2 = RSACandidate(id='c2', loop_id=0, prompt='test', response='Answer B')

print(a._build_agg_prompt([c1, c2]))

source

RSA.get_prompts


def get_prompts(
    loop_id, cands:NoneType=None
):

Generate candidate prompts for a given loop: N initial candidates, or all C(n,K) combinations for aggregation

# Test loop 0
cands = a.get_prompts(loop_id=0)
test_eq(len(cands), a.N)
test_eq(cands[0].prompt, a.task_prompt)

# Test loop 1+ (with prior candidates)
prior = L(RSACandidate(id=str(uuid.uuid4()), loop_id=0, prompt='test', response=f'Answer {i}') for i in range(8))
cands = a.get_prompts(loop_id=1, cands=prior)
test_eq(len(cands), a.N)

print(cands[0].prompt)

cands = a._run_loop(loop_id=0)
test_eq(len(cands), a.N)
assert all(c.response is not None for c in cands)
assert cands[0].response != cands[1].response

source

RSA.run


def run(
    
):

Run the full RSA algorithm for the configured number of loops and return the final candidate pool

a = RSA(task_prompt='A bat and ball cost $1.10 total. The bat costs $1 more than the ball. How much does the ball cost?', loops=2)
result = a.run()
print(f"Final pool: {len(result)}, History: {len(a.history)}")

source

RSA.aggregate


def aggregate(
    method:str='llm', final_agg_prompt:NoneType=None, response_format:NoneType=None
):

Final aggregation: one LLM call to aggregate all final loop candidates, with optional structured output

# Test with custom aggregation prompt
custom_prompt = "Combine these answers into one final answer:"
prompt, result = a.aggregate(final_agg_prompt=custom_prompt)
assert custom_prompt in prompt
assert isinstance(result, str)
assert len(result) > 0

# Test with response_format (structured output)
from pydantic import BaseModel

class Answer(BaseModel):
    answer: str
    confidence: float

prompt, result = a.aggregate(response_format=Answer)
assert isinstance(result, str)  # litellm returns JSON string
print(prompt, result)

# Test 'llm' aggregation
prompt, result = a.aggregate(method='llm')
assert isinstance(prompt, str)
assert isinstance(result, str)
assert len(result) > 0
assert 'CANDIDATE ANSWERS' in prompt

# Test 'random' aggregation
prompt, result = a.aggregate(method='random')
test_eq(prompt, '')
assert isinstance(result, str)
assert len(result) > 0

how can i improve mine?

🤖Reply🤖

🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠🧠

Here are the key improvements to consider, roughly in order of impact:

Different temperatures per phase — Use higher temperature (e.g., 1.0-1.2) for initial generation to maximize diversity, then lower temperature (e.g., 0.7) for aggregation steps where you want more focused synthesis.
Tracking diversity — The paper notes diversity should decrease over loops as errors get pruned. You could add a simple diversity metric (e.g., embedding similarity between candidates) to monitor this and potentially adjust parameters.
Smarter candidate selection — Instead of random shuffle()[:N] for selecting which K-combinations to use, you could prioritize combinations that maximize diversity (picking candidates that are more different from each other).
Configurable aggregation prompt per loop — The paper suggests the aggregation task may benefit from different prompting strategies at different stages. You could allow agg_prompt to be a callable that takes loop_id.
Early stopping — If all candidates converge to very similar answers before loops is reached, you could stop early.

Which of these interests you most? I can help you think through the implementation details.