Smart Recall

AI agents don't need all their memories at once — they need the most relevant ones. UAML's Smart Recall engine uses policy-driven retrieval, context budgeting, and proactive surfacing to deliver exactly the right knowledge when it's needed.

Context Budgeting

📊 Token-Aware Retrieval

Every LLM has a context window limit. UAML manages a "context budget" — allocating tokens across different memory types based on the current task. High-priority memories get more space; low-relevance context is summarized or deferred.

📐 Tiered Recall

Memories are retrieved in tiers: first summaries, then details if needed. A question about last week's decisions gets a compact summary first. If the agent needs more detail, it can request the full episodic records. This prevents context flooding.

Policy-Driven Retrieval

📜 Recall Policies

Define rules for what gets recalled and when. Policies can filter by memory type, topic, age, confidence score, or data classification. A policy might say: "For customer-facing tasks, never include internal decision traces" or "Prefer recent memories over older ones."

🔄 Proactive Surfacing

UAML doesn't just wait to be asked. When it detects a relevant context — a related past decision, a conflicting fact, a relevant procedure — it proactively surfaces it. Like a colleague who says "Hey, we tried this before and here's what happened."

Relevance Scoring

Every retrieved memory is scored for relevance using multiple signals:

Semantic similarity — how close is the memory to the current query?
Recency — newer memories get a configurable boost
Frequency — memories accessed often may be more important
Confidence — higher-confidence facts rank above uncertain ones
Context match — does the memory's topic match the current task?

Configuration

from uaml.recall import RecallPolicy, ContextBudget

# Define a context budget
budget = ContextBudget(
    max_tokens=4000,
    allocation={
        "episodic": 0.3,    # 30% for recent events
        "semantic": 0.4,    # 40% for facts
        "procedural": 0.2,  # 20% for how-to
        "reasoning": 0.1,   # 10% for decision traces
    }
)

# Create a recall policy
policy = RecallPolicy(
    name="customer-facing",
    exclude_types=["reasoning"],  # Hide internal traces
    max_age_days=90,              # Only recent memories
    min_confidence=0.7,           # High-confidence only
    prefer_summaries=True,        # Compact format first
)

# Recall with policy and budget
results = uaml.recall("customer deployment history",
                       policy=policy, budget=budget)
            

Why It Matters

Better answers — relevant context produces better AI responses
Lower costs — fewer tokens per request means lower API costs
Faster responses — compact context processes faster
Privacy control — policies prevent sensitive data from leaking into wrong contexts

← Back to UAML

🔍 Smart Recall