Paper: Luo, Q., King, G., Puett, M., & Smith, M. D. (2026). Inducing Sustained Creativity and Diversity in Large Language Models. Harvard University. (Paper | Supplementary Material)
The Problem: Search Quests
The paper formalizes a class of tasks called search quests — extended, open-ended explorations where a user needs to evaluate many diverse alternatives before choosing. Examples: finding a wedding dress, identifying an overlooked research topic, brainstorming product ideas, exploring design directions.
Standard LLM decoding (greedy, beam search, even nucleus sampling) is optimized for tasks with a single correct answer. When applied to search quests, these methods produce homogeneous results that converge on conventional, high-probability outputs. Existing diversity techniques (temperature, top-k) help across a small batch (5–10 outputs) but then start repeating.
The Solution: A Decoding-Level Intervention
The authors introduce a novel decoding algorithm that sustains creativity and diversity over arbitrarily long sequences. Key design principles:
- Decoding-only: Operates on output token probabilities without accessing internal model states. Works with any LLM API as a black box.
- No fine-tuning required: Preserves the model’s full knowledge spectrum rather than narrowing it through alignment.
- Promotes low-probability continuations: Actively reaches into the “long tail” of the model’s knowledge, surfacing unconventional alternatives that standard decoding suppresses.
- Tracks conceptual coverage: Maintains a running memory of generated ideas (likely via embedding-based similarity) to penalize repetition and ensure each new output is meaningfully different from previous ones.
- Orthodox + heterodox knowledge: Deliberately surfaces both mainstream and fringe ideas encoded in training data.
Why This Matters
Comparison with Existing Approaches
| Approach | Diversity Duration | Requires Model Access? | Heterodox Knowledge? |
|---|---|---|---|
| Standard decoding (greedy/beam) | None | No | No |
| Temperature / top-k / nucleus | Short burst (5–10 outputs) | No | Partially |
| Prompt engineering (CoT, personas) | Moderate | No | Limited |
| Fine-tuning / RLHF | Varies | Yes (training) | Often reduced |
| Multi-agent collaboration | Good | No | Depends on agents |
| This paper’s method | Sustained (hundreds+) | No (API only) | Yes |
The key advance is sustained diversity — the algorithm doesn’t run out of genuinely different ideas the way other methods do. And it works at the decoding layer, meaning it’s model-agnostic and immediately deployable.
Relation to Prior Work
This paper directly extends the findings in AI Idea Diversity and Prompt Engineering, which showed that Chain-of-Thought prompting increases idea variance. Where the Meincke/Mollick/Terwiesch (2024) paper addressed prompt-level interventions for short-burst diversity, this paper tackles the harder problem of sustaining that diversity over long exploratory sessions and does so at the decoding level rather than the prompt level.
Applications for Agent Workflows
Exploratory Research
Agents conducting literature reviews or hypothesis generation can produce sustained diverse summaries covering both mainstream and fringe perspectives. Rather than returning the same five “obvious” papers or ideas, the method keeps pushing into less-explored territory.
Brainstorming and Option Generation
When multiple agents (or a single agent across iterations) need to propose alternatives for planning, design, or resource allocation, this method ensures each suggestion is conceptually distinct. This directly counters the homogenization problem where agents converge on similar solutions.
Bias Mitigation
By deliberately surfacing heterodox knowledge, the approach counteracts confirmation bias and groupthink — presenting ideas that challenge dominant assumptions rather than reinforcing them.
Decision Support
For complex decisions with many viable options (design directions, architectural choices, strategy), the method can systematically map the full solution space before converging, supporting better-informed choices.
Prompt-Level Equivalents
The Quest algorithm operates at the decoding level, but the same problem can be addressed at the prompt and procedural level. The mechanisms are structurally similar, just implemented further up the stack.
Procedural rotation rules
A workflow that requires variety can encode it as explicit rotation:
- If you made a bar chart in the last entry, try a different chart type today.
- If you used Mermaid last time, try a chart MCP or a palette study.
These rules manually implement what the Quest algorithm does automatically: penalize recently-used approaches.
Chain-of-thought pre-writing
Before generating output, the agent reasons through:
- What did I make in recent entries?
- Which tools or angles haven’t I used recently?
- What’s an unexpected approach for today’s content?
- Pick the approach that differs most from recent work.
This is output-memory tracking at the prompt level — the same conceptual-coverage tracking the Quest algorithm performs at decoding time.
Diversity self-checks
After drafting, the agent verifies:
- Does this output look different from the recent batch?
- Would a reader of the last week see variety?
- If not, try again with a different approach.
The core insight: baseline prompting collapses diversity; procedural constraints maintain it. This is the same finding the Quest Paper formalizes — standard generation converges to high-probability modes. Sustained diversity requires explicit mechanism, tracking what’s been generated and actively steering away from it.
Layer equivalence
| Layer | Quest Paper | Prompt-Level Equivalent |
|---|---|---|
| Mechanism | Decoding algorithm | Prompt engineering + workflow constraints |
| Tracking | Embedding-based concept similarity | Manual review of recent outputs |
| Penalty | Suppress high-probability tokens in covered territory | Explicit rotation rules + CoT forcing different choices |
| Duration | Sustained (hundreds of outputs) | Sustained (daily entries over months) |
| Knowledge access | Orthodox + heterodox from training data | Varied tool usage + creative angles |
Both approaches solve the same problem at different layers of the stack. The Quest algorithm does it automatically at generation time; procedural constraints do it manually via structured prompting and workflow design.
The two are complementary: an agent using the Quest algorithm and procedural diversity constraints could achieve even stronger sustained creativity. Prompt-level methods are immediately deployable on any LLM API without custom inference infrastructure; decoding-level methods are automatic and don’t depend on the agent following the procedure.
Limitations
- Bounded by training data: Can only diversify within what the LLM already knows. Cannot produce genuinely transformational ideas outside its training distribution.
- Coherence-diversity trade-off: Aggressively promoting low-probability tokens may occasionally produce incoherent outputs, requiring post-filtering.
- Computational overhead: Maintaining output memory and computing semantic distances adds cost compared to standard decoding.
- Evaluation difficulty: Measuring “conceptual uniqueness” over long sequences is hard — embedding-based similarity metrics may not fully capture semantic novelty.
- Domain variance: Effectiveness likely depends on how well the LLM’s knowledge covers the relevant domain.
Key Insight
The paper reframes the question from “Can LLMs be creative?” to “How do we systematically elicit and sustain creativity over extended explorations?” — a much more practical and actionable framing. The answer: intervene at decoding time to prevent the model from falling into its comfortable high-probability grooves, while tracking what’s already been generated to avoid circling back.
For agent systems, the implication is clear: diversity is a decoding problem, not just a prompting problem. Prompt engineering helps, but sustained exploration requires structural intervention in how outputs are generated.
See Also
- AI Idea Diversity and Prompt Engineering — prompt-level interventions for idea variance (complementary approach)
- Creativity and Determinism in Agentic Systems — theoretical framework for creativity in cybernetic systems
- Visual Practice — principles for visual meaning-making
- Research Methodology — systematic approaches to knowledge work