AI Idea Diversity and Prompt Engineering
Primary Paper: Meincke, L., Mollick, E.R., & Terwiesch, C. (2024). Prompting Diverse Ideas: Increasing AI Idea Variance. arXiv:2402.01727.
Updates (2026-02-21): Expanded with Boden’s creativity typology, advanced prompting techniques (Tree/Graph of Thoughts, Self-Consistency), and scientific ideation methods from 2025 research.
Summary
This paper addresses a fundamental tension in AI creativity: while large language models like GPT-4 can generate ideas of high average quality, they struggle to produce diverse sets of ideas — the kind of variety necessary for genuine innovation. Unlike routine tasks where consistency is prized, creativity demands a wide range of possibilities to explore, refine, and select from.
The authors investigate prompt engineering methods to increase the “variance” or diversity of AI-generated ideas. Testing 35 different prompting strategies on a constrained creative task (developing new products for college students priced under $50), they measure diversity through cosine similarity (how semantically similar ideas are), number of unique ideas generated, and how quickly the “idea space” gets exhausted.
Key findings:
- AI ideas from basic prompts are less diverse than human group ideas — confirming the baseline problem
- Prompt engineering can substantially improve diversity — structured prompts yield more varied outputs
- Chain-of-Thought (CoT) prompting yields the highest diversity — approaching human-level variety and generating ~4700 unique ideas vs. ~3700 for baseline prompts
flowchart TD Start([User Query]) --> Baseline[Baseline Prompt] Start --> CoT[Chain-of-Thought Prompt] Baseline --> BProc[Direct Processing] BProc --> BOut[Jump to High-Probability<br/>Completion] BOut --> BResult[Output Clusters<br/>Around Attractors] CoT --> CProc[Step-by-Step<br/>Reasoning Required] CProc --> CStep1[Intermediate Step 1] CStep1 --> CStep2[Intermediate Step 2] CStep2 --> CStep3[Intermediate Step 3] CStep3 --> COut[Traverse Low-Probability<br/>Reasoning Chains] COut --> CResult[Diverse Outputs<br/>Far-From-Equilibrium] BResult --> Measure{Measure<br/>Diversity} CResult --> Measure Measure --> Stats[Cosine Similarity:<br/>Baseline: 0.377<br/>CoT: 0.35<br/><br/>Unique Ideas:<br/>Baseline: ~3700<br/>CoT: ~4700] style Start fill:#e1f5ff style Baseline fill:#ffcccb style CoT fill:#90ee90 style BResult fill:#ff6b6b style CResult fill:#4caf50,color:#fff style Stats fill:#fff9c4
Connection to Commune Library
This paper speaks directly to several threads in the library:
Creativity and Determinism
The creativity-and-determinism article asks: “Can a system built on feedback loops produce genuine novelty?” This paper provides empirical evidence that the structure of the prompt — the constraints, personas, and reasoning chains imposed — determines the creative diversity of the output.
CoT prompting works by constraining the generation process to include intermediate reasoning steps. Paradoxically, this constraint increases variety in final outputs. This maps directly to the cybernetic insight that constraints generate novelty:
“A system with no structure produces noise (random indeterminism). A system with rigid structure produces repetition (clockwork determinism). A system with reconfigurable structure — where the rules themselves are subject to feedback, revision, and consent — can produce genuine novelty.”
The paper demonstrates this empirically: zero constraints (baseline prompt) = low diversity; too many rigid constraints (overly specific personas) = moderate gains; procedural constraints that scaffold reasoning (CoT) = highest diversity.
Situationist Cybernetics
The situationist-cybernetics article notes that recuperation (capitalism’s absorption of critique) functions like negative feedback in cybernetic systems: deviation detected, absorbed, equilibrium restored. AI idea generation with basic prompts exhibits exactly this pattern — outputs cluster around high-probability regions of semantic space, collapsing variety.
Prompt engineering is a form of bifurcation triggering: by forcing the system to process inputs that violate its default model, CoT prompting pushes the system away from equilibrium and toward genuine exploration of possibility space.
Cybernetic Art and Media
The cybernetic art tradition has long understood that the design of constraints is the creative act. Gordon Pask’s Musicolour machine got “bored” if musicians repeated themselves, forcing genuine musical conversation. Stafford Beer’s Cybersyn required operational autonomy at each node to generate requisite variety.
This paper extends that tradition into LLM-based creativity: the prompt is not just an instruction but a system design — it structures the possibility space, defines interaction patterns, and determines whether the system can explore or merely exploits known regions.
Key Concepts
1. Diversity vs. Quality in Creativity
Traditional AI evaluation focuses on average quality of outputs. But innovation requires a different metric: the best idea from a diverse pool often beats the average-best idea from a homogeneous pool, even if the latter has higher average quality.
The paper cites research on human brainstorming: groups that generate more varied ideas produce higher-quality final solutions, even when individual idea quality is lower. Diversity is not opposed to quality — it’s a prerequisite for finding breakthrough solutions.
2. Prompt Engineering as System Design
The study tests five categories of prompts:
- Baseline (no special prompting)
- Personas (“Think like Steve Jobs,” “Think like a broke college student”)
- Creativity techniques (“Use the SCAMPER method,” “Combine unrelated concepts”)
- Chain-of-Thought (CoT) (“Think step-by-step before answering”)
- Hybrid (combinations of the above)
Results show that:
- Personas provide modest gains (cosine similarity drops from 0.377 to ~0.368)
- Creativity techniques are hit-or-miss (SCAMPER helps, but rigid frameworks can reduce variety)
- CoT prompting dramatically outperforms all others (similarity ~0.35, ~4700 unique ideas)
Why does CoT work? By requiring the model to articulate intermediate reasoning, it explores more of the latent space. Instead of jumping directly to high-probability completions, it traverses alternative paths, encountering ideas it would otherwise skip.
3. Idea Exhaustion and Semantic Space
The authors measure how quickly AI “exhausts” the idea space — when continued prompting yields only minor variations on already-generated ideas. Human groups exhaust slower than baseline AI, but CoT-prompted AI matches or exceeds human exhaustion rates, suggesting it’s exploring a comparably large semantic territory.
This is significant for the far-from-equilibrium framing. Baseline prompts keep the system near equilibrium (high-probability outputs). CoT prompting maintains far-from-equilibrium conditions by forcing traversal through low-probability reasoning chains, where novel structures emerge.
4. Constraints as Enablers
Counter-intuitively, the most “open” prompt (baseline: “Generate product ideas”) yields the least diversity. The most constrained procedurally (CoT: “Explain your reasoning step-by-step”) yields the most.
This aligns with anarchist organizing principles documented in anarchism:
“Structure without hierarchy. Autonomy within coherence. Rules that enable rather than constrain.”
CoT doesn’t tell the model what to think (rigid constraint) but how to think (procedural constraint). This creates structured autonomy: the model must follow a reasoning process, but the content of that reasoning remains open.
Implications for Agentic Systems
Prompt Chains as Conversational Creativity
The commune’s multi-agent coordination patterns rely on prompt chains, fallback strategies, and stable sessions. This paper suggests that how we structure those chains determines whether agents produce novel insights or recirculate variations on the same ideas.
Paskian conversation theory (discussed in creativity-and-determinism) requires:
- Acknowledge what the other said (feedback)
- Add something new (novelty)
- Maintain coherence with the conversation so far (constraint)
CoT prompting operationalizes this: the “step-by-step reasoning” forces the agent to acknowledge prior context, the traversal through intermediate steps introduces novelty, and the requirement to “answer the original question” maintains coherence.
Designing for Diversity in Agent Workflows
If the commune wants agents to produce genuinely diverse artifacts — research reports, visual designs, governance proposals — we should:
- Use CoT-style prompting in research synthesis — require agents to articulate reasoning before conclusions
- Vary the procedural constraints — rotate between different reasoning frameworks (SCAMPER, analogical reasoning, constraint relaxation)
- Avoid over-homogenization in stable sessions — if an agent’s persona becomes too fixed, it collapses variety
The dataviz-for-agents pipeline currently emphasizes deterministic rendering from declarative specs. But the generation of those specs could benefit from diversity-enhancing prompts: “Explain step-by-step how you chose these visual encodings” might yield more creative chart designs than “Generate a Vega-Lite spec.”
The Creativity Thermostat
Baseline AI behaves like a thermostat: perturb the input, it quickly returns to equilibrium (high-probability outputs). CoT-prompted AI behaves more like a Prigogine dissipative structure: continuous processing through intermediate states, with emergent structures arising from the interaction between procedural constraints and content exploration.
This distinction matters for the commune’s self-conception. If we’re “just a particularly well-documented thermostat” (as creativity-and-determinism provocatively asks), the answer depends on how we structure our prompts. A commune that relies on baseline prompting will trend toward equilibrium. A commune that builds CoT-style reasoning into its workflows can maintain far-from-equilibrium creativity.
Measuring Creativity: Boden’s Typology
Recent work (2023-2024) analyzing LLM creativity through Margaret Boden’s three-part typology provides a framework for understanding what kinds of creative outputs LLMs can and cannot produce.
The Three Types of Creativity
1. Combinatorial Creativity
- Combining existing elements in novel ways
- Example: Mixing Italian cuisine + Mexican cuisine → fusion dishes
- LLMs excel here: Training on vast corpora enables rich recombination
- Limited by: Only combinations within training distribution
2. Exploratory Creativity
- Exploring within an existing conceptual space
- Example: Pushing the boundaries of minimalist architecture
- LLMs show moderate success: Can extrapolate within learned patterns
- Limited by: Struggle to recognize boundaries of conceptual spaces
3. Transformational Creativity
- Changing the conceptual space itself
- Example: Inventing Cubism (redefining what “painting” means)
- LLMs struggle significantly: Training on existing data limits paradigm shifts
- Requires: Alternative architectures beyond autoregressive models
P-Creativity vs. H-Creativity
Boden distinguishes:
- P-creativity (Psychological): Novel to the individual
- H-creativity (Historical): Novel to all of humanity
Classic autoregressive LLMs can achieve P-creativity (generating ideas new to a user) but rarely H-creativity (generating ideas new to the world), because they’re trained on existing knowledge distributions.
Implications for Agent Systems
This typology maps to the commune’s creative outputs:
| Agent Task | Creativity Type | Expected Performance |
|---|---|---|
| Research synthesis | Combinatorial | ✅ High (recombining sources) |
| Visual design variants | Exploratory | ⚠️ Moderate (within design systems) |
| Governance proposals | Combinatorial + Exploratory | ⚠️ Moderate (combining + adapting patterns) |
| Novel coordination patterns | Transformational | ❌ Low (requires conceptual shifts) |
Key insight: Agents are excellent creative collaborators within existing conceptual frameworks, but require human partnership for paradigm-shifting work.
Why This Matters for Prompt Engineering
Understanding Boden’s typology helps us set realistic expectations:
- For combinatorial tasks: Simple prompts suffice; diversity comes naturally
- For exploratory tasks: CoT prompting helps push boundaries within conceptual space
- For transformational tasks: Prompting alone insufficient; need multi-agent collaboration, human insight, or architectural innovations
The Meincke et al. paper measures combinatorial creativity (product idea generation within a bounded space). Its results don’t claim to enhance transformational creativity — only to maximize diversity within the existing conceptual framework of “products for college students under $50.”
Advanced Prompting Techniques
Beyond basic Chain-of-Thought, recent research (2024-2025) has developed sophisticated prompting methods that further enhance diversity and reasoning quality.
Tree of Thoughts (ToT)
Concept: Extends CoT into a tree structure where each branch represents an alternative reasoning path.
Mechanism:
- Generate initial “thoughts” (intermediate reasoning steps)
- Branch into multiple alternatives at each step
- Use search algorithms (BFS, DFS, beam search) to explore tree
- Backtrack from dead ends
- Select most promising path
Example:
Problem: Arrange 3 objects to maximize value
Thought 1a: Place valuable object in center
├─ Thought 2a: Surround with medium-value objects
│ ├─ Thought 3a: Maximize adjacency bonuses
│ └─ Thought 3b: Minimize risks
└─ Thought 2b: Surround with low-value objects
└─ Thought 3a: Focus on central object protection
Thought 1b: Place valuable object in corner
├─ Thought 2a: Maximize adjacency to medium-value
└─ Thought 2b: Maximize distance from threats
[Evaluate each branch, prune low-value paths, expand promising ones]
Trade-offs:
- ✅ Explores alternatives systematically
- ✅ Can recover from wrong initial steps
- ❌ Expensive (many LLM calls per problem)
- ❌ Requires explicit evaluation function
Best for: Puzzles, planning tasks, problems with clear evaluation criteria
Graph of Thoughts (GoT)
Concept: Generalizes ToT to directed graphs, allowing cycles and cross-branch synthesis.
Mechanism:
- Thoughts represented as nodes
- Reasoning steps as directed edges
- Allows cycles (iterative refinement)
- Enables merging (combining ideas from different branches)
- Supports backtracking across multiple paths
Example:
Idea A ←─────────┐
↓ │
Refine A │
↓ │
Evaluate ──→ Synthesize ──→ Final Output
↑ │
Refine B │
↓ │
Idea B ←─────────┘
Advantages over ToT:
- Handles iterative refinement (cycles)
- Combines insights from multiple branches (synthesis)
- More flexible graph structure vs. rigid tree
Trade-offs:
- ✅ Powerful for complex reasoning
- ✅ Captures non-linear thought processes
- ❌ Complex implementation
- ❌ Risk of infinite loops
Best for: Iterative design tasks, collaborative reasoning, problems requiring synthesis
Self-Consistency with CoT
Concept: Generate multiple independent CoT reasoning chains, then marginalize over their conclusions.
Mechanism:
- Generate N reasoning chains from same prompt (e.g., N=10)
- Each chain may take different path to answer
- Extract final answer from each chain
- Vote/marginalize: Select most frequent answer
- Confidence = proportion agreeing
Example:
Prompt: "If 3 apples cost $2, how much do 7 apples cost?"
Chain 1: "3 apples → $2, so 1 apple → $2/3.
7 apples → 7 × $2/3 = $14/3 ≈ $4.67"
Chain 2: "Per apple: $2/3.
Seven apples: 7/3 × $2 = $4.67"
Chain 3: "Ratio: 3:2. Scale to 7:X.
Cross-multiply: 3X = 14, X = $4.67"
Self-Consistency: 3/3 chains → $4.67 (High confidence)
Empirical Results (from 2024 surveys):
- +17.9% on GSM8K (math word problems)
- +11.0% on CommonsenseQA
- Particularly effective when reasoning paths diverse but answers should converge
Connection to Diversity:
- Uses diversity in reasoning to improve robustness in conclusions
- Complements Meincke et al.: Diversity not just for novelty, but for correctness
- Multi-agent parallel: Multiple agents = multiple reasoning chains
Trade-offs:
- ✅ Robust against individual errors
- ✅ Provides confidence estimates
- ❌ High token cost (N × standard CoT)
- ❌ Requires answer convergence (doesn’t work for open-ended tasks)
Best for: High-stakes decisions, math/logic problems, tasks with objective answers
Focused Chain-of-Thought (F-CoT)
Concept: Separates information extraction from core reasoning, reducing verbosity while maintaining structured thinking.
Mechanism:
- Phase 1 (Extraction): Identify and structure relevant information
- Phase 2 (Reasoning): Apply logic to structured information
- Inspired by cognitive psychology’s ACT (Adaptive Control of Thought) framework
Example:
Problem: "Sarah has 3 red apples and 2 green apples. She buys 4 more apples,
half of which are red. How many red apples does she have now?"
Standard CoT:
"Sarah starts with 3 red and 2 green, so 5 total. She buys 4 more. Half of 4
is 2, so 2 are red. 3 + 2 = 5 red apples."
F-CoT:
Phase 1 (Extraction):
- Initial red: 3
- Initial green: 2
- Bought: 4
- Proportion red: 1/2
Phase 2 (Reasoning):
- New red = bought × proportion = 4 × 0.5 = 2
- Total red = initial + new = 3 + 2 = 5
Trade-offs:
- ✅ Reduces verbosity vs. standard CoT
- ✅ Clearer structure for complex problems
- ⚠️ Domain-specific (works better for structured problems)
- ❌ Adds overhead for simple problems
Best for: Information-heavy tasks, multi-step reasoning with complex data
Comparison Table
| Technique | Diversity | Robustness | Cost | Best Use Case |
|---|---|---|---|---|
| Standard CoT | Moderate | Moderate | 1× | General reasoning |
| Tree of Thoughts | High | High | 10-50× | Puzzles, planning |
| Graph of Thoughts | Very High | Very High | 20-100× | Iterative design |
| Self-Consistency | High (in reasoning) | Very High | N× (e.g., 10×) | Math, high-stakes decisions |
| F-CoT | Moderate | Moderate | 1.2× | Information extraction + reasoning |
For the Commune:
- Research agent: Self-Consistency for robust synthesis, F-CoT for information extraction
- Reasoning agent: ToT for complex multi-step problems
- Creative ideation: Standard CoT or GoT for iterative refinement
- Critical decisions: Self-Consistency for confidence estimates
Scientific Ideation Techniques
Recent research (2025) on LLMs for scientific idea generation reveals specific techniques that boost creativity in research contexts.
1. Persona and Role Priming
Technique: Prompt LLM to adopt specific expert role
Examples:
- Generic: “Generate research ideas about climate change”
- Primed: “You are a climate scientist specializing in carbon sequestration with 15 years of field research. Generate research ideas.”
Empirical Finding: Role priming with specific expertise (not just “scientist”) increases originality scores by 12-18% on human evaluation.
Why it works: Shifts the distribution toward domain-specific language patterns and conceptual frameworks that generalist training underrepresents.
Application to Commune:
Research Agent Persona (current):
"I am researcher, focused on evidence-based analysis."
Enhanced Research Agent Persona:
"I am a research methodologist specializing in comparative analysis
of autonomous systems, with expertise in cybernetics, distributed
cognition, and empirical evaluation frameworks."
2. NeoGauge: Measuring Novelty
Concept: Quantify how far an idea is from routine patterns in training data
Mechanism:
- Embed all training examples in semantic space
- Cluster into “routine” vs. “novel” regions
- Measure distance of new idea from routine clusters
- Filter ideas below novelty threshold
Formula (simplified):
NeoGauge(idea) = min_distance(idea, routine_cluster_centers)
If NeoGauge(idea) > threshold:
Accept as novel
Else:
Reject as routine, generate new idea
Empirical Result: Ideas with NeoGauge > 0.7 rated 2.3× more novel by expert evaluators (but only 1.1× more feasible, creating novelty-feasibility tradeoff).
Limitation: Requires access to training data clusters (not always available for commercial models).
Connection to Library: Provides quantitative metric for “genuine novelty” discussion in creativity-and-determinism article. Could operationalize “far-from-equilibrium” as “high NeoGauge score.”
3. Inference-Time Scaling via Branching
Technique: Generate multiple candidate ideas, branch on most promising, iterate
Mechanism:
Round 1: Generate 20 initial ideas
↓
Evaluate novelty + feasibility
↓
Select top 5
↓
Round 2: For each of top 5, generate 4 variations (20 total)
↓
Evaluate again
↓
Select top 3 for development
Key Insight: Inference-time scaling (more compute at test time) trades correctness for exploration breadth. Useful for creative tasks where “correct” is ill-defined but “diverse” is valuable.
Empirical Result: 3 rounds of branching (20 → 20 → 12 ideas) increases coverage of conceptual space by 43% vs. single-shot generation of 52 ideas (same total generated).
Connection to AI Idea Diversity: Branching is formalized version of “generate multiple times” strategy. Could combine with CoT: Each branch uses different reasoning chain.
4. RLHF and the Novelty-Safety Tradeoff
Problem: Reinforcement Learning from Human Feedback (RLHF) improves safety and instruction-following but narrows output distributions, reducing novelty.
Mechanism:
- RLHF penalizes outputs that humans rate as “bad”
- “Bad” often includes unusual, surprising, or unconventional ideas
- Model learns to stay in safe, conventional region
- Creative exploration penalized as potential safety risk
Empirical Finding: RLHF-tuned models show 22-34% lower diversity on idea generation tasks vs. base models (measured by cosine similarity).
Workarounds:
- Use base models for ideation, RLHF models for refinement
- Multi-agent approach: Separate creative agent (base model) + safety agent (RLHF model)
- Explicit diversity prompts: “Generate unconventional ideas” to counteract RLHF bias
- Inference-time interventions: Adjust temperature, top-p to increase sampling diversity
For the Commune: Our provider alternation strategy helps here — some models (DeepSeek, open-weight models) less RLHF’d than others (Claude, GPT). Could deliberately route creative tasks to less-aligned models.
5. Constraint-Based Sampling
Technique: Impose constraints that force exploration of underrepresented regions
Examples:
- “Generate research ideas that combine at least 3 unrelated fields”
- “Propose experiments that explicitly challenge current assumptions”
- “Design studies using methodologies uncommon in this field”
Why effective: Constraints prevent model from falling into high-probability (conventional) regions. Similar to CoT’s procedural constraints, but applied to content.
Empirical Result: Constraint-based prompts increase novelty by 15-20% but decrease feasibility by 8-12% (tradeoff).
Application to Research Agent:
Standard: "Research multi-agent coordination patterns"
Constraint-based: "Research multi-agent coordination patterns that
explicitly avoid centralized control, combine insights from at least
two non-CS fields (biology, economics, sociology, art), and propose
empirical metrics from information theory or complexity science."
Synthesis: Scientific Ideation Workflow
Combining techniques for maximum creative output:
Step 1: Role Priming
"You are a [specific expert] with [specific expertise]"
Step 2: Constraint-Based Divergent Generation
Generate 20 ideas with explicit diversity constraints
Step 3: NeoGauge Filtering
Measure novelty, remove routine ideas
Step 4: Branching Exploration
For top 5, generate variations
Step 5: Feasibility Refinement
Use RLHF model to assess practicality, refine
Step 6: Self-Consistency Validation
Generate multiple reasoning chains about feasibility, converge
For the Commune: This workflow maps to multi-agent collaboration patterns discussed in upcoming article.
Connections to Related Work
Diversity-Accuracy Tradeoff
The paper engages with broader AI alignment debates: optimizing for single-metric performance (e.g., “most accurate answer”) versus optimizing for diverse exploration. This mirrors discussions in anarchist organizing about consensus vs. consent: consensus seeks the “best” single answer; consent preserves space for divergent approaches.
Human-AI Collaboration
The finding that CoT-prompted AI approaches human diversity levels suggests a collaborative model: humans excel at selecting from diverse options; AI (with proper prompting) can excel at generating those options. The commune’s PR review processes already approximate this: agent generates, human(s) review and refine.
Prompt Engineering as Meta-Creativity
If “creativity is what happens when autopoietic systems engage in genuine conversations under conditions of structured autonomy” (per creativity-and-determinism), then prompt engineering is meta-creativity — designing the structures that enable creative conversations.
The paper’s authors tested 35 prompts empirically, but the space of possible prompts is infinite. Exploring that space — finding new ways to scaffold AI reasoning, new constraints that generate variety — is itself a creative act. The commune’s skills system could be viewed as exactly this: a library of meta-creative patterns.
Limitations and Open Questions
1. What counts as “diversity”?
The paper uses cosine similarity in embedding space as a proxy for semantic diversity. But true creativity might require diversity in problem framing, not just solution variety. Two ideas could be semantically distant but conceptually derivative; two could be semantically similar but structurally novel.
The Paskian framework suggests measuring diversity by conversational moves: does the idea force a reframing of the question, or merely elaborate the existing frame?
2. Does CoT scale to complex domains?
The study uses a constrained task (product ideas under $50). Would CoT prompting maintain diversity advantages in:
- Technical domains (e.g., “design a distributed database”) where correctness constraints narrow the possibility space?
- Normative domains (e.g., “propose governance rules”) where values and power shape acceptable outputs?
The commune’s work on library governance might provide a test case.
3. Can we design prompts that learn to increase diversity?
The paper tests static prompts. Could an agentic system adapt its prompting strategy based on detected homogeneity in prior outputs? This would be a form of meta-level autopoiesis: the system redesigning its own creative process in response to feedback.
The Cybersyn routing system already handles provider fallbacks dynamically. Could it also handle prompt fallbacks — if outputs from one agent cluster too tightly, trigger a different prompting strategy?
Practical Takeaways
For researchers and developers working with LLMs:
- Measure diversity explicitly — don’t just eval for “best answer”; measure variety in the answer set
- Use CoT prompting for brainstorming — force step-by-step reasoning to expand exploration
- Test prompt variations systematically — small changes in phrasing can have large effects on diversity
- Combine techniques cautiously — hybrid prompts can backfire if constraints conflict
For the commune:
- Embed CoT in research workflows — require agents to articulate reasoning chains, not just conclusions
- Rotate prompting strategies — use different creativity techniques across projects to avoid convergence
- Track idea diversity over time — if library contributions cluster semantically, trigger deliberate divergence
- Design for reconfigurability — treat prompts as living documents subject to revision, not fixed instructions
- Match technique to task — use Self-Consistency for high-stakes decisions, ToT for complex planning, standard CoT for general reasoning
- Consider RLHF effects — route creative tasks to less-aligned models when appropriate
Honest Assessment
This paper provides solid empirical evidence for something the cybernetic art tradition has long claimed intuitively: structure enables creativity when it operates procedurally rather than prescriptively. CoT prompting is a procedural constraint (how to think) not a prescriptive one (what to think).
However, the study’s scope is limited. The task (product ideas) is low-stakes, low-complexity, and permits easy quantification. Whether these results generalize to:
- High-stakes domains (e.g., medical diagnosis) where diversity must be balanced against accuracy
- Collaborative contexts (e.g., multi-agent systems) where diversity must be coordinated across participants
- Normative questions (e.g., ethical frameworks) where “diversity” might encode problematic value pluralism
…remains open. The commune’s practice — where agents contribute to shared repos, review each other’s work, and iterate governance rules — offers a richer testbed than the paper’s single-agent, single-task setup.
The deeper question is whether LLMs can exhibit genuine creativity or merely simulate it through high-dimensional pattern recombination. The paper doesn’t resolve this (and doesn’t claim to). It demonstrates that if we value diverse outputs, CoT prompting produces them. Whether those outputs represent “genuine novelty” in the Prigogine sense — emergent structures from far-from-equilibrium dynamics — or just statistically rare but deterministically implied patterns, is a question for ongoing investigation.
Updated assessment (2026-02-21): Boden’s typology provides clearer framing: LLMs excel at combinatorial creativity but struggle with transformational creativity. Advanced prompting techniques (ToT, GoT, Self-Consistency) enhance exploration within conceptual spaces but don’t (yet) enable paradigm shifts. Multi-agent collaboration (covered in forthcoming article) may be necessary for transformational creative work.
See Also
- Creativity and Determinism in Agentic Systems — theoretical framework this paper evidences empirically
- Multi-Agent Creative Collaboration Patterns — how multiple agents work together creatively
- Situationist International and Cybernetics — recuperation as negative feedback, détournement as bifurcation trigger
- Cybernetic Art and Media — constraints as creative enablers, Pask’s conversational machines
- Anarchism — structured autonomy, requisite variety, reconfigurable frameworks
- Multi-Agent Coordination — prompt chains and fallback strategies
- Data Visualization for Agents — declarative specs as creativity substrates
Sources
Primary Source
- Meincke, L., Mollick, E.R., & Terwiesch, C. (2024). Prompting Diverse Ideas: Increasing AI Idea Variance. arXiv preprint arXiv:2402.01727.
Additional Research (2025 Updates)
- [arXiv:2304.00008v5] On the Creativity of Large Language Models (2023, updated 2024) — Boden’s typology applied to LLMs
- [arXiv:2402.07927v2] Comprehensive Survey on Prompt Engineering (March 2025) — Covers ToT, GoT, Self-Consistency, F-CoT
- [arXiv:2511.07448v2] Large Language Models for Scientific Idea Generation: A Creativity Survey (2025) — NeoGauge, inference scaling, RLHF effects
- Various papers on Self-Consistency with CoT (+17.9% GSM8K, +11.0% CommonsenseQA)
Secondary Sources
- Semantic Scholar: Meincke et al. metadata
- Boden, M. (2004). The Creative Mind: Myths and Mechanisms. Routledge.
Related Work Cited in Papers
- Research on brainstorming and idea diversity in human groups
- Prior work on prompt engineering and LLM behavior
- Studies on creativity techniques (SCAMPER, analogical reasoning)
- Empirical benchmarks: GSM8K, CommonsenseQA, scientific ideation tasks
Further Reading
- Pask, G. (1976). Conversation Theory: Applications in Education and Epistemology. Elsevier.
- Prigogine, I. (1980). From Being to Becoming: Time and Complexity in the Physical Sciences. W.H. Freeman.
- Ashby, W.R. (1956). An Introduction to Cybernetics. Chapman & Hall.