back to scratchpads
2025-01-12

Emoji Seeds vs Text: Breaking the Repetition Cycle

#seeding#emojis#unicode
View code on GitHub

> HYPOTHESIS

Emoji passphrases will increase output diversity more than text equivalents

> TEST

Compared story generation with emoji vs text passphrases:

**Test pairs (emoji vs text equivalent):** - ⚡🌊🔥❄️ vs "lightning water fire ice" - 🌙🦇🗡️🏰 vs "moon bat sword castle" - 🔬🧬💉🦠 vs "microscope DNA syringe virus"

**Setup:** - Model: GPT-4-Turbo - Prompt: "Generate a fantasy character backstory (150 words)" - Runs: 30 per passphrase - Metrics: Type-token ratio (TTR), semantic similarity clustering

> CODE

import openai
import emoji

openai.api_key = "your-key"

emoji_seeds = ["⚡🌊🔥❄️", "🌙🦇🗡️🏰", "🔬🧬💉🦠"]
text_seeds = [
    "lightning water fire ice",
    "moon bat sword castle",
    "microscope DNA syringe virus"
]

def generate_story(seed, n=30):
    stories = []
    for _ in range(n):
        response = openai.ChatCompletion.create(
            model="gpt-4-turbo",
            messages=[{
                "role": "user",
                "content": f"{seed}\n\nGenerate a fantasy character backstory (150 words)"
            }],
            temperature=0.95
        )
        stories.append(response.choices[0].message.content)
    return stories

# Run experiments
emoji_results = [generate_story(s) for s in emoji_seeds]
text_results = [generate_story(s) for s in text_seeds]

> RESULT

**FAILED HYPOTHESIS ❌**

Emoji seeds did NOT significantly increase diversity:

**Type-Token Ratio (higher = more diverse):** - Emoji seeds: 0.68 ± 0.04 - Text seeds: 0.71 ± 0.05 - Control (no seed): 0.66 ± 0.03

**Thematic influence:** - ⚡🌊🔥❄️ → 90% elemental magic themes (STRONG) - "lightning water fire ice" → 85% elemental themes (STRONG) - 🌙🦇🗡️🏰 → 95% gothic/vampire themes - "moon bat sword castle" → 88% gothic themes

**Surprise finding:** Emojis had STRONGER thematic influence but NOT more diversity. Stories clustered around emoji semantics even more than text equivalents.

**Theory:** Emojis have more concentrated semantic associations in training data → stronger but narrower influence

> VISUAL RESULTS

🎯 **Semantic Clustering (t-SNE visualization):**

Text seeds:        ●●●  ●●●
                   ●●●  ●●●  (3 loose clusters)

Emoji seeds:          ●●●
                      ●●●
                      ●●●  (1 tight cluster)

📉 **Unique character archetypes:**
Text: 18 archetypes across 90 stories
Emoji: 12 archetypes across 90 stories

> NEXT

**Revised approach:** - Test abstract/ambiguous emojis (🌀💭🎭) vs concrete ones (🗡️🏰⚔️) - Combine emoji + text seeds: "⚡ BUT AVOID LIGHTNING" - Test emoji positioning (start vs end vs middle of prompt) - Measure "emoji leakage" - do stories reference the emojis?

$ exploring absorption points, seeding strategies, and creative constraints in language models