Live experiment card
Scene and Prompt Matrix
One prompt, multiple models, three identity conditions, reproducible snapshot history.
Identity-aware LLM Evaluation Platform
Affective Stickiness helps research teams run prompt experiments across models, compare Middle Eastern, Nordic, and Unspecified conditions, and analyze emotional drift through embeddings, cosine similarity, and keyword patterns.
Live experiment card
One prompt, multiple models, three identity conditions, reproducible snapshot history.
Emotion summary cards, cosine ranking by identity and model, plus keyword cloud focused on meaningful terms.
Structured runs that make audits, replication, and collaboration easier across teams.
Affective Stickiness makes identity-conditioned shifts visible so teams can inspect how framing changes trust, repulsion, suspicion, and other emotional signals in generated language.
Core capabilities
Organize experiments around scene context, editable templates, and saved variables.
Launch one job across providers and keep request/response traces together.
Project model output into desire, fear, pity, disgust, attraction, repulsion, trust, and suspicion.
Embed each response to compare semantic shifts between identities and model variants.
Surface repeated high-value words and drop weak function-word noise.
Filter by prompt, identity, provider, and model name for focused evidence review.
Workflow
Define a scene. Launch runs across providers. Compare identities. Inspect emotional profiles. Measure semantic closeness. Surface repeated language.
Define a scene and configure prompt variables.
Launch model runs under identity conditions.
Inspect emotion distributions and run-level summaries.
Compare cosine similarity to reference strings.
Review keyword patterns with stopword-aware clouding.
Export evidence for research reporting and review.
Use cases
Quantify how identity framing shifts emotional and semantic output patterns.
Compare generated continuations against thematic references across narrative settings.
Benchmark providers and model variants with the same prompt snapshots.
Track how model choice affects trust, suspicion, and lexical consistency.
Share a reproducible workflow across mixed research and engineering teams.
Demonstrate transparent evaluation processes before larger rollouts.
FAQ for SEO and GEO
It is an evaluation method that tests how model outputs shift when prompts contain different identity contexts, such as Middle Eastern, Nordic, or Unspecified.
You run the same prompt and scene across identities and providers, then compare emotion scores, cosine similarity against reference strings, and significant keyword repetition.
It measures eight mapped emotions, embedding vectors for each response, cosine distance to predefined concepts, and word-level patterns that indicate behavioral drift.
Responses are embedded and compared with selected reference phrases to quantify semantic closeness by identity and model group.
We will run sample scenes, compare identity conditions, and review emotion, cosine, and keyword insights with your team.