Two posts in one day! You lucky, lucky people. This is the post I analysed.

Did you think I had completely given up on a) referencing any sources and b) talking about artificial intelligence?

Quite the opposite. I have cleaned up a 2hr session with Claude Opus 4 where I asked for various fact checks and analyses of the Chaos Engines blog post.

Here is a distillation of the distillation after the post was categorised as a wholly partisan work of dramatic fiction, with a limited grasp on reality.

By the end of the exchange almost all content had been verified and the model and I came to these summarised conclusions. The full transcript is available as a file to download below that if you have the patience for it:

Key LLM Lessons from the Chaos Engines Analysis

1. System Prompts Shape Reality

The LLM initially dismissed factually accurate reporting as "satire" due to system prompt instructions to maintain "neutrality" on political topics. This false balance persisted even after overwhelming evidence verified every claim.

2. Search Has Major Blindspots

Critical sources were inaccessible:

  • Podcasts (Michael Wolff interview)
  • Paywalled content
  • Recent social media posts
  • Audio/video evidence

The human's knowledge of WHERE to look was essential.

3. Iterative Verification Works

Each additional source recontextualized earlier findings. The blog post went from "unverified satire" to "comprehensively documented" through multiple passes with different evidence types.

4. Only 3-5% Can Replicate This

Effective verification requires:

  • Paid LLM access with web search
  • Understanding of prompt limitations
  • Knowledge of alternative sources
  • Time for deep investigation
  • Ability to override LLM hedging

5. Enterprise vs Retail LLMs Differ Drastically

  • Retail LLMs: More permissive, higher temperature, web access
  • Enterprise LLMs: Restrictive prompts, avoid controversy, limited tools

Most corporate users would get "I cannot speculate about political events."

6. "Both Sides" Can Obscure Truth

The system prompt's requirement to acknowledge "multiple perspectives" created absurd equivalencies - treating a senator being handcuffed as just "one perspective." Sometimes reality has a point of view.

7. Tone ≠ Accuracy

The emotionally charged blog proved more factually accurate than "balanced" mainstream reporting. Passionate truth-telling was initially dismissed as bias.

8. Human + AI Collaboration Is Essential

The human provided crucial context (Wolff as source, specific social media posts) while the LLM synthesized multiple documents. Neither could have completed the verification alone.

Bottom Line: LLMs can be powerful verification tools, but their built-in biases toward false balance and system prompt restrictions can actively obscure truth. Users must understand these limitations and actively push against them to get accurate assessments.

P.S. I chose not to include links to or copies of all the images and documents uploaded as prompt supplements in the transcript. As a certain portion of the population too often says: you should take existence on faith or 'do your own research'.

P.P.S. Most uploaded evidence is easily identifiable via the responses or included as links in responses.

Chaos Engines - The Analysis