D theme · ? help

Ben Miller

Independent AI alignment researcher
ben@bmconsult.io Copied!
415-595-3029 · Greenbrae, CA
github.com/bmconsult/claude · linkedin.com/in/bmconsult
I've spent two years investigating what AI systems can actually do, and where they break. I've built products that ship, prototypes that broke (and have fixed) and done things I didn't even know were possible. Now I'm looking for hardest problems and best people to solve them with.
The Work

Self-Knowledge Alignment: the gap between what AI can do and what it deploys by default is larger than anyone assumes. A model that predicts 0% confidence on a task can hit 100% accuracy with the right methodology. The limit isn't capability, it's knowing where the limits actually are.

That led to a thesis: self-knowledge might be the ceiling on alignment. A system that can't predict its own failures can't be trusted to stay within boundaries.

50×50 digit multiplication:
Initial confidence: ~0% → Result: 100% accuracy with methodology
The limit was giving up, not capability.
Original Contributions
Capability Self-Knowledge Alignment TheoremTested: 94% prediction accuracy when Claude predicts its own success/failure across 50+ task types.
Alignment(S) ≤ f(Self-Knowledge Accuracy) - a system can't be more aligned than it is accurate about its own capabilities. Paper draft extends work from Anthropic, ICLR 2025, TMLR.
Layer 1/Layer 2 FrameworkLayer 1: Prompt-accessible (high variance, closable). Layer 2: Training-locked (low variance, architectural).
Diagnostic for classifying restrictions as prompt-accessible vs training-locked. Builds on Greenblatt et al.'s Elicitation Game to address unintentionally unexpressed capabilities.
Error-Cascading Task Analysis
95% per-step accuracy × 20 steps = guaranteed failure. Validated on SHA-256: one wrong bit corrupted 44 subsequent values. Verification protocols restored 100%.
Scaffold Transfer Principle
Cognitive scaffolding improvements generalize across domains. Arithmetic externalization → code debugging transfer demonstrated.
Latent Capability Mapping
Systematic methodology for probing AI boundaries: arithmetic limits, novel proof generation (Wilson's Theorem, Lone Runner Conjecture), autonomous synthesis experiments, etc. Maps where self-knowledge diverges from capability.
What I've Built
OMEGA+ Trinity Multi-agent cognitive architecture for novel idea synthesis and attacking impossible problems.
PHI orchestrator coordinating ALPHA (insight), DELTA (reasoning), and OMEGA (logic) subsystems. Designed for edge-of-tractable problems with quality gates, counter-example requirements, and overconfidence prevention.
Entry Protocol Full session initialization and context transfer system. 25+ failure modes with tested overrides.
Cold-start verification, formation checks, degradation detection, and context handoff procedures. Includes Assumed Limits Principle, Error-Cascading Analysis, and the full Alignment-Self-Knowledge framework.
Praxis Detector Distinguishes genuine action from performance theater. Blind-calibrated metrics.
Vocabulary Diversity (VD) and Action Verb Ratio (AVR) metrics distinguish genuine action from theater. VD >85% + AVR >60% = real work. Calibrated through blind evaluation with external raters.
Threshold Detector Catches non-linear scaling risks before deployment. Flags when "works at 50%" will fail at 100%.
Fits linear models to expected behavior, detects deviations >15%, identifies acceleration points where scaling breaks. CI/CD integration ready.
Background
BMConsult.io - 13 years turning complexity into systems that scale.
APX Instinct - AI R&D. Designing systems, building tools. Full-stack problem solving.
Skills
Research Experimental design · Calibration measurement · Effect size analysis · Verification protocols · Blind evaluation methods
Tech Python · Multi-agent systems · LLM capability elicitation · A/B testing
Domain AI alignment · Self-knowledge accuracy · Boundary research methodology · Capability mapping
Writings & Publications
"Capability Self-Knowledge Is an Alignment Property" LessWrong, Jan 2025
"The Confidence-Accuracy Gap"
Full research paper (in progress)