About BlanEval
We're building the evaluation infrastructure that AI teams need to ship with confidence.
Our Mission
AI is being deployed in high-stakes environments—customer support, healthcare, finance, legal. Yet most teams ship AI systems without the rigorous evaluation that traditional software demands.
We started BlanEval because we saw this gap firsthand. Teams were making release decisions based on vibes and spot-checks, not systematic evaluation. Red-team testing was an afterthought. Regression detection was manual and error-prone.
Our mission is to bring the rigor of software QA to AI development. We want every AI team to have access to the evaluation infrastructure that was previously only available to the largest AI labs.
With regulations like the EU AI Act taking effect, the need for systematic AI evaluation has never been greater. We provide the tooling teams need to generate compliance documentation, build audit trails, and demonstrate due diligence.
Our Values
The principles that guide how we build and operate.
Evidence Over Vibes
We believe AI decisions should be backed by data, not gut feelings. Every claim should be testable, every result should be reproducible.
Compliance-Ready by Design
Regulatory requirements like EU AI Act aren't afterthoughts. We build tooling that generates compliance documentation from day one.
Evaluation-First Engineering
Just as software teams adopted test-driven development, AI teams need evaluation-first workflows. Build the tests before you ship the model.
Transparency & Reproducibility
Every evaluation run should be fully reproducible. No black boxes, no magic numbers. You should be able to explain every score to auditors and stakeholders.
Our Story
BlanEval was founded in 2023 by a team of ML engineers and researchers who had spent years building evaluation systems at large tech companies and AI labs.
We kept seeing the same pattern: teams would build sophisticated AI systems, then struggle to answer basic questions like “Is this model better than the last one?” or “What happens if a user tries to jailbreak it?”
The tools existed internally at big companies, but they were fragmented, hard to use, and impossible to access for smaller teams. We decided to change that.
Today, BlanEval helps AI teams of all sizes evaluate their systems with the same rigor as the best AI labs in the world.
Leadership Team
Experienced builders from the AI and developer tools space.
Jan Adamek
Co-founder & CEO
Former ML Platform lead at a Big 4. Spent 8 years building evaluation systems for production AI.
Sarah Kim
Co-founder & CTO
PhD in NLP from Stanford. Previously built red-team testing infrastructure at a major AI lab.
Marcus Johnson
Head of Product
Former PM at Datadog. Passionate about making complex systems observable and understandable.
Join us in building better AI
Whether you're evaluating your first model or scaling to millions of evaluations, we're here to help.