About BlanEval

We're building the evaluation infrastructure that AI teams need to ship with confidence.

Our Mission

AI is being deployed in high-stakes environments—customer support, healthcare, finance, legal. Yet most teams ship AI systems without the rigorous evaluation that traditional software demands.

We started BlanEval because we saw this gap firsthand. Teams were making release decisions based on vibes and spot-checks, not systematic evaluation. Red-team testing was an afterthought. Regression detection was manual and error-prone.

Our mission is to bring the rigor of software QA to AI development. We want every AI team to have access to the evaluation infrastructure that was previously only available to the largest AI labs.

With regulations like the EU AI Act taking effect, the need for systematic AI evaluation has never been greater. We provide the tooling teams need to generate compliance documentation, build audit trails, and demonstrate due diligence.

Our Values

The principles that guide how we build and operate.

Evidence Over Vibes

We believe AI decisions should be backed by data, not gut feelings. Every claim should be testable, every result should be reproducible.

Compliance-Ready by Design

Regulatory requirements like EU AI Act aren't afterthoughts. We build tooling that generates compliance documentation from day one.

Evaluation-First Engineering

Just as software teams adopted test-driven development, AI teams need evaluation-first workflows. Build the tests before you ship the model.

Transparency & Reproducibility

Every evaluation run should be fully reproducible. No black boxes, no magic numbers. You should be able to explain every score to auditors and stakeholders.

Our Story

BlanEval was founded in 2023 by a team of ML engineers and researchers who had spent years building evaluation systems at large tech companies and AI labs.

We kept seeing the same pattern: teams would build sophisticated AI systems, then struggle to answer basic questions like “Is this model better than the last one?” or “What happens if a user tries to jailbreak it?”

The tools existed internally at big companies, but they were fragmented, hard to use, and impossible to access for smaller teams. We decided to change that.

Today, BlanEval helps AI teams of all sizes evaluate their systems with the same rigor as the best AI labs in the world.

Leadership Team

Experienced builders from the AI and developer tools space.

JA

Jan Adamek

Co-founder & CEO

Former ML Platform lead at a Big 4. Spent 8 years building evaluation systems for production AI.

SK

Sarah Kim

Co-founder & CTO

PhD in NLP from Stanford. Previously built red-team testing infrastructure at a major AI lab.

MJ

Marcus Johnson

Head of Product

Former PM at Datadog. Passionate about making complex systems observable and understandable.

Join us in building better AI

Whether you're evaluating your first model or scaling to millions of evaluations, we're here to help.