xBenchmark: Privacy-Preserving AI Model Evaluation
Evaluate and red-team AI models without exposing test datasets or model weights. xCompute enables computation on XorIDA shares so neither party sees the other’s data.
The Problem
AI model evaluation requires access to sensitive test datasets and model internals, but sharing either creates intellectual property and privacy risks.
Red-teaming and benchmarking require running sensitive prompts against models, but test datasets contain proprietary evaluation criteria, adversarial examples, and competitive intelligence. Sharing them with model providers defeats the purpose.
Model providers resist sharing weights or internal metrics for evaluation, creating a trust deadlock where neither party can verify the other’s claims.
The Old Way
The PRIVATE.ME Solution
xBenchmark uses xCompute to evaluate models on split data. Test datasets and model responses are XorIDA-split so neither the evaluator nor the model provider sees the other’s complete data.
Evaluation metrics are computed directly on XorIDA shares using xCompute’s Boolean circuit engine. XOR gates are free (zero communication); AND gates use Beaver triples. The result is a score that both parties can verify without either seeing the raw data.
All evaluation runs are recorded in an HMAC-chained audit trail with DID-signed attestations. Results are reproducible and tamper-evident.
The New Way
How It Works
xBenchmark orchestrates multi-party evaluation where test data and model outputs are XorIDA-split and scored via xCompute without reconstruction.
Use Cases
Red-team models without exposing adversarial test datasets to the model provider.
SafetyRun competitive benchmarks where neither model provider sees the test set.
BenchmarkEvaluate AI vendors against proprietary criteria without sharing your evaluation framework.
ProcurementThird-party audits of high-risk AI systems without exposing model internals.
ComplianceIntegration
import { EvalSession } from '@private.me/xbenchmark'; const session = await EvalSession.create({ evaluator: evaluatorDid, modelProvider: providerDid, metrics: ['accuracy', 'toxicity', 'bias'], threshold: { k: 2, n: 3 } }); const result = await session.evaluate(testSuite);
Security Properties
| Property | Mechanism | Guarantee |
|---|---|---|
| Test data privacy | XorIDA split datasets | ✓ Information-theoretic |
| Model privacy | Split model outputs | ✓ No weight exposure |
| Result integrity | HMAC-chained audit | ✓ Tamper-evident |
| Computation | xCompute MPC | ✓ No reconstruction |
Verifiable Data Protection
Every operation in this ACI produces a verifiable audit trail via xProve. HMAC-chained integrity proofs let auditors confirm that data was split, stored, and reconstructed correctly — without accessing the data itself.
Read the xProve white paper →
Ready to deploy xBenchmark?
Talk to Ren, our AI sales engineer, or book a live demo with our team.
Ship Proofs, Not Source
xBenchmark generates cryptographic proofs of correct execution without exposing proprietary algorithms. Verify integrity using zero-knowledge proofs — no source code required.
- Tier 1 HMAC (~0.7KB)
- Tier 2 Commit-Reveal (~0.5KB)
- Tier 3 IT-MAC (~0.3KB)
- Tier 4 KKW ZK (~0.4KB)
Use Cases
Deployment Options
SaaS Recommended
Fully managed infrastructure. Call our REST API, we handle scaling, updates, and operations.
- Zero infrastructure setup
- Automatic updates
- 99.9% uptime SLA
- Enterprise SLA available
SDK Integration
Embed directly in your application. Runs in your codebase with full programmatic control.
npm install @private.me/xbenchmark- TypeScript/JavaScript SDK
- Full source access
- Enterprise support available
On-Premise Upon Request
Enterprise CLI for compliance, air-gap, or data residency requirements.
- Complete data sovereignty
- Air-gap capable deployment
- Custom SLA + dedicated support
- Professional services included
Enterprise On-Premise Deployment
While xBenchmark is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:
- Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
- Air-gapped environments — SCIF, classified networks, offline operations
- Data residency requirements — EU GDPR, China data laws, government mandates
- Custom integration needs — Embed in proprietary platforms, specialized workflows
Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.