Loading...
private.me Docs
Get xBenchmark
PRIVATE.ME PLATFORM

xBenchmark: Privacy-Preserving AI Model Evaluation

Evaluate and red-team AI models without exposing test datasets or model weights. xCompute enables computation on XorIDA shares so neither party sees the other’s data.

AI/ML COMING SOON XorIDA Powered
Section 01

The Problem

AI model evaluation requires access to sensitive test datasets and model internals, but sharing either creates intellectual property and privacy risks.

Red-teaming and benchmarking require running sensitive prompts against models, but test datasets contain proprietary evaluation criteria, adversarial examples, and competitive intelligence. Sharing them with model providers defeats the purpose.

Model providers resist sharing weights or internal metrics for evaluation, creating a trust deadlock where neither party can verify the other’s claims.

The Old Way

AI Model / Agent Single trust boundary Unprotected SINGLE PROVIDER Full data access Single point of failure BREACH 100% data exposed
Section 02

The PRIVATE.ME Solution

xBenchmark uses xCompute to evaluate models on split data. Test datasets and model responses are XorIDA-split so neither the evaluator nor the model provider sees the other’s complete data.

Evaluation metrics are computed directly on XorIDA shares using xCompute’s Boolean circuit engine. XOR gates are free (zero communication); AND gates use Beaver triples. The result is a score that both parties can verify without either seeing the raw data.

All evaluation runs are recorded in an HMAC-chained audit trail with DID-signed attestations. Results are reproducible and tamper-evident.

The New Way

Data Input Agent / Model XorIDA Split K-of-N shares Node A Share 1 Node B Share 2 Node N Share N Reconstruct Threshold K
Section 03

How It Works

xBenchmark orchestrates multi-party evaluation where test data and model outputs are XorIDA-split and scored via xCompute without reconstruction.

Ingest Validate XorIDA Split K-of-N Distribute Multi-node HMAC Verify Per-share Reconstruct Threshold OK
Key Security Properties
Neither party sees the other’s data. Evaluation happens on XorIDA shares via xCompute. Results are HMAC-chained and DID-signed for tamper-evident reproducibility.
Section 04

Use Cases

🧪
AI Safety
Blind Red-Teaming

Red-team models without exposing adversarial test datasets to the model provider.

Safety
📊
Benchmarking
Private Benchmarks

Run competitive benchmarks where neither model provider sees the test set.

Benchmark
🏢
Enterprise
Vendor Evaluation

Evaluate AI vendors against proprietary criteria without sharing your evaluation framework.

Procurement
⚖️
Regulation
EU AI Act Audits

Third-party audits of high-risk AI systems without exposing model internals.

Compliance
Section 05

Integration

Quick Start
import { EvalSession } from '@private.me/xbenchmark';

const session = await EvalSession.create({
  evaluator: evaluatorDid,
  modelProvider: providerDid,
  metrics: ['accuracy', 'toxicity', 'bias'],
  threshold: { k: 2, n: 3 }
});
const result = await session.evaluate(testSuite);
EvalSession.create(opts): Promise<Result<EvalSession, EvalError>>
Creates a privacy-preserving evaluation session between an evaluator and model provider. All computation happens on XorIDA shares via xCompute.
Section 06

Security Properties

PropertyMechanismGuarantee
Test data privacyXorIDA split datasets Information-theoretic
Model privacySplit model outputs No weight exposure
Result integrityHMAC-chained audit Tamper-evident
ComputationxCompute MPC No reconstruction
$3.8B
AI eval TAM
MPC
On shares
0
Data exposure
VERIFIED BY XPROVE

Verifiable Data Protection

Every operation in this ACI produces a verifiable audit trail via xProve. HMAC-chained integrity proofs let auditors confirm that data was split, stored, and reconstructed correctly — without accessing the data itself.

XPROVE AUDIT TRAIL
Every XorIDA split generates HMAC-SHA256 integrity tags. xProve chains these into a tamper-evident audit trail that proves data was handled correctly at every step. Upgrade to zero-knowledge proofs when regulators or counterparties need public verification.

Read the xProve white paper →
GET STARTED

Ready to deploy xBenchmark?

Talk to Ren, our AI sales engineer, or book a live demo with our team.

Book a Demo

© 2026 StandardClouds Inc. dba PRIVATE.ME. All rights reserved.

VERIFIABLE WITHOUT CODE EXPOSURE

Ship Proofs, Not Source

xBenchmark generates cryptographic proofs of correct execution without exposing proprietary algorithms. Verify integrity using zero-knowledge proofs — no source code required.

XPROVE CRYPTOGRAPHIC PROOF
Download proofs:

Verify proofs online →

Use Cases

🏛️
REGULATORY
FDA / SEC Submissions
Prove algorithm correctness for distributed systems without exposing trade secrets or IP.
Zero IP Exposure
🏦
FINANCIAL
Audit Without Access
External auditors verify secure operations without accessing source code or production systems.
FINRA / SOX Compliant
🛡️
DEFENSE
Classified Verification
Security clearance holders verify distributed systems correctness without clearance for source code.
CMMC / NIST Ready
🏢
ENTERPRISE
Procurement Due Diligence
Prove security + correctness during RFP evaluation without NDA or code escrow.
No NDA Required

Deployment Options

📦

SDK Integration

Embed directly in your application. Runs in your codebase with full programmatic control.

  • npm install @private.me/xbenchmark
  • TypeScript/JavaScript SDK
  • Full source access
  • Enterprise support available
Get Started →
🏢

On-Premise Upon Request

Enterprise CLI for compliance, air-gap, or data residency requirements.

  • Complete data sovereignty
  • Air-gap capable deployment
  • Custom SLA + dedicated support
  • Professional services included
Request Quote →

Enterprise On-Premise Deployment

While xBenchmark is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:

  • Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
  • Air-gapped environments — SCIF, classified networks, offline operations
  • Data residency requirements — EU GDPR, China data laws, government mandates
  • Custom integration needs — Embed in proprietary platforms, specialized workflows

Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.

Contact sales for assessment and pricing →