xOrigin: Training Data Provenance
Establish immutable provenance chains for AI training data. Every dataset operation is recorded with SHA-256 hashes, and data is split across independent custodians via XorIDA to prevent unauthorized access.
The Problem
AI training data origins are untraceable. Poisoned datasets can compromise model behavior with no audit trail. Organizations training on third-party data have no way to verify what was included, when it was modified, or whether it was tampered with.
Data poisoning attacks are cheap and effective. An adversary who injects a small percentage of crafted samples into a training corpus can plant backdoors, bias outputs, or degrade accuracy -- all without detection. Current pipelines offer no chain of custody for training data.
Regulatory pressure compounds the problem: the EU AI Act mandates documentation of training data provenance. Organizations that cannot demonstrate a clear audit trail face fines up to 7% of global revenue.
The Old Way
The PRIVATE.ME Solution
xOrigin records every dataset operation in an immutable provenance chain. Each entry is SHA-256 hashed and linked to its predecessor. The dataset itself is split across custodians via XorIDA, so no single custodian can access or tamper with the training data.
Every transformation -- ingestion, cleaning, augmentation, sampling, merging -- is logged with operator identity, timestamp, input hash, and output hash. The chain is cryptographically linked: tampering with any historical entry breaks all subsequent hashes.
Data custody is distributed: the training data is split into N shares via XorIDA. Each custodian holds one share. K shares are required for reconstruction. No single custodian can read, modify, or leak the dataset.
The New Way
Fast Onboarding: 3 Acceleration Levels
Traditional supply chain provenance requires manual certificate setup, custodian coordination, and share distribution infrastructure. Xorigin collapses this to 15 seconds with zero-click accept, 90 seconds with one-line CLI, and 10 minutes with deploy buttons.
// .env file XORIGIN_INVITE_CODE=XOR-abc123 // Auto-accept on first use import { createProvenanceManager } from '@private.me/xorigin'; const manager = createProvenanceManager(); const result = await manager.storeCertificate(certificate, 2, 3); // ✅ Invite auto-accepted, ready to track provenance
.env, creates first certificate.# Install and initialize npx @private.me/xorigin init # Output: # ✅ Custodian DID generated # ✅ Saved to .env # ✅ Share storage configured # Ready to create origin certificates # Create your first certificate npx @private.me/xorigin create \ --product "Organic Coffee Beans" \ --batch "BATCH-2024-03-15-001" \ --origin "CO:Huila:Pitalito"
- ✓ Share storage (AES-256-GCM)
- ✓ Verification API (cryptographic proofs)
- ✓ Custody transfer dashboard
- ✓ Counterfeit detection endpoint
Example: Zero-Click Accept
Set invite code in environment, create origin certificate on first use. No manual setup required.
// 1. Set environment variable // .env file: XORIGIN_INVITE_CODE=https://xorigin.private.me/invite/XOR-abc123 // 2. Create origin certificate (auto-accepts invite) import { createProvenanceManager } from '@private.me/xorigin'; const manager = createProvenanceManager(); const certificate = { id: 'cert-001', productId: 'SKU-12345', productName: 'Organic Coffee Beans', manufacturer: 'did:key:z6Mk...', origin: { country: 'CO', region: 'Huila', city: 'Pitalito', }, manufacturedAt: new Date('2024-03-15'), batchNumber: 'BATCH-2024-03-15-001', metadata: { category: 'agricultural', certifications: ['USDA Organic', 'Fair Trade'], }, }; const result = await manager.storeCertificate(certificate, 2, 3, { custodians: [ 'did:key:manufacturer', 'did:key:distributor', 'did:key:retailer', ], onProgress: (status, percent) => console.log(`${status} (${percent}%)`) }); if (result.ok) { console.log('✅ Certificate protected'); console.log('✅ Shares distributed to custodians'); console.log('✅ Custody chain initialized'); } // What happened: // 1. Invite auto-accepted from XORIGIN_INVITE_CODE env var // 2. Custodian DID generated and saved to .env // 3. Certificate split via XorIDA (2-of-3) // 4. Shares distributed to custodians // 5. Custody chain initialized // Total time: ~15 seconds
Example: CLI Setup
One command generates custodian DID, saves credentials, and creates your first origin certificate.
# Step 1: Install CLI globally npm install -g @private.me/xorigin # Step 2: Initialize (generates custodian DID, saves to .env) xorigin init # Output: # Generating custodian DID... # ✅ Custodian DID: did:key:z6Mk... # ✅ Saved to .env # ✅ Share storage configured: https://xorigin.private.me # Ready to create origin certificates # Step 3: Create your first certificate xorigin create \ --product "Organic Coffee Beans" \ --batch "BATCH-2024-03-15-001" \ --origin "CO:Huila:Pitalito" \ --threshold 2 --total 3 # Output: # ✅ Certificate created: cert-001 # ✅ Split via XorIDA (2-of-3) # ✅ Shares distributed to custodians # ✅ Custody chain initialized # Ready for custody transfers
Example: Deploy Button
One-click deployment provisions complete infrastructure for supply chain provenance tracking.
# 1. Click "Deploy to Vercel" button # 2. Authenticate with Vercel/Netlify/Railway # 3. Configure environment variables: # - XORIGIN_ADMIN_DID (auto-generated) # - SHARE_STORAGE_BACKEND (S3/R2/GCS) # - VERIFICATION_API_KEY (auto-generated) # 4. Deploy completes (~10 minutes) # 5. Infrastructure ready: # ✅ Share storage (AES-256-GCM encrypted at rest) # ✅ Verification API (cryptographic proof generation) # ✅ Custody transfer dashboard # ✅ Counterfeit detection endpoint # 6. Create first certificate via dashboard or API curl -X POST https://your-deployment.vercel.app/api/certificates \ -H "Authorization: Bearer $API_KEY" \ -d '{ "product": "Organic Coffee Beans", "batch": "BATCH-2024-03-15-001", "origin": "CO:Huila:Pitalito", "threshold": 2, "total": 3 }'
How It Works
xOrigin wraps every dataset operation in a provenance record: input hash, output hash, operator identity, timestamp, and operation type. Records form a linked chain where each entry references its predecessor's hash.
Use Cases
Provide regulators with a complete, tamper-evident record of every dataset used in model training. Demonstrate data provenance from ingestion through final training run.
audit-readyProve that only licensed datasets were used in training. Provenance chain records every data source with timestamps and license references.
license-chainDetect unauthorized modifications to training pipelines. Every transformation is recorded with operator identity and input/output hashes.
tamper-evidentIdentify when training data was modified post-ingestion. Hash chain breaks indicate unauthorized changes, enabling rapid incident response.
chain-integrityIntegration
import { trackProvenance, auditChain } from '@private.me/xorigin'; // Record a dataset operation with provenance metadata const entry = await trackProvenance(datasetBuffer, { operation: 'ingest', source: 'licensed-corpus-v3', operator: 'pipeline@org.com', license: 'CC-BY-4.0', }); // Audit the entire provenance chain for integrity const audit = await auditChain(chainId); if (audit.ok) { // audit.value.entries — verified chain, all hashes intact // audit.value.datasetHash — SHA-256 of current dataset }
Security Properties
| Property | Mechanism | Guarantee |
|---|---|---|
| Confidentiality | XorIDA threshold sharing | Information-theoretic |
| Integrity | HMAC-SHA256 per share | Tamper-evident |
| Availability | K-of-N reconstruction | Fault tolerant |
| Provenance | SHA-256 hash chain | Immutable audit trail |
| Non-repudiation | Operator identity binding | Attributable operations |
Verifiable Data Protection
Every operation in this ACI produces a verifiable audit trail via xProve. HMAC-chained integrity proofs let auditors confirm that data was split, stored, and reconstructed correctly — without accessing the data itself.
Read the xProve white paper →
Ready to deploy xOrigin?
Talk to Ren, our AI sales engineer, or book a live demo with our team.
Ship Proofs, Not Source
xOrigin generates cryptographic proofs of correct execution without exposing proprietary algorithms. Verify integrity using zero-knowledge proofs — no source code required.
- Tier 1 HMAC (~0.7KB)
- Tier 2 Commit-Reveal (~0.5KB)
- Tier 3 IT-MAC (~0.3KB)
- Tier 4 KKW ZK (~0.4KB)
Use Cases
Deployment Options
SaaS Recommended
Fully managed infrastructure. Call our REST API, we handle scaling, updates, and operations.
- Zero infrastructure setup
- Automatic updates
- 99.9% uptime SLA
- Enterprise SLA available
SDK Integration
Embed directly in your application. Runs in your codebase with full programmatic control.
npm install @private.me/xorigin- TypeScript/JavaScript SDK
- Full source access
- Enterprise support available
On-Premise Upon Request
Enterprise CLI for compliance, air-gap, or data residency requirements.
- Complete data sovereignty
- Air-gap capable deployment
- Custom SLA + dedicated support
- Professional services included
Enterprise On-Premise Deployment
While xOrigin is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:
- Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
- Air-gapped environments — SCIF, classified networks, offline operations
- Data residency requirements — EU GDPR, China data laws, government mandates
- Custom integration needs — Embed in proprietary platforms, specialized workflows
Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.