ProvenanceSplit: Media Provenance & Deepfake Detection
As generative AI makes synthetic media indistinguishable from authentic content, proving the provenance of images, video, and audio becomes essential. ProvenanceSplit uses XorIDA (threshold sharing over GF(2)) to split media provenance metadata across independent verification nodes so no single node can forge or suppress a provenance record. Reconstruction requires a threshold of cooperating nodes, providing tamper-evident media authentication that resists both deepfake attacks and institutional compromise.
Executive Summary
ProvenanceSplit solves a fundamental problem: how to prove that a photograph, video, or audio recording is authentic when deepfakes are indistinguishable from reality.
Two functions cover the complete workflow: splitProvenance() takes a media file with provenance metadata (capture device, timestamp, creator identity, content fingerprint) and splits it into threshold shares via XorIDA, distributing each share to an independent verification node. verifyProvenance() reconstructs the original metadata from a threshold subset of shares, verifies HMAC integrity, and returns the authenticated provenance record.
The key insight: no single verification node holds the complete provenance record, and no single node can forge a record. An attacker must compromise k out of n nodes simultaneously. When k = 3 and you distribute shares to Associated Press, Internet Archive, and C2PA, you've created a tamper-evident proof that's resistant to state-level institutional compromise.
Each share carries an HMAC-SHA256 signature and key, enabling independent per-node verification before reconstruction. The original provenance data is committed via SHA-256 hash at split time, providing an immutable reference for future verification. Zero npm runtime dependencies. Built on XorIDA (information-theoretic security) and Web Crypto API.
The Problem
Media provenance verification today relies on centralized authorities or single points of trust.
Deepfakes Are Undetectable
Generative models (DALL-E, Midjourney, Stable Diffusion, video synthesis, voice cloning) produce media indistinguishable from authentic recordings. A photograph of a politician endorsing a policy, a video of a CEO resigning, an audio clip of a whistleblower — all can be fabricated with consumer hardware. By 2026, AI-generated media may outnumber authentic media in digital pipelines.
Centralized Verification Fails
Today's solutions (C2PA, Adobe Content Authenticity Initiative, blockchain verification) rely on a single verifier — a database, a certificate authority, a blockchain node, or a corporate server. If that authority is hacked, bribed, or coerced, the entire provenance system collapses. A state actor can compromise the verifier and either fabricate false provenance records or suppress authentic ones.
No Threshold Resistance
Traditional media authentication has no built-in redundancy. If a camera's signing key is stolen or a central archive is breached, provenance is lost. There's no mechanism to require k of n nodes to agree on a record's authenticity.
2023: DALL-E 3, video synthesis reaches "maybe not real" quality. 2024: Voice cloning + lip-sync synthesis. 2025: Real-time deepfakes, indistinguishable in compressed video. 2026: Mainstream adoption in disinformation campaigns. Media provenance becomes a critical national security problem.
The Solution
ProvenanceSplit distributes media provenance across threshold-protected verification nodes using XorIDA, a information-theoretic splitting scheme.
XorIDA: Information-Theoretic Security
XorIDA (XOR over Indexable Discrete Arrays) splits data into n shares such that any k shares can reconstruct the data, but k-1 shares reveal zero information about the original. This is not computationally hard — it is mathematically impossible. A quantum computer, given k-1 shares, learns nothing.
The scheme works over GF(2) (binary field). Each share is the result of XOR operations on the original data and random bit sequences. Reconstruction uses linear algebra over GF(2) to recover the original.
Threshold Resilience
You configure n verification nodes and a reconstruction threshold k. ProvenanceSplit splits provenance metadata into n shares. To reconstruct:
- Collect any k shares from the n nodes
- Verify HMAC integrity of each share (each share has its own HMAC key)
- Reconstruct the original provenance data
- Verify the SHA-256 provenance hash
- Return authenticated provenance or error
An attacker must compromise at least k nodes. If k = 3 and your nodes are Associated Press, Internet Archive, and a government archive, you've made attacks dramatically harder. Each node can be operated by a different organization with different security budgets, jurisdictions, and incentives.
HMAC Before Reconstruction
Critical security property: HMAC verification completes before reconstructed data is returned. If any share's HMAC fails, the operation returns an error immediately. This is fail-closed — corrupted or tampered shares are detected and rejected.
At split time, ProvenanceSplit computes SHA-256(provenanceData) and returns it alongside the shares. This hash is independent of the splits and can be verified by any verifier in the future. If an attacker later tries to inject false provenance, the reconstructed data won't match the committed hash.
Architecture
ProvenanceSplit implements a two-phase architecture: splitting at capture time and verification at consumption time.
Phase 1: Splitting Pipeline
When a journalist captures a photograph as evidence, the camera (or post-processing tool) runs the splitting pipeline:
- Validate: Check that mediaId, provenanceData, and config (nodes, threshold) are valid.
- Pad: PKCS#7 pad the provenanceData to a block size derived from the node count.
- HMAC: Generate HMAC-SHA256(padded data). Each share will carry the HMAC key and signature.
- Split: XorIDA split the padded data into n shares. Each share is a Uint8Array.
- Encode: Format each share with IDA5 header (patent-locked format). Base64 encode for transport.
- Package: Return ProvenanceSplitResult containing shares, mediaId, and SHA-256 provenance hash.
import { splitProvenance } from '@private.me/provenancesplit'; const media: MediaFile = { mediaId: 'IMG-2026-0042', fileName: 'evidence-photo.jpg', mediaType: 'image', captureDevice: 'Canon EOS R5', capturedAt: Date.now(), capturedBy: 'reporter-alias-7', provenanceData: new Uint8Array([...]), }; const config: ProvenanceConfig = { nodes: [ { id: 'ap', name: 'Associated Press', role: 'news-agency' }, { id: 'c2pa', name: 'C2PA Verifier', role: 'standards-body' }, { id: 'archive', name: 'Internet Archive', role: 'archive' }, ], threshold: 2, }; const result = await splitProvenance(media, config); if (result.ok) { // result.value.shares = [ ProvenanceShare, ... ] // result.value.provenanceHash = "sha256:..." }
Phase 2: Verification & HMAC
When a news publisher or researcher wants to verify an image, they collect shares from nodes and run verification:
- Validate Shares: Check that at least threshold shares are present, all with consistent metadata.
- Decode: Parse IDA5 header from each share. Base64 decode the share data.
- Reconstruct: XorIDA reconstruct from the first k shares.
- HMAC Verify: Verify HMAC-SHA256 using the key from the first share. If HMAC fails, return error immediately (fail-closed).
- Unpad: PKCS#7 unpad the reconstructed data.
- Return: Return the original provenanceData. Caller can then verify SHA-256 hash against the committed value.
const result = await verifyProvenance(shares); if (result.ok) { const reconstructedData = result.value; // Verify against committed hash const hash = await crypto.subtle.digest( 'SHA-256', reconstructedData ); const matches = hashB64 === committedProvenanceHash; } else { console.error(result.error.message); }
Use Cases
ProvenanceSplit is designed for high-stakes media authentication where a single point of failure is unacceptable.
API Surface
ProvenanceSplit exports two core functions, comprehensive types, and structured error classes.
Core Functions
Splits media provenance metadata into threshold shares via XorIDA. Each share includes HMAC key and signature. Returns ProvenanceSplitResult with shares array and SHA-256 provenance hash.
Verifies HMAC integrity and reconstructs provenance data from threshold shares. Returns reconstructed Uint8Array on success. Fails closed if HMAC verification fails.
Types
| Type | Fields | Purpose |
|---|---|---|
| MediaFile | mediaId, fileName, mediaType, captureDevice, capturedAt, capturedBy, provenanceData | Input media metadata to split |
| ProvenanceConfig | nodes: ProvenanceNode[], threshold: number | Splitting configuration (n nodes, k threshold) |
| ProvenanceShare | mediaId, nodeId, index, total, threshold, data, hmac, hmacKey, originalSize | Single share produced by splitting |
| ProvenanceSplitResult | mediaId, shares, provenanceHash | Successful split result |
Error Classes
All errors inherit from ProvenanceError. Use toProvenanceError(code) to convert string error codes to typed instances for try/catch.
| Error Code | Class | When It Occurs |
|---|---|---|
| INVALID_CONFIG | ProvenanceConfigError | Threshold exceeds node count or threshold < 2 |
| INVALID_MEDIA | ProvenanceMediaError | Missing or empty mediaId or provenanceData |
| SPLIT_FAILED | ProvenanceCryptoError | XorIDA split operation failed |
| HMAC_FAILURE | ProvenanceCryptoError | HMAC verification failed during verify |
| INSUFFICIENT_SHARES | ProvenanceCryptoError | Fewer than threshold shares provided |
| RECONSTRUCTION_FAILED | ProvenanceCryptoError | XorIDA reconstruction or unpadding failed |
| VERIFICATION_FAILED | ProvenanceCryptoError | Shares have mismatched mediaId or threshold |
Security
ProvenanceSplit is built on a foundation of cryptographic primitives and fail-closed design.
HMAC Before Reconstruction
The single most critical security property: HMAC verification ALWAYS completes before reconstructed data is returned to the caller. This is not optional, not deferred, not conditional. If HMAC fails, the operation returns an error. Corrupted or tampered data is never returned to the application.
Information-Theoretic Security
Any subset of shares below the threshold reveals zero information about the original provenance data. This is not based on computational hardness (RSA, ECC, AES). It is based on linear algebra over GF(2). Even a quantum computer with unlimited power cannot extract information from k-1 shares.
No Random Misuse
All random bytes are generated via crypto.getRandomValues(). Never Math.random(). The HMAC key is cryptographically random.
Tamper Detection
SHA-256 provenance hash is computed at split time and remains independent of all shares. If an attacker later tries to reconstruct false provenance, the SHA-256 hash will not match, detecting the tampering.
Per-Node HMAC Keys
Each share carries its own HMAC key. This enables independent per-node verification before any reconstruction. A node can verify the integrity of its own share without needing other nodes.
This package does not authenticate the provenance nodes themselves. Your application must verify that the node claiming to hold share #1 is actually Associated Press, not an attacker. Use TLS, certificate pinning, or digital signatures from the node operator.
Error Handling
ProvenanceSplit uses the Result<T, E> pattern with structured error codes and typed error classes.
Pattern: Result<T, E>
Every function returns a Result union:
type Result<T, E> = | { ok: true; value: T } | { ok: false; error: E };
Converting to Typed Errors
import { splitProvenance, toProvenanceError, ProvenanceCryptoError, } from '@private.me/provenancesplit'; const result = await splitProvenance(media, config); if (!result.ok) { const typedError = toProvenanceError(result.error.code); if (typedError instanceof ProvenanceCryptoError) { console.error('Cryptographic failure', typedError.message); } else { console.error('Other error', typedError.message); } }
Benchmarks
ProvenanceSplit is optimized for high-throughput media processing. Benchmarks measure the splitting and verification pipelines.
Performance Characteristics
| Operation | Data Size | Time | Notes |
|---|---|---|---|
| Split | 256 bytes | ~0.5ms | Pad + HMAC + XorIDA (3-of-3) |
| Split | 1 KB | ~1ms | Pad + HMAC + XorIDA (3-of-3) |
| Split | 10 KB | ~10ms | Pad + HMAC + XorIDA (3-of-3) |
| Verify | 256 bytes | ~0.5ms | Reconstruct + HMAC verify (2-of-3) |
| Verify | 1 KB | ~1ms | Reconstruct + HMAC verify (2-of-3) |
Performance scales linearly with metadata size. Most media provenance records (timestamps, device info, creator identity, content fingerprints) are under 1 KB.
Limitations & Honest Assessment
ProvenanceSplit solves media provenance splitting, not all deepfake detection challenges.
Schema Validation Out of Scope
This package does not validate the content of provenanceData. If your provenance metadata uses C2PA manifests, OpenTimestamps, or custom JSON, your application must validate the schema. ProvenanceSplit only splits and reconstructs bytes.
Media Content Not Processed
ProvenanceSplit splits provenance metadata, not the media file itself (image, video, audio). The actual media content remains under your application's control. You must hash and commit to the media separately if needed.
Node Authentication Not Included
This package assumes you can identify and trust your verification nodes. If a node operator is compromised, they can provide false shares. Use TLS, certificate pinning, or digital signatures to authenticate node identity.
Share Transport Out of Scope
ProvenanceSplit produces shares in base64 format but does not dictate how you transport them. Use HTTPS, authenticated channels, and rate limiting when transmitting shares.
Time Authority Not Included
The capturedAt field is a timestamp, but this package does not verify it against a trusted time source. Use RFC 3161 Time Stamping Authority or similar if you need strong time binding.
ProvenanceSplit solves institutional forgery. If you distribute shares across AP, Reuters, and BBC, no single organization can forge a provenance record without getting caught. It solves threshold resistance — an attacker needs to compromise k-of-n nodes simultaneously.
Deep Dives & Integration
For platform architects, security auditors, and teams integrating ProvenanceSplit with Xlink, enterprise authentication systems, or judicial evidence pipelines.
Post-Quantum Security
ProvenanceSplit inherits quantum resistance from its cryptographic foundation.
Payload Layer: Information-Theoretic
XorIDA splitting is unconditionally quantum-safe. K-1 shares reveal zero information regardless of computing power — classical, quantum, or hypothetical. This is not a hypothesis; it is linear algebra over GF(2).
Transport Layer: Hybrid Post-Quantum
When shares are exchanged via Xlink (the PRIVATE.ME M2M identity layer), they travel in hybrid post-quantum envelopes:
- Key Exchange: X25519 + ML-KEM-768 (FIPS 203) — always-on
- Signatures: Ed25519 + ML-DSA-65 (FIPS 204) — opt-in via
postQuantumSig: true
Applications integrating ProvenanceSplit should create Xlink agents with postQuantumSig: true for full post-quantum protection across all three cryptographic layers: payload (XorIDA), transport (Xlink hybrid KEM), and authentication (hybrid signatures).
Threat Model & Failure Modes
ProvenanceSplit is designed to resist specific threat vectors while acknowledging its assumptions and boundaries.
Threats Mitigated
Assumptions
- Node Availability: At least k of n nodes are reachable and responsive when you verify.
- Node Authenticity: You can verify that a node claiming to hold share #1 is actually Associated Press, not an attacker.
- Secure Channels: Share transport uses HTTPS or authenticated encryption.
- Metadata Validity: The
provenanceDatafield contains valid, schema-compliant metadata.
See docs/threat-model.md and docs/failure-modes.md in the package for comprehensive analysis.
Platform Integration
ProvenanceSplit is part of the PRIVATE.ME platform for authenticated cryptographic interfaces.
Xail Email Client Integration
Journalists using Xail can attach media with provenance splits. When composing to a threshold-protected recipient list (e.g., AP, Reuters, BBC), Xail automatically routes each share via Xlink envelopes to the corresponding organization.
Enterprise Compliance
Regulated organizations (media companies, law enforcement, government agencies) use ProvenanceSplit to create audit trails of media authenticity. Shares are split across internal archive, external notary, and blockchain timestamp service.
C2PA Manifest Compatibility
ProvenanceSplit provenanceData field can hold C2PA (Content Authenticity Initiative) manifests. Split the manifest across nodes, then reconstruct and verify C2PA signatures at consumption time.
Judicial Evidence Pipeline
Courts and legal teams use ProvenanceSplit to establish chain of custody for digital evidence. Provenance metadata (capture device, timestamp, handler identity, hash of original media) is split and reconstructed with court-appointed notaries as nodes.
Codebase Statistics
ProvenanceSplit is a compact, focused implementation of media provenance splitting.
Test Coverage
| Test Category | Count | Coverage |
|---|---|---|
| provenance-splitter.test.ts | 374 tests | All splitting paths, error codes |
| abuse.test.ts | 230 tests | Malformed configs, tampering, adversarial input |
| Total | 604 tests | 100% line coverage |
Module Structure
@private.me/provenancesplit/
src/
index.ts // Barrel export
types.ts // MediaFile, ProvenanceConfig, etc.
errors.ts // Error classes & conversion
provenance-splitter.ts // splitProvenance()
provenance-verifier.ts // verifyProvenance()
__tests__/
provenance-splitter.test.ts // 374 tests
abuse.test.ts // 230 tests
Security Documentation
All source files include JSDoc with @module, parameter descriptions, return types, and security notes. See docs/threat-model.md and docs/failure-modes.md for threat analysis.
Deployment Options
SaaS Recommended
Fully managed infrastructure. Call our REST API, we handle scaling, updates, and operations.
- Zero infrastructure setup
- Automatic updates
- 99.9% uptime SLA
- Enterprise SLA available
SDK Integration
Embed directly in your application. Runs in your codebase with full programmatic control.
npm install @private.me/provenancesplit- TypeScript/JavaScript SDK
- Full source access
- Enterprise support available
On-Premise Upon Request
Enterprise CLI for compliance, air-gap, or data residency requirements.
- Complete data sovereignty
- Air-gap capable deployment
- Custom SLA + dedicated support
- Professional services included
Enterprise On-Premise Deployment
While provenanceSplit is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:
- Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
- Air-gapped environments — SCIF, classified networks, offline operations
- Data residency requirements — EU GDPR, China data laws, government mandates
- Custom integration needs — Embed in proprietary platforms, specialized workflows
Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.