Loading...
private.me Docs
Get ProvenanceSplit
PRIVATE.ME · Technical White Paper

ProvenanceSplit: Media Provenance & Deepfake Detection

As generative AI makes synthetic media indistinguishable from authentic content, proving the provenance of images, video, and audio becomes essential. ProvenanceSplit uses XorIDA (threshold sharing over GF(2)) to split media provenance metadata across independent verification nodes so no single node can forge or suppress a provenance record. Reconstruction requires a threshold of cooperating nodes, providing tamper-evident media authentication that resists both deepfake attacks and institutional compromise.

v0.1.0 604 tests passing 2 core functions 0 npm deps Information-theoretic C2PA-compatible
Section 01

Executive Summary

ProvenanceSplit solves a fundamental problem: how to prove that a photograph, video, or audio recording is authentic when deepfakes are indistinguishable from reality.

Two functions cover the complete workflow: splitProvenance() takes a media file with provenance metadata (capture device, timestamp, creator identity, content fingerprint) and splits it into threshold shares via XorIDA, distributing each share to an independent verification node. verifyProvenance() reconstructs the original metadata from a threshold subset of shares, verifies HMAC integrity, and returns the authenticated provenance record.

The key insight: no single verification node holds the complete provenance record, and no single node can forge a record. An attacker must compromise k out of n nodes simultaneously. When k = 3 and you distribute shares to Associated Press, Internet Archive, and C2PA, you've created a tamper-evident proof that's resistant to state-level institutional compromise.

Each share carries an HMAC-SHA256 signature and key, enabling independent per-node verification before reconstruction. The original provenance data is committed via SHA-256 hash at split time, providing an immutable reference for future verification. Zero npm runtime dependencies. Built on XorIDA (information-theoretic security) and Web Crypto API.

Section 02

The Problem

Media provenance verification today relies on centralized authorities or single points of trust.

Deepfakes Are Undetectable

Generative models (DALL-E, Midjourney, Stable Diffusion, video synthesis, voice cloning) produce media indistinguishable from authentic recordings. A photograph of a politician endorsing a policy, a video of a CEO resigning, an audio clip of a whistleblower — all can be fabricated with consumer hardware. By 2026, AI-generated media may outnumber authentic media in digital pipelines.

Centralized Verification Fails

Today's solutions (C2PA, Adobe Content Authenticity Initiative, blockchain verification) rely on a single verifier — a database, a certificate authority, a blockchain node, or a corporate server. If that authority is hacked, bribed, or coerced, the entire provenance system collapses. A state actor can compromise the verifier and either fabricate false provenance records or suppress authentic ones.

No Threshold Resistance

Traditional media authentication has no built-in redundancy. If a camera's signing key is stolen or a central archive is breached, provenance is lost. There's no mechanism to require k of n nodes to agree on a record's authenticity.

DEEPFAKE TIMELINE

2023: DALL-E 3, video synthesis reaches "maybe not real" quality. 2024: Voice cloning + lip-sync synthesis. 2025: Real-time deepfakes, indistinguishable in compressed video. 2026: Mainstream adoption in disinformation campaigns. Media provenance becomes a critical national security problem.

Section 03

The Solution

ProvenanceSplit distributes media provenance across threshold-protected verification nodes using XorIDA, a information-theoretic splitting scheme.

XorIDA: Information-Theoretic Security

XorIDA (XOR over Indexable Discrete Arrays) splits data into n shares such that any k shares can reconstruct the data, but k-1 shares reveal zero information about the original. This is not computationally hard — it is mathematically impossible. A quantum computer, given k-1 shares, learns nothing.

The scheme works over GF(2) (binary field). Each share is the result of XOR operations on the original data and random bit sequences. Reconstruction uses linear algebra over GF(2) to recover the original.

Threshold Resilience

You configure n verification nodes and a reconstruction threshold k. ProvenanceSplit splits provenance metadata into n shares. To reconstruct:

  • Collect any k shares from the n nodes
  • Verify HMAC integrity of each share (each share has its own HMAC key)
  • Reconstruct the original provenance data
  • Verify the SHA-256 provenance hash
  • Return authenticated provenance or error

An attacker must compromise at least k nodes. If k = 3 and your nodes are Associated Press, Internet Archive, and a government archive, you've made attacks dramatically harder. Each node can be operated by a different organization with different security budgets, jurisdictions, and incentives.

HMAC Before Reconstruction

Critical security property: HMAC verification completes before reconstructed data is returned. If any share's HMAC fails, the operation returns an error immediately. This is fail-closed — corrupted or tampered shares are detected and rejected.

PROVENANCE COMMITMENT

At split time, ProvenanceSplit computes SHA-256(provenanceData) and returns it alongside the shares. This hash is independent of the splits and can be verified by any verifier in the future. If an attacker later tries to inject false provenance, the reconstructed data won't match the committed hash.

Section 04

Architecture

ProvenanceSplit implements a two-phase architecture: splitting at capture time and verification at consumption time.

Phase 1: Splitting Pipeline

When a journalist captures a photograph as evidence, the camera (or post-processing tool) runs the splitting pipeline:

  1. Validate: Check that mediaId, provenanceData, and config (nodes, threshold) are valid.
  2. Pad: PKCS#7 pad the provenanceData to a block size derived from the node count.
  3. HMAC: Generate HMAC-SHA256(padded data). Each share will carry the HMAC key and signature.
  4. Split: XorIDA split the padded data into n shares. Each share is a Uint8Array.
  5. Encode: Format each share with IDA5 header (patent-locked format). Base64 encode for transport.
  6. Package: Return ProvenanceSplitResult containing shares, mediaId, and SHA-256 provenance hash.
Splitting example
import { splitProvenance } from '@private.me/provenancesplit';

const media: MediaFile = {
  mediaId: 'IMG-2026-0042',
  fileName: 'evidence-photo.jpg',
  mediaType: 'image',
  captureDevice: 'Canon EOS R5',
  capturedAt: Date.now(),
  capturedBy: 'reporter-alias-7',
  provenanceData: new Uint8Array([...]),
};

const config: ProvenanceConfig = {
  nodes: [
    { id: 'ap', name: 'Associated Press', role: 'news-agency' },
    { id: 'c2pa', name: 'C2PA Verifier', role: 'standards-body' },
    { id: 'archive', name: 'Internet Archive', role: 'archive' },
  ],
  threshold: 2,
};

const result = await splitProvenance(media, config);
if (result.ok) {
  // result.value.shares = [ ProvenanceShare, ... ]
  // result.value.provenanceHash = "sha256:..."
}

Phase 2: Verification & HMAC

When a news publisher or researcher wants to verify an image, they collect shares from nodes and run verification:

  1. Validate Shares: Check that at least threshold shares are present, all with consistent metadata.
  2. Decode: Parse IDA5 header from each share. Base64 decode the share data.
  3. Reconstruct: XorIDA reconstruct from the first k shares.
  4. HMAC Verify: Verify HMAC-SHA256 using the key from the first share. If HMAC fails, return error immediately (fail-closed).
  5. Unpad: PKCS#7 unpad the reconstructed data.
  6. Return: Return the original provenanceData. Caller can then verify SHA-256 hash against the committed value.
Verification example
const result = await verifyProvenance(shares);
if (result.ok) {
  const reconstructedData = result.value;
  // Verify against committed hash
  const hash = await crypto.subtle.digest(
    'SHA-256',
    reconstructedData
  );
  const matches = hashB64 === committedProvenanceHash;
} else {
  console.error(result.error.message);
}
Section 05

Use Cases

ProvenanceSplit is designed for high-stakes media authentication where a single point of failure is unacceptable.

📸
JOURNALISM
Investigative Evidence
Investigative journalists photograph evidence at crime scenes or protests. ProvenanceSplit splits the photo metadata across AP, Reuters, and BBC archives, making it impossible to forge or suppress.
High-stakes
📹
LAW ENFORCEMENT
Body Camera / Chain of Custody
Police body cameras encode video timestamps and location. ProvenanceSplit splits provenance across independent notaries and legal archives, creating tamper-evident evidence for court.
Litigation-critical
🎙️
GOVERNMENT
Classified Recordings
Government recordings of agreements or meetings are split across State Department, Library of Congress, and a neutral archive, preventing falsification of historical record.
Institutional
📰
MEDIA NETWORKS
Breaking News Verification
When major news breaks, ProvenanceSplit timestamps and authenticates videos across CNN, BBC, and Reuters archives simultaneously, preventing deepfakes from entering the news cycle.
Real-time
Section 06

API Surface

ProvenanceSplit exports two core functions, comprehensive types, and structured error classes.

Core Functions

splitProvenance(media, config) → Promise<Result<ProvenanceSplitResult, ProvenanceError>>

Splits media provenance metadata into threshold shares via XorIDA. Each share includes HMAC key and signature. Returns ProvenanceSplitResult with shares array and SHA-256 provenance hash.

verifyProvenance(shares) → Promise<Result<Uint8Array, ProvenanceError>>

Verifies HMAC integrity and reconstructs provenance data from threshold shares. Returns reconstructed Uint8Array on success. Fails closed if HMAC verification fails.

Types

Type Fields Purpose
MediaFile mediaId, fileName, mediaType, captureDevice, capturedAt, capturedBy, provenanceData Input media metadata to split
ProvenanceConfig nodes: ProvenanceNode[], threshold: number Splitting configuration (n nodes, k threshold)
ProvenanceShare mediaId, nodeId, index, total, threshold, data, hmac, hmacKey, originalSize Single share produced by splitting
ProvenanceSplitResult mediaId, shares, provenanceHash Successful split result

Error Classes

All errors inherit from ProvenanceError. Use toProvenanceError(code) to convert string error codes to typed instances for try/catch.

Error Code Class When It Occurs
INVALID_CONFIG ProvenanceConfigError Threshold exceeds node count or threshold < 2
INVALID_MEDIA ProvenanceMediaError Missing or empty mediaId or provenanceData
SPLIT_FAILED ProvenanceCryptoError XorIDA split operation failed
HMAC_FAILURE ProvenanceCryptoError HMAC verification failed during verify
INSUFFICIENT_SHARES ProvenanceCryptoError Fewer than threshold shares provided
RECONSTRUCTION_FAILED ProvenanceCryptoError XorIDA reconstruction or unpadding failed
VERIFICATION_FAILED ProvenanceCryptoError Shares have mismatched mediaId or threshold
Section 07

Security

ProvenanceSplit is built on a foundation of cryptographic primitives and fail-closed design.

HMAC Before Reconstruction

The single most critical security property: HMAC verification ALWAYS completes before reconstructed data is returned to the caller. This is not optional, not deferred, not conditional. If HMAC fails, the operation returns an error. Corrupted or tampered data is never returned to the application.

Information-Theoretic Security

Any subset of shares below the threshold reveals zero information about the original provenance data. This is not based on computational hardness (RSA, ECC, AES). It is based on linear algebra over GF(2). Even a quantum computer with unlimited power cannot extract information from k-1 shares.

No Random Misuse

All random bytes are generated via crypto.getRandomValues(). Never Math.random(). The HMAC key is cryptographically random.

Tamper Detection

SHA-256 provenance hash is computed at split time and remains independent of all shares. If an attacker later tries to reconstruct false provenance, the SHA-256 hash will not match, detecting the tampering.

Per-Node HMAC Keys

Each share carries its own HMAC key. This enables independent per-node verification before any reconstruction. A node can verify the integrity of its own share without needing other nodes.

APPLICATIONS MUST VERIFY NODES

This package does not authenticate the provenance nodes themselves. Your application must verify that the node claiming to hold share #1 is actually Associated Press, not an attacker. Use TLS, certificate pinning, or digital signatures from the node operator.

Section 08

Error Handling

ProvenanceSplit uses the Result<T, E> pattern with structured error codes and typed error classes.

Pattern: Result<T, E>

Every function returns a Result union:

Result pattern
type Result<T, E> =
  | { ok: true; value: T }
  | { ok: false; error: E };

Converting to Typed Errors

Error conversion for try/catch
import {
  splitProvenance,
  toProvenanceError,
  ProvenanceCryptoError,
} from '@private.me/provenancesplit';

const result = await splitProvenance(media, config);

if (!result.ok) {
  const typedError = toProvenanceError(result.error.code);

  if (typedError instanceof ProvenanceCryptoError) {
    console.error('Cryptographic failure', typedError.message);
  } else {
    console.error('Other error', typedError.message);
  }
}
Section 09

Benchmarks

ProvenanceSplit is optimized for high-throughput media processing. Benchmarks measure the splitting and verification pipelines.

~1ms
Split time (1KB metadata)
~1ms
Verify time (3-of-3)
~30µs
HMAC per share
0
npm dependencies

Performance Characteristics

Operation Data Size Time Notes
Split 256 bytes ~0.5ms Pad + HMAC + XorIDA (3-of-3)
Split 1 KB ~1ms Pad + HMAC + XorIDA (3-of-3)
Split 10 KB ~10ms Pad + HMAC + XorIDA (3-of-3)
Verify 256 bytes ~0.5ms Reconstruct + HMAC verify (2-of-3)
Verify 1 KB ~1ms Reconstruct + HMAC verify (2-of-3)

Performance scales linearly with metadata size. Most media provenance records (timestamps, device info, creator identity, content fingerprints) are under 1 KB.

Section 10

Limitations & Honest Assessment

ProvenanceSplit solves media provenance splitting, not all deepfake detection challenges.

Schema Validation Out of Scope

This package does not validate the content of provenanceData. If your provenance metadata uses C2PA manifests, OpenTimestamps, or custom JSON, your application must validate the schema. ProvenanceSplit only splits and reconstructs bytes.

Media Content Not Processed

ProvenanceSplit splits provenance metadata, not the media file itself (image, video, audio). The actual media content remains under your application's control. You must hash and commit to the media separately if needed.

Node Authentication Not Included

This package assumes you can identify and trust your verification nodes. If a node operator is compromised, they can provide false shares. Use TLS, certificate pinning, or digital signatures to authenticate node identity.

Share Transport Out of Scope

ProvenanceSplit produces shares in base64 format but does not dictate how you transport them. Use HTTPS, authenticated channels, and rate limiting when transmitting shares.

Time Authority Not Included

The capturedAt field is a timestamp, but this package does not verify it against a trusted time source. Use RFC 3161 Time Stamping Authority or similar if you need strong time binding.

WHAT THIS DOES SOLVE

ProvenanceSplit solves institutional forgery. If you distribute shares across AP, Reuters, and BBC, no single organization can forge a provenance record without getting caught. It solves threshold resistance — an attacker needs to compromise k-of-n nodes simultaneously.

Advanced Topics

Deep Dives & Integration

For platform architects, security auditors, and teams integrating ProvenanceSplit with Xlink, enterprise authentication systems, or judicial evidence pipelines.

Section 11

Post-Quantum Security

ProvenanceSplit inherits quantum resistance from its cryptographic foundation.

Payload Layer: Information-Theoretic

XorIDA splitting is unconditionally quantum-safe. K-1 shares reveal zero information regardless of computing power — classical, quantum, or hypothetical. This is not a hypothesis; it is linear algebra over GF(2).

Transport Layer: Hybrid Post-Quantum

When shares are exchanged via Xlink (the PRIVATE.ME M2M identity layer), they travel in hybrid post-quantum envelopes:

  • Key Exchange: X25519 + ML-KEM-768 (FIPS 203) — always-on
  • Signatures: Ed25519 + ML-DSA-65 (FIPS 204) — opt-in via postQuantumSig: true
RECOMMENDATION

Applications integrating ProvenanceSplit should create Xlink agents with postQuantumSig: true for full post-quantum protection across all three cryptographic layers: payload (XorIDA), transport (Xlink hybrid KEM), and authentication (hybrid signatures).

Section 12

Threat Model & Failure Modes

ProvenanceSplit is designed to resist specific threat vectors while acknowledging its assumptions and boundaries.

Threats Mitigated

All-Node Compromise
NOT MITIGATED
Attacker compromises k nodes
Can forge/suppress provenance
Mitigation: distribute across jurisdictions

Assumptions

  • Node Availability: At least k of n nodes are reachable and responsive when you verify.
  • Node Authenticity: You can verify that a node claiming to hold share #1 is actually Associated Press, not an attacker.
  • Secure Channels: Share transport uses HTTPS or authenticated encryption.
  • Metadata Validity: The provenanceData field contains valid, schema-compliant metadata.

See docs/threat-model.md and docs/failure-modes.md in the package for comprehensive analysis.

Advanced

Platform Integration

ProvenanceSplit is part of the PRIVATE.ME platform for authenticated cryptographic interfaces.

Xail Email Client Integration

Journalists using Xail can attach media with provenance splits. When composing to a threshold-protected recipient list (e.g., AP, Reuters, BBC), Xail automatically routes each share via Xlink envelopes to the corresponding organization.

Enterprise Compliance

Regulated organizations (media companies, law enforcement, government agencies) use ProvenanceSplit to create audit trails of media authenticity. Shares are split across internal archive, external notary, and blockchain timestamp service.

C2PA Manifest Compatibility

ProvenanceSplit provenanceData field can hold C2PA (Content Authenticity Initiative) manifests. Split the manifest across nodes, then reconstruct and verify C2PA signatures at consumption time.

Judicial Evidence Pipeline

Courts and legal teams use ProvenanceSplit to establish chain of custody for digital evidence. Provenance metadata (capture device, timestamp, handler identity, hash of original media) is split and reconstructed with court-appointed notaries as nodes.

Advanced

Codebase Statistics

ProvenanceSplit is a compact, focused implementation of media provenance splitting.

604
Tests
2
Core functions
7
Error codes
0
npm deps

Test Coverage

Test Category Count Coverage
provenance-splitter.test.ts 374 tests All splitting paths, error codes
abuse.test.ts 230 tests Malformed configs, tampering, adversarial input
Total 604 tests 100% line coverage

Module Structure

Package exports
@private.me/provenancesplit/
  src/
    index.ts                 // Barrel export
    types.ts                 // MediaFile, ProvenanceConfig, etc.
    errors.ts                // Error classes & conversion
    provenance-splitter.ts   // splitProvenance()
    provenance-verifier.ts   // verifyProvenance()
    __tests__/
      provenance-splitter.test.ts  // 374 tests
      abuse.test.ts                // 230 tests

Security Documentation

All source files include JSDoc with @module, parameter descriptions, return types, and security notes. See docs/threat-model.md and docs/failure-modes.md for threat analysis.

Deployment Options

📦

SDK Integration

Embed directly in your application. Runs in your codebase with full programmatic control.

  • npm install @private.me/provenancesplit
  • TypeScript/JavaScript SDK
  • Full source access
  • Enterprise support available
Get Started →
🏢

On-Premise Upon Request

Enterprise CLI for compliance, air-gap, or data residency requirements.

  • Complete data sovereignty
  • Air-gap capable deployment
  • Custom SLA + dedicated support
  • Professional services included
Request Quote →

Enterprise On-Premise Deployment

While provenanceSplit is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:

  • Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
  • Air-gapped environments — SCIF, classified networks, offline operations
  • Data residency requirements — EU GDPR, China data laws, government mandates
  • Custom integration needs — Embed in proprietary platforms, specialized workflows

Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.

Contact sales for assessment and pricing →