PRIVATE.ME · Technical White Paper

ProvenanceSplit: Media Provenance & Deepfake Detection

As generative AI makes synthetic media indistinguishable from authentic content, proving the provenance of images, video, and audio becomes essential. ProvenanceSplit uses XorIDA (threshold sharing over GF(2)) to split media provenance metadata across independent verification nodes so no single node can forge or suppress a provenance record. Reconstruction requires a threshold of cooperating nodes, providing tamper-evident media authentication that resists both deepfake attacks and institutional compromise.

v0.1.0 604 tests passing 2 core functions 0 npm deps Information-theoretic C2PA-compatible

Section 01

Executive Summary

ProvenanceSplit solves a fundamental problem: how to prove that a photograph, video, or audio recording is authentic when deepfakes are indistinguishable from reality.

Two functions cover the complete workflow: splitProvenance() takes a media file with provenance metadata (capture device, timestamp, creator identity, content fingerprint) and splits it into threshold shares via XorIDA, distributing each share to an independent verification node. verifyProvenance() reconstructs the original metadata from a threshold subset of shares, verifies HMAC integrity, and returns the authenticated provenance record.

The key insight: no single verification node holds the complete provenance record, and no single node can forge a record. An attacker must compromise k out of n nodes simultaneously. When k = 3 and you distribute shares to Associated Press, Internet Archive, and C2PA, you've created a tamper-evident proof that's resistant to state-level institutional compromise.

Each share carries an HMAC-SHA256 signature and key, enabling independent per-node verification before reconstruction. The original provenance data is committed via SHA-256 hash at split time, providing an immutable reference for future verification. Zero npm runtime dependencies. Built on XorIDA (information-theoretic security) and Web Crypto API.

Section 02

The Problem

Media provenance verification today relies on centralized authorities or single points of trust.

Deepfakes Are Undetectable

Generative models (DALL-E, Midjourney, Stable Diffusion, video synthesis, voice cloning) produce media indistinguishable from authentic recordings. A photograph of a politician endorsing a policy, a video of a CEO resigning, an audio clip of a whistleblower — all can be fabricated with consumer hardware. By 2026, AI-generated media may outnumber authentic media in digital pipelines.

Centralized Verification Fails

Today's solutions (C2PA, Adobe Content Authenticity Initiative, blockchain verification) rely on a single verifier — a database, a certificate authority, a blockchain node, or a corporate server. If that authority is hacked, bribed, or coerced, the entire provenance system collapses. A state actor can compromise the verifier and either fabricate false provenance records or suppress authentic ones.

No Threshold Resistance

Traditional media authentication has no built-in redundancy. If a camera's signing key is stolen or a central archive is breached, provenance is lost. There's no mechanism to require k of n nodes to agree on a record's authenticity.

DEEPFAKE TIMELINE

2023: DALL-E 3, video synthesis reaches "maybe not real" quality. 2024: Voice cloning + lip-sync synthesis. 2025: Real-time deepfakes, indistinguishable in compressed video. 2026: Mainstream adoption in disinformation campaigns. Media provenance becomes a critical national security problem.

Section 03

The Solution

ProvenanceSplit distributes media provenance across threshold-protected verification nodes using XorIDA, a information-theoretic splitting scheme.

XorIDA: Information-Theoretic Security

XorIDA (XOR over Indexable Discrete Arrays) splits data into n shares such that any k shares can reconstruct the data, but k-1 shares reveal zero information about the original. This is not computationally hard — it is mathematically impossible. A quantum computer, given k-1 shares, learns nothing.

The scheme works over GF(2) (binary field). Each share is the result of XOR operations on the original data and random bit sequences. Reconstruction uses linear algebra over GF(2) to recover the original.

Threshold Resilience

You configure n verification nodes and a reconstruction threshold k. ProvenanceSplit splits provenance metadata into n shares. To reconstruct:

Collect any k shares from the n nodes
Verify HMAC integrity of each share (each share has its own HMAC key)
Reconstruct the original provenance data
Verify the SHA-256 provenance hash
Return authenticated provenance or error

An attacker must compromise at least k nodes. If k = 3 and your nodes are Associated Press, Internet Archive, and a government archive, you've made attacks dramatically harder. Each node can be operated by a different organization with different security budgets, jurisdictions, and incentives.

HMAC Before Reconstruction

Critical security property: HMAC verification completes before reconstructed data is returned. If any share's HMAC fails, the operation returns an error immediately. This is fail-closed — corrupted or tampered shares are detected and rejected.

PROVENANCE COMMITMENT

At split time, ProvenanceSplit computes SHA-256(provenanceData) and returns it alongside the shares. This hash is independent of the splits and can be verified by any verifier in the future. If an attacker later tries to inject false provenance, the reconstructed data won't match the committed hash.

Section 04

Architecture

ProvenanceSplit implements a two-phase architecture: splitting at capture time and verification at consumption time.

Phase 1: Splitting Pipeline

When a journalist captures a photograph as evidence, the camera (or post-processing tool) runs the splitting pipeline:

Validate: Check that mediaId, provenanceData, and config (nodes, threshold) are valid.
Pad: PKCS#7 pad the provenanceData to a block size derived from the node count.
HMAC: Generate HMAC-SHA256(padded data). Each share will carry the HMAC key and signature.
Split: XorIDA split the padded data into n shares. Each share is a Uint8Array.
Encode: Format each share with IDA5 header (patent-locked format). Base64 encode for transport.
Package: Return ProvenanceSplitResult containing shares, mediaId, and SHA-256 provenance hash.

Splitting example

import { splitProvenance } from '@private.me/provenancesplit';

const media: MediaFile = {
  mediaId: 'IMG-2026-0042',
  fileName: 'evidence-photo.jpg',
  mediaType: 'image',
  captureDevice: 'Canon EOS R5',
  capturedAt: Date.now(),
  capturedBy: 'reporter-alias-7',
  provenanceData: new Uint8Array([...]),
};

const config: ProvenanceConfig = {
  nodes: [
    { id: 'ap', name: 'Associated Press', role: 'news-agency' },
    { id: 'c2pa', name: 'C2PA Verifier', role: 'standards-body' },
    { id: 'archive', name: 'Internet Archive', role: 'archive' },
  ],
  threshold: 2,
};

const result = await splitProvenance(media, config);
if (result.ok) {
  // result.value.shares = [ ProvenanceShare, ... ]
  // result.value.provenanceHash = "sha256:..."
}

Phase 2: Verification & HMAC

When a news publisher or researcher wants to verify an image, they collect shares from nodes and run verification:

Validate Shares: Check that at least threshold shares are present, all with consistent metadata.
Decode: Parse IDA5 header from each share. Base64 decode the share data.
Reconstruct: XorIDA reconstruct from the first k shares.
HMAC Verify: Verify HMAC-SHA256 using the key from the first share. If HMAC fails, return error immediately (fail-closed).
Unpad: PKCS#7 unpad the reconstructed data.
Return: Return the original provenanceData. Caller can then verify SHA-256 hash against the committed value.

Verification example

const result = await verifyProvenance(shares);
if (result.ok) {
  const reconstructedData = result.value;
  // Verify against committed hash
  const hash = await crypto.subtle.digest(
    'SHA-256',
    reconstructedData
  );
  const matches = hashB64 === committedProvenanceHash;
} else {
  console.error(result.error.message);
}

Section 05

Use Cases

ProvenanceSplit is designed for high-stakes media authentication where a single point of failure is unacceptable.

JOURNALISM

Investigative Evidence

Investigative journalists photograph evidence at crime scenes or protests. ProvenanceSplit splits the photo metadata across AP, Reuters, and BBC archives, making it impossible to forge or suppress.

High-stakes

LAW ENFORCEMENT

Body Camera / Chain of Custody

Police body cameras encode video timestamps and location. ProvenanceSplit splits provenance across independent notaries and legal archives, creating tamper-evident evidence for court.

Litigation-critical

️

GOVERNMENT

Classified Recordings

Government recordings of agreements or meetings are split across State Department, Library of Congress, and a neutral archive, preventing falsification of historical record.

Institutional

MEDIA NETWORKS

Breaking News Verification

When major news breaks, ProvenanceSplit timestamps and authenticates videos across CNN, BBC, and Reuters archives simultaneously, preventing deepfakes from entering the news cycle.

Real-time

Section 06

ACI Interface

ProvenanceSplit exports two core functions, comprehensive types, and structured error classes.

Core Functions

splitProvenance(media, config) → Promise<Result<ProvenanceSplitResult, ProvenanceError>>

Splits media provenance metadata into threshold shares via XorIDA. Each share includes HMAC key and signature. Returns ProvenanceSplitResult with shares array and SHA-256 provenance hash.

verifyProvenance(shares) → Promise<Result<Uint8Array, ProvenanceError>>

Verifies HMAC integrity and reconstructs provenance data from threshold shares. Returns reconstructed Uint8Array on success. Fails closed if HMAC verification fails.

Types

Type	Fields	Purpose
MediaFile	mediaId, fileName, mediaType, captureDevice, capturedAt, capturedBy, provenanceData	Input media metadata to split
ProvenanceConfig	nodes: ProvenanceNode[], threshold: number	Splitting configuration (n nodes, k threshold)
ProvenanceShare	mediaId, nodeId, index, total, threshold, data, hmac, hmacKey, originalSize	Single share produced by splitting
ProvenanceSplitResult	mediaId, shares, provenanceHash	Successful split result

Error Classes

All errors inherit from ProvenanceError. Use toProvenanceError(code) to convert string error codes to typed instances for try/catch.

Error Code	Class	When It Occurs
INVALID_CONFIG	ProvenanceConfigError	Threshold exceeds node count or threshold < 2
INVALID_MEDIA	ProvenanceMediaError	Missing or empty mediaId or provenanceData
SPLIT_FAILED	ProvenanceCryptoError	XorIDA split operation failed
HMAC_FAILURE	ProvenanceCryptoError	HMAC verification failed during verify
INSUFFICIENT_SHARES	ProvenanceCryptoError	Fewer than threshold shares provided
RECONSTRUCTION_FAILED	ProvenanceCryptoError	XorIDA reconstruction or unpadding failed
VERIFICATION_FAILED	ProvenanceCryptoError	Shares have mismatched mediaId or threshold

Section 07

Security

ProvenanceSplit is built on a foundation of cryptographic primitives and fail-closed design.

HMAC Before Reconstruction

The single most critical security property: HMAC verification ALWAYS completes before reconstructed data is returned to the caller. This is not optional, not deferred, not conditional. If HMAC fails, the operation returns an error. Corrupted or tampered data is never returned to the application.

Information-Theoretic Security

Any subset of shares below the threshold reveals zero information about the original provenance data. This is not based on computational hardness (RSA, ECC, AES). It is based on linear algebra over GF(2). Even a quantum computer with unlimited power cannot extract information from k-1 shares.

No Random Misuse

All random bytes are generated via crypto.getRandomValues(). Never Math.random(). The HMAC key is cryptographically random.

Tamper Detection

SHA-256 provenance hash is computed at split time and remains independent of all shares. If an attacker later tries to reconstruct false provenance, the SHA-256 hash will not match, detecting the tampering.

Per-Node HMAC Keys

Each share carries its own HMAC key. This enables independent per-node verification before any reconstruction. A node can verify the integrity of its own share without needing other nodes.

APPLICATIONS MUST VERIFY NODES

This package does not authenticate the provenance nodes themselves. Your application must verify that the node claiming to hold share #1 is actually Associated Press, not an attacker. Use TLS, certificate pinning, or digital signatures from the node operator.

Section 08

Error Handling

ProvenanceSplit uses the Result<T, E> pattern with structured error codes and typed error classes.

Pattern: Result<T, E>

Every function returns a Result union:

Result pattern

type Result<T, E> =
  | { ok: true; value: T }
  | { ok: false; error: E };

Converting to Typed Errors

Error conversion for try/catch

import {
  splitProvenance,
  toProvenanceError,
  ProvenanceCryptoError,
} from '@private.me/provenancesplit';

const result = await splitProvenance(media, config);

if (!result.ok) {
  const typedError = toProvenanceError(result.error.code);

  if (typedError instanceof ProvenanceCryptoError) {
    console.error('Cryptographic failure', typedError.message);
  } else {
    console.error('Other error', typedError.message);
  }
}

Section 09

Benchmarks

ProvenanceSplit is optimized for high-throughput media processing. Benchmarks measure the splitting and verification pipelines.

~1ms

Split time (1KB metadata)

~1ms

Verify time (3-of-3)

~30µs

HMAC per share

npm dependencies

Performance Characteristics

Operation	Data Size	Time	Notes
Split	256 bytes	~0.5ms	Pad + HMAC + XorIDA (3-of-3)
Split	1 KB	~1ms	Pad + HMAC + XorIDA (3-of-3)
Split	10 KB	~10ms	Pad + HMAC + XorIDA (3-of-3)
Verify	256 bytes	~0.5ms	Reconstruct + HMAC verify (2-of-3)
Verify	1 KB	~1ms	Reconstruct + HMAC verify (2-of-3)

Performance scales linearly with metadata size. Most media provenance records (timestamps, device info, creator identity, content fingerprints) are under 1 KB.

Section 10

Limitations & Honest Assessment

ProvenanceSplit solves media provenance splitting, not all deepfake detection challenges.

Schema Validation Out of Scope

This package does not validate the content of provenanceData. If your provenance metadata uses C2PA manifests, OpenTimestamps, or custom JSON, your application must validate the schema. ProvenanceSplit only splits and reconstructs bytes.

Media Content Not Processed

ProvenanceSplit splits provenance metadata, not the media file itself (image, video, audio). The actual media content remains under your application's control. You must hash and commit to the media separately if needed.

Node Authentication Not Included

This package assumes you can identify and trust your verification nodes. If a node operator is compromised, they can provide false shares. Use TLS, certificate pinning, or digital signatures to authenticate node identity.

Share Transport Out of Scope

ProvenanceSplit produces shares in base64 format but does not dictate how you transport them. Use HTTPS, authenticated channels, and rate limiting when transmitting shares.

Time Authority Not Included

The capturedAt field is a timestamp, but this package does not verify it against a trusted time source. Use RFC 3161 Time Stamping Authority or similar if you need strong time binding.

WHAT THIS DOES SOLVE

ProvenanceSplit solves institutional forgery. If you distribute shares across AP, Reuters, and BBC, no single organization can forge a provenance record without getting caught. It solves threshold resistance — an attacker needs to compromise k-of-n nodes simultaneously.

Advanced Topics

Deep Dives & Integration

For platform architects, security auditors, and teams integrating ProvenanceSplit with xBind, enterprise authentication systems, or judicial evidence pipelines.

Section 11

Post-Quantum Security

ProvenanceSplit inherits quantum resistance from its cryptographic foundation.

Payload Layer: Information-Theoretic

XorIDA splitting is unconditionally quantum-safe. K-1 shares reveal zero information regardless of computing power — classical, quantum, or hypothetical. This is not a hypothesis; it is linear algebra over GF(2).

Transport Layer: Hybrid Post-Quantum

When shares are exchanged via xBind (the PRIVATE.ME M2M identity layer), they travel in hybrid post-quantum envelopes:

Key Exchange: X25519 + ML-KEM-768 (FIPS 203) — always-on
Signatures: Ed25519 + ML-DSA-65 (FIPS 204) — opt-in via postQuantumSig: true

RECOMMENDATION

Applications integrating ProvenanceSplit should create xBind agents with postQuantumSig: true for full post-quantum protection across all three cryptographic layers: payload (XorIDA), transport (xBind hybrid KEM), and authentication (hybrid signatures).

Section 12

Threat Model & Failure Modes

ProvenanceSplit is designed to resist specific threat vectors while acknowledging its assumptions and boundaries.

Threats Mitigated

Single Node Compromise

MITIGATED

Attacker steals one node's database

Cannot reconstruct or forge

k-1 shares = zero information

Institutional Coercion

MITIGATED

Government coerces one archive

Cannot suppress proof without k-1 others

Requires conspiracy across k organizations

Share Tampering

DETECTED

Attacker modifies one share

HMAC verification catches it

Fail-closed, error returned

All-Node Compromise

NOT MITIGATED

Attacker compromises k nodes

Can forge/suppress provenance

Mitigation: distribute across jurisdictions

Assumptions

Node Availability: At least k of n nodes are reachable and responsive when you verify.
Node Authenticity: You can verify that a node claiming to hold share #1 is actually Associated Press, not an attacker.
Secure Channels: Share transport uses HTTPS or authenticated encryption.
Metadata Validity: The provenanceData field contains valid, schema-compliant metadata.

See docs/threat-model.md and docs/failure-modes.md in the package for comprehensive analysis.

Advanced

Platform Integration

ProvenanceSplit is part of the PRIVATE.ME platform for authenticated cryptographic interfaces.

Xail Email Client Integration

Journalists using Xail can attach media with provenance splits. When composing to a threshold-protected recipient list (e.g., AP, Reuters, BBC), Xail automatically routes each share via xBind envelopes to the corresponding organization.

Enterprise Compliance

Regulated organizations (media companies, law enforcement, government agencies) use ProvenanceSplit to create audit trails of media authenticity. Shares are split across internal archive, external notary, and blockchain timestamp service.

C2PA Manifest Compatibility

ProvenanceSplit provenanceData field can hold C2PA (Content Authenticity Initiative) manifests. Split the manifest across nodes, then reconstruct and verify C2PA signatures at consumption time.

Judicial Evidence Pipeline

Courts and legal teams use ProvenanceSplit to establish chain of custody for digital evidence. Provenance metadata (capture device, timestamp, handler identity, hash of original media) is split and reconstructed with court-appointed notaries as nodes.

Advanced

Codebase Statistics

ProvenanceSplit is a compact, focused implementation of media provenance splitting.

604

Tests

Core functions

Error codes

npm deps

Test Coverage

Test Category	Count	Coverage
provenance-splitter.test.ts	374 tests	All splitting paths, error codes
abuse.test.ts	230 tests	Malformed configs, tampering, adversarial input
Total	604 tests	100% line coverage

Module Structure

Package exports

@private.me/provenancesplit/
  src/
    index.ts                 // Barrel export
    types.ts                 // MediaFile, ProvenanceConfig, etc.
    errors.ts                // Error classes & conversion
    provenance-splitter.ts   // splitProvenance()
    provenance-verifier.ts   // verifyProvenance()
    __tests__/
      provenance-splitter.test.ts  // 374 tests
      abuse.test.ts                // 230 tests

Security Documentation

All source files include JSDoc with @module, parameter descriptions, return types, and security notes. See docs/threat-model.md and docs/failure-modes.md for threat analysis.

Pricing

PRICING

Coming Soon

Pricing details will be available when this ACI launches. Subscribe to updates to be notified.

Questions about this ACI? Contact us

Deployment Options

SaaS Recommended

Fully managed infrastructure. Call our REST API, we handle scaling, updates, and operations.

Zero infrastructure setup
Automatic updates
99.9% uptime SLA
Enterprise SLA available

View Pricing →

SDK Integration

Integrate into your applications with full programmatic control. Perfect for developers building provenance verification into their products.

npm install @private.me/provenancesplit
TypeScript/JavaScript SDK
Full source access
Use in your applications

Get ProvenanceSplit →

On-Premise Upon Request

Enterprise CLI for compliance, air-gap, or data residency requirements.

Complete data sovereignty
Air-gap capable deployment
Custom SLA + dedicated support
Professional services included

Request Quote →

Enterprise On-Premise Deployment

While provenanceSplit is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:

Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
Air-gapped environments — SCIF, classified networks, offline operations
Data residency requirements — EU GDPR, China data laws, government mandates
Custom integration needs — Embed in proprietary platforms, specialized workflows

Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.

Contact sales for assessment and pricing →

ProvenanceSplit: Media Provenance & Deepfake Detection

Executive Summary

The Problem

Deepfakes Are Undetectable

Centralized Verification Fails

No Threshold Resistance

The Solution

XorIDA: Information-Theoretic Security

Threshold Resilience

HMAC Before Reconstruction

Architecture

Phase 1: Splitting Pipeline

Phase 2: Verification & HMAC

Use Cases

ACI Interface

Core Functions

Types

Error Classes

Security

HMAC Before Reconstruction

Information-Theoretic Security

No Random Misuse

Tamper Detection

Per-Node HMAC Keys

Error Handling

Pattern: Result<T, E>

Converting to Typed Errors

Benchmarks

Performance Characteristics

Limitations & Honest Assessment

Schema Validation Out of Scope

Media Content Not Processed

Node Authentication Not Included

Share Transport Out of Scope

Time Authority Not Included

Deep Dives & Integration

Post-Quantum Security

Payload Layer: Information-Theoretic

Transport Layer: Hybrid Post-Quantum

Threat Model & Failure Modes

Threats Mitigated

Assumptions

Related Packages

Platform Integration

Xail Email Client Integration

Enterprise Compliance

C2PA Manifest Compatibility

Judicial Evidence Pipeline

Codebase Statistics

Test Coverage

Module Structure

Security Documentation

Pricing

Deployment Options

SaaS Recommended

SDK Integration

On-Premise Upon Request

Enterprise On-Premise Deployment