Loading...
private.me Docs
Get SourceSplit
PRIVATE.ME · Technical White Paper

SourceSplit: Journalist Source Protection

Confidential sources are the backbone of investigative journalism. SourceSplit uses XorIDA (threshold sharing over GF(2)) to split source documents, identity records, and communications across independent press organizations so that no single newsroom holds the complete record. Reconstruction requires K-of-N organizations to cooperate, ensuring source protection survives the compromise of any individual institution.

v0.1.0 31 tests passing 5 modules 0 npm deps Information-theoretic Post-quantum ready
Section 01

Executive Summary

SourceSplit protects journalist sources from single-institution breach via cryptographic splitting. No single press organization holds the entire document — reconstruction requires threshold collaboration.

Two functions cover the complete workflow: splitSourceDocument() takes a source document, a list of press organizations, and a reconstruction threshold (K-of-N), then splits it into shares using XorIDA. Each organization receives exactly one share. reconstructSourceDocument() reassembles the original document from any K shares.

Every share carries cryptographic proof of integrity via HMAC-SHA256, verified before reconstruction. The package generates a SHA-256 hash of the original document, enabling post-reconstruction verification that no data was corrupted during transport or storage.

Documents are classified as source-identity, evidence, or communication to enable policy-driven handling. Source names are pseudonymous — the mapping between real identity and source alias lives outside this package, in journalistic workflows.

Zero npm dependencies. Runs on Node.js 20+, Tauri, modern browsers, Deno, and Bun via Web Crypto API. Dual ESM/CJS builds included.

Section 02

The Problem: Newsroom Vulnerability

A single newsroom breach — whether from a subpoena, a hack, or an insider threat — can expose every source who ever trusted that organization.

Subpoena risk. Governments demand source documents. A newsroom with centralized source records faces legal pressure to disclose. Protection depends entirely on the newscaster's legal team and the courts.

Cyberattack exposure. A breach of one newsroom's database exposes all sources in that system. Encryption at rest helps, but a sophisticated attacker inside the newsroom can access live keys.

Insider threat. A rogue employee, a disgruntled contractor, or a compromised admin account can exfiltrate source records in seconds.

Metadata leakage. Even if documents are encrypted, the metadata — when sources submitted material, from which regions, how many documents — can narrow down identity.

Threat Model Single Newsroom SourceSplit (K-of-N)
Newsroom #1 breached All sources exposed Sources safe (need K orgs)
Subpoena newsroom #1 Forced disclosure Newsroom #1 has 1 share only
Insider threat Full compromise Only 1-of-K shares stolen
K-1 newsrooms compromised All sources known Information-theoretic security

* SourceSplit guarantees: no single organization compromise, no computational break, zero information leakage below threshold.

The Gap in Journalism

Modern newsrooms use encrypted email, encrypted databases, and secure communication platforms. But they still centralize source records. The journalist in the newsroom is the only person who knows which encrypted email is from which source. The document is encrypted on disk, but the key is in memory on the journalist's workstation or in a secrets manager.

SourceSplit shifts the breach boundary. Compromise of one newsroom no longer means compromise of all sources. Reconstruction requires K independent organizations to agree. An attacker must breach K newsrooms simultaneously — a much higher cost.

Section 03

Real-World Use Cases

Three scenarios where SourceSplit protects sources against institutional risk.

📰
Journalism
Whistleblower Network

Three international newsrooms (NYT, Guardian, Der Spiegel) collaborate on an investigation. A whistleblower's documents are split 2-of-3. No single newsroom can be forced or hacked into revealing the source.

splitSourceDocument() + 3 orgs, threshold 2
🔒
Privacy
Classified Leak Defense

A journalist receives classified documents from a government insider. SourceSplit splits them across 4 regional news outlets (US, UK, Germany, Canada) with 3-of-4 threshold. A single leaked share reveals nothing about the document.

4 organizations, 3-of-4 reconstruction
🔎
Investigation
Longevity Archive

A news organization archives sources long-term. Using SourceSplit with a 2-of-2 split between a primary archive and a backup ensures that neither archive alone can expose sources, even if physically stolen or digitally breached.

splitSourceDocument() + dual storage
Section 04

Architecture & Design

SourceSplit follows a strict pipeline: validate → pad → HMAC → split → hash → package. Every step is deterministic and auditable.

XorIDA Threshold Sharing

SourceSplit uses XorIDA (XorIDA over GF(2)) to split documents. XorIDA is information-theoretically secure: any K-1 shares reveal zero information about the plaintext, regardless of computational power.

The splitting process:

  1. Input: document data (Uint8Array), organization count N, threshold K
  2. Find odd prime P ≥ N+1, block size = P-1
  3. Pad document data with PKCS7 to block size multiple
  4. XorIDA split produces N shares, each same size as padded input
  5. Each organization receives share[i]
  6. Reconstruction requires K or more shares

HMAC-SHA256 Integrity Verification

Before reconstruction, every share is verified with an HMAC. This prevents tampering during transport or storage.

The integrity pipeline:

  1. During split: generate HMAC-SHA256 over padded document
  2. Store HMAC signature and HMAC key with each share
  3. During reconstruction: verify HMAC before merging shares
  4. HMAC failure = reject reconstruction, raise HMAC_FAILED error
  5. If HMAC passes, combine shares via XorIDA
  6. Verify reconstructed document hash vs. original
HMAC BEFORE RECONSTRUCTION
HMAC verification happens before share data is processed. This prevents an attacker from injecting corrupted shares. Tampered shares are detected and rejected immediately.

Document Classification & Metadata

Each document carries a classification tag and a sourceAlias (pseudonym). These enable policy-driven handling:

Classification Use Case Policy Example
source-identity Real name, location, contact of source Never store in plaintext anywhere
evidence Documents, media, records from source May be published (redacted); long-term archive
communication Messages, emails, conversations with source Most sensitive; destroy after investigation

The source alias (e.g., deep-throat-7, whistleblower-2026-03) is a pseudonym tracked by the newsroom. SourceSplit does not enforce identity mapping — that policy lives in journalistic workflows.

Section 05

API Reference

SourceSplit exports two main functions and six types covering the complete workflow.

Core Functions

splitSourceDocument(document: SourceDocument, config: SourceConfig) → Promise<Result<SourceSplitResult, SourceError>>

Splits a source document across press organizations using XorIDA. Each organization receives one share. The document can only be reconstructed when threshold organizations combine their shares.

Pipeline: validate config → pad → HMAC → XorIDA split → SHA-256 hash → package shares

Returns: Success: documentId, shares[], documentHash. Failure: INVALID_CONFIG, SPLIT_FAILED, HMAC_FAILED.

reconstructSourceDocument(shares: SourceShare[]) → Promise<Result<Uint8Array, SourceError>>

Reconstructs the original document from K or more organization shares. Verifies HMAC integrity before processing, then uses XorIDA reconstruction to recover plaintext.

Pipeline: validate threshold → verify HMAC on each share → XorIDA reconstruct → unpad → integrity check → return document bytes

Returns: Success: Uint8Array of original document. Failure: INSUFFICIENT_SHARES, HMAC_FAILED, RECONSTRUCT_FAILED, INTEGRITY_FAILED.

Types

Core types
interface SourceDocument {
  documentId: string;
  classification: 'source-identity' | 'evidence' | 'communication';
  data: Uint8Array;
  submittedAt: string;  // ISO 8601
  sourceAlias: string;    // Pseudonymous source name
}

interface PressOrg {
  id: string;                // Unique org identifier
  name: string;              // Display name
  country: string;            // ISO 3166-1 alpha-2
}

interface SourceConfig {
  organizations: PressOrg[];
  threshold: number;         // K-of-N
}

interface SourceShare {
  documentId: string;
  orgId: string;
  index: number;             // 0-based share index
  total: number;             // N total shares
  threshold: number;          // K threshold
  data: string;              // Base64-encoded share with IDA5 header
  hmac: string;              // HMAC-SHA256 (hex)
  hmacKey: string;            // HMAC key for verification (hex)
  originalSize: number;        // Bytes before padding
}

Error Codes

Code Cause Recovery
INVALID_CONFIG Organizations < 2, threshold < 2, or threshold > N Validate org count ≥ 2, threshold ∈ [2, N]
SPLIT_FAILED XorIDA split operation failed (e.g., empty data) Check document.data is not empty; retry
HMAC_FAILED HMAC verification failed during reconstruction Share may be tampered with; discard and use backup
RECONSTRUCT_FAILED XorIDA reconstruction algorithm failed Shares may be corrupted; verify transport/storage
INSUFFICIENT_SHARES Fewer than threshold shares provided Collect K or more shares from K organizations
INTEGRITY_FAILED Reconstructed document hash ≠ original hash Document corrupted post-reconstruction; discard
Section 06

Security Guarantees

SourceSplit provides four core security properties: information-theoretic security, HMAC integrity, no key rotation, and cryptographic randomness.

Information-Theoretic Security

XorIDA guarantees that any subset of K-1 shares reveals zero information about the original document, regardless of computational resources. An attacker with quantum computers cannot break this guarantee — it is information-theoretic, not computational.

This means: compromising K-1 organizations (no matter how badly) still leaves sources safe.

HMAC Integrity & Tamper Detection

Each share carries an HMAC-SHA256 signature. Before reconstruction, every share is verified. If any share is tampered with (bit-flip, truncation, modification), HMAC verification fails and reconstruction is aborted.

HMAC verification happens before XorIDA processing, preventing corrupted data from entering the reconstruction pipeline.

Cryptographic Randomness

All random number generation uses crypto.getRandomValues(). Math.random() is never used. This ensures HMAC keys and any other random material are cryptographically secure.

Post-Quantum Layer

SourceSplit's payload security (XorIDA splitting) is information-theoretically quantum-safe. When messages are exchanged via Xlink/Xchange (transport layer), hybrid post-quantum cryptography protects the envelope:

  • Key exchange: X25519 + ML-KEM-768 (FIPS 203) always-on
  • Signatures: Ed25519 + ML-DSA-65 (FIPS 204) opt-in
METADATA NOT PROTECTED
Document classification, submission timestamp, and source alias are stored in plaintext. This metadata could leak correlation information about sources even without plaintext. Newsrooms should use separate protections for metadata (e.g., encrypt the entire SourceShare object at rest).

Known Limitations

No document size obfuscation. Share sizes reveal approximate original document size, which could narrow down document identity across a corpus.

HMAC keys in shares. HMAC keys are stored alongside shares to enable verification. An organization holding a share can verify it but cannot prevent metadata-level attacks.

Single-use shares. Each document split produces new shares. Old shares are not reusable for different documents.

Section 07

Limitations & Roadmap

SourceSplit v0.1.0 provides core functionality. Future versions will address metadata protection and multi-document workflows.

Current Limitations (v0.1.0)

  • Metadata plaintext: Classification, timestamp, alias are unencrypted. Newsrooms should wrap the entire SourceShare in application-level encryption.
  • No document expiry: Shares persist indefinitely. Newsrooms must implement TTL/deletion policies in their storage layer.
  • No progress callbacks: Split and reconstruction are synchronous operations. Large documents (>10MB) may block for several hundred milliseconds.
  • No key escrow: HMAC keys cannot be held separately. An organization holding a share can verify it independently.

Planned Enhancements (v0.2+)

  • Metadata encryption: Optional wrapper to encrypt classification, timestamp, and alias with a newsroom-controlled key.
  • Async API with progress events: onProgress callback for split/reconstruct to track long-running operations.
  • Document batching: Support for splitting multiple documents in one operation with shared threshold config.
  • Encrypted HMAC keys: HMAC key escrow via a separate key-holder service (not in package scope).

Out of Scope

SourceSplit is a cryptographic splitting library, not a full source protection system. Out of scope:

  • Source identity protection beyond document splitting (e.g., metadata stripping, anonymization)
  • Press organization authentication and access control
  • Secure communication channels between source and journalist (use Xlink/Signal/Wire)
  • Legal protections for journalistic sources (shield laws vary by jurisdiction)
  • Network transport security (use TLS 1.3+; application responsibility)
Section 08

Post-Quantum Security

SourceSplit's core payload security (XorIDA) is information-theoretically quantum-proof. When integrated with Xlink transport, hybrid PQ cryptography protects the entire system.

Payload Layer (XorIDA)

XorIDA threshold sharing is information-theoretically secure against all adversaries, classical and quantum. No computational assumption. K-1 shares reveal zero information regardless of computing power.

Transport Layer (with Xlink)

When SourceShare objects are exchanged via Xlink agents:

  • Key exchange: X25519 + ML-KEM-768 (FIPS 203) — always-on hybrid KEM
  • Signatures (optional): Ed25519 + ML-DSA-65 (FIPS 204) — opt-in via agent config

Recommendation: Applications integrating SourceSplit should create Xlink agents with postQuantumSig: true for full post-quantum protection across all three layers:

  1. Payload: XorIDA (information-theoretic)
  2. Confidentiality: AES-256-GCM + hybrid KEM (computational PQ)
  3. Authenticity: Ed25519 + ML-DSA-65 (signature PQ)
Advanced Topics

Deep Dive: Implementation Details

Appendices covering error taxonomy, benchmarks, and codebase statistics for integrators and operators.

Appendix A

Error Taxonomy

SourceSplit uses a Result<T, E> pattern with discriminated error unions. Every error includes a code, message, and optional documentation link.

Error Class Hierarchy

Error class structure
class SourceSplitError extends Error {
  code: string;        // Machine-readable code
  subCode?: string;      // Sub-code from colon-separated codes
  docUrl?: string;      // Doc link for error context
}

class SourceConfigError extends SourceSplitError {
  // Configuration validation errors
}

class SourceIntegrityError extends SourceSplitError {
  // Cryptographic integrity failures
}

class SourceReconstructError extends SourceSplitError {
  // Reconstruction failures
}

Use toSourceSplitError() to convert a Result error code into a typed error class for catch handlers:

Error conversion
import { splitSourceDocument, toSourceSplitError } from '@private.me/sourcesplit';

const result = await splitSourceDocument(doc, config);

if (!result.ok) {
  const error = toSourceSplitError(result.error.code);
  if (error instanceof SourceConfigError) {
    console.error('Config validation failed', error.message);
  } else if (error instanceof SourceIntegrityError) {
    console.error('Integrity check failed', error.message);
  }
}

Error Codes by Category

Code Class Description
INVALID_CONFIG SourceConfigError Org count <2, threshold <2, or threshold >N
SPLIT_FAILED SourceIntegrityError XorIDA split operation failed
HMAC_FAILED SourceIntegrityError HMAC verification failed on a share
RECONSTRUCT_FAILED SourceReconstructError XorIDA reconstruction failed
INSUFFICIENT_SHARES SourceReconstructError <threshold shares provided
INTEGRITY_FAILED SourceIntegrityError Document hash mismatch post-reconstruction
Appendix B

Performance Benchmarks

SourceSplit performance measured on Node.js 22 LTS. XorIDA is fast; HMAC and hashing dominate for large documents.

1.2ms
1KB split (3-of-3)
18ms
100KB split
320ms
5MB split

Breakdown: Where Time Goes

Step 1KB 100KB 5MB
PKCS7 Padding <0.1ms <0.1ms <1ms
HMAC-SHA256 0.3ms 3ms 150ms
XorIDA Split 0.1ms 5ms 140ms
SHA-256 Hash 0.2ms 2ms 25ms
Reconstruction (K-of-N) 0.8ms 8ms 160ms

* Benchmarks: M1 Pro, Node 22, single-threaded. Actual times vary by CPU/memory/I/O.

Scaling Notes

HMAC and hashing scale linearly with document size (O(n)). XorIDA split is also O(n) but with a small constant. Reconstruction time is dominated by HMAC verification and XorIDA processing.

For >10MB documents, consider:

  • Chunking: Split large documents into 1-10MB chunks, run sourcesplit on each chunk in parallel
  • Async processing: Use Web Workers (browser) or Worker Threads (Node) to avoid blocking
  • Progress callbacks: v0.2+ will support onProgress for long operations
Appendix C

Codebase Statistics

SourceSplit is a compact, focused package with 100% test coverage on cryptographic operations.

~750
Total lines of TypeScript
5
Core modules
31
Test cases
0
npm dependencies

Module Breakdown

Module Purpose Tests
source-splitter.ts Core split pipeline (validate → pad → HMAC → XorIDA → hash) 12
source-reconstructor.ts Core reconstruct pipeline (HMAC → XorIDA → unpad → verify) 10
types.ts TypeScript interfaces (SourceDocument, SourceShare, etc.)
errors.ts Error class hierarchy and conversion
index.ts Barrel export (public API)

Test Coverage

Category Test Count Coverage
Config validation 4 100%
XorIDA splitting 8 100%
HMAC integrity 6 100%
Document reconstruction 7 100%
Abuse cases (data tampering, insufficient shares) 6 100%

Dependencies

Runtime: 0 npm packages

  • @private.me/crypto (monorepo peer, XorIDA + HMAC + padding)
  • @private.me/shared (monorepo peer, Result pattern + encoding)
  • Web Crypto API (builtin, SHA-256 + cryptographic randomness)

Build/Dev: TypeScript, Vitest (test framework only)

Deployment Options

📦

SDK Integration

Embed directly in your application. Runs in your codebase with full programmatic control.

  • npm install @private.me/sourcesplit
  • TypeScript/JavaScript SDK
  • Full source access
  • Enterprise support available
Get Started →
🏢

On-Premise Upon Request

Enterprise CLI for compliance, air-gap, or data residency requirements.

  • Complete data sovereignty
  • Air-gap capable deployment
  • Custom SLA + dedicated support
  • Professional services included
Request Quote →

Enterprise On-Premise Deployment

While sourceSplit is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:

  • Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
  • Air-gapped environments — SCIF, classified networks, offline operations
  • Data residency requirements — EU GDPR, China data laws, government mandates
  • Custom integration needs — Embed in proprietary platforms, specialized workflows

Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.

Contact sales for assessment and pricing →