PRIVATE.ME · Technical White Paper

SourceSplit: Journalist Source Protection

Confidential sources are the backbone of investigative journalism. SourceSplit uses XorIDA (threshold sharing over GF(2)) to split source documents, identity records, and communications across independent press organizations so that no single newsroom holds the complete record. Reconstruction requires K-of-N organizations to cooperate, ensuring source protection survives the compromise of any individual institution.

v0.1.0 31 tests passing 5 modules 0 npm deps Information-theoretic Post-quantum ready

Section 01

Executive Summary

SourceSplit protects journalist sources from single-institution breach via cryptographic splitting. No single press organization holds the entire document — reconstruction requires threshold collaboration.

Two functions cover the complete workflow: splitSourceDocument() takes a source document, a list of press organizations, and a reconstruction threshold (K-of-N), then splits it into shares using XorIDA. Each organization receives exactly one share. reconstructSourceDocument() reassembles the original document from any K shares.

Every share carries cryptographic proof of integrity via HMAC-SHA256, verified before reconstruction. The package generates a SHA-256 hash of the original document, enabling post-reconstruction verification that no data was corrupted during transport or storage.

Documents are classified as source-identity, evidence, or communication to enable policy-driven handling. Source names are pseudonymous — the mapping between real identity and source alias lives outside this package, in journalistic workflows.

Zero npm dependencies. Runs on Node.js 20+, Tauri, modern browsers, Deno, and Bun via Web Crypto API. Dual ESM/CJS builds included.

Section 02

The Problem: Newsroom Vulnerability

A single newsroom breach — whether from a subpoena, a hack, or an insider threat — can expose every source who ever trusted that organization.

Subpoena risk. Governments demand source documents. A newsroom with centralized source records faces legal pressure to disclose. Protection depends entirely on the newscaster's legal team and the courts.

Cyberattack exposure. A breach of one newsroom's database exposes all sources in that system. Encryption at rest helps, but a sophisticated attacker inside the newsroom can access live keys.

Insider threat. A rogue employee, a disgruntled contractor, or a compromised admin account can exfiltrate source records in seconds.

Metadata leakage. Even if documents are encrypted, the metadata — when sources submitted material, from which regions, how many documents — can narrow down identity.

Threat Model	Single Newsroom	SourceSplit (K-of-N)
Newsroom #1 breached	All sources exposed	Sources safe (need K orgs)
Subpoena newsroom #1	Forced disclosure	Newsroom #1 has 1 share only
Insider threat	Full compromise	Only 1-of-K shares stolen
K-1 newsrooms compromised	All sources known	Information-theoretic security

* SourceSplit guarantees: no single organization compromise, no computational break, zero information leakage below threshold.

The Gap in Journalism

Modern newsrooms use encrypted email, encrypted databases, and secure communication platforms. But they still centralize source records. The journalist in the newsroom is the only person who knows which encrypted email is from which source. The document is encrypted on disk, but the key is in memory on the journalist's workstation or in a secrets manager.

SourceSplit shifts the breach boundary. Compromise of one newsroom no longer means compromise of all sources. Reconstruction requires K independent organizations to agree. An attacker must breach K newsrooms simultaneously — a much higher cost.

Section 03

Real-World Use Cases

Three scenarios where SourceSplit protects sources against institutional risk.

Journalism

Whistleblower Network

Three international newsrooms (NYT, Guardian, Der Spiegel) collaborate on an investigation. A whistleblower's documents are split 2-of-3. No single newsroom can be forced or hacked into revealing the source.

splitSourceDocument() + 3 orgs, threshold 2

Privacy

Classified Leak Defense

A journalist receives classified documents from a government insider. SourceSplit splits them across 4 regional news outlets (US, UK, Germany, Canada) with 3-of-4 threshold. A single leaked share reveals nothing about the document.

4 organizations, 3-of-4 reconstruction

Investigation

Longevity Archive

A news organization archives sources long-term. Using SourceSplit with a 2-of-2 split between a primary archive and a backup ensures that neither archive alone can expose sources, even if physically stolen or digitally breached.

splitSourceDocument() + dual storage

Section 04

Architecture & Design

SourceSplit follows a strict pipeline: validate → pad → HMAC → split → hash → package. Every step is deterministic and auditable.

XorIDA Threshold Sharing

SourceSplit uses XorIDA (XorIDA over GF(2)) to split documents. XorIDA is information-theoretically secure: any K-1 shares reveal zero information about the plaintext, regardless of computational power.

The splitting process:

Input: document data (Uint8Array), organization count N, threshold K
Find odd prime P ≥ N+1, block size = P-1
Pad document data with PKCS7 to block size multiple
XorIDA split produces N shares, each same size as padded input
Each organization receives share[i]
Reconstruction requires K or more shares

HMAC-SHA256 Integrity Verification

Before reconstruction, every share is verified with an HMAC. This prevents tampering during transport or storage.

The integrity pipeline:

During split: generate HMAC-SHA256 over padded document
Store HMAC signature and HMAC key with each share
During reconstruction: verify HMAC before merging shares
HMAC failure = reject reconstruction, raise HMAC_FAILED error
If HMAC passes, combine shares via XorIDA
Verify reconstructed document hash vs. original

HMAC BEFORE RECONSTRUCTION

HMAC verification happens before share data is processed. This prevents an attacker from injecting corrupted shares. Tampered shares are detected and rejected immediately.

Document Classification & Metadata

Each document carries a classification tag and a sourceAlias (pseudonym). These enable policy-driven handling:

Classification	Use Case	Policy Example
source-identity	Real name, location, contact of source	Never store in plaintext anywhere
evidence	Documents, media, records from source	May be published (redacted); long-term archive
communication	Messages, emails, conversations with source	Most sensitive; destroy after investigation

The source alias (e.g., deep-throat-7, whistleblower-2026-03) is a pseudonym tracked by the newsroom. SourceSplit does not enforce identity mapping — that policy lives in journalistic workflows.

Section 05

API Reference

SourceSplit exports two main functions and six types covering the complete workflow.

Core Functions

splitSourceDocument(document: SourceDocument, config: SourceConfig) → Promise<Result<SourceSplitResult, SourceError>>

Splits a source document across press organizations using XorIDA. Each organization receives one share. The document can only be reconstructed when threshold organizations combine their shares.

Pipeline: validate config → pad → HMAC → XorIDA split → SHA-256 hash → package shares

Returns: Success: documentId, shares[], documentHash. Failure: INVALID_CONFIG, SPLIT_FAILED, HMAC_FAILED.

reconstructSourceDocument(shares: SourceShare[]) → Promise<Result<Uint8Array, SourceError>>

Reconstructs the original document from K or more organization shares. Verifies HMAC integrity before processing, then uses XorIDA reconstruction to recover plaintext.

Pipeline: validate threshold → verify HMAC on each share → XorIDA reconstruct → unpad → integrity check → return document bytes

Returns: Success: Uint8Array of original document. Failure: INSUFFICIENT_SHARES, HMAC_FAILED, RECONSTRUCT_FAILED, INTEGRITY_FAILED.

Types

Core types

interface SourceDocument {
  documentId: string;
  classification: 'source-identity' | 'evidence' | 'communication';
  data: Uint8Array;
  submittedAt: string;  // ISO 8601
  sourceAlias: string;    // Pseudonymous source name
}

interface PressOrg {
  id: string;                // Unique org identifier
  name: string;              // Display name
  country: string;            // ISO 3166-1 alpha-2
}

interface SourceConfig {
  organizations: PressOrg[];
  threshold: number;         // K-of-N
}

interface SourceShare {
  documentId: string;
  orgId: string;
  index: number;             // 0-based share index
  total: number;             // N total shares
  threshold: number;          // K threshold
  data: string;              // Base64-encoded share with IDA5 header
  hmac: string;              // HMAC-SHA256 (hex)
  hmacKey: string;            // HMAC key for verification (hex)
  originalSize: number;        // Bytes before padding
}

Error Codes

Code	Cause	Recovery
INVALID_CONFIG	Organizations < 2, threshold < 2, or threshold > N	Validate org count ≥ 2, threshold ∈ [2, N]
SPLIT_FAILED	XorIDA split operation failed (e.g., empty data)	Check document.data is not empty; retry
HMAC_FAILED	HMAC verification failed during reconstruction	Share may be tampered with; discard and use backup
RECONSTRUCT_FAILED	XorIDA reconstruction algorithm failed	Shares may be corrupted; verify transport/storage
INSUFFICIENT_SHARES	Fewer than threshold shares provided	Collect K or more shares from K organizations
INTEGRITY_FAILED	Reconstructed document hash ≠ original hash	Document corrupted post-reconstruction; discard

Section 06

Security Guarantees

SourceSplit provides four core security properties: information-theoretic security, HMAC integrity, no key rotation, and cryptographic randomness.

Information-Theoretic Security

XorIDA guarantees that any subset of K-1 shares reveals zero information about the original document, regardless of computational resources. An attacker with quantum computers cannot break this guarantee — it is information-theoretic, not computational.

This means: compromising K-1 organizations (no matter how badly) still leaves sources safe.

HMAC Integrity & Tamper Detection

Each share carries an HMAC-SHA256 signature. Before reconstruction, every share is verified. If any share is tampered with (bit-flip, truncation, modification), HMAC verification fails and reconstruction is aborted.

HMAC verification happens before XorIDA processing, preventing corrupted data from entering the reconstruction pipeline.

Cryptographic Randomness

All random number generation uses crypto.getRandomValues(). Math.random() is never used. This ensures HMAC keys and any other random material are cryptographically secure.

Post-Quantum Layer

SourceSplit's payload security (XorIDA splitting) is information-theoretically quantum-safe. When messages are exchanged via xBind/Xchange (transport layer), hybrid post-quantum cryptography protects the envelope:

Key exchange: X25519 + ML-KEM-768 (FIPS 203) always-on
Signatures: Ed25519 + ML-DSA-65 (FIPS 204) opt-in

METADATA NOT PROTECTED

Document classification, submission timestamp, and source alias are stored in plaintext. This metadata could leak correlation information about sources even without plaintext. Newsrooms should use separate protections for metadata (e.g., encrypt the entire SourceShare object at rest).

Known Limitations

No document size obfuscation. Share sizes reveal approximate original document size, which could narrow down document identity across a corpus.

HMAC keys in shares. HMAC keys are stored alongside shares to enable verification. An organization holding a share can verify it but cannot prevent metadata-level attacks.

Single-use shares. Each document split produces new shares. Old shares are not reusable for different documents.

Section 07

Limitations & Roadmap

SourceSplit v0.1.0 provides core functionality. Future versions will address metadata protection and multi-document workflows.

Current Limitations (v0.1.0)

Metadata plaintext: Classification, timestamp, alias are unencrypted. Newsrooms should wrap the entire SourceShare in application-level encryption.
No document expiry: Shares persist indefinitely. Newsrooms must implement TTL/deletion policies in their storage layer.
No progress callbacks: Split and reconstruction are synchronous operations. Large documents (>10MB) may block for several hundred milliseconds.
No key escrow: HMAC keys cannot be held separately. An organization holding a share can verify it independently.

Planned Enhancements (v0.2+)

Metadata encryption: Optional wrapper to encrypt classification, timestamp, and alias with a newsroom-controlled key.
Async API with progress events: onProgress callback for split/reconstruct to track long-running operations.
Document batching: Support for splitting multiple documents in one operation with shared threshold config.
Encrypted HMAC keys: HMAC key escrow via a separate key-holder service (not in package scope).

Out of Scope

SourceSplit is a cryptographic splitting library, not a full source protection system. Out of scope:

Source identity protection beyond document splitting (e.g., metadata stripping, anonymization)
Press organization authentication and access control
Secure communication channels between source and journalist (use xBind or other E2EE protocols)
Legal protections for journalistic sources (shield laws vary by jurisdiction)
Network transport security (use TLS 1.3+; application responsibility)

Section 08

Post-Quantum Security

SourceSplit's core payload security (XorIDA) is information-theoretically quantum-proof. When integrated with xBind transport, hybrid PQ cryptography protects the entire system.

Payload Layer (XorIDA)

XorIDA threshold sharing is information-theoretically secure against all adversaries, classical and quantum. No computational assumption. K-1 shares reveal zero information regardless of computing power.

Transport Layer (with xBind)

When SourceShare objects are exchanged via xBind agents:

Key exchange: X25519 + ML-KEM-768 (FIPS 203) — always-on hybrid KEM
Signatures (optional): Ed25519 + ML-DSA-65 (FIPS 204) — opt-in via agent config

Recommendation: Applications integrating SourceSplit should create xBind agents with postQuantumSig: true for full post-quantum protection across all three layers:

Payload: XorIDA (information-theoretic)
Confidentiality: AES-256-GCM + hybrid KEM (computational PQ)
Authenticity: Ed25519 + ML-DSA-65 (signature PQ)

Advanced Topics

Deep Dive: Implementation Details

Appendices covering error taxonomy, benchmarks, and codebase statistics for integrators and operators.

Appendix A

Error Taxonomy

SourceSplit uses a Result<T, E> pattern with discriminated error unions. Every error includes a code, message, and optional documentation link.

Error Class Hierarchy

Error class structure

class SourceSplitError extends Error {
  code: string;        // Machine-readable code
  subCode?: string;      // Sub-code from colon-separated codes
  docUrl?: string;      // Doc link for error context
}

class SourceConfigError extends SourceSplitError {
  // Configuration validation errors
}

class SourceIntegrityError extends SourceSplitError {
  // Cryptographic integrity failures
}

class SourceReconstructError extends SourceSplitError {
  // Reconstruction failures
}

Use toSourceSplitError() to convert a Result error code into a typed error class for catch handlers:

Error conversion

import { splitSourceDocument, toSourceSplitError } from '@private.me/sourcesplit';

const result = await splitSourceDocument(doc, config);

if (!result.ok) {
  const error = toSourceSplitError(result.error.code);
  if (error instanceof SourceConfigError) {
    console.error('Config validation failed', error.message);
  } else if (error instanceof SourceIntegrityError) {
    console.error('Integrity check failed', error.message);
  }
}

Error Codes by Category

Code	Class	Description
INVALID_CONFIG	SourceConfigError	Org count <2, threshold <2, or threshold >N
SPLIT_FAILED	SourceIntegrityError	XorIDA split operation failed
HMAC_FAILED	SourceIntegrityError	HMAC verification failed on a share
RECONSTRUCT_FAILED	SourceReconstructError	XorIDA reconstruction failed
INSUFFICIENT_SHARES	SourceReconstructError	<threshold shares provided
INTEGRITY_FAILED	SourceIntegrityError	Document hash mismatch post-reconstruction

Appendix B

Performance Benchmarks

SourceSplit performance measured on Node.js 22 LTS. XorIDA is fast; HMAC and hashing dominate for large documents.

1.2ms

1KB split (3-of-3)

18ms

100KB split

320ms

5MB split

Breakdown: Where Time Goes

Step	1KB	100KB	5MB
PKCS7 Padding	<0.1ms	<0.1ms	<1ms
HMAC-SHA256	0.3ms	3ms	150ms
XorIDA Split	0.1ms	5ms	140ms
SHA-256 Hash	0.2ms	2ms	25ms
Reconstruction (K-of-N)	0.8ms	8ms	160ms

* Benchmarks: M1 Pro, Node 22, single-threaded. Actual times vary by CPU/memory/I/O.

Scaling Notes

HMAC and hashing scale linearly with document size (O(n)). XorIDA split is also O(n) but with a small constant. Reconstruction time is dominated by HMAC verification and XorIDA processing.

For >10MB documents, consider:

Chunking: Split large documents into 1-10MB chunks, run sourcesplit on each chunk in parallel
Async processing: Use Web Workers (browser) or Worker Threads (Node) to avoid blocking
Progress callbacks: v0.2+ will support onProgress for long operations

Appendix C

Codebase Statistics

SourceSplit is a compact, focused package with 100% test coverage on cryptographic operations.

~750

Total lines of TypeScript

Core modules

Test cases

npm dependencies

Module Breakdown

Module	Purpose	Tests
source-splitter.ts	Core split pipeline (validate → pad → HMAC → XorIDA → hash)	12
source-reconstructor.ts	Core reconstruct pipeline (HMAC → XorIDA → unpad → verify)	10
types.ts	TypeScript interfaces (SourceDocument, SourceShare, etc.)	—
errors.ts	Error class hierarchy and conversion	—
index.ts	Barrel export (public API)	—

Test Coverage

Category	Test Count	Coverage
Config validation	4	100%
XorIDA splitting	8	100%
HMAC integrity	6	100%
Document reconstruction	7	100%
Abuse cases (data tampering, insufficient shares)	6	100%

Dependencies

Runtime: 0 npm packages

@private.me/crypto (monorepo peer, XorIDA + HMAC + padding)
@private.me/shared (monorepo peer, Result pattern + encoding)
Web Crypto API (builtin, SHA-256 + cryptographic randomness)

Build/Dev: TypeScript, Vitest (test framework only)

Deployment Options

SaaS Recommended

Fully managed infrastructure. Call our REST API, we handle scaling, updates, and operations.

Zero infrastructure setup
Automatic updates
99.9% uptime SLA
Enterprise SLA available

View Pricing →

SDK Integration

Embed directly in your application. Runs in your codebase with full programmatic control.

npm install @private.me/sourcesplit
TypeScript/JavaScript SDK
Full source access
Enterprise support available

Get Started →

On-Premise Upon Request

Enterprise CLI for compliance, air-gap, or data residency requirements.

Complete data sovereignty
Air-gap capable deployment
Custom SLA + dedicated support
Professional services included

Request Quote →

Enterprise On-Premise Deployment

While sourceSplit is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:

Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
Air-gapped environments — SCIF, classified networks, offline operations
Data residency requirements — EU GDPR, China data laws, government mandates
Custom integration needs — Embed in proprietary platforms, specialized workflows

Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.

Contact sales for assessment and pricing →

Pricing

PRICING

Coming Soon

Pricing details will be available when this ACI launches. Subscribe to updates to be notified.

Questions about this ACI? Contact us