SourceSplit: Journalist Source Protection
Confidential sources are the backbone of investigative journalism. SourceSplit uses XorIDA (threshold sharing over GF(2)) to split source documents, identity records, and communications across independent press organizations so that no single newsroom holds the complete record. Reconstruction requires K-of-N organizations to cooperate, ensuring source protection survives the compromise of any individual institution.
Executive Summary
SourceSplit protects journalist sources from single-institution breach via cryptographic splitting. No single press organization holds the entire document — reconstruction requires threshold collaboration.
Two functions cover the complete workflow: splitSourceDocument() takes a source document, a list of press organizations, and a reconstruction threshold (K-of-N), then splits it into shares using XorIDA. Each organization receives exactly one share. reconstructSourceDocument() reassembles the original document from any K shares.
Every share carries cryptographic proof of integrity via HMAC-SHA256, verified before reconstruction. The package generates a SHA-256 hash of the original document, enabling post-reconstruction verification that no data was corrupted during transport or storage.
Documents are classified as source-identity, evidence, or communication to enable policy-driven handling. Source names are pseudonymous — the mapping between real identity and source alias lives outside this package, in journalistic workflows.
Zero npm dependencies. Runs on Node.js 20+, Tauri, modern browsers, Deno, and Bun via Web Crypto API. Dual ESM/CJS builds included.
The Problem: Newsroom Vulnerability
A single newsroom breach — whether from a subpoena, a hack, or an insider threat — can expose every source who ever trusted that organization.
Subpoena risk. Governments demand source documents. A newsroom with centralized source records faces legal pressure to disclose. Protection depends entirely on the newscaster's legal team and the courts.
Cyberattack exposure. A breach of one newsroom's database exposes all sources in that system. Encryption at rest helps, but a sophisticated attacker inside the newsroom can access live keys.
Insider threat. A rogue employee, a disgruntled contractor, or a compromised admin account can exfiltrate source records in seconds.
Metadata leakage. Even if documents are encrypted, the metadata — when sources submitted material, from which regions, how many documents — can narrow down identity.
| Threat Model | Single Newsroom | SourceSplit (K-of-N) |
|---|---|---|
| Newsroom #1 breached | All sources exposed | Sources safe (need K orgs) |
| Subpoena newsroom #1 | Forced disclosure | Newsroom #1 has 1 share only |
| Insider threat | Full compromise | Only 1-of-K shares stolen |
| K-1 newsrooms compromised | All sources known | Information-theoretic security |
* SourceSplit guarantees: no single organization compromise, no computational break, zero information leakage below threshold.
The Gap in Journalism
Modern newsrooms use encrypted email, encrypted databases, and secure communication platforms. But they still centralize source records. The journalist in the newsroom is the only person who knows which encrypted email is from which source. The document is encrypted on disk, but the key is in memory on the journalist's workstation or in a secrets manager.
SourceSplit shifts the breach boundary. Compromise of one newsroom no longer means compromise of all sources. Reconstruction requires K independent organizations to agree. An attacker must breach K newsrooms simultaneously — a much higher cost.
Real-World Use Cases
Three scenarios where SourceSplit protects sources against institutional risk.
Three international newsrooms (NYT, Guardian, Der Spiegel) collaborate on an investigation. A whistleblower's documents are split 2-of-3. No single newsroom can be forced or hacked into revealing the source.
splitSourceDocument() + 3 orgs, threshold 2A journalist receives classified documents from a government insider. SourceSplit splits them across 4 regional news outlets (US, UK, Germany, Canada) with 3-of-4 threshold. A single leaked share reveals nothing about the document.
4 organizations, 3-of-4 reconstructionA news organization archives sources long-term. Using SourceSplit with a 2-of-2 split between a primary archive and a backup ensures that neither archive alone can expose sources, even if physically stolen or digitally breached.
splitSourceDocument() + dual storageArchitecture & Design
SourceSplit follows a strict pipeline: validate → pad → HMAC → split → hash → package. Every step is deterministic and auditable.
XorIDA Threshold Sharing
SourceSplit uses XorIDA (XorIDA over GF(2)) to split documents. XorIDA is information-theoretically secure: any K-1 shares reveal zero information about the plaintext, regardless of computational power.
The splitting process:
- Input: document data (Uint8Array), organization count N, threshold K
- Find odd prime P ≥ N+1, block size = P-1
- Pad document data with PKCS7 to block size multiple
- XorIDA split produces N shares, each same size as padded input
- Each organization receives share[i]
- Reconstruction requires K or more shares
HMAC-SHA256 Integrity Verification
Before reconstruction, every share is verified with an HMAC. This prevents tampering during transport or storage.
The integrity pipeline:
- During split: generate HMAC-SHA256 over padded document
- Store HMAC signature and HMAC key with each share
- During reconstruction: verify HMAC before merging shares
- HMAC failure = reject reconstruction, raise HMAC_FAILED error
- If HMAC passes, combine shares via XorIDA
- Verify reconstructed document hash vs. original
Document Classification & Metadata
Each document carries a classification tag and a sourceAlias (pseudonym). These enable policy-driven handling:
| Classification | Use Case | Policy Example |
|---|---|---|
| source-identity | Real name, location, contact of source | Never store in plaintext anywhere |
| evidence | Documents, media, records from source | May be published (redacted); long-term archive |
| communication | Messages, emails, conversations with source | Most sensitive; destroy after investigation |
The source alias (e.g., deep-throat-7, whistleblower-2026-03) is a pseudonym tracked by the newsroom. SourceSplit does not enforce identity mapping — that policy lives in journalistic workflows.
API Reference
SourceSplit exports two main functions and six types covering the complete workflow.
Core Functions
Splits a source document across press organizations using XorIDA. Each organization receives one share. The document can only be reconstructed when threshold organizations combine their shares.
Pipeline: validate config → pad → HMAC → XorIDA split → SHA-256 hash → package shares
Returns: Success: documentId, shares[], documentHash. Failure: INVALID_CONFIG, SPLIT_FAILED, HMAC_FAILED.
Reconstructs the original document from K or more organization shares. Verifies HMAC integrity before processing, then uses XorIDA reconstruction to recover plaintext.
Pipeline: validate threshold → verify HMAC on each share → XorIDA reconstruct → unpad → integrity check → return document bytes
Returns: Success: Uint8Array of original document. Failure: INSUFFICIENT_SHARES, HMAC_FAILED, RECONSTRUCT_FAILED, INTEGRITY_FAILED.
Types
interface SourceDocument { documentId: string; classification: 'source-identity' | 'evidence' | 'communication'; data: Uint8Array; submittedAt: string; // ISO 8601 sourceAlias: string; // Pseudonymous source name } interface PressOrg { id: string; // Unique org identifier name: string; // Display name country: string; // ISO 3166-1 alpha-2 } interface SourceConfig { organizations: PressOrg[]; threshold: number; // K-of-N } interface SourceShare { documentId: string; orgId: string; index: number; // 0-based share index total: number; // N total shares threshold: number; // K threshold data: string; // Base64-encoded share with IDA5 header hmac: string; // HMAC-SHA256 (hex) hmacKey: string; // HMAC key for verification (hex) originalSize: number; // Bytes before padding }
Error Codes
| Code | Cause | Recovery |
|---|---|---|
| INVALID_CONFIG | Organizations < 2, threshold < 2, or threshold > N | Validate org count ≥ 2, threshold ∈ [2, N] |
| SPLIT_FAILED | XorIDA split operation failed (e.g., empty data) | Check document.data is not empty; retry |
| HMAC_FAILED | HMAC verification failed during reconstruction | Share may be tampered with; discard and use backup |
| RECONSTRUCT_FAILED | XorIDA reconstruction algorithm failed | Shares may be corrupted; verify transport/storage |
| INSUFFICIENT_SHARES | Fewer than threshold shares provided | Collect K or more shares from K organizations |
| INTEGRITY_FAILED | Reconstructed document hash ≠ original hash | Document corrupted post-reconstruction; discard |
Security Guarantees
SourceSplit provides four core security properties: information-theoretic security, HMAC integrity, no key rotation, and cryptographic randomness.
Information-Theoretic Security
XorIDA guarantees that any subset of K-1 shares reveals zero information about the original document, regardless of computational resources. An attacker with quantum computers cannot break this guarantee — it is information-theoretic, not computational.
This means: compromising K-1 organizations (no matter how badly) still leaves sources safe.
HMAC Integrity & Tamper Detection
Each share carries an HMAC-SHA256 signature. Before reconstruction, every share is verified. If any share is tampered with (bit-flip, truncation, modification), HMAC verification fails and reconstruction is aborted.
HMAC verification happens before XorIDA processing, preventing corrupted data from entering the reconstruction pipeline.
Cryptographic Randomness
All random number generation uses crypto.getRandomValues(). Math.random() is never used. This ensures HMAC keys and any other random material are cryptographically secure.
Post-Quantum Layer
SourceSplit's payload security (XorIDA splitting) is information-theoretically quantum-safe. When messages are exchanged via Xlink/Xchange (transport layer), hybrid post-quantum cryptography protects the envelope:
- Key exchange: X25519 + ML-KEM-768 (FIPS 203) always-on
- Signatures: Ed25519 + ML-DSA-65 (FIPS 204) opt-in
Known Limitations
No document size obfuscation. Share sizes reveal approximate original document size, which could narrow down document identity across a corpus.
HMAC keys in shares. HMAC keys are stored alongside shares to enable verification. An organization holding a share can verify it but cannot prevent metadata-level attacks.
Single-use shares. Each document split produces new shares. Old shares are not reusable for different documents.
Limitations & Roadmap
SourceSplit v0.1.0 provides core functionality. Future versions will address metadata protection and multi-document workflows.
Current Limitations (v0.1.0)
- Metadata plaintext: Classification, timestamp, alias are unencrypted. Newsrooms should wrap the entire SourceShare in application-level encryption.
- No document expiry: Shares persist indefinitely. Newsrooms must implement TTL/deletion policies in their storage layer.
- No progress callbacks: Split and reconstruction are synchronous operations. Large documents (>10MB) may block for several hundred milliseconds.
- No key escrow: HMAC keys cannot be held separately. An organization holding a share can verify it independently.
Planned Enhancements (v0.2+)
- Metadata encryption: Optional wrapper to encrypt classification, timestamp, and alias with a newsroom-controlled key.
- Async API with progress events: onProgress callback for split/reconstruct to track long-running operations.
- Document batching: Support for splitting multiple documents in one operation with shared threshold config.
- Encrypted HMAC keys: HMAC key escrow via a separate key-holder service (not in package scope).
Out of Scope
SourceSplit is a cryptographic splitting library, not a full source protection system. Out of scope:
- Source identity protection beyond document splitting (e.g., metadata stripping, anonymization)
- Press organization authentication and access control
- Secure communication channels between source and journalist (use Xlink/Signal/Wire)
- Legal protections for journalistic sources (shield laws vary by jurisdiction)
- Network transport security (use TLS 1.3+; application responsibility)
Post-Quantum Security
SourceSplit's core payload security (XorIDA) is information-theoretically quantum-proof. When integrated with Xlink transport, hybrid PQ cryptography protects the entire system.
Payload Layer (XorIDA)
XorIDA threshold sharing is information-theoretically secure against all adversaries, classical and quantum. No computational assumption. K-1 shares reveal zero information regardless of computing power.
Transport Layer (with Xlink)
When SourceShare objects are exchanged via Xlink agents:
- Key exchange: X25519 + ML-KEM-768 (FIPS 203) — always-on hybrid KEM
- Signatures (optional): Ed25519 + ML-DSA-65 (FIPS 204) — opt-in via agent config
Recommendation: Applications integrating SourceSplit should create Xlink agents with postQuantumSig: true for full post-quantum protection across all three layers:
- Payload: XorIDA (information-theoretic)
- Confidentiality: AES-256-GCM + hybrid KEM (computational PQ)
- Authenticity: Ed25519 + ML-DSA-65 (signature PQ)
Deep Dive: Implementation Details
Appendices covering error taxonomy, benchmarks, and codebase statistics for integrators and operators.
Error Taxonomy
SourceSplit uses a Result<T, E> pattern with discriminated error unions. Every error includes a code, message, and optional documentation link.
Error Class Hierarchy
class SourceSplitError extends Error { code: string; // Machine-readable code subCode?: string; // Sub-code from colon-separated codes docUrl?: string; // Doc link for error context } class SourceConfigError extends SourceSplitError { // Configuration validation errors } class SourceIntegrityError extends SourceSplitError { // Cryptographic integrity failures } class SourceReconstructError extends SourceSplitError { // Reconstruction failures }
Use toSourceSplitError() to convert a Result error code into a typed error class for catch handlers:
import { splitSourceDocument, toSourceSplitError } from '@private.me/sourcesplit'; const result = await splitSourceDocument(doc, config); if (!result.ok) { const error = toSourceSplitError(result.error.code); if (error instanceof SourceConfigError) { console.error('Config validation failed', error.message); } else if (error instanceof SourceIntegrityError) { console.error('Integrity check failed', error.message); } }
Error Codes by Category
| Code | Class | Description |
|---|---|---|
| INVALID_CONFIG | SourceConfigError | Org count <2, threshold <2, or threshold >N |
| SPLIT_FAILED | SourceIntegrityError | XorIDA split operation failed |
| HMAC_FAILED | SourceIntegrityError | HMAC verification failed on a share |
| RECONSTRUCT_FAILED | SourceReconstructError | XorIDA reconstruction failed |
| INSUFFICIENT_SHARES | SourceReconstructError | <threshold shares provided |
| INTEGRITY_FAILED | SourceIntegrityError | Document hash mismatch post-reconstruction |
Performance Benchmarks
SourceSplit performance measured on Node.js 22 LTS. XorIDA is fast; HMAC and hashing dominate for large documents.
Breakdown: Where Time Goes
| Step | 1KB | 100KB | 5MB |
|---|---|---|---|
| PKCS7 Padding | <0.1ms | <0.1ms | <1ms |
| HMAC-SHA256 | 0.3ms | 3ms | 150ms |
| XorIDA Split | 0.1ms | 5ms | 140ms |
| SHA-256 Hash | 0.2ms | 2ms | 25ms |
| Reconstruction (K-of-N) | 0.8ms | 8ms | 160ms |
* Benchmarks: M1 Pro, Node 22, single-threaded. Actual times vary by CPU/memory/I/O.
Scaling Notes
HMAC and hashing scale linearly with document size (O(n)). XorIDA split is also O(n) but with a small constant. Reconstruction time is dominated by HMAC verification and XorIDA processing.
For >10MB documents, consider:
- Chunking: Split large documents into 1-10MB chunks, run sourcesplit on each chunk in parallel
- Async processing: Use Web Workers (browser) or Worker Threads (Node) to avoid blocking
- Progress callbacks: v0.2+ will support onProgress for long operations
Codebase Statistics
SourceSplit is a compact, focused package with 100% test coverage on cryptographic operations.
Module Breakdown
| Module | Purpose | Tests |
|---|---|---|
| source-splitter.ts | Core split pipeline (validate → pad → HMAC → XorIDA → hash) | 12 |
| source-reconstructor.ts | Core reconstruct pipeline (HMAC → XorIDA → unpad → verify) | 10 |
| types.ts | TypeScript interfaces (SourceDocument, SourceShare, etc.) | — |
| errors.ts | Error class hierarchy and conversion | — |
| index.ts | Barrel export (public API) | — |
Test Coverage
| Category | Test Count | Coverage |
|---|---|---|
| Config validation | 4 | 100% |
| XorIDA splitting | 8 | 100% |
| HMAC integrity | 6 | 100% |
| Document reconstruction | 7 | 100% |
| Abuse cases (data tampering, insufficient shares) | 6 | 100% |
Dependencies
Runtime: 0 npm packages
- @private.me/crypto (monorepo peer, XorIDA + HMAC + padding)
- @private.me/shared (monorepo peer, Result pattern + encoding)
- Web Crypto API (builtin, SHA-256 + cryptographic randomness)
Build/Dev: TypeScript, Vitest (test framework only)
Deployment Options
SaaS Recommended
Fully managed infrastructure. Call our REST API, we handle scaling, updates, and operations.
- Zero infrastructure setup
- Automatic updates
- 99.9% uptime SLA
- Enterprise SLA available
SDK Integration
Embed directly in your application. Runs in your codebase with full programmatic control.
npm install @private.me/sourcesplit- TypeScript/JavaScript SDK
- Full source access
- Enterprise support available
On-Premise Upon Request
Enterprise CLI for compliance, air-gap, or data residency requirements.
- Complete data sovereignty
- Air-gap capable deployment
- Custom SLA + dedicated support
- Professional services included
Enterprise On-Premise Deployment
While sourceSplit is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:
- Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
- Air-gapped environments — SCIF, classified networks, offline operations
- Data residency requirements — EU GDPR, China data laws, government mandates
- Custom integration needs — Embed in proprietary platforms, specialized workflows
Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.