Fedlearn: Gradient Privacy via XorIDA
Federated learning with information-theoretic gradient privacy. Client gradients split via XorIDA threshold sharing across multiple aggregator nodes. No single aggregator sees complete gradient updates — model inversion and membership inference attacks become mathematically impossible, not computationally hard. Zero npm dependencies.
Executive Summary
Federated learning allows distributed model training without sharing raw data. But gradient updates themselves leak information — model inversion attacks can reconstruct training samples, membership inference can detect if specific data was used for training.
Fedlearn splits gradient updates via XorIDA (threshold sharing over GF(2)) across multiple independent aggregator nodes. A 2-of-3 configuration means any single compromised aggregator learns zero information about the gradient — not "computationally hard to break," but mathematically impossible.
Two core functions cover the entire workflow: splitGradient() takes a client's gradient update (Float32Array serialized as Uint8Array), generates an HMAC-SHA256 integrity tag, pads to the next odd prime, and splits into N shares with K-of-N reconstruction threshold. aggregateGradients() collects threshold shares from multiple clients, reconstructs each client's gradient (HMAC verification before reconstruction, fail closed), and computes a sample-weighted average for the training round.
Zero configuration out of the box. Zero npm runtime dependencies. Runs anywhere the Web Crypto API is available — Node.js, Deno, Bun, Cloudflare Workers, browsers. Dual ESM and CJS builds ship in a single package.
Developer Experience
Fedlearn provides structured error codes and comprehensive validation to help developers build reliable federated learning systems.
Structured Error Handling
Fedlearn uses a Result<T, E> pattern with detailed error structures. Every error includes a machine-readable code and human-readable message.
type FedLearnError = | { code: 'INVALID_CONFIG'; message: string } | { code: 'SPLIT_FAILED'; message: string } | { code: 'HMAC_FAILED'; message: string } | { code: 'RECONSTRUCT_FAILED'; message: string } | { code: 'INSUFFICIENT_SHARES'; message: string } | { code: 'ROUND_MISMATCH'; message: string };
Error Categories
Fedlearn organizes 6 error codes across 3 categories:
| Category | Example Codes | When |
|---|---|---|
| Configuration | INVALID_CONFIG, ROUND_MISMATCH | Config validation, round consistency |
| Integrity | SPLIT_FAILED, HMAC_FAILED | XorIDA split failures, HMAC verification |
| Reconstruction | RECONSTRUCT_FAILED, INSUFFICIENT_SHARES | Share reconstruction, threshold enforcement |
FedLearnError, FedLearnConfigError, FedLearnIntegrityError, and FedLearnReconstructError for try/catch consumers. Use toFedLearnError(code) to convert string codes to class instances.
The Problem
Federated learning keeps raw training data on-device, but gradient updates themselves leak sensitive information about the training corpus.
Model inversion attacks. An attacker with access to gradient updates can reconstruct representative samples from the training data. In healthcare federated learning, this means patient records can be partially recovered from model updates.
Membership inference attacks. An adversary can determine whether a specific data point was used in training by analyzing gradient behavior. This violates privacy guarantees even when raw data never leaves the device.
Central aggregator is a single point of failure. Traditional federated learning routes all gradients through a central aggregation server. If that server is compromised, the attacker sees every gradient from every client — full visibility into the training corpus across all participants.
Differential privacy adds noise. DP-SGD injects Gaussian noise into gradients to provide statistical privacy. But this degrades model accuracy, requires careful hyperparameter tuning, and still relies on computational assumptions about the attacker's capabilities.
The Old Way
The New Way
Real-World Use Cases
Six scenarios where Fedlearn provides information-theoretic gradient privacy for federated learning deployments.
Train diagnostic models across multiple hospitals without sharing patient data. Gradient shares routed to 3 aggregator nodes — any single node learns zero information about patient records.
HIPAA compliant, 2-of-3 thresholdFederated fraud model training across multiple financial institutions. Transaction patterns stay private — gradient splits prevent reconstruction of customer behavior.
PCI DSS, sample-weighted aggregationTrain next-word prediction models across millions of devices. User typing data never reconstructible from gradients — information-theoretic privacy guarantee.
splitGradient() on-device, 2-of-3Federated model training across intelligence agencies with classification boundaries. 3-of-5 threshold across classified and unclassified aggregators.
3-of-5, classified networksMulti-institution research projects with competitive data. Each lab contributes gradients without revealing proprietary training corpus.
Academic collaboration, IP protectionDistributed training across IoT sensor networks. Gradient shares routed to edge aggregators — sensor data patterns unrecoverable from individual shares.
Low bandwidth, threshold aggregationSolution Architecture
Two core operations: gradient splitting on training clients and threshold aggregation on aggregator nodes.
Gradient Splitting
The training client computes a local gradient update (Float32Array), serializes it as Uint8Array, generates an HMAC-SHA256 tag, pads to the next odd prime (PKCS7), and splits via XorIDA into N shares. Each share includes metadata (clientId, round, modelId, index, threshold) and is base64-encoded for transport.
import { splitGradient } from '@private.me/fedlearn'; const config = { aggregatorNodes: 3, threshold: 2, round: 0, modelId: 'fraud-v2', }; const update = { clientId: 'hospital-A', round: 0, modelId: 'fraud-v2', gradients: new Uint8Array(new Float32Array([0.1, -0.3, 0.5]).buffer), sampleCount: 1000, }; const result = await splitGradient(update, config); if (!result.ok) throw new Error(result.error.message); // result.value.shares = [share0, share1, share2] // Send share[i] to aggregator[i]
Aggregation
The aggregator node collects threshold shares from all training clients for a given round. For each client, it verifies HMAC consistency across shares (all shares must have the same HMAC tag), reconstructs the padded gradient via XorIDA, verifies the HMAC on the reconstructed data (fail closed), unpads, and deserializes to Float32Array. Finally, it computes a sample-weighted average across all clients.
import { aggregateGradients } from '@private.me/fedlearn'; // Collect shares from all clients (each client sends K shares) const clientAShares = [share0_from_agg0, share1_from_agg1]; const clientBShares = [share0_from_agg0, share1_from_agg1]; const result = await aggregateGradients( [clientAShares, clientBShares], config ); if (!result.ok) throw new Error(result.error.message); // result.value.gradients = sample-weighted average gradient // result.value.totalSamples = sum of all client sample counts // result.value.clientIds = ['hospital-A', 'hospital-B']
Integration
Fedlearn integrates with existing federated learning frameworks by replacing the gradient transmission step with XorIDA split-channel delivery.
Installation
pnpm add @private.me/fedlearn @private.me/crypto @private.me/shared
Complete Training Round
import { splitGradient, aggregateGradients } from '@private.me/fedlearn'; // ──────────────────────────────────────────── // CLIENT SIDE: Compute and split gradient // ──────────────────────────────────────────── async function clientTrainingStep(model, localData, config) { // 1. Compute local gradient update const gradientArray = computeGradient(model, localData); // 2. Serialize Float32Array → Uint8Array const gradientBytes = new Uint8Array(gradientArray.buffer); // 3. Split gradient via XorIDA const update = { clientId: 'client-123', round: config.round, modelId: config.modelId, gradients: gradientBytes, sampleCount: localData.length, }; const splitResult = await splitGradient(update, config); if (!splitResult.ok) throw new Error(splitResult.error.message); // 4. Send share[i] to aggregator[i] for (let i = 0; i < config.aggregatorNodes; i++) { await sendToAggregator(i, splitResult.value.shares[i]); } } // ──────────────────────────────────────────── // SERVER SIDE: Aggregate gradients // ──────────────────────────────────────────── async function serverAggregationStep(config) { // 1. Collect threshold shares from all clients const allClientShares = await collectSharesFromClients(config.round); // 2. Reconstruct and aggregate const aggResult = await aggregateGradients(allClientShares, config); if (!aggResult.ok) throw new Error(aggResult.error.message); // 3. Apply weighted average to global model const avgGradient = new Float32Array( aggResult.value.gradients.buffer ); applyGradientToModel(model, avgGradient); return model; }
Configuration Options
| Parameter | Type | Description |
|---|---|---|
| aggregatorNodes | number | Total number of aggregator nodes (N). Must be ≥ 2. |
| threshold | number | Minimum shares for reconstruction (K). Must be ≥ 2 and ≤ N. |
| round | number | Training round number. Must match across all updates. |
| modelId | string | Model identifier. Ensures shares from different models don't mix. |
Security
Fedlearn provides information-theoretic gradient privacy via XorIDA threshold sharing with HMAC-SHA256 integrity verification.
Information-Theoretic Security
XorIDA splits over GF(2) provide unconditional security — an attacker with access to K-1 shares (where K is the reconstruction threshold) learns zero information about the original gradient. This is not "computationally hard to break" — it is mathematically impossible, regardless of computational resources.
In a 2-of-3 configuration, compromising any single aggregator node reveals nothing. The attacker must compromise at least 2 nodes to reconstruct any gradient.
HMAC Integrity
Every gradient split generates an HMAC-SHA256 tag over the padded data. During aggregation:
- Share consistency check: All shares for a client must have identical HMAC tags (fails if shares are from different splits).
- Post-reconstruction verification: After XorIDA reconstruction, the HMAC is verified on the padded data. If verification fails, reconstruction is rejected (fail closed).
Sample-Weighted Aggregation
Aggregation computes a weighted average based on each client's sampleCount. A client that trained on 10,000 samples contributes proportionally more than a client with 100 samples. This preserves statistical validity and prevents small-sample clients from biasing the global model.
Randomness
All randomness via crypto.getRandomValues(). No Math.random() anywhere in the codebase.
Benchmarks
Performance measurements for gradient splitting and aggregation operations.
Splitting Latency
Gradient splitting latency scales linearly with gradient size. A 10,000-parameter gradient (40KB as Float32Array) splits in ~8-10ms on a modern CPU. Most of the time is spent in HMAC-SHA256 generation and XorIDA splitting.
Aggregation Latency
Aggregation latency depends on the number of clients and gradient size. For 10 clients with 10K parameters each, aggregation completes in ~150ms (15ms per client). HMAC verification and XorIDA reconstruction dominate.
Bandwidth Overhead
Share size is approximately (gradient_size / threshold) + metadata. For a 40KB gradient with 2-of-3 threshold, each share is ~20KB + ~200 bytes metadata. Total upload per client: ~60KB (3 shares × 20KB). Bandwidth overhead vs. plaintext: ~1.5x.
Honest Limitations
Fedlearn solves gradient privacy but does not address every federated learning challenge.
1. Does Not Prevent Model Poisoning
Fedlearn protects gradient privacy but does not validate gradient quality. A malicious client can submit poisoned gradients designed to degrade model performance or introduce backdoors. Defense requires additional techniques (robust aggregation, Byzantine fault tolerance, anomaly detection).
2. Requires Honest-Majority Aggregators
If K or more aggregators collude (where K is the reconstruction threshold), they can reconstruct gradients. A 2-of-3 configuration fails if any 2 aggregators are compromised. Choose N and K based on your threat model.
3. No Defense Against Sybil Attacks
Fedlearn does not authenticate clients or prevent a single adversary from registering multiple fake clients (Sybil attack). If 80% of "clients" are controlled by one adversary, gradient privacy is irrelevant — the adversary already controls the training corpus. Sybil resistance requires identity verification outside this package.
4. Bandwidth Overhead
Splitting gradients into N shares increases upload bandwidth by a factor of N. For 2-of-3 configuration, each client uploads 3 shares instead of 1 gradient. Low-bandwidth environments (mobile, IoT) may find this prohibitive.
5. No Compression
Fedlearn does not compress gradients before splitting. Gradient compression (top-k sparsification, quantization) can reduce bandwidth but must happen before splitGradient(). Compressing after splitting destroys the XorIDA shares.
6. Synchronous Aggregation Only
Aggregation waits for threshold shares from all clients before proceeding. Stragglers delay the entire round. Asynchronous federated learning (allow partial aggregation) is not supported.
Threat Model
Fedlearn defends against gradient leakage attacks under an honest-but-curious aggregator model.
Assumptions
| Assumption | Description |
|---|---|
| Honest clients | Clients execute splitGradient() correctly. Malicious clients can poison gradients (not addressed here). |
| Honest-but-curious aggregators | Aggregators follow protocol but may collude to reconstruct gradients. K-1 collusion reveals zero information. |
| Secure channels | Share transmission over TLS 1.3+. Network adversaries cannot intercept shares in transit. |
| No timing attacks | Aggregators cannot infer gradient content from timing side channels. |
Attacks Defended
- Model inversion: Attacker with K-1 shares cannot reconstruct training samples (information-theoretic guarantee).
- Membership inference: Single aggregator cannot determine if specific data was in training set.
- Gradient tampering: HMAC verification detects modified shares before reconstruction.
- Share replay: Round number prevents cross-round share reuse.
Attacks NOT Defended
- K-of-N aggregator collusion: If threshold aggregators collude, they can reconstruct gradients.
- Model poisoning: Malicious clients can submit adversarial gradients.
- Sybil attacks: Single adversary registering multiple fake clients.
- Client compromise: If client device is compromised, gradient is leaked before splitting.
Implementation Details
Low-level details for advanced integrators: error hierarchy, configuration validation, full ACI surface, and codebase statistics.
Error Hierarchy
Fedlearn exports 4 error classes for try/catch consumers.
class FedLearnError extends Error { readonly code: string; readonly subCode?: string; readonly docUrl?: string; } class FedLearnConfigError extends FedLearnError {} class FedLearnIntegrityError extends FedLearnError {} class FedLearnReconstructError extends FedLearnError {}
| Code | Class | Description |
|---|---|---|
| INVALID_CONFIG | FedLearnConfigError | aggregatorNodes < 2, threshold < 2, threshold > aggregatorNodes, round < 0, or missing modelId |
| SPLIT_FAILED | FedLearnIntegrityError | Empty gradient data or XorIDA split failure |
| HMAC_FAILED | FedLearnIntegrityError | HMAC inconsistency across shares or verification failure after reconstruction |
| RECONSTRUCT_FAILED | FedLearnReconstructError | XorIDA reconstruction or unpadding failure |
| INSUFFICIENT_SHARES | FedLearnReconstructError | Fewer shares than threshold for a client group |
| ROUND_MISMATCH | FedLearnConfigError | Gradient update round does not match config round |
Configuration
Configuration validation rules and common patterns.
Validation Rules
// All must pass: aggregatorNodes >= 2 threshold >= 2 threshold <= aggregatorNodes round >= 0 modelId !== ''
Common Configurations
| Pattern | N (nodes) | K (threshold) | Use Case |
|---|---|---|---|
| Minimal | 2 | 2 | Development, testing (no fault tolerance) |
| Standard | 3 | 2 | Production (1 node failure tolerance) |
| High Security | 5 | 3 | Sensitive data (2 node collusion required) |
| Government | 5 | 4 | Classified networks (minimal redundancy) |
Full ACI Surface
Complete public API exported by @private.me/fedlearn.
Codebase Statistics
Package metrics and test coverage.
Module Breakdown
| Module | Purpose | Lines |
|---|---|---|
| gradient-splitter.ts | XorIDA split, HMAC generation, PKCS7 padding | ~180 |
| gradient-aggregator.ts | Threshold reconstruction, HMAC verification, weighted averaging | ~200 |
| types.ts | TypeScript interfaces and error unions | ~75 |
| errors.ts | Error class hierarchy, toFedLearnError(), isFedLearnError() | ~90 |
| index.ts | Public API exports (barrel file) | ~35 |
Dependencies
| Package | Purpose |
|---|---|
| @private.me/crypto | XorIDA threshold sharing, HMAC, padding primitives |
| @private.me/shared | Result type, error utilities |
Deployment Options
SaaS Recommended
Fully managed infrastructure. Call our REST API, we handle scaling, updates, and operations.
- Zero infrastructure setup
- Automatic updates
- 99.9% uptime SLA
- Enterprise SLA available
SDK Integration
Embed directly in your application. Runs in your codebase with full programmatic control.
npm install @private.me/fedlearn- TypeScript/JavaScript SDK
- Full source access
- Enterprise support available
On-Premise Upon Request
Enterprise CLI for compliance, air-gap, or data residency requirements.
- Complete data sovereignty
- Air-gap capable deployment
- Custom SLA + dedicated support
- Professional services included
Enterprise On-Premise Deployment
While fedLearn is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:
- Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
- Air-gapped environments — SCIF, classified networks, offline operations
- Data residency requirements — EU GDPR, China data laws, government mandates
- Custom integration needs — Embed in proprietary platforms, specialized workflows
Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.