Loading...
private.me Docs
Get Fedlearn
PRIVATE.ME · Technical White Paper

Fedlearn: Gradient Privacy via XorIDA

Federated learning with information-theoretic gradient privacy. Client gradients split via XorIDA threshold sharing across multiple aggregator nodes. No single aggregator sees complete gradient updates — model inversion and membership inference attacks become mathematically impossible, not computationally hard. Zero npm dependencies.

v0.1.0 76 tests passing 5 modules 0 npm deps <10ms split Dual ESM/CJS
Section 01

Executive Summary

Federated learning allows distributed model training without sharing raw data. But gradient updates themselves leak information — model inversion attacks can reconstruct training samples, membership inference can detect if specific data was used for training.

Fedlearn splits gradient updates via XorIDA (threshold sharing over GF(2)) across multiple independent aggregator nodes. A 2-of-3 configuration means any single compromised aggregator learns zero information about the gradient — not "computationally hard to break," but mathematically impossible.

Two core functions cover the entire workflow: splitGradient() takes a client's gradient update (Float32Array serialized as Uint8Array), generates an HMAC-SHA256 integrity tag, pads to the next odd prime, and splits into N shares with K-of-N reconstruction threshold. aggregateGradients() collects threshold shares from multiple clients, reconstructs each client's gradient (HMAC verification before reconstruction, fail closed), and computes a sample-weighted average for the training round.

Zero configuration out of the box. Zero npm runtime dependencies. Runs anywhere the Web Crypto API is available — Node.js, Deno, Bun, Cloudflare Workers, browsers. Dual ESM and CJS builds ship in a single package.

Section 02

Developer Experience

Fedlearn provides structured error codes and comprehensive validation to help developers build reliable federated learning systems.

Structured Error Handling

Fedlearn uses a Result<T, E> pattern with detailed error structures. Every error includes a machine-readable code and human-readable message.

Error types
type FedLearnError =
  | { code: 'INVALID_CONFIG'; message: string }
  | { code: 'SPLIT_FAILED'; message: string }
  | { code: 'HMAC_FAILED'; message: string }
  | { code: 'RECONSTRUCT_FAILED'; message: string }
  | { code: 'INSUFFICIENT_SHARES'; message: string }
  | { code: 'ROUND_MISMATCH'; message: string };

Error Categories

Fedlearn organizes 6 error codes across 3 categories:

Category Example Codes When
Configuration INVALID_CONFIG, ROUND_MISMATCH Config validation, round consistency
Integrity SPLIT_FAILED, HMAC_FAILED XorIDA split failures, HMAC verification
Reconstruction RECONSTRUCT_FAILED, INSUFFICIENT_SHARES Share reconstruction, threshold enforcement
NAMED ERROR CLASSES
Fedlearn exports FedLearnError, FedLearnConfigError, FedLearnIntegrityError, and FedLearnReconstructError for try/catch consumers. Use toFedLearnError(code) to convert string codes to class instances.
Section 03

The Problem

Federated learning keeps raw training data on-device, but gradient updates themselves leak sensitive information about the training corpus.

Model inversion attacks. An attacker with access to gradient updates can reconstruct representative samples from the training data. In healthcare federated learning, this means patient records can be partially recovered from model updates.

Membership inference attacks. An adversary can determine whether a specific data point was used in training by analyzing gradient behavior. This violates privacy guarantees even when raw data never leaves the device.

Central aggregator is a single point of failure. Traditional federated learning routes all gradients through a central aggregation server. If that server is compromised, the attacker sees every gradient from every client — full visibility into the training corpus across all participants.

Differential privacy adds noise. DP-SGD injects Gaussian noise into gradients to provide statistical privacy. But this degrades model accuracy, requires careful hyperparameter tuning, and still relies on computational assumptions about the attacker's capabilities.

The Old Way

Client A Full gradient plaintext CENTRAL AGGREGATOR sees all gradients single point of failure Client B Full gradient MODEL INVERSION Reconstruct training samples MEMBERSHIP INFERENCE Detect specific training data GRADIENT LEAKAGE Privacy breach via updates Compromised aggregator = full visibility

The New Way

Client A splitGradient() 2-of-3 shares share 1 share 2 share 3 Aggregator 1 Aggregator 2 Aggregator 3 aggregateGradients() Reconstruct + weighted avg INFORMATION-THEORETIC Single share = zero knowledge NO MODEL INVERSION Impossible, not hard HMAC INTEGRITY HMAC before reconstruction Compromise any 1 node: learn nothing. Need K-of-N for reconstruction.
Section 04

Real-World Use Cases

Six scenarios where Fedlearn provides information-theoretic gradient privacy for federated learning deployments.

🏥
Healthcare
Cross-Hospital Model Training

Train diagnostic models across multiple hospitals without sharing patient data. Gradient shares routed to 3 aggregator nodes — any single node learns zero information about patient records.

HIPAA compliant, 2-of-3 threshold
💹
Financial
Fraud Detection

Federated fraud model training across multiple financial institutions. Transaction patterns stay private — gradient splits prevent reconstruction of customer behavior.

PCI DSS, sample-weighted aggregation
📱
Mobile
On-Device Keyboard

Train next-word prediction models across millions of devices. User typing data never reconstructible from gradients — information-theoretic privacy guarantee.

splitGradient() on-device, 2-of-3
🏛
Government
Multi-Agency Intelligence

Federated model training across intelligence agencies with classification boundaries. 3-of-5 threshold across classified and unclassified aggregators.

3-of-5, classified networks
🤖
AI / ML
Research Collaboration

Multi-institution research projects with competitive data. Each lab contributes gradients without revealing proprietary training corpus.

Academic collaboration, IP protection
📊
IoT
Edge ML Training

Distributed training across IoT sensor networks. Gradient shares routed to edge aggregators — sensor data patterns unrecoverable from individual shares.

Low bandwidth, threshold aggregation
Section 05

Solution Architecture

Two core operations: gradient splitting on training clients and threshold aggregation on aggregator nodes.

Aggregation
Server-side
Threshold reconstruction (K-of-N)
HMAC verification before reconstruction
Sample-weighted averaging
Float32Array output

Gradient Splitting

The training client computes a local gradient update (Float32Array), serializes it as Uint8Array, generates an HMAC-SHA256 tag, pads to the next odd prime (PKCS7), and splits via XorIDA into N shares. Each share includes metadata (clientId, round, modelId, index, threshold) and is base64-encoded for transport.

Client gradient splitting
import { splitGradient } from '@private.me/fedlearn';

const config = {
  aggregatorNodes: 3,
  threshold: 2,
  round: 0,
  modelId: 'fraud-v2',
};

const update = {
  clientId: 'hospital-A',
  round: 0,
  modelId: 'fraud-v2',
  gradients: new Uint8Array(new Float32Array([0.1, -0.3, 0.5]).buffer),
  sampleCount: 1000,
};

const result = await splitGradient(update, config);
if (!result.ok) throw new Error(result.error.message);

// result.value.shares = [share0, share1, share2]
// Send share[i] to aggregator[i]

Aggregation

The aggregator node collects threshold shares from all training clients for a given round. For each client, it verifies HMAC consistency across shares (all shares must have the same HMAC tag), reconstructs the padded gradient via XorIDA, verifies the HMAC on the reconstructed data (fail closed), unpads, and deserializes to Float32Array. Finally, it computes a sample-weighted average across all clients.

Server aggregation
import { aggregateGradients } from '@private.me/fedlearn';

// Collect shares from all clients (each client sends K shares)
const clientAShares = [share0_from_agg0, share1_from_agg1];
const clientBShares = [share0_from_agg0, share1_from_agg1];

const result = await aggregateGradients(
  [clientAShares, clientBShares],
  config
);

if (!result.ok) throw new Error(result.error.message);

// result.value.gradients = sample-weighted average gradient
// result.value.totalSamples = sum of all client sample counts
// result.value.clientIds = ['hospital-A', 'hospital-B']
Section 06

Integration

Fedlearn integrates with existing federated learning frameworks by replacing the gradient transmission step with XorIDA split-channel delivery.

Installation

Package installation
pnpm add @private.me/fedlearn @private.me/crypto @private.me/shared

Complete Training Round

Full federated learning round
import { splitGradient, aggregateGradients } from '@private.me/fedlearn';

// ────────────────────────────────────────────
// CLIENT SIDE: Compute and split gradient
// ────────────────────────────────────────────
async function clientTrainingStep(model, localData, config) {
  // 1. Compute local gradient update
  const gradientArray = computeGradient(model, localData);

  // 2. Serialize Float32Array → Uint8Array
  const gradientBytes = new Uint8Array(gradientArray.buffer);

  // 3. Split gradient via XorIDA
  const update = {
    clientId: 'client-123',
    round: config.round,
    modelId: config.modelId,
    gradients: gradientBytes,
    sampleCount: localData.length,
  };

  const splitResult = await splitGradient(update, config);
  if (!splitResult.ok) throw new Error(splitResult.error.message);

  // 4. Send share[i] to aggregator[i]
  for (let i = 0; i < config.aggregatorNodes; i++) {
    await sendToAggregator(i, splitResult.value.shares[i]);
  }
}

// ────────────────────────────────────────────
// SERVER SIDE: Aggregate gradients
// ────────────────────────────────────────────
async function serverAggregationStep(config) {
  // 1. Collect threshold shares from all clients
  const allClientShares = await collectSharesFromClients(config.round);

  // 2. Reconstruct and aggregate
  const aggResult = await aggregateGradients(allClientShares, config);
  if (!aggResult.ok) throw new Error(aggResult.error.message);

  // 3. Apply weighted average to global model
  const avgGradient = new Float32Array(
    aggResult.value.gradients.buffer
  );

  applyGradientToModel(model, avgGradient);
  return model;
}

Configuration Options

Parameter Type Description
aggregatorNodes number Total number of aggregator nodes (N). Must be ≥ 2.
threshold number Minimum shares for reconstruction (K). Must be ≥ 2 and ≤ N.
round number Training round number. Must match across all updates.
modelId string Model identifier. Ensures shares from different models don't mix.
Section 07

Security

Fedlearn provides information-theoretic gradient privacy via XorIDA threshold sharing with HMAC-SHA256 integrity verification.

Information-Theoretic Security

XorIDA splits over GF(2) provide unconditional security — an attacker with access to K-1 shares (where K is the reconstruction threshold) learns zero information about the original gradient. This is not "computationally hard to break" — it is mathematically impossible, regardless of computational resources.

In a 2-of-3 configuration, compromising any single aggregator node reveals nothing. The attacker must compromise at least 2 nodes to reconstruct any gradient.

HMAC Integrity

Every gradient split generates an HMAC-SHA256 tag over the padded data. During aggregation:

  1. Share consistency check: All shares for a client must have identical HMAC tags (fails if shares are from different splits).
  2. Post-reconstruction verification: After XorIDA reconstruction, the HMAC is verified on the padded data. If verification fails, reconstruction is rejected (fail closed).

Sample-Weighted Aggregation

Aggregation computes a weighted average based on each client's sampleCount. A client that trained on 10,000 samples contributes proportionally more than a client with 100 samples. This preserves statistical validity and prevents small-sample clients from biasing the global model.

Randomness

All randomness via crypto.getRandomValues(). No Math.random() anywhere in the codebase.

SECURITY GUARANTEE
Fedlearn provides information-theoretic privacy for gradients: K-1 shares reveal zero bits of information about the original gradient. This holds unconditionally, independent of computational assumptions or adversary capabilities.
Section 08

Benchmarks

Performance measurements for gradient splitting and aggregation operations.

<10ms
Split (10K params)
<15ms
Aggregate (10K params)
~1.5x
Overhead vs plaintext
0
npm deps

Splitting Latency

Gradient splitting latency scales linearly with gradient size. A 10,000-parameter gradient (40KB as Float32Array) splits in ~8-10ms on a modern CPU. Most of the time is spent in HMAC-SHA256 generation and XorIDA splitting.

Aggregation Latency

Aggregation latency depends on the number of clients and gradient size. For 10 clients with 10K parameters each, aggregation completes in ~150ms (15ms per client). HMAC verification and XorIDA reconstruction dominate.

Bandwidth Overhead

Share size is approximately (gradient_size / threshold) + metadata. For a 40KB gradient with 2-of-3 threshold, each share is ~20KB + ~200 bytes metadata. Total upload per client: ~60KB (3 shares × 20KB). Bandwidth overhead vs. plaintext: ~1.5x.

PERFORMANCE NOTE
All benchmarks measured on Node.js 20.x with Web Crypto API on a modern x64 CPU. WASM acceleration for XorIDA is a future optimization. Current TypeScript implementation is production-ready for gradients up to 1M parameters.
Section 09

Honest Limitations

Fedlearn solves gradient privacy but does not address every federated learning challenge.

1. Does Not Prevent Model Poisoning

Fedlearn protects gradient privacy but does not validate gradient quality. A malicious client can submit poisoned gradients designed to degrade model performance or introduce backdoors. Defense requires additional techniques (robust aggregation, Byzantine fault tolerance, anomaly detection).

2. Requires Honest-Majority Aggregators

If K or more aggregators collude (where K is the reconstruction threshold), they can reconstruct gradients. A 2-of-3 configuration fails if any 2 aggregators are compromised. Choose N and K based on your threat model.

3. No Defense Against Sybil Attacks

Fedlearn does not authenticate clients or prevent a single adversary from registering multiple fake clients (Sybil attack). If 80% of "clients" are controlled by one adversary, gradient privacy is irrelevant — the adversary already controls the training corpus. Sybil resistance requires identity verification outside this package.

4. Bandwidth Overhead

Splitting gradients into N shares increases upload bandwidth by a factor of N. For 2-of-3 configuration, each client uploads 3 shares instead of 1 gradient. Low-bandwidth environments (mobile, IoT) may find this prohibitive.

5. No Compression

Fedlearn does not compress gradients before splitting. Gradient compression (top-k sparsification, quantization) can reduce bandwidth but must happen before splitGradient(). Compressing after splitting destroys the XorIDA shares.

6. Synchronous Aggregation Only

Aggregation waits for threshold shares from all clients before proceeding. Stragglers delay the entire round. Asynchronous federated learning (allow partial aggregation) is not supported.

USE CASE BOUNDARIES
Fedlearn is a gradient privacy layer, not a complete federated learning framework. You still need: client selection, model distribution, learning rate scheduling, convergence detection, and secure aggregator infrastructure. This package handles gradient splitting and aggregation only.
Section 10

Threat Model

Fedlearn defends against gradient leakage attacks under an honest-but-curious aggregator model.

Assumptions

Assumption Description
Honest clients Clients execute splitGradient() correctly. Malicious clients can poison gradients (not addressed here).
Honest-but-curious aggregators Aggregators follow protocol but may collude to reconstruct gradients. K-1 collusion reveals zero information.
Secure channels Share transmission over TLS 1.3+. Network adversaries cannot intercept shares in transit.
No timing attacks Aggregators cannot infer gradient content from timing side channels.

Attacks Defended

  • Model inversion: Attacker with K-1 shares cannot reconstruct training samples (information-theoretic guarantee).
  • Membership inference: Single aggregator cannot determine if specific data was in training set.
  • Gradient tampering: HMAC verification detects modified shares before reconstruction.
  • Share replay: Round number prevents cross-round share reuse.

Attacks NOT Defended

  • K-of-N aggregator collusion: If threshold aggregators collude, they can reconstruct gradients.
  • Model poisoning: Malicious clients can submit adversarial gradients.
  • Sybil attacks: Single adversary registering multiple fake clients.
  • Client compromise: If client device is compromised, gradient is leaked before splitting.
Advanced Topics

Implementation Details

Low-level details for advanced integrators: error hierarchy, configuration validation, full ACI surface, and codebase statistics.

Appendix A1

Error Hierarchy

Fedlearn exports 4 error classes for try/catch consumers.

Error class hierarchy
class FedLearnError extends Error {
  readonly code: string;
  readonly subCode?: string;
  readonly docUrl?: string;
}

class FedLearnConfigError extends FedLearnError {}
class FedLearnIntegrityError extends FedLearnError {}
class FedLearnReconstructError extends FedLearnError {}
Code Class Description
INVALID_CONFIG FedLearnConfigError aggregatorNodes < 2, threshold < 2, threshold > aggregatorNodes, round < 0, or missing modelId
SPLIT_FAILED FedLearnIntegrityError Empty gradient data or XorIDA split failure
HMAC_FAILED FedLearnIntegrityError HMAC inconsistency across shares or verification failure after reconstruction
RECONSTRUCT_FAILED FedLearnReconstructError XorIDA reconstruction or unpadding failure
INSUFFICIENT_SHARES FedLearnReconstructError Fewer shares than threshold for a client group
ROUND_MISMATCH FedLearnConfigError Gradient update round does not match config round
Appendix A2

Configuration

Configuration validation rules and common patterns.

Validation Rules

validateConfig() checks
// All must pass:
aggregatorNodes >= 2
threshold >= 2
threshold <= aggregatorNodes
round >= 0
modelId !== ''

Common Configurations

Pattern N (nodes) K (threshold) Use Case
Minimal 2 2 Development, testing (no fault tolerance)
Standard 3 2 Production (1 node failure tolerance)
High Security 5 3 Sensitive data (2 node collusion required)
Government 5 4 Classified networks (minimal redundancy)
Appendix A3

Full ACI Surface

Complete public API exported by @private.me/fedlearn.

splitGradient(update: GradientUpdate, config: FedLearnConfig): Promise<Result<GradientSplitResult, FedLearnError>>
Split a gradient update via XorIDA for distribution to aggregator nodes. HMAC generated before splitting for integrity verification.
aggregateGradients(shares: GradientShare[][], config: FedLearnConfig): Promise<Result<AggregatedGradient, FedLearnError>>
Reconstruct gradients from threshold shares and compute weighted average. HMAC verification before reconstruction (fail closed).
validateConfig(config: FedLearnConfig): Result<true, FedLearnError>
Validate federated learning configuration. Checks aggregatorNodes, threshold, round, modelId constraints.
packHmac(key: Uint8Array, signature: Uint8Array): string
Pack HMAC key + signature into a single base64 string for share transmission.
unpackHmac(hmacB64: string): { key: Uint8Array; signature: Uint8Array }
Unpack combined HMAC string into key and signature bytes for verification.
toFedLearnError(code: string): FedLearnError
Convert a string error code to a typed FedLearnError instance. Handles colon-separated sub-codes.
isFedLearnError(value: unknown): value is FedLearnError
Type guard for FedLearnError instances.
Appendix A4

Codebase Statistics

Package metrics and test coverage.

678
Lines of code
5
Source modules
76
Test cases
2
Test files

Module Breakdown

Module Purpose Lines
gradient-splitter.ts XorIDA split, HMAC generation, PKCS7 padding ~180
gradient-aggregator.ts Threshold reconstruction, HMAC verification, weighted averaging ~200
types.ts TypeScript interfaces and error unions ~75
errors.ts Error class hierarchy, toFedLearnError(), isFedLearnError() ~90
index.ts Public API exports (barrel file) ~35

Dependencies

Package Purpose
@private.me/crypto XorIDA threshold sharing, HMAC, padding primitives
@private.me/shared Result type, error utilities

Deployment Options

📦

SDK Integration

Embed directly in your application. Runs in your codebase with full programmatic control.

  • npm install @private.me/fedlearn
  • TypeScript/JavaScript SDK
  • Full source access
  • Enterprise support available
Get Started →
🏢

On-Premise Upon Request

Enterprise CLI for compliance, air-gap, or data residency requirements.

  • Complete data sovereignty
  • Air-gap capable deployment
  • Custom SLA + dedicated support
  • Professional services included
Request Quote →

Enterprise On-Premise Deployment

While fedLearn is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:

  • Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
  • Air-gapped environments — SCIF, classified networks, offline operations
  • Data residency requirements — EU GDPR, China data laws, government mandates
  • Custom integration needs — Embed in proprietary platforms, specialized workflows

Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.

Contact sales for assessment and pricing →