Loading...
private.me Docs
Get xGene
PRIVATE.ME · Technical White Paper

Xgene: Cross-Jurisdiction Genomic Data Sharing

Genomic data is the most sensitive personal information that exists — irreversible, lifelong, and shared with biological relatives. Xgene splits genomic datasets across distinct legal jurisdictions using XorIDA threshold sharing, ensuring no single jurisdiction holds enough data to reconstruct a complete genome. Consent-first architecture enforces authorization before any cryptographic operation.

v0.1.0 Consent-enforced 2MB chunks SHA-256 integrity 0 npm deps TypeScript strict
Section 01

Executive Summary

Cross-border genomic research requires data to comply with GDPR, HIPAA, PIPEDA, and APPI simultaneously. Xgene makes this mathematically possible by splitting genomes across jurisdictions so that no single regulator or court order can reconstruct the data.

Two functions cover the core workflow: splitGenome() takes a VCF/FASTQ/BAM file and consent record, validates consent covers all target jurisdictions, splits the genome into XorIDA shares (default 2MB chunks), and distributes shares to jurisdiction-specific endpoints. reconstructGenome() collects threshold shares from authorized jurisdictions, verifies HMAC integrity, and reconstructs the original genome with SHA-256 hash verification.

When an attacker compromises a single jurisdiction's storage, they learn exactly zero bits about the genome — not computationally hard to break, but information-theoretically impossible. XorIDA operates over GF(2) with threshold K. Any K-1 or fewer shares reveal zero information, regardless of computing power or quantum capability.

Consent validation is mandatory and enforced before any cryptographic operation. The system rejects split requests if consent does not explicitly authorize all target jurisdictions. This ensures regulatory compliance at the protocol level, not just policy level.

Section 02

Developer Experience

Xgene provides structured error codes and Result-based error handling to make cross-border genomic sharing safe and debuggable.

Error Categories

Xgene organizes 7 error codes across 4 categories for systematic handling:

Category Codes When
Configuration INVALID_CONFIG Fewer than 2 jurisdictions, threshold out of range, invalid chunk size
Consent CONSENT_INVALID, JURISDICTION_MISMATCH Missing consent fields, jurisdictions not covered by authorization
Cryptographic SPLIT_FAILED, RECONSTRUCT_FAILED, HMAC_FAILED XorIDA split/reconstruction failure, integrity check failure
Threshold INSUFFICIENT_SHARES Fewer than K shares available for chunk reconstruction

Result Pattern

All operations return Result<T, GenomicShareError>. Every error includes a machine-readable code and human-readable message.

Result-based error handling
const result = await splitGenome(genomeData, metadata, config, consent);

if (!result.ok) {
  switch (result.error.code) {
    case 'CONSENT_INVALID':
      console.error('Missing required consent fields');
      break;
    case 'JURISDICTION_MISMATCH':
      console.error('Consent does not cover target jurisdictions');
      break;
    case 'SPLIT_FAILED':
      console.error('XorIDA split operation failed');
      break;
  }
  return;
}

// Use result.value.manifest and result.value.shares
Section 03

The Problem

Genomic data sharing across jurisdictions violates privacy laws. Single-location storage creates permanent breach risk for the most sensitive personal data that exists.

Genomic data is irreversible. Unlike passwords, you cannot change your genome after a breach. A single compromise exposes lifelong health risks, ancestry, paternity, predispositions to disease, and information about biological relatives who never consented.

Cross-border research is legally paralyzed. EU biobanks cannot share with US institutions under GDPR Article 5 data minimization. Canadian researchers face PIPEDA restrictions. Japanese pharmacogenomics projects cannot pool data across APPI and HIPAA jurisdictions. Each jurisdiction demands the data stay within its borders.

Centralized storage is a single point of failure. The 23andMe breach exposed 6.9 million genetic profiles. MyHeritage leaked 92 million accounts. GEDmatch was accessed without warrants. Every centralized genomic database is a catastrophic breach waiting to happen.

Existing solutions fail. Encryption-at-rest still leaves the database server with full access to plaintext during queries. Federated learning leaks information through gradient updates. Homomorphic encryption is too slow for genome-scale computation. Differential privacy adds noise that destroys rare variant signals critical for precision medicine.

Approach Cross-Border Single-Location Risk Query Performance Rare Variants
Centralized DB Violates GDPR/PIPEDA Single breach = total loss Fast Preserved
Federated Learning Possible Partial (gradient leakage) Slow Gradient noise
Homomorphic Encryption Possible Encrypted 500-5000x slower Preserved
Differential Privacy Possible Database sees plaintext Fast Noise destroys signal
Xgene Compliant Zero-knowledge split Sub-millisecond split Bit-perfect reconstruction
Section 04

Real-World Use Cases

Six scenarios where cross-jurisdiction genomic splitting enables research that is currently impossible.

🧬
Rare Disease Research
Multi-Country Rare Variant Pooling

Orphan diseases require pooling genomic data from multiple countries to reach statistical power. Xgene splits genomes across EU, US, and CA jurisdictions — each sees only unintelligible shares.

3-of-5 threshold, GDPR+HIPAA+PIPEDA
💊
Pharmacogenomics
Cross-Border Drug Response Studies

Pharmacogenomic trials require ethnic diversity impossible within single jurisdictions. Japanese APPI, EU GDPR, and US HIPAA compliance achieved via jurisdiction-aware splitting.

2-of-3 threshold, population stratification preserved
🏛
Population Health
Biobank Federation

National biobanks (UK Biobank, All of Us, FinnGen) cannot legally share complete genomes. XorIDA splitting enables federated queries while each biobank retains only non-reconstructable shares.

K-of-N threshold per protocol, consent-tracked
🛡
Defense
Military Biodefense Genomics

Defense genomic surveillance requires splitting classified genomic threat data across allied nations with zero single-nation reconstruction. NATO biodefense collaboration without centralized exposure.

3-of-5 threshold, JWICS-compliant distribution
🔬
Enterprise
Multi-Jurisdictional Clinical Trials

Pharma clinical trials collect genomic data across 50+ countries. Xgene ensures each trial site retains only jurisdiction-compliant shares while enabling pooled variant analysis.

2MB chunks, VCF format, per-site consent tracking
📊
Finance
Insurance Genomic Privacy

Life insurance genomic risk scoring requires genetic data that cannot be centralized due to GDPR genetic data protections. Xgene enables distributed scoring without single-database exposure.

EU+UK jurisdiction split, Article 9 special category data
Section 05

Solution Architecture

Three-layer architecture: consent validation, XorIDA splitting, and jurisdiction routing.

Splitting Layer
XorIDA GF(2)
2MB default chunk size
PKCS#7 padding for alignment
HMAC-SHA256 per share
Share indices 1-based
Jurisdiction Routing
ISO 3166-1
Share[i] → Jurisdiction[i]
HTTPS POST to jurisdiction endpoint
GDPR, HIPAA, PIPEDA, APPI aware
Integrity Layer
SHA-256
Hash original genome before split
Store hash in manifest
Verify after reconstruction
Reject if mismatch
Genome VCF/FASTQ 2MB chunks Consent Validate XorIDA GF(2) Split DE (GDPR) CA (PIPEDA) JP (APPI) Reconstruct K-of-N Genome Restored SHA-256 ✓ Any 2 shares reconstruct. Single jurisdiction compromise = zero information.
Section 05b

Chunking Strategy

XorIDA operates on fixed-size chunks. Large genomes (whole genome sequencing = 100-200GB) are split into 2MB chunks by default, each independently split into shares.

Why 2MB? Trade-off between memory efficiency and parallelization. Smaller chunks (e.g., 512KB) increase manifest size and reduce per-chunk throughput. Larger chunks (e.g., 16MB) increase memory pressure and reduce parallelization opportunities. 2MB fits comfortably in Node.js default heap and allows ~100 parallel chunks on typical research servers.

Chunk Size 100GB Genome Manifest Overhead Parallelization
512 KB 200,000 chunks ~120MB manifest Excellent
2 MB (default) 50,000 chunks ~30MB manifest Good
16 MB 6,250 chunks ~3.8MB manifest Fair

Each chunk is independently split. The manifest contains per-chunk metadata: chunk index, total chunks, original size, data hash (SHA-256 of the full genome, not per-chunk). Shares for chunk N are stored with chunkIndex: N in the GenomicShare structure.

Custom chunk size
const config: GenomicShareConfig = {
  jurisdictions: [
    { code: 'DE', name: 'Germany', regulation: 'GDPR', endpoint: 'https://de.store' },
    { code: 'US', name: 'United States', regulation: 'HIPAA', endpoint: 'https://us.store' },
  ],
  threshold: 2,
  chunkSize: 8 * 1024 * 1024, // 8MB for reduced manifest size
};
Reconstruction requires all chunks
Reconstruction fails if any chunk has fewer than K shares. For a 100GB genome with 50,000 chunks (2MB each), all 50,000 chunk groups must have at least K shares. A missing share for chunk 42,731 prevents full genome reconstruction even if the other 49,999 chunks are complete. This is an intentional fail-closed design — partial genome reconstruction is not supported.
Section 06

Integration Patterns

Three integration patterns for different genomic data workflows.

Biobank Federation

Multi-country biobank split
import { splitGenome } from '@private.me/genomicshare';

const vcfData = fs.readFileSync('patient-123.vcf');

const config = {
  jurisdictions: [
    { code: 'GB', name: 'UK', regulation: 'GDPR', endpoint: 'https://uk-biobank.store' },
    { code: 'US', name: 'US', regulation: 'HIPAA', endpoint: 'https://all-of-us.store' },
    { code: 'FI', name: 'Finland', regulation: 'GDPR', endpoint: 'https://finngen.store' },
  ],
  threshold: 2,
};

const consent = {
  subjectId: 'PATIENT-123',
  purpose: 'cardiovascular-gwas',
  grantedBy: 'patient',
  grantedAt: new Date().toISOString(),
  authorizedJurisdictions: ['GB', 'US', 'FI'],
};

const result = await splitGenome(vcfData, metadata, config, consent);
// Shares distributed to UK Biobank, All of Us, FinnGen
// Each sees only unintelligible XorIDA share

Clinical Trial Data Distribution

Per-site jurisdiction assignment
// Trial sites in Germany, Canada, Japan
const siteJurisdictions = [
  { code: 'DE', endpoint: 'https://site-berlin.trial' },
  { code: 'CA', endpoint: 'https://site-toronto.trial' },
  { code: 'JP', endpoint: 'https://site-tokyo.trial' },
];

// Each site gets one share, any 2 reconstruct
const config = { jurisdictions: siteJurisdictions, threshold: 2 };

// Consent must cover all trial sites
const consent = {
  subjectId: 'TRIAL-456',
  purpose: 'oncology-phase-2',
  grantedBy: 'patient',
  grantedAt: trialEnrollmentDate,
  authorizedJurisdictions: ['DE', 'CA', 'JP'],
  expiresAt: trialEndDate,
};

Reconstruction for Analysis

Authorized researcher reconstruction
import { reconstructGenome } from '@private.me/genomicshare';

// Fetch shares from authorized jurisdictions only
const sharesDE = await fetchSharesFromJurisdiction('DE', manifestId);
const sharesCA = await fetchSharesFromJurisdiction('CA', manifestId);

// Group shares by chunk index
const sharesByChunk = groupSharesByChunk([sharesDE, sharesCA]);

const result = await reconstructGenome(manifest, sharesByChunk);
if (result.ok) {
  const genome = result.value; // Uint8Array, SHA-256 verified
  // Run GWAS, variant calling, etc.
}
Section 07

Security Properties

Five layers of protection. Each independently verifiable.

Property Mechanism Guarantee
Consent Enforcement Pre-crypto validation No shares created without authorization
Information-Theoretic XorIDA over GF(2) K-1 shares reveal zero bits
Integrity HMAC-SHA256 per share Tampered shares rejected
End-to-End Integrity SHA-256 genome hash Reconstruction verified against original
Jurisdiction Isolation Independent endpoints No cross-jurisdiction data flow

Information-Theoretic Security

XorIDA operates over GF(2) (binary field). With threshold K=2 and N=3 jurisdictions, any single jurisdiction sees a share that is cryptographically indistinguishable from random noise. Not "hard to break with current computers" — impossible to break even with infinite computing power, including quantum computers.

Quantum-proof by construction
Information-theoretic security does not rely on computational hardness assumptions (factoring, discrete log, lattice problems). It is based on entropy — a single share has zero Shannon information about the genome. Quantum computers provide no advantage against information-theoretic security.

Comparison to Alternatives

Property Xgene Encryption-at-Rest Federated Learning
Single-location risk Zero-knowledge split DB admin sees plaintext Gradient leakage
Cross-border compliant Yes No Yes
Rare variant preservation Bit-perfect Yes Noise destroys signal
Quantum-proof Information-theoretic Depends on algorithm No
Consent enforcement Protocol-level Policy-level Policy-level
Section 08

Benchmarks

Performance characteristics measured on Node.js 22, Apple M2.

<1ms
Per 1KB chunk split
~33ms
Per 1MB chunk split
2MB
Default chunk size
0
npm dependencies
Operation Time Notes
Consent validation <1ms In-memory check, no I/O
2MB chunk split (2-of-3) ~66ms XorIDA + 3× HMAC-SHA256
2MB chunk reconstruct ~35ms XorIDA + 2× HMAC verify + unpad
SHA-256 genome hash (100MB) ~120ms Web Crypto API
Whole genome (100GB, 50K chunks) ~55 minutes Single-threaded, no parallelization
Whole genome (parallel 10×) ~6 minutes 10-core parallelization assumed
Parallelization opportunity
Each chunk is independently split. Production implementations should parallelize across CPU cores. With 10-core parallelization, a 100GB genome (50,000 × 2MB chunks) splits in approximately 6 minutes. The package does not provide built-in parallelization — use worker threads or cluster module.
Section 09

Honest Limitations

What Xgene cannot do, and why.

Partial Reconstruction Not Supported

If a single chunk has fewer than K shares, the entire genome reconstruction fails. For a 100GB genome with 50,000 chunks, all 50,000 chunk groups must have K shares. Missing share for chunk 1 = total failure, even if the other 49,999 chunks are complete. This is intentional — partial genomes create privacy risks (e.g., reconstructing only exome regions while leaving whole genome incomplete).

Consent Re-Validation on Reconstruction

The package validates consent before splitting but does not re-validate on reconstruction. The manifest includes the original consent record for audit purposes, but reconstruction logic does not check if consent has expired or been revoked. This is the data controller's responsibility — typically enforced at the API layer that calls reconstructGenome().

No Built-In Transport

The package does not handle HTTPS POST to jurisdiction endpoints. The splitGenome() function returns shares and a manifest. Distributing shares to jurisdiction endpoints is the caller's responsibility. This is intentional — different deployments have different network topologies (direct HTTPS, message queues, S3 presigned URLs, etc.).

Memory Usage for Large Genomes

A 100GB genome with 2MB chunks requires ~30MB of manifest metadata in memory (50,000 chunks × ~600 bytes per chunk metadata). For 200GB whole-genome sequencing, manifest overhead is ~60MB. This fits in Node.js default heap, but applications processing thousands of genomes concurrently should monitor memory usage.

No Homomorphic Computation

Xgene provides jurisdiction-isolated storage, not homomorphic computation. You cannot run GWAS directly on split shares. To analyze the genome, you must reconstruct it (requires K shares). If your threat model includes "authorized analyst must never see plaintext," consider secure multi-party computation or trusted execution environments in addition to jurisdiction splitting.

What this solves vs. what it does not
Solves: Cross-border compliance, jurisdiction-isolated storage, catastrophic breach prevention, consent enforcement.
Does not solve: Computation on encrypted data, insider threats at reconstruction time, consent revocation after shares distributed.
Appendix A1

Jurisdiction Mapping

Common regulatory frameworks mapped to ISO 3166-1 country codes.

Jurisdiction Code Country/Region Regulation Genomic Data Provisions
DE / FR / IT / ES EU Member States GDPR Article 9 special category data, Article 5 data minimization, Article 46 cross-border transfers
US United States HIPAA Protected Health Information (PHI), 45 CFR Part 160/164, genetic information as PHI
CA Canada PIPEDA Sensitive personal information, cross-border transfers require consent
GB United Kingdom UK GDPR Post-Brexit GDPR equivalent, genetic data special category
JP Japan APPI Sensitive personal information requiring opt-in consent
CH Switzerland FADP Genetic data as sensitive personal data, adequacy with EU
AU Australia Privacy Act 1988 Sensitive information requiring higher consent standard
Custom jurisdictions supported
The Jurisdiction interface accepts any ISO 3166-1 code. For example, Singapore (SG) under PDPA, South Korea (KR) under PIPA, or Israel (IL) under Privacy Protection Law. The package does not enforce regulatory compliance — it provides the infrastructure for jurisdiction-aware splitting.
Appendix A2

Full API Surface

Complete function signatures and types.

splitGenome(data: Uint8Array, metadata: GenomicMetadata, config: GenomicShareConfig, consent: ConsentRecord): Promise<Result<GenomicSplitResult, GenomicShareError>>
Split genomic data across jurisdictions with consent validation. Returns manifest and shares grouped by chunk.
reconstructGenome(manifest: GenomicManifest, shares: GenomicShare[][]): Promise<Result<Uint8Array, GenomicShareError>>
Reconstruct genome from threshold shares. Verifies HMAC per share and SHA-256 hash after reconstruction.
validateConfig(config: GenomicShareConfig): Result<never, GenomicShareError> | null
Validate configuration before splitting. Returns null if valid, error result otherwise.
validateConsent(consent: ConsentRecord, jurisdictions: string[]): Result<never, GenomicShareError> | null
Validate consent covers all target jurisdictions. Returns null if valid, error result otherwise.
DEFAULT_CHUNK_SIZE: number
Constant: 2MB (2 * 1024 * 1024 bytes). Used when config.chunkSize is not specified.
Appendix A3

Error Taxonomy

Complete error code reference with descriptions and resolution guidance.

Code Message Resolution
INVALID_CONFIG Configuration validation failed Ensure at least 2 jurisdictions, threshold ≥ 2, threshold ≤ N, chunk size > 0
CONSENT_INVALID Consent record missing required fields Provide subjectId, purpose, grantedBy, grantedAt, authorizedJurisdictions
JURISDICTION_MISMATCH Target jurisdictions not covered by consent Update consent.authorizedJurisdictions to include all config.jurisdictions codes
SPLIT_FAILED XorIDA split operation failed Check input data validity, ensure chunk size divisible by share count
RECONSTRUCT_FAILED XorIDA reconstruction or unpadding failed Verify shares are from same split operation, check share integrity
HMAC_FAILED HMAC verification failed after reconstruction Shares have been tampered with or are from different split operations
INSUFFICIENT_SHARES Fewer than threshold shares for chunk group Provide at least K shares for every chunk group

Deployment Options

📦

SDK Integration

Embed directly in your application. Runs in your codebase with full programmatic control.

  • npm install @private.me/genomicshare
  • TypeScript/JavaScript SDK
  • Full source access
  • Enterprise support available
Get Started →
🏢

On-Premise Upon Request

Enterprise CLI for compliance, air-gap, or data residency requirements.

  • Complete data sovereignty
  • Air-gap capable deployment
  • Custom SLA + dedicated support
  • Professional services included
Request Quote →

Enterprise On-Premise Deployment

While genomicShare is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:

  • Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
  • Air-gapped environments — SCIF, classified networks, offline operations
  • Data residency requirements — EU GDPR, China data laws, government mandates
  • Custom integration needs — Embed in proprietary platforms, specialized workflows

Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.

Contact sales for assessment and pricing →