Xgene: Cross-Jurisdiction Genomic Data Sharing
Genomic data is the most sensitive personal information that exists — irreversible, lifelong, and shared with biological relatives. Xgene splits genomic datasets across distinct legal jurisdictions using XorIDA threshold sharing, ensuring no single jurisdiction holds enough data to reconstruct a complete genome. Consent-first architecture enforces authorization before any cryptographic operation.
Executive Summary
Cross-border genomic research requires data to comply with GDPR, HIPAA, PIPEDA, and APPI simultaneously. Xgene makes this mathematically possible by splitting genomes across jurisdictions so that no single regulator or court order can reconstruct the data.
Two functions cover the core workflow: splitGenome() takes a VCF/FASTQ/BAM file and consent record, validates consent covers all target jurisdictions, splits the genome into XorIDA shares (default 2MB chunks), and distributes shares to jurisdiction-specific endpoints. reconstructGenome() collects threshold shares from authorized jurisdictions, verifies HMAC integrity, and reconstructs the original genome with SHA-256 hash verification.
When an attacker compromises a single jurisdiction's storage, they learn exactly zero bits about the genome — not computationally hard to break, but information-theoretically impossible. XorIDA operates over GF(2) with threshold K. Any K-1 or fewer shares reveal zero information, regardless of computing power or quantum capability.
Consent validation is mandatory and enforced before any cryptographic operation. The system rejects split requests if consent does not explicitly authorize all target jurisdictions. This ensures regulatory compliance at the protocol level, not just policy level.
Developer Experience
Xgene provides structured error codes and Result-based error handling to make cross-border genomic sharing safe and debuggable.
Error Categories
Xgene organizes 7 error codes across 4 categories for systematic handling:
| Category | Codes | When |
|---|---|---|
| Configuration | INVALID_CONFIG | Fewer than 2 jurisdictions, threshold out of range, invalid chunk size |
| Consent | CONSENT_INVALID, JURISDICTION_MISMATCH | Missing consent fields, jurisdictions not covered by authorization |
| Cryptographic | SPLIT_FAILED, RECONSTRUCT_FAILED, HMAC_FAILED | XorIDA split/reconstruction failure, integrity check failure |
| Threshold | INSUFFICIENT_SHARES | Fewer than K shares available for chunk reconstruction |
Result Pattern
All operations return Result<T, GenomicShareError>. Every error includes a machine-readable code and human-readable message.
const result = await splitGenome(genomeData, metadata, config, consent); if (!result.ok) { switch (result.error.code) { case 'CONSENT_INVALID': console.error('Missing required consent fields'); break; case 'JURISDICTION_MISMATCH': console.error('Consent does not cover target jurisdictions'); break; case 'SPLIT_FAILED': console.error('XorIDA split operation failed'); break; } return; } // Use result.value.manifest and result.value.shares
The Problem
Genomic data sharing across jurisdictions violates privacy laws. Single-location storage creates permanent breach risk for the most sensitive personal data that exists.
Genomic data is irreversible. Unlike passwords, you cannot change your genome after a breach. A single compromise exposes lifelong health risks, ancestry, paternity, predispositions to disease, and information about biological relatives who never consented.
Cross-border research is legally paralyzed. EU biobanks cannot share with US institutions under GDPR Article 5 data minimization. Canadian researchers face PIPEDA restrictions. Japanese pharmacogenomics projects cannot pool data across APPI and HIPAA jurisdictions. Each jurisdiction demands the data stay within its borders.
Centralized storage is a single point of failure. The 23andMe breach exposed 6.9 million genetic profiles. MyHeritage leaked 92 million accounts. GEDmatch was accessed without warrants. Every centralized genomic database is a catastrophic breach waiting to happen.
Existing solutions fail. Encryption-at-rest still leaves the database server with full access to plaintext during queries. Federated learning leaks information through gradient updates. Homomorphic encryption is too slow for genome-scale computation. Differential privacy adds noise that destroys rare variant signals critical for precision medicine.
| Approach | Cross-Border | Single-Location Risk | Query Performance | Rare Variants |
|---|---|---|---|---|
| Centralized DB | Violates GDPR/PIPEDA | Single breach = total loss | Fast | Preserved |
| Federated Learning | Possible | Partial (gradient leakage) | Slow | Gradient noise |
| Homomorphic Encryption | Possible | Encrypted | 500-5000x slower | Preserved |
| Differential Privacy | Possible | Database sees plaintext | Fast | Noise destroys signal |
| Xgene | Compliant | Zero-knowledge split | Sub-millisecond split | Bit-perfect reconstruction |
Real-World Use Cases
Six scenarios where cross-jurisdiction genomic splitting enables research that is currently impossible.
Orphan diseases require pooling genomic data from multiple countries to reach statistical power. Xgene splits genomes across EU, US, and CA jurisdictions — each sees only unintelligible shares.
3-of-5 threshold, GDPR+HIPAA+PIPEDAPharmacogenomic trials require ethnic diversity impossible within single jurisdictions. Japanese APPI, EU GDPR, and US HIPAA compliance achieved via jurisdiction-aware splitting.
2-of-3 threshold, population stratification preservedNational biobanks (UK Biobank, All of Us, FinnGen) cannot legally share complete genomes. XorIDA splitting enables federated queries while each biobank retains only non-reconstructable shares.
K-of-N threshold per protocol, consent-trackedDefense genomic surveillance requires splitting classified genomic threat data across allied nations with zero single-nation reconstruction. NATO biodefense collaboration without centralized exposure.
3-of-5 threshold, JWICS-compliant distributionPharma clinical trials collect genomic data across 50+ countries. Xgene ensures each trial site retains only jurisdiction-compliant shares while enabling pooled variant analysis.
2MB chunks, VCF format, per-site consent trackingLife insurance genomic risk scoring requires genetic data that cannot be centralized due to GDPR genetic data protections. Xgene enables distributed scoring without single-database exposure.
EU+UK jurisdiction split, Article 9 special category dataSolution Architecture
Three-layer architecture: consent validation, XorIDA splitting, and jurisdiction routing.
Consent Enforcement
Consent validation happens before any cryptographic operation. This is not a checkbox to skip — it is a protocol-level requirement.
const consent: ConsentRecord = { subjectId: 'SUBJ-001', purpose: 'rare-disease-research', grantedBy: 'patient', grantedAt: new Date().toISOString(), authorizedJurisdictions: ['DE', 'CA', 'JP'], expiresAt: '2027-04-10T00:00:00Z', // Optional };
The authorizedJurisdictions array must include every jurisdiction code in the split configuration. If the config specifies shares for DE, CA, and JP, but consent only authorizes DE and CA, the operation fails with JURISDICTION_MISMATCH.
splitGenome() function calls validateConsent() and returns an error result if validation fails. No shares are created if consent is invalid. The genome never touches the XorIDA splitting function until consent passes.
Consent Audit Trail
Every GenomicManifest includes the full consent record. Reconstruction does not re-validate consent (that is the data controller's responsibility), but the manifest provides an immutable record of the original authorization.
Chunking Strategy
XorIDA operates on fixed-size chunks. Large genomes (whole genome sequencing = 100-200GB) are split into 2MB chunks by default, each independently split into shares.
Why 2MB? Trade-off between memory efficiency and parallelization. Smaller chunks (e.g., 512KB) increase manifest size and reduce per-chunk throughput. Larger chunks (e.g., 16MB) increase memory pressure and reduce parallelization opportunities. 2MB fits comfortably in Node.js default heap and allows ~100 parallel chunks on typical research servers.
| Chunk Size | 100GB Genome | Manifest Overhead | Parallelization |
|---|---|---|---|
| 512 KB | 200,000 chunks | ~120MB manifest | Excellent |
| 2 MB (default) | 50,000 chunks | ~30MB manifest | Good |
| 16 MB | 6,250 chunks | ~3.8MB manifest | Fair |
Each chunk is independently split. The manifest contains per-chunk metadata: chunk index, total chunks, original size, data hash (SHA-256 of the full genome, not per-chunk). Shares for chunk N are stored with chunkIndex: N in the GenomicShare structure.
const config: GenomicShareConfig = { jurisdictions: [ { code: 'DE', name: 'Germany', regulation: 'GDPR', endpoint: 'https://de.store' }, { code: 'US', name: 'United States', regulation: 'HIPAA', endpoint: 'https://us.store' }, ], threshold: 2, chunkSize: 8 * 1024 * 1024, // 8MB for reduced manifest size };
Integration Patterns
Three integration patterns for different genomic data workflows.
Biobank Federation
import { splitGenome } from '@private.me/genomicshare'; const vcfData = fs.readFileSync('patient-123.vcf'); const config = { jurisdictions: [ { code: 'GB', name: 'UK', regulation: 'GDPR', endpoint: 'https://uk-biobank.store' }, { code: 'US', name: 'US', regulation: 'HIPAA', endpoint: 'https://all-of-us.store' }, { code: 'FI', name: 'Finland', regulation: 'GDPR', endpoint: 'https://finngen.store' }, ], threshold: 2, }; const consent = { subjectId: 'PATIENT-123', purpose: 'cardiovascular-gwas', grantedBy: 'patient', grantedAt: new Date().toISOString(), authorizedJurisdictions: ['GB', 'US', 'FI'], }; const result = await splitGenome(vcfData, metadata, config, consent); // Shares distributed to UK Biobank, All of Us, FinnGen // Each sees only unintelligible XorIDA share
Clinical Trial Data Distribution
// Trial sites in Germany, Canada, Japan const siteJurisdictions = [ { code: 'DE', endpoint: 'https://site-berlin.trial' }, { code: 'CA', endpoint: 'https://site-toronto.trial' }, { code: 'JP', endpoint: 'https://site-tokyo.trial' }, ]; // Each site gets one share, any 2 reconstruct const config = { jurisdictions: siteJurisdictions, threshold: 2 }; // Consent must cover all trial sites const consent = { subjectId: 'TRIAL-456', purpose: 'oncology-phase-2', grantedBy: 'patient', grantedAt: trialEnrollmentDate, authorizedJurisdictions: ['DE', 'CA', 'JP'], expiresAt: trialEndDate, };
Reconstruction for Analysis
import { reconstructGenome } from '@private.me/genomicshare'; // Fetch shares from authorized jurisdictions only const sharesDE = await fetchSharesFromJurisdiction('DE', manifestId); const sharesCA = await fetchSharesFromJurisdiction('CA', manifestId); // Group shares by chunk index const sharesByChunk = groupSharesByChunk([sharesDE, sharesCA]); const result = await reconstructGenome(manifest, sharesByChunk); if (result.ok) { const genome = result.value; // Uint8Array, SHA-256 verified // Run GWAS, variant calling, etc. }
Security Properties
Five layers of protection. Each independently verifiable.
| Property | Mechanism | Guarantee |
|---|---|---|
| Consent Enforcement | Pre-crypto validation | No shares created without authorization |
| Information-Theoretic | XorIDA over GF(2) | K-1 shares reveal zero bits |
| Integrity | HMAC-SHA256 per share | Tampered shares rejected |
| End-to-End Integrity | SHA-256 genome hash | Reconstruction verified against original |
| Jurisdiction Isolation | Independent endpoints | No cross-jurisdiction data flow |
Information-Theoretic Security
XorIDA operates over GF(2) (binary field). With threshold K=2 and N=3 jurisdictions, any single jurisdiction sees a share that is cryptographically indistinguishable from random noise. Not "hard to break with current computers" — impossible to break even with infinite computing power, including quantum computers.
Comparison to Alternatives
| Property | Xgene | Encryption-at-Rest | Federated Learning |
|---|---|---|---|
| Single-location risk | Zero-knowledge split | DB admin sees plaintext | Gradient leakage |
| Cross-border compliant | Yes | No | Yes |
| Rare variant preservation | Bit-perfect | Yes | Noise destroys signal |
| Quantum-proof | Information-theoretic | Depends on algorithm | No |
| Consent enforcement | Protocol-level | Policy-level | Policy-level |
Benchmarks
Performance characteristics measured on Node.js 22, Apple M2.
| Operation | Time | Notes |
|---|---|---|
| Consent validation | <1ms | In-memory check, no I/O |
| 2MB chunk split (2-of-3) | ~66ms | XorIDA + 3× HMAC-SHA256 |
| 2MB chunk reconstruct | ~35ms | XorIDA + 2× HMAC verify + unpad |
| SHA-256 genome hash (100MB) | ~120ms | Web Crypto API |
| Whole genome (100GB, 50K chunks) | ~55 minutes | Single-threaded, no parallelization |
| Whole genome (parallel 10×) | ~6 minutes | 10-core parallelization assumed |
Honest Limitations
What Xgene cannot do, and why.
Partial Reconstruction Not Supported
If a single chunk has fewer than K shares, the entire genome reconstruction fails. For a 100GB genome with 50,000 chunks, all 50,000 chunk groups must have K shares. Missing share for chunk 1 = total failure, even if the other 49,999 chunks are complete. This is intentional — partial genomes create privacy risks (e.g., reconstructing only exome regions while leaving whole genome incomplete).
Consent Re-Validation on Reconstruction
The package validates consent before splitting but does not re-validate on reconstruction. The manifest includes the original consent record for audit purposes, but reconstruction logic does not check if consent has expired or been revoked. This is the data controller's responsibility — typically enforced at the API layer that calls reconstructGenome().
No Built-In Transport
The package does not handle HTTPS POST to jurisdiction endpoints. The splitGenome() function returns shares and a manifest. Distributing shares to jurisdiction endpoints is the caller's responsibility. This is intentional — different deployments have different network topologies (direct HTTPS, message queues, S3 presigned URLs, etc.).
Memory Usage for Large Genomes
A 100GB genome with 2MB chunks requires ~30MB of manifest metadata in memory (50,000 chunks × ~600 bytes per chunk metadata). For 200GB whole-genome sequencing, manifest overhead is ~60MB. This fits in Node.js default heap, but applications processing thousands of genomes concurrently should monitor memory usage.
No Homomorphic Computation
Xgene provides jurisdiction-isolated storage, not homomorphic computation. You cannot run GWAS directly on split shares. To analyze the genome, you must reconstruct it (requires K shares). If your threat model includes "authorized analyst must never see plaintext," consider secure multi-party computation or trusted execution environments in addition to jurisdiction splitting.
Does not solve: Computation on encrypted data, insider threats at reconstruction time, consent revocation after shares distributed.
Jurisdiction Mapping
Common regulatory frameworks mapped to ISO 3166-1 country codes.
| Jurisdiction Code | Country/Region | Regulation | Genomic Data Provisions |
|---|---|---|---|
| DE / FR / IT / ES | EU Member States | GDPR | Article 9 special category data, Article 5 data minimization, Article 46 cross-border transfers |
| US | United States | HIPAA | Protected Health Information (PHI), 45 CFR Part 160/164, genetic information as PHI |
| CA | Canada | PIPEDA | Sensitive personal information, cross-border transfers require consent |
| GB | United Kingdom | UK GDPR | Post-Brexit GDPR equivalent, genetic data special category |
| JP | Japan | APPI | Sensitive personal information requiring opt-in consent |
| CH | Switzerland | FADP | Genetic data as sensitive personal data, adequacy with EU |
| AU | Australia | Privacy Act 1988 | Sensitive information requiring higher consent standard |
Jurisdiction interface accepts any ISO 3166-1 code. For example, Singapore (SG) under PDPA, South Korea (KR) under PIPA, or Israel (IL) under Privacy Protection Law. The package does not enforce regulatory compliance — it provides the infrastructure for jurisdiction-aware splitting.
Full API Surface
Complete function signatures and types.
Error Taxonomy
Complete error code reference with descriptions and resolution guidance.
| Code | Message | Resolution |
|---|---|---|
| INVALID_CONFIG | Configuration validation failed | Ensure at least 2 jurisdictions, threshold ≥ 2, threshold ≤ N, chunk size > 0 |
| CONSENT_INVALID | Consent record missing required fields | Provide subjectId, purpose, grantedBy, grantedAt, authorizedJurisdictions |
| JURISDICTION_MISMATCH | Target jurisdictions not covered by consent | Update consent.authorizedJurisdictions to include all config.jurisdictions codes |
| SPLIT_FAILED | XorIDA split operation failed | Check input data validity, ensure chunk size divisible by share count |
| RECONSTRUCT_FAILED | XorIDA reconstruction or unpadding failed | Verify shares are from same split operation, check share integrity |
| HMAC_FAILED | HMAC verification failed after reconstruction | Shares have been tampered with or are from different split operations |
| INSUFFICIENT_SHARES | Fewer than threshold shares for chunk group | Provide at least K shares for every chunk group |
Deployment Options
SaaS Recommended
Fully managed infrastructure. Call our REST API, we handle scaling, updates, and operations.
- Zero infrastructure setup
- Automatic updates
- 99.9% uptime SLA
- Enterprise SLA available
SDK Integration
Embed directly in your application. Runs in your codebase with full programmatic control.
npm install @private.me/genomicshare- TypeScript/JavaScript SDK
- Full source access
- Enterprise support available
On-Premise Upon Request
Enterprise CLI for compliance, air-gap, or data residency requirements.
- Complete data sovereignty
- Air-gap capable deployment
- Custom SLA + dedicated support
- Professional services included
Enterprise On-Premise Deployment
While genomicShare is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:
- Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
- Air-gapped environments — SCIF, classified networks, offline operations
- Data residency requirements — EU GDPR, China data laws, government mandates
- Custom integration needs — Embed in proprietary platforms, specialized workflows
Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.