Xgene: Cross-Jurisdiction Genomic Data Sharing

Section 01

Executive Summary

Cross-border genomic research requires data to comply with GDPR, HIPAA, PIPEDA, and APPI simultaneously. Xgene makes this mathematically possible by splitting genomes across jurisdictions so that no single regulator or court order can reconstruct the data.

Two functions cover the core workflow: splitGenome() takes a VCF/FASTQ/BAM file and consent record, validates consent covers all target jurisdictions, splits the genome into XorIDA shares (default 2MB chunks), and distributes shares to jurisdiction-specific endpoints. reconstructGenome() collects threshold shares from authorized jurisdictions, verifies HMAC integrity, and reconstructs the original genome with SHA-256 hash verification.

When an attacker compromises a single jurisdiction's storage, they learn exactly zero bits about the genome — not computationally hard to break, but information-theoretically impossible. XorIDA operates over GF(2) with threshold K. Any K-1 or fewer shares reveal zero information, regardless of computing power or quantum capability.

Consent validation is mandatory and enforced before any cryptographic operation. The system rejects split requests if consent does not explicitly authorize all target jurisdictions. This ensures regulatory compliance at the protocol level, not just policy level.

Section 02

Developer Experience

Xgene provides structured error codes and Result-based error handling to make cross-border genomic sharing safe and debuggable.

Error Categories

Xgene organizes 7 error codes across 4 categories for systematic handling:

Category	Codes	When
Configuration	INVALID_CONFIG	Fewer than 2 jurisdictions, threshold out of range, invalid chunk size
Consent	CONSENT_INVALID, JURISDICTION_MISMATCH	Missing consent fields, jurisdictions not covered by authorization
Cryptographic	SPLIT_FAILED, RECONSTRUCT_FAILED, HMAC_FAILED	XorIDA split/reconstruction failure, integrity check failure
Threshold	INSUFFICIENT_SHARES	Fewer than K shares available for chunk reconstruction

Result Pattern

All operations return Result<T, GenomicShareError>. Every error includes a machine-readable code and human-readable message.

Result-based error handling

const result = await splitGenome(genomeData, metadata, config, consent);

if (!result.ok) {
  switch (result.error.code) {
    case 'CONSENT_INVALID':
      console.error('Missing required consent fields');
      break;
    case 'JURISDICTION_MISMATCH':
      console.error('Consent does not cover target jurisdictions');
      break;
    case 'SPLIT_FAILED':
      console.error('XorIDA split operation failed');
      break;
  }
  return;
}

// Use result.value.manifest and result.value.shares

Section 03

The Problem

Genomic data sharing across jurisdictions violates privacy laws. Single-location storage creates permanent breach risk for the most sensitive personal data that exists.

Genomic data is irreversible. Unlike passwords, you cannot change your genome after a breach. A single compromise exposes lifelong health risks, ancestry, paternity, predispositions to disease, and information about biological relatives who never consented.

Cross-border research is legally paralyzed. EU biobanks cannot share with US institutions under GDPR Article 5 data minimization. Canadian researchers face PIPEDA restrictions. Japanese pharmacogenomics projects cannot pool data across APPI and HIPAA jurisdictions. Each jurisdiction demands the data stay within its borders.

Centralized storage is a single point of failure. The 23andMe breach exposed 6.9 million genetic profiles. MyHeritage leaked 92 million accounts. GEDmatch was accessed without warrants. Every centralized genomic database is a catastrophic breach waiting to happen.

Existing solutions fail. Encryption-at-rest still leaves the database server with full access to plaintext during queries. Federated learning leaks information through gradient updates. Homomorphic encryption is too slow for genome-scale computation. Differential privacy adds noise that destroys rare variant signals critical for precision medicine.

Approach	Cross-Border	Single-Location Risk	Query Performance	Rare Variants
Centralized DB	Violates GDPR/PIPEDA	Single breach = total loss	Fast	Preserved
Federated Learning	Possible	Partial (gradient leakage)	Slow	Gradient noise
Homomorphic Encryption	Possible	Encrypted	500-5000x slower	Preserved
Differential Privacy	Possible	Database sees plaintext	Fast	Noise destroys signal
Xgene	Compliant	Zero-knowledge split	Sub-millisecond split	Bit-perfect reconstruction

Section 04

Real-World Use Cases

Six scenarios where cross-jurisdiction genomic splitting enables research that is currently impossible.

Rare Disease Research

Multi-Country Rare Variant Pooling

Orphan diseases require pooling genomic data from multiple countries to reach statistical power. Xgene splits genomes across EU, US, and CA jurisdictions — each sees only unintelligible shares.

3-of-5 threshold, GDPR+HIPAA+PIPEDA

Pharmacogenomics

Cross-Border Drug Response Studies

Pharmacogenomic trials require ethnic diversity impossible within single jurisdictions. Japanese APPI, EU GDPR, and US HIPAA compliance achieved via jurisdiction-aware splitting.

2-of-3 threshold, population stratification preserved

Population Health

Biobank Federation

National biobanks (UK Biobank, All of Us, FinnGen) cannot legally share complete genomes. XorIDA splitting enables federated queries while each biobank retains only non-reconstructable shares.

K-of-N threshold per protocol, consent-tracked

Defense

Military Biodefense Genomics

Defense genomic surveillance requires splitting classified genomic threat data across allied nations with zero single-nation reconstruction. NATO biodefense collaboration without centralized exposure.

3-of-5 threshold, JWICS-compliant distribution

Enterprise

Multi-Jurisdictional Clinical Trials

Pharma clinical trials collect genomic data across 50+ countries. Xgene ensures each trial site retains only jurisdiction-compliant shares while enabling pooled variant analysis.

2MB chunks, VCF format, per-site consent tracking

Finance

Insurance Genomic Privacy

Life insurance genomic risk scoring requires genetic data that cannot be centralized due to GDPR genetic data protections. Xgene enables distributed scoring without single-database exposure.

EU+UK jurisdiction split, Article 9 special category data

Section 05

Solution Architecture

Three-layer architecture: consent validation, XorIDA splitting, and jurisdiction routing.

Consent Layer

Mandatory

Validates before any crypto operation

Checks jurisdiction authorization list

Records grantedBy, grantedAt, purpose

Optional expiration enforcement

Splitting Layer

XorIDA GF(2)

2MB default chunk size

PKCS#7 padding for alignment

HMAC-SHA256 per share

Share indices 1-based

Jurisdiction Routing

ISO 3166-1

Share[i] → Jurisdiction[i]

HTTPS POST to jurisdiction endpoint

GDPR, HIPAA, PIPEDA, APPI aware

Integrity Layer

SHA-256

Hash original genome before split

Store hash in manifest

Verify after reconstruction

Reject if mismatch

Section 05a

Consent Enforcement

Consent validation happens before any cryptographic operation. This is not a checkbox to skip — it is a protocol-level requirement.

Consent record structure

const consent: ConsentRecord = {
  subjectId: 'SUBJ-001',
  purpose: 'rare-disease-research',
  grantedBy: 'patient',
  grantedAt: new Date().toISOString(),
  authorizedJurisdictions: ['DE', 'CA', 'JP'],
  expiresAt: '2027-04-10T00:00:00Z', // Optional
};

The authorizedJurisdictions array must include every jurisdiction code in the split configuration. If the config specifies shares for DE, CA, and JP, but consent only authorizes DE and CA, the operation fails with JURISDICTION_MISMATCH.

Consent is cryptographically enforced

This is not a UI warning. The splitGenome() function calls validateConsent() and returns an error result if validation fails. No shares are created if consent is invalid. The genome never touches the XorIDA splitting function until consent passes.

Consent Audit Trail

Every GenomicManifest includes the full consent record. Reconstruction does not re-validate consent (that is the data controller's responsibility), but the manifest provides an immutable record of the original authorization.

Section 05b

Chunking Strategy

XorIDA operates on fixed-size chunks. Large genomes (whole genome sequencing = 100-200GB) are split into 2MB chunks by default, each independently split into shares.

Why 2MB? Trade-off between memory efficiency and parallelization. Smaller chunks (e.g., 512KB) increase manifest size and reduce per-chunk throughput. Larger chunks (e.g., 16MB) increase memory pressure and reduce parallelization opportunities. 2MB fits comfortably in Node.js default heap and allows ~100 parallel chunks on typical research servers.

Chunk Size	100GB Genome	Manifest Overhead	Parallelization
512 KB	200,000 chunks	~120MB manifest	Excellent
2 MB (default)	50,000 chunks	~30MB manifest	Good
16 MB	6,250 chunks	~3.8MB manifest	Fair

Each chunk is independently split. The manifest contains per-chunk metadata: chunk index, total chunks, original size, data hash (SHA-256 of the full genome, not per-chunk). Shares for chunk N are stored with chunkIndex: N in the GenomicShare structure.

Custom chunk size

const config: GenomicShareConfig = {
  jurisdictions: [
    { code: 'DE', name: 'Germany', regulation: 'GDPR', endpoint: 'https://de.store' },
    { code: 'US', name: 'United States', regulation: 'HIPAA', endpoint: 'https://us.store' },
  ],
  threshold: 2,
  chunkSize: 8 * 1024 * 1024, // 8MB for reduced manifest size
};

Reconstruction requires all chunks

Reconstruction fails if any chunk has fewer than K shares. For a 100GB genome with 50,000 chunks (2MB each), all 50,000 chunk groups must have at least K shares. A missing share for chunk 42,731 prevents full genome reconstruction even if the other 49,999 chunks are complete. This is an intentional fail-closed design — partial genome reconstruction is not supported.

Section 06

Integration Patterns

Three integration patterns for different genomic data workflows.

Biobank Federation

Multi-country biobank split

import { splitGenome } from '@private.me/genomicshare';

const vcfData = fs.readFileSync('patient-123.vcf');

const config = {
  jurisdictions: [
    { code: 'GB', name: 'UK', regulation: 'GDPR', endpoint: 'https://uk-biobank.store' },
    { code: 'US', name: 'US', regulation: 'HIPAA', endpoint: 'https://all-of-us.store' },
    { code: 'FI', name: 'Finland', regulation: 'GDPR', endpoint: 'https://finngen.store' },
  ],
  threshold: 2,
};

const consent = {
  subjectId: 'PATIENT-123',
  purpose: 'cardiovascular-gwas',
  grantedBy: 'patient',
  grantedAt: new Date().toISOString(),
  authorizedJurisdictions: ['GB', 'US', 'FI'],
};

const result = await splitGenome(vcfData, metadata, config, consent);
// Shares distributed to UK Biobank, All of Us, FinnGen
// Each sees only unintelligible XorIDA share

Clinical Trial Data Distribution

Per-site jurisdiction assignment

// Trial sites in Germany, Canada, Japan
const siteJurisdictions = [
  { code: 'DE', endpoint: 'https://site-berlin.trial' },
  { code: 'CA', endpoint: 'https://site-toronto.trial' },
  { code: 'JP', endpoint: 'https://site-tokyo.trial' },
];

// Each site gets one share, any 2 reconstruct
const config = { jurisdictions: siteJurisdictions, threshold: 2 };

// Consent must cover all trial sites
const consent = {
  subjectId: 'TRIAL-456',
  purpose: 'oncology-phase-2',
  grantedBy: 'patient',
  grantedAt: trialEnrollmentDate,
  authorizedJurisdictions: ['DE', 'CA', 'JP'],
  expiresAt: trialEndDate,
};

Reconstruction for Analysis

Authorized researcher reconstruction

import { reconstructGenome } from '@private.me/genomicshare';

// Fetch shares from authorized jurisdictions only
const sharesDE = await fetchSharesFromJurisdiction('DE', manifestId);
const sharesCA = await fetchSharesFromJurisdiction('CA', manifestId);

// Group shares by chunk index
const sharesByChunk = groupSharesByChunk([sharesDE, sharesCA]);

const result = await reconstructGenome(manifest, sharesByChunk);
if (result.ok) {
  const genome = result.value; // Uint8Array, SHA-256 verified
  // Run GWAS, variant calling, etc.
}

Section 07

Security Properties

Five layers of protection. Each independently verifiable.

Property	Mechanism	Guarantee
Consent Enforcement	Pre-crypto validation	No shares created without authorization
Information-Theoretic	XorIDA over GF(2)	K-1 shares reveal zero bits
Integrity	HMAC-SHA256 per share	Tampered shares rejected
End-to-End Integrity	SHA-256 genome hash	Reconstruction verified against original
Jurisdiction Isolation	Independent endpoints	No cross-jurisdiction data flow

Information-Theoretic Security

XorIDA operates over GF(2) (binary field). With threshold K=2 and N=3 jurisdictions, any single jurisdiction sees a share that is cryptographically indistinguishable from random noise. Not "hard to break with current computers" — impossible to break even with infinite computing power, including quantum computers.

Quantum-proof by construction

Information-theoretic security does not rely on computational hardness assumptions (factoring, discrete log, lattice problems). It is based on entropy — a single share has zero Shannon information about the genome. Quantum computers provide no advantage against information-theoretic security.

Comparison to Alternatives

Property	Xgene	Encryption-at-Rest	Federated Learning
Single-location risk	Zero-knowledge split	DB admin sees plaintext	Gradient leakage
Cross-border compliant	Yes	No	Yes
Rare variant preservation	Bit-perfect	Yes	Noise destroys signal
Quantum-proof	Information-theoretic	Depends on algorithm	No
Consent enforcement	Protocol-level	Policy-level	Policy-level

Section 08

Benchmarks

Performance characteristics measured on Node.js 22, Apple M2.

<1ms

Per 1KB chunk split

~33ms

Per 1MB chunk split

2MB

Default chunk size

0

npm dependencies

Operation	Time	Notes
Consent validation	<1ms	In-memory check, no I/O
2MB chunk split (2-of-3)	~66ms	XorIDA + 3× HMAC-SHA256
2MB chunk reconstruct	~35ms	XorIDA + 2× HMAC verify + unpad
SHA-256 genome hash (100MB)	~120ms	Web Crypto API
Whole genome (100GB, 50K chunks)	~55 minutes	Single-threaded, no parallelization
Whole genome (parallel 10×)	~6 minutes	10-core parallelization assumed

Parallelization opportunity

Each chunk is independently split. Production implementations should parallelize across CPU cores. With 10-core parallelization, a 100GB genome (50,000 × 2MB chunks) splits in approximately 6 minutes. The package does not provide built-in parallelization — use worker threads or cluster module.

Section 09

Honest Limitations

What Xgene cannot do, and why.

Partial Reconstruction Not Supported

If a single chunk has fewer than K shares, the entire genome reconstruction fails. For a 100GB genome with 50,000 chunks, all 50,000 chunk groups must have K shares. Missing share for chunk 1 = total failure, even if the other 49,999 chunks are complete. This is intentional — partial genomes create privacy risks (e.g., reconstructing only exome regions while leaving whole genome incomplete).

Consent Re-Validation on Reconstruction

The package validates consent before splitting but does not re-validate on reconstruction. The manifest includes the original consent record for audit purposes, but reconstruction logic does not check if consent has expired or been revoked. This is the data controller's responsibility — typically enforced at the API layer that calls reconstructGenome().

No Built-In Transport

The package does not handle HTTPS POST to jurisdiction endpoints. The splitGenome() function returns shares and a manifest. Distributing shares to jurisdiction endpoints is the caller's responsibility. This is intentional — different deployments have different network topologies (direct HTTPS, message queues, S3 presigned URLs, etc.).

Memory Usage for Large Genomes

A 100GB genome with 2MB chunks requires ~30MB of manifest metadata in memory (50,000 chunks × ~600 bytes per chunk metadata). For 200GB whole-genome sequencing, manifest overhead is ~60MB. This fits in Node.js default heap, but applications processing thousands of genomes concurrently should monitor memory usage.

No Homomorphic Computation

Xgene provides jurisdiction-isolated storage, not homomorphic computation. You cannot run GWAS directly on split shares. To analyze the genome, you must reconstruct it (requires K shares). If your threat model includes "authorized analyst must never see plaintext," consider secure multi-party computation or trusted execution environments in addition to jurisdiction splitting.

What this solves vs. what it does not

Solves: Cross-border compliance, jurisdiction-isolated storage, catastrophic breach prevention, consent enforcement.
Does not solve: Computation on encrypted data, insider threats at reconstruction time, consent revocation after shares distributed.

Jurisdiction Code	Country/Region	Regulation	Genomic Data Provisions
DE / FR / IT / ES	EU Member States	GDPR	Article 9 special category data, Article 5 data minimization, Article 46 cross-border transfers
US	United States	HIPAA	Protected Health Information (PHI), 45 CFR Part 160/164, genetic information as PHI
CA	Canada	PIPEDA	Sensitive personal information, cross-border transfers require consent
GB	United Kingdom	UK GDPR	Post-Brexit GDPR equivalent, genetic data special category
JP	Japan	APPI	Sensitive personal information requiring opt-in consent
CH	Switzerland	FADP	Genetic data as sensitive personal data, adequacy with EU
AU	Australia	Privacy Act 1988	Sensitive information requiring higher consent standard

Custom jurisdictions supported

The Jurisdiction interface accepts any ISO 3166-1 code. For example, Singapore (SG) under PDPA, South Korea (KR) under PIPA, or Israel (IL) under Privacy Protection Law. The package does not enforce regulatory compliance — it provides the infrastructure for jurisdiction-aware splitting.

splitGenome(data: Uint8Array, metadata: GenomicMetadata, config: GenomicShareConfig, consent: ConsentRecord): Promise<Result<GenomicSplitResult, GenomicShareError>>

Split genomic data across jurisdictions with consent validation. Returns manifest and shares grouped by chunk.

reconstructGenome(manifest: GenomicManifest, shares: GenomicShare[][]): Promise<Result<Uint8Array, GenomicShareError>>

Reconstruct genome from threshold shares. Verifies HMAC per share and SHA-256 hash after reconstruction.

validateConfig(config: GenomicShareConfig): Result<never, GenomicShareError> | null

Validate configuration before splitting. Returns null if valid, error result otherwise.

validateConsent(consent: ConsentRecord, jurisdictions: string[]): Result<never, GenomicShareError> | null

Validate consent covers all target jurisdictions. Returns null if valid, error result otherwise.

DEFAULT_CHUNK_SIZE: number

Constant: 2MB (2 * 1024 * 1024 bytes). Used when config.chunkSize is not specified.

Code	Message	Resolution
INVALID_CONFIG	Configuration validation failed	Ensure at least 2 jurisdictions, threshold ≥ 2, threshold ≤ N, chunk size > 0
CONSENT_INVALID	Consent record missing required fields	Provide subjectId, purpose, grantedBy, grantedAt, authorizedJurisdictions
JURISDICTION_MISMATCH	Target jurisdictions not covered by consent	Update consent.authorizedJurisdictions to include all config.jurisdictions codes
SPLIT_FAILED	XorIDA split operation failed	Check input data validity, ensure chunk size divisible by share count
RECONSTRUCT_FAILED	XorIDA reconstruction or unpadding failed	Verify shares are from same split operation, check share integrity
HMAC_FAILED	HMAC verification failed after reconstruction	Shares have been tampered with or are from different split operations
INSUFFICIENT_SHARES	Fewer than threshold shares for chunk group	Provide at least K shares for every chunk group

Xgene: Cross-Jurisdiction Genomic Data Sharing

Executive Summary

Developer Experience

Error Categories

Result Pattern

The Problem

Real-World Use Cases

Solution Architecture

Chunking Strategy

Integration Patterns

Biobank Federation

Clinical Trial Data Distribution

Reconstruction for Analysis

Security Properties

Information-Theoretic Security

Comparison to Alternatives

Benchmarks

Honest Limitations

Partial Reconstruction Not Supported

Consent Re-Validation on Reconstruction

No Built-In Transport

Memory Usage for Large Genomes

No Homomorphic Computation

Jurisdiction Mapping

Full ACI Interface

Error Taxonomy

Deployment Options

SaaS Recommended

SDK Integration

On-Premise Upon Request

Enterprise On-Premise Deployment

Pricing