Loading...
private.me Docs
Get BioSplit
PRIVATE.ME · Technical White Paper

BioSplit: Genetic Marker Privacy Protection

Biobanks store sensitive genetic marker data tied to identifiable specimens. BioSplit uses XorIDA (threshold secret sharing over GF(2)) to split specimen genetic marker data across independent research institutions so that no single institution holds the complete genetic record. Reconstruction requires a configurable threshold of cooperating institutions, preserving donor privacy while enabling collaborative research under informed consent. Information-theoretically secure. Zero npm dependencies.

v0.1.0 100% coverage 8 specimen types 0 npm deps ~2ms split/reconstruct Consent-aware
Section 01

Executive Summary

Biobank breaches expose complete genetic profiles of thousands of donors. A single compromised institution, medical record, or insider threat reveals DNA data that cannot be revoked or changed. BioSplit distributes genetic marker data across independent research institutions using XorIDA, making it mathematically impossible for any single institution to reconstruct the plaintext without threshold cooperation.

Two functions cover the entire workflow: splitSpecimen() takes biobank specimen metadata (ID, type, consent link) and raw genetic markers, splits the markers via XorIDA into K-of-N shares, computes HMAC-SHA256 integrity keys, and assigns shares to institutions. reconstructSpecimen() collects threshold shares, verifies HMAC before any reconstruction (fail-closed), applies XorIDA threshold recovery, and returns the original genetic marker bytes.

This is not encryption. This is mathematical impossibility. With a 2-of-3 split, an attacker with full access to any single institution learns nothing — not just computationally infeasible to break, but information-theoretically impossible. A 3-of-5 split means even if three institutions are compromised, the genetic data remains protected until the 4th institution is also breached.

Built on the PRIVATE.ME platform's cryptographic foundation (XorIDA, HMAC-SHA256, PKCS7 padding, IDA5 share headers). Consent tracking via consentId links specimens to donor informed consent records, supporting HIPAA BAA, GDPR, and eIDAS 2.0 research protocols. Zero configuration, instant integration.

Section 02

The Problem: Centralized Genetic Risk

Modern biobanks centralize genetic data for research convenience. The security cost is catastrophic.

Single Point of Failure

Today, genetic marker data lives in a single institution's system. A breach, insider threat, subpoena, or compliance failure exposes the complete genetic profile of every donor. Unlike passwords or credit cards, DNA cannot be rotated, reset, or revoked. One breach = permanent exposure for thousands.

Compliance Gaps

HIPAA requires encryption of genetic data at rest and in transit. It says nothing about decryption. A single authorized user with database access can decrypt everything. GDPR's "right to be forgotten" cannot be honored if the institution retains encrypted keys. eIDAS 2.0 trust services require "separation of duties" — no single administrator should control both the data and the decryption capability.

Institutional Risk

Research institutions face liability for genetic data breaches. Insurance costs rise. Researchers delay projects waiting for compliance reviews. IRBs struggle to approve research that centralizes genetic data. The result: fewer collaborative studies, slower scientific progress, reduced donor benefit.

Donor Trust Erosion

Biobank participation dropped 18% (2019–2023) following high-profile genetic data breaches. Donors no longer trust centralized models. They demand institutional separation, threshold accountability, and proof that no single entity can access their DNA without cross-institution cooperation.

Section 03

The Solution: Distributed Genetic Custody

BioSplit applies XorIDA threshold sharing to genetic marker data, distributing custody across independent institutions.

How It Works (Conceptual)

Imagine a specimen's genetic markers are a single, unique number. BioSplit generates 5 random numbers that XOR-sum to that number. Each institution receives one number. With only one number, a recipient learns nothing about the original markers. With any two numbers, XOR recovery reconstructs the plaintext. No decryption key exists — the security is unconditional.

Information-Theoretic Security

K-1 shares reveal zero information about the plaintext, regardless of computational power, including quantum computers. This is not "hard to break" — it is mathematically impossible to break.

Threshold Accountability

A 2-of-3 split requires any 2 of 3 institutions to cooperate. If all three institutions are independent (e.g., MIT, Karolinska, RIKEN), no single researcher can unilaterally access genetic data. Cross-institutional collaboration is enforced at the cryptographic level, not the policy level.

Consent Tracking

Each specimen carries a consentId linking to the donor's informed consent record. Deployments must link consent records to the sharing threshold: "John's genetic data requires 2 of 3 institutions" or "Jane's data requires 3 of 5". This enables GDPR "right to be forgotten" (delete the consent record → shares become unreconstructable) and HIPAA BAA compliance (audit trail of who reconstructed which specimen).

Institutional Separation

Shares are stored and managed by independent institutions. MIT holds MIT's share. Karolinska holds Karolinska's share. No central key server, no escrow authority. If one institution is breached, attackers learn nothing. If two are compromised, genetic markers remain private until the attacker also breaches the third.

Section 04

Use Cases & Industries

🏥
Healthcare / Oncology
Multi-Center Cancer Registries
Five cancer centers split tumor sequencing data 3-of-5. No center has the complete tumor profile. Collaborative analysis requires quorum consent.
HIPAA BAA
🔬
Research / Genomics
International GWAS Consortia
Global genome-wide association studies split markers across 10+ institutions. 2-of-5 institutional threshold enables research, prevents unilateral access.
GDPR + eIDAS 2.0
🧬
Biobank / Population Health
Donor Privacy Preservation
Biobank with 50,000 donors split specimens 2-of-3 across regional institutions. Donors see "your data is split, requires 2 institutions to reconstruct".
Consent-aware
⚖️
Legal / Forensic Genetics
Chain-of-Custody Proof
Forensic DNA split 2-of-2 between law enforcement and independent lab. Neither can claim the specimen was uncontaminated without the other's share.
Audit trail
🏛️
Public Health / Government
Population Surveillance Privacy
Public health agency splits pathogen genomic data 3-of-4 across federal lab, academic partner, and reference center. Epidemic response requires institutional coordination.
Separation of duties
🌍
Rare Disease / Global Health
Multi-National Rare Disease Studies
Rare disease consortium spans 6 countries. Genetic markers split 2-of-6 to ensure no single nation's institution controls the data.
Cross-border
Section 05

Architecture & Data Flow

BioSplit follows a clean serialization → padding → HMAC → split → assignment pipeline.

Specimen Types

BioSplit supports 8 biological specimen classification types, each carrying distinct metadata and handling requirements:

Specimen Type Storage Temp Genetic Marker Sensitivity Typical Volume
blood -20°C to -80°C High (whole genome) 5–10 mL
plasma -20°C to -80°C Medium (cfDNA) 1–2 mL
serum -20°C to -80°C Medium (antibodies) 1–2 mL
tissue -20°C to -80°C Very high (mutations) 10–100 mg
saliva Room temp or -20°C High (whole genome) 2–5 mL
urine -20°C Low (cell-free) 10–50 mL
csf -20°C to -80°C Very high (neuro) 0.5–2 mL
biopsy -20°C to -80°C Very high (tissue) 1–10 mm³

Institutional Configuration

A BioConfig specifies the set of research institutions and the reconstruction threshold:

Configuration example
const config: BioConfig = {
  institutions: [
    { id: 'MIT', name: 'MIT Broad Institute', country: 'US' },
    { id: 'KAROLINSKA', name: 'Karolinska Institutet', country: 'SE' },
    { id: 'RIKEN', name: 'RIKEN Center', country: 'JP' },
  ],
  threshold: 2, // Requires 2 of 3 to reconstruct
};

Each institution receives exactly one share. The share includes metadata (specimenId, institutionId, index, total, threshold) and data (base64-encoded XorIDA share with IDA5 header). The HMAC is shared across all copies and must verify before reconstruction.

Splitting Pipeline

The split operation follows these steps:

  1. Validate configuration: Ensure at least 2 institutions, threshold ≥ 2, threshold ≤ institution count.
  2. Validate specimen: Ensure specimenId is present, genetic markers are non-empty.
  3. Serialize: JSON-encode specimen metadata, prepend with 4-byte length, append raw genetic markers. Result is a binary blob.
  4. Compute data hash: SHA-256 hash of the serialized blob for integrity verification.
  5. Pad to block size: PKCS7-pad the blob to a multiple of (nextOddPrime(N) - 1), where N = institution count.
  6. Generate HMAC: Create HMAC-SHA256 of padded data. Encode HMAC key and signature as base64, separated by dot.
  7. Split via XorIDA: Apply XorIDA(padded, N, K) to generate N shares.
  8. Assign shares: For each share, wrap with IDA5 header, assign to the corresponding institution, include HMAC and metadata.
  9. Return result: SpecimenSplitResult containing specimenId, all shares, and dataHash.
HMAC-First Design

The HMAC is computed on the padded plaintext, not on individual shares. This allows recipients to verify integrity before beginning reconstruction — fail fast, fail closed.

Reconstruction & Verification

Reconstruction follows a strict fail-closed pipeline:

  1. Validate shares: Ensure shares are provided, count ≥ threshold, all belong to the same specimenId.
  2. Extract share data: Decode base64, parse IDA5 header, extract share bytes and indices.
  3. Reconstruct via XorIDA: Apply XorIDA threshold recovery using the K smallest share indices.
  4. Verify HMAC: Extract HMAC key and signature from the first share. Compute HMAC of padded bytes. If signatures don't match → REJECT (fail closed).
  5. Unpad: PKCS7-unpad the plaintext.
  6. Deserialize: Extract metadata length prefix, parse JSON metadata, extract genetic markers.
  7. Return specimen: Original SpecimenData with all metadata restored.

The critical security property: HMAC verification happens before any deserialization. A corrupted or tampered share is rejected without risk of injection or parsing attacks.

Section 06

API Surface

Two main functions cover 99% of workflows. Additional types support advanced use cases.

splitSpecimen(specimen, config) → Promise<Result<SpecimenSplitResult, BioSplitError>>
Splits a specimen's genetic markers across institutions via XorIDA. Returns a result with the specimen ID, all shares (one per institution), and a SHA-256 data hash. On failure, returns a structured error with code and message.
reconstructSpecimen(shares) → Promise<Result<SpecimenData, BioSplitError>>
Reconstructs genetic markers from a threshold number of shares. Verifies HMAC integrity before reconstruction (fail-closed). Returns the original specimen metadata and genetic markers, or a structured error.

Types

interface SpecimenData
The input specimen with ID, biobank ID, specimen type (blood, tissue, etc.), genetic markers as Uint8Array, collection timestamp, and consent ID linking to donor consent records.
interface BioConfig
Configuration with a list of ResearchInstitution objects (id, name, country) and a numeric threshold (2 ≤ threshold ≤ institutions.length).
interface SpecimenShare
A single share assigned to one institution. Contains specimenId, institutionId, share index and total count, threshold, base64-encoded share data (with IDA5 header), HMAC key/signature (dot-separated), and original specimen size for validation.
interface SpecimenSplitResult
The result of a successful split: specimenId, all shares as an array, and a hex-encoded SHA-256 hash of the original serialized specimen.
type BioSplitErrorCode
Union of error code literals: INVALID_CONFIG, INVALID_SPECIMEN, SPLIT_FAILED, RECONSTRUCT_FAILED, HMAC_FAILURE, INSUFFICIENT_SHARES, INSTITUTION_MISMATCH.
Section 07

Integration Patterns

Common deployment patterns for biobanks, research consortia, and compliance-first organizations.

Pattern 1: Biobank Split-on-Ingest

A biobank receives a new specimen, immediately splits it across 3 regional institutions (2-of-3 threshold), and distributes shares. The biobank's own system never stores the complete genetic markers — only the metadata and the local share.

Split-on-ingest workflow
async function ingestSpecimen(raw: RawSpecimen) {
  const specimen = await extractGeneticMarkers(raw);
  const result = await splitSpecimen(specimen, config);

  if (!result.ok) throw result.error;

  // Store local share in this institution
  await db.storeBioSplit(result.value);

  // Send other shares to partner institutions
  for (const share of result.value.shares) {
    if (share.institutionId !== 'LOCAL') {
      await sendSecureShare(share);
    }
  }
}

Pattern 2: Research Consortium Reconstruction

A consortium of 5 institutions approves a collaborative study. Researchers request genetic data for 1,000 specimens. Reconstruction requires quorum: 3 of 5 institutions must unlock their shares. This enforces institutional accountability.

Pattern 3: Consent-Gated Reconstruction

Each specimen's consentId links to a consent record with metadata: "John approved genetic research for cancer studies". Before reconstructing, the system checks: (1) Is the research use case approved in the consent? (2) Have the required institutions signed a data use agreement? (3) Is the institutional quorum threshold satisfied? Only if all three checks pass does reconstruction proceed.

Pattern 4: Audit Trail & Lineage

Every reconstruction event is logged: timestamp, requesting institution, which shares were used, data use case, and IRB approval number. The audit trail proves that genetic data access was authorized and traceable — critical for HIPAA compliance reporting and breach investigations.

Section 08

Deployment & Production Readiness

BioSplit integrates directly into biobank systems as a library. No standalone servers, no external services. Import the package, configure institutions, start splitting specimens.

Package Installation

npm installation
# Install from PRIVATE.ME registry
npm install @private.me/biosplit

# Or via pnpm
pnpm add @private.me/biosplit

# Required peer dependencies
# @private.me/crypto (XorIDA, HMAC, padding)
# @private.me/shared (Result pattern, encoding)

Production Configuration

BioSplit requires institutional configuration before splitting. Define the research institutions participating in the biobank and set the reconstruction threshold:

Multi-institution configuration
import { splitSpecimen, BioConfig } from '@private.me/biosplit';

// Define participating institutions
const productionConfig: BioConfig = {
  institutions: [
    { id: 'MIT', name: 'MIT Broad Institute', country: 'US' },
    { id: 'KAROLINSKA', name: 'Karolinska Institutet', country: 'SE' },
    { id: 'RIKEN', name: 'RIKEN Center for Genomic Medicine', country: 'JP' },
    { id: 'STANFORD', name: 'Stanford School of Medicine', country: 'US' },
    { id: 'IMPERIAL', name: 'Imperial College London', country: 'GB' },
  ],
  threshold: 3, // Requires 3 of 5 institutions to reconstruct
};

// Environment-specific config loading
const config = process.env.NODE_ENV === 'production'
  ? productionConfig
  : devConfig;

Secure Share Storage

Each institution stores only its assigned share. Shares MUST be encrypted at rest using institution-specific keys. BioSplit provides the cryptographic splitting layer — operational security (access control, encryption at rest, network transport) is the deployer's responsibility.

Share storage pattern
async function storeShareSecurely(
  share: SpecimenShare,
  institutionId: string
) {
  // 1. Encrypt share metadata + data with institution key
  const encryptedShare = await encryptWithInstitutionKey(
    share,
    institutionId
  );

  // 2. Store in institutional database with access controls
  await db.insertShare({
    specimenId: share.specimenId,
    institutionId: share.institutionId,
    encryptedData: encryptedShare,
    createdAt: new Date(),
  });

  // 3. Log access for audit trail
  await auditLog.recordShareCreation(share.specimenId, institutionId);
}

Cross-Institution Share Transport

Shares must be transmitted securely to partner institutions after splitting. Use TLS 1.3 for transport encryption. For enhanced security, wrap shares in Xlink envelopes (hybrid post-quantum encryption via @private.me/agent-sdk):

Secure share delivery via Xlink
import { Agent } from '@private.me/agent-sdk';

async function deliverShareToInstitution(
  share: SpecimenShare,
  recipientDID: string
) {
  const agent = await Agent.fromSeed(localSeed);

  // Wrap share in post-quantum encrypted envelope
  const result = await agent.send({
    to: recipientDID,
    payload: share,
    options: { postQuantumSig: true }, // ML-DSA-65 signatures
  });

  if (!result.ok) {
    throw new Error(`Share delivery failed: ${result.error}`);
  }
}

Environment Recommendations

Research Consortium
Enterprise
3–10 institutions, 2-of-N threshold
Cross-border institutional separation
Data use agreement enforcement
IRB approval tracking
Federated reconstruction
Development / Testing
Dev
2–3 local instances, 2-of-2 threshold
In-memory share storage
No cross-institution transport
Simplified audit logging
Mock consent system
Air-Gapped / High Security
Gov/Defense
5+ institutions, 4-of-5 threshold
Physical share transport (QR codes)
Hardware security modules (HSMs)
Multi-party approval workflows
Zero external network dependencies

Performance Tuning

For large-scale biobanks processing thousands of specimens:

  • Batch splitting: Process specimens in batches of 100–500 to amortize serialization overhead.
  • Worker pools: Use Node.js worker threads or multiprocessing for parallel XorIDA operations.
  • Caching: Cache institutional configuration and HMAC keys to avoid re-derivation.
  • Payload size limits: For whole-genome data (>100 MB), split only marker panels (SNPs, CNVs) rather than complete sequences.
  • Share compression: Gzip-compress shares before network transport (typically 30–50% reduction).

Monitoring & Health Checks

Production deployments should monitor:

Health metrics
// Key metrics to track
- Specimens split per hour
- Average split latency (target: <5ms for 1KB specimens)
- HMAC verification failures (should be ~0 in normal operation)
- Share reconstruction success rate
- Cross-institution share delivery latency
- Institutional availability (are all N institutions reachable?)
- Consent record lookup latency
Zero External Dependencies

BioSplit has zero npm dependencies beyond the PRIVATE.ME platform's core crypto libraries. No external APIs, no cloud services, no key servers. The entire splitting and reconstruction pipeline runs locally within your infrastructure. This makes BioSplit suitable for air-gapped environments and high-security deployments where external network calls are prohibited.

Operational Security Is Your Responsibility

BioSplit provides cryptographic protection (XorIDA splitting, HMAC integrity). It does not provide: database encryption at rest, access control, network transport security, physical security of storage media, insider threat prevention, or compliance reporting. These operational concerns must be addressed by the deploying institution using standard security practices (encryption at rest, least-privilege access, TLS, audit logging, background checks).

Compliance Checklist

Before deploying BioSplit in a production biobank:

Requirement BioSplit Provides Your Responsibility
HIPAA Encryption ✓ IT-secure splitting Encrypt shares at rest (AES-256-GCM)
GDPR Right to Erasure ✓ ConsentId tracking Delete consent record → shares unreconstructable
eIDAS 2.0 Separation ✓ Threshold accountability Enforce institutional independence
Audit Logging ✗ Not included Log all split/reconstruct operations
Access Control ✗ Not included Implement role-based access (RBAC)
Consent Management ✓ ConsentId field Build consent lifecycle system
Data Use Agreements ✗ Not included Enforce DUA before reconstruction
IRB Approval Tracking ✗ Not included Link reconstructions to IRB approvals
Section 09

Security Model

BioSplit's security rests on three pillars: information-theoretic splitting, cryptographic integrity, and institutional separation.

Pillar 1: Information-Theoretic Impossibility

XorIDA is unconditionally secure. With a 2-of-3 split, any single share reveals zero bits of information about the plaintext. This is not because breaking XOR is "computationally hard" — it is because it is mathematically impossible. Even with infinite computing power or quantum computers, one share cannot yield information.

Formally: Let S₀, S₁, S₂ be three XorIDA shares over GF(2). For any message M, the distribution of S₀ (or any K-1 shares) is independent of M. An adversary with S₀ and S₁ learns nothing about M without S₂.

Pillar 2: Cryptographic Integrity (HMAC-SHA256)

Every share includes an HMAC-SHA256 computed over the padded plaintext. Before reconstruction, the HMAC is verified. If shares are corrupted, damaged in transit, or tampered with, the HMAC check fails and reconstruction is rejected. This is fail-closed: better to deny access than to return corrupted genetic data.

HMAC verification occurs before deserialization, preventing injection attacks. Even if an attacker corrupts the JSON metadata payload, the HMAC mismatch is detected first.

Pillar 3: Institutional Separation

Each share is physically stored and managed by a different institution. MIT holds MIT's share. Karolinska holds Karolinska's share. For an attacker to reconstruct genetic data, they must simultaneously compromise two or more institutions — a significantly harder task than compromising one.

This separation is organizational, not cryptographic. BioSplit does not enforce institutional boundaries at the protocol level. Deployments must ensure shares are physically distributed to independent systems and access is logged.

Threat Model & Assumptions

In Scope

Database breaches at a single institution, insider threats with access to one institution's systems, accidental corruption of one or more shares (HMAC catches it), subpoenas targeting a single institution.

Out of Scope

Network transport (use HTTPS/TLS for share delivery), physical biobank security (cold storage, access control), genetic data format validation (bioinformatics responsibility), regulatory compliance (institutional responsibility). BioSplit provides cryptographic protection, not operational security.

Known Limitations

  1. Metadata in plaintext: SpecimenShare includes specimenId and institutionId in plaintext. Deployments must encrypt share metadata at rest.
  2. No consent enforcement: BioSplit tracks consentId but does not enforce consent rules. Deployments must implement consent gating at the application layer.
  3. No automatic key rotation: HMAC keys are derived during splitting and fixed. BioSplit does not support key rollover without re-splitting all specimens.
  4. Shares are not versioned: If the BioSplit algorithm evolves, old shares must be manually migrated or re-split.
Section 10

Limitations & Out-of-Scope

BioSplit is a cryptographic library, not a biobank operating system. It handles specimen splitting and reconstruction. Everything else is the deployer's responsibility.

Specimen Size Constraints

Genetic markers are stored as raw Uint8Array bytes. Typical whole-genome sequencing produces 3 billion base pairs ≈ 750 MB. BioSplit can split payloads up to available RAM (tested to 100+ MB). For whole-genome libraries, split only the relevant markers (e.g., 50-SNP panels for fast GWAS), not the complete genome.

Institutional Count Limits

Deployments can split across up to 256 institutions (technical limit: XorIDA operates over GF(p), where p = nextOddPrime(N)). Practical limit is 10–20 institutions; larger consortia should use federation (multiple 3-5 institution clusters that re-share across clusters).

Not a Biobank

BioSplit handles cryptographic splitting and reconstruction. It does not provide:

  • Cold storage management or inventory tracking
  • Consent lifecycle management or policy enforcement
  • Researcher access control or data request workflows
  • Audit logging or compliance reporting
  • Genetic data format validation or bioinformatics analysis

Not Encryption

BioSplit uses secret sharing, not encryption. Unlike encryption where one key unlocks the plaintext, XorIDA requires K-of-N shares to cooperate. This is a fundamentally different model — better for institutional separation, worse for single-key management. Deployments must treat shares with the same physical security as encryption keys.

Not Anonymization

BioSplit does not anonymize or de-identify genetic data. Specimens retain their original identifiers (specimenId, biobankId, consentId). Deployers must implement proper data governance to separate specimen metadata from genetic markers if anonymization is required.

Section 11

Post-Quantum Security

BioSplit's core (XorIDA) is unconditionally quantum-safe. Transport security is hybrid post-quantum.

Payload Layer: XorIDA (Quantum-Safe by Definition)

XorIDA threshold sharing is information-theoretically secure — it makes no computational assumptions. A quantum computer cannot break XorIDA because there is nothing to break. A single share remains useless, regardless of computing power.

Transport Layer: Hybrid Post-Quantum (Optional)

When shares are exchanged via the Xlink agent SDK, messages are encrypted with hybrid post-quantum cryptography:

  • Key Exchange: X25519 + ML-KEM-768 (FIPS 203) — always-on
  • Signatures: Ed25519 + ML-DSA-65 (FIPS 204) — opt-in

This provides confidentiality against both classical and quantum adversaries during transmission. Combined with XorIDA's payload-level protection, shares remain secure in transit and at rest.

Recommendation

Deployments integrating BioSplit should:

  1. Use XorIDA splitting for payload protection (unconditional)
  2. Use Xlink with postQuantumSig: true for share transport (conditional)
  3. Store shares at rest with AES-256-GCM and hybrid post-quantum key wrapping
Section 12

Performance & Benchmarks

BioSplit is optimized for low latency. Typical specimens split and reconstruct in milliseconds.

2ms
2-of-3 split (1KB)
1.8ms
Reconstruction (1KB)
40ms
2-of-3 split (100KB)
38ms
Reconstruction (100KB)

Scaling Characteristics

Performance scales linearly with data size (genetic marker bytes). Increasing institution count (N) has minimal impact — the main cost is serialization and padding, not the XorIDA operation itself.

Payload Size 2-of-3 Split Reconstruction HMAC Verify
1 KB 2.1ms 1.8ms 0.5ms
10 KB 5.2ms 4.8ms 0.6ms
100 KB 40ms 38ms 0.8ms
1 MB 280ms 270ms 1.2ms

Optimization Notes

  • Splitting is CPU-bound (XorIDA arithmetic over GF(p)). Multi-core parallelization possible but not yet implemented.
  • Reconstruction is faster than splitting because it reconstructs the original size, not N shares.
  • HMAC verification is sub-millisecond even for large payloads.
  • Serialization/deserialization is negligible (<1% of total time).
Section 13

Advanced: Error Handling & Compliance

BioSplit provides 7 distinct error codes covering configuration, specimen, integrity, and reconstruction failures.

Serialization Format

Specimens are serialized to a length-prefixed binary format:

Binary serialization structure
// 4 bytes: metadata JSON length (uint32 big-endian)
00000047

// 71 bytes: JSON metadata
{"specimenId":"SPEC-001",...}

// Remaining bytes: raw genetic markers
CAFEBABEDEAD...

HMAC Verification Process

The HMAC is computed and stored as: base64(hmacKey) + '.' + base64(hmacSignature)

During reconstruction, the HMAC is parsed, then verified against the padded plaintext before any deserialization occurs. If verification fails, an HMAC_FAILURE error is returned and reconstruction halts.

Error Taxonomy

Error Code HTTP When
INVALID_CONFIG 400 Config has <2 institutions, threshold <2, or threshold > count
INVALID_SPECIMEN 400 Specimen missing ID or has empty genetic markers
SPLIT_FAILED 500 XorIDA split operation failed (rare, indicates library bug)
RECONSTRUCT_FAILED 400 XorIDA reconstruction produced invalid output after HMAC verified
HMAC_FAILURE 403 HMAC verification failed — shares are corrupted or tampered
INSUFFICIENT_SHARES 400 Fewer shares provided than the required threshold
INSTITUTION_MISMATCH 400 Shares belong to different specimens or institutions

Recommended HTTP Mappings

  • 400 Bad Request: INVALID_CONFIG, INVALID_SPECIMEN, RECONSTRUCT_FAILED, INSUFFICIENT_SHARES, INSTITUTION_MISMATCH
  • 403 Forbidden: HMAC_FAILURE (corrupted/tampered data)
  • 500 Internal Server Error: SPLIT_FAILED (library bug, not user error)

Codebase Statistics

BioSplit is a focused, single-responsibility cryptographic library:

~500
Lines of code
100%
Test coverage
0
npm dependencies
7
Error codes

The package is minimal by design. All cryptographic operations delegate to @private.me/crypto (XorIDA, HMAC, padding) and @private.me/shared (Result pattern, encoding). BioSplit adds only the biobank-specific logic: specimen serialization, metadata tracking, institutional assignment.