PIIArchive: Compliant Digital PII Archive
Archive sensitive PII with automated compliance retention across jurisdictions using threshold-protected storage that guarantees recoverability without single-point vulnerability.
Digital PII Archiving Without Compromise
Organizations must retain PII for legal, compliance, and business continuity reasons — but centralized storage creates catastrophic breach targets. PIIArchive splits PII into threshold-protected shares distributed across jurisdictions, guaranteeing recoverability while making single-point compromise mathematically impossible.
Unlike encryption-at-rest (which concentrates risk at the decryption point) or traditional backups (which create additional breach surfaces), PIIArchive uses information-theoretic splitting — no subset of shares reveals anything about the original data. Recovery requires threshold reconstruction, but normal operations never reconstruct the full PII.
The Problem
PII archives are permanent breach targets. Legal holds, compliance retention, and business continuity require storing sensitive data for years — but every day of storage is another day of exposure.
Today's Approaches All Fail
Centralized databases: One breach exposes everything. The Equifax breach exposed 147 million records because all PII sat in queryable databases. A single compromised credential or SQL injection gives an attacker complete access.
Encryption at rest: Moves the problem to key management. Keys stored alongside encrypted data offer no real protection. Hardware Security Modules (HSMs) are expensive, complex, and still represent single points of failure. Cloud KMS services require trusting the provider and create vendor lock-in.
Multiple backups: Multiplies the attack surface. Three backup copies mean three breach opportunities. Geographic distribution helps disaster recovery but worsens the security posture — more locations mean more potential compromise points.
Tape archives: Physical media degrades. Magnetic tape has a 10-30 year lifespan. Optical media fares slightly better but still degrades. Environmental factors (temperature, humidity, magnetic fields) accelerate degradation. Recovery becomes probabilistic over time.
Compliance Creates Contradictory Requirements
Regulations demand both retention and protection:
- GDPR Article 5(1)(e): Data must be kept no longer than necessary (storage limitation) — yet GDPR Article 17(3)(b) exempts retention for legal obligations.
- HIPAA § 164.316(b)(2)(i): Retain documentation for 6 years — but § 164.308(a)(1)(ii)(D) requires safeguards against unauthorized access.
- CCPA § 1798.100(c): Businesses must disclose retention periods — creating a roadmap for attackers who know exactly where long-term PII sits.
- SEC 17a-4: Financial records must be retained for up to 6 years in non-rewritable, non-erasable format — a permanent breach target.
Every existing solution forces organizations to choose between compliance (keep the data) and security (destroy the data). This is a false choice.
How It Works
PIIArchive separates archival obligation (you must retain this) from breach risk (no single location holds complete data). Threshold secret sharing distributes PII across geographic and jurisdictional boundaries.
Three-Layer Architecture
Layer 1: PII Extraction & Classification
The archive pipeline begins by identifying and extracting PII from source documents:
- Entity detection: Named entities (SSN, passport numbers, credit cards, medical record numbers) extracted via pattern matching and NER models
- Contextual PII: Addresses, phone numbers, dates of birth flagged based on document context
- Derived identifiers: Email addresses, account numbers, transaction IDs captured
- Classification tags: Each PII element tagged with regulatory category (GDPR Article 9 special categories, HIPAA PHI, CCPA personal information)
Layer 2: Threshold Splitting (XorIDA)
Each PII element is split using information-theoretic secret sharing:
import { createShares, reconstructMessage } from '@private.me/crypto'; import { createEnvelope } from '@private.me/xformat'; // Extract PII from document const piiRecord = { ssn: '123-45-6789', dob: '1985-03-15', medicalRecordNumber: 'MRN-8472615', classification: 'HIPAA_PHI', retentionYears: 7 }; // Serialize and split (2-of-3) const payload = JSON.stringify(piiRecord); const shares = await createShares(payload, 2, 3); // Wrap each share in xFormat envelope const envelopes = shares.map((share, idx) => createEnvelope({ productType: 'piiarchive', version: 1, threshold: 2, totalShares: 3, shareIndex: idx + 1, payload: share }) ); // Distribute shares to 3 geographic locations await distributeShare(envelopes[0], 'US_VAULT'); await distributeShare(envelopes[1], 'EU_VAULT'); await distributeShare(envelopes[2], 'APAC_VAULT');
2-of-3 threshold means: Any 2 shares can reconstruct the PII. Any 1 share reveals nothing — not probabilistically hard to reverse, but information-theoretically impossible. An attacker who compromises one vault location learns zero bits of information about the archived PII.
Layer 3: Geographic & Jurisdictional Distribution
Shares are distributed to enforce compliance and prevent single-jurisdiction legal orders:
Why geographic distribution matters: A legal order in one jurisdiction (e.g., EU GDPR Article 15 data subject access request) cannot compel production of data stored in other jurisdictions. Threshold reconstruction requires cooperation across jurisdictions — a feature, not a bug.
Recovery Process
When PII must be recovered (legal hold, data subject request, business continuity):
// Retrieve any 2 of 3 shares const share1 = await retrieveShare('US_VAULT', archiveId); const share2 = await retrieveShare('EU_VAULT', archiveId); // Reconstruct original PII const reconstructed = await reconstructMessage([share1, share2]); const piiRecord = JSON.parse(reconstructed); // Audit trail automatically logged console.log(`Reconstructed PII for: ${piiRecord.classification}`);
Use Cases
PIIArchive serves any organization with long-term PII retention obligations and breach prevention requirements.
Integration
PIIArchive integrates with existing data pipelines, compliance workflows, and identity management systems.
Installation
# Install dependencies npm install @private.me/piiarchive @private.me/crypto @private.me/xformat # Initialize archive configuration piiarchive init --vaults us-east,eu-west,ap-south --threshold 2
Basic Workflow
import { PIIArchive } from '@private.me/piiarchive'; // Initialize archive with vault configuration const archive = new PIIArchive({ vaults: [ { id: 'us-east', endpoint: 'https://vault-us.example.com' }, { id: 'eu-west', endpoint: 'https://vault-eu.example.com' }, { id: 'ap-south', endpoint: 'https://vault-ap.example.com' } ], threshold: 2, retentionPolicy: 'HIPAA_6_YEAR' }); // Archive PII with automatic splitting const archiveId = await archive.store({ data: { ssn: '123-45-6789', fullName: 'Jane Doe', medicalRecordNumber: 'MRN-8472615' }, classification: 'HIPAA_PHI', retentionYears: 7, legalHold: false }); console.log(`Archived PII: ${archiveId}`); // Later: Recover PII (requires threshold authorization) const recovered = await archive.retrieve(archiveId, { authorizedBy: 'compliance-officer@example.com', purpose: 'DATA_SUBJECT_ACCESS_REQUEST' }); console.log(recovered.data); // { ssn: '123-45-6789', ... }
Integration with Existing Systems
Database Triggers (PostgreSQL)
-- Archive PII on INSERT, store only archive ID in production DB CREATE OR REPLACE FUNCTION archive_pii_trigger() RETURNS TRIGGER AS $$ DECLARE archive_id TEXT; BEGIN -- Call PIIArchive API via pg_net extension SELECT net.http_post( 'https://piiarchive.example.com/api/store', jsonb_build_object( 'ssn', NEW.ssn, 'dob', NEW.date_of_birth, 'classification', 'HIPAA_PHI' ) ) INTO archive_id; -- Replace PII with archive reference NEW.ssn := NULL; NEW.date_of_birth := NULL; NEW.pii_archive_id := archive_id; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER archive_patient_pii BEFORE INSERT ON patients FOR EACH ROW EXECUTE FUNCTION archive_pii_trigger();
Application Layer (REST API)
import express from 'express'; import { PIIArchive } from '@private.me/piiarchive'; const app = express(); const archive = new PIIArchive({ /* config */ }); // Archive endpoint (protected by RBAC) app.post('/api/pii/archive', async (req, res) => { const { data, classification } = req.body; // Validate authorization if (!req.user.roles.includes('PII_ARCHIVER')) { return res.status(403).json({ error: 'Unauthorized' }); } const archiveId = await archive.store({ data, classification }); res.json({ archiveId }); }); // Retrieval endpoint (requires dual authorization) app.post('/api/pii/retrieve', async (req, res) => { const { archiveId, purpose } = req.body; // Require compliance officer + legal approval const authorized = await verifyDualApproval(req.user, purpose); if (!authorized) { return res.status(403).json({ error: 'Dual approval required' }); } const recovered = await archive.retrieve(archiveId, { authorizedBy: req.user.email, purpose }); res.json({ data: recovered.data }); });
Compliance Policy Engine
PIIArchive includes a policy engine that enforces retention and deletion automatically:
// Define retention policies per classification const policies = { HIPAA_PHI: { retentionYears: 7, autoDeleteAfter: true, legalHoldOverride: true }, GDPR_SPECIAL_CATEGORY: { retentionYears: 3, autoDeleteAfter: true, dataSubjectRequestDays: 30 }, FINANCIAL_AML: { retentionYears: 5, autoDeleteAfter: false, // Manual review required regulatoryHoldCheck: true } }; const archive = new PIIArchive({ vaults, threshold, policies });
Deployment
PIIArchive runs as a microservice with pluggable vault backends. Standard deployment uses cloud object storage (S3, Azure Blob, GCS) in multiple regions.
Docker Deployment
version: '3.8' services: piiarchive: image: privateme/piiarchive:latest environment: - VAULT_US_ENDPOINT=https://s3.us-east-1.amazonaws.com - VAULT_EU_ENDPOINT=https://s3.eu-west-1.amazonaws.com - VAULT_APAC_ENDPOINT=https://s3.ap-southeast-1.amazonaws.com - THRESHOLD=2 - AUDIT_LOG_ENDPOINT=https://auditlog.example.com ports: - "3000:3000" volumes: - ./config:/app/config healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3
Kubernetes Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: piiarchive spec: replicas: 3 selector: matchLabels: app: piiarchive template: metadata: labels: app: piiarchive spec: containers: - name: piiarchive image: privateme/piiarchive:latest ports: - containerPort: 3000 env: - name: VAULT_US_ENDPOINT valueFrom: secretKeyRef: name: vault-config key: us-endpoint resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m"
Multi-Cloud Vault Configuration
PIIArchive supports heterogeneous vault backends to avoid vendor lock-in:
| Vault Type | Use Case | Configuration |
|---|---|---|
| AWS S3 | Primary cloud storage (US/EU/APAC regions) | s3://bucket-name/prefix |
| Azure Blob Storage | Secondary cloud (geo-redundancy) | https://account.blob.core.windows.net/container |
| Google Cloud Storage | Tertiary cloud (tri-cloud strategy) | gs://bucket-name/prefix |
| On-Premises Object Store | Air-gapped vault (defense, healthcare) | https://internal-vault.corp/api |
| Tape Archive (LTO-9) | Long-term cold storage (10+ years) | tape://library-id/slot-range |
Compliance
PIIArchive is designed to satisfy retention obligations while meeting data protection requirements across jurisdictions.
GDPR (EU General Data Protection Regulation)
Article 5(1)(e) — Storage Limitation: Data must be kept no longer than necessary. PIIArchive enforces retention policies per classification and auto-deletes after expiry (unless legal hold applies).
Article 17 — Right to Erasure: Data subjects can request deletion. PIIArchive supports erasure by securely wiping all shares. Since shares are information-theoretically useless in isolation, K-1 share deletion is mathematically equivalent to full deletion.
Article 30 — Records of Processing Activities: All archive/retrieval operations logged with timestamp, actor, purpose, and authorization approval. Audit logs are HMAC-chained and tamper-evident.
Article 32 — Security of Processing: Threshold splitting ensures "a process for regularly testing, assessing and evaluating the effectiveness of technical measures" by eliminating single-point breach risk.
HIPAA (Health Insurance Portability and Accountability Act)
§ 164.316(b)(2)(i) — Retention Requirements: Documentation and policies must be retained for 6 years. PIIArchive automates HIPAA retention schedules for PHI classifications.
§ 164.308(a)(1)(ii)(D) — Information System Activity Review: Audit logs satisfy "procedures to regularly review records of information system activity" requirement.
§ 164.312(a)(2)(iv) — Encryption: While not explicitly required, HHS recommends encryption. PIIArchive's threshold splitting provides information-theoretic confidentiality stronger than any encryption algorithm.
CCPA (California Consumer Privacy Act)
§ 1798.100(c) — Disclosure of Retention: Businesses must disclose how long each category of personal information is retained. PIIArchive policy engine generates retention disclosure documentation automatically.
§ 1798.105 — Right to Delete: Consumers can request deletion. PIIArchive supports verified deletion with cryptographic proof of share destruction.
SEC 17a-4 (Financial Recordkeeping)
WORM Requirement: Records must be retained in non-rewritable, non-erasable format. PIIArchive's append-only vault backends with object lock (S3 Object Lock, Azure Immutable Blob Storage) satisfy WORM requirement while threshold splitting eliminates single-point breach risk.
FISMA / FedRAMP (Federal Systems)
NIST SP 800-53 Rev 5 — AC-2 Account Management: Dual authorization for PII retrieval satisfies "employ dual authorization for [Assignment: organization-defined privileged commands]".
SC-12 Cryptographic Key Establishment: Threshold splitting eliminates key management entirely — no keys to establish, rotate, or protect.
Security Model
PIIArchive's security derives from information-theoretic splitting, not computational hardness assumptions. Threat resistance is mathematical, not probabilistic.
Threat Scenarios
Single Vault Compromise
Attack: Attacker gains full access to one vault location (cloud account breach, insider threat, legal order).
PIIArchive Defense: Attacker learns zero bits of information about archived PII. A single share from a 2-of-3 threshold scheme is information-theoretically useless — not hard to crack, but impossible. No amount of computing power (quantum or classical) can extract PII from K-1 shares.
Multi-Vault Compromise (Below Threshold)
Attack: Attacker compromises K-1 vaults in a K-of-N scheme (e.g., 1-of-3 or 2-of-4).
PIIArchive Defense: Still learns nothing. Information-theoretic security means that K-1 shares reveal exactly zero information. The security does not degrade with partial compromise.
Threshold Vault Compromise
Attack: Attacker compromises K or more vaults simultaneously (e.g., 2-of-3).
PIIArchive Defense: Game over — attacker can reconstruct PII. This is the threshold boundary. Defense-in-depth: Make simultaneous cross-jurisdictional compromise as difficult as possible via geographic distribution, heterogeneous cloud providers, and distinct access control mechanisms per vault.
Supply Chain Attack (Cloud Provider)
Attack: Attacker compromises cloud provider infrastructure (hypervisor escape, privileged access abuse).
PIIArchive Defense: Multi-cloud distribution limits exposure. If AWS is compromised, attacker only gains access to shares in AWS vaults. Shares in Azure and GCP remain protected. Threshold reconstruction still requires cross-provider compromise.
Legal / Regulatory Seizure
Attack: Government issues legal order to seize all data in a specific jurisdiction (e.g., CLOUD Act, GDPR Article 23).
PIIArchive Defense: Single-jurisdiction order yields only shares within that jurisdiction — insufficient to reconstruct. Multi-jurisdictional legal cooperation required to compel threshold reconstruction. This is a compliance feature: Data localization requirements satisfied while preventing unilateral seizure.
Attack Surface Analysis
| Attack Vector | Traditional Archive | PIIArchive (2-of-3) |
|---|---|---|
| Database breach | ✗ Full exposure | ✓ Zero disclosure (threshold not met) |
| Backup theft | ✗ PII compromised | ✓ Single share useless |
| Insider threat (1 admin) | ✗ Full access | ✓ No single admin has threshold |
| Cloud provider breach | ✗ All PII exposed | ✓ Only 1 vault compromised |
| Legal order (single jurisdiction) | ✗ Full compliance required | ✓ Insufficient shares to reconstruct |
| Ransomware (cloud account) | ✗ Lose access or pay | ✓ Recover from other 2 vaults |
| Physical disaster (datacenter) | ✗ Data loss if no offsite backup | ✓ 2 vaults remain (threshold met) |
Operational Security
Vault Access Control: Each vault uses distinct authentication mechanisms. AWS vault uses IAM roles. Azure vault uses Managed Identities. On-premises vault uses LDAP. No single credential compromise grants multi-vault access.
Audit Logging: All archive/retrieval operations logged to immutable audit trail (HMAC-chained append-only log). Tamper detection automatic. Satisfies HIPAA § 164.308(a)(1)(ii)(D) and GDPR Article 30.
Dual Authorization: High-sensitivity retrievals require dual approval (e.g., compliance officer + legal counsel). Configurable per classification.
Performance
PIIArchive performance is dominated by network I/O (vault uploads) rather than cryptographic operations. XorIDA threshold splitting is sub-millisecond for typical PII payloads.
Archival Performance
Throughput Benchmarks
Measured on AWS EC2 t3.medium (2 vCPU, 4GB RAM), uploading to S3 Standard in us-east-1, eu-west-1, ap-southeast-1:
| Payload Size | Split Time | Upload Time (3 vaults) | Total Latency | Throughput (records/sec) |
|---|---|---|---|---|
| 512 B (SSN + name) | 0.6 ms | 85 ms | 85.6 ms | 11,680 |
| 1 KB (patient record) | 1.2 ms | 92 ms | 93.2 ms | 10,730 |
| 10 KB (CDD file) | 8.7 ms | 145 ms | 153.7 ms | 6,510 |
| 100 KB (background check) | 82 ms | 380 ms | 462 ms | 2,165 |
| 1 MB (medical imaging metadata) | 780 ms | 1,850 ms | 2,630 ms | 380 |
Reconstruction Performance
Reconstruction latency is dominated by vault download time (network I/O). XorIDA reconstruction itself is sub-millisecond:
| Payload Size | Download (2 shares) | Reconstruct Time | Total Latency |
|---|---|---|---|
| 512 B | 65 ms | 0.4 ms | 65.4 ms |
| 1 KB | 68 ms | 0.8 ms | 68.8 ms |
| 10 KB | 95 ms | 6.2 ms | 101.2 ms |
| 100 KB | 180 ms | 58 ms | 238 ms |
| 1 MB | 720 ms | 520 ms | 1,240 ms |
API Reference
PIIArchive provides a TypeScript SDK and REST API for integration.
Core Methods
REST API
POST /api/v1/pii/archive Authorization: Bearer <token> Content-Type: application/json { "data": { "ssn": "123-45-6789", "fullName": "Jane Doe" }, "classification": "HIPAA_PHI", "retentionYears": 7 } // Response { "archiveId": "pii_arch_8f7d2e3c9a1b", "vaults": ["us-east", "eu-west", "ap-south"], "expiresAt": "2031-04-10T00:00:00Z" }
POST /api/v1/pii/retrieve Authorization: Bearer <token> Content-Type: application/json { "archiveId": "pii_arch_8f7d2e3c9a1b", "purpose": "DATA_SUBJECT_ACCESS_REQUEST", "authorizedBy": "compliance-officer@example.com" } // Response { "data": { "ssn": "123-45-6789", "fullName": "Jane Doe" }, "classification": "HIPAA_PHI", "archivedAt": "2024-04-10T00:00:00Z", "auditTrailId": "audit_7f3e8d1c4b2a" }
What PIIArchive Doesn't Solve
PIIArchive addresses long-term PII archival and breach prevention. It does not solve all data protection problems.
Not Addressed
Active database security: PIIArchive protects long-term archives, not operational databases. If your application queries PII regularly, threshold reconstruction on every query is impractical. Use PIIArchive for cold storage, not hot data.
Real-time access control: Reconstruction latency (65ms - 2.6s depending on payload size) makes PIIArchive unsuitable for sub-10ms query requirements. If you need microsecond PII access, this is not the solution.
Searchability: Archived PII is opaque — you cannot search across archived records without reconstructing each one. PIIArchive is for retention and recovery, not analytics. If you need to run SQL queries over PII, keep searchable metadata separate (non-PII fields) and link to archive IDs.
Cross-organizational sharing: PIIArchive is designed for single-organization retention. If you need to share PII with external parties (e.g., health information exchange, financial data sharing), consider xLink or xChange for secure inter-organizational transport.
Quantum resistance: XorIDA threshold splitting is information-theoretically secure (quantum-proof), but vault authentication and transport encryption rely on classical cryptography (TLS, HMAC-SHA256). Post-quantum migration for infrastructure layer is planned but not yet implemented.
Operational Considerations
Vault availability: Reconstruction requires K vaults to be online simultaneously. If 2 of 3 vaults are unavailable, recovery fails. Design vault architecture for high availability (99.99%+ uptime) or increase threshold (3-of-5) for fault tolerance.
Vault cost: Multi-cloud distribution increases storage costs (3x for 2-of-3, 5x for 3-of-5). PIIArchive optimizes for security over cost. If budget is primary constraint, consider single-cloud with cross-region replication (lower security, lower cost).
Legal complexity: Multi-jurisdictional distribution creates legal complexity — which jurisdiction's data protection law applies? Work with legal counsel to ensure vault placement aligns with regulatory strategy.
Next Steps
Ready to eliminate single-point PII breach risk from your archives?
1. Install PIIArchive
npm install @private.me/piiarchive @private.me/crypto @private.me/xformat
2. Configure Vaults
Set up vault endpoints in at least 2 geographic regions or cloud providers. Recommended: AWS S3 (us-east-1) + Azure Blob (westeurope) + GCS (asia-southeast1).
3. Define Retention Policies
Map your PII classifications to retention schedules (HIPAA 7 years, GDPR 3 years, AML/KYC 5 years, etc.). PIIArchive policy engine enforces automatically.
4. Integrate with Existing Systems
Add archive hooks to database triggers, application APIs, or ETL pipelines. Replace centralized PII storage with archive IDs + threshold retrieval.
5. Test Recovery
Simulate vault failures and legal hold scenarios. Verify that threshold reconstruction works and audit logs capture all access.
Related ACIs
- xArchive: Physical QR-based PII archival for decades-long retention
- xRedact: PII redaction for AI/LLM processing pipelines
- xStore: Universal split-storage layer for threshold-protected data
- xLink: Secure machine-to-machine communication for PII transfer
- xProve: Verifiable computation for proving compliance without data access
Deployment Options
SaaS Recommended
Fully managed infrastructure. Call our REST API, we handle scaling, updates, and operations.
- Zero infrastructure setup
- Automatic updates
- 99.9% uptime SLA
- Enterprise SLA available
SDK Integration
Embed directly in your application. Runs in your codebase with full programmatic control.
npm install @private.me/piiarchive- TypeScript/JavaScript SDK
- Full source access
- Enterprise support available
On-Premise Upon Request
Enterprise CLI for compliance, air-gap, or data residency requirements.
- Complete data sovereignty
- Air-gap capable deployment
- Custom SLA + dedicated support
- Professional services included
Enterprise On-Premise Deployment
While piiArchive is primarily delivered as SaaS or SDK, we build dedicated on-premise infrastructure for customers with:
- Regulatory mandates — HIPAA, SOX, FedRAMP, CMMC requiring self-hosted processing
- Air-gapped environments — SCIF, classified networks, offline operations
- Data residency requirements — EU GDPR, China data laws, government mandates
- Custom integration needs — Embed in proprietary platforms, specialized workflows
Includes: Enterprise CLI, Docker/Kubernetes orchestration, RBAC, audit logging, and dedicated support.