Zero-PII Architecture for Special-Education Software
The post-COPPA-2025 trust surface, why cloud-side inference inherits it, and the three architectural commitments — UUID-only IDs, device-local inference, PII-stripping — that close it.
By Davit Janunts, M.Ed. Special Education (Lehigh University — Fulbright Foreign Student Program); co-author, Morin, Janunts, et al. (2024), Exceptional Children, 90(2), 145-163, doi:10.1177/00144029231165506.
Summary
The three commitments below apply to all special-education software architecture. The LLM-call layer — “AI” in 2026 procurement language — is treated here as the focal example because it is the highest-risk new attack surface districts are actively procuring against, but the underlying privacy-by-design argument generalizes to every cloud-side data path the tool exposes.
Special-education data is uniquely sensitive: disability category, behavioral-incident logs, AAC vocabulary, IEP narrative text, and the legally protected fact of receiving services at all. The FTC’s January 2025 amendments to the Children’s Online Privacy Protection Rule (16 CFR Part 312, effective April 22, 2026) expand the definition of personal information, require a written data-retention policy, and require a written information-security program. FERPA (20 U.S.C. §1232g; 34 CFR Part 99) overlays a separate education-records regime. Layered onto that compliance frame, every cloud-side LLM call from a SPED tool inherits the vendor’s third-party trust surface — a surface that vendor self-attestation has been a poor instrument for measuring. Three architectural commitments — UUID-only identification, device-local inference, and a hard PII-stripping gate before any unavoidable network call — collapse most of that surface by construction.
The cost shape — what a breach in this category looks like
The 2024-2025 PowerSchool incident — disclosed publicly in January 2025 — is the operative counterexample. The breach exposed records spanning multiple districts and reportedly included IEPs, special-education service histories, and disability-category data. Larsen et al. (2019, NPJ Digital Medicine, 2:18) document, in a parallel domain, that 72 % of top-rated mental-health apps make unsupported clinical or privacy claims; vendor self-attestation is not a substitute for architectural constraints. Zeide (2019, Big Data & Society, 6(2), 1-15) frames the structural risk: aggregating children’s educational data centrally creates the conditions for algorithmic profiling regardless of any single vendor’s intent.
The post-2025 cost is not just regulatory exposure. It is the loss of the trust capital that districts spend years building with families. Once a district is the named party in a children’s data-breach press cycle, that capital does not come back on the same procurement cycle.
Three architectural commitments that collapse the trust surface
1. UUID-only identification at the application layer
Every student record in the application database is keyed on a randomly generated UUID. Names, date of birth, school of record, and any other directly identifying field are stored only on the district’s SIS-backed authentication boundary, never in the SPED-tool’s own telemetry, behavioral-event log, or analytics tables. The COPPA 2025 amendments expanded the definition of personal information to include persistent identifiers (16 CFR §312.2). UUIDs that are scoped per-tenant and never reused or rejoined to identifying data on the application side fall outside that expanded definition. van der Hof & Lievens (2018, International Journal of Law and IT, 26(1), 30-58) call this privacy by design — the regulatory obligation is met by data that does not exist in identifiable form, not by promising not to misuse data that does.
2. Device-local inference for student-content AI
AI features that touch student-produced content — reading passages a student decoded, AAC utterances a student composed, behavioral-incident text a teacher logged about a student — run on the device, not in a hosted LLM endpoint. Akgun & Greenhow (2022, AI and Ethics, 2, 431-440) document the K-12 ethics surface created by hosted-model training-data appropriation, prompt-log retention, and surveillance affordances. Device-local inference removes those failure modes by construction: there is no network endpoint at which student content lives, however briefly, in identifiable form.
3. PII-stripping gate before any unavoidable network call
Some AI calls — generating decodable practice passages tuned to a phonics scope, translating a district-side IEP-meeting summary into a parent’s native language at adult-grade-level readability — currently exceed practical device-local capacity. For those calls, every payload passes through a strict stripPII() pipeline that removes student names, school names, district names, dates of birth, addresses, and diagnostic terminology, replacing them with generic tokens; tokens are re-mapped to original values device-locally on response. Haque et al. (2021, IEEE Access, 9, 78008-78049) catalog the encryption requirements for the remaining transit and at-rest surfaces — TLS 1.3 in transit, AES-256-GCM at rest per NIST FIPS 197 and NIST SP 800-38D.
Why this is structural, not policy
Privacy policies, vendor self-attestations, and Data Processing Addenda are necessary instruments but they describe intent; they do not constrain capability. The Larsen et al. (2019) finding — that the majority of consumer-health apps make unsupported privacy or clinical claims — is the operative counterexample for what self-attestation alone can deliver. Architectural commitments constrain capability: a UUID-only key cannot be joined to a student name; a device-local inference call cannot leak a network payload; a PII-stripping gate is enforced by code on the call path, not by a paragraph in a policy document.
The trade-off is real. Some product features that depend on cross-tenant aggregation, persistent behavioral identifiers, or large-context cloud LLM calls are simpler to build inside a centralized data architecture. The post-2025 question is whether those features are worth the surface they expand. For SPED specifically — where the protected categories overlap directly with the FTC’s expanded definition of personal information and with FERPA-protected education records — the answer for a growing share of K-12 tooling is no.
Equity guard: the data that is not collected cannot disparately impact
Skiba et al. (2011, School Psychology Review, 40(1), 85-107) document that student-level risk models in school-discipline contexts amplify pre-existing demographic disproportionality. The corresponding result for educational-AI tools is structural: a tool that does not retain identifiable student-level features cannot — and a vendor that does not aggregate behavioral telemetry cannot — fit a model on those features later. Zeide (2019) makes the broader case: the equity exposure of educational data infrastructure is set by what is collected and retained, not by the model trained on top of it. The architectural commitments above shrink that exposure upstream of the model.
What changes operationally
For districts, the operational shift is procurement: the COPPA 2025 amendments mean the vendor’s written data-retention and information-security programs must be on file before the tool reaches a student. For vendors, the shift is architectural: the choice of UUID-only schema, device-local inference, and PII-stripping gates moves the privacy compliance argument from what the vendor promises to what the codebase makes possible.
The deliverable is not a longer privacy policy. It is a smaller blast radius if anything goes wrong — and an attestation that holds because the data does not exist to be exposed.
Disclaimer. This brief is a research-informed analysis of published peer-reviewed literature, federal regulation, and the publicly disclosed PowerSchool breach record. It is not legal advice, security advice, or a vendor recommendation. Districts evaluating SPED-tool privacy claims should consult qualified data-privacy counsel and their state education agency’s data-governance office. References to architectural commitments describe a category of design pattern; a specific vendor’s implementation of any pattern requires independent verification.
References
- Akgun, S., & Greenhow, C. (2022). Artificial intelligence in education: Addressing ethical challenges in K-12 settings. AI and Ethics, 2, 431-440.
- Children’s Online Privacy Protection Rule, 16 CFR Part 312 (FTC final amendments, January 2025; effective April 22, 2026).
- Family Educational Rights and Privacy Act, 20 U.S.C. §1232g; 34 CFR Part 99.
- Haque, A.B., Rahman, M.S., Ahsan, M.A.M., Habib, M.A., Rahman, M.A., & Roy, S. (2021). A comprehensive analysis of security and privacy in IoT-based healthcare and EdTech systems. IEEE Access, 9, 78008-78049.
- Larsen, M.E., Huckvale, K., Nicholas, J., Torous, J., Birrell, L., Li, E., & Reda, B. (2019). Using science to sell apps: Evaluation of mental health app store quality claims. NPJ Digital Medicine, 2, 18.
- NIST FIPS PUB 197 (2001, reaffirmed 2023). Advanced Encryption Standard (AES). National Institute of Standards and Technology.
- NIST SP 800-38D (2007). Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC. National Institute of Standards and Technology.
- Skiba, R.J., Horner, R.H., Chung, C.G., Rausch, M.K., May, S.L., & Tobin, T. (2011). Race is not neutral: A national investigation of African American and Latino disproportionality in school discipline. School Psychology Review, 40(1), 85-107.
- van der Hof, S., & Lievens, E. (2018). The importance of privacy by design and data protection impact assessments in strengthening protection of children’s personal data under the GDPR. International Journal of Law and IT, 26(1), 30-58.
- Zeide, E. (2019). The structural consequences of big data-driven education. Big Data & Society, 6(2), 1-15.