NeuralPath AI Security Audit — 25 Vulnerabilities in an ML Platform

Critical C-01 · ML-001

Training Data S3 Bucket Publicly Accessible — 4.2TB Enterprise Data Exposed

The S3 bucket hosting all 200 enterprise tenants' training datasets was publicly accessible with no authentication. Any internet user could download petabytes of proprietary training data including medical images, financial time series, and PII-laden NLP corpora. Confirmed via unauthenticated aws s3 ls command.

SOC2 CC6.1 NIST SC-28 Data Exfiltration AWS S3

Remediation

Enable S3 Block Public Access at account level. Restrict bucket policies to specific IAM roles. Enable SSE-KMS encryption with per-tenant CMKs. Enable S3 access logging and CloudTrail data events.

Critical C-02 · ML-002

Model Weights Downloadable Without Authentication — Complete IP Theft Vector

The model artifact API served full model weight files (PyTorch, TensorFlow, ONNX) with zero authentication. A 2.1GB fine-tuned GPT model trained on enterprise customer data was downloaded in under 4 minutes with no API key. Enterprise clients invest millions in training costs — all of it freely downloadable by any actor on the internet.

Model Theft Broken Auth OWASP A01 IP Protection

Remediation

Require authentication on all artifact endpoints. Implement S3 presigned URLs with 1-hour TTL. Add per-tenant ownership verification — tenant A cannot download models belonging to tenant B.

Critical C-03 · ML-003

AWS & GCP Credentials Hardcoded in MLflow Training Pipeline Scripts

Active AWS access keys and GCP service account credentials were found hardcoded in MLflow training scripts committed to the Git repository 14 months ago — never rotated. The keys had s3:*, ec2:*, sagemaker:* permissions, enabling full training infrastructure access and compute resource abuse at scale.

Secret Exposure MLflow NIST IA-5 Git History

Remediation

Rotate all exposed credentials immediately. Run TruffleHog on full repo history. Implement pre-commit hooks. Migrate to IAM instance profiles and GCP Workload Identity — no long-lived credentials in code.

Critical C-04 · ML-004

Multi-Tenant Training Data Cross-Contamination — All 200 Clients Affected

Tenant isolation was enforced only at the application layer — no storage-level separation existed. By manipulating the dataset_path parameter in training job submissions, we accessed training datasets belonging to Fortune 500 enterprise tenants from a low-tier trial account. A fundamental multi-tenancy breach affecting all 200 clients.

Multi-Tenancy IDOR SOC2 CC6.3 Data Isolation

Remediation

Implement storage-level isolation: separate S3 buckets or KMS per-tenant encryption. Enforce tenant scoping at IAM policy level using resource tags. Cryptographic ownership verification before any storage operation.

Critical C-05 · ML-005

Kubernetes API Server Exposed to Public Internet — Anonymous Auth Enabled

The GPU training cluster Kubernetes API server was publicly accessible with --anonymous-auth=true. Unauthenticated kubectl commands returned pod listings, service configs, and secret names. Full cluster compromise achievable without credentials — all 847 active training jobs accessible.

Kubernetes CVE-2023-2728 NIST SC-7 Network Exposure

Remediation

Disable anonymous auth (--anonymous-auth=false). Restrict API server to internal VPC only — remove public DNS. Enable Kubernetes audit logging. Use dedicated bastion/jump host for cluster admin.

Critical C-06 · ML-006

Insecure Pickle Deserialization on Model Upload — Remote Code Execution Confirmed

The model upload endpoint deserialized PyTorch model files using Python's native pickle.load() without sandboxing. We uploaded a crafted pickle file that executed os.system("id; cat /etc/passwd") — confirmed code execution as the inference-worker service account with access to all tenant models and the internal Kubernetes API.

RCE Pickle Deserialization CVE-2023-43338 OWASP A08

Remediation

Use torch.load(weights_only=True) or migrate to SafeTensors/ONNX. Validate model files in a sandboxed gVisor container with no network access and read-only filesystem before accepting uploads.

Critical C-07 · ML-007

Model Artifacts Unencrypted at Rest — All EBS Volumes Unencrypted

All 20 GPU training and inference EC2 instances mounted unencrypted EBS volumes storing model checkpoints and final weights. Zero of 20 EBS volumes had encryption enabled per AWS Console. A compromised EC2 or unauthorized EBS snapshot immediately yields raw model weights — millions in compute investment and learned patterns from proprietary customer data instantly exposed.

Encryption at Rest NIST SC-28 SOC2 CC6.7 AWS EBS

Remediation

Enable EBS encryption on all volumes and set as account default. Implement per-tenant KMS CMKs for model artifact encryption. Enable CloudTrail for all EBS and KMS API calls.

Critical C-08 · ML-008

No Per-Tenant Isolation at Inference Layer — Shared Service Account for All 200 Clients

All 200 enterprise tenants' inference requests were processed by a single Triton Inference Server instance under a shared inference-svc service account. No workload isolation, GPU memory partitioning, or per-request sandboxing. Adversarial inputs from one tenant could read cached GPU memory belonging to another tenant's recently completed inference batch.

Tenant Isolation Triton Inference SOC2 CC6.3 Side Channel

Remediation

Deploy per-tenant Triton instances or use NVIDIA MIG GPU partitioning. Implement strict Kubernetes NetworkPolicies. Add GPU memory clearing between tenant inference batches. Evaluate NVIDIA Confidential Computing for cryptographic GPU isolation.

Critical C-09 · ML-009

Privileged GPU Containers Enable Container Escape to Host Node

All GPU training containers ran with --privileged=true and seccomp=unconfined. Using CVE-2022-0185 (Linux kernel cgroup boundary bypass), we achieved root on the underlying EC2 host node from within a tenant-controlled training container. From the host, we extracted cluster-admin Kubernetes tokens — a reliable path from any tenant's training job to full cluster compromise.

Container Escape CVE-2022-0185 Kubernetes Privilege Escalation

Remediation

Remove --privileged — use the NVIDIA device plugin for Kubernetes instead. Apply Pod Security Admission to block privileged containers. Enable Falco for runtime container escape detection. Patch Linux kernel to 5.16.2+.

High H-01 · API-001

No Rate Limiting on Inference API — Model Extraction Attack Confirmed

The inference API had no rate limiting or quota enforcement. We executed a model extraction attack: 50,000 systematic queries over 48 hours reconstructed a surrogate model achieving 94% fidelity against the original fine-tuned LLM. Enterprise clients' million-dollar training investments are stealable by any actor with a valid API key — including lower-tier tenants.

Model Extraction Rate Limiting OWASP API4 AI Security

High H-02 · API-002

No Input Validation on Inference Endpoints — Adversarial Inputs Crash Servers

Inference endpoints accepted malformed tensors without validation. A tensor with shape [INT_MAX, INT_MAX, 3] caused the TensorFlow Serving instance to allocate 18GB before OOM-crashing, taking the shared inference server offline for all 200 tenants for 4 minutes. FGSM adversarial inputs also bypassed customer-deployed fraud detection models.

Input Validation Adversarial ML DoS TensorFlow Serving

High H-03 · API-003

No Audit Logs on Inference Requests — Model Theft Undetectable in Forensics

50,000 queries over 48 hours left zero forensic trace — no alert fired, no log entry recorded the requesting API key. Enterprise customers in regulated industries cannot demonstrate model integrity or detect IP exfiltration. SOC2 CC7.2 requires system monitoring; without inference audit logs, this control fails entirely.

Audit Logging SOC2 CC7.2 NIST AU-2 Forensics Gap

High H-04 · API-004

Inference Pods Have cluster-admin Kubernetes Permissions

The service account in all inference pods had cluster-admin ClusterRoleBinding — the highest Kubernetes privilege. The mounted service account token was readable from within any container. Combined with the Pickle RCE finding, this gave us full Kubernetes API access from a single model upload, including reading secrets across all tenant namespaces.

Kubernetes RBAC Privilege Escalation NIST AC-6 Least Privilege

High H-05 · DATA-001

12 Petabytes of Training Data Unencrypted on NFS Storage

The primary training data NFS cluster stored 12 petabytes without encryption. NFS transmitted data in plaintext over the internal network. NFS exports were misconfigured with no_root_squash, allowing any root process on a mounted host — including compromised training containers — to read all tenant data across all NFS paths.

NFS Security Encryption at Rest NIST SC-8 no_root_squash

High H-06 · RBAC-001

Any API Key Can Delete or Modify Any Tenant's Models — No Authorization Checks

The model management API performed no ownership checks. Any valid API key — including trial accounts — could call DELETE /api/v1/models/{model_id} with any enterprise tenant's model ID and succeed. We deleted and recreated a production model belonging to a different tenant without any authorization error. Complete sabotage vector for any enterprise client.

Broken Access Control IDOR OWASP A01 Multi-Tenancy

High H-07 · MGMT-001

JupyterHub Server Unauthenticated — 47 Notebooks With Live Production Credentials

A JupyterHub instance was accessible without any authentication. 47 active engineering notebooks contained live database connection strings, production S3 bucket names, model architecture details for proprietary customer models, and direct database query results displaying customer PII. Token-based authentication was never configured.

JupyterHub Credential Exposure NIST CM-6 Broken Auth

High H-08 · CONT-001

3 Critical Unpatched CVEs in TensorFlow Serving Images — 18 Months Unpatched

Trivy scan of the production TensorFlow Serving image (tensorflow/serving:2.11.0) revealed CVE-2023-25801 (CVSS 9.8, RCE in REST API handler), CVE-2022-41911 (CVSS 8.8, heap overflow in batch inference), and CVE-2021-41203 (CVSS 9.8, type confusion RCE). Public exploits exist for all three. Images were 18 months out of date despite patches in TF 2.14.0+.

CVE-2023-25801 TensorFlow Serving Container CVE NIST SI-2

High H-09 · IAM-001

ML Training Worker IAM Role Has Wildcard AWS Permissions — Full Account Takeover via SSRF

The IAM role for GPU training EC2 instances had s3:*, ec2:*, iam:*, sts:*, sagemaker:* on Resource *. Training jobs run user-controlled code — making EC2 metadata SSRF trivial. An attacker controlling a training job could obtain temporary keys with full AWS account permissions, enabling complete account takeover.

AWS IAM SSRF EC2 Metadata NIST AC-6

High H-10 · MLFLOW-001

MLflow Tracking Server Unauthenticated — Exposes All Model Metadata and Allows Writes

The MLflow experiment tracking server exposed all tenant experiment configs, model hyperparameters, artifact S3 paths, training metrics, and Git commit hashes without authentication. The REST API allowed unauthenticated writes — we successfully logged fake metrics and registered a backdoored model version under an existing enterprise experiment, demonstrating a model poisoning vector.

MLflow Model Poisoning NIST CM-6 Broken Auth

High H-11 · TLS-001

Internal GPU Cluster Traffic Transmitted in Plaintext — Inference Data Interceptable

Service-to-service communications between the inference load balancer, TensorFlow Serving, and Triton Inference Server used unencrypted HTTP. Network packet capture on the VPC confirmed inference payloads — including medical images, financial records, and PII-containing text from enterprise clients — traversed the internal network in plaintext, interceptable by any actor with internal network access.

Plaintext Traffic NIST SC-8 mTLS Gap Internal Network

Medium M-01 · COMP-001

No Model Versioning Audit Trail — Cannot Determine Which Model Served Which Inference

No tamper-proof log linked inference requests to specific deployed model versions. When a model produces incorrect outputs, there is no way to determine retroactively which version was active, when it deployed, or who authorized it. For enterprise clients in regulated industries (healthcare AI, financial risk models), model lineage is a direct regulatory requirement.

Model Lineage SOC2 CC4.1 Audit Trail EU AI Act

Medium M-02 · COMP-002

No Data Retention or Deletion Policies — GDPR/CCPA Training Data Erasure Unimplemented

NeuralPath had no data retention schedule and no process to handle GDPR Art. 17 erasure requests. Enterprise clients upload EU/California resident personal data for model training. When customers request deletion, NeuralPath cannot identify all copies, delete derived model artifacts, or provide deletion confirmation. No DPA (Data Processing Agreement) existed in any enterprise customer contract.

GDPR Art. 17 CCPA Data Lifecycle SOC2 CC9.1

Medium M-03 · COMP-003

No AI Ethics or Bias Review — Models for Hiring, Lending, and Diagnosis Deployed Without Fairness Testing

NeuralPath's deployment pipeline had no bias evaluation or ethics review gate. Enterprise clients were deploying models for hiring decisions, loan approvals, and medical diagnosis without any fairness evaluation. Multiple EU AI Act high-risk AI system use cases were in production without required conformity assessments. FTC guidelines require AI bias testing for consequential automated decisions.

AI Ethics EU AI Act Fairness Gap FTC Compliance

Medium M-04 · COMP-004

SOC2 Type II Evidence Collection Gaps — Multiple Controls Will Fail Audit

Evidence review for ongoing SOC2 Type II certification revealed critical gaps: no automated control monitoring (manual evidence gathering only), 43% of infrastructure changes had no ticket or approval record, vendor risk assessments missing for 7 critical sub-processors, and access reviews for production systems had never been conducted. A qualified or adverse finding from auditors is likely without remediation.

SOC2 Type II Evidence Gaps Change Mgmt Access Reviews

Medium M-05 · COMP-005

No Network Micro-Segmentation Between Multi-Tenant Training Jobs

Training jobs from different enterprise tenants ran in adjacent pods within shared network namespaces with no Kubernetes NetworkPolicies deployed. A training job from tenant A could open TCP connections directly to tenant B's training infrastructure, potentially extracting in-flight gradient updates during distributed training or poisoning checkpoints via shared NFS paths.

Network Segmentation Kubernetes NetPol NIST SC-7 Multi-Tenancy

25 Vulnerabilities in an AI/ML Platform:
Training Data Exposed, Models Stolen, RCE Achieved

25 Findings Across All Audit Modules

Frameworks Tested & Mapped

See the Audit in Action

Additional Audit Walkthroughs

Is Your AI Platform Exposed?

25 Vulnerabilities in an AI/ML Platform:Training Data Exposed, Models Stolen, RCE Achieved

25 Findings Across All Audit Modules

Frameworks Tested & Mapped

See the Audit in Action

Additional Audit Walkthroughs

Is Your AI Platform Exposed?

25 Vulnerabilities in an AI/ML Platform:
Training Data Exposed, Models Stolen, RCE Achieved