A full-spectrum security assessment of NeuralPath AI — a multi-tenant ML model serving platform hosting enterprise inference workloads for 200 clients. We found training data publicly accessible on S3, model weights downloadable without authentication, and Remote Code Execution via insecure pickle deserialization. Privileged GPU containers delivered a reliable path to full Kubernetes cluster compromise.
NeuralPath AI is a cloud-native ML model serving platform where enterprise customers upload proprietary training datasets, NeuralPath hosts and fine-tunes models on GPU infrastructure, and clients consume predictions via REST API endpoints. With 200 enterprise clients, millions of daily inference requests, and petabytes of sensitive training data, the security stakes are extraordinarily high.
Our black-box assessment began by mapping the external attack surface. Within the first 2 hours, we identified that training datasets were stored in publicly accessible S3 buckets — 4.2TB of enterprise customer data including proprietary medical imaging, financial time series, and PII-laden NLP corpora accessible with a single unauthenticated AWS CLI command.
Model weight files were downloadable without authentication. A crafted pickle file uploaded to the model upload endpoint executed arbitrary code on inference servers. Privileged GPU containers — running with --privileged=true — provided a path from tenant-controlled training container to full Kubernetes cluster-admin access. Over 5 days, we catalogued 25 findings across all 6 audit modules, demonstrating that all 200 enterprise clients' model IP and training data were simultaneously at risk.
All findings were delivered with full proof-of-concept exploits, technical remediation guidance, and a prioritized remediation roadmap. Critical infrastructure isolation was completed within 72 hours.
9 Critical, 11 High, 5 Medium — spanning training data exfiltration, model IP theft, RCE, container escape, and SOC2 compliance gaps. Every finding includes a proof-of-concept and full remediation.
aws s3 ls command.Enable S3 Block Public Access at account level. Restrict bucket policies to specific IAM roles. Enable SSE-KMS encryption with per-tenant CMKs. Enable S3 access logging and CloudTrail data events.
Require authentication on all artifact endpoints. Implement S3 presigned URLs with 1-hour TTL. Add per-tenant ownership verification — tenant A cannot download models belonging to tenant B.
s3:*, ec2:*, sagemaker:* permissions, enabling full training infrastructure access and compute resource abuse at scale.Rotate all exposed credentials immediately. Run TruffleHog on full repo history. Implement pre-commit hooks. Migrate to IAM instance profiles and GCP Workload Identity — no long-lived credentials in code.
dataset_path parameter in training job submissions, we accessed training datasets belonging to Fortune 500 enterprise tenants from a low-tier trial account. A fundamental multi-tenancy breach affecting all 200 clients.Implement storage-level isolation: separate S3 buckets or KMS per-tenant encryption. Enforce tenant scoping at IAM policy level using resource tags. Cryptographic ownership verification before any storage operation.
--anonymous-auth=true. Unauthenticated kubectl commands returned pod listings, service configs, and secret names. Full cluster compromise achievable without credentials — all 847 active training jobs accessible.Disable anonymous auth (--anonymous-auth=false). Restrict API server to internal VPC only — remove public DNS. Enable Kubernetes audit logging. Use dedicated bastion/jump host for cluster admin.
pickle.load() without sandboxing. We uploaded a crafted pickle file that executed os.system("id; cat /etc/passwd") — confirmed code execution as the inference-worker service account with access to all tenant models and the internal Kubernetes API.Use torch.load(weights_only=True) or migrate to SafeTensors/ONNX. Validate model files in a sandboxed gVisor container with no network access and read-only filesystem before accepting uploads.
Enable EBS encryption on all volumes and set as account default. Implement per-tenant KMS CMKs for model artifact encryption. Enable CloudTrail for all EBS and KMS API calls.
inference-svc service account. No workload isolation, GPU memory partitioning, or per-request sandboxing. Adversarial inputs from one tenant could read cached GPU memory belonging to another tenant's recently completed inference batch.Deploy per-tenant Triton instances or use NVIDIA MIG GPU partitioning. Implement strict Kubernetes NetworkPolicies. Add GPU memory clearing between tenant inference batches. Evaluate NVIDIA Confidential Computing for cryptographic GPU isolation.
--privileged=true and seccomp=unconfined. Using CVE-2022-0185 (Linux kernel cgroup boundary bypass), we achieved root on the underlying EC2 host node from within a tenant-controlled training container. From the host, we extracted cluster-admin Kubernetes tokens — a reliable path from any tenant's training job to full cluster compromise.Remove --privileged — use the NVIDIA device plugin for Kubernetes instead. Apply Pod Security Admission to block privileged containers. Enable Falco for runtime container escape detection. Patch Linux kernel to 5.16.2+.
[INT_MAX, INT_MAX, 3] caused the TensorFlow Serving instance to allocate 18GB before OOM-crashing, taking the shared inference server offline for all 200 tenants for 4 minutes. FGSM adversarial inputs also bypassed customer-deployed fraud detection models.cluster-admin ClusterRoleBinding — the highest Kubernetes privilege. The mounted service account token was readable from within any container. Combined with the Pickle RCE finding, this gave us full Kubernetes API access from a single model upload, including reading secrets across all tenant namespaces.no_root_squash, allowing any root process on a mounted host — including compromised training containers — to read all tenant data across all NFS paths.DELETE /api/v1/models/{model_id} with any enterprise tenant's model ID and succeed. We deleted and recreated a production model belonging to a different tenant without any authorization error. Complete sabotage vector for any enterprise client.tensorflow/serving:2.11.0) revealed CVE-2023-25801 (CVSS 9.8, RCE in REST API handler), CVE-2022-41911 (CVSS 8.8, heap overflow in batch inference), and CVE-2021-41203 (CVSS 9.8, type confusion RCE). Public exploits exist for all three. Images were 18 months out of date despite patches in TF 2.14.0+.s3:*, ec2:*, iam:*, sts:*, sagemaker:* on Resource *. Training jobs run user-controlled code — making EC2 metadata SSRF trivial. An attacker controlling a training job could obtain temporary keys with full AWS account permissions, enabling complete account takeover.Every finding was mapped to the frameworks that apply to enterprise AI/ML platforms handling proprietary training data for regulated industries. Pre-remediation compliance scores tell the full story.
Watch us walk through the NeuralPath findings live — demonstrating the S3 data exfiltration, Pickle RCE, and container escape chain from tenant training job to Kubernetes cluster-admin. Coming soon.
Three complete audit walkthroughs across different industries and threat profiles — each covering a distinct compliance framework and attack surface.
If you're hosting ML models for enterprise clients, training data isolation and model IP protection aren't optional. We'll find what's exposed before someone else does.