SafeRoomAI – Real-Time Anomaly Detection Platform
Hybrid computer vision pipeline (YOLO + Autoencoder) with reproducible MLOps (DVC + MLflow) and fault‑tolerant streaming.
Problem
Security operations required *stream-level* anomaly detection with resilience to network instability. Traditional single‑model approaches struggled with novel events and produced brittle alerting when feeds dropped.
Solution Overview
- Hybrid inference: supervised object/event detection (YOLO) + unsupervised reconstruction error (Autoencoder).
- Streaming orchestrator controlling decode → preprocess → dual-path scoring → aggregation.
- Fallback mechanism: switch to cached local video segment or circular buffer when RTSP unavailable.
- Reproducible, versioned pipeline (code, data, model artifacts).
Architecture Highlights
- Modular Stages: Capture, Frame Queue, Detection, Anomaly Scoring, Aggregation, Output Bus.
- Asynchronous Queues: Micro-batching frames to keep GPU saturated while bounding latency.
- Inference Modes: Primary YOLO events; fallback anomaly score gating reduces false positives.
- Packaging: BentoML service exporting REST + JSON schema for detection & anomaly endpoints.
- Interfaces: FastAPI edge wrapper adds routing, auth stub & health probes.
Reproducibility & Experimentation
- DVC tracks datasets, augmentation configs, and model checkpoints (hash-pinned lineage).
- MLflow logs parameters, metrics, artifacts, env fingerprint; run IDs embedded in service metadata.
- Deterministic builds: lockfiles + Docker layering; build args capture model/runtime versions.
Security & Quality
- Container principle of least privilege (non-root user, minimal base image).
- Structured logging (JSON) for ingestion anomalies & inference latency.
- Pluggable auth gateway placeholder (JWT / API key) for future RBAC.
- Static analysis / lint gates (Ruff, mypy optional) in CI (planned integration).
Performance & Optimization
- Quantization & selective layer fusion to reduce inference time.
- Frame skip heuristics based on motion / event density thresholds.
- Parallel decode (ffmpeg) pipelined with GPU inference to hide I/O latency.
- Candidates: p95 latency target < 70ms (replace with audited number), sustained FPS scaling across N streams.
Deployment & Operations
- BentoML build → Docker multi-stage image → K8s manifest (readiness & liveness).
- Config via env & mounted model store; hot-reload friendly architecture for model swaps.
- Planned lightweight metrics exporter (Prometheus / custom JSON) for live dashboard integration.
Timeline (Condensed)
- Problem Needed real-time anomaly detection for camera streams with graceful fallback when RTSP feeds become unavailable.
- Design Dual path: YOLO for object/event semantics + Autoencoder reconstruction error for unsupervised anomaly scoring.
- Reproducibility DVC pinning data/model artifacts; MLflow run metadata & metrics ensure repeatable experiments.
- Deployment BentoML packaged service containerized; readiness & liveness probes for K8s orchestration.
- Optimization Quantization, micro-batching & frame skip heuristics targeting latency reduction (e.g., ~120ms → ~70ms p95 candidate).
Next Focus: finalize empiric latency benchmarks, introduce auto-scaling policy tuned to queue depth,
integrate alert fan‑out channel (webhook / email) and add subscription feed for near real-time anomaly notifications.