Anomaly Detection

This project demonstrates an end-to-end Machine Learning solution for detecting fraudulent credit card transactions. It encompasses data preprocessing, model training and optimization, and deployment as a containerized RESTful API. The goal is to identify anomalous transactions that might indicate fraud, leveraging a real-world imbalanced dataset.

Project Brief

This project detects fraudulent credit card transactions using an end-to-end ML workflow: preprocessing, model training/optimization, and deployment as a containerized REST API. It targets real-world class imbalance and optimizes the precision/recall trade-off for the fraud class.

Project Duration (Estimate)

Part-time (evenings/weekends): 6–8 weeks

Repositories

Frontend: https://github.com/NRicky25/anomaly-frontend
Backend: https://github.com/NRicky25/anomaly-detector

Approach

Planning & Problem Framing
- Defined the goal: detect fraudulent transactions with high recall while keeping precision practical for review teams
- Identified constraints: severe class imbalance, limited interpretability, need for real-time inference
Data Preparation
- Loaded the public credit-card dataset; separated train/validation/test splits
- Scaled key features (Amount, Time) and preserved the anonymized V1–V28 components as-is
- Applied stratified splits to maintain class ratios across sets
Modeling
- Started with baseline (Logistic Regression) → moved to RandomForest for non-linear boundaries
- Handled imbalance with class_weight and careful cross-validation
- Tracked metrics beyond accuracy: ROC-AUC, PR-AUC, Precision/Recall/F1 on the fraud class
Threshold Tuning
- Optimized the decision threshold for the fraud class (maximize F1 while guarding precision)
- Validated the chosen threshold on a hold-out set to avoid optimistic bias
API & Contracts
- Exported the trained model + scalers with joblib
- Designed FastAPI schemas (Pydantic) for single/batch prediction with strict validation
- Exposed `/predict` and documented with Swagger UI & ReDoc
Packaging & Deployment
- Containerized the service with Docker for reproducible local and cloud runs
- Environment-driven config for thresholds, model paths, and log levels
Testing & QA
- Smoke tests for API routes and schema errors (invalid/missing fields)
- Metric checks to ensure degradation doesn’t slip through (spot-check F1/precision/recall)
Monitoring & Next Steps
- Baseline logging for predictions and errors; plan for drift checks on score distributions
- Future: model retraining pipeline, alerting on metric drops, feature importance reports
Project Duration (Estimate)
- Part-time (evenings/weekends): 6–8 weeks for MVP (data → model → API → Docker)
- Add 2–4 weeks for monitoring, retraining workflow, and CI hardening

Features

Prediction API: FastAPI endpoints for single & batch scoring
Interactive Docs: Swagger UI (/docs) and ReDoc (/redoc)
Model Artifacts: joblib-exported model and scalers
Threshold Tuning: calibrated decision threshold for fraud class
Validation: Pydantic schemas with robust error responses
Containerization: Docker image for easy run/deploy
CI: optional GitHub Actions for lint/build/test

Tools & Technologies

Python 3.10scikit-learnpandasnumpyjoblibFastAPIuvicornpydanticDockerGitGitHubJupyter Notebook