Anomaly Detection
This project demonstrates an end-to-end Machine Learning solution for detecting fraudulent credit card transactions. It encompasses data preprocessing, model training and optimization, and deployment as a containerized RESTful API. The goal is to identify anomalous transactions that might indicate fraud, leveraging a real-world imbalanced dataset.

Project Brief
This project detects fraudulent credit card transactions using an end-to-end ML workflow: preprocessing, model training/optimization, and deployment as a containerized REST API. It targets real-world class imbalance and optimizes the precision/recall trade-off for the fraud class.
Project Duration (Estimate)
Part-time (evenings/weekends): 6–8 weeks
Repositories
Approach
- Planning & Problem Framing
- Defined the goal: detect fraudulent transactions with high recall while keeping precision practical for review teams
- Identified constraints: severe class imbalance, limited interpretability, need for real-time inference
- Data Preparation
- Loaded the public credit-card dataset; separated train/validation/test splits
- Scaled key features (Amount, Time) and preserved the anonymized V1–V28 components as-is
- Applied stratified splits to maintain class ratios across sets
- Modeling
- Started with baseline (Logistic Regression) → moved to RandomForest for non-linear boundaries
- Handled imbalance with class_weight and careful cross-validation
- Tracked metrics beyond accuracy: ROC-AUC, PR-AUC, Precision/Recall/F1 on the fraud class
- Threshold Tuning
- Optimized the decision threshold for the fraud class (maximize F1 while guarding precision)
- Validated the chosen threshold on a hold-out set to avoid optimistic bias
- API & Contracts
- Exported the trained model + scalers with joblib
- Designed FastAPI schemas (Pydantic) for single/batch prediction with strict validation
- Exposed `/predict` and documented with Swagger UI & ReDoc
- Packaging & Deployment
- Containerized the service with Docker for reproducible local and cloud runs
- Environment-driven config for thresholds, model paths, and log levels
- Testing & QA
- Smoke tests for API routes and schema errors (invalid/missing fields)
- Metric checks to ensure degradation doesn’t slip through (spot-check F1/precision/recall)
- Monitoring & Next Steps
- Baseline logging for predictions and errors; plan for drift checks on score distributions
- Future: model retraining pipeline, alerting on metric drops, feature importance reports
- Project Duration (Estimate)
- Part-time (evenings/weekends): 6–8 weeks for MVP (data → model → API → Docker)
- Add 2–4 weeks for monitoring, retraining workflow, and CI hardening
Features
- Prediction API: FastAPI endpoints for single & batch scoring
- Interactive Docs: Swagger UI (/docs) and ReDoc (/redoc)
- Model Artifacts: joblib-exported model and scalers
- Threshold Tuning: calibrated decision threshold for fraud class
- Validation: Pydantic schemas with robust error responses
- Containerization: Docker image for easy run/deploy
- CI: optional GitHub Actions for lint/build/test
Tools & Technologies
Python 3.10scikit-learnpandasnumpyjoblibFastAPIuvicornpydanticDockerGitGitHubJupyter Notebook