--- title: Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI emoji: 💊 colorFrom: blue colorTo: gray sdk: docker app_port: 8000 pinned: True --- > **Note:** This repo contains only deployment/demo files. > For full source, notebooks, and complete code, see [Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI](https://github.com/Kev-HL/capsule-defect-segmentation-api). # Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI This project addresses a real-world computer vision challenge: detecting and localizing defects on medicinal capsules via image classification and segmentation. The aim is to deliver a complete pipeline—data preprocessing, model training and evaluation, and deployment, demonstrating practical ML engineering from scratch to API. --- ## Main Repo This is a minimal clone with only the necessary files from the main repo. For full source, notebooks, and complete code, see [Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI](https://github.com/Kev-HL/capsule-defect-segmentation-api). --- ## Project Overview End-to-end defect detection and localization using the **Capsule** class from the **MVTec AD dataset**. Key steps include: - Data preprocessing, formatting, and augmentation - Model design (pre-trained backbone + custom heads) - Training, evaluation, and hyperparameter tuning - Dockerized FastAPI deployment for inference *Portfolio project to showcase ML workflow and engineering.* --- ## Key Results - Evaluation dataset: MVTec AD 'capsule' class, 70/15/15 train/val/test split - Quantitative results on test evaluation: - Classification accuracy: **83 %** - Classification defect-only accuracy: **75 %** - Defect presence accuracy: **91 %** - Segmentation quality (mIoU / Dice): **0.79 / 0.73** - Segmentation defect-only quality (mIoU / Dice): **0.70 / 0.55** - Model artifacts: - Original model size (.keras / SavedModel): **345 MB** - Raw Converted TFLite size (.tflite): **119 MB** - Optimized Converted TFLite size (.tflite): **31 MB** (Dynamic Range Quantization applied) - Container / runtime: - Docker image size: **317 MB** - Runtime used: **tflite-runtime + Uvicorn/FastAPI** - Avg inference latency (inference only, set tensor + invoke): **239 ms** - Avg inference latency (single POST request, measured): **271 ms** - Average memory usage during inference: **321 MB** - Startup time (local): **72 ms** - Observations: - The app returns expected visualizations and class labels for the MVTec-style test images. - POST inference latency measured locally, expect increased latency on real use (network delays) - Given the small and highly imbalanced dataset (351 samples, 242 'good' and 109 defective distributed in 5 defect types, ~22 per defect), coupled with the nature of the samples (only distinctive feature is the defect, which in most cases has a small size and varied shape), performance is not as strong as desired, and results lack statistical confidence for a real-case use. Without more data would be difficult to get a reasonable improvement. --- ## Dataset - *Capsule* class from [MVTec AD dataset](https://www.mvtec.com/company/research/datasets/mvtec-ad) - License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) - Dataset folder contains license file - Usage is strictly non-commercial/educational --- ## Tech Stack - Python - TensorFlow - Scikit-Learn - Numpy / Pandas - OpenCV / Pillow - Ray Tune (Experiment tracking) - OmegaConf (Config management) - Docker, FastAPI, Uvicorn (Deployment) --- ## Folder Structure ``` data/ # Dataset and annotations app/ # Inference and deployment code and files models/ # Saved trained models and training logs ``` --- ## How to Run **Build image for deployment:** - Requirements: - `models/final_model/final_model.tflite` (included) - `app/` folder and contents (included) - `Dockerfile` (included) - `.dockerignore` (included) - From the project root, build and run the Docker image: ```sh docker build -t cv-app . docker run -p 8000:8000 cv-app ``` - Open http://0.0.0.0:8000 in your browser to access the demo UI _Note: For the full source code and steps on how to recreate the model, visit the full repo (see "Main Repo" section near the top)_ --- ## Citations & References **Backbone architectures:** - EfficientNetV2: [EfficientNetV2: Smaller Models and Faster Training](https://arxiv.org/abs/2104.00298) (Mingxing Tan, Quoc V. Le. ICML 2021) - MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) (Andrew Howard et al. ICCV 2019) - ConvNeXt: [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) (Zhuang Liu et al. CVPR 2022) **Output heads architectures:** _Not directly implemented, but inspired by:_ - FCN: [Fully Convolutional Networks for Semantic Segmentation](https://arxiv.org/abs/1411.4038) (Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015) - U-Net: [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597) (Olaf Ronneberger, Philipp Fischer, Thomas Brox. MICCAI 2015) --- ## Contact For questions reach out via GitHub (Kev-HL).