---
title: Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI
emoji: 💊
colorFrom: blue
colorTo: gray
sdk: docker
app_port: 8000
pinned: True
---

> **Note:** This repo contains only deployment/demo files.  
> For full source, notebooks, and complete code, see [Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI](https://github.com/Kev-HL/capsule-defect-segmentation-api).  

# Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI

This project addresses a real-world computer vision challenge: detecting and localizing defects on medicinal capsules via image classification and segmentation.  
The aim is to deliver a complete pipeline—data preprocessing, model training and evaluation, and deployment, demonstrating practical ML engineering from scratch to API.

---

## Main Repo

This is a minimal clone with only the necessary files from the main repo.  
For full source, notebooks, and complete code, see [Capsule Defect Detection and Segmentation with ConvNeXt+U-Net and FastAPI](https://github.com/Kev-HL/capsule-defect-segmentation-api).  

---

## Project Overview

End-to-end defect detection and localization using the **Capsule** class from the **MVTec AD dataset**.  
Key steps include:
- Data preprocessing, formatting, and augmentation
- Model design (pre-trained backbone + custom heads)
- Training, evaluation, and hyperparameter tuning
- Dockerized FastAPI deployment for inference

*Portfolio project to showcase ML workflow and engineering.*

---

## Key Results

- Evaluation dataset: MVTec AD 'capsule' class, 70/15/15 train/val/test split
- Quantitative results on test evaluation:
  - Classification accuracy: **83 %**
  - Classification defect-only accuracy: **75 %**
  - Defect presence accuracy: **91 %**
  - Segmentation quality (mIoU / Dice): **0.79 / 0.73**
  - Segmentation defect-only quality (mIoU / Dice): **0.70 / 0.55**
- Model artifacts:
  - Original model size (.keras / SavedModel): **345 MB**
  - Raw Converted TFLite size (.tflite): **119 MB**
  - Optimized Converted TFLite size (.tflite): **31 MB** (Dynamic Range Quantization applied)
- Container / runtime:
  - Docker image size: **317 MB**
  - Runtime used: **tflite-runtime + Uvicorn/FastAPI**
  - Avg inference latency (inference only, set tensor + invoke): **239 ms**
  - Avg inference latency (single POST request, measured): **271 ms**
  - Average memory usage during inference: **321 MB**
  - Startup time (local): **72 ms**
- Observations:
  - The app returns expected visualizations and class labels for the MVTec-style test images.
  - POST inference latency measured locally, expect increased latency on real use (network delays)
  - Given the small and highly imbalanced dataset (351 samples, 242 'good' and 109 defective distributed in 5 defect types, ~22 per defect), coupled with the nature of the samples (only distinctive feature is the defect, which in most cases has a small size and varied shape), performance is not as strong as desired, and results lack statistical confidence for a real-case use. Without more data would be difficult to get a reasonable improvement.

---

## Dataset

- *Capsule* class from [MVTec AD dataset](https://www.mvtec.com/company/research/datasets/mvtec-ad)
- License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- Dataset folder contains license file  
- Usage is strictly non-commercial/educational

---

## Tech Stack

- Python
- TensorFlow
- Scikit-Learn
- Numpy / Pandas
- OpenCV / Pillow
- Ray Tune (Experiment tracking)
- OmegaConf (Config management)
- Docker, FastAPI, Uvicorn (Deployment)

---

## Folder Structure

```
data/       # Dataset and annotations
app/        # Inference and deployment code and files
models/     # Saved trained models and training logs
```

---

## How to Run

**Build image for deployment:**  
- Requirements:
  - `models/final_model/final_model.tflite` (included)
  - `app/` folder and contents  (included)
  - `Dockerfile` (included)
  - `.dockerignore` (included)
- From the project root, build and run the Docker image:
```sh
docker build -t cv-app .
docker run -p 8000:8000 cv-app
```
- Open http://0.0.0.0:8000 in your browser to access the demo UI  

_Note: For the full source code and steps on how to recreate the model, visit the full repo (see "Main Repo" section near the top)_  

---

## Citations & References

**Backbone architectures:**
- EfficientNetV2: [EfficientNetV2: Smaller Models and Faster Training](https://arxiv.org/abs/2104.00298) (Mingxing Tan, Quoc V. Le. ICML 2021)
- MobileNetV3: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) (Andrew Howard et al. ICCV 2019)
- ConvNeXt: [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) (Zhuang Liu et al. CVPR 2022)

**Output heads architectures:**
_Not directly implemented, but inspired by:_  
- FCN: [Fully Convolutional Networks for Semantic Segmentation](https://arxiv.org/abs/1411.4038) (Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015)
- U-Net: [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597) (Olaf Ronneberger, Philipp Fischer, Thomas Brox. MICCAI 2015)

---

## Contact

For questions reach out via GitHub (Kev-HL).