YOLOv12 Steel Surface Defect Detection

A mill inspector staring at moving steel coils for scratches, inclusions, and pits gets tired and misses things. This project automates that with Ultralytics YOLOv12, training an object detector on the public NEU-DET dataset: a steel-surface image goes in, the model draws boxes over defects with a class label and confidence.

The task

NEU-DET (Northeastern University surface-defect database for hot-rolled steel strip) is a classic industrial-vision benchmark with 6 defect classes:

Class	Meaning
`crazing`	fine crack network — the hardest class to detect
`inclusion`	foreign-material inclusions
`pitted_surface`	pitting / point corrosion
`scratches`	scratches
`patches`	surface patches
`rolled-in_scale`	rolled-in oxide scale

About ~5000 grayscale steel-surface images, each with YOLO-format labels (class + normalized xywh). This is object detection, not just "defect / no defect" classification — the model has to locate where the defect is and decide which class.

Data config (dataset.yaml)

Ultralytics describes a dataset with a single dataset.yaml declaring the train/val paths and the class-name order:

path: ./NEU-DET
train: images/train
val: images/val

names:
  0: crazing
  1: inclusion
  2: pitted_surface
  3: patches
  4: scratches
  5: rolled-in_scale

The order in names is the class index used in the label files — get it wrong and the whole run is silently mislabeled, so this is the first thing to verify.

Training recipe (3-yolo-steel.py)

The pipeline is three stages — train → val → predict — all on Ultralytics' high-level API:

from ultralytics import YOLO

# 1. Load YOLOv12 pretrained weights for transfer learning
model = YOLO("yolov12n.pt")

# 2. Train on NEU-DET
model.train(
    data="dataset.yaml",
    epochs=100,
    imgsz=640,
    mosaic=1.0,        # stitch 4 images — boosts small-object + context variety
    mixup=0.1,         # linear image blending
    copy_paste=0.1,    # cut instances out and paste them onto other images
)

# 3. Evaluate on the val split → mAP / per-class AP
metrics = model.val()

# 4. Inference on held-out steel images
results = model.predict("samples/steel_held_out.jpg", conf=0.25)
results[0].show()      # overlay boxes

Key settings:

imgsz=640 — bigger images make small defects clearer; 640 is the usual speed/accuracy balance
mosaic + mixup + copy-paste — defect samples are naturally imbalanced (many scratches, few crazing), so augmentation pads the minority classes and improves generalization
epochs≈100 — starting from pretrained weights, a single domain like steel converges fast
conf=0.25 — inference threshold: raise it and you miss defects (recall ↓), lower it and you over-flag (precision ↓); QC scenarios usually favor recall

Training ran on an AutoDL GPU, producing best.pt (the highest-val-mAP checkpoint), which inference then loads.

Why YOLO instead of a two-stage detector

Dimension	Why YOLO
Speed	single-stage, end-to-end — coils move fast on the line, so detection must be real-time
Sufficient	for regular surface defects like NEU-DET, YOLO accuracy is enough; no Faster R-CNN two-stage overhead needed
Ecosystem	Ultralytics wraps train/val/predict/export, so it's one line from training to ONNX/TensorRT deployment

What this signals

You can run end-to-end industrial CV: data config → training → evaluation → inference, not just calling a classification API
It's a real deployment scenario — automated steel-surface QC with 6 named defect classes and a clear production motivation
It complements the VLM / RAG projects in the portfolio: this is the first object-detection / industrial-CV project, filling in the classical-CV lane

Demo strategy

What the demo maps to

The training data (NEU-DET, ~5000 images) and the training code (3-yolo-steel.py / dataset.yaml) are real, from the course; but the trained best.pt weights are NOT shipped with the site, and no model runs in the browser. So this is framed as a reproducible training recipe + an illustrative inference demo: the defect boxes, confidence scores, and per-class AP in the interactive demo are representative of a typical NEU-DET YOLO run (labeled illustrative), not a measured benchmark from shipped weights.

Public preview can be enabled later without redesigning the case-study layout