[OD] 객체 탐지(Object Detection) 대표 데이터 포맷 공부

[OD] 객체 탐지(Object Detection) 대표 데이터 포맷 공부 | COCO, Pascal VOC, YOLO

sean11

|2024. 10. 4. 09:00

데이터 포맷

주어진 이미지와 그에 해당하는 클래스가 대응되는 분류 문제와 다르게 객체 탐지 문제는 객체를 찾고(Localization), 이를 분류(Classification)해야 하는 두가지 일이 존재하기 때문에 데이터의 양식이 조금 더 복잡합니다. 특히, 하나의 이미지에서 여러 개의 객체가 존재할 수도 있기 때문에 더욱 어려운 문제가 될 수 있죠.

이러한 문제들을 극복하기 위해 데이터에 레이블을 붙이는 어노테이션(annotation)을 수행하게 되는데, 어노테이션 방식에 따라 데이터를 처리하는 방식이 달라져야 합니다. 가장 대표적인 객체 탐지 데이터 형식은 COCO, Pascal VOC, YOLO 등이 있습니다.

COCO

COCO는 Common Objects in COntext의 약자로 해당 데이터셋은 객체 탐지, 세그멘테이션, 키포인트 추출 등 다양한 작업을 위한 대규모 데이터셋입니다. COCO 데이터셋은 JSON 파일로 객체의 위치, 크기, 분류 정보를 저장하고 있습니다. 객체 탐지를 위한 주요한 정보는 아래와 같이 정리할 수 있습니다.

Image Info : 이미지 id, 이미지 파일명, 크기(width, height), 이미지 출처 및 수집 시기 등 관련 정
Categories : 클래스 정보(id, 명칭)
Annotations : 각 객체에 대한 어노테이션 정보로 bounding box, segmentation 정보, 클래스 id 등 포함
- Bounding box : (x, y, width, height) 형태로 저장되며, 좌측 상단 모서리 좌표(x,y)와 너비와 높이
- Segmentation : 객체를 폴리곤 형태로 나타내는 좌표 목록
- Category ID : 객체 클래스의 ID
- IsCrowd : 객체가 군중인지 여부 (1이면 군중)

특히, Annotations가 중요하므로 예시를 살펴보면 아래와 같이 나타납니다.

좀 더 자세한 내용을 살펴보고 싶으신 경우 아래 링크들을 참고하면 데이터셋 운영주체나 AWS에서 기술하고 있는 설명을 확인할 수 있습니다.

COCO - Common Objects in Context

cocodataset.org

The COCO dataset format - Rekognition

The COCO dataset format A COCO dataset consists of five sections of information that provide information for the entire dataset. The format for a COCO object detection dataset is documented at COCO Data Format. info – general information about the datase

docs.aws.amazon.com

Pascal VOC

Pascal VOC(Visual Object Classes)는 객체 탐지 작업을 위해 XML 형식으로 어노테이션을 저장하는 데이터셋입니다. 이미지와 객체의 bounding box 정보를 기록하는 방식으로 다음과 같은 주요 어노테이션 포맷으로 활용이 됩니다.

Folder :이미지가 저장된 폴더
Filename : 이미지 파일 이름
Size :이미지 너비, 높이, 채널 정보
Object : 이미지에 포함된 객체 정보로 주요 어노테이션 정보 포함
- Name : 객체 클래스 이름
- Bounding box : (xmin, ymin, xmax, ymax)로 객체의 bounding box 좌표
- Difficult : 객체 탐지가 어려운 경우 1 아닌 경우 0

img0001에 대한 정보를 담은 Pascal VOC 포맷 예시입니다. 보면 상단에는 이미지에 대한 정보들이 있고, object 부터 이미지에 존재하는 여러 객체들에 대한 정보(클래스, 바운딩박스 위치 정보)를 담고 있는 것을 확인할 수 있습니다.

YOLO

YOLO(You Only Look Once)는 객체 탐지 모델로 유명하지만, 데이터셋의 어노테이션 양식도 있습니다. YOLO의 포맷은 txt 형태로 저장되며, 각 줄에 한 객체의 어노테이션 정보를 한 칸씩 띄어쓰기한 형태로 포함합니다.
(이러한 형태로 ➡️ 클래스 x y w h)

클래스 인덱스 : 객체가 어떤 클래스에 포함되는지 표현하는 정수
바운딩 박스 좌표 : (x_center, y_center, width, height) 순으로 표현되며, 모두 상대적인 좌표 (0~1 사이 존재)로 주어집니다.

YOLO 데이터셋 포맷의 예시는 아래와 같습니다. 다른 형태에 비해 비교적 간단하게 담아냅니다.

데이터 변환

위에서 살펴보면 데이터셋이 담는 파일 양식(json, xml, txt), 바운딩 박스 표현도 다른 것을 알 수 있습니다. 양식이야 그때 그때 맞춰주면 되겠지만, 바운딩 박스 양식은 크게 영향을 받을 수 있는데 각각의 데이터셋을 변환하는 방법을 정리하려고 합니다. 사실 원리만 알면 2~3개만 정리해도 되지만, 언제든 편리하게 갖다 쓰기 위해 모든 경우의 수를 고려한 6가지 소제목으로 아래와 같이 정리합니다.

COCO ➡️ Pascal VOC

def coco_to_voc(coco_annotations):
    voc_annotations = []
    for ann in coco_annotations:
        x_min, y_min, width, height = ann["bbox"]
        xmax = x_min + width
        ymax = y_min + height
        class_name = coco_category_to_voc_class(ann["category_id"])  # Map category ID to class name
        
        voc_annotations.append({
            "class_name": class_name,
            "xmin": int(x_min),
            "ymin": int(y_min),
            "xmax": int(xmax),
            "ymax": int(ymax)
        })
    return voc_annotations

COCO ➡️ YOLO

def coco_to_yolo(coco_annotations, img_width, img_height):
    yolo_annotations = []
    for ann in coco_annotations:
        x_min, y_min, width, height = ann["bbox"]
        x_center = (x_min + width / 2) / img_width
        y_center = (y_min + height / 2) / img_height
        width = width / img_width
        height = height / img_height
        class_id = ann["category_id"] - 1  # YOLO는 0부터 시작
        yolo_annotations.append([class_id, x_center, y_center, width, height])
    return yolo_annotations

Pascal VOC ➡️ COCO

import xml.etree.ElementTree as ET

def voc_to_coco(voc_annotations, img_id, category_id_map):
    coco_annotations = []
    annotation_id = 1  # Unique ID for each annotation
    
    for voc_annotation in voc_annotations:
        tree = ET.parse(voc_annotation)
        root = tree.getroot()

        img_filename = root.find('filename').text
        img_width = int(root.find('size/width').text)
        img_height = int(root.find('size/height').text)
        
        for obj in root.findall('object'):
            class_name = obj.find('name').text
            if class_name not in category_id_map:
                continue  # Skip unknown classes
            category_id = category_id_map[class_name]

            bndbox = obj.find('bndbox')
            xmin = int(bndbox.find('xmin').text)
            ymin = int(bndbox.find('ymin').text)
            xmax = int(bndbox.find('xmax').text)
            ymax = int(bndbox.find('ymax').text)

            x_min = xmin
            y_min = ymin
            width = xmax - xmin
            height = ymax - ymin

            # Create COCO annotation
            annotation = {
                "id": annotation_id,
                "image_id": img_id,
                "category_id": category_id,
                "bbox": [x_min, y_min, width, height],
                "area": width * height,
                "iscrowd": 0
            }
            coco_annotations.append(annotation)
            annotation_id += 1
        
    return coco_annotations

Pascal VOC ➡️ YOLO

import xml.etree.ElementTree as ET

def voc_to_yolo(voc_annotation, img_width, img_height):
    yolo_annotations = []
    tree = ET.parse(voc_annotation)
    root = tree.getroot()
    
    for obj in root.findall('object'):
        class_name = obj.find('name').text
        bndbox = obj.find('bndbox')
        xmin = int(bndbox.find('xmin').text)
        ymin = int(bndbox.find('ymin').text)
        xmax = int(bndbox.find('xmax').text)
        ymax = int(bndbox.find('ymax').text)

        x_center = (xmin + xmax) / 2 / img_width
        y_center = (ymin + ymax) / 2 / img_height
        width = (xmax - xmin) / img_width
        height = (ymax - ymin) / img_height

        # Assume a mapping from class names to YOLO class indices
        class_id = class_name_to_id(class_name)  # e.g., {'person': 0}
        yolo_annotations.append([class_id, x_center, y_center, width, height])
    return yolo_annotations

YOLO ➡️ COCO

def yolo_to_coco(x_center, y_center, width, height, img_width, img_height):
    # YOLO에서는 w, h가 비율이므로 원래 이미지의 너비와 높이를 따로 받아야 함
    x_min = (x_center - width / 2) * img_width
    y_min = (y_center - height / 2) * img_height
    width = width * img_width
    height = height * img_height
    return [x_min, y_min, width, height]
    
def convert_yolo_to_coco(yolo_annotations, img_width, img_height):
    coco_annotations = []
    for ann in yolo_annotations:
        class_id, x_center, y_center, width, height = ann
        bbox = yolo_to_coco(x_center, y_center, width, height, img_width, img_height)
        coco_annotations.append({
            "category_id": int(class_id),
            "bbox": bbox,
            "area": bbox[2] * bbox[3],
            "iscrowd": 0
        })
    return coco_annotations

YOLO ➡️ Pascal VOC

import xml.etree.ElementTree as ET

def yolo_to_voc(x_center, y_center, width, height, img_width, img_height):
    xmin = (x_center - width / 2) * img_width
    ymin = (y_center - height / 2) * img_height
    xmax = (x_center + width / 2) * img_width
    ymax = (y_center + height / 2) * img_height
    return int(xmin), int(ymin), int(xmax), int(ymax)

def create_voc_annotation(filename, width, height, bbox, class_name):
    annotation = ET.Element("annotation")
    ET.SubElement(annotation, "filename").text = filename

    size = ET.SubElement(annotation, "size")
    ET.SubElement(size, "width").text = str(width)
    ET.SubElement(size, "height").text = str(height)

    obj = ET.SubElement(annotation, "object")
    ET.SubElement(obj, "name").text = class_name

    bndbox = ET.SubElement(obj, "bndbox")
    ET.SubElement(bndbox, "xmin").text = str(bbox[0])
    ET.SubElement(bndbox, "ymin").text = str(bbox[1])
    ET.SubElement(bndbox, "xmax").text = str(bbox[2])
    ET.SubElement(bndbox, "ymax").text = str(bbox[3])

    return ET.tostring(annotation)

참고자료

[1] https://cocodataset.org/#format-data
[2] https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/md-coco-overview.html
[3] https://roboflow.com/formats/pascal-voc-xml
[4] https://docs.ultralytics.com/ko/datasets/detect/#ultralytics-yolo-format

'Python > Data Prep' 카테고리의 다른 글

파이썬 이미지 처리 라이브러리 비교 \| PIL, OpenCV, Numpy, PyTorch, Albumentations (0)	2024.11.18
DeepL을 활용해서 외국어 OCR 영수증 데이터셋 합성하기 (2)	2024.11.09
DataLoader에서 오류가 난다면 누락 데이터가 있는지 확인 필요 \| DataLoader는 이터레이터 (2)	2024.09.26
PyTorch에서 Dataset과 DataLoader 클래스를 활용해 데이터 파이프라인 구축하기 (1)	2024.09.09
Colab에서 Kaggle 데이터셋 가져오기 \| Kaggle, API, Colab (0)	2024.08.10