COCO dataset是计算机视觉领域中最流行的数据集之一,用于对各种视觉任务进行基准测试,例如目标检测、分割、关键点检测等。

在数据集中,有118K张图像用于训练,5K张图像用于验证。下载数据集后,目录中内容如下:

COCO Annotations,COCO标注的基础信息,在大多数情况下, COCO API 可以用于帮助我们从复杂的json注释文件中轻松访问数据和标签。

instances_train2017.json的数据结构如下:

{"info": {"description": "COCO 2017 Dataset","url": "http://cocodataset.org","version": "1.0",...},"licenses": {{"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/","id": 1,"name": "Attribution-NonCommercial-ShareAlike License"},...},"images": [{"license": 4,"file_name": "000000397133.jpg","coco_url": "http://images.cocodataset.org/val2017/000000397133.jpg","height": 427,"width": 640,"date_captured": "2013-11-14 17:02:52","flickr_url": "http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg","id": 397133},...],"annotations": [{"segmentation": RLE or [polygon],"area": float,"iscrowd": 0 or 1,"image_id": int,"bbox": [x, y, width, height],"category_id": int,"id": int},...],"categories": [{"supercategory": str,"id": int,"name": str},...]}

images

images字段包含训练集的图像信息,如filename,width,height, 以及id。其中,id对于每个图像都是唯一的,用于索引数据集中的图像数据。

categories

categories字段包含 class/label 名称作为字符串,每一个类别都分配了唯一的类别id以便于访问.

annotations

annotations字段包含所有的object instances, 每一个实例都标有一系列注释。

注意:目标实例的数量通常大于图像的数量,因为一张图像中通常有多个目标。

每一个annotation都有以下字段:

id

  • int, 实例 id, 每个注释都有唯一的id.

image_id

  • int, 用于标识当前目标属于哪一张图像.

category_id

  • int, 用于识别类别.

bbox

  • [x, y, width, height], 边界框坐标.

    格式为[box top-left corner x, box top-left corner y, box width, box height]. 请注意,[0,0]坐标是图像的左上角。

iscrowd

  • 0 or 1,iscrowd=1用于标记一大群人。

segmentation

  • RLEor[polygon], ifiscrowd=0, return[polygon].

    [polygon]是目标掩码的一组点,用于单个目标。格式为[x0, y0, x1, y1, x2, y2, ...].

    RLE(Run Length Encoding)用于一组目标,RLE格式为:

  • segmentation:{"counts": [179, 27, 392 ...],"size": [426,640,]}

    RLE是一种用于表示每个像素属于前景还是背景的编码方式。size存储图像的长度和高度。counts连续存储前景或背景中的像素数量。

    例如,我们有以下图像和掩码:

RLE编码对属于背景的像素数进行计数(从左上角开始,逐行),直到遇到前景像素,将这个数字存储在counts中,然后计算前景像素的数量并存储在counts中。

/

JSON文件主要包含以下字段:

{"info": info, # dict"licenses": [license], # list ,内部是dict"images": [image], # list ,内部是dict"annotations": [annotation], # list ,内部是dict"categories": # list ,内部是dict}

读取json文件的方法:

>>> import json>>> val=json.load(open('instances_val2017.json', 'r'))>>> val.keys()dict_keys(['info', 'licenses', 'images', 'annotations', 'categories'])

前两个key(没有用到,只是说明了数据集信息和版权相关的信息。):

>>> val['info']{'description': 'COCO 2017 Dataset', 'url': 'http://cocodataset.org', 'version': '1.0', 'year': 2017, 'contributor': 'COCO Consortium', 'date_created': '2017/09/01'} >>> val['licenses'][{'url': 'http://creativecommons.org/licenses/by-nc-sa/2.0/', 'id': 1, 'name': 'Attribution-NonCommercial-ShareAlike License'}, {'url': 'http://creativecommons.org/licenses/by-nc/2.0/', 'id': 2, 'name': 'Attribution-NonCommercial License'}, {'url': 'http://creativecommons.org/licenses/by-nc-nd/2.0/', 'id': 3, 'name': 'Attribution-NonCommercial-NoDerivs License'}, {'url': 'http://creativecommons.org/licenses/by/2.0/', 'id': 4, 'name': 'Attribution License'}, {'url': 'http://creativecommons.org/licenses/by-sa/2.0/', 'id': 5, 'name': 'Attribution-ShareAlike License'}, {'url': 'http://creativecommons.org/licenses/by-nd/2.0/', 'id': 6, 'name': 'Attribution-NoDerivs License'}, {'url': 'http://flickr.com/commons/usage/', 'id': 7, 'name': 'No known copyright restrictions'}, {'url': 'http://www.usa.gov/copyright.shtml', 'id': 8, 'name': 'United States Government Work'}]

接下来看categories这个key:

>>> len(val['categories'])80>>> val['categories'][{'supercategory': 'person', 'id': 1, 'name': 'person'}, {'supercategory': 'vehicle', 'id': 2, 'name': 'bicycle'}, {'supercategory': 'vehicle', 'id': 3, 'name': 'car'}, {'supercategory': 'vehicle', 'id': 4, 'name': 'motorcycle'}, {'supercategory': 'vehicle', 'id': 5, 'name': 'airplane'}, {'supercategory': 'vehicle', 'id': 6, 'name': 'bus'}, {'supercategory': 'vehicle', 'id': 7, 'name': 'train'},

这个键的值是长度为80的数组,这里只展示了前几个,每个的结构都是一样的。

‘supercategory’表示当前这个类别从属的大类,例如自行车类从属于交通工具类这个大类。

‘id’是当前这个类别的编号,总共80个类,编号从1-80,编号0表示背景。

再看image这个键:

>>> len(val['images'])5000>>> val['images'][:2][{'license': 4, 'file_name': '000000397133.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000397133.jpg', 'height': 427, 'width': 640, 'date_captured': '2013-11-14 17:02:52', 'flickr_url': 'http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg', 'id': 397133}, {'license': 1, 'file_name': '000000037777.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000037777.jpg', 'height': 230, 'width': 352, 'date_captured': '2013-11-14 20:55:31', 'flickr_url': 'http://farm9.staticflickr.com/8429/7839199426_f6d48aa585_z.jpg', 'id': 37777}]>>> val['images'][0].keys()dict_keys(['license', 'file_name', 'coco_url', 'height', 'width', 'date_captured', 'flickr_url', 'id'])

images这个键有5000个值,表示5000张图片的信息,

比较重要的是‘file_name’,‘height’,‘width’和’id’。‘height’,’width’表明图片的长和宽;

最后看最重要的annotations键:

"annotation": [{"segmentation": [ # 对象的边界点(边界多边形)[224.24,297.18,# 第一个点 x,y坐标228.29,297.18, # 第二个点 x,y坐标234.91,298.29,…………225.34,297.55]],"area": 1481.3806499999994, # 区域面积"iscrowd": 0, # "image_id": 397133, # 对应的图片ID(与images中的ID对应)"bbox": [217.62,240.54,38.99,57.75], # 定位边框 [x,y,w,h]"category_id": 44, # 类别ID(与categories中的ID对应)"id": 82445 # 对象ID,因为每一个图像有不止一个对象,所以要对每一个对象编号(每个对象的ID是唯一的)},…………]

注意,单个的对象(iscrowd=0)可能需要多个polygon来表示,比如这个对象在图像中被挡住了。而iscrowd=1时(将标注一组对象,比如一群人)的segmentation使用的就是RLE格式。

COCO数据集keypoint部分

coco数据集的使用:

from pycocotools.coco import COCOimport matplotlib.pyplot as pltimport cv2import osannot_path_train = '../datasets/coco_2017_dataset/annotations/person_keypoints_train2017.json'imgRoot = "D:/PycharmProjects/datasets/coco_2017_dataset"dataType = "train2017"coco = COCO(annot_path_train)imgId = coco.getAnnIds(imgIds=552272)imgInfo = coco.loadAnns(imgId)print(f'图像{imgId}的信息如下:\n{imgInfo}')imgId = coco.getImgIds(imgIds=552272)imgInfo = coco.loadImgs(imgId)[0]print(f'图像{imgId}的信息如下:\n{imgInfo}')imPath = os.path.join(imgRoot, dataType, imgInfo['file_name'])im = cv2.imread(imPath)plt.imshow(im);plt.axis('off')# plt.show()annIds = coco.getAnnIds(imgIds=imgInfo['id'])# 获取该图像对应的anns的Id# print(f'图像{imgInfo["id"]}包含{len(anns)}个ann对象,分别是:\n{annIds}')anns = coco.loadAnns(annIds)coco.showAnns(anns)print(f'ann{annIds[2]}对应的mask如下:')mask = coco.annToMask(anns[2])plt.imshow(mask); plt.axis('off')plt.show()

输出为:

图像[538087, 547153, 1206290, 1716667, 1717136, 2165006, 2165777]的信息如下:[{'segmentation': [[255.14, 151.89, 251.89, 175.68, 244.32, 188.65, 255.14, 202.7, 265.95, 202.7, 277.84, 202.7, 276.76, 237.3, 285.41, 278.38, 307.03, 289.19, 310.27, 247.03, 324.32, 240.54, 352.43, 263.24, 355.68, 312.97, 367.57, 343.24, 400, 367.03, 409.73, 358.38, 402.16, 340, 394.59, 317.3, 390.27, 271.89, 378.38, 241.62, 354.59, 205.95, 335.14, 191.89, 343.78, 178.92, 337.3, 154.05, 314.59, 127.03, 296.22, 129.19, 296.22, 102.16, 283.24, 95.68, 260.54, 104.32, 261.62, 123.78, 261.62, 131.35, 257.3, 147.57, 255.14, 161.62], [286.49, 323.78, 286.49, 337.84, 295.14, 344.32, 312.43, 332.43]], 'num_keypoints': 16, 'area': 16444.5709, 'iscrowd': 0, 'keypoints': [284, 124, 2, 288, 116, 2, 277, 117, 2, 0, 0, 0, 264, 123, 2, 308, 141, 2, 263, 155, 2, 337, 170, 2, 251, 191, 2, 318, 185, 2, 270, 163, 2, 327, 203, 2, 290, 211, 2, 363, 241, 2, 288, 254, 2, 375, 331, 2, 303, 313, 1], 'image_id': 552272, 'bbox': [244.32, 95.68, 165.41, 271.35], 'category_id': 1, 'id': 538087}, {'segmentation': [[632.43, 377.84, 614.05, 400.54, 601.08, 412.43, 577.3, 409.19, 567.57, 394.05, 562.16, 388.65, 539.46, 385.41, 525.41, 376.76, 517.84, 358.38, 535.14, 341.08, 547.03, 331.35, 554.59, 319.46, 554.59, 309.73, 561.08, 295.68, 577.3, 275.14, 583.78, 261.08, 582.7, 258.92, 556.76, 258.92, 538.38, 256.76, 525.41, 243.78, 515.68, 230.81, 516.76, 215.68, 522.16, 213.51, 547.03, 226.49, 560, 227.57, 568.65, 227.57, 580.54, 218.92, 588.11, 211.35, 598.92, 203.78, 603.24, 191.89, 609.73, 177.84, 619.46, 169.19, 623.78, 168.11, 636.76, 168.11, 640, 168.11]], 'num_keypoints': 12, 'area': 18150.97805, 'iscrowd': 0, 'keypoints': [607, 199, 2, 0, 0, 0, 0, 0, 0, 638, 208, 2, 0, 0, 0, 0, 0, 0, 611, 213, 2, 0, 0, 0, 587, 244, 2, 632, 301, 2, 538, 240, 2, 612, 320, 2, 586, 304, 2, 595, 338, 2, 540, 362, 2, 595, 380, 2, 613, 368, 2], 'image_id': 552272, 'bbox': [515.68, 168.11, 124.32, 244.32], 'category_id': 1, 'id': 547153}, {'segmentation': [[245.5, 480, 245.74, 476.17, 244.06, 473.03, 251.3, 454.45, 250.81, 412.45, 250.09, 382.28, 250.09, 364.18, 260.33, 352.86, 259.02, 311.68, 273.97, 306.43, 287.09, 311.94, 305.71, 314.3, 317.51, 317.71, 329.05, 317.97, 329.05, 316.14, 321.97, 311.68, 304.4, 298.83, 317.78, 298.3, 315.94, 294.63, 294.69, 289.91, 282.1, 291.48, 273.71, 292.79, 263.48, 292.27, 255.61, 291.75, 252.47, 280.73, 266.1, 276.53, 274.5, 282.04, 279.22, 274.96, 271.35, 269.19, 250.37, 262.37, 248.79, 240.86, 237.78, 207.81, 231.48, 200.47, 225.97, 191.81, 228.07, 183.16, 237.51, 181.58, 248.01, 180.01, 251.42, 168.47, 251.68, 160.6, 253.25, 157.98, 257.97, 156.4, 254.04, 151.16, 251.42, 144.86, 253.25, 136.73, 252.73, 123.62, 256.14, 119.95, 237.25, 109.45, 213.38, 104.73, 196.6, 106.31, 189.25, 112.6, 181.38, 126.24, 175.35, 150.63, 175.09, 163.75, 178.24, 173.72, 177.98, 189.98, 184.01, 199.22, 179.81, 215.75, 178.24, 233.84, 179.02, 262.17, 179.81, 267.68, 171.42, 309.58, 168.01, 335.28, 167.75, 365.45, 190.04, 376.99, 186.37, 389.32, 185.06, 411.35, 182.96, 431.02, 182.43, 442.82, 180.6, 457.77, 177.19, 467.21, 172.73, 478.43, 172.99, 480]], 'num_keypoints': 10, 'area': 28188.6849, 'iscrowd': 0, 'keypoints': [252, 154, 2, 0, 0, 0, 239, 148, 2, 0, 0, 0, 204, 160, 2, 0, 0, 0, 210, 201, 2, 0, 0, 0, 211, 295, 2, 247, 269, 2, 285, 298, 2, 0, 0, 0, 229, 357, 2, 189, 433, 2, 225, 453, 2, 0, 0, 0, 0, 0, 0], 'image_id': 552272, 'bbox': [167.75, 104.73, 161.3, 375.27], 'category_id': 1, 'id': 1206290}, {'segmentation': [[153.48, 156.52, 174.23, 163.43, 178.18, 172.33, 182.13, 194.06, 184.11, 203.95, 182.13, 243.47, 175.22, 262.25, 176.2, 283, 170.27, 283, 151.5, 286.95, 149.52, 296.83, 137.67, 302.76, 137.67, 296.83, 129.76, 312.64, 128.77, 335.37, 129.76, 350.19, 127.79, 364.02, 128.77, 373.9, 137.67, 386.75, 139.64, 391.69, 140.63, 393.67, 122.84, 403.55, 109.01, 417.38, 103.08, 421.33, 97.15, 418.37, 92.21, 413.43, 91.22, 413.43, 89.25, 407.5, 99.13, 395.64, 109.01, 385.76, 112.96, 382.8, 112.96, 377.86, 110, 363.03, 105.06, 340.31, 107.03, 333.39, 107.03, 317.58, 107.03, 292.88, 111.98, 283.98, 110.99, 275.09, 110, 269.16, 110.99, 266.2, 106.05, 255.33, 104.07, 233.59, 92.21, 222.72, 91.22, 208.89, 87.27, 201.97, 92.21, 182.21, 114.94, 166.4, 107.03, 160.47, 101.11, 146.64, 96.17, 128.85, 96.17, 121.93, 104.07, 113.04, 114.94, 110.07, 128.77, 110.07, 136.68, 115.02, 146.56, 124.9, 149.52, 126.87, 151.5, 132.8, 148.54, 148.61, 148.54, 154.54, 152.49, 159.48]], 'num_keypoints': 14, 'area': 15615.20195, 'iscrowd': 0, 'keypoints': [145, 155, 2, 144, 144, 2, 132, 147, 2, 0, 0, 0, 109, 151, 2, 161, 171, 2, 103, 190, 2, 172, 219, 2, 119, 247, 2, 166, 262, 2, 146, 267, 2, 166, 252, 2, 130, 255, 2, 0, 0, 0, 119, 296, 2, 0, 0, 0, 115, 376, 2], 'image_id': 552272, 'bbox': [87.27, 110.07, 96.84, 311.26], 'category_id': 1, 'id': 1716667}, {'segmentation': [[0, 418.08, 9.18, 425.45, 12.33, 434.58, 25.32, 434.22, 20.41, 427.21, 16.19, 412.47, 22.16, 407.2, 25.32, 406.5, 29.88, 401.94, 20.41, 397.73, 1.81, 383.69]], 'num_keypoints': 0, 'area': 779.02935, 'iscrowd': 0, 'keypoints': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'image_id': 552272, 'bbox': [0, 383.69, 29.88, 50.89], 'category_id': 1, 'id': 1717136}, {'segmentation': [[209.45, 98.49, 208.21, 76.05, 214.44, 67.32, 226.91, 61.09, 244.36, 73.56, 241.87, 86.03, 244.36, 97.25, 239.38, 108.47, 211.95, 100.99]], 'num_keypoints': 4, 'area': 1291.0023, 'iscrowd': 0, 'keypoints': [232, 94, 2, 236, 89, 2, 227, 88, 2, 0, 0, 0, 212, 92, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'image_id': 552272, 'bbox': [208.21, 61.09, 36.15, 47.38], 'category_id': 1, 'id': 2165006}, {'segmentation': [[435.61, 363.65, 424.3, 376.85, 436.55, 390.99, 458.24, 378.73, 458.24, 357.99, 459.18, 302.36, 470.5, 290.1, 496.9, 366.48, 499.73, 370.25, 517.64, 357.99, 515.75, 340.08, 500.67, 265.59, 495.38, 249.08, 503.13, 212.25, 500.23, 192.37, 482.29, 175.89, 483.75, 162.32, 487.14, 144.87, 483.26, 130.33, 465.81, 129.85, 452.24, 139.54, 447.88, 149.24, 445.94, 159.41, 454.18, 169.11, 428.49, 170.56, 417.34, 179.29, 409.1, 203.52, 407.65, 208.85, 413.47, 213.7, 398.32, 256.92, 405.49, 288.69, 415.74, 292.78, 422.91, 230.28, 432.13, 272.29, 436.23, 361.44]], 'num_keypoints': 16, 'area': 16305.31245, 'iscrowd': 0, 'keypoints': [474, 166, 2, 478, 158, 2, 467, 157, 2, 0, 0, 0, 448, 158, 2, 495, 193, 2, 427, 179, 2, 480, 246, 2, 417, 225, 2, 463, 288, 2, 411, 266, 2, 471, 247, 2, 440, 242, 2, 493, 293, 2, 445, 285, 2, 501, 349, 2, 441, 362, 2], 'image_id': 552272, 'bbox': [398.32, 129.85, 119.32, 261.14], 'category_id': 1, 'id': 2165777}]图像[552272]的信息如下:{'license': 2, 'file_name': '000000552272.jpg', 'coco_url': 'http://images.cocodataset.org/train2017/000000552272.jpg', 'height': 480, 'width': 640, 'date_captured': '2013-11-21 20:53:16', 'flickr_url': 'http://farm3.staticflickr.com/2542/3970944877_5cd66b8655_z.jpg', 'id': 552272}ann1206290对应的mask如下: