常用图像数据集原始数据(.png或.jpg格式)生成方法

引言

在计算机视觉方面的工作,我们常常需要用到很多图像数据集.像ImageNet这样早已大名鼎鼎的数据集,我等的百十个G的硬盘容量怕是怎么也承载不下;本文中,将给出一些Hello world级的图像数据集生成方法,以及其他相关图像数据资源的整理.

本文的主要内容包括:

  1. MNIST, CIFAR-10, CIFAR-100等.png或.jpg格式数据集的生成方法;
  2. 如何编写脚本生成图像数据,并更根据标签文件自动归类;
  3. 如何使用Digist工具生成这些数据集;
  4. .h5格式数据文件格式查看方式;
  5. 相关数据集的下载地址.

生成图像数据

MNIST, CIFAR-10, CIFAR-100等数据在其官网都有相关的介绍,这里也给出相关的数据集的官方地址:

  1. MNIST: http://yann.lecun.com/exdb/mnist/
  2. CIFAR系列: https://www.cs.toronto.edu/~kriz/cifar.html

通过官网的介绍可以看出,官网给出的数据集大多都是二进制格式和一些python,matlab格式;有时候我们需要的是原始图像数据,这个时候我们就需要使用代码或者借助其他工具自己生成了.

代码生成的方式,在网上也有很多,但良莠不齐.大多需要自己根据官网给出的数据格式,自己更具格式特征生成原始数据,这里就不做具体介绍了,网上有很多.这里介绍一些比较简单快捷的方式,来帮助我们快速得到原始图像数据.


CIFAR-10 图像数据

这部分是我从kaggle cifar-10 官网提供的CIFAR-10数据集生成的,原始数据集(.png格式,比较符合我们的要求),但存在一个问题,所给的图片混乱的排列在train目录下,未按照原始10分类进行分类,但好在给出了trainLabels.csv类别映射文件;所以,我们需要解决的首要问题就是,根据这个映射文件自动分成10类别,并存放在10个文件目录下.

下边直接给出我的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# coding: utf-8

import csv
import os
import shutil
import sys

# 获取文件名(除去后缀)
def getImageFilePre(filename):
if filename.endswith(".png"):
temp = filename.split(".")
filePre = temp[0]
return filePre

# string 转 int
def str2Int(stringValue):
return int(stringValue)

# int 转 string
def int2Str(intValue):
return str(intValue)

# 文件重命名
def fileRename(dirPath):
# 三个参数:分别返回
# 1.父目录
# 2.所有文件夹名字(不含路径)
# 3.所有文件名字
for parent, dirnames, filenames in os.walk(dirPath):
for dirname in dirnames: #输出文件夹信息
count = 1
newTmpPath = os.path.join(dirPath, dirname)
os.chdir(newTmpPath)
fileContents = os.listdir(newTmpPath)

for curFile in fileContents:
if curFile.endswith(".png"):
newName = dirname + "."+ int2Str(count) + ".png"
count = count + 1
shutil.move(curFile, newName)
print curFile + " -> " + newName + " ------> OK!"

def main():
# 读取标签文件内容
csvfile = file('trainLabels.csv', 'rb')
reader = csv.reader(csvfile)
reader = list(reader) # 转化为list列表

# 读取目录下文件列表
dirPath = "F:\\xxxxx\\data_origin\\train_200"
os.chdir(dirPath)
dirContents = os.listdir(dirPath)
dirContents.sort(key=lambda x:int(x[:-4])) #按文件名排序

totalFiles = 50001
for num in range(1, totalFiles): # 0-199
labelContent = reader[num]
labelID = reader[num][0]
labelName = reader[num][1]
imageFilename = dirContents[num-1]
tmpFilePre = getImageFilePre(dirContents[num-1])

if str2Int(labelID) == str2Int(tmpFilePre):
print "labelID == filePre !!!"
baseDirPath = "F:\\xxxxx\\data_origin\\train_with_class"
new_dir_name = labelName
new_dir_path = os.path.join(baseDirPath, new_dir_name)

isExists = os.path.isdir(new_dir_path)
if not isExists:
os.makedirs(new_dir_path)
print new_dir_path + " 创建成功!"
else:
print new_dir_path + "目录已存在!"

shutil.copy(imageFilename, new_dir_path)

print ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"

csvfile.close()

rootPath = "F:\\xxxxx\\data_origin\\train_with_class"
fileRename(rootPath)

if __name__ == '__main__':
main()

这样,便分成了10个类别,并根据类别存放在不同的目录下,每一类别5000张图片;在我的Windows平台下耗时1.5个小时(包括文件重命名)才跑完,确实有点慢.下图为最终的结果图:

CIFAR-10 with class

CIFAR-10 examples


caffe图像化操作工具digits工具生成图像数据集

详细的使用方法可移步这篇博文:http://www.cnblogs.com/denny402/p/5136155.html

需要安装caffe和digits工具,使用工具可直接生成自动归类的图片数据,速度很快可以一试.


.h5文件结构查看器

在做卷积神经网络的时候,我们经常需要保存.h5数据文件,但有时候我们需要利用这些.h5文件,比如在进行transfor Learning的时候,就需要根据.h5文件的格式进行层冻结.

除了自己用代码一窥.h5文件结构外,还有什么快捷的工具吗?有的,matlab就提供了现成的调用方法.文档地址在这里:http://cn.mathworks.com/help/matlab/ref/h5disp.html

如,我们可以使用matlab命令查看vgg16模型的权重结构

1
>> h5disp('vgg16_weights.h5')

结果显示如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
>> h5disp('vgg16_weights.h5')
HDF5 vgg16_weights.h5
Group '/'
Attributes:
'nb_layers': 37
Group '/layer_0'
Attributes:
'nb_params': 0
Group '/layer_1'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x3x64
MaxSize: 3x3x3x64
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 64
MaxSize: 64
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_10'
Attributes:
'nb_params': 0
Group '/layer_11'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x128x256
MaxSize: 3x3x128x256
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 256
MaxSize: 256
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_12'
Attributes:
'nb_params': 0
Group '/layer_13'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x256x256
MaxSize: 3x3x256x256
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 256
MaxSize: 256
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_14'
Attributes:
'nb_params': 0
Group '/layer_15'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x256x256
MaxSize: 3x3x256x256
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 256
MaxSize: 256
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_16'
Attributes:
'nb_params': 0
Group '/layer_17'
Attributes:
'nb_params': 0
Group '/layer_18'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x256x512
MaxSize: 3x3x256x512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 512
MaxSize: 512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_19'
Attributes:
'nb_params': 0
Group '/layer_2'
Attributes:
'nb_params': 0
Group '/layer_20'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x512x512
MaxSize: 3x3x512x512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 512
MaxSize: 512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_21'
Attributes:
'nb_params': 0
Group '/layer_22'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x512x512
MaxSize: 3x3x512x512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 512
MaxSize: 512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_23'
Attributes:
'nb_params': 0
Group '/layer_24'
Attributes:
'nb_params': 0
Group '/layer_25'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x512x512
MaxSize: 3x3x512x512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 512
MaxSize: 512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_26'
Attributes:
'nb_params': 0
Group '/layer_27'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x512x512
MaxSize: 3x3x512x512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 512
MaxSize: 512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_28'
Attributes:
'nb_params': 0
Group '/layer_29'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x512x512
MaxSize: 3x3x512x512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 512
MaxSize: 512
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_3'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x64x64
MaxSize: 3x3x64x64
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 64
MaxSize: 64
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_30'
Attributes:
'nb_params': 0
Group '/layer_31'
Attributes:
'nb_params': 0
Group '/layer_32'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 4096x25088
MaxSize: 4096x25088
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 4096
MaxSize: 4096
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_33'
Attributes:
'nb_params': 0
Group '/layer_34'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 4096x4096
MaxSize: 4096x4096
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 4096
MaxSize: 4096
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_35'
Attributes:
'nb_params': 0
Group '/layer_36'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 1000x4096
MaxSize: 1000x4096
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 1000
MaxSize: 1000
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_4'
Attributes:
'nb_params': 0
Group '/layer_5'
Attributes:
'nb_params': 0
Group '/layer_6'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x64x128
MaxSize: 3x3x64x128
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 128
MaxSize: 128
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_7'
Attributes:
'nb_params': 0
Group '/layer_8'
Attributes:
'nb_params': 2
Dataset 'param_0'
Size: 3x3x128x128
MaxSize: 3x3x128x128
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Dataset 'param_1'
Size: 128
MaxSize: 128
Datatype: H5T_IEEE_F32LE (single)
ChunkSize: []
Filters: none
FillValue: 0.000000
Group '/layer_9'
Attributes:
'nb_params': 0

参考资料

[1]. https://www.cs.toronto.edu/~kriz/cifar.html
[2]. https://www.kaggle.com/c/cifar-10/data
[3]. http://cn.mathworks.com/help/matlab/ref/h5disp.html
[4]. http://www.cnblogs.com/denny402/p/5136155.html


0%