SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting.数据库下载地址

   我所下载的数据集是32×32的形式的。然后用scipy将数据导入到代码中,先进行数据的分析,mat后缀的格式是matplotlib的数据形式,scipy能够很好的读取,但是数据形式何不相同。其中代码如下:

# coding: UTF-8
from scipy.io import loadmat as load

traindata = load('data/train_32x32.mat')
testdata = load('data/test_32x32.mat')

print ("Train Data Shape:", traindata['X'].shape)
print ("Test Data Shape:", traindata['y'].shape)

实验的数据显示数据的形式是matlib的,和普通的形式很不相同。然后进行数据的预处理和观察数据:

# coding: UTF-8
from scipy.io import loadmat as load
import matplotlib.pyplot as plt
import numpy as np


def reformat(samples, labels):
    # 改变原始数据的形状
    # (图片高,图片宽,通道数,图片数)->(图片数,图片高,图片宽,通道数)
    # labels 变成one-hot encoding
    samples = np.transpose(samples, (3, 0, 1, 2))
    labels = np.array([x[0] for x in labels])
    one_hot_labels = []
    for num in labels:
        one_hot = [0.0] * 10
        if num == 10:
            one_hot[0] = 1.0
        else:
            one_hot[num] = 1.0
        one_hot_labels.append(one_hot)
    labels = np.array(one_hot_labels).astype(np.float32)
    return samples, labels


def normalize(samples):
    # 将图片从0~255 线性映射到 -1.0~+1.0
    # 并且灰度化
    pass

def distribution(labels, name):
    # 查看一下每个label的分布。就是比例
    pass

def inspect(dataset, labels, i):
    # 显示图片看看
    print(labels[i])
    plt.imshow(dataset[i])
    plt.show()

train = load('data/train_32x32.mat')
test = load('data/test_32x32.mat')

print ("Train Data Shape:", train['X'].shape)
print ("Train Label Shape:", train['y'].shape)

train_samples = train['X']
train_labels = train['y']
test_samples = test['X']
test_labels = test['y']

_train_sample, _train_labels = reformat(train_samples, train_labels)
_test_sample, _test_labels = reformat(test_samples, test_labels)

num_labels = 10
image_size = 32


if __name__ == '__main__':
    # 探索数据
    inspect(_train_sample, _train_labels, 12322)

其中代码中reformat时进行数据的预处理,将改变原始数据的格式,并将labels变成one-hot形式。然后就是inspect函数,进行数据的显示,能够将图像进行显示。其结果如下所示。














Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐