Caffe cifar100 实例

"【Caffe-windows】 cifar100实例过程记录"

Posted by Canary on April 17, 2018

前言

在过完cifar-10的例程之后,现在对cifar-100的数据集进行训练,但是这一次不下载数据集的二进制版本,打算试用下caffe-python接口。现对此在 Windows Caffe 下的操作做简要记录!

cifar-100数据集获取

cifar-10数据集和cifar-100数据集的官方网址都是:https://www.cs.toronto.edu/~kriz/cifar.html

这个数据集就像CIFAR-10,除了它有100个类,每个类包含600个图像。,每类各有500个训练图像和100个测试图像。CIFAR-100中的100个类被分成20个超类。每个图像都带有一个“精细”标签(它所属的类)和一个“粗糙”标签(它所属的超类),比如超类“鱼”里边又细分为“水族馆的鱼,比目鱼,射线,鲨鱼,鳟鱼” 等小类。

cifar-100有三种版本,我下载的是第三个 CIFAR-100 python version

CIFAR-100 python version    161 MB  eb9058c3a382ffc7106e4002c42a8d85
CIFAR-100 Matlab version    175 MB  6a4bfa1dcd5c9453dda6bb54194911f4
CIFAR-100 binary version (suitable for C programs)  161 MB  03b5dce01913d631647c71ecec9e9cb8

将下载好的压缩包压缩到路径 D:\GitHub Repository\caffe\data\cifar100

在官网上注明了以下一小段代码:

def unpickle(file):
    import cPickle
    fo = open(file, 'rb')
    dict = cPickle.load(fo)
    fo.close()
    return dict

通过以上代码可以将其转换成一个dict对象,test和train的dict中包含以下元素:

  • data:一个n*3072的numpy数组,每一行都是(32,32,3)的RGB图像,n代表图像个数。

  • coarse_labels:一个范围在0-19的包含n个元素的列表,对应图像的大类别。

  • fine_labels:一个范围在0-99的包含n个元素的列表,对应图像的小类别。

  • metadict中只包含fine_label_names,第i个元素对应其真正的类别。

于是,在当前目录下新建 convert_dict.py 实现数据转换,内容如下:

# -*- coding: UTF-8 -*-
import os
import cPickle

import numpy as np
import sklearn
import sklearn.cross_validation
import sklearn.linear_model

import lmdb
import caffe


def unpickle(file):
    """ unpickle the data """
    fo = open(file, 'rb')
    dict = cPickle.load(fo)
    fo.close()
    return dict


def shuffle_data(data, labels):
    data, _, labels, _ = sklearn.cross_validation.train_test_split(
        data, labels, test_size=0.0, random_state=42
    )
    return data, labels


def load_data(train_file):
    """  load the train and test data"""
    d = unpickle(train_file)
    data = d['data']
    # coarse_labels = d['coarse_labels']
    fine_labels = d['fine_labels']
    length = len(d['fine_labels'])

    data, labels = shuffle_data(
        data,
        # np.array(zip(coarse_labels, fine_labels))
        np.array(fine_labels)
    )
    # coarse_labels, fine_labels = zip(*labels.tolist())
    return (
        data.reshape(length, 3, 32, 32),
        # np.array(coarse_labels),
        labels
    )


if __name__ == '__main__':
    cifar_python_directory = os.path.abspath("cifar-100-python")

    # meta=unpickle(os.path.join(cifar_python_directory, 'meta'))
    # fine_label_names=meta['fine_label_names']
    # print(fine_label_names)

    print("Converting...")
    cifar_caffe_directory = os.path.abspath("cifar100_train_lmdb")
    if not os.path.exists(cifar_caffe_directory):

        X, y_f = load_data(os.path.join(cifar_python_directory, 'train'))
        Xt, yt_f = load_data(os.path.join(cifar_python_directory, 'test'))

        print("Data is fully loaded,now truly convertung.")

        env = lmdb.open(cifar_caffe_directory, map_size=50000 * 1000 * 5)
        txn = env.begin(write=True)
        count = 0
        for i in range(X.shape[0]):
            datum = caffe.io.array_to_datum(X[i], y_f[i])
            str_id = '{:08}'.format(count)
            txn.put(str_id, datum.SerializeToString())

            count += 1
            if count % 1000 == 0:
                print('already handled with {} pictures'.format(count))
                txn.commit()
                txn = env.begin(write=True)

        txn.commit()
        env.close()

        env = lmdb.open('cifar100_test_lmdb', map_size=10000 * 1000 * 5)
        txn = env.begin(write=True)
        count = 0
        for i in range(Xt.shape[0]):
            datum = caffe.io.array_to_datum(Xt[i], yt_f[i])
            str_id = '{:08}'.format(count)
            txn.put(str_id, datum.SerializeToString())

            count += 1
            if count % 1000 == 0:
                print('already handled with {} pictures'.format(count))
                txn.commit()
                txn = env.begin(write=True)

        txn.commit()
        env.close()
    else:
        print("Conversion was already done. Did not convert twice.")

在当前目录命令行执行该文件:

python .\convert_dict.py

这个时候,糟心的事情发生了,出现报错,一个接一个,那好吧,一个个解决呗!!

  1. 报错Non-ASCII character 'xe5' in file

    Python默认是以ASCII作为编码方式的,如果在自己的Python源码中包含了中文(或者其他非英语系的语言),此时即使你把自己编写的Python源文件以UTF-8格式保存了,但实际上,这依然是不行的。解决办法很简单,只要在文件开头加入下面代码就行了。

     # -*- coding: UTF-8 -*-  
    
  2. 提示找不到ImportError: No module named 'cPickle'

     Traceback (most recent call last):
       File ".\convert_dict.py", line 2, in <module>
         import cPickle
     ImportError: No module named 'cPickle'
    

    python3版本没有cPickle了,改成import pickle即可。

  3. 报错提示找不到模块sklearn

     ImportError: No module named sklearn
    

    使用anaconda在指定的python环境下安装包scikit-learn

  4. 报错提示找不到模块lmdb

     ImportError: No module named lmdb
    

    使用anaconda在指定的python环境下安装包python-lmdb

  5. 报错UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

     Traceback (most recent call last):
       File ".\convert_dict.py", line 60, in <module>
         X, y_f = load_data(os.path.join(cifar_python_directory, 'train'))
       File ".\convert_dict.py", line 30, in load_data
         d = unpickle(train_file)
       File ".\convert_dict.py", line 16, in unpickle
         dict = cPickle.load(fo)
     UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
    

    修改语句dict = cPickle.load(fo)改成dict = cPickle.load(fo, encoding='latin1')

  6. 出现RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb报错:

     RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
     Traceback (most recent call last):
       File ".\convert_dict.py", line 10, in <module>
         import caffe
       File "D:\GitHub Repository\caffe-windows\python\caffe\__init__.py", line 1, in <module>
         from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
       File "D:\GitHub Repository\caffe-windows\python\caffe\pycaffe.py", line 13, in <module>
         from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
     ImportError: numpy.core.multiarray failed to import
    

    解决方案是,numpy的版本太低,将numpy的版本更新到最新版。

  7. 找不到答案的报错:

     Traceback (most recent call last):
       File ".\convert_dict.py", line 69, in <module>
         datum = caffe.io.array_to_datum(X[i], y_f[i])
       File "D:\GitHub Repository\caffe\python\caffe\io.py", line 80, in array_to_datum
         datum.label = label
       File "C:\ProgramData\Anaconda3\envs\Caffe_3_5\lib\site-packages\google\protobuf\internal\python_message.py", line 674, in field_setter
         new_value = type_checker.CheckValue(new_value)
       File "C:\ProgramData\Anaconda3\envs\Caffe_3_5\lib\site-packages\google\protobuf\internal\type_checkers.py", line 132, in CheckValue
         raise TypeError(message)
     TypeError: 63 has type <class 'numpy.int32'>, but expected one of: (<class 'int'>,)
    

    这是一个糟心的报错,没有解决方法,心灰意冷ing。

不想就此放弃,这个时候想到之前看过的一句话:“caffe在python3下的兼容性没有python2好”!!!!会不会是python3环境的问题。带着这个疑问和对caffe python的怨念。我打算重新编译一遍python2版本的caffe。说干就干!!!

整个过程和在python3下配置caffe的博客一样“Caffe for Windows10”

  1. 为了不造成环境污染,我使用anaconda3新建虚拟环境Caffe_2_7
     conda create -n Caffe_2_7 python=2.7.14
    

    然后将新创建的虚拟环境添加到Path环境变量:

     C:\ProgramData\Anaconda3\envs\Caffe_2_7
     C:\ProgramData\Anaconda3\envs\Caffe_2_7\Scripts
    

    因为我原来已经配置过python3的caffe版本,所以环境变量里边已经有python3的路径了,为了使用这个新增的python2,一定要将上述两条路径放到python3的路径的前边:

  2. 在构建cmake编译配置文件的时候依旧使用cmake-gui工具,只需要在定义python版本的时候将python的数字修改为2

  3. 为了能够成功运行python2版本的import caffe:需要将新编译的caffe/python添加到环境变量PYTHONPATH,并保证新添加的路径在其他路径的前边。

  4. 其他操作完全一样。

  5. 同时为了避免找不到上述几个包的情况,我在新的虚拟环境里,使用conda重新安装了ython-lmdbscikit-learn包。这个时候重新在Windows PowerShell下执行python .\convert_dict.py:

欢呼!!!!!!!!🙌

观察路径:

成功转换成我们熟悉的cifar100_train_lmdbcifar100_test_lmdb了。

LMDB数据查看

我们将LMDB数据库中数据,将其转化为可视化的png格式,新建read_lmdb.py代码如下:

import lmdb
import os
import cv2
import cPickle
import caffe
from caffe.proto import caffe_pb2

def unpickle(file):
    """ unpickle the data """
    fo = open(file, 'rb')
    dict = cPickle.load(fo)
    fo.close()
    return dict

if __name__=='__main__':
    meta=unpickle(os.path.join('cifar-100-python', 'meta'))
    fine_label_names=meta['fine_label_names']

    env=lmdb.open('cifar100_train_lmdb')
    txn=env.begin()
    cursor=txn.cursor()
    datum=caffe_pb2.Datum()

    i=0
    for key,value in cursor:
        datum.ParseFromString(value)
        if i<10:
            data=caffe.io.datum_to_array(datum)
            label=datum.label
            img=data.transpose(1,2,0)
            cv2.imwrite('{}.png'.format(fine_label_names[label]),img)
        i+=1

    env.close()
    print('there are totally {} pictures'.format(i))

运行一下之后,当前目录下就会出现训练集中的前十张图片。

如果出现报错提示ImportError: No module named cv2,安装一下opencv包即可:

conda install -n Caffe_2_7 -c https://conda.binstar.org/menpo opencv

计算均值

对图像的数据预处理通常都是采取减去像素均值的方式,caffe-windows通过编译好的compute_image_mean.exe,我的在路径D:\GitHub Repository\caffe-windows\build\x64\install\bin下,具体的bat脚本可以参考cifar10-caffe-计算均值,这里给出python版本的,新建文件mean.py:

import caffe
import lmdb
import numpy as np
from caffe.proto import caffe_pb2
import time

lmdb_env=lmdb.open('cifar100_train_lmdb')
lmdb_txn=lmdb_env.begin()
lmdb_cursor=lmdb_txn.cursor()
datum=caffe_pb2.Datum()

N=0
mean = np.zeros((1, 3, 32, 32))
begintime = time.time()
for key,value in lmdb_cursor:
    datum.ParseFromString(value)
    data=caffe.io.datum_to_array(datum)
    image=data.transpose(1,2,0)
    mean[0,0] += image[:, :, 0]
    mean[0,1] += image[:, :, 1]
    mean[0,2] += image[:, :, 2]
    N+=1
    if N % 1000 == 0:
        elapsed = time.time() - begintime
        print("Processed {} images in {:.2f} seconds. "
              "{:.2f} images/second.".format(N, elapsed,
                                             N / elapsed))
mean[0]/=N
blob = caffe.io.array_to_blobproto(mean)
with open('mean.binaryproto', 'wb') as f:
    f.write(blob.SerializeToString())

lmdb_env.close()

运行文件在当前路径生成mean.binaryproto文件。

caffe网络结构文件的设置

仿照caffe里边关于cifar10的网络结构文件在cifar100文件夹里边新建cifar100_full_solver.prototxt

# The train/test net protocol buffer definition
net: "examples/cifar100/cifar100_full_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of CIFAR100, we have test batch size 100 and 500 test iterations,
# covering the full 50,000 testing images.
test_iter: 100
# Carry out testing every 1000 training iterations.
test_interval: 1000
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 200 iterations
display: 200
# The maximum number of iterations
max_iter: 60000
# snapshot intermediate results
snapshot: 10000
snapshot_format: HDF5
snapshot_prefix: "examples/cifar100/cifar100_full"
# solver mode: CPU or GPU
solver_mode: GPU

同样在cifar100里边新建cifar100_full_train_test.prototxt:

name: "CIFAR100_full"
layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_file: "data/cifar100/mean.binaryproto"
  }
  data_param {
    source: "data/cifar100/cifar100_train_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mean_file: "data/cifar100/mean.binaryproto"
  }
  data_param {
    source: "data/cifar100/cifar100_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}

以下内容和caffe/examples/cifar10里边的`cifar10_quick_train_test.prototxt`雷同,不表!

开始训练 cifar-100

来到 Caffe 的根目录新建文件: run-train-cifar100.bat,双击运行,开始训练数据:

.\build\x64\install\bin\caffe.exe train --solver=examples/cifar100/cifar100_full_solver.prototxt  
pause  

结束后,在caffe\examples\cifar100 生成的caffemodel是训练完毕得到的模型参数文件,solverstate是训练中断以后,可以用此文件从中断地方继续训练。

新建文件:run-test-cifar100.bat 双击运行测试网络的效果和质量:

build\x64\install\bin\caffe.exe test -model=examples\cifar100\cifar100_full_train_test.prototxt -weights=examples\cifar100\cifar100_full_iter_xxxx.caffemodel.h5
pause

上边脚本的cifar100_quick_iter_xxxx.caffemodel(.h5)里边的xxxx根据你自己电脑生成的来写,比如我的是cifar100_full_iter_20000.caffemodel.h5

要注意根据solver.prototxtsnapshot_format设置不同,存在生成的权值文件名不同的情况,有些是xxx.caffemodel有些是xxx.caffemodel.h5

结束!!

知识共享许可协议本作品采用知识共享署名-相同方式共享 4.0 国际许可协议进行许可。欢迎转载,并请注明来自:黄钢的博客 ,同时保持文章内容的完整和以上声明信息!