虚拟机Ubuntu14.04+Caffe+libsvm对数据集进行分类

一、问题描述任务是利用已有分类方法，如SVM和Ridge Regression对MIRFlickr-25000数据集进行分类实验。具体要求：数据集：MIRFlickr-25000 http://press.liacs.nl/mirflickr/数据特征：Caffe http://caffe.berkeleyvision.org/分类方法：SVM (http://www.csie.nt

Chenweilunster

2311人浏览 · 2015-07-23 16:44:09

Chenweilunster · 2015-07-23 16:44:09 发布

一、问题描述
任务是利用已有分类方法，如SVM和Ridge Regression对MIRFlickr-25000数据集进行分类实验。具体要求：

数据集：MIRFlickr-25000 http://press.liacs.nl/mirflickr/
数据特征：Caffe http://caffe.berkeleyvision.org/
分类方法：SVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) 和Ridge Regression (Matlab里应该有，如果没有，去Google里搜索）
评价标准：AUC (area under curve) 和average precision (AP) （注意是分类的AP，不是检索retrieval的AP）
实验环境：Matlab和Unix
实验内容：
6.1 训练数据大小对实验结果的影响；
6.2 模型参数对实验结果的影响；
6.3 两个方法的比较结果；
6.4 每个具体类别的性能

二、步骤(updating)

安装caffe，并安装python接口。http://caffe.berkeleyvision.org/installation.html#compilation

注意，虚拟机环境不能访问GPU，所以不需要安装CUDA，安装完成后在caffe-master/Makefile.config中设置CPU_ONLY=1。
python接口的安装需要很多支持的包，建议安装anaconda，安装完成后务必将anaconda/bin添加入PATH,修改Makefile.config使得其读取anaconda的路径，并且将caffe-master/python添加进PYTHONPATH
make all之后，需要make pycaffe和make distribute。

2.使用caffe中训练好的模型来进行feature extraction测试：http://caffe.berkeleyvision.org/gathered/examples/feature_extraction.html

3.将MIRFLIckr-25k的图片放入caffe-master/examples/images,然后使用已有的模型进行特征提取。需要注意的是，我们应该在fc7层提取4096个特征，并且修改batch_size和提取命令的最后两个参数，使得（batch_size）*（number of data mini-batches）=25000//图片总数，最后的lmdb改为leveldb //生成leveldb数据库而非lmdb数据库，这样方便python编程

4.提取特征以后，存放按我的来讲，存放在/home/username/Downloads/caffe-master/examples/_temp/features文件夹中，接下来提取特征

将levelDB数据库中的内容提取出X矩阵，X矩阵为25000*4096，每一行对应一张图片
提取时需要将mirflickr的annotation文件下载并放在虚拟机上
提取24个y向量，每一个y向量对应着某一个分类（如people,animals...），y为25000*1的列向量。以people类为例，如果第i个图片（X的第i行）含有people（在annotation的people.txt文件中出现，则y_people的第i行为1，否则为0）

提取X矩阵的代码如下

import numpy as np  
import caffe
import leveldb  
import sys 
import collections 
from caffe.proto import caffe_pb2 
dbName = '/home/chenweilun/_temp/features'#dbName is the place where the levelDB data was stored 
#featureFile is the place where you want to generate the X matrix
featureFile = '/home/chenweilun/_temp/data'  
output = open(featureFile, 'w')    # open leveldb files  
db = leveldb.LevelDB(dbName)  
    # get db iterator  

    # convert string to datum  
it = db.RangeIter()  
features = {}
count=0  
for key,value in it:  
    # convert string to datum  
    datum = caffe_pb2.Datum.FromString(value)  
    # convert datum to numpy string   
    arr = caffe.io.datum_to_array(datum) 
    features[int(key)] = arr
    count+=1  
  
#write to file, since the key in it is sorted by alpha_number default, while leveldb is sorted by number, we must sort the key again.  
sort_features = collections.OrderedDict(sorted(features.items()))  
for k, arr  in sort_features.iteritems():  
    if(k > count-1):  
        break  
    line = ""  
    for i in range(0, len(arr)):
	temp=str(arr[i].tolist()[0])
	temp=temp.strip('[')
	temp=temp.strip(']')  
        line += temp + ' '  
    output.write(line.strip() + "\n")  
output.close()

生成y向量的函数如下，DBlist是提取特征时temp文件的地址，用来获得图片编号。classX是annotation中的某个类，classX_output是希望输出y_classX的位置

import sys
DBlist = open('/home/chenweilun/_temp/temp.txt')
temp = open('/home/chenweilun/_temp/imagelist','r+')
animals = open('/home/chenweilun/MIRFLICKR-25000 Annotation/animals.txt')
animals_output= open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/animals','w')

baby = open('/home/chenweilun/MIRFLICKR-25000 Annotation/baby.txt')
baby_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/baby','w')

bird = open('/home/chenweilun/MIRFLICKR-25000 Annotation/bird.txt')
bird_output =  open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/bird','w')
car = open('/home/chenweilun/MIRFLICKR-25000 Annotation/car.txt')
car_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/car','w')
clouds = open('/home/chenweilun/MIRFLICKR-25000 Annotation/clouds.txt')
clouds_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/cloads','w')
dog = open('/home/chenweilun/MIRFLICKR-25000 Annotation/dog.txt')
dog_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/dog','w')
female = open('/home/chenweilun/MIRFLICKR-25000 Annotation/female.txt')
female_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/female','w')
flower = open('/home/chenweilun/MIRFLICKR-25000 Annotation/flower.txt')
flower_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/flower','w')
food = open('/home/chenweilun/MIRFLICKR-25000 Annotation/food.txt')
food_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/food','w')
indoor = open('/home/chenweilun/MIRFLICKR-25000 Annotation/indoor.txt')
indoor_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/indoor','w')
lake = open('/home/chenweilun/MIRFLICKR-25000 Annotation/lake.txt')
lake_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/lake','w')
male = open('/home/chenweilun/MIRFLICKR-25000 Annotation/male.txt')
male_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/male','w')
night = open('/home/chenweilun/MIRFLICKR-25000 Annotation/night.txt')
night_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/night','w')
people = open('/home/chenweilun/MIRFLICKR-25000 Annotation/people.txt')
people_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/people','w')
plant_life = open('/home/chenweilun/MIRFLICKR-25000 Annotation/plant_life.txt')
plant_life_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/plant_life','w')

portrait = open('/home/chenweilun/MIRFLICKR-25000 Annotation/portrait.txt')
portrait_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/portrait','w')
river = open('/home/chenweilun/MIRFLICKR-25000 Annotation/river.txt')
river_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/river','w')
sea = open('/home/chenweilun/MIRFLICKR-25000 Annotation/sea.txt')
sea_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/sea','w')
sky = open('/home/chenweilun/MIRFLICKR-25000 Annotation/sky.txt')
sky_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/sky','w')
structures = open('/home/chenweilun/MIRFLICKR-25000 Annotation/structures.txt')
structures_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/structures','w')
sunset = open('/home/chenweilun/MIRFLICKR-25000 Annotation/sunset.txt')
sunset_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/sunset','w')
transport = open('/home/chenweilun/MIRFLICKR-25000 Annotation/transport.txt')
transport_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/transport','w')
tree = open('/home/chenweilun/MIRFLICKR-25000 Annotation/tree.txt')
tree_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/tree','w')
water = open('/home/chenweilun/MIRFLICKR-25000 Annotation/water.txt')
water_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/water','w')

def search(dic,word):
	flag=0
	while True:
		line = dic.readline()
		if line:
			line=line.strip('\n').strip()
			if line==word:
				flag=1
				break
		else:
			break
	dic.seek(0)	
	if flag==1:
		return 1	
	else:
		return 0

def generateTemp(words):
        while True:
                line = words.readline()
                if line:
                        line = line[(line.find('im')+2):line.find('.')]
                        temp.write(line + '\n')
                else:
                        break
	temp.seek(0)
		
def generateY(out,dic,words):
	while True:
		line = words.readline()
		if line:
			line = line.strip('\n').strip()
			flag=search(dic,line)
			out.write(str(flag) + '\n')
		else:
			break
	words.seek(0)			
	
generateTemp(DBlist)
generateY(animals_output,animals,temp)
generateY(baby_output,baby,temp)
generateY(bird_output,bird,temp)
generateY(car_output,car,temp)
generateY(clouds_output,clouds,temp)
generateY(dog_output,dog,temp)
generateY(female_output,female,temp)
generateY(flower_output,flower,temp)
generateY(food_output,food,temp)
generateY(indoor_output,indoor,temp)
generateY(lake_output,lake,temp)
generateY(male_output,male,temp)
generateY(night_output,night,temp)
generateY(people_output,people,temp)
generateY(plant_life_output,plant_life,temp)
generateY(portrait_output,portrait,temp)
generateY(river_output,river,temp)
generateY(sea_output,sea,temp)
generateY(sky_output,sky,temp)
generateY(structures_output,structures,temp)
generateY(sunset_output,sunset,temp)
generateY(transport_output,transport,temp)
generateY(tree_output,tree,temp)
generateY(water_output,water,temp)