虚拟机Ubuntu14.04+Caffe+libsvm对数据集进行分类
一、问题描述 任务是利用已有分类方法,如SVM和Ridge Regression对MIRFlickr-25000数据集进行分类实验。具体要求:数据集:MIRFlickr-25000 http://press.liacs.nl/mirflickr/数据特征:Caffe http://caffe.berkeleyvision.org/分类方法:SVM (http://www.csie.nt
一、问题描述
任务是利用已有分类方法,如SVM和Ridge Regression对MIRFlickr-25000数据集进行分类实验。具体要求:
- 数据集:MIRFlickr-25000 http://press.liacs.nl/mirflickr/
- 数据特征:Caffe http://caffe.berkeleyvision.org/
- 分类方法:SVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) 和Ridge Regression (Matlab里应该有,如果没有,去Google里搜索)
- 评价标准:AUC (area under curve) 和average precision (AP) (注意是分类的AP,不是检索retrieval的AP)
- 实验环境:Matlab和Unix
- 实验内容:
6.1 训练数据大小对实验结果的影响;
6.2 模型参数对实验结果的影响;
6.3 两个方法的比较结果;
6.4 每个具体类别的性能
二、步骤(updating)
- 安装caffe,并安装python接口。http://caffe.berkeleyvision.org/installation.html#compilation
- 注意,虚拟机环境不能访问GPU,所以不需要安装CUDA,安装完成后在caffe-master/Makefile.config中设置CPU_ONLY=1。
- python接口的安装需要很多支持的包,建议安装anaconda,安装完成后务必将anaconda/bin添加入PATH,修改Makefile.config使得其读取anaconda的路径,并且将caffe-master/python添加进PYTHONPATH
- make all之后,需要make pycaffe和make distribute。
2.使用caffe中训练好的模型来进行feature extraction测试:http://caffe.berkeleyvision.org/gathered/examples/feature_extraction.html
3.将MIRFLIckr-25k的图片放入caffe-master/examples/images,然后使用已有的模型进行特征提取。需要注意的是,我们应该在fc7层提取4096个特征,并且修改batch_size和提取命令的最后两个参数,使得(batch_size)*(number of data mini-batches)=25000//图片总数,最后的lmdb改为leveldb //生成leveldb数据库而非lmdb数据库,这样方便python编程
4.提取特征以后,存放按我的来讲,存放在/home/username/Downloads/caffe-master/examples/_temp/features文件夹中,接下来提取特征
- 将levelDB数据库中的内容提取出X矩阵,X矩阵为25000*4096,每一行对应一张图片
- 提取时需要将mirflickr的annotation文件下载并放在虚拟机上
- 提取24个y向量,每一个y向量对应着某一个分类(如people,animals...),y为25000*1的列向量。以people类为例,如果第i个图片(X的第i行)含有people(在annotation的people.txt文件中出现,则y_people的第i行为1,否则为0)
- 提取X矩阵的代码如下
import numpy as np import caffe import leveldb import sys import collections from caffe.proto import caffe_pb2 dbName = '/home/chenweilun/_temp/features'#dbName is the place where the levelDB data was stored #featureFile is the place where you want to generate the X matrix featureFile = '/home/chenweilun/_temp/data' output = open(featureFile, 'w') # open leveldb files db = leveldb.LevelDB(dbName) # get db iterator # convert string to datum it = db.RangeIter() features = {} count=0 for key,value in it: # convert string to datum datum = caffe_pb2.Datum.FromString(value) # convert datum to numpy string arr = caffe.io.datum_to_array(datum) features[int(key)] = arr count+=1 #write to file, since the key in it is sorted by alpha_number default, while leveldb is sorted by number, we must sort the key again. sort_features = collections.OrderedDict(sorted(features.items())) for k, arr in sort_features.iteritems(): if(k > count-1): break line = "" for i in range(0, len(arr)): temp=str(arr[i].tolist()[0]) temp=temp.strip('[') temp=temp.strip(']') line += temp + ' ' output.write(line.strip() + "\n") output.close()
- 生成y向量的函数如下,DBlist是提取特征时temp文件的地址,用来获得图片编号。classX是annotation中的某个类,classX_output是希望输出y_classX的位置
import sys DBlist = open('/home/chenweilun/_temp/temp.txt') temp = open('/home/chenweilun/_temp/imagelist','r+') animals = open('/home/chenweilun/MIRFLICKR-25000 Annotation/animals.txt') animals_output= open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/animals','w') baby = open('/home/chenweilun/MIRFLICKR-25000 Annotation/baby.txt') baby_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/baby','w') bird = open('/home/chenweilun/MIRFLICKR-25000 Annotation/bird.txt') bird_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/bird','w') car = open('/home/chenweilun/MIRFLICKR-25000 Annotation/car.txt') car_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/car','w') clouds = open('/home/chenweilun/MIRFLICKR-25000 Annotation/clouds.txt') clouds_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/cloads','w') dog = open('/home/chenweilun/MIRFLICKR-25000 Annotation/dog.txt') dog_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/dog','w') female = open('/home/chenweilun/MIRFLICKR-25000 Annotation/female.txt') female_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/female','w') flower = open('/home/chenweilun/MIRFLICKR-25000 Annotation/flower.txt') flower_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/flower','w') food = open('/home/chenweilun/MIRFLICKR-25000 Annotation/food.txt') food_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/food','w') indoor = open('/home/chenweilun/MIRFLICKR-25000 Annotation/indoor.txt') indoor_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/indoor','w') lake = open('/home/chenweilun/MIRFLICKR-25000 Annotation/lake.txt') lake_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/lake','w') male = open('/home/chenweilun/MIRFLICKR-25000 Annotation/male.txt') male_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/male','w') night = open('/home/chenweilun/MIRFLICKR-25000 Annotation/night.txt') night_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/night','w') people = open('/home/chenweilun/MIRFLICKR-25000 Annotation/people.txt') people_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/people','w') plant_life = open('/home/chenweilun/MIRFLICKR-25000 Annotation/plant_life.txt') plant_life_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/plant_life','w') portrait = open('/home/chenweilun/MIRFLICKR-25000 Annotation/portrait.txt') portrait_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/portrait','w') river = open('/home/chenweilun/MIRFLICKR-25000 Annotation/river.txt') river_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/river','w') sea = open('/home/chenweilun/MIRFLICKR-25000 Annotation/sea.txt') sea_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/sea','w') sky = open('/home/chenweilun/MIRFLICKR-25000 Annotation/sky.txt') sky_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/sky','w') structures = open('/home/chenweilun/MIRFLICKR-25000 Annotation/structures.txt') structures_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/structures','w') sunset = open('/home/chenweilun/MIRFLICKR-25000 Annotation/sunset.txt') sunset_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/sunset','w') transport = open('/home/chenweilun/MIRFLICKR-25000 Annotation/transport.txt') transport_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/transport','w') tree = open('/home/chenweilun/MIRFLICKR-25000 Annotation/tree.txt') tree_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/tree','w') water = open('/home/chenweilun/MIRFLICKR-25000 Annotation/water.txt') water_output = open('/home/chenweilun/MIRFLICKR-25000 Annotation/output/water','w') def search(dic,word): flag=0 while True: line = dic.readline() if line: line=line.strip('\n').strip() if line==word: flag=1 break else: break dic.seek(0) if flag==1: return 1 else: return 0 def generateTemp(words): while True: line = words.readline() if line: line = line[(line.find('im')+2):line.find('.')] temp.write(line + '\n') else: break temp.seek(0) def generateY(out,dic,words): while True: line = words.readline() if line: line = line.strip('\n').strip() flag=search(dic,line) out.write(str(flag) + '\n') else: break words.seek(0) generateTemp(DBlist) generateY(animals_output,animals,temp) generateY(baby_output,baby,temp) generateY(bird_output,bird,temp) generateY(car_output,car,temp) generateY(clouds_output,clouds,temp) generateY(dog_output,dog,temp) generateY(female_output,female,temp) generateY(flower_output,flower,temp) generateY(food_output,food,temp) generateY(indoor_output,indoor,temp) generateY(lake_output,lake,temp) generateY(male_output,male,temp) generateY(night_output,night,temp) generateY(people_output,people,temp) generateY(plant_life_output,plant_life,temp) generateY(portrait_output,portrait,temp) generateY(river_output,river,temp) generateY(sea_output,sea,temp) generateY(sky_output,sky,temp) generateY(structures_output,structures,temp) generateY(sunset_output,sunset,temp) generateY(transport_output,transport,temp) generateY(tree_output,tree,temp) generateY(water_output,water,temp)
更多推荐
所有评论(0)