python各种文件数据的读取

(持续更新中…)

Mr-Cat伍可猫

10858人浏览 · 2018-12-06 15:44:02

Mr-Cat伍可猫 · 2018-12-06 15:44:02 发布

(持续更新中…)

文章目录

0 常规方法open

0.1 读取时存在中文无法识别

关键因素：

以二进制读取，‘rb’
以utf-8解码
通过open可以读取写入，参考open读取写入
open语法：

open(name[, mode[, buffering]])

参数描述如下
在这里插入图片描述
以下举例说明：
举例：现在有一个test.txt文件,内容如下：

分别使用上面的三个函数打印：
1），使用read()—读取全部文件内容
返回字符串类型

f = open('new/test.txt','rb')
a = f.read()
print(a)
print(a.decode('utf-8'))

注意，需要是用utf-8才能打印出来想要的结果，
在这里插入图片描述
2），使用readline()----只读取一行
返回字符串类型

f = open('new/test.txt','rb')
a = f.readline()
print(a)
print(a.decode('utf-8'))

同理需要使用utf-8
在这里插入图片描述
3），使用readlines()—一行行读取文件
返回类型是列表，每行作为列表的一个元素（元素都是字符串）存放在list中

f = open('new/test.txt','rb')
a = f.readlines()
for i in a:
    print(i.decode('utf-8'))

在这里插入图片描述

0.2 写入，写入中文

1）写入英文
这里使用追加和读写模式a+，这样增加的内容会在最后，同时我使用了\n表示换一行再写。如果是w或者r+模式则在首行开始写

f = open('new/test.txt','a+')
f.write('\nabcd')
f.close()
f = open('new/test.txt','rb')
w=f.read().decode('utf-8')
print(w)

在这里插入图片描述
2）写入中文
需要加入编码encoding=‘utf-8’

f = open('new/test.txt','a+',encoding='utf-8')
f.write('\n今天天气很好')
f.close()
f = open('new/test.txt','rb')
w=f.read().decode('utf-8')
print(w)

3）从文件A读取写入到文件B
有一个名为bibtex.bib的文件，内容如下：

@article{Akrami:2018mcd,
      author         = "Akrami, Y. and others",
      title          = "{Planck 2018 results. IV. Diffuse component separation}",
      collaboration  = "Planck",
      year           = "2018",
      eprint         = "1807.06208",
      archivePrefix  = "arXiv",
      primaryClass   = "astro-ph.CO",
      SLACcitation   = "%%CITATION = ARXIV:1807.06208;%%"
}

现在要提取‘author’,‘title’,'year’和’eprint’到文件test.txt中

f = open(r'demo1\bibtex.bib','r+')  #阅读形式打开文件
f_list = f.readlines()      #list方式缓存每一行
f.close()
save_f = open(r'demo1\test.txt','w+') #打开需要存入数据的文件
vocab = ['author', 'title', 'year','eprint'] 
#可以使用列表解析或者两层for循环
# [save_f.write(i) for i in f_list for x in vocab if x in i]
for i in f_list:
    for x in vocab:
        if x in i:
            # print(i.decode('utf-8'))
            save_f.write(i)
            # save_f.close()

#测试一下保存的文件
test_f = open(r'demo1/test.txt','rb')
print(test_f.read().decode('utf-8'))

读取的test.txt文件如下：

      author         = "Akrami, Y. and others",
      title          = "{Planck 2018 results. IV. Diffuse component separation}",
      year           = "2018",
      eprint         = "1807.06208",

在此强调一下，列表解析可能比for循环快点儿，但并没有太大的提高，如stackoverflow说的，使用c会更好，比如列表解析提高15%的话，c可以有300%

1.读取excel文件

已知有个名为student_score.xlsx的文件，现需要读取里面的文件
在这里插入图片描述

一）python读取：

a)使用xlrd库函数

import numpy as np
import xlrd   #使用库函数

workbook = xlrd.open_workbook('C:/users/lenovo/desktop/student_score.xlsx')  #读取路径
sheet = workbook.sheet_by_name('Sheet1')     #读取excel中的第一个sheet

data_name = sheet.col_values(0)    #按列读取，读取第一列
#data_name1 = sheet.row_values(0)  #按行读取，读取第一行
data_st_ID = sheet.col_values(1)
data_st_score = sheet.col_values(2)

结果如下
在这里插入图片描述

2.读取csv文件

csv文件是逗号隔开的文件，比如将上面的excel文件另存为csv文件然后通过下面的方式打开,csv通过记事本打开如下
在这里插入图片描述

一）python读取：

可以通过with open ‘xxx’ as的方式也可以直接open

a)with open (‘xxx’) as f

with open('C:/users/lenovo/desktop/student_score.csv','r') as f:
    for line in f.readlines(): #逐行读取
        print(line)

在这里插入图片描述

b) open(‘xxx’)

filename = open('C:/users/lenovo/desktop/student_score.csv','r')
for line in filename:
    print(line)

在这里插入图片描述

3.读取txt文件

numpy.loadtxt

dataset = np.loadtxt('路径')

通过with open

一次性读完

with open('my_file.txt') as file_object:
	contents = file_object.read()  #一次性全读
	print(contents)

逐行读取

with open('my_file.txt') as f:
	for line in f:      #逐行读取
		print(line.strip())  #使用strip删除空格和空行，否则会有\n在最后

4.借助format写任意文件

import sys 
sys.path.append('..')

import numpy as np 

global_config='''
This is an example of
how to write a file with 
{global_name}
'''

layer_config='''
layer {layername}
P {pbase:.4f} mbar
T {tbase:.6f} K
h {height:.3f} m
column {rh:.6f}%
'''

config_f = open('example.txt','w+')
config_f.write(global_config.format(global_name='Python'))
config_f.write(layer_config.format(layername='layer1',pbase=0.45236,tbase=3.461346,height=123.613476,rh=9.2385723))

得到的文件为example.txt，如下：
在这里插入图片描述
还可以将最后一行借助for循环写多个layer

import sys 
sys.path.append('..')

import numpy as np 

global_config='''
This is an example of
how to write a file with 
{global_name}
'''

layer_config='''
layer {layername}
P {pbase:.4f} mbar
T {tbase:.6f} K
h {height:.3f} m
column {rh:.6f}%
'''

config_f = open('example.txt','w+')
config_f.write(global_config.format(global_name='Python'))
for layer in range(1,4):
    config_f.write(layer_config.format(layername=f'layer{layer}',pbase=0.45236,tbase=3.461346,height=123.613476,rh=9.2385723))

在这里插入图片描述
可以看到上面的输出每次输出前都有一个空行，这是因为我们用'''的时候空了一行。另外如果想要输出的数是整型的，那么只需要对里面的参数变为整型即可，如loadtxt(‘xxx.txt’).astype(int),或者对pbase,tbase等改成整型再write写入文本。

import sys 
sys.path.append('..')

import numpy as np 
import csv

global_config='''This is an example of   #不空行
how to write a file with 
{global_name}
'''

layer_config='''layer {layername}  #不空行
P {pbase:.2f} mbar
T {tbase:.2f} K
h {height:.3f} m
column {rh:.6f}%
'''

config_f = open('example.txt','w+',newline='')
config_f.write(global_config.format(global_name='Python'))
for layer in range(1,4):
    config_f.write(layer_config.format(layername=f'layer{layer}',pbase=45.236,tbase=3.461346,height=123.613476,rh=9.2385723))
config_f.close()