当前位置: 动力学知识库 > 问答 > 编程问答 >

excel - Can pandas implicitly determine header based on value, not row?

问题描述:

I work with people who use Excel and continuously add or subtract rows unbeknownst to me. I have to scrape a document for data, and the row where the header is found changes based on moods.

My challenge is to handle these oscillating currents by detecting where the header is.

I first organized my scrape using xlrd and a number of conditional statements using the values in the workbook.

My initial attempt works and is long (so I will not publish it) but involves bringing in the entire sheet, and not slices:

from xlrd import open_workbook

book = open_workbook(fName)

sheet = book.sheet_by_name(sht)

return book,sheet

However, it is big and I would prefer to get a more targeted selection. The header values never change, nor does when the data shows up after this row.

Do you know of a way to implicitly get the header based on a found value in the sheet using either pandas.ExcelFile or pandas.read_excel?

Here is my attempt with pandas.ExcelFile:

import pandas as pd

xlsx = pd.ExcelFile(fName)

dataFrame = pd.read_excel(xlsx, sht,

parse_cols=21, merge_cells=noMerge,

header=header)

return dataFrame

I cannot get the code to work unless I give the call the correct header value, which is exactly what I'm hoping to avoid.

This previous question seems to present a similar problem without addressing the concern of finding the headers implicitly.

网友答案:

Do the same loop through ExcelFile objects:

xlsx = pd.ExcelFile(fName)
sheet = xlsx.sheet_by_name(sht)
# apply the same algorithm you wrote against xlrd here
# ... results in having header_row = something, 0 based
dataFrame = pd.read_excel(xlsx, sht,
                      parse_cols=21, merge_cells=noMerge, 
                      skip_rows=header_row)
分享给朋友:
您可能感兴趣的文章:
随机阅读: