I work with people who use Excel and continuously add or subtract rows unbeknownst to me. I have to scrape a document for data, and the row where the header is found changes based on moods.
My challenge is to handle these oscillating currents by detecting where the header is.
I first organized my scrape using
xlrd and a number of conditional statements using the values in the workbook.
My initial attempt works and is long (so I will not publish it) but involves bringing in the entire sheet, and not slices:
from xlrd import open_workbook
book = open_workbook(fName)
sheet = book.sheet_by_name(sht)
However, it is big and I would prefer to get a more targeted selection. The header values never change, nor does when the data shows up after this row.
Do you know of a way to implicitly get the header based on a found value in the sheet using either
Here is my attempt with
import pandas as pd
xlsx = pd.ExcelFile(fName)
dataFrame = pd.read_excel(xlsx, sht,
I cannot get the code to work unless I give the call the correct header value, which is exactly what I'm hoping to avoid.
This previous question seems to present a similar problem without addressing the concern of finding the headers implicitly.
Do the same loop through ExcelFile objects:
xlsx = pd.ExcelFile(fName) sheet = xlsx.sheet_by_name(sht) # apply the same algorithm you wrote against xlrd here # ... results in having header_row = something, 0 based dataFrame = pd.read_excel(xlsx, sht, parse_cols=21, merge_cells=noMerge, skip_rows=header_row)