当前位置: 动力学知识库 > 问答 > 编程问答 >

windows - Python EOL issue in text file

问题描述:

G'day all,

I have a text file that was extracted from comments field of a Geographic Information Systems (GIS) app (name withheld). I need to parse this text and produce a pretty report. The text has individual lines that are terminated with Carriage Return/Linefeed (x0D/x0A). However, some of the lines contain a newline within the body of the text. Not sure how this happens. Cause is irrelevant. I just need to deal. My text looks like this (data changed, but basic idea the same) --

this is line 01

this is line 02

this is line 03

and it contains a newline after the 03 character string

this is line 04

I can't represent the text file correctly in this post because my cut and post is stripping the CR/LFs out but there is CR/LF after each "line 0?" string. This posting mechanism doesn't permit attachment of files or I'd attach this short text file.

I need to read each whole line up to CR/LF and print it out.

Lines 1 and 2 print OK.

Line 3 prints up to the first 03.

So when I read this with the following snippet --

import sys

import os

if __name__ == '__main__':

if sys.version_info >= (3, 0):

print ("script: EOL_Python_test.py");

print ("Python version: " + str(sys.version_info));

# vars

input_file = r"EOL_test_file.txt";

input_data_line = "";

line_number = 0;

output_line = "";

# end vars def

if os.path.isfile(input_file):

output_line = "processing file: " + input_file + "\n";

print (output_line);

original_file = open(input_file)

input_data_line = original_file.readline().strip("\r\n")

while input_data_line != "":

line_number = line_number + 1;

output_line = "line #:" + str(line_number) + " " + \

str(input_data_line);

print (output_line)

input_data_line = original_file.readline().strip("\r\n")

# regex for replacing EOL with newline? "\r\n?|\n"

original_file.close();

else:

print ("must run on Python 3+, now exiting...");

exit;

everything prints OK except the 3rd and 4th lines. Line 3 prints the 3rd line up to the newline. Line 4 prints the remainder of the 3rd line. The program then continues on, adding an extra line to the line count variable and of course, printing one too many lines.

So... why does Python break on both the newline and the carraige return/newline combo when reading a text file?

Is there a way I can remove the newline before I issue the readlines() call? Use a regex?

Ideas?

ty, Glen

网友答案:

You can try to make a string out of a whole file and then split it by '\r\n':

input_data = original_file.read().split('\r\n')
for line in input_data:
    ...

But remember it's not an efficient method for big files.

分享给朋友:
您可能感兴趣的文章:
随机阅读: