I want to find strings listed in list.txt (one string per line) in another text file in case I found it print 'string,one_sentence' in case didn't find 'string,another_sentence'. I'm using following code, but it is finding only last string in the strings list from file list.txt. Cannot understand what could be the reason?
data = open('c:/tmp/textfile.TXT').read()
for x in open('c:/tmp/list.txt').readlines():
if x in data:
When you read a file with
readlines(), the resulting list elements do have a trailing newline characters. Likely, these are the reason why you have less matches than you expected.
Instead of writing
for x in list:
for x in (s.strip() for s in list):
This removes leading and trailing whitespace from the strings in
list. Hence, it removes trailing newline characters from the strings.
In order to consolidate your program, you could do something like this:
with open('c:/tmp/textfile.TXT') as f: haystack = f.read() if not haystack: sys.exit("Could not read haystack data :-(") with open('c:/tmp/list.txt') as f: for needle in (line.strip() for line in f): if needle in haystack: print(needle, ',one_sentence') else: print(needle, ',another_sentence')
I did not want to make too drastic changes. The most important difference is that I am using the context manager here via the
with statement. It ensures proper file handling (mainly closing) for you. Also, the 'needle' lines are stripped on the fly using a generator expression. The above approach reads and processes the needle file line by line instead of loading the whole file into memory at once. Of course, this only makes a difference for large files.
readlines() keeps a newline character at the end of each string read from your list file. Call strip() on those strings to remove those (and every other whitespace) characters.