当前位置: 动力学知识库 > 问答 > 编程问答 >

postgresql - Python code randomly freezes

问题描述:

I am going through a dataset and trying to calculate features from a database (postgres).

The issue is that every now and then, the program gets stuck somewhere (confirmed from database logs, which I enabled, where no new queries take place for a long time), and when I press ctrl+c, the program seems to resume normally (I've not yet confirmed if the calculations are correct since there are a lot of rows). It doesn't get stuck at the same location, but it seems to have a random pattern. Any idea what I might be doing wrong?

I have 2 files main.py and NAC.py.

main.py:

import NAC

from dateutil.parser import parse

from datetime import timedelta

rows = fc.Read_CSV_to_Dict(input_file) #just a wrapper around csv.Dictreader

i=0

start_time = time.time()

for row in rows : #rows has about 600,000 rows

ret1,ret2 = NAC.function(row['key1'], ...) #and other parameters

#new keys

row['newKey1'],row['newKey2'] = ret1

row['newKey3'],row['newKey4'] = ret2 #unpacking

i=i+1

if(i%10000==0): #progress monitor

print i

print (time.time()-start_time)/60

NAC.db_close()

NAC.py:

from dateutil.parser import parse

from datetime import timedelta

import psycopg2

import psycopg2.extras

def function(param1, ...):

"""

Returns:

2 element list, each a list by itself

"""

nsclist = [0]*param2_count

naclist = [0]*param2_count

for i in range(param2_count):

stime = (begintime + timedelta(seconds = 60*intervalPeriod * i))

etime = (begintime + timedelta(seconds = 60*intervalPeriod * (i+1)))

table1_query = "select sum(count)from table1 where column1= '{0}' and column2>'{1}'::TIMESTAMP WITH TIME ZONE and column2<='{2}'::TIMESTAMP WITH TIME ZONE"

cur.execute(sched_query.format(param1,stime,etime))

nsclist[i] = cur.fetchone()[0]

if(nsclist[i] == []):

nsclist[i] = 0

table2_query = "select sum(count)from table2 where column1 = '{0}' and column2 >'{1}'::TIMESTAMP WITH TIME ZONE and column2 <='{2}'::TIMESTAMP WITH TIME ZONE"

cur.execute(table2_query .format(param1,stime,etime))

naclist[i] = cur.fetchone()[0]

if(naclist[i] == []):

naclist[i] = 0

return nsclist, naclist

def db_close():

cur.close()

conn.close()

intervalPeriod = 5 #minutes

conn = psycopg2.connect(cs.local_connstr)

cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)

DB LOG TIMESTAMPS:

2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:29:30 ctl+c pressed (manually added... not in the log)

2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ...

2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ...

网友答案:

It turns out that there was an issue with the cursor. I just had to open and close the cursor at every function call. I'm not sure why.

from dateutil.parser import parse
from datetime import timedelta
import psycopg2
import psycopg2.extras

def function(param1, ...):
    """     
    Returns:
        2 element list, each a list by itself
    """ 
    cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
    nsclist = [0]*param2_count
    naclist = [0]*param2_count  
    for i in range(param2_count):
        table1_query = "select sum(count)from table1 where column1= '{0}' and column2>'{1}'::TIMESTAMP WITH TIME ZONE and column2<='{2}'::TIMESTAMP WITH TIME ZONE"
        cur.execute(sched_query.format(param1,stime,etime))
        nsclist[i] = cur.fetchone()[0]
        if(nsclist[i] == []):
            nsclist[i] = 0
        table2_query = "select sum(count)from table2 where column1 = '{0}' and column2 >'{1}'::TIMESTAMP WITH TIME ZONE and column2 <='{2}'::TIMESTAMP WITH TIME ZONE"
        cur.execute(table2_query .format(param1,stime,etime))
        naclist[i] = cur.fetchone()[0]
        if(naclist[i] == []):
            naclist[i] = 0
    cur.close()
    return nsclist, naclist

def db_close():
    conn.close()

intervalPeriod = 5 #minutes
conn = psycopg2.connect(cs.local_connstr)
分享给朋友:
您可能感兴趣的文章:
随机阅读: