当前位置: 动力学知识库 > 问答 > 编程问答 >

python - group data ranges by column value not zero

问题描述:

I have the following dataframe

 count

2015-09-28 2

2015-09-29 2

2015-09-30 0

2015-10-01 2

2015-10-02 3

2015-10-05 2

2015-10-06 1

2015-10-07 0

2015-10-08 1

I would like to group by data ranges that are separated by the days with count==0. I would like to get something like this

 first last totalcount

1 2015-09-28 2015-09-29 4

2 2015-10-01 2015-10-06 8

3 2015-10-08 2015-10-08 1

网友答案:

Use cumsum to associate each row with a group number:

In [134]: df['groupno'] = (df['count'] == 0).cumsum()

In [135]: df
Out[135]: 
            count  groupno
2015-09-28      2        0
2015-09-29      2        0
2015-09-30      0        1
2015-10-01      2        1
2015-10-02      3        1
2015-10-05      2        1
2015-10-06      1        1
2015-10-07      0        2
2015-10-08      1        2

Then you can use groupby/agg to obtain the desired result:

import pandas as pd
df = pd.DataFrame({'count': [2, 2, 0, 2, 3, 2, 1, 0, 1]},
                  index=[u'2015-09-28', u'2015-09-29', u'2015-09-30', u'2015-10-01',
                         u'2015-10-02', u'2015-10-05', u'2015-10-06', u'2015-10-07',
                         u'2015-10-08'])


mask = (df['count'] == 0)
df['groupno'] = mask.cumsum()
# Remove the rows where the count is 0
df = df.loc[~mask]
# Make the index a column so we can use 'index':['first', 'last'] to find the
# first and last date in each group.
df = df.reset_index()
result = df.groupby('groupno').agg({'index':['first', 'last'], 'count':'sum'})
result.columns = result.columns.droplevel(0)
result = result.rename(columns={'sum':'totalcount'})

yields

         totalcount       first        last
groupno                                    
0                 4  2015-09-28  2015-09-29
1                 8  2015-10-01  2015-10-06
2                 1  2015-10-08  2015-10-08
分享给朋友:
您可能感兴趣的文章:
随机阅读: