当前位置: 动力学知识库 > 问答 > 编程问答 >

python - pandas: reduce data frame to only specific chain of occurences

问题描述:

Assume I have this data:

>>> data = {'event': [0,1,1,2,1,0],

... 'val1': [1, 2, 3, 4, 5, 6]

... }

>>> df1 = pd.DataFrame(data, index = ['hash1', 'hash1', 'hash2',

'hash3', 'hash3', 'hash3'])

>>> df1

event val1

hash1 0 1

hash1 1 2

hash2 1 3

hash3 2 4

hash3 1 5

hash3 0 6

What I want to do:

I want to reduce df to only show me data about those hashes, which have at least one occurence with all unique values of event.

so in the end I would get a dataframe looking like this:

 event val1

hash3 2 4

hash3 1 5

hash3 0 6

I tried to split the dataframe to events either equal to zero vs bigger to zero and then tried to look up the indexes from the "equal to zero" dataframe in the "not equal to zero" dataframe - but I'm just really bad with pandas. If someone could help me accomplish this, I would be very grateful.

Thanks in advance guys!

网友答案:

You can filter the df prior to groupby on the index and then get the number of unique events using nunique and filter the orig df on the hashes that have more than a single unique entry:

In [62]:
gp = df1[df1['event'] !=0].groupby(level=0)['event'].nunique()
df1.loc[gp[gp> 1].index]

Out[62]:
       event  val1
hash3      2     4
hash3      1     5
hash3      0     6

breaking the above down:

In [63]:
df1['event'] !=0

Out[63]:
hash1    False
hash1     True
hash2     True
hash3     True
hash3     True
hash3    False
Name: event, dtype: bool

In [64]:
df1[df1['event'] !=0]

Out[64]:
       event  val1
hash1      1     2
hash2      1     3
hash3      2     4
hash3      1     5

In [65]:
df1[df1['event'] !=0].groupby(level=0)['event'].nunique()

Out[65]:
hash1    1
hash2    1
hash3    2
Name: event, dtype: int64

In [66]:
gp[gp> 1]

Out[66]:
hash3    2
Name: event, dtype: int64

EDIT

Based on your update you can compare the length of nunique values against the length of unique values of event:

In [107]:
df1.loc[df1.groupby(level=0)['event'].nunique() == len(df1['event'].unique())]

Out[107]:
       event  val1
hash3      2     4
hash3      1     5
hash3      0     6
分享给朋友:
您可能感兴趣的文章:
随机阅读: