当前位置: 动力学知识库 > 问答 > 编程问答 >

python & pandas - How to calculate frequency under conditions in columns in DataFrame?

问题描述:

I have a series of data in a DataFrame called frames:

 NoUsager Sens IdVehiculeUtilise NoConducteur NoAdresse Fait NoDemande Periods

0 000001 + 287Véh 000087 000079 1 42196000013 Matin

1 000001 - 287Véh 000087 000079 1 42196000013 Matin

2 000314 + 263Véh 000077 006470 1 42196000002 Matin

3 002372 + 287Véh 000087 002932 1 42196000016 Matin

4 000466 + 287Véh 000087 002932 1 42196000015 Matin

5 000314 - 263Véh 000077 000456 1 42196000002 Matin

6 000466 - 287Véh 000087 004900 1 42196000015 Matin

7 002372 - 287Véh 000087 007072 1 42196000016 Matin

8 002641 + 263Véh 000077 007225 1 42196000004 Soir

9 002641 - 263Véh 000077 000889 1 42196000004 Soir

10 000382 + 263Véh 000077 002095 1 42196000006 Soir

11 002641 + 287Véh 000087 000889 1 42196000019 Soir

12 000382 - 263Véh 000077 006168 1 42196000006 Soir

13 002641 - 287Véh 000087 007225 1 42196000019 Soir

14 001611 + 287Véh 000087 004236 -1 42196000021 Soir

15 002785 + 263Véh 000077 007482 1 42196000007 Soir

16 002372 + 287Véh 000087 007072 1 42196000022 Soir

17 002785 - 263Véh 000077 007483 1 42196000007 Soir

18 000466 + 287Véh 000087 004900 1 42196000023 Soir

19 000382 + 263Véh 000077 006168 1 42196000008 Soir

For each Usager, depending on Sens and Periods, they can have more than one related address. I want to know for all the Usager, how many address do they have and the frequency of each address.

I used frames.set_index(['NoUsager','NoAdresse']) to make it looks like:


EDIT

I don't want all the other columns but only a new one with the result of frequency. In which way I can do it? Can I use pivot() to do it?

Any help will be really appreciated!

网友答案:

I think you need groupby by columns which will be indexes (NoUsager,Sens,Periods) in output df. Then need add column (NoAdresse) as last item in list in groupby, which is converted by unstack to columns in output. And you need aggregate by size.

df = df.groupby(['NoUsager','Sens','Periods', 'NoAdresse']).size().unstack(fill_value=0)
print (df)
NoAdresse              79    456   889   2095  2932  4236  4900  6168  6470  \
NoUsager Sens Periods                                                         
1        +    Matin       1     0     0     0     0     0     0     0     0   
         -    Matin       1     0     0     0     0     0     0     0     0   
314      +    Matin       0     0     0     0     0     0     0     0     1   
         -    Matin       0     1     0     0     0     0     0     0     0   
382      +    Soir        0     0     0     1     0     0     0     1     0   
         -    Soir        0     0     0     0     0     0     0     1     0   
466      +    Matin       0     0     0     0     1     0     0     0     0   
              Soir        0     0     0     0     0     0     1     0     0   
         -    Matin       0     0     0     0     0     0     1     0     0   
1611     +    Soir        0     0     0     0     0     1     0     0     0   
2372     +    Matin       0     0     0     0     1     0     0     0     0   
              Soir        0     0     0     0     0     0     0     0     0   
         -    Matin       0     0     0     0     0     0     0     0     0   
2641     +    Soir        0     0     1     0     0     0     0     0     0   
         -    Soir        0     0     1     0     0     0     0     0     0   
2785     +    Soir        0     0     0     0     0     0     0     0     0   
         -    Soir        0     0     0     0     0     0     0     0     0   

NoAdresse              7072  7225  7482  7483  
NoUsager Sens Periods                          
1        +    Matin       0     0     0     0  
         -    Matin       0     0     0     0  
314      +    Matin       0     0     0     0  
         -    Matin       0     0     0     0  
382      +    Soir        0     0     0     0  
         -    Soir        0     0     0     0  
466      +    Matin       0     0     0     0  
              Soir        0     0     0     0  
         -    Matin       0     0     0     0  
1611     +    Soir        0     0     0     0  
2372     +    Matin       0     0     0     0  
              Soir        1     0     0     0  
         -    Matin       1     0     0     0  
2641     +    Soir        0     1     0     0  
         -    Soir        0     1     0     0  
2785     +    Soir        0     0     1     0  
         -    Soir        0     0     0     1  

If need reset index:

df = df.groupby(['NoUsager','Sens','Periods', 'NoAdresse'])
       .size()
       .unstack(fill_value=0)
       .reset_index()
       .rename_axis(None, axis=1)

print (df)
    NoUsager Sens Periods  79  456  889  2095  2932  4236  4900  6168  6470  \
0          1    +   Matin   1    0    0     0     0     0     0     0     0   
1          1    -   Matin   1    0    0     0     0     0     0     0     0   
2        314    +   Matin   0    0    0     0     0     0     0     0     1   
3        314    -   Matin   0    1    0     0     0     0     0     0     0   
4        382    +    Soir   0    0    0     1     0     0     0     1     0   
5        382    -    Soir   0    0    0     0     0     0     0     1     0   
6        466    +   Matin   0    0    0     0     1     0     0     0     0   
7        466    +    Soir   0    0    0     0     0     0     1     0     0   
8        466    -   Matin   0    0    0     0     0     0     1     0     0   
9       1611    +    Soir   0    0    0     0     0     1     0     0     0   
10      2372    +   Matin   0    0    0     0     1     0     0     0     0   
11      2372    +    Soir   0    0    0     0     0     0     0     0     0   
12      2372    -   Matin   0    0    0     0     0     0     0     0     0   
13      2641    +    Soir   0    0    1     0     0     0     0     0     0   
14      2641    -    Soir   0    0    1     0     0     0     0     0     0   
15      2785    +    Soir   0    0    0     0     0     0     0     0     0   
16      2785    -    Soir   0    0    0     0     0     0     0     0     0   

    7072  7225  7482  7483  
0      0     0     0     0  
1      0     0     0     0  
2      0     0     0     0  
3      0     0     0     0  
4      0     0     0     0  
5      0     0     0     0  
6      0     0     0     0  
7      0     0     0     0  
8      0     0     0     0  
9      0     0     0     0  
10     0     0     0     0  
11     1     0     0     0  
12     1     0     0     0  
13     0     1     0     0  
14     0     1     0     0  
15     0     0     1     0  
16     0     0     0     1  

Another solution with crosstab:

df = pd.crosstab([df.NoUsager,df.Sens,df.Periods], df.NoAdresse)
       .reset_index()
       .rename_axis(None, axis=1)

print (df)
    NoUsager Sens Periods  79  456  889  2095  2932  4236  4900  6168  6470  \
0          1    +   Matin   1    0    0     0     0     0     0     0     0   
1          1    -   Matin   1    0    0     0     0     0     0     0     0   
2        314    +   Matin   0    0    0     0     0     0     0     0     1   
3        314    -   Matin   0    1    0     0     0     0     0     0     0   
4        382    +    Soir   0    0    0     1     0     0     0     1     0   
5        382    -    Soir   0    0    0     0     0     0     0     1     0   
6        466    +   Matin   0    0    0     0     1     0     0     0     0   
7        466    +    Soir   0    0    0     0     0     0     1     0     0   
8        466    -   Matin   0    0    0     0     0     0     1     0     0   
9       1611    +    Soir   0    0    0     0     0     1     0     0     0   
10      2372    +   Matin   0    0    0     0     1     0     0     0     0   
11      2372    +    Soir   0    0    0     0     0     0     0     0     0   
12      2372    -   Matin   0    0    0     0     0     0     0     0     0   
13      2641    +    Soir   0    0    1     0     0     0     0     0     0   
14      2641    -    Soir   0    0    1     0     0     0     0     0     0   
15      2785    +    Soir   0    0    0     0     0     0     0     0     0   
16      2785    -    Soir   0    0    0     0     0     0     0     0     0   

    7072  7225  7482  7483  
0      0     0     0     0  
1      0     0     0     0  
2      0     0     0     0  
3      0     0     0     0  
4      0     0     0     0  
5      0     0     0     0  
6      0     0     0     0  
7      0     0     0     0  
8      0     0     0     0  
9      0     0     0     0  
10     0     0     0     0  
11     1     0     0     0  
12     1     0     0     0  
13     0     1     0     0  
14     0     1     0     0  
15     0     0     1     0  
16     0     0     0     1  

EDIT by comment:

I think you need only aggregate size:

df = df.groupby(['NoUsager','NoAdresse']).size().reset_index(name='Count')
print (df)
    NoUsager  NoAdresse  Count
0          1         79      2
1        314        456      1
2        314       6470      1
3        382       2095      1
4        382       6168      2
5        466       2932      1
6        466       4900      2
7       1611       4236      1
8       2372       2932      1
9       2372       7072      2
10      2641        889      2
11      2641       7225      2
12      2785       7482      1
13      2785       7483      1

If need set indexes, you can use another solution - rename Series name and then call to_frame:

df = df.groupby(['NoUsager','NoAdresse']).size().rename('Count').to_frame()
                    Count
NoUsager NoAdresse       
1        79             2
314      456            1
         6470           1
382      2095           1
         6168           2
466      2932           1
         4900           2
1611     4236           1
2372     2932           1
         7072           2
2641     889            2
         7225           2
2785     7482           1
         7483           1

Or add set_index:

df = df.groupby(['NoUsager','NoAdresse'])
       .size()
       .reset_index(name='Count')
       .set_index(['NoUsager','NoAdresse'])
print (df)
                    Count
NoUsager NoAdresse       
1        79             2
314      456            1
         6470           1
382      2095           1
         6168           2
466      2932           1
         4900           2
1611     4236           1
2372     2932           1
         7072           2
2641     889            2
         7225           2
2785     7482           1
         7483           1
分享给朋友:
您可能感兴趣的文章:
随机阅读: