当前位置: 动力学知识库 > 问答 > 编程问答 >

how to get the 10 most frequent strings in a list in python

问题描述:

I have a list that has 93 different strings. I need to find the 10 most frequent strings and the return must be in order from most frequent to least frequent.

mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig']

# this is just a sample of the actual list.

I dont have the newest version of python and cannot use a counter.

网友答案:

You could use a Counter from the collections module to do this.

from collections import Counter
c = Counter(mylist)

Then doing c.most_common(10) returns

[('and', 13),
 ('all', 2),
 ('as', 2),
 ('borogoves', 2),
 ('boy', 1),
 ('blade', 1),
 ('bandersnatch', 1),
 ('beware', 1),
 ('bite', 1),
 ('arms', 1)]
网友答案:

David's answer is the best - but if you are using a version of Python that does not include Counter from the collections module (which was introduced in Python 2.7), you can use this implementation of a counter class that does the same thing. I suspect that it would be slower than the module, but will do the same thing.

网友答案:

Without using Counter as the modified version of the question requests

Changed to use heap.nlargest as suggested by @Duncan

>>> from collections import defaultdict
>>> from operator import itemgetter
>>> from heapq import nlargest
>>> mylist = ['"and', '"beware', '`twas', 'all', 'all', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'and', 'arms', 'as', 'as', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'borogoves', 'borogoves', 'boy', 'brillig']
>>> c = defaultdict(int)
>>> for item in mylist:
        c[item] += 1


>>> [word for word,freq in nlargest(10,c.iteritems(),key=itemgetter(1))]
['and', 'all', 'as', 'borogoves', 'boy', 'blade', 'bandersnatch', 'beware', 'bite', 'arms']
网友答案:

David's solution is the best.

But probably more for fun than anything, here is a solution that does not import any module:

dicto = {}

for ele in mylist:
    try:
        dicto[ele] += 1
    except KeyError:
        dicto[ele] = 1

top_10 = sorted(dicto.iteritems(), key = lambda k: k[1], reverse = True)[:10] 

Result:

>>> top_10
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]

EDIT:

Answering the follow up question:

new_dicto = {}

for val, key in zip(dicto.itervalues(), dicto.iterkeys()):

    try:
        new_dicto[val].append(key)
    except KeyError:
        new_dicto[val] = [key]

alph_sorted = sorted([(key,sorted(val)) for key,val in zip(new_dicto.iterkeys(), new_dicto.itervalues())], reverse = True)

Result:

>>> alph_sorted
[(13, ['and']), (2, ['all', 'as', 'borogoves']), (1, ['"and', '"beware', '`twas', 'arms', 'awhile', 'back', 'bandersnatch', 'beamish', 'beware', 'bird', 'bite', 'blade', 'boy', 'brillig'])]

The words that show up once are sorted alphabetically, if you notice some words have extra quotation marks in them.

EDIT:

Answering another follow up question:

top_10 = []

for tup in alph_sorted:
    for word in tup[1]:
        top_10.append(word)
        if len(top_10) == 10:
            break

Result:

>>> top_10
['and', 'all', 'as', 'borogoves', '"and', '"beware', '`twas', 'arms', 'awhile', 'back']
网友答案:

In case your Python Version does not support Counter, you can do the way Counter is implemented

>>> import operator,collections,heapq
>>> counter = collections.defaultdict(int)
>>> for elem in mylist:
    counter[elem]+=1        
>>> heapq.nlargest(10,counter.iteritems(),operator.itemgetter(1))
[('and', 13), ('all', 2), ('as', 2), ('borogoves', 2), ('boy', 1), ('blade', 1), ('bandersnatch', 1), ('beware', 1), ('bite', 1), ('arms', 1)]

If you see the Counter Class, it creates a dictionary of the occurrence of all the elements present in the Iterable It then puts the data in an heapq, key is the value of the dictionary and retrieves the nargest

分享给朋友:
您可能感兴趣的文章:
随机阅读: