I have a list of over 100000 values and I am iterating over these values and checking if each one is contained in another list of random values (of the same size).
I am doing this by using
if item[x] in randomList.
How efficient is this? Does python do some sort of hashing for each container or is it internally doing a straight up search of the other container to find the element I am looking for?
Also, if it does this search linearly, then does it create a dictionary of the randomList and do the lookup with that?
in is implemented by the
__contains__ magic method of the object it applies to, so the efficiency is dependent upon that. For instance,
frozenset will be hash based lookups, while
list will require a linear search. However,
range in Python 3.x) has a
__contains__ method that doesn't require a linear search, but instead can use the start/stop/step information to determine a truthy value. (eg:
7 in xrange(4, 1000000) isn't done linearly).
Custom classes are free to implement
__contains__ however they see fit but ideally should provide some information about how it does so in documentation if "not obvious".
You will want to pre-convert your list to a set, where hashing can be used for O(1) lookup.
(Normally, you have to search every element in a 'classical' list to tell if something is in it (unless your data structure also keeps a set of all elements, but that would add a huge amount of time and space complexity, and the programmer can implement it themselves).)
ninjagecko already answered the specific case of lists (and tuples FWIW), the general answer is : "it depends on the 'container' implementation" (I quote 'container' cause it can as well be a generator). For builtin types, you want sets or dicts for a fast lookup. Else you'd better check the doc or implementation for your container.
You can do this without using a set if the two lists are sorted, as follows:
def gen(P1, P2): i, j = 0, 0 I, J = len(P1), len(P2) while i != I and j != J: if P1[i] == P2[j]: yield P1[i] i, j = i + 1, j + 1 else: if P1[i] < P2[j]: i += 1 else: j += 1
This returns the intersection of P1 and P2:
>>> P1 = [1, 2, 3, 5, 90] >>> P2 = [2, 3, 5, 30, 48] >>> list(gen(P1, P2)) [2, 3, 5]
The algorithm is described here.