6
:Author: Andrew Dalke and Raymond Hettinger
10
Python lists have a built-in :meth:`list.sort` method that modifies the list
11
in-place. There is also a :func:`sorted` built-in function that builds a new
12
sorted list from an iterable.
14
In this document, we explore the various techniques for sorting data using Python.
20
A simple ascending sort is very easy: just call the :func:`sorted` function. It
21
returns a new sorted list::
23
>>> sorted([5, 2, 3, 1, 4])
26
You can also use the :meth:`list.sort` method. It modifies the list
27
in-place (and returns *None* to avoid confusion). Usually it's less convenient
28
than :func:`sorted` - but if you don't need the original list, it's slightly
31
>>> a = [5, 2, 3, 1, 4]
36
Another difference is that the :meth:`list.sort` method is only defined for
37
lists. In contrast, the :func:`sorted` function accepts any iterable.
39
>>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
45
Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a
46
function to be called on each list element prior to making comparisons.
48
For example, here's a case-insensitive string comparison:
50
>>> sorted("This is a test string from Andrew".split(), key=str.lower)
51
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
53
The value of the *key* parameter should be a function that takes a single argument
54
and returns a key to use for sorting purposes. This technique is fast because
55
the key function is called exactly once for each input record.
57
A common pattern is to sort complex objects using some of the object's indices
60
>>> student_tuples = [
65
>>> sorted(student_tuples, key=lambda student: student[2]) # sort by age
66
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
68
The same technique works for objects with named attributes. For example:
71
def __init__(self, name, grade, age):
76
return repr((self.name, self.grade, self.age))
78
>>> student_objects = [
79
Student('john', 'A', 15),
80
Student('jane', 'B', 12),
81
Student('dave', 'B', 10),
83
>>> sorted(student_objects, key=lambda student: student.age) # sort by age
84
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
86
Operator Module Functions
87
=========================
89
The key-function patterns shown above are very common, so Python provides
90
convenience functions to make accessor functions easier and faster. The
91
:mod:`operator` module has :func:`~operator.itemgetter`,
92
:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
94
Using those functions, the above examples become simpler and faster:
96
>>> from operator import itemgetter, attrgetter
98
>>> sorted(student_tuples, key=itemgetter(2))
99
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
101
>>> sorted(student_objects, key=attrgetter('age'))
102
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
104
The operator module functions allow multiple levels of sorting. For example, to
105
sort by *grade* then by *age*:
107
>>> sorted(student_tuples, key=itemgetter(1,2))
108
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
110
>>> sorted(student_objects, key=attrgetter('grade', 'age'))
111
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
113
Ascending and Descending
114
========================
116
Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
117
boolean value. This is used to flag descending sorts. For example, to get the
118
student data in reverse *age* order:
120
>>> sorted(student_tuples, key=itemgetter(2), reverse=True)
121
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
123
>>> sorted(student_objects, key=attrgetter('age'), reverse=True)
124
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
126
Sort Stability and Complex Sorts
127
================================
129
Sorts are guaranteed to be `stable
130
<http://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
131
when multiple records have the same key, their original order is preserved.
133
>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
134
>>> sorted(data, key=itemgetter(0))
135
[('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
137
Notice how the two records for *blue* retain their original order so that
138
``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
140
This wonderful property lets you build complex sorts in a series of sorting
141
steps. For example, to sort the student data by descending *grade* and then
142
ascending *age*, do the *age* sort first and then sort again using *grade*:
144
>>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key
145
>>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending
146
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
148
The `Timsort <http://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
149
does multiple sorts efficiently because it can take advantage of any ordering
150
already present in a dataset.
152
The Old Way Using Decorate-Sort-Undecorate
153
==========================================
155
This idiom is called Decorate-Sort-Undecorate after its three steps:
157
* First, the initial list is decorated with new values that control the sort order.
159
* Second, the decorated list is sorted.
161
* Finally, the decorations are removed, creating a list that contains only the
162
initial values in the new order.
164
For example, to sort the student data by *grade* using the DSU approach:
166
>>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
168
>>> [student for grade, i, student in decorated] # undecorate
169
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
171
This idiom works because tuples are compared lexicographically; the first items
172
are compared; if they are the same then the second items are compared, and so
175
It is not strictly necessary in all cases to include the index *i* in the
176
decorated list, but including it gives two benefits:
178
* The sort is stable -- if two items have the same key, their order will be
179
preserved in the sorted list.
181
* The original items do not have to be comparable because the ordering of the
182
decorated tuples will be determined by at most the first two items. So for
183
example the original list could contain complex numbers which cannot be sorted
186
Another name for this idiom is
187
`Schwartzian transform <http://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
188
after Randal L. Schwartz, who popularized it among Perl programmers.
190
Now that Python sorting provides key-functions, this technique is not often needed.
193
The Old Way Using the *cmp* Parameter
194
=====================================
196
Many constructs given in this HOWTO assume Python 2.4 or later. Before that,
197
there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword
198
arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to
199
handle user specified comparison functions.
201
In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to
202
simplify and unify the language, eliminating the conflict between rich
203
comparisons and the :meth:`__cmp__` magic method).
205
In Py2.x, sort allowed an optional function which can be called for doing the
206
comparisons. That function should take two arguments to be compared and then
207
return a negative value for less-than, return zero if they are equal, or return
208
a positive value for greater-than. For example, we can do:
210
>>> def numeric_compare(x, y):
212
>>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare)
215
Or you can reverse the order of comparison with:
217
>>> def reverse_numeric(x, y):
219
>>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric)
222
When porting code from Python 2.x to 3.x, the situation can arise when you have
223
the user supplying a comparison function and you need to convert that to a key
224
function. The following wrapper makes that easy to do::
226
def cmp_to_key(mycmp):
227
'Convert a cmp= function into a key= function'
229
def __init__(self, obj, *args):
231
def __lt__(self, other):
232
return mycmp(self.obj, other.obj) < 0
233
def __gt__(self, other):
234
return mycmp(self.obj, other.obj) > 0
235
def __eq__(self, other):
236
return mycmp(self.obj, other.obj) == 0
237
def __le__(self, other):
238
return mycmp(self.obj, other.obj) <= 0
239
def __ge__(self, other):
240
return mycmp(self.obj, other.obj) >= 0
241
def __ne__(self, other):
242
return mycmp(self.obj, other.obj) != 0
245
To convert to a key function, just wrap the old comparison function:
247
>>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
250
In Python 3.2, the :func:`functools.cmp_to_key` function was added to the
251
:mod:`functools` module in the standard library.
256
* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
257
:func:`locale.strcoll` for a comparison function.
259
* The *reverse* parameter still maintains sort stability (so that records with
260
equal keys retain the original order). Interestingly, that effect can be
261
simulated without the parameter by using the builtin :func:`reversed` function
264
>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
265
>>> assert sorted(data, reverse=True) == list(reversed(sorted(reversed(data))))
267
* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons
268
between two objects. So, it is easy to add a standard sort order to a class by
269
defining an :meth:`__lt__` method::
271
>>> Student.__lt__ = lambda self, other: self.age < other.age
272
>>> sorted(student_objects)
273
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
275
* Key functions need not depend directly on the objects being sorted. A key
276
function can also access external resources. For instance, if the student grades
277
are stored in a dictionary, they can be used to sort a separate list of student
280
>>> students = ['dave', 'john', 'jane']
281
>>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
282
>>> sorted(students, key=newgrades.__getitem__)
283
['jane', 'dave', 'john']