Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Python Functional Python Functional Workhorses Sorting

key=itemgetter/attrgetter

Hi

It would be grateful if someone can explains on what these code works in detail.. or point me to reference so that I can understand a bit more

BOOKS = get_books('books.json')
RAW_BOOKS = get_books('books.json', raw=True)

### SORTED ###
# pub_sort = sorted(RAW_BOOKS, key=itemgetter('publish_date'))
# print(pub_sort[0]['publish_date'], pub_sort[-1]['publish_date'])
# pages_sort = sorted(BOOKS, key=attrgetter('number_of_pages'))
# print(pages_sort[0].number_of_pages, pages_sort[-1].number_of_pages)

according to Kenneth, key=.... performs the sorting. I don't know why we use key=itemgetter() for RAW_BOOKS, while using attrgetter() for books.

Below content is in Intermediate Python > Functional Python > Functional Workhorses

Thanks so much

3 Answers

My summary: itemgetter() operates on dictionaries, while attrgetter() operators on objects. In this lesson, we are explicitly passing itemgetter() and attrgetter() into sorted().

In such a case, after assigning itemgetter() to key, you must pass an int in to itemgetter(). The int is the index of the inner-list or inner-tuple. In the example below, the int must be 0 or 1, nothing higher:

fruit = [("cherry", 3), ("banana", 2)]
sorted_by_number = sorted(fruit, key=itemgetter(1))
sorted_by_name = sorted(fruit, key=itemgetter(0))

Note above: fruit must be an iterable that contains a sub-iterable.

With attrgetter(), you must pass in to key, as a string, the name of the object's attribute on which you want to sort. Here's a procedural breakdown:

# 1) Decide you want to sort an object on a particular attribute.  
    datetime.datetime.day
# 2) Type that attribute as a "string".  
    'day'
# 3) Pass it into attrgetter().  
    attrgetter('day')
# 4) Pass THAT into key.  
    key=attrgetter('day')
# 5) Pass the object into sorted().  
    sorted(datetime.datetime, key=attrgetter('day'))

I hope that helps!

Qui Le
Qui Le
10,998 Points

To understand why Kenneth is using itemgetter and attrgetter, I think it would be better to look at a similar but smaller example. Say, the variable BOOKS contains a much smaller list of BOOK objects: [Bag of bones, Carrie, Dreamcatcher, The green mile]. These BOOK objects have common attributes, one of which is number_of_pages. How do we access an attribute of an object? For example, we do this by using dot operator:

# We want to find out number of pages from the book Bag of bones
BOOK[0].number_of_pages

So, now let us look at the line of code with regards to sorting BOOKS:

pages_sort = sorted(BOOKS, key=attrgetter('number_of_pages'))

Here, we have attrgetter('number_of_pages') for the key argument of the built-in sorted() method. What it does is it returns "a callable object that fetches attr from its operand" (Functional Programming Modules, Python Standard Library). From experimenting with operator.attrgetter, I would say that it returns something similar to a function object to which you can feed an object of a class that has that attribute, which is number_of_pages in our example. But, you may ask, what is the point of the key argument? This argument, if provided, is a way for developers to customize how they want to sort the iterable (list, tuple, etc.). If key is not provided, then sorted() will compare elements within an iterable directly. So, in our sorted() statement above, what we do is we sort the BOOKS list by comparing the number_of_pages between each book [attrgetter('number_of_pages') will be applied to each book in the list, and give the number of pages for sorted() to do the comparisons].

To explain itemgetter, I will first look at RAW_BOOKS and what it contains. If we look at a smaller sample of RAW_BOOKS, we would see this

RAW_BOOKS = [{ "number_of_pages" : 849,
        "price" : 13.550000000000001,
        "publish_date" : 2011,
        "subjects" : [ "Time travel",
            "Assassination"
          ],
        "title" : "11/22/63"
      },
      { "number_of_pages" : 732,
        "price" : 7.9900000000000002,
        "publish_date" : 1999,
        "subjects" : [ "Authors",
            "Custody of children",
            "Grandfathers",
            "Haunted houses",
            "Novelists",
            "Trials (Custody of children)",
            "Widowers",
            "Widows",
            "Writer's block"
          ],
        "title" : "Bag of bones"
      }]

RAW_BOOKS is a list of dictionaries, and each dictionary is comprised of details concerning a specific book. A dictionary has a key/value pair, and we should demonstrate how we get the value if we know that publish_date is one of the keys:

# RAW_BOOKS is a list, RAW_BOOKS[0] is the first book item
RAW_BOOKS[0]['publish_date']

So, now we know what RAW_BOOKS is, and we want to sort it:

pub_sort = sorted(RAW_BOOKS, key=itemgetter('publish_date'))

Here, what we provide for the key argument of sorted() is itemgetter, which will "return a callable object that fetches item from its operand using the operand’s getitem() method" (Functional Programming Modules, Python Standard Library). When we have the callable object (like a function object) from itemgetter, we can feed it the dictionary (does not have to be a dictionary; it can be a list or tuple; it has to support the getitem() method) and it will return the value, which is the number for publish_date key from the dictionary.

Item getter is a function used to get the ITEM associated with published_date. The attrgetter is a function used to get the attribute associated with the number_of_pages. Itemgetter looks up an index, attrgetter looks up the attribute rather than the index.