It's never easy to admit when you do things wrong, but making errors is part of any learning process, from learning to walk to learning a new programming language, such as Python.
Here's a list of three things I got wrong when I was learning Python, presented so that newer Python programmers can avoid making the same mistakes. These are errors that either I got away with for a long time or that that created big problems that took hours to solve.
Take heed young coders, some of these mistakes are afternoon wasters!
1. Mutable data types as default arguments in function definitions
It makes sense right? You have a little function that, let's say, searches for links on a current page and optionally appends it to another supplied list.
def search_for_links(page, add_to=[]):
new_links = page.search_for_links()
add_to.extend(new_links)
return add_to
On the face of it, this looks like perfectly normal Python, and indeed it is. It works. But there are issues with it. If we supply a list for the add_to parameter, it works as expected. If, however, we let it use the default, something interesting happens.
Try the following code:
def fn(var1, var2=[]):
var2.append(var1)
print var2
fn(3)
fn(4)
fn(5)
You may expect that we would see:
[3]
[4]
[5]
But we actually see this:
[3]
[3, 4]
[3, 4, 5]
Why? Well, you see, the same list is used each time. In Python, when we write the function like this, the list is instantiated as part of the function's definition. It is not instantiated each time the function is run. This means that the function keeps using the exact same list object again and again, unless of course we supply another one:
fn(3, [4])
[4, 3]
Just as expected. The correct way to achieve the desired result is:
def fn(var1, var2=None):
if not var2:
var2 = []
var2.append(var1)
Or, in our first example:
def search_for_links(page, add_to=None):
if not add_to:
add_to = []
new_links = page.search_for_links()
add_to.extend(new_links)
return add_to
This moves the instantiation from module load time so that it happens every time the function runs. Note that for immutable data types, like tuples, strings, or ints, this is not necessary. That means it is perfectly fine to do something like:
def func(message="my message"):
print message
2. Mutable data types as class variables
Hot on the heels of the last error is one that is very similar. Consider the following:
class URLCatcher(object):
urls = []
def add_url(self, url):
self.urls.append(url)
This code looks perfectly normal. We have an object with a storage of URLs. When we call the add_url method, it adds a given URL to the store. Perfect right? Let's see it in action:
a = URLCatcher()
a.add_url('http://www.google.')
b = URLCatcher()
b.add_url('http://www.bbc.co.')
b.urls
['http://www.google.com', 'http://www.bbc.co.uk']
a.urls
['http://www.google.com', 'http://www.bbc.co.uk']
Wait, what?! We didn't expect that. We instantiated two separate objects, a and b. A was given one URL and b the other. How is it that both objects have both URLs?
Turns out it's kinda the same problem as in the first example. The URLs list is instantiated when the class definition is created. All instances of that class use the same list. Now, there are some cases where this is advantageous, but the majority of the time you don't want to do this. You want each object to have a separate store. To do that, we would modify the code like:
class URLCatcher(object):
def __init__(self):
self.urls = []
def add_url(self, url):
self.urls.append(url)
Now the URLs list is instantiated when the object is created. When we instantiate two separate objects, they will be using two separate lists.
3. Mutable assignment errors
This one confused me for a while. Let's change gears a little and use another mutable datatype, the dict.
a = {'1': "one", '2': 'two'}
Now let's assume we want to take that dict and use it someplace else, leaving the original intact.
b = a
b['3'] = 'three'
Simple eh?
Now let's look at our original dict, a, the one we didn't want to modify:
{'1': "one", '2': 'two', '3': 'three'}
Whoa, hold on a minute. What does b look like then?
{'1': "one", '2': 'two', '3': 'three'}
Wait what? But… let's step back and see what happens with our other immutable types, a tuple for instance:
c = (2, 3)
d = c
d = (4, 5)
Now c is:
(2, 3)
While d is:
(4, 5)
That functions as expected. So what happened in our example? When using mutable types, we get something that behaves a little more like a pointer from C. When we said b = a in the code above, what we really meant was: b is now also a reference to a. They both point to the same object in Python's memory. Sound familiar? That's because it's similar to the previous problems. In fact, this post should really have been called, "The Trouble with Mutables."
Does the same thing happen with lists? Yes. So how do we get around it? Well, we have to be very careful. If we really need to copy a list for processing, we can do so like:
b = a[:]
This will go through and copy a reference to each item in the list and place it in a new list. But be warned: If any objects in the list are mutable, we will again get references to those, rather than complete copies.
Imagine having a list on a piece of paper. In the original example, Person A and Person B are looking at the same piece of paper. If someone changes that list, both people will see the same changes. When we copy the references, each person now has their own list. But let's suppose that this list contains places to search for food. If "fridge" is first on the list, even when it is copied, both entries in both lists point to the same fridge. So if the fridge is modified by Person A, by say eating a large gateaux, Person B will also see that the gateaux is missing. There is no easy way around this. It is just something that you need to remember and code in a way that will not cause an issue.
Dicts function in the same way, and you can create this expensive copy by doing:
b = a.copy()
Again, this will only create a new dictionary pointing to the same entries that were present in the original. Thus, if we have two lists that are identical and we modify a mutable object that is pointed to by a key from dict 'a', the dict object present in dict 'b' will also see those changes.
The trouble with mutable data types is that they are powerful. None of the above are real problems; they are things to keep in mind to prevent issues. The expensive copy operations presented as solutions in the third item are unnecessary 99% of the time. Your program can and probably should be modified so that those copies are not even required in the first place.
Happy coding! And feel free to ask questions in the comments.
7 Comments