Use PDB over print

Published on Jul 07, 2022

When I started coding in python, whenever my script crashed with some traceback, I would start adding prints here and there and would re-run the script as many times as needed to figure out what's wrong. Now, this approach works, but takes a lot of time and may be quite frustrating (Where should I put prints? What should I print?).

Here, I would like to show a different approach using python's built-in debugger (and some alternatives). This article should serve as a guide mainly to PDB, but also to debugging in general.

Simple example

Imagine you have a simple script that just requests some URL and tries to convert the response to JSON, like this

# example.py
import requests

response = requests.get("https://example.com")
json_response = response.json()

When you run this script, it will fail with a traceback similar to this one:

Traceback (most recent call last):
  File "/home/libor/venv/lib/site-packages/requests/models.py", line 971, in json
    return complexjson.loads(self.text, **kwargs)
  File "/home/libor/venv/lib/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/libor/venv/lib/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/libor/venv/lib/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/libor/example.py", line 4, in <module>
    json_response = response.json()
  File "/home/libor/venv/lib/site-packages/requests/models.py", line 975, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The traceback is telling us there is something wrong with deserializing data to JSON. Common approach to get to know what went wrong would be to print pretty much everything relevant. The altered code might look like this

import requests

response = requests.get("https://exmple.com")
print(response)
print(dir(response))
print(response.status_code)
print(response.text)
json_response = response.json()

That's a lot of prints to get an idea of what's wrong, and once you do, you have to remove them all (well, at least if this was production code).

Let's try a different approach using PDB. To invoke our script in PDB, we don't have to install any extra libraries as it is a part of the standard libraries. To start debugging, run python -m pdb example.py, which in human language means "invoke module pdb with argument my-script". Once you do so, you will see an output something like this

/home/libor/my-script(1)<module>()
-> import requests

The first line is telling us that we are on the first line of the script example.py and the second line shows the code on that line. There are now several ways on how to proceed, and the PDB's interactive shell also supports some help. We can see the help page by typing h (or help) and hitting enter. The help shows all the available commands and their shortcuts. The full reference to the commands can also be found in the documentation. Let's start with what I personally find the most useful - breakpoint. We can break the script execution at any line (before a crash occurs) and inspect our variables and code.

When you check the PDB's help or the documentation, you can see there's break. The syntax of setting a breakpoint can be as simple as b 4, telling the debugger "stop execution on line 4 of the current file". But it can be much more complicated (that's the true power of the breakpoints), more on that later. When you type that and hit enter, the PDB will in return tell you where the breakpoint is set.

When the breakpoint's set, let's execute the script by running c (or continue). The execution will stop on line 4 just as we wanted and the PDB will output something like

> /home/libor/example.py(4)<module>()
-> json_response = response.json()

At this moment we can hit c again and the program will fail. Or we can start inspecting what's wrong. Let's do the latter. To print a variable (or expression) simply do p <expression>. So in this case we could do

(Pdb) p response
<Response [200]>
(Pdb) pp dir(response)  # pp stands for prettyprint, dir is built-in function for showing all the methods and attributes on an object
[...
<some attributes>
...]
(Pdb) p response.status_code
200
(Pdb) p response.text
<some HTML string>

After issuing the last command, we can see that the response is not a JSON but HTML, so that's what's wrong with the code. Now we can simply exit the debugger using q or quit and start fixing our code.

I know, I know, this was a lot of work for a very trivial example, but when coding a real script / application / tool / whatever, all this may actually come in handy. Especially when reproducing the crash is not easy or it takes significant amount of time (you don't want to add print here and there, run the code, go grab a coffee to come back and finding out that the print printed something useless and you are back where you started).

Until here I was mainly giving the examples of how to do some things. Let's dive deeper and properly now. I will start with all the ways of how to trigger the PDB, then going through the commands and their optional parameters. Finally, I will go through alternative debuggers, that can be often found in the Python world, and their pros and cons.

Invoking PDB

We have already seen one way of invoking the PDB. It's by running command python -m pdb <script>. This way works but often the debugged code is not a simple script and in case of larger project with multiple files, it's not very convenient to do it this way. The reason is simple, mostly you run the debugger because you want to set a breakpoint somewhere, and running the code this way would require you to type the full path to the file where you want to set up the breakpoint. There are simpler ways of doing it.

From python version 3.7 there is a new built-in function called breakpoint() (guess what it does). Just for completion, before python 3.7 you would have to do import pdb; pdb.set_trace(). Once the executed code hits the line with this function call, it drops the execution to the PDB. This is very convenient way of debugging when you know approximate or exact place in your code where a crash occurs and you want to inspect under what condition it happens.

There is also another way of invoking PDB. Let's say you are running your code from some main() function like this

if __name__ == "__main__":
    main()

A simple alteration to the code will drop you to the debugger once there is an exception, allowing you to inspect stack and variables on it.

import pdb

if __name__ == "__main__":
    try:
        main()
    except Exception as e:
        pdb.post_mortem()

These are the ways how to invoke the debugger. Let's look at the power the debugger gives us.

PDB commands

We have already seen some of the commands that can be issued to the debugger. When we run help inside the debugger, we can see all the available commands.

Navigation

First we will need commands for navigating in the code. We have already seen c(ontinue) which runs the code until it hits the next breakpoint. For executing only current line, we have n(ext) and if we want to step into a call there is s(tep), if the call is nested, it will step into the inner most call, then the second inner most, etc... There is also r(eturn) that continues until it hits the return line (in case function is not returning, it will stop on the last line).

Let's see these comands in action debugging a simple script

# navigating.py
def inc(x: int) -> int:
    print(x)  # just to make this function a bit longer
    x += 1
    return x


if __name__ == "__main__":
    number = inc(2)
    print(number)

Running python -m pdb navigating.py will drop us directly to the debugger with output like this

> /home/libor/navigating.py(1)<module>()
-> def inc(x: int) -> int:

Meaning that we are on the first line of the script being debugged. By issuing command n the execution moves on to the line 7 with if __name__ == "__main__":, skipping the inc function. We can now simply hit enter which issues the last command moving the execution to the line 8 with the output

> /home/libor/navigating.py(8)<module>()
-> number = inc(2)

We now may want to step into the inc function call. That can be done by issuing s command, moving us back to the line 1. However, if this time we issue n, it won't move to the line 7 again, but to the line 2.

Issuing r at this moment will execute everything up to the return line (line 4) and stop there. It also means the print on line 2 will output 2.

And finally issuing c will continue the program until it stops. It should execute the print on the last line of the script outputing 3 and say The program finished and will be restarted. At this moment we are back on the line 1 and we can run all the commands again. To stop the debugger use q(uit).

Listing source code

Now that we know how to navigate in debugger, it would be also nice to list the source code, because frankly, having to open a file containing the function we just stepped into would be tedious.

For that pdb offers two commands. First of them is l(ist), it lists 11 lines around line but when issued again, it lists next 11 lines. The command also takes 3 optional parameters, you can read how they work in the docs. The second command is ll or longlist, which lists the source code of current frame - it can be function, method, class or the whole file. These commands are quite intuitive, so I will skip examples of their usage.

Breakpoints

With navigating and listing source code, we have solid basics for starting to place breakpoints. Of course, we can always use simply breakpoint function, but sometimes it is more convenient to use breakpoint functionality provided by pdb.

Firstly let's look into documentation:

b(reak) [([filename:]lineno | function) [, condition]]

This looks a bit complicated, but all parameters from the first square bracket are optional. It also means we can use just b(reak), but more on that later.

The easiest usage is b <linenumber>, which places a breakpoint in current file to line linenumber. We can also specify some other file using b file:<linenumber> which will place breakpoint into file to the specified line number. Also instead of linenumber we can use function name, but that sounds like too much typing. If we know the function name, we also know the line number (or at least modern IDEs should be able to tell us), so why bother with names. When we have specified breakpoints, we can simply run c and wait until it hits some breakpoint. Let's see that in action. We will use the previous example navigating.py:

python -m pdb navigating.py

Following output should show:

> /home/libor/navigating.py(1)<module>()
-> def inc(x: int) -> int:

Let's list the source code, so we can see where to put the breakpoint:

(Pdb) l
  1  -> def inc(x: int) -> int:
  2         print(x)
  3         x += 1
  4         return x
  5
  6
  7     if __name__ == "__main__":
  8         number = inc(2)
  9         print(number)
[EOF]

There are not really many interesting places where to put a breakpoint, but line 8 looks promising, as there is a function call. So now command b 8 will do that and we should see:

(Pdb) b 8
Breakpoint 1 at /home/libor/navigating.py:8

Alternatively, we could use b navigating.py:8 or specify the full file path b /home/libor/navigating.py:8 (be sure to use your path, not mine :)).

This says we have a breakpoint with id 1 set to line 8. We now need not use n or s to 'slowly' get to the line, instead we can just run c and the debugger will stop there on (any) encounter.

(Pdb) c
> /home/libor/navigating.py(8)<module>()
-> number = inc(2)

One more thing about placing breakpoints remain - conditioned breakpoints. Consider code like this:

# iterating.py
items = ["banana", "apple", "orange"]

for item in items:
    print(item)

If we placed simple breakpoint on line 4 and run c, the debugger stops there. If now we wanted the debugger to just run until the end and simply issued another c, we end up on the same line. The reason is simple, we placed a breakpoint into loop and the loop runs 3 times. This might be tedious, especially if we only want to break in loop undder given conditions and we don't want to inspect all the other cases. Lucky for us, we can simply use b 4, item == "apple" and the c command will only break on line 4 if item is equal to "apple".

We have seen all the possible parameter choices for b(reak), but what does it do without any parameters? It simply lists all the breakpoints we have set. If we continue with the previous example and put in b we will get

(Pdb) b
Num Type         Disp Enb   Where
1   breakpoint   keep yes   at /home/libor/iterating.py:4
        stop only if item == "apple"

With this in hand, we can disable or enable a breakpoint and we can delete it by its id. Using disable 1 in the previous example will disable the breakpoint

(Pdb) disable 1
Disabled breakpoint 1 at /home/libor/iterating.py:4

If we now continue the debugger, it will run till the end of the script. We can enable the breakpoint again by using enable 1 or remove it by running clear 1. We can also remove all breakpoints by running clear, this will prompt us Clear all breaks? and by y we confirm it.

There is also (ignore)[https://docs.python.org/3/library/pdb.html#pdbcommand-ignore] command that can be used like ignore bpnumber [count]. It will ignore breakpoint count times and then enable it again.

One convenient way of getting to some place is a tbreak. It has the same syntax as normal breakpoint but it 'disappears' the first time its hit. This can serve well when we want to reach some place in the code without setting there a permanent breakpoint (e.g. frequently called function).

Another convenient way of moving forward in the code execution is using unt(il), which only works in the current frame. Let's say we simply want to move few lines in the current function without setting a breakpoint, then using unt(il) is the best way to do so. Again to see this in the previous example:

(Pdb) unt 4
> /home/libor/iterating.py(4)<module>()
-> print(item)

Inspecting variables, evaluating expressions

We have tools needed for navigating in the code, and we should be able to get to the critical parts of the code. But when we get there we also want to be able to inspect variables and evaluate expressions to get (ideally all) informations necessary for fixing the code. For that we have several options. One of the simplest is to print arguments of current function, to do that just run a(rgs). We can see that on the navigating example:

python -m pdb navigating.py
> /home/libor/navigating.py(1)<module>()
-> def inc(x: int) -> int:
(Pdb) ll
  1  -> def inc(x: int) -> int:
  2         print(x)
  3         x += 1
  4         return x
  5
  6
  7     if __name__ == "__main__":
  8         number = inc(2)
  9         print(number)
(Pdb) unt 8
> /home/libor/navigating.py(8)<module>()
-> number = inc(2)
(Pdb) s
> /home/libor/navigating.py(1)inc()
-> def inc(x: int) -> int:
(Pdb) a
x = 2

Even better are commands p(rint) and pp (prettyprint), let's see them in action. For these commands we don't really need any file to debug, we just need to get into a debugger, let's just use some file to get there but ignore everything in it.

python -m pdb iterating.py
> /home/libor/iterating.py(1)<module>()
-> items = ["banana", "apple", "orange"]
(Pdb) p list(range(23))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
(Pdb) pp list(range(23))
[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22]

(The output may differ based on your display size. If the pp prints values in row, increase the range to e.g. 100)

What happens here is that p(rint) prints out whatever representation of given object. In case it was a container such as dict or list, it would print it in a line regardless length. On the other hand pp looks at what the object to be printed is and tries to fit it into some line length, making it somewhat easier for the reader to read the output.

On the above example there is also one more thing worth noticing, p and pp evaluate expressions. This means we can give it whatever expression available in the scope of current frame and it will be evaluated. This is the real power of pdb.

Not only can we evaluate expression available in the current scope, we can also create our own scope. When I return to the previous example where we ended, we can do for example this:

(Pdb) import requests; requests.get("https://example.com")
<Response [200]>

Walking stack

This section is just to make our life easier. When we are debugging, we can print the whole stack and move on it up and down. The commands are w(here) for printing the stack, u(p) for moving up and d(own) for moving down.

Let's see an example of a nested calls:

# nested.py
from typing import Any, Dict

import requests


def get_response_json(url: str) -> Dict[str, Any]:
    response = requests.get(url)
    return response.json()


def main(domain: str) -> None:
    url = "https://" + domain
    data = get_response_json(url)


if __name__ == "__main__":
    main("example.com")

And debugging the inner most function:

python -m pdb nested.py
> /home/libor/nested.py(1)<module>()
-> from typing import Any, Dict
(Pdb) b 8
Breakpoint 1 at /home/libor/nested.py:8
(Pdb) c
> /home/libor/nested.py(8)get_response_json()
-> return response.json()
(Pdb) w
  /home/libor/venv/lib/bdb.py(597)run()
-> exec(cmd, globals, locals)
  <string>(1)<module>()
  /home/libor/nested.py(17)<module>()
-> main()
  /home/libor/nested.py(13)main()
-> data = get_response_json(url)
> /home/libor/nested.py(8)get_response_json()
-> return response.json()

At this moment we can see printed stack similarly as when our script crashes. Let's move up to the main function and see what was the value of domain variable

(Pdb) u
> /home/libor/nested.py(13)main()
-> data = get_response_json(url)
(Pdb) ll
 11     def main(domain: str) -> None:
 12         url = "https://" + domain
 13  ->     data = get_response_json(url)
(Pdb) p domain
'example.com'

We can also move back to the get_response_json function using d and at any point if we use w it will print the whole stack with an arrow indicating the frame we are currently in.

PDB cons

There are some cases where PDB won't be of much help.

First of all debugging C or cython code requires gdb or other c debuggers.

Debugging threads and multiprocesses in pdb doesn't work very well, there's no way to switch between threads, processes and continue execution of just that one.

Allternative debuggers

iPDB

iPDB is a third party debugger that makes the user experience somewhat better especially because it offers syntax highlighting. The commands are the same as for plain pdb.

It can be installed simply by pip install ipdb and invoked similarly as pdb, for example python -m ipdb navigating.py.

iPDB

I personally dislike it for its verbosity, especially when it comes to inspecting the stack. Let's see example:

python -m ipdb nested.py
> /home/libor/nested.py(1)<module>()
----> 1 from typing import Any, Dict
      2 
      3 import requests

ipdb> b 8
Breakpoint 1 at /home/libor/nested.py:8
ipdb> c
> /home/libor/nested.py(8)get_response_json()
      7     response = requests.get(url)
1---> 8     return response.json()
      9

ipdb> w

See the result here

iPDB stack

But I believe iPDB can be useful especially when it comes to learning with debugger.

PyCharm's debugger

PyCharm's debugger is great especially when it comes to debugging threads and multiprocesses. You can simply place breakpoint anywhere in the code and it stops the execution there. It also allows to switch between threads and processes and inspect their variables.

Conclusion

PDB is great debugger for most of the developer's work. It's also useful to be familiar with it because it comes bundled to every python installation, so whenever you have to debug a code outside your environment, it's possibly the only option you have.

Libor's blog

Use PDB over print

Simple example

Invoking PDB

PDB commands

Navigation

Listing source code

Breakpoints

Inspecting variables, evaluating expressions

Walking stack

PDB cons

Allternative debuggers

iPDB

PyCharm's debugger

Conclusion