[Python] Profiling your code

Apr. 12, 2021

“Why is my code taking so long to run?” If you ever feel like your application run slugishly and taking forever to run, where do you start searching for answer?

Maybe if you are experienced, you have the knowledge of the problem at hand, and you are an expert at code review and analysing, then you maybe quick to pin point certain snippets that could be the cause of the problem. However, it can be a tedious tasks that takes a lot time and effort.

So here come some tools and techniques that will help you getting to the answer more quickly. This exploratory process is called profiling, in which you can either time a few different modules, classes, functions to see where most of the execution time is spent, or you can profile your code to get more information about the relative time spent in different functions and sub-functions.

Firstly, you need to know how to time your code. To do so, you can use something as simple as:

from time import time

start = time()
# your code snippets
end = time()
print(f"It tooks {end - start} seconds!")

If you want something for benchmarking purposes, you can use timeit for reasonable accurate result.

From this you can find out the total run time of the program, and run time of each piece in isolation. Then you can compare to each other and to the total time to see if a specific piece of code can be labeled as the bottleneck.

Python has a module that gives us more information called cProfile. It is a C extension with reasonable overhead that makes it suitable for profiling long-running programs The Python Profilers.

cProfile

A profile is a set of statistics that describe how often and how long various parts of the program executed. You can use the run() method to start the profiling, simply pass what you want to profile as a string statement:

import cProfile

cProfile.run("pd.Series(list('ABCDEF'))")

# you can save the result to a file
# by specifying a filename
cProfile.run('re.compile("foo" | "bar")', 'restats)

It can also be invoked as a script to profile another script

python -m cProfile [-o output_file] [-s sort_order] (-m module | myscript.py)

The result is:

>>> 258 function calls (256 primitive calls) in 0.001 seconds
Ordered by: standard name
ncalls  tottime  percall  cumtime  percall filename:lineno(function)

primitive calls are those that are not induced via recursion, and the report is sorted based on standard name, which is the text in filename:lineno(function) function
ncalls: the number of calls made
tottime: the total time spent in the given function
percall: tottime divided by ncalls
cumtime: cummulative time spent in this and all sub-functions
percall: cumtime divided by primitive calls
`filename:lineno(function): respective data of each function

memory-profiler

This is a python module for monitoring memory consumption of a process as well as line-by-line analysis.

Decorate the function with @profile then run the script with memory-profiler or from import

@profile
def my_function()

if __name__ == '__main__':
    my_func()

python -m memory-profiler my-script.py

Fil - a memory profiler for data scientists

If your Python data pipeline is using too much memory, it can be very difficult to figure where all that memory is going.

Data pipelines and servers

A data pipeline means a batch program that reads some data, processes it, and then writes it out. The impact of memory usage is different from that of a server:

Server: because it keeps running, memory leaks are common cause of memory problems. Most servers just process small amount of data at a time, so actual business logic memory usage is usually less of a concern.
Data pipelines: spikes in memory usage due to processing large chunks of data are a more common problem.