Python profiling data

Python comes with a built-in way of collecting profile information, documented at

In a nutshell, it boils down to:

python3 -m cProfile -o profile.out arguments…

This post is an attempt to document what's in profile.out, since the python documentation does not cover it. I have tried to figure this out by looking at the sources of the pstats module, the sources of gprof2dot, and trying to make sense of the decoded structure, and navigate it.

Decoding Python profile data

The data collected by python's profile/cProfile is a data structure encoding using the marshal module1.

Loading the data is simple:

>>> import marshal
>>> with open("profile.out", "rb") as fd:
>>>     data = marshal.load(fd)

Structure of Python profile data

Decoded profile data is a dictionary indexed by tuples representing functions or scopes:

# File name, Line number, Function or scope name
Func = Tuple[str, int, str]

For example:

("./staticsite/", 1, "<module>")
("./staticsite/features/", 138, "load_dir")
('~', 0, "<method 'items' of 'collections.OrderedDict' objects>")
('/usr/lib/python3.7/', 1156, 'stat')

('~', 0, …) represents a built-in function.

The signature of the decoded dict seems to be:

    # Function of scope
        # Aggregated statistics
        int, int, float, float,
        # Callers
            # Caller function or scope
            # Call statistics collected for this caller → callee
            Tuple[int, int, float, float]

The four int, int, float, float statistics are normally unpacked as cc, nc, tt, ct, and stand for:

Phrases in italic are quotations from profile's documentation.

Building a call graph

This is some example code to turn the collected statistics into a call graph structure:

class Node:
    Node in the caller → callee graph
    def __init__(self, func: Func):
        self.fname, self.lineno, self.scope = func
        self.callees: List["Call"] = []
        self.callers: List["Call"] = [] = 0 = 0 = 0.0
        self.ct = 0.0

    def __str__(self):
        # Builtin function
        if self.fname == "~" and self.lineno == 0:
            return f"[builtin]:{self.scope}"

        # Shorten file names from system libraries
        self.shortname = self.fname
        for path in sorted(sys.path, key=lambda x: -len(x)):
            if not path:
            if self.fname.startswith(path):
                return f"[sys]{self.fname[len(path):]}:{self.lineno}:{self.scope}"

        # File in the local project
        return f"{self.fname}:{self.lineno}:{self.scope}"

class Call:
    Arc in the caller → callee graph
    def __init__(self, caller: Node, callee: Node, cc: int, nc: int, tt: float, ct: float):
        self.caller = caller
        self.callee = callee = cc = nc = tt
        self.ct = ct

class Graph:
    Graph of callers and callees.

    Each node in the graph represents a function, with its aggregated

    Each arc in the graph represents each collected caller → callee statistics
    def __init__(self, stats: Stats):
        # Index of all nodes in the graph
        self.nodes: Dict[Func, Node] = {}
        # Total execution time
        self.total_time = 0.0

        # Build the graph
        for callee, (cc, nc, tt, ct, callers) in stats.items():
            self.total_time += tt

            # Get the callee and fill its aggregated stats
            ncallee = self.node(callee)
   = cc
   = nc
   = tt
            ncallee.ct = ct

            # Create caller → callee arcs
            for caller, (cc, nc, tt, ct) in callers.items():
                ncaller = self.node(caller)
                call = Call(ncaller, ncallee, cc, nc, tt, ct)

    def node(self, fun: Func) -> Node:
        Lookup or create a node
        res = self.nodes.get(fun)
        if res is None:
            res = Node(fun)
            self.nodes[fun] = res
        return res

    def load(cls, pathname: str) -> "Graph":
        Builds a Graph from profile statistics saved on a file
        with open(pathname, "rb") as fd:
            return cls(marshal.load(fd))

    def from_pstats(cls, stats: pstats.Stats) -> "Graph":
        Builds a Graph from an existing pstats.Stats structure
        return cls(stats.stats)

Other python profiling links

  1. marshal does not guarantee stability of the serialized format across python versions. This means that you can't, for example, use a python2 tool to examine profiling results from a python3 script, or vice versa. See for example Debian bug #946894