Python Web Performance 101: Uncovering the root causes

Sustainable Python Performance

Uncovering the root causes

First almost white screen

With PyScript you can use a, d keyboard buttons to move left, right

Sustainable Python Performance

Uncovering the root causes

Second almost white screen

With PyScript you can use a, d keyboard buttons to move left, right

Sustainable Python Performance

Uncovering the root causes

By Alex Ptakhin

Tech Lead at Prestatech GmbH, Berlin

Latest slides

Agenda

CPU tools
RAM tools

Agenda

CPU tools
RAM tools
Briefly IO
Tracing

Who at least once used timeit, time.perf_counter(), CPU or memory usage profilers?

Image by Foundry Co from Pixabay

htop

Temporary solution

Scale-up: more CPU, more RAM

Now we have time to debug

CPU

time.perf_counter

import time from calls import cpu_intensive_call start = time.perf_counter() cpu_intensive_call(num_iterations=5000000) end = time.perf_counter() print('Elapsed seconds: {:.1f}'.format(end - start))

time.perf_counter

Out of box

time.perf_counter

Out of box
Need to edit code, no internal details

cProfile

import cProfile import re from calls import cpu_intensive_call cProfile.run('cpu_intensive_call(num_iterations=5000000)')

cProfile

$ python -m cProfile \
    -o out/cpu-intensive-program.prof \
    load/cpu-intensive-program.py
$ snakeviz out/cpu-intensive-program.prof

cProfile

Out of box
Internal details timings

cProfile

Out of box
Internal details timings
Have visualize extensions

CPU Profilers

Perfect! Is this all?

CPU Profilers

Perfect! Is this all?
Not exactly. Measuring something can change the behaviour of the system

CPU Profilers

Perfect! Is this all?
Not exactly. Measuring something can change the behaviour of the system
Let's take a look to sampling profilers

py-spy

$ py-spy record -o out/py-spy.svg -- python load/cpu-intensive-program.py

Sampling profilers - gets traces after

py-spy

Sampling profiler

py-spy

Sampling profiler. Requires development environment

Bonus: yappi for asyncio

$ python load/asyncio_yappi.py

Also very interesting profiler. Supports asynchronous execution

Bonus: yappi for asyncio

$ python load/asyncio_yappi.py > out/asyncio_yappi.txt
$ snakeviz out/asyncio_yappi.prof

Bonus: yappi for asyncio

Supports asynchronous execution
Different clock modes

Problem found

With bigger data stored
We catched not obvious iteration through many documents

After a few days

After a few days failing processes and 500.

htop

RAM

Temporary solution

Restart every N requests

Temporary solution

Restart every N requests
Might be also good for the permanent solution :)

sys.getsizeof

import sys print(f'Empty dict size: {sys.getsizeof({})}') print(f'Empty list size: {sys.getsizeof([])}') print(f'Empty set size: {sys.getsizeof(set())}')

sys.getsizeof

import sys print(f'Empty list size: {sys.getsizeof([])}') lorem = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam vitae nisl nisi. Donec malesuada luctus diam ac lacinia. Suspendisse porta dolor sem, id semper nibh tempor a. Proin porttitor nulla nec risus sollicitudin semper. Sed at lectus ante. Curabitur venenatis interdum malesuada. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ut nisl rhoncus, laoreet diam et, blandit elit. Maecenas non quam dictum, ullamcorper massa ac, egestas tortor. Suspendisse venenatis leo nisl, vel mollis turpis consequat nec. Suspendisse lobortis auctor ante id condimentum. In porta, dui ultricies placerat dapibus, lorem ante euismod mi, et pretium lectus lorem fringilla mauris. Mauris aliquet, odio ac euismod mollis, lacus dolor accumsan velit, eu dignissim felis arcu eu ex. Nunc consectetur et sapien non iaculis. Sed dictum tellus velit.' print(f'List with long string size: {sys.getsizeof([lorem])}')

tracemalloc

import tracemalloc def ram_intensive_dummy_call() -> None: a = [1] * (10 ** 6) b = [2] * (2 * 10 ** 7) del b return a tracemalloc.start() snapshot1 = tracemalloc.take_snapshot() ram_intensive_dummy_call() snapshot2 = tracemalloc.take_snapshot() top_stats = snapshot2.compare_to(snapshot1, 'lineno') print("[ Top 10 differences ]") for stat in top_stats[:10]: print(stat)

memory-profiler

$ poetry add memory_profiler
$ python -m memory_profiler load/memory_profiler.py

memory-profiler

Requires code changes for the detailed overview

memory-profiler

Requires code changes for the detailed overview
Uses deprecated matplotlib.pylab
No longer maintained

memray

$ poetry add memray
$ memray run -o out/memray.bin load/ram-intensive-program.py
$ memray flamegraph out/memray.bin
$ # ... out/memray-flamegraph-memray.html

memray

Looks promising

memray

Looks promising
No Windows support

IO

General advices

Scale up
Scale out
Network

But it's not the whole story

Problems continue happening

Follow-up: what to do on regular basis?

Benchmark in CI pipelines

Follow-up: what to do on regular basis?

Benchmark in CI pipelines

pyperf https://pyperf.readthedocs.io/en/latest/

Follow-up: what to do on regular basis?

Benchmark in CI pipelines

pyperf https://pyperf.readthedocs.io/en/latest/
pytest-benchmark https://pytest-benchmark.readthedocs.io/en/latest/

Follow-up: what to do on regular basis?

Benchmark in CI pipelines

pyperf https://pyperf.readthedocs.io/en/latest/
pytest-benchmark https://pytest-benchmark.readthedocs.io/en/latest/
codspeed https://codspeed.io/

Monitor production

Tracing

Open Telemetry

from calls import cpu_intensive_call from opentelemetry import trace tracer = trace.get_tracer(__name__) if __name__ == '__main__': with tracer.start_as_current_span("cpu_intensive_call") as child: cpu_intensive_call(num_iterations=5000000)

Open Telemetry

from otel_helpers import catchtime, init_otel from opentelemetry import trace, metrics from calls import cpu_intensive_call init_otel() tracer = trace.get_tracer(__name__) meter = metrics.get_meter(__name__) execution_time_hgram = meter.create_histogram('execution_time') with tracer.start_as_current_span("cpu_intensive_application") as parent: for x in range(3): with tracer.start_as_current_span("cpu_intensive_call") as child, catchtime() as t: cpu_intensive_call(num_iterations=5000000) execution_time_hgram.record(t())

Open Telemetry

Multiply vendors, e.g. Grafana.

Alternatives

Grafana Stack: Loki, Prometheus.

Alternatives

Grafana Stack: Loki, Prometheus.

Cloud intrumentation.

3 things to remember

Worth to have a chance win some time with resources

3 things to remember

Worth to have a chance win some time with resources
Monitor application errors

3 things to remember

Worth to have a chance win some time with resources
Monitor application errors
Measuring something can change the behaviour of the system

3 4 things to remember

Worth to have a chance win some time with resources
Monitor application errors
Measuring something can change the behaviour of the system
Tuning is good, and remember, pure Python is not about the performance

Thank you! Questions?

Gratitudes

PyScript for empowering the presentation
highlight.js for syntax highlighting
Prestatech GmbH for this talk support
OpenSource developers without which this talk wasn't possible

By Alex Ptakhin, Tech Lead at Prestatech GmbH, Berlin. [email protected] / github.com/aptakhin / twitter.com/aptakhin / hachyderm.io/@AlexPtakhin / linkedin.com/in/aptakhin

Latest slides

https://aptakhin.name/talks/2023-Sustainable-Python-Performance/

Secret reference slide

https://docs.python.org/3/library/profile.html
Yappi clock types:
- CPU Time
- WALL Time
https://docs.python.org/3/library/tracemalloc.html
https://bloomberg.github.io/memray/
https://opentelemetry.io/ecosystem/vendors/
init_otel implementantion python source

Sustainable Python Performance

Uncovering the root causes

First almost white screen

With PyScript you can use a, d keyboard buttons to move left, right

Sustainable Python Performance

Uncovering the root causes

Second almost white screen

With PyScript you can use a, d keyboard buttons to move left, right

Sustainable Python Performance

Uncovering the root causes

By Alex Ptakhin

Tech Lead at Prestatech GmbH, Berlin

Latest slides

Agenda

Agenda

Who at least once used timeit, time.perf_counter(), CPU or memory usage profilers?

htop

Temporary solution

Temporary solution

Scale-up: more CPU, more RAM

CPU

time.perf_counter

time.perf_counter

time.perf_counter

cProfile

cProfile

cProfile

cProfile

CPU Profilers

CPU Profilers

CPU Profilers

py-spy

py-spy

py-spy

Bonus: yappi for asyncio

Bonus: yappi for asyncio

Bonus: yappi for asyncio

Problem found

Problem found

After a few days

htop

RAM

Temporary solution

Temporary solution

Restart every N requests

Temporary solution

Restart every N requestsMight be also good for the permanent solution :)

sys.getsizeof

sys.getsizeof

tracemalloc

memory-profiler

memory-profiler

memory-profiler

memray

memray

memray

IO

General advices

General advices

But it's not the whole story

Problems continue happening

Follow-up: what to do on regular basis?

Follow-up: what to do on regular basis?

Follow-up: what to do on regular basis?

Follow-up: what to do on regular basis?

Tracing

Open Telemetry

Open Telemetry

Open Telemetry

Alternatives

Alternatives

3 things to remember

3 things to remember

3 things to remember

3 things to remember

3 4 things to remember

Thank you! Questions?

Gratitudes

Latest slides

Secret reference slide

Restart every N requests
Might be also good for the permanent solution :)