packages = ["pyflame", "opentelemetry-distro"]
terminal = false
[[fetch]]
from = "./load"
files = ["parsing-document-in-cpu-intensive-application.py"]
[[fetch]]
from = "./load"
files = ["ram_intensive_program.py"]
[[fetch]]
from = "./load"
files = ["ram_intensive_dummy_program.py"]
[[fetch]]
from = "./"
files = ["calls.py"]
[[fetch]]
from = "./"
files = ["otel_helpers.py"]
Python web performance 101: uncovering the root causes
Web engineers meet issues with performance with fast-growing or even maintaining existing products. It’s always unexpected and we have limited time for decisions. With our hero, we meet real-faced RAM, CPU and IO problems and learn troubleshooting approaches to monolithic and distributed systems.
We try different existing tools from Python and the cloud ecosystem including, but not limited to: cProfile, yappi, memory-profiler and tracing.
This talk will be more focused on backend parts and designed for intermediate-level web engineers, but all skill levels are welcome.
---
22 minutes.
Structure
Introduction (2 min.)
CPU (5 min)
- time and timeit
- cProfile
- snakeviz things
- yappi
RAM (5 min)
- memory_profiler
IO (3 min)
- heavy loaded, processing big files
Tracing (4 min)
Conclusion (3 min)
Sustainable Python Performance
Uncovering the root causes
First almost white screen
With PyScript you can use a, d keyboard buttons to move left, right
Sustainable Python Performance
Uncovering the root causes
Second almost white screen
With PyScript you can use a, d keyboard buttons to move left, right
Sustainable Python Performance
Uncovering the root causes
By Alex Ptakhin
Latest slides
Agenda
- CPU tools
- RAM tools
- Briefly IO
- Tracing
Who at least once used timeit, time.perf_counter(), CPU or memory usage profilers?
Who at least once used timeit, time.perf_counter(), CPU or memory usage profilers?
htop
Temporary solution
Temporary solution
Scale-up: more CPU, more RAM
Now we have time to debug
CPU
time.perf_counter
import time
from calls import cpu_intensive_call
start = time.perf_counter()
cpu_intensive_call(num_iterations=5000000)
end = time.perf_counter()
print('Elapsed seconds: {:.1f}'.format(end - start))
time.perf_counter
- Out of box
- Need to edit code, no internal details
cProfile
import cProfile
import re
from calls import cpu_intensive_call
cProfile.run('cpu_intensive_call(num_iterations=5000000)')
cProfile
$ python -m cProfile \
-o out/cpu-intensive-program.prof \
load/cpu-intensive-program.py
$ snakeviz out/cpu-intensive-program.prof
cProfile
- Out of box
- Internal details timings
cProfile
- Out of box
- Internal details timings
- Have visualize extensions
CPU Profilers
- Perfect! Is this all?
- Not exactly. Measuring something can change the behaviour of the system
CPU Profilers
- Perfect! Is this all?
- Not exactly. Measuring something can change the behaviour of the system
- Let's take a look to sampling profilers
py-spy
$ py-spy record -o out/py-spy.svg -- python load/cpu-intensive-program.py
Sampling profilers - gets traces after
py-spy
- Sampling profiler. Requires development environment
Bonus: yappi for asyncio
$ python load/asyncio_yappi.py
Also very interesting profiler. Supports asynchronous execution
Bonus: yappi for asyncio
$ python load/asyncio_yappi.py > out/asyncio_yappi.txt
$ snakeviz out/asyncio_yappi.prof
Bonus: yappi for asyncio
- Supports asynchronous execution
- Different clock modes
Problem found
Problem found
- With bigger data stored
- We catched not obvious iteration through many documents
After a few days
After a few days failing processes and 500.
htop
RAM
Temporary solution
Temporary solution
Restart every N requests
Temporary solution
Restart every N requests
Might be also good for the permanent solution :)
sys.getsizeof
import sys
print(f'Empty dict size: {sys.getsizeof({})}')
print(f'Empty list size: {sys.getsizeof([])}')
print(f'Empty set size: {sys.getsizeof(set())}')
sys.getsizeof
import sys
print(f'Empty list size: {sys.getsizeof([])}')
lorem = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam vitae nisl nisi. Donec malesuada luctus diam ac lacinia. Suspendisse porta dolor sem, id semper nibh tempor a. Proin porttitor nulla nec risus sollicitudin semper. Sed at lectus ante. Curabitur venenatis interdum malesuada. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ut nisl rhoncus, laoreet diam et, blandit elit. Maecenas non quam dictum, ullamcorper massa ac, egestas tortor. Suspendisse venenatis leo nisl, vel mollis turpis consequat nec. Suspendisse lobortis auctor ante id condimentum. In porta, dui ultricies placerat dapibus, lorem ante euismod mi, et pretium lectus lorem fringilla mauris. Mauris aliquet, odio ac euismod mollis, lacus dolor accumsan velit, eu dignissim felis arcu eu ex. Nunc consectetur et sapien non iaculis. Sed dictum tellus velit.'
print(f'List with long string size: {sys.getsizeof([lorem])}')
tracemalloc
import tracemalloc
def ram_intensive_dummy_call() -> None:
a = [1] * (10 ** 6)
b = [2] * (2 * 10 ** 7)
del b
return a
tracemalloc.start()
snapshot1 = tracemalloc.take_snapshot()
ram_intensive_dummy_call()
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("[ Top 10 differences ]")
for stat in top_stats[:10]:
print(stat)
memory-profiler
$ poetry add memory_profiler
$ python -m memory_profiler load/memory_profiler.py
memory-profiler
- Requires code changes for the detailed overview
memory-profiler
- Requires code changes for the detailed overview
- Uses deprecated matplotlib.pylab
- No longer maintained
memray
$ poetry add memray
$ memray run -o out/memray.bin load/ram-intensive-program.py
$ memray flamegraph out/memray.bin
$ # ... out/memray-flamegraph-memray.html
memray
- Looks promising
- No Windows support
IO
General advices
General advices
- Scale up
- Scale out
- Network
But it's not the whole story
Problems continue happening
Follow-up: what to do on regular basis?
- Benchmark in CI pipelines
Follow-up: what to do on regular basis?
- Benchmark in CI pipelines
- pyperf https://pyperf.readthedocs.io/en/latest/
Follow-up: what to do on regular basis?
- Benchmark in CI pipelines
- pyperf https://pyperf.readthedocs.io/en/latest/
- pytest-benchmark https://pytest-benchmark.readthedocs.io/en/latest/
Follow-up: what to do on regular basis?
- Benchmark in CI pipelines
- pyperf https://pyperf.readthedocs.io/en/latest/
- pytest-benchmark https://pytest-benchmark.readthedocs.io/en/latest/
- codspeed https://codspeed.io/
- Monitor production
Tracing
Open Telemetry
from calls import cpu_intensive_call
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
if __name__ == '__main__':
with tracer.start_as_current_span("cpu_intensive_call") as child:
cpu_intensive_call(num_iterations=5000000)
Open Telemetry
from otel_helpers import catchtime, init_otel
from opentelemetry import trace, metrics
from calls import cpu_intensive_call
init_otel()
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
execution_time_hgram = meter.create_histogram('execution_time')
with tracer.start_as_current_span("cpu_intensive_application") as parent:
for x in range(3):
with tracer.start_as_current_span("cpu_intensive_call") as child, catchtime() as t:
cpu_intensive_call(num_iterations=5000000)
execution_time_hgram.record(t())
Open Telemetry
Multiply vendors, e.g. Grafana.
Alternatives
Grafana Stack: Loki, Prometheus.
Alternatives
Grafana Stack: Loki, Prometheus.
Cloud intrumentation.
3 things to remember
3 things to remember
- Worth to have a chance win some time with resources
3 things to remember
- Worth to have a chance win some time with resources
- Monitor application errors
3 things to remember
- Worth to have a chance win some time with resources
- Monitor application errors
- Measuring something can change the behaviour of the system
3 4 things to remember
- Worth to have a chance win some time with resources
- Monitor application errors
- Measuring something can change the behaviour of the system
- Tuning is good, and remember, pure Python is not about the performance