packages = ["pyflame", "opentelemetry-distro"]
terminal = false
[[fetch]]
from = "./load"
files = ["parsing-document-in-cpu-intensive-application.py"]
[[fetch]]
from = "./load"
files = ["ram_intensive_program.py"]
[[fetch]]
from = "./load"
files = ["ram_intensive_dummy_program.py"]
[[fetch]]
from = "./"
files = ["calls.py"]
[[fetch]]
from = "./"
files = ["otel_helpers.py"]
Python web performance 101: uncovering the root causes
Web engineers meet issues with performance with fast-growing or even maintaining existing products. It’s always unexpected and we have limited time for decisions. With our hero, we meet real-faced RAM, CPU and IO problems and learn troubleshooting approaches to monolithic and distributed systems.
We try different existing tools from Python and the cloud ecosystem including, but not limited to: cProfile, yappi, memory-profiler and tracing.
This talk will be more focused on backend parts and designed for intermediate-level web engineers, but all skill levels are welcome.
---
22 minutes.
Structure
Introduction (2 min.)
CPU (5 min)
- time and timeit
- cProfile
- snakeviz things
- yappi
RAM (5 min)
- memory_profiler
IO (3 min)
- heavy loaded, processing big files
Tracing (4 min)
Conclusion (3 min)
    
Sustainable Python Performance
    Uncovering the root causes
    First almost white screen
    With PyScript you can use a, d keyboard buttons to move left, right
    
Sustainable Python Performance
    Uncovering the root causes
    Second almost white screen
    With PyScript you can use a, d keyboard buttons to move left, right
    Sustainable Python Performance
    Uncovering the root causes
    By Alex Ptakhin
    
    Latest slides
     
 
    Agenda
    
        - CPU tools
- RAM tools
- Briefly IO
- Tracing
 
    Who at least once used timeit, time.perf_counter(), CPU or memory usage profilers?
    
        Who at least once used timeit, time.perf_counter(), CPU or memory usage profilers?
    
 
    htop
     
 
    
Temporary solution
    Temporary solution
    Scale-up: more CPU, more RAM
    
        Now we have time to debug
    
 
    
CPU
    
    time.perf_counter
    
        import time
        from calls import cpu_intensive_call
        start = time.perf_counter()
        cpu_intensive_call(num_iterations=5000000)
        end = time.perf_counter()
        print('Elapsed seconds: {:.1f}'.format(end - start))
    
    
 
    time.perf_counter
    
        - Out of box
- Need to edit code, no internal details
 
    cProfile
    
        import cProfile
        import re
        from calls import cpu_intensive_call
        cProfile.run('cpu_intensive_call(num_iterations=5000000)')
    
    
 
    cProfile
    $ python -m cProfile \
    -o out/cpu-intensive-program.prof \
    load/cpu-intensive-program.py
$ snakeviz out/cpu-intensive-program.prof
     
 
    cProfile
    
        - Out of box
- Internal details timings
 
    cProfile
    
        - Out of box
- Internal details timings
- Have visualize extensions
 
    CPU Profilers
    
        - Perfect! Is this all?
- Not exactly. Measuring something can change the behaviour of the system
 
    CPU Profilers
    
        - Perfect! Is this all?
- Not exactly. Measuring something can change the behaviour of the system
- Let's take a look to sampling profilers
 
    py-spy
    
    $ py-spy record -o out/py-spy.svg -- python load/cpu-intensive-program.py
    
     
    
        Sampling profilers - gets traces after
    
 
    py-spy
    
        - Sampling profiler. Requires development environment
 
    Bonus: yappi for asyncio
    
    $ python load/asyncio_yappi.py
    
    
        Also very interesting profiler. Supports asynchronous execution
    
 
    Bonus: yappi for asyncio
    $ python load/asyncio_yappi.py > out/asyncio_yappi.txt
$ snakeviz out/asyncio_yappi.prof
     
 
    Bonus: yappi for asyncio
    
        - Supports asynchronous execution
- Different clock modes
 
    
Problem found
    Problem found
    
        - With bigger data stored
- We catched not obvious iteration through many documents
 
    After a few days
    
        After a few days failing processes and 500.
    
 
    htop
     
 
    
RAM
    
Temporary solution
    
Temporary solution
    Restart every N requests
    
Temporary solution
    Restart every N requests
Might be also good for the permanent solution :)
    sys.getsizeof
    
        import sys
        print(f'Empty dict size: {sys.getsizeof({})}')
        print(f'Empty list size: {sys.getsizeof([])}')
        print(f'Empty set size: {sys.getsizeof(set())}')
    
    
 
    sys.getsizeof
    
        import sys
        print(f'Empty list size: {sys.getsizeof([])}')
        lorem = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam vitae nisl nisi. Donec malesuada luctus diam ac lacinia. Suspendisse porta dolor sem, id semper nibh tempor a. Proin porttitor nulla nec risus sollicitudin semper. Sed at lectus ante. Curabitur venenatis interdum malesuada. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed ut nisl rhoncus, laoreet diam et, blandit elit. Maecenas non quam dictum, ullamcorper massa ac, egestas tortor. Suspendisse venenatis leo nisl, vel mollis turpis consequat nec. Suspendisse lobortis auctor ante id condimentum. In porta, dui ultricies placerat dapibus, lorem ante euismod mi, et pretium lectus lorem fringilla mauris. Mauris aliquet, odio ac euismod mollis, lacus dolor accumsan velit, eu dignissim felis arcu eu ex. Nunc consectetur et sapien non iaculis. Sed dictum tellus velit.'
        print(f'List with long string size: {sys.getsizeof([lorem])}')
    
    
 
    tracemalloc
    
        import tracemalloc
        def ram_intensive_dummy_call() -> None:
            a = [1] * (10 ** 6)
            b = [2] * (2 * 10 ** 7)
            del b
            return a
        tracemalloc.start()
        snapshot1 = tracemalloc.take_snapshot()
        ram_intensive_dummy_call()
        snapshot2 = tracemalloc.take_snapshot()
        top_stats = snapshot2.compare_to(snapshot1, 'lineno')
        print("[ Top 10 differences ]")
        for stat in top_stats[:10]:
            print(stat)
    
    
 
    memory-profiler
    
    $ poetry add memory_profiler
$ python -m memory_profiler load/memory_profiler.py
    
    
    
 
    memory-profiler
    
        - Requires code changes for the detailed overview
 
    memory-profiler
    
        - Requires code changes for the detailed overview
- Uses deprecated matplotlib.pylab
- No longer maintained
 
    memray
    $ poetry add memray
$ memray run -o out/memray.bin load/ram-intensive-program.py
$ memray flamegraph out/memray.bin
$ # ... out/memray-flamegraph-memray.html
     
 
    memray
    
        - Looks promising
- No Windows support
 
    
IO
    
General advices
    General advices
    
        - Scale up
- Scale out
- Network
 
    
But it's not the whole story
    
Problems continue happening
    Follow-up: what to do on regular basis?
    
        - Benchmark in CI pipelines
 
    Follow-up: what to do on regular basis?
    
        - Benchmark in CI pipelines
            - pyperf https://pyperf.readthedocs.io/en/latest/
 
    Follow-up: what to do on regular basis?
    
        - Benchmark in CI pipelines
            - pyperf https://pyperf.readthedocs.io/en/latest/
- pytest-benchmark https://pytest-benchmark.readthedocs.io/en/latest/
 
    Follow-up: what to do on regular basis?
    
        - Benchmark in CI pipelines
            - pyperf https://pyperf.readthedocs.io/en/latest/
- pytest-benchmark https://pytest-benchmark.readthedocs.io/en/latest/
- codspeed https://codspeed.io/
- Monitor production
 
    
Tracing
    Open Telemetry
    
        from calls import cpu_intensive_call
        from opentelemetry import trace
        tracer = trace.get_tracer(__name__)
        if __name__ == '__main__':
            with tracer.start_as_current_span("cpu_intensive_call") as child:
                cpu_intensive_call(num_iterations=5000000)
    
 
    Open Telemetry
    
        from otel_helpers import catchtime, init_otel
        from opentelemetry import trace, metrics
        from calls import cpu_intensive_call
        init_otel()
        tracer = trace.get_tracer(__name__)
        meter = metrics.get_meter(__name__)
        execution_time_hgram = meter.create_histogram('execution_time')
        with tracer.start_as_current_span("cpu_intensive_application") as parent:
            for x in range(3):
                with tracer.start_as_current_span("cpu_intensive_call") as child, catchtime() as t:
                    cpu_intensive_call(num_iterations=5000000)
                execution_time_hgram.record(t())
    
    
 
    Open Telemetry
    Multiply vendors, e.g. Grafana.
     
 
    Alternatives
    Grafana Stack: Loki, Prometheus.
 
    Alternatives
    Grafana Stack: Loki, Prometheus.
    Cloud intrumentation.
 
    
3 things to remember
    3 things to remember
    
        - Worth to have a chance win some time with resources
 
    3 things to remember
    
        - Worth to have a chance win some time with resources
- Monitor application errors
 
    3 things to remember
    
        - Worth to have a chance win some time with resources
- Monitor application errors
- Measuring something can change the behaviour of the system
 
    3 4 things to remember
    
        - Worth to have a chance win some time with resources
- Monitor application errors
- Measuring something can change the behaviour of the system
- Tuning is good, and remember, pure Python is not about the performance