Slight tagent but do you and many others here appear mere hours after their project is mentioned? Pure luck or a large percentage of HN users are querying the api to see who mentions their projects?
thank you so much! it wasn't a big deal (if you remembered to check after using py-spy, granted). i'd take py-spy with this bug vs not having py-spy at all any day :)
> you can either race with the program and hope that you read its memory to get the function stack before it changes what function it's running (and you're likely to win the race, because C is faster than Python), or you can pause the program briefly while taking a sample.
Somewhat interestingly, this problem doesn't seem to occur with Ruby - and rbspy can get away without pausing the target program with only minor errors seen when profiling a similar function. I suspect this is because of differences between how the Ruby and Python interpreters store call stack information, but haven't had a chance to dig into the specifics.
Interesting article. While I definitely think you should be profiling your code to figure out the hot spots, cProfile has some limitations for profiling: cProfile doesn't give you line numbers, doesn’t work with threads, and significantly slows your program down.
I wrote a tool py-spy (https://github.com/benfred/py-spy) that is worth checking out if you’re interesting in profiling python programs. Not only does it solve those problems with cProfile - py-spy also lets you generate a flamegraph, profile running programs in production, works with multiprocess python applications, can profile native python extensions etc.
Have you looked at Yappi[0]? I use it in combination with kcachegrind[1] (call graph viewer) and the combination has been extremely useful in eliminating bottlenecks across entire programs.
Side note: I also used pyreverse, now part of pylint, to diagram entire projects and get a class hierarchy. It helped tremendously in refactoring and decoupling code through whole projects, finding redundancies, and have a better architecture.
Shoutout to Ben, py-spy is an amazing profiler. I believe cProfile has certain limitations and doesn't fully understand deep call stacks. py-spy does not have that limitation. It also offers multiple output formats (especially flamegraph and speedscope format, https://www.speedscope.app/) which make it so much nicer to identify bottlenecks.
At our company, py-spy has helped us a lot for our line-of-business application. I'm not affiliated with Ben in any way, but he deserves some praise for his work on py-spy.
Author here. It's worth noting that since I wrote this post, py-spy has gained the ability to profile multiprocess python applications - and can also now show local variables in the dump command.
Thanks! Both of your suggestions totally make sense. I've created an issue to track the poll() issue here https://github.com/benfred/py-spy/issues/13 - I think that should be an easy fix.
And as a slightly different take than that of the person posting the issue - interfaces like kcachegrind are a pretty clunky (if powerful, in their clunky way) - the profiler coming with some built-in presentation and reporting of its own like the flamegraph and the realtime display is a big win and a serious deficiency in most python profilers.
I analyzed the top 1 million robots.txt files looking for sites that allow google and block everyone else here: https://www.benfrederickson.com/robots-txt-analysis/ - it's a relatively common pattern for major websites