Hacker Newsnew | past | comments | ask | show | jobs | submit | benfrederickson's commentslogin

Sorry about leaving your program suspended - I believe that was because of a bug that should be fixed now https://github.com/benfred/py-spy/issues/390 =(


Well look at that! I guess I'll try it again with a newer version when I get a chance, thanks!


Slight tagent but do you and many others here appear mere hours after their project is mentioned? Pure luck or a large percentage of HN users are querying the api to see who mentions their projects?


Interesting I’ve had this happen with strace and/or gdb before when hitting control-c a few times.


thank you so much! it wasn't a big deal (if you remembered to check after using py-spy, granted). i'd take py-spy with this bug vs not having py-spy at all any day :)


> you can either race with the program and hope that you read its memory to get the function stack before it changes what function it's running (and you're likely to win the race, because C is faster than Python), or you can pause the program briefly while taking a sample.

Py-spy defaults to blocking because the results can be pretty wrong otherwise: https://github.com/benfred/py-spy/issues/56 . You can see this problem profiling a program like https://github.com/benfred/py-spy/blob/master/tests/scripts/... with or without the nonblocking flag in py-spy - the nonblocking version produces garbage output.

Somewhat interestingly, this problem doesn't seem to occur with Ruby - and rbspy can get away without pausing the target program with only minor errors seen when profiling a similar function. I suspect this is because of differences between how the Ruby and Python interpreters store call stack information, but haven't had a chance to dig into the specifics.


Interesting article. While I definitely think you should be profiling your code to figure out the hot spots, cProfile has some limitations for profiling: cProfile doesn't give you line numbers, doesn’t work with threads, and significantly slows your program down.

I wrote a tool py-spy (https://github.com/benfred/py-spy) that is worth checking out if you’re interesting in profiling python programs. Not only does it solve those problems with cProfile - py-spy also lets you generate a flamegraph, profile running programs in production, works with multiprocess python applications, can profile native python extensions etc.


Have you looked at Yappi[0]? I use it in combination with kcachegrind[1] (call graph viewer) and the combination has been extremely useful in eliminating bottlenecks across entire programs.

Side note: I also used pyreverse, now part of pylint, to diagram entire projects and get a class hierarchy. It helped tremendously in refactoring and decoupling code through whole projects, finding redundancies, and have a better architecture.

I'll have a look at py-spy. Thanks for that.

[0]: https://pypi.org/project/yappi/

[1]: https://kcachegrind.github.io/html/Home.html


Shoutout to Ben, py-spy is an amazing profiler. I believe cProfile has certain limitations and doesn't fully understand deep call stacks. py-spy does not have that limitation. It also offers multiple output formats (especially flamegraph and speedscope format, https://www.speedscope.app/) which make it so much nicer to identify bottlenecks.

At our company, py-spy has helped us a lot for our line-of-business application. I'm not affiliated with Ben in any way, but he deserves some praise for his work on py-spy.


Author here. It's worth noting that since I wrote this post, py-spy has gained the ability to profile multiprocess python applications - and can also now show local variables in the dump command.


I wrote something that will get you the python interpreter stack from any running cpython process : https://github.com/benfred/py-spy/ , and rbspy can do the same for ruby https://github.com/rbspy/rbspy


the name pyspy was taken on pypi already : https://github.com/tdfischer/pyspy =)


Thanks! Both of your suggestions totally make sense. I've created an issue to track the poll() issue here https://github.com/benfred/py-spy/issues/13 - I think that should be an easy fix.


I think a more robust solution might be to have counters for in-Python samples, outside-Python, and in-syscall.


Not yet - but I'm hoping to have a version that supports this next week. Will update this issue when it's done: https://github.com/benfred/py-spy/issues/3


Great, thanks!

And as a slightly different take than that of the person posting the issue - interfaces like kcachegrind are a pretty clunky (if powerful, in their clunky way) - the profiler coming with some built-in presentation and reporting of its own like the flamegraph and the realtime display is a big win and a serious deficiency in most python profilers.


Check out rbspy https://github.com/rbspy/rbspy (rbspy was the inspiration for this project =)


I analyzed the top 1 million robots.txt files looking for sites that allow google and block everyone else here: https://www.benfrederickson.com/robots-txt-analysis/ - it's a relatively common pattern for major websites


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: