Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: UForm v2 – tiny CLIP-like embeddings in 21 languages and Graphcore API (github.com/unum-cloud)
16 points by vov_or on Aug 18, 2023 | hide | past | favorite | 1 comment
I want to share the most recent model release we have prepared. It's a Vision-Language understanding Transformer.

It has 40% fewer parameters than vanilla CLIP while performing much better on text-to-image retrieval, where it's also beneficial that our output embeddings have 2x fewer dimensions (256 vs. 512).

Moreover, it supports 21 languages, including popular English, Hindi, Chinese, Arabic, and lower-resource languages like Ukrainian, Hebrew, and Armenian.

We have packed the library into ONNX and CoreML, providing PyTorch inference code for CPUs and GPUs and PopTorch code for Graphcore IPUs.

Demo: http://usearch-images.com/ Blog: https://www.unum.cloud/blog/2023-08-17-uform-graphcore

Looking forward to your feedback!



I did some tests and compared with the clip demo on https://huggingface.co/spaces/vivien/clip

It seems clip performs better for prompts like "three birds", "man and woman"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: