Splatter Image: Ultra-Fast Single-View 3D Reconstruction

mk_stjames · on Dec 21, 2023

Oof, the dependency tree on this.

It uses diff-gaussian-rasterization from the original gaussian splatting implementation (which, is a linked submodule on the git, so if you are trying to git clone that dependency remember to use --recursive to actually download it).

But that is written in mostly pure CUDA.

That part is just used to display the resulting gaussian splatt'd model, and there have been other cross-platform implementations to render splats – there was even that web demo a few weeks ago, that was using WebGL [0] – and if that was used as a display output in place of the original implementation there is no reason people couldn't use this on non-Nvidia hardware, I think.

edit: also device=cuda is hardcoded in the torch portions of the training code (sigh!). This doesn't have to be the case. pytorch could push this onto mps (metal) probably just fine.

[0] https://github.com/antimatter15/splat?tab=readme-ov-file

catapart · on Dec 21, 2023

So if I'm tracking the progress correctly, now we should be able to do: Single Image -> Gaussian Splats -> Object Identification -> [Nearest Known Object | Algo-based shell] Mesh Generation -> Use-Case-Based Retopology -> Style-Trained Mesh Transformation

Which would produce a new mesh in the style of your other meshes, based on a single photograph of a real-world object.

...and, at this speed, you could do that as a real-time(ish) import into a running application/game.

Gotta say, I'm looking forward to someone putting these puzzle pieces together! But it really does feel like if we wait another month, there might be some new AI that shrinks that pipeline by another one or two steps! It's an exhausting time to be excited!

andybak · on Dec 21, 2023

I do wonder if we need to stop relying on meshes entirely. NeRFs and splats have potentially much richer representations of material and lighting response. Current hardware is very focused on triangles and bitmaps but GPUs are versatile beasts.

efnx · on Dec 21, 2023

I don’t think the engines will switch their happy-paths to splats until artists have the proper tools to create assets with splats. As cool as generating splats with AI is, the assets in a AAA game must fit the art directors vision, which means having artists in the loop.

I feel like the visual style of games will change as a result of generative AI to be whatever style those AI models have a hard time generating. Essentially the games that will stand out will be truly original, art wise.

anigbrowl · on Dec 21, 2023

I suspect a lot of game/film artists would be very happy to go back to sculpting physical objects and taking a few photographs as opposed to building the models from scratch in the computer.

andybak · on Dec 21, 2023

> I don’t think the engines will switch their happy-paths to splats until artists have the proper tools to create assets with splats.

Oh - I agree and it's a bit chicken and egg. I'm not expecting this shift to be quick (or even universal). But I do feel the need to put the idea out there that meshes might not be the be-all and end-all for games and other spatial media.

a_t48 · on Dec 21, 2023

Collision geometry is also generally triangles or other primitives.

andybak · on Dec 21, 2023

But only because we generally are starting with a mesh and creating a low-res collision mesh from that reuses existing tooling. Mesh colliders aren't terribly ideal. You need a lot of triangles. SDFs can be a better choice in some cases.

golergka · on Dec 22, 2023

How hard it is to generate splats from meshes?

lainga · on Dec 21, 2023

How do you do collisions and shadowing? How is UV mapping done?

andybak · on Dec 21, 2023

Collisions could be handles separately (they already are - you don't use the render mesh for collisions). Maybe a separate mesh, maybe an SDF or similar.

UV mapping is a mesh thing. That's Stockholm Syndrome talking. ;-)

joosters · on Dec 21, 2023

Probably a dumb question, but is this trained by the use of lots of inputs of similar objects, or is it 'just' estimating by the look of the input image?

Like, if you have an image of a car, viewed at an angle, you can gauge the shape of the 3d object from the image itself. You could then assume that the hidden side of the car is similar to the side that you can see, and when you generate a 360 rotation animation of it, it will look pretty good (cars being roughly symmetrical). But if you gave it a flat image of a playing card, just showing the face up side, how would it reconstruct the reverse side? Would it infer it based on the front, or would it 'know' from training data that playing cards have a very different patterned back to them?

zellyn · on Dec 21, 2023

I came here to ask this. The output was impressive to the point of magic… until they showed whole grids full of fire hydrants and teddy bear training data.

lamerose · on Dec 21, 2023

Where do they show that?

zellyn · on Dec 22, 2023

https://www.youtube.com/watch?v=pcKTf9SVh4g&t=204s

lamerose · on Dec 22, 2023

Those are inputs, ground truth, and outputs from various methods, not training data. Not sure what their training data is though lol.

roflmaostc · on Dec 21, 2023

Since it's based on 3D Gaussians in space, is there a way to obtain sharp images? Inherently, Gaussian functions extent infinitely, so images always look blurry. Don't they? Of course, \sigma can be optimized to be small, but then it converges to some point representation, doesn't it?

Maybe some CV/ML people can help me understanding.

dahart · on Dec 21, 2023

Yes. The main way to keep the images sharp is to render the models at near the same size & resolution that they were captured, or slightly smaller in size. It’s the same thing as zooming into an image- if you zoom in it gets blurry because the filtered pixels get too big, the highest frequency in the data is now zoom-factor pixels wide. If you zoom out, the Gaussian splat images become sharper automatically (and eventually you run into aliasing issues). The way to obtain sharp images if you want to zoom in is to let the NN hallucinate some high frequency details based on what it learns about similar objects (or otherwise have external knowledge of the likely geometry and material properties not captured in the original image.)

The theoretical Gaussian function is infinite, but splat rendering doesn’t use infinite extent, and that’s not really the reason images look blurry, nor do they always look blurry. (Lots of anti-aliasing pixel filters have theoretically infinite extent, but that doesn’t matter in practice, i.e., what matters is only sigma, not extent, provided the finite extent doesn’t cut off too early.) There is a near optimal range of Gaussian sizes for image sharpness that will antialias without overblurring. The capture / optimization process of opaque objects will probably produce Gaussians that are near this optimal size at the smallest, so if you render them back at the same size, it will stay near the optimal range. Generally, the optimizers we have so far tend to blur a little bit, which is why rendering the reconstruction slightly smaller than the captured image currently tends to sharpen things.

heliophobicdude · on Dec 21, 2023

Hard edges are a challenge right now.

Thinking in 2D for a second, to get a nice crispy edge, you need a long and opaque splat to mark the boundary. Sometimes the long splat could wisp off leaving fuzzy artifacts.

Take this example: https://www.shadertoy.com/view/dtSfDD

Peyman Milanfar [1] suggested using bump functions instead. Bump functions would allow you to specify cut off intervals but still make the whole function smooth and continuous (good for my gradient optimization freaks)

1: https://x.com/docmilanfar/status/1719584410348204233

karmakaze · on Dec 21, 2023

Not working in the field I don't know relevance, but I thought that the "4D Gaussian Splatting"[0] looked like it makes great efficiency gains.

[0] https://news.ycombinator.com/item?id=37905601

XorNot · on Dec 21, 2023

I guess this is how you'd implement that thing in Enemy Of The State where they pan around a single-perspective camera view (which I think doesn't come across as absurd in the movie anyway since the tech guys point out it's basically a clever extrapolation).

rijx · on Dec 21, 2023

Now we can finally turn Street View into a game world!

xnx · on Dec 21, 2023

Waymo has done this for their simulations (a kind of game I suppose): https://waymo.com/research/block-nerf/

speedgoose · on Dec 21, 2023

Is there something similar than this, but with source code? Papers with great results but without code are frustrating.

eurekin · on Dec 21, 2023

For anybody wanting to take a look at the code, this time the Github link does include it - it's not empty, which is typicaly for those "too good to be true" publications

lawlessone · on Dec 21, 2023

Am I imagining this ,or somebody making a newer and faster one of these every day?

I'm expecting Overwhelming Fast Splatter by January.

xnx · on Dec 21, 2023

I have already named my residential dwelling optimized splatter, "Splatterhouse".

kridsdale1 · on Dec 21, 2023

The innovation rate in Splats is astounding.

teunispeters · on Dec 21, 2023

For a change, [code] works, but [arXiv] link is not present. Have to say this looks really interesting!

billconan · on Dec 21, 2023

the paper link doesn't work for me. the correct link https://arxiv.org/pdf/2312.13150.pdf

alkonaut · on Dec 21, 2023

Wouldn't it be more useful to generate a vector model than a "3d image" voxel/radiance field/splats/whatever it's called? Apart from the use case "I want to spin the thing or walk around in it" they feel like they are of limited use?

Unlike say a crude model of a fire hydrant which you could throw into a game or whatever. If the model is fed some more constraints/assumptions? I think I saw some recent paper that did generate meshes now instead of pixels.

andybak · on Dec 21, 2023

See my comment above about meshes. Games should adapt to new representations, not the other way round.

What do games need? Relighting, animation, collision. All of these can be done with non-mesh objects. At the moment it's all in it's infancy compared to conventional 3d but it won't stay that way for long.

tomp · on Dec 21, 2023

maybe check this out - it's based on NERFs, not Gaussian Splatters, but might be applicable

https://research.nvidia.com/labs/toronto-ai/adaptive-shells/

EchoChamberMan · on Dec 21, 2023

All I have to say is "ENHANCE!"

amelius · on Dec 21, 2023

This would be more powerful if you could feed it more input images for a better result, if desired.

anigbrowl · on Dec 21, 2023

This could get prove useful for autonomous navigation systems as well.

tantalor · on Dec 21, 2023

That "GT" method seems even better, we should just use that. /s

cooper_ganglia · on Dec 21, 2023

I didn't realize what GT stood for until I came across this thread, I was confused why they weren't providing it's render time results, hahaha

mft_ · on Dec 21, 2023

Might I ask what that acronym stands for? :)

xnx · on Dec 21, 2023

"Ground Truth" (i.e. real world, actual data)

mft_ · on Dec 21, 2023

Thanks!

xnx · on Dec 21, 2023

GT also always renders in real time!