It uses diff-gaussian-rasterization from the original gaussian splatting implementation (which, is a linked submodule on the git, so if you are trying to git clone that dependency remember to use --recursive to actually download it).
But that is written in mostly pure CUDA.
That part is just used to display the resulting gaussian splatt'd model, and there have been other cross-platform implementations to render splats – there was even that web demo a few weeks ago, that was using WebGL [0] – and if that was used as a display output in place of the original implementation there is no reason people couldn't use this on non-Nvidia hardware, I think.
edit: also device=cuda is hardcoded in the torch portions of the training code (sigh!). This doesn't have to be the case. pytorch could push this onto mps (metal) probably just fine.
So if I'm tracking the progress correctly, now we should be able to do:
Single Image -> Gaussian Splats -> Object Identification -> [Nearest Known Object | Algo-based shell] Mesh Generation -> Use-Case-Based Retopology -> Style-Trained Mesh Transformation
Which would produce a new mesh in the style of your other meshes, based on a single photograph of a real-world object.
...and, at this speed, you could do that as a real-time(ish) import into a running application/game.
Gotta say, I'm looking forward to someone putting these puzzle pieces together! But it really does feel like if we wait another month, there might be some new AI that shrinks that pipeline by another one or two steps! It's an exhausting time to be excited!
I do wonder if we need to stop relying on meshes entirely. NeRFs and splats have potentially much richer representations of material and lighting response. Current hardware is very focused on triangles and bitmaps but GPUs are versatile beasts.
I don’t think the engines will switch their happy-paths to splats until artists have the proper tools to create assets with splats. As cool as generating splats with AI is, the assets in a AAA game must fit the art directors vision, which means having artists in the loop.
I feel like the visual style of games will change as a result of generative AI to be whatever style those AI models have a hard time generating. Essentially the games that will stand out will be truly original, art wise.
I suspect a lot of game/film artists would be very happy to go back to sculpting physical objects and taking a few photographs as opposed to building the models from scratch in the computer.
> I don’t think the engines will switch their happy-paths to splats until artists have the proper tools to create assets with splats.
Oh - I agree and it's a bit chicken and egg. I'm not expecting this shift to be quick (or even universal). But I do feel the need to put the idea out there that meshes might not be the be-all and end-all for games and other spatial media.
But only because we generally are starting with a mesh and creating a low-res collision mesh from that reuses existing tooling. Mesh colliders aren't terribly ideal. You need a lot of triangles. SDFs can be a better choice in some cases.
Collisions could be handles separately (they already are - you don't use the render mesh for collisions). Maybe a separate mesh, maybe an SDF or similar.
UV mapping is a mesh thing. That's Stockholm Syndrome talking. ;-)
Probably a dumb question, but is this trained by the use of lots of inputs of similar objects, or is it 'just' estimating by the look of the input image?
Like, if you have an image of a car, viewed at an angle, you can gauge the shape of the 3d object from the image itself. You could then assume that the hidden side of the car is similar to the side that you can see, and when you generate a 360 rotation animation of it, it will look pretty good (cars being roughly symmetrical). But if you gave it a flat image of a playing card, just showing the face up side, how would it reconstruct the reverse side? Would it infer it based on the front, or would it 'know' from training data that playing cards have a very different patterned back to them?
I came here to ask this. The output was impressive to the point of magic… until they showed whole grids full of fire hydrants and teddy bear training data.
Since it's based on 3D Gaussians in space, is there a way to obtain sharp images? Inherently, Gaussian functions extent infinitely, so images always look blurry. Don't they?
Of course, \sigma can be optimized to be small, but then it converges to some point representation, doesn't it?
Maybe some CV/ML people can help me understanding.
Yes. The main way to keep the images sharp is to render the models at near the same size & resolution that they were captured, or slightly smaller in size. It’s the same thing as zooming into an image- if you zoom in it gets blurry because the filtered pixels get too big, the highest frequency in the data is now zoom-factor pixels wide. If you zoom out, the Gaussian splat images become sharper automatically (and eventually you run into aliasing issues). The way to obtain sharp images if you want to zoom in is to let the NN hallucinate some high frequency details based on what it learns about similar objects (or otherwise have external knowledge of the likely geometry and material properties not captured in the original image.)
The theoretical Gaussian function is infinite, but splat rendering doesn’t use infinite extent, and that’s not really the reason images look blurry, nor do they always look blurry. (Lots of anti-aliasing pixel filters have theoretically infinite extent, but that doesn’t matter in practice, i.e., what matters is only sigma, not extent, provided the finite extent doesn’t cut off too early.) There is a near optimal range of Gaussian sizes for image sharpness that will antialias without overblurring. The capture / optimization process of opaque objects will probably produce Gaussians that are near this optimal size at the smallest, so if you render them back at the same size, it will stay near the optimal range. Generally, the optimizers we have so far tend to blur a little bit, which is why rendering the reconstruction slightly smaller than the captured image currently tends to sharpen things.
Thinking in 2D for a second, to get a nice crispy edge, you need a long and opaque splat to mark the boundary. Sometimes the long splat could wisp off leaving fuzzy artifacts.
Peyman Milanfar [1] suggested using bump functions instead. Bump functions would allow you to specify cut off intervals but still make the whole function smooth and continuous (good for my gradient optimization freaks)
I guess this is how you'd implement that thing in Enemy Of The State where they pan around a single-perspective camera view (which I think doesn't come across as absurd in the movie anyway since the tech guys point out it's basically a clever extrapolation).
For anybody wanting to take a look at the code, this time the Github link does include it - it's not empty, which is typicaly for those "too good to be true" publications
Wouldn't it be more useful to generate a vector model than a "3d image" voxel/radiance field/splats/whatever it's called? Apart from the use case "I want to spin the thing or walk around in it" they feel like they are of limited use?
Unlike say a crude model of a fire hydrant which you could throw into a game or whatever. If the model is fed some more constraints/assumptions? I think I saw some recent paper that did generate meshes now instead of pixels.
See my comment above about meshes. Games should adapt to new representations, not the other way round.
What do games need? Relighting, animation, collision. All of these can be done with non-mesh objects. At the moment it's all in it's infancy compared to conventional 3d but it won't stay that way for long.
It uses diff-gaussian-rasterization from the original gaussian splatting implementation (which, is a linked submodule on the git, so if you are trying to git clone that dependency remember to use --recursive to actually download it).
But that is written in mostly pure CUDA.
That part is just used to display the resulting gaussian splatt'd model, and there have been other cross-platform implementations to render splats – there was even that web demo a few weeks ago, that was using WebGL [0] – and if that was used as a display output in place of the original implementation there is no reason people couldn't use this on non-Nvidia hardware, I think.
edit: also device=cuda is hardcoded in the torch portions of the training code (sigh!). This doesn't have to be the case. pytorch could push this onto mps (metal) probably just fine.
[0] https://github.com/antimatter15/splat?tab=readme-ov-file