Intel Labs and Blockade Labs have announced their collaboration on the launch of the Latent Diffusion Model for 3D (LDM3D), a novel diffusion model that uses generative AI to create lifelike 3D visual content. LDM3D is the industry's first model to generate depth maps through a diffusion process, enabling the creation of immersive 360-degree 3D images. With the potential to revolutionize content creation, metaverse applications, and digital experiences in entertainment, gaming, architecture, and design, LDM3D represents a significant breakthrough.
Vasudev Lal, an AI/ML research scientist at Intel Labs, highlighted the power of generative AI technology in expanding and enhancing human creativity while saving valuable time. However, most existing generative AI models are limited to generating 2D images, with only a few capable of generating 3D images from textual prompts. Unlike existing latent diffusion models, LDM3D uses a nearly identical parameter count to generate images and depth maps from given textual prompts. By providing more accurate relative depth for each pixel in an image compared to standard depth estimation post-processing methods, LDM3D saves developers significant time in scene construction.
Intel emphasizes its commitment to democratizing AI by fostering an open ecosystem that allows wider adoption of AI benefits. Recent years have witnessed significant advancements in computer vision, particularly in generative AI. However, many advanced generative AI models today are limited to generating 2D images. Unlike existing diffusion models that typically generate 2D RGB images from textual prompts, LDM3D allows users to generate both images and depth maps from given textual prompts. With a parameter count comparable to latent diffusion models, LDM3D offers more precise relative depth for every pixel compared to standard depth estimation post-processing methods.
Moreover, this research has the potential to fundamentally change the way people interact with digital content, enabling users to experience their textual prompts in previously unimaginable ways. The images and depth maps generated by LDM3D allow users to transform quiet tropical beaches or futuristic worlds from science fiction into detailed 360-degree panoramas. This ability to capture depth information immediately enhances overall realism and immersion, opening doors for innovative applications in entertainment, gaming, interior design, real estate sales, virtual museums, and immersive virtual reality (VR) experiences.
Intel emphasizes that LDM3D was trained on a dataset called LAION-400M, which comprises over 400 million images with textual annotations from 10,000 samples. The team used the Dense Prediction Transformer (DPT), a large-scale depth estimation model previously developed by Intel Labs, to annotate the training dataset. The DPT model provides highly accurate relative depth for each pixel in an image. The LAION-400M dataset was specifically created for research purposes, allowing researchers and interested communities to conduct model training and testing on a larger scale. The LDM3D model was trained on the Intel AI supercomputer equipped with Intel Xeon processors and Intel Habana Gaudi AI accelerators. The combined output of the generated RGB images and depth maps produces a 360-degree view, delivering an immersive experience.
To demonstrate the potential of LDM3D, Intel and Blockade researchers developed an application called DepthFusion, which creates immersive and interactive 360-degree experiences using standard 2D RGB photos and depth maps. DepthFusion utilizes the node-based visual programming language TouchDesigner, known for real-time interactive multimedia content, to transform textual prompts into interactive and immersive digital experiences. By enabling the creation of both RGB images and their corresponding depth maps with a single model, LDM3D reduces memory consumption and improves latency.
Intel further highlights that the introduction of LDM3D and DepthFusion paves the way for further advancements in multi-view generative AI and computer vision Intel continues to emphasize its commitment to exploring the capabilities of generative AI and establishing a robust open-source AI development ecosystem to democratize the technology further. Through its support for an open AI ecosystem, Intel is currently working on open-sourcing LDM3D in collaboration with HuggingFace. This initiative will enable AI researchers and practitioners to improve the system further and fine-tune it for customized applications.
The introduction of LDM3D and DepthFusion marks a significant milestone in the advancement of multi-view generative AI and computer vision. Intel's dedication to pushing the boundaries of generative AI technology not only expands human creativity but also enables users to unlock new levels of interaction with digital content. By combining generative AI with depth maps, LDM3D revolutionizes the creation of immersive 3D visuals, offering a range of innovative applications in various industries, including entertainment, gaming, interior design, real estate, virtual museums, and immersive VR experiences.
As Intel Labs and Blockade Labs join forces to bring LDM3D to the forefront, the future holds promising possibilities for leveraging generative AI and 3D visualization to reshape the way we engage with and experience digital content. With the continuous advancements in AI technology, Intel remains at the forefront of driving innovation and empowering individuals and industries to harness the full potential of generative AI for creative expression and immersive experiences.