Diffusion models stand out for their ability to create high-quality images by transforming data into noise, a process inspired by thermodynamics. This transformation, central to the performance of these models, has become a key area of study in generative modeling and image synthesis, especially for its potential to enhance image quality through novel methodologies.
The primary challenge in diffusion models is the noise schedule – adding Gaussian noise to images. Traditionally, this schedule is preset based on thermodynamic principles, which may limit the model’s adaptability and performance. The question arises: can the performance of diffusion models be enhanced by learning and adapting the noise schedule directly from the data rather than relying on a fixed, pre-determined approach?
The noise schedule in diffusion models is usually fixed or treated as a hyperparameter. This standard approach, while principled, might only partially adapt to the variations within datasets, suggesting a potential area for improvement. The noise schedule, critical for image quality, has thus far been approached with a one-size-fits-all mindset and has yet to consider the nuanced differences in individual images.
To address this, Cornell University researchers introduced “Multivariate Learned Adaptive Noise” (MuLAN). This machine learning method proposes a learned, data-driven approach to diffusion, representing a significant deviation from traditional fixed schedules. MuLAN enhances classical models with a polynomial noise schedule, a conditional noising process, and auxiliary-variable reverse diffusion. This innovation challenges the conventional concept of invariant noise schedules by introducing a learning mechanism for noise application, adapting more effectively to data variances.
MuLAN’s methodology involves learning the diffusion process from data, allowing for a more tailored application of noise across an image. This approach leverages Bayesian inference, viewing the diffusion process as an approximate variational posterior. The multivariate aspect introduces variability in noise application, adapting to each image’s specific characteristics. The method entails a per-pixel polynomial noise schedule and a conditional noising process augmented by auxiliary-variable reverse diffusion.
MuLAN has shown remarkable results in performance, achieving state-of-the-art performance in density estimation on standard image datasets like CIFAR-10 and ImageNet. This improvement is primarily attributed to MuLAN’s ability to adapt the noise schedule to each image instance, which enhances the model’s fidelity and effectiveness.
MuLAN represents a considerable advancement in diffusion models, challenging the traditional notion of invariant noise schedules. By introducing a learning mechanism for noise application, it adapts more effectively to data variances, enhancing image generation quality. This approach could pave the way for more nuanced and adaptable generative modeling techniques, offering a significant leap in image synthesis through diffusion models.
Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, LinkedIn Group, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Muhammad Athar Ganaie, a consulting intern at MarktechPost, is a proponet of Efficient Deep Learning, with a focus on Sparse Training. Pursuing an M.Sc. in Electrical Engineering, specializing in Software Engineering, he blends advanced technical knowledge with practical applications. His current endeavor is his thesis on “Improving Efficiency in Deep Reinforcement Learning,” showcasing his commitment to enhancing AI’s capabilities. Athar’s work stands at the intersection “Sparse Training in DNN’s” and “Deep Reinforcemnt Learning”.