Purdue University’s researchers have developed a novel approach, Graph-Based Topological Data Analysis (GTDA), to simplify interpreting complex predictive models like deep neural networks. These models often pose challenges in understanding and generalization. GTDA utilizes topological data analysis to transform intricate prediction landscapes into simplified topological maps.
Unlike traditional methods such as tSNE and UMAP, GTDA provides a more specific inspection of model results. The method involves constructing a Reeb network, a discretization of topological structures, to simplify data while respecting topology. Based on the mapper algorithm, this recursive splitting and merging procedure builds a discrete approximation of the Reeb graph. GTDA starts with a graph representing relationships among data points and uses lenses, like neural network prediction matrices, to guide the analysis. The recursive splitting strategy helps build bins in the multidimensional space.
GTDA uses a transformer-based model, Enformer, designed for predicting gene expression levels based on DNA sequences. The analysis of harmful mutations in the BRCA1 gene demonstrated GTDA’s ability to highlight biologically relevant features. GTDA showcased the localization of predictions in the DNA sequence and provided insights into the impact of mutations in specific gene regions.
The GTDA framework also offers automatic error estimation, outperforming model uncertainty in certain cases. The analysis of a chest X-ray dataset revealed incorrect diagnostic annotations, emphasizing the potential of GTDA in identifying errors in deep learning datasets. The method was further applied to a pre-trained ResNet50 model on the Imagenette dataset, providing a visual taxonomy of images and uncovering mislabeled data points. The scalability of GTDA was demonstrated by analyzing over a million images in ImageNet, taking about 7.24 hours.
The researchers compared GTDA with traditional methods such as tSNE and UMAP across different datasets, showing the efficacy of GTDA in providing detailed insights. The method was also applied to study chest X-ray diagnostics and compare deep-learning frameworks, showcasing its versatility. GTDA offers a promising solution to the challenges of interpreting complex predictive models. Its ability to simplify topological landscapes provides detailed insights into prediction mechanisms and facilitates the identification of biologically relevant features. The method’s scalability and applicability to diverse datasets make it a valuable tool for understanding and improving prediction models in various domains.
Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, LinkedIn Group, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML.