#2.1 Scalability
Research QUESTION:
How to Compute PERsistent Homology on large graphs?
#2.2 Motivation: What if the network is too Big?
Complexity
Large graphs
Computation of higher-level persistence for relatively large graphs can
take days or weeks.
Our approach
We aim to address this fundamental bottleneck in the application of TDA
to large networks by introducing two new efficient algorithms which
significantly reduce the cost of computing persistence diagrams (PD) for
large real-world networks:
CoralTDA and PrunIT.
#2.3 Filtration
Filtration types
PH in a network setting: power filtration or using different complexes
(e.g., Vietoris-Rips, Cech complexes) to construct the filtration for a
given filtering function [11].
We focus on the most common methods to define PH for graphs: sub/superlevel filtrations obtained by a filtering function and the clique (flag) complexes.
Why sub/super level?
We can inject domain information into the PH process if the chosen
filtering function comes from the network domain (e.g., atomic number in
protein networks, transaction amount for blockchain networks).
Our results can be generalized to the persistent homology defined with a filtering function for different complexes.
#2.4 Coral Decomposition offers a way (CoralTDA)
Origins
Core decomposition presents a hierarchical ordering of nodes based on
edges in communities.
Key idea - CoralTDA
#2.4 Coral Decomposition offers a way (PrunIT)
Origins
In algebraic topology, homotopy is a very effective tool to compute
topological invariants like homology. If two spaces are homotopy
equivalent, then their corresponding topological invariants are the
same, i.e.,
\(πβΌπβπ»_π (π)=π»_π (π).\)
Key idea - PrunIT
Example - PrunIT
Vertex 3 dominates vertices 1 and 2 because all neighbors of 1 or 3 are
neighbors of 3. There are no other dominated vertices.
#2.5 Evaluation - Datasets AND LABELS
TU Datasets (molecular, chemical, biological)
Single Graphs (citation, social, co-authorship)
Facebook, Twitter ego networks (social)
Stanford Open Graph Benchmark Datasets
Evaluation
CoralTDA vertex reduction in graph and node classification datasets
(higher is better). Reduction values are averages from graph instances
of the datasets (CORA and CITESEER node classification datasets contain
a single graph instance only).
FACEBOOK and TWITTER datasets are reduced by 10% for k > 4, whereas in other datasets graphs are reduced to empty sets.
#2.5 Evaluation - PrunIT Reduction in VerticesVertex reduction by PrunIT algorithm in the superlevel filtration. Results are averages of graph instances from the datasets. | PrunIT reduction in OGB node classification dataset. Each data point is an ego network. Even for large networks, time reduction rates can reach 75% |
Evaluation
The figures show reduction percentages by the PrunIT algorithm. FIRSTMM
and SYNNEW datasets are reduced by less than 10%; however, the other 11
datasets are reduced by at least 35%.
PrunIt reductions in the number of vertices and edges. | Vertex reduction results for 11 large datasets after the application of PrunIt and CoralTDA algorithms. |
Evaluation
When we apply both CoralTDA and PrunIt on large networks, even for low
cores of 2 and 3, the combined algorithms reach a vertex reduction rate
of 78%.
Clustering coefficients vs.Β number of topological features in Facebook and Twitter datasets. Each data point is a graph instance. We observe hundreds of higher topological features in these datasets which can be highly useful for various graph learning tasks |
Theory
Real world datasets
For a graph order of n = 1000, this implies that the average degree
should be between 31 and 100 to have a nontrivial second homology in
random networks. However, in real-life networks, our results show that
higher Betti numbers are prevalent in much sparser graphs.
Home Page
>>Link