Hostname: page-component-7c8c6479df-r7xzm Total loading time: 0 Render date: 2024-03-19T05:39:30.378Z Has data issue: false hasContentIssue false

Learning to count: A deep learning framework for graphlet count estimation

Published online by Cambridge University Press:  11 September 2020

Xutong Liu*
Affiliation:
The Chinese University of Hong Kong, Shatin, NT, Hong Kong (e-mail: cslui@cse.cuhk.edu.hk)
Yu-Zhen Janice Chen
Affiliation:
University of Massachusetts Amherst, MA01002, USA (e-mail: yuzhenchen@cs.umass.edu)
John C. S. Lui
Affiliation:
The Chinese University of Hong Kong, Shatin, NT, Hong Kong (e-mail: cslui@cse.cuhk.edu.hk)
Konstantin Avrachenkov
Affiliation:
INRIA Sophia Antipolis, 06902Valbonne, France (e-mail: k.avrachenkov@inria.fr)
*
*Corresponding author. Email: liuxt@cse.cuhk.edu.hk

Abstract

Graphlet counting is a widely explored problem in network analysis and has been successfully applied to a variety of applications in many domains, most notatbly bioinformatics, social science, and infrastructure network studies. Efficiently computing graphlet counts remains challenging due to the combinatorial explosion, where a naive enumeration algorithm needs O(Nk) time for k-node graphlets in a network of size N. Recently, many works introduced carefully designed combinatorial and sampling methods with encouraging results. However, the existing methods ignore the fact that graphlet counts and the graph structural information are correlated. They always consider a graph as a new input and repeat the tedious counting procedure on a regular basis even if it is similar or exactly isomorphic to previously studied graphs. This provides an opportunity to speed up the graphlet count estimation procedure by exploiting this correlation via learning methods. In this paper, we raise a novel graphlet count learning (GCL) problem: given a set of historical graphs with known graphlet counts, how to learn to estimate/predict graphlet count for unseen graphs coming from the same (or similar) underlying distribution. We develop a deep learning framework which contains two convolutional neural network models and a series of data preprocessing techniques to solve the GCL problem. Extensive experiments are conducted on three types of synthetic random graphs and three types of real-world graphs for all 3-, 4-, and 5-node graphlets to demonstrate the accuracy, efficiency, and generalizability of our framework. Compared with state-of-the-art exact/sampling methods, our framework shows great potential, which can offer up to two orders of magnitude speedup on synthetic graphs and achieve on par speed on real-world graphs with competitive accuracy.

Type
Research Article
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Special Issue Editor: Hocine Cherifi

References

Ahmed, N. K., Neville, J., Rossi, R. A., & Duffield, N. (2015). Efficient graphlet countingor large networks. In 2015 IEEE international conference on data mining (ICDM) (pp. 110). IEEE.CrossRefGoogle Scholar
Ahmed, N. K., Neville, J., Rossi, R. A., Duffield, N. G., & Willke, T. L. (2017). Graphlet decomposition: Framework, algorithms, and applications. Knowledge and Information Systems, 50(3), 689722.CrossRefGoogle Scholar
Akoglu, L., Tong, H., & Koutra, D. (2015). Graph based anomaly detection and description: A survey. Data Mining and Knowledge Discovery, 29(3), 626688.CrossRefGoogle Scholar
Alon, N., Yuster, R., & Zwick, U. (1995). Color-coding. Journal of the ACM (JACM), 42(4), 844856.CrossRefGoogle Scholar
Backstrom, L., & Leskovec, J. (2011). Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the Fourth ACM international conference on web search and data mining (pp. 635644). ACM.CrossRefGoogle Scholar
Becchetti, L., Boldi, P., Castillo, C., & Gionis, A. (2008). Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1624). ACM.CrossRefGoogle Scholar
Bhuiyan, M. A., Rahman, M., & Al Hasan, M. (2012). Guise: Uniform sampling of graphlets for large graph analysis. In 2012 IEEE 12th international conference on data mining (ICDM) (pp. 91100). IEEE.CrossRefGoogle Scholar
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., & Hwang, D.-U. (2006). Complex networks: Structure and dynamics. Physics Reports, 424(4–5), 175308.CrossRefGoogle Scholar
Bressan, M., Chierichetti, F., Kumar, R., Leucci, S., & Panconesi, A. (2017). Counting graphlets: Space vs time. In Proceedings of the tenth ACM international conference on web search and data mining (pp. 557566). ACM.CrossRefGoogle Scholar
Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arxiv preprint arxiv:1312.6203.Google Scholar
Butler, S. K. (2008). Eigenvalues and structures of graphs. Ph.D. thesis, UC San Diego.Google Scholar
Chen, X., & Lui, J. (2018). Mining graphlet counts in online social networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 12(4), 41.CrossRefGoogle Scholar
Chen, X., Li, Y., Wang, P., & Lui, J. (2016). A general framework for estimating graphlet statistics via random walk. Proceedings of the VLDB Endowment, 10(3), 253264.CrossRefGoogle Scholar
Chiba, N., & Nishizeki, T. (1985). Arboricity and subgraph listing algorithms. SIAM Journal on Computing, 14(1), 210223.CrossRefGoogle Scholar
Chung, K. L. (2001). A course in probability theory. Cambridge, Massachusetts, USA: Academic Press.Google Scholar
Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems (pp. 38443852).Google Scholar
Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 855864). ACM.CrossRefGoogle Scholar
Hamilton, W., Ying, Z., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Advances in neural information processing systems (pp. 10241034).Google Scholar
Henaff, M., Bruna, J., & LeCun, Y. (2015). Deep convolutional networks on graph-structured data. arxiv preprint arxiv:1506.05163.Google Scholar
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 1330.CrossRefGoogle Scholar
Holland, P. W., & Leinhardt, S. (1976). Local structure in social networks. Sociological Methodology, 7, 145.CrossRefGoogle Scholar
Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. The Journal of Physiology, 195(1), 215243.CrossRefGoogle ScholarPubMed
Jain, S., & Seshadhri, C. (2017). A fast and provable method for estimating clique counts using turán’s theorem. In Proceedings of the 26th international conference on world wide web (pp. 441449). International World Wide Web Conferences Steering Committee.CrossRefGoogle Scholar
Jha, M., Seshadhri, C., & Pinar, A. (2015). Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In Proceedings of the 24th international conference on world wide web (pp. 495505). International World Wide Web Conferences Steering Committee.CrossRefGoogle Scholar
Johnson, R., & Zhang, T. (2015). Semi-supervised convolutional neural networks for text categorization via region embedding. In Advances in neural information processing systems (pp. 919927).Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 17251732).CrossRefGoogle Scholar
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arxiv preprint arxiv:1412.6980.Google Scholar
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arxiv preprint arxiv:1609.02907.Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 10971105).Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.CrossRefGoogle ScholarPubMed
Leskovec, J., & Faloutsos, C. (2006). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 631636). ACM.CrossRefGoogle Scholar
Leskovec, J., Kleinberg, J., & Faloutsos, C. (2007). Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 2.CrossRefGoogle Scholar
Liu, C., Yan, X., Yu, H., Han, J., & Yu, P. S. (2005). Mining behavior graphs for backtrace of noncrashing bugs. In Proceedings of the 2005 SIAM international conference on data mining (pp. 286297). SIAM.CrossRefGoogle Scholar
Mawhirter, D., Wu, B., Mehta, D., & Ai, C. (2018). ApproxG: Fast approximate parallel graphlet counting through accuracy control. In IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID) (pp. 533542). IEEE.CrossRefGoogle Scholar
Milenković, T., Memišević, V., Ganesan, A. K., & Pržulj, N. (2010). Systems-level cancer gene identification from protein interaction network topology applied to melanogenesis-related functional genomics data. Journal of the Royal Society Interface, 7(44), 423437.CrossRefGoogle ScholarPubMed
Molchanov, P., Tyree, S., Karras, T., Aila, T., & Kautz, J. (2016). Pruning convolutional neural networks for resource efficient inference. arxiv preprint arxiv:1611.06440.Google Scholar
Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., & Bronstein, M. M. (2017). Geometric deep learning on graphs and manifolds using mixture model CNNs. In Proceedings of the CVPR (p. 3), vol. 1.CrossRefGoogle Scholar
Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 5, 323351.CrossRefGoogle Scholar
Niepert, M., Ahmed, M., & Kutzkov, K. (2016). Learning convolutional neural networks for graphs. In International conference on machine learning (pp. 20142023).Google Scholar
Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 701710). ACM.CrossRefGoogle Scholar
Pinar, A., Seshadhri, C., & Vishal, V. (2017). Escape: Efficiently counting all 5-vertex subgraphs. In Proceedings of the 26th international conference on world wide web (pp. 14311440). International World Wide Web Conferences Steering Committee.CrossRefGoogle Scholar
Pržulj, N. (2007). Biological network comparison using graphlet degree distribution. Bioinformatics, 23(2), e177e183.CrossRefGoogle ScholarPubMed
Rahman, M., Bhuiyan, M. A., & Al Hasan, M. (2014). Graft: An efficient graphlet counting method for large graph analysis. IEEE Transactions on Knowledge and Data Engineering, 26(10), 24662478.CrossRefGoogle Scholar
Rossi, R. A., & Ahmed, N. K. (2015). The network data repository with interactive graph analytics and visualization. In Proceedings of the twenty-ninth AAAI conference on artificial intelligence.Google Scholar
Rossi, R. A., & Ahmed, N. K. (2019). Complex networks are structurally distinguishable by domain. Social Network Analysis and Mining, 9, 51.CrossRefGoogle Scholar
Rossi, R. A., Zhou, R., & Ahmed, N. K. (2018). Estimation of graphlet counts in massive networks. IEEE Transactions on Neural Networks and Learning Systems, 30(1), 114.Google ScholarPubMed
Schank, T., & Wagner, D. (2005). Finding, counting and listing all triangles in large graphs, an experimental study. In International workshop on experimental and efficient algorithms (pp. 606609). Springer.CrossRefGoogle Scholar
Schöning, U. (1988). Graph isomorphism is in the low hierarchy. Journal of Computer and System Sciences, 37(3), 312323.CrossRefGoogle Scholar
Seshadhri, C., Pinar, A., & Kolda, T. G. (2013). Triadic measures on graphs: The power of wedge sampling. In Proceedings of the 2013 SIAM international conference on data mining (pp. 1018). SIAM.CrossRefGoogle Scholar
Shervashidze, N., Vishwanathan, S. V. N., Petri, T., Mehlhorn, K., & Borgwardt, K. (2009). Efficient graphlet kernels for large graph comparison. In Artificial intelligence and statistics (pp. 488495).Google Scholar
Sporns, O., Chialvo, D. R., Kaiser, M., & Hilgetag, C. C. (2004). Organization, development and function of complex brain networks. Trends in Cognitive Sciences, 8(9), 418425.CrossRefGoogle ScholarPubMed
Tsourakakis, C. E. (2008). Fast counting of triangles in large real networks without counting: Algorithms and laws. In Eighth IEEE international conference on data mining, 2008. ICDM’08 (pp. 608617). IEEE.CrossRefGoogle Scholar
Ugander, J., Backstrom, L., & Kleinberg, J. (2013). Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In Proceedings of the 22nd international conference on world wide web (pp. 13071318). ACM.CrossRefGoogle Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2017). Graph attention networks. arxiv preprint arxiv:1710.10903.Google Scholar
Vishwanathan, S. V. N., Schraudolph, N. N., Kondor, R., & Borgwardt, K. M. (2010). Graph kernels. Journal of Machine Learning Research, 11, 12011242.Google Scholar
Wale, N., Watson, I. A., & Karypis, G. (2008). Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14(3), 347375.CrossRefGoogle Scholar
Wang, P., Lui, J. C. S., Towsley, D., & Zhao, J. (2016). Minfer: A method of inferring motif statistics from sampled edges. In 2016 IEEE 32nd international conference on data engineering (ICDE). CrossRefGoogle Scholar
Wang, P., Zhao, J., Zhang, X., Li, Z., Cheng, J., Lui, J. C. S., Towsley, D., Tao, J., & Guan, X. (2017). MOSS-5: A fast method of approximating counts of 5-node graphlets in large graphs. IEEE Transactions on Knowledge and Data Engineering, 30(1), 7386.CrossRefGoogle Scholar
Weisfeiler, B., & Lehman, A. A. (1968). A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-technicheskaya Informatsia, 2(9), 1216.Google Scholar
Zhang, B., Xing, K., Cheng, X., Huang, L., & Bie, R. (2012). Traffic clustering and online traffic prediction in vehicle networks: A social influence perspective. In INFOCOM, 2012 Proceedings IEEE (pp. 495503). IEEE.CrossRefGoogle Scholar
Zhang, M., & Chen, Y. (2017). Weisfeiler–Lehman neural machine for link prediction. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 575583). ACM.CrossRefGoogle Scholar
Zhang, Z.-K., Zhou, T., & Zhang, Y.-C. (2010). Personalized recommendation via integrated diffusion on user–item–tag tripartite graphs. Physica A: Statistical Mechanics and its Applications, 389(1), 179186.CrossRefGoogle Scholar