Community-based time segmentation from network snapshots

The voting network before and after our detected event, colored by Louvain grouping.

Abstract

Community detection has proved to be extremely successful in a variety of domains. However, most of the algorithms used in practice assume networks are unchanging in time. This assumption is violated for many datasets, resulting in incorrect or misleading communities. Many different algorithms to rectify this problem have been proposed. Most of them, however, focus on community evolution rather than abrupt changes. The problem of change detection is easier than that of community evolution, and is often sufficient. Here, we propose an algorithm for determining community-based change points from network snapshots. Networks can then be aggregated between change points, and analyzed without violating assumptions. There are three network types that we have defined our algorithm for, each having a case study: static nodesets, semi-static nodesets, and dynamic nodesets. The case studies for these network types are: the Ukrainian Legislature, the Enron email network, and Twitter data from Ukraine. We empirically verify our algorithm in each case study, and compare results to two popular alternatives: Generalized Louvain and GraphScope. We show the impracticality of Generalized Louvain and that our method is less sensitive than GraphScope. Lastly, we use our first two case studies to determine optimal parameters for an anomaly-detection-based streaming method. We then demonstrate that the streaming method was capable of determining events both from data collection errors and from internal network disruptions.

Publication
In Applied Network Science
Tom Magelinski, PhD
Tom Magelinski, PhD
Senior Data Scientist - Information Extraction and Generative AI

I build AI systems that help domain experts understand vast amounts of data through state-of-the-art techniques from natural language processing, generative modeling, graph ML, and network science. I’m particularly interested researching and developing techniques to combine NLP and graph-based approaches to capture complex relationships in unstructured data.

Related