Graph-Hist: Graph Classification from Latent Feature Histograms with Application to Bot Detection

The Graph-Hist Architecture

Abstract

Neural networks are increasingly used for graph classification in a variety of contexts. Social media is a critical application area in this space, however the characteristics of social media graphs differ from those seen in most popular benchmark datasets. Social networks tend to be large and sparse, while benchmarks are small and dense. Classically, large and sparse networks are analyzed by studying the distribution of local properties. Inspired by this, we introduce Graph-Hist: an end-to-end architecture that extracts a graph’s latent local features, bins nodes together along 1-D cross sections of the feature space, and classifies the graph based on this multi-channel histogram. We show that Graph-Hist improves state of the art performance on true social media benchmark datasets, while still performing well on other benchmarks. Finally, we demonstrate Graph-Hist’s performance by conducting bot detection in social media. While sophisticated bot and cyborg accounts increasingly evade traditional detection methods, they leave artificial artifacts in their conversational graph that are detected through graph classification. We apply Graph-Hist to classify these conversational graphs. In the process, we confirm that social media graphs are different than most baselines and that Graph-Hist outperforms existing bot-detection models.

Publication
In AAAI 2020
Tom Magelinski, PhD
Tom Magelinski, PhD
Senior Data Scientist - Information Extraction and Generative AI

I build AI systems that help domain experts understand vast amounts of data through state-of-the-art techniques from natural language processing, generative modeling, graph ML, and network science. I’m particularly interested researching and developing techniques to combine NLP and graph-based approaches to capture complex relationships in unstructured data.

Related