I am a computer scientist with a research interest in artificial intelligence. I recently graduated from CMU with a PhD in Societal Computing from the School of Computer Science. There, I was a Knight Fellow in the Center for Informed Democracy and Social Cybersecurity, as well as a member of the Center for Computational Analysis of Social and Organizational Systems. Now I research information representation techniques and help build state-of-the-art technologies into applications that are deployed and used by security analysts. For these applications, I’m particularly interested researching and developing techniques to combine NLP and graph-based approaches to capture complex relationships in unstructured data. A major emphasis of my recent work has evaluating and improving the robustness of LLM-enabled systems.
PhD in Societal Computing, 2023
Carnegie Mellon University
BS in Engineering Science and Mechanics, 2017
Virginia Tech
Network Science provides a framework to understand the large-scale discussions that happen on social media and their impact on society. However, a standard network model of a conversational network destroys the context that users are interacting within. First, the interactional context is destroyed. The interactional component of context includes the content of the conversation in which the users are interacting. When interactional context is not accounted for, separate discussions are combined into one big network, artificially inflating the number of nodes and edges in the network. This leads to inaccurate information about conversation structure and important actors. Next, the personal context is destroyed. The personal component of context includes the attributes of the users involved, as observed through their self-descriptions. Long-standing social theory of offline social communities such as self-categorization place great importance on personal context. Thus, this context needs to be accounted for to test these theories in the social media setting.
This thesis provides the theory and methodologies needed to account for both interactional and personal contexts which were previously lost in network analysis of social media conversations. Specifically, I study the importance of these contexts as they relate to community dynamics. I find that network structure is indeed dependent on interactional context, indicating that existing non-contextualized analyses could be improved. When investigating personal context, I find that the long-standing theory of self-categorization can be extended from offline social communities to massive online communities, with some important limitations. Taken together, the dynamic contextualized analysis outlined in this thesis furthers our understanding of attribute salience in online interactions. Each of these analyses is performed on multiple case studies, providing both validation and a set of examples used to detail a list of best practices for contextualized network analysis.
If we want to model Twitter conversations with a network, we need to account for the context that users interact within. We propose a deep-learning approach to separating Twitter data out into contextualized networks. We then show that these contextualized networks have very different nodesets, topology, and central actors than observed in the non-contextualized networks. Our findings suggest that the dominant way of modeling social media conversations may be inaccurately portraying the nature of the conversations and the most important people in them.
Network Science provides a framework to understand the large-scale discussions that happen on social media and their impact on society. However, a standard network model of a conversational network destroys the context that users are interacting within. First, the interactional context is destroyed. The interactional component of context includes the content of the conversation in which the users are interacting. When interactional context is not accounted for, separate discussions are combined into one big network, artificially inflating the number of nodes and edges in the network. This leads to inaccurate information about conversation structure and important actors. Next, the personal context is destroyed. The personal component of context includes the attributes of the users involved, as observed through their self-descriptions. Long-standing social theory of offline social communities such as self-categorization place great importance on personal context. Thus, this context needs to be accounted for to test these theories in the social media setting.
This thesis provides the theory and methodologies needed to account for both interactional and personal contexts which were previously lost in network analysis of social media conversations. Specifically, I study the importance of these contexts as they relate to community dynamics. I find that network structure is indeed dependent on interactional context, indicating that existing non-contextualized analyses could be improved. When investigating personal context, I find that the long-standing theory of self-categorization can be extended from offline social communities to massive online communities, with some important limitations. Taken together, the dynamic contextualized analysis outlined in this thesis furthers our understanding of attribute salience in online interactions. Each of these analyses is performed on multiple case studies, providing both validation and a set of examples used to detail a list of best practices for contextualized network analysis.
In this work we detail a scalable method of detecting groups of actors coordinating to exert influence on Twitter. Our method captures more coordinated behaviors than prior work and can detect coordination along multiple modalities. Looking a discussion of the Reopen America Protests, we find obvious, but non-threatening coordinated campaigns, as well a group of suspicious users promoting the protests in harmony across, each focusing on different state’s protests.
We show that modularity vitality, or the difference between the modularity of a graph with and without a node, can be used to measure that node’s contribution to community structure. We also derive a scalable way of computing this for all nodes. We then show that this measure identifies nodes which are more important to network integrity than existing measures can. This method fragmentes the PA Road network over 8 times more effectively than previous methods.
Modularity Vitality measures a node’s contribution to group structure. In hashtag networks, then, Modularity Vitality can be used to select hashtag that contributes most to a topic found through community detection. We show that this leads to more interpretable topic analysis for a large Twitter dataset.
The deep learning approach to graph classification is to embed nodes in a latent space, typically graph convolutions, and then to use these embeddings to make a single classification. The number of nodes may differ from one training example to the next, which poses a problem. We demonstrate that the node embedding distribution can be approximated using differentiable histograms. After the histograms are created, traditional convolutional layers can be used to classify the graph. This procedure leverages all available information, regardless of how the size of graphs vary. We demonstrate that this architecture gives incremental improvement for various benchmark datasets. We use this approach to classify bots on Twitter based on their communication graph. We find this classification technique generalizes better than previous methods, however sacrifices some precision.
In this work we advocate for the use of interoperable pipelines for Social Cybersecurity. We demonstrate one such pipeline in the analysis of the Twitter discussion of the Trident Juncture exercise. We find bot activity aiming to discredit NATO targeted and allied nations.
We develop a procedure for finding time-segments of community stability in dynamic networks. This also functions as a community-based event detector. Applying this to the legislative voting network in Ukraine’s 8th convocation, we identify the Euromaidan Revolution as a major event, and show that the network structure is vastly different before and after.