Nowadays social networks are becoming an essential ingredient of our life, the faster way to share ideas and to influence people. Interaction within social networks tends to take place within communities, sets of social accounts which share friendships, ideas, interests and passions; detecting digital communities is of increasing relevance, from a social and economical point of view. In this paper, we analyze the problem of community detection from a content analysis perspective: we argue that the content produced in social interaction is a very distinctive feature of a community, hence it can be effectively used for community detection. We analyze the problem from a textual perspective using only syntactic and semantic features, including high level latent features that we denote as topics. We show that, by inspecting the content used by tweets, we can achieve very efficient classifiers and predictors of account membership within a given community. We describe the features that best constitute a vocabulary, then we provide their comparative evaluation and select the best features for the task, and finally we illustrate an application of our approach to some concrete community detection scenarios, such as Italian politics and targeted advertising.
Ramponi Giorgia, Brambilla Marco, Ceri Stefano, Daniel Florian, Di Giovanni Marco (2020). Content-based characterization of online social communities. INFORMATION PROCESSING & MANAGEMENT, 57(6), 1-11 [10.1016/j.ipm.2019.102133].
Content-based characterization of online social communities
Di Giovanni Marco
2020
Abstract
Nowadays social networks are becoming an essential ingredient of our life, the faster way to share ideas and to influence people. Interaction within social networks tends to take place within communities, sets of social accounts which share friendships, ideas, interests and passions; detecting digital communities is of increasing relevance, from a social and economical point of view. In this paper, we analyze the problem of community detection from a content analysis perspective: we argue that the content produced in social interaction is a very distinctive feature of a community, hence it can be effectively used for community detection. We analyze the problem from a textual perspective using only syntactic and semantic features, including high level latent features that we denote as topics. We show that, by inspecting the content used by tweets, we can achieve very efficient classifiers and predictors of account membership within a given community. We describe the features that best constitute a vocabulary, then we provide their comparative evaluation and select the best features for the task, and finally we illustrate an application of our approach to some concrete community detection scenarios, such as Italian politics and targeted advertising.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.