Extraction and analysis of data from Online Social Networks

Sandwich Seminar

This seminar will survey the conceptual framework and the research results we obtained in Messina on analysing some of the user-generated content now available from Online Social Networks (OSNs). I will describe how starting from research in Web data extraction we have become interested in different issues that are now becoming of great interest, in view of the almost-ubiquity of OSNs and their ever-increasing base of content-generating users.

First of all, I will cover the extraction and analysis of [snapshots of] the Facebook friendship graph: what can (still) be done? How to study FB friendship and its evolution? I will describe the main features of two (large samples) we extracted from Facebook by applying two different sampling strategies. Extracted samples have been studied by applying methods which are largely accepted in the field of Complex Network Analysis (vertex degree distribution, clustering coeffcient, diameters and so on).

Second, I will cover the topic of community detection inside OSN, a problem of obvious relevance and notorious computational complexity. I will briefly glance our solution, the CONCLUDE algorithm, and argue for its effectiveness and accuracy. Our results are twofold: on one hand we designed randomized algorithms to weight network edges and this tasks proves to be useful to improve the accuracy of the whole community detection problem. On the other hand I will illustrate some experimentas showing that our approach outperforms other, well-known algorithms when applied on large, real-world OSN instances.

Finally, I will introduce our latest work on the aNobii network of book-lovers (bibliophiles) where we studied the intensity of a user's participation to the SN in terms of i) joining groups (e.g., that on French literature) or assigning tags to books they've read. We have designed, implemented and validated a sampling algorithm that finds a good approximation of the probability distribution of joint user profiles. Our algorithm can be seen as an instantiation of the AA meta-algorithm of Dagum, Karp et al. Its complexity is controlled by the number of samples of a certain class it must find, even though the number of iterations is not fixed a priori; the overall error is bounded.


Alessandro Provetti (University of Messina)

Wednesday, February 20, 2013 - 11:00
to 12:00