The Unit of the University of Bologna will be responsible for the technical development of the models, languages and tools to be used by the sociological units to access and analyze the large amount of data collected from selected Social Network Sites. This activity will not only provide technical support to the other units, to enable their advanced analyses, but will also be devoted to the development of new computing methods: given the complexity of the data, consisting of a combination of structured, graph, and unstructured information with uncertain values generated by the extraction process, and given the size of the data, consisting of millions of nodes and several Gigabytes of text messages, the development of these languages and tools will in fact require the definition of novel data query and analysis methods. The activities of the unit can be summarized as follows: - Definition of a model for a database containing social data (Large Social Database), enabling the representation and storage of complex social structures and associated attributes. - Extraction of the data from selected Social Network Sites identified in collaboration with the sociological units and according to the available APIs (data access methods provided by the sites) and privacy policies. - Definition and implementation of a social-aware language to access the data, to be then used by the sociological units (SocQL). - Development of automated and semi-automated social data analysis methods. These methods will be innovative in their ability to deal with data presenting three characteristic aspects of socio-technical environments: graph structure (defined by user connections and relationships between their actions, e.g., messages), large size, and uncertainty (generated by the limited availability of some relevant information, due to privacy settings, unstructured Web contents and missing data, and by the consequent information extraction activity). As highlighted in our overview of the state of the art, many techniques have been developed to deal with these aspects but not with data presenting all these features together - as it usually happens in computer-mediated social contexts.
BOCCIA ARTIERI G., MESSINA A., COLOMBO F., PASQUALI F., GRECO G. (2013). Social relations and network identity: data extraction, models and query languages.
Social relations and network identity: data extraction, models and query languages
MESSINA, ANTONIO;
2013
Abstract
The Unit of the University of Bologna will be responsible for the technical development of the models, languages and tools to be used by the sociological units to access and analyze the large amount of data collected from selected Social Network Sites. This activity will not only provide technical support to the other units, to enable their advanced analyses, but will also be devoted to the development of new computing methods: given the complexity of the data, consisting of a combination of structured, graph, and unstructured information with uncertain values generated by the extraction process, and given the size of the data, consisting of millions of nodes and several Gigabytes of text messages, the development of these languages and tools will in fact require the definition of novel data query and analysis methods. The activities of the unit can be summarized as follows: - Definition of a model for a database containing social data (Large Social Database), enabling the representation and storage of complex social structures and associated attributes. - Extraction of the data from selected Social Network Sites identified in collaboration with the sociological units and according to the available APIs (data access methods provided by the sites) and privacy policies. - Definition and implementation of a social-aware language to access the data, to be then used by the sociological units (SocQL). - Development of automated and semi-automated social data analysis methods. These methods will be innovative in their ability to deal with data presenting three characteristic aspects of socio-technical environments: graph structure (defined by user connections and relationships between their actions, e.g., messages), large size, and uncertainty (generated by the limited availability of some relevant information, due to privacy settings, unstructured Web contents and missing data, and by the consequent information extraction activity). As highlighted in our overview of the state of the art, many techniques have been developed to deal with these aspects but not with data presenting all these features together - as it usually happens in computer-mediated social contexts.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.