Most viewed parts of an image in the Pisa Collection from The Social Picture.

The Social Picture

Abstract: The Social Picture is a framework, developed in collaboration with TIM Telecom Italia, to collect and explore huge amount of crowdsourced social images about public events, cultural sites and other customized private events. The collections can be explored through a number of advanced Computer Vision algorithms, able to capture the visual content of images in order to organize them in a semantic way.

Related Publications:

Social networks have become increasingly useful to understand people opinion and trends. Particularly, social media have changed the communication paradigm of people sharing multimedia data: users express emotions and share experiences in social networks. In social events (e.g., parties, concerts, sport matches) users are gradually changing in the so called prosumers, as they do not just use as consumers but also produces and share multimedia data related to what has captured their interest with mobile devices. The redundancy in these data, together with annexed metadata (e.g., geolocation, tags, mood-tag), can be exploited to infer social information about the attitude of the audience. For instance, systems such as MoViMash, ViComp and RECfusion are able to generate a video which describes the crowd interest starting from a set of videos by considering scene content popularity. Indeed, the popularity of a visual content is an important cue for understand the mood of crowd attending to an event or estimate how much parts of a cultural heritage are perceived as interesting. Large scale visual data from social media and other multimedia information gathered by multiple sources (e.g., mobile devices) can be processed with Machine Learning and Computer Vision algorithms in order to infer knowledge about social contexts or organize images by visual content.

We presented our framework called The Social Picture (TSP) , in which images are gathered from social networks or uploaded directly in the repository by users through a mobile app and a website. The framework is capable of collect, analyze and organize huge flows of visual data, and to allow users the navigation of image collections generated by the community. In TSP three categories of image collections are distinguished: social events, private events and cultural heritage landmarks. The collections are processed with several tools which include automatic clustering of images, intensity heatmaps and automatic image captioning. These tools allow TSP to provide users a number of representative image prototypes related to each stored collection, exploitable for different purposes (e.g., selection of the most meaningful pictures of a painting during a showcase in a museum). Automatic clustering is implemented using a Convolutional Neural Network (CNN) representation and employing an AlexNet architecture. For each image, the fc7 features are extracted and the t-SNE algorithm~\cite{maaten2008visualizing} is employed to compute a 2D embedding representation characterizing the pairwise distances between visual features. The intensity heatmap is another tool implemented in the framework. It consists of a map of values related to the number of collected pictures containing visual areas similar to the ones of a specific landmark building or area of interest. Users can interact with the heatmap selecting points on the map and retrieving images that contribuited to generate intensity values in that specific point. Finally, the automatic image captioning is a tool used to create and suggest descriptions of images, that comes useful for text-based queries perfomed by users.

We published two advanced image analysis applications : in the first one, we considered the cameras as nodes in a fully connected graph in which the edges weights are equal to the number of matches between cameras. The spanning tree of this graph was used to explore images in a meaningful way, obtaining a scene summarization. In the second application, we defined several kinds of density maps with relation to image features. We shown how density-maps can be used together with the Structure from Motion (SfM) technique to highlight parts of the image with robust visual features. Several types of density-maps have been defined with different aims. Particularly, SWD-maps represent a good tool to stress the presence of visual features even when a strong occlusion is present in the image.