Social media contain open and public discussions about every conceivable topic. These discussions can provide invaluable insights into the views and narratives among consumers, influencers, and businesses. But the information is unstructured, in the form of text and images, and spread out across a large number of social media platforms. Making sense of it has typically required programming and data science skills: the data needs to be collected, preprocessed, structured, and analyzed. This post presents a new approach, one which does not require any programming or expert text analytics skills.
What value can we get from social media? In relation to any given topic, analyzing posts from social media can generate:
- An unbiased overview of discussions and narratives
- Detailed insights into specific discussions
- Mapping of trends and spotting of emerging phenomena
- Identification of influencers and enthusiasts
- An understanding of hot content attracting online interest
In this post, we take a look at how to extract such rich insights by mining social media, using simple drag-and-drops in our text analytics tool Dcipher Analytics. In this particular case, we look into the online discussion around sustainability. Though Dcipher Analytics provides access to tens of thousands of sources – including Twitter, YouTube, and worldwide forums and blogs – here we choose to focus on two: public Facebook and Instagram posts.
Getting and prepping the data
Downloading data from social media and importing it into Dcipher Analytics is a simple matter of defining criteria for the posts of interest and clicking “Import”. We use the keyword “sustainability” and download the latest 10,000 posts from each channel.
Before analyzing the posts, it is a good idea to do a bit of cleaning. We can, for example, add a filter to exclude all posts that are not written in English. We can also apply the Clean text operation to get rid of unneeded tags and symbols from the text.
1. Getting an overview of topics in the discussion
The easiest way to get a quick overview of the data is to look at what hashtags are used the most. We do this by finding the hashtag column in the dataset and drag-and-dropping it to the `Display as bubbles` field in the Bubble workbench. This counts the occurrence of each hashtag and displays them as bubbles, where the larger bubbles indicate higher frequencies. If we prefer, we can switch to displaying hashtags as words instead of bubbles.
The word cloud shows that the sustainability related discussions on Facebook revolve around topics such as zero waste, sustainable living, recycling and reuse, and eco friendly fashion and design.
For a higher-level overview, we can cluster hashtags so that tags that tend to be used in the same posts are connected. This is done by drag-and-dropping the hashtags to the Display as network drop zone in the Bubble workbench. We now see not only individual hashtags, but a network of connected hashtags that form a topic landscape.
The network shows that education, inspiration, organic, and fashion are concepts that bridge multiple topics of discussion.
Drag-and-drop the word you are interested in to the Document Summary workbench to score and sort the posts based on your words of interest.
For easier comparison, drag-and-drop interesting keywords to the Bar chart workbench. By default, the height of the bar chart represents the number of posts mentioning the given word. The dataset also contains engagement metrics, such as number of likes and shares. Drag-and-dropping these to the Bar workbench allows us to use a function of these values, for example to view the sum or average number of shares for posts about each topic.
2. Finding influencers and enthusiasts
To find out who is most active and influential in relation to the topic, we use the Authors column, which contains information about the account that published each post. By dragging it to Document Summary workbench we can choose to group by author name and count the number of posts published by each author. The engagement metrics can be used to generate additional information about the authors, such as the total number of likes and shares of their posts.
Sorting the authors by number of published posts gives the most prolific authors – let us call them the enthusiasts, those who are writing a lot about sustainability. Sorting by the engagement measures provides the most influential accounts publishing about sustainability on Facebook.
We can see that the most prolific public accounts on Facebook in the area of sustainability include Rate It Green, Palm Oil Free Certification, and Semodu AG. Measured by engagement, however, politician Nasir El-Rufai, satirical newspaper The Onion, and journalist Rosana Jatobá top the list.
3. Spotting hot content
People are constantly posting links to content they find interesting, both on and off the social media platforms they are active on. Mapping these links can give insights into what content people in the know are talking about, liking, and sharing. Links in posts are structured and available in the ‘attached_link’ column. By following the same procedure as when we mapped authors above, we get a list of the most frequently posted, liked, and shared links. We can export the list so that we can go through the links one by one in our web browser.
4. Finding insights about specific topics
While the topic overview described above is useful if we want to let themes emerge from the data and find topics we would otherwise not have thought of looking for, we need another approach if we want to gain in-depth insights into specific topics of interest.
Both the Document Summary workbench and the Bubble workbench contain search functionality. In this case, we are interested in recycling, and choose to investigate how the topic is discussed in the downloaded Instagram posts. We search for words containing “recycl” using the search bar of Bubble workbench to highlight all matching words. Having located the words we are interested in, we can right-click one or more of them and use them as seeds for a similarity search. As a result, other words used in the same posts as recycling appear.
To read individual post, select and drag-and-drop the words of interest to the Document Summary workbench. This will display the most relevant posts, so that we can read them and understand the context in-depth.
We can see that discussions about recycling revolve around plastic products, alternatives such as glass containers, biodegradable materials such as bamboo, recyclable paper and packaging, landfills, and waste water.
5. Creating a meaning-based landscape of posts
If the hashtag and word networks feel too abstract, we can use another approach called document landscaping. This is triggered by drag-and-dropping the post to the Scatter plot workbench to let the posts self-organize according to contextual similarity. We can navigate the landscape and mouse-over dots to read individual posts.
In our case, we see that one of the clusters is about deforestation and saving the rainforest, while another cluster relates to plastics ending up in the ocean. We can explore the entire landscape and label the different parts of it to get an overview of the discussion. Document landscapes are ideal for viewing the forest without loosing sight of the trees.
By selecting the posts we can get more information about them in other workbenches. If we want to quantify our findings, we can tag posts and analyze and visualize the tags. If we want to gain qualitative insights, we can continue researching the content in the Document Summary workbench.