Skip to content

How to Build Word Cloud in Python?

  • by

What Are Word Clouds?

A word cloud is a visual representation of text data, typically used to depict keyword density or popularity. The technique has been used in a variety of fields, including web design, business intelligence, and linguistics.

The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes. The more often a word is used, the larger it appears in the word cloud. The word clouds you create can be saved as PNG or JPG files and shared with your friends.

sample word clouds

Word Cloud in 3 Basic Steps with Python

Here is a sampling of data that can be used to demonstrate on how to do word cloud using Python Jupyter. The data is extracted from LinkedIn job postings about Data Analyst roles. This data set includes information on Data Analyst job titles, job descriptions and job requirements. The idea is to have a better understanding on different kinds of data analytics roles required in 2022, what are the typical job scope including skills needed for a data analyst job.

Data source is from:

1. Install the Wordcloud library

To install the Wordcloud library, simply run “pip install wordcloud” in your terminal.

2. Install the Wordcloud library

  • Import Pandas and Numpy Library
  • Import Matplot Library and Seaborn as library for data visualisation¬†
  • In this exercise we will use a CSV data, open CSV data using Pandas read CSV command¬†

Before conducting any data exploratory analysis, it is important to first cleanse the data. This means ensuring that all data is accurate, consistent, and complete. Data cleansing can be a time-consuming process, but it is essential in order to produce reliable results from the data analysis.

3. Generating Word Cloud

Install WordCloud and STOPWORDS from wordcloud libary. 

What are stopwords in natural language processing?

Stopwords are words which are typically filtered out of natural language processing tasks such as text classification and topic modeling. They are usually words with little meaning which would not be useful in determining the context of a document. Stopwords can vary between languages, but some common examples in English include “the”, “a”, “an”, and “is”.

Using the Python code below, we can start Visualizing the most frequent words on any Data Analyst job postings

Example: Word Cloud

Job Titles Word Cloud for a senior Data Analyst job postings

 

Job Descriptions Word Cloud for a senior Data Analyst job postings

Job Description for Data Analyst Senior Level

Job Requirements Word Cloud for a senior Data Analyst job postings

Job Requirments for Data Analyst Senior Level

 

Above word cloud visualisation helps us understand job expectations for a data analyst including skills needed in order to be successful in this role.

Why Word Cloud for Data Visualisation?

Word clouds are useful tool for visualizing the most common words used in a body of text. They can also help to reveal patterns or trends in the data. For example, a word cloud generated from a set of survey responses might show that the majority of respondents feel positive about a particular product or service.