RESEARCH PAPER
Visualizing the Covid-19 Pandemic
Have you ever wondered what it would be like to create anything you wanted? The power to create anything with the use of technology is an ethereal feeling. Data is everywhere in our modern society and being able to interpret is an important skill. The ability to dissect data sets and extrapolate the necessary information is something that has always been an interest to me. Before starting to code, learning about data visualizations is important. The main purpose is to use Python in order to visualize data in a way that anyone can digest. The traditional way of visualizing data is through a graph of some sort. Bar graphs, histograms, and line plots are some of the quintessential visualization tools of the 20th century (Lutz). However, in the 21st century, one could even go beyond and visualize data in other ways. Choropleth maps are a form of visualization that has become more accessible due to improvements in technology. They are easily understood and can convey a variety of messages. In addition, they are also visually nice to look at. The culmination of these factors is what pushed choropleth visualizations to be chosen. Data visualization, color theory, and programming languages were utilized in order to demonstrate the effect of Covid-19 on the United States.
The research question was formulated in order to enable as much research as possible about data visualization. This is because the more knowledge that is gained, the more polished the final product will look. This vast knowledge will create the final product. A myriad of sources ranging from websites to books helped this project greatly. Many sources on this topic are either in the form of a book or website documentation. A lot of the information on the website was offered by the developers of the Python libraries, which was a major reason that they were chosen. It was as close to a primary source that could be found regarding the syntax in the code. Other scholarly journals typically talked about a newer tool that was developed and had little to no relevance to the project. The guiding questions helped traverse the oceans of programming knowledge. The questions focused on four main questions: how, why, where, and how much. The first steps were to learn how to make the visualization. It won’t create itself, and the knowledge of how to make it is essential. The following steps would involve choosing colors and time, which raises the question of why, as in why this color. The visualization will have lots of colors, and knowing which colors to use and why will benefit the project. Learning ‘where’ is the last important step. Where can reliable datasets be found, and where can they be accessed. The data is what the whole visualization is modeling; without it, there would be nothing. Therefore, finding the datasets is crucial to the success of this project. A vital acknowledgment is a lack of scholarly journals in the bibliography due to the aforementioned lack of relevancy. Despite this, many reliable, primary sources are available on official websites such as anaconda.org or python.com. The information was handwritten by the developers making it as close to primary as one can get.
Data visualization is the graphical representation of information and data. People want to see colors and shapes rather than abstract ideas such as numbers and percentages. It makes the data easier to digest and creates a better understanding for the viewer. In addition, to converge with the modern, technological world, visualizations are able to conveniently mesh with websites, unlike traditional mediums (Dash). But before the specifics of the visualization come into play, it is important to start at the context, “success in data visualization does not start with data visualization. Rather, before one begins down the path of creating a data visualization or communication, attention and time should be paid to understanding the context for the need to communicate” (Knaflic). Visualizations are created in order to get a point across, and the point is left to the creator. Acknowledging what a visualization is, the next step was to learn how to make it. There are many ways to create a visualization, but the one that stood out the most was making one in Python. Yes, there are other easier visualization tools on the market, but none offer the freedom and power that Python does—anything could be visualized if one utilizes Python. Other tools limit the user which is a huge negative (Jolly). Utilizing Python’s libraries, a set of functions already created, the visualization was created. The main libraries used were Bokeh, Matplotlib, Pandas, and Numpy. These four helped read the data, manufacture the data, and then finally take the data and visualize it. It had many colors and looked aesthetically pleasing, though it was not created as a piece of art. Even though they include elements of art, a visualization's main job is to get a point across. The point that is trying to be made through this project is to show the areas of a country that are most affected by Covid and the areas that are least affected by it. In a Cambridge research paper written by Mark Shelly, it has been established that the first step to doing this is to thoroughly understand the data. The variables at use must be checked before any of the visualization processes begin. Spread, range, maximum, minimum, and many other statistical measures of data are things that should be checked (Shelly). Through classic methods of analysis, a lot can be learned. Jumping straight into the visualization without properly respecting the data will lead to a skewed visualization. From here, further questions can be raised and pursued. Another huge part of the visualization was about the color choice and which colors would be best to utilize.
Through the research, it is evident that some colors are more useful than others. Everyone has their own preference on colors and which ones they like, but there is science behind which colors are most empirically eye-catching. “...demonstrated red to be more stimulating than green and green more stimulating than blue and yellow. According to this study, colors in long wavelengths are more stimulating than those in short wavelengths” (Renk Etkisi). Colors in a longer wavelength can help catch the eye of the viewer. The color with the longest wavelength is red, and the one with the shortest is blue/violet. Because red has the longest wavelength, it will be utilized to show areas of dense covid cases. Even though colors like purple and blue have shorter wavelengths, they can still be utilized to show an infographic of vaccinations across an area because colors that are shorter in wavelength are more relaxing to the human psyche. The visualization is telling a story and using the proper colors to fit the emotion that is being conveyed greatly helps. For example, a visualization showing Covid-19 cases should have colors such as red in order to create a sense of danger and urgency. Otherwise, the viewer might feel relaxed and have a less apprehensive attitude. Another thing that has to be brought up is the color range available, which is hex colors. Hex colors are six-digit codes used to represent colors. The number of color combinations possible is 2563, which is the total amount of RGB values (Rhyne). Technically, there are 2564 combinations if transparency is taken into account, but officially transparency is not supported by many libraries, so it is unknown if that will be utilized in this project. Most monitors are built with the ability to show these colors, which is why they are being used. Using colors that some monitors cannot display would negatively affect my project because it means some people cannot view it on their personal devices. Due to improvements in technology, the color range is quite broad and should not be a limiting factor in this project. Had this project been done 20 years ago with the same processing power but with older monitors, colors would be a limiting factor. In addition to colors, another important part of the project was choosing Python to be the language of choice.
Among the many programming languages, Python was best suited for this project for a multitude of reasons. “And unlike C, the Python code can be run immediately, without compiling and linking; changes can be tested much quicker in Python” (Lutz). Python is the most optimal language for data science because it is one of the most efficient languages. It can handle large quantities of data unlike any other language, which is why it is so popular in the data science community. In addition, there are many functions to deal with math and statistics, which make it optimal for my project. Some would argue for C++ or Java for its procedural programming for intensive functions or app compatibility, respectively. They both are powerful languages with equal, if not larger audiences (Adams). The biggest difference is the power Python has with regard to datasets. Other languages would have a hard time trudging through big sets of data. Python, however, can do it easily using the libraries native to it. Utilizing Pandas or another library makes reading data smoother than any other language. With the addition of many other features (ease of use, data science, computing power, libraries), Python is the most optimal language for this project. Most other languages would likely face countless errors that the programmer could do nothing to fix. Another thing that other programming languages don’t have is the number of public libraries that Python has (Python). A public library is essentially a set of pre-made functions that developers can use to efficiently finish the project as they do not have to rewrite these functions from scratch. The functions are already made and are on websites such as Github or are already integrated into Python. In other languages, the process would take longer as one would have to manually write the functions and then integrate them into the code. But through Python libraries, the writing process of the code is eliminated (Seaborn). The only step left is to integrate it into the code, effectively cutting the workload in half. However, since the code was written by another programmer, the next challenge is to learn the library’s purpose and how to use it. The learning curve for this is quite sharp, as many libraries have a variety of functions. But after reading through documentation and practice, it starts to get monotonous and easier. The functions start to make more sense, and filling in the parameters becomes easier.
In conclusion, the research has explained what a visualization is, what colors are best suited for it, and what programming language is most effective. A visualization is any form of data envisionment. The forms vary from traditional to more modern visualization. The best option for this project is to go with a choropleth map to show the effects of Covid-19 on an area. In addition, the research about color theory will aid the decision-making process when creating the final project. Utilizing colors with a longer wavelength will help instill urgency in the viewer’s eyes and also get their attention. Lastly, Python was chosen as the programming language because of its efficiency when working with raw data. The next steps are a culmination of the research and to create the visualization by putting the three research topics together. Further questions that have arisen are on the topic of certain functions within Python. One question that had to be answered was the question of where the data can be found. Through an interview with a professional interview, it has been learned that information on Covid-19 can be found through John Hopkins (Karki). A major skill that has been learned over this process is the ability to persevere. Essentially no code ever runs seamlessly on the first test, and it is important to go back and fix the bugs. Through this process, the life lesson of perseverance has been received and implemented in the other facets of life.
Works Cited
Adams, Chad. Learning Python Data Visualization: Master How to Build Dynamic
HTML5-Ready SVG Charts Using Python and the Pygal Library. Birmingham, Uk, Packt
Publishing Ltd, 2014.
Anaconda. “Getting Started — Anaconda Documentation.” Anaconda, 2021,
docs.anaconda.com, Accessed 20 Sept. 2021.
Dash. “Dash Documentation & User Guide | Plotly.” Dash Python User Guide, 2021,
dash.plotly.com/.
Jolly, Kevin. Hands-on Data Visualization with Bokeh: Interactive Web Plotting for Python
Using Bokeh. Birmingham, UK, Packt Publishing, 2018.
Karki, Anjan. Interview. Aditya Shah. October 18, 2021.
*Knaflic, Cole Nussbaumer. Storytelling with Data: A Data Visualization Guide for Business
Professionals. Hoboken, New Jersey, Wiley, 4 Oct. 2015.
Lutz, Mark. Programming Python : [Solutions for Python Programmers; Covers Python 2].
Beijing, O’reilly, 2001, books.google.com.np/ Accessed 28 Sept. 2021.
Python. “Welcome to Python.” Python, 29 May 2019, www.python.org/. Accessed 16 Sept.
2021.
“Renk Etkisi | the Effect of Color | Psychology and Color.” Renketkisi.com, 2017,
renketkisi.com/en/psychology-and-color.html.
Rhyne, Theresa. “Applying Color Theory to Digital Media and Visualization.” ACM Digital
Library, 6 May 2017, dl.acm.org. Accessed 22 Sept. 2021.
Seaborn. “User Guide and Tutorial — Seaborn 0.11.2 Documentation.” Seaborn.pydata.org,
2021, seaborn.pydata.org/tutorial.html. Accessed 20 Sept. 2021.
Shelly, Mark A. “Exploratory Data Analysis: Data Visualization or Torture?” Infection Control
and Hospital Epidemiology, vol. 17, no. 9, [Cambridge University Press, Society for
Healthcare Epidemiology of America], 1996, pp. 605–12, https://doi.org/10.2307/30141948.