{"id":376,"date":"2022-03-19T17:40:08","date_gmt":"2022-03-19T21:40:08","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/?post_type=chapter&#038;p=376"},"modified":"2022-06-14T08:18:39","modified_gmt":"2022-06-14T12:18:39","slug":"data-visualizations-with-r","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/chapter\/data-visualizations-with-r\/","title":{"raw":"Student Testimonial: Data Visualizations with R","rendered":"Student Testimonial: Data Visualizations with R"},"content":{"raw":"<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">Box 10.9 - Student Testimony - Making Data Visualizations with R<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nMy thesis made use of simple graphics in R to present basic word frequencies of key terms in my data. R is an open-source coding tool with as much flexibility as its coding language will allow. It can organize information and visualize data through infographics, plots, charts, you name it. Despite its wide applicability, however, the language can be forbidding to the uninitiated coder. I found that each term and function can easily become dependent on more subtle information about the logic of R, resulting in many late nights on Reddit forums to understand my botched attempts to make a simple graph. In the hope that my suffering with R can make the process easier for you, I have presented a simple five step guide to descriptive statistics on R along with some resources for further exploration.\r\n\r\n&nbsp;\r\n<p style=\"text-align: right\"><strong>Alexander Wilson, Sociology Honours student, 2020-2021<\/strong><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><strong>Download R, RStudio and Install a Plotting Package<\/strong><\/h2>\r\n<ul>\r\n \t<li>R For Windows: <a href=\"https:\/\/cran.r-project.org\/bin\/windows\/base\/\">Download R-4.1.2 for Windows. The R-project for statistical computing.<\/a><\/li>\r\n \t<li>R For Mac: <a href=\"https:\/\/cran.r-project.org\/bin\/macosx\/\">R for macOS (r-project.org)<\/a><\/li>\r\n \t<li>RStudio: <a href=\"https:\/\/www.rstudio.com\/products\/rstudio\/download\/\">Download the RStudio IDE - RStudio<\/a><\/li>\r\n<\/ul>\r\nOnce you have downloaded both, you should be able to start up the RStudio application, which will take you to a blank coding terminal. The RStudio package is to help make coding through R easier. It is neater and will predict the coding functions you are trying to type in.\r\n\r\nAfter opening R, type in (or copy):\u00a0 install.packages(\u201cggplot2\u201d)\r\n\r\nWhich should install the latest version of the data visualization package ggplot2.\r\n<h2><strong>Starting up ggplot<\/strong><\/h2>\r\nThe following link will take you to the website of ggplot2, which has extra resources for downloading and a cheat sheet of the relevant functions you will need to know.\r\n\r\nPlotting Package &amp; Cheat Sheet: <a href=\"https:\/\/sourceforge.net\/projects\/ggplot2.mirror\/\">ggplot2 download | SourceForge.net<\/a>\r\n\r\nOnce you have downloaded ggplot2, you will need to load it to use it. The load function is below:\r\n\r\nlibrary(ggplot2)\r\n<h2><strong>Input your data<\/strong><\/h2>\r\nTo be able to visualize data on R, first it must be organized within the system. This can be simply done through the creation of basic quantitative variables. You can create a simple bivariate data frame in R like so:\r\n\r\ndata.frame(age = c(9, 10, 11, 12, 13), grade = c(6, 7, 8, 9, 10))\r\n\r\nWhere age has five cases, namely 9, 10, 11, 12, 13; and grade has five cases, 6, 7, 8, 9, 10. It is helpful to name the data.frame for simple use like so:\r\n\r\nage_grade &lt;- data.frame(age = c(9, 10, 11, 12, 13), grade = c(6, 7, 8, 9, 10))\r\n\r\nFrom here, you can begin to plot simple descriptive statistics by typing \u201cage_grade\u201d into the ggplot functions outlined below.\r\n<h2><strong>ggplot Legend and Functions<\/strong><\/h2>\r\nFor instance, taking our previous example, you could create a simple box chart of our data.frame (age_grade).\r\n\r\nFirst you begin with the basic form of all ggplot functions\r\n\r\nggplot(data = &lt;DATA&gt;, Mapping = aes(&lt;MAPPINGS&gt;)) +\u00a0 &lt;GEOM_FUNCTION&gt;()\r\n\r\nwhere:\r\n\r\ndata = your file of data (in this case age)\r\n\r\nmapping = which is determined by the function aes and then the axes that your function is using (i.e. x, y, z). It typically runs like so, aes(x = weight, y = age).\r\n\r\nGEOM FUNCTION = the various charts you can visualize your data through (such as boxplots, geom_boxplot())\r\n\r\nPut these together with your data like so:\r\n\r\nggplot(data = age_grade, mapping = aes(x = age, y = grade) + geom_boxplot())\r\n\r\nAnd you should be presented with the simple following chart:\r\n\r\n&nbsp;\r\n\r\nUse the following legend to be able to map out your coordinates according to many different visualizations. For simple repeated use, save your ggplot function like so:\r\n\r\nage_grade_plot &lt;- ggplot(data = age_grade, mapping = aes(x = age, y = grade))\r\n\r\nAnd then simply add age_grade_plot to the geom function you want to use:\r\n\r\nage_grade_plot + geom_bar()\r\n\r\nAnd you should get that function. Screenshot it and it is yours to present in your paper!\r\n\r\n&nbsp;\r\n\r\n[table id=36]\r\n\r\n&nbsp;\r\n\r\n[table id=37]\r\n<h1>References<\/h1>\r\nWickham, H. (2016). Getting started with ggplot2. ggplot2 (pp. 11-31). Springer International Publishing.\u00a0<a href=\"https:\/\/doi.org\/10.1007\/978-3-319-24277-4_2\">https:\/\/doi.org\/10.1007\/978-3-319-24277-4_2<\/a>","rendered":"<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">Box 10.9 &#8211; Student Testimony &#8211; Making Data Visualizations with R<\/header>\n<div class=\"textbox__content\">\n<p>My thesis made use of simple graphics in R to present basic word frequencies of key terms in my data. R is an open-source coding tool with as much flexibility as its coding language will allow. It can organize information and visualize data through infographics, plots, charts, you name it. Despite its wide applicability, however, the language can be forbidding to the uninitiated coder. I found that each term and function can easily become dependent on more subtle information about the logic of R, resulting in many late nights on Reddit forums to understand my botched attempts to make a simple graph. In the hope that my suffering with R can make the process easier for you, I have presented a simple five step guide to descriptive statistics on R along with some resources for further exploration.<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: right\"><strong>Alexander Wilson, Sociology Honours student, 2020-2021<\/strong><\/p>\n<\/div>\n<\/div>\n<h2><strong>Download R, RStudio and Install a Plotting Package<\/strong><\/h2>\n<ul>\n<li>R For Windows: <a href=\"https:\/\/cran.r-project.org\/bin\/windows\/base\/\">Download R-4.1.2 for Windows. The R-project for statistical computing.<\/a><\/li>\n<li>R For Mac: <a href=\"https:\/\/cran.r-project.org\/bin\/macosx\/\">R for macOS (r-project.org)<\/a><\/li>\n<li>RStudio: <a href=\"https:\/\/www.rstudio.com\/products\/rstudio\/download\/\">Download the RStudio IDE &#8211; RStudio<\/a><\/li>\n<\/ul>\n<p>Once you have downloaded both, you should be able to start up the RStudio application, which will take you to a blank coding terminal. The RStudio package is to help make coding through R easier. It is neater and will predict the coding functions you are trying to type in.<\/p>\n<p>After opening R, type in (or copy):\u00a0 install.packages(\u201cggplot2\u201d)<\/p>\n<p>Which should install the latest version of the data visualization package ggplot2.<\/p>\n<h2><strong>Starting up ggplot<\/strong><\/h2>\n<p>The following link will take you to the website of ggplot2, which has extra resources for downloading and a cheat sheet of the relevant functions you will need to know.<\/p>\n<p>Plotting Package &amp; Cheat Sheet: <a href=\"https:\/\/sourceforge.net\/projects\/ggplot2.mirror\/\">ggplot2 download | SourceForge.net<\/a><\/p>\n<p>Once you have downloaded ggplot2, you will need to load it to use it. The load function is below:<\/p>\n<p>library(ggplot2)<\/p>\n<h2><strong>Input your data<\/strong><\/h2>\n<p>To be able to visualize data on R, first it must be organized within the system. This can be simply done through the creation of basic quantitative variables. You can create a simple bivariate data frame in R like so:<\/p>\n<p>data.frame(age = c(9, 10, 11, 12, 13), grade = c(6, 7, 8, 9, 10))<\/p>\n<p>Where age has five cases, namely 9, 10, 11, 12, 13; and grade has five cases, 6, 7, 8, 9, 10. It is helpful to name the data.frame for simple use like so:<\/p>\n<p>age_grade &lt;- data.frame(age = c(9, 10, 11, 12, 13), grade = c(6, 7, 8, 9, 10))<\/p>\n<p>From here, you can begin to plot simple descriptive statistics by typing \u201cage_grade\u201d into the ggplot functions outlined below.<\/p>\n<h2><strong>ggplot Legend and Functions<\/strong><\/h2>\n<p>For instance, taking our previous example, you could create a simple box chart of our data.frame (age_grade).<\/p>\n<p>First you begin with the basic form of all ggplot functions<\/p>\n<p>ggplot(data = &lt;DATA&gt;, Mapping = aes(&lt;MAPPINGS&gt;)) +\u00a0 &lt;GEOM_FUNCTION&gt;()<\/p>\n<p>where:<\/p>\n<p>data = your file of data (in this case age)<\/p>\n<p>mapping = which is determined by the function aes and then the axes that your function is using (i.e. x, y, z). It typically runs like so, aes(x = weight, y = age).<\/p>\n<p>GEOM FUNCTION = the various charts you can visualize your data through (such as boxplots, geom_boxplot())<\/p>\n<p>Put these together with your data like so:<\/p>\n<p>ggplot(data = age_grade, mapping = aes(x = age, y = grade) + geom_boxplot())<\/p>\n<p>And you should be presented with the simple following chart:<\/p>\n<p>&nbsp;<\/p>\n<p>Use the following legend to be able to map out your coordinates according to many different visualizations. For simple repeated use, save your ggplot function like so:<\/p>\n<p>age_grade_plot &lt;- ggplot(data = age_grade, mapping = aes(x = age, y = grade))<\/p>\n<p>And then simply add age_grade_plot to the geom function you want to use:<\/p>\n<p>age_grade_plot + geom_bar()<\/p>\n<p>And you should get that function. Screenshot it and it is yours to present in your paper!<\/p>\n<p>&nbsp;<\/p>\n<table id=\"tablepress-36\" class=\"tablepress tablepress-id-36 tbody-has-connected-cells\">\n<thead>\n<tr class=\"row-1\">\n<th colspan=\"2\" class=\"column-1\"><b>Table 10.5 - Terminology with R<\/b><\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping\">\n<tr class=\"row-2\">\n<td class=\"column-1\"><b>Term<\/b><\/td>\n<td class=\"column-2\"><b>Definition<\/b><\/td>\n<\/tr>\n<tr class=\"row-3\">\n<td class=\"column-1\">Data<\/td>\n<td class=\"column-2\">Data\u00a0you visualize and a set of outlines of how you want to make it look appealing (choice of colour, bolding, etc.).<\/td>\n<\/tr>\n<tr class=\"row-4\">\n<td class=\"column-1\">Layers<\/td>\n<td class=\"column-2\">Layers\u00a0are the statistical summaries of that data which will be represented by geometric objects,\u00a0geoms for short, that show what you see on the plot: points, lines, polygons, and so forth.<\/td>\n<\/tr>\n<tr class=\"row-5\">\n<td class=\"column-1\">Scales<\/td>\n<td class=\"column-2\">Scales show the ratio or proportion in which you have mapped your data onto your graphic.<\/td>\n<\/tr>\n<tr class=\"row-6\">\n<td class=\"column-1\">Coord<\/td>\n<td class=\"column-2\">Coord stands for a coordinate system. The coordinate system describes where the data is shown on the plane of the graphic. It provides axes and gridlines to conceptualize the data onto space. A coordinate system,\u00a0coord\u00a0for short, describes how data coordinates are mapped to the plane of the graphic. It also provides axes and gridlines to make it possible to read the graph.<\/td>\n<\/tr>\n<tr class=\"row-7\">\n<td class=\"column-1\">Faceting<\/td>\n<td class=\"column-2\">Faceting can break up the data into smaller subsets and make decisions about how to use these smaller groupings of data.<\/td>\n<\/tr>\n<tr class=\"row-8\">\n<td class=\"column-1\">Theme<\/td>\n<td class=\"column-2\">The theme refers to choices of presentation such as colour or font.<\/td>\n<\/tr>\n<tr class=\"row-9\">\n<td colspan=\"2\" class=\"column-1\"><small>Source: Wilson, A. (2021). Driver\u2019s of Dissidence: A Discourse Analysis of Vancouver\u2019s Road to Ride-Hailing. Undergraduate Thesis. (p. 13).<\/small><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><!-- #tablepress-36 from cache --><\/p>\n<p>&nbsp;<\/p>\n<table id=\"tablepress-37\" class=\"tablepress tablepress-id-37 tbody-has-connected-cells\">\n<thead>\n<tr class=\"row-1\">\n<th colspan=\"2\" class=\"column-1\"><b>Table 10.6  - R Functions<\/b><\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping\">\n<tr class=\"row-2\">\n<td class=\"column-1\"><b>Term<\/b><\/td>\n<td class=\"column-2\"><b>Definition<\/b><\/td>\n<\/tr>\n<tr class=\"row-3\">\n<td class=\"column-1\">Getting Started<\/td>\n<td class=\"column-2\">Basic structure: ggplot(mpg, aes(x = displ, y = hwy) +<\/p>\n<p>Layers develop: geom_point()<\/p>\n<p>You can add colour to the last component. IE: ggplot(mpg, aes(x = disl, y = hwy, colour = class).<\/p>\n<p>Faceting entails splitting the data into subsets and displaying the same graph for each subset.<\/p>\n<p>It is done with the function, facet_wrap()<\/td>\n<\/tr>\n<tr class=\"row-4\">\n<td class=\"column-1\">Geom Functions<\/td>\n<td class=\"column-2\">geom_smooth()\u00a0fits a smoother to the data and displays the smooth and its standard error.<\/p>\n<p>geom_boxplot()\u00a0produces a box-and-whisker plot to summarize the distribution of a set of points.<\/p>\n<p>geom_histogram()\u00a0and\u00a0geom_freqpoly()\u00a0show the distribution of continuous variables.<\/p>\n<p>geom_bar()\u00a0shows the distribution of categorical variables.<\/p>\n<p>geom_path()\u00a0and\u00a0geom_line()\u00a0draw lines between the data points. A line plot is constrained to produce lines that travel from left to right, while paths can go in any direction. Lines are typically used to explore how things change over time.<\/td>\n<\/tr>\n<tr class=\"row-5\">\n<td class=\"column-1\">Histograms and Frequency Polygons<\/td>\n<td class=\"column-2\">ggplot(mpg, aes(hwy)) + geom_histogram()<\/p>\n<p>stat_bin() using bins = 30<\/p>\n<p>or ggplot(mpg, aes(hwy)) + geom_freqpoly(binwidth= 2.5)<\/td>\n<\/tr>\n<tr class=\"row-6\">\n<td class=\"column-1\">Bar Charts<\/td>\n<td class=\"column-2\">geom_bar()<\/td>\n<\/tr>\n<tr class=\"row-7\">\n<td class=\"column-1\">Time Series with Line and Path Plots<\/td>\n<td class=\"column-2\">ggplot(economics,\u00a0aes(date,\u00a0unemploy\u00a0\/\u00a0pop))\u00a0+<\/p>\n<p>geom_line()<\/td>\n<\/tr>\n<tr class=\"row-8\">\n<td colspan=\"2\" class=\"column-1\"><small>Source: Wickham, H. (2016). Getting started with ggplot2.\u00a0ggplot2\u00a0(pp. 11-31). Springer International Publishing.\u00a0https:\/\/doi.org\/10.1007\/978-3-319-24277-4_2<\/small><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><!-- #tablepress-37 from cache --><\/p>\n<h1>References<\/h1>\n<p>Wickham, H. (2016). Getting started with ggplot2. ggplot2 (pp. 11-31). Springer International Publishing.\u00a0<a href=\"https:\/\/doi.org\/10.1007\/978-3-319-24277-4_2\">https:\/\/doi.org\/10.1007\/978-3-319-24277-4_2<\/a><\/p>\n","protected":false},"author":1076,"menu_order":7,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-376","chapter","type-chapter","status-publish","hentry"],"part":232,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/pressbooks\/v2\/chapters\/376","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/wp\/v2\/users\/1076"}],"version-history":[{"count":14,"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/pressbooks\/v2\/chapters\/376\/revisions"}],"predecessor-version":[{"id":1799,"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/pressbooks\/v2\/chapters\/376\/revisions\/1799"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/pressbooks\/v2\/parts\/232"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/pressbooks\/v2\/chapters\/376\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/wp\/v2\/media?parent=376"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/pressbooks\/v2\/chapter-type?post=376"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/wp\/v2\/contributor?post=376"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/undergradresearch\/wp-json\/wp\/v2\/license?post=376"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}