{"id":57,"date":"2018-10-31T17:00:20","date_gmt":"2018-10-31T21:00:20","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/simplestats\/?post_type=chapter&#038;p=57"},"modified":"2020-09-21T13:37:44","modified_gmt":"2020-09-21T17:37:44","slug":"2-1-data","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/simplestats\/chapter\/2-1-data\/","title":{"raw":"2.1 Data Sets and What Data \"Looks\" Like","rendered":"2.1 Data Sets and What Data &#8220;Looks&#8221; Like"},"content":{"raw":"&nbsp;\r\n\r\nBy now you have learned that <em>variables<\/em> are tools that allow us to measure concepts and to collect information about them. As such they are comprised of information -- information that varies across the <em>units of analysis<\/em> (the 'things' on which we collect information, be it people, organizations, countries, etc.). So far, we have discussed individual variables - but creating and collecting information on a single variable is uncommon. Generally, we collect information on many variables at the same time (which, in turn, allows us to analyze variables together and hypothesize about possible associations between variables).\r\n\r\n&nbsp;\r\n\r\nVariables \"live\" in data sets (or datasets, as I prefer; both usages are common). <strong>A <em>dataset<\/em> is a collection of variables that lists the information (or observations) gathered on them from the units of analysis.<\/strong> As usual, I focus on analysis of people for simplicity's sake (but do keep in mind the units of analysis can be something else.)\r\n\r\n&nbsp;\r\n\r\nThe best way to visualize\u00a0a dataset is as a sort of a table (a.k.a a <em>matrix<\/em>) which summarizes the responses from every individual (in the rows of the table) on the variables in the dataset (in the columns of the table). As such, the size of a dataset depends on two things: the number of variables and the number of individuals supplying information (a.k.a. respondents). Typically, datasets vary in size from just a handful of variables and few respondents to hundreds of variables and thousands of respondents. (Huge datasets -- comprising information on millions of people -- exist too; these are known as <em>big data<\/em>. Big data is not analyzed in the conventional ways regular datasets are, so from now on we'll leave big data aside as it's not the subject of this book.)\r\n\r\n&nbsp;\r\n\r\nTo start small, imagine you have just four friends at your university and you decide to list some items of information about them (say, maybe you want to compare your standing at the university with theirs, and to see differences and commonalities between you and them). You could do that in a sentence form, for example, thus: Arjun, who is twenty years old, speaks Punjabi at home and is a first year student in the Business School, has a job and his GPA is 3.6. Benjamin, on the other hand, who is 25, speaks German at home and is a third year Science student, also has a job but his GPA is lower than Arjun's at 3.2. Cecilia, who speaks Spanish at home and is a fourth year Health Sciences student doesn't have a paying job and her GPA is the highest of your friends, 4.0. Finally, Xingxing is also a first year student and is employed like Arjun but she is an Arts major, speaks Mandarin at home, and her GPA is 3.3.\r\n\r\n&nbsp;\r\n\r\nIndeed, you might do that but the points of comparison might get lost as they are not easy to see: one has to read very carefully to keep track of who does what and has a GPA of how much. Instead, you could present the same information as it is in the table in Example 2.1 below.\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Example 2.1 (A)\u00a0 A Hypothetical Dataset of Four Friends's Characteristics<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<table style=\"border-collapse: collapse;width: 100%;height: 75px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 14.2857%;height: 15px\"><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Age<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Year at university<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Employment<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>GPA<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Major (by Faculty)<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Language spoken at home<\/strong><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 14.2857%;height: 15px\"><strong>Arjun<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">20<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">1<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">yes<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">3.6<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Business<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Punjabi<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 14.2857%;height: 15px\"><strong>Benjamin<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">25<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">3<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">yes<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">3.2<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Science<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">German<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 14.2857%;height: 15px\"><strong>Cecilia<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">22<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">4<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">no<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">4.0<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Health<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Spanish<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 14.2857%;height: 15px\"><strong>Xingxing<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">19<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">1<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">yes<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">3.3<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Arts<\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Mandarin<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nIf you do that, what you have created is a dataset. Now imagine that instead of this contrived combination of four friends and their varying characteristics, I generalize the example like so:\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Example 2.1 (B) A Hypothetical Dataset of Four Individuals and Six Variables<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<table style=\"border-collapse: collapse;width: 100%;height: 75px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 16.6936%;height: 15px\"><\/td>\r\n<td style=\"width: 11.8778%;height: 15px;text-align: center\"><strong>Variable 1<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 2<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 3<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 4<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 5<\/strong><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 6<\/strong><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 16.6936%;height: 15px\"><strong>Respondent #1<\/strong><\/td>\r\n<td style=\"width: 11.8778%;height: 15px;text-align: center\">Response<sub>1.1<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>2.1<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>3.1<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>4.1<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>5.1<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>6.1<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 16.6936%;height: 15px\"><strong>Respondent #2<\/strong><\/td>\r\n<td style=\"width: 11.8778%;height: 15px;text-align: center\">Response<sub>1.2<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>2.2<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>3.2<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>4.2<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>5.2<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>6.2<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 16.6936%;height: 15px\"><strong>Respondent #3<\/strong><\/td>\r\n<td style=\"width: 11.8778%;height: 15px;text-align: center\">Response<sub>1.3<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>2.3<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>3.3<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>4.3<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>5.3<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>6.3<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 16.6936%;height: 15px\"><strong>Respondent #4<\/strong><\/td>\r\n<td style=\"width: 11.8778%;height: 15px;text-align: center\">Response<sub>1.4<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>2.4<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>3.4<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>4.4<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>5.4<\/sub><\/td>\r\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>6.4<\/sub><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nIn Example 2.1 (B), the respondents are the four people on whose varying characteristics we have information, and these are represented by the six variables. This, however, seems rather cumbersome. Instead of \"Variable 3\", and \"Respondent 5\", and \"Response<sub>4.3<\/sub>\", etc., a simpler way to represent all of these in a generalized way is through mathematical notation.[footnote]A note on mathematical notation, about which, I know, many students feel quite anxious: think of notation as a type of shorthand, or a sort of simplified foreign language. It's used to simplify what you can write out in words and sentences but would be too long and not as clear. The key to notation, just like with any foreign language, is to know what the symbols mean. Keep their meaning in mind, and you can read notation as fast and as easily as your own language.[\/footnote]\r\n\r\n&nbsp;\r\n\r\nSo, prepare yourselves! Here comes notation:\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Example 2.1 (C) A Hypothetical Dataset of Four Individuals and Six Variables 2.0<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<table style=\"border-collapse: collapse;width: 74.2857%\" border=\"0\">\r\n<tbody>\r\n<tr>\r\n<td style=\"width: 14.2857%\"><\/td>\r\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>1<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>2<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>3<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>4<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>5<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>6<\/sub><\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 14.2857%\"><strong>I<sub>1<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center\"><sub><span style=\"font-size: 14.4px\">x<\/span>11<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\"><sub><span style=\"font-size: 14.4px\">x<\/span>21<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>31<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>41<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>51<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>61<\/sub><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 14.2857%\"><strong>I<sub>2<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center\"><sub><span style=\"font-size: 14.4px\">x<\/span>12<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>22<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>32<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>42<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>52<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>62<\/sub><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 14.2857%\"><strong>I<sub>3<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center\"><sub><span style=\"font-size: 14.4px\">x<\/span>13<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>23<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>33<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>43<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>53<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>63<\/sub><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 14.2857%\"><strong>I<sub>4<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>14<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>24<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>34<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>44<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>54<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center\">x<sub>64<\/sub><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nIn Example 2.1 (C), <em>I<sub>1<\/sub>, I<sub>2<\/sub>, I<sub>3<\/sub><\/em>, and <em>I<sub>4<\/sub><\/em>\u00a0are the four individuals; <em>X<sub>1<\/sub>, X<sub>2<\/sub>, X<sub>3<\/sub>, X<sub>4<\/sub>, X<sub>5<\/sub><\/em>, and <em>X<sub>6<\/sub><\/em> are the six variables; and <em>x<sub>11<\/sub>, x<sub>12<\/sub><\/em>, etc. stand for any specific characteristic\/response a respondent has on a variable. More specifically, <em>x<sub>53<\/sub><\/em>, for example, is the characteristic that Respondent #3 has on Variable 5. Scrolling up to Example 2.1 (A) will allow you to see that <em>x<sub>53<\/sub><\/em> is <em>Health<\/em>, which is Cecilia's Major by Faculty.\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Do It!<\/em> <em>2.1 Reading Points of Information\u00a0<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nIn a similar vein, look up <em>x<sub>22<\/sub>, x<sub>34<\/sub>,<\/em> and <em>x<sub>61<\/sub><\/em>. It's a simple and easy task but it will help you connect notation to what it stands for, and to understand the logic underlying the way information is presented in datasets.\r\n\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nFrom here, it's not difficult to extrapolate the specific dataset we had above to a general one. Thus, Example 2.1 (D) below presents a template of a typical dataset.\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Example 2.1 (D) A Hypothetical Dataset of N Individuals and K Variables<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<table style=\"border-collapse: collapse;width: 100%;height: 146px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 15px\"><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>1<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>2<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>3<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>4<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>5<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>6<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>7<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>...<\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>K<\/sub><\/strong><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>1<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>11<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>21<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>31<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>41<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>51<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>61<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>71<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k1<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>2<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>12<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>22<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>32<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>42<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>52<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>62<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>72<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k2<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>3<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>13<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>23<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>33<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>43<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>53<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>63<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>73<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k3<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>4<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>14<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>24<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>34<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>44<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>54<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>64<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>74<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k4<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 11px\"><strong>I<sub>5<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>15<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>25<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>35<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>45<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>55<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>65<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>75<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 11px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>k5<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>6<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>16<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>26<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>36<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>46<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>56<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>66<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>76<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k6<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>7<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>17<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>27<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>37<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>47<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>57<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>67<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>77<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k7<\/sub><\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 15px\"><strong>...<\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<\/tr>\r\n<tr style=\"height: 15px\">\r\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>N<\/sub><\/strong><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>1n<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>2n<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>3n<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>4n<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>5n<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>6n<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>7n<\/sub><\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">...<\/td>\r\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>kn<\/sub><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<em>N\u00a0<\/em>= number of elements in the dataset\r\n\r\n<em>K\u00a0<\/em>= number of variables in the dataset\r\n\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\nIn the table above, you may think of <em>N<\/em> as the last row on the table, i.e., the last individual for whom we have information and you may think of <em>K<\/em> as the last column on the table, i.e., the last variable we have in the dataset. Both numbers can theoretically be \"any positive number\", though in practice the former is usually a number up to several thousands and the latter a number up to few hundreds. The ellipses in the next-to-last row and the next-to-last column indicate that the table is truncated:\u00a0 \u00a0there are omitted rows between the seventh and the last individuals (i.e., between <em>I<sub>7<\/sub><\/em> and <em>I<sub>N<\/sub><\/em>), and omitted columns between the seventh and the last variables (i.e., between <em>X<sub>7<\/sub><\/em> and <em>X<sub>K<\/sub><\/em>). (They obviously have to be omitted so that the table can fit on the page.)\r\n\r\n&nbsp;\r\n\r\nArmed with this knowledge, let's take a look at an excerpt from a real dataset. The following Example 2.1 (E) provides a snapshot of the first ten respondents and first nine variables in the <em>Aboriginal Peoples Survey 2012<\/em>\u00a0<span style=\"text-indent: 37.3333px;font-size: 14pt\">dataset\u00a0<\/span><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">(or <\/span><em style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">APS 2012\u00a0<\/em><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">for short)[footnote]APS 2012 is a Statistics Canada dataset which I will formally introduce in <span style=\"color: #ffff00\"><span style=\"color: #000000;background-color: #ffff00\">Ch. XX<\/span>.<\/span>[\/footnote] using a software called <\/span><em style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">IBM\u00ae Statistical Package for the Social Sciences<\/em><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">, commonly referred to as SPSS.\u00a0<\/span>\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Example 2.1 (E) A Snapshot of Survey Data (APS 2012<\/em><em>)<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n&nbsp;\r\n\r\nSnapshot of <em>APS 2012<\/em>'s <em>Data View\u00a0<\/em>in SPSS:\r\n\r\n&nbsp;\r\n\r\n<span style=\"font-size: 14px\"><img class=\"wp-image-1465 size-full alignleft\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-data-view.png\" alt=\"\" width=\"874\" height=\"261\" \/><\/span>\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n&nbsp;\r\n\r\n<span style=\"font-size: 1rem;text-indent: 1em\">Snapshot of <em>APS 2012<\/em>'s <em>Variable View<\/em> in SPSS:<\/span>\r\n\r\n&nbsp;\r\n\r\n<img class=\"aligncenter wp-image-1464\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-variable-view.png\" alt=\"\" width=\"1033\" height=\"218\" \/>\r\n\r\n&nbsp;\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em>Do It!<\/em> <em>2.2\u00a0 Understanding How Datasets Are Organized<\/em><\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n&nbsp;\r\n\r\nMake sure you can connect the data snapshots from the example above with your understanding of how datasets are organized. What do the numbers in the first (blue) columns in both images represent? (Hint: this is not a variable!) What is listed in the first (blue) row in the top image?\u00a0In the top image what does 1 stand for in the first white row in column <em>ID_03G<\/em>? How about the 1 in the fifth row in the <em>SEX<\/em> column?\r\n\r\n<\/div>\r\n<sub>Answer:<em> Registered\/Status Indian<\/em> and <em>male<\/em>, respectively.<\/sub>\r\n\r\n&nbsp;\r\n\r\n<\/div>\r\n&nbsp;\r\n\r\nOne thing you might find surprising is the obvious fact that all cell entries (i.e., the observations we have) are listed in a number format. Does that mean that all variables in this particular dataset are interval or ratio? What about any nominal or ordinal variables - do they not exist in this dataset? The answer is <em>no<\/em>\u00a0on both accounts: the variable\u00a0<em>SEX<\/em> (i.e., \"<em>Sex of respondent\" <\/em>as stated in <em>Variable View<\/em>) is nominal and the variable\u00a0<em>AGE_YRSG<\/em> (i.e., \"Age group of respondent...\"<strong>)<\/strong>\u00a0is ordinal because of the hierarchical arrangement of the responses.\u00a0\u00a0<strong>However, the dataset cells contain only numbers because statistical software can only analyze numerical data.<\/strong>\r\n\r\n&nbsp;\r\n\r\nTo that effect, nominal and ordinal variables appear \"in code\" in datasets; i.e., <strong>the categories of nominal and ordinal variables are assigned numerical values as <em>labels<\/em> to represent them<\/strong> in the actual dataset you might be working with. Thus, the numbers in nominal and ordinal variables' columns are not <em>actual numbers,\u00a0<\/em>they are artificially (and in the case of nominal variables, somewhat arbitrarily) assigned to represent the words contained in the categories in order to make computer-based statistical analysis possible. (On the other hand, interval\/ratio variables' categories contain <em>actual numbers.<\/em>\u00a0Of course, the trick then is to learn to differentiate the actual numbers from the code\/ number values used as labels <span style=\"text-indent: 18.6667px;font-size: 14pt\">in the cells of a dataset.)<\/span>\r\n\r\n&nbsp;\r\n\r\nTherefore, you should always keep track of the code (see the Watch Out! panel below for tips on <em>Variable View<\/em> in SPSS which allows you to do that), and remember to refer to the categories by their proper (word-based) names -- not by the artificial numerical values (i.e., code) representing them!\r\n\r\n&nbsp;\r\n<div class=\"textbox textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\"><em><strong><span style=\"color: #ff0000\">Watch Out! #2\u00a0<\/span><\/strong>...for\u00a0 Making Hasty Decisions about Variables Based <\/em>Only<em> on Data View or <\/em>Only <em>on Variable View<\/em><\/p>\r\n\r\n<\/header>&nbsp;\r\n\r\nIt's tempting, but you cannot deduce <em>all<\/em> categories of a variable with any certainty just by looking at the snapshot in Example 2.1 (E). You cannot do that even if, instead of a snapshot, you had the real, interactive <em>Data View<\/em> window in SPSS in front of you.\u00a0 Not only you might not be able to scroll through all the data (depending on its size) but, more importantly, not all characteristics might exist among the individuals. (For example, imagine the variable <em>hair colour<\/em>, and say, not one respondent having red hair: then a response \"red\" would not be visible in <em>Data View<\/em>, even if such a category existed in the variable.) For the same reasons you should also not decide a variable's level of measurement based on <em>Data View<\/em>. (Remember, all data in the cells appears in numerical format, regardless if it's an actual number or just a value label\/code!)\r\n\r\n&nbsp;\r\n\r\nTo explore any dataset you might end up working with and all the variables contained therein, you should always look to explore not only the <em>Data View<\/em> but the <em>Variable View<\/em> of the dataset as well (in SPSS you can toggle between Data View and Variable View easily with a click of the mouse). The <em>Variable View<\/em> lists all variables along with some information about them -- including something which <em>looks like<\/em> their level of measurement, called <em>Measure<\/em> (it is not included in the bottom snapshot above).\u00a0 <strong>The <em>Measure<\/em>\u00a0information can be quite misleading for students so: Never trust this software-generated conclusion!<\/strong>\r\n\r\n&nbsp;\r\n\r\nInstead, you should always explore <em>both<\/em> <em>Variable View<\/em> and <em>Data View<\/em>. You should note the variables' respective categories (in <em>Variable View,\u00a0<\/em>where you can click on any cell in the <em>Values<\/em> column for a full category listing) and the type of the observations you have in the cells in the table (in <em>Data View<\/em>). Then --and <em>only<\/em> then -- reach the appropriate conclusion about the levels of measurement of the variables you have at hand.\r\n\r\n&nbsp;\r\n\r\nWhat should guide your decision about a variable's level of measurement is what you see in the <em>Values<\/em> column in<em> Data View<\/em>. To repeat, clicking on the respective column will open up a window displaying the (nominal or ordinal) variable's categories\/values along with the number label representing them in the dataset.\r\n\r\n&nbsp;\r\n\r\nAgain, note that reporting on the variable should be done by using its categories\/values, never by the number label you see in <em>Variable View<\/em> standing in for them! This point will become more relevant and less abstract once we start learning what to do with variables, in Chapter 3.\r\n\r\n&nbsp;\r\n\r\n<\/div>","rendered":"<p>&nbsp;<\/p>\n<p>By now you have learned that <em>variables<\/em> are tools that allow us to measure concepts and to collect information about them. As such they are comprised of information &#8212; information that varies across the <em>units of analysis<\/em> (the &#8216;things&#8217; on which we collect information, be it people, organizations, countries, etc.). So far, we have discussed individual variables &#8211; but creating and collecting information on a single variable is uncommon. Generally, we collect information on many variables at the same time (which, in turn, allows us to analyze variables together and hypothesize about possible associations between variables).<\/p>\n<p>&nbsp;<\/p>\n<p>Variables &#8220;live&#8221; in data sets (or datasets, as I prefer; both usages are common). <strong>A <em>dataset<\/em> is a collection of variables that lists the information (or observations) gathered on them from the units of analysis.<\/strong> As usual, I focus on analysis of people for simplicity&#8217;s sake (but do keep in mind the units of analysis can be something else.)<\/p>\n<p>&nbsp;<\/p>\n<p>The best way to visualize\u00a0a dataset is as a sort of a table (a.k.a a <em>matrix<\/em>) which summarizes the responses from every individual (in the rows of the table) on the variables in the dataset (in the columns of the table). As such, the size of a dataset depends on two things: the number of variables and the number of individuals supplying information (a.k.a. respondents). Typically, datasets vary in size from just a handful of variables and few respondents to hundreds of variables and thousands of respondents. (Huge datasets &#8212; comprising information on millions of people &#8212; exist too; these are known as <em>big data<\/em>. Big data is not analyzed in the conventional ways regular datasets are, so from now on we&#8217;ll leave big data aside as it&#8217;s not the subject of this book.)<\/p>\n<p>&nbsp;<\/p>\n<p>To start small, imagine you have just four friends at your university and you decide to list some items of information about them (say, maybe you want to compare your standing at the university with theirs, and to see differences and commonalities between you and them). You could do that in a sentence form, for example, thus: Arjun, who is twenty years old, speaks Punjabi at home and is a first year student in the Business School, has a job and his GPA is 3.6. Benjamin, on the other hand, who is 25, speaks German at home and is a third year Science student, also has a job but his GPA is lower than Arjun&#8217;s at 3.2. Cecilia, who speaks Spanish at home and is a fourth year Health Sciences student doesn&#8217;t have a paying job and her GPA is the highest of your friends, 4.0. Finally, Xingxing is also a first year student and is employed like Arjun but she is an Arts major, speaks Mandarin at home, and her GPA is 3.3.<\/p>\n<p>&nbsp;<\/p>\n<p>Indeed, you might do that but the points of comparison might get lost as they are not easy to see: one has to read very carefully to keep track of who does what and has a GPA of how much. Instead, you could present the same information as it is in the table in Example 2.1 below.<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Example 2.1 (A)\u00a0 A Hypothetical Dataset of Four Friends&#8217;s Characteristics<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<table style=\"border-collapse: collapse;width: 100%;height: 75px\">\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"width: 14.2857%;height: 15px\"><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Age<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Year at university<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Employment<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>GPA<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Major (by Faculty)<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Language spoken at home<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 14.2857%;height: 15px\"><strong>Arjun<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">20<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">1<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">yes<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">3.6<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Business<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Punjabi<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 14.2857%;height: 15px\"><strong>Benjamin<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">25<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">3<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">yes<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">3.2<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Science<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">German<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 14.2857%;height: 15px\"><strong>Cecilia<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">22<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">4<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">no<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">4.0<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Health<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Spanish<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 14.2857%;height: 15px\"><strong>Xingxing<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">19<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">1<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">yes<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">3.3<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Arts<\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Mandarin<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>If you do that, what you have created is a dataset. Now imagine that instead of this contrived combination of four friends and their varying characteristics, I generalize the example like so:<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Example 2.1 (B) A Hypothetical Dataset of Four Individuals and Six Variables<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<table style=\"border-collapse: collapse;width: 100%;height: 75px\">\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"width: 16.6936%;height: 15px\"><\/td>\n<td style=\"width: 11.8778%;height: 15px;text-align: center\"><strong>Variable 1<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 2<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 3<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 4<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 5<\/strong><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\"><strong>Variable 6<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 16.6936%;height: 15px\"><strong>Respondent #1<\/strong><\/td>\n<td style=\"width: 11.8778%;height: 15px;text-align: center\">Response<sub>1.1<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>2.1<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>3.1<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>4.1<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>5.1<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>6.1<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 16.6936%;height: 15px\"><strong>Respondent #2<\/strong><\/td>\n<td style=\"width: 11.8778%;height: 15px;text-align: center\">Response<sub>1.2<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>2.2<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>3.2<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>4.2<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>5.2<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>6.2<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 16.6936%;height: 15px\"><strong>Respondent #3<\/strong><\/td>\n<td style=\"width: 11.8778%;height: 15px;text-align: center\">Response<sub>1.3<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>2.3<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>3.3<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>4.3<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>5.3<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>6.3<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 16.6936%;height: 15px\"><strong>Respondent #4<\/strong><\/td>\n<td style=\"width: 11.8778%;height: 15px;text-align: center\">Response<sub>1.4<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>2.4<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>3.4<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>4.4<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>5.4<\/sub><\/td>\n<td style=\"width: 14.2857%;height: 15px;text-align: center\">Response<sub>6.4<\/sub><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>In Example 2.1 (B), the respondents are the four people on whose varying characteristics we have information, and these are represented by the six variables. This, however, seems rather cumbersome. Instead of &#8220;Variable 3&#8221;, and &#8220;Respondent 5&#8221;, and &#8220;Response<sub>4.3<\/sub>&#8220;, etc., a simpler way to represent all of these in a generalized way is through mathematical notation.<a class=\"footnote\" title=\"A note on mathematical notation, about which, I know, many students feel quite anxious: think of notation as a type of shorthand, or a sort of simplified foreign language. It's used to simplify what you can write out in words and sentences but would be too long and not as clear. The key to notation, just like with any foreign language, is to know what the symbols mean. Keep their meaning in mind, and you can read notation as fast and as easily as your own language.\" id=\"return-footnote-57-1\" href=\"#footnote-57-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>So, prepare yourselves! Here comes notation:<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Example 2.1 (C) A Hypothetical Dataset of Four Individuals and Six Variables 2.0<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<table style=\"border-collapse: collapse;width: 74.2857%\">\n<tbody>\n<tr>\n<td style=\"width: 14.2857%\"><\/td>\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>1<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>2<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>3<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>4<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>5<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center\"><strong>X<sub>6<\/sub><\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 14.2857%\"><strong>I<sub>1<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center\"><sub><span style=\"font-size: 14.4px\">x<\/span>11<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\"><sub><span style=\"font-size: 14.4px\">x<\/span>21<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>31<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>41<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>51<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>61<\/sub><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 14.2857%\"><strong>I<sub>2<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center\"><sub><span style=\"font-size: 14.4px\">x<\/span>12<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>22<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>32<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>42<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>52<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>62<\/sub><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 14.2857%\"><strong>I<sub>3<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center\"><sub><span style=\"font-size: 14.4px\">x<\/span>13<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>23<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>33<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>43<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>53<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>63<\/sub><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 14.2857%\"><strong>I<sub>4<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>14<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>24<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>34<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>44<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>54<\/sub><\/td>\n<td style=\"width: 10%;text-align: center\">x<sub>64<\/sub><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>In Example 2.1 (C), <em>I<sub>1<\/sub>, I<sub>2<\/sub>, I<sub>3<\/sub><\/em>, and <em>I<sub>4<\/sub><\/em>\u00a0are the four individuals; <em>X<sub>1<\/sub>, X<sub>2<\/sub>, X<sub>3<\/sub>, X<sub>4<\/sub>, X<sub>5<\/sub><\/em>, and <em>X<sub>6<\/sub><\/em> are the six variables; and <em>x<sub>11<\/sub>, x<sub>12<\/sub><\/em>, etc. stand for any specific characteristic\/response a respondent has on a variable. More specifically, <em>x<sub>53<\/sub><\/em>, for example, is the characteristic that Respondent #3 has on Variable 5. Scrolling up to Example 2.1 (A) will allow you to see that <em>x<sub>53<\/sub><\/em> is <em>Health<\/em>, which is Cecilia&#8217;s Major by Faculty.<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Do It!<\/em> <em>2.1 Reading Points of Information\u00a0<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>In a similar vein, look up <em>x<sub>22<\/sub>, x<sub>34<\/sub>,<\/em> and <em>x<sub>61<\/sub><\/em>. It&#8217;s a simple and easy task but it will help you connect notation to what it stands for, and to understand the logic underlying the way information is presented in datasets.<\/p>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>From here, it&#8217;s not difficult to extrapolate the specific dataset we had above to a general one. Thus, Example 2.1 (D) below presents a template of a typical dataset.<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Example 2.1 (D) A Hypothetical Dataset of N Individuals and K Variables<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<table style=\"border-collapse: collapse;width: 100%;height: 146px\">\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 15px\"><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>1<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>2<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>3<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>4<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>5<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>6<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>7<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>&#8230;<\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\"><strong>X<sub>K<\/sub><\/strong><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>1<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>11<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>21<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>31<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>41<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>51<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>61<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>71<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k1<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>2<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>12<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>22<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>32<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>42<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>52<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>62<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>72<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k2<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>3<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>13<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>23<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>33<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>43<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>53<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>63<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>73<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k3<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>4<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>14<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>24<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>34<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>44<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>54<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>64<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>74<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k4<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 11px\"><strong>I<sub>5<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>15<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>25<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>35<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>45<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>55<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>65<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>75<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 11px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 11px\">x<sub>k5<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>6<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>16<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>26<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>36<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>46<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>56<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>66<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>76<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k6<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>7<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>17<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>27<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>37<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>47<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>57<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>67<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>77<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>k7<\/sub><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 15px\"><strong>&#8230;<\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"width: 10%;height: 15px\"><strong>I<sub>N<\/sub><\/strong><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>1n<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>2n<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>3n<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>4n<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>5n<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>6n<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>7n<\/sub><\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">&#8230;<\/td>\n<td style=\"width: 10%;text-align: center;height: 15px\">x<sub>kn<\/sub><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><em>N\u00a0<\/em>= number of elements in the dataset<\/p>\n<p><em>K\u00a0<\/em>= number of variables in the dataset<\/p>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p>In the table above, you may think of <em>N<\/em> as the last row on the table, i.e., the last individual for whom we have information and you may think of <em>K<\/em> as the last column on the table, i.e., the last variable we have in the dataset. Both numbers can theoretically be &#8220;any positive number&#8221;, though in practice the former is usually a number up to several thousands and the latter a number up to few hundreds. The ellipses in the next-to-last row and the next-to-last column indicate that the table is truncated:\u00a0 \u00a0there are omitted rows between the seventh and the last individuals (i.e., between <em>I<sub>7<\/sub><\/em> and <em>I<sub>N<\/sub><\/em>), and omitted columns between the seventh and the last variables (i.e., between <em>X<sub>7<\/sub><\/em> and <em>X<sub>K<\/sub><\/em>). (They obviously have to be omitted so that the table can fit on the page.)<\/p>\n<p>&nbsp;<\/p>\n<p>Armed with this knowledge, let&#8217;s take a look at an excerpt from a real dataset. The following Example 2.1 (E) provides a snapshot of the first ten respondents and first nine variables in the <em>Aboriginal Peoples Survey 2012<\/em>\u00a0<span style=\"text-indent: 37.3333px;font-size: 14pt\">dataset\u00a0<\/span><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">(or <\/span><em style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">APS 2012\u00a0<\/em><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">for short)<a class=\"footnote\" title=\"APS 2012 is a Statistics Canada dataset which I will formally introduce in Ch. XX.\" id=\"return-footnote-57-2\" href=\"#footnote-57-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a> using a software called <\/span><em style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">IBM\u00ae Statistical Package for the Social Sciences<\/em><span style=\"text-align: initial;text-indent: 2em;font-size: 14pt\">, commonly referred to as SPSS.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Example 2.1 (E) A Snapshot of Survey Data (APS 2012<\/em><em>)<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>&nbsp;<\/p>\n<p>Snapshot of <em>APS 2012<\/em>&#8216;s <em>Data View\u00a0<\/em>in SPSS:<\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-size: 14px\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-1465 size-full alignleft\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-data-view.png\" alt=\"\" width=\"874\" height=\"261\" srcset=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-data-view.png 874w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-data-view-300x90.png 300w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-data-view-768x229.png 768w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-data-view-65x19.png 65w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-data-view-225x67.png 225w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-data-view-350x105.png 350w\" sizes=\"auto, (max-width: 874px) 100vw, 874px\" \/><\/span><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-size: 1rem;text-indent: 1em\">Snapshot of <em>APS 2012<\/em>&#8216;s <em>Variable View<\/em> in SPSS:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-1464\" src=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-variable-view.png\" alt=\"\" width=\"1033\" height=\"218\" srcset=\"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-variable-view.png 1043w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-variable-view-300x63.png 300w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-variable-view-768x162.png 768w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-variable-view-1024x216.png 1024w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-variable-view-65x14.png 65w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-variable-view-225x47.png 225w, https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-content\/uploads\/sites\/564\/2019\/08\/data-snapshot-variable-view-350x74.png 350w\" sizes=\"auto, (max-width: 1033px) 100vw, 1033px\" \/><\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em>Do It!<\/em> <em>2.2\u00a0 Understanding How Datasets Are Organized<\/em><\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p>&nbsp;<\/p>\n<p>Make sure you can connect the data snapshots from the example above with your understanding of how datasets are organized. What do the numbers in the first (blue) columns in both images represent? (Hint: this is not a variable!) What is listed in the first (blue) row in the top image?\u00a0In the top image what does 1 stand for in the first white row in column <em>ID_03G<\/em>? How about the 1 in the fifth row in the <em>SEX<\/em> column?<\/p>\n<\/div>\n<p><sub>Answer:<em> Registered\/Status Indian<\/em> and <em>male<\/em>, respectively.<\/sub><\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p>One thing you might find surprising is the obvious fact that all cell entries (i.e., the observations we have) are listed in a number format. Does that mean that all variables in this particular dataset are interval or ratio? What about any nominal or ordinal variables &#8211; do they not exist in this dataset? The answer is <em>no<\/em>\u00a0on both accounts: the variable\u00a0<em>SEX<\/em> (i.e., &#8220;<em>Sex of respondent&#8221; <\/em>as stated in <em>Variable View<\/em>) is nominal and the variable\u00a0<em>AGE_YRSG<\/em> (i.e., &#8220;Age group of respondent&#8230;&#8221;<strong>)<\/strong>\u00a0is ordinal because of the hierarchical arrangement of the responses.\u00a0\u00a0<strong>However, the dataset cells contain only numbers because statistical software can only analyze numerical data.<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>To that effect, nominal and ordinal variables appear &#8220;in code&#8221; in datasets; i.e., <strong>the categories of nominal and ordinal variables are assigned numerical values as <em>labels<\/em> to represent them<\/strong> in the actual dataset you might be working with. Thus, the numbers in nominal and ordinal variables&#8217; columns are not <em>actual numbers,\u00a0<\/em>they are artificially (and in the case of nominal variables, somewhat arbitrarily) assigned to represent the words contained in the categories in order to make computer-based statistical analysis possible. (On the other hand, interval\/ratio variables&#8217; categories contain <em>actual numbers.<\/em>\u00a0Of course, the trick then is to learn to differentiate the actual numbers from the code\/ number values used as labels <span style=\"text-indent: 18.6667px;font-size: 14pt\">in the cells of a dataset.)<\/span><\/p>\n<p>&nbsp;<\/p>\n<p>Therefore, you should always keep track of the code (see the Watch Out! panel below for tips on <em>Variable View<\/em> in SPSS which allows you to do that), and remember to refer to the categories by their proper (word-based) names &#8212; not by the artificial numerical values (i.e., code) representing them!<\/p>\n<p>&nbsp;<\/p>\n<div class=\"textbox textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\"><em><strong><span style=\"color: #ff0000\">Watch Out! #2\u00a0<\/span><\/strong>&#8230;for\u00a0 Making Hasty Decisions about Variables Based <\/em>Only<em> on Data View or <\/em>Only <em>on Variable View<\/em><\/p>\n<\/header>\n<p>&nbsp;<\/p>\n<p>It&#8217;s tempting, but you cannot deduce <em>all<\/em> categories of a variable with any certainty just by looking at the snapshot in Example 2.1 (E). You cannot do that even if, instead of a snapshot, you had the real, interactive <em>Data View<\/em> window in SPSS in front of you.\u00a0 Not only you might not be able to scroll through all the data (depending on its size) but, more importantly, not all characteristics might exist among the individuals. (For example, imagine the variable <em>hair colour<\/em>, and say, not one respondent having red hair: then a response &#8220;red&#8221; would not be visible in <em>Data View<\/em>, even if such a category existed in the variable.) For the same reasons you should also not decide a variable&#8217;s level of measurement based on <em>Data View<\/em>. (Remember, all data in the cells appears in numerical format, regardless if it&#8217;s an actual number or just a value label\/code!)<\/p>\n<p>&nbsp;<\/p>\n<p>To explore any dataset you might end up working with and all the variables contained therein, you should always look to explore not only the <em>Data View<\/em> but the <em>Variable View<\/em> of the dataset as well (in SPSS you can toggle between Data View and Variable View easily with a click of the mouse). The <em>Variable View<\/em> lists all variables along with some information about them &#8212; including something which <em>looks like<\/em> their level of measurement, called <em>Measure<\/em> (it is not included in the bottom snapshot above).\u00a0 <strong>The <em>Measure<\/em>\u00a0information can be quite misleading for students so: Never trust this software-generated conclusion!<\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>Instead, you should always explore <em>both<\/em> <em>Variable View<\/em> and <em>Data View<\/em>. You should note the variables&#8217; respective categories (in <em>Variable View,\u00a0<\/em>where you can click on any cell in the <em>Values<\/em> column for a full category listing) and the type of the observations you have in the cells in the table (in <em>Data View<\/em>). Then &#8211;and <em>only<\/em> then &#8212; reach the appropriate conclusion about the levels of measurement of the variables you have at hand.<\/p>\n<p>&nbsp;<\/p>\n<p>What should guide your decision about a variable&#8217;s level of measurement is what you see in the <em>Values<\/em> column in<em> Data View<\/em>. To repeat, clicking on the respective column will open up a window displaying the (nominal or ordinal) variable&#8217;s categories\/values along with the number label representing them in the dataset.<\/p>\n<p>&nbsp;<\/p>\n<p>Again, note that reporting on the variable should be done by using its categories\/values, never by the number label you see in <em>Variable View<\/em> standing in for them! This point will become more relevant and less abstract once we start learning what to do with variables, in Chapter 3.<\/p>\n<p>&nbsp;<\/p>\n<\/div>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-57-1\">A note on mathematical notation, about which, I know, many students feel quite anxious: think of notation as a type of shorthand, or a sort of simplified foreign language. It's used to simplify what you can write out in words and sentences but would be too long and not as clear. The key to notation, just like with any foreign language, is to know what the symbols mean. Keep their meaning in mind, and you can read notation as fast and as easily as your own language. <a href=\"#return-footnote-57-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-57-2\">APS 2012 is a Statistics Canada dataset which I will formally introduce in <span style=\"color: #ffff00\"><span style=\"color: #000000;background-color: #ffff00\">Ch. XX<\/span>.<\/span> <a href=\"#return-footnote-57-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":533,"menu_order":1,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-57","chapter","type-chapter","status-publish","hentry"],"part":323,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/57","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/users\/533"}],"version-history":[{"count":25,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/57\/revisions"}],"predecessor-version":[{"id":2174,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/57\/revisions\/2174"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/parts\/323"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapters\/57\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/media?parent=57"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/pressbooks\/v2\/chapter-type?post=57"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/contributor?post=57"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/simplestats\/wp-json\/wp\/v2\/license?post=57"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}