<?xml version="1.0" encoding="UTF-8" ?>
<!-- This is a WordPress eXtended RSS file generated by WordPress as an export of your site. -->
<!-- It contains information about your site's posts, pages, comments, categories, and other content. -->
<!-- You may use this file to transfer that content from one site to another. -->
<!-- This file is not intended to serve as a complete backup of your site. -->

<!-- To import this information into a WordPress site follow these steps: -->
<!-- 1. Log in to that site as an administrator. -->
<!-- 2. Go to Tools: Import in the WordPress admin panel. -->
<!-- 3. Install the "WordPress" importer from the list. -->
<!-- 4. Activate & Run Importer. -->
<!-- 5. Upload this file using the form provided on that page. -->
<!-- 6. You will first be asked to map the authors in this export file to users -->
<!--    on the site. For each author, you may choose to map to an -->
<!--    existing user on the site or to create a new user. -->
<!-- 7. WordPress will then import each of the posts, pages, comments, categories, etc. -->
<!--    contained in this file into your site. -->

<!-- generator="WordPress/5.0.2" created="2019-09-03 22:50" -->
<rss version="2.0"
	xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:wp="http://wordpress.org/export/1.2/"
>

<channel>
	<title>Simple Stats Tools</title>
	<link>https://pressbooks.bccampus.ca/simplestats</link>
	<description>Open Textbook</description>
	<pubDate>Tue, 03 Sep 2019 22:50:03 +0000</pubDate>
	<language>en-CA</language>
	<wp:wxr_version>1.2</wp:wxr_version>
	<wp:base_site_url>http://pressbooks.bccampus.ca/</wp:base_site_url>
	<wp:base_blog_url>https://pressbooks.bccampus.ca/simplestats</wp:base_blog_url>

	<wp:author><wp:author_id>533</wp:author_id><wp:author_login><![CDATA[mariana]]></wp:author_login><wp:author_email><![CDATA[mariana.gatzeva@kpu.ca]]></wp:author_email><wp:author_display_name><![CDATA[Mariana Gatzeva]]></wp:author_display_name><wp:author_first_name><![CDATA[]]></wp:author_first_name><wp:author_last_name><![CDATA[]]></wp:author_last_name></wp:author>

	<wp:category>
		<wp:term_id>1</wp:term_id>
		<wp:category_nicename><![CDATA[uncategorized]]></wp:category_nicename>
		<wp:category_parent><![CDATA[]]></wp:category_parent>
		<wp:cat_name><![CDATA[Uncategorised]]></wp:cat_name>
	</wp:category>
	<wp:term>
		<wp:term_id><![CDATA[23]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[about-the-author]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[About the Author]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[24]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[about-the-publisher]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[About the Publisher]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[2]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[abstracts]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Abstract]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[3]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[acknowledgements]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Acknowledgements]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[25]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[acknowledgements]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Acknowledgements]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[26]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[afterword]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Afterword]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[56]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[license]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[all-rights-reserved]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[All Rights Reserved]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[27]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[appendix]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Appendix]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[28]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[authors-note]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Author's Note]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[29]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[back-of-book-ad]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Back of Book Ad]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[4]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[before-title]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Before Title Page]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[30]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[bibliography]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Bibliography]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[31]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[biographical-note]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Biographical Note]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[50]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[license]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[cc-by]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[CC BY (Attribution)]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[53]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[license]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[cc-by-nc]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[CC BY-NC (Attribution NonCommercial)]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[55]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[license]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[cc-by-nc-nd]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[CC BY-NC-ND (Attribution NonCommercial NoDerivatives)]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[54]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[license]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[cc-by-nc-sa]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[CC BY-NC-SA (Attribution NonCommercial ShareAlike)]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[52]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[license]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[cc-by-nd]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[CC BY-ND (Attribution NoDerivatives)]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[51]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[license]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[cc-by-sa]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[CC BY-SA (Attribution ShareAlike)]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[58]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[license]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[cc-zero]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[CC0 (Creative Commons Zero)]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[5]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[chronology-timeline]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Chronology, Timeline]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[32]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[colophon]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Colophon]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[33]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[conclusion]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Conclusion]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[34]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[credits]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Credits]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[6]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[dedication]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Dedication]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[35]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[dedication]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Dedication]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[7]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[disclaimer]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Disclaimer]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[8]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[epigraph]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Epigraph]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[36]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[epilogue]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Epilogue]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[9]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[foreword]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Foreword]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[10]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[genealogy-family-tree]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Genealogy, Family Tree]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[37]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[glossary]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Glossary]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[11]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[image-credits]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Image credits]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[38]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[index]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Index]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[12]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[introduction]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Introduction]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[13]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[list-of-abbreviations]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[List of Abbreviations]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[14]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[list-of-characters]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[List of Characters]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[15]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[list-of-illustrations]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[List of Illustrations]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[16]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[list-of-tables]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[List of Tables]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[57]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[contributor]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[mariana-gatzeva]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Mariana Gatzeva]]></wp:term_name>
		<wp:termmeta>
			<wp:meta_key><![CDATA[contributor_first_name]]></wp:meta_key>
			<wp:meta_value><![CDATA[]]></wp:meta_value>
		</wp:termmeta>
		<wp:termmeta>
			<wp:meta_key><![CDATA[contributor_last_name]]></wp:meta_key>
			<wp:meta_value><![CDATA[]]></wp:meta_value>
		</wp:termmeta>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[17]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[miscellaneous]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Miscellaneous]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[39]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[miscellaneous]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Miscellaneous]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[40]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[notes]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Notes]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[48]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[chapter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[numberless]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Numberless]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[18]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[other-books]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Other Books by Author]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[41]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[other-books]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Other Books by Author]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[42]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[permissions]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Permissions]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[19]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[preface]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Preface]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[20]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[prologue]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Prologue]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[49]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[license]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[public-domain]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Public Domain]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[43]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[reading-group-guide]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Reading Group Guide]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[21]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[recommended-citation]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Recommended citation]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[44]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[resources]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Resources]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[45]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[sources]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Sources]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[47]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[chapter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[standard]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Standard]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[46]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[back-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[suggested-reading]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Suggested Reading]]></wp:term_name>
	</wp:term>
	<wp:term>
		<wp:term_id><![CDATA[22]]></wp:term_id>
		<wp:term_taxonomy><![CDATA[front-matter-type]]></wp:term_taxonomy>
		<wp:term_slug><![CDATA[title-page]]></wp:term_slug>
		<wp:term_parent><![CDATA[]]></wp:term_parent>
		<wp:term_name><![CDATA[Title Page]]></wp:term_name>
	</wp:term>

	<generator>https://wordpress.org/?v=5.0.2</generator>

	<item>
		<title>FR for median, household size GSS 2016</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/fr-for-median-household-size-gss-2016/</link>
		<pubDate>Thu, 31 Jan 2019 23:43:31 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/FR-for-median-household-size-GSS-2016.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>381</wp:post_id>
		<wp:post_date><![CDATA[2019-01-31 18:43:31]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-01-31 23:43:31]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[fr-for-median-household-size-gss-2016]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>70</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/FR-for-median-household-size-GSS-2016.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/01/FR-for-median-household-size-GSS-2016.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:551;s:6:"height";i:246;s:4:"file";s:49:"2019/01/FR-for-median-household-size-GSS-2016.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:49:"FR-for-median-household-size-GSS-2016-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:49:"FR-for-median-household-size-GSS-2016-300x134.png";s:5:"width";i:300;s:6:"height";i:134;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:47:"FR-for-median-household-size-GSS-2016-65x29.png";s:5:"width";i:65;s:6:"height";i:29;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:49:"FR-for-median-household-size-GSS-2016-225x100.png";s:5:"width";i:225;s:6:"height";i:100;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:49:"FR-for-median-household-size-GSS-2016-350x156.png";s:5:"width";i:350;s:6:"height";i:156;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>FT for Median, household size</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/ft-for-median-household-size/</link>
		<pubDate>Thu, 31 Jan 2019 23:51:11 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/FT-for-Median-household-size.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>382</wp:post_id>
		<wp:post_date><![CDATA[2019-01-31 18:51:11]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-01-31 23:51:11]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[ft-for-median-household-size]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>70</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/FT-for-Median-household-size.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/01/FT-for-Median-household-size.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:551;s:6:"height";i:246;s:4:"file";s:40:"2019/01/FT-for-Median-household-size.jpg";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:40:"FT-for-Median-household-size-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:40:"FT-for-Median-household-size-300x134.jpg";s:5:"width";i:300;s:6:"height";i:134;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:38:"FT-for-Median-household-size-65x29.jpg";s:5:"width";i:65;s:6:"height";i:29;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:40:"FT-for-Median-household-size-225x100.jpg";s:5:"width";i:225;s:6:"height";i:100;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:40:"FT-for-Median-household-size-350x156.jpg";s:5:"width";i:350;s:6:"height";i:156;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>FT for Median, household size</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/ft-for-median-household-size-2/</link>
		<pubDate>Fri, 01 Feb 2019 01:49:40 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/FT-for-Median-household-size-1.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>402</wp:post_id>
		<wp:post_date><![CDATA[2019-01-31 20:49:40]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-02-01 01:49:40]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[ft-for-median-household-size-2]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>70</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/FT-for-Median-household-size-1.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/01/FT-for-Median-household-size-1.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:590;s:6:"height";i:263;s:4:"file";s:42:"2019/01/FT-for-Median-household-size-1.jpg";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:42:"FT-for-Median-household-size-1-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:42:"FT-for-Median-household-size-1-300x134.jpg";s:5:"width";i:300;s:6:"height";i:134;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:40:"FT-for-Median-household-size-1-65x29.jpg";s:5:"width";i:65;s:6:"height";i:29;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:42:"FT-for-Median-household-size-1-225x100.jpg";s:5:"width";i:225;s:6:"height";i:100;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:42:"FT-for-Median-household-size-1-350x156.jpg";s:5:"width";i:350;s:6:"height";i:156;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:15:"Mariana Gatzeva";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>test</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/test/</link>
		<pubDate>Fri, 01 Feb 2019 01:55:12 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/test.pdf</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>403</wp:post_id>
		<wp:post_date><![CDATA[2019-01-31 20:55:12]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-02-01 01:55:12]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[test]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>70</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/test.pdf]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/01/test.pdf]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:1:{s:5:"sizes";a:4:{s:9:"thumbnail";a:4:{s:4:"file";s:20:"test-pdf-116x150.jpg";s:5:"width";i:116;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:20:"test-pdf-232x300.jpg";s:5:"width";i:232;s:6:"height";i:300;s:9:"mime-type";s:10:"image/jpeg";}s:5:"large";a:4:{s:4:"file";s:21:"test-pdf-791x1024.jpg";s:5:"width";i:791;s:6:"height";i:1024;s:9:"mime-type";s:10:"image/jpeg";}s:4:"full";a:4:{s:4:"file";s:12:"test-pdf.jpg";s:5:"width";i:1088;s:6:"height";i:1408;s:9:"mime-type";s:10:"image/jpeg";}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Capture</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/capture/</link>
		<pubDate>Fri, 01 Feb 2019 02:04:34 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/Capture.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>404</wp:post_id>
		<wp:post_date><![CDATA[2019-01-31 21:04:34]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-02-01 02:04:34]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[capture]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>70</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/Capture-e1548986783519.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/01/Capture-e1548986783519.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:578;s:6:"height";i:269;s:4:"file";s:34:"2019/01/Capture-e1548986783519.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:19:"Capture-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:19:"Capture-300x140.png";s:5:"width";i:300;s:6:"height";i:140;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:17:"Capture-65x30.png";s:5:"width";i:65;s:6:"height";i:30;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:19:"Capture-225x105.png";s:5:"width";i:225;s:6:"height";i:105;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:19:"Capture-350x163.png";s:5:"width";i:350;s:6:"height";i:163;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_backup_sizes]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:2:{s:9:"full-orig";a:3:{s:5:"width";i:578;s:6:"height";i:269;s:4:"file";s:11:"Capture.png";}s:18:"full-1548986783519";a:3:{s:5:"width";i:578;s:6:"height";i:269;s:4:"file";s:26:"Capture-e1548986715185.png";}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>FT mean, consulted mental health</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-4-mean/ft-mean-consulted-mental-health/</link>
		<pubDate>Wed, 06 Feb 2019 22:13:18 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/FT-mean-consulted-mental-health.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>455</wp:post_id>
		<wp:post_date><![CDATA[2019-02-06 17:13:18]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-02-06 22:13:18]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[ft-mean-consulted-mental-health]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>72</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/FT-mean-consulted-mental-health.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/02/FT-mean-consulted-mental-health.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:480;s:6:"height";i:451;s:4:"file";s:43:"2019/02/FT-mean-consulted-mental-health.jpg";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:43:"FT-mean-consulted-mental-health-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:43:"FT-mean-consulted-mental-health-300x282.jpg";s:5:"width";i:300;s:6:"height";i:282;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:41:"FT-mean-consulted-mental-health-65x61.jpg";s:5:"width";i:65;s:6:"height";i:61;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:43:"FT-mean-consulted-mental-health-225x211.jpg";s:5:"width";i:225;s:6:"height";i:211;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:43:"FT-mean-consulted-mental-health-350x329.jpg";s:5:"width";i:350;s:6:"height";i:329;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>FT for mean stopped smoking</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-4-mean/ft-for-mean-stopped-smoking/</link>
		<pubDate>Wed, 06 Feb 2019 23:45:19 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/FT-for-mean-stopped-smoking.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>468</wp:post_id>
		<wp:post_date><![CDATA[2019-02-06 18:45:19]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-02-06 23:45:19]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[ft-for-mean-stopped-smoking]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>72</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/FT-for-mean-stopped-smoking.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/02/FT-for-mean-stopped-smoking.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:484;s:6:"height";i:517;s:4:"file";s:39:"2019/02/FT-for-mean-stopped-smoking.jpg";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:39:"FT-for-mean-stopped-smoking-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:39:"FT-for-mean-stopped-smoking-281x300.jpg";s:5:"width";i:281;s:6:"height";i:300;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:37:"FT-for-mean-stopped-smoking-65x69.jpg";s:5:"width";i:65;s:6:"height";i:69;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:39:"FT-for-mean-stopped-smoking-225x240.jpg";s:5:"width";i:225;s:6:"height";i:240;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:39:"FT-for-mean-stopped-smoking-350x374.jpg";s:5:"width";i:350;s:6:"height";i:374;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>ruler-1023726_1280</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/chapter-4-2/ruler-1023726_1280/</link>
		<pubDate>Fri, 08 Feb 2019 18:52:06 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/ruler-1023726_1280.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>526</wp:post_id>
		<wp:post_date><![CDATA[2019-02-08 13:52:06]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-02-08 18:52:06]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[ruler-1023726_1280]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>26</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/ruler-1023726_1280.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/02/ruler-1023726_1280.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:1280;s:6:"height";i:426;s:4:"file";s:30:"2019/02/ruler-1023726_1280.png";s:5:"sizes";a:7:{s:9:"thumbnail";a:4:{s:4:"file";s:30:"ruler-1023726_1280-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:30:"ruler-1023726_1280-300x100.png";s:5:"width";i:300;s:6:"height";i:100;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:30:"ruler-1023726_1280-768x256.png";s:5:"width";i:768;s:6:"height";i:256;s:9:"mime-type";s:9:"image/png";}s:5:"large";a:4:{s:4:"file";s:31:"ruler-1023726_1280-1024x341.png";s:5:"width";i:1024;s:6:"height";i:341;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:28:"ruler-1023726_1280-65x22.png";s:5:"width";i:65;s:6:"height";i:22;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:29:"ruler-1023726_1280-225x75.png";s:5:"width";i:225;s:6:"height";i:75;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:30:"ruler-1023726_1280-350x116.png";s:5:"width";i:350;s:6:"height";i:116;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>World population</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/chapter-6/world-population/</link>
		<pubDate>Wed, 27 Feb 2019 22:04:34 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/World-population.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>661</wp:post_id>
		<wp:post_date><![CDATA[2019-02-27 17:04:34]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-02-27 22:04:34]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[world-population]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/World-population.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/02/World-population.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:755;s:6:"height";i:566;s:4:"file";s:28:"2019/02/World-population.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:28:"World-population-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:28:"World-population-300x225.png";s:5:"width";i:300;s:6:"height";i:225;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:26:"World-population-65x49.png";s:5:"width";i:65;s:6:"height";i:49;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:28:"World-population-225x169.png";s:5:"width";i:225;s:6:"height";i:169;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:28:"World-population-350x262.png";s:5:"width";i:350;s:6:"height";i:262;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal mean 250 1 and 2 se</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-5-the-central-limit-theorem/normal-mean-250-1-and-2-se/</link>
		<pubDate>Fri, 08 Mar 2019 02:49:38 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-250-1-and-2-se.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>802</wp:post_id>
		<wp:post_date><![CDATA[2019-03-07 21:49:38]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-08 02:49:38]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-mean-250-1-and-2-se]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>99</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-250-1-and-2-se.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/normal-mean-250-1-and-2-se.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:839;s:6:"height";i:276;s:4:"file";s:38:"2019/03/normal-mean-250-1-and-2-se.jpg";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:38:"normal-mean-250-1-and-2-se-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:37:"normal-mean-250-1-and-2-se-300x99.jpg";s:5:"width";i:300;s:6:"height";i:99;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:38:"normal-mean-250-1-and-2-se-768x253.jpg";s:5:"width";i:768;s:6:"height";i:253;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:36:"normal-mean-250-1-and-2-se-65x21.jpg";s:5:"width";i:65;s:6:"height";i:21;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:37:"normal-mean-250-1-and-2-se-225x74.jpg";s:5:"width";i:225;s:6:"height";i:74;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:38:"normal-mean-250-1-and-2-se-350x115.jpg";s:5:"width";i:350;s:6:"height";i:115;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal mean 250 1 and 2 se</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-5-the-central-limit-theorem/normal-mean-250-1-and-2-se-2/</link>
		<pubDate>Fri, 08 Mar 2019 02:57:58 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-250-1-and-2-se-1.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>804</wp:post_id>
		<wp:post_date><![CDATA[2019-03-07 21:57:58]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-08 02:57:58]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-mean-250-1-and-2-se-2]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>99</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-250-1-and-2-se-1.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/normal-mean-250-1-and-2-se-1.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:839;s:6:"height";i:276;s:4:"file";s:40:"2019/03/normal-mean-250-1-and-2-se-1.jpg";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:40:"normal-mean-250-1-and-2-se-1-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:39:"normal-mean-250-1-and-2-se-1-300x99.jpg";s:5:"width";i:300;s:6:"height";i:99;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:40:"normal-mean-250-1-and-2-se-1-768x253.jpg";s:5:"width";i:768;s:6:"height";i:253;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:38:"normal-mean-250-1-and-2-se-1-65x21.jpg";s:5:"width";i:65;s:6:"height";i:21;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:39:"normal-mean-250-1-and-2-se-1-225x74.jpg";s:5:"width";i:225;s:6:"height";i:74;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:40:"normal-mean-250-1-and-2-se-1-350x115.jpg";s:5:"width";i:350;s:6:"height";i:115;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal mean 250 1 and 2 se B</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-5-the-central-limit-theorem/normal-mean-250-1-and-2-se-b/</link>
		<pubDate>Fri, 08 Mar 2019 03:02:55 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-250-1-and-2-se-B.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>806</wp:post_id>
		<wp:post_date><![CDATA[2019-03-07 22:02:55]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-08 03:02:55]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-mean-250-1-and-2-se-b]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>99</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-250-1-and-2-se-B.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/normal-mean-250-1-and-2-se-B.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:839;s:6:"height";i:276;s:4:"file";s:40:"2019/03/normal-mean-250-1-and-2-se-B.jpg";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:40:"normal-mean-250-1-and-2-se-B-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:39:"normal-mean-250-1-and-2-se-B-300x99.jpg";s:5:"width";i:300;s:6:"height";i:99;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:40:"normal-mean-250-1-and-2-se-B-768x253.jpg";s:5:"width";i:768;s:6:"height";i:253;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:38:"normal-mean-250-1-and-2-se-B-65x21.jpg";s:5:"width";i:65;s:6:"height";i:21;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:39:"normal-mean-250-1-and-2-se-B-225x74.jpg";s:5:"width";i:225;s:6:"height";i:74;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:40:"normal-mean-250-1-and-2-se-B-350x115.jpg";s:5:"width";i:350;s:6:"height";i:115;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal mean 500 2se</title>
		<link>https://pressbooks.bccampus.ca/simplestats/normal-mean-500-2se/</link>
		<pubDate>Fri, 08 Mar 2019 18:18:32 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-2se.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>810</wp:post_id>
		<wp:post_date><![CDATA[2019-03-08 13:18:32]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-08 18:18:32]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-mean-500-2se]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-2se.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/normal-mean-500-2se.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:839;s:6:"height";i:263;s:4:"file";s:31:"2019/03/normal-mean-500-2se.jpg";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:31:"normal-mean-500-2se-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:30:"normal-mean-500-2se-300x94.jpg";s:5:"width";i:300;s:6:"height";i:94;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:31:"normal-mean-500-2se-768x241.jpg";s:5:"width";i:768;s:6:"height";i:241;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:29:"normal-mean-500-2se-65x20.jpg";s:5:"width";i:65;s:6:"height";i:20;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:30:"normal-mean-500-2se-225x71.jpg";s:5:"width";i:225;s:6:"height";i:71;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:31:"normal-mean-500-2se-350x110.jpg";s:5:"width";i:350;s:6:"height";i:110;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal mean 500 2se B</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-5-the-central-limit-theorem/normal-mean-500-2se-b/</link>
		<pubDate>Fri, 08 Mar 2019 18:20:28 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-2se-B.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>811</wp:post_id>
		<wp:post_date><![CDATA[2019-03-08 13:20:28]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-08 18:20:28]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-mean-500-2se-b]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>99</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-2se-B.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/normal-mean-500-2se-B.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:839;s:6:"height";i:263;s:4:"file";s:33:"2019/03/normal-mean-500-2se-B.jpg";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:33:"normal-mean-500-2se-B-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:32:"normal-mean-500-2se-B-300x94.jpg";s:5:"width";i:300;s:6:"height";i:94;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:33:"normal-mean-500-2se-B-768x241.jpg";s:5:"width";i:768;s:6:"height";i:241;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:31:"normal-mean-500-2se-B-65x20.jpg";s:5:"width";i:65;s:6:"height";i:20;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:32:"normal-mean-500-2se-B-225x71.jpg";s:5:"width";i:225;s:6:"height";i:71;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:33:"normal-mean-500-2se-B-350x110.jpg";s:5:"width";i:350;s:6:"height";i:110;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal mean 500 1 and 2se B</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-6-confidence-intervals/normal-mean-500-1-and-2se-b/</link>
		<pubDate>Fri, 08 Mar 2019 21:30:57 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-1-and-2se-B.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>826</wp:post_id>
		<wp:post_date><![CDATA[2019-03-08 16:30:57]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-08 21:30:57]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-mean-500-1-and-2se-b]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>101</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-1-and-2se-B.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/normal-mean-500-1-and-2se-B.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:839;s:6:"height";i:263;s:4:"file";s:39:"2019/03/normal-mean-500-1-and-2se-B.jpg";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:39:"normal-mean-500-1-and-2se-B-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:38:"normal-mean-500-1-and-2se-B-300x94.jpg";s:5:"width";i:300;s:6:"height";i:94;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:39:"normal-mean-500-1-and-2se-B-768x241.jpg";s:5:"width";i:768;s:6:"height";i:241;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:37:"normal-mean-500-1-and-2se-B-65x20.jpg";s:5:"width";i:65;s:6:"height";i:20;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:38:"normal-mean-500-1-and-2se-B-225x71.jpg";s:5:"width";i:225;s:6:"height";i:71;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:39:"normal-mean-500-1-and-2se-B-350x110.jpg";s:5:"width";i:350;s:6:"height";i:110;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal mean 500 1 and 2 and 3se C</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-6-confidence-intervals/normal-mean-500-1-and-2-and-3se-c/</link>
		<pubDate>Fri, 08 Mar 2019 22:48:34 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-1-and-2-and-3se-C.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>830</wp:post_id>
		<wp:post_date><![CDATA[2019-03-08 17:48:34]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-08 22:48:34]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-mean-500-1-and-2-and-3se-c]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>101</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-1-and-2-and-3se-C.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/normal-mean-500-1-and-2-and-3se-C.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:839;s:6:"height";i:263;s:4:"file";s:45:"2019/03/normal-mean-500-1-and-2-and-3se-C.jpg";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:45:"normal-mean-500-1-and-2-and-3se-C-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:44:"normal-mean-500-1-and-2-and-3se-C-300x94.jpg";s:5:"width";i:300;s:6:"height";i:94;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:45:"normal-mean-500-1-and-2-and-3se-C-768x241.jpg";s:5:"width";i:768;s:6:"height";i:241;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:43:"normal-mean-500-1-and-2-and-3se-C-65x20.jpg";s:5:"width";i:65;s:6:"height";i:20;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:44:"normal-mean-500-1-and-2-and-3se-C-225x71.jpg";s:5:"width";i:225;s:6:"height";i:71;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:45:"normal-mean-500-1-and-2-and-3se-C-350x110.jpg";s:5:"width";i:350;s:6:"height";i:110;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal mean 500 1 and 2 and 3se D</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-6-confidence-intervals/normal-mean-500-1-and-2-and-3se-d/</link>
		<pubDate>Fri, 08 Mar 2019 22:48:47 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-1-and-2-and-3se-D.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>831</wp:post_id>
		<wp:post_date><![CDATA[2019-03-08 17:48:47]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-08 22:48:47]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-mean-500-1-and-2-and-3se-d]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>101</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-1-and-2-and-3se-D.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/normal-mean-500-1-and-2-and-3se-D.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:839;s:6:"height";i:303;s:4:"file";s:45:"2019/03/normal-mean-500-1-and-2-and-3se-D.jpg";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:45:"normal-mean-500-1-and-2-and-3se-D-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:45:"normal-mean-500-1-and-2-and-3se-D-300x108.jpg";s:5:"width";i:300;s:6:"height";i:108;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:45:"normal-mean-500-1-and-2-and-3se-D-768x277.jpg";s:5:"width";i:768;s:6:"height";i:277;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:43:"normal-mean-500-1-and-2-and-3se-D-65x23.jpg";s:5:"width";i:65;s:6:"height";i:23;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:44:"normal-mean-500-1-and-2-and-3se-D-225x81.jpg";s:5:"width";i:225;s:6:"height";i:81;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:45:"normal-mean-500-1-and-2-and-3se-D-350x126.jpg";s:5:"width";i:350;s:6:"height";i:126;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal vs t 1</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-7-the-t-distribution/normal-vs-t-1/</link>
		<pubDate>Wed, 20 Mar 2019 01:38:26 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-vs-t-1.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>936</wp:post_id>
		<wp:post_date><![CDATA[2019-03-19 21:38:26]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-20 01:38:26]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-vs-t-1]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>103</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-vs-t-1.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/normal-vs-t-1.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:871;s:6:"height";i:354;s:4:"file";s:25:"2019/03/normal-vs-t-1.jpg";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:25:"normal-vs-t-1-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:25:"normal-vs-t-1-300x122.jpg";s:5:"width";i:300;s:6:"height";i:122;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:25:"normal-vs-t-1-768x312.jpg";s:5:"width";i:768;s:6:"height";i:312;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:23:"normal-vs-t-1-65x26.jpg";s:5:"width";i:65;s:6:"height";i:26;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:24:"normal-vs-t-1-225x91.jpg";s:5:"width";i:225;s:6:"height";i:91;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:25:"normal-vs-t-1-350x142.jpg";s:5:"width";i:350;s:6:"height";i:142;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>boxplots</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-1-between-a-discrete-and-a-continuous-variable/boxplots/</link>
		<pubDate>Thu, 21 Mar 2019 22:15:38 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/boxplots.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>954</wp:post_id>
		<wp:post_date><![CDATA[2019-03-21 18:15:38]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-21 22:15:38]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[boxplots]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>940</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/boxplots.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/boxplots.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:626;s:6:"height";i:581;s:4:"file";s:20:"2019/03/boxplots.jpg";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:20:"boxplots-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:20:"boxplots-300x278.jpg";s:5:"width";i:300;s:6:"height";i:278;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:18:"boxplots-65x60.jpg";s:5:"width";i:65;s:6:"height";i:60;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:20:"boxplots-225x209.jpg";s:5:"width";i:225;s:6:"height";i:209;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:20:"boxplots-350x325.jpg";s:5:"width";i:350;s:6:"height";i:325;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Box_plot_descriptionA</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-1-between-a-discrete-and-a-continuous-variable/box_plot_descriptiona/</link>
		<pubDate>Fri, 22 Mar 2019 01:27:59 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/Box_plot_descriptionA.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>966</wp:post_id>
		<wp:post_date><![CDATA[2019-03-21 21:27:59]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-22 01:27:59]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[box_plot_descriptiona]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>940</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/Box_plot_descriptionA.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/Box_plot_descriptionA.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:437;s:6:"height";i:457;s:4:"file";s:33:"2019/03/Box_plot_descriptionA.jpg";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:33:"Box_plot_descriptionA-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:33:"Box_plot_descriptionA-287x300.jpg";s:5:"width";i:287;s:6:"height";i:300;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:31:"Box_plot_descriptionA-65x68.jpg";s:5:"width";i:65;s:6:"height";i:68;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:33:"Box_plot_descriptionA-225x235.jpg";s:5:"width";i:225;s:6:"height";i:235;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:33:"Box_plot_descriptionA-350x366.jpg";s:5:"width";i:350;s:6:"height";i:366;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>crosstab aboriginal gender language</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-2-between-two-discrete-variables/crosstab-aboriginal-gender-language/</link>
		<pubDate>Fri, 22 Mar 2019 20:31:37 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-aboriginal-gender-language.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>986</wp:post_id>
		<wp:post_date><![CDATA[2019-03-22 16:31:37]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-22 20:31:37]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[crosstab-aboriginal-gender-language]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>974</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-aboriginal-gender-language.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/crosstab-aboriginal-gender-language.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:422;s:6:"height";i:186;s:4:"file";s:47:"2019/03/crosstab-aboriginal-gender-language.jpg";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:47:"crosstab-aboriginal-gender-language-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:47:"crosstab-aboriginal-gender-language-300x132.jpg";s:5:"width";i:300;s:6:"height";i:132;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:45:"crosstab-aboriginal-gender-language-65x29.jpg";s:5:"width";i:65;s:6:"height";i:29;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:46:"crosstab-aboriginal-gender-language-225x99.jpg";s:5:"width";i:225;s:6:"height";i:99;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:47:"crosstab-aboriginal-gender-language-350x154.jpg";s:5:"width";i:350;s:6:"height";i:154;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>crosstab aboriginal gender language percent</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-2-between-two-discrete-variables/crosstab-aboriginal-gender-language-percent/</link>
		<pubDate>Fri, 22 Mar 2019 23:53:25 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-aboriginal-gender-language-percent.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>999</wp:post_id>
		<wp:post_date><![CDATA[2019-03-22 19:53:25]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-22 23:53:25]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[crosstab-aboriginal-gender-language-percent]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>974</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-aboriginal-gender-language-percent.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/crosstab-aboriginal-gender-language-percent.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:582;s:6:"height";i:260;s:4:"file";s:55:"2019/03/crosstab-aboriginal-gender-language-percent.jpg";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:55:"crosstab-aboriginal-gender-language-percent-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:55:"crosstab-aboriginal-gender-language-percent-300x134.jpg";s:5:"width";i:300;s:6:"height";i:134;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:53:"crosstab-aboriginal-gender-language-percent-65x29.jpg";s:5:"width";i:65;s:6:"height";i:29;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:55:"crosstab-aboriginal-gender-language-percent-225x101.jpg";s:5:"width";i:225;s:6:"height";i:101;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:55:"crosstab-aboriginal-gender-language-percent-350x156.jpg";s:5:"width";i:350;s:6:"height";i:156;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>crosstab marital status health cchs</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-2-between-two-discrete-variables/crosstab-marital-status-health-cchs/</link>
		<pubDate>Wed, 27 Mar 2019 21:26:10 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-marital-status-health-cchs.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1002</wp:post_id>
		<wp:post_date><![CDATA[2019-03-27 17:26:10]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-27 21:26:10]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[crosstab-marital-status-health-cchs]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>974</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-marital-status-health-cchs.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/crosstab-marital-status-health-cchs.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:761;s:6:"height";i:359;s:4:"file";s:47:"2019/03/crosstab-marital-status-health-cchs.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:47:"crosstab-marital-status-health-cchs-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:47:"crosstab-marital-status-health-cchs-300x142.png";s:5:"width";i:300;s:6:"height";i:142;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:45:"crosstab-marital-status-health-cchs-65x31.png";s:5:"width";i:65;s:6:"height";i:31;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:47:"crosstab-marital-status-health-cchs-225x106.png";s:5:"width";i:225;s:6:"height";i:106;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:47:"crosstab-marital-status-health-cchs-350x165.png";s:5:"width";i:350;s:6:"height";i:165;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot students attendance testscore</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-3-between-two-continuous-variables/scatterplot-students-attendance-testscore/</link>
		<pubDate>Wed, 27 Mar 2019 23:36:43 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-students-attendance-testscore.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1011</wp:post_id>
		<wp:post_date><![CDATA[2019-03-27 19:36:43]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-27 23:36:43]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-students-attendance-testscore]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>976</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-students-attendance-testscore.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/scatterplot-students-attendance-testscore.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:53:"2019/03/scatterplot-students-attendance-testscore.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:53:"scatterplot-students-attendance-testscore-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:53:"scatterplot-students-attendance-testscore-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:51:"scatterplot-students-attendance-testscore-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:53:"scatterplot-students-attendance-testscore-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:53:"scatterplot-students-attendance-testscore-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterdplotstudents attendance social media</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-3-between-two-continuous-variables/scatterdplotstudents-attendance-social-media/</link>
		<pubDate>Wed, 27 Mar 2019 23:36:51 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterdplotstudents-attendance-social-media.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1012</wp:post_id>
		<wp:post_date><![CDATA[2019-03-27 19:36:51]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-27 23:36:51]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterdplotstudents-attendance-social-media]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>976</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterdplotstudents-attendance-social-media.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/scatterdplotstudents-attendance-social-media.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:56:"2019/03/scatterdplotstudents-attendance-social-media.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:56:"scatterdplotstudents-attendance-social-media-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:56:"scatterdplotstudents-attendance-social-media-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:54:"scatterdplotstudents-attendance-social-media-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:56:"scatterdplotstudents-attendance-social-media-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:56:"scatterdplotstudents-attendance-social-media-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot students attendance testscore line</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-3-between-two-continuous-variables/scatterplot-students-attendance-testscore-line/</link>
		<pubDate>Thu, 28 Mar 2019 21:03:26 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-students-attendance-testscore-line.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1016</wp:post_id>
		<wp:post_date><![CDATA[2019-03-28 17:03:26]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-28 21:03:26]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-students-attendance-testscore-line]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>976</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-students-attendance-testscore-line.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/scatterplot-students-attendance-testscore-line.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:58:"2019/03/scatterplot-students-attendance-testscore-line.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:58:"scatterplot-students-attendance-testscore-line-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:58:"scatterplot-students-attendance-testscore-line-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:56:"scatterplot-students-attendance-testscore-line-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:58:"scatterplot-students-attendance-testscore-line-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:58:"scatterplot-students-attendance-testscore-line-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterdplotstudents attendance social media lineA</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-3-between-two-continuous-variables/scatterdplotstudents-attendance-social-media-linea/</link>
		<pubDate>Thu, 28 Mar 2019 21:13:01 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterdplotstudents-attendance-social-media-lineA.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1019</wp:post_id>
		<wp:post_date><![CDATA[2019-03-28 17:13:01]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-28 21:13:01]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterdplotstudents-attendance-social-media-linea]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>976</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterdplotstudents-attendance-social-media-lineA.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/scatterdplotstudents-attendance-social-media-lineA.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:62:"2019/03/scatterdplotstudents-attendance-social-media-lineA.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:62:"scatterdplotstudents-attendance-social-media-lineA-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:62:"scatterdplotstudents-attendance-social-media-lineA-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:60:"scatterdplotstudents-attendance-social-media-lineA-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:62:"scatterdplotstudents-attendance-social-media-lineA-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:62:"scatterdplotstudents-attendance-social-media-lineA-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot student number test score flat lineA</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-3-between-two-continuous-variables/scatterplot-student-number-test-score-flat-linea/</link>
		<pubDate>Thu, 28 Mar 2019 21:49:33 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-student-number-test-score-flat-lineA.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1022</wp:post_id>
		<wp:post_date><![CDATA[2019-03-28 17:49:33]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-28 21:49:33]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-student-number-test-score-flat-linea]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>976</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-student-number-test-score-flat-lineA.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/scatterplot-student-number-test-score-flat-lineA.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:60:"2019/03/scatterplot-student-number-test-score-flat-lineA.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:60:"scatterplot-student-number-test-score-flat-lineA-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:60:"scatterplot-student-number-test-score-flat-lineA-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:58:"scatterplot-student-number-test-score-flat-lineA-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:60:"scatterplot-student-number-test-score-flat-lineA-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:60:"scatterplot-student-number-test-score-flat-lineA-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot student number test score CUBIC flat line</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-3-between-two-continuous-variables/scatterplot-student-number-test-score-cubic-flat-line/</link>
		<pubDate>Thu, 28 Mar 2019 21:49:40 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-student-number-test-score-CUBIC-flat-line.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1023</wp:post_id>
		<wp:post_date><![CDATA[2019-03-28 17:49:40]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-28 21:49:40]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-student-number-test-score-cubic-flat-line]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>976</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-student-number-test-score-CUBIC-flat-line.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/scatterplot-student-number-test-score-CUBIC-flat-line.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:65:"2019/03/scatterplot-student-number-test-score-CUBIC-flat-line.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:65:"scatterplot-student-number-test-score-CUBIC-flat-line-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:65:"scatterplot-student-number-test-score-CUBIC-flat-line-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:63:"scatterplot-student-number-test-score-CUBIC-flat-line-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:65:"scatterplot-student-number-test-score-CUBIC-flat-line-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:65:"scatterplot-student-number-test-score-CUBIC-flat-line-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>correlation attendance test scores</title>
		<link>https://pressbooks.bccampus.ca/simplestats/correlation-attendance-test-scores/</link>
		<pubDate>Thu, 28 Mar 2019 23:45:05 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/correlation-attendance-test-scores.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1031</wp:post_id>
		<wp:post_date><![CDATA[2019-03-28 19:45:05]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-28 23:45:05]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[correlation-attendance-test-scores]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/correlation-attendance-test-scores.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/correlation-attendance-test-scores.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:378;s:6:"height";i:220;s:4:"file";s:46:"2019/03/correlation-attendance-test-scores.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:46:"correlation-attendance-test-scores-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:46:"correlation-attendance-test-scores-300x175.png";s:5:"width";i:300;s:6:"height";i:175;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:44:"correlation-attendance-test-scores-65x38.png";s:5:"width";i:65;s:6:"height";i:38;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:46:"correlation-attendance-test-scores-225x131.png";s:5:"width";i:225;s:6:"height";i:131;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:46:"correlation-attendance-test-scores-350x204.png";s:5:"width";i:350;s:6:"height";i:204;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot paeduc educ gss line</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-3-between-two-continuous-variables/scatterplot-paeduc-educ-gss-line/</link>
		<pubDate>Fri, 29 Mar 2019 19:55:13 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-paeduc-educ-gss-line.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1034</wp:post_id>
		<wp:post_date><![CDATA[2019-03-29 15:55:13]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-29 19:55:13]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-paeduc-educ-gss-line]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>976</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-paeduc-educ-gss-line.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/scatterplot-paeduc-educ-gss-line.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:44:"2019/03/scatterplot-paeduc-educ-gss-line.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:44:"scatterplot-paeduc-educ-gss-line-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:44:"scatterplot-paeduc-educ-gss-line-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:42:"scatterplot-paeduc-educ-gss-line-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:44:"scatterplot-paeduc-educ-gss-line-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:44:"scatterplot-paeduc-educ-gss-line-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>correlation paeduc educ gss</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-3-between-two-continuous-variables/correlation-paeduc-educ-gss/</link>
		<pubDate>Fri, 29 Mar 2019 19:56:14 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/correlation-paeduc-educ-gss.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1035</wp:post_id>
		<wp:post_date><![CDATA[2019-03-29 15:56:14]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-29 19:56:14]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[correlation-paeduc-educ-gss]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>976</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/correlation-paeduc-educ-gss.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/03/correlation-paeduc-educ-gss.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:492;s:6:"height";i:262;s:4:"file";s:39:"2019/03/correlation-paeduc-educ-gss.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:39:"correlation-paeduc-educ-gss-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:39:"correlation-paeduc-educ-gss-300x160.png";s:5:"width";i:300;s:6:"height";i:160;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:37:"correlation-paeduc-educ-gss-65x35.png";s:5:"width";i:65;s:6:"height";i:35;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:39:"correlation-paeduc-educ-gss-225x120.png";s:5:"width";i:225;s:6:"height";i:120;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:39:"correlation-paeduc-educ-gss-350x186.png";s:5:"width";i:350;s:6:"height";i:186;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>boxplot degree income nhs</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/9-1-between-a-discrete-and-a-continuous-variable/boxplot-degree-income-nhs/</link>
		<pubDate>Wed, 10 Apr 2019 19:09:07 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/boxplot-degree-income-nhs.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1213</wp:post_id>
		<wp:post_date><![CDATA[2019-04-10 15:09:07]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-10 19:09:07]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[boxplot-degree-income-nhs]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1137</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/boxplot-degree-income-nhs.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/04/boxplot-degree-income-nhs.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:410;s:4:"file";s:37:"2019/04/boxplot-degree-income-nhs.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:37:"boxplot-degree-income-nhs-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:37:"boxplot-degree-income-nhs-300x266.png";s:5:"width";i:300;s:6:"height";i:266;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:35:"boxplot-degree-income-nhs-65x58.png";s:5:"width";i:65;s:6:"height";i:58;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:37:"boxplot-degree-income-nhs-225x200.png";s:5:"width";i:225;s:6:"height";i:200;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:37:"boxplot-degree-income-nhs-350x311.png";s:5:"width";i:350;s:6:"height";i:311;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>crosstab degree and citizenship nhs</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/9-2-the-chi-square/crosstab-degree-and-citizenship-nhs/</link>
		<pubDate>Fri, 12 Apr 2019 22:58:01 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/crosstab-degree-and-citizenship-nhs.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1262</wp:post_id>
		<wp:post_date><![CDATA[2019-04-12 18:58:01]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-12 22:58:01]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[crosstab-degree-and-citizenship-nhs]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>126</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/crosstab-degree-and-citizenship-nhs.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/04/crosstab-degree-and-citizenship-nhs.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:737;s:6:"height";i:646;s:4:"file";s:47:"2019/04/crosstab-degree-and-citizenship-nhs.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:47:"crosstab-degree-and-citizenship-nhs-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:47:"crosstab-degree-and-citizenship-nhs-300x263.png";s:5:"width";i:300;s:6:"height";i:263;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:45:"crosstab-degree-and-citizenship-nhs-65x57.png";s:5:"width";i:65;s:6:"height";i:57;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:47:"crosstab-degree-and-citizenship-nhs-225x197.png";s:5:"width";i:225;s:6:"height";i:197;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:47:"crosstab-degree-and-citizenship-nhs-350x307.png";s:5:"width";i:350;s:6:"height";i:307;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot class assignment requirements mark</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-basics-of-linear-regression/scatterplot-class-assignment-requirements-mark/</link>
		<pubDate>Tue, 23 Apr 2019 22:14:46 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-mark.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1324</wp:post_id>
		<wp:post_date><![CDATA[2019-04-23 18:14:46]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-23 22:14:46]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-class-assignment-requirements-mark]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>132</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-mark.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/04/scatterplot-class-assignment-requirements-mark.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:58:"2019/04/scatterplot-class-assignment-requirements-mark.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:58:"scatterplot-class-assignment-requirements-mark-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:58:"scatterplot-class-assignment-requirements-mark-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:56:"scatterplot-class-assignment-requirements-mark-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:58:"scatterplot-class-assignment-requirements-mark-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:58:"scatterplot-class-assignment-requirements-mark-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot class assignment requirements markA</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-basics-of-linear-regression/scatterplot-class-assignment-requirements-marka/</link>
		<pubDate>Tue, 23 Apr 2019 22:20:30 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-markA.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1327</wp:post_id>
		<wp:post_date><![CDATA[2019-04-23 18:20:30]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-23 22:20:30]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-class-assignment-requirements-marka]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>132</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-markA.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/04/scatterplot-class-assignment-requirements-markA.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:59:"2019/04/scatterplot-class-assignment-requirements-markA.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:59:"scatterplot-class-assignment-requirements-markA-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:59:"scatterplot-class-assignment-requirements-markA-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:57:"scatterplot-class-assignment-requirements-markA-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:59:"scatterplot-class-assignment-requirements-markA-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:59:"scatterplot-class-assignment-requirements-markA-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot class assignment requirements markA</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-basics-of-linear-regression/scatterplot-class-assignment-requirements-marka-2/</link>
		<pubDate>Wed, 24 Apr 2019 21:11:33 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-markA-1.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1344</wp:post_id>
		<wp:post_date><![CDATA[2019-04-24 17:11:33]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-24 21:11:33]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-class-assignment-requirements-marka-2]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>132</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-markA-1.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/04/scatterplot-class-assignment-requirements-markA-1.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:61:"2019/04/scatterplot-class-assignment-requirements-markA-1.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:61:"scatterplot-class-assignment-requirements-markA-1-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:61:"scatterplot-class-assignment-requirements-markA-1-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:59:"scatterplot-class-assignment-requirements-markA-1-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:61:"scatterplot-class-assignment-requirements-markA-1-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:61:"scatterplot-class-assignment-requirements-markA-1-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot class assignment requirements mark with variability</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-1-the-linear-regression-model/scatterplot-class-assignment-requirements-mark-with-variability/</link>
		<pubDate>Wed, 24 Apr 2019 21:11:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-mark-with-variability.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1345</wp:post_id>
		<wp:post_date><![CDATA[2019-04-24 17:11:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-24 21:11:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-class-assignment-requirements-mark-with-variability]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>135</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-mark-with-variability.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/04/scatterplot-class-assignment-requirements-mark-with-variability.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:75:"2019/04/scatterplot-class-assignment-requirements-mark-with-variability.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:75:"scatterplot-class-assignment-requirements-mark-with-variability-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:75:"scatterplot-class-assignment-requirements-mark-with-variability-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:73:"scatterplot-class-assignment-requirements-mark-with-variability-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:75:"scatterplot-class-assignment-requirements-mark-with-variability-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:75:"scatterplot-class-assignment-requirements-mark-with-variability-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>regression table educ paeduc</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-2-elements-of-the-linear-regression-model/regression-table-educ-paeduc/</link>
		<pubDate>Thu, 25 Apr 2019 22:50:04 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/regression-table-educ-paeduc.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1370</wp:post_id>
		<wp:post_date><![CDATA[2019-04-25 18:50:04]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-25 22:50:04]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[regression-table-educ-paeduc]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1340</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/regression-table-educ-paeduc.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/04/regression-table-educ-paeduc.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:812;s:6:"height";i:178;s:4:"file";s:40:"2019/04/regression-table-educ-paeduc.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:40:"regression-table-educ-paeduc-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:39:"regression-table-educ-paeduc-300x66.png";s:5:"width";i:300;s:6:"height";i:66;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:40:"regression-table-educ-paeduc-768x168.png";s:5:"width";i:768;s:6:"height";i:168;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:38:"regression-table-educ-paeduc-65x14.png";s:5:"width";i:65;s:6:"height";i:14;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:39:"regression-table-educ-paeduc-225x49.png";s:5:"width";i:225;s:6:"height";i:49;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:39:"regression-table-educ-paeduc-350x77.png";s:5:"width";i:350;s:6:"height";i:77;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot educ paeduc line</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-2-elements-of-the-linear-regression-model/scatterplot-educ-paeduc-line/</link>
		<pubDate>Thu, 25 Apr 2019 22:58:28 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-educ-paeduc-line.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1371</wp:post_id>
		<wp:post_date><![CDATA[2019-04-25 18:58:28]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-25 22:58:28]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-educ-paeduc-line]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1340</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-educ-paeduc-line.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/04/scatterplot-educ-paeduc-line.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:40:"2019/04/scatterplot-educ-paeduc-line.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:40:"scatterplot-educ-paeduc-line-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:40:"scatterplot-educ-paeduc-line-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:38:"scatterplot-educ-paeduc-line-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:40:"scatterplot-educ-paeduc-line-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:40:"scatterplot-educ-paeduc-line-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>scatterplot attendance scores full</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-2-elements-of-the-linear-regression-model/scatterplot-attendance-scores-full/</link>
		<pubDate>Thu, 02 May 2019 20:06:05 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/scatterplot-attendance-scores-full.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1377</wp:post_id>
		<wp:post_date><![CDATA[2019-05-02 16:06:05]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-05-02 20:06:05]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[scatterplot-attendance-scores-full]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1340</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/scatterplot-attendance-scores-full.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/05/scatterplot-attendance-scores-full.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:46:"2019/05/scatterplot-attendance-scores-full.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:46:"scatterplot-attendance-scores-full-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:46:"scatterplot-attendance-scores-full-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:44:"scatterplot-attendance-scores-full-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:46:"scatterplot-attendance-scores-full-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:46:"scatterplot-attendance-scores-full-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>regression attendance scores full</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-2-elements-of-the-linear-regression-model/regression-attendance-scores-full/</link>
		<pubDate>Thu, 02 May 2019 20:06:11 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/regression-attendance-scores-full.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1378</wp:post_id>
		<wp:post_date><![CDATA[2019-05-02 16:06:11]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-05-02 20:06:11]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[regression-attendance-scores-full]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1340</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/regression-attendance-scores-full.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/05/regression-attendance-scores-full.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:733;s:6:"height";i:163;s:4:"file";s:45:"2019/05/regression-attendance-scores-full.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:45:"regression-attendance-scores-full-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:44:"regression-attendance-scores-full-300x67.png";s:5:"width";i:300;s:6:"height";i:67;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:43:"regression-attendance-scores-full-65x14.png";s:5:"width";i:65;s:6:"height";i:14;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:44:"regression-attendance-scores-full-225x50.png";s:5:"width";i:225;s:6:"height";i:50;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:44:"regression-attendance-scores-full-350x78.png";s:5:"width";i:350;s:6:"height";i:78;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>r2 class attendance scores full</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-3-r-squared/r2-class-attendance-scores-full/</link>
		<pubDate>Thu, 02 May 2019 20:51:09 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/r2-class-attendance-scores-full.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1385</wp:post_id>
		<wp:post_date><![CDATA[2019-05-02 16:51:09]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-05-02 20:51:09]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[r2-class-attendance-scores-full]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>137</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/r2-class-attendance-scores-full.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/05/r2-class-attendance-scores-full.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:395;s:6:"height";i:123;s:4:"file";s:43:"2019/05/r2-class-attendance-scores-full.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:43:"r2-class-attendance-scores-full-150x123.png";s:5:"width";i:150;s:6:"height";i:123;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:42:"r2-class-attendance-scores-full-300x93.png";s:5:"width";i:300;s:6:"height";i:93;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:41:"r2-class-attendance-scores-full-65x20.png";s:5:"width";i:65;s:6:"height";i:20;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:42:"r2-class-attendance-scores-full-225x70.png";s:5:"width";i:225;s:6:"height";i:70;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:43:"r2-class-attendance-scores-full-350x109.png";s:5:"width";i:350;s:6:"height";i:109;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>r2 educ paeduc</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-3-r-squared/r2-educ-paeduc/</link>
		<pubDate>Thu, 02 May 2019 21:32:29 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/r2-educ-paeduc.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1394</wp:post_id>
		<wp:post_date><![CDATA[2019-05-02 17:32:29]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-05-02 21:32:29]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[r2-educ-paeduc]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>137</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/r2-educ-paeduc.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/05/r2-educ-paeduc.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:392;s:6:"height";i:135;s:4:"file";s:26:"2019/05/r2-educ-paeduc.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:26:"r2-educ-paeduc-150x135.png";s:5:"width";i:150;s:6:"height";i:135;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:26:"r2-educ-paeduc-300x103.png";s:5:"width";i:300;s:6:"height";i:103;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:24:"r2-educ-paeduc-65x22.png";s:5:"width";i:65;s:6:"height";i:22;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:25:"r2-educ-paeduc-225x77.png";s:5:"width";i:225;s:6:"height";i:77;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:26:"r2-educ-paeduc-350x121.png";s:5:"width";i:350;s:6:"height";i:121;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>hammer-hand-tools-measuring-tape-175039</title>
		<link>https://pressbooks.bccampus.ca/simplestats/front-matter/cover/hammer-hand-tools-measuring-tape-175039/</link>
		<pubDate>Thu, 25 Jul 2019 22:16:33 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/07/hammer-hand-tools-measuring-tape-175039.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1413</wp:post_id>
		<wp:post_date><![CDATA[2019-07-25 18:16:33]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-07-25 22:16:33]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[hammer-hand-tools-measuring-tape-175039]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>166</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/07/hammer-hand-tools-measuring-tape-175039.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/07/hammer-hand-tools-measuring-tape-175039.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:4608;s:6:"height";i:3456;s:4:"file";s:51:"2019/07/hammer-hand-tools-measuring-tape-175039.jpg";s:5:"sizes";a:7:{s:9:"thumbnail";a:4:{s:4:"file";s:51:"hammer-hand-tools-measuring-tape-175039-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:51:"hammer-hand-tools-measuring-tape-175039-300x225.jpg";s:5:"width";i:300;s:6:"height";i:225;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:51:"hammer-hand-tools-measuring-tape-175039-768x576.jpg";s:5:"width";i:768;s:6:"height";i:576;s:9:"mime-type";s:10:"image/jpeg";}s:5:"large";a:4:{s:4:"file";s:52:"hammer-hand-tools-measuring-tape-175039-1024x768.jpg";s:5:"width";i:1024;s:6:"height";i:768;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:49:"hammer-hand-tools-measuring-tape-175039-65x49.jpg";s:5:"width";i:65;s:6:"height";i:49;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:51:"hammer-hand-tools-measuring-tape-175039-225x169.jpg";s:5:"width";i:225;s:6:"height";i:169;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:51:"hammer-hand-tools-measuring-tape-175039-350x263.jpg";s:5:"width";i:350;s:6:"height";i:263;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>stats tools</title>
		<link>https://pressbooks.bccampus.ca/simplestats/front-matter/cover/stats-tools/</link>
		<pubDate>Thu, 25 Jul 2019 22:27:31 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/07/stats-tools.jpg</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1414</wp:post_id>
		<wp:post_date><![CDATA[2019-07-25 18:27:31]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-07-25 22:27:31]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[stats-tools]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>166</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/07/stats-tools.jpg]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/07/stats-tools.jpg]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:4608;s:6:"height";i:3456;s:4:"file";s:23:"2019/07/stats-tools.jpg";s:5:"sizes";a:7:{s:9:"thumbnail";a:4:{s:4:"file";s:23:"stats-tools-150x150.jpg";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:10:"image/jpeg";}s:6:"medium";a:4:{s:4:"file";s:23:"stats-tools-300x225.jpg";s:5:"width";i:300;s:6:"height";i:225;s:9:"mime-type";s:10:"image/jpeg";}s:12:"medium_large";a:4:{s:4:"file";s:23:"stats-tools-768x576.jpg";s:5:"width";i:768;s:6:"height";i:576;s:9:"mime-type";s:10:"image/jpeg";}s:5:"large";a:4:{s:4:"file";s:24:"stats-tools-1024x768.jpg";s:5:"width";i:1024;s:6:"height";i:768;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_small";a:4:{s:4:"file";s:21:"stats-tools-65x49.jpg";s:5:"width";i:65;s:6:"height";i:49;s:9:"mime-type";s:10:"image/jpeg";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:23:"stats-tools-225x169.jpg";s:5:"width";i:225;s:6:"height";i:169;s:9:"mime-type";s:10:"image/jpeg";}s:14:"pb_cover_large";a:4:{s:4:"file";s:23:"stats-tools-350x263.jpg";s:5:"width";i:350;s:6:"height";i:263;s:9:"mime-type";s:10:"image/jpeg";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:6:"Picasa";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:10:"1564092788";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>data snapshot variable view</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-1-data/data-snapshot-variable-view/</link>
		<pubDate>Tue, 06 Aug 2019 22:04:18 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/data-snapshot-variable-view.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1464</wp:post_id>
		<wp:post_date><![CDATA[2019-08-06 18:04:18]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-06 22:04:18]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[data-snapshot-variable-view]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>57</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/data-snapshot-variable-view.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/data-snapshot-variable-view.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:1043;s:6:"height";i:220;s:4:"file";s:39:"2019/08/data-snapshot-variable-view.png";s:5:"sizes";a:7:{s:9:"thumbnail";a:4:{s:4:"file";s:39:"data-snapshot-variable-view-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:38:"data-snapshot-variable-view-300x63.png";s:5:"width";i:300;s:6:"height";i:63;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:39:"data-snapshot-variable-view-768x162.png";s:5:"width";i:768;s:6:"height";i:162;s:9:"mime-type";s:9:"image/png";}s:5:"large";a:4:{s:4:"file";s:40:"data-snapshot-variable-view-1024x216.png";s:5:"width";i:1024;s:6:"height";i:216;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:37:"data-snapshot-variable-view-65x14.png";s:5:"width";i:65;s:6:"height";i:14;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:38:"data-snapshot-variable-view-225x47.png";s:5:"width";i:225;s:6:"height";i:47;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:38:"data-snapshot-variable-view-350x74.png";s:5:"width";i:350;s:6:"height";i:74;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>data snapshot data view</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-1-data/data-snapshot-data-view/</link>
		<pubDate>Tue, 06 Aug 2019 22:04:19 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/data-snapshot-data-view.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1465</wp:post_id>
		<wp:post_date><![CDATA[2019-08-06 18:04:19]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-06 22:04:19]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[data-snapshot-data-view]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>57</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/data-snapshot-data-view.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/data-snapshot-data-view.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:874;s:6:"height";i:261;s:4:"file";s:35:"2019/08/data-snapshot-data-view.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:35:"data-snapshot-data-view-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:34:"data-snapshot-data-view-300x90.png";s:5:"width";i:300;s:6:"height";i:90;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:35:"data-snapshot-data-view-768x229.png";s:5:"width";i:768;s:6:"height";i:229;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:33:"data-snapshot-data-view-65x19.png";s:5:"width";i:65;s:6:"height";i:19;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:34:"data-snapshot-data-view-225x67.png";s:5:"width";i:225;s:6:"height";i:67;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:35:"data-snapshot-data-view-350x105.png";s:5:"width";i:350;s:6:"height";i:105;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>nominal freq table sex gss 2016</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-3-frequency-tables/nominal-freq-table-sex-gss-2016/</link>
		<pubDate>Wed, 07 Aug 2019 00:14:06 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/nominal-freq-table-sex-gss-2016.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1479</wp:post_id>
		<wp:post_date><![CDATA[2019-08-06 20:14:06]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-07 00:14:06]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[nominal-freq-table-sex-gss-2016]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>61</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/nominal-freq-table-sex-gss-2016.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/nominal-freq-table-sex-gss-2016.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:452;s:6:"height";i:143;s:4:"file";s:43:"2019/08/nominal-freq-table-sex-gss-2016.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:43:"nominal-freq-table-sex-gss-2016-150x143.png";s:5:"width";i:150;s:6:"height";i:143;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:42:"nominal-freq-table-sex-gss-2016-300x95.png";s:5:"width";i:300;s:6:"height";i:95;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:41:"nominal-freq-table-sex-gss-2016-65x21.png";s:5:"width";i:65;s:6:"height";i:21;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:42:"nominal-freq-table-sex-gss-2016-225x71.png";s:5:"width";i:225;s:6:"height";i:71;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:43:"nominal-freq-table-sex-gss-2016-350x111.png";s:5:"width";i:350;s:6:"height";i:111;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>ordinal freq table workplace size gss 2016</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-3-frequency-tables/ordinal-freq-table-workplace-size-gss-2016/</link>
		<pubDate>Wed, 07 Aug 2019 00:14:10 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/ordinal-freq-table-workplace-size-gss-2016.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1480</wp:post_id>
		<wp:post_date><![CDATA[2019-08-06 20:14:10]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-07 00:14:10]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[ordinal-freq-table-workplace-size-gss-2016]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>61</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/ordinal-freq-table-workplace-size-gss-2016.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/ordinal-freq-table-workplace-size-gss-2016.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:520;s:6:"height";i:297;s:4:"file";s:54:"2019/08/ordinal-freq-table-workplace-size-gss-2016.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:54:"ordinal-freq-table-workplace-size-gss-2016-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:54:"ordinal-freq-table-workplace-size-gss-2016-300x171.png";s:5:"width";i:300;s:6:"height";i:171;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:52:"ordinal-freq-table-workplace-size-gss-2016-65x37.png";s:5:"width";i:65;s:6:"height";i:37;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:54:"ordinal-freq-table-workplace-size-gss-2016-225x129.png";s:5:"width";i:225;s:6:"height";i:129;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:54:"ordinal-freq-table-workplace-size-gss-2016-350x200.png";s:5:"width";i:350;s:6:"height";i:200;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>ratio freq table takeout dishes gss 2016</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-3-frequency-tables/ratio-freq-table-takeout-dishes-gss-2016/</link>
		<pubDate>Wed, 07 Aug 2019 00:14:13 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/ratio-freq-table-takeout-dishes-gss-2016.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1481</wp:post_id>
		<wp:post_date><![CDATA[2019-08-06 20:14:13]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-07 00:14:13]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[ratio-freq-table-takeout-dishes-gss-2016]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>61</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/ratio-freq-table-takeout-dishes-gss-2016.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/ratio-freq-table-takeout-dishes-gss-2016.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:484;s:6:"height";i:869;s:4:"file";s:52:"2019/08/ratio-freq-table-takeout-dishes-gss-2016.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:52:"ratio-freq-table-takeout-dishes-gss-2016-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:52:"ratio-freq-table-takeout-dishes-gss-2016-167x300.png";s:5:"width";i:167;s:6:"height";i:300;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:51:"ratio-freq-table-takeout-dishes-gss-2016-65x117.png";s:5:"width";i:65;s:6:"height";i:117;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:52:"ratio-freq-table-takeout-dishes-gss-2016-225x404.png";s:5:"width";i:225;s:6:"height";i:404;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:52:"ratio-freq-table-takeout-dishes-gss-2016-350x628.png";s:5:"width";i:350;s:6:"height";i:628;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>pie chart sex gss 2016</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-4-graphs/pie-chart-sex-gss-2016/</link>
		<pubDate>Thu, 08 Aug 2019 22:54:56 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/pie-chart-sex-gss-2016.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1512</wp:post_id>
		<wp:post_date><![CDATA[2019-08-08 18:54:56]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-08 22:54:56]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[pie-chart-sex-gss-2016]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>63</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/pie-chart-sex-gss-2016.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/pie-chart-sex-gss-2016.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:34:"2019/08/pie-chart-sex-gss-2016.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:34:"pie-chart-sex-gss-2016-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:34:"pie-chart-sex-gss-2016-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:32:"pie-chart-sex-gss-2016-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:34:"pie-chart-sex-gss-2016-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:34:"pie-chart-sex-gss-2016-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>pie chart marstat gss 2016</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-4-graphs/pie-chart-marstat-gss-2016/</link>
		<pubDate>Thu, 08 Aug 2019 22:54:57 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/pie-chart-marstat-gss-2016.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1513</wp:post_id>
		<wp:post_date><![CDATA[2019-08-08 18:54:57]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-08 22:54:57]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[pie-chart-marstat-gss-2016]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>63</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/pie-chart-marstat-gss-2016.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/pie-chart-marstat-gss-2016.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:38:"2019/08/pie-chart-marstat-gss-2016.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:38:"pie-chart-marstat-gss-2016-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:38:"pie-chart-marstat-gss-2016-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:36:"pie-chart-marstat-gss-2016-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:38:"pie-chart-marstat-gss-2016-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:38:"pie-chart-marstat-gss-2016-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>bar graph workplace size gss 2016</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-4-graphs/bar-graph-workplace-size-gss-2016/</link>
		<pubDate>Thu, 08 Aug 2019 23:08:16 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/bar-graph-workplace-size-gss-2016.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1517</wp:post_id>
		<wp:post_date><![CDATA[2019-08-08 19:08:16]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-08 23:08:16]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[bar-graph-workplace-size-gss-2016]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>63</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/bar-graph-workplace-size-gss-2016.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/bar-graph-workplace-size-gss-2016.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:45:"2019/08/bar-graph-workplace-size-gss-2016.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:45:"bar-graph-workplace-size-gss-2016-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:45:"bar-graph-workplace-size-gss-2016-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:43:"bar-graph-workplace-size-gss-2016-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:45:"bar-graph-workplace-size-gss-2016-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:45:"bar-graph-workplace-size-gss-2016-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>histogram takeout dishes gss 2016</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-4-graphs/histogram-takeout-dishes-gss-2016/</link>
		<pubDate>Thu, 08 Aug 2019 23:24:01 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/histogram-takeout-dishes-gss-2016.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1522</wp:post_id>
		<wp:post_date><![CDATA[2019-08-08 19:24:01]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-08 23:24:01]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[histogram-takeout-dishes-gss-2016]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>63</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/histogram-takeout-dishes-gss-2016.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/histogram-takeout-dishes-gss-2016.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:45:"2019/08/histogram-takeout-dishes-gss-2016.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:45:"histogram-takeout-dishes-gss-2016-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:45:"histogram-takeout-dishes-gss-2016-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:43:"histogram-takeout-dishes-gss-2016-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:45:"histogram-takeout-dishes-gss-2016-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:45:"histogram-takeout-dishes-gss-2016-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>right skew number cigarettes cchs zoomed</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-6-outliers/right-skew-number-cigarettes-cchs-zoomed/</link>
		<pubDate>Tue, 13 Aug 2019 22:25:12 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/right-skew-number-cigarettes-cchs-zoomed.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1609</wp:post_id>
		<wp:post_date><![CDATA[2019-08-13 18:25:12]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-13 22:25:12]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[right-skew-number-cigarettes-cchs-zoomed]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1601</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/right-skew-number-cigarettes-cchs-zoomed.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/right-skew-number-cigarettes-cchs-zoomed.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:52:"2019/08/right-skew-number-cigarettes-cchs-zoomed.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:52:"right-skew-number-cigarettes-cchs-zoomed-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:52:"right-skew-number-cigarettes-cchs-zoomed-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:50:"right-skew-number-cigarettes-cchs-zoomed-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:52:"right-skew-number-cigarettes-cchs-zoomed-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:52:"right-skew-number-cigarettes-cchs-zoomed-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>right skew number cigarettes cchs1</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-6-outliers/right-skew-number-cigarettes-cchs1/</link>
		<pubDate>Tue, 13 Aug 2019 22:28:18 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/right-skew-number-cigarettes-cchs1.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1610</wp:post_id>
		<wp:post_date><![CDATA[2019-08-13 18:28:18]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-13 22:28:18]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[right-skew-number-cigarettes-cchs1]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1601</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/right-skew-number-cigarettes-cchs1.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/right-skew-number-cigarettes-cchs1.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:46:"2019/08/right-skew-number-cigarettes-cchs1.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:46:"right-skew-number-cigarettes-cchs1-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:46:"right-skew-number-cigarettes-cchs1-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:44:"right-skew-number-cigarettes-cchs1-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:46:"right-skew-number-cigarettes-cchs1-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:46:"right-skew-number-cigarettes-cchs1-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>range iqr freq table smokers cchs</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/4-2-interquartile-range/range-iqr-freq-table-smokers-cchs/</link>
		<pubDate>Wed, 14 Aug 2019 01:14:26 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/range-iqr-freq-table-smokers-cchs.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1635</wp:post_id>
		<wp:post_date><![CDATA[2019-08-13 21:14:26]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-14 01:14:26]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[range-iqr-freq-table-smokers-cchs]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1621</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/range-iqr-freq-table-smokers-cchs.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/range-iqr-freq-table-smokers-cchs.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:484;s:6:"height";i:1287;s:4:"file";s:45:"2019/08/range-iqr-freq-table-smokers-cchs.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:45:"range-iqr-freq-table-smokers-cchs-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:45:"range-iqr-freq-table-smokers-cchs-113x300.png";s:5:"width";i:113;s:6:"height";i:300;s:9:"mime-type";s:9:"image/png";}s:5:"large";a:4:{s:4:"file";s:46:"range-iqr-freq-table-smokers-cchs-385x1024.png";s:5:"width";i:385;s:6:"height";i:1024;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:44:"range-iqr-freq-table-smokers-cchs-65x173.png";s:5:"width";i:65;s:6:"height";i:173;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:45:"range-iqr-freq-table-smokers-cchs-225x598.png";s:5:"width";i:225;s:6:"height";i:598;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:45:"range-iqr-freq-table-smokers-cchs-350x931.png";s:5:"width";i:350;s:6:"height";i:931;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal curve bmi cchs</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-the-normal-distribution/normal-curve-bmi-cchs/</link>
		<pubDate>Fri, 16 Aug 2019 19:05:26 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-curve-bmi-cchs.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1679</wp:post_id>
		<wp:post_date><![CDATA[2019-08-16 15:05:26]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-16 19:05:26]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-curve-bmi-cchs]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>767</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-curve-bmi-cchs.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-curve-bmi-cchs.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:33:"2019/08/normal-curve-bmi-cchs.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:33:"normal-curve-bmi-cchs-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:33:"normal-curve-bmi-cchs-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:31:"normal-curve-bmi-cchs-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:33:"normal-curve-bmi-cchs-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:33:"normal-curve-bmi-cchs-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal curve weight cchs</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-the-normal-distribution/normal-curve-weight-cchs/</link>
		<pubDate>Fri, 16 Aug 2019 19:24:46 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-curve-weight-cchs.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1680</wp:post_id>
		<wp:post_date><![CDATA[2019-08-16 15:24:46]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-16 19:24:46]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-curve-weight-cchs]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>767</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-curve-weight-cchs.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-curve-weight-cchs.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:462;s:6:"height";i:370;s:4:"file";s:36:"2019/08/normal-curve-weight-cchs.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:36:"normal-curve-weight-cchs-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:36:"normal-curve-weight-cchs-300x240.png";s:5:"width";i:300;s:6:"height";i:240;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:34:"normal-curve-weight-cchs-65x52.png";s:5:"width";i:65;s:6:"height";i:52;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:36:"normal-curve-weight-cchs-225x180.png";s:5:"width";i:225;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:36:"normal-curve-weight-cchs-350x280.png";s:5:"width";i:350;s:6:"height";i:280;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Appendix</title>
		<link>https://pressbooks.bccampus.ca/simplestats/back-matter/appendix/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/2018/10/31/appendix/</guid>
		<description></description>
		<content:encoded><![CDATA[This is where you can add appendices or other back matter.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>6</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[open]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[appendix]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[back-matter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<category domain="back-matter-type" nicename="appendix"><![CDATA[Appendix]]></category>
	</item>
	<item>
		<title>References</title>
		<link>https://pressbooks.bccampus.ca/simplestats/back-matter/references/</link>
		<pubDate>Wed, 31 Oct 2018 18:38:52 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=back-matter&#038;p=40</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>40</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:38:52]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:38:52]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[references]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[back-matter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal with standard deviation</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-1-properties-of-the-normal-curve/normal-with-standard-deviation/</link>
		<pubDate>Tue, 20 Aug 2019 01:17:05 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-with-standard-deviation.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1705</wp:post_id>
		<wp:post_date><![CDATA[2019-08-19 21:17:05]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-20 01:17:05]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-with-standard-deviation]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1694</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-with-standard-deviation.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-with-standard-deviation.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:870;s:6:"height";i:401;s:4:"file";s:42:"2019/08/normal-with-standard-deviation.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:42:"normal-with-standard-deviation-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:42:"normal-with-standard-deviation-300x138.png";s:5:"width";i:300;s:6:"height";i:138;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:42:"normal-with-standard-deviation-768x354.png";s:5:"width";i:768;s:6:"height";i:354;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:40:"normal-with-standard-deviation-65x30.png";s:5:"width";i:65;s:6:"height";i:30;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:42:"normal-with-standard-deviation-225x104.png";s:5:"width";i:225;s:6:"height";i:104;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:42:"normal-with-standard-deviation-350x161.png";s:5:"width";i:350;s:6:"height";i:161;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal test scores 68percent</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-1-properties-of-the-normal-curve/normal-test-scores-68percent/</link>
		<pubDate>Tue, 20 Aug 2019 01:48:48 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-test-scores-68percent-.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1707</wp:post_id>
		<wp:post_date><![CDATA[2019-08-19 21:48:48]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-20 01:48:48]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-test-scores-68percent]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1694</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-test-scores-68percent-.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-test-scores-68percent-.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:870;s:6:"height";i:401;s:4:"file";s:41:"2019/08/normal-test-scores-68percent-.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:41:"normal-test-scores-68percent--150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:41:"normal-test-scores-68percent--300x138.png";s:5:"width";i:300;s:6:"height";i:138;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:41:"normal-test-scores-68percent--768x354.png";s:5:"width";i:768;s:6:"height";i:354;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:39:"normal-test-scores-68percent--65x30.png";s:5:"width";i:65;s:6:"height";i:30;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:41:"normal-test-scores-68percent--225x104.png";s:5:"width";i:225;s:6:"height";i:104;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:41:"normal-test-scores-68percent--350x161.png";s:5:"width";i:350;s:6:"height";i:161;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal test scores 95percent</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-1-properties-of-the-normal-curve/normal-test-scores-95percent/</link>
		<pubDate>Tue, 20 Aug 2019 01:48:54 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-test-scores-95percent-.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1708</wp:post_id>
		<wp:post_date><![CDATA[2019-08-19 21:48:54]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-20 01:48:54]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-test-scores-95percent]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1694</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-test-scores-95percent-.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-test-scores-95percent-.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:41:"2019/08/normal-test-scores-95percent-.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:41:"normal-test-scores-95percent--150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:41:"normal-test-scores-95percent--300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:41:"normal-test-scores-95percent--768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:39:"normal-test-scores-95percent--65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:41:"normal-test-scores-95percent--225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:41:"normal-test-scores-95percent--350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal test scores 99percent</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-1-properties-of-the-normal-curve/normal-test-scores-99percent/</link>
		<pubDate>Tue, 20 Aug 2019 01:49:01 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-test-scores-99percent-.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1709</wp:post_id>
		<wp:post_date><![CDATA[2019-08-19 21:49:01]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-20 01:49:01]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-test-scores-99percent]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1694</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-test-scores-99percent-.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-test-scores-99percent-.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:41:"2019/08/normal-test-scores-99percent-.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:41:"normal-test-scores-99percent--150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:41:"normal-test-scores-99percent--300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:41:"normal-test-scores-99percent--768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:39:"normal-test-scores-99percent--65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:41:"normal-test-scores-99percent--225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:41:"normal-test-scores-99percent--350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal wrong percentiles</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-3-percentiles/normal-wrong-percentiles/</link>
		<pubDate>Wed, 21 Aug 2019 22:22:43 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-wrong-percentiles.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1749</wp:post_id>
		<wp:post_date><![CDATA[2019-08-21 18:22:43]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-21 22:22:43]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-wrong-percentiles]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>150</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-wrong-percentiles.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-wrong-percentiles.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:36:"2019/08/normal-wrong-percentiles.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:36:"normal-wrong-percentiles-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:36:"normal-wrong-percentiles-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:36:"normal-wrong-percentiles-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:34:"normal-wrong-percentiles-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:36:"normal-wrong-percentiles-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:36:"normal-wrong-percentiles-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>freq table social class gss 2016 probabilities</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-3-probabilities-with-frequency-tables/freq-table-social-class-gss-2016-probabilities/</link>
		<pubDate>Fri, 23 Aug 2019 20:31:11 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/freq-table-social-class-gss-2016-probabilities.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1842</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 16:31:11]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 20:31:11]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[freq-table-social-class-gss-2016-probabilities]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1837</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/freq-table-social-class-gss-2016-probabilities.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/freq-table-social-class-gss-2016-probabilities.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:533;s:6:"height";i:319;s:4:"file";s:58:"2019/08/freq-table-social-class-gss-2016-probabilities.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:58:"freq-table-social-class-gss-2016-probabilities-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:58:"freq-table-social-class-gss-2016-probabilities-300x180.png";s:5:"width";i:300;s:6:"height";i:180;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:56:"freq-table-social-class-gss-2016-probabilities-65x39.png";s:5:"width";i:65;s:6:"height";i:39;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:58:"freq-table-social-class-gss-2016-probabilities-225x135.png";s:5:"width";i:225;s:6:"height";i:135;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:58:"freq-table-social-class-gss-2016-probabilities-350x209.png";s:5:"width";i:350;s:6:"height";i:209;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>freq table marstat gss2016 probabilities</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-3-probabilities-with-frequency-tables/freq-table-marstat-gss2016-probabilities/</link>
		<pubDate>Fri, 23 Aug 2019 21:20:48 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/freq-table-marstat-gss2016-probabilities.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1857</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 17:20:48]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 21:20:48]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[freq-table-marstat-gss2016-probabilities]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1837</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/freq-table-marstat-gss2016-probabilities.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/freq-table-marstat-gss2016-probabilities.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:527;s:6:"height";i:231;s:4:"file";s:52:"2019/08/freq-table-marstat-gss2016-probabilities.png";s:5:"sizes";a:5:{s:9:"thumbnail";a:4:{s:4:"file";s:52:"freq-table-marstat-gss2016-probabilities-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:52:"freq-table-marstat-gss2016-probabilities-300x131.png";s:5:"width";i:300;s:6:"height";i:131;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:50:"freq-table-marstat-gss2016-probabilities-65x28.png";s:5:"width";i:65;s:6:"height";i:28;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:51:"freq-table-marstat-gss2016-probabilities-225x99.png";s:5:"width";i:225;s:6:"height";i:99;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:52:"freq-table-marstat-gss2016-probabilities-350x153.png";s:5:"width";i:350;s:6:"height";i:153;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal 100 percent</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-4-the-real-normal-distribution/normal-100-percent/</link>
		<pubDate>Fri, 23 Aug 2019 22:59:30 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-100-percent.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1868</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 18:59:30]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 22:59:30]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-100-percent]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1759</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-100-percent.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-100-percent.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:30:"2019/08/normal-100-percent.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:30:"normal-100-percent-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:30:"normal-100-percent-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:30:"normal-100-percent-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:28:"normal-100-percent-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:30:"normal-100-percent-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:30:"normal-100-percent-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal 50 50 percent</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-4-the-real-normal-distribution/normal-50-50-percent/</link>
		<pubDate>Fri, 23 Aug 2019 22:59:37 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-50-50-percent.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1869</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 18:59:37]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 22:59:37]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-50-50-percent]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1759</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-50-50-percent.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-50-50-percent.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:32:"2019/08/normal-50-50-percent.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:32:"normal-50-50-percent-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:32:"normal-50-50-percent-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:32:"normal-50-50-percent-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:30:"normal-50-50-percent-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:32:"normal-50-50-percent-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:32:"normal-50-50-percent-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal 68 percent</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-4-the-real-normal-distribution/normal-68-percent/</link>
		<pubDate>Fri, 23 Aug 2019 22:59:57 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-68-percent.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1870</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 18:59:57]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 22:59:57]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-68-percent]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1759</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-68-percent.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-68-percent.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:29:"2019/08/normal-68-percent.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:29:"normal-68-percent-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:29:"normal-68-percent-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:29:"normal-68-percent-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:27:"normal-68-percent-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:29:"normal-68-percent-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:29:"normal-68-percent-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal 95 percent</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-4-the-real-normal-distribution/normal-95-percent/</link>
		<pubDate>Fri, 23 Aug 2019 23:00:02 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-95-percent.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1871</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 19:00:02]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 23:00:02]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-95-percent]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1759</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-95-percent.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-95-percent.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:29:"2019/08/normal-95-percent.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:29:"normal-95-percent-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:29:"normal-95-percent-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:29:"normal-95-percent-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:27:"normal-95-percent-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:29:"normal-95-percent-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:29:"normal-95-percent-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal 99 percent</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-4-the-real-normal-distribution/normal-99-percent/</link>
		<pubDate>Fri, 23 Aug 2019 23:00:06 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-99-percent.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1872</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 19:00:06]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 23:00:06]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-99-percent]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1759</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-99-percent.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-99-percent.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:29:"2019/08/normal-99-percent.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:29:"normal-99-percent-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:29:"normal-99-percent-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:29:"normal-99-percent-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:27:"normal-99-percent-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:29:"normal-99-percent-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:29:"normal-99-percent-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal hockey players z example in cm</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-5-the-real-use-of-z-values/normal-hockey-players-z-example-in-cm/</link>
		<pubDate>Mon, 26 Aug 2019 20:25:47 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-in-cm.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1891</wp:post_id>
		<wp:post_date><![CDATA[2019-08-26 16:25:47]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-26 20:25:47]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-hockey-players-z-example-in-cm]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1876</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-in-cm.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-hockey-players-z-example-in-cm.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:49:"2019/08/normal-hockey-players-z-example-in-cm.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:49:"normal-hockey-players-z-example-in-cm-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:49:"normal-hockey-players-z-example-in-cm-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:49:"normal-hockey-players-z-example-in-cm-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:47:"normal-hockey-players-z-example-in-cm-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:49:"normal-hockey-players-z-example-in-cm-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:49:"normal-hockey-players-z-example-in-cm-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal hockey players z example</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-5-the-real-use-of-z-values/normal-hockey-players-z-example/</link>
		<pubDate>Mon, 26 Aug 2019 20:25:48 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1892</wp:post_id>
		<wp:post_date><![CDATA[2019-08-26 16:25:48]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-26 20:25:48]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-hockey-players-z-example]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1876</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-hockey-players-z-example.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:43:"2019/08/normal-hockey-players-z-example.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:43:"normal-hockey-players-z-example-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:43:"normal-hockey-players-z-example-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:43:"normal-hockey-players-z-example-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:41:"normal-hockey-players-z-example-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:43:"normal-hockey-players-z-example-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:43:"normal-hockey-players-z-example-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal hockey players z example in cm 2</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-5-the-real-use-of-z-values/normal-hockey-players-z-example-in-cm-2/</link>
		<pubDate>Mon, 26 Aug 2019 20:49:38 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-in-cm-2.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1898</wp:post_id>
		<wp:post_date><![CDATA[2019-08-26 16:49:38]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-26 20:49:38]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-hockey-players-z-example-in-cm-2]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1876</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-in-cm-2.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-hockey-players-z-example-in-cm-2.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:51:"2019/08/normal-hockey-players-z-example-in-cm-2.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-2-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-2-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-2-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:49:"normal-hockey-players-z-example-in-cm-2-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-2-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-2-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal hockey players z example 2</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-5-the-real-use-of-z-values/normal-hockey-players-z-example-2/</link>
		<pubDate>Mon, 26 Aug 2019 20:51:03 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-2.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1899</wp:post_id>
		<wp:post_date><![CDATA[2019-08-26 16:51:03]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-26 20:51:03]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-hockey-players-z-example-2]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1876</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-2.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-hockey-players-z-example-2.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:45:"2019/08/normal-hockey-players-z-example-2.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-2-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-2-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-2-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:43:"normal-hockey-players-z-example-2-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-2-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-2-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal hockey players z example in cm 3</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-5-the-real-use-of-z-values/normal-hockey-players-z-example-in-cm-3/</link>
		<pubDate>Mon, 26 Aug 2019 21:16:14 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-in-cm-3.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1903</wp:post_id>
		<wp:post_date><![CDATA[2019-08-26 17:16:14]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-26 21:16:14]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-hockey-players-z-example-in-cm-3]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1876</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-in-cm-3.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-hockey-players-z-example-in-cm-3.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:454;s:4:"file";s:51:"2019/08/normal-hockey-players-z-example-in-cm-3.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-3-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-3-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-3-768x388.png";s:5:"width";i:768;s:6:"height";i:388;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:49:"normal-hockey-players-z-example-in-cm-3-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-3-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:51:"normal-hockey-players-z-example-in-cm-3-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>normal hockey players z example 3</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-5-the-real-use-of-z-values/normal-hockey-players-z-example-3/</link>
		<pubDate>Mon, 26 Aug 2019 21:16:20 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-3.png</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1904</wp:post_id>
		<wp:post_date><![CDATA[2019-08-26 17:16:20]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-26 21:16:20]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[normal-hockey-players-z-example-3]]></wp:post_name>
		<wp:status><![CDATA[inherit]]></wp:status>
		<wp:post_parent>1876</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[attachment]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:attachment_url><![CDATA[https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-3.png]]></wp:attachment_url>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attached_file]]></wp:meta_key>
			<wp:meta_value><![CDATA[2019/08/normal-hockey-players-z-example-3.png]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_attachment_metadata]]></wp:meta_key>
			<wp:meta_value><![CDATA[a:5:{s:5:"width";i:898;s:6:"height";i:455;s:4:"file";s:45:"2019/08/normal-hockey-players-z-example-3.png";s:5:"sizes";a:6:{s:9:"thumbnail";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-3-150x150.png";s:5:"width";i:150;s:6:"height";i:150;s:9:"mime-type";s:9:"image/png";}s:6:"medium";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-3-300x152.png";s:5:"width";i:300;s:6:"height";i:152;s:9:"mime-type";s:9:"image/png";}s:12:"medium_large";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-3-768x389.png";s:5:"width";i:768;s:6:"height";i:389;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_small";a:4:{s:4:"file";s:43:"normal-hockey-players-z-example-3-65x33.png";s:5:"width";i:65;s:6:"height";i:33;s:9:"mime-type";s:9:"image/png";}s:15:"pb_cover_medium";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-3-225x114.png";s:5:"width";i:225;s:6:"height";i:114;s:9:"mime-type";s:9:"image/png";}s:14:"pb_cover_large";a:4:{s:4:"file";s:45:"normal-hockey-players-z-example-3-350x177.png";s:5:"width";i:350;s:6:"height";i:177;s:9:"mime-type";s:9:"image/png";}}s:10:"image_meta";a:12:{s:8:"aperture";s:1:"0";s:6:"credit";s:0:"";s:6:"camera";s:0:"";s:7:"caption";s:0:"";s:17:"created_timestamp";s:1:"0";s:9:"copyright";s:0:"";s:12:"focal_length";s:1:"0";s:3:"iso";s:1:"0";s:13:"shutter_speed";s:1:"0";s:5:"title";s:0:"";s:11:"orientation";s:1:"0";s:8:"keywords";a:0:{}}}]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>1.1 Variables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/chapter-1/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/2018/10/31/chapter-1/</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

You can think of a <strong><em>variable</em> as a characteristic that <em>varies</em> across individual elements</strong>. For example, hair colour varies across individuals: black, blonde, brown, red, grey (or practically any colour if we include the wonders of hair dying). If we go by other physical characteristics, we can easily see that height, weight, body type, skin colour, age, etc. are all <em>variables</em>.

&nbsp;

Then what about social/economic characteristics like level of education, annual income, occupation, employment, citizenship, marital status, political party affiliation, union membership, participation in sports (to name a few)...? All variables. Or, what about personal opinions and preferences? You might love chocolate a lot but your friend might not care for it; another friend might like it but just a little... Your friend might try to convince you that classical music is great but you might find it terribly boring, preferring rock instead. You might be a dog person and might frequently extol the virtues of dogs in comparison to cats, to the dismay of your cat-loving significant other. You might think that legalizing marijuana in Canada was the right decision but your parents might feel it was a profound mistake on part of the government. Clearly, opinions and preferences vary, so we can add 'opinion on marijuana legalization', 'liking of chocolate', 'preferred music genre to listen to', and 'favourite pet animal<span style="text-indent: 1em;font-size: 14pt">' to our ever growing list of variables.</span>

&nbsp;

So far, you might decide that variables only apply to <em>people</em>: after all, all the examples mentioned above discuss characteristics that vary across human beings. However, this is absolutely not the case, as we can just as easily see that other things can have varying characteristics. For example, <em>universities</em> can differ in their student enrollment numbers, instructor-to-student ratios, type of degrees awarded, geographical location, source of funding, presence of medical school, percentage of international students, etc. <em>Countries</em> vary on population size, climate, geographical/geopolitical location, language, GDP (gross domestic product), level of human development, presence of minority groups, immigration (and emigration) rates, fertility and mortality rates, access to universal healthcare, average education level, age of majority, freedom of press, type of government... you get the picture. Clearly, variables apply to <em>elements</em> of anything <span style="text-indent: 18.6667px;font-size: 14pt">that may be compared on characteristics which vary across these elements </span><span style="text-indent: 1em;font-size: 14pt">(hence the somewhat clumsy definition I started with).</span>

&nbsp;

<span style="text-indent: 1em;font-size: 14pt">Researchers refer to <strong><em>units of analysis</em></strong> when they want to specify the elements they study: When we have information about characteristics of <em>people</em>, we say that the unit of analysis is "individual". When instead of people, we study <em>countries</em>, the unit of analysis is "country", and so on.</span>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>5</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[open]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[chapter-1]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>3</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<category domain="chapter-type" nicename="standard"><![CDATA[Standard]]></category>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>1.3 Levels of Measurement</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/1-3-levels-of-measurement/</link>
		<pubDate>Wed, 31 Oct 2018 20:45:17 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=47</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Now that you know there are different ways to operationalize concepts, let me introduce another term in respect to variables: <em>level of measurement</em>. Each and every variable has a level of measurement. Knowing, or being able to identify, the level of measurement of a variable tells us how it has been operationalized and vice versa: knowing how an existing variable has been operationalized gives us information about its level of measurement.

&nbsp;

More importantly, however, knowing and being able to identify <strong>a variables's level of measurement allows us to determine what we can do with that variable in terms of statistical methods and procedures</strong>. This last point is key to doing statistical analysis in a correct and meaningful way. The flip side is also true: misidentifying a variable's level of measurement will inevitably end in erroneous analysis and conclusions (that is, if the analysis can even be performed, as in many cases the statistical software will give an error message).[footnote]The more dangerous - and quite frequent - scenario, however, is when the software will execute the analysis and produce results. In that case, without an error message to warn them, the researchers would trust their analysis and results without realizing both are bogus. [/footnote]

&nbsp;

Why is the level of measurement so important for statistical analysis?

&nbsp;

Simply put, variables are not created equal when it comes to levels of measurement. Due to differences in the nature of the information contained within, you can do very little with some variables <span style="text-indent: 18.6667px;font-size: 14pt">in terms of analysis</span><span style="text-indent: 18.6667px;font-size: 14pt"> </span><span style="text-indent: 1em;font-size: 14pt">while you can do a whole lot more with others. </span>

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do it!</em> <em>1.2. Measuring Different Types of Variables</em></p>

</header>
<div class="textbox__content">

Imagine you have to analyze the following (individual-level) variables:

a) religious affiliation,

b) educational attainment,

c) exam test scores,

d) age.

&nbsp;

Think of what type of information would be contained within the categories of each of the four variables above. (It might help to imagine the possible answers respondents -- say, university students -- could give if asked questionnaire questions about each.)

&nbsp;

What more (beyond collecting it), if anything, can you do with that information? For example, can you say that one answer is more/bigger than another? Can you identify answers as different or the same as others? Can you do some calculations with the answers?

</div>
</div>
&nbsp;

The exercise above gives you a clue: <strong>there are <em>four </em>levels of measurement</strong>. They are called <em>nominal</em>, <em>ordinal</em>, <em>interval</em>, and <em>ratio</em>. Each and every variable has only one level of measurement once it's operationalized.[footnote]Recall, however, that sometimes -- though not always -- one and the same variable can be operationalized in different ways. These different ways can sometimes be at different levels of measurement, depending on the type of information we want to have.[/footnote] A variable's level of measurement is sometimes also called its measurement <em>scale</em>.

&nbsp;

The following sub-sections provide details about each measurement scales.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>47</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 16:45:17]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 20:45:17]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[1-3-levels-of-measurement]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>3</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>1.2 Concepts, Measurement, and Operationalization</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/1-2-operationalization/</link>
		<pubDate>Wed, 31 Oct 2018 20:47:53 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=49</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

You might be wondering why we even need to introduce a concept such as variables. Can't we simply call them <em>characteristics</em>, if that's what they are? The short answer is that we use the language of variables when we engage in formal research, but the reason is not solely scientific jargon. <em>Variables</em>, as opposed to <em>characteristics</em>, imply measurement.

&nbsp;

You see, sociologists and other social scientists study <em>concepts</em> (i.e., ideas, notions) that are more often than not abstract. If I say "I want to know if the average height of Canadians has changed over time", it's easy for you to suggest that I first collect information about people's heights (perhaps actually measure them, if I don't trust self-reports). By doing that, you might not realize it but what you have done is actually offer <em>a way to measure a concept</em>, which is what we call with the mouthful of a word <strong><em>operationalization</em></strong>. In other words, you have <em>operationalized</em> the abstract concept (height of Canadians) through the actual, physical measurement of individuals' heights (in centimeters or in inches) in real life.

&nbsp;

So operationalization is that easy, right? Unfortunately, no, not really.

&nbsp;

What if, instead of average height of Canadians, I had wanted to study how poverty has changed in Canada over time? Or homelessness? How about income? Or people's attitudes to immigration? Or their religiosity? What about if I wanted to study self-esteem of adolescents? Or social status among Canadian university students? Or bullying in high school?

&nbsp;

I'm sure you have no trouble understanding the concepts as <em>abstract ideas</em> -- but how do you <em>measure</em> them? <span style="text-indent: 18.6667px;font-size: 14pt">[footnote]</span>There are various ways one can measure concepts. At the most fundamental level, this depends on what the chosen method of inquiry (or, research) is,<em> qualitative</em> or <em>quantitative</em>. We shouldn't reify the boundary between quantitative and qualitatve methods, however. Many scientists mix their methods, employing both methods in a single study with considerable success. Social scientists use statistics predominantly when they have chosen a quantitative method of collecting and analyzing data, so here we'll focus on the quantitative operationalization of concepts.<span style="text-indent: 18.6667px;font-size: 14pt">[/footnote] </span>
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do it! 1.1 Measuring Homelessness</em></p>

</header>
<div class="textbox__content">

Imagine you really do want to study the prevalence of homelessness in your city (or any of the abstract concepts mentioned above). Before you decide how to collect information about it, you have to choose about what <em>exactly </em>you will be gathering information<em>.</em> How are you going to define <em>being homeless</em> in order to measure homelessness? In a word, how are you going to <em>operationalize</em> homelessness? Make a list of possible definitions. What are the various aspects of homelessness, which you may choose to consider in your definition or not, that make defining homelessness difficult?

</div>
</div>
&nbsp;

All in all, operationalizing a concept boils down to choosing a working (i.e., operational), <em>measurable</em> definition of a concept within a given study. Most concepts can be (and regularly are) defined differently by different researchers. What matters is that the definition of any concept is provided and is used consistently within each individual study.

&nbsp;

If the <em>Do It!</em> exercise above seems too abstract still, perhaps one easier way to understand the operationalization of concepts into measurable variables with concrete definitions is to imagine a survey question about what you want to study. Sometimes one such question can provide the operationalization/definition of the concept under study. Other times a single question is not enough and a set of questions can help a researcher measure what they want to study.

&nbsp;

Let's say you want to study <em>income</em> (perhaps as a part of a larger study on poverty). You want to ask people about their income but how exactly? Will you be asking about personal or household income? Are there types of income you have in mind -- from salary, from rent, from interest, etc.? Is it weekly, bi-monthly, or annual income? Is it income <em>before</em> or <em>after</em> taxes? For that matter, do you mean only taxable income? Furthermore, what kind of answers would you accept? Will the respondents provide an exact number? Or will you provide a set of multiple-choice answers from which the respondents will choose?

&nbsp;

For example, you can measure income in a hypothetical study (through a survey question) like this: "What is your household's annual after-tax income (from any source)?" This means that you have chosen to operationalize the abstract concept <em>income</em> through the specific, measurable variable <em>annual household after-tax income.</em>

&nbsp;

The types of possible answers you choose to accept for the question are also part of the measurement. Example 1.1. below offers three options to operationalize income.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 1.1. Operationalizing Income</em></p>

</header>
<div class="textbox__content">

Q1. What is your household's annual after-tax income (from any source)?

a) $0 - $50,000;

b) $50,001 - $100,000;

c) $100,001 - $150,000;

d) $150,001 - $200,000;

e) $200,001 or more;

</div>
Q2. Is your annual household after-tax income (from any source) less than $50,000?

a) Yes;

b) No.

&nbsp;

Q3. What is your annual household after-tax income (from any source)?

.... [Any number provided by the respondent will be recorded.]

&nbsp;

</div>
&nbsp;

The multiple choices provided in <em>Q1</em> in the example above can contain any number of categories to choose from. I have chosen to go by 50 thousand dollars to create the categories, but I could have done so by as little as, say, five thousand dollars to as much as 500 thousand dollars (and I would have ended with a different number of possible answers). If we need the actual dollar amount of the income reported by each respondent, we'll chose to ask <em>Q3</em>.

&nbsp;

The way we choose to create categories or not depends on the type of answers that will be suitable for our study and what type of information we want. As well, <em>Q2</em> offers only two possible answers, yes or no. If the relevant information for our study is whether household annual income is below or above $50,000 (say, because the average such income has already been established as $50,000), Q2 would be the way to go.

&nbsp;

Keep in mind that how a variable is operationalized depends not only on the researcher's goals and needs (and practical considerations like time and money) -- but also on their personal beliefs and preferences, the time period in which they live/d and work/ed, etc. Operationalizing concepts considered controversial at a specific time and place can be quite political and itself become a controversy. Consider the following example.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 1.2. Operationalzing Gender</em></p>

</header>
<div class="textbox__content">

It should come as not surprise to anyone studying sociology that how people operationalize gender has changed over time. Until recently, the conventional operationalization went something like this:

</div>
Q1. Are you...?

a) Male

b) Female

&nbsp;

With advances in the study of gender and sexuality, over time our understanding of gender changed. Nowadays you are far more likely to see an operationalization similar to the following style of the American Sociological Association when collecting information on their members:

&nbsp;

Q2. What is your gender? Select up to two.
a) Female
b) Male
c) Transgender female/Transgender woman
d) Transgender male/Transgender man
e) Gender queer/Gender non-conforming
f) Different identity (please specify) ......
g) Prefer not to state

&nbsp;

In countries like Canada, using <em>Q1</em> nowadays would might be considered too restrictive for many purposes, and also offensive by some. On the other hand, in some countries (like in Eastern Europe) choosing to go with <em>Q2</em> might be seen as quite controversial and as political activism. Even in Canada the switch to more inclusive gender oprationalization is gradual and quite recent. As you will see later in the book, datasets collected in the past typically use a binary operationalization of gender.

&nbsp;

</div>
&nbsp;

Before we continue with measurement in the next section, here is a practical tip when working with SPSS.

&nbsp;
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip 1.1. Exploring How Variables in a Dataset Have Been Operationalized</em></p>

</header>
<div class="textbox__content">

When exploring an existing dataset in SPSS (more on that in Chapter 2), you can see a variable's categories/values in the <em>Values</em> column in <em>Data View</em>. (You can switch between <em>Data View</em> and <em>Variable View</em> by clicking on their respective tabs at the bottom of your primary data window.) Clicking on a variable's cell in the <em>Values</em> column will open a new window listing all the categories/values through which the variable has been operationalized.

</div>
</div>
&nbsp;

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>49</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 16:47:53]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 20:47:53]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[1-2-operationalization]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>3</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[1-2-1-operationalization]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>1.6 Creating Variables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/1-6-creating-variables/</link>
		<pubDate>Wed, 31 Oct 2018 20:49:14 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=53</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

If you ever find yourself in need of creating your own variables (perhaps, in creating a questionnaire), this brief final note is for you. As well, you can learn to evaluate whether an existing variable has been operationalized properly.

&nbsp;

<strong>To properly create a variable, its categories need to satisfy two requirements: they need to be collectively exhaustive and mutually exclusive.</strong> The first condition, <em>collectively exhaustive</em>, refers to the requirement that the categories cover all possible ways the variable can vary (or ll posisble answers to a questionnaire question) -- none can be excluded. The second condition, <em>mutually exclusive</em>, adds the logical necessity that a specific variation (or an answer to a questionnaire question) can exist in one and only one category.

&nbsp;

This is simpler than the definition makes it sound to be. The following example illustrates.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 1.4. Logical Requirements to Operationalizing Variables</em></p>

</header>
<div class="textbox__content">

Imagine you are filling out a questionnaire and one of the questions is about age, like this:

</div>
Q1. What is your age?

a) 20-29

b) 30-39

c) 40-49

d) 50-59

&nbsp;

What if you are 18 or 19? Which answer would you pick? How about if the person filling out the questionnaire is 60 or older? As stated, the Q1 question (i.e., the way the variable <em>age</em> is operationalized by it) violates the first requirement, that of providing an exhaustive list of all possibilities. All possible variations need to be covered by the variable's categories, otherwise the variable is incomplete.

&nbsp;

Now consider another hypothetical way to ask the same questionnaire question:

&nbsp;

Q2. What is your age?

a) 18-25

b) 25-30

c) 30-35

d) 35-40

e) 40-45

f) 45-50

g) 50+

&nbsp;

Assuming the questionnaire is administered only to adults, Q2 provides a collectively exhaustive list of possible answers; the variable's categories are too collectively exhaustive.

&nbsp;

They are, however, misleading as they are not mutually exclusive. Which answer do you pick if you are 25 -- a) or b)? Which answer do you pick if you are 40 -- d) or e)? Logically, one and the same possible variation cannot fall into two or more categories; it can only fall in<em> one</em> of the variable's categories.

&nbsp;

Thus, one proper way to operationalize<em> age</em> is something like this:

&nbsp;

Q3. What is your age?

a) 18-25

b) 26-30

c) 31-35

d) 36-40

e) 41-45

f) 46-50

g) Above 50

&nbsp;

</div>
&nbsp;

See if you can spot and fix violations of the two logical operationalization requirements in the exercise below.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 1.6. What is Wrong with These Variables' Operationalizations?</em></p>

</header>
<div class="textbox__content">

Q4. What year in college are you?

a) First-year

b) Second-year

&nbsp;

Q5. How many siblings do you have?

a) 0

b) 1

c) 1-2

d) 3-4

e) 4 or more

</div>
Q6. How do you commute to your institution's campus?

a) Car

b) Public transit

c) Bus

d) Bike

&nbsp;

</div>
&nbsp;

Now that we've covered the theoretical preliminaries, go see what working with actual data is like, in Chapter 2.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>53</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 16:49:14]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 20:49:14]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[1-6-creating-variables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>3</wp:post_parent>
		<wp:menu_order>9</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[1-2-3-validity-and-reliability]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>2.1 Data Sets and What Data &quot;Looks&quot; Like</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-1-data/</link>
		<pubDate>Wed, 31 Oct 2018 21:00:20 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=57</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

By now you have learned that <em>variables</em> are tools that allow us to measure concepts and to collect information about them. As such they are comprised of information -- information that varies across the <em>units of analysis</em> (the 'things' on which we collect information, be it people, organizations, countries, etc.). So far, we have discussed individual variables - but creating and collecting information on a single variable is uncommon. Generally, we collect information on many variables at the same time (which, in turn, allows us to analyze variables together and hypothesize about possible associations between variables).

&nbsp;

Variables "live" in data sets (or datasets, as I prefer; both usages are common). <strong>A <em>dataset</em> is a collection of variables that lists the information (or, observations) gathered on them from the units of analysis.</strong> As usual, I focus on analysis of people for simplicity's sake (but do keep in mind the units of analysis can be something else.)

&nbsp;

The best way to visualize a dataset is as a sort of a table (a.k.a a <em>matrix</em>) which summarizes the responses from every individual (in the raws of the table) on the variables in the dataset (in the columns of the table). As such, the size of a dataset depends on two things: the number of variables and the number of individuals supplying information (a.k.a. respondents). Typically, datasets vary in size from just a handful of variables and few respondents to hundreds of variables and thousands of respondents. (Huge datasets -- comprising information on millions of people -- exist too; these are known as <em>big data</em>. Big data is not analyzed in the conventional ways regular datasets are, so from now on we'll leave big data aside as it's not the subject of this book.)

&nbsp;

To start small, imagine you have just four friends at your university and you decide to list some items of information about them (say, maybe you want to compare your standing at the university with theirs, and to see differences and commonalities between you and them). You could do that in a sentence form, for example, thus: Arjun, who is twenty years old, speaks Punjabi at home and is a first year student in the Business School, has a job and his GPA is 3.6. Benjamin, on the other hand, who is 25, speaks German at home and is a third year Science student, also has a job but his GPA is lower than Arjun's at 3.2. Cecilia, who speaks Spanish at home and is a fourth year Health Sciences student doesn't have a paying job and her GPA is the highest of your friends, 4.0. Finally, Xingxing is also a first year student and is employed like Arjun but she is an Arts major, speaks Mandarin at home, and her GPA is 3.3.

&nbsp;

Indeed, you might do that but the points of comparison might get lost as they are not easy to see: one has to read very carefully to keep track of who does what and has a GPA of how much. Instead, you could present the same information as it is in the table in Example 2.1 below.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.1 (A)  A Hypothetical Dataset of Four Friends's Characteristics</em></p>

</header>
<div class="textbox__content">
<table style="border-collapse: collapse;width: 100%;height: 75px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 14.2857%;height: 15px"></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Age</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Year at university</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Employment</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>GPA</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Major (by Faculty)</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Language spoken at home</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 14.2857%;height: 15px"><strong>Arjun</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center">20</td>
<td style="width: 14.2857%;height: 15px;text-align: center">1</td>
<td style="width: 14.2857%;height: 15px;text-align: center">yes</td>
<td style="width: 14.2857%;height: 15px;text-align: center">3.6</td>
<td style="width: 14.2857%;height: 15px;text-align: center">Business</td>
<td style="width: 14.2857%;height: 15px;text-align: center">Punjabi</td>
</tr>
<tr style="height: 15px">
<td style="width: 14.2857%;height: 15px"><strong>Benjamin</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center">25</td>
<td style="width: 14.2857%;height: 15px;text-align: center">3</td>
<td style="width: 14.2857%;height: 15px;text-align: center">yes</td>
<td style="width: 14.2857%;height: 15px;text-align: center">3.2</td>
<td style="width: 14.2857%;height: 15px;text-align: center">Science</td>
<td style="width: 14.2857%;height: 15px;text-align: center">German</td>
</tr>
<tr style="height: 15px">
<td style="width: 14.2857%;height: 15px"><strong>Cecilia</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center">22</td>
<td style="width: 14.2857%;height: 15px;text-align: center">4</td>
<td style="width: 14.2857%;height: 15px;text-align: center">no</td>
<td style="width: 14.2857%;height: 15px;text-align: center">4.0</td>
<td style="width: 14.2857%;height: 15px;text-align: center">Health</td>
<td style="width: 14.2857%;height: 15px;text-align: center">Spanish</td>
</tr>
<tr style="height: 15px">
<td style="width: 14.2857%;height: 15px"><strong>Xingxing</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center">19</td>
<td style="width: 14.2857%;height: 15px;text-align: center">1</td>
<td style="width: 14.2857%;height: 15px;text-align: center">yes</td>
<td style="width: 14.2857%;height: 15px;text-align: center">3.3</td>
<td style="width: 14.2857%;height: 15px;text-align: center">Arts</td>
<td style="width: 14.2857%;height: 15px;text-align: center">Mandarin</td>
</tr>
</tbody>
</table>
</div>
</div>
&nbsp;

If you do that, what you have created is a dataset. Now imagine that instead of this contrived combination of four friends and their varying characteristics, I generalize the example like so:

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.1 (B) A Hypothetical Dataset of Four Individuals and Six Variables</em></p>

</header>
<div class="textbox__content">
<table style="border-collapse: collapse;width: 100%;height: 75px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 16.6936%;height: 15px"></td>
<td style="width: 11.8778%;height: 15px;text-align: center"><strong>Variable 1</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Variable 2</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Variable 3</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Variable 4</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Variable 5</strong></td>
<td style="width: 14.2857%;height: 15px;text-align: center"><strong>Variable 6</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 16.6936%;height: 15px"><strong>Respondent #1</strong></td>
<td style="width: 11.8778%;height: 15px;text-align: center">Response<sub>1.1</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>2.1</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>3.1</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>4.1</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>5.1</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>6.1</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 16.6936%;height: 15px"><strong>Respondent #2</strong></td>
<td style="width: 11.8778%;height: 15px;text-align: center">Response<sub>1.2</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>2.2</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>3.2</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>4.2</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>5.2</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>6.2</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 16.6936%;height: 15px"><strong>Respondent #3</strong></td>
<td style="width: 11.8778%;height: 15px;text-align: center">Response<sub>1.3</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>2.3</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>3.3</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>4.3</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>5.3</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>6.3</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 16.6936%;height: 15px"><strong>Respondent #4</strong></td>
<td style="width: 11.8778%;height: 15px;text-align: center">Response<sub>1.4</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>2.4</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>3.4</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>4.4</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>5.4</sub></td>
<td style="width: 14.2857%;height: 15px;text-align: center">Response<sub>6.4</sub></td>
</tr>
</tbody>
</table>
</div>
</div>
&nbsp;

In Example 2.1 (B), the respondents are the four people on whose varying characteristics we have information, and these are represented by the six variables. This, however, seems a rather cumbersome. Instead of "Variable 3", and "Respondent 5", and "Response<sub>4.3</sub>", etc., a simpler way to represent all of these in a generalized way is through mathematical notation.[footnote]A note on mathematical notation, about which, I know, many students feel quite anxious: think of notation as a type of shorthand, or a sort of simplified foreign language. It's used to simplify what you can write out in words and sentences but would be too long and not as clear. The key to notation, just like with any foreign language, is to know what the symbols mean. Keep their meaning in mind, and you can read notation as fast and as easily as your own language.[/footnote]

So, prepare yourselves! Here comes notation:

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.1 (C) A Hypothetical Dataset of Four Individuals and Six Variables 2.0</em></p>

</header>
<div class="textbox__content">
<table style="border-collapse: collapse;width: 74.2857%" border="0">
<tbody>
<tr>
<td style="width: 14.2857%"></td>
<td style="width: 10%;text-align: center"><strong>X<sub>1</sub></strong></td>
<td style="width: 10%;text-align: center"><strong>X<sub>2</sub></strong></td>
<td style="width: 10%;text-align: center"><strong>X<sub>3</sub></strong></td>
<td style="width: 10%;text-align: center"><strong>X<sub>4</sub></strong></td>
<td style="width: 10%;text-align: center"><strong>X<sub>5</sub></strong></td>
<td style="width: 10%;text-align: center"><strong>X<sub>6</sub></strong></td>
</tr>
<tr>
<td style="width: 14.2857%"><strong>I<sub>1</sub></strong></td>
<td style="width: 10%;text-align: center"><sub><span style="font-size: 14.4px">x</span>11</sub></td>
<td style="width: 10%;text-align: center"><sub><span style="font-size: 14.4px">x</span>21</sub></td>
<td style="width: 10%;text-align: center">x<sub>31</sub></td>
<td style="width: 10%;text-align: center">x<sub>41</sub></td>
<td style="width: 10%;text-align: center">x<sub>51</sub></td>
<td style="width: 10%;text-align: center">x<sub>61</sub></td>
</tr>
<tr>
<td style="width: 14.2857%"><strong>I<sub>2</sub></strong></td>
<td style="width: 10%;text-align: center"><sub><span style="font-size: 14.4px">x</span>12</sub></td>
<td style="width: 10%;text-align: center">x<sub>22</sub></td>
<td style="width: 10%;text-align: center">x<sub>32</sub></td>
<td style="width: 10%;text-align: center">x<sub>42</sub></td>
<td style="width: 10%;text-align: center">x<sub>52</sub></td>
<td style="width: 10%;text-align: center">x<sub>62</sub></td>
</tr>
<tr>
<td style="width: 14.2857%"><strong>I<sub>3</sub></strong></td>
<td style="width: 10%;text-align: center"><sub><span style="font-size: 14.4px">x</span>13</sub></td>
<td style="width: 10%;text-align: center">x<sub>23</sub></td>
<td style="width: 10%;text-align: center">x<sub>33</sub></td>
<td style="width: 10%;text-align: center">x<sub>43</sub></td>
<td style="width: 10%;text-align: center">x<sub>53</sub></td>
<td style="width: 10%;text-align: center">x<sub>63</sub></td>
</tr>
<tr>
<td style="width: 14.2857%"><strong>I<sub>4</sub></strong></td>
<td style="width: 10%;text-align: center">x<sub>14</sub></td>
<td style="width: 10%;text-align: center">x<sub>24</sub></td>
<td style="width: 10%;text-align: center">x<sub>34</sub></td>
<td style="width: 10%;text-align: center">x<sub>44</sub></td>
<td style="width: 10%;text-align: center">x<sub>54</sub></td>
<td style="width: 10%;text-align: center">x<sub>64</sub></td>
</tr>
</tbody>
</table>
</div>
</div>
&nbsp;

In Example 2.1 (C), <em>I<sub>1</sub>, I<sub>2</sub>, I<sub>3</sub></em>, and <em>I<sub>4</sub></em> are the four individuals; <em>X<sub>1</sub>, X<sub>2</sub>, X<sub>3</sub>, X<sub>4</sub>, X<sub>5</sub></em>, and <em>X<sub>6</sub></em> are the six variables; and <em>x<sub>11</sub>, x<sub>12</sub></em>, etc. stand for any specific characteristic/response a respondent has on a variable. More specifically, <em>x<sub>53</sub></em>, for example, is the characteristic that Respondent #3 has on Variable 5. Scrolling up to Example 2.1 (A) will allow you to see that <em>x<sub>53</sub></em> is <em>Health</em>, which is Cecilia's Major by Faculty.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It!</em> <em>2.1 Reading Points of Information </em></p>

</header>
<div class="textbox__content">

In a similar vein, look up <em>x<sub>22</sub>, x<sub>34</sub>,</em> and <em>x<sub>61</sub></em>. It's a simple and easy task but it will help you connect notation to what it stands for, and to understand the logic behind the way information is presented in datasets.

</div>
</div>
&nbsp;

From here, it's not difficult to extrapolate the specific dataset we had above to a general one. Thus, Example 2.1 (D) below presents a template of a typical dataset.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.1 (D) A Hypothetical Dataset of N Individuals and K Variables</em></p>

</header>
<div class="textbox__content">
<table style="border-collapse: collapse;width: 100%;height: 146px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 10%;height: 15px"></td>
<td style="width: 10%;text-align: center;height: 15px"><strong>X<sub>1</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px"><strong>X<sub>2</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px"><strong>X<sub>3</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px"><strong>X<sub>4</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px"><strong>X<sub>5</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px"><strong>X<sub>6</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px"><strong>X<sub>7</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px"><strong>...</strong></td>
<td style="width: 10%;text-align: center;height: 15px"><strong>X<sub>K</sub></strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 10%;height: 15px"><strong>I<sub>1</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>11</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>21</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>31</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>41</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>51</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>61</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>71</sub></td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>k1</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 10%;height: 15px"><strong>I<sub>2</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>12</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>22</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>32</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>42</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>52</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>62</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>72</sub></td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>k2</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 10%;height: 15px"><strong>I<sub>3</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>13</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>23</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>33</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>43</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>53</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>63</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>73</sub></td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>k3</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 10%;height: 15px"><strong>I<sub>4</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>14</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>24</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>34</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>44</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>54</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>64</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>74</sub></td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>k4</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 10%;height: 11px"><strong>I<sub>5</sub></strong></td>
<td style="width: 10%;text-align: center;height: 11px">x<sub>15</sub></td>
<td style="width: 10%;text-align: center;height: 11px">x<sub>25</sub></td>
<td style="width: 10%;text-align: center;height: 11px">x<sub>35</sub></td>
<td style="width: 10%;text-align: center;height: 11px">x<sub>45</sub></td>
<td style="width: 10%;text-align: center;height: 11px">x<sub>55</sub></td>
<td style="width: 10%;text-align: center;height: 11px">x<sub>65</sub></td>
<td style="width: 10%;text-align: center;height: 11px">x<sub>75</sub></td>
<td style="width: 10%;text-align: center;height: 11px">...</td>
<td style="width: 10%;text-align: center;height: 11px">x<sub>k5</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 10%;height: 15px"><strong>I<sub>6</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>16</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>26</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>36</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>46</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>56</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>66</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>76</sub></td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>k6</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 10%;height: 15px"><strong>I<sub>7</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>17</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>27</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>37</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>47</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>57</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>67</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>77</sub></td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>k7</sub></td>
</tr>
<tr style="height: 15px">
<td style="width: 10%;height: 15px"><strong>...</strong></td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
</tr>
<tr style="height: 15px">
<td style="width: 10%;height: 15px"><strong>I<sub>N</sub></strong></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>1n</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>2n</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>3n</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>4n</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>5n</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>6n</sub></td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>7n</sub></td>
<td style="width: 10%;text-align: center;height: 15px">...</td>
<td style="width: 10%;text-align: center;height: 15px">x<sub>kn</sub></td>
</tr>
</tbody>
</table>
<em>N</em>= number of elements in the dataset

<em>K</em>= number of variables in the dataset

</div>
</div>
&nbsp;

In the table above, you may think of <em>N</em> as the last row on the table, i.e., the last individual for whom we have information and you may of <em>K</em> as the last column on the table, i.e., the last variable we have in the dataset. Both numbers can theoretically be "any positive number", though in practice the former is usually a number up to several thousands and the latter a number up to few hundreds. The ellipses in the next-to-last row and the next-to-last column indicate that the table is truncated:   there are omitted rows between the seventh and the last individuals (i.e., between <em>I<sub>7</sub></em> and<em> I<sub>N</sub></em>), and omitted columns between the seventh and the last variables (i.e., between <em>X<sub>7</sub></em> and <em>X<sub>K</sub></em>). (They obviously have to be omitted so that the table can fit on the page.)

&nbsp;

Armed with this knowledge, let's take a look at an excerpt from a real dataset. The following Example 2.1 (E) provides a snapshot of the first ten respondents and first nine variables in the <em>Aboriginal Peoples Survey 2012</em> <span style="text-indent: 37.3333px;font-size: 14pt">dataset </span><span style="text-align: initial;text-indent: 2em;font-size: 14pt">(or </span><em style="text-align: initial;text-indent: 2em;font-size: 14pt">APS 2012 </em><span style="text-align: initial;text-indent: 2em;font-size: 14pt">for short)[footnote]APS 2012 is a Statistics Canada dataset which I will formally introduce in <span style="color: #ffff00"><span style="color: #000000;background-color: #ffff00">Ch. XX</span>.</span>[/footnote] using a software called </span><em style="text-align: initial;text-indent: 2em;font-size: 14pt">IBM® Statistical Package for the Social Sciences</em><span style="text-align: initial;text-indent: 2em;font-size: 14pt">, commonly referred to as SPSS. </span>

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.1 (E) A Snapshot of Survey Data (APS 2012</em><em>)</em></p>

</header>
<div class="textbox__content">

&nbsp;

Snapshot of <em>APS 2012</em>'s <em>Data View </em>in SPSS:

&nbsp;

<span style="font-size: 14px"><img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/data-snapshot-data-view.png" alt="" width="874" height="261" class="aligncenter wp-image-1465 size-full" /></span>

&nbsp;

<span style="font-size: 1rem;text-indent: 1em">Snapshot of <em>APS 2012</em>'s <em>Variable View</em> in SPSS:</span>

&nbsp;

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/data-snapshot-variable-view.png" alt="" width="1043" height="220" class="aligncenter wp-image-1464 size-full" />

</div>
</div>
&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It!</em> <em>2.2  Understanding How Datasets Are Organized</em></p>

</header>
<div class="textbox__content">

Make sure you can connect the data snapshots from the example above with your understanding of how datasets are organized. What do the numbers in the first (blue) columns in both images represent? (Hint: this is not a variable!) What is listed in the first (blue) row in the top image? In the top image what does 1 stand for in the first white row in column <em>ID_03G</em>? How about the 1 in the fifth row in the <em>SEX</em> column?

</div>
</div>
&nbsp;

One thing you might find surprising is the obvious fact that all cell entries (i.e., the observations we have) are listed in a number format. Does that mean that all variables in this particular dataset are interval or ratio? What about any nominal or ordinal variables - do they not exist in this dataset? The answer is "no", on both accounts: the variable <em>SEX</em> (i.e., "<em>Sex of respondent" </em>as stated in <em>Variable View</em>) is nominal and the variable <em>AGE_YRSG</em> (i.e., "Age group of respondent..."<strong>)</strong> is ordinal because of the hierarchical arrangement of the responses.<span> </span> <strong>However, the dataset cells contain only numbers because statistical software can only analyze numerical data.</strong>

&nbsp;

<strong>Coding.</strong> To that effect, nominal and ordinal variables appear "in code" in datasets; i.e., <strong>the categories of nominal and ordinal variables are assigned numerical values as <em>labels</em> to represent them</strong> in the actual dataset you might be working with. Thus, the numbers in nominal and ordinal variables' columns are not <em>actual numbers, </em>they are artificially (and in the case of nominal variables, somewhat arbitrarily) assigned to represent the words contained in the categories in order to make computer-based statistical analysis possible. (On the other hand, interval/ratio variables' categories contain <em>actual numbers.</em> Of course, the trick then is to learn to differentiate the actual numbers from the code/ number values used as labels <span style="text-indent: 18.6667px;font-size: 14pt">in the cells of a dataset.)</span>

&nbsp;

Therefore, you should always keep track of the code (see the Watch Out! panel below for tips on <em>Variable View</em> in SPSS which allows you to do that), and remember to refer to the categories by their proper (word-based) names -- not by the artificial numerical values (i.e., code) representing them!

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><strong><span style="color: #ff0000">Watch Out!! #2</span></strong>...for  Making Hasty Decisions about Variables Based </em>Only<em> on Data View or </em>Only <em>on Variable View</em></p>

</header>&nbsp;

It's tempting, but you cannot deduce <em>all</em> categories of a variable with any certainty just by looking at the snapshot in Example 2.1 (E). You cannot do that even if, instead of a snapshot, you had the real, interactive <em>Data View</em> window in SPSS in front of you.  Not only you might not be able to scroll through all the data (depending on its size) but, more importantly, not all characteristics might exist among the individuals. (For example, imagine the variable <em>hair colour</em>, and say, not one respondent having red hair: then a response "red" would not be visible in <em>Data View</em>, even if such a category existed in the variable.) For the same reasons you should also not decide a variable's level of measurement based on <em>Data View</em>. (Remember, all data in the cells appears in numerical format, regardless if it's an actual number or just a value label/code!)

&nbsp;

To explore any dataset you might end up working with and all the variables contained therein, you should always look to explore not only the <em>Data View</em> but the <em>Variable View</em> of the dataset as well (in SPSS you can toggle between Data View and Variable View easily with a click of the mouse). The <em>Variable View</em> lists all variables along with some information about them -- including something which <em>looks like</em> their level of measurement, called <em>Measure</em> (it is not included in the bottom snapshot above).  <strong>The <em>Measure</em> information can be quite misleading for students so: Never trust this software-generated conclusion!</strong>

&nbsp;

Instead, you should always explore <em>both</em> <em>Variable View</em> and <em>Data View</em>. You should note the variables' respective categories (in <em>Variable View, </em>where you can click on any cell in the <em>Values</em> column for a full category listing) and the type of the observations you have in the cells in the table (in <em>Data View</em>). Then --and <em>only</em> then -- reach the appropriate conclusion about the levels of measurement of the variables you have at hand.

&nbsp;

What should guide your decision about a variable's level of measurement is what you see in the <em>Values</em> column in<em> Data View</em>. To repeat, clicking on the respective column will open up a window displaying the (nominal or ordinal) variable's categories/values along with the number label representing them in the dataset.

&nbsp;

Again, note that reporting on the variable should be done by using its categories/values, never by the number label you see in <em>Variable View</em> standing in for them! This point will become more relevant and less abstract once we start learning what to do with variables, in Chapter 3.

&nbsp;

</div>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>57</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:00:20]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:00:20]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[2-1-data]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>323</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[1-4-spss-data]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>2.2 Summarizing Data</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-2-summarizing-data/</link>
		<pubDate>Wed, 31 Oct 2018 21:02:04 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=59</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Imagine a dataset containing a hundred respondents and just five variables. Such a dataset would have 500 data points and, while that may seem like a lot, a dataset of this size is considered rather small. Typically, datasets used in sociology (and other social sciences) tend to be larger. What this tells you is that there is an enormous amount of information housed within even an average dataset. Just like a library containing thousands and thousands of books but no catalog, unless we have the means to make sense of the information - order it, systematize it, categorize it - that information is all but useless. In the previous section, I discussed exploring a dataset in SPSS's <em>Data View</em>. While that's a useful (and necessary) task to do before working with any dataset, it doesn't provide anything more than a sort of global view of the variables in it.

In order to understand any variables better and to be able to fully use the information they contain we need tools to allow us to <em>zoom in </em>each individual variable, as it were, and to organize that information in a meaningful way.

Two of the most widely used such tools for exploring variables and presenting their information in a summarized, easy to understand way are, as you well know, <em>tables</em> and <em>graphs</em>. In the next section I start by introducing <strong>frequency tables</strong>; then we will end this chapter with introducing <strong>graphical displays</strong>.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>59</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:02:04]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:02:04]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[2-2-summarizing-data]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>323</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[1-4-1-summarizing-data]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>2.3 Frequency Tables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-3-frequency-tables/</link>
		<pubDate>Wed, 31 Oct 2018 21:02:28 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=61</guid>
		<description></description>
		<content:encoded><![CDATA[As usual, let's start ground-up with an example, and work our way up to the concept under study. Consider the following raw (unorganized) data.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.2 (A) Hypothetical Raw Data on</em> <em>Educational Attainment</em></p>

</header>
<div class="textbox__content">

Imagine that a group of 21 people were asked about their highest educational degree they have attained. These are their responses:

</div>
<table class="no-lines aligncenter" style="height: 118px;width: 71.7522%">
<tbody>
<tr style="height: 29px">
<td style="width: 14.5098%;height: 29px"><strong>Secondary/High School</strong></td>
<td style="width: 16.6013%;height: 29px"><strong> Bachelor's</strong></td>
<td style="width: 15.5556%;height: 29px"><strong>Secondary/High School</strong></td>
<td style="width: 15.5556%;height: 29px"><strong>No Degree</strong></td>
<td style="width: 10.719%;height: 29px"><strong>Bachelor's</strong></td>
<td style="width: 21.5921%;height: 29px"><strong>Didn't answer </strong></td>
</tr>
<tr style="height: 29px">
<td style="width: 14.5098%;height: 29px"><strong>Master's</strong></td>
<td style="width: 16.6013%;height: 29px"><strong>Associate's</strong></td>
<td style="width: 15.5556%;height: 29px"><strong>Master's</strong></td>
<td style="width: 15.5556%;height: 29px"><strong>Secondary/High School</strong></td>
<td style="width: 10.719%;height: 29px"><strong>Bachelor's</strong></td>
<td style="width: 21.5921%;height: 29px"></td>
</tr>
<tr style="height: 31px">
<td style="width: 14.5098%;height: 31px"><strong>Secondary/High School</strong><strong>
</strong></td>
<td style="width: 16.6013%;height: 31px"><strong>Secondary/High School</strong><strong>
</strong></td>
<td style="width: 15.5556%;height: 31px"><strong>Didn't answer</strong><strong>
</strong></td>
<td style="width: 15.5556%;height: 31px"><strong>Didn't answer</strong><strong>
</strong></td>
<td style="width: 10.719%;height: 31px"><strong>Bachelor's</strong><strong>
</strong></td>
<td style="width: 21.5921%;height: 31px"><strong> </strong></td>
</tr>
<tr style="height: 29px">
<td style="width: 14.5098%;height: 29px"><strong>Secondary/High School</strong></td>
<td style="width: 16.6013%;height: 29px"><strong>PhD</strong></td>
<td style="width: 15.5556%;height: 29px"><strong>Bachelor's</strong></td>
<td style="width: 15.5556%;height: 29px"><strong>Associate's</strong></td>
<td style="width: 10.719%;height: 29px"><strong>Associate's</strong></td>
<td style="width: 21.5921%;height: 29px"></td>
</tr>
</tbody>
</table>
</div>
&nbsp;

What can we glean from this presentation of the information? Can we easily see which is the most frequently obtained educational degree in the group? How many people do we have of each degree? What fraction/proportion of the total are each?

Of course, we could always count -- but what if I had asked you to imagine a group of 36 people? Of 72? Or 200? Or 2,000? Or more? Are you still going to painstakingly count the different responses?

You may be surprised, but the answer is "yes, if we had to". In the past, researchers used to do a that, a lot. Nowadays of course we have computers to do it for us. SPSS can easily summarize this data but to understand the process better, we'll start from scratch.

The most obvious way we can organize the raw data above into something less chaotic is the following:

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.2 (B) Hypothetical Data on Educational Attainment, Organized</em></p>

</header>
<div class="textbox__content">

<em>Table 2.1 Educational Attainment by Frequency </em>
<table style="border-collapse: collapse;width: 56.5894%;height: 121px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px;text-align: center"><strong>Degree</strong></td>
<td style="width: 19.8946%;height: 15px">
<p style="text-align: center"><strong> Count (a.k.a. <em>frequency</em>)</strong></p>
</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   No degree</td>
<td style="width: 19.8946%;height: 15px;text-align: center">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   Secondary/High School</td>
<td style="width: 19.8946%;height: 15px;text-align: center">6</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   Associate's</td>
<td style="width: 19.8946%;height: 15px;text-align: center">3</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   Bachelor's</td>
<td style="width: 19.8946%;height: 15px;text-align: center">5</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   Master's</td>
<td style="width: 19.8946%;height: 15px;text-align: center">2</td>
</tr>
<tr style="height: 16px">
<td style="width: 28.187%;height: 16px">   PhD</td>
<td style="width: 19.8946%;height: 16px;text-align: center">1</td>
</tr>
<tr>
<td style="width: 28.187%">   Didn't answer</td>
<td style="width: 19.8946%;text-align: center">3</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">  <strong> TOTAL</strong></td>
<td style="width: 19.8946%;text-align: center;height: 15px"><strong>21</strong></td>
</tr>
</tbody>
</table>
</div>
</div>
&nbsp;

In the most basic sense, this is a <em>frequency table</em>. It lists the different categories of a variable along with their observed count, a.k.a. <em>frequency. </em>That is<em>, </em><strong>we essentially count how many times any given category appears, i.e., we count how <em>frequent</em> a response is among the respondents, and then indicate the number for each category/response. Frequency is usually denoted by <em>f</em> in statistical notation.</strong>

Real frequency tables, however, usually contain more information than a simple count. The following few sub-sections provide the details, while we work our way through creating a full frequency table.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>61</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:02:28]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:02:28]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[2-3-frequency-tables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>323</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[1-4-2-frequency-tables]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[2-2-1-frequency-tables]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>2.4 Graphs</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-4-graphs/</link>
		<pubDate>Wed, 31 Oct 2018 21:03:11 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=63</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

A picture is worth a thousands words they say, so in this section we will explore the most basic ways we can summarize data using graphical displays rather than tables. Unlike frequency tables which can be used to summarize variables at all levels of measurement with a a table of the same format, the types of graphs we use tends to differ depending on the variable's level of measurement.  Almost all graphs in this book are produced using SPSS.

&nbsp;

The three most basic graphs used to summarize variables are <em>pie charts</em>, <em>bar graphs</em> (or <em>bar charts</em>), and <em>histograms</em>.

&nbsp;

<strong>Pie charts</strong>. You have undoubtedly encountered (and likely used) pie charts before. Fig. 2.1 below presents one such simple pie chart. The size of a slice of the "pie" corresponds to the category's size. The higher the category's frequency (and, of course, relative frequency), the larger the slice.

&nbsp;

<em>Figure 2.1 Sex of the Respondent (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/pie-chart-sex-gss-2016.png" alt="" width="462" height="370" class="alignnone wp-image-1512 size-full" />

The pie chart in Fig. 2.1 corresponds to the frequency table of <em>sex of the respondent</em> in the previous section, namely Table 2.5.

&nbsp;

Since the binary variable <em>sex</em> tends to look 'boring', in Fig. 2.2 below you can find a bonus pie chart for <em>marital status</em> which tends to be more colourful as it has more categories.

&nbsp;

<em>Table 2.2 Marital Status of the Respondent (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/pie-chart-marstat-gss-2016.png" alt="" width="462" height="370" class="alignnone wp-image-1513 size-full" />

&nbsp;

Pie charts can be used with both nominal and ordinal variables, though an argument can be made that the circular form of the pie chart may "hide" valuable insights about the order inherent in ordinal variables. As such, some prefer to use bar graphs for nominal variables <em>only</em>, and to use bar graphs for ordinal variables. Ultimately, it is a matter of preference, and both usages are correct.

&nbsp;

You should not try to use a pie chart for an interval/ratio variable, however, as the "pie" in most cases will end up divided into far too many and far too small slices which will make "reading" the chart impossible.

&nbsp;

<strong>Bar graphs</strong>. Fig. 2.3 below features a simple bar graph. The height of the bars corresponds to the size of the different categories. The higher the category's frequency (and relative frequency), the taller the bar.

&nbsp;

<em>Figure 2.3 Workplace Size (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/bar-graph-workplace-size-gss-2016.png" alt="" width="462" height="370" class="alignnone wp-image-1517 size-full" />

This bar graph corresponds to the frequency table for <em>workplace size</em> from the previous section (Table 2.6). Note that the percentages reflected in the graph are the <em>valid</em> percentages from the frequency table.

&nbsp;

Again, using a bar graph with a nominal variable is allowed, and it's up to you whether you prefer to use a pie chart instead, since the categories of a nominal variable have no order and can be "moved around" without loss of information. However, a bar chart can present the order of a ordinal variable's categories in a more intuitive manner, so for some people bar graphs are the preferred graph of choice for ordinal variables: this way the order goes through the bars from left to right.

&nbsp;

Like with pie charts, you shouldn't use bar graphs with interval/ratio variables as the potential for ending up with far too many bars is quite high, making reading the graph difficult.

&nbsp;

<strong>Histograms</strong>. Histograms are the graphical representations used with interval/ratio variables. Fig. 2.4 presents one such histogram. Once again, the height of each bar represents the frequency of a variable's category. In this case, the histogram corresponds to Table 2.7 from the previous section which was the frequency table of the number of takeout dishes respondents purchased in the last month.

&nbsp;

<em>Figure 2.4 Purchasing Takeout Dishes from Grocery Stores in the Past Month (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/histogram-takeout-dishes-gss-2016.png" alt="" width="462" height="370" class="alignnone wp-image-1522 size-full" />

&nbsp;

At first glance, a histogram might look similar to a bar graph - albeit usually with more bars/categories. However, the number of categories is not the only difference. Notice how the bars in the bar graph in Fig. 2.3 have space between them, wile the bars in the histogram in Fig. 2.4 do not. This difference represents the difference between discrete and continuous variables: Discrete variables[footnote]If you recall from Section<span style="text-indent: 37.3333px;font-size: 14pt"> </span><span style="text-indent: 37.3333px;font-size: 14pt">1</span><span style="text-indent: 37.3333px;font-size: 14pt">.5</span><span style="text-align: initial;text-indent: 2em;font-size: 14pt">   (https://pressbooks.bccampus.ca/simplestats/chapter/1-5-discrete-and-continuous-variables/), nominal and typically ordinal variables are considered discrete.[/footnote] </span><span style="text-align: initial;text-indent: 2em;font-size: 14pt">have separate categories, hence the distance between the bars in the bar graph. Continuous variables (typically interval/ratio variables) have continuous categories, therefore the bars representing the categories touch each other to indicate their continuous nature (i.e., their potentially infinite number of values).</span>

&nbsp;

In the next two chapters you will learn how you can use these graphs in greater detail (especially the histogram). Here is how to produce them in SPSS.

&nbsp;
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip 2.2  Basic Graphs </em></p>

</header>
<div class="textbox__content">

<strong>To get a pie chart:</strong>
<ul>
 	<li>From the <em>Main Menu</em>, click <em>Graphs</em> and then <em>Legacy Dialogs</em>;</li>
 	<li>From the pull-down menu of <em>Legacy Dialogs</em>, select <em>Pie</em>; a <em>Pie Charts</em> window will appear.</li>
 	<li>Leave <em>Summaries for groups of cases</em> selected and click <em>Define</em>;</li>
 	<li>Select your variable of interest from the left-hand side variable list and, using the correct arrow, move the variable into the <em>Define Slices by</em> empty space.</li>
 	<li>You can change what the slices represent -- the frequency (<em>N of cases</em>) or percentages (<em>% of cases</em>) in the top right section of the window called <em>Slices Represent</em>.</li>
 	<li>When you are done, click <em>OK</em>. The pie chart will appear in the <em>Output</em> window.</li>
</ul>
</div>
<strong>To get a bar graph:</strong>
<ul>
 	<li>From the <em>Main Menu</em>, click <em>Graphs</em> and then <em>Legacy Dialogs</em>;</li>
 	<li>From the pull-down menu of <em>Legacy Dialogs</em>, select <em>Bar</em>; a <em>Bar Charts</em> window will appear.</li>
 	<li>Leave <em>Simple</em> and <em>Summaries for groups of cases</em> selected and click <em>Define</em>;</li>
 	<li>Select your variable of interest from the left-hand side variable list and, using the correct arrow, move the variable into the <em>Category Axis</em> empty space.</li>
 	<li>You can change what the slices represent -- the frequency (<em>N of cases</em>) or percentages (<em>% of cases</em>) in the top right section of the window called <em>Bars Represent</em>.</li>
 	<li>When you are done, click <em>OK</em>. The bar graph will appear in the <em>Output</em> window.</li>
</ul>
<strong>To get a histogram:</strong>
<ul>
 	<li>From the <em>Main Menu</em>, click <em>Graphs</em> and then <em>Legacy Dialogs</em>;</li>
 	<li>From the pull-down menu of <em>Legacy Dialogs</em>, select <em>Histogram</em>; a <em>Histogram</em> window will appear.</li>
 	<li>Select your variable of interest from the left-hand side variable list and, using the correct arrow, move it into the <em>Variable</em> empty space.</li>
 	<li>When you are done, click <em>OK</em>. The bar graph will appear in the <em>Output</em> window.</li>
</ul>
</div>
&nbsp;

&nbsp;

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>63</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:03:11]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:03:11]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[2-4-graphs]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>323</wp:post_parent>
		<wp:menu_order>8</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[1-4-3-graphs]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[2-2-2-graphs]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>3.1 Mode</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-1-mode/</link>
		<pubDate>Wed, 31 Oct 2018 21:14:21 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=68</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Central tendency is the information about the clustered-ness of a variable's distribution; whether its observations/cases/responses tend to group together (or not) and where (i.e., in which categories/values) they tend to fall.

&nbsp;

<strong>There are three measures of central tendency: mode, median, and mean</strong>. In this section, we explore the <em>mode.</em>

&nbsp;

To find a variable's mode, you only need a frequency table - or rather, even just the frequency column in the table (although the <em>Valid Percent</em> column will do you just as well). Here is a simple, small-<em>N</em>, real-world example.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 3.1 Religious Affiliation of Canadian Prime Ministers</em></p>

</header>
<div class="textbox__content">

<em>Table 3.1 Religious Affiliation of Canadian Prime Ministers (Wikipedia 2017)</em>
<table style="border-collapse: collapse;width: 65.6463%;height: 105px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 40.7496%;height: 15px;text-align: center"><strong>Religious affiliation</strong></td>
<td style="width: 25.1162%;height: 15px;text-align: center"><strong>Frequency</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 40.7496%;height: 15px">Anglican</td>
<td style="width: 25.1162%;height: 15px;text-align: center">4</td>
</tr>
<tr style="height: 15px">
<td style="width: 40.7496%;height: 15px">Baptist</td>
<td style="width: 25.1162%;height: 15px;text-align: center">3</td>
</tr>
<tr style="height: 15px">
<td style="width: 40.7496%;height: 15px">Evangelical</td>
<td style="width: 25.1162%;height: 15px;text-align: center">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 40.7496%;height: 15px">Presbyterian</td>
<td style="width: 25.1162%;height: 15px;text-align: center">3</td>
</tr>
<tr style="height: 15px">
<td style="width: 40.7496%;height: 15px">Roman Catholic</td>
<td style="width: 25.1162%;height: 15px;text-align: center">10</td>
</tr>
<tr style="height: 15px">
<td style="width: 40.7496%;height: 15px">United Church of Canada (prev. Methodist)</td>
<td style="width: 25.1162%;height: 15px;text-align: center">2</td>
</tr>
<tr>
<td style="width: 40.7496%">TOTAL</td>
<td style="width: 25.1162%;text-align: center">23</td>
</tr>
</tbody>
</table>
&nbsp;

What is the most popular religious affiliation of Canadian Prime Ministers as of 2019? Or, what religious affiliation is most frequently reported by Canadian Prime Ministers so far? In other words, what religious affiliation do Canadian Prime Ministers most have tended to have?

&nbsp;

Surprising no one with any knowledge about Canada, the largest category among the religious denominations, or the one that Canadian Prime Ministers most frequently subscribe to -- i.e., <strong>the category with the highest frequency</strong> -- is "Roman Catholic", with 10 of the Canadian Prime Ministers identified as such. (And are you surprised that Canada has only had Christian Prime Ministers?)

</div>
</div>
&nbsp;

As simple as that, <strong>the category/value with the highest frequency is called the mode of the variable</strong>. Alternatively, you can easily spot the mode in a graph: it would be the largest slice of the pie or the tallest column in a bar chart or a histogram.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It!</em> <em>3.1  Do all variables have a mode?</em></p>

</header>
<div class="textbox__content">

Considering that the only thing you need to do to find a variable's mode is to count the frequency of each of its categories/values and indicate the one with the highest count, will it be possible to find the mode of any variable, regardless of its level of measurement? Or would the mode be a descriptive statistics applicable only to some variables depending on their level of measurement?

</div>
</div>
&nbsp;

If by now you have a good grasp of what makes a variable nominal, ordinal, interval, or ratio (and if you do not -- go back and really reread Section 1.3! (https://pressbooks.bccampus.ca/simplestats/chapter/1-3-levels-of-measurement/)), you should be able to easily answer the questions in the <em>Do It! 3.1</em> above. Obtaining the mode, the simplest of all measures of central tendency, does not require any calculations or complicated procedures. To identify the mode, it doesn't matter whether the categories of a variable are made of <em>words</em> or <em>numbers</em>, or if there is any order in them. All that matters is the <em>count</em> -- the frequency -- of responses in each category/value in order to identify <em>where</em> <em>cases tend to cluster</em> across the categories/values. As such, <strong>the mode is a descriptive statistic applicable to any and all variables</strong>.

&nbsp;

To illustrate, let's bring back the Example 2.2 (B) from Section 2.3:
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 3.2  </em><em>Educational Attainment's Mode</em></p>

</header>
<div class="textbox__content">

<em>Table 3.2 Educational Attainment </em>
<table style="border-collapse: collapse;width: 56.5894%;height: 121px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px;text-align: center"><strong>Degree</strong></td>
<td style="width: 19.8946%;height: 15px">
<p style="text-align: center"><strong> Count (a.k.a. <em>frequency</em>)</strong></p>
</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   No degree</td>
<td style="width: 19.8946%;height: 15px;text-align: center">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   Secondary/High School</td>
<td style="width: 19.8946%;height: 15px;text-align: center">6</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   Associate's</td>
<td style="width: 19.8946%;height: 15px;text-align: center">3</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   Bachelor's</td>
<td style="width: 19.8946%;height: 15px;text-align: center">5</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">   Master's</td>
<td style="width: 19.8946%;height: 15px;text-align: center">2</td>
</tr>
<tr style="height: 16px">
<td style="width: 28.187%;height: 16px">   PhD</td>
<td style="width: 19.8946%;height: 16px;text-align: center">1</td>
</tr>
<tr>
<td style="width: 28.187%">   Didn't answer</td>
<td style="width: 19.8946%;text-align: center">3</td>
</tr>
<tr style="height: 15px">
<td style="width: 28.187%;height: 15px">  <strong> TOTAL</strong></td>
<td style="width: 19.8946%;text-align: center;height: 15px"><strong>21</strong></td>
</tr>
</tbody>
</table>
What is the mode for educational attainment based on the 21 respondents in the example?

</div>
</div>
&nbsp;

Looking for the largest category in Table 3.2 above, you undoubtedly already identified that the mode for <em>educational attainment</em> is "Secondary/High School". That is, to put this into language that even people non-trained in statistics could understand, the most frequent educational degree among the 21 respondents in the example is "Secondary/High School" as it has the highest frequency/the largest number of cases in it, 6. (It is generally quite useful to get into the habit of translating <em>statistics-ese</em> into English when you write reports so you should practice it on all occasions.)[footnote]Note that <em>most frequent</em> category does not mean that it contains the <em>majority</em> or <em>most</em> cases. Sometimes that may be the case, but it's not necessarily so. In both examples above you can see that neither Roman Catholics nor people with Secondary/High School degrees are a majority in their respective groups (10 out of 23 and 6 out of 21, respectively). Thus, be careful when writing about a mode as being "where <em>most/the majority</em> of cases cluster" because many times the phrasing would be factually incorrect. [/footnote]

&nbsp;

And this is all there is to finding out a variable's mode. Beyond simply counting (applicable to groups of relatively small size, as generally no one would want to count hundreds or thousands of cases by hand), the ways to obtain a mode through SPSS are listed below.

&nbsp;
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip 3.1: Finding a Variable's Mode</em></p>

</header>
<div class="textbox__content">

<strong>Option 1: Through a frequency table</strong>[footnote]You might want to avoid this option when working with interval/ratio variables, as their frequency tables can be very, very long.[/footnote]
<ul>
 	<li>Use SPSS to create a frequency table for your chosen variable[footnote]See Section 2.3.4 (https://pressbooks.bccampus.ca/simplestats/chapter/2-3-4-what-frequency-tables-look-like/)for the tip on how to create frequency tables in SPSS.[/footnote];</li>
 	<li>Look for the category/value with the highest frequency (the relative frequency in the <em>Valid Percent</em> column works too);</li>
 	<li>Report the category with the highest frequency as the mode of that variable.</li>
</ul>
</div>
<strong>Option 2: Directly requesting the statistic</strong>
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Analyze,</em> then Descriptive<em> Statistics,</em> then Frequencies;</li>
 	<li>Select your variable of choice from the left-hand side and use the arrow to move it to the right side of the window;</li>
 	<li>Click on the <em>Statistics</em> button on the right;</li>
 	<li>In the new window, check <em>Mode</em> in the <em>Central Tendency</em> section on your right;</li>
 	<li>Click <em>Continue</em>, then <em>OK</em>.</li>
</ul>
</div>
&nbsp;

Note that SPSS gives you the option to display a frequency table or not before clicking <em>OK</em> in the last step listed in the SPSS Tip above. The reason is practical: the frequency tables of interval/ratio variables can be quite long depending on the number of values they contain. As such, while identifying the mode from the frequency table of a nominal or ordinal variable is fine, it's often more practical to request SPSS to report the mode of an interval/ratio variable directly rather than through a frequency table.

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch Out!! #6</strong></span>... for Reporting Nominal/Ordinal Variable's Modes As Given by SPSS</em></p>

</header>
<div class="textbox__content">

&nbsp;

One thing to keep in mind when requesting the mode directly from SPSS is that SPSS will report modes by their number labels, or code (i.e., not by the actual name of the categories). If you recall from Section 2.1 (https://pressbooks.bccampus.ca/simplestats/chapter/2-1-data/), datasets contain only numbers, with nominal and ordinal categories appearing in code so that the software can work with them. As such, your SPSS output will list the mode of a nominal or ordinal variable as a number, and it is your job to "translate" that number into its proper form, i.e., its the actual category.

&nbsp;

For example, in the <em>Religious Affiliation of Canadian Prime Ministers</em> example above, going in the order the categories are listed, the categories would typically be coded in the following way: "Anglican" =1, "Baptist" = 2, "Evangelical" = 3, "Presbyterian" = 4,  "Roman Catholic" = 5, "United Church of Canada" = 6. The dataset would contain only the code (i.e., the numbers) and SPSS would report the mode as "5" in the output.

&nbsp;

However, it is a mistake to report the code (the number label assigned to the category) instead of the actual category's name. <strong>You should always report the mode with its real category name</strong>. (That is, it is up to you too look up the code -- recall that you can do this through the <em>Values</em> column in SPSS's <em>Data View</em> -- and find the correct name of the modal category). In this case, you should report the mode of <em>Religious Affiliation of Canadian Prime Ministers</em> not as 5 but as "Roman Catholic". (The "5" has no real meanings, it simply indicates that Roman Catholic is the fifth category in the listing.)

</div>
</div>
&nbsp;

I'll end this section with a final consideration regarding the mode: it is quite possible that a variable has more than one mode. After all, two (or more) categories/values might have the same frequency, so in that case we say that the variable's distribution is <em>multimodal</em> (<em>bi-modal</em> or <em>tri-modal</em> in the specific cases of two or three modes). Depending on the number of modes, it's acceptable to report only the first, while indicating that multiple modes exist for that variable. Multiple modes are usually also easy to spot in bar graphs and histograms: they appear as bars of equal height.

&nbsp;

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>68</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:14:21]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:14:21]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[3-1-mode]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>24</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[2-1-mode]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>3.2 Median</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/</link>
		<pubDate>Wed, 31 Oct 2018 21:14:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=70</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

The three measures of central tendency are all measures that tell us where typical cases fall or where cases tend to cluster. After exploring the mode in the previous section, in this section we turn to the second measure of central tendency called the <em><strong>median</strong></em>.

&nbsp;

The<em> median</em> lives up to its name: it derives from the Latin root <em>medi</em>, meaning "middle", and that's exactly the type of information it provides. Specifically, the median divides the cases of a variable into two equal halves and identifies the case in the middle. As such, it points out the "centre" of the data in a very straightforward way -- it simply reports the middle observation.

&nbsp;

Consider, however, the following point: even in everyday life, the middle implies a beginning and an end (e.g., "in the middle of the book"); something that is in-between, a gradation from a point A to a point C, as it were. From clothes sizes ("small, <em>medium</em>, large") to how spicy you like your Thai food ("a little, <em>medium</em>, or hot"), through turning the volume up or down while listening to music ("low, <em>medium</em>, high"), the "centre" category bisects whatever it is applied to into a smaller/larger, less/more, left/right, etc. parts. That is, to speak of <em>the middle</em> of something we need to know where it starts (e.g., the minimum) and where it ends (e.g., the maximum). Simply put, we need an <em>order</em>.

&nbsp;

What all this should tell you is that <strong>the median is not applicable to nominal variables.</strong> Speaking of the middle of <em>gender</em>, or the middle of <em>ethnicity</em>, or <em>religious affiliation</em>, or <em>hair colour</em>, or <em>degree major, </em>or of the middle of any other nominal variable makes no sense. After all, the order the categories of a nominal variable appear is either arbitrary or a matter of preference; nothing precludes rearranging the categories in some <em>other</em> way so that a case belonging to one category that ends up in the middle of one arrangement would not necessarily be in the middle of another arrangement. A case belonging to <em>any</em> category can easily end up being the middle one. A statistic shouldn't depend on such a chance/preference; as such <strong>nominal variables have no median</strong>.

&nbsp;

<span style="font-size: 14pt;text-indent: 18.6667px">On the other hand, as you know by now, ordinal and interval/ratio variables do have an inherent order arranging their categories/values. They have a "beginning" and an "end", and therefore a "centre". As such, </span><strong style="font-size: 14pt;text-indent: 18.6667px">the median applies (only) to ordinal and interval/ratio variables</strong><span style="font-size: 14pt;text-indent: 18.6667px">.</span>

&nbsp;

Note that while the mode applies to a <em>category </em>(freflecting the largest number of cases), the median is determined by the <em>case</em> (observation) that falls in the middle of the category-ordered listing of all cases. Thus <strong>it's not the middle category that is the median</strong>; depending on the size of the categories, the median <em>case</em> can belong to any category/value. <strong>The median category/value is the one to which the middle case belongs.</strong> Presented this way, the explanation sounds undoubtedly as clear as mud but do not despair.  It will get better when we establish the manner in which we obtain the median, so trust me and read on.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 3.2 (A) Three Students, Five Students, Eight Students by Year of Study, Counting</em></p>

</header>
<div class="textbox__content">

<em>N=3</em>

&nbsp;

a) Let's say we have three students at different levels of their studies: one is a first-year, the second one a fourth-year, and the third a third-year. Before we do anything else, we need to establish the correct order. We rearrange the students properly:

&nbsp;

(1) a first-year student

(2) a third-year student           <em>← median</em>

(3) a fourth-year student

&nbsp;

The case in the middle is Case #2, the second one on the list (as there is one student below and one student above), i.e., the third-year student. Thus we have established that the median category is "third year of study". That is, half of the students are below the third year of study and half are above (as odd as it sounds when we only have three cases).

</div>
<em>N=5</em>

&nbsp;

<span style="text-align: initial;font-size: 0.9em;text-indent: 0px">b) What happens if I add two more students to our group, say, a first-year student and a second-year student? The order will go like this:</span>

&nbsp;

<span style="text-align: initial;font-size: 0.9em;text-indent: 0px">(1) a first-year student</span>

<span style="text-align: initial;font-size: 0.9em;text-indent: 0px">(2) a first-year student (new)</span>

<span style="text-align: initial;font-size: 0.9em;text-indent: 0px">(3) a second-year student  (new)     <em>← median</em></span>

<span style="text-align: initial;font-size: 0.9em;text-indent: 0px">(4) a third-year student</span>

<span style="text-align: initial;font-size: 0.9em;text-indent: 0px">(5) a fourth-year student</span>

&nbsp;

<span style="text-align: initial;font-size: 0.9em;text-indent: 0px">Once again, it's easy to see that the middle case is Case #3, the third one on the list (as there are two students below and two students above), i.e., the second-year student. This time around the median category is "second-year of study". That is, half of the students are below their second year of study and half are above.</span>
<div class="textbox__content">

<em>N=8</em>

&nbsp;

c) What if I complicate matters further? What if I add three more students to the group, say, two second-years and a fourth-year? Their order will be:

&nbsp;

(1) a first-year student

(2) a first-year student

(3) a second-year student                 <em>The median is between</em>

(4) a second-year student (new)      <em>← this case</em>

(5) a second-year student (new)      <em>← and this case</em>

(6) a third-year student

(7) a third-year student (new)

(8) a fourth-year student

&nbsp;

If you go by the same logic as above, you'll quickly find that there is no "middle" student: unlike before, the students now are an even number. The middle of the group actually falls between Cases #4 and Case #5, the fourth and the fifth cases on the list (so that four are below and four above it). Since both the fourth and the fifth students are second-year, we can conclude that, again, the median is "second-year of study".  Had the fourth and the fifth student been different years of study, we could say that the median was between their respective categories.

</div>
</div>
&nbsp;

We could continue the same way as in Example 3.2 (A) above for larger groups too: we could arrange the cases in order of their categories/values, find the middle case (or two middle cases) and report its category/value as the median. However, you can guess that this would quickly become impractical the larger the group size gets. We need some other way of finding the median, one that generalizes across groups of any size.

&nbsp;

Consider the following formula:

&nbsp;

$$\frac{N+1}{2}= $$  "<em>numbered position of the median case in the ordered list of cases</em>"

&nbsp;

where, as usual, N is the group size.

&nbsp;

Instead of counting, let's apply this formula to Example 3.2 (A).

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header"><header class="textbox__header">
<p class="textbox__title"><em>Example 3.2 (B) Three Students, Five Students, Eight Students by Year of Study, Using a Formula</em></p>

</header></header>
<div class="textbox__content">

a) <em>N</em>=3

&nbsp;

(1) a first-year student

(2) a third-year student

(3) a fourth-year student

&nbsp;

According to the formula,

&nbsp;

$$\frac{N+1}{2}=\frac{3+1}{2}=\frac{4}{2}=2 $$

&nbsp;

That is, the "numbered position of the median case in the ordered list of cases" is equal to 2; the middle case is Case #2, the second one on the list, or like we established before, the third-year student.

&nbsp;

b) <em>N</em>=5

&nbsp;

(1) a first-year student

(2) a first-year student (new)

(3) a second-year student  (new)

(4) a third-year student

(5) a fourth-year student

&nbsp;

According to the formula,

&nbsp;

$$\frac{N+1}{2}=\frac{5+1}{2}=\frac{6}{2}=3 $$

&nbsp;

That is, the "numbered position of the median case in the ordered list of cases" is equal to 3; the middle case is Case #3,  the third one on the list, or again, the second-year student.

&nbsp;

c) <em>N</em>=8

&nbsp;

(1) a first-year student

(2) a first-year student

(3) a second-year student

(4) a second-year student (new)

(5) a second-year student (new)

(6) a third-year student

(7) a third-year student (new)

(8) a fourth-year student

&nbsp;

According to the formula,

&nbsp;

$$\frac{N+1}{2}=\frac{8+1}{2}=\frac{9}{2}=4.5 $$

&nbsp;

That is, the "numbered position of the median case in the ordered list of cases" is equal to 4.5. Considering we have discrete numbers (after all, the cases are individuals), there is no case number 4.5. Instead, we say that the median falls between Case #4 and Case #5, the fourth and fifth cases on the list, or between two second-year students, so it is "second year of study".

</div>
</div>
&nbsp;

It is easy to see that we could substitute a group of any size for the <em>N</em> in the formula. Even when working with hundreds or thousands of cases, we can always use the formula to find the place (or which case number) bisects the variable's distribution in two haves.

&nbsp;

So far I only used an ordinal variable to illustrate the median. How does finding the median work for interval/ratio variables? Would it matter that interval/ratio variables have numerical values rather than qualitative categories? No, not in the least. After all, finding the median doesn't depend on the category or value of any case in any substantive sense -- only on its numbered position in the ordered list of categories/values.

&nbsp;

There is something a bit different in the way interval/ratio variables look, however, that some people seem to find a tad more confusing when working with values rather than categories. To illustrate, I'll give you another example.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 3.3 (A) Median for Number of Siblings, Raw Data</em></p>

</header>
<div class="textbox__content">

Imagine you talk to seven of your friends and ask them about the number of siblings they have. Let's say these are the responses you receive: 2, 1, 4, 2, 1, 0, 3. That is, two friends report having two siblings each, two friends report having one sibling each, and three of your friends report having four, zero, and three siblings each.

&nbsp;

To find the median, the first thing we need to do is put the responses in order:

&nbsp;

(1) 0

(2) 1

(3) 1

(4) 2

(5) 2

(6) 3

(7) 4

&nbsp;

Whether you visually identify Case #4 as the middle case (three cases below and three cases above it) or use the formula ($\frac{N+1}{2}=\frac{7+1}{2}=\frac{8}{2}=4$) to obtain the same result, it is clear that the median is "two siblings": half of your friends in this example have fewer than two siblings, and half have two or more siblings.

&nbsp;

</div>
</div>
&nbsp;

What might be confusing for some people is differentiating between the numbered positions of the cases on the list and their values since both are expressed numerically. In this example I have tried to make it easier to distinguish by putting the numbered positions of the cases in brackets and the values next to them (just like the categories in the ordinal example above). Thus you can see that Case #1 has 0 siblings, Case #2 has 1 sibling, etc. Had I chosen different set of values -- for example, if Case #1 had 1 sibling, Case #2 had 2 siblings, Case #3 had 3 siblings, etc. -- you might have found it a bit harder. For that reason, make a mental note to keep a clear track of what is a case's value and what is its numbered position.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>70</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:14:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:14:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[3-2-median]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>24</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[2-2-median]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>3.4 Mean</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-4-mean/</link>
		<pubDate>Wed, 31 Oct 2018 21:15:13 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=72</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

The third, and final, measure of central tendency is one you have undoubtedly encountered before. It is one that most people have had to calculate at least a few times in their lives, and that everyone has heard reported about one thing or another. You most likely know it by its common name, <strong>the average</strong>.

&nbsp;

Recall that the measures of central tendency provide information about the typical cases, or where cases tend to centre in a variable's distribution. Thus a student's Grade Point Average (GPA) provides a measure for how well they do academically, not in one class, but <em>on average</em>, across all of them; a hockey player's points season average provides a measure of their performance on the ice not just in one game but for a whole season; a monthly average temperature gives indication of what the typical weather for a specific month is, etc. All of these averages show what is typical or expected.

&nbsp;

<strong>The mean of a variable is</strong> therefore, quite simply put, <strong>the mathematical average of the values</strong> <strong>of the variable's cases</strong>. Reported alongside the mode and the median, it provides a fuller picture of where the cases tend to cluster, or what the typical cases are. The mode does this in the simplest way, by counting their frequency and reporting the largest one. The median does that by providing the most centrally located case in terms of order.

&nbsp;

<strong>Unlike the mode and the median, however, the mean takes into account the actual <em>values</em> of the cases.</strong>

&nbsp;

Keeping the last sentence in mind, do you think the mean will apply to all and any variables? If you have been paying attention, you would know that the answer is "no, of course not".

&nbsp;

Nominal and ordinal variables have categories.  <strong>Only interval/ratio variables have actual numerical values, therefore, the mean applies only to them.</strong> After all, mathematical calculations are only possible when we have <em>numbers</em> with which to do the calculations: we cannot calculate an average of gender,  or of race/ethnicity, or of religious affiliation, etc.[footnote]Note that in specific cases it's possible to calculate <em>something like an average</em> for certain ordinal variables, for example, Likert-scales, to the extent that their numerical labels reflect a somewhat monotonic, stable-unit, distances. This should be done with extreme care and ample justification, however, and beginner researchers (like you) are advised against using means for ordinal variables.[/footnote] We could, however, calculate an average age, income, score, temperature, etc.

&nbsp;

If you had ever calculated your GPA, you already know how to calculate the mean. I will still give you an example to strengthen your knowledge.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 3.4 (A) Mean of Number of Siblings, Raw data</em></p>

</header>
<div class="textbox__content">

&nbsp;

If you recall our Example 3.3 (A) from the previous Section 3.2 (<span> </span><span id="sample-permalink"><a href="https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/">https://pressbooks.bccampus.ca/simplestats/chapter/<span id="editable-post-name">3-2-median</span>/</a></span><span> ‎)</span>, you imagined yourself asking seven of your friends about the number of siblings they had. We imagined the responses as follows: 2, 1, 4, 2, 1, 0, 3. We had to put these values in order to be able to find the median, but the mean works either way, whether the values are in order or not.

&nbsp;

To calculate the average number of siblings your imagined friends have, we simply add all responses together and divide them by the total number of friends, i.e., by 7:

&nbsp;

$$\frac{(2+1+4+2+1+0+3)}{7}=\frac{13}{7}=1.86$$

&nbsp;

That is, your imagined friends have 1.86 siblings on average (or not quite but closer to two, rather than one siblings on average). We could also say that the mean of <em>number of siblings</em> is 1.86.

</div>
</div>
&nbsp;

Let's do it again, as practice makes perfect.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 3.5 Textbook Prices For a Semester, Raw Data</em></p>

</header>
<div class="textbox__content">

&nbsp;

Depending on the courses you take in a semester, what you pay for books will vary but let's say we're interested in how much you pay for books in a typical semester.<span style="font-size: 1rem"> Perhaps </span><span style="text-indent: 1em;font-size: 1rem">you are very-well organized and want to finish your degree as quickly as possible so you have decided to take five courses per semester. For simplicity's sake, let's assume your were assigned one book per course. These are the books' prices: \$120, \$230, \$300, \$65, \$30. How much did you pay for a book on average?</span>

&nbsp;

$$\frac{(120+230+300+65+30)}{5}=\frac{745}{5}=149$$

&nbsp;

That is, despite the fact that some of your books were expensive (like the \$300 one), and some relatively cheap (like the \$30 one), the average price you paid for a book in that semester was \$149.

</div>
</div>
&nbsp;

Now that we've seen how the mean works in practice, let's generalize what we did in the two examples above using proper notation. Fair warning: the formula below does <em>look</em> complicated but remember what we just did: our calculations were quite simple (adding all values, dividing their sum by their total number), and so is the formula. As usual, it simply restates what we've said in words in a mathematical shorthand. If you know what each symbol in the shorthand stands for, you know what the formula means. So, take a deep breath:

&nbsp;

\begin{equation}

\frac{x_1+x_2+x_3+\dots+x_N}{N}=\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\overline{x}

\end{equation}

&nbsp;

where ∑ stands for "sum"[footnote] <em>∑</em> is pronounced "SIG-ma" and is the Greek letter S. [/footnote], $\sum\limits_{i=1}^{N}$ indicates to sum all cases from the first (1) to the last (<em>N</em>), <em>x<sub>i</sub> </em>stands for any case with a number between 1 and <em>N</em>, and $\overline{x}$ indicates the mean<span style="text-indent: 18.6667px;font-size: 14pt">[footnote]$\overline{x}$ is pronounced "EX-bar".[/footnote]</span><span style="text-indent: 1em;font-size: 14pt">, i.e., the average of all the </span><em style="text-indent: 1em;font-size: 14pt">x<sub>i</sub></em><span style="text-indent: 1em;font-size: 14pt">'s. Thus, the formula basically tells you to add all values and divide by their total, just as we did in the examples.</span>

&nbsp;

So far, we only calculated the means for raw data, i.e., data not presented in a frequency table. Will the calculation of the mean be different if we had a frequency table instead? While the principle is the same, the fact that the values are grouped by frequency in frequency tables requires that we do a slight modification to our calculations. Here's a small-scale illustration to demonstrate the principle before we do an example with a larger<em> N</em>.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 3.4 (B) Mean for Number of Siblings, Aggregated Data</em></p>

</header>
<div class="textbox__content">

&nbsp;

Arranging the raw data from Example 3.4 (A) above, we again get the following table.

</div>
<em>Table 3.3 Frequency Table for Number of Siblings</em>
<div class="textbox__content">
<table style="border-collapse: collapse;width: 50%;height: 105px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px;text-align: center"><strong>Value</strong></td>
<td style="width: 2.849%;height: 15px;text-align: center"><strong>Frequency</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">0</td>
<td style="width: 2.849%;height: 15px">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">1</td>
<td style="width: 2.849%;height: 15px">2</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">2</td>
<td style="width: 2.849%;height: 15px">2</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">3</td>
<td style="width: 2.849%;height: 15px">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">4</td>
<td style="width: 2.849%;height: 15px">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px"><strong>Total</strong></td>
<td style="width: 2.849%;height: 15px"><strong>7</strong></td>
</tr>
</tbody>
</table>
According to the formula for the mean, we need to add all values together and then divide their sum by their total number. When the values are disaggregated (i.e., raw), we can proceed to adding them up right away. However, when they are grouped by frequency, we first need to multiply each value by its respective frequency, and then add the value-times-frequency products together, before dividing them by the total number, like this:

&nbsp;

$$\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{(0+1+1+2+2+3+4)}{7}=\frac{0(1)+1(2)+2(2)+3(1)+4(1)}{7}=\frac{13}{7}=1.86=\overline{x}$$

&nbsp;

Again, the average number of siblings of these seven friends is 1.86, as previously calculated.

</div>
</div>
&nbsp;

Now let's apply the same principle to a new, larger-<em>N</em> example.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 3.6 Age of Classmates, Aggregated Data</em></p>

</header>
<div class="textbox__content">

&nbsp;

Imagine you are doing a survey for one of your class assignments and one of the questions is about age. You aggregate the data by frequency and you get the following table.

&nbsp;

<em>Table 3.5 Mean for Age of Classmates</em>
<table style="border-collapse: collapse;width: 50%;height: 135px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 24.3218%;height: 15px;text-align: center"><strong>Value</strong></td>
<td style="width: 23.4743%;height: 15px;text-align: center"><strong>Frequency</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 24.3218%;height: 15px">19</td>
<td style="width: 23.4743%;height: 15px">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 24.3218%;height: 15px">20</td>
<td style="width: 23.4743%;height: 15px">10</td>
</tr>
<tr style="height: 15px">
<td style="width: 24.3218%;height: 15px">21</td>
<td style="width: 23.4743%;height: 15px">12</td>
</tr>
<tr style="height: 15px">
<td style="width: 24.3218%;height: 15px">22</td>
<td style="width: 23.4743%;height: 15px">8</td>
</tr>
<tr style="height: 15px">
<td style="width: 24.3218%;height: 15px">25</td>
<td style="width: 23.4743%;height: 15px">2</td>
</tr>
<tr style="height: 15px">
<td style="width: 24.3218%;height: 15px">27</td>
<td style="width: 23.4743%;height: 15px">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 24.3218%;height: 15px">35</td>
<td style="width: 23.4743%;height: 15px">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 24.3218%;height: 15px"><strong>TOTAL</strong></td>
<td style="width: 23.4743%;height: 15px"><strong>35</strong></td>
</tr>
</tbody>
</table>
By the formula, we have:

$\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{19(1)+20(10)+21(12)+22(8)+25(2)+27(1)+35(1)}{35}=\frac{19+200+252+176+50+27+35}{35}=\frac{759}{35}=21.69=\overline{x}$

&nbsp;

Or, now you know that the average age of your classmates in that class is 21.69 years, or a bit less than 22 years.

</div>
</div>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>72</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:15:13]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:15:13]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[3-4-mean]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>24</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[2-3-mean]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[3-3-mean]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>4.1 Range</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/4-1-range/</link>
		<pubDate>Wed, 31 Oct 2018 21:19:43 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=75</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Providing the <em>range</em> for a set of values is so easy, most people don't even realize it is an actual statistical measure of dispersion. If you have ever said something to the effect of "I have friends whose ages vary between seventeen and twenty-seven" or "my scores on these exams vary from 25/100 to 95/100", etc., you have effectively been providing the range of your friends' ages or the range of your exam scores.

&nbsp;

To give you the more technical definition, <strong>the range of a variable is the difference between its highest and lowest values</strong>. That is, to get the range, we simply subtract the lowest value from the highest value:

&nbsp;

$$x_{max}-x_{min}= range$$

&nbsp;

In the two quick examples above, the range of your friends' ages would be (27-17=) 10 years, and the range of your exam scores would be (95-25=) 70 points.

&nbsp;

I'll use an older, familiar example for the longer work-through, below.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 4.1 The Range for Textbook Prices Paid in One Semester</em></p>

</header>
<div class="textbox__content">

Recall Example 3.5 from Section 3.4 (<a href="https://pressbooks.bccampus.ca/simplestats/chapter/3-4-mean/">https://pressbooks.bccampus.ca/simplestats/chapter/3-4-mean/</a>)  where we calculated the mean price of textbooks we imagined you paid in a particular semester. <span style="text-indent: 1em;font-size: 1rem">The books' prices were \$120, \$230, \$300, \$65, \$30. The cheapest book (i.e., the lowest value, $x_{min}$) was \$30 and the most expensive book (i.e., the highest value, $x_{max}$) was \$300. Thus</span>

&nbsp;

$$x_{max}-x_{min} = 300 - 30 = 270 = range$$

&nbsp;

That is, now we have found that the range of textbook prices for that semester was \$270, with prices you paid ranging between \$30 and \$300.

</div>
</div>
&nbsp;

One thing to note here is that in order to have a difference, i.e., in order to be able to do a mathematical operation like subtraction, we need to have numerical values.

&nbsp;

In truth, as you are about to see, <em>all</em> measures of dispersion are obtained through mathematical operations and, as such, require numerical values. Since interval/ratio variables are the only variables which contain actual numerical values, <strong>all dispersion measures (including the range) are only applicable to interval/ratio variables</strong>.[footnote]Some people find it useful to provide <em>something like a range</em> for ordinal variables: after all, they do have a "lowest" category and a "highest" category. While technically not a statistical measure of dispersion (as no difference can be computed), it can still be useful to add a description about the categories ranging between the lowest and highest points, e.g., "respondents' agreement with the statement varies between "strongly disagree" and "strongly agree". Considering that the categories of nominal variables have no inherent order, nothing of the sort can be applied to them. All in all, providing a qualitative description of dispersion for ordinal variables (like the agreement one I just mentioned) is optional and, strictly speaking, not a statistical measure.[/footnote]

&nbsp;

A final point about the range is that it is a rather unsophisticated measure of dispersion, as you have already noticed. (Hence the very short section about it.) <strong>By taking into account solely the highest and the lowest values, the range effectively ignores all other values</strong>, be they more clustered or more spread out.

&nbsp;

After all, if you recall from Section 3.6 (<a href="https://pressbooks.bccampus.ca/simplestats/chapter/3-6-outliers/">https://pressbooks.bccampus.ca/simplestats/chapter/3-6-outliers/</a>), outliers do exist. In the presence of outliers, the range can end up being quite large, even if the majority of the observations are closely clustered. Therefore, we'd better find a dispersion measure which takes into account more than just the two extremes of a variable's distribution.

&nbsp;

The <em>interquartile range</em> is one such measure which provides a bit more information about the variability of the distribution. Alas, the cost of this information is, of course, an increased complexity in obtaining that measure. (An ominous foreshadowing for what's to come!)

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>75</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:19:43]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:19:43]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[4-1-range]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>26</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[3-1-range]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>4.3 Variance</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/4-3-variance/</link>
		<pubDate>Wed, 31 Oct 2018 21:24:07 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=79</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Similarly to how the median is about the central position of a case while the mean is about the average of actual numerical values, the range and interquartile range are about positions in the overall (ordered) distribution of cases while the remaining two dispersion measures, the variance and the standard deviation, are about averaging numerical values.

&nbsp;

Thus, like the mean, the variance and the standard deviation account for <em>all</em> cases, not just a select few. Unlike the mean, however, instead of calculating the <em>average of all values</em>, <strong>the standard deviation and variance calculate (approximately) <em>the average of the distances of each and every value to the mean</em>.</strong>

&nbsp;

The mean is a measure of central tendency, as you know by now, and it represent a sort of "centre" of the data, <em>value</em>-wise (as opposed to <em>position</em>-wise, which is what the median is). You know that all cases' values enter the calculation of the mean (after all, we sum all values and divide the sum on their total number to get the mean), but, at the same time, the values are <em>different</em> from the mean. (That is, <span style="text-indent: 18.6667px;font-size: 14pt">either all are different, or all but one </span><span style="text-indent: 1em;font-size: 14pt">  -- it's possible that one of the values is actually what the mean is, in which case the difference is zero.)</span>

&nbsp;

<span style="text-indent: 1em;font-size: 14pt">This difference, between a value of a case and the mean, is what we call </span><em style="text-indent: 1em;font-size: 14pt">distance to the mean</em><span style="text-indent: 1em;font-size: 14pt">. We have to average these (by adding all of the distances of all cases's values together and dividing by </span><span style="text-indent: 1em;font-size: 14pt">their total number) to obtain the variance and the standard deviation. Once we have these dispersion measures, we'll be able to tell how <em>all</em> cases are spread out around the mean. This, in turn, gives us information about how much <em>variability</em> there is in a given variable's cases, if they are dispersed or clustered together.</span>

&nbsp;

You'll be glad to know that the variance and the standard deviation are calculated in almost the exact same way;  the standard deviation needs just one additional mathematical operation after getting the variance. In a sense, they calculate the same thing but are expressed differently, and the standard deviation is usually considered easier to interpret.

&nbsp;

This is all the good news I have for you at this point, I'm afraid, as what follows is a calculation process containing several steps. On the whole, it may look complicated though it really isn't; the key is to not forget what you are doing and where you are in the process. If you find yourself losing track, simply go back and start from the beginning, paying attention to what steps you go through.

&nbsp;

<strong>Variance</strong>. Since we want an average of the distances of the cases from the mean, it would make sense to start with getting these distances as a Step 1. Step 2 would be to add these distances together, then Step 3 would be to divide the sum on their total number. This is easier said that done, as you shall see (ominous foreshadowing!), so I'll divide Step 1 into two sub-steps, Step 1A (getting the distances) and Step 2B (a procedure I'll keep as a mystery for now).

&nbsp;

As usual, we'll do all this through an example. For simplicity's sake, I'll reuse Examples 4.2/4.3 from the previous section which we used to introduce the concept of IQR.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example 4.4 (A) <em>Weekly Hours Worked,</em> Revisited</p>

</header>
<div class="textbox__content">

&nbsp;

If you recall, we had imagined you as a research assistant (RA) on a research project and you had worked 20 weeks in total in the last two semesters, ten weeks in each semester. The maximum hours per week you could work was 15, limited by the nature of your contract.

&nbsp;

As there are a lot of calculations to be done, to simplify our job, let's imagine further that we're interested in only one of the two semesters you had worked, and these are only the hours in the <em>ten</em> weeks of that one semester:

&nbsp;

3, 3, 5, 7, 8, 10, 12, 12, 13, 14

&nbsp;

Considering that <strong>for Step 1A we need the distances of each of these ten values to the mean</strong>, we'll calculate the mean as a preliminary requirement.[footnote]Since <em>N</em>=10 or more makes for quite the long equations if the values are listed (summed) one by one separately, from now on I will group values by frequencies in the calculations I do as a matter of principle. (I.e., instead of <em>3+3</em>, here I have <em>(3)2</em>, instead of <em>7+7+7</em>, I would have <em>(7)3</em>, etc.) Coincidentally, this is exactly what we do when working with data organized in a frequency table.[/footnote]

&nbsp;

\begin{equation*}
\begin{aligned}
&amp; \frac{\sum\limits_{i=1}^{N}{x_i}}{N} = \\
&amp;= \frac{(3)2+5+7+8+10+(12)2+13+14)}{10} = \\
&amp;=\frac{(6+5+7+8+10+24+13+14)}{10}=\frac{87}{10}=8.7=\overline{x}
\end{aligned}
\end{equation*}

&nbsp;

Armed with the mean of 8.7 hours, we can now proceed to calculate the distance of every value to the mean (i.e., subtract the mean from each value to obtain the difference). I list the values and their respective distances from the mean in the table below.

&nbsp;

<em>Table 4.3 Step 1A Calculating Distances To the Mean</em>
<table class="lines" style="border-collapse: collapse;width: 100%;height: 165px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center"><strong>$x_i$</strong></td>
<td style="width: 50%;height: 15px;text-align: center"><strong>$(x_i - \overline{x})$</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">3</td>
<td style="width: 50%;height: 15px;text-align: center">(3 - 8.7) = -5.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">3</td>
<td style="width: 50%;height: 15px;text-align: center">(3 - 8.7) = -5.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">5</td>
<td style="width: 50%;height: 15px;text-align: center">(5 - 8.7) = -3.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">7</td>
<td style="width: 50%;height: 15px;text-align: center">(7 - 8.7) = -1.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">8</td>
<td style="width: 50%;height: 15px;text-align: center">(8 - 8.7) = -0.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">10</td>
<td style="width: 50%;height: 15px;text-align: center">(10 - 8.7) = 1.3</td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">12</td>
<td style="width: 50%;height: 15px;text-align: center">(12 - 8.7) = 3.3</td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">12</td>
<td style="width: 50%;height: 15px;text-align: center">(12 - 8.7) = 3.3</td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">13</td>
<td style="width: 50%;height: 15px;text-align: center">(13 - 8.7) = 4.3</td>
</tr>
<tr style="height: 15px">
<td style="width: 50%;height: 15px;text-align: center">14</td>
<td style="width: 50%;height: 15px;text-align: center">(14 - 8.7) = 5.3</td>
</tr>
</tbody>
</table>
Again, as usual, $x_i$ is the value of each and any Case #$i$ (from 1 to 10), and $(x_i - \overline{x})$ is the distance (i.e., difference) between each and any Case #$i$ (from 1 to 10) to the mean.

</div>
</div>
&nbsp;

Now if we were to jump directly to Step 2 (summing all distances together) and Step 3 (dividing by the total number), we would be in trouble. You see, since the mean averages all values and provides a "centre" of the variable's distribution value-wise,<strong> distances of the values below the mean equal the distances of the values above the mean, albeit with opposite signs.</strong>

&nbsp;

That is, summing all values <em>below</em> the mean (i.e., the negative differences) would equal the sum of all values <em>above</em> the mean (i.e., the positive differences). <strong>As one sum is negative and the other positive (but with the same <em>absolute value</em></strong><span style="font-size: 14pt">[footnote]The absolute value of a positive number is the number itself; the absolute value of a negative number is the number itself but without the negative sign; the absolute value of zero is zero. Absolute value is noted with two straight vertical line. For example, the absolute values of -1 and 1 are equal to each other: |-1| = |1| = 1. [/footnote]</span><span style="font-size: 14pt"> </span><strong style="text-indent: 1em;font-size: 14pt">)</strong><span style="text-indent: 1em;font-size: 14pt">, </span><strong style="text-indent: 1em;font-size: 14pt">they cancel each other out -- adding them together would result in 0, every time.</strong><span style="text-indent: 1em;font-size: 14pt"> This is due to the very nature of the calculation of the mean; it's a mathematical inevitability.</span>

&nbsp;

Don't believe me? Try it. The sum of the distances <em>below</em> the mean is:

&nbsp;

$$(-5.7) + (-5.7) + (-3.7) + (-1.7) + (-0.7) = -17.5$$

&nbsp;

The sum of the distances <em>above</em> the mean is:

&nbsp;

$$1.3 + 3.3 + 3.3 + 4.3 + 5.3 = 17.5$$

&nbsp;

Thus, the sum of <em>all</em> distances from the mean is

&nbsp;

$$(-5.7) + (-5.7) + (-3.7) + (-1.7) + (-0.7) + 1.3 + 3.3 + 3.3 + 4.3 + 5.3 =\\

&amp;= -17.5 + 17.5 = 0$$

&nbsp;

Told you: <em>Zero</em>. <em>Every. Time</em>.[footnote]If you're still not convinced and think that maybe I selected the numbers <em>just so</em> that the distances to their mean add up to zero on purpose, you are welcome to try this 'trick' with any set of numbers.[/footnote]

&nbsp;

So if the sum of the distances to the mean is always zero, then what? How are we to average those distances, since dividing the sum (i.e., zero) on any <em>N</em> would give us zero? Are we to give up?

&nbsp;

The thing is, the distances (below and above the mean) only cancel each other out because we consider the distances below the mean as <em>negative</em>. This, however, is a somewhat of a mathematical conceptual artifact: in real life, there is no such thing as a negative distance from one thing to another. Imagine yourself standing between two of your friends, one on your left and the other on your right. Let's assume they both stand a meter away from you: you wouldn't say that one is a negative meter away while the other is a positive meter away, would you? There are no negative and positive meters, just meters (and well, they are always positive, as distance in the physical sense always is).

&nbsp;

Thus we are actually not interested in summing the cases' distances from the mean <em>as calculated</em> but only in their "positive version" ignoring their signs, i.e., we want their <em>absolute values</em>.

&nbsp;

True, we could proceed with our Steps 1 and 2 using only positive distances. When done, this produces an actual dispersion measure called <em>mean deviation</em> (or <em>mean</em> <em>absolute deviation</em>). The mean deviation is easy to understand and quite intuitive, however (and perhaps to your chagrin), it is rarely used -- specifically because we have the variance and standard deviation which are found to be much more useful (this comes into play in inferential statistics, as you will see in the latter part of this book). Due to its unpopularity, I'll therefore skip the mean deviation -- we'll have to look for another way of getting only positive numbers for our calculation of the average distance from the mean.[footnote]For the curious souls out there (all three of them), this is what the mean deviation looks like, using the numbers from Example 4.4 (A) above. As the below-the-mean sum was -17.5 and the above-the-mean sum was 17.5, ignoring the negative signs we would get $5.7 + 5.7 + 3.7 + 1.7 + 0.7 + 1.3 + 3.3 + 3.3 + 4.3 + 5.3 = 17.5 + 17.5 = 35$. Since <em>N</em>=10, by averaging the distances we get $\frac{35}{10}=3.5$ (the mean absolute deviation). That it, the average distance of a case's value from the mean is 3.5, or, in terms of our example, your weekly hours (which ranged from 3 to 14) <span style="text-indent: 18.6667px;font-size: 14pt">on average </span><span style="text-indent: 1em;font-size: 14pt">varied by 3.5 hours from the mean of 8.7 hours, across the ten weeks you worked as a research assistant. [/footnote]</span>

&nbsp;

Now stop and think: beside absolute values, is there another way of turning numbers positive?

&nbsp;

If you thought of squaring, good for you! A (non-zero) number squared is a positive number: $(-2)^2 = 2^2 = 4$. Thus one other way of getting around our distances-summing-to-zero problem is to <em>square</em> the distances <em>before</em> adding them up! Nifty trick, eh?

&nbsp;

Let's test how this works with our Example 4.4.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 4.4 (B) Weekly Hours Worked, Revisited</em></p>

</header>
<div class="textbox__content">

&nbsp;

A reminder: what we are trying to get is a dispersion measure giving us an average distance of the cases to the mean; something to account for the variability of<em> all</em> cases, not just a few (unlike the range and IQR). To make the calculations look more orderly, I add a third column to Table 4.3 above, one with the squared distances. Thus, our mysterious <strong>Step 1B is squaring each individual distance</strong>.

&nbsp;

<em>Table 4.4 Step 1B Squaring Individual Distances</em>
<table class="lines" style="border-collapse: collapse;width: 131.917%;height: 165px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center"><strong>$x_i$</strong></td>
<td style="width: 36.6855%;height: 15px;text-align: center"><strong>$(x_i - \overline{x})$</strong></td>
<td style="width: 61.0955%;height: 15px;text-align: center"><strong>$(x_i - \overline{x})^2$</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">3</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(3 - 8.7) = -5.7</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(-5.7)<sup>2</sup> = 32.5</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">3</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(3 - 8.7) = -5.7</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(-5.7)<sup>2</sup> = 32.5</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">5</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(5 - 8.7) = -3.7</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(-3.7)<sup>2</sup> = 13.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">7</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(7 - 8.7) = -1.7</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(-1.7)<sup>2</sup> = 2.9</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">8</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(8 - 8.7) = -0.7</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(-0.7)<sup>2</sup> = 0.5</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">10</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(10 - 8.7) = 1.3</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(1.3)<sup>2</sup> = 1.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">12</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(12 - 8.7) = 3.3</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(3.3)<sup>2</sup> = 10.9</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">12</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(12 - 8.7) = 3.3</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(3.3)<sup>2</sup> = 10.9</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">13</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(13 - 8.7) = 4.3</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(4.3)<sup>2</sup> = 18.5</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">14</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(14 - 8.7) = 5.3</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(5.3)<sup>2</sup> = 28.1</td>
</tr>
</tbody>
</table>
&nbsp;

We are thus ready for <strong>Step 2: summing up the (now-squared) distances from the mean</strong>:

&nbsp;

$\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2} =(32.5)2+13.7+2.9+0.5+1.7+(10.9)2+18.5+28.1=\\

= 152.1 =<span style="text-indent: 33.6px;font-size: 0.9em">$</span><span style="text-align: initial;text-indent: 2em;font-size: 0.9em">      ← <em>Sum of Squares </em></span>

&nbsp;

As you can see above, <strong>the sum of the squared distances from the mean is called the<em> sum of squares </em></strong>(sometimes indicated by <em>SS</em>).

&nbsp;

Finally, to get the average distance from the mean we need <strong>Step 3: to divide the sum of squares by the total number, <em>N</em></strong>:

&nbsp;

$\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N}= \frac{152.1}{10} = 15.21 =\sigma^2 =$      ← <em>variance</em>

&nbsp;

That is, the variance of your hours worked per week is 15.21, or the average of the squared distances from the mean is 15.21. (Note that we cannot say 15.21 <em>hours</em> as now we are working in squared units.)

&nbsp;

</div>
</div>
&nbsp;

And this is it, the <em>variance</em>. It is denoted by a small-case Greek letter <em>s</em>, i.e. <em>σ</em>[footnote]<span style="font-size: 14pt">It is pronounced SIG-ma, just like Σ which is the capital-case Greek letter <em>S</em>.</span><span style="text-indent: 1em;font-size: 14pt">[/footnote] and, since it's in squared units, actually <em>σ</em></span><em style="text-indent: 1em;font-size: 14pt"><sup>2</sup></em><span style="text-indent: 1em;font-size: 14pt"> (SIG-ma-squared).[footnote]An alternative notation for</span><span style="text-indent: 1em;font-size: 14pt"> variance you might encounter is <strong>var(<em>x</em>)</strong> where <em>x</em> is the variable in question.[/footnote]</span>

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>79</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:24:07]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:24:07]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[4-3-variance]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>26</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[3-2-standard-deviation-and-variance]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[4-2-variance-and-standard-deviation]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[4-2-variance]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>3.7 Central Tendency and the Levels of Measurement</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-7/</link>
		<pubDate>Wed, 31 Oct 2018 21:30:48 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=85</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

This chapter introduced a lot of new concepts and terminology so a recap is in order. The three measures of central tendency -- the mode, the median, and the mean -- provide information about the so-called "centre of gravity" of a variable's distribution, or where the cases tend to cluster. <strong>The <em>mode</em> provides the most frequent category/value; the <em>median</em> provides the middle point/"centre" of the data and bisects the distribution into two equal part; and the <em>mean</em> is the mathematical average of values.</strong>

&nbsp;

One thing worth repeating is the caveat about the appropriateness of each of the measures of central tendency given the level of measurement of the variables at hand. Below is a quick, "cheat sheet" type of <strong>a table summarizing which central tendency measures are appropriate for which levels of measurement</strong>.

&nbsp;

<em>Table 3.8 What Central Tendency Measures to Report for The Different Types of Variables</em>
<table class="shaded" style="border-collapse: collapse;width: 100%;height: 68px" border="0">
<tbody>
<tr style="height: 17px">
<td style="width: 25%;height: 17px"></td>
<td style="width: 25%;height: 17px;text-align: center"><strong>Nominal Scale</strong></td>
<td style="width: 25%;height: 17px;text-align: center"><strong>Ordinal Scale</strong></td>
<td style="width: 25%;height: 17px;text-align: center"><strong>Interval/Ratio Scale</strong></td>
</tr>
<tr style="height: 17px">
<td style="width: 25%;height: 17px"><strong>Mode</strong></td>
<td style="width: 25%;height: 17px;text-align: center">♦</td>
<td style="width: 25%;height: 17px;text-align: center">♦</td>
<td style="width: 25%;height: 17px;text-align: center">♦</td>
</tr>
<tr style="height: 17px">
<td style="width: 25%;height: 17px"><strong>Median</strong></td>
<td style="width: 25%;height: 17px;text-align: center"><strong>-</strong></td>
<td style="width: 25%;height: 17px;text-align: center">♦</td>
<td style="width: 25%;height: 17px;text-align: center">♦</td>
</tr>
<tr style="height: 17px">
<td style="width: 25%;height: 17px"><strong>Mean</strong></td>
<td style="width: 25%;height: 17px;text-align: center"><strong>-</strong></td>
<td style="width: 25%;height: 17px;text-align: center"><strong>-</strong></td>
<td style="width: 25%;height: 17px;text-align: center">♦</td>
</tr>
</tbody>
</table>
&nbsp;

<strong>In other words, the mode is appropriate for all variables, regardless of their level of measurement; the median works only with ordinal and interval/ratio variables; and the mean can be calculated only for interval/ratio variables. </strong>

&nbsp;

I'll also restate it in terms of the variable type: <strong>nominal variables have only a mode; ordinal variables a mode and a median; and interval/ratio variables have all three measures of central tendency.</strong>

&nbsp;

In terms of working with SPSS, as usual, it is <em>you</em> who makes the decision to request modes, medians, and means. You can either memorize the above Table 3.8, or, better yet, understand the logic behind each central tendency measure to know whether it's logically possible to apply it to a variable of a given scale -- but in either case, SPSS will not make the decision for you.

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch Out!!</strong></span>  #9... for Trusting SPSS to Provide Only Appropriate Measures</em></p>

</header>
<div class="textbox__content">

&nbsp;

SPSS cannot tell you the appropriate central tendency measures for a specific variable. Sometimes, if you make a mistake, depending on the mathematical procedure requested, SPSS might be <span style="font-size: 1rem">genuinely</span><span style="font-size: 1rem"> </span><span style="text-indent: 1em;font-size: 1rem">unable to execute a command which will alert you to the fact that you have made an error. <strong>However, in many cases SPSS will execute a command and will produce output, regardless of whether the command makes logical sense or not. </strong></span>

&nbsp;

<span style="text-indent: 1em;font-size: 1rem">To your bad luck, the measures of central tendency (and, as we will see in the next chapter, the measures of dispersion) are exactly one of these cases where SPSS will produce <em>any</em> measure of central tendency for <em>any</em> variable you ask of it. Thus, for example, if you request a mean for <em>race/ethnicity</em>, or a median for <em>religious affiliation</em>, it will execute the commands and give you what you asked for: it will produce numbers (which, if you remember, stand for the numerical labels of the categories). It will be then up to you to interpret those numbers. </span>

&nbsp;

<span style="text-indent: 1em;font-size: 1rem">This, however, would be a logical impossibility -- there is no average <em>race/ethnicity</em>, nor "centre value" for <em>religious affiliation</em>. You would have made a mistake, and SPSS would have let you have your meaningless output.</span>

&nbsp;

This basically illustrates the saying "garbage in, garbage out": if you input nonsense, the output will be nonsensical too. It thus falls on you to not input nonsense and to not request measures of central tendency for variables for which they are inappropriate.

&nbsp;

</div>
</div>
&nbsp;

Results aside, proper communicating of findings is also very important. Even when output is produced correctly, your job is still not done: you still have to interpret the results and communicate what you have found. Considering that people in general (including in the social sciences) are variously trained in quantitative research, it is always a good idea to "translate" the more technical jargon into a more easily understandable, everyday language.

&nbsp;

Specifically about descriptive statistics like the measures of central tendency we explored in this chapter, or the measures of dispersion in Chapter 4, the goal is to communicate your findings not only about <em>variables</em> and <em>measures</em> and <em>modes</em>, etc. but to explain what you have found in terms of <em>people</em> (or whatever units of analysis you happen to work with). Thus, "the mode of <em>religious affiliation</em> is..." becomes "the most frequently reported religious affiliation is..." or even "respondents most frequently identified as ... in terms of their religious affiliation". (As well, getting into the habit of "translating" variable-centric jargon into people-centered statements is a good practice for your understanding of the material.)

&nbsp;

Finally, a related issue is remembering to use the variable's units of measurement when communicating results. To give a few examples, the median of <em>number of siblings</em> is measured in "siblings", the mean of <em>income</em> is measured in "dollars", the mode of <em>age</em> is measured in "years", etc. If you know the unit of measurement of the variable you describe (and you should), use it: a median age is never, say, 20; it's 20 <em>years</em>.

&nbsp;

With this done, we now turn to the last set of measures used to describe variables, namely measures of dispersion.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>85</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:30:48]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:30:48]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[3-7]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>24</wp:post_parent>
		<wp:menu_order>7</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[2-4-spss]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[3-4]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>4.5 Summary</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/4-5-summary/</link>
		<pubDate>Wed, 31 Oct 2018 21:31:43 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=87</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

It sure feels like we've covered a lot! You might need a recap. You will find it below.

&nbsp;

The measures of dispersion tell us how a variable's cases are distributed: whether they are more tightly clustered together, or more loosely spread out. After all, it's perfectly possible to have two variables with the same central tendency measures but with different measures of dispersion!

&nbsp;

There are four measures of dispersion that are typically used: range, interquartile range (IQR), variance, and standard deviation. While the former two are simple and account for the dispersion of cases only through the positioning of a few cases in the (ordered) distribution, the latter two employ <em>all</em> cases's values to produce somewhat more complicated and comprehensive measures of a variable's spread.

&nbsp;

The range reports the difference between the highest and the lowest values. The IQR provides the same but for the middle half of the cases. The variance calculates <em>something like</em> an average of the squared distances of all cases from the mean (in squared terms), while the standard deviation, through square-rooting the variance, provides us with an almost-average of the distances of all cases from the mean (in standard -- i.e., <em>regular</em> -- units). Generally, the larger the measures of dispersion, the more <em>variability</em> the variable has.

&nbsp;

Finally, as they all require numerical values, all measures of dispersion are applicable only to interval/ratio variables: we cannot provide dispersion measures for nominal or ordinal variables.

&nbsp;

With this, we have the full range of measures to describe variables: we not only learned how to graph variables to see their distribution visually, but also to calculate how their cases cluster (through the three measures of central tendency, the mode, the median, and the mean) and how the cases can spread (through the four measures of dispersion, the range, the interquartile range, the variance, and the standard deviation).

&nbsp;

We also learned that while we can graph all types of variables, the measures of central tendency and dispersion vary in their applicability depending on a variable's level of measurement. <strong>While the mode applies to all variables, and the median to ordinal and interval/ratio variables, the mean, the range, the IQR, the variance, and the standard deviation apply <em>only</em> to interval/ratio variables. </strong>Keep this in mind when deciding what kind of information to provide about a specific variable.[footnote]<span style="text-indent: 18.6667px;font-size: 14pt">Again, do not trust SPSS to make that decision for you: it cannot and it will not.</span><span style="text-indent: 1em;font-size: 14pt">[/footnote] </span>

&nbsp;

Before we continue inching toward inferential statistics, starting with the normal curve and basic of probability in Chapter 5, here is a handy list of things you should know before proceeding further.

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title" style="text-align: center"><strong>What You Need To Know So Far</strong></p>

</header>
<div class="textbox__content">
<ul>
 	<li>How to visually display a variable's distribution (i.e., how to graph variables) and the proper graph for each variable type depending on level of measurement;</li>
 	<li>How to display a variable's distribution in a tabular format, specifically how to create and how to read frequency tables;</li>
 	<li>What the central tendency measures are, how many and what they are, their applicability to variable types depending on level of measurement, and what methods there are to obtain them (including calculation);</li>
 	<li>What the central dispersion measures are, how many and what they are, their applicability to variable types depending on level of measurement, and what methods there are to obtain them (including calculation);</li>
 	<li>What outliers are and how they affect the central tendency and dispersion measures, and what makes a more appropriate measure of central tendency or dispersion in the presence of outliers.</li>
 	<li>How to interpret graphs, frequency tables, measures of central tendency, and measures of dispersion both by using statistical jargon and <em>without</em> using statistical jargon. (You should be able to explain what any of these concepts are and what they mean to someone not trained in statistics.)</li>
 	<li>Finally, to use proper and precise vocabulary to express yourself both orally and in writing when discussing statistics concepts -- including <em>variables, measurement, operationalization, levels of measurement, units of analysis, units of measurement, etc.</em></li>
 	<li><strong>Hint/Warning: If any of the above gives you trouble, go back and reread the relevant section. Proceeding further with gaps in your knowledge will only make things worse. (There is no hope that by reading the more complicated material which follows you will suddenly learn/understand the things discussed so far!)</strong></li>
</ul>
</div>
</div>
&nbsp;

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>87</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:31:43]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:31:43]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[4-5-summary]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>26</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[3-4-spss]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[4-4-summary]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[4-3-summary]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>6.1 Populations and Samples</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-1-populations-and-samples/</link>
		<pubDate>Wed, 31 Oct 2018 21:37:23 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=91</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Before we start, yet another word of warning: what follows is only a brief overview of the topic of sampling and types of sampling. <span style="font-size: 14pt">What I offer is enough in terms of a necessary background to statistical inference -- but the main learning objective here </span><em style="font-size: 14pt">is</em><span style="font-size: 14pt"> inference, </span><em style="font-size: 14pt">not</em><span style="font-size: 14pt"> everything there is to know about sampling methods and their intricacies. Thus, i</span><span style="text-indent: 1em;font-size: 14pt">f this is the first time you encounter the concept, you would be better served to read a thorough introduction on sampling and the benefits and downsides of the different sampling methods in virtually any one of the research methods textbooks you can find as it would give a more comprehensive treatment that I do here. </span>

&nbsp;

With that in mind, onward to the preliminaries: populations and samples.

&nbsp;

In the introduction to this chapter, I asked a question: <em>Do Canadians approve of immigration?</em> How, do you think, we can go about answering it?

&nbsp;

Presumably, the simplest way to investigate this would be <em>to simply ask -- </em>imagine we contacted everyone and, indeed,  simply asked them whatever version of the question we have decided on (i.e., whichever way we have operationalized our variable, <em>attitudes to immigration</em>), noting everyone's responses. Many governments, both historically and to this day, have employed and still employ this method for gathering information.

&nbsp;

<strong>When we gather information from everyone in whom we are interested, we are doing a <em>census</em>.</strong> You probably know that the Government of Canada, through Statistics Canada, conducts a census of the Canadian population every five years. (You might have even filled the form yourself, if you are of age, or seen your parents do it otherwise.) Then, can the government (or any researcher/agency for that matter) collect information about everything it might need or want through censuses, every time the information is required?

&nbsp;

Theoretically, it's an option. In practice, no way: it would be prohibitively expensive. You might find the reason prosaic, but any research is limited by the availability of resources, money <em>and</em> time. Asking one additional question on a questionnaire to one additional person has costs, which add up quickly the more questions and the more people are included in the study. Thus, censuses of the population are enormous undertakings reserved for collecting only <em>really</em> important (typically demographic) information, and are usually quite limited in scope.[footnote]For more information on the Canadian census program see here: https://www12.statcan.gc.ca/census-recensement/index-eng.cfm[/footnote][footnote]Censuses of the population are so expensive, some governments cannot afford to do them (or at least not regularly) and instead rely on survey data from samples. As well, in some places censuses can be fraught with controversies due to racial/ethnic and/or religious tensions, etc. and are therefore avoided. (REFERENCE Weeks 2015).[/footnote]

&nbsp;

<span style="text-indent: 1em;font-size: 14pt">Given that conducting censuses for everything anyone (researches, governments, etc.) might want information on is generally impractical/unfeasible, what can be done when information about a population is needed? </span>

&nbsp;

<span style="text-indent: 1em;font-size: 14pt">Here is where statistics saves the day: with probability theory and inferential statistics, we can use the next best thing to a census -- <em>random-sample surveys</em>! </span><span style="text-indent: 1em;font-size: 14pt">My job in this chapter will be to convince you that you don't need to do a census of the population you want to study as long as you have a well-selected sample.</span>

&nbsp;

You, undoubtedly, have taken a survey at some point in your life in one form or another: a survey for which you were selected/invited or you volunteered; which included other people but definitely not <em>everyone</em>. In other words, unless we are discussing a census, surveys typically are administered to <em>samples</em> (i.e., sub-groups) of the population. However, not all surveys are created equal: those that can "substitute" for the population, as it were, rely on the just-mentioned technique of <em>random sampling</em>.

&nbsp;

But first off, let's establish what samples and populations really are. While it's intuitive to think of <em>population</em> as the population of a country (say, 36.7 mln. Canadians), and of <em>sample</em> as a sub-group of that population (say, ten thousand Canadians), this is only a special case of the general terms <em>sample</em> and <em>population</em>. <strong>In research, a <em>population</em> is a group encompassing everyone on whom we want information, i.e. everyone (or everything) we want to study.</strong> Considering that we might not be studying people (recall that the units of analysis can be countries, organizations, etc.), we say that <strong>a population encompasses all elements under study</strong>. This means that we could have study populations such as "countries in South America", or "hospitals and medical clinics in Toronto", or "departments of sociology in Canadian universities", etc.

&nbsp;

As well, while the elements may be people, instead of the whole population of a country, we might be interested in studying "university students in Canada," or "early childhood educators in British Columbia," or "dog walkers in downtown Vancouver," or "Telus company employees," or "dentists in Surrey, BC," etc. All of these examples are of populations that can be defined as such by researchers interested in them.

&nbsp;

Thus,<strong> a <em>sample</em> is any sub-group of the population under study</strong>. For example, if I decide to study "KPU students," my study population would be defined as "everyone registered as a student at KPU." If I select a hundred student for my study, I would have a sample of <em>N</em>=100.

&nbsp;

Ultimately, again, <strong>what the population for a particular study is depends on what the researcher wants to study</strong>.

&nbsp;

If we go back to the <em>Do Canadians approve of immigration?</em> example, the population under study would be, of course, "Canadians" but we have to be very careful how we define "Canadians": Are we interested in <em>all</em> Canadians, regardless of where they live/are at the moment? (I.e., do we include ex-pats, people with dual citizenship residing abroad, Canadian tourists travelling the world, etc.?) Or do we only want to study Canadians <em>in Canada</em>? And do we want to study permanent residents in Canada too or only people with Canadian passports?  Regardless of how we want to define our study population, it has to be precise and have objective criteria that we follow consistently.

&nbsp;

Once a researcher has decided on and defined a study population, and collecting data on all elements of that population is considered unfeasible[footnote]And, as you will eventually see, collecting data on all elements of the population might be even undesirable as its unnecessary, even if it were feasible.[/footnote], the researcher needs to select a sample for their study.

&nbsp;

<strong>The procedure of selecting a sample is called <em>sampling.</em>There are two broad types of sampling, <em>non-random</em> and <em>random</em></strong>, and the next section is devoted to that.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>91</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:37:23]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:37:23]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[6-1-populations-and-samples]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[6-1-non-random-vs-random-sampling]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-1-non-random-vs-random-sampling]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-1-populations-and-samples]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>6.4. The Sampling Distribution</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-4-the-sampling-distribution/</link>
		<pubDate>Wed, 31 Oct 2018 21:37:55 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=94</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

With this section we reach a point where you'll have to make a good use of your imagination and abstract thinking.  Unlike our presentation and discussion of variables early on, giving real-life examples for this material becomes impossible as the sampling distribution lies firmly in the realms of abstract mathematical concepts. Yet we need it because it's the sampling distribution which makes inference possible and bridges the gap between a sample and the population from which it was taken.

Thus, as promised in my introduction to keep everything I present to its most necessary minimum to be understandable, below I offer as non-technical and non-mathematical explanation of what the sampling distribution is and how we use it as possible. However, this course of action has its obvious inevitable downsides: since we are skipping the actual mathematical proofs and going directly for the results of these, you will have to accept the presentation at my word. This is a hard thing to ask of anyone (<em>"it is what it is because I tell you so"</em>). My justification for doing this is because the vast majority of my students so far seem to find the alternative (<em>"it is what it is because of all this very long presentation of complex mathematical concepts and complicated procedures"</em>) even more unpalatable without any gains in comprehensibility -- and, as such, ultimately mostly useless. (Of course, if interested, you can always check other, more comprehensive books and online sources.) [PERHAPS SUGGEST?]

Despite the dire warning about upcoming doom in the form of abstract concepts, I still start with an example.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Age of Classmates</em></p>

</header>
<div class="textbox__content">

Imagine you are enrolled in a class along with 49 other students, so the total class size is 50. Let's say as a class assignment (perhaps in a research methods class) you are tasked with taking a sample of your class and administering a survey to your sample. In this sense, your class is your population of interest. For simplicity's sake, we focus on one possible question, say, <em>age of respondent</em>. You want to know the average age of the study population but, instead of asking all 50 of your classmates, you draw a random sample of them for the purposes of estimating the class's average age.[footnote]Of course, with a population of only 50 in real life you can just collect the information from everyone. I'm using a small-size example for teaching purposes only and to make calculations manageable. The principle of sampling applies equally not only to a population of 50 but of any size -- and when your study population's size is in the millions, you wouldn't attempt to survey all of them (barring the already discussed case for censuses).[/footnote]

Now despite that I still haven't said anything about sample size (but we're getting there), I'll assume that a sample size of 10 (i.e., 20 percent of the population) would sound reasonable enough to you. The random draw yields the following ten classmate's ages:

19, 19, 20, 20, 20, 21, 21, 22, 23, 28

Based on these values, the average age of the sample, $\overline{x}$, is

$\overline{x}=\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{(19)2+(20)3+(21)2+22+23+28}{10}= \frac{213}{10}=21.3$

I.e., your sample's average age is 21.3 years. Considering that these ten people were randomly drawn, and that they are, well, only <em>ten</em>, can we assume that the average age of your <em>entire</em> class of 50 is 21.3 years? While this is a good -- <em>educated</em> even -- guess and a good starting point, <strong>it is unlikely that, had you polled everyone in the class, your calculation would have produced <em>exactly</em> 21.3</strong>. After all, polling 10 people is not the same as polling 50; in the latter case your calculation would include a lot more information than in the former. Thus, it's also reasonable to expect that there will be <em>some</em> difference between the mean based on the sample, $overline{x}$, and the <em>true</em> population mean, <em>μ</em>.

The how about if you decided to draw another random sample of ten people out of your class? Would you expect to have the exact same mean of 21.3 years? Unless you somehow end up with the exact same ten people who were in the first sample (and after Chapter X on probability you should know how minuscule that probability is), it is again unlikely you'd get the same mean. We could easily imagine that the new, second sample's ages might look like this:

18, 19, 19, 19, 20, 20, 22, 22, 24, 25

Based on these ten new values, the average age of the second sample (let's call it $\overline{x_2}) is:

$\overline{x_2}=\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{18+(19)3+(20)2+(22)2+24+25}{10}= \frac{208}{10}=20.8$

I.e., your <em>second</em> sample's average age is 20.8 years, despite it being drawn from the same population. Your two samples (of the same size) yielded two close -- but still different -- numbers. As well, following the same logic as before, it's just as unlikely that the population mean <em>μ</em> (your class's average age) is 20.8 years as it was unlikely that it's 21.3 years (the sample is still only 10 people).

</div>
</div>
What then? How can we trust a sample statistic to estimate a population parameter? It appears we need more information. Before we get to that, however, let's finally address the elephant in the room - the issue of <em>sample size</em> I have been neglecting so far.

<strong>Sample size.</strong> One reason you might think the sample estimates in the Example XX above differ (both from each other and from the true population mean) could be the sample size: isn't N=10 just too small? The answer is of the <i>yes-but-no </i>variety: No, a sample size that's 20 percent of the population size is actually quite big for a research study of a typical, relatively large size. Yes, a sample of 10 out of population of 50 <em>is</em> way too small. And, in general, yes, the larger the sample the better. Let's unpack-- and qualify -- all of these three contradicting pieces of information properly.

Inferential statistics -- at least the typical kind discussed in this textbook -- is about estimating <em>relatively large</em> populations; luckily, quantitative social science research most commonly deals with such populations too.[footnote]There is no magic number as to what constitutes a relatively large population, and therefore an adequate minimum requirement for a sample size. For the latter, I could offer 100; some suggest 30, others 50 but in truth all these are more or less arbitrary. It <em>is</em> a fact that having a larger sample (both in absolute and proportionate sense) puts you in a safer ground in terms of statistical inference (this has to do with probability theory, the law of large numbers, the sampling distribution, the normal curve, and the Central Limit Theorem discussed below for which to work, say, a minimum N=30 is a frequently cited number. What you can take out of this is that it's better to avoid dealing with <span style="text-indent: 1em;font-size: 14pt">N&lt;30 (or even N&lt;100), as the tools and methods discussed in textbook are better suited for larger sample (and population) size. [/footnote] </span><strong style="text-indent: 1em;font-size: 14pt">The recommended sample size depends on the size of the population it will be used to estimate but at <em>diminishing returns</em>: the larger the population, the larger the sample's <em>absolute </em>size should generally be -- but at the same time the gains of the larger sample size diminish (to zero), the larger the population is. </strong><span style="text-indent: 1em;font-size: 14pt">In other words, <strong>smaller populations need samples of bigger proportion to represent them correctly, while larger populations need samples of smaller (and smaller, and smaller) proportions to do so.</strong> (This also means that even if you have larger and larger populations, there will be no gains in increasing the sample size beyond a certain point.)</span>

In reality, no one would try<em> estimating</em> the parameters of a population as small as 50, as in most cases they can be easily obtained -- not to mention that to have a meaningful estimate of a population that small, one would indeed need almost the entire sample. Sample size calculators are abundant and free online[footnote]You can find one example <a href="https://www.surveymonkey.com/mp/sample-size-calculator/?ut_source=help_center">here</a>, at SurveyMoneky.com (https://www.surveymonkey.com/mp/sample-size-calculator/?ut_source=help_center).[/footnote] but to give you an idea of the diminishing returns to increasing sample size I'll just list a few. To estimate a population of 200, you'll typically need a sample of about 180;[footnote]Here and on "typically" refers to a frequently used <em>margin of error</em> of ±2.5%; more on what this actually means in Section XX.[/footnote] to estimate a population of 500, you'll typically need a sample of about 380; to estimate a population of 1,000, a sample of 600 would be adequate; for a population of 2,000, a sample of about 870 would work; for a population of 5,000, a sample of 1,200 would be enough; for a population of 10,000, a sample of about 1,300 would be enough... then for a population of 50,000, a sample of about only 1,500 would suffice, and a population of 100,000 would do just as well with the same number of 1,500.[footnote]You can also find a table summarizing sample size like <a href="https://www.research-advisors.com/tools/SampleSize.htm">this</a> one useful.[/footnote]

What it comes down to is that, to the surprise of many, actually<strong> a sample size of "just" 1,500 respondents can safely and accurately estimate any population 25,000+.</strong> This also means that a random sample of 1,500 people can statistically represent, for example, both the population of Toronto (2.7+ mln. people) <em>and</em> the population of Canada (36.7 mln. people) -- however, it cannot be the <em>same</em> sample (the former needs to be drawn of Torontians only, the latter of all Canadians.[footnote]In truth, researchers do want larger samples to represent Canada (or other countries' populations) but that's only to increase the <em>power</em> (to be defined later) of their statistical findings, not their generalizability. This desire for larger N is, of course, constrained by limited resources (time, money, etc.).[/footnote]
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><span style="color: #ff0000"><strong>Watch Out!!</strong></span>... for (Mis)Judging a Study On Its Sample Size</p>

</header>
<div class="textbox__content">

The point against judging a study on its sample size alone should be clear already but it bears repeating. When people unfamiliar with statistics encounter social-scientific reports based on studies of what they consider a "too small" sample size, they tend to dismiss the findings; they consider the "only 500 respondents" or "only 1000 cases" too few to accurately represent the population from which they were drawn, especially if the population is, in their view, disproportionately large. As you should have learned by now, the generalizability of a study is more a matter of <em>how</em> the sample is drawn, not of its size (beyond a certain point). As long as the chosen sampling method is a type of random sampling, and the sample size is adequate for the population size[footnote]At the desired -- and reported -- margin of error.[/footnote], the results of the study will be generalizable to the population -- the actual sample size doesn't matter for that, even if it may look "too small" to some.

</div>
</div>
In any event, even if it's from a point on unnecessary as demonstrated above, as a logical inevitability, the closer the sample is in size to the population from which it is drawn, the smaller the difference between statistics and parameters should be. Even in the Example XX above with its imagined, only-<span style="text-indent: 18.6667px;font-size: 14pt">for-illustration-purposes</span><span style="text-indent: 1em;font-size: 14pt"> population of 50, getting information from 40 of your classmates instead of the 10 we used in the example should get us an average age that is closer to the true population age (of all 50).  However, as a corollary, unless we obtain information from truly everyone (i.e., we do a census), <strong>in random sampling a difference between the sample statistic and the population parameter will always exist.</strong>[footnote]Well, <em>almost</em> always: it is possible (though very unlikely) that a sample will just so happen to produce the true population parameter. This will also be a result of random chance, as unlikely as it may be.[/footnote] <strong>This difference between the estimate (the statistic)  and what is being estimated (the parameter) is called <em>random error</em>.</strong> Random error is <em>inevitable</em> -- no matter what we do, a sample will always only produce an estimate, never the "real thing", as it were.</span>

<strong>The sampling distribution. </strong>Going back to Example XX above, we can extrapolate that when randomly drawing a sample after sample after sample (of the same size, as long as its adequate) infinite number of times, and calculating a mean after mean after mean, we'll get a long (well, <em>infinite</em>) number of means which will all be somewhat close to, but not exactly, the true population mean. If you could possibly imagine this very long (infinite[footnote]For ease of imagination, I'll stick to "very long/large" from now on, but at the far back of your mind, remember it's actually infinite.[/footnote]) list of means as similar to a variable will a large number of observations, please do so, it helps. This variable you imagined (made of the very large number of means that would be produced by the very large number of samples if we took them) will have a frequency distribution just like any real variable we have discussed so far. <strong>The </strong> <strong>distribution of the variable made of the means is called the <em>sampling distribution of the mean</em>.</strong>[footnote]I provide this definition only to make understanding the sampling distribution easier. It's in no way the technical definition of the sampling distribution. As well, keep in mind that this "variable" made of the means is a perfectly imaginary heuristic device.[/footnote] However, since all this is <em>theoretical</em> (we do not take more than one sample), this distribution is not really about actual frequencies but rather about probabilities. As such, <strong>the sampling distribution is a <em>probability distribution</em><span style="text-indent: 18.6667px;font-size: 14pt">-- it lists each (hypothetical sample's) mean's <em>probability</em> of occurring</span></strong><span style="text-indent: 1em;font-size: 14pt">.[footnote]Compare this to flipping a coin, or throwing a die: as we saw, in both cases the distribution of the <em>possible</em> outcomes (over infinite number of flips/throws) is calculated and known probability distribution. After all, that's why we know that the probability of getting tails or heads is 0.5<em> in theory</em>, just like it's 0.167 for throwing any of the die's six numbers <em>in theory</em> (even if calculating actual flipped/thrown frequencies in real life yields different results).[/footnote]</span>

In a more precise phrasing, all statistics based on samples (e.g., mean, median, deviations, etc. plus many others we haven't yet encountered) have a sampling distribution, which refers to their theoretical[footnote]It is theoretical because we do not actually take multiple, much less infinite, number of samples as there is no need: courtesy of probability theory and the Central Limit Theorem, we just <em>know</em> what <em>would</em> happen if we did.[/footnote] variability over repeated (to infinity) random samples of specific (and equal) size. What we know about the sampling distribution of sample statistics is summarized in the Central Limit Theorem, next.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>94</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:37:55]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:37:55]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[6-4-the-sampling-distribution]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[6-2-the-sampling-distribution]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-4-the-sampling-distribution]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>6.5. The Central Limit Theorem</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-5-the-central-limit-theorem/</link>
		<pubDate>Wed, 31 Oct 2018 21:41:36 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=99</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Despite it's scary-sounding name, the <em>Central Limit Theorem</em> (CLT) simply <em>describes</em> the sampling distribution -- and simultaneously explains why, and how, we can use sample statistics (like the mean of a variable, $\overline{x}$, obtained through sample data) to estimate population parameters (like the true population mean of that variable,<em> μ</em>).

Recall what we use to describe a variable's frequency distribution: 1) a graph to visually display the distribution's shape; 2) measures of central tendency; and 3) measures of dispersion). In the previous section I also asked you to imagine the (entirely theoretical, i.e., <em>probability</em>) distribution of the mean <span style="text-indent: 18.6667px;font-size: 14pt">(again, in theory, o</span><span style="text-indent: 1em;font-size: 14pt">ver infinitely repeated samples). What t</span>he CLT does then is provide information about all three of these elements (shape, central tendency, dispersion) but about the distribution of mean. <strong>In short, the CLT describes the sampling distribution of the mean. </strong>

The sample size plays an important role: the CLT applies for "large <em>N"</em>, and is stated for "as the sample size grows", bringing us back to the point that the larger the <em>N</em>, the better for inference it is (as per the law of large numbers).

Specifically, the CLT states that with random sampling, as <em>N</em> increases (i.e., for large <em>N</em>), the shape, central tendency, and the dispersion (of the sampling distribution) of the mean, $\overline{x}$, will be the following:
<ol>
 	<li>The distribution of $\overline{x}$ will approach normal distribution in shape. (That is, the sampling distribution is a bell-shaped curve.)</li>
 	<li>The mean of the sampling distribution[footnote]You can think of it as "the mean of the means", or the mean of the hypothetical variable <em>mean</em>. [/footnote] (denoted as $\mu_\overline{x}$)  will become the population mean, $\mu$. (That is, $\mu_\overline{x}$ $=\mu$.)</li>
 	<li>The standard deviation of the sampling distribution (denoted as $\sigma_\overline{x}$) is called <em>the standard error</em>, and is related to the population standard deviation,<em> σ</em>, by the formula $\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}$.</li>
</ol>
This may seem like a lot to take in (what with all the jargon, notation, and all) but it really <em>is</em> simply a description of a distribution. The next paragraph clarifies each of the CLT's points in turn.

As brief as it is, the CLT is conveniently packed with all sorts of useful information: The sampling distribution is normal in shape -- so we can apply all we know about the normal distribution to it (for example, that it's bisected by its mean). Hence, the sampling distribution is <em>centered</em> on the population mean. Finally, according to the formula for the sampling distribution's standard deviation (a.k.a the standard error), as the sample size <em>N</em> grows, the standard error becomes smaller[footnote]After all, <em>N</em> is in the denominator.[/footnote] -- so the distribution will be less variable/spread out, and thus the estimates will be closer to the parameters.[footnote]On the flip side, the larger the original variables's dispersion, the larger the standard error and the smaller the original variable's dispersion, the smaller the standard error <span style="font-size: 14pt;text-indent: 18.6667px">(as </span><em style="font-size: 14pt;text-indent: 18.6667px">σ</em><span style="font-size: 14pt;text-indent: 18.6667px"> is in the numerator)</span><span style="text-indent: 1em;font-size: 14pt">.[/footnote]</span>

To summarize, the sampling distribution provides us with a bridge between sample statistics (i.e., estimators) and population parameters (i.e., the estimated). <strong>The CLT provides a description of the sampling distribution: by giving us information about an estimator in repeated sampling, it decreases the uncertainty of the estimation since now we can calculate how close the statistic is to the parameter.</strong>

I say <em>estimator</em> and <em>statistic</em>, not <em>mean</em>, because <strong>CLT (or a version thereof) applies to all statistical estimators, as they all have a normal distribution with increasing sample size. </strong>The latter is noteworthy because<strong> it's true regardless of the shape of the original variable's distribution </strong>(in the population)<strong>: a variable might not be normally distributed but it's mean (and other statistics) always is.</strong>[footnote]Many, if not most, social science type variables tend to be normally distributed in the population. The point I'm emphasizing here is that even when they are not, the statistics of these variables based on random sample data <em>are</em> normally distributed. This relates to our discussion of how large <em>N</em> should be: if the original variable's distribution in the population is close to normal to start with, a smaller <em>N</em> will be fine. On the other hand, if a variable is not normally distributed in the population (or is too widely dispersed/has a lot of outliers, as reflected in <em>σ</em>), a relatively large <em>N</em> will be needed to ensure the normality of the sampling distribution.[/footnote]

If you are wondering about the connection between random sampling and the normal distribution, the following video might help:

https://youtu.be/Kq7e6cj2nDw

The video above uses a <em>Galton board</em> to demonstrate the connection between randomness and normal curves by showing that balls falling randomly end up distributed approximately into a bell-shaped curve -- with the majority in the centre, fewer to the sides, and fewer yet in the "tails". You can think of a sample mean as one of these balls (all other balls are the means of other samples of the same size). Thus, what we see is that the majority of means would fall in the centre, fewer to the sides, and fewer still in the tail ends. However, since we do not have many means at all but only one, produced by one sample, we are dealing with a probability distribution. In turn, this tells us that <span style="font-size: 14pt">the highest probability</span><span style="font-size: 14pt"> is </span><span style="text-indent: 1em;font-size: 14pt">the mean to fall in the centre region, with smaller probability to be to the sides but still close to the centre, and a further decreasing probability the farther it gets from the centre.[footnote]Of course, in the video you see an <em>approximation</em> of a normal curve; after all, this is a finite, not infinite, number of balls. That is why the perfectly normal distribution is only a theortical concept.[/footnote] </span>

If you still find all this hopelessly abstract (as I'm sure most do), you can see exactly how we use the CLT for inference in the example below. (Unfortunately, your relief to be back to examples will be premature at this point: we have more necessary theory to cover ahead. On the bright side, we are more than half-way in the chapter so cheer up, the chapter's end is near.)

As a heads-up, here's the rationale of what we'll do: In order to explain inference about populations based on samples, we'll reverse-engineer it. That is, we'll start with "knowledge" about the population and, based on the CLT, we'll "infer" the sample statistic. At the end we'll see that, following the same logic (but in reverse), we can easily do the opposite -- to estimate the population parameter through a sample statistic -- which is exactly what we want to do in the first place.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Price of Statistics Textbooks</em></p>

</header>
<div class="textbox__content">

Let's say that university students on average spend \$250 for a statistics textbook, with a standard deviation of \$100 -- i.e., we assume to know the population parameters:

<em>μ</em> = 250 and <em>σ</em> = 100

We draw a random sample of<em> N</em>=1,600 students. We want to know the probability for that sample to have a specific mean price paid for statistics textbooks.

To get that probability, we first need the standard error, $\sigma_\overline{x}$:

$\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}=\frac{100}{\sqrt{1600}}=\frac{100}{40}=2.5$

Next, we can draw the sampling distribution: bell-shaped, centered on μ, and with a (standard deviation called) standard error of \$2.5. Applying what we know about the normal distribution in terms of the probability under the curve, we get the following Fig. XX.

Fig. XX <em>The Sampling Distribution of the Mean Price of Statistics Textbooks</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-250-1-and-2-se-B.jpg" alt="" width="627" height="206" class="alignnone wp-image-806 " />

&nbsp;

That is, we see that 68% of the sample mean prices of statistics textbooks in hypothetical repeated sampling would fall between \$247.5 and \$252.5[footnote]That is, 250-2.5=247.5 and 250+2.5=252.5.[/footnote] (i.e., within 1 standard error away from the mean, denoted with green in Fig. XX) and 95% of the sample means will fall between \$245 and \$255[footnote]That is, 250-2(2.5)=250-5=245 and 250+2(2.5)=250+5=255.[/footnote] (i.e., within 2 standard errors away from the mean, denoted with blue in Fig. XX). Since this is just a heuristic way to <em>imagine</em> the sampling distribution, we can restate our finding more correctly: a single, one-off sample mean will fall between \$247.5 and \$252.5 68% of the time, and between \$245 and \$255 95% of the time.

Or, even <em>more</em> precisely, we have a 68% probability that the average paid price for statistics books obtained from a random sample of 1,600 students will be between \$247.5 and \$252.5, and a 95% probability that it will be between \$245 and \$255. This means that we have a 95% chance that the sample mean, $\overline{x}$, will fall within \$10 (i.e., ±\$5) of the population mean, <em>μ</em>.

Quite good as far as predictions go, eh?

</div>
</div>
Of course, we rarely would have the population mean to go by, and we would <em>never</em> need to estimate a statistics -- usually, it's the other way around. But the sampling distribution <em>is</em> the same, as we still go by the CLT: With large <em>N</em>, it is still a normal curve. With large <em>N</em>, the sample mean, $\overline{x}$, is still approaching the true population mean, <em>μ.</em> And, with large <em>N</em>, the formula for the standard error is still the same, $\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}$. For statistical inference, we<span style="text-indent: 1em;font-size: 14pt"> need only follow the logic presented in Example XX above (albeit in reverse).</span>

However, there is one thing we normally do <em>not</em> have in order to proceed: the population standard deviation, <em>σ</em>. We typically use the sample standard deviation, <em>s</em>, as a substitute, even if this does increase the uncertainty of the estimates.[footnote]<span style="text-indent: 18.6667px;font-size: 14pt">We have a way to account for that, however, as we will see in Section XX on the <em>t-distribution</em> and the concept of <em>degrees of freedom</em></span><span style="text-indent: 1em;font-size: 14pt">.[/footnote]</span>

Then, finally, here is <strong>how inference works</strong>, in one paragraph: <strong>we use sample statistics to estimate population parameters </strong>-- i.e., the statistics we calculate based on random sample data act as statistical estimators for what we truly want to know, the unknown population parameters.<strong> We do that by the postulates of the Central Limit Theorem </strong>which describe the sampling distribution, the bridge between the statistics and the parameters.By the CLT, we have <strong>the sampling distribution as normal. </strong>Again, by the CLT,<strong> we can center the sampling distribution on the sample mean, and calculate the sampling distribution's standard error using the sample standard deviation. By applying the properties of the normal probability distribution to the sampling distribution</strong><span style="text-indent: 18.6667px;font-size: 14pt"><strong>, we then produce population estimates.</strong> Ta-da!</span>

We'll end this section with an example to illustrate the full process from the beginning to the end.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Average Annual Income</em></p>

</header>
<div class="textbox__content">

Imagine you are interested in the average annual income in a medium-size city. You randomly select <em>N</em>=1,600 people, and ask them about their annual income. You then calculate the mean of the resulting variable as \$50,000, and the standard deviation as \$12,000. I.e.,

$\overline{x}=50,000$ and <em>s</em> = 12,000

<em>As a first guess</em>, you <em>could</em> say that the average annual income in the city is \$50,000. However, since we know this is an estimate, and random error exists, you can do better: you can also provide information about how certain you are about your estimate along with some margins for error.

To do that, you need to draw the sampling distribution of the mean. Following the CLT, you draw the sampling distribution as a normal curve centered on \$50,000. At this point, you also need information about the sampling distribution's dispersion, i.e., its standard error. You substitute the <em>s</em> you do know for the <em>σ</em> you don't[footnote]Recall that a "hat" over a symbol indicates it being estimated.[/footnote]:

$\hat\sigma_\overline{x}$ $=s_\overline{x}$ $=\frac{s}{\sqrt{N}}= \frac{12000}{\sqrt{1600}}=\frac{12000}{40}=300$

Fig. XX shows the resulting sampling distribution.

Fig. XX. <em>Average Annual Income</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-2se-B.jpg" alt="" width="642" height="201" class="alignnone wp-image-811 " />

&nbsp;

Based on the figure above (and following the same logic as in the previous Example XX), you find that the average annual income of the city's population will be between \$49,400 and \$50,600 with 95% probability.[footnote]We get these bounds (i.e., within two standard errors away from the mean) through 50,000-2(300)=50,000-600=47,400 and 50,000+2(300)=50,000+600=50,600.[/footnote] That is, you can be 95% confident that the city's average annual income will be within \$1,200 of the sample average of \$50,000, or, that the city's average annual income is \$50,000 ±\$600, with 95% certainty.

</div>
</div>
You should be able to appreciate that this "average annual income of \$50,000 ±\$600" is a much more qualified and precise statement than simply assuming the population average is the same as the sample average (which it is not). <strong>Now you <em>know</em> how much potential variability the population mean has, with a specific </strong>(and quite high!) <strong>level of certainty.</strong>

This is no way trivial, and the best "guess" you can offer as an estimate of the population mean. No other research method using sample data is able to produce a closer level of generalizability of the sample findings to the level of population, much less with the mathematical, probability-theory-backed proof offered by random sampling. This is what statistical inference does, and now you even know how and why it works!

Try some inference yourself!
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title">Do It! XX</p>

</header>
<div class="textbox__content">

[PLACEHOLDER FOR EXERCISE]

</div>
</div>
We are almost, but not quite, done with this abstract monster of a chapter. There is a light at the end of the tunnel -- what is left is tying some loose ends, formally introducing a concept you're already using, and providing a some final details on inference in the next section -- and then we are good to go: we can start on some real research and working with variables again in Chapter 7!]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>99</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:41:36]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:41:36]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[6-5-the-central-limit-theorem]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>6</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[6-2-2-the-central-limit-theorem]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-4-1-the-central-limit-theorem]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-5-the-central-limit-theorem]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_oembed_110b05a292e9ed7c30d2a4942caf6104]]></wp:meta_key>
			<wp:meta_value><![CDATA[<iframe width="500" height="375" src="https://www.youtube.com/embed/Kq7e6cj2nDw?feature=oembed&rel=0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_oembed_time_110b05a292e9ed7c30d2a4942caf6104]]></wp:meta_key>
			<wp:meta_value><![CDATA[1554428219]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_oembed_11b09bd00d291d087fe75d3a92d175d2]]></wp:meta_key>
			<wp:meta_value><![CDATA[<iframe width="743" height="557" src="https://www.youtube.com/embed/Kq7e6cj2nDw?feature=oembed&rel=0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_oembed_time_11b09bd00d291d087fe75d3a92d175d2]]></wp:meta_key>
			<wp:meta_value><![CDATA[1554428220]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_oembed_c43d73ac7139c4cebcb887fc5881b2b6]]></wp:meta_key>
			<wp:meta_value><![CDATA[<iframe width="534" height="401" src="https://www.youtube.com/embed/Kq7e6cj2nDw?feature=oembed&rel=0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_oembed_time_c43d73ac7139c4cebcb887fc5881b2b6]]></wp:meta_key>
			<wp:meta_value><![CDATA[1562971297]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>6.6. Confidence Intervals</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-6-confidence-intervals/</link>
		<pubDate>Wed, 31 Oct 2018 21:43:55 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=101</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

<span style="text-indent: 18.6667px;font-size: 14pt">You might have noticed</span><span style="text-indent: 18.6667px;font-size: 14pt"> that i</span>n our discussion on statistical inference so far, I have used not one type of estimators but two, without bringing your attention to it. I do that now: probability theory and the Central Limit Theorem describing the sampling distribution of statistics provide us with two types of estimators, called <em>point estimators</em> and <em>interval estimators</em>.

<strong>A single sample statistic which estimates a population parameter</strong> -- and which offers a "best guess" for that parameter -- <strong>is <em>point</em> estimator. </strong>We have worked with several point estimates by now: the sample mean $\overline{x}$ is a point estimate of the population mean <em>μ </em>while the sample standard deviation <em>s</em> is a point estimate (which we can note as $\hat{\sigma}$) of the population standard deviation <em>σ</em>.

Similarly, I'll add another useful point estimate, of the sample <em>proportion</em>. Imagine we are interested in studying unemployment. We take a random sample which reveals that, say, 10% of the sample respondents report being unemployed. Thus, we have the sample proportion <em>p</em> as 0.1 and we can use that proportion as a point estimate of the proportion of the population which is unemployed. We denote population proportions by the small-case Greek letter for <em>p </em>which is <em>π</em>[footnote]Pronounced PAI, as you probably already know from the mathematical constant <em>π</em>=3.14. While we use the letter <em>π</em> for both population proportions and the mathematical constant, context provides enough clues to differentiate them.[/footnote]. In other words, the sample proportion <em>p</em> serves as a point estimate of the population proportion <em>π</em>.

You'll be happy to know that you are also already familiar with the other, <em>interval</em>, type of statistical estimators. As their name suggests, <strong>interval estimators, called <em>confidence intervals</em>, provide not just one number as a best guess but a whole set of plausible values for the population parameter. </strong>

If you recall Example XX from the previous section, you'll recognize that we already calculated confidence intervals. In Example XX on the average annual salary, we found a range of values within which the average annual salary of the city population was estimated to fall. Specifically, the average annual salary of the random sample was \$50,000 and we were able to estimate <span style="text-indent: 18.6667px;font-size: 14pt">with 95% certainty </span><span style="text-indent: 1em;font-size: 14pt">that the average annual salary of the city population would fall between \$49,400 and \$50,600.  This range of values between \$49,400 and \$50,600 is in effect a confidence interval (a 95% confidence interval, to be precise). The actual numbers "bracketing" the interval are called <em>error bounds</em>; the interval itself is between, and including, the <em>lower error bound</em> and the <em>upper error bound. </em></span>

Up until now, we calculated the confidence interval in a fast and easy way as I wanted to get the point of the logic underlying statistical inference across. At this point, however, we need to get more technical and precise about it.

First, let's revisit how we did it in the previous section to refresh your memory; then I'll show you the <em>more</em> correct way to do it. (Before you panic, know that what we did before was not incorrect; we just used rounded numbers to make calculations easier/faster.)

This is the information about the sample mean and standard deviation we had from Example XX <em>Average Annual Income </em>(without the dollar signs for clarity of presentation):

$\overline{x}=50000$

$s=12000$

$N=1600$

Our starting point is the sample mean (which, according to the CLT approximates the population mean with large <em>N</em>). In order to calculate a confidence interval around the sample mean $\overline{x}$, we first need to get the standard error $\sigma_\overline{x}$, given by the CLT-based formula:

$\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}$

We don't know the population standard deviation $\sigma$ but we estimate it with its point estimator <em>s</em>, so we get:

$\hat\sigma_\overline{x}$ $=s_\overline{x}$ $=\frac{s}{\sqrt{N}}$

Substituting <em>s</em> and <em>N</em> in the formula gives us the following:

$s_\overline{x}$ $=\frac{12000}{\sqrt{1600}}=\frac{12000}{40}=300$

Now, by the CLT, we have everything we need for the sampling distribution: its mean (as estimated by the sample mean $\overline{x}$, its standard deviation (i.e., the standard error $\sigma_\overline{x}$), and its shape as a normal curve. From Chapter 5, we know the probabilities under the normal curve, and that 68% of cases (in this case, the cases are the hypothetical means over repeated sampling) fall within 1 standard deviation from the mean while 95% of cases fall within 2 standard deviations from the mean.

The resulting graph was presented in Fig. XX in the previous section. Here it is again, this time with the 68% demarcations included:

Figure XX <em>Average Annual Income (in thousands of dollars), Revisited </em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-1-and-2se-B.jpg" alt="" width="650" height="203" class="alignnone wp-image-826 " />

&nbsp;

However, to calculate a confidence interval we don't need to draw the sampling distribution every time; we just need to keep in mind what it represents in terms of probabilities.

From our discussion of Example XX in the previous section and now, we can easily to deduce the basic formula for calculating a confidence interval:
<ul>
 	<li>for a 68% confidence interval around the mean, we would have
<ul>
 	<li>$\overline{x}\pm1\times\hat\sigma_\overline{x}$</li>
</ul>
</li>
 	<li>for a 95% confidence interval around the mean, we would have
<ul>
 	<li>$\overline{x}\pm2\times\hat\sigma_\overline{x}$</li>
</ul>
</li>
</ul>
We could even add the 99% confidence interval, encompassing values within 3 standard deviations away from the mean:
<ul>
 	<li>for a 99% confidence interval, we would have
<ul>
 	<li>$\overline{x}\pm3\times\hat\sigma_\overline{x}$</li>
</ul>
</li>
</ul>
Using the data from Example XX for illustration, we then have the following confidence intervals (CI):
<ul>
 	<li>68% CI: $\overline{x}\pm1\times\sigma_\overline{x}$ $=50000\pm1\times300=50000\pm300=(49700; 50300)$</li>
 	<li>95% CI: $\overline{x}\pm2\times\sigma_\overline{x}$ $=50000\pm2\times300=50000\pm600= (49400; 50600)$</li>
 	<li>99% CI: $\overline{x}\pm3\times\sigma_\overline{x}$ $=50000\pm3\times300=50000\pm900= (49100; 50900)$</li>
</ul>
Fig. XX illustrates these confidence intervals (the 99% CI is denoted only by its error bounds to avoid overcrowding).

Figure XX <em>Confidence Intervals for Average Annual Income</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-1-and-2-and-3se-D.jpg" alt="" width="673" height="243" class="alignnone wp-image-831 " />

That is, we find that the average annual income for the city population is between \$49,700 and \$50,300 with 68% certainty; it is between \$49,400 and \$50,600 with 95% certainty; and it's between \$49,100 and \$50,900 with 99% certainty. Alternatively, we could report that the average annual income of the city population is \$50,000 ±\$300 with 68% confidence; \$50,000 ±\$600 with 95% confidence; and \$50,000 ±\$900 with 99% confidence. The $\pm\hat\sigma_\overline{x}$ (i.e., the plus and minus the estimated standard error) represents the <em>margin of error</em> for the specific confidence interval. The margin of error of course reflects the intervals's error bounds, as illustrated in Fig. XX below.

Figure XX <em>Average Annual Income, Margins of Error</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-mean-500-1-and-2-and-3se-C.jpg" alt="" width="683" height="214" class="alignnone wp-image-830 " />

Now that you understand the principle of calculating confidence intervals, let's start doing it with greater precision, as we normally would in real-life research.

Even if I used "1, 2, 3 standard deviations/errors away from the mean" in the calculations so far, this is a quick-and-easy rounding only <em>approximating</em> the real formula for confidence interval. <span style="font-size: 14pt;text-indent: 18.6667px">From Chapter 5, we know that the probabilities under the normal curve are associated with specific </span><em style="font-size: 14pt;text-indent: 18.6667px">z</em><span style="font-size: 14pt;text-indent: 18.6667px">-values. Specifically, we </span>know that the precise z-values associated with 95% probability and 99% probability are 1.96 (almost but not quite 2) and 2.58 (almost but not quite 3), respectively.

Thus, even if the z-value associated with 68% probability is 1, the other two confidence intervals we have used so far need to be recalculated properly:
<ul>
 	<li>68% CI: $\overline{x}\pm1\times\hat\sigma_\overline{x}$ $=50000\pm1\times300=50000\pm300=(49700; 50300)$</li>
 	<li>95% CI: $\overline{x}\pm1.96\times\hat\sigma_\overline{x}$ $=50000\pm1.96\times300=50000\pm588= (49412; 50588)$</li>
 	<li>99% CI: $\overline{x}\pm2.58\times\hat\sigma_\overline{x}$ $=50000\pm2.58\times300=50000\pm774= (49226; 50774)$</li>
</ul>
To interpret, we find that we can be 95% certain that the average annual income of the population is between \$49,412 and \$50,588. As well, we find that we can be 99% certain that the average annual income is between \$49,226 and \$50,774.

Furthermore, although going by "1, 2, 3 standard deviations/errors" makes intuitive sense, in reality would you be happy to learn anything "with 68% certainty"? Sixty-eight percent certainty is hardly certain at all, as such, it is pretty much never used outside of teaching.

On the other hand, while the 95% and 99% confidence intervals are the most widely used and useful ones, there is no need to restrain yourself, should you choose to calculate <em>any</em> confidence interval you wish. <strong>The general formula for a confidence interval is thus:</strong>
<ul>
 	<li><strong>Any % CI:</strong> $\overline{x}\pm$ $z\times\sigma_\overline{x}$</li>
</ul>
To calculate this, you need to choose the level of certainty you want; once you have the probability, check its corresponding z-value and multiply it by the standard error to get margins of error with the desired probability level of certainty. For example, I might want the 90% CI (not as popular as the other two but still a relevant confidence interval that has its uses).

I check for the z-value associated with 90% probability in a <em>z</em>-distribution table and I find that it's 1.65. Then, for the example used above, I would get:
<ul>
 	<li>90% CI: $\overline{x}\pm1.65\times\hat\sigma_\overline{x}$ $=50000\pm1.65\times300=50000\pm495= (49505; 50495)$</li>
</ul>
Or, I can be 90% certain that the average annual income of the population is between \$49,505 and \$50,495.

By analogy, you can thus produce any confidence interval with any level of certainty you want.

<strong>The trade-off between certainty and precision</strong>. One thing you you might have noticed from the above calculations is that the more certainty you get, the larger your confidence interval becomes (or vice versa: the smaller the interval, the less precise your estimate):
<ul>
 	<li>between \$49,700 and \$50,300 with 68% confidence;</li>
 	<li>between \$49,505 and \$50,495 with 90% confidence;</li>
 	<li>between \$49,412 and \$50,588 with 95% confidence; and</li>
 	<li>between \$49,226 and \$50,774 with 99% confidence.</li>
</ul>
Of course, who wouldn't want both more precise <em>and</em> more certain estimates? Unfortunately there simply is no way to have our cake and eat it too in this case. As you can see above, the more confident in our estimate we get, the more the error bounds of the confidence intervals spread out further. It's a trade-off between precision and confidence. The more precise our estimate, the less certain we are of it; the more confident we are in our estimate, the less precise our "guess" is.

Logically, this makes a lot of sense: imagine the population parameter as a target and estimation as throwing a dart at it. The smaller the target, the more precise you'll have to be but also the less confident of hitting it. At the same time, increasing the target size will accommodate less precise "shots" while simultaneously increasing the certainty of the target being hit.

<strong>Why can't we have 100% CI?</strong> The non-technical answer is simply because a statistical estimator is based on a sample drawn from a population of interest: as long as you don't have data from your entire population, there will always be possibility for random error (and uncertainty). The more technical answer lies in the characteristics of the normal probability distribution. Specifically, we know from chapter 5 that the probability in its "tails" is not bound -- i.e., a probability for <em>any</em> z-value exists, no matter how small or large, and it never reaches 0. Thus, a 100% confidence interval would result in -∞ to +∞ , i.e., it would be virtually <em>infinitely</em> large, to accommodate the perfect certainty. Logically, no bound, finite interval can provide 100% certainty by the nature of statistical <em>inference </em>itself. (As, at 100%, it would stop being inference altogether.)

<strong>The effect of sample size on confidence intervals</strong>. Let's also consider the effect of sample size on the precision and level of certainty of confidence intervals. In Section XX I attempted to convince you that increasing the sample size beyond a specific (large) number becomes not only unfeasible in a world of limited resources but also statistically pointless. Let's see if I could further support my claim by the effect of sample size on the standard error.

If you recall, we find the standard error in the following way:

$\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}$

where we estimate <em>σ</em> (the standard deviation of the population) with <em>s</em> (the standard deviation of the sample) to get

$\hat\sigma_\overline{x}$ $=s_\overline{x}$ $=\frac{s}{\sqrt{N}}$

We already established that a larger <em>N</em> would result in a smaller standard error. Given the formula for calculating confidence intervals, a smaller standard error should in turn lead to smaller  intervals (i.e., to more precise estimates) <em>at a fixed level of certainty</em>. The question is -- how much smaller?
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>The Effect of Sample Size on Confidence Intervals</em></p>

</header>
<div class="textbox__content">

Going back to our <em>Average Annual Income</em> example, we had that

<span style="font-size: 1rem">$N=1600$</span>

$\overline{x}=50000$

$s=12000$

We had also already calculated its 95% CI:
<ul>
 	<li>95% CI: $\overline{x}\pm1.96\times\hat\sigma_\overline{x}$ $=50000\pm1.96\times300=50000\pm588= (49412; 50588)$.</li>
</ul>
What would happen, if we increased the sample size to, say, <em>N</em>=10,000?

As usual, we start with calculating the standard error:

$\hat\sigma_\overline{x}$ $=s_\overline{x}$ $=\frac{s}{\sqrt{N}}= \frac{12000}{\sqrt{10000}}=\frac{12000}{100}=120$

Then, the new 95% CI would be
<ul>
 	<li>95% CI: $\overline{x}\pm1.96\times\hat\sigma_\overline{x}$ $=50000\pm1.96\times120=50000\pm235= (49765; 50235)$.</li>
</ul>
To be sure, the larger-<em>N</em> confidence interval *is* smaller; we did gain precision. But consider these numbers for what they actually are, in actual dollar terms, had this been a real-life research instead of a hypothetical example. With a sample of <em>N</em>=1,600 we found that, with 95% certainty, the average annual income for the population is between \$49,412 and \$50,588. We now find that had we a sample of N=10,000, the average annual income of the population would be between \$49,765 and \$50,235.

The precision "gain"between the two sample sizes is \$353 on both error bounds; i.e., our estimate of average annual income of the population becomes ±\$353 more precise (a total "gain" of \$706). At the same time, consider that surveying a sample size of <em>N</em>=10,000 would cost more than <em>six times </em>more than surveying one of <em>N</em>=1,600 (as 10,000 is 6.25 times more than 1,600) -- would this be worth it, to only be able to improve your estimate by \$350, give or take, on both sides, when the actual sums we are dealing with are in the tens of thousands dollars magnitude?

</div>
</div>
Most people would agree that \$49,412 to \$50,588 <em>is</em> precise enough, and that there's no need to waste six times more resources on such a relatively insignificant gain in precision when it comes to average annual income.[footnote]To demonstrate the effect of sample size only, this example keeps the other conditions (i.e., the sample mean and standard deviation) the same. Arguably, however, a larger <em>N</em> would have a mean and a standard deviation "truer" to the population. To the extent that a larger sample ends up with a smaller standard deviation, the standard error would be further reduced, and the confidence interval would be even tighter, thus gaining more precision. Still, the point of the effect of sample size <em>per se</em> remains.[/footnote]

However, had we been discussing effectiveness of a life-saving medical treatment instead of average annual income, our preferences regarding the trade-off between precision and cost would most likely be different. Thus, the actual value of increasing sample size cannot be judged solely on statistics grounds: what is considered a small/insignificant change in precision for one thing may very well be a large and worthy change in another context. Still, in social science research there's rarely a need for such increasing precision of inference no matter the costs, even if larger samples are generally preferred[footnote]<span style="font-size: 14pt;text-indent: 18.6667px">Large sample sizes are very useful for gaining <em>power</em> in detecting associations between variables, </span><span style="font-size: 14pt;text-indent: 18.6667px">as you'll see in the remaining chapters</span><span style="text-indent: 1em;font-size: 14pt">.[/footnote]</span>

<strong>Confidence intervals for a proportion. </strong>Just like we may like to know the population mean of something (like the average annual income above), we might want to know the population <em>proportion</em> of something else (like, say, the proportion of Canadians working part time). Population proportions are, like population means, parameters that can be estimated.

<strong>The principle of estimating a population proportion through a confidence interval is the same as the mean -- we need a standard error for creating error bounds around the sample statistics (in this case, the proportion).</strong> The question, however, is how to calculate the standard error of a proportion. After all, the CI formula requires the use of a standard deviation; a standard deviation that proportions do <em>not</em> have as the dispersion measures we studied are only applicable to interval/ratio data. Thus, calculating the mean and the standard deviation of an interval/ratio variable is all well and good but what do we do with proportions, considering that they relate to <em>categories</em>?

In fact, there is a way to measure dispersion in a binary distribution (i.e., where there are only two categories/outcomes, e.g., employed vs. unemployed, women vs. men, undergraduate vs. graduate students, heads vs. tails, approval vs. disapproval, yes vs. no, success vs. failure, etc.). Unlike interval/ratio variables (which usually have an approximately normal -- continuous --distribution), such a binary distribution (formally called <em>binomial</em>) is a <em>discrete</em> distribution.[footnote]It is also called a Bernoulli distribution (after Jacob Bernoulli, a Swiss mathematician) in the special case of a single "trial", like a single random sample is. [/footnote]

Since then the standard deviation is off the table, here is an example to demonstrate the logic underlying the measurement of variability of proportions.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Variability Through Clothing</em></p>

</header>
<div class="textbox__content">

Imagine you have a friend who is partial to the colour black so much so that they always wear a monochromatic, all-black outfit. Then one day you notice your friend is wearing a single article of a different colour, say, dark purple. Arguably, that's more variability than wearing all-black, but the outfit will still be predominantly black. Then on the next day, there are two pieces of purple amid all the black, then three, then four, and so on. At what point would your friend's outfit stop being "predominantly black" and would become "predominantly purple"? And what would happen eventually, if the exchanging-black-for-purple trend continues?

The answer to the latter question is obvious: the end point of such a trend would be for the outfit to become monochromatic again, this time all-purple. Now think about variability. At what point was there the greatest and at what point was there the least amount of variability in your imaginary friend's outfit?

To make it easier, let's add a numerical aspect to what we have imagines, and say that your friend's outfit consisted of 10 articles of clothing (and accessories) to start with, and then your friend swapped a black article for a purple article on each successive day, for ten days straight after that. Table XX illustrates.

Table XX <em>Black and Purple Articles of Clothing</em>
<table class="lines" style="border-collapse: collapse;width: 0%;height: 180px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center"></td>
<td style="width: 25%;height: 15px;text-align: center"><strong>Black Articles</strong></td>
<td style="width: 25%;height: 15px;text-align: center"><strong>Purple Articles</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center"><strong>Initial state</strong></td>
<td style="width: 25%;height: 15px;text-align: center">10</td>
<td style="width: 25%;height: 15px;text-align: center">0</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 1</td>
<td style="width: 25%;height: 15px;text-align: center">9</td>
<td style="width: 25%;height: 15px;text-align: center">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 2</td>
<td style="width: 25%;height: 15px;text-align: center">8</td>
<td style="width: 25%;height: 15px;text-align: center">2</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 3</td>
<td style="width: 25%;height: 15px;text-align: center">7</td>
<td style="width: 25%;height: 15px;text-align: center">3</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 4</td>
<td style="width: 25%;height: 15px;text-align: center">6</td>
<td style="width: 25%;height: 15px;text-align: center">4</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 5</td>
<td style="width: 25%;height: 15px;text-align: center">5</td>
<td style="width: 25%;height: 15px;text-align: center">5</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 6</td>
<td style="width: 25%;height: 15px;text-align: center">4</td>
<td style="width: 25%;height: 15px;text-align: center">6</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 7</td>
<td style="width: 25%;height: 15px;text-align: center">3</td>
<td style="width: 25%;height: 15px;text-align: center">7</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 8</td>
<td style="width: 25%;height: 15px;text-align: center">2</td>
<td style="width: 25%;height: 15px;text-align: center">8</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 9</td>
<td style="width: 25%;height: 15px;text-align: center">1</td>
<td style="width: 25%;height: 15px;text-align: center">9</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 10</td>
<td style="width: 25%;height: 15px;text-align: center">0</td>
<td style="width: 25%;height: 15px;text-align: center">10</td>
</tr>
</tbody>
</table>
Again, on what day(s) would your friend's outfit be the least and the most variable in terms of colour? Looking at Table XX, it's not difficult to spot that the least variable were your friend's initial (all-black) outfit and what they wore on Day 10 (all-purple), both consisting of a single colour. There is a slight variability on Days 1 and 9 (when there was a <em>single</em> article of different colour); then more variability on Days 2 and 8 (when there were <em>two</em> articles of different colour); then even more<span style="text-indent: 1em;font-size: 1rem"> variability on Days 3 and 7 (when your friend had <em>three</em> different-coloured articles); and yet even more variability on Days 4 and 6 (when there were <em>four</em> articles of different colour). The most outfit was most variable on Day 5, when it was half-black and half-purple, neither colour predominating.</span>

Going by "half-black and half-purple", let's restate the information in Table XX in terms of proportions, as this will help us generalize the logic without the constraint of an actual count (of 10 articles of clothing, or anything else).

Table XX <em>Black and Purple Articles of Clothing, Proportions</em>
<table class="lines" style="border-collapse: collapse;width: 0%;height: 180px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center"></td>
<td style="width: 25%;height: 15px;text-align: center"><strong>Black Articles</strong></td>
<td style="width: 25%;height: 15px;text-align: center"><strong>Purple Articles</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center"><strong>Initial state</strong></td>
<td style="width: 25%;height: 15px;text-align: center">1</td>
<td style="width: 25%;height: 15px;text-align: center">0</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 1</td>
<td style="width: 25%;height: 15px;text-align: center">0.9</td>
<td style="width: 25%;height: 15px;text-align: center">0.1</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 2</td>
<td style="width: 25%;height: 15px;text-align: center">0.8</td>
<td style="width: 25%;height: 15px;text-align: center">0.2</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 3</td>
<td style="width: 25%;height: 15px;text-align: center">0.7</td>
<td style="width: 25%;height: 15px;text-align: center">0.3</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 4</td>
<td style="width: 25%;height: 15px;text-align: center">0.6</td>
<td style="width: 25%;height: 15px;text-align: center">0.4</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 5</td>
<td style="width: 25%;height: 15px;text-align: center">0.5</td>
<td style="width: 25%;height: 15px;text-align: center">0.5</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 6</td>
<td style="width: 25%;height: 15px;text-align: center">0.4</td>
<td style="width: 25%;height: 15px;text-align: center">0.6</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 7</td>
<td style="width: 25%;height: 15px;text-align: center">0.3</td>
<td style="width: 25%;height: 15px;text-align: center">0.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 8</td>
<td style="width: 25%;height: 15px;text-align: center">0.2</td>
<td style="width: 25%;height: 15px;text-align: center">0.8</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 9</td>
<td style="width: 25%;height: 15px;text-align: center">0.1</td>
<td style="width: 25%;height: 15px;text-align: center">0.9</td>
</tr>
<tr style="height: 15px">
<td style="width: 25%;height: 15px;text-align: center">Day 10</td>
<td style="width: 25%;height: 15px;text-align: center">0</td>
<td style="width: 25%;height: 15px;text-align: center">1</td>
</tr>
</tbody>
</table>
One convenient way to quantify what we found in terms of the least and the largest variability is through multiplying the proportions in the two columns, like so:

Table XX <em>Black and Purple Articles of Clothing, Variability</em>
<table class="lines" style="border-collapse: collapse;width: 0%;height: 180px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center"></td>
<td style="width: 21.9277%;height: 15px;text-align: center"><strong>Black Articles</strong></td>
<td style="width: 18.1836%;height: 15px;text-align: center"><strong>Purple Articles</strong></td>
<td style="width: 16.8011%;text-align: center;height: 15px"><strong>Variability</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center"><strong>Initial state</strong></td>
<td style="width: 21.9277%;height: 15px;text-align: center">1</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0</td>
<td style="width: 16.8011%;text-align: center;height: 15px">1(0)=<strong>0</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 1</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.9</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0.1</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.9(0.1)=<strong>0.09</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 2</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.8</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0.2</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.8(0.2)=<strong>0.16</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 3</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.7</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0.3</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.7(0.3)=<strong>0.21</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 4</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.6</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0.4</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.6(0.4)=<strong>0.24</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 5</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.5</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0.5</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.5(0.5)=<strong>0.25</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 6</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.4</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0.6</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.4(0.6)=<strong>0.24</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 7</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.3</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0.7</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.3(0.7)=<strong>0.21</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 8</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.2</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0.8</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.2(0.8)=<strong>0.16</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 9</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.1</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0.9</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.1(0.9)=<strong>0.09</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 10</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0</td>
<td style="width: 18.1836%;height: 15px;text-align: center">1</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0(1)=<strong>0</strong></td>
</tr>
</tbody>
</table>
That is, starting from zero, variability is the highest at precisely the half-and-half point, when neither outcome/category (in our example, neither <em>colour</em>) predominates.

Now we are ready for the formula to measure the dispersion of a proportion. I demonstrate it by restating Table XX, by designating black as 1 and purple as 0, and taking black as the colour of interest (i.e., all proportion will be expressed in terms of black).

Table XX <em>Black and Purple Articles of Clothing, Generalized</em>
<table class="lines" style="border-collapse: collapse;width: 0%;height: 180px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center"></td>
<td style="width: 21.9277%;height: 15px;text-align: center"><strong>Black Articles</strong></td>
<td style="width: 18.1836%;height: 15px;text-align: center"><strong>Non-black Articles</strong></td>
<td style="width: 16.8011%;text-align: center;height: 15px"><strong>Variability</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center"><strong>Initial state</strong></td>
<td style="width: 21.9277%;height: 15px;text-align: center">1</td>
<td style="width: 18.1836%;height: 15px;text-align: center">0</td>
<td style="width: 16.8011%;text-align: center;height: 15px">1(0)=<strong>0</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 1</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.9</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-09)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.9(1-0.9)=<strong>0.09</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 2</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.8</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-0.8)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.8(1-0.8)=<strong>0.16</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 3</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.7</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-0.7)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.7(1-0.7)=<strong>0.21</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 4</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.6</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-0.6)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.6(1-0.6)=<strong>0.24</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 5</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.5</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-0.5)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.5(1-0.5)=<strong>0.25</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 6</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.4</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-0.4)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.4(1-0.4)=<strong>0.24</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 7</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.3</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-0.3)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.3(1-0.3)=<strong>0.21</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 8</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.2</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-0.2)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.2(1-0.2)=<strong>0.16</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 9</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0.1</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-0.1)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0.1(1-0.1)=<strong>0.09</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.0876%;height: 15px;text-align: center">Day 10</td>
<td style="width: 21.9277%;height: 15px;text-align: center">0</td>
<td style="width: 18.1836%;height: 15px;text-align: center">(1-0)</td>
<td style="width: 16.8011%;text-align: center;height: 15px">0(1-0)=<strong>0</strong></td>
</tr>
</tbody>
</table>
&nbsp;

</div>
</div>
And there you have it in the Table XX above, the formula for calculating variability for a proportion (i.e., for a discrete binary variable). Since we denote sample proportions with <em>p</em> and population proportions with <em>π</em>,<strong> the variability of a proportion is given by multiplying the proportion of the outcome we're interested in on <em>1 minus the proportion</em> </strong>(i.e., on the other outcome's proportion) -- that is, we have <em>p</em>(1-<em>p</em>) for samples and <em>π</em>(1-<em>π</em>) for populations.

Technically speaking, this variability I have been speaking of is the proportion's variance:

$\sigma^2=\pi(1-\pi)$

Thus, to get the proportion's standard deviation, we need a square root of the variance:

$\sigma=\sqrt{\sigma^2}=\sqrt{\pi(1-\pi)}$

With this, we are finally ready to get back to calculating a confidence interval for a proportion, as we now have everything we need to calculate its standard error. If you recall, the formula for the standard error was:

$\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}$

Substituting the standard deviation of the proportion, we get:

$\sigma_p=\frac{\sigma}{\sqrt{N}}=\frac{\sqrt{\pi(1-\pi)}}{\sqrt{N}}=\sqrt{\frac{\pi(1-\pi)}{N}$ = <em>standard error of the proportion</em>

Of course, when we don't have the population standard deviation, we estimate it with the sample standard deviation -- i.e., we need to substitute <em>p</em> for <em>π</em>:

$\hat\sigma_p=\frac{\sigma}{\sqrt{N}}=\frac{\sqrt{p(1-p)}}{\sqrt{N}}=\sqrt{\frac{p(1-p)}{N}}$ = <em>estimated standard error of the proportion</em>

Following the formula for confidence interval (the sample statistic ± z$\times$ the standard error), we ultimately get <strong>the confidence interval for a proportion:</strong>
<ul>
 	<li><strong>Any % CI</strong>: $p \pm$ $z\times\hat\sigma_p=p \pm$ $z\times\sqrt{\frac{p(1-p)}{N}}$</li>
</ul>
As with the mean, we can calculate a confidence interval with any preferred level of certainty by substituting with the z-value associated with that probability. For example, the 95% confidence interval for the proportion would be:
<ul>
 	<li>95% CI: $p \pm1.96\times\hat\sigma_p=p \pm1.96\times\sqrt{\frac{p(1-p)}{N}}$</li>
</ul>
If you find all this too technical and abstract, the following example should help.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Part-Time Workers in Canada, Age 25-54</em></p>

</header>
<div class="textbox__content">

Let's say we want to know what proportion of Canadian workers work part time, and that we are especially interested in what Statistics Canada calls "the core ages" 25 to 54 (Statistics Canada, 2017 [https://www150.statcan.gc.ca/n1/pub/71-222-x/71-222-x2018002-eng.htm]). We conduct a survey of N=1,600 Canadian individuals aged 25-54 and find that 12% of our respondents work part time. As usual, we want to estimate the proportion of <em>all</em> Canadians aged 25-54 who work part time.

We start with calculating the standard error:

$\hat\sigma_p=\sqrt{\frac{p(1-p)}{N}}=\sqrt{\frac{0.12(0.88)}{1600}}=\sqrt{\frac{0.106}{1600}}=\frac{0.325}{40}=0.008$

Then, a 95% confidence interval for the proportion would be:
<ul>
 	<li>95% CI: $p \pm1.96\times\hat\sigma_p=p \pm1.96\times0.008=0.12 \pm0.016=(0.104; 0.136)$</li>
</ul>
That is, we can estimate with 95% certainty that (that is, 95% of the time such a study is undertaken) between 10.4% and 13.6% of the Canadian workers aged 25-54 work part time. Alternatively, we can say with 95% certainty that 12% ±1.6% points of Canadian workers aged 25-54 work part time.

</div>
</div>
As there is a lot to take in here, a second example is in order.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Women in Managerial Positions</em></p>

</header>
<div class="textbox__content">

Let's say a large, nationally-representative study of N=10,000 finds that women in Canada occupy 36% of managerial positions. [https://www.expertmarket.com/female-managers] What would be the estimate for Canada as a whole?

The estimated standard error of the proportion would be:

$\hat\sigma_p=\sqrt{\frac{p(1-p)}{N}}=\sqrt{\frac{0.36(0.64)}{10000}}=\sqrt{\frac{0.230}{10000}}=\frac{0.48}{100}=0.005$

As in the previous examples, the 95% confidence interval for the proportion would be:
<ul>
 	<li>95% CI: $p \pm1.96\times\hat\sigma_p=p \pm1.96\times0.005=0.36 \pm0.01=(0.35; 0.37)$</li>
</ul>
That is, we can estimate with 95% certainty (i.e, 95% of the time such a study is undertaken) that between 35% and 37% of managerial positions in Canada are occupied by women. Alternatively, we can say with 95% certainty that women occupy 36% ±0.01% points of managerial positions in Canada.[footnote]In this chapter I have presented the most commonly used interpretation of confidence intervals, and the one most frequently taught to introductory statistics students. I should point out, however, that this is one of those instances (of which I spoke in the introduction to this book) where the reality is a bit different than what is being taught. The interpretation presented here is easier to understand and follows a logic that is more intuitive to students than what confidence intervals <em>really</em> tell us. Briefly, the range of plausible values we find are just that -- values that the population <em>could</em> have, as we have't ruled them out yet, and 95% (or 99%) of the time such studies will not be able to rule these plausible values out (van der Zee, 2017 [How (Not) To Interpret Confidence Intervals, in the hyperlink]). This, technically speaking, is somewhat different than the "95% (or 99%) certainty that the population mean/proportion <em>will be</em> between the calculated error bounds" version we usually work with. If you'd like to go down that particular rabbit hole, go <a href="http://www.timvanderzee.com/not-interpret-confidence-intervals/">here</a>. For everyone else, the interpretation of confidence intervals presented so far in this chapter should be enough.[/footnote]

If you find this a bit too precise to believe, note the quite large sample size of N=10,000. As established above, confidence intervals based on large N and around proportions indicating not very strong variability (after all, the sample statistics indicated that managerial positions are predominantly occupied by men) tend to have small standard errors (due to the relatively small numerator (the variability) and the large denominator (the sample size)).

</div>
</div>
Now it's your turn to try, first with means...
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title">Try It! XX <em>Average Height of NHL Players</em></p>

</header>
<div class="textbox__content">

Let's say that a random sample of N=900 past and present players in the National Hockey League finds that the average height of players is 73 inches, with a standard deviation of 3 inches. What can you say about the average height of NHL players as a whole? Construct a 95% and a 99% confidence intervals for the average height of NHL players.

</div>
</div>
... And now with proportions.
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title">Try It! XX Paying of Student Debt Within Three Years After Graduation</p>

</header>
<div class="textbox__content">

Let's say that a sample of N=1,600 finds that only 34% of Canadians with a bachelor's degree have paid off their student loans within three years after graduation. Can you estimate the rate for all Canadians with a bachelor's degree? Construct both a 95% and a 99% confidence interval for that rate.

</div>
</div>
To summarize, confidence intervals allow us to estimate population parameters with a specific level of precision and certainty. We construct them based on the idea of the (normally distributed) sampling distribution of the mean (or the proportion) using CLT's postulates: centering the interval on the sample man (or proportion) and taking that many times the standard error below and above the mean (or proportion). The "how many times the standard error" determines the interval's confidence (i.e., certainty) level.

Before we move on to variable associations (along with further uses of confidence intervals in statistics inference; you didn't think it was just this, did you?), let's finally address the glaring omission in my presentation so far: How come we can simply use the sample standard deviation <em>s</em> instead of the population standard deviation <em>σ</em> in calculating the standard error? I left that explanation for last, in the next Section XX.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>101</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:43:55]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:43:55]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[6-6-confidence-intervals]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>7</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[6-3-confidence-intervals]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-6-confidence-intervals]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>6.7. The t-Distribution</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-7-the-t-distribution/</link>
		<pubDate>Wed, 31 Oct 2018 21:44:51 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=103</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

If, having reached this chapter's final section, after all we had been through, random sampling, sampling distribution, CLT, parameters, estimates, statistics, confidence intervals, you are now groaning in dismay -- <em>why is there even more</em><span style="text-indent: 1em;font-size: 14pt"> <em>to this topic??</em></span><span style="font-size: 14pt">[footnote]As a general principle, in introductory texts such as this there is </span><em style="font-size: 14pt">always</em><span style="font-size: 14pt"> more. Much, much more; it's not a matter </span><em style="font-size: 14pt">if</em><span style="font-size: 14pt"> </span><span style="font-size: 14pt">but of </span><em style="font-size: 14pt">how much </em><span style="font-size: 14pt">something is left out</span><span style="font-size: 14pt">. [/footnote]</span><span style="text-indent: 1em;font-size: 14pt"> -- take heart, this is a short explanation I kept for last, through a brief introduction of new concept.</span>

If you recall, when we needed to calculate the standard error of the mean (or proportion) in the previous Section XX, I simply replaced the <em>unknown</em> population standard deviation <em>σ</em> with the <em>known</em> sample standard deviation <em>s</em> in the formula. This is what I did:

$\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}$ = <em>standard error of the mean</em>

Substituting in <em>s </em>for <em>σ</em> we had

$\hat\sigma_\overline{x}$ $=s_\overline{x}$ $=\frac{s}{\sqrt{N}}$ = <em>estimated standard error of the mean</em>

Similarly, for the proportion we had

$\sigma_p=\frac{\sigma}{\sqrt{N}}=\frac{\sqrt{\pi(1-\pi)}}{\sqrt{N}}=\sqrt{\frac{\pi(1-\pi)}{N}$ = <em>standard error of the proportion</em>

and substituting <span style="font-size: 14pt;text-indent: 18.6667px">the known sample proportion </span><em style="font-size: 14pt;text-indent: 18.6667px">p </em>for <span style="text-indent: 1em;font-size: 14pt">the unknown population proportion </span><em style="text-indent: 1em;font-size: 14pt">π </em>in calculating the proportion's variability<span style="text-indent: 1em;font-size: 14pt">, we ended up with</span>

$\hat\sigma_p=\frac{\sigma}{\sqrt{N}}=\frac{\sqrt{p(1-p)}}{\sqrt{N}}=\sqrt{\frac{p(1-p)}{N}}$ = <em>estimated standard error of the proportion</em>

But why can we do that?

The more observant of you might have noticed that I swept the explanation for the change under the carpet and simply moved on -- but why should the variability of the population be the same as the sample?

In truth they are not -- or rather, they <em>might</em> be; there's just no way to know. That is, by using the sample statistics to estimate the variability of the population, we introduce more <em>uncertainty </em>in the calculation. When we do that, we actually move away from using the normal distribution and its associated z-values. What we end up using is something similar, called the <em>t-distribution</em>[footnote]Also called the <em>Student</em>'s t-distribution, after the pseudonym of William Gosset who introduced it to statistics (along with many other concepts). Due to contractual obligations, William Gosset used to publish under the name of "Student" (Pagels, 2018). Here you can find more about his <a href="https://medium.com/value-stream-design/the-curious-tale-of-william-sealy-gosset-b3178a9f6ac8">curious case</a>.[/footnote]: an entire set of bell-shaped curves, accounting for each and every sample size <em>N</em>. Figure XX illustrates.

Figure XX <em>The Normal vs. the t-Distribution</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/normal-vs-t-1.jpg" alt="" width="673" height="274" class="wp-image-936 aligncenter" />
<p style="text-indent: 18.6667px"> <strong>The t-distribution provides a separate bell-shaped curve for each possible sample size</strong>, thus helping us "ground", as it were, the estimation in the reality of an actual sample of a specific size.</p>
<strong>The accommodation of the sample size is done through the concept of <em>degrees of freedom </em>(commonly abbreviated to <em>df</em>). The degrees of freedom represent the number of values in a statistical calculation that are free to vary. In the case of the t-distribution, the degrees of freedom are <em>N</em>-1 as one degree of freedom is reserved for estimating the mean, and N-1 degrees remain for estimating the variability. </strong>Unlike with z-values, where each z-value represents a specific probability under the normal curve, the probabilities associates by t-values are calculated based on its degrees of freedom.

Still, none of this explains why I was able to shamelessly switch from using the z-distribution to the t-distribution, without any change to the standard error and confidence interval calculations in the examples in the previous sections. If z-values and t-values (and their associated probabilities) are different, shouldn't the calculations differ too?

Before I reassure you that all is well, let's revisit what z-values actually represent. From Chapter 5, you know that the z-value is the distance between a case and the mean, expressed in terms standard deviations (i.e., standardized):

$z=\frac{x_i-\overline{x}}{s}$

The reason we were able to use <em>z</em>=1,<em> z</em>=1.96, and <em>z</em>=2.58 in the calculations of the 68%, 95%, and 99% confidence intervals, respectively, was because the sampling distribution is a normal distribution (<span style="text-indent: 18.6667px;font-size: 14pt">per the CLT)</span><span style="text-indent: 1em;font-size: 14pt">. That is, the z-value in this case is the distance between the sample mean (the "case" in the sampling distribution) and the population mean ("the mean of means", the mean of the sampling distribution), expressed in standard errors (the "standard deviation" of the sampling distribution):</span>

$z=\frac{\overline{x}-\mu}{\sigma_\overline{x}}$ [footnote]where $\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}$.[/footnote]

Now what about <em>t</em>? By substituting the sample standard deviation for the population standard deviation, we end up with the <em>estimated</em> standard error. In turn, substituting the <em>estimated</em> standard error for the standard error in the formula for the z-value above, we get the t-value, the distance between the sample mean and the population mean, expressed in <em>estimated</em> standard errors:

$t=\frac{\overline{x}-\mu}{s_\overline{x}}$ [footnote]Where $s_\overline{x}$ $=\frac{s}{\sqrt{N}}$.[/footnote]

Compare the two formulas for the <em>z</em>-value and the <em>t</em>-value above. As similar as they look, the <em>t</em>-value is more "uncertain" than the z-value, and comes with the aforementioned specification of degrees of freedom. Given specific degrees of freedom, the shape of the <em>t</em>-distribution curve changes, and thus the probabilities associated with each <em>t</em>-value change too.

Finally, for the drum roll: The reason I was able to work with <em>t</em>-values instead of <em>z</em>-values in the calculations of confidence intervals in the previous section without acknowledging it is due to the sample sizes I chose for my examples. See, <strong>the biggest difference between the <em>z</em> and the <em>t</em> happens with small <em>N </em>(especially <em>N</em>&lt;30). The larger the <em>N</em>, the closer and closer the <em>t</em>-distribution approaches the <em>z</em>-distribution. </strong><span style="text-indent: 18.6667px"><span style="font-size: 14pt">You can see this in Figure XX above: as the degrees of freedom increase, the shape of the distribution becomes more and more normal, so much so that the <em>t</em>-distribution at </span><em style="font-size: 14pt">df</em><span style="font-size: 14pt">=30 is already rendered invisible in the Figure, its light blue colour </span><span style="font-size: 18.6667px">overridden by the normal distribution's black.</span><span style="font-size: 14pt"> </span><span style="font-size: 14pt">And from <strong><em>N</em>=100 on, the <em>t</em> converges so fast t<span style="text-indent: 18.6667px;font-size: 14pt">o <em>z</em></span></strong><span style="text-indent: 1em;font-size: 14pt"><strong>, the <em>t</em>-distribution curve becomes</strong> our old, familiar, beloved <strong>normal curve!</strong> (Okay, maybe "beloved" applies just to me.)</span></span></span>

<span style="text-indent: 1em;font-size: 14pt">Given that in the confidence interval examples in the Section XX I used only large <em>N</em>'s (=900 and above), the probabilities associated with the <em>t</em>-value at <em>N</em>-1 degrees of freedom (=899 and above) were the same as those associated with the <em>z</em>-values: 68% for <em>t=z</em>=1, 95% for <em>t=z</em>=1.96, 99% for <em>t=z</em>=2.58. (Hence I left them out of the discussion at that time to properly explain here.)</span>

<em>Hmm, much ado about nothing</em>, I can imagine you saying at this point. If the <em>t</em>-distribution and the <em>z</em>-distribution are no different at larger <em>N,</em> why even bother with the <em>t </em>(beyond any small-<em>N</em> uses)? And as unsatisfying the answer "I'll explain later" is, I'm afraid I have no choice but to resort to it, again. Briefly, it has to do with something called a <em>t</em>-<em>test for significance</em> which we will be using soon enough for hypothesis testing in Chapter 7, next.

For now, what you should take from this section is that <strong>the <em>t</em>-distribution exists, and it is what we actually use for estimation (and not<em> z</em>!), given a specific sample size. </strong>As well, remember that<strong> for <em>N</em>=100 and above, <em>t</em> converges to <em>z</em>, so you can readily apply any probabilities you associate with <em>z</em> to <em>t</em> with<em> N</em>-1 <em>df</em>. </strong>(Regarding the latter, <strong>do not forget to always specify the degrees of freedom for whatever <em>t</em> you might have. A <em>t</em>-value <em>always</em> comes with <em>df</em> attached as it's meaningless/undefined without them.</strong>)]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>103</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:44:51]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:44:51]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[6-7-the-t-distribution]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>8</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[6-4-the-t-distribution]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>7.2 Describing and Examining Bivariate Associations</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-describing-bivariate-associations/</link>
		<pubDate>Wed, 31 Oct 2018 21:49:36 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=108</guid>
		<description></description>
		<content:encoded><![CDATA[Before we can get to establishing statistical associations between two variables, we need to know what we are looking for (or at), as it were. Social research, especially deductive reasoning, usually starts with an idea -- a research question if you will -- which is frequently grounded in an empirical observation of two variables' possible association (e.g., "Hey, it seems like all vegetarians/vegans I know tend to be well off. I wonder if income and vegetarianism/veganism are related...") Then, if one is quantitatively inclined, a random sample can be used to "check" for such an association.

Most people conceive of that "check" as a one-step process but it actually involves two steps frequently undertaken in quick succession, so much so that to appear singular. As this is your introduction to the topic, we'll take the steps slowly, one after the other.

<strong>The first step is the descriptive part: given our sample data, does it <em>look like</em> there is an association between the two variables of interest?</strong> This step concerns the data obtained through our sample, i.e., it describes <em>our sample</em>, and <em>only</em> our sample.

<strong>The second step is the inferential part: assuming that it looks like there is an association between the two variables of interest <em>in the sample</em>, is this association <em>generalizable to the population</em>? </strong>That is, is this a "real" association reflecting the population or is it something we have observed in our sample due to the vagaries of random chance? This is the part where we formulate and test hypotheses in or order to be able to make generalizable conclusions, on which we'll focus in Section XX and further on in the remaining chapters.

<strong>We hereby start with the first step, describing bivariate associations</strong> (again, based on sample data). What you need for this step is a recollection of the types of variables, and of the fact that we generally use both visual (graphical) and numerical descriptions.

From Section XX (a <em>long</em> while back), recall that we <em>univariately</em> described a variable by 1) graphing its distribution (we used pie charts, bar graphs, and histograms, depending on level of measurement), and 2) providing numerical measures of central tendency and dispersion where applicable; this is how we used to "get a sense" of the variable and what it looked like. Similarly, we can also use graphical and numerical bivariate descriptives, this time depending on the combination of continuous-or-discrete variable type, to "get a sense" of the potential association between two variables and what it might look like.

Recall as well (from Section XX), that we can classify variables as discrete and continuous[footnote]Briefly, nominal and ordinal variables tend to be (but, especially the latter, are not always) <em>treated</em> as discrete, and interval/ratio variables tend to be (but are not always) <em>treated</em> as continuous. Note, again, that all social science data tends to be discrete -- we just treat some variables (with relatively large number of categories/values) as continuous. For the remainder of the text I'll be referring to variables as discrete and continuous and you should take this to mean that that's how they are <em>treated </em>(and not as an indication of their "true nature").[/footnote]  (I know, I know - it too has been awhile, but I did warn you eventually we'd get back to that). <strong>From this chapter on, we'll proceed by considering all three possible bivariate combinations</strong> of these: <strong>1) associations between a <em>discrete</em> and a <em>continuous</em> variable</strong>[footnote]We'll eventually learn to test this type of associations in Section XX of the current chapter and in Chapter 8.[/footnote], <strong>2) associations between <em>two discrete</em> variables</strong>[footnote]We'll learn to test this type of associations in Chapter 9.[/footnote], and, finally, <strong>3) associations between <em>two continuous</em> variables</strong>[footnote]We'll learn to test this type of associations in Chapter 10[/footnote].

I discuss describing each of the three types of associations in the following subsections.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>108</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:49:36]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:49:36]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[7-2-describing-bivariate-associations]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>34</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-1-visualizing-bivariate-associations]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>8.2. Hypotheses</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/8-2-hypotheses/</link>
		<pubDate>Wed, 31 Oct 2018 21:51:09 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=110</guid>
		<description></description>
		<content:encoded><![CDATA[Now that we have come to terms with the fact that we will not be making causal statements at this point, let's turn our attention to establishing statistical associations. As I mentioned in the previous section, this is done through testing. In order to test variables' associations we need to know how hypotheses are scientifically tested.

To have a hypothesis about something means to have an idea about how to explain it. This idea, or proposed explanation, might be based on any combination of logic, previous related observations, experience, etc. In science, hypotheses are formulated as relatively concise, <em>testable</em> statements. If a statement cannot be tested, it doesn't qualify as a scientific hypothesis.

Most students unfamiliar with the scientific method of testing hypothesis are surprised to learn that the testing is done in a roundabout, method-of-exclusion kind of way: we don't set out to confirm our hypothesis but rather to reject the opposite of what we claim. To baffle you further, if we reject the opposite, we have found evidence in support of our hypothesis but we have not <em>proven</em> that it's true.  Similarly, if we do <em>not</em> reject the opposite, it doesn't mean that we've <em>proven</em> the opposite as true <em>or</em> that we have <em>proven</em> our hypothesis wrong. (Nothing is ever proven in science as that would require 100% certainty and we already established that is impossible.) Thus, interpreting a hypothesis test requires careful, qualified language as to not overstate findings.

Confused? Not to worry. I'm getting ahead of myself here to give you a quick sketch of where we're headed in this section, but of course I'll go over and explain the parts of the paragraph above in greater detail below. Also a heads-up: after the brief respite, things are about to get technical again (in the next section). But first things first.

<strong>To test a hypothesis of interest, we make <em>two</em> contradictory statements: one about what we hypothesize and another stating the <em>exact</em> opposite. </strong>[footnote]Why? Beyond what i already said about proofs, also because scientists need to be impartial about what a test will reveal. As a scientist, you want to test a hypothesis with an open mind and to be equally prepared to accept the result either way it goes -- so you cannot set out from the start to find your hypothesis supported. [/footnote] The "opposite" hypothesis is called a <em>null hypothesis</em> (frequently designated as H<sub>0</sub>) is usually stated first; the original hypothesis of interest is called an <em>alternative hypothesis </em>(usually designated as H<sub>a</sub>) and is stated second[footnote]Do not get alarmed if you see different notation in published research. When researchers test many hypotheses in the same study, they may designate them as H<sub>1</sub>, H<sub>2</sub>, H<sub>3</sub>, etc. Even more importantly, experienced researchers don't explicitly state the null hypotheses in their studies -- they are self-understood as the opposite of whatever each alternative hypothesis states. Further, some researchers never explicitly designate a hypothesis as it's taken as evident that this is what they do. Beginner researchers like you, however, should practice stating -- and clearly designating -- both null and alternative hypotheses.[/footnote].

When we apply all this to testing variables associations, we end up with null hypotheses such as "the two variables are not associated", "there is no association between the two variables", or "Variable 1 does not affect Variable 2", or "the two variables are independent of each other", etc. The alternative hypotheses then would be something like "the two variables are associated", "there is an association between the two variables", or "Variable 1 does affect Variable 2", etc. (However, recall that when interpreting and reporting results it's always better to state the findings not only in terms of variables but also in terms of people.) See some examples in the box below.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XX Stating Hypotheses</em></p>

</header>
<div class="textbox__content">

<em>Hair colour</em> <em>and eye colour</em>:
<ul>
 	<li>H<sub>0</sub>: Hair colour and eye colour are not associated; e.g., dark-haired individuals are equally likely to have blue eyes as blond people are.</li>
 	<li>H<sub>a</sub>: Hair colour and eye colour are associated; e.g., dark-haired individuals' and blond individuals' likelihood of having blue eyes is different.</li>
</ul>
</div>
<em>Smoking and lung disease:</em>
<ul>
 	<li>H<sub>0</sub>: Smoking and lung disease are not associated; e.g., smokers and non-smokers have the same odds of developing lung disease.</li>
 	<li>H<sub>a</sub>: Smoking and lung disease are associated; e.g., smokers and non-smokers have different odds of developing lung disease.</li>
</ul>
<em>Gender and income:</em>
<ul>
 	<li>H<sub>0</sub>: Income is independent of gender; e.g., men and women have the same average income.</li>
 	<li>H<sub>a</sub>: Income is dependent on gender; e.g., women and men have different income on average.</li>
</ul>
<em>Parental education and offspring education:</em>
<ul>
 	<li>H<sub>0</sub>: Parental education is unrelated to the education of their offspring; e.g., the level of parental education has no effect on children's level of education.</li>
 	<li>H<sub>a</sub>: Parental education and offspring education are related; e.g., the level of parental education is associated with the children's level of education.</li>
</ul>
</div>
There are three things that you can learn from the examples presented above. <strong>First, the hypotheses are formulated as short statements that can be evaluated in a simple yes-or-no kind of way</strong>: "Average income is independent of gender": YES, or "Average income is independent of gender": NO. Thus you really need only one statement per hypothesis; if your proposed explanation is complicated and involves more than two variables, this means you are dealing with multiple hypotheses, each of which needs to be tested separately.

<strong>Second</strong>, while there are many ways you can state essentially the same hypothesis, <strong>try to keep the null hypothesis as the <em>same</em> statement the alternative hypothesis has but in opposition</strong>, such as "...are not related/associated" and "...are related/associated", or "...are independent" and "...are not independent", etc.

<strong>Third</strong>, you may have noticed the slightly awkward way in which some of the alternative hypotheses are listed above. Couldn't I have stated "women have lower income than men on average"? Or, "blond individuals are more likely to have blue eyes than dark-haired individuals"? I could but then these would have been different alternative hypotheses. The reason I didn't imply who is more or less likely to have blue eyes, or who has a higher income on average but <strong>kept the statements as a generic "different likelihood" and "different income" is because it affects the kind of test needs to be used</strong>. Briefly, there is a general test for association/difference <span style="font-size: 14pt;text-indent: 18.6667px">(aka </span><em style="font-size: 14pt;text-indent: 18.6667px">two-tailed test</em><span style="font-size: 14pt;text-indent: 18.6667px">)</span><span style="text-indent: 1em;font-size: 14pt">, and a more specific version (aka <em>one-tailed test</em>) which implies "direction"; the former is more </span><span style="font-size: 14pt;text-indent: 18.6667px">"open-minded"</span><span style="text-indent: 1em;font-size: 14pt">as it doesn't rely on/imply prior knowledge and is therefore more conservative, the latter indicates not only a difference/association but of what type so its usage needs to be justified. More on that in the next section but for now keep in mind that as beginner researchers, I'd recommend you use the general, two-tailed, version of the test.</span>

Before we move to some actual hypothesis testing, see if you can formulate some hypotheses on your own.
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Try It! XX Stating Hypotheses</em></p>

</header>
<div class="textbox__content">

Formally state the null and alternative hypotheses about each of the following pairs of variables: class attendance and test scores, time spent of social media prior to the test and test scores, race/ethnicity and years of schooling, gender and belief in climate change, political affiliation and attitudes toward gun control. In fact, just go ahead and practice formulating hypotheses about anything you like.

</div>
</div>
&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>110</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:51:09]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:51:09]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[8-2-hypotheses]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>1051</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-2-hypotheses-and-hypotheses-testing]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-4-hypotheses-and-hypotheses-testing]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[8-2-hypotheses-and-hypotheses-testing]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>7.1. Types of Bivariate Associations</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-1-types-of-bivariate-associations/</link>
		<pubDate>Wed, 31 Oct 2018 21:55:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=117</guid>
		<description></description>
		<content:encoded><![CDATA[To start with, what does it mean for two variables to be associated? Even without prior knowledge of statistics and statistics terminology, you likely have considered or at least noticed variables associations both during your studies and in social life in general. For example, you probably know that fertility rates are higher in some countries and lower in others, and you might also know that the level of socioeconomic development also tends to differ between the two groups. You might also have noticed that, say, early childhood educators and hospital nurses tend to be women, while auto-mechanics or refrigerator repair technicians tend to be men. You certainly know that (<span style="font-size: 14pt">for now) </span><span style="text-indent: 1em;font-size: 14pt">prime-ministers in Canada and presidents of the USA have tended to be white (and male, and Christian).</span>

These of course are all examples of associations between variables. <strong>Every time it can be noted that specific attributes of one variable tend to go/appear more often with certain attributes of another variable, you're looking at an association. That is, we're looking for a <em>pattern</em> between the sets of attributes of two variables; a pattern where some attribute combinations are seen more frequently while other attribute combinations are observed less often.</strong>

Recall that we defined variables as characteristics that vary across cases. Variables can vary <em>independently</em> of one another, or they can vary together -- <em>in tandem</em>, as it were -- in such a way that when some attributes of one variable are present, you'd expect to see some specific attributes of the other variable present too. Like so: Countries defined as <em>developed</em> tend to have lower fertility rates than countries defined as <em>developing</em>, so we have the variables <em>level of socioeconomic development</em> on the one hand, and <em>fertility rate</em> on the other. The association pits high levels of the former variable with low levels of the latter variable and vice versa -- low levels of the former variable with high levels of the latter. These two combinations (high development/low fertility and low development/high fertility) are more likely to be observed than a no-pattern situation, where all sorts of combinations of development and fertility levels would be equally likely.

Similarly, research has repeatedly shown that some occupations tend to be male-dominated while others female dominated. If there were no association (i.e., no pattern between the two sets of attributes), we would expect to observe approximately equal numbers of women and men in all occupations -- but from what we've seen, that's not the case. That is, it seems there is an association between the variables <em>gender</em> and (choice of) <em>occupation</em>. Furthermore, participation in Canadian and US politics (and voters' preferences), especially at the highest levels of power, appears also to be gendered -- as well as associated with other variables like <em>race/ethnicity</em> and <em>religious affiliation</em>.
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title">Try It! XX</p>

</header>
<div class="textbox__content">

Try to think of some other bivariate associations on your own. Start with something simple, like asking yourself if you commonly encounter some characteristic alongside a specific other characteristic; e.g., are dark-haired people more likely to have brown<span style="text-indent: 1em;font-size: 1rem"> eyes while at the same time are blonde people more likely to have blue eyes? (Or, are the combinations dark hair/brown eyes and blond hair/blue eyes more common than dark hair/blue eyes and blond hair/brown eyes? Is hair colour related to -- associated with -- eye colour?) Etc.</span>

</div>
</div>
Now that you're more familiar with the associations vocabulary, let's clarify the typology of variable associations. <strong>There are two substantively different types of variable associations: <em>statistical </em>associations and <em>causal </em>associations.</strong> Claiming a causal association between variables is stronger than the claim for statistical association. Further, <strong>having a statistical association between two variables is a prerequisite for claiming a causal association between them -- a prerequisite that is a <em>necessary but not sufficient</em> condition</strong> at that.

Statistical inference provides tests for establishing statistical association, to some basics of which I'll introduce you in the remaining chapters. Establishing causality, however, takes statistical associations as only but a starting point, as you will see in Section XX. <strong>Statistical associations are for the most part a <em>technical</em> matter -- causality, on the other hand, is based on <em>logic</em> </strong>and involves one's ability to consider (and account for) multiple variables' associations at the same time.

When two variables vary together, we simply can say they are <em>associated</em>; however, when we claim causality, we call one variable <em>the cause</em> (or <em>predictor</em>) and the other <em>the effect</em> (or <em>outcome</em>).

In summary, finding if two variables are statistically associated (i.e., that some attributes on one of the variables tends to go with specific attributes of the other) is relatively easy. Claiming that one variable <em>affects</em> another (i.e., that changes in one variable produce/cause changes in the other variable), on the other hand, is not easy at all -- rather, in the social world, it is quite difficult. But we'll get to that later.

For now, let's start with statistical associations and how to "find" them. To get there, first we need to take a brief trip to the <span style="text-indent: 18.6667px;font-size: 14pt">(almost everyone's favourite)</span><span style="text-indent: 18.6667px;font-size: 14pt"> </span><span style="text-indent: 1em;font-size: 14pt">land of descriptive statistics in order to learn to even recognize potential statistical associations. We do that through bivariate description, i.e., by describing two variables together, considering them and their potential association at the same time.</span>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>117</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 17:55:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 21:55:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[7-1-types-of-bivariate-associations]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>34</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>8.4. Errors of Inference</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/8-4-errors-of-inference/</link>
		<pubDate>Wed, 31 Oct 2018 22:02:10 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=122</guid>
		<description></description>
		<content:encoded><![CDATA[Making decisions about hypotheses is inference based on evidence and logic. Inference, however, doesn't come with a guarantee of being right -- in fact, it's guaranteed that being right all the time is impossible. All the evidence and logic in the world will not be enough to ensure 100% certainty of making the right decision simply by the probabilistic nature of statistical inference. As long as we work with samples to estimate populations, some amount of uncertainty will be unavoidable -- or it wouldn't be called inference.

Logically speaking, since we have <em>two</em> options given a null hypothesis (to reject or not to reject), we can make <em>two</em> types of mistakes. One is to be wrong about rejecting the null hypothesis, the other to be wrong about <em>not</em> rejecting it.

You might be rolling your eyes at this -- well<em> duh!</em> -- but bear with me: these really are the two types of statistical error, imaginatively called <em>Type I</em> and <em>Type II</em>.

<strong>If we reject a true null hypothesis, we commit a Type I error. If we fail to reject a false null hypothesis, we commit a Type II error.</strong> Before I even explain these further, make a mental note that since we <em>either</em> reject <em>or</em> fail to reject a null hypothesis -- one <em>or</em> the other -- <strong>at any given time we can only make <em>only one</em> of the two types of errors.</strong> If you've rejected your null hypothesis, the <em>only</em> error you could have committed is Type I; if you've <em>not</em> rejected your null hypothesis, the <em>only</em> error you could have made is Type II.

The trick of course is that we never know if we have made an error or not. (If we knew, we wouldn't be making it in the first place, am I right?) We only know that the possibility that we've made an error exists. However, as with everything about inference we've discussed so far, what we can do is to quantify the uncertainty as best as we can.

Table XX summarizes the errors of inference based on the (unknown) real situation and the (uncertain) decision we've made about it, through an analogy of a criminal trial. The null hypothesis then stands for "innocent" (no effect/difference/association, etc.) while the alternative hypothesis stands for "guilty" (there is an effect/difference/association, etc.).

<em>Table XX Errors of Statistical Inference</em>
<table class="shaded" style="border-collapse: collapse;width: 100%" border="0">
<tbody>
<tr>
<td style="width: 48.8947%"></td>
<td style="width: 25.8911%;text-align: center"><strong>Reality: Guilty</strong></td>
<td style="width: 25.2141%;text-align: center"><strong>Reality: Innocent</strong></td>
</tr>
<tr>
<td style="width: 48.8947%"><strong>Reject H<sub>0</sub>: Innocent ⇒ Guilty Verdict</strong></td>
<td style="width: 25.8911%;text-align: center"><span style="color: #3366ff">Correct Decision (1-<em>β</em>)</span></td>
<td style="width: 25.2141%;text-align: center"><span style="color: #ff0000">Type I Error (<em>α</em>)</span></td>
</tr>
<tr>
<td style="width: 48.8947%"><strong>Fail to Reject H<sub>0</sub>: Innocent ⇒ Innocent Verdict</strong></td>
<td style="width: 25.8911%;text-align: center"><span style="color: #ff0000">Type II Error (<em>β</em>)</span></td>
<td style="width: 25.2141%;text-align: center"><span style="color: #3366ff">Correct Decision </span></td>
</tr>
</tbody>
</table>
Recall from Section XX that to reject the null hypothesis, we had to have a test with a <em>p</em>-value lower than the pre-selected level of significance <em>α</em>,  i.e., <em>p≤α</em>. The level of significance amounted essentially to how much probability of being wrong we were able to tolerate (so as long as the probability of having the observations we did given a true null hypothesis -- i.e., the p-value -- was less than that, we would be fine).

Now consider that I just defined Type I error as the probability that we're wrong about rejecting a true null hypothesis -- and <em>ta-dam!</em> -- <strong>Type I error is exactly equal to <em>α, </em>the significance level</strong>! The great thing about it is that it's not only precise, it's also utterly under our control as <em>we</em> are the ones to decide how much error (regarding "convicting an innocent") we want to tolerate. If we want a smaller such chance, we can just raise the bar, as it were - so that only the smallest p-values can pass under the lowest possible <em>α</em>[footnote]Make sure you don't confuse <em>p</em> and <em>α</em>, especially in that <em>p</em> doesn't show the probability of being wrong. Even the significance level is not the <em>true</em> error rate (Selkke, Bayarri &amp; Berger, 2001), some of which you can find <a href="https://blog.minitab.com/blog/adventures-in-statistics-2/how-to-correctly-interpret-p-values">here</a> if you're curious.[/footnote<em>]</em>.

<span style="font-size: 14pt;text-indent: 18.6667px">On the other hand, </span><strong style="font-size: 14pt;text-indent: 18.6667px">when we fail to reject a false null hypothesis </strong><span style="font-size: 14pt;text-indent: 18.6667px">(i.e., when we "let a guilty person go free as if innocent"), </span><strong style="font-size: 14pt;text-indent: 18.6667px">we make a Type II error, called <em>β</em></strong><span style="font-size: 14pt;text-indent: 18.6667px">[footnote]The small-case Greek letter </span><em style="font-size: 14pt;text-indent: 18.6667px">b</em><span style="font-size: 14pt;text-indent: 18.6667px"> is </span><em style="font-size: 14pt;text-indent: 18.6667px">β</em><span style="font-size: 14pt;text-indent: 18.6667px">, pronounced ['BEI-tuh].[/footnote].</span><span style="font-size: 14pt;text-indent: 18.6667px"> </span>At the same time, as you can see in Table XX, <strong>the probability to correctly reject a false null hypothesis is a neat </strong><em><strong>1-β</strong> </em>(after all, the decision has only two options), <strong>known as the <em>power</em> of the test</strong>.

<span style="font-size: 14pt;text-indent: 18.6667px">Unfortunately, there is no way for us to directly control </span><em style="font-size: 14pt;text-indent: 18.6667px">β</em><span style="font-size: 14pt;text-indent: 18.6667px">; your best bet is to have a large sample size, which increases the test's power (to detect an effect/difference/"guilt") where it truly exists, and thus indirectly decreasing<em> β</em>. </span>

<em>Well, then</em>, you might logically ask, <em>why don't we just decrease both Type I and Type II errors? </em>I'm afraid you can't do that: <strong>Type I and Type II errors are opposites, and as such there is a trade-off between them.</strong> Think about it: if you hate the thought of convicting an innocent, and say you'd never do it, you'll end up deciding "innocence" all the time, thus inevitably at some point letting a criminal go. If you decide that you hate letting criminals go, you can convict everyone, but then of course, you'll inevitably end up convicting an innocent.

In other words, the harder you make it to reject a null hypothesis/"to convict" (by making <em>α</em> the lowest possible), the higher the chances you'll commit Type II error, failing to reject a false null hypothesis (and you'll let a criminal slip free). The easier you make it to reject a null hypothesis/"to convict" (by making <em>α</em> as high as you want), of course the higher the odds of committing Type I error, rejecting a true null hypothesis (and convicting an innocent).

In summary, the errors of inference are unavoidable: every time we make a  decision about the null hypothesis one way or the other, we run the risk of making <em>one</em> of the statistical errors. With a careful selection of <em>α</em> and a comfortably large sample size, making an error shouldn't worry you too much -- but do not forget that it is a distinct possibility.

I end this chapter with a warning.
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch out!!</strong></span>... for Mixing Up Your Error Concepts</em></p>

</header>
<div class="textbox__content">

You might recall that, the statistical errors presented in this chapter aside, we discussed two other error concepts, the <em>random error</em> and the <em>standard error</em>. Make a note about all three:<strong> 1) the random error, 2) the standard error, and, 3) the Type I error and Type II error of statistical inference are all different concepts.</strong>

As a brief reminder, the random error is an inevitable corollary of sampling and reflects the fact that a sample is different from the population from which it was taken; the standard error is simply a formula for the standard deviation of the sampling distribution; and finally, the Type I and Type II statistical errors apply to decisions about the null hypothesis during testing.

</div>
</div>
Now that you know how hypothesis testing works <em>in principle</em>, let's get us some variables' associations tested with their appropriate tests, in Chapter 9 and Chapter 10.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>122</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:02:10]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:02:10]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[8-4-errors-of-inference]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>1051</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-4-errors-of-inference]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[8-3-errors-of-inference]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>9.2 Between Two Discrete Variables: The χ2</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/9-2-the-chi-square/</link>
		<pubDate>Wed, 31 Oct 2018 22:05:15 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=126</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

As in the previous section, here you need to recall how we examine potential association between two variables treated both as discrete (Section 7.2.2). We described such associations through contingency tables, reporting differences of proportions as appropriate.

We can start with the simplest, binary case: when the discrete variables have two groups each. Then we compare the groups of interest (categories of one variable) on one of the categories of the other variable. (The example in Chapter 7 we used was to compare the percentage of first-year students who like the campus cafeteria to the percentage of second-year students who do.)

<strong>The <em>t</em>-test for testing difference of <em>two</em> proportions.</strong> When we have only two proportions (or percentages) to compare, we can actually use the same <em>t</em>-test we used for testing differences of means, again treating the<em> difference</em> as a single, normally distributed statistic. Since we have categorical variables, however, and no standard deviations/variances, we resort to measuring population variability by π(1-π) and sample variability by <em>p</em>(1-<em>p</em>)[footnote]Don't forget that <em>p</em> here stands for <em>proportion</em>, not <em>probability/p</em>-<em>value</em>.[/footnote] (see Section XX). We can thus simply substitute that into the formula for <em>z</em>:

$z=\frac{(p_1 -p_2)-(\pi_1 -\pi_2 )}{\sqrt{\frac{\pi_1(1-\pi_1)}{N_1}+\frac{\pi_2(1-\pi_2)}{N_2}}}$

where, of course, under the null hypothesis $(\pi_1 -\pi_2 )=0$. Then, using the sample proportions leaves us with <em>t</em>:

$t=\frac{(p_1 -p_2)}{\sqrt{\frac{p_1(1-p_1)}{N_1}+\frac{p_2(1-p_2)}{N_2}}}$

Again, under the null hypothesis the two groups' proportions are assumed to be the same so effectively we have:

$t=\frac{(p_1 -p_2)}{\sqrt{p(1-p)(\frac{1}{N_1}+\frac{2}{N_2})}}$

Let's revisit the cafeteria-preferences example from Section 7.2.2 to see how the <em>t</em>-test for testing difference of proportions works.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XX Do You Like the Campus Cafeteria? (A t-Test)</em></p>

</header>
<div class="textbox__content">

In Chapter 7 we imagined that you asked 35 students in your class[footnote]Note that this of course is not a random sample; we're using it here only for illustrating how hypothesis testing works so we are effectively pretending it's random. In a real-life study, you shouldn't use non-probability samples for statistical inference.[/footnote] whether they liked the campus cafeteria: 12 of your classmates said yes (i.e., 34.3%), 7 (out of 15) first-years and 5 (out of 20) second-years (46.7% of all first-years and 25% of all second-years, respectively).

We want to know whether the observed in the sample difference in proportions (0.467-0.25=0.217) is statistically significant: can it be generlized to a larger student population, or is it due to a regular sampling variability?
<ul>
 	<li>H<sub>0</sub>: The proportion of first year students who like the cafeteria is the same as the proportion of second year students who do; $\pi_1=\pi_2$.</li>
 	<li>H<sub>a</sub>: The proportion of first year students who like the cafeteria is different than the proportion of second year students who do; $\pi_1\neq\pi_2$.</li>
</ul>
Substituting these numbers in the formula we have:

$t=\frac{(p_1 -p_2)}{\sqrt{p(1-p)(\frac{1}{N_1}+\frac{2}{N_2})}}=\frac{0.467-0.25}{\sqrt{0.343(1-343)(\frac{1}{15}+\frac{1}{20})}}=\frac{0.217}{0.162}=1.34$

<strong>With a <em>t</em>=1.34, <em>df</em>=34, and <em>p</em>=0.189 (i.e., <em>p</em>&gt;0.05) we <em>fail</em> to reject the null hypothesis: at this point we don't have enough evidence to conclude there is a difference between the proportions of first and second year students who like the campus cafeteria. The 21.7 percentage points difference is not statistically significant, and has a high enough probability of being due to random chance</strong>.

We can check this with a confidence interval too:
<ul>
 	<li>95% CI: $(p_1 -p_2)\pm1.96\times\sqrt{\frac{p_1(1-p_1)}{N_1}+\frac{p_2(1-p_2)}{N_2}}=0.217\pm1.96\times\sqrt{\frac{0.467(0.533)}{15}+\frac{0.25(0.75)}{20}}=0.217\pm0.316=(-0.099; 0.533)$</li>
</ul>
<strong>In other words, the difference between the proportion of first years and the proportion of second years who like the cafeteria could be anywhere between -9.9 percentage points and 53.3 percentage points with 95% confidence (or 19 out of 20 such samples will have a difference within this pretty large interval).</strong> The difference can be in favour of second years or in favour of the first years (notice the negative lower bound); it can even be 0. Thus, <strong>since a difference of 0 (i.e., no difference) is a plausible value, we cannot reject the null hypothesis. We conclude that we don't have enough evidence of an association between year of study and opinion on the campus cafeteria.</strong>

</div>
</div>
Admittedly, the formulas look scary but if you've followed through the example above, you have seen by now the actual calculation is quite simple. You can try it out and see for yourself.
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title">Try It!! XX Vegetarianism/Veganism among Canadian and International Students</p>

</header>
<div class="textbox__content">

Imagine you're interested in exploring whether there is a difference between Canadian and international students in your university when it comes to dietary preferences like vegetarianism and veganism. With your institution's registrar's assistance, you take a random sample of 100 students and poll them on 1) whether they are a Canadian or an international student, and 2) whether they are vegetarian/vegan or not.

You find that you have 70 Canadian and 30 international students in your sample. Out of the Canadian students, 15 (or 21.4%) are vegetarian or vegan; out of the international students 5 (or 16.7%) have such dietary restrictions.

Check if the observed <em>in the sample</em> difference in proportions is generalizable to the larger student population by testing the hypothesis whether dietary preferences are associated with country of origin. Create a 95% confidence interval for that difference, and substantively interpret what you have found with both the t-test and the confidence interval.

Useful hint 1: Among the 100, there are 20 vegan/vegetarian students in total.

Useful hint 2: You can find the <em>p</em>-value of your <em>t</em>-statistic <a href="https://www.socscistatistics.com/pvalues/tdistribution.aspx">here</a>.

</div>
</div>
Of course, discrete variables don't have to be binary: they can have more than two categories each. Just like in the case of a continuous and a discrete variables' association discussed in the previous section where non-binary variables required the use of an <em>F</em>-test, there is a different test for testing the association between any two discrete variables, regardless of their respective number of categories (i.e., not just binary ones).

<strong>The <em>χ<sup>2</sup></em>-test for testing associations between discrete variables. </strong>The <em>χ<sup>2</sup></em>-test[footnote]This is the small-case Greek letter <em>h</em>, <em>χ</em>. <em>It is pronounced [KHAI]</em>, but since it's transliterated as <em>chi</em>, many people incorrectly pronounce it as [CHAI] or even [CHEE]. The test itself is called chi-squared test (again, pronounced as [KHAI- squared] not [CHAI- or CHEE-squared]).[/footnote] (or Pearson's <em>χ<sup>2</sup></em>-test) is based on <strong>a comparison between the <em>observed</em> and the <em>expected</em> cell values in a contingency table.</strong>

The observed values are the cell counts you see in a contingency table given a specific dataset. The expected values, on the other hand, are the counts we would <em>expect</em> to see <em>if there were no pattern/association in the data</em>. In other words, the test effectively compares the sample to a null-hypothesis-like hypothetical distribution of the observations across the cells. Thus, logically, <strong>if there is a relatively large difference between the observed and the expected values, we can take that as evidence against the null hypothesis and reject it. If, however, the difference between observed and expected values is relatively small, the evidence against the null hypothesis will be insufficient and we would <em>fail</em> to reject it.</strong>

The actual way the <em>χ<sup>2</sup></em><sup> </sup>is calculated is this:

$$\chi^2=\Sigma\frac{(f_o -f_e)^2}{f_e}$$

where <em>f<sub>o</sub></em> is the observed frequency (count) and <em>f<sub>e</sub></em> is the expected frequency count of a given cell.

The formula looks more complicated than it is (don't they always?) -- it only asks us to calculate the difference between the observed and the expected count <em>for each cell</em>, square it and divide it by the expected count; once we have done this for all cells, we need only add the resulting numbers together to get the <em>χ<sup>2</sup></em> .

Considering that the <em>χ<sup>2</sup></em><sup> </sup>is then a sum of as many numbers as there are cells, the larger the table (i.e., the more rows and columns there are), the bigger the resulting <em>χ<sup>2</sup></em><sup> </sup>will be. To account for that, the <em>χ<sup>2</sup></em><sup> </sup>too has degrees of freedom, where the <em>df</em>=(<em>rows</em>-1)(<em>columns</em>-1). The <em>χ<sup>2 </sup></em>follows a <em>χ<sup>2</sup>-</em>distribution, which too provides a <em>p</em>-value given specific <em>df</em>.

<strong>The hypothesis testing then follows the same steps as the <em>t</em>-test and the <em>F</em>-test: obtain <em>χ<sup>2</sup></em>-value with specific <em>df, </em>find its associated <em>p</em>-value, and finally compare the <em>p</em>-value to the pre-selected significance level. If <em>p</em>&lt;<em>α</em>, reject the null hypothesis.</strong>

To demonstrate, I'll first show you a <em>one-way</em> <em>χ<sup>2</sup></em><sup> </sup>calculation, i.e., based on the frequency distribution of just one variable. (Of course, if tabulated, this would not be considered a contingency table but a frequency table.)
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XX Do You Like The Campus Cafeteria? (Univariate χ<sup>2</sup>-Test)</em></p>

</header>
<div class="textbox__content">

To use the imaginary data from before, we had 12 people who admitted liking the campus cafeteria food out of the 35 polled. (Since we're interested only in one of the variables, here we ignore whether the students who like the cafeteria are first- or second-years.) As such, we have the following table:

<em>Table XX Approval of the Campus Cafeteria, Observed Count (Univariate)  </em>
<table class="lines" style="border-collapse: collapse;width: 50.2841%;height: 85px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px"><strong>Yes</strong></td>
<td style="width: 2.83286%;height: 15px">12</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px"><strong>No</strong></td>
<td style="width: 2.83286%;height: 15px">23</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px"><strong>Total</strong></td>
<td style="width: 2.83286%;height: 15px">35</td>
</tr>
</tbody>
</table>
If you didn't know anything about the campus cafeteria and had no observations about it whatsoever -- i.e., had you been an impartial observer, as it were -- wouldn't you expect to see an approximately 50/50 split of the 35 students into the two categories? After all, there are only two groups, and an unbiased (random) distribution would be exactly like everyone flipping a coin as a manner of deciding in which group they end up. Thus, <strong>the expected count here is simply N divided by the number of groups/categories</strong> (denoted by <em>k</em>):

$f_e=\frac{N}{k}=\frac{35}{2}=17.5$

Table XX adds the expected count in brackets next to the observed count.

<em>Table XX Approval of the Campus Cafeteria, Observed and Expected</em> <em>Count (Univariate)</em>
<table class="lines" style="border-collapse: collapse;width: 50.2841%;height: 85px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px"><strong>Yes</strong></td>
<td style="width: 2.83286%;height: 15px">12    (17.5)</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px"><strong>No</strong></td>
<td style="width: 2.83286%;height: 15px">23    (17.5)</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px"><strong>Total</strong></td>
<td style="width: 2.83286%;height: 15px">35</td>
</tr>
</tbody>
</table>
Then, according to the formula, this is what we have for each of the two groups:
<ul>
 	<li>Yes-group: $\frac{(f_o-f_e)^2}{f_e}=\frac{(12-17.5)^2}{17.5}=\frac{30.25}{17.5}=1.73$</li>
 	<li>No-group: $\frac{(f_o-f_e)^2}{f_e}=\frac{(23-17.5)^2}{17.5}=\frac{30.25}{17.5}=1.73$</li>
</ul>
Finally, to get the<em> χ<sup>2</sup> </em>we only need to add these two numbers together:

$\chi^2=\Sigma\frac{(f_o -f_e)^2}{f_e}= \frac{(12-17.5)^2}{17.5}+\frac{(23-17.5)^2}{17.5}=1.73+1.73=3.46$

The degrees of freedom in a one-way test is <em>k</em>-1, where <em>k</em> is the number of categories/groups. In this case we have <em>k</em>=2, so <em>df</em>=1.

<strong>With a <em>χ<sup>2 </sup>=3.45,</em><em> df</em>=1</strong>,<strong> and a <em>p</em>=0.06</strong>[footnote]You can check the significance of any <em>χ<sup>2 </sup></em>with a convenient online calculator, like this one <a href="https://www.socscistatistics.com/pvalues/chidistribution.aspx">here</a>.[/footnote] (i.e., <em>p</em>&gt;0.05), <strong>we fail to reject the null hypothesis. At this time, we do<em> not</em> have enough evidence to conclude that the observed distribution of the students is unusual enough to suggest a pattern which is different than a random variation of a 50/50 split. As such, this distribution is <em>not</em> statistically significant -- we cannot conclude that the students lean one way or the other in their opinion about the campus cafeteria.</strong>

</div>
</div>
Calculating a two-way <em>χ<sup>2 </sup></em>-- by far the more often used one as it tests associations between two variables -- is just as easy, even if it involves calculating more numbers (since in the bivariate case we have more cells; four at the minimum, given a 2x2 cross-tabulation).
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XX Do You Like The Campus Cafeteria? (Bivariate χ<sup>2</sup>-Test)</em></p>

</header>
<div class="textbox__content">

While we already know that year of study and opinion on he campus cafeteria are not statistically associated from the t-test in Example XX, I will further use the imaginary data in original contingency table from Example XX to demonstrate a two-way<em> χ<sup>2</sup></em>-test. Then, this was the table we had is Section 7.7.2.

Table XX <em>Do You Like The Campus Cafeteria? (Revisited)</em>
<table class="lines" style="border-collapse: collapse;width: 0%;height: 60px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"></td>
<td style="width: 30.8403%;height: 15px;text-align: center"><strong>First Year Students</strong></td>
<td style="width: 29.8917%;height: 15px;text-align: center"><strong>Second Year Students</strong></td>
<td style="width: 36.8762%;height: 15px;text-align: center"><strong>Total</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"><strong> YES</strong></td>
<td style="width: 30.8403%;height: 15px;text-align: center">7</td>
<td style="width: 29.8917%;height: 15px;text-align: center">5</td>
<td style="width: 36.8762%;height: 15px;text-align: center">12</td>
</tr>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"><strong> NO</strong></td>
<td style="width: 30.8403%;height: 15px;text-align: center">8</td>
<td style="width: 29.8917%;height: 15px;text-align: center">15</td>
<td style="width: 36.8762%;height: 15px;text-align: center">23</td>
</tr>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"><strong>Total</strong></td>
<td style="width: 30.8403%;height: 15px;text-align: center">15</td>
<td style="width: 29.8917%;height: 15px;text-align: center">20</td>
<td style="width: 36.8762%;height: 15px;text-align: center">35</td>
</tr>
</tbody>
</table>
Our hypotheses are:
<ul>
 	<li>H<sub>0</sub>: Liking the cafeteria or not is not associated with one's year of study; first- and second-year students are equally likely to lie the cafeteria, or<em> π<sub>1</sub>=π<sub>2</sub></em>.</li>
 	<li>H<sub>a</sub>: Liking the cafeteria is associated with one's year of study; first-year students and second-year students differ in their liking of the cafeteria, or <em>π<sub>1</sub>≠π<sub>2.</sub></em></li>
</ul>
To compute the <em>χ<sup>2</sup></em>, we need the expected count for each cell. Unlike the one-way <em>χ<sup>2</sup></em> case, however, determining the expected count in a contingency table is a bit more complicated than dividing the N on the number of groups and expecting the same (expected) number in each cell. Instead, we multiply the respective group/category sizes (i.e., the raw total and the column total at the margins) and divide the product by N (the full total)[footnote]<span style="font-size: 1rem">We do that to account for the different group/category sizes.</span><span style="text-indent: 1em;font-size: 1rem">[/footnote]:</span>

$f_e=\frac{N_j\times N_k}{N}$

where <em>j</em> is the size of the respective group and <em>k</em> is the size of the respective category[footnote]Recall that to differentiate between the groups/categories of the two variables, we refer to one variable having groups and the other having categories: so that we can say we compare the groups of one variable on the categories of the other.[/footnote].

Thus we have the following:
<ul>
 	<li>First-years who said "Yes": $f_e=\frac{N_j\times N_k}{N}=\frac{15\times 12}{35}=5.14$</li>
 	<li>Second-years who said "Yes": $f_e=\frac{N_j\times N_k}{N}=\frac{20\times 12}{35}=6.86$</li>
 	<li>First-years who said "No": $f_e=\frac{N_j\times N_k}{N}=\frac{15\times 23}{35}=9.86$</li>
 	<li>Second-years who said "No": $f_e=\frac{N_j\times N_k}{N}=\frac{20\times 23}{35}=13.14$</li>
</ul>
Table XX adds the expected count in brackets next to the observed count.

Table XX <em>Do You Like The Campus Cafeteria? (Observed and Expected Frequencies)</em>
<table class="lines" style="border-collapse: collapse;width: 0%;height: 60px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"></td>
<td style="width: 30.8403%;height: 15px;text-align: center"><strong>First Year Students</strong></td>
<td style="width: 29.8917%;height: 15px;text-align: center"><strong>Second Year Students</strong></td>
<td style="width: 36.8762%;height: 15px;text-align: center"><strong>Total</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"><strong> YES</strong></td>
<td style="width: 30.8403%;height: 15px;text-align: center">7   (5.14)</td>
<td style="width: 29.8917%;height: 15px;text-align: center">5   (6.86)</td>
<td style="width: 36.8762%;height: 15px;text-align: center">12</td>
</tr>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"><strong> NO</strong></td>
<td style="width: 30.8403%;height: 15px;text-align: center">8   (9.86)</td>
<td style="width: 29.8917%;height: 15px;text-align: center">15   (13.14)</td>
<td style="width: 36.8762%;height: 15px;text-align: center">23</td>
</tr>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"><strong>Total</strong></td>
<td style="width: 30.8403%;height: 15px;text-align: center">15</td>
<td style="width: 29.8917%;height: 15px;text-align: center">20</td>
<td style="width: 36.8762%;height: 15px;text-align: center">35</td>
</tr>
</tbody>
</table>
Now we only need calculate the four elements of the χ2 and add them altogether at the end.
<ul>
 	<li>First-years who said "Yes": $\frac{(f_o-f_e)^2}{f_e}=\frac{(7-5.14)^2}{5.14}=0.67$</li>
 	<li>Second-years who said "Yes": $\frac{(f_o-f_e)^2}{f_e}=\frac{(5-6.86)^2}{6.86}=0.5$</li>
 	<li>First-years who said "No": $\frac{(f_o-f_e)^2}{f_e}=\frac{(8-9.86)^2}{9.86}=0.35$</li>
 	<li>Second-years who said "No": $\frac{(f_o-f_e)^2}{f_e}=\frac{(15-13.14)^2}{13.14}=0.26$</li>
</ul>
Finally,

$\chi^2=\Sigma\frac{(f_o -f_e)^2}{f_e}=0.67+0.5+0.35+0.26=1.78$

The degrees of freedom are, again, <em>df</em>=(<em>rows</em>-1)(<em>columns</em>-1), so here <em>df</em>=(2-1)(2-1)=1(1)=1.

That is, <strong>with<em> χ<sup>2</sup></em>=1.78, <em>df</em>=1, and <em>p</em>=1.18 (i.e., p&gt;0.05), we do <em>not</em> have enough evidence to reject the null hypothesis. At this time, we <em>cannot</em> claim there is an association between year of study and opinion on the cafeteria, i.e., the 0.217 difference in proportions we observe in the sample (7/15 versus 5/20, or 0.467 versus 0.25) is <em>not</em> statistically significant.</strong>

Of course, we already knew this from the t-test in Example XX[footnote]You may find it curious to know that the correspondence of results between the <em>t</em> and the <em>χ<sup>2</sup></em> goes even further: in the binary variables' case, squaring the <em>t</em>-value will give you exactly <em>χ<sup>2</sup></em>: <em>t<sup>2</sup>=χ<sup>2</sup></em>. In our examples, <em>t</em>=1.34, and 1.34<sup>2</sup>=1.79 which, if it wasn't for rounding, would be the same as <em>χ<sup>2</sup></em>. Even their respective degrees of freedom are the same, 1.8. This of course isn't the case when at least one of the discrete variables has more than two categories.[/footnote], so no surprises here.

</div>
</div>
The imaginary example above serves well as a work-through for calculating χ2, but we can do better -- an example using real, random-sample data and a large <em>N</em> is in order.

If you recall, in Section 7.7.2 we explored gender differences in the ability to speak an Aboriginal language using <em>APS 2012</em> (Statistics Canada, 2019) data. Armed with knowledge about the <em>χ<sup>2</sup></em>, now we can finish that investigation.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX  <em>Testing</em> <em>Gender Differences in the Speaking Aboriginal Language Ability among Indigenous Canadians , APS 2012 </em></p>

</header>
<div class="textbox__content">

Our exploration in Section 7.7.2 left us with the following table.

Table XXB <em> Speaking Aboriginal Language Ability by Gender, APS 2012</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-aboriginal-gender-language-percent.jpg" alt="" width="582" height="260" class="wp-image-999 size-full aligncenter" />

Source: Statistics Canada (2019).

Our hypotheses are:
<ul>
 	<li>H<sub>0</sub>: Gender and the ability to speak an Aboriginal language are not associated; women and men are equally likely to speak an Aboriginal language, or <em>π<sub>f</sub>=π<sub>m</sub></em>.</li>
 	<li>H<sub>a</sub>: Gender and the ability to speak an Aboriginal language are associated; women and men are not equally likely to speak an Aboriginal language, or <em>π<sub>f</sub>≠π<sub>m</sub></em>.</li>
</ul>
SPSS calculates <em>χ<sup>2 </sup></em>as 31.78. <strong>With <em>χ<sup>2 </sup>=</em>31.78, <em>df</em>=1, and <em>p</em>&lt;0.001, we have enough evidence to reject the null hypothesis and conclude that Indigenous women and men tend to differ in their ability to speak an Aboriginal language. The 3.6 percentage points difference (i.e., 45%-41.4%) in favour of women being more likely to speak an Aboriginal language is statistically significant and therefore generalizable to the larger Indigenous population. </strong>

</div>
</div>
I "cheated" out of presenting the actual calculations in the example above to give you the opportunity to do it on your own. Use it as an exercise in practicing your understating of the <em>χ<sup>2</sup></em> and<em> t</em> statistical significance tests.
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title">Try It!! XX <em>Testing</em> <em>Gender Differences in the Speaking Aboriginal Language Ability among Indigenous Canadians , APS 2012 </em></p>

</header>
<div class="textbox__content">

Using the information presented in Table XX, 1) calculate the expected frequencies for each cell and compute the <em>χ<sup>2</sup></em>; and 2) do a <em>t</em>-test on the difference of proportions and create a 95% confidence interval for the difference, to observe the correspondence between the different tests.

</div>
</div>
Finally, lest I leave you with the impression that there is no difference between using a t-test and a <em>χ<sup>2</sup></em>-test, let's consider a case where both variables have more than two categories. We definitely need to use the <em>χ<sup>2 </sup></em>for it, as we no longer have two proportions to consider.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Citizenship and Education, NHS 2011</em></p>

</header>
<div class="textbox__content">

A lot has been written about Canada's selective immigration practices: the Canadian government is committed to getting "the best and the brightest" immigrants through a point system which awards more points the more education the prospective immigrant has. [CITATIONS] Be that as it may, how does the rest of the Canadian population (the one born in Canada) compare to the supposedly highly-educated foreign-born? With the help of <em>NHS 2011</em> (Statistics Canada, 2019), we can find out. (Note that once again, I'm using about 3% random sub-sample of the data, for an <em>N</em>=21,577.)

For this example I use the variable <em>citizenship</em> which has three categories: born in Canada, naturalized Canadian, and not a Canadian citizen. For education, I use the same recoded variable I used in Example XX in the previous section, namely <em>degree</em>. Degree has six categories, ranging from (1) "no high school degree" to (6) "PhD" (for full category listing, see Example XX).

Table XX cross-tabulates citizenship and degree in a busy-looking 3x6 table (that's 18 cells!).

<em>Table XX Degree by Canadian Citizenship Status</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/crosstab-degree-and-citizenship-nhs.png" alt="" width="737" height="646" class="wp-image-1262 size-full aligncenter" />

Source: Statistics Canada (2019).

What do we see? Let's carefully examine the evidence[footnote]Don't forget to focus on the percentages, not the number count in each cell! Recall that you can only compare relative frequencies (relative to group size, that is).[/footnote]. While all citizenship groups follow a similar vertical "spread" (i.e., relatively few people without degrees, most people with high/secondary school and some post-secondary school certificates and short of Bachelor's degrees, then decreasing proportions in the higher education categories), this isn't what we are interested in. Recall that we are looking for a pattern between the two variables's categories/groups -- we are comparing groups on their levels of education.

As such, we see that fewer naturalized Canadians (18.6%) and fewer still non-Canadian citizens (15.8%) have no degrees compared to Canadian citizens (20.4%). Furthermore, in the three highest education categories (Bachelors, Master's, and PhD), both naturalized Canadians and non-Canadian citizens outperform those born in Canada (while the non-Canadian citizens even outperform naturalized Canadians in turn): 16.7% of naturalized Canadians and 20.5% of non-Canadian citizens have Bachelor's degrees compared to only 12.3% of Canadians born in the country; 11.2% of naturalized Canadians and 13.5% of non-Canadian citizens have Master's degrees compared to only 5.5% of the ones born in Canada; and, finally, 1% of naturalized Canadians and 1.4% of non-Canadian citizens have PhD's compared to 0.4% of those born in Canada.

Thus, the table suggests a pattern -- Canadians born elsewhere and non-Canadian citizens seem to have more education than the Canadian-born. Whether this pattern showing difference in proportions in the education degrees among the different citizenship status groups is statistically significant (i.e., generalizable to the Canadian population) remains to be checked -- through a <em>χ<sup>2</sup>-</em>test.

These are our hypotheses:
<ul>
 	<li>H<sub>0</sub>: Citizenship status and educational degree are not associated; Canadian-born, naturalized citizens, and non-Canadian citizens are on average similarly educated, and are equally likely to to be highly educated.</li>
 	<li>H<sub>a</sub>: Citizenship status and educational degree are associated; Canadian-born, naturalized citizens, and non-Canadian citizens have different levels of education on average, and are not equally likely to be highly educated.</li>
</ul>
I would guess you'd rather not calculate the expected frequencies and their differences from the observed frequencies for all 18 cells (but if you want to do it, who am I to stop you), so I'll report the SPSS output instead.

<strong>With <em>χ<sup>2</sup>=</em>449.543, <em>df</em>=10</strong>[footnote]<em>Df</em>=(<em>rows</em>-1)(<em>columns</em>-1)=(6-1)(3-1)=5(2)=10.[/footnote]<strong>, and <em>p</em>&lt;0.001, we have enough evidence to reject the null hypothesis and conclude that citizenship status and educational degree are statistically significantly associated: people born in Canada, naturalized Canadians, and non-Canadian citizens differ in their levels of education.</strong> It seems indeed that Canadians born in the country are on average less educated than both naturalized Canadians and non-Canadian citizens, perhaps as a result of the selective criteria for Canadian immigration.

</div>
</div>
<strong>Important conditions for using the <em>χ<sup>2</sup></em>-test.</strong> For the <em>χ<sup>2</sup></em>-test to work properly, two conditions must be met: 1) the expected count should not be less than 1 for any of the contingency table cells; and 2) no more than 20% of the cells should have an expected count less than 5. SPSS warns you about violations of these conditions in the output; if you're not using SPSS you should make sure the conditions are met before proceeding with analysis. Either way, if these conditions are not met, you shouldn't use the <em>χ<sup>2</sup></em>-test and consider a different type of testing instead (not discussed here).

Finally, a brief word of warning.
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><strong><span style="color: #ff0000">Watch Out!!</span>... for Identifying The Wrong Pattern</strong></em></p>

</header>
<div class="textbox__content">

Once again, the warning is about how to read a contingency table <em>in light of an association between two variables</em>. The pattern (association) we are interested in and the one we test is a comparison between the groups of one variable on the categories of the other variable. Thus, looking at how the observations are divided within each group is only marginally relevant to the research question, and doesn't contribute to analyzing the association in question.

In Example XX above, all immigration status groups were divided relatively similarly across the educational categories but, as interesting you may find this "pattern", that is <em>not</em> an indication of an association -- comparing the percentages/proportions of the different groups in the same category is. In other words, in that example we were interested in whether there was a <em>difference in percentages/proportions among the Canadian-born, naturalized Canadians, and non-Canadians citizens</em> with no education,<em> or</em> with high school degree only, <em>or</em> with some college degree or certificate only, <em>or</em> with Bachelor's degree, etc. We were <em>not</em> interested in what percentage of Canadian-born (or naturalized, or non-Canadian citizens) have no degree, <em>and</em> what percentage have high school degree, <em>and</em> what percentage have some college degree or certificate, etc. (if you recall, the latter add up to a 100%, and can be referred to as how the observations are spread across categories <em>within</em> each group).

As in Section 7.2.2, what it comes down to is knowing which way to read the table, according to the research question you have.[footnote]<span style="font-size: 1rem">I'll remind you again of the rule of thumb: if the groups you're comparing are in the columns, and the percentages down the columns add to 100%, then look at and compare the percentages/proportions on the same row. If the groups you're comparing are in the rows, and the rows add up to 100%, then compare the percentages down the same column.</span><span style="text-indent: 1em"><span style="font-size: 1rem">[/footnote] To use the language of causality, <strong>to the extent that you can identify an independent and a dependent variable, to examine an association between the variables you'll be looking to compare the groups of the </strong></span><strong>independent<span style="font-size: 1rem"> variable on the categories of the dependent variable.</span></strong></span>

</div>
</div>
We finish the chapter with the tips on using SPSS for <em>χ<sup>2</sup></em>-testing.
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip XX The χ<sup>2</sup>-test</em></p>

</header>
<div class="textbox__content">
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Analyze</em>, and from the pull-down menu, click on <em>Descriptive Statistics</em> and then <em>Crosstabs</em>;</li>
 	<li>From the variable list on the left, select your variable (the independent variable, with groups to be compared) and, using the bottom arrow, move it to the <em>Column(s)</em> empty space on the right;</li>
 	<li>From the variable list on the left, select your variable (the dependent variable, on whose categories you'll compare the groups) and, using the bottom arrow, move it to the <em>Row(s)</em> empty space on the right[footnote]Again, the convention is to put the independent variable in the columns and the dependent variable in the rows. This is not a hard-set rule, however, and it's perfectly acceptable to do it the opposite way. The only thing that is <em>not</em> a matter of preference is for which percentages you should ask, <em>columns</em> or <em>rows</em>. <em>If your independent variable is in the columns, you need column percentages to compare, if your independent variable is in the rows, you need row percentages to compare.</em> In this latter case, this <em>is</em> a hard-set rule, and if you violate it, you will not be able to properly identify -- and test -- the association you might be investigating.[/footnote];</li>
 	<li>Click on <em>Statistics</em> and select <em>Chi-square</em> at the top of the new window, click <em>Continue</em>;</li>
 	<li>Once back in the<em> Crosstabs</em> window, click <em>Cells</em>; in the new window keep <em>Observed</em>[footnote]Note that from here you can also request <em>Expected</em> counts if you'd like to check them at any point.[/footnote] in <em>Counts</em> selected, and further select <em>Column</em> in <em>Percentages</em>; click <em>Continue</em>;</li>
 	<li>Once back to the <em>Crosstabs</em> window, click <em>OK.</em></li>
 	<li>SPSS will provide the requested output in the Output window: a contingency table followed by a <em>χ<sup>2</sup></em>-test table, containing the <em>χ<sup>2</sup></em>-value, <em>df</em>, and <em>p</em>-value.[footnote]Note that the table contains more than just the <em>χ<sup>2</sup></em>-test; discussing the rest of the tests is beyond the intended scope of this book.[/footnote]</li>
</ul>
</div>
</div>
With this, we turn to our last remaining topic: the testing and investigation of the association between two continuous variables in Chapter 10, next.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>126</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:05:15]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:05:15]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[9-2-the-chi-square]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>120</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>10.1. Correlation</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-1-correlation/</link>
		<pubDate>Wed, 31 Oct 2018 22:08:14 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=130</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

You'll recall from Section 7.2.3 that we use the coefficient of correlation (Pearson's) <em>r</em> to examine associations between two continuous variables. The correlation coefficient <em>r</em> varies between -1 and 1. The closer it is to either, the stronger the correlation, and the closer it is to 0, the weaker the correlation[footnote]The sign of <em>r</em> is there <em>only</em> to indicate the direction of the association: positive or negative, nothing else. Thus this is a reminder not to use <em>r</em>'s sign as a measure of magnitude or strength of the association. Thus, for example, -0.9 is a stronger association than 0.2 because -0.9 is closer to -1 than 0.2 is to 1. (In fact, 0.2. is much closer to 0, or no association.) That is, a strong negative correlation is <em>stronger</em> than a weak positive one, despite that -0.9&lt;0.2.[/footnote].

Where does <em>r</em> come from though? What does it actually measure? I doubt you have lost sleep wondering about these questions which I left unanswered in Chapter 7, but here is your chance to learn this anyway (think of it as closure of sorts).

The correlation coefficient is<span style="text-indent: 18.6667px;font-size: 14pt">, essentially,</span><span style="text-indent: 1em;font-size: 14pt"> a ratio of the variabilities of the two variables[footnote]To be precise, the ratio is between the covariance of <em>x</em> and <em>y</em> (i.e., their joint variability, <em>s<sub>xy</sub></em>) and the product of their separate variances<em> s<sub>x</sub></em> and <em>s<sub>y</sub></em>:</span>

$$r=\frac{s_{xy}}{s_x s_y}$$ or

$$\rho=\frac{\sigma_{xy}}{\sigma_x \sigma_y}$$ if we apply it to a population instead of a sample. (Here <em>ρ</em> is the small-case Greek letter <em>r</em>, pronounced [ROH].)

<span style="text-indent: 1em;font-size: 14pt">[/footnote]. <strong>The easiest way to calculate </strong></span><strong><em style="text-indent: 1em;font-size: 14pt">r</em></strong><span style="text-indent: 1em;font-size: 14pt"><strong> between a variable<em> x</em> and a variable<em> y</em> is through the distances of the observations from the means of the two variables, or more accurately, the sums of squares</strong>[footnote]Recall that the sum of squares was the numerator in the formulas for the variance and the standard deviation. We take the distances of the observations from the mean, square them, and them add them altogether. (We square them </span><em style="text-indent: 1em;font-size: 14pt">before</em><span style="text-indent: 1em;font-size: 14pt"> adding t</span><span style="text-indent: 18.6667px;font-size: 14pt">o turn them all positive, otherwise they'd cancel each other upon summation. See Section XX for details.)</span><span style="text-indent: 1em;font-size: 14pt"> [/footnote]: </span>

$$r=\frac{\Sigma{(x-\overline{x})(y-\overline{y})}}{\sqrt{\Sigma{(x-\overline{x})^2}\Sigma{(y-\overline{y})^2}}}$$

From Section XX, we know that $\Sigma{(x-\overline{x})}^2$ is the sum of squares  of the variable <em>x </em><span style="text-indent: 18.6667px;font-size: 14pt">(so, <em>SS<sub>x</sub></em>)</span><span style="text-indent: 1em;font-size: 14pt">; by analogy, </span><span style="text-indent: 1em;font-size: 14pt"> $\Sigma{(y-\overline{y})}^2$ will be the sum of squares of the variable <em>y</em> (so, <em>SS<sub>y</sub></em>). When the distances between an observation and the two means are "cross-multiplied" before summing (like in the numerator), they are called the sum of products (<em>SP<sub>xy</sub></em>). </span>

Thus we can restate the formula above in the following simplified (and easier to remember) way<span style="text-indent: 1em;font-size: 14pt">[footnote]Note that other "versions" of the formula for</span><span style="text-indent: 1em;font-size: 14pt"> </span><em style="text-indent: 1em;font-size: 14pt">r</em><span style="text-indent: 1em;font-size: 14pt"> </span><span style="text-indent: 1em;font-size: 14pt">exist. All of them calculate the same r, but are just restated in different term. The two "versions" presented in the text above are the simplest. For example, one of the most common ways to express <em>r</em> you may find elsewhere (but which is rather hard on the eyes and for purposes of calculation by hand) is this:</span>

<span style="text-indent: 1em;font-size: 14pt"><span style="text-indent: 1em;font-size: 14pt">$$r=\frac{N\Sigma{xy} -\Sigma{x}\Sigma{y}}{\sqrt{N\Sigma{x^2}-(\Sigma{x})^2)(N\Sigma{y^2}-(\Sigma{y})^2}}$$</span></span>

<span style="text-indent: 1em;font-size: 14pt"><span style="text-indent: 1em;font-size: 14pt">[/footnote]</span>:</span>

$$r=\frac{SP_{xy}}{\sqrt{SS_x SS_y}}$$

Example XX provides an empirical application of <em>r</em>'s calculation.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XXA Education and Parental Education, GSS 2018</em></p>

</header>
<div class="textbox__content">

Table XX lists the years of schooling (our variable <em>y</em>) of seven respondents in the <em>GSS 2018</em> (NORC, 2019) and the years of schooling of their respective fathers (our variable <em>x</em>)[footnote]Here <em>parental education</em> is the independent variable and <em>respondent's education</em> is the dependent variable, so they are denoted as <em>x</em> and <em>y</em>, respectively, according to convention. [/footnote]. While inference with <em>N</em>=7 is not a serious proposition, the small observation count allows for a quick calculation for demonstration purposes only. (After all, we already know the correlation coefficient of these exact same two variables from Section 7.2.3; there the SPSS-calculated <em>r</em>=0.413.)

The rest of the columns in Table XX list the necessary computations (obtaining distances from the mean, squaring distances, summing distances, etc.) to produce <em>SS<sub>x</sub>, SS<sub>y</sub></em>, and <em>SP<sub>xy</sub>.</em>

<em>Table XX Calculating Pearson's r</em>
<table style="border-collapse: collapse;width: 100%;height: 289px" border="0">
<tbody>
<tr style="height: 30px">
<td style="width: 1.41643%;height: 30px;text-align: center">$x$</td>
<td style="width: 2.31468%;height: 30px;text-align: center">$y$</td>
<td style="width: 21.1225%;height: 30px;text-align: center"> $(x-\overline{x})$</td>
<td style="width: 18.0203%;height: 30px;text-align: center">$(x-\overline{x})^2$</td>
<td style="width: 21.4724%;height: 30px;text-align: center">$(y-\overline{y})$</td>
<td style="width: 16.8352%;height: 30px;text-align: center">$(y-\overline{y})^2$</td>
<td style="width: 18.8183%;height: 30px;text-align: center"> $(x-\overline{x})(y-\overline{y})$</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">12</td>
<td style="width: 2.31468%;height: 15px;text-align: center">8</td>
<td style="width: 21.1225%;height: 15px;text-align: center">(12-12.4) = -0.4</td>
<td style="width: 18.0203%;height: 15px;text-align: center">-0.4<sup>2 </sup>= 0.2</td>
<td style="width: 21.4724%;height: 15px;text-align: center">(8-13.6) = 5.6</td>
<td style="width: 16.8352%;height: 15px;text-align: center">5.6<sup>2 </sup>= 31.4</td>
<td style="width: 18.8183%;height: 15px;text-align: center">(-0.4)(5.6)=2.2</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">6</td>
<td style="width: 2.31468%;height: 15px;text-align: center">12</td>
<td style="width: 21.1225%;height: 15px;text-align: center">(6-12.4) = -6.4</td>
<td style="width: 18.0203%;height: 15px;text-align: center">-6.4<sup>2 </sup>= 41</td>
<td style="width: 21.4724%;height: 15px;text-align: center">(12-13.6) = -1.6</td>
<td style="width: 16.8352%;height: 15px;text-align: center">-1.6<sup>2 </sup>= 2.6</td>
<td style="width: 18.8183%;height: 15px;text-align: center">(-6.4)(1.6)=10.2</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">12</td>
<td style="width: 2.31468%;height: 15px;text-align: center">19</td>
<td style="width: 21.1225%;height: 15px;text-align: center">(12-12.4) = -0.4</td>
<td style="width: 18.0203%;height: 15px;text-align: center">-0.4<sup>2 </sup>= 0.2</td>
<td style="width: 21.4724%;height: 15px;text-align: center">(19-13.6) = 5.4</td>
<td style="width: 16.8352%;height: 15px;text-align: center">5.4<sup>2 </sup>= 29.2</td>
<td style="width: 18.8183%;height: 15px;text-align: center">(-0.4)(5.4)=-2.2</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">16</td>
<td style="width: 2.31468%;height: 15px;text-align: center">16</td>
<td style="width: 21.1225%;height: 15px;text-align: center">(16-12.4) = 3.6</td>
<td style="width: 18.0203%;height: 15px;text-align: center">3.6<sup>2 </sup>= 13</td>
<td style="width: 21.4724%;height: 15px;text-align: center">(16-13.6) = 2.4</td>
<td style="width: 16.8352%;height: 15px;text-align: center">2.4<sup>2 </sup>= 5.8</td>
<td style="width: 18.8183%;height: 15px;text-align: center">(3.6)(2.4)=8.6</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">15</td>
<td style="width: 2.31468%;height: 15px;text-align: center">12</td>
<td style="width: 21.1225%;height: 15px;text-align: center">(15-12.4) = 2.6</td>
<td style="width: 18.0203%;height: 15px;text-align: center">2.6<sup>2 </sup>= 6.8</td>
<td style="width: 21.4724%;height: 15px;text-align: center">(12-13.6) = -1.6</td>
<td style="width: 16.8352%;height: 15px;text-align: center">-1.6<sup>2 </sup>= 2.6</td>
<td style="width: 18.8183%;height: 15px;text-align: center">(2.6)(-1.6)=-4.2</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">12</td>
<td style="width: 2.31468%;height: 15px;text-align: center">12</td>
<td style="width: 21.1225%;height: 15px;text-align: center">(12-12.4) = -0.4</td>
<td style="width: 18.0203%;height: 15px;text-align: center">-0.4<sup>2 </sup>= 0.2</td>
<td style="width: 21.4724%;height: 15px;text-align: center">(12-13.6) = -1.6</td>
<td style="width: 16.8352%;height: 15px;text-align: center">-1.6<sup>2 </sup>= 2.6</td>
<td style="width: 18.8183%;height: 15px;text-align: center">(-0.4)(-1.6)=0.6</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">14</td>
<td style="width: 2.31468%;height: 15px;text-align: center">16</td>
<td style="width: 21.1225%;height: 15px;text-align: center">(14-12.4) = 1.6</td>
<td style="width: 18.0203%;height: 15px;text-align: center">1.6<sup>2 </sup>= 2.6</td>
<td style="width: 21.4724%;height: 15px;text-align: center">(16-13.6) = 2.4</td>
<td style="width: 16.8352%;height: 15px;text-align: center">2.4<sup>2 </sup>= 5.8</td>
<td style="width: 18.8183%;height: 15px;text-align: center">(1.6)(2.4)=3.8</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center"><img src="https://pressbooks.bccampus.ca/simplestats/wp-content/ql-cache/quicklatex.com-0d00c2da2b2541a97ae0ac3c10e1504e_l3.svg" alt="\overline{x}" />12.4</td>
<td style="width: 2.31468%;height: 15px;text-align: center"><img src="https://pressbooks.bccampus.ca/simplestats/wp-content/ql-cache/quicklatex.com-01881adf9c51d256ce0a5af82c2e7024_l3.svg" alt="\overline{y}" />13.6</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"><strong><em>SS<sub>x</sub></em>=63.7</strong></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"><strong><em>SS<sub>y</sub></em>=79.7</strong></td>
<td style="width: 18.8183%;height: 15px;text-align: center"><strong><em>SP<sub>xy</sub></em>=19.3</strong></td>
</tr>
</tbody>
</table>
Then, according to the formula for <em>r</em> we have:

$$r=\frac{SP_{xy}}{\sqrt{SS_x SS_y}}=\frac{19.3}{\sqrt{63.7\times79.7}}=\frac{19.3}{71.3}=0.271$$

Obviously, this <em>r</em>=0.271 is not the same as the SPSS-produced <em>r</em>=0.413 we had from Section 7.2.3; in fact, it would be very surprising if they were the same, considering the former is based on <em>N</em>=7 while the latter is based on <em>N</em>=1,687. The exact value of r in the above calculation (<em>r</em>=0.271) doesn't matter, and doesn't serve any purpose and shouldn't be interpreted as it exists only as the end result of our demonstration.

</div>
</div>
Fancy trying it out on your own?
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It!! XX Calculating Pearson's r</em></p>

</header>
<div class="textbox__content">

Here are 7 more cases from the same GSS 2018 dataset. Fill out the table fully and produce <em>r</em>.
<table style="border-collapse: collapse;width: 100%;height: 289px" border="0">
<tbody>
<tr style="height: 30px">
<td style="width: 1.41643%;height: 30px;text-align: center">$x$</td>
<td style="width: 2.31468%;height: 30px;text-align: center">$y$</td>
<td style="width: 21.1225%;height: 30px;text-align: center"> $(x-\overline{x})$</td>
<td style="width: 18.0203%;height: 30px;text-align: center">$(x-\overline{x})^2$</td>
<td style="width: 21.4724%;height: 30px;text-align: center">$(y-\overline{y})$</td>
<td style="width: 16.8352%;height: 30px;text-align: center">$(y-\overline{y})^2$</td>
<td style="width: 18.8183%;height: 30px;text-align: center"> $(x-\overline{x})(y-\overline{y})$</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">12</td>
<td style="width: 2.31468%;height: 15px;text-align: center">12</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"></td>
<td style="width: 18.8183%;height: 15px;text-align: center"></td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">12</td>
<td style="width: 2.31468%;height: 15px;text-align: center">14</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"></td>
<td style="width: 18.8183%;height: 15px;text-align: center"></td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">13</td>
<td style="width: 2.31468%;height: 15px;text-align: center">13</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"></td>
<td style="width: 18.8183%;height: 15px;text-align: center"></td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">13</td>
<td style="width: 2.31468%;height: 15px;text-align: center">16</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"></td>
<td style="width: 18.8183%;height: 15px;text-align: center"></td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">14</td>
<td style="width: 2.31468%;height: 15px;text-align: center">20</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"></td>
<td style="width: 18.8183%;height: 15px;text-align: center"></td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">20</td>
<td style="width: 2.31468%;height: 15px;text-align: center">16</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"></td>
<td style="width: 18.8183%;height: 15px;text-align: center"></td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">21</td>
<td style="width: 2.31468%;height: 15px;text-align: center">18</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"></td>
<td style="width: 18.8183%;height: 15px;text-align: center"></td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center"><img src="https://pressbooks.bccampus.ca/simplestats/wp-content/ql-cache/quicklatex.com-0d00c2da2b2541a97ae0ac3c10e1504e_l3.svg" alt="\overline{x}" />=</td>
<td style="width: 2.31468%;height: 15px;text-align: center"><img src="https://pressbooks.bccampus.ca/simplestats/wp-content/ql-cache/quicklatex.com-01881adf9c51d256ce0a5af82c2e7024_l3.svg" alt="\overline{y}" />=</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"><strong><em>SS<sub>x</sub></em>=</strong></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"><strong><em>SS<sub>y</sub></em>=</strong></td>
<td style="width: 18.8183%;height: 15px;text-align: center"><strong><em>SP<sub>xy</sub></em>=</strong></td>
</tr>
</tbody>
</table>
</div>
</div>
Even if we dismiss the value of the<em> N</em>=7 coefficients and go back to <em>r</em>=0.413 based on <em>N</em>=1,687, we still want to know if this correlation observed in the sample is statistically significant (i.e., generalizable to the population). Thus, we need to test <em>r</em>, and we do that through a <em>t</em>-test.

<strong>The <em>t</em>-test for Pearson's <em>r</em> is</strong> given by the following formula:

$$t=\frac{r\sqrt{N-2}}{\sqrt{1-r^2}}$$

with <em>df</em>=<em>N</em>-2.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XXB <em>Testing the</em> <em>Education and Parental Education Correlation, GSS 2018</em></p>

</header>
<div class="textbox__content">

As usual, it helps to know what we are testing exactly:
<ul>
 	<li>H<sub>0</sub>: There is no correlation between parental and offspring education; <em>ρ</em>=0.</li>
 	<li>H<sub>a</sub>: There is a correlation between parental ad offspring education; <em>ρ</em>≠0.</li>
</ul>
Then, for <em>N</em>=1,687 and <em>r</em>=0.413, we have:

$$t=\frac{r\sqrt{N-2}}{\sqrt{1-r^2}}=\frac{0.413\sqrt{1687-2}}{\sqrt{1-0.413^2}}=\frac{0.413(41.1)}{0.911}=18.633$$

<strong>With <em>t</em>=18.633, <em>df</em>=1,685, and <em>p</em>=0.00001 (i.e., <em>p</em>=0.00001&lt;0.5), we can reject the null hypothesis that parental and offspring education are not correlated. At this time, we have enough evidence to conclude that there is a moderately weak (<em>r</em>=0.413), statistically significant correlation between parental education and offspring education in the US population</strong>[footnote]Purely for demonstration purposes, we could also calculate the<em> t</em> for the 7 respondents whose responses we used to calculate <em>r</em>=0.271:

$$t=\frac{r\sqrt{N-2}}{\sqrt{1-r^2}}=\frac{0.271\sqrt{7-2}}{\sqrt{1-0.271^2}}=\frac{0.271(5)}{0.963}=1.407$$

In this case, we could interpret the results like this: "With <em>t</em>=1.407, <em>df</em>=5, and <em>p</em>=0.218 (i.e., <em>p</em>=0.218&gt;0.5), we cannot reject the null hypothesis that parental and offspring education are not correlated. At this time, we do not have enough evidence to conclude that there is a statistically significant correlation between parental education and offspring education in the US population." However, we cannot trust this "inference" as it's only based on <em>N</em>=7.[/footnote]<strong>.</strong>

</div>
</div>
With this, we can have established (with 99% certainty) that parental education and offspring association are correlated. Considering that parents tend to have their schooling done before their children have theirs, on average, it's also reasonable to assume that parental education affects offspring education (and not vice versa)[footnote]In terms of establishing causality, we are limited by the bivariate case we have: it's entirely possible(and expected) that other things affect offspring education too, not just their parents' education. As well, it's possible than something else (for example, wealth, income, socioecoomic class, etc.) might be affecting both parental and offspring education, rendering the effect of parental educaion on offspring education spurious. These type of considerations are exactly the purpose of mutlivariate analysis, but since we are dealing with bivariate analysis here, we have to leave these considerations aside. I bring them up here to remind you not to forget them in the discussion that follows, which will focus on the two variables at hand. [/footnote].

Wouldn't then be good to know <em>exactly how much</em> effect parental education has on offspring education? That is, wouldn't you like to know that if a father had one more year of schooling compared to another father, how much more schooling the child of the former would be expected to have than the child of the latter? One type of regression -- called <em>linear regression</em> -- can tell us just that.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>130</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:08:14]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:08:14]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[10-1-correlation]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>128</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>10.2 Basics of Linear Regression</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-basics-of-linear-regression/</link>
		<pubDate>Wed, 31 Oct 2018 22:09:03 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=132</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

You may find it surprising but you already <em>have</em> an idea about linear regression from Section 7.2.3. Again, when describing and examining the association between two continuous variables, we can use the correlation coefficient <em>r</em> and a scatterplot plotting the observations in a coordinate system. To visualize the linear relationship between the variables, we could also add a <em>line of best fit</em> to the scatterplot. The line of best fit is actually also called a <em>regression line</em>; and regression itself is based upon the concepts of correlation and variance, with which you are already familiar.

You might be asking yourselves at this point what regression adds to the analysis of two continuous variables, or in other words, why do we even need it -- don't we already have Pearson's for that? As you will see in the examples below,<strong> linear regression allows us to precisely calculate and predict a change in the <em>dependent</em> variable that is due to the<em> independent</em> variable</strong>.

What we say in this case is that <strong>the independent variable <em>explains</em> a percentage of the variance of the dependent variable</strong>. Think about it this way: the dependent variable varies due to arguably many causes (i.e., independent variables), which affect it to a different extent and which each explain some part of its total variance. <strong>Through linear regression, we are able to quantify to what extent an independent variable explains the variability of the dependent variable, i.e., to what extent it affects it.</strong>[footnote]Multivariate regression thus allows for direct comparisons of the size of the independent variables' effects. In the bivariate case, we only focus on the effect of<em> one</em> independent variable, without considering and accounting for others -- which isn't something you should do in a real-life social science research, especially in terms of causal analysis. Again, the bivariate case serves only as an illustration/introduction to the expansive topic of regression in general.[/footnote] To take the example about parental and offspring education from the previous section, doing a regression analysis on these two variables would allow us to predict how much more education a respondent is expected to have for one more year of schooling for the parent[footnote]Or, to put it differently, if one father has one more year of schooling than anther father, how much more schooling the offspring of the first would be expected to have in comparison to the offspring of the second.[/footnote] (father, in our case), and what percent of respondent's schooling is explained by the years of education of the parent.

How does linear regression do all that? To put it simply, through the regression line (of best fit), or more precisely, through the way the regression line is created.

<strong>The linear function.</strong> How do you draw a line? The simplest method requires exactly two pieces of information: a starting point of the line, and an indicator of slope (so that you know whether the line is straight, sloping upward, or sloping downward). This is captured in the following formula:

$$y=\alpha + \beta x$$

where <em>α</em> is the line's starting point and <em>β</em> is the slope of the line. The two variables, <em>x</em> and <em>y</em>, are the independent and the dependent variable, respectively: we know this because the formula establishes <em>y</em> as a <em>function</em> of <em>x</em> (i.e., if we know <em>α</em> and <em>β</em>, we can calculate <em>y</em> for any value of <em>x</em>).

Let's take a brief example.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XX Class Assignment Mark</em></p>

</header>
<div class="textbox__content">

Imagine you are given written, take-home assignment in some class. Your professor has stipulated that there are three part of the assignment, each worth 30 points, and that you'll receive 10 points just for turning in your work.

In this case, your assignment mark is entirely a function of your submitted work. You'll be getting 10 points to start with, then 30 points for fulfilling each of the three requirements. The class grades on the submitted assignments could thus be 10 points  (0 completed requirements), 40 points (1 completed requirement),  70 points (2 completed requirements), and 100 points (3 completed requirements). Figure XX plots this.

<em>Figure XX Assignment Mark as a Function of Completed Requirements</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-markA-1.png" alt="" width="462" height="370" class="wp-image-1344 size-full aligncenter" />

</div>
&nbsp;
<div class="textbox__content">

As you can see, the relationship between the two variables, <em>assignment requirements completed</em> and <em>assignment mark</em>, is simply

$$y=10+30x$$

as helpfully shown in the graph itself. This is a summary form of having to write out all the observations:
<ul>
 	<li>when x=0, $y=10+30x=10+30\times 0=10+0=10$</li>
 	<li>when x=1, $y=10+30x= 10+30\times 1=10+30=40$</li>
 	<li>when x=2, $y=10+30x= 10+30\times 2=10+60=70$</li>
 	<li>when x=3, $y=10+30x= 10+30\times 3=10+90=100$[footnote]Of course, to draw a line you only really need <em>two</em> points. Thus if you only take <em>x</em>=0/<em>y</em>=10 and <em>x</em>=3/<em>y</em>=100 and connect these points with a line, the line will also pass through <em>x</em>=1/<em>y</em>=40 and <em>x</em>=2/<em>y</em>=70. This is a useful property if you need to draw a line by hand.[/footnote].</li>
</ul>
</div>
</div>
In the example above <em>α</em>=10 and <em>β</em>=30: the line starts at <em>x</em>=0 and <em>y</em>=10, and for each additional unit of <em>x</em> (i.e., each additional requirement completed), <em>y</em> increases by 30 points.

In fact, these are the exact definitions of <em>α</em> and <em>β</em>. That is, <strong><em>α</em> is the value of <em>y</em> when <em>x</em>=0, also called <em>Y-intercept</em></strong> (as it shows where the regression line crosses the vertical <em>Y</em>-axis), and <strong><em>β</em> is the <em>slope</em>, also called the <em>regression coefficient</em>, i.e., the amount of change in the dependent variable <em>y</em> expected for every unit change in the independent variable<em> x</em> (or simply, the size of the effect of <em>x</em> on <em>y</em>)</strong>.

Let's take a look at the regression model in detail.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>132</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:09:03]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:09:03]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[10-2-basics-of-linear-regression]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>128</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>10.2.1. The Linear Regression Model and the Line of Best Fit</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-1-the-linear-regression-model/</link>
		<pubDate>Wed, 31 Oct 2018 22:13:37 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=135</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

You might have noticed that there was no uncertainty of any kind in the Example XX about the assignment requirements and mark in the previous section. The line in that case represented a <em>deterministic</em> relationship -- <em>x</em> fully determined <em>y</em> (i.e.,<em> x</em> fully explained the variability of <em>y</em>) -- hence all the observation were on the line itself.

As such, this wasn't a typical situation and this wasn't a typical <em>regression</em> line. In reality, in statistical inference we deal with <em>probabilistic</em> associations, where the regression line does <em>not</em> capture all observations in itself but their general (on average) trend. That is, in a usual regression model situation, some observations will be above the line and some below it; thus some observations would be <em>underestimated</em> and others would be <em>overestimated</em> because <strong>the line serves as a <em>prediction</em> </strong>(an expectation, a summary, a trend) of the association. And as we know by now, predictions/estimations always contain a level of uncertainty.

Specifically, we cannot expect that a single independent variable <em>x</em> will explain away <em>all</em> variability in a dependent variable <em>y</em>; there will always be some unexplained (by the regression model) variability left. Figure XX illustrates.

<em>Figure XX Assignment Mark as a Function of Completed Requirements (With Variance)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-mark-with-variability.png" alt="" width="462" height="370" class="wp-image-1345 size-full aligncenter" />

In Figure XX I have added seven more observations to the case we had in Figure XX in the previous section, this time allowing for additional variability in the assignment marks: no longer is it enough to know the number of requirements completed to predict the assignment grade. (Imagine that the professor has started evaluating the completed requirements substantively, not just counting them: in this case while the number of requirements is still essential for the grade, <em>something else</em>[footnote]This <em>something else</em> is an 'unobserved variable', or a variable not included in the model (even though we could speculate about it). This type of unobserved variable/s is the source for the unexplained variance in <em>y</em>.[/footnote] also affects the final assignment mark.)

An actual <strong>regression model accommodates the uncertainty inherent in estimation through two interrelated concepts, <em>error of prediction</em> (a.k.a. statistical error) and <em>residuals</em>.</strong>

<strong>The <em>error of prediction</em> reflects the difference between the observations and the predicted values we would have if we had data about the population.</strong> That is, if we imagined a line of best fit of the population[footnote]This line of course doesn't exist, it's a heuristic device.[/footnote], <em>α+βx</em>, the difference between our observations and that line would be:

$$y-(\alpha+\beta x)=\epsilon$$ = <em>error of prediction</em>[footnote]This is the small-case Greek letter <em>e</em>, <em>ε</em> [EHpsilon].[/footnote]

That is, we need to include the error term in the regression model:

$$y=\alpa+\beta x +\epsilon$$

Considering that we pretty much never have information about the population, however, we can restate <strong>the <em>sample</em> regression model like this</strong>:

$$y=a+bx+e$$

<strong>where <em>a</em> is the estimated<em> α</em>, <em>b</em> is the estimated <em>β</em>, and <em>e</em> is the estimated <em>ε</em>, with all estimations based on sample data. Note that <em>e</em> here is called the <em>residual</em>, and it is not only the estimation of the unobservable error of prediction, but also simply difference between an observation and its predicted value</strong>:

$$y-(a+bx)=e$$ = <em>residual</em>

Since <em>a+bx</em> is the regression line, or the prediction, it also stands for the predicted (estimated values), which we can, as usual, denote $\hat{y}$. Then, since

$$\hat{y}=a+bx$$,

we also have

$$y-\hat{y}=e$$

or, again, that <strong>the residuals are the difference between the observations and their predicted values.</strong>

With this, we come at a full circle and the reason for all the notation and protracted explanations above (and here you thought I was subjecting you to all these equations without a purpose): in a graph, <strong>the residuals are simply the distance between the observations and the regression line</strong>. (In Figure XX that's the empty space -- the shortest distance -- between an observation and the regression line.)

A comprehensive treatment of the residuals (through a full-blown analysis of variance) is beyond the scope of this book but they do help us understand the nature of the regression line and of the logic of regression in general. You see, <strong>the regression line is called a line of <em>best fit</em> precisely because it <em>minimizes</em> the residuals</strong> -- it is created in such a way, as to minimize the residuals (and therefore the error of prediction) and fit the data/observations as best as possible. Visually, this will mean that the line is drawn to pass <em>as close as possible</em> to all the observations.

In fact, <strong>linear regression is also called <em>OLS regression</em>, which stands for <em>ordinary least squares</em>.</strong> The<em> least squares</em> concept comes from the fact that to minimize the distances of the observations to the prediction line, we need to first square them before adding them together[footnote]I.e., $\Sigma{(y-\hat{y})^2}$.[/footnote] -- just like we needed to do that in the calculation of the variance and the sum of squares (or the distances would cancel each other out)[footnote]The <em>ordinary</em> part is there to differentiate between another regression version called <em>generalized least squares regression</em>, or <em>GLS</em> regression (not discussed here).[/footnote].

But how do we ensure that the regression line minimizes the residuals? The next section explains.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>135</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:13:37]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:13:37]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[10-2-1-the-linear-regression-model]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>128</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[10-2-1-elements]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>10.2.3 R-squared</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-3-r-squared/</link>
		<pubDate>Wed, 31 Oct 2018 22:13:57 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=137</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

In the previous section we established that the correlation coefficient <em>r</em> and the regression coefficient <em>b</em> are related:

$$b=r\frac{s_y}{s_x}$$

And how could they not be: if a slope exists, correlation exists. As such, the standard regression output provided by SPSS includes a <em>Model Summary</em> table that lists the Pearson's <em>r</em>. Table XX below is the <em>Model Summary</em> table of the simulated-data class attendance/final class scores regression.

<em>Table XX R and R<sup>2</sup> for Class Attendance and Final Class Scores</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/r2-class-attendance-scores-full.png" alt="" width="395" height="123" class="wp-image-1385 size-full aligncenter" />

Pearson's <em>r</em> (listed as <em>R </em>above) in this table is, of course, exactly the same as what the SPSS <em>Correlate</em> procedure provides. Squaring that number, however, provides us with a new and useful piece of information, sometimes called <strong>the <em>coefficient of determination</em>, but more often simply referred to as<em> R<sup>2</sup></em></strong>.

$$r\times r=R^2$$

<strong style="font-size: 14pt;text-indent: 18.6667px"><em>R<sup>2</sup></em> provides a measure of the proportion of the variability in the dependent variable explained by the independent variable[</strong>footnote]Or, independent variable<strong>s</strong>, in the case of multivariate regression.[/footnote] <strong>in the model.</strong>

$$R^2=\frac{explained~variation~of~y}{total~variation~of~y}$$

Recall that regression's logic is based on minimizing residuals/errors and about explaining the variation of the dependent variable through information about the independent variable. In a deterministic case, the dependent variable will depend entirely on the independent one, then we'd have a correlation of 1 and <em>R<sup>2</sup></em>=1. However, with uncertainty and estimation, this is not the case -- some variability of the dependent variable remains unexplained by the regression model (i.e., the independent variable).

Thus, one way to look at <em>R<sup>2</sup></em> is as an indication of <em>goodness of fit</em>: how close the observations are fitted around the regression line (i.e., how little variability is left unexplained). The larger R2 then, the better -- as a large <em>R<sup>2</sup></em> would mean the model (the independent variable/s) explains a large proportion of the variability in the dependent variable.

As you can see in Table XX, the <em>R<sup>2</sup></em> of the class attendance/final test scores is:

$$r\times r=0.849^2=0.721=R^2$$

Or, class attendance explains 72.1% of the variability in final test scores, which is a lot, and quite good regression fit[footnote]Of course, this also means that (100-72.1=) 27.9% of the variation in test scores is left unexplained by class attendance, i.e., is due to something else beyond class attendance.[/footnote].

Compare this to the Model Summary table of respondent's and father's years of schooling in Table XX below.

<em>Table XX R and R<sup>2</sup> for Respondent's and Father's Years of Schooling</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/r2-educ-paeduc.png" alt="" width="392" height="135" class="wp-image-1394 size-full aligncenter" />

Unlike the very strong correlation of <em>r</em>=0.849, the moderately weak correlation coefficient <em>r</em>=0.413 is already an indication of not that great a fit. Thus, the <em>R<sup>2</sup></em> of offspring and parental education is:

$$r\times r=0.413^2=0.170=R^2$$

That is, fathers' years of schooling explain only 17% of the variation of respondents' years of schooling. The biggest 'chunk' of the variation in schooling is left unexplained, i.e., there are other factors influencing how much education one is expected to have, on average. Regardless, we shouldn't dismiss parental education outright -- it still has a statistically significant effect on offspring education (albeit not very strong).

. . . Or does it? Recall our discussion on causality. The fact that two variables are statistically associated doesn't necessarily mean that one causes the other to change (or, that it explains the other's variability). Working with two variables only prevents us from accounting for alternative explanations -- i.e., of taking into account other factors, other variables, other effects. Luckily, regression has our backs. I leave you with how that happens in the next -- <em>final!</em> -- section of this textbook.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>137</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:13:57]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:13:57]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[10-2-3-r-squared]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>128</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[10-2-2-r-squared]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>10.2.4 What Lies Ahead: Multiple Regression</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-4-what-lies-ahead-multiple-regression/</link>
		<pubDate>Wed, 31 Oct 2018 22:16:35 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=143</guid>
		<description></description>
		<content:encoded><![CDATA[Don't worry, this is but a brief farewell. Do me a last favour and imagine we had more ideas about why students end up with different final test scores, or why people end up with different number of years of education. On other words, what else could possibly explain some of the variability in the dependent variables we've been interested so far?

In the case of test scores, perhaps hours of independent study? Doing end-of-chapter exercises? How many classes in total the student is taking that semester? Does the student work for pay? Have they recently experienced problems related to their personal life? Do they have dependents of which they have to take care at home? Is English their native language? Are they international students? What is their area of study? . . . And so on, and so on, I'm certain you can add more on your own.

In the case of years of schooling, perhaps the family's socioeconomic status? Wealth? Gender? Race/ethnicity? Citizenship status? Attitudes toward education? The presence of role models? Being passionate about a field of study? Go on, and again your ideas to the list.

If there are so many factors that can affect a (dependent) variable, how do we examine their effects? Bivariately, one by one? While this is a good first step (to establish <em>something</em> is going on), obviously that cannot be the end of our analysis. We<em> have</em> to be able to account for all of them at the same time, to compare their effects, and to create more complicated models which <em>together</em> to explain more variability in the dependent variable.

Multiple regression allows us to do just that. Instead of <em>one</em> independent variable <em>x</em>, we can consider many independent variables at the same time. Then, the effect of each single variable is provided <em>net</em> of the effects of the other variables (or we say that we <em>control for</em> the other variables), so that we can simultaneously take care of alternative explanations. In this way, a variable's effect on <em>y</em> may be decreased or increased (from what it used to be in the bivariate case), and its statistical significance may disappear (or even appear, in some cases). In any case, this effect would likely be 'truer' than the one obtained bivariately (though this of course depends on the choice of variable controls).

And this is where you will be going, if you choose to continue on the statistics path. If I said there is a lot more to learn it would be a gross understatement -- but, given what statistics (<em>proper</em> use of statistics!) enables you to do in social research, it's absolutely and totally worth it.

If you choose not to continue[footnote]I'll be crushed. Don't let me know. [/footnote], then use what statistical knowledge you already have, and use it responsibly (great power, and all that).[footnote]<span style="text-indent: 18.6667px;font-size: 14pt">Either way, here you are, in the last section -- you survived! (Possibly even with your sanity mostly intact.) Go celebrate!</span><span style="text-indent: 1em;font-size: 14pt">[/footnote]</span>

<span style="text-indent: 1em;font-size: 14pt">With this, I bid you adieu. </span>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>143</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:16:35]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:16:35]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[10-2-4-what-lies-ahead-multiple-regression]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>128</wp:post_parent>
		<wp:menu_order>6</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>8.1. Causality</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/8-1-causality/</link>
		<pubDate>Wed, 31 Oct 2018 22:18:13 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=146</guid>
		<description></description>
		<content:encoded><![CDATA[From the start, I need to make one thing clear: regardless if observed only in sample data or generalizable to populations, so far we have only discussed <em>statistical</em> associations.

<em>Well, what kind of </em>other<em> associations could we discuss</em>, I can imagine you grumbling, <em>it's a </em>statistics<em> textbook</em> -- <em>of </em>course<em> the associations will be statistical!</em>

You are correct, of course, but (you knew there will be a "but") -- "statistical" here has a <em>very</em> narrow meaning, something most people unfamiliar with statistics seem unaware of and thus interpreting to mean a lot more than it actually does.

You see, <em>statistical association</em> refers only to whether there is a pattern in the data or not; whether certain attributes of one variable tend to go with specific attributes of another variable. In no way does this imply that one variable is what it is <em>because</em> of another, or that a change in one <em>causes</em> another variable to change, or that a variable is dependent on another.

If we can state any of these, we make a much stronger claim -- one of <em>causality</em> -- and the associations are then called <em>causal</em>[footnote]Please make sure you don't confuse causal ['KO-zal] and causality [ko-'ZA-liti] with casual ['KEH-jwal] and casuality (which doesn't exist).[/footnote]. When we have a causal association, we call one variable<em> independent</em> and the other <em>dependent</em>[footnote]You can think of the independent variable (i.e., the cause) as free to vary on its own; with or without the dependent variable, the independent is what it is. On the other hand, the dependent variable (the effect) varies <em>because</em> of the independent one, that's why it's called <em>dependent</em>. (Note that it's<em> depend<strong>e</strong>nt </em>variable and not <em>depend<strong>a</strong>nt</em>. The latter applies to people who are economically supported by others, like children are dependants of their parents.)[/footnote].

See if you can differentiate statistical and causal associations. Smoking is associated with lung cancer: people who smoke (or smoke more) have lung cancer at higher rates than those who don't. Smoking <em>causes</em> lung cancer: smokers are <em>more likely</em> to get lung cancer <em>because</em> of the fact that they smoke. Class attendance and test scores are associated: students who attend more classes have higher test scores. Test scores are <em>dependent</em> on class attendance: coming to class more often is partly<em> responsible</em> for higher test scores. Parental education and offspring education are positively correlated: higher levels of parental schooling are associated with higher levels of schooling for the offspring. Individuals with higher levels of schooling have more education <em>because</em> their parents were better educated themselves.

The first sentence in any of the examples in the previous paragraph was a statement of statistical association, the second statement was one of causality. If they generally sound the same to you, you should start paying more explicit attention to phrasing, specifically how the claims of association are put into words. As the one of most often-quoted sayings in statistics goes, <strong>correlation is <em>not</em> causation</strong>. Apart from urging caution in interpreting results, it also brings attention to how careful researchers must be when reporting results and conclusions in order to not overstate their claims.

What is the main difference between statistical association[footnote]While <span style="text-indent: 18.6667px;font-size: 14pt">many times </span><span style="text-indent: 1em"><span style="font-size: 14pt">the words </span><em style="font-size: 14pt">association</em><span style="font-size: 14pt"> and</span><em style="font-size: 14pt"> correlation</em><span style="font-size: 14pt"> are  used </span><span style="font-size: 18.6667px">interchangeably</span><span style="font-size: 14pt">, I prefer to use <em>correlation</em> only in relation to continuous variables in the context of the correlation coefficient. Referring to any statistical association as <em>correlation</em>, however, is not technically wrong; the usage is simply a matter of preference.[/footnote] and causation? Briefly, the method of establishing either; what is necessary for us to be able to claim one or the other.</span></span>

Establishing a statistical association between two variables is relatively straightforward and easy: there are tests for that (as we shall shortly see)[footnote]Of course, it's not as easy as I present it further in this text. As an introduction to the topic, however, it will suffice. My point is that relative to establishing causality, it is easier.[/footnote]. Establishing a causal association between two variables (especially in the social sciences), on the other hand, is notoriously hard.

<span style="text-indent: 18.6667px;font-size: 14pt"><strong>Criteria for establishing causality.</strong> </span>There are three basic requirements for establishing causal associations, and an additional, overarching one related to the logic of research as a whole.
<ol>
 	<li><strong>Does the variable we claim is the <em>cause</em> come before the variable we claim as an <em>effect</em> in time? </strong></li>
</ol>
This requirement is also known as <strong><em>temporal precedence</em></strong> -- that is, <strong>whether the potential cause happens before the potential outcome</strong>. It is squarely based on logic: after all, an outcome cannot logically precede its cause. You can't take a test on the first day of class, and claim that your test score was due to your attending class or being absent later in the semester: that's not how time works. Similarly, you cannot claim that the bachelor's degree you will get in the near future is somehow responsible for your parents college degrees from twenty or so years ago.

While in these examples the temporal precedence is crystal clear, keep in mind that this is not always the case. There are plenty of opportunities in social research when it's difficult to adjudicate which one of a pair of variables came first, as well as cases of mutual causality and reverse causality (to what is being claimed). Without getting into too much detail, take for example, the popular finding [citations? Waite?] that married people tend to be happier, on average. One can easily conclude that marriage promotes happiness. But what if happier people tend to have more successful relationships leading to marriage and a related propensity to stay married? Which one, marriage or happiness, is the cause of the association and which one the outcome? Further analysis and investigation of the variables association is necessary in such case (and even that might not lead to definite conclusion).

<strong>2. Are the two variables statistically associated?</strong>

This provides further evidence that statistical association is different from causation by listing the presence of a statistical association as a necessary requirement for establishing causality, among others. In short, <strong>the presence of a statistical association between two variables is a <em>necessary</em> <em>but not sufficient</em> condition for claiming causality</strong>.

Why it's necessary should be obvious: we cannot claim that we have a variable we think is a cause to a potential outcome variable, if we have no evidence whatsoever that they are statistically associated in the first place. Otherwise, if there is no observable pattern between the values/categories of the two variables, how can we claim that changes in one variable <em>cause</em> changes in the other? Again, logically, the cause and the effect must be related in some way for which association we have enough evidence with a specific desired level of certainty. (The remaining chapters are devoted to finding just that type of evidence.)

<strong>3. Are there no alternative explanations of the variables's statistical association?</strong>

This condition is the most complicated one of the three, as it requires the examination of other variables and not just the two of initial interest. Again briefly, there are concerns about causality due to the social world being vastly complex and to the social science variables' complicated interplay in real life. Basically, in the social world there rarely is a single cause of anything.

For example, is the statistical association in question observed because the potential cause variable <em>indeed</em> affects the potential outcome variable -- or because both variables are in fact effects of a <em>third</em> variable (sometimes without any association between the original two variables)? Can we differentiate between a genuine relationship and a so-called <em>spurious</em> (i.e., fake, bogus) one, like the one described? As well, perhaps we only observe a statistical association between two variables and claim one as the cause because we haven't considered different potential causes. How can we be certain that it is (solely) the "cause" we have identified, or that if we considered alternative causes, the original so-called "cause" will remain as one?

Regarding the latter, consider again the association between <em>class attendance</em> and <em>test scores</em>. Would you believe me if I told you that your statistics test scores depended <em>only</em> on your class attendance? What about hours of studying, potential after-class tutoring, doing exercises, pre-existing math knowledge, searching for/reading additional sources online or in the library, asking relevant questions in class and/or office hours, etc., etc.?

There are numerous reasons why anyone would score higher or lower on a test, and I just listed a few of the study-related ones. We don't need to limit ourselves to these though. How about general health on the date of the exam (maybe you have come to the test sick)? Or romantic relationship or family problems one might be going through? A sick relative at home? Episodes of anxiety and/or depression? Being overworked, working a night shift before the test, and/or not getting enough sleep for another reason?

You can certainly add even more reasons for why a particular test score ends up what it is, and that class attendance is merely <em>one</em> such potential cause. (Are we even certain that, if we somehow accounted for all the other potential causes, we would still observe an association between attendance and scores?)

As to spurious associations, consider that it's possible for two variables to seem associated (i.e., there is a pattern between their values/categories; changes in one are accompanied by changes in the other) only because a third variable is causing the changes in both. Then, if, instead of focusing on the two genuine associations, we ignore the third variable and focus on its two outcomes which just happen to change at the same time, we would make a wrong conclusion in attributing causality to an association that essentially doesn't exist.

Take for example <em>life expectancy</em> and <em>internet</em>: Since 1990s, as internet was becoming more and more widespread in Canada, the Canadian life expectancy at birth was also increasing. We can therefore conclude that internet prolongs life. But there is a reason you've never before heard about this particular beneficial effect of internet on one's health and life -- it's extremely doubtful it exists. After all, wouldn't it make more sense to attribute both to general technological progress (not only in communications, IT, and infrastructure but also in healthcare and medicine)?

Finally, this is where the additional, overarching general condition for causality comes into play. Assuming the three conditions listed above are met, <strong>claiming causality essentially implies providing a <em>logical explanation </em>of the observed association.</strong> In and of itself, causality is about having a theory -- an idea, if you will, <em>why</em> there is such an association. Without such an idea, we are left simply with two variables which may be  or may not be <em>statistically -- but definitely not causally</em> -- associated, and the statistical association doesn't mean much, on its own [footnote]You most certainly need to check <a href="http://www.tylervigen.com/spurious-correlations">these</a> associations out. (You need any distraction you can get, and this time you can even say it's for a good, pedagogically meaningful cause. Or so I can tell myself.) Among them, you'll learn that the number of doctorates in Sociology awarded in the USA is very strongly correlated over time with worldwide non-commercial space launches, not to mention that the number of drownings by people falling into a pool correlates moderately strongly with number of movies in which Nicholas Cage appeared for the ten years between 1999 and 2009 (Spurious Media LLC/Tyler Vigen http://www.tylervigen.com/spurious-correlations[/footnote].  And given that the potential statistical association you may think exists might not even be there once other alternative causes are considered, you should realize by now that making a causal claim is indeed not a walk in the park.

What is to be done then? Obviously, such a brief presentation on the topic leaves a lot to be desired and is not going to be enough to fully prepare you for the task of comprehensively establishing causality in real-life research. What you should be able to do even now, however, is appreciate causality's complexity, keep in mind the necessary conditions for claiming causality (and apply these when reading about research findings and questioning conclusions), and always, always keep an eye on alternative explanations in particular (by asking yourself "what else could be causing this?").  These should provide enough basis for you not to take statements about statistical association between variables as more than they are, and to not confuse them with claims about causality.

As well, I hope you would be careful in phrasing your own conclusions when communicating statistical research to others by not overstating the findings of any analyses you might end up doing, especially if they involve only two variables, as per our discussion. By now it should be clear that real-life research considers many variables at the same time. Such <em>multivariate</em> analysis lies beyond the scope of this book so you should take any bivariate associations we discuss to be of solely <em>indicative</em> (or exploratory) nature -- something that additional, multivariate analysis may establish at a later point, but definitely not a finished product. After all, you didn't expect that you can establish causality by considering only two variables, did you?

With this in mind, we proceed with the question of how to establish <em>statistical</em> associations -- and not just observable in sample data, but the associations in which we are truly interested, i.e., those generalizable to populations. You may not be able to make claims about causality at this point but you can certainly learn how to test for evidence of statistical associations between two variables. To that purpose, the next section introduces the logic of using hypotheses in research and how hypotheses get tested.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>146</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:18:13]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:18:13]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[8-1-causality]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>1051</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-4-causality]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-3-causality]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.1.2 The z-Value</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-2-the-z-value/</link>
		<pubDate>Wed, 31 Oct 2018 22:21:21 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=148</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

In the previous section you discovered that we can "orient" ourselves about where a specific value lies along the normal distribution in relation to the average by means of the standard deviation. In Example 5.1 we saw that 68 percent of students' test scores were between 55 and 75 (i.e., between -1 and +1 standard deviations from the mean), 95 percent of scores were between <span style="text-indent: 37.3333px;font-size: 14pt">approximately</span><span style="text-indent: 37.3333px;font-size: 14pt"> </span><span style="text-align: initial;text-indent: 2em;font-size: 14pt">45 and 85 (i.e., between about -2 and +2 standard deviations from the mean), and that 99 percent of scores were between </span><span style="text-indent: 37.3333px;font-size: 14pt">approximately</span><span style="text-indent: 37.3333px;font-size: 14pt"> </span><span style="font-size: 14pt;text-align: initial;text-indent: 2em">35 and 95 (i.e., between -3 and +3 standard deviations from the mean). Thus, if your score was, say, 60, you would know that it was below the mean, but within 1 standard deviation away, which wouldn't be as bad as, say, had you scored 40, which is more than two standard deviations away from the mean.</span>

&nbsp;

<em>Hmm, do we really need standard deviations to tell us that a test score of 40 is bad news,</em> you ask. <em>Everyone knows that.</em>

&nbsp;

In absolute terms, sure, a score of 40 (out of 100) would be considered a failing one. In relative terms, however -- which is also known as grading on a curve -- a score of 40 doesn't tell you anything, unless you know the mean and the standard deviation.

&nbsp;

To better illustrate this, imagine another set of test scores, and that you on that test you get a score of 80. In absolute terms, a score of 80 (out of 100) would be quite good. What about in relative terms? Can you think of a situation where a score of 80 would be considered worse than a score of 40?

&nbsp;

What if I told you that the mean in the first case (when we imagine you scored 40) was 35 with a standard deviation of 5, while the mean in the second case (when we imagined you scored 80) was 90 with a standard deviation of 2? (You might find it easier to see the point if you grab a pen and paper and simply draw a line with the mean in the middle, then add and subtract that many standard deviations away from it in each direction, above and below.)

&nbsp;

A score of 40 (i.e., $35+5=40$) is 1 standard deviation <em>above</em> <em>the mean</em> of that test. A score of 80 (i.e., $90-5(2)=80$) is 5 standard deviations <em>below the mean</em> of that other test. In fact, 80 is well below the even 3 standard deviations away from the mean where 99 percent of scores are, it's at the very far end of the left "tail" of the distribution, likely an outlier.

&nbsp;

It turns out that the second test we imagined was so easy, scoring 80 on it was too low given how easy it was. On the other hand, scoring 40 on the first test we imagined was quite good given how hard it was.

&nbsp;

This mental exercise shows you that <strong>expressing values in terms of standard deviations</strong> has its merits, as it <strong>puts the values into perspective</strong> -- which allows us to make comparisons. A score/value in and of itself doesn't tell you anything -- not unless you know where it falls in relation to the mean and how far away it is. Now only if there was a way to express <em>any</em> value in terms of standard deviations without having to always calculate 1 standard deviation away, 2 standard deviations away, 3 standard deviations away from the mean (or to have to resort to pen and paper)...

&nbsp;

Guess what? There is! <strong>Expressing a value in terms of standard deviations is a process aptly called <em>standardization</em></strong> (as it produces scores that have a uniform, <em>standard</em> meaning allowing comparison) <strong>and</strong> <strong>the standardized values are called <em>z-values</em> (or <em>z-scores</em>). We standardize values by expressing the distance of the value from the mean in standard deviations,</strong> i.e.:

&nbsp;

$$\frac{\textrm{original score} - \textrm{mean}}{\textrm{standard deviation}}=\textrm{z-value}$$

&nbsp;

Or, in proper notation, where we denote the mean by <em>μ</em>[footnote]The difference between using $\overline{x}$ and <em>μ</em> and the reason we use the latter here will be explained in Chapter 6.[/footnote]:

&nbsp;

$$\frac{x_i - \mu}{\sigma}=z$$

&nbsp;

Following this formula, a score of 40 when the mean is 35 and the standard deviation is 5 (i.e., when <em>μ</em>=35 and <em>σ</em>=5) has a <em>z</em>-score of

&nbsp;

$$\frac{x_i - \mu}{\sigma}=\frac{40-35}{5}=\frac{5}{5}=1=z$$

&nbsp;

and a score of 80 when the mean is 90 and the standard deviation is 2 (i.e., when <em>μ</em>=90 and <em>σ</em>=2) has a <em>z</em>-score of

&nbsp;

$$\frac{x_i - \mu}{\sigma}=\frac{80-90}{2}=\frac{-10}{2}=-5=z$$

&nbsp;

Thus, we formally found what we already knew from before: that in the former case, the score of 40 was 1 standard deviation above the mean (i.e., its $z=1$) and the score of 80 was 5 standard deviations below the mean (i.e., its $z=-5$). If this seems repetitive -- after all, we reached the same conclusion without any fancy formulas -- that's only because I chose easily calculatable numbers to illustrate my point more easily. Perhaps an example with less "easy" numbers will convince you of the formula's worth.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 5.2 Average Monthly Rent for a Two-Bedroom Apartment in Vancouver</em></p>

</header>
<div class="textbox__content">

&nbsp;

<em>The Vancouver Sun</em> recently reported that the average monthly rent of a two-bedroom apartment in Vancouver, BC was \$2,915, at the time of writing the highest in all Canada. (REFERENCE https://vancouversun.com/news/local-news/vancouver-two-bedroom-apartments-now-cost-close-to-3000-report) While the standard deviation was not reported, for the purposes of this exercise we can imagine it as \$150.

</div>
What is the z-score of a family which pays \$2,630 per month for their two-bedroom condo? How about the z-score of someone who pays \$3,450 for theirs?

&nbsp;

Of course, we could grab a pen and paper and draw the normal distribution demarcating where 1, 2, and 3 standard deviations away from the mean fall in order to see where the two listed rents are relative to the demarcations. However, using the <em>z</em>-score formula makes for a faster (and a more precise) answer.

&nbsp;

In the first case, we have:

&nbsp;

$$\frac{x_i - \mu}{\sigma}=\frac{2630-2915}{150}=\frac{-285}{150}=-1.9=z$$

&nbsp;

In the second case, we have:

&nbsp;

$$\frac{x_i - \mu}{\sigma}=\frac{3450-2915}{150}=\frac{535}{150}=-5=3.6=z$$

&nbsp;

That is, the first family's monthly rent of \$2,630 is below the average but not that unusual: with a <em>z</em>-score of -1.9, it falls within 2 standard deviations away from the mean, which is within what 95 percent of renters in Vancouver pay for their two-bedroom apartments.

&nbsp;

On the other hand, the second person's rent of \$3,450 is quite high: with its <em>z</em>-score of 3.6, it falls beyond 3 standard deviations away from the mean, i.e., it's higher than what 99 of people pay monthly for a two-bedroom apartment.

&nbsp;

Again, we see the use of standardization and <em>z</em>-scores, as it allows us to put values into perspective.

&nbsp;

</div>
&nbsp;

Now is your turn to try.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 5.1 Comparing Average Monthly Rent for a One-Bedroom Apartment in Vancouver, Toronto, and Montreal</em></p>

</header>
<div class="textbox__content">

&nbsp;

According to the <em>National Rent Ranking</em>s monthly report for July 2019 by Rentals.ca (REFERENCE https://rentals.ca/national-rent-report), the average monthly rent for a one-bedroom apartment was \$2,028 in Vancouver, BC, \$2,259 in Toronto, ON, and \$1,231 in Montreal, QC. Assume the standard deviations are \$140 in Vancouver, \$180 in Toronto, and \$125 in Montreal.

&nbsp;

Using <em>z</em>-values, compare and analyze where in the distribution a rent of \$1,950 will put a Vancouverite, a Torontonian, and a Montrealer who all pay the same rent but in different cities.

&nbsp;

</div>
<sub>(Answer: Vancouverite's z=-0.6, Torontonian's z=-1.7, Montrealer's z=5.8.)</sub>

&nbsp;

</div>
&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>148</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:21:21]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:21:21]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-1-2-the-z-value]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-1-the-z-score]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-2-1-the-z-score]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-2-1-the-z-value]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-1-1-the-z-value]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.1.3 Percentiles</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-3-percentiles/</link>
		<pubDate>Wed, 31 Oct 2018 22:21:42 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=150</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Remember quartiles? We used them in Section 4.2 to find the interquartile range (<a href="https://pressbooks.bccampus.ca/simplestats/chapter/4-2-interquartile-range/">https://pressbooks.bccampus.ca/simplestats/chapter/4-2-interquartile-range/</a>). They would split the cases in the distribution in four equal parts (i.e., in quarters) giving us a first (1 percent to 25 percent of the data), second (26 percent to 50 percent of the data), third (51 percent to 75 percent of the data), and fourth quartile (76-100 percent of the data).

&nbsp;

What if, instead of splitting the distribution into <em>four</em> equal parts, we decided to divide it into <em>five</em>? That would be easy: Instead of having four parts, 25 percent of the data in each, we can just have five parts, 20 percent of the data in each. Like this: 1 percent to 20 percent, 21 percent to 40 percent, 41 percent to 60 percent, 61 percent to 80 percent, and 81 percent to 100 percent. This time, we call the five equal parts <em>quintiles</em> (from the Latin root "quin" like <em>quinctus</em>, meaning five).

&nbsp;

Just as easily, we can divide the distribution into <em>ten</em> equal parts: 1 percent to 10 percent, 11 percent to 20 percent, etc. ... all the way up to the last part, 91 percent to 100 percent. Then we have ten <em>deciles</em> (from the Latin root "dec" like <em>decem</em>, meaning ten).

&nbsp;

Following the same logic to the smallest possible whole number by which we can divide a distribution, we get <em>percentiles</em> -- a distribution divided into a hundred equal parts, 1 percent in each. It turns out percentiles can be quite useful when working with a normal distribution. (You didn't forget that's our current topic, did you?)

&nbsp;

The key piece of knowledge you need to recall from our discussion about quartiles is that to split the distribution, we need the cases lines up in order from the lowest value to the highest (or else we wouldn't be able to speak of first, second, third or last quartiles). Applying this to the normal distribution, we might be tempted to imagine the normal curve as illustrated in Fig. 5.5 below.

&nbsp;

<em>Figure 5.5 What Percentiles Do </em>Not<em> Look Like</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-wrong-percentiles.png" alt="" width="898" height="454" class="aligncenter wp-image-1749 size-full" />

&nbsp;

Fig. 5.5 lists the position of four randomly selected percentiles, <em>had the percentiles been evenly spread over the horizontal axis</em>. Of course, this is wrong. If we do this, we would be ignoring the <em>actual</em> distribution --  you know, the blue curve on the graph. After all, we have established by now that 68 percent of observations fall in the middle, within only 1 standard deviation way from the mean, where the curve is as its highest. (Recall that the height of the curve -- and the fact that it's a <em>curve</em>, not a <em>line</em> -- reflects the larger frequencies of the values around the mean, and the smaller, and smaller frequencies of the values further away from the mean, in the "tails".)

&nbsp;

What this should tell you is that we can't just assume the percentiles are uniformly spread -- because the data is not. We need to account for the fact that that values in the middle are way more popular than the ones in the "tails". Then how do we know what percentile a particular value has?

&nbsp;

Again, it's easy. We have <em>z</em>-scores for that. You see, every value has a <em>z</em>-score and the <em>z</em>-score reflects the percentage of cases which fall below or above that value. This is precisely the reason we know that 68 percent of the data fall within 1 standard deviation from the mean and that 95 percent of data falls within about 2 standard deviations from the mean.

&nbsp;

Thus, with a normal distribution, you can turn any value into a <em>z</em>-score (as we saw in the previous section), and this <em>z</em>-score into a percentile. While there are z-score tables providing percentages associated with any z-value, the easiest way to find a percentile is through online calculators like this one by <em>Measuring U</em>: <a href="https://measuringu.com/pcalcz/">https://measuringu.com/pcalcz/</a>.[footnote]For that matter, you can use an online calculator to find the <em>z</em>-score of any value. You can try one here (provided by <em>Social Science Statistics</em>): <a href="https://www.socscistatistics.com/tests/ztest/zscorecalculator.aspx">https://www.socscistatistics.com/tests/ztest/zscorecalculator.aspx</a>.[/footnote]  There, you can enter a <em>z</em>-score (make sure you choose "one-sided") and see what percent of data falls below it (on the normal curve on the left) and what percent of data falls above it (on the normal curve on the right). The exact percentile is the number reflecting the data "below".

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 5.2 Finding Percentiles Using an Online Calculator</em></p>

</header>
<div class="textbox__content">

&nbsp;

Using the percentile calculator linked above, you find that the percentile for <em>z</em>=1 is 84. Explain where this result comes from. (Hint: The mean bisects the distribution in two equal halves. A z-score of 1 is of course 1 standard deviation <em>above</em> the mean.)

&nbsp;

</div>
</div>
&nbsp;

<em>Cool,</em> you say (probably quite sarcastically), <em>we now know how to find percentiles. But for what do we use them?</em>

&nbsp;

I'm glad you asked. <strong>Percentiles allow us to compare a score in relation to the rest of the data; just like <em>z</em>-scores, they put things into perspective.</strong> Let's say you have 69 on a test. Turning your score into a percentile will tell you <em>exactly</em> what percent of the test-takers scored <em>below</em> you, whether it's 35 percent (then your score wouldn't be considered too impressive) or 99 percent (which would be most impressive, seeing how you'd be in the top 1 percent of test-takers) or any other percent it might be.[footnote]This is exactly what standardized tests (e.g., SAT) do to interpret individual scores. They provide percentiles so that any test-taker can find how they did <em>relative to others</em> (i.e., it provides the place of a score in the overall distribution of scores).[/footnote]

&nbsp;

Let's make sure you understand all that, shall we?

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 5.3 Hourly Wage</em></p>

</header>
<div class="textbox__content">

&nbsp;

Imagine you have applied for a job and your employer offers you \$13.5/hour. You also learn that the average hourly wage your potential employer pays to their employees is \$17.5/hour with a standard deviation of \$2.5/hour. See if this is a generous offer (after all, you would be just starting) by finding its <em>z</em>-score and percentile and comparing it to how the other employees of the company are fairing. (Don't forget to interpret both the percentile and the <em>z</em>-score.)

&nbsp;

</div>
</div>
&nbsp;

And now that you might be starting to feel somewhat comfortable with the uses of the normal distribution, I'll pull the rug a bit from under you, as it were. Recall how I started the chapter by explaining that many real-world interval/ratio variables tend to be approximately normally distributed? (That part's true.) And then we talked about where the variable's observations fall in the normal distribution? Well, there I lied. (It was necessary!)

&nbsp;

If you think about it carefully, both statements cannot be true. On the one hand, a real-existing variable has a specific distribution -- an <em>approximately</em> normal one. But would two real-existing variables have <em>exactly the same</em> approximately normal distribution? That would be unlikely, considering that different variables, in different datasets, with different number of observations, units of measurements, units of analysis, means and standard deviations, etc. cannot possibly look exactly the same if plotted on a histogram. How then do we get these very fixed and very specific numbers and percentages associated with the <em>z</em>-scores and the percentiles?

&nbsp;

The thing is, everything I told you about the normal distribution, starting with its defining features and ending with the <em>z</em>-scores and percentiles, refers to the ideal-type, only-existing-in-theory, perfect normal distribution. All the numbers and calculations and percentages we discussed reflect the <em>theoretical</em> normal distribution; they serve as a sort of <em>expectation</em> of how a (continuous, random)[footnote]I explain randomness a bit in the next section, and further in Chapter 6. For now, know that in statistics it doesn't mean "arbitrary" or "accidental" but rather "obtained in an unbiased way" (i.e., with every element having an equal chance to be picked).[/footnote] variable <em>is expected </em>to be distributed. Of course, real-existing variables generally fall short of this ideal, and therefore we call their distributions <em>approximately</em> normal.

&nbsp;

I will repeat: <strong>the theoretical (perfect) normal distribution provides us with what we can <em>expect</em> the actual frequencies of the variable's values to be, in theory</strong>. (In reality, the distribution differs from that expectation to varying degrees). It turns out, <strong>when we work with <em>z</em>-scores and associated percentages and percentiles, we work with what is <em>expected</em></strong>, not with what <em>is</em>. (The variables' observed distributions differ but the normal -- expected --  distribution is always the same.)

&nbsp;

What do we do then, with this reality versus expectation we have here? Why did we learn all we did about the normal distribution if "it isn't real"?[footnote]That said, again, some standardized tests can be designed in such a way that their test scores to be distributed normally. Thus, real-existing data <em>can</em> have a normal distribution, it's just usually it's an approximation.[/footnote]

&nbsp;

This is where probability comes in. Hold the thought about the normal distribution being an expectation; we'll come back to it in the remaining sections of this chapter.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>150</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:21:42]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:21:42]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-1-3-percentiles]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-2-percentiles]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-2-2-percentiles]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-1-2-percentiles]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>6.2. Non-random Sampling</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-2-non-random-sampling/</link>
		<pubDate>Thu, 28 Feb 2019 18:59:53 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=674</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

How do we go about selecting elements (be they individuals, organizations, etc.) for a study, once we have decided on a population? In short, how do we go about sampling?

&nbsp;

You know by now (if only because of the title of this section) that the two broad types of sampling are non-random and random.<strong> Statistics (specifically, <em>inferential</em> statistics) is based on random sampling</strong>, therefore in what follows I disproportionately focus on that. This is not because non-random sampling is not used or isn't useful -- not at all! <strong>Non-random sampling comprises several very much valid and valuable sampling techniques, typically used in qualitative studies.</strong> However,  these are situated outside the scope of this book; as such I will do an only passing overview of non-random sampling (so that you are able to spot it and differentiate it from random sampling).[footnote]You would be doing yourself a favour to learn about all research (and sampling) methods available. After all, not every research question can be approached and studied from a quantitative perspective. (And, at the very least, there are study populations that can only be sampled non-randomly.) I thus very much encourage you, if you haven't already, to take an introductory course in research methods to learn all there is to learn to sampling, both non-random and random.[/footnote]

&nbsp;

With that in mind, I start my lopsided mini-presentation on the topic; non-random sampling first and random-sampling in the next section.

&nbsp;

Professors in social science classes sometimes ask students to interview or administer surveys as part of class assignments. You might have had to do that, or you can just imagine such an assignment -- so how did/would you select your subjects? Most likely you would go with what's most convenient -- fellow students in your class, students that happen to be in, say, the cafeteria when you had time to do the assignment, your closest relatives or friends if you were instructed to chose non-fellow students. Any of these ways of sampling are generally classified as non-random (a.k.a non-probability) sampling.

&nbsp;

<strong>Non-random sampling techniques typically include <em>convenience sampling</em></strong> (selecting whichever elements are closets/most convenient to you),<em> <strong>purposive sampling</strong></em> (sampling with a purpose: selecting only the most useful (e.g., most knowledgeable/ rich in information) cases as judged by the researcher, also called <em>judgment, selective</em>, or <em>subjective sampling</em>), <em><strong>snowball sampling</strong></em> (where selected few initial participants contact/invite/recruit others in their respective circles to become participants in the research), and <em><strong>quota sampling</strong> </em>(sampling on a specific desired characteristic, e.g., specifically selecting a certain number of men and a certain number of women for a study).

&nbsp;

As well, <strong>any time the subjects of a study are self-selected</strong> <strong>(i.e., the study is based on</strong> <strong>people volunteering to participate), it is also considered non-random sampling</strong>.

&nbsp;

The one defining feature common to all non-random sampling methods is related to the probability of elements to be selected/included in the study. <strong>If the probability of the elements of the population to be included in the study is unequal -- i.e., if some elements have higher probability to be in the study than others -- the sampling is called <em>non-random</em>.</strong> Non-random samples are in this sense <em>biased</em> -- they focus, and select information, on some elements more than others.

&nbsp;

The information about these specific elements might be very useful but it reflects <em>only</em> the elements from which it was collected. In other words, <strong>such information (and studies based on it) is said to have <em>limited generalizability</em>.</strong> To the extent that there is a claim to generalizability, the generalizability is <em>assumed</em> (perhaps by assuming the population is so uniform that any sub-group would reflect it).

&nbsp;

A word of caution, however: The limited generalizability of non-random sampling techniques should never be taken as somehow detracting from, or invalidating, research who legitimately uses them. To take a prime example, ethnographies usually rely on non-random sampling methods, yet they typically provide a wealth of information and levels of detail that could never be achieved through a quantitative survey research alone. Thus, non-random sampling techniques should ever be considered as inferior to random ones -- just different, and serving different purposes.

&nbsp;

<strong>The purpose of random sampling, then, is to find a way for a sample to truthfully reflect</strong> -- i.e., to stand in for -- <strong>the population from which it is taken.</strong> This truthful reflection - i.e., generalizability -- is no longer assumed (as it is in non-random sampling), but rather it is verifiably proven through mathematical means based on probability theory.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>674</wp:post_id>
		<wp:post_date><![CDATA[2019-02-28 13:59:53]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-02-28 18:59:53]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[6-2-non-random-sampling]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-2-non-random-and-random-sampling]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[6-2-non-random-and-random-sampling]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>6.4 Parameters, Statistics, and Estimators</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-4-parameters-statistics-and-estimators/</link>
		<pubDate>Fri, 01 Mar 2019 23:24:34 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=701</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

The logic underlying statistical inference is that we want to know something about a population of interest but, since we cannot know it directly, what we do is study a subgroup of that population. Based on what we learn/know about the subgroup, we can then <em>estimate</em> (i.e., infer) things about the population. In the previous section, we already established that not any subgroup of the population will do -- what we need is a <em>randomly</em> selected sample, created through one of the random sampling methods I listed (simple, systematic, stratified, and cluster). What we do is <strong>collect data from/about elements of a <em>sample</em> (e.g., respondents) with the explicit goal of finding something and making conclusions about a <em>population</em></strong>. (Again, we can do that due to the fact that random sampling allows us to use probability theory through the normal curve.)

&nbsp;

Saying we want to find "something" about the population of interest is hardly formal (much less precise) terminology but I wanted to get the message across before I introduced you to the proper statistics jargon. Let's do that now.

&nbsp;

Populations have <em>parameters</em> and samples have <em>statistics</em>. <strong>We describe populations with their <em>parameters</em> while we describe samples with their <em>statistics</em>.</strong> When we study something, we are interested in the parameters of the population, however, in most cases it is difficult to collect the information to calculate them. What we do instead is <strong>we take a random sample of the population and calculate the <em>sample's statistics</em>. We then use the sample statistics to <em>estimate </em>(i.e., infer) the <em>population parameters</em>.</strong> Thus, sample statistics are also called <em>estimators</em> of population parameters.

&nbsp;

For example, if we want to know the average age of Canadians, we could either do a census and ask everyone or simply take a nationally representative sample. Considering how expensive and time-consuming it would be to ask all 36.7 mln. Canadians (and Statistics Canada conducts the official census only every five years), we can poll a random selection of people across Canada, calculate their average age, and use <em>that</em> as an <em>estimate</em> of the average age of all Canadians.[footnote]When people who have no statistics background learn of this, they usually protest that the information is not accurate because it's not based on <em>everyone</em>. What you will learn in this chapter is that you don't <em>need</em> everyone, and a sample is perfectly enough because random samples of sufficient size are mathematically proven to produce the best (closest, truest, most unbiased) estimates of the population parameters. To the extent that there is a difference between a statistics and the parameter it estimates, this difference is accounted for by reporting levels of certainty/confidence. More on that later.[/footnote]

&nbsp;

In this example, the average age calculated based on the people in the sample is the<em> statistic</em> which we use to <em>estimate</em> the average age of all Canadians, the population <em>parameter</em>. All measures of central tendency and dispersion describing variables based on sample data are statistics. On the other hand, if we have data from all the population when calculating measures of central tendency and dispersion, we would have parameters.

&nbsp;

Consider, if you will, examples I have used in past chapters: whenever the example was based on actual data from a dataset, and SPSS was used, this was sample data producing statistics.[footnote]All datasets used in this book are nationally representative data collected by Statistics Canada.[/footnote] Even if we haven't used statistics in this way yet, t<span style="text-indent: 18.6667px;font-size: 14pt">hey <em>can</em> be used to estimate things about Canadians as a whole. On the other hand, any time I have used examples using hypothetical (imaginary) data about "your friends," "your classmates," "hours you have worked per week," etc. can be considered as having population data, as we imagine we have all the information about those things, and there's nothing to estimate.  </span>

&nbsp;

A final note concerns formal notation. <strong>To differentiate between statistics and parameters, we designate sample statistics by Latin letters but we denote population parameters by Greek letters.</strong>

&nbsp;

You have already seen a ready-made example for this rule: recall our discussion on variance and standard deviation. In Section 4.4 (<a href="https://pressbooks.bccampus.ca/simplestats/chapter/4-4-standard-deviation/">https://pressbooks.bccampus.ca/simplestats/chapter/4-4-standard-deviation/</a>) I introduced formulas for <em>σ</em> and <em>σ<sup>2</sup> </em>and I mentioned (without much explanation) that another "version" of these exist as <em>s</em> and <em>s<sup>2</sup></em>. In truth, when we calculated the variance and the standard deviation with the hypothetical data in the examples, we needed the <em>population</em> standard deviation and variance (i.e., <em>σ</em> and <em>σ<sup>2</sup></em>, respectively); but when we use SPSS with a dataset (i.e., sample data), we need the <em>sample</em> standard deviation and variance (i.e., <em>s</em> and <em>s<sup>2</sup></em>, respectively). Here they are again:

&nbsp;

$$\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N} = \sigma^2 =\textrm{population variance}$$

&nbsp;

$$\sqrt{\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N}} = \sqrt{\sigma^2}=\sigma=\textrm{population standard deviation}$$

&nbsp;

$$\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N-1} = s^2 =\textrm{sample variance}$$

&nbsp;

<span style="text-indent: 18.6667px">$$\sqrt{\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N-1}} = \sqrt{s^2}=s=\textrm{sample standard deviation}$$ </span>

&nbsp;

I'll take this opportunity to finally explain why we need the difference in the formulas (i.e., to divide by <em>N-1</em> in the <em>sample</em> formulas but by <em>N</em> in the <em>population</em> formulas). Considering that the sample statistics <em>estimate</em> the population parameters but are arguably different from the exact parameters -- i.e., some uncertainty exists, as inference is not a perfect "guess" -- if we assumed what we obtain from a sample is the same as the population would be a biased estimation. Thus, the <em>N-1</em> is meant to correct that bias[footnote]This is called <em>Bessel's correction</em>, by the name of Friedrich Bessel who introduced it.[/footnote] (which it does for the variance, and does to an extent for the standard deviation). <strong>What we have then is that <em>s</em> and <em>s<sup>2</sup></em> are unbiased estimators of <em>σ</em> and <em>σ<sup>2</sup></em>, respectively.</strong>

&nbsp;

Thus it should be clear why we use the <em>s</em> and <em>s<sup>2</sup></em>  formulas when working with datasets and SPSS -- as the actual data has been collected from respondents randomly selected from <span style="text-indent: 18.6667px;font-size: 14pt">a population of interest and comprising </span><span style="text-indent: 1em;font-size: 14pt">a sample of specific size. On the other hand, when we have data about everyone/everything we're interested in (like in the small-scale examples with made-up data), we have a </span><em style="text-indent: 1em;font-size: 14pt">de facto</em><span style="text-indent: 1em;font-size: 14pt"> population on our hands -- hence the </span><em style="text-indent: 1em;font-size: 14pt">σ</em><span style="text-indent: 1em;font-size: 14pt"> and </span><em style="text-indent: 1em;font-size: 14pt">σ<sup>2</sup> </em><span style="text-indent: 1em;font-size: 14pt">formulas are appropriate. In the former case, the findings can be extrapolated to the population (acknowledging that we are dealing with inferred estimates); in the latter case, there is nothing further to extrapolate as we are calculating the parameters directly.</span>

&nbsp;

Another important parameter to note as we will be using it a lot from now on is the population mean designated by the small-case Greek letter for <em>m</em> (from <em>mean</em>) -- <em>μ</em>.[footnote]The Greek letter  <em>μ</em> is pronounced as "MYU".[/footnote] Unlike the correspondence between <em>s</em> and <em>σ</em>, however, we don't usually denote the sample mean with an <em>m</em>; as you know we use $\overline{x}$ instead (so that we know which variable's mean we have in mind).

Finally, when a parameter is being estimated by an estimator, it is designated by a "hat" on top: for example, if we have a sample statistic called <em>a</em> estimating a population parameter <em>α</em>[footnote]This is the small-case Greek letter <em>a</em>: <em>α</em>, pronounced "AL-pha".[/footnote], the estimated <em>α</em> will be $\hat{\alpha}$, pronounced "alpha-hat". By analogy, if a statistic <em>b</em> estimates a parameter <em>β</em>[footnote]This is the small-case Greek letter <em>b</em>: <em>β</em>, pronounced "BAY-ta".[/footnote], the estimated <em>β</em> will be $\hat{\beta}$, pronounced "beta-hat".

Thus, the logic of inference tells us that while <em>a</em> = $\hat{\alpha}$ and <em>b</em> = $\hat{\beta}$ (i.e., the statistics are estimators for the parameters), <em>a</em> = $\hat{\alpha}\neq\alpha$ and <em>b</em> = $\hat{\beta}\neq\beta$ (i.e., the statistics (also estimators) are not the same as the parameters). More on this, next.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>701</wp:post_id>
		<wp:post_date><![CDATA[2019-03-01 18:24:34]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-01 23:24:34]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[6-4-parameters-statistics-and-estimators]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[7-3-parameters-statistics-and-estimators]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[6-3-parameters-statistics-and-estimators]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.1 The Normal Distribution</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-the-normal-distribution/</link>
		<pubDate>Thu, 07 Mar 2019 21:28:54 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=767</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

You might have already heard of bell curves (or bell-shaped curves), or even normal curves. If you have, you also probably know they look similar to the one in Fig. 5.1.

&nbsp;

<em>Figure 5.1 Body Mass Index of Respondents (CCHS 2015/2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-curve-bmi-cchs.png" alt="" width="462" height="370" class="alignnone wp-image-1679 size-full" />

&nbsp;

Fig. 5.1 shows a histogram with the distribution of the variable <em>body mass index</em> (or <em>BMI</em>) of respondents to the <em>CCHS 2015/2016</em>. Judging by the height of the bars that comprise it, the histogram illustrates the fact that most cases tend to cluster at the centre (i.e., most people's <em>BMI</em> is average), while a decreasing number of cases end up in the "tails" of the distribution (i.e., the further their <em>BMI</em> is from the average, the fewer cases there are).

&nbsp;

You can easily notice that the distribution (as reflected in the green bars) is not perfectly symmetric but a bit positively skewed: the right "tail" is longer than the left. Still, its shape approximates a bell well-enough (note for comparison the black curve in Fig. 5.1 which is a true bell shape). <strong>We call this type of distribution <em>approximately normal</em></strong>.

&nbsp;

A great many interval/ratio variables in the world tend to have an approximately normal distribution when plotted (true for both the social and natural sciences). That is, the majority of observations are centered in the middle of the distribution (i.e., they tend to be <em>average</em>); we find fewer observations just below and just above the average, and fewer still which are  much below or much above the average.

&nbsp;

Think about height, for example. Most people are of average height (that's why it's called <em>average</em> height after all), some people are above and some below average, fewer people are much taller or shorter, and rather rarely are some people extremely short or extremely tall. Variables like age, or weight (which you can see in Fig. 5.2 below[footnote]The reason you observe the "double" distribution -- one shorter (darker) while the other taller (lighter) -- is due to the self-reporting of weight. Most people tend to report their weight in whole numbers, and here some have done so, stating their weight as 65 kg or 85 kg, etc.; these are the tall bars. Others, however, may have reported it with grams and/or in pounds (which when converted to kilograms would produce a non-whole number weight), thus resulting in weights such as 65.35 kg or 85.75 kg, etc., leading to the short bars and to the histogram appearing like two histograms plotted on top of each other. Had the responses been rounded to the nearest whole kilogram, the histogram would have taken a regular, "single" normal-curve shape.[/footnote]) but also, say, test marks, or points scored per hockey game, or text messages sent per day, etc. are similar. There will be an average, and a continuous decrease in frequency the further one gets from that average.

&nbsp;

<em>Fig. 5.2 Weight of Respondents (CCHS 2015/2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-curve-weight-cchs.png" alt="" width="462" height="370" class="alignnone wp-image-1680 size-full" />

&nbsp;

<em>As fascinating as all this is</em>, you might be thinking now, <em>why do we care about it?</em> <em>It's just one type of a distribution among many.</em>

&nbsp;

True, but as I already mentioned, the normal distribution is special, and not just because many variables' histograms tend to plot an approximately normal curve. To understand why, we need to start exploring the normal distribution as a <em>theoretical</em> concept (or, to borrow from Max Weber, as an <em>ideal type</em>).

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>767</wp:post_id>
		<wp:post_date><![CDATA[2019-03-07 16:28:54]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-07 21:28:54]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-1-the-normal-distribution]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-2-the-normal-distribution]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>6.8. Summary [EMPTY]</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-8-summary-empty/</link>
		<pubDate>Mon, 18 Mar 2019 22:42:18 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=912</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>912</wp:post_id>
		<wp:post_date><![CDATA[2019-03-18 18:42:18]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-18 22:42:18]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[6-8-summary-empty]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>9</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>7.2.1. Between A Discrete and A Continuous Variable</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-1-between-a-discrete-and-a-continuous-variable/</link>
		<pubDate>Wed, 20 Mar 2019 20:58:31 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=940</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

<strong>We can "get a sense" if a discrete and a continuous variable <em>seem</em> associated visually through a chart called a <em>boxplot</em></strong> (discussed further below) <strong>and numerically through examining the <em>difference of means</em></strong> (or medians, if one so prefers).

What type of an association do we get when we consider a discrete and a continuous variable? The easiest way to represent this type of association is when we consider a binary (two-category) discrete variable and check if a continuous variable's statistics (like the mean, or the median) vary between the discrete variable's categories. This sounds far more complicated than it is. A couple of examples will show you that you have probably considered questions about "comparisons of means" even in your everyday life. The first one will explain it conceptually, the second  with actual data.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Sex Differences in Upper Body Strength, American College Students</em></p>

</header>
<div class="textbox__content">

Research has shown that, despite similar lower body strength, women have less upper body strength than men, on average.  [LIST CITATIONS FROM HERE https://health.howstuffworks.com/wellness/diet-fitness/personal-training/men-vs-women-upper-body-strength.htm AND THE FOLLOWING] One such study examined differences in upper body strength in a sample of Caucasian and East-Asian college students engaged in weight-lifting classes in American colleges (Chen, Liu and Yu, 2012)  [https://content.sciendo.com/view/journals/ssr/21/3-4/article-p153.xml, pdf here https://www.degruyter.com/downloadpdf/j/ssr.2012.xxi.issue-3-4/v10237-012-0015-5/v10237-012-0015-5.pdf].

While the study examined numerous aspects of the difference in strength, I'll take only one of the researchers' findings to illustrate my point: triceps strength in arm extension. The reported means were 46.2 pounds for women versus 87.4 pounds for men in the Caucasian sample, and 39.6 pounds for women versus 82.1 pounds for men in the East-Asian sample Chen, Liu and Yu, 2012, p.156).

Consider what we are discussing here: We have two variables of interest[footnote]You could argue that <em>race/ethnicity</em> is also there. As reported in the study, however, <em>race/ethnicity</em> was a secondary variable bringing more detail to the study, though which the authors were able to demonstrate that upper-body strength differences based on sex exist in both race/ethnic groups considered.[/footnote], <em>gender</em> and <em>upper-body strength</em>. <em>Gender</em> is a nominal discrete (and, in this study, binary) variable while <em>upper-body strength</em> (through various measurements in pounds) is a ratio continuous variable. The hypothesized association between the two posits that some categories of the discrete variable (e.g., men) tend to go with specific values of the continuous variable (e.g., higher values on upper body-strength). That is, if both men and women had the same means for, in this case, triceps strength in arm extension, <em>gender</em> and <em>upper-body strength</em> would be unrelated, as one's sex wouldn't be predictive of one's upper-body strength at all.

In effect, we are comparing the mean values (of a continuous variable) across groups (i.e., the categories of a discrete variable). Now, as far as a numerical description of that comparison goes, we have the two means (of men and of women) and we can thus calculate the difference of means:

$\overline{x}_{men}$ $-\overline{x}_{women}$ $=87.4-46.2=41.2$ (Caucasian sub-sample)

$\overline{x}_{men}$ $-\overline{x}_{women}$ $=82.1-39.6=42.5$ (East-Asian sub-sample)

Thus, what we observe here <em>in this sample</em> is a 41.2 pounds difference in the upper-body strength (as measured by triceps strength in arm extension) between Caucasian men and women and a difference in upper-body strength of 42.5 pounds between East-Asian men and women. Again, note that <strong>the fact that we see these differences in the sample does not mean they exist in the population -- they may, or they might not</strong>. <strong>We wouldn't know this unless we test if the differences are generalizable to the population</strong>[footnote]If you are interested, the authors of the study did test these differences (with a t-test, discussed later) and found them generalizable to the population indeed (Chen, Liu and Yo, 2012).[/footnote]<strong>.</strong> We'll get to testing later, for now we are only interested in the differences <em>descriptively</em>, i.e., that they exist <em>in the sample</em>.

</div>
</div>
Example XX above shows that every time we compare averages of two (or indeed, more than two) groups and calculate the differences in the means, we are effectively describing associations between variables. I could have easily presented other examples like gender or race/ethnic differences in annual income, years of education, occupational prestige, test scores[footnote]For an example of a brief study on the association between <em>race/ethnicity</em> (a five-category discrete variable) and <em>SAT scores</em> of Harvard University applicants, see <a href="https://www.thecrimson.com/article/2018/10/22/asian-american-admit-sat-scores/">here</a>.[/footnote], etc., etc. The reason I chose an example about a sex-based rather than gender-based difference (that is, a kinesiological rather than a sociological study) was so that I can warn you in passing about a common mistake, called the <em>ecological fallacy</em>.
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><strong><span style="color: #ff0000">Watch Out!!</span></strong>  . . . for The Ecological Fallacy</p>

</header>
<div class="textbox__content">

Consider the findings from the study in Example XX above: men's average upper-body strength is higher than women's. Assuming we can generalize the findings to the general population[footnote]As mentioned above, many studies support this as a real physiological sex difference; the reason I chose this example instead of more controversial/debated issues like gender differences in IQ, or the gender pay gap, etc.[/footnote], the evidence suggests than when it comes to upper-body strength men are stronger than women <em>on average</em>. Many people take this to mean that a randomly selected man would be <em>always</em> stronger than a randomly selected woman . . .  which does not follow at all from the difference in mean strength.

Statistically speaking, it is a matter of the dispersion around the means of the two groups, and of how big the difference in means is. Figure XX below demonstrates.

[PLACEHOLDER FOR CROSSING NORMAL CURVES - and perhaps non crossing]

Ultimately, the takeaway from this caveat is to not over-interpret differences in averages to mean more than what they actually are: differences in <em>averaged</em> values, not of the specific values of <em>individuals </em>belonging to the different groups that are compared.

</div>
</div>
With that warning out of the way, let's take another (this time, sociologically motivated) example for examining differences of means, along with a proper visual description -- boxplots.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Gender Differences in Total Income, NHS 2011</em></p>

</header>
<div class="textbox__content">

Statistics Canada's <em>National Household Survey 2011</em> (NHS 2011) was designed to replace the until-that-time mandatory long form of the Census[footnote]For the problematic nature of the (Harper) Government's decision in 2010 to make the survey voluntary and its related implications, see for example <a href="https://ocul.on.ca/node/3400">here</a>. The mandatory long-form census was restored in 2016 by the Liberal Government. My usage of the data here is strictly for demonstration purposes and as such shouldn't be taken as an endorsement of the NHS 2011.[/footnote]. For this example, I'm using a random sample of about 3% of the NHS 2011 iindividual data (aka a Public Use Microdata File, or PUMF), resulting in <em>N</em>=22,123. I'm interested in whether men and women's income for the year preceding the survey differed, i.e., whether the variables <em>gender</em> (called <em>sex</em> in the dataset) and <em>total income</em> (i.e., income from all possible sources) appear associated.

With the help of SPSS, I plot the data. The resulting boxplots graph is given in Figure XX below.

Figure XX <em>Gender Differences in Total Income, NHS 2011</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/boxplots.jpg" alt="" width="580" height="538" class="wp-image-954 aligncenter" />

Source: Statistics Canada, 2019.

Boxplots are charts visually incorporating a lot of statistical information in one neat little package; I encourage you to make use of them when exploring your data as they can be quite useful. What do we see in Figure XX our case? Obviously, we have two groups to compare (as per the two categories in the nominal variable gender), women and men, and therefore the graph presents two boxplots. (Had we multiple categories in our discrete variable, we'd have had multiple boxplots.)

<strong>How to read a boxplot.</strong> Each boxplot consists of the eponymous "box" and two so-called "whiskers" protruding from it. The "box" (in green above) represents the middle 50 percent of the data (i.e., the two middle quartiles, or the IQR); the lower whisker represents the first/bottom quartile of the data, and the upper whisker represents the last/top quartile of the data. The dark line bisecting the box indicates the median. The two ends of the whiskers are the lowest and the highest values. Note, however, that the quartiles (as represented by the whiskers) exclude outliers as to not <span style="font-size: 1rem">visually </span><span style="text-indent: 1em;font-size: 1rem">distort  the "regular" spread of the data. As such, the chart plots run-of-the-mill outlier cases as small circles (above they are in red)) outside of the whiskers; extreme outliers are indicated by stars (in black above)[footnote]Also note that to make the boxplot readable in an appropriate size, in Figure XX I cut some <em>extremely</em> extreme outliers off at the top of the male boxplot.[/footnote].</span>

Now that you know how to read them, compare the two boxplots above. First, we see that the median  for men is higher than the median for women <span style="font-size: 1rem">(again, these are the dark lines within the boxes)</span><span style="text-indent: 1em;font-size: 1rem">; as well, total income appears to be more spread out for men than for women (the whiskers in the men's boxplot reach further, indicating larger range and IQR. Further, while both men and women appear to have outliers, the men's group seems to include more extreme outliers and at higher values than those observed in the women's group[footnote]You might have noticed that the first quartile (i.e., the bottom "whisker") includes negative values. Statstics Canada uses several income variables for which this is the case. Negative income exists as an accounting possibility: when one's annual expenses end up larger than one's annual income (e.g., like for a self-employed individual whose business hasn't been as successful, etc.). In many real-life research, negative income values are frequently dropped/removed if that course of action is justified by the study's design, research question, and purposes. In the case of this example I have no reason to do that, hence I left the negative income values in the data.[/footnote]. </span>

<span style="text-indent: 1em;font-size: 1rem">All this points to the conclusion that men in the sample had higher (median, and quite likely average) total income for 2010 than women did, despite that the individuals with the lowest incomes also appear to be men.</span>

As useful the general information we gleaned from the boxplots, we should look at the precise numbers too. SPSS calculates the mean total income as \$32,465 for women and \$48,866 for men -- that is, there is \$16,401 difference in mean total income in favour of men. In this sample of 22,123 people, men's average total income is \$16,401 more than women's.

We could also compare the medians (especially useful when dealing with income variables): SPSS gives the median total income of women in the sample as \$23,000, while the median total income for men is \$35,000 -- a difference of medians of \$12,000, again in favour of men.

</div>
</div>
To summarize,<strong> you can explore a potential association between a discrete and a continuous variables of interest in two ways: 1) visually -- by plotting and comparing boxplots, and 2) numerically, by inspecting the means (or medians) for the groups (i.e., the categories in the discrete variable being compared) and reporting their difference. </strong>

Keep in mind that we are not estimating anything at this point and are not claiming anything about the population: we are simply describing data based on a specific, actual sample.

Figure XX below shows a quick reference for interpreting boxplots.

Figure XX <em>How to Interpret a Boxplot</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/Box_plot_descriptionA.jpg" alt="" width="437" height="457" class="wp-image-966 size-full aligncenter" />

Source: https://commons.wikimedia.org/wiki/File:Box_plot_description.jpg

[BRIEFLY DISCUSS 3+ groups]
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title">SPSS Tip XX <em>Bivariate Descriptions of Discrete and Continuous Variables: Boxplots and Comparisons of Means</em></p>

</header>
<div class="textbox__content">

This is <strong>how you can get boxplots</strong> like the ones in Figure XX above:
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Graphs</em>, then from the pull-down menu <em>Legacy Dialogues</em>, and finally <em>Boxplot</em>;</li>
 	<li>In the resulting <em>Boxplot</em> window select <em>Simple</em> and, keeping <em>Summaries of groups of cases</em> checked, click <em>Define</em>;</li>
 	<li>Select your continuous variable of interest from the list on the left and, using the appropriate arrow, move it into the <em>Variable</em> empty space on the right (at the top);</li>
 	<li>Select your discrete variable of interest from the list on the left and, using the appropriate arrow, move it into the <em>Category Axis</em> empty space on the right (below the <em>Variable</em>), then click <em>OK</em>;</li>
 	<li>Your boxplots will appear in the <em>Output</em> window. (Note that the graph will appear in its default SPSS colours and specifications. Double-clicking the chart will make a <em>Chart Editor</em> window appear. In the <em>Chart Editor</em> you can change, edit, and modify the appearance of your boxplots to your heart's content.)</li>
</ul>
This is <strong>how you can get means, medians (or any descriptive statistic really) for different groups</strong>:
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Data</em> and then from the pull-down menu, select <em>Split File</em>;</li>
 	<li>In the new window, select <em>Compare groups</em>, then find your discrete variable of interest from the left-hand side, and using the arrow, move it into the <em>Groups Based on</em> empty space; click <em>OK</em>.</li>
 	<li>You would have just placed a filter on your data. From this point on (until you switch the filter off), everything you do in SPSS will be done for each separate group (this is indicated by a message "SORT CASES BY [your discrete variable name]. SPLIT FILE LAYERED BY [your discrete variable name]." appearing in the <em>Output</em> window.</li>
 	<li>Then, from the <em>Main Menu</em>, select <em>Analyze</em>, and then <em>Frequencies</em>, etc. to request any descriptive statistics you may like, e.g., the mean, the median, the standard deviation, etc. as discussed in SPSS Tip XX in Chapter XX.</li>
 	<li>Your output in the <em>Output</em> window will list the requested descriptives by the different groups (categories of the discrete variable).</li>
 	<li>Once you are done with the comparisons, do not forget to switch the filter off (or your data file will remain split by groups): go again to <em>Data</em> in the <em>Main Menu</em>, select <em>Split File</em> and click <em>Analyze all cases, do not create groups</em> on the right-hand side; click OK.</li>
 	<li>Your <em>Output</em> window will give a message of "SPLIT FILE OFF." to indicate that the data is no longer split by group and it's in its original condition.</li>
</ul>
</div>
</div>
Now let's see how to "spot" and describe potential associations between two discrete variables.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>940</wp:post_id>
		<wp:post_date><![CDATA[2019-03-20 16:58:31]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-20 20:58:31]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[7-2-1-between-a-discrete-and-a-continuous-variable]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>34</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>7.2.2. Between Two Discrete Variables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-2-between-two-discrete-variables/</link>
		<pubDate>Fri, 22 Mar 2019 03:19:47 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=974</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Examining a potential statistical association between two discrete variables amounts to comparing groups (as per the categories of one of the variables) on the number (and proportion) of their respective members that fall in the categories of the other variable[footnote]Since both variables are discrete, for clarity's sake I refer to the attributes of one variable as<em> groups</em> and to the attributes of the other variable as <em>categories</em>. (I have used them interchangeably until now but here it helps to distinguish the two variables by using two different words for their attributes.)[/footnote]. Again, this sounds far worse than it actually is, as you will see in the examples that follow.

The potential association between discrete variables can be examined both visually and numerically via a special table called <em>cross-tabulation </em>table ("cross-table" or "crosstab" for short) or <em>contingency</em> table. While a contingency table can have any number of rows and columns, <em>too</em> large a number of either/or both can easily make the table unreadable as it would contain too much data to contemplate at once. (This is also the reason why we chose to treat some variables as continuous -- when they have too many categories -- as then we can use another tool to visualize and examine them; Section XX explains.) Thus, below I introduce the simplest form of a contingency table, a 2x2 crosstab (i.e., 2 rows and 2 columns).

In the general sense a <em>K</em>x<em>J</em> cross-table would be a table containing <em>K</em> rows and <em>J</em> columns, where the categories of one variable go into the rows (a <em>K</em> number of them) and the categories of the second variable (a <em>J</em> number of them) go into the columns of the table (therefore <em>crossing</em> in the interior cells of the table).

Thus, a 2x2 contingency table would mean we have two binary variables, each with two categories. Before I show you an actual data exploration, Table XX presents an "empty shell" of one such table which I use to introduce some needed vocabulary.

Table XX<em> A Generic Cross-tabulation Table</em>
<table class="shaded" style="border-collapse: collapse;width: 100%;height: 68px" border="0">
<tbody>
<tr style="height: 17px">
<td style="width: 23.7821%;height: 17px;text-align: center"></td>
<td style="width: 22.429%;height: 17px;text-align: center"><strong>Variable 1 Group 1</strong></td>
<td style="width: 24.3234%;height: 17px;text-align: center"><strong>Variable 1 Group 2</strong></td>
<td style="width: 29.4655%;height: 17px;text-align: center"><span style="color: #3366ff"><strong>Total</strong></span></td>
</tr>
<tr style="height: 17px">
<td style="width: 23.7821%;height: 17px;text-align: center"><strong>Variable 2 Category 1</strong></td>
<td style="width: 22.429%;height: 17px;text-align: center"><span style="color: #008000">Number A</span></td>
<td style="width: 24.3234%;height: 17px;text-align: center"><span style="color: #008000">Number B</span></td>
<td style="width: 29.4655%;height: 17px;text-align: center"><span style="color: #3366ff">Category 1 Total (A+B)</span></td>
</tr>
<tr style="height: 17px">
<td style="width: 23.7821%;height: 17px;text-align: center"><strong>Variable 2 Category 2</strong></td>
<td style="width: 22.429%;height: 17px;text-align: center"><span style="color: #008000">Number C</span></td>
<td style="width: 24.3234%;height: 17px;text-align: center"><span style="color: #008000">Number D</span></td>
<td style="width: 29.4655%;height: 17px;text-align: center"><span style="color: #3366ff">Category 2 Total (C+D)</span></td>
</tr>
<tr style="height: 17px">
<td style="width: 23.7821%;height: 17px;text-align: center"><span style="color: #3366ff"><strong>Total</strong></span></td>
<td style="width: 22.429%;height: 17px;text-align: center"><span style="color: #3366ff">Group 1 Total (A+C)</span></td>
<td style="width: 24.3234%;height: 17px;text-align: center"><span style="color: #3366ff">Group 2 Total (B+D)</span></td>
<td style="width: 29.4655%;height: 17px;text-align: center"><span style="color: #3366ff">Total All (A+B+C+D)</span></td>
</tr>
</tbody>
</table>
The first thing you should notice is that the <em>K</em>x<em>J</em>, or the <em>2</em>x<em>2</em> in our case, refers to the groups/categories of the variables in questions, not to the actual number of rows and columns in the contingency table. Technically speaking, Table XX contains four rows and four columns -- but the ones that count are only the ones in green: two "green" rows and two "green" columns, indicating the number of groups and categories of the variables. The last row and the last column (in blue above) are called <em>margins</em> and are reserved for reporting totals[footnote]The more observant of you may notice that the <em>horizontal margin</em> (the last row) shows the frequency distribution of Variable 1 (i.e., the number of cases per group), while the <em>vertical margin</em> (the last column) shows the frequency distribution of Variable 2 (the number of cases per category).[/footnote]. The first column and the first row (in bold above) are simply titles.

The central cells of the table are the most important ones. In the example above, <em>Number A</em> indicates the number of cases (observations/individuals/etc.) that belong simultaneously to Group 1 (of the first variable) and Category 1 (of the second variable). By analogy, <em>Number B</em> indicates the number of cases that belong simultaneously to Group 2 and Category 1; <em>Number C</em> stands for the number of cases that belong to both Group 1 and Category 2; and finally, <em>Number D</em> is the number of cases that belong to both Group 2 and Category 2.

The margins contain the totals by row and by column, and the last cell (last row/last column) is reserved for the total <em>N</em>.

<em>So what is so special about this table? I've seen such tables all my life!</em> you might be saying right about now. Bear with me, we'll eventually get to the special -- and somewhat complicated -- part (and likely you'll be sorry for it). First though, let's look at a contingency table with some actual numbers.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Do You Like The Campus Cafeteria?</em></p>

</header>
<div class="textbox__content">

Imagine you are frustrated within the food options available in your campus cafeteria and you wonder if others share your thoughts on the matter (perhaps in order to gauge support for changes you'd like to see enacted or similar type of activism). Before you devote time to do an actual random sample study (now that you know), you do a quick exploratory poll of your classmates in one of your classes . You ask 35 people whether they like the campus cafeteria, and in the process, you get the inkling that second-year students seem to have different opinion about the food options than the first-year students in the class. You plot your results:

Table XXA <em>Do You Like The Campus Cafeteria?</em>
<table class="lines" style="border-collapse: collapse;width: 0%;height: 128px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"></td>
<td style="width: 30.8403%;height: 15px;text-align: center"><strong>First Year Students</strong></td>
<td style="width: 29.8917%;height: 15px;text-align: center"><strong>Second Year Students</strong></td>
<td style="width: 36.8762%;height: 15px;text-align: center"><strong>Total</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"><strong> YES</strong></td>
<td style="width: 30.8403%;height: 15px;text-align: center">7</td>
<td style="width: 29.8917%;height: 15px;text-align: center">5</td>
<td style="width: 36.8762%;height: 15px;text-align: center">12</td>
</tr>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"><strong> NO</strong></td>
<td style="width: 30.8403%;height: 15px;text-align: center">8</td>
<td style="width: 29.8917%;height: 15px;text-align: center">15</td>
<td style="width: 36.8762%;height: 15px;text-align: center">23</td>
</tr>
<tr style="height: 15px">
<td style="width: 17.5807%;height: 15px;text-align: center"><strong>Total</strong></td>
<td style="width: 30.8403%;height: 15px;text-align: center">15</td>
<td style="width: 29.8917%;height: 15px;text-align: center">20</td>
<td style="width: 36.8762%;height: 15px;text-align: center">35</td>
</tr>
</tbody>
</table>
I'm certain you know how to read this: 7 first-year and 5 second-year students like the cafeteria, while 8 first-year and 15 second-year students do not. There is a total of 12[footnote]As 7+5=12.[/footnote] students who like the cafeteria and 23[footnote]As 8+15=23.[/footnote] students who do not.  You talked to 15[footnote]As 7+8=15.[/footnote] first-year and 20[footnote]As 5+15=20.[/footnote] second-year students, a total of 35 students.

Can you compare the relevant numbers as they are presented in the table? And, for that matter, what are the relevant numbers?

Let's answer both questions in turn.

Recall from Chapter XX on frequencies: No, you cannot compare the numbers as stated since the two groups you have to compare are different size. The relevant comparison is between the different year students who like the cafeteria -- first-years vs. second-years -- as this is what you want to know.

It's true that 2 more first-years like the food in the cafeteria than the second year students (7&gt;5) but at the same time you had 5 more second-year students in your sample (20&gt;15). To take into account the differing group size, you need to compare proportions (or percentages): the proportion of first-year students who like the cafeteria against the proportion of second-year students who like the cafeteria. You therefore calculate the respective proportions, turning them into percentages at the end:
<ul>
 	<li>$\frac{7}{15}=0.467$, or 46.7% of first-years like the cafeteria</li>
 	<li>$\frac{5}{20}=0.250$, or 25% of second-years like the cafeteria</li>
 	<li>$\frac{8}{15}=0.533$, or 53.3% of first years do NOT like the cafeteria</li>
 	<li>$\frac{15}{20}=0.750$, or 75% of the second-years do NOT like the cafeteria</li>
 	<li>$\frac{12}{35}=0.342$, or 34.2% of ALL students like the cafeteria</li>
 	<li>$\frac{23}{35}=0.718$, or 71.8% of ALL students do NOT like the cafeteria</li>
</ul>
To summarize the information neatly, we modify our table to this:

Table XXB <em>Do You Like The Campus Cafeteria? (Column Percentages)</em>
<table class="lines" style="border-collapse: collapse;width: 0%;height: 128px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 18.4798%;height: 15px;text-align: center"></td>
<td style="width: 26.4268%;height: 15px;text-align: center"><strong>First Year Students</strong></td>
<td style="width: 38.4833%;height: 15px;text-align: center"><strong>Second Year Students</strong></td>
<td style="width: 48.941%;height: 15px;text-align: center"><strong>Total</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.4798%;height: 15px;text-align: center"><strong> YES</strong></td>
<td style="width: 26.4268%;height: 15px;text-align: center">46.7%</td>
<td style="width: 38.4833%;height: 15px;text-align: center">25%</td>
<td style="width: 48.941%;height: 15px;text-align: center">34.3%</td>
</tr>
<tr style="height: 15px">
<td style="width: 18.4798%;height: 15px;text-align: center"><strong> NO</strong></td>
<td style="width: 26.4268%;height: 15px;text-align: center">53.3%</td>
<td style="width: 38.4833%;height: 15px;text-align: center">75%</td>
<td style="width: 48.941%;height: 15px;text-align: center">71.8%</td>
</tr>
<tr style="height: 15px">
<td style="width: 18.4798%;height: 15px;text-align: center"><strong>Total</strong></td>
<td style="width: 26.4268%;height: 15px;text-align: center">100%</td>
<td style="width: 38.4833%;height: 15px;text-align: center">100%</td>
<td style="width: 48.941%;height: 15px;text-align: center">100%</td>
</tr>
</tbody>
</table>
So far so good? From Table XXB now we clearly see that your initial hunch was right: there does seem to be a difference in the opinions of your classmates based on which year they are in their studies. That is, while you do have support for anti-cafeteria activism (only 34.3% of your classmates like the campus cafeteria, while 71.8% dislike it) first-year students seem to like the cafeteria a lot (almost twice) more than second-year students do: 46.7% of first-years like the food options in the cafeteria compared to only 25% of the second-years, a difference of 21.7 percentage points.

</div>
</div>
The example above shows <strong>what you need to examine the possible association between two discrete variables: a <em>cross-tabulation</em> (listing percentages, not absolute numbers!), visually, and a<em> difference in proportions</em> (or percentages), numerically</strong>.

Again, a reminder that this is sample-only exploration. We make no predictions or inferences about a population, we just explore what the data we have at hand shows.

So far, I purposefully shows you how the logic of the descriptive analysis of contingency table goes, <em>the right way</em>. Here comes the complication, however: why did I calculate the proportions in the example the way I did? Consider the alternative:
<ul>
 	<li>$\frac{7}{12}=0.583$, or 58.3% of the students who like the cafeteria are first-years</li>
 	<li>$\frac{5}{12}=0.417$, or 41.7% of the students who like the cafeteria are second-years</li>
 	<li>$\frac{8}{23}=0.348$, or 34.8% of students who do NOT like the cafeteria are first-years</li>
 	<li>$\frac{15}{23}=0.652$, or 65.2% of students who do NOT like the cafeteria are second-years</li>
 	<li>$\frac{15}{35}= 0.429$, or 42.9% of ALL students are first-years</li>
 	<li>$\frac{20}{35}=0.571$, or 57.1% of ALL students are second-years</li>
</ul>
Table XXC below demonstrates this alternative.

Table XXC <em>Do You Like The Campus Cafeteria? (Row Percentages)</em>
<table class="shaded" style="border-collapse: collapse;width: 0%;height: 128px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 18.4798%;height: 15px;text-align: center"></td>
<td style="width: 29.9292%;height: 15px;text-align: center"><strong>First Year Students</strong></td>
<td style="width: 34.0584%;height: 15px;text-align: center"><strong>Second Year Students</strong></td>
<td style="width: 49.8635%;height: 15px;text-align: center"><strong>Total</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 18.4798%;height: 15px;text-align: center"><strong> YES</strong></td>
<td style="width: 29.9292%;height: 15px;text-align: center">58.3%</td>
<td style="width: 34.0584%;height: 15px;text-align: center">41.7%</td>
<td style="width: 49.8635%;height: 15px;text-align: center">100%</td>
</tr>
<tr style="height: 15px">
<td style="width: 18.4798%;height: 15px;text-align: center"><strong> NO</strong></td>
<td style="width: 29.9292%;height: 15px;text-align: center">34.8%</td>
<td style="width: 34.0584%;height: 15px;text-align: center">65.2%</td>
<td style="width: 49.8635%;height: 15px;text-align: center">100%</td>
</tr>
<tr style="height: 15px">
<td style="width: 18.4798%;height: 15px;text-align: center"><strong>Total</strong></td>
<td style="width: 29.9292%;height: 15px;text-align: center">42.9%</td>
<td style="width: 34.0584%;height: 15px;text-align: center">57.1%</td>
<td style="width: 49.8635%;height: 15px;text-align: center">100%</td>
</tr>
</tbody>
</table>
<div>Table XXB and Table XXC contain two different sets of percentages. The percentages in Table XXB are called <em>column</em> percentages, while the percentages in Table XXC are called <em>row</em> percentages. <strong>Column percentages are calculated "down the columns" (i.e., the proportions are based on the numbers on the horizontal margin/last row, which in turns lists "100%" in each column). Row percentages are calculated "right/across the rows" (i.e., proportions are based on the vertical margin/last column, which in turn lists "100%").</strong></div>
<div></div>
<div>Why didn't we use Table XXC in the example above? The answer is in the warning box below.</div>
<div></div>
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><span style="color: #ff0000"><strong>Watch Out!!</strong></span>...<em>for Choosing The Wrong Percentages in Contingency Tables</em></p>

</header>
<div class="textbox__content">

The complication regarding choosing the "right" percentage arises due to the fact that what is considered the "right" or the "wrong" percentage depends on what you actually want to know, as in, what your research question/question of interest is. The percentages in Table XXC are "wrong" only because they are not helpful to answer the question whether there is a difference in the two groups of students we compare, first-years and second-years. Had we been comparing the YES group and the NO group on how many first-year students they each contained, we'd have used Table XXC. However, this doesn't seem like the most relevant question we could ask in <em>this</em> hypothetical study.

Unfortunately, that's not all. If you thought <em>OK, then, I'll always just use column percentages and be done with it</em>, you'd have been too hasty. You see, <strong>the "correctness" of the percentages you need depends on where your compared-groups variable is placed.</strong> In Table XXB I placed the groups-to-be-compared (first-years vs. second-years) in the columns, and therefore I calculated the column percentages. If I had put the groups-to-be-compared in the rows, I would have calculated the row percentages (which would have resulted in a transposed Table XXB)[footnote]

<em>Table XXD Do You Like The Campus Cafeteria? (Transposed -- and still correct)</em>
<table class="shaded" style="border-collapse: collapse;width: 0%;height: 128px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 21.171%;height: 15px;text-align: center"></td>
<td style="width: 23.7356%;height: 15px;text-align: center"><strong>YES</strong></td>
<td style="width: 26.1799%;height: 15px;text-align: center"><b>NO</b></td>
<td style="width: 42.0742%;height: 15px;text-align: center"><strong>Total</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 21.171%;height: 15px;text-align: center"><strong>First Year Students</strong></td>
<td style="width: 23.7356%;height: 15px;text-align: center">46.7%</td>
<td style="width: 26.1799%;height: 15px;text-align: center">53.3</td>
<td style="width: 42.0742%;height: 15px;text-align: center">100%</td>
</tr>
<tr style="height: 15px">
<td style="width: 21.171%;height: 15px;text-align: center"><strong>Second Year Students</strong></td>
<td style="width: 23.7356%;height: 15px;text-align: center">25%</td>
<td style="width: 26.1799%;height: 15px;text-align: center">75%</td>
<td style="width: 42.0742%;height: 15px;text-align: center">100%</td>
</tr>
<tr style="height: 15px">
<td style="width: 21.171%;height: 15px;text-align: center"><strong>Total</strong></td>
<td style="width: 23.7356%;height: 15px;text-align: center">34.3%</td>
<td style="width: 26.1799%;height: 15px;text-align: center">71.8%</td>
<td style="width: 42.0742%;height: 15px;text-align: center">100%</td>
</tr>
</tbody>
</table>
[/footnote].

Many students faced with contingency tables have trouble deciding whether they need column or row percentages. My advice is (which you can take as a rule of thumb) to be clear what groups you are comparing based on your question:<strong> if you compare the groups in the columns, you need column percentages; if you compare the groups in the rows, you need row percentages</strong>. (This is also the reason why I labeled Variable 2's attributes as <em>categories</em> early in this section, not to confuse them with the Variable 1's <em>groups</em>.)

Another rule of thumb you might find useful: try to always put your groups-to-be-compared in the columns (as most people find comparing a left column to a right column, horizontally, easier), then you'll always need column percentages. That said, do not assume that everyone follows this last advice: sometimes you might find a table where the relevant comparison is top row to bottom row, vertically. To orient yourself in the organization of the table, look for which margin contains the "100%"s -- if it's the horizontal margin (bottom row), you're dealing with column percentages, if it's the vertical margin (last column), you're dealing with row percentages.

Finally, <strong>never try to "compare" the percentages that add to 100%</strong> (be they in the rows or in the columns) as this would not constitute a comparison at all -- instead, it would be a breakdown of the groups in terms of composition (that's why they'd add up to 100%, like the 25% of second-years who liked the cafeteria and the 75% who did not in Table XXB above). Again, <strong>what you need to compare is always the fraction of cases from one group falling in a category of interest to the fraction of cases from the other group in the same category of interest.</strong>

</div>
</div>
All of this is arguably complicated at first blush. The light at the end of the tunnel is that the more you work with contingency tables, the easier you will find constructing them and/or interpret them correctly.

To that effect, let's take an example with real existing data.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Gender Differences in the Speaking Aboriginal Language Ability among Indigenous Canadians , APS 2012</em></p>

</header>
<div class="textbox__content">

Statistics Canada's <em>Aboriginal People Survey (APS) 2012</em> is a nationally representative survey of First Nations peoples (living off reserve), Métis and Inuit, 6 years of age and older (Statistics Canada, 2019)[footnote]One could perhaps see the APS 2012 as an effort by Statsistics Canada to address some of the volunary NHS 2011 issues with coverage/non-response of the listed population groups.[/footnote]. Language is a key element in retaining, preserving, and transmitting culture; as such, the ability of Indigenous peoples to speak their ancestral languages is of special interest given the recommendations of the Truth and Reconciliation Commission's (TRC) final report (2015).

For the purposes of this example, I am interested if there are gender differences in the ability to speak an Aboriginal Language among the collected sample. Table XX shows the cross-tabulation of <em>gender</em> (called <em>sex</em> in the APS) and <em>speaking Aboriginal language</em> variables. (Both variables are binary in the survey.)

Table XXA Speaking Aboriginal Language Ability by Gender, APS 2012

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-aboriginal-gender-language.jpg" alt="" width="422" height="186" class="alignnone wp-image-986 size-full" />

Source: Statistics Canada (2019)

As you can see, working with real, large N data makes proportions even more indispensable for making sense of the table.   We need to compare the fraction of women who speak an Aboriginal language (or languages) to the fraction of men who are able to do that. To make things easier, I followed the rules of thumb I listed in the Watch Out!! XX Box above: the groups-to-be-compared are in the columns, and we need to compare them horizontally[footnote]A point to be made here is that when working with binary data, it's enough to focus on one of the categories on which you compare the groups, as the other category would be a complement of the first as we are working with proportions. That is, here we need consider only the YES category (as the NO category is it's exact opposite, i.e., "1- YES") due to the fact that we're interested in those who can speak the language, not those who don't.[/footnote]. Therefore, I need column percentages. Table XXB does just that.

Table XXB <em> Speaking Aboriginal Language Ability by Gender, APS 2012 (Column Percentages)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-aboriginal-gender-language-percent.jpg" alt="" width="582" height="260" class="alignnone wp-image-999 size-full" />

Source: Statistics Canada (2019)

SPSS provides both the original cell count (i.e., frequency) and the respective percentage below it[footnote]Yet another useful rule of thumb: make sure that SPSS lists "% within [groups-to-be-compared]" as this indicates that the correct percentages appear in the table. In this case, SPSS tells us that it has listed "% within Sex of respondent", i.e., the ones we need in order to compare the two gender groups.[/footnote].

We can thus easily see that while only 41.4% of men in the sample can speak an Aboriginal language, 45% of women in the sample can do that; i.e., there is a gender difference of 3.6 percentage points in favour of women[footnote]You might think it a small difference, but the magnitude of the difference is not the most important thing when establishing statistical associations. More on the topic in Section XX below.[/footnote].

</div>
</div>
So far, we discussed only 2x2 contingency tables, i.e., binary variables. Of course, discrete variables can have more than two categories each. In the case of a 2x3 table (and assuming our groups to-be-compared are in the columns), we'd simply have three groups/proportions to compare. In the case of 2x<em>J</em>, where <em>J</em>&gt;3, we'd have J groups/proportions to compare. The proportions can be compared in two ways: one against the remaining ones together (through one difference of proportions), or each compared to each of the remaining ones (through several difference of proportions).

Matters become more complicated when we let go of binary variables altogether and have <em>K</em>x<em>J</em> table where both <em>K</em>&gt;2 and <em>J</em>&gt;2 instead. This type of table can be visually complicated, the larger the <em>K</em> and <em>J</em>. However, the comparison can still be done between groups on a category of interest in the manner described above. For a brief illustration, see Table XX below.

Table XX <em>Marital Status Differences in Perceived Health, CCHS 2016</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/crosstab-marital-status-health-cchs.png" alt="" width="761" height="359" class="wp-image-1002 size-full alignleft" />

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

Source: Statistics Canada (2019)

Table XX is a 5x4 table and it presents data from Statistics Canada's <em>Canadian Community Health Survey (CCHS) 2015-2016,</em> crosstabulating <em>marital status</em> (in 4 groups) and <em>perceived health</em> (in 5 categories). Considering that the latter is an ordinal variable, a way to mentally simplify the presented information is to focus on the extremes -- the proportions of people in the different marital status groups who reported excellent or poor health.

A quick examination of the relevant percentages reveals that fewer widowed/divorced/separated respondents appear to report their health as excellent than any of the other groups (15.9% vs. 22.2%, 23.8%, and 25.1% of married, common-law, and single respondents, respectively) -- a difference of 6.3 percentage points at the minimum in favour of the other groups. Correspondingly, widowed/separated/divorced respondents also report their health as poor more often than the other marital status groups (6.4% vs. 3.3.%, 2.3%, and 2.6% for married, common-law, and single individuals, respectively) -- a difference of 3.1 percentage points at the minimum in favour (or rather, <em>dis</em>favour) of the widowed/married/separated group.

As such, it appears that while the other groups do not seem to differ much on their self-reported health, the widowed/separated/divorced group stands out by reporting lower levels of health, an observation consistent through all five health categories, an indication that the variables <em>marital status</em> and <em>perceived health</em> could be associated.

This concludes my presentation on how to analyze contingency tables data for possible discrete variable associations; the only thing left is to tell you how to produce a table with SPSS.
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title">SPSS Tip XX <em>How to Create Contingency Tables</em></p>

</header>
<div class="textbox__content">
<ul>
 	<li>From the <em>Main Menu,</em> select <em>Analyze</em>, and then from the pull-down menu, <em>Descriptive Statistics</em> and then <em>Crosstabs</em>;</li>
 	<li>Select your pair of discrete variables of interest from the list on the left-hand side, and, using the appropriate arrows, move each to their respective slot on the right: <em>Row(s)</em> or <em>Column(s)</em>;</li>
 	<li>Click on the <em>Cells</em> button in the top right corner[footnote]This is important as if you fail to click on <em>Cells</em> and just click <em>OK</em> at the bottom, SPSS will produce a table with only the observed count (i.e., number of elements in each cell) which will make comparison between the groups impossible. Clicking <em>Cells</em> allows you to choose which percentages you want calculated and included in the table.[/footnote]; in the resultant window select <em>Observed</em> in <em>Counts</em>, and either <em>Row </em><strong>or</strong><em> Column</em> in <em>Percentages</em>[footnote]Avoid selecting both, and even more so, avoid selecting all three  options (<em>Row</em>, <em>Column</em>, and <em>Total</em>). I guarantee you wouldn't want to interpret the resulting table should you choose more than one set of percentages. Again, be careful to request the percentages for the place, rows or columns, where you put your groups-to-be-compared. If they are in the rows, select <em>Row</em> in <em>Percentages</em>; if they are in the columns, select <em>Column</em> in <em>Percentages</em>.[/footnote], depending on where you put your groups-to-be-compared, and click <em>Continue</em>;</li>
 	<li>Once back at the original window, click <em>OK</em>.</li>
 	<li>The <em>Output</em> window will show the contingency table of the variables you selected.</li>
</ul>
</div>
</div>
So far, we have seen how we examine potential bivariate associations between a discrete and a continuous variable (previous Section 7.2.1) and between two discrete variable (presently). We now turn to the last bivariate combination, between two continuous variables, next.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>974</wp:post_id>
		<wp:post_date><![CDATA[2019-03-21 23:19:47]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-22 03:19:47]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[7-2-2-between-two-discrete-variables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>34</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>7.2.3. Between Two Continuous Variables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-2-3-between-two-continuous-variables/</link>
		<pubDate>Fri, 22 Mar 2019 03:20:26 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=976</guid>
		<description></description>
		<content:encoded><![CDATA[The distinctive feature of continuous variables is their large number of values. As discussed previously, typically we treat most interval/ratio variables as continuous. However, sometimes ordinal variables too can have a number of categories, large enough to justify their treatment as continuous for the purposes of statistical analysis. (Think back to the previous Section 7.2.2 and imagine crosstabulating a variable with, say, 10+ categories on another; the resulting table will be too unwieldy for meaningful examination.)

As well, continuous variables have values of different magnitudes, which can be ordered from low to high. Thus, what we will be looking for when examining two such variables for a possible association is whether a pattern exists between their values, or, alternatively, if their values do not exhibit any predictable combination. While many types of patterns can exists, for the purposes of this introductory text we'll focus on the two simplest ones: a <em>positive linear </em>association and a <em>negative linear</em> association. The way we describe and examine such associations is visually through a graph called a <em>scatterplot</em> and numerically through a special indicator called <em>Pearson's correlation coefficient r</em> (or <em>Pearson's r</em>, or just <em>r</em>). I explain both below.

<strong>A positive linear association is a pattern in which low values of one variable go with low values of the other variable alongside with high values of the former going with high values of the latter.</strong> That is, in a positive linear association when the values of Variable 1 increase or decrease, so do the values of Variable 2. As its name suggests, <strong>a negative linear association is the exact opposite: low values of one variable go with high values of the other variable and vice versa.</strong> Then, as the values of Variable 1 <em>increase</em>, the values of Variable 2 will tend to <em>decrease</em>, or vice versa.

Both the positive and the negative version of this pattern are called <em>linear</em> because plotting the values of the two variables on a coordinate system shows the data points "congregating" in an approximately "straight" fashion, as if along an imaginary straight line with an upward (i.e., positive) or downward (i.e., negative) slope[footnote]Other than linear associations exists, e.g., <em>curvilinear</em> (imagine U-shaped or inverted U-shaped <em>curves</em> in the data, instead of a straight line). Analyzing these is more complicated and beyond the scope of this book. The discussion hereafter will consider only bivariate linear associations assocations, regardless if I mention it explicitly or not. [/footnote].

Consider the following example two figures.

Figure XXA <em>Positive Association: Test Scores by Class Attendance (Simulated Data</em>[footnote]The simulated data used here for illustration purposes only is provided by DataBake (www.databake.io). [see terms of use 3.6, 3.7: (free) datasets can be copied, modified, stored or otherwise used for your own personal, academic, or internal business purposes"][/footnote]<em>)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-students-attendance-testscore.png" alt="" width="462" height="370" class="wp-image-1011 size-full aligncenter" />

In the <strong>scatterplot</strong> in Figure XX above, I have plotted data from 35 imaginary students on their class attendance and subsequent final test scores[footnote]The data is called <em>simulated</em> as it's computer-generated for the purposes of the exercise.[/footnote]. Both <em>class attendance</em> and <em>test scores</em> are continuous variables. (Attendance is a ratio variable measuring proportion of the class time attended while test scores is an interval variable measured in percentages.) Each point of the data represents <em>simultaneously</em> a student's attendance (on the horizontal axis) and a student's test score (on the vertical axis); e.g., the lowest/left-most data point stands for a student who attended about 20% of class time and scored less than 20% on the final exam. The data points look "scattered" all over the graph, hence the name <em>scatterplot</em>.

You can easily see the pattern in the data in Figure XX: lower attendance seems to go with lower test cores, and higher attendance with higher scores. The bottom right side (high attendance/low scores) and the top left side (low attendance/high scores) of the graph are empty: there seem to be no students who attended classes a lot but scored low on the test nor students who didn't attend much but scored high on the test. Had there been no pattern, the data points would spread all over the graph, identifying no clear "congregation" of values based on their magnitude.

Since class attendance and test scores seem to go <em>concordantly</em> "together" (i.e., low/low and high/high), we have indication of a <em>positive</em> association.

Figure XXA <em>Negative Association: Test Scores by Time Spent On Social Media (Simulated Data)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterdplotstudents-attendance-social-media.png" alt="" width="462" height="370" class="wp-image-1012 size-full aligncenter" />

Again, both <em>time on social media</em> and <em>test scores</em> are continuous variables, with time on social media measured in average hours per day.

The pattern in Figure XX is the opposite of the one we had before: lower number of hours spent on social media seem to go with higher test cores, and higher social media usage with lower scores. This time, the bottom left side (low on social media/low scores) and the top right side (high on social media/high scores) of the graph are empty: there seem to be no students who spent very little time on social media but scored low on the test nor students who had high usage of social media but scored high on the test.

Since social media usage and test scores seem to go <em>discordantly</em> "together" (i.e., low/high and high/low), here we have an indication of a <em>negative</em> association.

Figure XXB and Figure XXB below make the point about linearity clearer by adding something called a <em>line of best fit </em>to the original graphs[footnote]We discuss the line of best fit (aka regression line) in Chapter XX.[/footnote]. <strong>The slope of the line indicates the nature of the supposed association: upward/positive or downward/negative.</strong>

Figure XXB <em>Positive Association: Test Scores by Class Attendance With Line of Best Fit</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-students-attendance-testscore-line.png" alt="" width="462" height="370" class="wp-image-1016 size-full aligncenter" />

Figure XXB <em>Negative Association: Test Scores by Time Spent On Social Media With Line of Best Fit</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterdplotstudents-attendance-social-media-lineA.png" alt="" width="462" height="370" class="wp-image-1019 size-full aligncenter" />

Compare the slopes of the lines in the figures above to the one in Figure XX below.

<em>Figure XX No Association: Test Scores By Student Number in Class (Selected Scores)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-student-number-test-score-flat-lineA.png" alt="" width="462" height="370" class="wp-image-1022 size-full aligncenter" />

The graph in Figure XX above plots the non-existent association between a student's number in in the class and their final test score. Of course, this is a bogus "association" which I'm showing here only as an example of <strong>a <em>flat</em> line of best fit, an indication that the two variable have nothing to do with each other</strong>. The line in Figure XX is not perfectly flat, however, so it helps to have a numerical indication of association in addition to the visual ones the scatterplots give us.

Before we get to that, a word of warning. The presumption of linearity for this type of analysis is <em>very</em> important and <strong>you should make sure to not impose linearity where it doesn't exist</strong>. the caveat below explains.
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><span style="color: #ff0000"><strong>Watch Out!!</strong></span>... For Non-Linear Associations</p>

</header>
<div class="textbox__content">

Data points without a pattern produce a flat (i.e., with no linear slope) line of best fit, as shown in Figure XX above. However, <strong>data points in a non-linear patter will also result in a flat (i.e., with no linear slope) line of best fit</strong>, if we insist on seeing the variables as linearly associated. This can lead to dismissing a potential association only because it's non-linear, which would be a mistake. While this textbook doesn't go into non-linear associations, this doesn't mean they do not exist or they are not important: on the contrary, but they do require you to use different methods to investigate them.

My warning here is simple: <strong>When working with given data, keep an eye on potential non-linearity. Otherwise you may incorrectly assume no association when in fact a non-linear association exists. </strong>Figure XX below illustrates.

Figure XX <em>Curvilinear Association: Test Scores By Student Number in Class (All Scores)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-student-number-test-score-CUBIC-flat-line.png" alt="" width="462" height="370" class="wp-image-1023 size-full aligncenter" />

Surprisingly enough, Figure XX shows that students at the beginning and at the end of the class list scored lower on their final test than their peers for whatever reason, or simply by chance (my bet would be on the latter).

Regardless of the reason or lack thereof, my goal here is to show you that imposing linearity by drawing a <em>linear</em> line of best fit will end up as a flat line, which one hastily may take as an indication of no association (see the straight blue line on the graph). A closer and more careful look, however, reveals the inverted-U shape pattern of the data points in the scatterplot: As the student numbers increase initially, so do the test scores. Then, as the student numbers continue to increase, the test scores start decreasing (see the curved red line following the data points much more closely that the blue flat one). This is clearly a pattern that should not be ignored in any serious, real-life study.

</div>
</div>
A visual summary of the data and any potential bivariate associations like the scatterplot is thus very useful. Scatterplots are in fact rather indispensable if one is to base their analysis on the assumption of a linear association between two continuous variables. Still, the like in the previous two cases of two discrete variables and a discrete and a continuous variable, a numerical summary of the potential association can be of great help.

For discrete variables we could examine and report differences of proportions, while for a discrete and continuous variables we use differences of means (or medians). In both cases we could compare groups (on proportions, or means). In the case of continuous variables, we have neither groups, nor a convenient number to compare them on. Instead, here we have a correlation coefficient, <em>Pearson's r</em>. The correlation coefficient takes all data points simultaneously and summarizes to what extent certain values of one of the variables go with certain values of the other variable, i.e., if they form a pattern or they vary independently of each other.

While we'll examine the exact definition and calculation of the Pearson's <em>r</em> in Chapter X, for now we'll focus on its interpretation.

<strong>The correlation coefficient <em>r</em> is a number between -1 and +1, indicating the strength of any possible (linear) association between two continuous variables.</strong> However, there is a catch: <strong>the strength of the association is calculated in absolute terms while the ± sign is there to indicate whether the association is positive or negative</strong>. Thus, both <em>r</em>=-1 and<em> r</em>=1 stand for the strongest possible (i.e., perfect) correlation, the former perfect <em>negative association</em>, the latter perfect <em>positive association.</em> Between them is <em>r</em>=0, or no association.

While perfect correlations (<em>r</em>=±1) are very rare (if not non-existent)[footnote]The obvious exception here is the correlation of a variable on itself, which will produce <em>r</em>=1.[/footnote], most variables's associations are somewhere between 0 and ±1.  <strong>The closer a correlation is to 0, the weaker it is; the closer the correlation is to -1 or +1, the stronger it is.</strong> Typically, in the social sciences a correlation of about <em>r</em>=±0.7 would be considered strong, a correlation of about <em>r</em>=±0.5 would be considered moderate, and a correlation about <em>r</em>=±0.3 would be considered weak. Correlations around ±0.8 or ±0.9 would therefore be very strong, while associations around ±0.2 and ±0.1 would be quite weak.

Now that you are well-equipped with knowledge about interpreting correlations, let's see what the correlations of the associations discussed above were.

First we looked at class attendance and test scores (Figures XXA and XXB); the correlation between the two variables was a very strong <em>r</em>=0.881. Then, we looked at social media usage and test scores (Figures XXA and XXB), where the correlation was equally strong <em>r</em>=-0.882[footnote]If you're wondering why the correlations appear to be of the same strength, the reason lies in the way I created the synthetic variable <em>social media usage</em> -- as an inversion of the simulated variable <em>class attendance</em>. I did warn you the data is made up as a heuristic. (Do not take this to mean that such associations -- between attendance and class performance and social media usage and test scores -- do not exist in real life.) [/footnote]. Finally, we discussed the practically non-existent association between student number and test scores (of selected students, Figure XX) whose <em>r</em>=0.049, while the improperly imposed linearity in Figure XX from the caveat had a similar so-weak-almost-zero correlation of <em>r</em>=-0.051.

Tired of fake data? Ready to return to the real world of sociological research? Then let's take a real example with existing data and see how it all works out.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title">Example XX <em>Intergenerational Reproduction of Privilege in Education in the USA (GSS 2018)</em></p>

</header>
<div class="textbox__content">

For this example I'm using data from the National Opinion Research Center's (NORC) at the University of Chicago <em>General Social Survey (GSS) 2018</em>. I'm interested in exploring whether <em>father's education</em> and the <em>education</em> of the respondent are potentially correlated (as demonstrated in numerous studies over the years [citations]). Both father's education and education of the respondent are measured in years of schooling, ranging from 0 (no education) to 20 years. As such they are discrete ratio variables which we can treat as continuous due to their number of values being quite large (twenty-one to be precise). Figure XX shows the relevant scatterplot.

Figure XX <em>Respondent's Years of Schooling</em><em style="text-indent: 1em;font-size: 1rem"> by Father's Years of Schooling (GSS, 2018)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/scatterplot-paeduc-educ-gss-line.png" alt="" width="462" height="370" class="wp-image-1034 size-full aligncenter" />

Source: NORC/GSS (2018).

There are several thing to note in the graph above. One is that the data points look less "scattered" and more orderly arranged in neat rows and columns than would be the case, had we variables with much larger number of values. Furthermore, while <em>N</em>=1,687, there are much fewer data points on the scatterplot: the reason, of course, is that there are many observations "on top" of each other, i.e., most data points represent more than one person's combination of their own years of education and their respective father's years of education. (After all, most such combination are unlikely to be unique; we can arguably expect there to be more than one respondent and their father both having, say, 12 years of education in the dataset.)

Substantively, however, what do we see in the scatterplot above? To the extent that there are respondents with low levels of education, they seem to have fathers with low levels of education too. As well, while respondents with higher levels of education seem to have fathers with all levels of education, those with higher parental education appear to be more than those with lower parental education. (That is, both the left and the right side of the upper half of the scatterplot have many observations, but the top right area do seem to contain more observations than the top left area). Finally, and most importantly, there seem to be almost no respondents with low levels of education whose fathers had high levels of education (note the empty bottom right area of the graph).

All in all, it seems like more years of father's education "go" with more years of respondent's education, and fewer years of father's education "go" with fewer years of respondent's education -- though not completely so, or the top left area of the graph (the less educated fathers with more educated offspring) would be empty too. This is reflected in the line of best fit whose slope, while positive, is not very steep.

<strong>Ultimately, the scatterplot indicates that <em>father's education</em> and<em> respondent's education</em> seem positively associated in the dataset but also that this association is not very strong.</strong> That is, there appears to be intergenerational reproduction of privilege in education, however, fortunately, one's father's lower levels of education don't seem to completely preclude one's own educational attainment.

The correlation coefficient provides a numerical summary of the potential association described above.

Table XX <em>Correlation between Father's Years of Schooling and Respondent's Years of Schooling (GSS 2018)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/03/correlation-paeduc-educ-gss.png" alt="" width="492" height="262" class="wp-image-1035 size-full aligncenter" />

<strong>SPSS's output provides <em>r</em> as "Pearson Correlation", and here <em>r</em>=0.413. As suspected, this reflects positive a moderate/moderately-weak association.</strong>[footnote]Note that SPSS's bivariate correlation tables are 2x2 tables, with the information repeated twice. Thus, while four coefficients are provided in the central cells of the table, they are actually two pairs of the same two correlations. (That is, correlations are symmetric: correlating <em>Variable 1</em> on <em>Variable 2</em> is the same as correlating <em>Variable 2</em> on <em>Variable 1</em>.). As well, one of these two pairs is always equal to 1, as a variable correlated on itself is a perfect correlation. This is shown in the table as <em>corr</em>(<em>Highest year school completed, Highest years school completed, father</em>)=0.413=(<em>Highest year school completed, father, Highest years school completed</em>) and <em>corr</em>(<em>Highest year school completed, Highest year school completed</em>)=1=(<em>Highest year school completed, father, Highest year school completed, father</em>).[/footnote]

</div>
</div>
<header class="textbox__header">To summarize, you can describe and examine potential associations between continuous variables through scatterplots with lines of best fit (looking for a concordant or discordant pattern in the data points) and the coefficient of correlation <em>r</em> (ranging from 0 to ±1 in strength, with 0 standing for  no correlation and ±1 constituting a perfect negative or a perfect positive correlation).</header><header></header><header>Before we move on, the tip below shows how to get the visual and the numerical summary of continuous bivariate associations in SPSS.</header><header></header><header>
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title">SPSS Tip XX <em>Scatterplot and Correlation Coefficient</em></p>

</header>&nbsp;

<strong>For Scatterplots:</strong>
<ul>
 	<li><span style="font-size: 1rem;text-indent: 0px">From the <em>Main Menu</em> select <em>Graphs</em> and, from the pull-down menu, <em>Legacy Dialogues</em>; click on <em>Scatter/Dot</em>;</span></li>
 	<li>Keep the pre-selected <em>Simple Scatter</em> option and click <em>Define</em>;</li>
 	<li>In the new window, select one by your variables of interest from the list on the left and, using the arrow buttons, move them to the <em>X Axis</em> and <em>Y Axis</em>[footnote]At this point, it doesn't really matter which one you put in the X or Y Axis though I would suggest placing the variable that precedes the other in time (like father'd education generally precedes offspring's education) in the X Axis. The reasons for this will be explained in Section XX, Chapter X.[/footnote] empty spaces on the right; click <em>OK</em>.</li>
 	<li>The <em>Output</em> window will show the resulting scatterplot; double-clicking on it will open a <em>Chart Editor</em> window from where you can change the text, colours, size, etc. of the graph to suit your needs.</li>
</ul>
<strong>For the correlation coefficient (Pearson's<em> r</em>):</strong>
<ul>
 	<li> From the <em>Main Menu</em>, select <em>Analyze</em>;</li>
 	<li>From the pull-down menu, select <em>Correlate</em> and then <em>Bivariate</em>;</li>
 	<li>In the resulting window, select one at a time your two variables of interest from the list on the left and, using the arrow button, move them to the <em>Variables</em> space on the right (the order is not important); click <em>OK</em>.</li>
 	<li>The <em>Output</em> window will display a symmetric 2x2 table with your requested correlation coefficient.</li>
</ul>
</div>
[SUMMARY ALL BIVARIATE DESCRIPTIONS; A DO-IT HERE AND the OTHER TWO CASES?]

</header>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>976</wp:post_id>
		<wp:post_date><![CDATA[2019-03-21 23:20:26]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-22 03:20:26]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[7-2-3-between-two-continuous-variables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>34</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>7.3. Summary [EMPTY]</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/7-3-summary-empty/</link>
		<pubDate>Fri, 29 Mar 2019 22:29:26 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1054</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1054</wp:post_id>
		<wp:post_date><![CDATA[2019-03-29 18:29:26]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-29 22:29:26]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[7-3-summary-empty]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>34</wp:post_parent>
		<wp:menu_order>6</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>8.3. Hypothesis Testing</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/8-3-hypothesis-testing/</link>
		<pubDate>Thu, 04 Apr 2019 22:47:14 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1093</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

The first thing you should know about testing hypotheses is their relationship to statistical inference: <strong>We formulate hypotheses about the population of interest, and <em>only</em> about the population of interest</strong>. <strong>We test them through sample data.</strong>

Like so: say I have heard enough on the topic of gender gap that I hypothesize that women and men receive different income on average. I explore my sample data and I do find that in <em>the sample with which I'm working</em> men have a higher average income than women. <em>It seems</em> like there is an association between gender an income; however, <em>I do not know </em>if there is an association between gender and income <em>in the population </em>in general. To that effect, I want to estimate (with a given level of certainty) whether such an association exists <em>in the population</em>. My hypothesis is about <em>the gender/income association in the population</em>. (After all, I can <em>see</em> the different average income levels in the sample, there is no need to <em>hypothesize</em> about the sample.)

You may be getting tired of my italicizing "the population" but it really <em>is</em> that important: hypotheses are stated about <em>the population</em>. This is key for testing, so keep it in mind.

If the test provides us with evidence in support of our alternative hypothesis, we call the association being tested <em>statistically significant</em>[footnote]Statistical significance has a very narrow, very specific meaning as you will learn further in this section. On the difference between statistical significance and significance in general, see warning Box XX.[/footnote].

Before we get to the nitty-gritty of hypotheses testing, here's <strong>an overview to show you the underlying logic of how it all works</strong>:
<ol>
 	<li>State null and alternative hypotheses;</li>
 	<li>Assuming the null hypothesis as "true", calculate the related score (e.g., <em>z</em>-value, <em>t</em>-value, etc.);</li>
 	<li>Find the probability associated with that score (essentially the probability that the null hypothesis is indeed "true", called <em>p-value</em>)</li>
 	<li>If that probability is <em>low enough</em> (i.e., below the <em>level of significance</em>, explained below), reject the null hypothesis; if the probability is <em>too high</em> (above the level of significance), fail to reject the null hypothesis;</li>
 	<li><strong>If the null hypothesis has been rejected, you have found support for the alternative hypothesis: your bivariate association is statistically significant, and therefore generalizable to the population.</strong></li>
 	<li><strong>If the null hypothesis has not been rejected, you have found no support for the alternative hypothesis: your bivariate association is not statistically significant, and is <i>perhaps </i>due to expected sampling variability (i.e., to random error) appearing in this one particular sample.</strong></li>
</ol>
Example XX below illustrates the whole process in detail. As with applying the CLT to confidence intervals, it's easier to start with an example where we assume we have the population parameters. Once you grasp the underlying logic, we can move on to testing bivariate associations.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XXA Employee Productivity (Finding Statistically Significant Results, N=100)</em></p>

</header>
<div class="textbox__content">

Imagine a large company has created a productivity index to measure its employees' productivity. The (interval scale) index is constructed to be normally distributed, with a mean of 600 points and a standard deviation of 100 points.

Imagine further that a hundred of the company's employees were randomly selected to attend a new specialized training course, after which their average productivity score was measured as 650 points. Can we conclude that the training course had indeed increased productivity? Or are the gain of 50 points something due to regular sampling variability? That is -- is this 50 points gain statistically significant?

Here's what we have, formally stated:

$\mu=600$

$\sigma=100$

$\overline{x}=650$

$N=100$

What we want to know is the probability of a score of 650 if the training course didn't contribute to the gain, i.e., <strong>the probability of a score of 650 <em>under the condition of the null hypothesis</em>.</strong>
<ul>
 	<li>H<sub>0</sub>: The training course did not affect productivity (the 650 score was due to random chance); $\mu_\overline{x}$ $=\mu$.</li>
 	<li>H<sub>a</sub>: The training course affected productivity (the 650 score was a true gain);           $\mu_\overline{x}$ $\neq\mu$.</li>
</ul>
Recall from Chapter 5 and Chapter 6 that to obtain the probability of a score we need to express in terms of standard deviations (i.e., in standard errors, as we are working with a sampling distribution), i.e., we need its z-value.

The standard error is:

$\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}=\frac{100}{\sqrt{100}}=\frac{100}{10}=10$

Then the z-value of 650 is:

$z=\frac{\overline{x}-\mu}{\sigma_\overline{x}}$ $=\frac{650-600}{10}=\frac{50}{10}=5$

What is the probability under the normal curve of a $z=5$? Figure XX shhows the sampling distribution under the condition of the null hypothesis ($\mu_\overline{x}$ $=\mu$, i.e., centered on 600) which demonstrates where a $z=5$ falls.

[PLACEHOLDER FOR FIGURE]
<p style="padding-left: 30px">Given the properties of the normal curve, we know that 68% of all means in infinite sampling will fall between ±1 standard error (i.e, between 590 and 610), 95% will fall between ±1.96 standard errors (i.e., approximately between 580 and 620), and 99% will fall between ±2.58 standard errors (i.e., approximately between 570 and 630). The score of 650 has $z=5$ -- it falls very, <em>very</em> far in the right tail.</p>
In terms of probabilities, consider the following: if a sample mean has a 99% probability of being approximately between 570 and 630, and the remaining 1% is distributed equally in the two tails, the probability beyond 630 is 0.5%. Assuming the null hypothesis were true, our calculations show that the 650 score then appears with a probability of <em>p</em>&lt;0.005[footnote]The <em>p</em> here stands for "probability".[/footnote] -- a very small probability, so small that a score of 650 seems highly unusual[footnote]Generally, you don't need to draw the sampling distribution to obtain this probability. While SPSS provides it as a default, you can also check the probability of any z-value <a href="https://www.socscistatistics.com/pvalues/normaldistribution.aspx">here</a>.[/footnote].

<strong>And this is where the crux of the logic hypotheses testing lies: the chance of the 100 employees getting an average productivity score of 650 after a training course <em>if the course had no effect</em> is so small, that it is <em>highly</em> unlikely to be the case. It is much likelier that the course had an effect</strong>, so that $\mu_\overline{x}$ $\neq\mu$ (and in fact $\mu_\overline{x}$ $&gt;mu$).

<strong>We therefore reject the null hypothesis and conclude that the score of 650</strong> doesn't appear to be just due to random variability (otherwise it would be within 3 standard errors away from the sampling mean while it stands at 5, under the null hypothesis). Rather, it <strong>is statistically significantly different from 600</strong>. In other words, our evidence suggests that the training course may have affected the productivity score of employees who took it. (Again, causality aside, note that we haven't proven beyond a shadow of a doubt that it did , rather that<em> given our evidence at this point in time, we have a reason to believe it did</em>.)

</div>
</div>
In the example above we ended up rejecting the null hypothesis. I will also show how it can turn out that we cannot reject the null hypothesis but first I'll use the opportunity to 1) <span style="font-size: 14pt">make a connection to a concept with which you are already familiar -- confidence intervals; </span><span style="text-indent: 18.6667px;font-size: 14pt">and 2) </span><span style="text-indent: 18.6667px;font-size: 14pt">introduce two interrelated important theoretical concepts, the level of significance and the p-value.</span>

<strong>Hypothesis testing and confidence intervals. </strong>Believe it or not, these two are complementary as both testing a hypothesis and constructing a confidence interval allow us to arrive at the same conclusion. To see this, we just need to construct a, say, 95% confidence interval for $mu_\overline{x}$ for Example XX:
<ul>
 	<li>95% CI: $\overline{x}\pm1.96\times\sigma_\overline{x}$ $= 650\pm1.96\times10=650\pm19.6=(630.4; 669.6)$</li>
</ul>
That is, as per Section XX, we can be 95% certain that the average score for a larger population of employees who take the training course would be between approximately 630 points and 670 points. The average general score of 600 points is not part of the plausible values for $\mu_\overline{x}$, which is consistent with our decision to reject the null hypothesis.

<strong>The level of significance and the p-value.</strong> The concept of <em>level of significance</em> is used to adjudicate whether the probability (of our results if the null hypothesis is true) is too high to dismiss the null hypothesis or low enough to allow us to reject the null hypothesis. In other words, the level of significance is what we use to proclaim results as statistically significant (when we reject the null hypothesis) or not statistically significant (when we fail to reject the null hypothesis).

Think about it this way: recall that with confidence intervals we had selected 95% certainty and 99% certainty as meaningful levels of confidence. What is left is 5% and 1% "uncertainty", as it were, which we agree to tolerate. These 5% or 1% are distributed equally between the two tails of the normal distribution (2.5% on each side or 0.5% on each side, respectively). They also correspond to <em>z</em>=1.96 and <em>z</em>=2.58. Following the logic of Example XX above, in order to reject a null hypothesis, we want the probability to be lower that these 5% or 1% (so that we can "feel confident enough").

And this is exactly it: When we put it that way, saying that we want the probability (of the null hypothesis being true) -- called a <em>p-value</em> -- to be less than 5%, we have essentially set the level of significance at 0.05. If we want the probability to be less than 1%, we have set the level of significance at 0.01. We can go even further: we might want to be extra cautious and to want <span style="text-indent: 18.6667px;font-size: 14pt">a "confidence" of 99.99%</span><span style="text-indent: 1em;font-size: 14pt">, so that we want the probability to be less than 0.01% -- then we have set the level of significance at 0.001.  </span>

These three numbers -- 0.05, 0.01, and 0.001 -- are the most commonly used levels of significance. The level of significance is denoted by the small-case Greek letter <em>a</em>, i.e., <em>α</em>[footnote] The small-case Greek letter <em>α</em> is pronounced 'AL-pha, as most of you surely know.[/footnote], thus we usually choose one of the following:

$\alpha=0.05$

$\alpha=0.01$

$\alpha=0.001$

<strong>You can think of the significance level as the acceptable probability of being wrong</strong> -- and what is acceptable is left to the discretion of the researcher, subject to the purposes of the particular study.

Following the logic presented in Example XX then, <strong>if the probability of the result under the null hypothesis -- the p-value -- is smaller than a pre-selected significance level <em>α</em>, the null hypothesis is rejected and the result is considered statistically significant</strong>[footnote]Note the difference between <em>α</em> and the p-value. While α indicates what probability of being wrong we are willing to tolerate, the actual p-value we obtain is <em>not</em> the probability of being wrong. The p-value, again, is the probability of our result if the null hypothesis is true; in other words, if the null hypothesis is in fact true, and our p-value is, say, 0.03, we'd obtain our results 3% of the time simply due to random sampling error. [/footnote]. This is denoted in one of the following ways:

p ≤ 0.05

p ≤ 0.01

p ≤ 0.001[footnote]In published research you will find results marked by one asterisk, two asterisks, and three asterisks. These correspond to their significance based on the level used: <em>α</em>=0.05, <em>α</em>=0.01, and <em>α</em>=0.001, respectively. The smaller the level of significance, the more strongly statistically significant the result is (i.e., most consider <em>α</em>=0.001 to indicate "highly statistically significant" results). (If you happen upon a dagger (†), it indicates significance at <em>α</em>=0.1 level, or 10% probability of being wrong, which most researchers consider too high, but some still use.[/footnote]

<strong>To summarize, when a hypothesis is tested, we end up with an associated <em>p</em>-value (again, the probability of the observed sample statistics if the null hypothesis is true). We compare the p-value to the pre-selected significance level <em>α</em>: if <em>p</em> ≤ <em>α</em>, the results are statistically significant and therefore generalizable to the population.</strong>

So far so good? Good. However, unfortunately this isn't all (sorry!). What I have presented above is the most conventional treatment of how to use and interpret <em>p</em>-values. It is attractively straightforward  -- but it's also arbitrary, and its<em> true</em> interpretation is subject of an ongoing debate. As an introduction to the topic, I'll leave it at that but you should be aware that there's more to the <em>p</em>-value, and that its usage has been (rightfully) questioned and/or challenged in recent years.[footnote] You can find plenty of information on the topic online; from journals banning the use of p-values and hypothesis testing in favour of effect size (the <em>Journal of Applied and Social Psychology</em>, see Trafimow &amp; Marks, 2015 https://www.tandfonline.com/doi/full/10.1080/01973533.2015.1012991), to calls to abandon statistical significance (e.g., McShane, Gal, Gelman, Robert &amp; Tackett, 2019 https://www.tandfonline.com/doi/abs/10.1080/00031305.2018.1527253), to others calling for its and p-values defense (e.g., Kuffner &amp; Walker, 2016 https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1277161?src=recsys; Greenland, 2019 https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1529625?src=recsys). One thing is clear: p-values and levels of significance have become increasingly controversial. Still, the American Statistical Association's position is that although caution against over-reliance on a single indicator is necessary, p-values can still be used, <em>alongside with other appropriate methods</em>: "<span>Researchers should recognize that a </span><i>p</i><span>-value without context or other evidence provides limited information. For example, a </span><i>p</i><span>-value near 0.05 taken by itself offers only weak evidence against the null hypothesis. Likewise, a relatively large </span><i>p</i><span>-value does not imply evidence in favor of the null hypothesis; many other hypotheses may be equally or more consistent with the observed data. For these reasons, data analysis should not end with the calculation of a </span><i>p</i><span>-value when other approaches are appropriate and feasible" (Wasserstein &amp; Lazar, 2016https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108?src=recsys). Finally, if you really want to not to overstate what the p-value actually shows, see Greenland et al. (2016) for a <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/">list</a> of common misinterpretations and over-interpretations of the p-value, of confidence intervals, and tests significance. Because of its enormity, the topic is still conventionally taught as I presented it above (as it goes way beyond the scope of this book), at least at introductory level.</span>[/footnote].

Going back to our example from above, let's see how the p-values can change due to particular features of the study, like the sample size. Example XXB illustrates.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XXAB Employee Productivity (Finding Statistically Non-significant Results, N=25)</em></p>

</header>
<div class="textbox__content">

Imagine that we had the same information as in Example XXA, however, 25 employees took the training course instead of 100 and their average score was 620. The we have:

$\mu=600$

$\sigma=100$

$\overline{x}=620$

$N=25$

We still want to know the probability of a score of 620 if the training course didn't contribute to the gain, i.e., <strong>the probability of a score of 650 <em>under the condition of the null hypothesis</em>.</strong>
<ul>
 	<li>H<sub>0</sub>: The training course did not affect productivity (the 620 score was due to random chance); $\mu_\overline{x}$ $=\mu$.</li>
 	<li>H<sub>a</sub>: The training course affected productivity (the 620 score was a true gain);           $\mu_\overline{x}$ $\neq\mu$.</li>
</ul>
The new standard error is:

$\sigma_\overline{x}$ $=\frac{\sigma}{\sqrt{N}}=\frac{100}{\sqrt{25}}=\frac{100}{5}=20$

Then the z-value of 620 is:

$z=\frac{\overline{x}-\mu}{\sigma_\overline{x}}$ $=\frac{620-600}{20}=\frac{20}{20}=1$

What is the probability under the normal curve of a $z=1$? Figure XX shows the sampling distribution under the condition of the null hypothesis ($\mu_\overline{x}$ $=\mu$, i.e., centered on 600) which demonstrates where a $z=1$ falls.

[PLACEHOLDER FOR FIGURE]
<p style="padding-left: 30px">Given the properties of the normal curve, we know that 68% of all means in infinite sampling will fall between ±1 standard error (i.e, between 580 and 620), 95% will fall between ±1.96 standard errors (i.e., approximately between 560 and 640), and 99% will fall between ±2.58 standard errors (i.e., approximately between 540 and 660). The score of 620 has $z=1$ -- it falls quite close to the mean of 600.</p>
In terms of probabilities, consider the following: z=1 has a <em>p&gt;</em>0.30[footnote]Again, you can check the exact p-value <a href="https://www.socscistatistics.com/pvalues/normaldistribution.aspx">here</a> (in a two-tailed test).[/footnote]. <strong>Assuming the null hypothesis is true, our calculations show that the 620 score will appear more than 30% of the time due to random chance, which is a lot more than the 5% (at α=0.05) that we are willing to tolerate. As such, we cannot reject the null hypothesis: we do not have enough evidence to conclude that the gain in productivity of 20 points which the 25 employees demonstrated is statistically significant. In other words, we don't have enough evidence that the training course was effective.</strong> (This doesn't mean that it didn't beyond a shadow of a doubt, just that<em> at this point in this particular study we don't have enough evidence to say it did</em>.)

We can also see the correspondence with confidence intervals:
<ul>
 	<li>95% CI: $\overline{x}\pm1.96\times\sigma_\overline{x}$ $= 620\pm1.96\times20=620\pm39.2=(580.8; 659.2)$</li>
</ul>
That is, we can be 95% certain that the average score for a larger population of employees who take the training course would be between roughly 581 points and 659 points. <strong>The average general score of 600 points is a plausible values for $\mu_\overline{x}$, which is consistent with our decision to not reject the null hypothesis.</strong>

</div>
</div>
Example XX is a heuristic device, used only to explain the logic of hypotheses testing. Of course, normally we wouldn't have information about population parameters and will be using sample statistics (i.e., we would use not only the sample mean $\overline{x}$ but also the sample standard deviation s, to calculate the estimated sampling distribution $s_\overline{x}$). As you learned in Section XX, this moves us from using the z-distribution to the t-distribution with given degrees of freedom. Recall that with a sample size of about 100 -- i.e., with df=100 -- the two distributions converge.

Here then is a quick-and-dirty method you can use as a preliminary indication of whether something will be statistically significant. Since z=1.96 corresponds to 5% probability (2.5% in each tail), and z=2.58 corresponds to 1% probability (0.5% in each tail), even without knowing the exact <em>p</em>-value associated with a given z-value, you can guess that getting a z&lt;1.96 will be non-significant while a z&gt;1.96 will be significant at <em>α</em>=0.05; similarly, getting a z&gt;2.58 will be  statistically significant at α=0.01[footnote]Obviously, for negative z-values we'll have all these in reverse: -z&gt;-1.96 will be non-significant and -z&lt;-1.96 will be significant, etc.[/footnote]. As samples used in sociological research are commonly of N&gt;100, the same insight applies to the corresponding t-values with df≥100. Understand, however, that this is not an official way to test hypotheses or report findings: to do that, <strong>you always need to report the <em>exact p</em>-value associated with a z-value or a t-value with given df</strong>[footnote]You can find a handy online p-value calculator of t-values <a href="https://goodcalculators.com/student-t-value-calculator/">here</a>. [/footnote].

<strong>One-tailed tests.</strong> Finally, a note on <em>one-tailed tests</em>. While at the beginner researcher level, I'd advise you against using them yourself, it's not a bad idea to know they exist and what they are. Briefly, the idea is that if we have a good reason to suspect not only a difference/effect but a difference/effect with a specific direction (i.e., positive or negative), we can specify the hypotheses accordingly. To use Example XXA again, say we think there is no possibility that the training course <em>decreased</em> productivity scores. Then we can state the hypotheses as:
<ul>
 	<li>H<sub>0</sub>: The training course either did not affect productivity or <em>decreased</em> it; $\mu_\overline{x}$ ≤$\mu$.</li>
 	<li>H<sub>a</sub>: The training course <em>increased</em> productivity;  $\mu_\overline{x}$ &gt;$\mu$.</li>
</ul>
This is a stronger claim (that's why it needs to be well-justified) -- we test not a difference (that can be either positive or negative) but an <em>increase</em>. Thus, we move the significance level to only <em>one</em> of the tails, as it were, the positive (right) tail. In Figure XX you can see that instead of having 2.5% in each tail, we then have 5% in the right tail.

[FIGURE XX PLACEHOLDER]

This change in probability essentially "moves" the z-value corresponding to significance closer to the mean; now a smaller z-value will have the p-value necessary to achieve statistical significance. To be precise, 5% (2.5% in each tail) corresponded to z=1.96; all 5% in the <em>right</em> tail corresponds to z=1.65. This obviously "lowers the bar" of achieving statistical significance <em>without changing the level of significance α itself</em>, and makes rejecting the null hypothesis easier, hence my description of the two-tailed test as more conservative (and my insistence on using it instead of a one-tailed test).

Before we move on to the last section of this theoretical chapter, the promised warning about the meanings of the term <em>significance</em>.
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch out!!</strong></span>... for Mistaking Statistical Significance for Magnitude or Importance</em></p>

</header>
<div class="textbox__content">

If you have been paying attention, you have learned by now that statistical significance has a very narrow meaning. To have a statistically significant result simply means that the probability of observing our sample statistics (or difference, or effect, etc.) as they are given that the null hypothesis is true is small enough to be (highly) unusual; to be so relatively rare as to indicate what we have is not a result of random sampling variation.

None of this says <em>anything</em> about how <em>big</em> a difference/effect is -- in fact it can be quite small, and still <em>statistically</em> significant, given large enough sample size and other study specifications[footnote]<span style="font-size: 1rem">This is actually one of the reasons some have called for abandoning <em>p</em>-values, statistical significance, and hypothesis testing whatsoever, because statistical significance is not indicative of effect size and is frequently over-stated to mean more than it does; at the same time over-reliance on <em>p</em>-values decreases attention to effect size, careful study design, context, etc.</span><span style="text-indent: 1em;font-size: 1rem">[/footnote].</span>

Similarly, many people unfamiliar with statistics take statistical significance to mean that the finding are of significant importance. Again, nothing about statistical significance confers great meaning to or implies importance of statistically significant findings. One can study an objectively trivial/unimportant issue and have statistically significant findings of no relevance to anyone whatsoever.

To conclude, keep these distinctions -- between the conventional usage of the word significant (meaning either important, or big) and statistical significance -- in mind, both when interpreting and reporting results and when reading and evaluating existing research.

</div>
</div>
When testing hypotheses, I defined the significance level as sort of probability of being wrong we are willing to tolerate. This implies that a likelihood of making an <i>erroneous </i>decision about the null hypothesis (to reject it or not) exists. The next and final section deals with just that.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1093</wp:post_id>
		<wp:post_date><![CDATA[2019-04-04 18:47:14]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-04 22:47:14]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[8-3-hypothesis-testing]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>1051</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>9.1. Between a Discrete and a Continuous Variable: The t-test and The F-test</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/9-1-between-a-discrete-and-a-continuous-variable/</link>
		<pubDate>Sat, 06 Apr 2019 06:17:08 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1137</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

For this part, you need to recall (from Section 7.2.1) how we described bivariate associations between two variables, one of which is treated as discrete and one as continuous. In this case we essentially compared the groups (categories of the discrete variable) by their mean (or median) value on the continuous variable.  We examine the potential association between such variables visually through boxplots and numerically through a difference of means.

Now the question in front of us is: even if we do see a difference in the means of the different groups <em>in sample data</em>, how certain can we be that this association is real and reflective of the population? As we learned in Chapter 8, to answer this question, we need to test the difference for statistical significance.

I'll start with a few theoretical notes, which we'll then apply to the example I used in Chapter 7 about the potential gender difference in average income. In this way we'll be able to test whether the difference observed in the NHS 2011 data (\$16,401 in favour of men to be precise) is statistically significant or not. In the latter half of this chapter we'll see what happens when there are more than two groups' means to compare.

<strong>Testing the difference of two means.</strong> Recall from Section 8.3. that we tested whether the employees who took a training course indeed had a higher average productivity by simply calculating the <em>z</em>-value (or, using the estimated standard error, the <em>t</em>-value with a given <em>df</em>) for the mean and then finding its associated <em>p</em>-value. We could then compare the <em>p</em>-value to the preselected <em>α</em>-level and make a conclusion regarding the null hypothesis.

You'll be happy to know that testing a difference of means follows the same principle: obtain <em>z/t</em>-value, get associated <em>p</em>-value, compare to <em>α</em>. What's not the same is  that now we are testing a difference of two means -- so we need a <em>z/t</em>-value for the<em> difference</em>. It turns out, we can calculate one as easily as ever, as long as we had the standard error of the <em>difference</em>[footnote]I hope you haven't forgotten that $z=\frac{\overline{x}-\mu}{\sigma_\overline{x}}$, where the standard error $\sigma_\overline{x}$ $=\frac{\sigma}{N}$.[/footnote].

<strong>The standard error of a difference of two means is a combination of their separate standard errors:</strong>

$\sigma_(\overline{x}_1-\overline{x}_2)$ $=\sqrt{\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2}}$ = <em>standard error of the difference of two means</em>

where the subscripts refer to the first and second group being compared.

The z-value for a difference of two means follows the ordinary z-value formula, but with the <em>difference</em> taking the place of the single mean:

$z=\frac{(\overline{x_1} -\overline{x_2})-(\mu_1 -\mu_2 )}{\sigma_(\overline{x}_1-\overline{x}_2)}$

However, under the null hypothesis we hypothesize there is no difference in the population means, as such $\mu_1=\mu_2$, and thus $\mu_1-\mu_2=0$. Accounting for that in the formula, along with substituting the standard error with its own formula from above, we get:

$z=\frac{\overline{x_1} -\overline{x_2}}{\sqrt{\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2}}}$

Finally, since we generally don't know the population parameters but work with sample data, we estimate the standard error <em>σ</em> with the sample standard error <em>s</em>, thus moving to the <strong><em>t</em>-value through which we test the difference for statistical significance:</strong>

$t=\frac{\overline{x_1} -\overline{x_2}}{\sqrt{\frac{s_1^2}{N_1}+\frac{s_2^2}{N_2}}}$ = <em>t-test for the difference of means</em>[footnote]The more observant of you would notice that the squared standard deviations of the two groups, i.e., the <em>s<sub>1</sub><sup>2</sup></em> and s<em><sub>2</sub><sup>2</sup></em> here are of course the groups' variances (which we need if we are to have them under the square root). In this version of the formula, the groups are taken to have <em>unequal</em> variances, which is a more conservative assumption than assuming the variances of the two groups are equal. If we have a good reason to assume <em>equal</em> variances, then <em>s<sub>1</sub><sup>2 </sup></em>and <em>s<sub>2</sub><sup>2</sup></em> will just be the same (combined, or pooled) variance<em> s<sup>2</sup></em>, and the formula will look like this:

$t=\frac{\overline{x_1} -\overline{x_2}}{s\sqrt{\frac{1}{N_1}+\frac{1}{N_2}}}$ [/footnote]

Note than unlike the single value case where the <em>df=N</em>-1, when working with a difference of means of two groups the<span style="text-indent: 18.6667px;font-size: 14pt"> </span><em><span style="text-indent: 18.6667px;font-size: 14pt">df=N</span></em><span style="text-indent: 1em;font-size: 14pt">-2.</span>

Before you eyes glaze over (completely), rest assured that SPSS calculates this for you; I only provide it here to show you that the logic of hypothesis testing is the same, only the formulas change to accommodate the testing of a <em>difference of means</em> rather than a single mean.

From this point on, it's easy: you only need to check the <em>p</em>-value of the <em>t</em>-value you have obtained (given the specific <em>df</em>)[footnote]You can do that through an online p-value calculator for the t-distribution like this one <a href="https://www.socscistatistics.com/pvalues/tdistribution.aspx">here</a>.[/footnote], and compare it to the significance level, and <em>voila</em> -- you have yourself a significance test!

Let's see how this all works out in an example. I promised you to test the gender differences in average income, didn't I?
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XX Testing Gender Differences in Average Income, NHS 2011 </em></p>

</header>
<div class="textbox__content">

As in Example XX in Section 7.2.1, I use a random sample of about 3 percent of the entire NHS 2011 data, this time resulting in <em>N</em>=21,902[footnote]Since I use a new random sub-sample of the data, you can consider this an indirect illustration of sampling variation. For comparison of sample statistics as well as variable description, refer back to Example XX[/footnote].

We are still interested in whether women and men on average earn differently, i.e., whether <em>gender</em> affects <em>income</em>:
<ul>
 	<li>H<sub>0</sub>: The average income of women and men is the same, $\mu_m =\mu_f$</li>
 	<li>H<sub>a</sub>: The average income of women and men is different, $\mu_m \neq\mu_f$</li>
</ul>
There are 11,323 women (<em>N<sub>f</sub></em>=11,323) and 10,579 men (<em>N<sub>m</sub></em>=10,579) in the sample. The men earn an average of \$48,113 ($\overline{x}_m =48113$) and women earn an average of \$31,519 ($\overline{x}_f =31,529$).  The respective standard deviations are \$68214 for men ($s_m =68214$) and \$34,760 for women ($s_f=34760$).

The difference of means is therefore:

$\overline{x}_m -\overline{x}_f =48113-31519=16594$

The question is whether this \$16,549 is due to sampling variation (i.e., statistically not different than a population difference of means of \$0), or unusual enough so that a population mean of \$0 to be unlikely (i.e., statistically significant).

To test this, we need to calculate the standard error of the difference. Once we have the standard error of the difference, we can calculate the <em>t</em>-value.

The standard error of the difference is:

$s_\overline{x}_m-\overline{x}_f$ = $\sqrt{\frac{s_m^2}{N_m}+\frac{s_f^2}{N_f}}=\sqrt{\frac{68214^2}{10579}+\frac{34760^2}{11323}}=\sqrt{439848+106708}=739$

The <em>t</em>-value is then:

$t=\frac{\overline{x}_m -\overline{x}_f}{\s_(\overline{x}_m-\overline{x}_f)}=\frac{16594}{739}=22.446$

Given the large <em>N</em>, even just looking at the <em>t</em>-value should make it clear that the difference is statistically significant -- after all, in a two-tailed test, the <em>t</em>-value is significant at 1.96 and on (for <em>α</em>=0.05) and at 2.58 and on (for <em>α</em>=0.01).

Still, this isn't the way to report a test -- this is: <strong>With a <em>t</em>=22.447, <em>df</em>=21,900, and <em>p</em>=0.000</strong>[footnote]You can check this with a p-value calculator; SPSS reports it too.[/footnote]<strong>, and <em>p</em>&lt;0.001</strong>[footnote]That is, the probability to observe a difference of \$16,594 in the sample if there was no difference in the population is smaller than 0.1%.[/footnote]<strong>, we have enough evidence to reject the null hypothesis. Indeed, we can conclude with 99.99% certainty that there is a statistically significant difference between the average income of men and women (i.e., that the difference exists in the population).</strong>

We can check this with a confidence interval too, again substituting the difference in place of a single value[footnote]I hope you rememer that 95% CI: $\overline{x} \pm 1.96\times s_\overline{x}$. [/footnote]:

95% CI: $\overline{x}_m - \overline{x}_f \pm 1.96\times s_\overline{x}_m-\overline{x}_f$ = $16594 \pm 1.96 \times 739 = 16594 \pm 1448$ = $= (15145; 18043)$

That is, we can say that <b>the difference of average incomes between men and women will be between \$15,145 and \$18,043 with 95% certainty; or that 19 out of 20 such studies will find a difference of \$16,594 $\pm$ \$1,448. (</b>We also see the correspondence with hypothesis testing: since the interval doesn't contain 0, 0 is not a plausible value for the difference.)

Inference is not doing too badly, no?

</div>
</div>
Again, SPSS will provide all the calculations but I advise you to still test your understanding of the procedure with the following exercise.
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title">Try It!! XX Gender Differences in Age of Actors in Main Roles</p>

</header>
<div class="textbox__content">

Studies find that due to the gendered social construction of aging (i.e., women are considered "older" and "mature" at younger ages than men), male actors are frequently paired with much younger female actors [CiTATIONS]. For example, the Oscars average age of male and female Academy Award nominees is telling: in the Best Actor category, the average age of men is 43.4 years while the average age of women is 37.2 years (Beckwith &amp; Hester, 2018 [http://thedataface.com/2018/03/culture/oscar-nominees-age]).

Let's say that you want to investigate this phenomenon yourself. You randomly select 100 male and 100 female <span style="font-size: 1rem">academy award nominees, and calculate their age at nomination for an Academy Award. You find that men's average age is 45 years and women's is 36 years, with standard deviations of 15 years for men and 20 years for women. Test the hypothesis that the average age for women is different from that of men for the population of all Best Actor/Actress Oscar nominees. Create a 95% CI for the difference to see its correspondence with the hypothesis test.</span>

</div>
</div>
Now that you understand the principle of testing the difference of two means, let's see what we can do about non-binary discrete variables.

<strong>Testing the difference of more than two means.</strong> When the discrete variable of interest has more than two categories, we can no longer use the simple t-test presented above. While we can still use a boxplot chart for visualizing the association between the two variables -- where instead of two boxplots, we'll have as many boxplots as there are groups (categories of the discrete variable) -- we no longer have only one difference to test.

Testing multiple means for statistical significance is done through a version of a test called an <em>F</em>-test. This <em>F</em>-test tests whether the means of several groups[footnote]Note that "several groups" includes the two-groups case as well: you <em>could</em> test the significance of a difference between the means of two groups with an F-test too (it will just provide less information).[/footnote] are all equal (versus at least one of them not being the same as the rest) through an analysis of variance (aka ANOVA).

At this point you might feel like a treatment of the topic of the kind I offered about the <em>t</em>-test above would be a tad too much, and you'll be correct: providing the full-on technical details and the formula of the F-test is beyond the scope of this book.

Briefly, the ANOVA <em>F</em>-test calculates a ratio of variances (between groups to within groups, in terms of sums of squares): the larger the ratio, the more evidence there is against the null hypothesis, and vice versa. The <em>F</em>-test statistic follows an <em>F</em>-distribution (not discussed here), which provides the <em>F</em>-value with its <em>p</em>-value, which is then compared to the <em>α</em>-level and interpreted in the usual way.

Example XX illustrates.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XX Education Differences in Average Income, NHS 2011</em></p>

</header>
<div class="textbox__content">

Presumably, college is worth it. You delay your full entry into the labour force and instead invest in your education, with the hope that you'll then be able to have a better -- and better-paying job.

Let's examine this questions then -- do higher educational degrees translate into higher average income? -- using about  3% random sample of the <em>NHS 2011</em> data. The variable <em>income</em> is the same one I used in previous occasions (i.e., <em>total income</em> in NHS 2011). The groups to compare are the categories of a variable called (highest) <em>degree</em>. The variable degree is a recoded version of the<em> NHS 2011</em>'s <em>highest certificate, diploma or degree</em>. I recoded the original variable's thirteen categories in <em>degree</em>'s six: 1) no high school, 2) high school, 3) certificate or diploma below Bachelor's, 4) Bachelor's, 5) Master's[footnote]This category includes <span style="font-size: 1rem">certificates above Bachelor's, and medical, dentistry, and veterinary degrees.</span><span style="text-indent: 1em;font-size: 1rem">[/footnote], and 6) PhD.</span>

A brief descriptive investigation of the data reveals that the average income reported by the six education groups <em>looks</em> different: \$19,433 for respondents without a high school degree, \$30,455 for respondents with a high school degree, \$41,971 for respondents with more than a high school but less than a Bachelor's degree, \$60,360 for respondents with a Bachelor's degree, \$71,593 for respondents with a Master's degree, and \$93,924 for respondents with a PhD. This potential positive association (more education, more income) is also reflected in the boxplots in Figure XX. While there are outliers with extremely high average income in all groups (the most extreme were even truncated at the top), the median and the outlier-less maximum income increase from left to right with the increase of highest degree.

<em>Figure XX Average Income by Highest Degree, NHS 2011</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/boxplot-degree-income-nhs.png" alt="" width="462" height="410" class="alignnone wp-image-1213 size-full" />

Source: Statistics Canada (2019).

Are these differences statistically significant? In other words, are the differences observed in the sample a result of regular sampling variation, or reflective of differences in the population?
<ul>
 	<li>H0: The average income of all six education groups is the same.</li>
 	<li>Ha: The average income of some of the education groups is different from others.</li>
</ul>
SPSS reports a larger between-groups than within-groups variance; <strong><em>F</em>=413.535 with <em>p</em>&lt;0.001. With the probability of observing such differences between the groups in the sample, had there been no difference in the population (i.e., under the null hypothesis) less than 1 in a thousand, we reject the null hypothesis and conclude that the differences in average income of groups with different highest degrees are statistically significant. </strong>

</div>
</div>
Before we turn to testing associations between two discrete variables, the SPSS Tip XX below lists the steps of the <em>t</em>-test and ANOVA <em>F</em>-test procedures.
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip XX The t-test and the F-test</em></p>

</header>
<div class="textbox__content">

<strong>For a <em>t</em>-test:</strong>
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Analyze</em>, and from the pull-down menu, click on <em>Compare Means</em> and <em>Independent Samples T Test</em>;</li>
 	<li>Select your continuous variable from the list of variables on the left and, using the top arrow, move it to the <em>Test Variable(s)</em> empty space on the right;</li>
 	<li>Select your discrete variable from the list of variables on the left and, using the bottom arrow, move it to the <em>Grouping Variable</em> empty space on the right;</li>
 	<li>Click on <em>Define Groups</em>, and in the new window, keep <em>Use specified values</em> selected; in the empty spaces for <em>Group 1</em> and <em>Group 2</em>, enter the <em>numeric</em> values[footnote]That would be the "code" -- for example, <em>gender</em> may be coded as "1 female, 2 male", or "0 male, 1 female", etc., depending on the dataset. You have to know this beforehand; if unsure, go back to Variable View and check.[/footnote] corresponding to the two categories of your discrete variable; click <em>Continue</em>.</li>
 	<li>In the <em>Independent Samples T Test</em> window click <em>Options</em>...; you can request specific confidence interval in the new window (the default is 95%); click <em>Continue</em>;</li>
 	<li> Click <em>OK</em> once back to the <em>Independent Samples T Test</em> window.</li>
 	<li>SPSS will produce two tables in the <em>Output</em> window: a <em>Group Statistics</em> one (where you can see sample size, the mean, standard deviation, and standard error for each group (category in the discrete variable), and an <em>Independent Samples Test</em> one (where you can find the <em>t</em>-value, <em>df</em>, <em>p</em>-value, mean difference, standard error of the difference, and the requested confidence interval)[footnote] The table provides two versions of the test: with and without equal variances assumed. Which one you should use depends on the size of the two groups' variances. If the variance of one groups is twice (or more) as big as the other group's variance (like in Example XX above, where the men's variance was much larger than the women's one), use the test results in the bottom row, "equal variances not assumed". If the two groups' variances are relatively similar, you can use the top row, "equal variances assumed". You don't have to decide on your own, as SPSS provides a convenient indication for which one is better to use, under<em> Levene's Test/F</em> for comparing variances. If the <em>F</em>-test is significant (i.e., <em>p</em>≤0.05), the variances are too different and using the bottom row is better; if the <em>F</em>-test is non-significant (i.e., <em>p</em>&gt;0.05) you can assume the variances are equal and use the top row of results.[/footnote].</li>
</ul>
<strong>For the F-test:</strong>
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Analyze</em>, and from the pull-down menu, click on <em>Compare Means</em> and then <em>One-Way ANOVA</em>;</li>
 	<li>Select your continuous variable from the list of variables on the left and, using the top arrow, move it to the <em>Dependent List</em> empty space on the right;</li>
 	<li>Select your discrete variable from the list of variables on the left and, using the bottom arrow, move it to the <em>Factor</em> empty space on the right; click OK.</li>
 	<li>The <em>Output</em> window will present a <em>Oneway ANOVA</em> table, listing a breakdown of variances (by sums of squares), and most importantly, the resulting <em>F</em>-statistics and <em>p</em>-value.</li>
</ul>
</div>
</div>
&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1137</wp:post_id>
		<wp:post_date><![CDATA[2019-04-06 02:17:08]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-06 06:17:08]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[9-1-between-a-discrete-and-a-continuous-variable]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>120</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>10.2.2. Elements of the Linear Regression Model</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/10-2-2-elements-of-the-linear-regression-model/</link>
		<pubDate>Wed, 24 Apr 2019 20:56:03 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1340</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

The secret to minimizing the residuals -- and to ensuring the regression line is indeed <em>the best fitting</em> (to the data) line -- lies in the way the elements of the line are calculated. The regression/preditcion line is, after all, created through <em>a</em> and <em>b</em>, as I explained in Section 10.2:

$$\hat{y}=a+bx=$$ = <em>predicted values</em>

We can calculate <em>a</em> and <em>b</em> such that they minimize the residuals through the following formulas:

$$b=\frac{\Sigma{(x-\overline{x})(y-\overline{y})}}{\Sigma{(x-\overline{x})^2}}=\frac{SP}{SS_x}=$$ = <em>slope</em>, or <em>regression coefficient</em>

$$a=\overline{y}-b\overline{x}=$$ = <em>Y-intercept</em>, or <em>constant</em>

where <em>SP</em> is, again, the sum of products, <em>SS<sub>x</sub></em> is the sum of squares for <em>x</em>, and $\overline{x}$ and $\overline{y}$ are the variable means of <em>x</em> and <em>y</em>, respectively.

As with the correlation coefficient <em>r</em>, once again, everything revolves around variances (and means)[footnote]So much so that the correlation coefficient r and the regression coefficient b are related: $b=r\frac{s_y}{s_x}$ where <em>s<sub>y</sub></em> and <em>s<sub>x</sub></em> are, of course, the standard deviations of <em>y</em> and <em>x</em>, respectively.[/footnote].

An example will serve best to illustrate all this.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XX Assignment Requirements and Marks</em></p>

</header>
<div class="textbox__content">

Here I continue with the fictitious data on which Figure XX is based. In a "sample" of N=11, I have data about the "respondents" completed assignment requirements (<em>x</em>) and their assignment marks (<em>y</em>). In Table XX, I calculate the necessary means, sums of squares, and sum of products.

<em>Table XX Assignment Requirements and Marks: Calculating a and b</em>
<table class="lines" style="border-collapse: collapse;width: 99.9998%;height: 210px" border="0">
<tbody>
<tr style="height: 30px">
<td style="width: 1.41643%;height: 30px;text-align: center">$x$</td>
<td style="width: 2.31468%;height: 30px;text-align: center">$y$</td>
<td style="width: 21.1225%;height: 30px;text-align: center"> $(x-\overline{x})$</td>
<td style="width: 18.0203%;height: 30px;text-align: center">$(x-\overline{x})^2$</td>
<td style="width: 21.4724%;height: 30px;text-align: center">$(y-\overline{y})$</td>
<td style="width: 16.8352%;height: 30px;text-align: center">$(y-\overline{y})^2$</td>
<td style="width: 18.8183%;height: 30px;text-align: center"> $(x-\overline{x})(y-\overline{y})$</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">0</td>
<td style="width: 2.31468%;height: 15px;text-align: center">10</td>
<td style="width: 21.1225%;height: 15px;text-align: center">-1.64</td>
<td style="width: 18.0203%;height: 15px;text-align: center">2.68</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" width="64" height="19">-41.82</td>
<td style="width: 16.8352%;height: 15px;text-align: center">1748.76</td>
<td style="width: 18.8183%;height: 15px;text-align: center">68.43</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">1</td>
<td style="width: 2.31468%;height: 15px;text-align: center">40</td>
<td style="width: 21.1225%;height: 15px;text-align: center">-0.64</td>
<td style="width: 18.0203%;height: 15px;text-align: center">0.40</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">-11.82</td>
<td style="width: 16.8352%;height: 15px;text-align: center">139.67</td>
<td style="width: 18.8183%;height: 15px;text-align: center">7.52</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">2</td>
<td style="width: 2.31468%;height: 15px;text-align: center">70</td>
<td style="width: 21.1225%;height: 15px;text-align: center">0.36</td>
<td style="width: 18.0203%;height: 15px;text-align: center">0.13</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">18.18</td>
<td style="width: 16.8352%;height: 15px;text-align: center">330.58</td>
<td style="width: 18.8183%;height: 15px;text-align: center">6.61</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">3</td>
<td style="width: 2.31468%;height: 15px;text-align: center">100</td>
<td style="width: 21.1225%;height: 15px;text-align: center">1.36</td>
<td style="width: 18.0203%;height: 15px;text-align: center">1.86</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">48.18</td>
<td style="width: 16.8352%;height: 15px;text-align: center">2321.49</td>
<td style="width: 18.8183%;height: 15px;text-align: center">65.70</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">1</td>
<td style="width: 2.31468%;height: 15px;text-align: center">30</td>
<td style="width: 21.1225%;height: 15px;text-align: center">-0.64</td>
<td style="width: 18.0203%;height: 15px;text-align: center">0.40</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">-21.82</td>
<td style="width: 16.8352%;height: 15px;text-align: center">476.03</td>
<td style="width: 18.8183%;height: 15px;text-align: center">13.88</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">1</td>
<td style="width: 2.31468%;height: 15px;text-align: center">45</td>
<td style="width: 21.1225%;height: 15px;text-align: center">-0.64</td>
<td style="width: 18.0203%;height: 15px;text-align: center">0.40</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">-6.82</td>
<td style="width: 16.8352%;height: 15px;text-align: center">46.49</td>
<td style="width: 18.8183%;height: 15px;text-align: center">4.34</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">2</td>
<td style="width: 2.31468%;height: 15px;text-align: center">55</td>
<td style="width: 21.1225%;height: 15px;text-align: center">0.36</td>
<td style="width: 18.0203%;height: 15px;text-align: center">0.13</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">3.18</td>
<td style="width: 16.8352%;height: 15px;text-align: center">10.12</td>
<td style="width: 18.8183%;height: 15px;text-align: center">1.16</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">2</td>
<td style="width: 2.31468%;height: 15px;text-align: center">40</td>
<td style="width: 21.1225%;height: 15px;text-align: center">0.36</td>
<td style="width: 18.0203%;height: 15px;text-align: center">0.13</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">-11.82</td>
<td style="width: 16.8352%;height: 15px;text-align: center">139.67</td>
<td style="width: 18.8183%;height: 15px;text-align: center">-4.30</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">3</td>
<td style="width: 2.31468%;height: 15px;text-align: center">85</td>
<td style="width: 21.1225%;height: 15px;text-align: center">1.36</td>
<td style="width: 18.0203%;height: 15px;text-align: center">1.86</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">33.18</td>
<td style="width: 16.8352%;height: 15px;text-align: center">1101.03</td>
<td style="width: 18.8183%;height: 15px;text-align: center">45.25</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">3</td>
<td style="width: 2.31468%;height: 15px;text-align: center">90</td>
<td style="width: 21.1225%;height: 15px;text-align: center">1.36</td>
<td style="width: 18.0203%;height: 15px;text-align: center">1.86</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">38.18</td>
<td style="width: 16.8352%;height: 15px;text-align: center">1457.85</td>
<td style="width: 18.8183%;height: 15px;text-align: center">52.07</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center">0</td>
<td style="width: 2.31468%;height: 15px;text-align: center">5</td>
<td style="width: 21.1225%;height: 15px;text-align: center">-1.64</td>
<td style="width: 18.0203%;height: 15px;text-align: center">2.68</td>
<td class="xl65" style="height: 15px;width: 21.4724%;text-align: center" align="right" height="19">-46.82</td>
<td style="width: 16.8352%;height: 15px;text-align: center">2191.94</td>
<td style="width: 18.8183%;height: 15px;text-align: center">76.61</td>
</tr>
<tr style="height: 15px">
<td style="width: 1.41643%;height: 15px;text-align: center"><img src="https://pressbooks.bccampus.ca/simplestats/wp-content/ql-cache/quicklatex.com-0d00c2da2b2541a97ae0ac3c10e1504e_l3.svg" alt="\overline{x}" />1.6</td>
<td style="width: 2.31468%;height: 15px;text-align: center"><img src="https://pressbooks.bccampus.ca/simplestats/wp-content/ql-cache/quicklatex.com-01881adf9c51d256ce0a5af82c2e7024_l3.svg" alt="\overline{y}" />51.8</td>
<td style="width: 21.1225%;height: 15px;text-align: center"></td>
<td style="width: 18.0203%;height: 15px;text-align: center"><strong><em>SS<sub>x</sub></em>=12.55</strong></td>
<td style="width: 21.4724%;height: 15px;text-align: center"></td>
<td style="width: 16.8352%;height: 15px;text-align: center"><strong><em>SS<sub>y</sub></em>=9963.64</strong></td>
<td style="width: 18.8183%;height: 15px;text-align: center"><strong><em>SP<sub>xy</sub></em>=337.27</strong></td>
</tr>
</tbody>
</table>
Then, I substitute the relevant numbers into the formulas for <em>a</em> and <em>b</em>:

$$b=\frac{SP}{SS_x}={337.27}{12.55}=26.88$$

$$a=\overline{y}-b\overline{x}=51.8-26.88\times 1.6=51.8-43.99=7.83$$

This makes our <strong>best-fitting/regression line</strong> this:

$$\hat{y}=a+bx=7.83+26.88x$$

... which is exactly what SPSS had already told us, if you care to go back to Table XX in the previous section and check.

You may or might not be impressed by this, but you certainly need to know how to interpret it. In this case the regression tells us that <strong>a student who doesn't complete even one requirement of their assignment is expected to receive 7.83 points (=constant, or Y-intercept); further, for every requirement completed, their mark would increase by 26.88 points (= regression coefficient). That is, the effect of one completed requirement on the assignment mark is 26.88 points.</strong>

We can also calculate the actual predicted values (which form the regression line itself):
<ul>
 	<li>for <em>x</em>=0, $\hat{y}=7.83+26.88\times 0=7.83+0=7.83$;</li>
 	<li>for <em>x</em>=1, $\hat{y}=7.83+26.88\times 1=7.83+26.88=34.71$;</li>
 	<li>for <em>x</em>=2, $\hat{y}=7.83+26.88\times 2=7.83+53.76=61.59$;</li>
 	<li>for <em>x</em>=3, $\hat{y}=7.83+26.88\times 3=7.83+80.64=88.47$.</li>
</ul>
As you can see, these are different values than the ones we had in the deterministic version with which we started in Section XX (i.e., 0 requirements=10 points, 1 requirement=40 points, 2 requirements=70 points, 3 requirements=100 points). The difference between the certainty of the deterministic version and the uncertainty of the current probabilistic version is the unexplained (by number of requirements) variance[footnote]That is, in the deterministic version, we could say that $y=\hat{y}$ (<em>reality</em> = <em>prediction</em>), or rather, that there is no prediction at all -- we know what the true relationship between the variables is as the assignment mark depends entirely on the number of fulfilled requirements. In the actual/probabilistic version, $y=\hat{y}+e$ (<em>reality</em> = <em>prediction plus residual/error</em>), where the residual is what is left unexplained, or simply the difference between reality and prediction. [/footnote]. How much variance we <em>have</em> explained we'll see in the next Section. Before that, here is Figure XX again so that you can pinpoint the predicted values for yourselves. (Hint: they're on the line.)

<em>Figure XX Assignment Requirements and Mark (Redux)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-class-assignment-requirements-mark-with-variability.png" alt="" width="462" height="370" class="wp-image-1345 size-full aligncenter" />

</div>
</div>
<strong>Testing the regression coefficient for statistical significance. </strong>Of course, as with any statistics obtained through a sample, we have to be able to check whether the regression coefficient is generalizable to the population, i.e., whether it's statistically significant. In other words, we have to examine the evidence whether the identified effect of the independent variable on the dependent variable exists in the population or whether it's a result of random sampling.

The significance test for <em>b</em> is your familiar <em>t</em>-test, given by the following formula[footnote]The population version is $z=\frac{b}{\sigma_b}$. Since we generally don't know $\sigma_b$, we substitute it with its estimate, the sample-based <em>s<sub>b</sub></em>. This of course also means we move to the <em>t</em>-distribution.[/footnote]:

$$t=\frac{b}{s_b}$$

where <em>s<sub>b</sub></em> is <em>b</em>'s standard error.[footnote]The standard error of <em>b</em> is calculated by this, admittedly scary-looking, formula:

$$s_b =\sqrt{\frac{\frac{\Sigma{(y-\hat{y})^2}}{(N-2)}}{\sqrt{\Sigma{(x-\overline{x})^2}}    }}$$

<span style="font-size: 14pt;text-indent: 18.6667px">This can be simplified to be more user-friendly but then I'll need to introduce additional concepts (like the </span><em style="font-size: 14pt;text-indent: 18.6667px">mean squared error</em><span style="font-size: 14pt;text-indent: 18.6667px"> and </span><em style="font-size: 14pt;text-indent: 18.6667px">the standard error of the estimate</em><span style="font-size: 14pt;text-indent: 18.6667px">) which are not necessary for you at this stage and are therefore beyond the scope of this book. You'll be happy to know that the hand calculation of <em>s<sub>b</sub></em> also falls in that category.[/footnote]:</span>

The degrees of freedom for <em>t<sub>b</sub></em> are <em>N</em>-2 in the bivariate case.

<strong>Hypothesis testing and confidence intervals for the regression coefficient. </strong>To test the regression coefficient <em>b</em> for significance we have the following hypotheses:
<ul>
 	<li>H<sub>0</sub>: The independent variable <em>x</em> has no effect on the dependent variable <em>y</em> (i.e., the variables are not associated); <em>β</em>=0.</li>
 	<li>H<sub>a</sub>: The independent variable<em> x</em> has an effect on the dependent variable<em> y</em> (i.e., the variables are associated); <em>β</em>≠0[footnote]Note that I'm using causal language here with the assumption that the conditions for causality are met. Theirs is a separate investigation. In and of itself, finding a significant effect of <em>x</em> on <em>y</em> doesn't itself establish that changes in<em> x</em> <em>cause</em> changes in <em>y</em>.[/footnote].</li>
</ul>
<strong>After calculating<em> t<sub>b</sub></em> with <em>df</em>=<em>N</em>-2 and finding its associated <em>p</em>-value, we then compare the <em>p</em>-value to the pre-selected significance level <em>α</em>. As usual, when p≤α, we reject the null hypothesis, and have enough evidence to deem the regression coefficient <em>b</em> statistically significant. If, on the contrary, <em>p</em>&gt;<em>α</em>, we fail to reject the null hypothesis and therefore conclude that at present, there is no evidence to suggest an effect of <em>x</em> on <em>y</em>.</strong>

Again, similarly to other statistics, we can <strong>calculate confidence intervals for <em>b</em>, so that we can report the size of the effect with a specific level of certainty</strong>. For example, the 95% confidence interval for the regression coefficient <em>b</em> is:
<ul>
 	<li>95% CI: $b\pm 1.96\times s_b$</li>
</ul>
To illustrate, let's revisit our example about the effect of parental education on their offspring education. (Don't worry, <span style="text-indent: 18.6667px;font-size: 14pt">with <em>N</em>=1, 686 </span><span style="text-indent: 1em;font-size: 14pt">I will not offer you a calculation by hand: SPSS is there for us.)  </span>
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example XX Effect of Parental Years of Schooling on Respondent's Years of Schooling (GSS 2018)</em></p>

</header>
<div class="textbox__content">

We already examined the association between parental and offspring education through the correlation coefficient r and found it to be moderately weak at 0.413, and statistically significant at <em>α</em>=0.01. Can we do better, however, and estimate the effect of each additional year of parental schooling on the schooling of the respondents?

Again, we use data from the U.S. <em>GSS 2018</em> (NORC, 2019).  Our sample is <em>N</em>=1,686, and <strong>our hypotheses are:</strong>
<ul>
 	<li>H<sub>0</sub>: Father's education has no effect on respondent's education; <em>β</em>=0.</li>
 	<li>H<sub>a</sub>: Father's education has an effect on respondent's education; <em>β</em>≠0.</li>
</ul>
<strong>The regression model is:</strong>

$${years of schooling}=y=a+bx+e=a+b({years of parental schooling})+e$$

<strong>Our predicted values are:</strong>

$${predicted years of schooling} =\hat{y}=a+bx=a+b({years of parental schooling})$$

Figure XX plots the association and Table XX show the relevant SPSS output.

Figure XX <em>Linear Regression of Respondent's Years of Schooling and Father's Years of Schooling</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/scatterplot-educ-paeduc-line.png" alt="" width="462" height="370" class="wp-image-1371 size-full aligncenter" />

&nbsp;

<em>Table XX Linear Regression of Respondent's Years of Schooling and Father's Years of Schooling</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/04/regression-table-educ-paeduc.png" alt="" width="812" height="178" class="wp-image-1370 size-full aligncenter" />

That is, SPSS has calculated the constant (or <em>Y</em>-intercept) <em>a</em> and the regression coefficient <em>b</em> in such a way as to minimize the residuals:
<ul>
 	<li><em>a</em> = 10.67</li>
 	<li><em>b</em> = 0.29</li>
</ul>
Then, the predicted values (i.e., the regression line on Figure XX above) are:

$$predicted years of schooling=\hat{y}=a+bx=10.67+0.29(years of parental schooling)$$

We also know that the standard error of<em> b</em> is <em>s<sub>b</sub></em>=0.016, so

$$t=\frac{b}{s_b}=\frac{0.29}{0.016}=18.607$$ [footnote]If you actually divide 0.29 by 0.016, you'll end up with 18.125. The difference from 18.607 is due to rounding (as the standard error of b is rounded up to 0.016 from 0.01558...).[/footnote]

<strong>Thus, with <em>t</em>=18.607, <em>df</em>=1,684, and <em>p</em>&lt;<em>α</em>=0.001, we can reject the null hypothesis. Our current evidence supports our hypothesis that father's education affects their offspring's education, on average. The effect is 0.29 years (or about 3.5 months) for every additional year of father's schooling, and it's statistically significant.</strong>

As well, we can interpret <strong>the confidence interval:</strong>
<ul>
 	<li>95% CI: $$ b\pm1.96s_b =0.29\pm1.96(0.016)=0.29\pm0.031=(0.26; 0.32)$$</li>
</ul>
<strong>Or, father's education's effect on offspring's education would be between 0.26 additional years and 0.32 additional years for every year of father's schooling with 95% certainty; in other words, the effect would be 0.29 ± 0.031, 19 out of 20 times.</strong>

That's a lot more information than simply stating that the variables are associated based on the correlation coefficient!

</div>
</div>
Let's make sure you understand how regression works and where the regression coefficients and line come from by interpreting regression output.
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Try It! XX  Class Attendance and Final Test Scores (Simulated Data)</em></p>

</header>
<div class="textbox__content">

We are revisiting the simulated data on student class attendance (measured in percent of classes attended) and their final class scores. <em>N</em>=987. <span style="font-size: 1rem">Start by stating your hypotheses then, using the </span><span style="text-indent: 1em;font-size: 1rem">SPSS's output presented in Figure XX and Table XX below, write a paragraph interpreting what you have found, discussing the evidence presented regarding your hypotheses and your decision about them, etc. Include as much information as possible, and don't forget to justify your use of linear regression in this case.</span>

<em>Figure XX Class Attendance and Final Test Scores</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/scatterplot-attendance-scores-full.png" alt="" width="462" height="370" class="wp-image-1377 size-full aligncenter" />

<em>Table XX Class Attendance and Final Test Scores</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/05/regression-attendance-scores-full.png" alt="" width="733" height="163" class="wp-image-1378 size-full aligncenter" />

</div>
</div>
Finally, these are the steps through which the regression output is obtained in SPSS.
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip XX Linear Regression</em></p>

</header>
<div class="textbox__content">
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Analyze</em>, then from the pull-down menu, select <em>Regression</em> and click on <em>Linear</em>;</li>
 	<li>Select your dependent variable from the list of variables on the left and, using the appropriate arrow, move it to the <em>Dependent</em> open space on the right;</li>
 	<li>Select your independent variable from the list of variables on the left and, using the appropriate arrow, move it to the <em>Block 1 of 1</em> empty space on the right.</li>
 	<li>You can click <em>OK</em> or, if you need a confidence interval for <em>b</em>, click on <em>Statistics</em>, and check off <em>Confidence intervals</em> in the new window (here you can also specify the confidence <em>Level</em> of the CI); click <em>Continue</em>;</li>
 	<li>Once back in the original window, click <em>OK</em>.</li>
 	<li>After the <em>OK</em>, SPSS will provide the output in the<em> Output</em> window. The relevant information we have discussed so far can be found in the last table called <em>Coefficients</em>.</li>
</ul>
</div>
</div>
SPSS provides several tables as the standard regression output. Beyond the <em>Coefficients</em> one, there are three other short tables: a <em>Variables Entered/Removed</em> (which lists the independent variable/s in the model and the dependent variable as a footnote), an <em>ANOVA</em> table (which presents analysis of variance information that, as mentioned before, is outside the scope of this book), and a <em>Model Summary</em> table. We'll take a brief look at that last table in the next section.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1340</wp:post_id>
		<wp:post_date><![CDATA[2019-04-24 16:56:03]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-04-24 20:56:03]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[10-2-2-elements-of-the-linear-regression-model]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>128</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>1.3.1 Nominal Variables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/1-3-1-nominal-variables/</link>
		<pubDate>Tue, 30 Jul 2019 20:07:25 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1428</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

As the name of this level of measurement implies, the information contained in the categories of a nominal-scale variable is solely their... well, <em>name</em>. Think about the <em>religious affiliation</em> variable from the <em>Do it!</em> <em>1.2. </em>exercise. You have already probably imagined people's possible affiliations in terms of religion (i.e., what religion they subscribe to, if any) as something like <em>Muslim</em>, <em>Jewish</em>, <em>Christian</em>, <em>Sikh</em>, <em>Hindu</em>, <em>Buddhist</em>, <em>not religious -- </em>though likely (and depending on your own religious affiliation) <em>not in this particular order.</em>[footnote]It's also likely that these general categories might have been <span style="font-size: 18.6667px">disaggregated</span> to list variations/denominations, e.g. <em>Catholic</em> and <em>Protestant</em> instead of simply <em>Christian</em>, or <em>Shia</em> and <em>Sunni</em> instead of simply Muslim, etc. For simplicity's sake, I choose to use the most general religious categories in the example.[/footnote]

&nbsp;

Of course, I could have just as easily listed the possible categories (or "questionnaire answers") as <em>Christian</em>, <em>Muslim</em>, <em>Jewish</em>, <em>Buddhist</em>, <em>Hindu</em>, <em>Sikh</em>, <em>not religious</em>. Or, as <em>Sikh</em>, <em>not religious</em>, <em>Buddhist</em>, <em>Hindu</em>, <em>Jewish</em>, <em>Muslim</em>, <em>Christian</em>. Or, as... virtually any possible variation in the ordering of the list.

&nbsp;

In other words, the information we have about religious affiliation is simply in <em>identifying</em> the different categories, and that is <em>all</em>. We cannot do much more than count the different answers and specify what they are. We cannot even use some inherent order to them, as they are only that, <em>names</em>.[footnote]Of course, we could order the categories alphabetically -- just like you can order pretty much <em>anything</em> alphabetically. That would be an arbitrary decision, however, not an <em>inherent</em> order contained in the names (like that in small to big, left to right, slow to fast, less to more, etc.).

&nbsp;

When researchers study religious affiliation in real life, they usually list the groups' names by the size of the religious group/popularity of a religion in their area. For example, in the Americas and Europe the listings usually start with <em>Christian</em>. In India, one can arguably assume they start with <em>Hindu</em>, etc. This type of ordering by size is still <em>purposefully imposed</em>, <em>not an inherent one</em>. [/footnote]

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do it!</em> <em>1.3. Nominal Variables</em></p>

</header>
<div class="textbox__content">

Try to come up with at least three different nominal variables. Can you explain why they are nominal? Try to defend your choice in identifying the scale for these variables as nominal.

</div>
</div>
&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1428</wp:post_id>
		<wp:post_date><![CDATA[2019-07-30 16:07:25]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-07-30 20:07:25]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[1-3-1-nominal-variables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>3</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>1.3.2 Ordinal Variables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/1-3-2-ordinal-variables/</link>
		<pubDate>Tue, 30 Jul 2019 20:13:46 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1431</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

As with the nominal scale, the name of this scale is indicative of it's defining feature: an order. That is, the categories of an ordinal variable cannot just be ordered arbitrarily in any other way, like we can with nominal variables, no: <strong>the categories of any ordinal variable have an inherent order to them</strong>. Listing the categories of an ordinal variable differently would violate the intrinsic logic of their order and would make little to no sense; as well, we would lose the information contained in their order.

&nbsp;

Think back to the variable <em>educational attainment</em> from the  <em>Do It! 1.2.</em> exercise earlier. <em>Educational attainment</em> is usually measured by the educational degrees attained by an individual, so if you imagined the categorie<span style="text-indent: 18.6667px;font-size: 14pt">s being </span><span style="text-indent: 1em;font-size: 14pt">something like <em>no degree</em>, <em>secondary/high school</em>, A<em>ssociate's</em>, <em>Bachelor's, M</em><em>aster's</em>, <em>doctorate/PhD</em> you are probably not alone. That is, chances are, most, if not everyone, would come up with a list <em>in that particular order.</em> Why? Because, I can hear you explaining, no degree is <em>the lowest</em> formal educational attainment one can have; it's clearly <em>less</em> than having finished secondary/high school, which in turn is <em>less</em> than having a college degree, which again is clearly <em>less</em> than achieving a Master's degree, while, finally, a PhD is the highest degree one can get in academia. Arbitrarily switching the categories in <em>educational attainment</em> to be listed as, say, <em>Associate's</em>, <em>Master's</em>, <em>no degree</em>, <em>PhD</em>, <em>Bachelor's</em> makes little (rather, no) sense, and worse, it deprives us of the information about there being an intrinsically ascending order in the obtaining of the degrees (as one can only have a doctorate if they had previously finished college, which ca only be done after secondary/high school). </span>

&nbsp;

Note that having an intrinsic order (in this case, from less to more), however, is a necessary but not a sufficient condition to identify an ordinal scale. <strong>There is an additional requirement: a variable is ordinal only when the categories do not have a precise (numerical) value.</strong> In other words, while we know that a Bachelor's degree is <em>more</em> than an Associate's degree, we don't know <em>how much</em> more. Having a PhD is more than a Master's degree, but again, we don't know by how much. The same goes for any of the categories. We know the order, but not the precise "distance" between one category and another. As well, the "distance" between the first category and the second one might be unequal (while still unknown) to the "distance" between the second category and the third, and so on. It is not the size of the distance that matters here, only that the distance exists and that a category is clearly less/more (or bigger/smaller, nearer/farther, etc.) than another.[footnote]You might be tempted to measure the "distance" between the categories in <em>educational attainment</em> in terms of years. For example, you could say that the "distance" between <em>secondary/high school</em> and <em>Associate's</em> is two years, or that between <em>Associate's</em> and <em>Bachelor's</em> is another two years, etc. This would still be an imprecise measurement, however, because different people take different times to accomplish their degrees, not to mention that there is no way to measure the difference between <em>no degree</em> and <em>secondary/high school</em> (as <em>no degree</em> can mean anything between no education -- still a sad reality in many countries -- to dropping out of school a year before graduation. As well, doctoral studies vary enormously in duration depending not only on the chosen discipline but also on the country, etc. In short, measuring the "distance" in educational attainment categories in years would vary far too much on a case by case basis to be meaningful in any way. Note, however, that you could operationalize a variable <em>years of schooling</em> measured in years but that would not be the same variable anymore (nor would it be an ordinal variable).[/footnote]

&nbsp;

To summarize: As you can see from this example, the key feature of ordinal variables is the intrinsic logical ordering of their categories, a logic that would be lost if we were to reorder them in any other way. As well, this tells you that ordinal variables contain more information in comparison to nominal variables: namely, the ordering of the ordinal variable's categories. Ordering the categories of a variable is an additional action you can do above simply listing them. Finally, the general order is the <em>only</em> additional information: the "distances" between the categories could vary and should not be measurable/ quantifiable. If the latter is not the case, you are already moving into interval/ratio scales territory.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It!</em> <em>1.4. Ordinal Variables</em></p>

</header>
<div class="textbox__content">

With the risk of being repetitive, I'll ask that you try to think of three different ordinal variables. Can you explain why they should be classified as ordinal? Remember to make sure that the internal logical ordering of the categories of your variables is of the "more/less" type rather than involving precise measurement.

</div>
</div>
&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1431</wp:post_id>
		<wp:post_date><![CDATA[2019-07-30 16:13:46]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-07-30 20:13:46]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[1-3-2-ordinal-variables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>3</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>1.3.3 Interval and Ratio Variables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/1-3-3-interval-and-ratio-variables/</link>
		<pubDate>Tue, 30 Jul 2019 20:34:31 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1437</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Going back to the original <em>Do It!</em> <em>1.2.</em> exercise, I am sure that you found imagining the categories of <em>exam test scores</em> and <em>age</em> the easiest, as they would be simply numbers. Perhaps something like <em>30, 65, 72, 88, 95, </em>etc.<em>...</em> points out of 100 in the former case (though I know you don't want to imagine a test score of 30 on any exam!), and, if we're imagining college students, something like <em>18, 19, 20, 22, 23, 24, </em>etc.... years in the latter. Notice the major difference from the categories of the nominal and ordinal variables we discussed above: now we are working with <em>numbers</em>. Not only are the <em>exam scores</em> and <em>age</em> categories comprised of numbers (as opposed to words) but they are also ordered in <em>measurable "distances". </em>In other words, <strong>there is a stable/unchangeable unit by which the "distance" between any two categories can be measured</strong>: <em>a point</em> in the <em>exam scores</em> case and <em>a year</em> in the <em>age </em>case. <strong>This unit is called <em>unit of measurement</em> for the interval and ratio variables.</strong>

&nbsp;

<em>Wait a second</em>, you're probably thinking now -- the <em>exam scores</em> above lists 30, 65, 72... as categories, and a quick calculation reveals that the "distance" between 30 and 65 is thirty-five points, while the "distance" between 65 and 72 is only seven points. Thirty-five is clearly much bigger (five times bigger to be precise) than seven: isn't that as arbitrary as the "distances" across the <em>educational attainment</em> categories above? Well, no. The difference is that for interval and ratio variables the information contained in the categories and their "distances" from each other is not simply of the more/less, bigger/smaller, left/right, etc. kind but is readily quantifiable and measurable in precise, stable units. In practical terms, you can specify <em>exactly how much</em> smaller/bigger a category is than another (i.e., 65 points is thirty-five points more than 30 points; a 22 years-old is two years older than a 20 year-old) -- unlike with ordinal variables, where we know a Bachelor's is a bigger educational attainment than secondary/high school but there is no agreed unit to measure the "distance" precisely (as it's neither measured in years, not in numbers of degrees).

&nbsp;

Furthermore, my <em>exam scores</em> example lists 30, 65, 72... but I simply chose these numbers at random: I could have just as easily listed 25, 45, 70..., or 12, 54, 69..., etc. The point here is that one can have <em>any </em>number between 0 and 100 (in a conventional 100-point exam) as a <em>potential</em> score, or be a college student of <em>any</em> potential age (say, more than 5 years old),<span style="text-indent: 18.6667px;font-size: 14pt">[footnote]You might think I'm joking but do look Michael Kearney up. He graduated high school at age 6 and had earned his Bachelor's degree at age 10, this making him the youngest university graduate on record. <span style="background-color: #ffff99">(*January 15, 1995|RICHARD KAHLENBERG | The LA Times) </span> [/footnote] </span><span style="text-indent: 1em;font-size: 14pt">while the categories of an ordinal variable are </span><em style="text-indent: 1em;font-size: 14pt">fixed</em><span style="text-indent: 1em;font-size: 14pt">, or set, during operationalization (to a usually relatively small number), and cannot potentially be anything else (unless you operationalize the variable in a different way, which would result in a new variable). </span>

&nbsp;

Finally, a happy corollary to the fact that interval and ratio variables' categories are comprised of numbers is our ability to perform mathematical operations on them, beyond simple comparisons -- something we can do neither with nominal, nor with ordinal variables. (Exactly what kind of mathematical operations we can do with interval and ratio variables you'll see in Chapter 2.)

&nbsp;

<strong>To summarize, interval and ratio variables have three defining features: 1) their categories (typically called <em>values</em>) are comprised of numbers, 2) the categories follow an order inherent in the fact that there is a measurable, unit-based scale, so that we can speak of a variable's units of measurement, and 3) we can perform mathematical operations on the values (that the categories are).</strong>

&nbsp;

Wait though... Why did I say that interval and ratio variables are different when I keep defining them together, and in the same way? Not to worry: the difference comes next, as I saved what students usually find the trickiest part for last.

&nbsp;

With the risk of oversimplification (and, inevitably, exaggeration), interval scales are "made-up" while ratio scales are "real". The difference is purely conceptual: you have to know whether the scale o which the variable is measured is "artificially designed", as it were, or whether it exists as a some sort of "objective reality". A rule-of-thumb advise on differentiating them that you may encounter is the "existence of a true zero": <strong>ratio variables have a <em>true zero</em> while interval variables do not</strong>. (Clear as mud, eh? I did say it's tricky.)

&nbsp;
<p style="text-indent: 18.6667px">Examples usually help make this conundrum seem less of a conundrum.</p>
&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 1.3. Interval Variables: Temperature</em></p>

</header>
<div class="textbox__content">
<p style="text-indent: 18.6667px">Let's take the classic example of an interval-scale variable, <em>temperature</em>. If you go by centigrade, 0°C is, I'm sure you know, the temperature at which water freezes. If you go by Fahrenheit, however, 0°F is... well, nothing in particular; it's just equal to about -18°C. <span style="font-size: 14pt;text-indent: 18.6667px">On the other hand, if you are more scientifically-minded, you might go by Kelvin, where 0°K is the coldest-cold-and-nothing-could-ever -be-colder temperature </span><span style="text-indent: 18.6667px"><span style="font-size: 14pt">(a.k.a., <em>absolute zero</em>), equal to -273.15°C, or -459.67°F.</span></span></p>
&nbsp;
<p style="text-indent: 18.6667px"><span style="text-indent: 18.6667px"><span style="font-size: 14pt">Have you ever wondered why there are three scales of measuring temperature? From where did they come from? They were "</span><span style="font-size: 18.6667px">artificially</span><span style="font-size: 14pt"> designed" (or you might say, invented) by people: Anders Celsius, Daniel Fahrenheit, and Lord Kelvin were the scientists who came up with them and whose names we use to indicate in which scale we have chosen to report temperature. Not only is a temperature of 0 degrees different in all three systems, <em>they don't indicate zero/nothing/absence of something</em>.[footnote] Well, 0°K </span><em style="font-size: 14pt">does</em><span style="font-size: 14pt"> indicate absence of all energy, a temperature where all atoms stop moving, but it is still not an absence of </span><em style="font-size: 14pt">temperature</em><span style="font-size: 14pt">.[/footnote] Temperatures of 0°C or 0°F do not indicate an <em>absence</em> of temperature or <em>no</em> temperature </span><span style="font-size: 18.6667px">whatsoever</span><span style="font-size: 14pt">, they are purposefully (and one could say, arbitrarily) chosen by people as a zero-point on an human-made scale. </span></span></p>

</div>
</div>
&nbsp;

In a similar vein, a score of 0 points on an exam doesn't typically mean a complete absence of or no knowledge on a subject whatsoever -- such a score usually simply means that the test-taker did not perform well on <em>that particular test</em>. Arguably, an easier test on the subject could be designed, and the test-taker would likely score more points.

&nbsp;

Contrast this to our other variable from the original <em>Do it!</em> <em>1.2.</em> exercise, <em>age</em>. Age of 0 years means exactly that - that we are talking about an infant who hasn't yet reached their first birthday, and thus has completed 0 years of life (pardon the awkward phrasing). [footnote]<span style="text-indent: 18.6667px;font-size: 14pt">Of course, we measure babies' ages in smaller units, like months, or weeks, or even days and hours -- just like we can measure any person's precise age that way. However, we <em>usually</em> don't do it for anyone who's not an infant, so I'll leave it at that.</span><span style="text-indent: 1em;font-size: 14pt">[/footnote] Or consider a variable for, say, income: an income of $0 means the complete absence of income on dollars, i.e., <em>no income</em>. Both age and income are not "made up": they exist regardless of how we measure them, and a zero on either indicates an absence of something (<em>time</em> in the former case,<em> dollars</em> in the latter). Physical attributes like height and weight work the same way.</span>

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It!</em> <em>1.5. Interval/Ratio Variables</em></p>

</header>
<div class="textbox__content">

You saw it coming: Try to come up with three interval/ratio variables (in addition to the ones I listed above). Try to differentiate between the interval and ratio scales and to identify which variable goes with each. Make sure you can explain what makes each variable interval- or ratio-scale.

</div>
</div>
&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1437</wp:post_id>
		<wp:post_date><![CDATA[2019-07-30 16:34:31]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-07-30 20:34:31]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[1-3-3-interval-and-ratio-variables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>3</wp:post_parent>
		<wp:menu_order>6</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>1.4 Level of Measurement and Operationalization Considerations</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/1-4-level-of-measurement-and-operationalization-considerations/</link>
		<pubDate>Tue, 30 Jul 2019 20:44:18 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1442</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

All in all, the difference between interval and ratio variables exists more on a conceptual level rather than in practical terms. As such, they are frequently grouped together in an interval/ratio category and treated the same for the purposes of statistical analysis. At this stage, while it's preferable to know the difference between them, it is still far more important to be able to differentiate interval/ratio variables from nominal and ordinal ones.

&nbsp;

Here is proof how tricky identifying the correct level of measurement of a variable can be.

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><strong><span style="color: #ff0000">Watch Out!! #1 ... <span style="color: #ffffff">for Likert Scales</span></span></strong></em></p>

</header>
<div class="textbox__content">
<div class="textbox__content">

Most likely, at some point you have encountered survey questions that read something like this:

&nbsp;

"On a scale of 1 to 5, where 1 is the lowest and 5 is the highest, how much do you like ...?"

&nbsp;

... let's say, "chocolate". It is possible that you were presented with the numbers from 1 to 5 to choose from, or that they were accompanied with phrasing of the <em>strongly dislike, dislike, neither like nor dislike, like, strongly like</em> type. Now that you know about levels of measurement, as what scale would you classify the variable <em>liking of chocolate</em>: nominal, ordinal, or interval/ratio?

&nbsp;

Considering that the answers from which one can choose are listed as numbers, many students are tempted to classify such a variable is interval. However, the <em>strongly dislike, dislike, neither like nor dislike, like, strongly like</em> part should give you more clues. Ask yourself: is there a uniform unit that allows us to precisely measure the "distance" between <em>dislike</em> and <em>strongly dislike</em>? Or between <em>like</em> and <em>neither like nor dislike</em>? Is it even the same "distance"? We would be hard-pressed to say "yes" to any of these questions. We know that people who like chocolate like it more than those who neither like it nor dislike it but we don't know <em>exactly how much</em> more. The numbers are there to make analyzing the responses easier, and as a sort of "code" for the ranking of preferences regarding chocolate, but substantively the ranking contains only order, not precise measurement of these preferences.

&nbsp;

Variables such as these are called <strong><em>Likert scales</em>.</strong> As I just explained, they are ordinal by constitution (although, in some special cases -- for example, when the possible responses are not five but, say, ten or more -- they can be <em>treated</em> as interval for purposes of analysis). Researchers use them usually to capture people's preferences -- but preferences are generally "fuzzy" and not fully-defined; they do not come with a build-in, measurable, uniform unit scale, despite the fact that it seems like the numbers represent one such scale.

&nbsp;

In Chapter 2 you will see that numbers can be used to represent a lot more than actual numbers. (And you were just starting to think identifying the level of measurement is easy!)

</div>
</div>
</div>
&nbsp;

A further word of caution: the examples I used in this chapter might leave you with the impression that you can simply <em>hear</em> the name of a variable and you should be able to identify its scale of measurement. That would be wrong. My examples are <em>hypothetical</em> and as such I <em>imagine</em> what the variables' categories might look like. (I also ask you to imagine variables and their categories in the <em>Do It!</em> exercises.) However, variables -- not hypothetical, <em>real</em> variables that we use for analysis -- exist in real datasets, where they have been operationalized in one specific, concrete way.

&nbsp;

As such, upon hearing the name of a variable, instead of <em>imagining</em> what it looks like, you should always - <em>always</em>! - <em>actually look</em> at it and its categories in the given/specific dataset of which the variable is a part. <strong>Determining an existing variable's scale of measurement requires exploring <em>the actual variable as it was created</em></strong>. Recall that there is more than one way to operationalize a variable. Thus, the researcher/s who created some variable into which you might be looking might arguably have created it differently than you would, or differently than some other researchers might have created theirs -- <em>even if these variables</em> (the different researchers' and your hypothetical one) <em>have the same name</em>.

&nbsp;

This leads us to the question: <strong>Can the same concept be operationalized at different levels of measurement?</strong> The answer lies in the nature of the concept (or that of the hypothetical variable, if you prefer). Let's go back to the example of <em>income</em> from the previous section on operationalization. There I provided you with a few different ways to create income categories. One was based on a yes/no question ("Is your income below...?" a specific number), and few more ways listed several categories based on income groups ("0-19,999", "20,000-29,999",....etc.). Additionally, we could ask people to supply their specific income, rounded to the nearest dollar. Alternatively, thinking along the lines of a survey questions, this would result in a) yes/no response, b) multiple choice answer, and lastly, c) an open-ended, respondent-supplied answer.

&nbsp;

In this way, we can say that we can successfully operationalize the concept of <em>income</em> at three different levels of measurement: a) nominal, b) ordinal, and c) ratio, respectively. This is only possible because of the <em>numerical</em> nature of income: income is monetary, and money is countable - and expressed in <em>numbers</em>. We can <em>choose</em> to create several categories of income (out of the numbers involved), or we could <em>choose</em> to create only a binary variable (i.e., with two categories) to indicate an income below/above some threshold. In choosing either of these, we also make the decision to forego, or lose the more specific information of the actual income of everyone we ask. Logically though, we can only forego/lose information that is otherwise potentially available: we cannot <em>make</em> information <em>up</em>.

&nbsp;

What it all boils down to is that <strong>we can operationalize <em>down</em>: from the highest level of measurement possible for a variable towards the lower ones - but never vice versa.</strong> A concept of numerical nature, i.e., an interval/ratio variable can be operationalized <em>down</em> and created as an ordinal variable, or even further down as a nominal variable, losing potential information (actual numbers and order) along the way. A concept of ordinal nature can also be operationalized down to a nominal scale, again, foregoing the potential information of order. However, a "naturally" nominal variable cannot be operationalized as anything else but nominal: there is simply no further information available. The same goes for "naturally" ordinal variables - they cannot be operationalized as interval/ratio as  the only information we can have is order, while precision and measurable, defined constant units are not possible to obtain.[footnote]Beyond the original operationalization, sometimes researchers actually <em>recode</em> variables down within an existing dataset. Since they start with an interval/ratio variable, they can choose which level of measurement they want to use, and go back and forth between ordinal and nominal and back to interval/ratio. They can do this only because the information has initially been collected at interval/ratio level of detail. If the original information is collected as nominal or ordinal data, no further information cannot be accessed: <em>recoding up is impossible.</em> [/footnote]]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1442</wp:post_id>
		<wp:post_date><![CDATA[2019-07-30 16:44:18]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-07-30 20:44:18]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[1-4-level-of-measurement-and-operationalization-considerations]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>3</wp:post_parent>
		<wp:menu_order>7</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>1.5 Discrete and Continuous Variables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/1-5-discrete-and-continuous-variables/</link>
		<pubDate>Tue, 30 Jul 2019 21:01:27 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1448</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

I will introduce a final useful typology by which variables can be grouped: discrete and continuous.

&nbsp;

By definition, variables called <em>discrete</em> (note, not discreet!) have finite number of categories (i.e.,"space" between them, and nothing occupies that space), while variables called <em>continuous</em> have potentially infinite number of values (i.e., it's possible that a value exists between any two given values, in smaller and smaller -- <em>infinite</em> --- number of "spaces" between any two the values, to infinity). To make things easier to understand, and with more than a little risk of oversimplification, <strong>in a very broad sense you can think of nominal and ordinal variables as discrete and of interval/ratio variables as continuous</strong>.[footnote]Technically speaking, in theory nominal and some ordinal variables are categorical, ordinal variables with numerical categories are discrete, and interval/ratio variables are continuous. In practice, things are less clear cut.[/footnote] For example, <em>hair colour, religious affiliation,</em> and <em>educational attainment</em> (as measured in educational degrees) are all discrete: they have finite number of <em>discrete</em> categories.

&nbsp;

On the other hand, age, income, or exam scores are all continuous: a number (value) can exist between any two given values, depending on how precise you want your measurement to be. To take <em>age</em>, for example, if two people report being 20 and 22, respectively, it's obviously possible that another person in 21. However, we need not round to full years; between two people ages 20 and 21, a value of 21.5 (or 21 years and 6 months) is possible to exist. Further, between the ages of 21 years and 21 years and 6 months, we can have a value of 21 years and 3 months, and so on, until we are down to counting days, then counting hours, then counting minutes, then counting seconds, then milliseconds, then microseconds, then nanoseconds, etc.... The point is that, in theory, there is always a smaller number between any two numbers (which can be represented by the possibility of infinite number of digits after the decimal point). The same can be applied to income and exam scores too.

&nbsp;

In practice, however, things are different. In sociological <span style="text-indent: 37.3333px;font-size: 14pt">research </span><span style="text-align: initial;text-indent: 2em;font-size: 14pt">(as with other similar disciplines), the data collected is <em>empirically</em> discrete, as the values collected are a finite number and are typically rounded to whole numbers: we don't bother to measure age in anything but years, income in dollars (and not cents), etc. Still, w</span><span style="text-indent: 18.6667px;font-size: 14pt">e usually call interval/ratio variables are continuous</span><span style="text-indent: 1em;font-size: 14pt">, because of the </span><em style="text-indent: 1em;font-size: 14pt">potential</em><span style="text-indent: 1em;font-size: 14pt"> for infinite number of values.</span>

&nbsp;

At the same time, however, some ratio variables are truly discrete. Think, for example, about a measure called <em>number of children </em>of the respondent. Clearly, there is no possibility for an infinite number of values, just like with any "number of people"-type variable: people can only be counted in whole numbers, and the count is always finite.

&nbsp;

All this is undoubtedly confusing, so here is a practical tip for applied research, and what you need to focus on. Regardless if a variable is discrete or continuous <em>in theory</em>, in practice all variables you will encounter in real-life, actual datasets will be discrete. <strong>What we do is <em>treat</em> some variables as discrete, and other variables as continuous <em>for the purposes of statistical analysis</em></strong>. The rule of thumb is to make the differentiation based on the number of categories/values: <strong><em>typically</em> nominal and ordinal variables have relatively few categories so we treat them as discrete, while interval/ratio variables <em>typically</em> have relatively large number of values, so we treat them as continuous.</strong> If, however, an ordinal variable has relatively large number of categories it may be treated as continuous, and, on the flip side, if an interval/ratio variable has relatively few values it may be treated as discrete. Generally, and <span style="text-indent: 18.6667px;font-size: 14pt">assuming proper justification (i.e., a large number of categories/values), </span><span style="text-indent: 1em;font-size: 14pt">the decision to treat an ordinal variable as continuous or an interval/ratio variable as discrete remains a matter of the researcher's discretion.</span>

&nbsp;

<span style="text-indent: 1em;font-size: 14pt">Finally, what is the magic number in the  "relatively large number of categories/values" rule? This also depends, but from what I have seen in practice, the number is around 7-10 categories/values for most (i.e., if a variable has more categories/values that that it's treated as continuous, and if it has fewer categories/values than that it is treated as discrete).</span>

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1448</wp:post_id>
		<wp:post_date><![CDATA[2019-07-30 17:01:27]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-07-30 21:01:27]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[1-5-discrete-and-continuous-variables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>3</wp:post_parent>
		<wp:menu_order>8</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>2.3.1 Relative Frequency: Adding Percentages</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-3-1-adding-percentages/</link>
		<pubDate>Thu, 08 Aug 2019 21:44:59 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1495</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Simply counting the frequency of the different variable's categories (or the number of specific responses) is rarely enough. Often, we also want to know what <em>proportion</em> -- or what <em>percentage</em> -- of the total each category represents. This is especially important when comparing across two or more different groups. Thus we will stop on our way to frequency tables to undertake a brief side quest into <em>relative frequency</em> territory.

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch Out!! #3</strong></span>... for Cross-Group Comparisons Using Counted Numbers</em></p>

</header>
<div class="textbox__content">

Imagine that researchers are conducting a study on eating habits and they have interviewed 170 people; 102 identified as men and 68 identified as women. Say that the researchers found that 17 of the men and 13 of women reported a vegan diet. Can the researchers conclude that men tend to favour vegan diets more than women do?

If you go by the actual, counted numbers reported, you may decide that yes, the researchers' conclusion is correct as 17 is more than 13, i.e.,  four more men than women have reported eating vegan.  This, however, would be wrong. We cannot compare the two groups (men and women) directly since the groups have different sizes. <span style="font-size: 1rem">That is, comparison of the numbers as counted in the two groups has little meaning since it does not take into account group size. Yes, more men report eating vegan but men in the study outnumber women by 24 to start with. Thus, <strong>maybe we find more vegan men than women simply because there are more men than women in the study.</strong> </span><span style="text-indent: 1em;font-size: 1rem">What we should be asking ourselves instead is whether a larger</span><em style="text-indent: 1em;font-size: 1rem"> proportion</em><span style="text-indent: 1em;font-size: 1rem"> of men eat vegan, compared to women -- and the correct answer would require a comparison of the numbers <strong>relative to group size</strong>. </span>

&nbsp;

<span style="text-indent: 1em;font-size: 1rem">A quick calculation reveals that 17 out of 102 is actually <em>less</em> than 13 out of 68: </span>

&nbsp;

$$\frac{17}{102}=0.167$$

$$\frac{13}{68}=0.191$$

&nbsp;

That is, <strong>the proportion of vegan men (0.167) is smaller than the proportion of vegan women (0.191)</strong>, so no, we cannot say that men tend to be vegan more than women do. Rather, it's the other way around: <strong>more women than men tend to eat vegan, because vegan women are a higher proportion (i.e., the number for women is higher relative to their group size).</strong>

To conclude, <strong>never use numbers as counted to compare between groups</strong> (unless they are of equal size). To make comparison possible -- and meaningful -- you should <strong>always use <em>proportions or percentages </em></strong>(i.e., the numbers relative to the size of each group).

</div>
</div>
&nbsp;

A bit more notation then: if we denote <em>frequency</em> by <em>f, </em>and you recall that<em> N</em> stands for <em>number</em> (of elements in a dataset; of people in a group, etc.), it would be easy to see that <em>proportion</em> -- denoted by <em>p</em> -- should be

&nbsp;

$$\frac{f}{N}=p$$.

&nbsp;

While actual numbers represent frequency, proportions are one way of expressing <strong><em>relative frequency</em></strong>. You probably are more familiar with another way of expressing relative frequency -- <strong>percentages</strong>.

&nbsp;

In the example I used in the<em> Watch Out!! #3</em> above, we concluded that more women than men were vegan based on the fact that the proportion of vegan women (0.191) was higher than the proportion of vegan men (0.167). In everyday life, people usually tend to use percentages to express that. <strong>To convert proportions to percentages you only need to multiply by a 100</strong>[footnote]After all, percent or per cent comes from the Latin "per centum", meaning "by a hundred"; i.e., whatever proportion you are expressing, standardized by a hundred.[/footnote]:

&nbsp;

$$\frac{f}{N}(100)=percent$$

&nbsp;

Thus, we get the following percentages when comparing vegan men and women from the <em>Watch Out!! #3 </em> above:

&nbsp;

$$0.167(100)=16.7\%$$ and

$$0.191(100)=19.1\%$$.

&nbsp;

That is, we could rephrase our finding and say that since only 16.7 percent of men reported being vegan while 19.1 percent of women did, clearly women are more likely to be vegan based on this particular group of respondents.

&nbsp;

Note that <strong>while proportions range from 0 to 1 and typically get rounded up to three digits after the decimal point</strong> (e.g., 0.167 and 0.191), <strong>percentages range from 0 to 100 and usually get rounded up to one or two digits after the decimal point</strong> (e.g., 16.7% and 19.1%).  Also note that <strong>differences in percentages are expressed in <em>percentage points</em>, not in percent</strong>: in the current example, the difference between men and women who eat vegan is (19.1% - 16.7%=) 2.4 percentage <em>points</em> in favour of women being vegan, <em>not</em> 2.4 percent.

&nbsp;

A final way to express relative frequencies are <strong>ratios</strong>, where a ratio is simply one frequency/count relative to another:

&nbsp;

$$\frac{f_1}{f_2}=ratio$$

&nbsp;

Using the numbers from the <em>Watch Out!! #3</em> above, we can say that in the group of 170 respondents (102 men and 68 women), we have a men-to-women ratio of 1.5  -- or, men in the study outnumber women by 1.5 to 1 since

&nbsp;

$$\frac{f_m}{f_w}=\frac{102}{68}=1.5$$.

&nbsp;

It's easy to see that if we want the women-to-men ratio, we only need to switch the numerator and denominator of the ratio:

&nbsp;

$$\frac{f_w}{f_m}=\frac{68}{102}=0.7$$

&nbsp;

This still tells us that men outnumber women as for every 1 man there is only a "0.7 woman". Since this type of fractions, depending on the context, can lead to an awkward phrasing (like in this case), you may choose to report a ratio in the way most apt for easier interpretation.

&nbsp;

Relative frequencies are all nice and good, but let's go back to our main quest, the frequency table. Since we established that reported actual numbers are meaningless for comparison purposes and that we need relative frequencies to do that, it would only make sense to add a relative frequency column to our <em>educational attainment</em> Table 2.1 from Example 2.2 (B).

The percentages in Table 2.2 below have all been calculated using the steps described above: 1) obtain proportion, and 2) multiply by a 100. For example, only one of ours original 21 respondents had no degree. Then the percentage of the 21 respondents with no degree is:

&nbsp;

$$\frac{f}{N}(100)=\frac{1}{21}(100)=0.047(100)=4.7\%$$

&nbsp;

The rest of the categories' percentages are calculated in the same vain.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.2 (C) Hypothetical Data on Educational Attainment, Organized and with Relative Frequencies Added</em></p>

</header>
<div class="textbox__content">

<em>Table 2.2 Educational Attainment by Frequency and Percent</em>
<table style="border-collapse: collapse;width: 24.0408%;height: 121px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 30.7511%;height: 15px;text-align: center"><strong>Degree</strong></td>
<td style="width: 7.3832%;height: 15px">
<p style="text-align: center"><strong>  Frequency</strong></p>
</td>
<td style="width: 9.9473%">
<p style="text-align: center"><strong>Percent</strong></p>
</td>
</tr>
<tr style="height: 15px">
<td style="width: 30.7511%;height: 15px">   No degree</td>
<td style="width: 7.3832%;height: 15px;text-align: center">1</td>
<td style="width: 9.9473%;text-align: center">4.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 30.7511%;height: 15px">   Secondary/High School</td>
<td style="width: 7.3832%;height: 15px;text-align: center">6</td>
<td style="width: 9.9473%;text-align: center">28.6</td>
</tr>
<tr style="height: 15px">
<td style="width: 30.7511%;height: 15px">   Associate's</td>
<td style="width: 7.3832%;height: 15px;text-align: center">3</td>
<td style="width: 9.9473%;text-align: center">14.3</td>
</tr>
<tr style="height: 15px">
<td style="width: 30.7511%;height: 15px">   Bachelor's</td>
<td style="width: 7.3832%;height: 15px;text-align: center">5</td>
<td style="width: 9.9473%;text-align: center">23.8</td>
</tr>
<tr style="height: 15px">
<td style="width: 30.7511%;height: 15px">   Master's</td>
<td style="width: 7.3832%;height: 15px;text-align: center">2</td>
<td style="width: 9.9473%;text-align: center">9.5</td>
</tr>
<tr style="height: 16px">
<td style="width: 30.7511%;height: 16px">   PhD</td>
<td style="width: 7.3832%;height: 16px;text-align: center">1</td>
<td style="width: 9.9473%;text-align: center">4.7</td>
</tr>
<tr>
<td style="width: 30.7511%">   Didn't answer</td>
<td style="width: 7.3832%;text-align: center">3</td>
<td style="width: 9.9473%;text-align: center">14.3</td>
</tr>
<tr style="height: 15px">
<td style="width: 30.7511%;height: 15px">  <strong> TOTAL</strong></td>
<td style="width: 7.3832%;text-align: center;height: 15px"><strong>21</strong></td>
<td style="width: 9.9473%;text-align: center"><strong>100.0</strong></td>
</tr>
</tbody>
</table>
</div>
</div>
&nbsp;

This way we can easily see how the respondents are distributed across the different educational attainment categories and each category's share as a fraction of the total. If we had another group of respondents, we could easily compare between our initial group of 21 and the second hypothetical group by using the percentages above. Or can we?]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1495</wp:post_id>
		<wp:post_date><![CDATA[2019-08-08 17:44:59]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-08 21:44:59]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[2-3-1-adding-percentages]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>323</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>2.3.2 Missing Data: Adding Valid Percentages</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-3-2-missing-data-adding-valid-percentages/</link>
		<pubDate>Thu, 08 Aug 2019 21:54:57 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1499</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

If you've paid attention so far, you must have noticed that three of our 21 respondents provided a "Didn't answer" response when asked about their educational attainment. Sometimes respondents may refuse to answer a question, or the question may not have been applicable to them and wasn't asked, or a response might not get recorded due to an error, etc. In short, sometimes we have a case of what is known as <em>missing data</em>.

&nbsp;

What do we know about the educational attainment of the three individuals who, for whatever reason, didn't answer this question?

&nbsp;

Nothing.

&nbsp;

Can we in some way infer their educational attainment? Not with the data provided in the example.

&nbsp;

So then what do we do? How do we analyze our <em>educational attainment</em> variable?

&nbsp;

The most frequent -- and strongly recommended (especially for people just starting on their journey to research) -- course of action is to simply<em> drop</em> the missing cases.[footnote]Depending on the particular data and particular situation, and assuming strong justification, researchers experienced in data analysis may have different options, such as estimation, imputation of means, etc. These, however, are beyond the scope of this text. The safest action for students/beginners to take remains dropping any missing cases from the analysis. See <a href="https://www.iriseekhout.com/missing-data/missing-data-methods/imputation-methods/">https://www.iriseekhout.com/missing-data/missing-data-methods/imputation-methods/</a> for a discussion. [/footnote] Missing cases have no part in any analysis and using them as they are would inevitably compromise conclusions -- after all, we have no information on what we want to know about them, and we cannot make that information up.

&nbsp;

Generally, how statistical software deal with missing data by default settings may vary. SPSS's default is to skip missing cases so that analysis is always based on valid cases only.

&nbsp;

As well, SPSS provides a separate column in <em>Data View</em> indicating which values in the data stand for a missing data point. As discussed in <span style="color: #000000">Section 2.1 </span>(<a href="https://pressbooks.bccampus.ca/simplestats/chapter/2-1-data/">https://pressbooks.bccampus.ca/simplestats/chapter/2-1-data/</a>), you can find the <em>coding</em> of the values in the <em>Values</em> column in <em>Data View</em>.  Clicking the specific cell in that column opens up a window with the values' code. There you may find several types of missing data, typically values such as "Valid skip"/"Not applicable" (the respondent had not been asked the question on which the variable is based due to a previous answer)[footnote]For example, if a respondent has indicated previously that they didn't smoke, a subsequent question about how often they smoked would make no sense; the respondent then would be "validly skipped" from answering this subsequent question.[/footnote], "Don't know" (the respondent did not know the answer to the question), "Refusal" (the respondent refused to answer the question), "Not stated" (when the question should have been answered/ an answer should have been recorded but, for whatever reason, it hasn't been), etc.

&nbsp;

Apart from "Not applicable", the codes listed here are standard Statistics Canada codes used in all their datasets and can be found in any Statistics Canada dataset documentation.[footnote]Currently, Statistics Canada uses 6, 96, 996, etc. for "Valid skip"; 7, 97, 997, etc. for "Don't know"; 8, 98, 998, etc. for "Refused"; and 9, 99, 999, etc. for "Not stated". [/footnote]

&nbsp;

So given that we had three cases of missing data within our group of 21 respondents, are the percentages reported in the previous sub-section's Table 2.2 in Example 2.2 (C) <em>valid</em> to use?

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch Out!! #4</strong></span>... for Findings Based on Missing Data</em></p>

</header>
<div class="textbox__content">

This will be a short warning but it deserves it's own scary-red<em> Watch Out!!</em> reiteration: do not trust analysis and findings that include missing cases as they would be distorted and unreliable. Missing data is exactly that - <em>missing</em>. It simply does not exist. As a beginner researcher, always make sure you have dropped (i.e., excluded) any missing cases before analyzing your data and reporting any results.

</div>
</div>
&nbsp;

Considering that Table 2.2 did include missing data in the calculation of percentages, let us correct that by modifying it and including another column, <strong><em>valid</em> percentages</strong>.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.2 (D) Hypothetical Data on Educational Attainment, Organized and with Relative Frequencies and Valid Percentages Added</em></p>

</header>
<div class="textbox__content">

<em>Table 2.3 Educational Attainment by Frequency, Percent and Valid Percent</em>
<table style="border-collapse: collapse;width: 79.1507%;height: 209px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 10.9909%;text-align: center;height: 59px"><strong> </strong></td>
<td style="width: 37.8257%;height: 59px;text-align: center"><strong>Degree</strong></td>
<td style="width: 4.26491%;height: 59px">
<p style="text-align: center"><strong>  Frequency</strong></p>
</td>
<td style="width: 17.6642%;height: 59px">
<p style="text-align: center"><strong>Percent</strong></p>
</td>
<td style="width: 18.7802%;height: 59px">
<p style="text-align: center"><strong>Valid Percent</strong></p>
</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px">Valid</td>
<td style="width: 37.8257%;height: 15px">   No degree</td>
<td style="width: 4.26491%;height: 15px;text-align: center">1</td>
<td style="width: 17.6642%;text-align: center;height: 15px">4.7</td>
<td style="width: 18.7802%;text-align: center;height: 15px">5.6</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 37.8257%;height: 15px">   Secondary/High School</td>
<td style="width: 4.26491%;height: 15px;text-align: center">6</td>
<td style="width: 17.6642%;text-align: center;height: 15px">28.6</td>
<td style="width: 18.7802%;text-align: center;height: 15px">33.3</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 37.8257%;height: 15px">   Associate's</td>
<td style="width: 4.26491%;height: 15px;text-align: center">3</td>
<td style="width: 17.6642%;text-align: center;height: 15px">14.3</td>
<td style="width: 18.7802%;text-align: center;height: 15px">16.7</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 37.8257%;height: 15px">   Bachelor's</td>
<td style="width: 4.26491%;height: 15px;text-align: center">5</td>
<td style="width: 17.6642%;text-align: center;height: 15px">23.8</td>
<td style="width: 18.7802%;text-align: center;height: 15px">27.8</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 37.8257%;height: 15px">   Master's</td>
<td style="width: 4.26491%;height: 15px;text-align: center">2</td>
<td style="width: 17.6642%;text-align: center;height: 15px">9.5</td>
<td style="width: 18.7802%;text-align: center;height: 15px">11.1</td>
</tr>
<tr style="height: 16px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 37.8257%;height: 15px">   PhD</td>
<td style="width: 4.26491%;height: 15px;text-align: center">1</td>
<td style="width: 17.6642%;text-align: center;height: 15px">4.7</td>
<td style="width: 18.7802%;text-align: center;height: 15px">5.6</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 37.8257%;height: 15px"><strong>   Total Valid</strong></td>
<td style="width: 4.26491%;text-align: center;height: 15px"><strong>18</strong></td>
<td style="width: 17.6642%;text-align: center;height: 15px"><strong>85.6</strong></td>
<td style="width: 18.7802%;text-align: center;height: 15px"><strong>100.0</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px">Missing</td>
<td style="width: 37.8257%;height: 15px">   Didn't answer</td>
<td style="width: 4.26491%;text-align: center;height: 15px">3</td>
<td style="width: 17.6642%;text-align: center;height: 15px">14.3</td>
<td style="width: 18.7802%;text-align: center;height: 15px"></td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 37.8257%;height: 15px">   Total Missing</td>
<td style="width: 4.26491%;text-align: center;height: 15px">3</td>
<td style="width: 17.6642%;text-align: center;height: 15px">14.3</td>
<td style="width: 18.7802%;text-align: center;height: 15px"></td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 37.8257%;height: 15px">    TOTAL</td>
<td style="width: 4.26491%;text-align: center;height: 15px">21</td>
<td style="width: 17.6642%;text-align: center;height: 15px">100.0</td>
<td style="width: 18.7802%;text-align: center;height: 15px"><strong> </strong></td>
</tr>
</tbody>
</table>
</div>
</div>
&nbsp;

As you see in the modified Table 2.3 above, I have separated the missing cases from the valid cases (the cases for which we have educational attainment data). <strong>Since we have only 18 valid cases, we should use only those 18 cases for any calculations and analysis -- and not the total of 21 cases</strong> (which includes the missing). Thus, instead of having just

&nbsp;

$$\frac{f}{N}(100)=\frac{1}{21}(100)=0.047(100)=4.7\%$$

&nbsp;

along with the rest of the categories' percentages calculated in this way, we should calculate the categories' <em>valid</em> percentages, discarding he three missing cases, like this:

&nbsp;

$$\frac{f}{N}(100)=\frac{1}{18}(100)=0.056(100)=5.6\%$$

&nbsp;

(As usual, I only show you the calculation for the first category as the rest follow in the same way.)

&nbsp;

Despite the fact that we do have the percentages based on missing data in the table, note that these - <strong>the valid percentages -- are the only percentages you should use in your analysis and report in your findings</strong>.

&nbsp;

Alright, you might say now, we added percentages and valid percentages to the simple frequencies, this surely means we have a complete frequency table by now.

&nbsp;

Sorry, no, not yet. One thing remains.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1499</wp:post_id>
		<wp:post_date><![CDATA[2019-08-08 17:54:57]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-08 21:54:57]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[2-3-2-missing-data-adding-valid-percentages]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>323</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>2.3.3 Summing Up: Adding Cumulative Percentages</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-3-3-summing-up-adding-cumulative-percentages/</link>
		<pubDate>Thu, 08 Aug 2019 22:06:04 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1504</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

The thing that remains to add to our frequency table is there only for convenience's sake. It can be useful to know, for example, what percentage of the 21 people in our original group do not have graduate degrees, or what percentage of people have not gone to university, etc. Of course, in our specific <em>educational attainment</em> example it would be easy to to the quick-and-dirty calculation of adding 11.1 percent (those with Master's degrees) to 5.6 percent (those with PhD), thus finding that 16.7 percent of our respondents have graduate degrees; or adding 5.6 percent (those without a degree) to 33.3 percent (those with Secondary/High School) and finding that 38.9 percent of our respondents have not gone to university. Doing such calculations all the time, depending on the question, might get tedious, however, at best, and, at worst, it's also incorrect (hence the "quick-and-dirty" appellation).

&nbsp;

Let's then improve on our frequency table-in-progress a final time, shall we? The version below is the final version, ta-da!

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 2.2 (E) Frequency Table for Educational Attainment</em></p>

</header>
<div class="textbox__content">

<em>Table 2.4 Educational Attainment by Frequency, Percent, Valid Percent and Cumulative Percent</em>
<table style="border-collapse: collapse;width: 101.198%;height: 209px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 10.9909%;text-align: center;height: 59px"><strong> </strong></td>
<td style="width: 24.2285%;height: 59px;text-align: center"><strong>Degree</strong></td>
<td style="width: 16.1624%;height: 59px">
<p style="text-align: center"><strong>  Frequency</strong></p>
</td>
<td style="width: 14.1231%;height: 59px">
<p style="text-align: center"><strong>Percent</strong></p>
</td>
<td style="width: 13.4978%;height: 59px">
<p style="text-align: center"><strong>Valid Percent</strong></p>
</td>
<td style="width: 10.5232%">
<p style="text-align: center"><strong>Cumulative Percent</strong></p>
</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px">Valid</td>
<td style="width: 24.2285%;height: 15px">   No degree</td>
<td style="width: 16.1624%;height: 15px;text-align: center">1</td>
<td style="width: 14.1231%;text-align: center;height: 15px">4.7</td>
<td style="width: 13.4978%;text-align: center;height: 15px">5.6</td>
<td style="width: 10.5232%;text-align: center">5.6</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 24.2285%;height: 15px">   Secondary/High School</td>
<td style="width: 16.1624%;height: 15px;text-align: center">6</td>
<td style="width: 14.1231%;text-align: center;height: 15px">28.6</td>
<td style="width: 13.4978%;text-align: center;height: 15px">33.3</td>
<td style="width: 10.5232%;text-align: center">38.9</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 24.2285%;height: 15px">   Associate's</td>
<td style="width: 16.1624%;height: 15px;text-align: center">3</td>
<td style="width: 14.1231%;text-align: center;height: 15px">14.3</td>
<td style="width: 13.4978%;text-align: center;height: 15px">16.7</td>
<td style="width: 10.5232%;text-align: center">55.6</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 24.2285%;height: 15px">   Bachelor's</td>
<td style="width: 16.1624%;height: 15px;text-align: center">5</td>
<td style="width: 14.1231%;text-align: center;height: 15px">23.8</td>
<td style="width: 13.4978%;text-align: center;height: 15px">27.8</td>
<td style="width: 10.5232%;text-align: center">83.3</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 24.2285%;height: 15px">   Master's</td>
<td style="width: 16.1624%;height: 15px;text-align: center">2</td>
<td style="width: 14.1231%;text-align: center;height: 15px">9.5</td>
<td style="width: 13.4978%;text-align: center;height: 15px">11.1</td>
<td style="width: 10.5232%;text-align: center">94.4</td>
</tr>
<tr style="height: 16px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 24.2285%;height: 15px">   PhD</td>
<td style="width: 16.1624%;height: 15px;text-align: center">1</td>
<td style="width: 14.1231%;text-align: center;height: 15px">4.7</td>
<td style="width: 13.4978%;text-align: center;height: 15px">5.6</td>
<td style="width: 10.5232%;text-align: center">100.0</td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 24.2285%;height: 15px"><strong>   Total Valid</strong></td>
<td style="width: 16.1624%;text-align: center;height: 15px"><strong>18</strong></td>
<td style="width: 14.1231%;text-align: center;height: 15px"><strong>85.6</strong></td>
<td style="width: 13.4978%;text-align: center;height: 15px"><strong>100.0</strong></td>
<td style="width: 10.5232%;text-align: center"><strong> </strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px">Missing</td>
<td style="width: 24.2285%;height: 15px">   Didn't answer</td>
<td style="width: 16.1624%;text-align: center;height: 15px">3</td>
<td style="width: 14.1231%;text-align: center;height: 15px">14.3</td>
<td style="width: 13.4978%;text-align: center;height: 15px"></td>
<td style="width: 10.5232%;text-align: center"></td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 24.2285%;height: 15px">   Total Missing</td>
<td style="width: 16.1624%;text-align: center;height: 15px">3</td>
<td style="width: 14.1231%;text-align: center;height: 15px">14.3</td>
<td style="width: 13.4978%;text-align: center;height: 15px"></td>
<td style="width: 10.5232%;text-align: center"></td>
</tr>
<tr style="height: 15px">
<td style="width: 10.9909%;height: 15px"></td>
<td style="width: 24.2285%;height: 15px">    TOTAL</td>
<td style="width: 16.1624%;text-align: center;height: 15px">21</td>
<td style="width: 14.1231%;text-align: center;height: 15px">100.0</td>
<td style="width: 13.4978%;text-align: center;height: 15px"><strong> </strong></td>
<td style="width: 10.5232%;text-align: center"><strong> </strong></td>
</tr>
</tbody>
</table>
</div>
</div>
&nbsp;

The final column I have added in our Table 2.4 is called <em><strong>Cumulative Percent</strong></em>. <strong>What it does is keep a sort of a "running total"</strong>, adding the second category's frequency to the first and reporting the first two categories as a fraction of the total; adding the third category's frequency to the total of the first two and reporting the first three categories as a fraction of the total, etc. -- in effect <strong>adding each subsequent category to the total of all preceding ones, one by one, until all categories are added together.</strong>

&nbsp;

Note, however, that you should not add the percentages in the <em>Valid Percent</em> column to obtain cumulative percentages. Despite the quick-and-dirty trick I did before, I actually calculated the cumulative percentages based on the added categories' frequencies, and so should you, if you have to create a frequency table from scratch.

&nbsp;

Like this:  there is one person without a degree and 6 people with secondary/high school degrees, or 7 people combined. Therefore, the cumulative percent of these two categories is obtained thus:

&nbsp;

$$\frac{f_1+f_2}{N}(100)=\frac{1+6}{18}(100)=\frac{7}{18}=0.389(100)=38.9\%$$

&nbsp;

and <em>not</em> by adding 5.6 percent (the person with no degree) to 33.3 percent (the ones with secondary/high school degrees) -- even if in this case, both produce the same result, 38.9 percent.

&nbsp;

<strong>The reason why we need to add the original frequencies and not the valid percentages themselves is rounding.</strong> The percentages reported in the frequency table are rounded to 1 digit after the decimal point; adding rounded numbers inevitably adds imprecision to the result, which, depending on the situation, might end up being crucial. In our case, it makes no difference but do note that the percentages reported in the <em>Percent</em> column actually only add up to 99.9 percent, not 100%; similarly, the percentages reported in the <em>Valid Percent</em> column actually add up to 100.1% rather than 100%. These differences, as negligible as they seem when working with a variable with few categories like the one here, can add up and become more significant in variables with numerous categories (like interval/ratio variables, for example).

&nbsp;

You can see examples of real-data frequency tables in the next-subsection.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1504</wp:post_id>
		<wp:post_date><![CDATA[2019-08-08 18:06:04]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-08 22:06:04]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[2-3-3-summing-up-adding-cumulative-percentages]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>323</wp:post_parent>
		<wp:menu_order>6</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>2.3.4 What Frequency Tables Really Look Like</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/2-3-4-what-frequency-tables-look-like/</link>
		<pubDate>Thu, 08 Aug 2019 22:14:09 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1508</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Before we move on to the last section of this chapter,  take a look at what frequency tables of real variables look like, using SPSS. All three variables in the tables below come from the <em>General Social Survey 2016</em> (or <em>GSS 2016</em>) (Statistics Canada 2018) which I'll formally introduce in <span style="background-color: #ffff00">Chapter XX</span>.

&nbsp;

<em>Table 2.5 Frequency Table for Sex of Respondent (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/nominal-freq-table-sex-gss-2016.png" alt="" width="452" height="143" class="wp-image-1479 size-full alignnone" />

&nbsp;

Table 2.5 shows a nominal variable, <em>sex of respondent</em>, with no missing data (thus both <em>Percent</em> and <em>Valid Percent</em> columns contain the same information).

&nbsp;

Unlike it, Table 2.6 below shows an ordinal variable, <em>workplace size</em>, where almost half (47.4 percent) of the respondents didn't supply a valid response. In cases like this one it's imperative you only use the data as presented in the <em>Valid Percent</em> column, and not the <em>Percent</em> one.

&nbsp;

<em>Table 2.6 Frequency Table for Workplace Size (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/ordinal-freq-table-workplace-size-gss-2016.png" alt="" width="520" height="297" class="alignnone wp-image-1480 size-full" />

&nbsp;

Table 2.7 below presents a ratio variable, <em>purchasing grocery store takeout dishes in the past month, </em>with relatively moderate number of data points missing (9.3 percent). Again,<em> Valid Percent</em> is the column at which you should be looking. As well, note that the first (blue) column lists the categories (or values) of the variable as supplied by the respondents, as it normally does. Since these consist of actual numbers, you might be tempted to see them as some sort of consecutive listing, and that would be wrong. If you look carefully, you'll see that numbers like 11, 19, 22, 23, etc. are not listed there. This is not because they are somehow "missing" but because no respondent provided such a response.

&nbsp;

<em>Table 2.7 Frequency Table for Purchasing Grocery Store Takeout Dishes (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/ratio-freq-table-takeout-dishes-gss-2016.png" alt="" width="484" height="869" class="alignnone wp-image-1481 size-full" />

&nbsp;

Finally, note that although the <em>Cumulative Percent</em> column is less useful when we are dealing with nominal variables, it's quite handy to have when working with ordinal and especially with interval/ratio variables. Thus we can easily state that 83.2 percent of respondents work at a small or a midsize workplace and that almost 90 percent of respondents have purchased no more than 4 grocery takeout dishes in the past month.

&nbsp;
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip 2.1: How to Request Frequency Tables</em></p>

</header>
<div class="textbox__content">

From the <em>Main Menu</em>:
<ul>
 	<li>Click <em>Analyze</em>, then <em>Descriptive statistics</em>, and then <em>Frequencies</em>;</li>
 	<li>Select variable/s from the left-side of the window and use the arrow button to move the variable/s to the right side;</li>
 	<li>Click <em>OK</em>.</li>
 	<li>The <em>Output</em> window will display the selected variable/s frequency table/s.</li>
</ul>
</div>
</div>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1508</wp:post_id>
		<wp:post_date><![CDATA[2019-08-08 18:14:09]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-08 22:14:09]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[2-3-4-what-frequency-tables-look-like]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>323</wp:post_parent>
		<wp:menu_order>7</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>3.3 The Median With Frequency Tables and Other Considerations</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-3-the-median-with-frequency-tables/</link>
		<pubDate>Mon, 12 Aug 2019 22:41:12 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1555</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

A similar -- though far more widespread confusion - may happen when working with frequency tables. Frequency tables, as you know from Section 2.3.3 (https://pressbooks.bccampus.ca/simplestats/chapter/2-3-3-summing-up-adding-cumulative-percentages/), list a variable's categories/values in the first column and their frequencies in the second column. Take a look at the incomplete frequency table of the fictitious <em>number of siblings</em> variable used from before.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 3.3 (B)  Number of Siblings, Aggregated</em></p>

</header>
<div class="textbox__content">

<em>Table 3.3 Frequency Table for Number of Siblings</em>
<table style="border-collapse: collapse;width: 50%;height: 90px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px;text-align: center"><strong>Value</strong></td>
<td style="width: 2.83286%;height: 15px;text-align: center"><strong>Frequency</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">0</td>
<td style="width: 2.83286%;height: 15px">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">1</td>
<td style="width: 2.83286%;height: 15px">2</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">2</td>
<td style="width: 2.83286%;height: 15px">2</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">3</td>
<td style="width: 2.83286%;height: 15px">1</td>
</tr>
<tr style="height: 15px">
<td style="width: 2.83286%;height: 15px">4</td>
<td style="width: 2.83286%;height: 15px">1</td>
</tr>
<tr>
<td style="width: 2.83286%"><strong>Total</strong></td>
<td style="width: 2.83286%"><strong>7</strong></td>
</tr>
</tbody>
</table>
</div>
</div>
&nbsp;

Can you as easily see that one of your (imaginary) friends has zero siblings, two of your (imaginary) friends have one sibling each, another two of them have two siblings each, etc.? While Table 3.3 presents the same information as Example 3.3 (A) in the previous section does, the way the data is organized is different, so again, make sure you differentiate the variable's values (first column) from the values' frequencies (second column).

&nbsp;

A further consideration is finding the median itself. While we saw that the mode depended only on identifying the category/value with the highest frequency (and it was therefore just a matter of finding the largest number in the <em>Frequency</em> column of a frequency table), are you able to determine the median from the partial frequency table in Example 3.3 (B) above? I would venture that the answer would be "no" for most readers.

&nbsp;

Of course, you can find a solution to our median-finding problem by "unpacking" the frequency column from Table 3.3 and reverting to raw (uncategorized) data again: one 0, two 1's, two 2's, one 3, and one 4 are 0, 1, 1, 2, 2, 3, 4. We already established (both visually and through using the position-of-the-median formula) that the middle case was Case #4, or "two siblings". Would you like, however, to do that for the following Table 3.4?

&nbsp;

<em>Table 3.4 Household Size of the Respondent (GSS 2016)</em>[footnote]Note that this variable is technically an ordinal variable. Despite the numerical values and equal "distances" (of <em>one person</em>) between the first five categories, the last category "Six <em>or more</em> person household" prevents us from categorizing the variable as ratio. After all, we don't know exactly how many individuals live with any of the 426 people in that category: it could be six, or seven, or eight, etc. Thus it is not possible to say how many more persons live in the households of the respondents in the last category compared to any of the preceding categories: the "distance" is no longer <em>one person</em>. Any interval/ratio variable that has its last category truncated in this way (i.e., it has "... or more" in its label) becomes technically ordinal. Nevertheless, for heuristic purposes I will ignore the "...or more" part in this example which allows me to assume that everyone in that last category lives in a six-person household. This, in turn, allows me to pretend the variable is a ratio. However, the example works the same way regardless if the variable is truly ordinal or ratio.[/footnote]

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/01/FR-for-median-household-size-GSS-2016.png" alt="" width="551" height="246" class="alignnone wp-image-381 size-full" />

&nbsp;

Most likely, you wouldn't "unpack" the 19,609 cases into raw data, so we should seek some other -- and more generalizable -- method for finding the median through frequency tables, one that would apply to <em>N</em> of any size.

&nbsp;

We could, of course, use the formula to at least establish the middle case's numbered position, and then work our way through the table to identify the median.

&nbsp;

$$\frac{N+1}{2}=\frac{19,609+1}{2}=\frac{19,610}{2}=9,805$$

&nbsp;

That is, Case #9,805's household size will be the median household size for these almost 20 thousand respondents.

&nbsp;

How do we find it? There are 5,462 respondents who reported living alone ("one person household") so we know that Case #5,462 does not "reach" the median yet, thus we have to count further. We take the next 7,432 respondents who reported living in two person households, but we need to add them to the 5,462 people living alone in order to obtain the second group's case number positions. After all, the case count for the 7,432 respondents does not start from 1 but from 5,463, and Case #5,463 will already be living in a two person household. So will Case #5,464, Case #5,465, etc. ... all the way up to Case #12,894 (because 5,462+7,432=12,894), which will be the last respondent living in a two person household.

&nbsp;

However, we now see that we have "counted" too far ahead -- we have jumped not to Case #9,805 but all the way to Case #12,894! We do know though that all cases between Case #4,463 and Case #12,894 live in two person households: this is enough for us to establish that Case #9,805 lives in a two person household as well.

&nbsp;

In short, the median household size of the 19,609 respondents is two-persons household. That is, half of the respondents live in two-person or smaller households and half of them live in two-person or larger households.

&nbsp;

<em>Hmm,</em> I hear you say, <em>this is still quite the roundabout way of getting to the median -- can you do better? </em>

&nbsp;

Alright, let's think of something else then. We tried adding the frequencies together until we reached the median... How about we try using percentages this time around -- and more to the point, <em>cumulative</em> percentages, as they are already keeping a running total? We just need to know which percent corresponds to the middle case.

&nbsp;

Recall, then, that the middle case splits the distribution of the cases in <em>two equal halves</em>. What percent is half of something? Of course, 50 percent. Thus it would make sense to simply look at the <em>Cumulative Percent</em> column and try to figure out where 50 percent would fall. The respondents living alone comprise 27.9 percent, so too low for the median, but the respondents living in one or two person households <em>added together</em> comprise already 65.8 percent of the total. Following the same logic as with the frequencies, the 50th percent falls within the one/two person household cumulative group. However, we know it's not within the one person household group. That means the 50th percent can only fall within the respondents living in two person household, which, again confirms what we already knew: the median household size is made up of two persons.

&nbsp;

To generalize, if you'd rather not use the formula for the median's position and add the frequencies of a frequency table up in order to find the median, you can always simply look for within which category/value the 50th percent would fall. That category/value will be the median one.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 3.3 Median Workplace Size</em></p>

</header>
<div class="textbox__content">

&nbsp;

Let's revisit Table 2.6 from Section 2.3.4 (https://pressbooks.bccampus.ca/simplestats/chapter/2-3-4-what-frequency-tables-look-like/). Can you identify the median of workplace size? And since you're at it anyway, what about the mode?

&nbsp;

Imagine you have to tell what you have found to some of your friends who have no knowledge of statistics. How are you going to explain to them your findings about the mode and the median of <em>workplace size</em>?

</div>
<em>Table 2.6 Frequency Table for Workplace Size of the Respondent (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/ordinal-freq-table-workplace-size-gss-2016.png" alt="" width="520" height="297" class="alignnone wp-image-1480 size-full" />

&nbsp;

</div>
&nbsp;

Finally, now that you have learned what the median is and how you can find it, I will also casually mention that you can use SPSS for that. (Okay, okay. Don't throw bricks, please: it really <em>is</em> important to work through the examples and exercises manually so that you understand what the SPSS output tells you and so that you are able to interpret that output properly.)

&nbsp;
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip 3.3 Finding the Median Of a Variable</em></p>

</header>
<div class="textbox__content">
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Analyze,</em> then <em>Descriptive Statistics,</em> then Frequencies;</li>
 	<li>Select your variable of choice from the list on the left and use the arrow to move it to the right side of the window;</li>
 	<li>Click on the <em>Statistics</em> button on the right;</li>
 	<li>In this new window, check <em>Median</em> of the <em>Central Tendency</em> section on your right;</li>
 	<li>Click <em>Continue</em>, then <em>OK</em>.</li>
 	<li>The <em>Output</em> window will provide a small table listing the median of the selected variable.</li>
</ul>
</div>
</div>
&nbsp;

Keep in mind that the <em>Watch Out!! #6</em> warning from Section 3.1 about the mode applies equally to the median: <strong>for ordinal variables, SPSS will provide the median in numerical code.</strong> <strong>It is your job to "translate" the code into the actual category's name.</strong> In the case of <em>household size</em> SPSS supplies "2" as the median, which stands for "two person household". Thus we say that the median household is a two-person one; we do <em>not</em> report that the median household is "2".

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch Out!! #7</strong></span>... for Misinterpreting the Formula for the Median</em></p>

</header>
<div class="textbox__content">

&nbsp;

An extremely common mistake regarding the median is to take the result of $\frac{N+1}{2}$ to be equal to the median itself. This is patently not true. Again, what the formula provides is the <em>place </em>(or the <em>numbered position</em>) of the median once the cases have been put in their correct order:

&nbsp;

$$\frac{N+1}{2}= $$ <em>"numbered <strong>position of the median case</strong> in the ordered list of cases</em>"

&nbsp;

Thus, once your calculation for the place of the median is done, do not forget to do the final step: check the position you have calculated and see what the <em>category/value</em> of the median case is. <strong>You need to report only that <em>value</em>, not the position itself.</strong>

</div>
</div>
&nbsp;

<strong>Stability of the median.</strong> A final noteworthy observation about the median is its <em>stability as a measure of central tendency</em>. Since the median is entirely about the central position in a variable's distribution and all it takes into account is the <em>order</em> of the cases, <em>not</em> their substantive <em>values</em>, <strong>it's</strong> <strong>impervious to the actual magnitude of the values</strong>. Thus it doesn't matter if we have a set of values like 1, 5, 20, or one like 4, 5, 6, or another like 0, 5, 9 -- the median is the same for all three, even if the values in the sets are different. Whether we have a small or a large value is immaterial, all that it matters is where the value goes into the order of the variable's cases.

&nbsp;

You will learn why this has important implications for the central tendency in the next section, all devoted to the mean.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1555</wp:post_id>
		<wp:post_date><![CDATA[2019-08-12 18:41:12]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-12 22:41:12]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[3-3-the-median-with-frequency-tables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>24</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>3.5 The Mean With Existing Data and Other Considerations</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-5-the-mean-with-real-data-and-other-considerations/</link>
		<pubDate>Tue, 13 Aug 2019 21:07:04 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1592</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Let's work through some real-world data, this time from the Canadian Community Health Survey<span style="font-size: 18.6667px"> </span><span style="text-indent: 1em;font-size: 14pt">2015-2016 </span><span style="text-indent: 18.6667px;font-size: 14pt">(Statistics Canada 2017)</span><span style="font-size: 14pt;text-indent: 1em;text-align: initial">, a.k.a. </span><em style="font-size: 14pt;text-indent: 1em;text-align: initial">CCHS 15/16</em><span style="font-size: 14pt;text-indent: 1em;text-align: initial">, a very large dataset containing information on more than 100,000 respondents.</span>

&nbsp;
<p style="text-align: left"><em>Table 3.6 Number of Times the Respondent Consulted a Mental Health</em>
<em>Professional in the Last 12 Months (CCHS 15/16)</em>
<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/FT-mean-consulted-mental-health.jpg" alt="" width="480" height="451" class="alignnone wp-image-455 size-full" /></p>
&nbsp;

To calculate how many times Canadians consulted a mental health professional in the last year preceding their participation in the survey based on the data above, we need to follow the principle we used in the <em>age of classmates</em> and <em>number of siblings</em> examples in the previous section.

&nbsp;

Specifically, we need to multiply each value (1 through 12 number of times a mental health professional was seen) by its frequency, then to sum all the products together, and finally to divide the sum on the total number of respondents, 15,462 (recall that we only use valid cases for analysis and exclude the missing ones).

&nbsp;

\begin{equation*}
\begin{aligned}
&amp; \frac{\sum\limits_{i=1}^{N}{x_i}}{N} = \\
&amp;= \frac{1(3778)+2(2851)+3(1700)+4(1426)+5(778)+6(1008)}{15462}+ \\
&amp;+ \frac{7(205)+8(357)+9(66)+10(534)+11(24)+12(2735)}{15462} = \\
&amp;= \frac{3778+5702+5100+5704+3890+6048}{15462}+ \\
&amp;+ \frac{1435+2856+594+5340+264+32820}{15462}= \\
&amp;=\frac{73531}{15462}=4.76=\overline{x}
\end{aligned}
\end{equation*}

&nbsp;

That is, we have found that the respondents on average consulted a mental health professional 4.76 times over the 12 months preceding the survey.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 3.4 How Many Times Has The Respondent Stopped Smoking for at Least 24 hrs In the Past 12 Months (CCHS 15/16)</em></p>

</header>
<div class="textbox__content">

&nbsp;

To save you you from calculating into the thousands, here is a variable based on a question that 99.9 percent of the respondents did not have to answer, which gives you a manageable <em>N</em>=106. Calculate the average number of times respondents have stopped smoking for at least 24 hrs for the 12 months preceding the survey. While you're at it, find and report the mode and median of this variable.

</div>
<em>Table 3.7 Number of Times Respondent Stopped Smoking In the Past Year (CCHS 15/16)</em>
<div class="textbox__content">

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/FT-for-mean-stopped-smoking.jpg" alt="" width="484" height="517" class="alignnone wp-image-468 size-full" />

</div>
</div>
&nbsp;

I strongly encourage you to do the above exercise yourself. Still, as usual, here is an SPSS tip on how to obtain a mean in SPSS.

&nbsp;
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip 3.4 Obtaining the Mean</em></p>

</header>
<div class="textbox__content">
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Analyze, </em>then<em> Descriptive Statistics, </em>and then<em> Frequencies</em>;</li>
 	<li>Select your variable of choice from the list on the left and use the arrow to move it to the right side of the window;</li>
 	<li>Click on the <em>Statistics</em> button on the right;</li>
 	<li>In this new window, check <em>Mean</em> in the <em>Central Tendency</em> section on your right;</li>
 	<li>Click <em>Continue</em>, then <em>OK</em>.</li>
 	<li>The Output window will provide a small table listing the selected variable's mean.</li>
</ul>
</div>
</div>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1592</wp:post_id>
		<wp:post_date><![CDATA[2019-08-13 17:07:04]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-13 21:07:04]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[3-5-the-mean-with-real-data-and-other-considerations]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>24</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>3.6 Outliers</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/3-6-outliers/</link>
		<pubDate>Tue, 13 Aug 2019 21:34:51 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1601</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Out of the three measures of central tendency, the mean is the only one that takes into account the actual numerical values of the cases. As such, it is easily affected by the size of the values: a sequence of numbers such as "1, 5, 7, 10, 15" will produce a smaller mean than a sequence of numbers like "100, 50, 75, 130, 90".

&nbsp;

When all values to be averaged are of relatively comparable magnitude, the mean does a good job at reflecting the central tendency of a variable -- that is why it is the most familiar and widely used measure. However, <strong>when a variable contains an extremely small or an extremely large value (or several values) compared to the rest of the values, the mean gets easily distorted</strong> and stops reflecting the central tendency "truthfully", as it were. <strong>Extremely small and extremely large values are called statistical <em>outliers</em>.</strong>

&nbsp;

While there is a convenient method for identifying outliers (using a concept <span style="font-size: 14pt;text-indent: 18.6667px">called </span><em style="font-size: 14pt;text-indent: 18.6667px">interquartile range </em>which <span style="text-indent: 1em;font-size: 14pt">we will discuss in the next chapter</span><span style="text-indent: 1em;font-size: 14pt">), at this stage it is not necessary that you be so technical. You can visually identify outliers, albeit less precisely, by the "disturbance" in the general pattern of the data you observe. For example, if you have values like "1, 5, 7, 10, 15", a value of 130 in that sequence would be considered an outlier. Similarly, if you have values like "100, 80, 75, 130, 90", a value of 5 would be an outlier.</span>

&nbsp;

Let's calculate the means of the two sequences, first with and then without the so-called outliers and see what happens.

&nbsp;

The first sequence is 1, 5, 7, 10, 15 and we want to see what happens when we add 130.

&nbsp;

$$\frac{(1+5+7+10+15)}{5}=\frac{38}{5}=7.6$$

&nbsp;

We add 130 to the sequence:

&nbsp;

$$\frac{(1+5+7+10+15+130)}{6}=\frac{168}{6}=28$$

&nbsp;

Both means, 7.6 and 28, are the true averages of the sequences of values as listed. However, the addition of an uncommonly large number "pulled" the mean away from the "centre" of the original data.

&nbsp;

How truthfully does 28 represent the "centre" of a sequence where the majority of the cases's values (in fact, five out of the six values) are 15 and below? Not that much.[footnote]If you believe it's not the magnitude of the value but just its addition that causes the "pulling" of the mean, consider redoing the example with adding 18, instead of 130. Then we have $\frac{(1+5+7+10+15+18)}{6}=\frac{56}{6}=9.3$. The "pull" from 7.6 to 9.3 is much smaller than from 7.6 to 28. The value 9.3 reflects the central tendency of the data more truthfully than 28 does.[/footnote]

&nbsp;

To demonstrate the effect of an extremely small value, we continuing with the next sequence:

&nbsp;

$$\frac{(100+80+75+130+90)}{5}=\frac{475}{5}=95$$

&nbsp;

Adding a value of 5 to the sequence produces the following:

&nbsp;

$$\frac{(100+80+75+130+90+5)}{6}=\frac{460}{6}=80$$

&nbsp;

Similarly as with the effect on the mean of the first sequence, the mean here gets "pulled", but in the opposite direction, from 95 to 80. Both means are technically true averages of their respective values but the latter one is "artificially" low: after all, four out of the six values are the same or higher.[footnote]Again, if we added a value of a comparable size to this sequence instead of 5, the mean would not be impacted as much: $\frac{(100+80+75+130+90+70)}{6}=\frac{545}{6}=90.8.$ Consider the "pull" from 95 to 80 vs. from 95 to 90.8.[/footnote]

&nbsp;

What this tells you is that <strong>the mean is an unstable measure of central tendency, prone to being affected by outliers.</strong> Contrast this to what you know about the median: the median does not take the magnitude of the values into consideration, beyond their order. Thus, as explained in the previous Section 3.3 (https://pressbooks.bccampus.ca/simplestats/chapter/3-3-the-median-with-frequency-tables/), adding a value (be it extremely small or extremely large) to a sequence does not affect the median much -- unlike the mean. The median of 1, 5, 7, 10, 15 is 7 (there are two values above and two below it), and whether we add 130 or 18, it doesn't matter: it's just an additional value in the sequence.[footnote]The median of 1, 5, 7, 10, 15, 18 is between 7 and 10, i.e., 8.5 (since we need the half-way distance between 7 and 10, we use the average of 7 and 10, that is 7+10=17 and divide it by 2 to get 8.5).  The median of 1, 5, 7, 10, 15, 130 is exactly the same -- it is still half-way between the two middle values, 7 and 10, or again 8.5. [/footnote]

&nbsp;

Since the mean is prone to being affected by outliers, while the median is not,<strong> in some situations it is advisable to report the median as a more "valid" measure of the typical cases/"centre" of the data rather than the mean.</strong> Specifically, watch out for reports on average income, average age, average weight, etc. where a few outliers can <em>skew</em> a variable's distribution.

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch Out!! #8</strong></span> ... for Reports on Averages of Variables Prone to Skewing by Outliers</em></p>

</header>
<div class="textbox__content">

&nbsp;

Imagine a small company advertising an open position by claiming that the average salary of their employees is 100 thousand dollars per year. For simplicity's sake, let's assume the company has ten employees and these are their salaries:

</div>
<em>Table 3.8 Employee Salaries (Hypothetical Data) </em>
<div class="textbox__content">
<table style="border-collapse: collapse;width: 100%" border="0">
<tbody>
<tr>
<td style="width: 50%;text-align: left"><strong>Value (in thousands)</strong></td>
<td style="width: 50%;text-align: left"><strong>Frequency</strong></td>
</tr>
<tr>
<td style="width: 50%">70</td>
<td style="width: 50%">5</td>
</tr>
<tr>
<td style="width: 50%">87.5</td>
<td style="width: 50%">4</td>
</tr>
<tr>
<td style="width: 50%">300</td>
<td style="width: 50%">1</td>
</tr>
<tr>
<td style="width: 50%"><strong>TOTAL</strong></td>
<td style="width: 50%"><strong>10</strong></td>
</tr>
</tbody>
</table>
You can check for yourself what the average annual salary is:

&nbsp;

$$\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{70(5)+87.5(4)+300(1)}{10}=\frac{350+350+300}{10}= \frac{1000}{10}=100$$

&nbsp;

or, indeed, 100 thousand dollars. However, how representative this annual salary is for the regular employee? After all, nine out of ten employees of the company get less than that. The average annual salary reported is inflated by the very high salary of one employee (perhaps the manager), a clear outlier.

&nbsp;

Let's instead look at the median. We start by arrange the values in order:

&nbsp;

70, 70, 70, 70, 70, 87.5, 87.5, 87.5, 87.5, 300

&nbsp;

Using the formula for finding the position of the median, we have

&nbsp;

$$\frac{(N+1)}{2}=\frac{(10+1)}{2}=\frac{11}{2}=5.5$$

&nbsp;

I.e., we find that the median falls between the fifth and the sixth value in the order, or between 70 and 87.5. The halfway point between these two values is found by averaging them:

&nbsp;

$$\frac{(70+87.5)}{2}=\frac{157.5}{2}=78.75$$

&nbsp;

which shows us that the median annual salary of the employees in that company is \$78,750. This is a lot less than the touted average of \$100,000 and a lot more reflective of what nine out of ten employees receive.

&nbsp;

</div>
</div>
&nbsp;

Examples like the <em>Watch Out!! #8</em> above show that relying on the mean can be tricky, and in some cases can be deliberately used to "lie with statistics" (i.e., a report might be technically correct but at the same time very misleading). <span style="text-indent: 18.6667px;font-size: 14pt">Thus, <strong>generally reporting all three central tendency measures is the way to go</strong> and you, as a beginner researchers should do that.</span>

&nbsp;

Finally, you can observe a skew in the data even visually by looking at an interval/ratio variable's graphical representation, i.e., its histogram. Extremely high values tend to "pull" the mean to the right of the "centre", i.e., with the majority of cases being relatively smaller, the few high values will produce a "tail" on the right side of the distribution (a.k.a. <em>positive skew</em>). On the other hand, extremely low values tend to "pull" the mean to the left of the "centre", i.e., with the majority of cases being relatively larger, the few low values will produce a "tail" on the left side of the distribution (a.k.a. <em>negative skew</em>).

&nbsp;

<span style="text-indent: 1em;font-size: 14pt">As well, since the median indicates the "centre" of the data better, a mean smaller than the median would typically indicate a negative/left skew, while a mean larger than the median would typically indicate a positive/right skew. When you observe a skew in the data, the median would typically be a the preferred measure of central tendency.</span>

&nbsp;

Observe the positive skew in Fig. 3.2 below.

&nbsp;

<em>Figure 3.1 Number of Cigarettes Smoked Per Day by Occasional Smokers (CCHS 15/16)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/right-skew-number-cigarettes-cchs1.png" alt="" width="462" height="370" class="alignnone wp-image-1610 size-full" />

The reason the numbers on the horizontal axis reach as high as 100 despite the fact that there appears to be nothing there is because there is at least one outlier case -- a respondent who said they were an occasional smoker but reported smoking 99 cigarettes per day.[footnote]Whether this is to be believed is not important here, just the fact that such a value exists in the data. You will learn what is to be done about outliers in statistical analysis in Chapter 4.[/footnote] Thus the distribution has a long right-side "tail", as it were, which you can better see in Fig. 3.2 providing the "zoomed-in" version of the histogram above. (The "tail" is what you will have if you trace an imaginary line through the tops of all the bars in the histogram down to the single case of 99 cigarettes per day.)

&nbsp;

<em>Figure 3.2</em> <em>Number of Cigarettes Smoked Per Day by Occasional Smokers (CCHS 15/16), Zoomed</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/right-skew-number-cigarettes-cchs-zoomed.png" alt="" width="462" height="370" class="alignnone wp-image-1609 size-full" />

In this case the median is 3 cigarettes smoked per day by an occasional smoker. The mean is 4.33, and as expected, it is larger than the median.

&nbsp;

Similarly, an exceptionally small value compared to the bulk of the cases will produce a negatively-skewed histogram where the distribution has a "tail" but on the left of where most cases are. In that case the mean will be smaller than the median.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1601</wp:post_id>
		<wp:post_date><![CDATA[2019-08-13 17:34:51]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-13 21:34:51]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[3-6-outliers]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>24</wp:post_parent>
		<wp:menu_order>6</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>4.2 Interquartile Range</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/4-2-interquartile-range/</link>
		<pubDate>Tue, 13 Aug 2019 23:38:19 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1621</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Unlike the range which focuses on the extreme ends, the <strong>interquartile range</strong> (frequently referred to as <strong><em>IQR</em></strong>) looks into the distribution of observations around the "centre". To that purpose, it splits the distribution into <strong>four equal parts called <em>quartiles</em></strong> (from the Latin <em>quartus</em>, meaning one-fourth, i.e., a quarter), and then provides the range of the middle two parts taken together. This sounds more complicated than it actually is, so let's turn to examples and make it better.

&nbsp;

To begin, let me first demonstrate what all this means with a set of raw values which we can call, say, <em>hours worked per week</em>.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 4.2  Weekly Hours Worked (Raw Data)</em></p>

</header>
<div class="textbox__content">

&nbsp;

Imagine you have been hired as a research assistant (RA) on a research project. You have worked 20 weeks in total in the past two semesters, ten weeks in each semester (with your classes and all, you couldn't work every week). The maximum hours per week you could work was 15, limited by the nature of your contract. You make a list of all hours you have worked in each of the twenty weeks, and you list the twenty values <em>in ascending order</em>. Here they are:

&nbsp;

2, 3, 3, 4, 5, 7, 7, 7, 8, 8, 10, 10, 10, 10, 12, 12, 13, 13, 13, 14

&nbsp;

If you recall from our discussion of the median, to split a group of values into equal parts we need the values' positions in the order. You can find these in the table below:

&nbsp;

<em>Table 4.1 Values and Their Positions of Hours Worked per Week</em>
<table style="border-collapse: collapse;width: 94.6712%;height: 165px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center"><strong>Position</strong></td>
<td style="width: 31.9405%;height: 15px;text-align: center"><strong>Hours Worked per week</strong></td>
<td style="width: 17.4929%;height: 15px;text-align: center"><strong>Position</strong></td>
<td style="width: 32.2238%;height: 15px;text-align: center"><strong>Hours Worked per Week</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(1)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">2</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(11)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">10</td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(2)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">3</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(12)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">10</td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(3)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">3</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(13)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">10</td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(4)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">4</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(14)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">10</td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(5)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">5</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(15)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">12</td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(6)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">5</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(16)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">12</td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(7)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">7</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(17)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">13</td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(8)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">7</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(18)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">13</td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(9)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">8</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(19)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">13</td>
</tr>
<tr style="height: 15px">
<td style="width: 13.102%;height: 15px;text-align: center">(10)</td>
<td style="width: 31.9405%;height: 15px;text-align: center">8</td>
<td style="width: 17.4929%;height: 15px;text-align: center">(20)</td>
<td style="width: 32.2238%;height: 15px;text-align: center">14</td>
</tr>
</tbody>
</table>
You might be tempted to use an intuitive method for splitting the set of twenty values given in the example into 4 equal parts (i.e., into quartiles) by simply dividing 20 by 4, which will let you have 5 values in each quartile:

&nbsp;

2, 3, 3, 4, 5          5, 7, 7, 8, 8          10, 10, 10, 10, 12          12, 13, 13, 13, 14,

&nbsp;

Thus the interquartile range (or "the range of the middle two parts taken together") of the entire set of 20 values would be the range of 5, 7, 7, 8, 8, 10, 10, 10, 10, 12.

&nbsp;

A quick-and-dirty calculation would show that the IQR is (12-5=) 7 hours. You would be correct -- indeed, the interquartile range <em>is</em> 7 hours -- but I'll stop you nevertheless. This worked out only because I've chosen the numbers between the first and the second quarter of cases to be both 5, and the numbers between the third quarter and the last to be both 12. You need to read below to find out the proper method for obtaining the IQR. (The example continues further down.)

</div>
</div>
&nbsp;

Quick-and-dirty calculations are not precise, even if they serve their purpose to give you a basic idea of what we are doing. Now that you've seen where this is going, let's do everything <em>properly</em>.

&nbsp;

First, we need to calculate the precise positions of the values that separate the quartiles. Recall how we used to split a set of values in two in order to get the position median. We used the following formula:

&nbsp;

$\frac{N+1}{2}=$     ←<em>"position of the median"</em>

&nbsp;

We'll follow the same logic to split each of the halves in two themselves. Thus let me restate the above formula to this:

&nbsp;

$\frac{N+1}{2}=(N+1)\frac{1}{2}=(N+1)0.5$    ←<em>"position of the median"</em>

&nbsp;

Since we effectively multiply <em>N+1</em> by 0.5 in order to split the entire set in two halves (or, to get <em>one</em> <em>half of the data</em>), to split the first half of the values further in two itself, we need to multiply <em>N+1</em> by "half of 0.5", i.e., by 0.25 (essentially getting <em>one quarter</em> of the data):

&nbsp;

<span style="text-indent: 18.6667px;font-size: 14pt">$\frac{N+1}{4}=(N+1)\frac{1}{4}=(N+1)0.25$   ←</span> <em>"position of the first quartile"</em>

&nbsp;

By analogy, splitting the second half in two itself will require getting <em>three quarters</em> of the data,  or to multiply <em>N+ 1</em> by "0.5 and a quarter", i.e., by 0.75:

&nbsp;

$\frac{(N+1)3}{4}=(N+1)\frac{3}{4}=(N+1)0.75$   ← <em>"position of the third quartile"</em>

&nbsp;

If you follow the logic, you'll easily conclude that <strong>the median is also <em>de facto</em> the second quartile</strong> (i.e., <em>two quarters</em> of the data).

&nbsp;

To restate, we have the following way to split the data into four equal parts:

&nbsp;

The position of the first quartile,<em> Q</em><sub><em>1</em>,</sub> is found through $(N+1)0.25$.

&nbsp;

The position of the second quartile, <em><span style="text-indent: 1em;font-size: 14pt">Q</span></em><sub style="text-indent: 1em"><em>2</em> </sub><span style="font-size: 14pt;text-indent: 18.6667px">(</span><span style="font-size: 14pt;text-indent: 1em">a.k.a the median), is found through </span><span style="font-size: 14pt;text-indent: 1em">$(N+1)0.5$.</span>

&nbsp;

The position of the third quartile, <em>Q</em><sub><em>3</em>, </sub>is found through $(N+1)0.75$.[footnote]Obviously, we don't speak of a <em>fourth quartile</em>, as four quarters comprise the whole thing: the fourth quartile would simply be 100%, or all of the data.[/footnote]

&nbsp;

Now let's use our newfound formulas in the Example 4.2.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 4.2 Weekly Hours Worked, Continued</em></p>

</header>
<div class="textbox__content">

With <em>N</em>=20, we get:

&nbsp;

<em>Q<sub>1</sub>'s position</em> →    $(N+1)0.25=(20+1)0.25=(21)0.25=5.25$

&nbsp;

<em>Q<sub>2</sub>'s position</em> →    $(N+1)0.5=(20+1)0.5=(21)0.5=10.5$

&nbsp;

<em>Q<sub>3</sub>'s position</em> →    $(N+1)0.75=(20+1)0.75=(21)0.75=15.75$

&nbsp;

Once again, do not forget that all these formulas provide the <em>positions</em> of the quartiles, <em>not</em> their respective values. To see the values, we have to look at Table 4.1 above which cross-lists the cases' positions <em>and</em> values. Since there is no Case #5.25, we know that the value we're looking for is between Cases #5 and #6 (a quarter further than #5) -- but as the values of both Cases #5 and #6 are 5, we conclude that the value of the first quartile is 5.

&nbsp;

Similarly, there is no Case #15.75 (so the value we're looking for is three quarters past the 15th case), but both Cases #15 and #16 are 12, so we conclude that the third quartile is 12.

&nbsp;

We are still interested in the interquartile range -- or the range of the two middle quarters of the data (or the middle 50 percent, so to speak). Then, since

&nbsp;

<em>Q<sub>3</sub></em> = 12 and <em>Q<sub>1</sub></em> = 5,

&nbsp;

we have that

&nbsp;

<em>Q<sub>3</sub> - Q</em><sub><em>1</em> </sub>= $12 - 5=7$

&nbsp;

Or, we have found that the IQR for <em>hours worked</em> <em>per week</em> is 7 hours per week. Or, at the mid-range, your hours worked per week varied between 5 and 12 hours per week.

&nbsp;

</div>
</div>
&nbsp;

Alright, but <em>why</em>, you might ask -- couldn't we just have the range and be done with it?

&nbsp;

The value added of using interquartile range is that it takes care of outliers, so it's frequently a better measure of dispersion than range. The IQR provides the spread of the centrally located 50 percent of the data which in many situations paints a more accurate picture of how "the more typical" of the variable's cases are spread out, rather than looking at the more extreme spread provided by the range which encompasses all cases, even the clear outliers.

&nbsp;

All in all, however, just like with choosing whether to use a median or mean, the decision which of these two measures of dispersion is the more appropriate one to be used and reported depends on the specific situation and the researcher's discretion. I would urge you, as a beginner researcher, to make a habit of reporting both the range and the interquartile range, while simultaneously discussing the effect of any potential outliers.

&nbsp;

Instead of working with raw data, we might have frequency tables at hand. <strong>How do we get the range and IQR from aggregated data?</strong>  For the range, simply subtract the lowest value (the one listed first in the <em>Values</em> column, of course) from the highest value (the one listed last in the <em>Values</em> column) and report the difference (in its appropriate units of measurement). For the IQR, look for the 75th percentile (i.e., <em>Q<sub>3</sub></em>) and the 25th percentile (i.e., <em>Q<sub>1</sub></em>) in the <em>Cumulative Percent</em> column, then subtract the <em>Q<sub>1</sub></em> value from the <em>Q<sub>3</sub></em> value, and again report the difference. (This is <span style="font-size: 14pt;text-indent: 18.6667px">similar to how we looked for the 50th percentile for the median, <em>Q</em></span><em><sub style="text-indent: 18.6667px">2</sub></em><span style="font-size: 14pt;text-indent: 18.6667px">, in Section 3.3 (<a href="https://pressbooks.bccampus.ca/simplestats/chapter/3-3-the-median-with-frequency-tables/">https://pressbooks.bccampus.ca/simplestats/chapter/3-3-the-median-with-frequency-tables/</a>).)</span>

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Exercise 4.1 Range and IQR for Cigarettes Smoked per Day </em></p>

</header>
<div class="textbox__content">

&nbsp;

Practice your newly acquired skills to find <em>Q<sub>1</sub>, Q<sub>2</sub></em> (i.e., the median), and <em>Q<sub>3</sub></em> in the following table. Calculate and report the range and the interquartile range for <em>number of cigarettes smoked each day</em>.

</div>
<em>Table 4.2 Number of Cigarettes Smoked Per Day by Daily Smokers (CCHS 15/16)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/range-iqr-freq-table-smokers-cchs.png" alt="" width="484" height="1287" class="alignnone wp-image-1635 size-full" />

&nbsp;

</div>
&nbsp;

To make sure you're doing it correctly, let's quickly check your answers right away. The range is of course (99-1=) 98 cigarettes per day. To find the IQR, you must have first identified<em> Q<sub>1</sub></em>= 10 (since 23.9 percent of the cases make up to 9 cigarettes per day, the 25th percentile falls in the 10 cigarettes per day category) and <em>Q<sub>3</sub></em> = 20 (since 65.4 percent of the cases make up to 19 cigarettes per day, the 75th percentile falls in the 20 cigarettes per day category). Then the IQR is (20-10=) 10. Thus you see the difference between range and interquartile range: while the range might leave you with the impression that cigarettes smoked per day vary by almost a hundred for daily smokers, the middle half of the cases actually only vary by 10 cigarettes.

&nbsp;

Of course, there's also SPSS. Check below to see how to find the range and IQR  (semi-) directly.

&nbsp;
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip 4.1 Obtaining Range and Interquartile Range</em></p>

</header>
<div class="textbox__content">
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Analyze, </em>then <em>Descriptive Statistics, </em>and then<em> Frequencies</em>;</li>
 	<li>Select your variable of choice from the list on the left and use the arrow to move it to the right side of the window;</li>
 	<li>Click on the <em>Statistics</em> button on the right;</li>
 	<li>In this new window, check <em>Quartiles</em> from the <em>Percentile Values</em> on your top left and check <em>Range</em> (and <em>Minimum</em> and <em>Maximum</em> if you wish) from the <i>Dispersion </i>section below it;</li>
 	<li>Click <em>Continue</em>, then <em>OK</em>.</li>
 	<li>Range (along with the smallest and largest values, if you asked for them) will be reported in the <em>Output</em> directly.</li>
 	<li>To obtain the IQR, simply subtract the value reported as 25th percentile from the value reported as 75th percentile.</li>
</ul>
</div>
</div>
&nbsp;

With the range and IQR covered, we are halfway through the typically used measures of dispersion. On to the remaining two, the variance and the standard deviation.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1621</wp:post_id>
		<wp:post_date><![CDATA[2019-08-13 19:38:19]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-13 23:38:19]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[4-2-interquartile-range]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>26</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>4.4 Variance Continued, Standard Deviation</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/4-4-standard-deviation/</link>
		<pubDate>Thu, 15 Aug 2019 20:30:33 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1647</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

I'm sure you'll agree the preceding section was a lot to take in. And here's the kicker: after all that, we arrived at something which we cannot easily or intuitively interpret, given the squared units. However, the variance is used a lot in statistics, for great many things. Generally, the larger the variance, the greater the <em>variability</em> of the variable, or the larger the "dispersed-ness" of the cases.

&nbsp;

Despite the seemingly convoluted way we arrived at the variance and all the calculations and mathematical notation, what we did was actually quite simple. (No, really!)

&nbsp;

To recap: just like we average all values by summing them up and dividing the sum on their total to get the mean, we average the distances of the values from the mean by summing them up and dividing the sum on their total. The only difference is that in order to be able to sum the distances, we need to square each of them first, or we cannot proceed.

&nbsp;

Here are the formulas for the mean and the variance together so that you can compare:

&nbsp;

$\frac{\sum\limits_{i=1}^{N}{x_i}}{N} = \overline{x}$   ← <em>mean</em>

&nbsp;

$\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N} = \sigma^2$    ← <em>variance</em>

&nbsp;

Now that I have you feeling somewhat comfortable, I have a confession to make. <strong>This above isn't the only version of the formula for variance that exists or that we will be using.</strong>

&nbsp;

Bear with me (and welcome back, to those who threw the reading away in disgust) -- I promise to explain everything when we get to inferential statistics further in the textbook, as the explanation requires concepts and terminology we have not yet covered and which cannot be easily introduced at this point. (Hint: it deals with estimation and uncertainty.)[footnote]If you'd like a preview, <strong>the alternative, to-be-explained-later, formula for variance is:</strong>

&nbsp;

$\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N-1} = s^2 =$    ← <strong><em>variance</em></strong>

&nbsp;

As you can see, the modification is quite small -- <strong>instead of dividing the sum of squares by the total number <em>N</em>, we actually divide it by the total <em>minus one</em>, <em>N</em>-1</strong>. If it makes you feel better, dividing just by <em>N</em> or by <em>N-1 </em>produces generally similar results, in terms of magnitude of the variance. <span style="text-indent: 18.6667px;font-size: 14pt">We also denote this version with a regular small-case $s^2$.</span><span style="text-indent: 1em;font-size: 14pt">[/footnote] </span>

&nbsp;

One thing worth noting, however, is that despite the lack of proper explanation as of yet, when working with typical datasets<strong> SPSS will produce variances by dividing the sum of squares by <em>N-1</em> instead of by <em>N</em>.</strong>

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch Out!!</strong></span> <span style="background-color: #000000;color: #ff0000">#9</span> ... for The Order of Operations</em></p>

</header>
<div class="textbox__content">

&nbsp;

When considering the formula for variance, and the steps we took to calculate it, pay special attention to the <em>sum of squares</em>. That is, we need a sum of <em>squares </em>(a.k.a., to add the squared distances from the mean together): <strong>we <em>first</em> calculate the distances, <em>then</em> square them, and finally sum the <em>squared</em> distances up</strong>.

&nbsp;

A common mistake, however, is to try to calculate the distances, sum them up, <em>then</em> square the sum. As explained above, the (un-squared) distances add up to zero, and squaring the zero will not improve things. A version of this mistake is also to calculate the distances, then try to sum them and divide them by <em>N</em>-1, and <em>then</em> square the result. Obviously this would also be unsuccessful. To avoid these type of frustrations, try to remember the purpose of the squaring: to "turn" all distances into positive numbers. Everything else we do (summing, dividing), we do to the already squared distances.

</div>
</div>
&nbsp;

In an effort to show you that the calculation of the variance is simple when done without the protracted explanations, take another example we have used before, <em>number of siblings</em>.
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 4.5 Variance for Number of Siblings</em></p>

</header>
<div class="textbox__content">

&nbsp;

In discussing the median in Section 3.2 (<a href="https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/">https://pressbooks.bccampus.ca/simplestats/chapter/3-2-median/</a>), we imagined you asked seven of your friends about the number of their siblings. These were the values we used:  2, 1, 4, 2, 1, 0, 3.

&nbsp;

Let's produce the variance, in four simple steps, after calculating the mean; Step 1A, obtain the distances from the mean; Step 1B, square the distances from the mean; Step 2, obtain the sum of squares (i.e., sum the distances up); Step 3, divide by <em>N</em>.

&nbsp;

<strong>Preliminary step: obtain the mean.</strong>

$\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{2+1+4+2+1+0+3}{7}=\frac{13}{7}=1.857= \overline{x}$

&nbsp;

<strong>Steps 1A and 1B are presented in the table below:</strong>

&nbsp;

<em>Table 4.4 Calculating Distances To the Mean and Squaring Each Distance</em>
<table class="lines" style="border-collapse: collapse;width: 131.729%;height: 236px" border="0">
<tbody>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center"><strong>$x_i$</strong></td>
<td style="width: 36.6855%;height: 15px;text-align: center"><strong>$(x_i - \overline{x})$</strong></td>
<td style="width: 61.0955%;height: 15px;text-align: center"><strong>$(x_i - \overline{x})^2$</strong></td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">2</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(2 - 1.857) = 0.143</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(0.143)<sup>2</sup> = 0.02</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">1</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(1 - 1.857) = -0.857</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(-0.857)<sup>2</sup> = 0.734</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">4</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(4 - 1.857) = 2.143</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(2.143)<sup>2</sup> = 4.592</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">2</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(2 - 1.857) = 0.143</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(0.143)<sup>2</sup> = 0.02</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">1</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(1 - 1.857) = -0.857</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(-0.857)<sup>2</sup> = 0.734</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">0</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(0 - 1.857) = -1.857</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(-1.857)<sup>2</sup> = 3.448</td>
</tr>
<tr style="height: 15px">
<td style="width: 34.1359%;height: 15px;text-align: center">3</td>
<td style="width: 36.6855%;height: 15px;text-align: center">(3 - 1.86) = 1.143</td>
<td style="width: 61.0955%;height: 15px;text-align: center">(1.143)<sup>2</sup> = 1.306</td>
</tr>
</tbody>
</table>
<strong>Step 2, obtain the sum of squares</strong>:

&nbsp;

$\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2} = (0.02)2+(0.734)2+4.592+3.448+1.306=10.854$    ←<em>Sum of Squares</em>

&nbsp;

<strong>Step 3, divide the sum of squares</strong> (rounded down to two digits) <strong>by <em>N</em></strong>, i.e., by<em> 7</em>:

&nbsp;

$\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N}=\frac{10.85}{7}=1.55= \sigma^2$    ← <strong><em>variance</em></strong>

&nbsp;

Thus, we find that your seven friends have an average of about 1.6 squared distances from the mean number of siblings 1.9 (rounded up from 1.857).

</div>
</div>
&nbsp;

<em>Oh, great</em>, you are probably thinking now, and I can imagine the sarcasm -- <em>we calculated something we can't even interpret properly</em>. I mean, it's more than a tad awkward to try to explain "an average of about 1.6 squared distances from the mean number of siblings" to anyone not versed in statistics. Maybe it would be better if we could get rid of the "squared-ness"?

&nbsp;

You know what? <em>We can</em>. The standard deviation is here to help.

&nbsp;

<strong>Standard deviation</strong>. Believe it or not, after all the steps we went through to get to the variance, calculating the standard deviation is a breeze: specifically, a breeze that turns back the squared units into <em>standard</em> units, hence the name.

&nbsp;

See for yourself:

&nbsp;

$\sqrt{\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N}} = \sqrt{\sigma^2}=\sigma$    ← <strong><em>standard deviation</em></strong>

&nbsp;

Despite its scary looks, this is actually just the formula for variance <em>under a square root</em>. That is, <strong>we take the square root of the variance to get the standard deviation</strong>. That's it. Nothing more. Just a regular square root, and we're there. Cue in a sigh of relief![footnote]Note, however, that just like there is an "alternative", to-be-explained-later, formula for variance, there is an "alternative" formula for standard deviation, following the same principle regarding dividing the sum of squares by <em>N-1</em> instead of by <em>N</em>:

&nbsp;

$\sqrt{\frac{\sum\limits_{i=1}^{N}{(x_i-\overline{x})^2}}{N-1}} = \sqrt{s^2}=s$   ← <strong><em>standard deviation </em></strong>

&nbsp;

As well, SPSS will use this (<em>N</em>-1) version of the formula when working with variables in a dataset.[/footnote]

&nbsp;

Now that we know how to get back to standard units, let's do that for the two examples we used. We had a variance of <em>σ<sup>2</sup></em> = 15.21 for <em>hours worked per week</em> in the previous section and a variance of <em>σ<sup>2</sup></em> = 1.6 for <em>numbers of siblings </em>in the example above. Square-rooting gives us the following:

&nbsp;

$$\sqrt{\sigma^2}=\sqrt{15.21}=3.9$$

&nbsp;

and

&nbsp;

$$\sqrt{\sigma^2}=\sqrt{1.6}=1.25$$

&nbsp;

Now <em>these</em> we <em>can</em> interpret: on average, your hours worked per week deviated from the mean of 8.7 hours per week by 3.9 <em>hours</em>, and your friends <span style="text-align: initial;text-indent: 2em;font-size: 14pt">deviated from the average number of siblings,1.9, by 1.25 </span><em style="text-align: initial;text-indent: 2em;font-size: 14pt">siblings</em><span style="text-align: initial;text-indent: 2em;font-size: 14pt">.</span>

&nbsp;

To repeat, <strong>the standard deviation is the square root of the variance. The standard deviation is a measure of dispersion which gives us the average deviation of the cases from the mean.</strong> (Technically, an average of the squared distances from the mean in standard units.)

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 4.2 Longevity of The First Fifteen Canadian Prime Ministers</em></p>

</header>
<div class="textbox__content">

&nbsp;

Calculate the variance and standard deviation of the longevity of the first fifteen Prime Ministers of Canada. In chronological order (starting with Macdonald and ending with Pierre Trudeau), their ages at the time of death were: 76, 70, 72, 49, 93, 94, 77, 82, 86, 75, 76, 91, 83, 75, and 80. Interpret your results (i.e., explain what you have found beyond "the standard deviation is ...").

&nbsp;

You can use a table like Table 4.4 to organize your calculations. (Hint: Start with calculating the mean age at death, $\overline{x}$, and round it up to a whole number to make your job easier.) Here $x_i$ is age at death for each PM and <em>N</em>=15.

&nbsp;

You can check your answers in this footnote.<span style="text-indent: 33.6px;font-size: 0.9em">[footnote] </span><span style="text-align: initial;text-indent: 2em;font-size: 0.9em">The mean is 79 years; the sum of squares 1,717; the variance 114.5; the standard deviation 10.7 years. However, if you calculated the variance and standard deviation with </span><em style="text-align: initial;text-indent: 2em;font-size: 0.9em">N</em><span style="text-align: initial;text-indent: 2em;font-size: 0.9em">-1 in the denominators, you will get a variance of 123 and a standard deviation of 11.1 years.  The difference is as large as it is due to the small </span><em style="text-align: initial;text-indent: 2em;font-size: 0.9em">N. H</em><span style="text-align: initial;text-indent: 2em;font-size: 0.9em">ad we been working with a real dataset of hundreds or thousands of cases, the difference between the just-</span><em style="text-align: initial;text-indent: 2em;font-size: 0.9em">N</em><span style="text-align: initial;text-indent: 2em;font-size: 0.9em"> and </span><em style="text-align: initial;text-indent: 2em;font-size: 0.9em">N</em><span style="text-align: initial;text-indent: 2em;font-size: 0.9em">-1 versions of the formulas would have been less pronounced.[/footnote]</span>

</div>
</div>
&nbsp;

Of course, one wouldn't normally calculate variances and standard deviations by hand: we only do it so that you can understand what the measures are and what they really provide us with, by obtaining them ourselves. Usually, however, we simply use SPSS.

&nbsp;
<div class="textbox textbox--key-takeaways"><header class="textbox__header">
<p class="textbox__title"><em>SPSS Tip 4.2 Obtaining Variance and Standard Deviation</em></p>

</header>
<div class="textbox__content">
<ul>
 	<li>From the <em>Main Menu</em>, select <em>Analyze,</em> then<em> Descriptive Statistics, </em>and then<em> Frequencies</em>;</li>
 	<li>Select your variable of choice from the list on the left and use the arrow to move it to the right side of the window;</li>
 	<li>Click on the <em>Statistics</em> button on the right;</li>
 	<li>In this new window, check <em>Variance</em> and <em>Standard deviation</em> in the <i>Dispersion </i>section on the left at the bottom;</li>
 	<li>Click <em>Continue</em>, then <em>OK</em>.</li>
 	<li>The <em>Output</em> window will provide a table with the requested measures.</li>
 	<li>Make sure you know how to interpret your results! (Try to use as little statistics jargon as possible.)</li>
</ul>
</div>
</div>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1647</wp:post_id>
		<wp:post_date><![CDATA[2019-08-15 16:30:33]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-15 20:30:33]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[4-4-standard-deviation]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>26</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.2 Probability Basics</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-probability-basics/</link>
		<pubDate>Thu, 15 Aug 2019 22:35:44 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1677</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Whenever we talk about <strong>the likelihood of some future event taking place</strong>, we talk about <strong><em>probability</em></strong>. This likelihood serves as a prediction -- what we can expect to happen or not happen. For example, people might mention the odds of winning the lottery, or the probability of being hit by lightning, or to discuss the fact that it's likelier to die in a car accident rather than an airplane one, or to think that the odds of having a baby girl are the same as the odds of having a a baby boy. Sociologists in particular might typically be interested in an individual's life chances, things like the probability of going to college, the probability of being unemployed, or to have a high-paying job, etc. and comparing the probabilities for any of these happening based on characteristics like race/ethnicity, gender, socioeconomic class, religion, sexual orientation, etc.

&nbsp;

Probability is predicated on uncertainty; as the old song goes, "the future's not ours to see."  We use probabilities to manage the uncertainty, usually by quantifying it. For example, life expectancy at birth is the predicted longevity that a newborn will have (given current death rates). Or you might have even taken important decisions and made choices based on odds and likelihoods (i.e.. on probabilities). An entire industry -- betting and gambling -- is based on the fact that we don't know what <em>will</em> happen but we nevertheless try to predict what <em>might</em> happen.

&nbsp;

Given the dealing with uncertainty and predictions, it shouldn't be too surprising that probability is completely and entirely <em>theoretical</em>. It's an <em>expectation</em> for the future, which can't be anything but abstract. (After all if, something had already happened, and has become reality, we wouldn't need to predict it or to discuss its probability of occurring.)

&nbsp;

Let's start with an example which is familiar to absolutely everyone, usually from an early age. At some point in your life you have likely uttered the phrase "there's a fifty-fifty chance of..." Like "I didn't do too well on my last test, by now there's a fifty-fifty chance to pass the course." Or "the traffic looks bad but it might clear up; I still have a fifty-fifty chance of making it to the job interview on time." Or "this plan has a fifty-fifty chance of success." Or even "this nachos look disgusting, you have a fifty-fifty chance to get food poisoning."

&nbsp;

<em>A fifty-fifty chance</em> of course means <em>an</em> <em>equal probability of something to happen or not</em>; out of two possible outcomes, either can happen with equal likelihood so it's impossible to predict in favour of any of them.

&nbsp;

I'm sure you know that the fifty-fifty chance expression comes from the impossibility of predicting the outcome of a flipped coin: be it heads or tails. Assuming a coin cannot possibly fall on its edge, when flipped it has only two outcomes, represented by its two sides, falling as heads or as tails. Thus, the probability of its falling on a side (a 100 percent) is divided by two -- giving us 50 percent chance to get heads and 50 percent chance to get tails.

&nbsp;

The 50/50 percent is a <em>prediction</em>. The moment the coin falls, one outcome has been realized and the prediction no longer applies because the event is no longer in the future.  The distinction between the <em>factual</em> reality (the event has happened) versus the <em>theoretical</em> probability[footnote]Note that the theoretical probability is still grounded in the reality of there being only two possible outcomes. Thus predictions we base on probability are not wild, baseless guesses but a product of rational thinking and calculations.[/footnote] (of the event happening) might seem trivially easy to make at this point but its nevertheless very important. Keep it in mind, you'll need it for what's to come.

&nbsp;

Imagine you flip a coin two times in a row. Can you predict that you'll get once heads and the other time tails? Is it possible that you get heads twice in a row? What if you flip a coin ten times? Would you get <span style="text-indent: 37.3333px;font-size: 14pt">tails</span><span style="text-indent: 37.3333px;font-size: 14pt"> </span><span style="text-align: initial;text-indent: 2em;font-size: 14pt">exactly 5 times and </span><span style="text-indent: 37.3333px;font-size: 14pt">heads exactly </span><span style="font-size: 14pt;text-align: initial;text-indent: 2em">5 times? Or could you perhaps get 3 heads and 7 tails?  What about 9 times heads and 1 time tails? And what if you flip a coin a hundred times? Or more?</span>

&nbsp;

You might have already reasoned it, or you might have even  tried it at some point: it's quite possible to flip a coin and get the same side twice in a row. Or three times. Or four times. Or more. (It's even possible to flip heads ten out of ten times in a row... or even a hundred out of a hundred. In this case <em>possible</em> means that there is such a probability, as small as it is. <em>Possible</em> doesn't mean necessarily <em>plausible</em>.) How do you reconcile this with the knowledge that the probability of getting heads is 50 percent?

&nbsp;

And that -- the <em>probability</em> -- is just it. We know that <em>theoretically</em> with each coin toss the coin can fall as either heads or tails, the prediction/expectation is a fifty-fifty chance.We know that <em>in theory</em>, if we flipped coins <em>forever</em>, heads and tails will average at 50 percent of the time each[footnote]This website provides a neat visualization of both the probability/expectation and a digital coin toss: <a href="https://seeing-theory.brown.edu/basic-probability/index.html">https://seeing-theory.brown.edu/basic-probability/index.html</a>. There you can try flipping the coin 100, even 1000 times, and see that the larger the number of flips, the closer you get to the fifty-fifty expectation. The same website allows you to throw a die and to pick a card out of ten consecutively numbered cards[/footnote]. We can't flip coins forever, however, so it's possible we get a different outcomes distribution in any finite number of times we do it (but the larger the number of times, the likelier we'll be getting to 50/50 percent, or close[footnote]You can find more on this property of large numbers in Chapter 6.[/footnote]).

&nbsp;

Thus there is no contradiction in <em>theoretically expecting</em> a fifty-fifty chance of flipping tails out of, say, ten tosses and <em>actually</em> <em>getting</em> heads 6 times and tails only 4, as I'm sure you know. The former is a probability distribution, the latter is the observed, actual frequency distribution of the cases/observations/data. Keep this thought too.

&nbsp;

Before we continue on to something more novel and exciting than the old coin toss example, however, let formalize our discussion a bit.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1677</wp:post_id>
		<wp:post_date><![CDATA[2019-08-15 18:35:44]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-15 22:35:44]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-2-probability-basics]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.1.1 Properties of the Normal Curve</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-1-1-properties-of-the-normal-curve/</link>
		<pubDate>Mon, 19 Aug 2019 23:02:46 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1694</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Recall that we describe a distribution via three things: its shape, its central tendency measures, and its measures of dispersion. <strong>The</strong> perfect (i.e., theoretical) <strong>normal distribution thus has three defining features.</strong>

&nbsp;

First, the normal curve is <strong>bell-shaped and perfectly symmetric</strong> (i.e., if you bisect it in the middle, the left side will be identical to the right side).[footnote]It's also asymptotic to the horizontal axis line, i.e., it gets as close to it as possible in the "tails" without ever touching it. More on this after you learn about probabilities.[/footnote]

&nbsp;

Second, the normal curve is <strong>centered on the mean</strong>, which also happens to be equal to its median and mode. That is, for the normal curve <strong>all measures of central tendency fall on the same value</strong>.

&nbsp;

Third, <strong>the normal curve's standard deviation tell us what percentage of observations fall within a specific distance from the mean</strong>. When we have a normal curve, the area below the curve to contains 100 percent of all observations. Then, 68 percent of all observations fall within 1 standard deviation from the mean[footnote]Given the symmetry, this means 34 percent fall within -1 standard deviation below and 34 percent fall within +1 standard deviation above the mean.[/footnote]; 95 percent of observations fall within about 2 standard deviations from the mean[footnote]That is, 47.5 percent fall within about -2 standard deviations below the mean and 47.5 fall within about +2 standard deviations above the mean.[/footnote]; and 99 percent of observations fall within about 3 standard deviations from the mean[footnote]That is, 49.5 percent fall within about -3 standard deviations below the mean and 49.5 percent fall within about +3 standard deviations above the mean[/footnote]. Fig. 5.3 illustrates.

&nbsp;

<em>Figure 5.3 Normal Curve with Standard Deviations</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-with-standard-deviation.png" alt="" width="754" height="348" class="wp-image-1705 aligncenter" />

&nbsp;

If you imagine Fig. 5.3 interposed on top of an approximately distributed variable's histogram, you can see what percentage of observations will fall within 1, 2, and 3 standard deviations from the mean. (Obviously, the mean is at 0, since the normal curve is centered on the midway point of the curve, and is neither below nor above itself, i.e., "the mean is 0 standard deviations away from the mean", as awkward as it sounds.)

&nbsp;

Let's make sure this makes sense to you in applied terms, through the example below.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 5.1 Normally Distributed Test Scores (Hypothetical Data)</em></p>

</header>
<div class="textbox__content">

&nbsp;

Imagine your statistics class has taken a test. The average test score is 65 with a standard deviation of 10 and the following scores distribution. (You can imagine a histogram whose many bars follow the curve in the three Fig. 5.4 below.)

&nbsp;

<em>Figure 5.4 (A) Test Scores within 1 Standard Deviation</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-test-scores-68percent-.png" alt="" width="870" height="401" class="wp-image-1707 size-full alignleft" />

</div>
<em>Figure 5.4 (B) Test Scores within About 2 Standard Deviations</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-test-scores-95percent-.png" alt="" width="708" height="358" class="aligncenter wp-image-1708 " />
<div class="textbox__content">

<em>Figure 5.4 (B) Test Scores within About 3 Standard Deviations</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-test-scores-99percent-.png" alt="" width="898" height="454" class="wp-image-1709 size-full alignleft" />

&nbsp;

&nbsp;

&nbsp;

&nbsp;

<span style="text-align: initial;text-indent: 2em;font-size: 0.9em">Given the properties of the normal curve, we now know that 68 percent of students in the class scored between 55 and 75 (i.e., between -1 and +1 standard deviations from the mean, and since the standard deviation is 10, then $65-10=55$ and $65+10=75$). We also know that 95 percent of students scored approximately between 45 and 85 (i.e., between about -2 and +2 standard deviations from the mean, or $65-2(10)=65-20=45$ and $65+2(10)=65+20=85$). Finally, we know that 99 percent of students (almost everyone!) scored approximately between 35 and 95 (i.e., between -3 and +3 standard deviations from the mean, or $$65-3(10)=65-30=35$ and $65+3(10)=65+30=95$).</span>

&nbsp;

<span style="text-align: initial;text-indent: 2em"><span style="font-size: 0.9em">As is typical of normal distributions, the majority of scores (68 percent) are clustered in the middle (within -1 and 1 standard deviations) around the mean; the remaining 32 percent are split between the "tails" of the distribution, with about 16 percent in each "tail" beyond -1 and beyond +1 standard deviation from the mean. Only 5 percent of test scores are as far away as -2 and +2 standard deviations from the mean, with just 2.5 percent at the tips of each of the "tails". And at the very, very far ends of the "tails", beyond the -3 and +3 standard deviations from the mean, you have 1 percent split between them, so a </span><span style="font-size: 15.12px">minuscule</span><span style="font-size: 0.9em"> 0.5 percent of students has a score below 35 and another 0.5 percent has a score above 95.</span></span>

</div>
&nbsp;

</div>
&nbsp;

These features of the normal distribution (symmetrical, centered on the mean/median/mode, measured in standard deviations from the mean) make it very useful to work with. Simultaneously, now you can begin to see why the standard deviation is the most popular measure of dispersion, due to its unique relationship with the normal curve.

&nbsp;

Can we find more uses of the normal distribution? Read on to find out.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1694</wp:post_id>
		<wp:post_date><![CDATA[2019-08-19 19:02:46]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-19 23:02:46]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-1-1-properties-of-the-normal-curve]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Chapter 1 Variables and Their Measurement</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/main-body/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/2018/10/31/main-body/</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Naturally, we start with preliminaries. Before you learn the tools of any trade, you need to learn about your subject matter, i.e., on what you will be applying those tools. In this chapter I introduce you to the "building blocks" of statistics: the concept of variables and some related vocabulary. You will learn what variables are and about their levels of measurement (what nominal, ordinal, interval, and ratio scales are); how to determine the level of measurement of an actual existing variable and whether you should treat variables as discrete or as continuous for the purposes of statistical analysis.

&nbsp;

Think of this chapter as the one establishing the main characters of a fictional story -- the characters might seem too many at first, appearing too fast one after the other, so initially it might be hard to keep track of them and who is who and who does what. In time, however, the more you read about them (and sometimes going back to re-read key passages) they become familiar to you; then and only then you can comfortably follow their story.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>3</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[main-body]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Introduction</title>
		<link>https://pressbooks.bccampus.ca/simplestats/front-matter/introduction/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/2018/10/31/introduction/</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

This book is intended to be your "first date" with statistics. It might end up as your <em>last</em> date with statistics too, so I'll try to make the most of it while given the chance.

&nbsp;

The book is organized as follows. Applied statistics is about data. Chapters 1 and 2 introduce you to concepts like variables and data sets and the type of information collected wherein, and generally cover all the preliminaries you need to know in order to start 'doing' statistics. Chapters 3 and 4 follow with the ways we can summarize and describe data. Altogether, this first part of the book is usually called <em>descriptive statistics</em>; it allows us to learn things from and about data that in many cases we cannot readily see just from looking at it.

&nbsp;

I have devoted Chapters 5 and 6 to some theoretical concepts which are necessary to continue with the rest of the book, i.e, the part usually referred to as <em>inferential statistics</em>. You see, statistics would have a rather limited value if all it allowed us to do were to summarize or <em>describe</em> data (as useful as that is). The real power of statistics comes from <em>prediction</em> and <em>estimation </em>(i.e., <em>inference</em>), the subjects of the latter part of the book. In Chapters 7 through 10 you will learn how and why we can know things that go beyond the actual data we have; how likely they are and how confident we can be in this newfound knowledge; what it means for variables to be statistically associated, and finally, whether we can identify causes and effects in the social world with any amount of certainty.

&nbsp;

At this point, when promising all this to my students I usually feel like a charlatan at a county fair: <em>Come one, come all, I'll look at my crystal ball and the palm of your hand and tell you things I cannot possibly know</em>. After all, yes, alright, describing data you can see is one thing -- but this <em>inference</em> thing?.. However, the more you learn about statistics and statistical tools and methods, the less (and less, and less) it will feel like charlatanry (I promise). Like many things in science, it only <em>looks</em> <em>like</em> charlatanry at first blush because you lack the knowledge of the principles that make the seemingly impossible, possible. In reality, what you will be learning in this book is not even all that complicated. If you don't believe me yet, check it yourself -- just promise to go consecutively and patiently through all the parts until the end -- no skipping!

&nbsp;

So, ready to go?]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>4</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[open]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[introduction]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>6</wp:menu_order>
		<wp:post_type><![CDATA[front-matter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<category domain="front-matter-type" nicename="introduction"><![CDATA[Introduction]]></category>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Authors</title>
		<link>https://pressbooks.bccampus.ca/simplestats/authors/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/authors/</guid>
		<description></description>
		<content:encoded><![CDATA[<!-- Here be dragons. -->]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>7</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[authors]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[page]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
	</item>
	<item>
		<title>Cover</title>
		<link>https://pressbooks.bccampus.ca/simplestats/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/cover/</guid>
		<description></description>
		<content:encoded><![CDATA[<!-- Here be dragons. -->]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>8</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[cover]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[page]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
	</item>
	<item>
		<title>Table of Contents</title>
		<link>https://pressbooks.bccampus.ca/simplestats/table-of-contents/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/table-of-contents/</guid>
		<description></description>
		<content:encoded><![CDATA[<!-- Here be dragons. -->]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>9</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[table-of-contents]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[page]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
	</item>
	<item>
		<title>About</title>
		<link>https://pressbooks.bccampus.ca/simplestats/about/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/about/</guid>
		<description></description>
		<content:encoded><![CDATA[<!-- Here be dragons. -->]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>10</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[about]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[page]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
	</item>
	<item>
		<title>Buy</title>
		<link>https://pressbooks.bccampus.ca/simplestats/buy/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/buy/</guid>
		<description></description>
		<content:encoded><![CDATA[<!-- Here be dragons. -->]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>11</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[buy]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[page]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
	</item>
	<item>
		<title>Access Denied</title>
		<link>https://pressbooks.bccampus.ca/simplestats/access-denied/</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/access-denied/</guid>
		<description></description>
		<content:encoded><![CDATA[<!-- Here be dragons. -->]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>12</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[access-denied]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[page]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
	</item>
	<item>
		<title>Book Information</title>
		<link>https://pressbooks.bccampus.ca/simplestats/?metadata=book-information</link>
		<pubDate>Wed, 31 Oct 2018 18:29:39 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/2018/10/31/book-information/</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>16</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:29:39]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:29:39]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[book-information]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>0</wp:menu_order>
		<wp:post_type><![CDATA[metadata]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<category domain="contributor" nicename="mariana-gatzeva"><![CDATA[Mariana Gatzeva]]></category>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_authors]]></wp:meta_key>
			<wp:meta_value><![CDATA[mariana]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[Simple Stats Tools]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_language]]></wp:meta_key>
			<wp:meta_value><![CDATA[en-ca]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_cover_image]]></wp:meta_key>
			<wp:meta_value><![CDATA[https://pressbooks.bccampus.ca/wp-content/plugins/pressbooks/assets/dist/images/default-book-cover.jpg]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Preface</title>
		<link>https://pressbooks.bccampus.ca/simplestats/front-matter/foreword/</link>
		<pubDate>Wed, 31 Oct 2018 18:30:59 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=front-matter&#038;p=20</guid>
		<description></description>
		<content:encoded><![CDATA[<p class="indent">I have dedicated this book to my statistics students, former and future, all of them. Future, because it's all for them; they'll be the ones making use of it.</p>
&nbsp;

Former, because over the years they have been showing me (and, in many cases, telling me in no uncertain terms and with great emotion) how their first experience with statistics went. Because, somehow, along the way they have also taught me how to teach statistics to <em>them. </em>Not to a mass of generalized "undergraduate social science students in an introductory stats class," with my initial preconceived idea of these students' abilities, prior knowledge and needs, no -- but to the actual <em>them</em>, the very real people I see in my classes.<span style="font-size: 14pt"> During the </span><span style="text-indent: 1em;font-size: 14pt">almost ten years of "SOCI 2365 Introduction to Social Research Statistics" instruction at Kwantlen Polytechnic University, I have learned how best to approach teaching stats to </span><em style="text-indent: 1em;font-size: 14pt">my</em><span style="text-indent: 1em;font-size: 14pt"> students, in accordance to their actual academic needs and their actual academic abilities.</span>

&nbsp;

So who are the students in my classes? (Forgive me, now I'll have to generalize after all.) The typical student in my introductory stats class tends to be there because they have to (the course is compulsory for our major, along with a handful of others); is majoring sociology; is likely "not very good with math" and, therefore, has delayed taking the course as much as possible because, understandably, they are terrified. I could have used "she is" instead of the gender-neutral "they are"-- I typically have more female than male students. That is not to say that students not fitting this profile don't take up my class; they do, and they're not few. This example simply gives me the opportunity to give you a taste of what the book will be about: statistics and sociology.

&nbsp;

See, "tends to", "likely", and "on average" are all terms with specific statistical meaning (as much as they can be misused and misinterpreted in conventional, everyday usage) -- but you'll have to go further into the book for that. However, I can easily tell you that I also have students, many of them, who are <em>not</em> majoring in the social sciences, are in their second year (as they are supposed to), are great with math, and who find the course easy. Of course, many of my students are also male. Obviously, none of what I just said contradicts the description of my typical student (and if it's not obvious, you <em>definitely</em> need this book). The "typical student" description is simply based on a brief statistical profile of an average class I usually have. The various characteristics I listed may or may not be statistically associated with each other, not to mention anything about <em>causal</em> association. (Were you perhaps thinking that, say, women in my classes are the ones "not good with math" while men "find it easy"? I actually never said, not even implied, that. But now you see how easily statistical information can be misinterpreted and how statements based on statistical information can be taken to mean more than they actually do.)

&nbsp;

Why sociology though? The description above can lead us to a few questions (i.e., we can formulate hypotheses), like, are students majoring sociology (or other social sciences, except economics) really more likely to say they are "not good at math" than, say, students in the natural sciences? For that matter, are women on average more likely to major in social sciences and humanities than in the STEM (science, technology, engineering, and mathematics) fields? The answers to these questions  can be found through statistical analysis <span style="text-indent: 18.6667px;font-size: 14pt">(both are "yes" by the way) b</span><span style="text-indent: 1em"><span style="font-size: 14pt">ut the explanations (or theories) -- i.e., </span><em style="font-size: 14pt">why</em><span style="font-size: 14pt"> we observe the relationships between gender, major, and perceived math ability -- are profoundly sociological.</span></span>

&nbsp;

<span style="text-indent: 1em"><span style="font-size: 14pt">In a similar vein, throughout this book I will bring up questions of sociological relevance, I will refer to </span><span style="font-size: 18.6667px">sociological</span><span style="font-size: 14pt"> </span></span><span style="font-size: 18.6667px">theories, research and findings, I will give sociological examples, and ultimately I will use sociological data.  </span>

&nbsp;

<span style="font-size: 18.6667px">Why does that matter? Stats is stats, right?.. Hmm, yes, and no -- and in the case of applied statistics, as the current text is, rather no. Yes; if you go by the table of contents, you'll see what one typically sees in a generic introductory statistics book (for social scientists); statistics is a set of tools, and it can be presented as generically and as generally as possible. However, like any tool, its value is higher the more specialized it is (you <em>can</em> take an ailing tooth out with a hammer yet arguably it's better to use specialized dental equipment). Like any tool, it also matters what it is used for and how.</span>

&nbsp;

<span style="font-size: 18.6667px">In other words, in this book the statistics instruction will be specialized: from a sociologist (granted, herself specialized in social statistics) for sociologists. </span><span style="font-size: 14pt;text-indent: 18.6667px">(If you are neither a sociology student or sociology instructor, you can take this as sort of a </span><em style="font-size: 14pt;text-indent: 18.6667px">caveat emptor</em><span style="font-size: 14pt;text-indent: 18.6667px"> clause: buyer beware.) </span><span style="text-indent: 1em">To the extent that sociology itself is a rather broad discipline and its use of statistics is equally as broad, one could use the book as an introduction to social science statistics. However, I do not go out of my way to engage in statistical instruments more frequently used in, say, criminology or psychology (i.e., small-size court case data, or experiment data, etc.).</span>

&nbsp;

<span style="text-indent: 1em">I'll give you a different example: If you open an introductory psychology textbook, you will likely find a chapter on Sexuality and Gender. Yet "gender" and "sexuality" are also huge topics in sociology, and any introductory sociology textbook also has a chapter on them. There will be some overlap in the treatment of the topic by the two disciplines, but you'd be wrong to expect everything -- or even most -- to be the same. </span>

&nbsp;

<span style="text-indent: 1em">Simply put, psychologists and sociologists generally tend to ask different questions, to approach a topic differently, to have different concerns, to have different preferred methods for collecting and analyzing (quantitative) data, and to even reach different conclusions, and to therefore offer different theories (as one would expect from two separate disciplines). Why wouldn't we want specialized statistics for each discipline?</span>

&nbsp;

Think of this book as a crash course in statistics. As such, I make these promises:

1) I promise to include only what is absolutely necessary.

2) I promise to skip on fluff and padding and any other material that is not strictly relevant to the exposition.

3) I promise to avoid repetitiveness as much as possible and instead explain everything only once but slowly and patiently.

&nbsp;

Given my promise, this book provides a necessarily brief introduction to statistics. It is also a conventional introduction in that, as almost all such books, it does not include all there is about some of the more complex concepts, i.e., it is not entirely truthful.

&nbsp;

Don't get alarmed by this admission. Rather, think of this introduction as your first date with statistics. No one tells all and bares all their secrets on a first date, do they? (...Or it might be their last.) Some things need to be revealed at a later time, once you've come to know your love interest better. Statistics is like that too. Some advanced concepts and relatively new developments in the discipline would only make sense to you only after the initial period of getting to know it has passed; then you can learn more "truthfully" and understand in what way and why the tools and concepts were simplified when they were first introduced to you.

&nbsp;

And if you never get to "a second date" with statistics, never fear. What you will learn from this brief introduction will be quite practical "in real life" and still will serve you well. (You'll just know there is more to what you've learned -- but that's the case with everything, no?) You will learn the basics of summarizing data and extracting useful information out of it; how data can be manipulated and how and why not to do that; how and when you can generalize from data and the limits to your generalizations; what role probability and uncertainty play in statistics; how to interpret basic statistical information; what to look for in existing statistical reports; and how to execute a basic statistics report on your own. You will learn how to talk about statistics, and how to write about statistics. Finally, you will learn where to go from here, should you ever feel like going on a second date with statistics after all.

&nbsp;

Given the purposefully streamlined content, some will not like this book. If you are an instructor (or a student) looking for theoretically comprehensive and expansive introductory treatment of statistics, this is not the book for you -- but you also know many such books exist, freely available online or otherwise. Statisticians will likely be severely displeased by some of the things missing here, as compared to a truly conventional introductory statistics text.

&nbsp;

But this indeed is why this book exists at all: to only include what I've discovered my students need in order to have a basic working knowledge about the most useful and most frequently used simple stats tools.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>20</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:30:59]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:30:59]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[foreword]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[front-matter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Acknowledgements</title>
		<link>https://pressbooks.bccampus.ca/simplestats/front-matter/acknowledgements/</link>
		<pubDate>Wed, 31 Oct 2018 18:38:01 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=front-matter&#038;p=38</guid>
		<description></description>
		<content:encoded><![CDATA[]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>38</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:38:01]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:38:01]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[acknowledgements]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[front-matter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Cover</title>
		<link>https://pressbooks.bccampus.ca/simplestats/front-matter/cover/</link>
		<pubDate>Thu, 08 Nov 2018 00:09:35 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=front-matter&#038;p=166</guid>
		<description></description>
		<content:encoded><![CDATA[<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/07/stats-tools.jpg" alt="" width="776" height="582" class="alignnone wp-image-1414 " />]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>166</wp:post_id>
		<wp:post_date><![CDATA[2018-11-07 19:09:35]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-11-08 00:09:35]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[cover]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[front-matter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Title page</title>
		<link>https://pressbooks.bccampus.ca/simplestats/front-matter/title-page/</link>
		<pubDate>Thu, 08 Nov 2018 00:09:47 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=front-matter&#038;p=168</guid>
		<description></description>
		<content:encoded><![CDATA[<h1 style="text-align: center"><strong><span style="text-align: center">Simple Statistics Tools </span></strong></h1>
<h1 style="text-align: center"><strong><span style="text-align: center">(for Sociologists)</span></strong></h1>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>168</wp:post_id>
		<wp:post_date><![CDATA[2018-11-07 19:09:47]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-11-08 00:09:47]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[title-page]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[front-matter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Dedication</title>
		<link>https://pressbooks.bccampus.ca/simplestats/front-matter/dedication/</link>
		<pubDate>Wed, 14 Nov 2018 21:53:14 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=front-matter&#038;p=176</guid>
		<description></description>
		<content:encoded><![CDATA[To my all my students, former, present, and future.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>176</wp:post_id>
		<wp:post_date><![CDATA[2018-11-14 16:53:14]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-11-14 21:53:14]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[dedication]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[front-matter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.2.4 The Real Normal Distribution Is a Probability One</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-4-the-real-normal-distribution/</link>
		<pubDate>Thu, 22 Aug 2019 01:52:34 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1759</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Now back to the normal distribution, as promised.

&nbsp;

Recall, if you will, the distinction between discrete and continuous variables[footnote]We discussed this in Section 1.5, here: <a href="https://pressbooks.bccampus.ca/simplestats/chapter/1-5-discrete-and-continuous-variables/">https://pressbooks.bccampus.ca/simplestats/chapter/1-5-discrete-and-continuous-variables/</a>.[/footnote]. Flipping coins and throwing dice and selecting respondents from a small number of categories are all discrete outcomes, so their probability distributions are also discrete.

&nbsp;

On the other hand, continuous variables (i.e., mostly interval/ratio variables) have continuous probability distributions. <strong>The normal distribution</strong> -- whose features we discussed at length -- i<strong>s one type of a continuous probability distribution.</strong>

&nbsp;

As well, recall that probabilities are expectations. Thus, while some continuous random variables might have an approximately normal <em>observed</em> distribution, their <em>probability</em> distribution (i.e., expected in theory) is perfectly normal -- because it's theoretical.

&nbsp;

I said it before and it bears repeating: just like a few coin flips can produce an unequal number of heads and tails despite the fact that the probabilities of getting heads or tails are both equal to 0.5 <em>in theory,</em> a variable can have an approximately normal frequency distribution w<span style="text-indent: 37.3333px;font-size: 14pt">hile it probability distribution is theoretically normal. In short, we can <em>expect</em> some continuous variables to be normally distributed. For example, we can <em>expect</em> most people to be of average height or thereabouts, and to have few people who are much shorter or much taller, and the shortest and the tallest to be so rare as to be exceptional. </span>

&nbsp;

This, however, is actually<em> not</em> why the normal distribution is so important in statistics. What do we care about "some variables" and whether their distribution is normal or only approximately so? (Well, we do use that information, of course, but that's not the point here.) <strong>The reason the normal distribution is so valuable is because one specific very special distribution is normal -- the sampling distribution</strong>, as we will see in Chapter 6. (The sampling distribution lies at the basis of statistical inference.) But let's not get ahead of ourselves.

&nbsp;

After all this, you can see the normal distribution as a <em>normally distributed probability.</em> (Or, instead of a frequency distribution, as a relative frequency distribution). Thus, the area under the normal curve is equal to 1 (or 100 percent, the whole probability), and it can be sectioned off, as it were, to indicate various outcomes' probabilities. See the following set of Figures 5.6.

&nbsp;

<em>Figure 5.6 (A) Probability of 1 (100%)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-100-percent.png" alt="" width="898" height="454" class="aligncenter wp-image-1868 size-full" />

&nbsp;

<em>Figure 5.6 (B) The Mean Gives Us Two Identical (Symmetric) Parts of 50% Probability Each</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-50-50-percent.png" alt="" width="898" height="454" class="aligncenter wp-image-1869 size-full" />

&nbsp;

<em>Figure 5.6 (C) 1 Standard Deviation from the Mean Sections Off  68% Probability</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-68-percent.png" alt="" width="898" height="454" class="aligncenter wp-image-1870 size-full" />

&nbsp;

<em>Figure 5.6 (D) About 2 Standard Deviations from the Mean Section Off 95% Probability</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-95-percent.png" alt="" width="898" height="454" class="aligncenter wp-image-1871 size-full" />

&nbsp;

<em>Figure 5.6 (E) About 3 Standard Deviations from the Mean Section Off 99% Probability</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-99-percent.png" alt="" width="898" height="454" class="aligncenter wp-image-1872 size-full" />

&nbsp;

Thus apart from what percentage of cases falls where, now we can discuss what the probability that a case will fall in a particular place is. Both refer to the same thing essentially but the latter indicates the <em>theoretical expectation</em> and allows us to be more precise (as empirically cases are only approximately normally distributed). Or, you can think of it like this: given the properties of the normal probability distribution, we can <em>expect</em> that much percentage of the data to be within that many standard deviations from the mean.

&nbsp;

You'll see how the normal curve allows us to calculate probabilities through <em>z</em>-values in the next and (to your eternal relief) final section on the topic.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1759</wp:post_id>
		<wp:post_date><![CDATA[2019-08-21 21:52:34]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-22 01:52:34]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-2-4-the-real-normal-distribution]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>9</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-2-1-the-real-use-of-z-values]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-2-2-the-real-use-of-z-values]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_wp_old_slug]]></wp:meta_key>
			<wp:meta_value><![CDATA[5-2-4-the-real-use-of-z-values]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.2.1 Working with Probabilities</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-1-calculating-probabilities/</link>
		<pubDate>Fri, 23 Aug 2019 02:15:31 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1784</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

<strong>We express probabilities as proportions</strong> (and we also denote them with <em>p, </em>just like we do proportions[footnote]If you need a reminder, the relevant part is in Section 2.3.1, here: <a href="https://pressbooks.bccampus.ca/simplestats/chapter/2-3-1-adding-percentages">https://pressbooks.bccampus.ca/simplestats/chapter/2-3-1-adding-percentages</a>/[/footnote]), as this is indeed what they are:

&nbsp;

$$p=\frac{\textrm{number of specific outcomes we are interested in}}{\textrm{number of all possible outcomes}}$$

&nbsp;

Or, the probability of a specific outcome is the proportion of the number of such outcomes out of the number of all possible outcomes.

&nbsp;

Thus the probability of getting heads in a coin toss is:

&nbsp;

$$p(\textrm{heads})=\frac{\textrm{number of heads sides of a coin}}{\textrm{number of all sides of a coin}}=\frac{1}{2}=0.5$$

&nbsp;

The same of course applies to tails:

&nbsp;

$$p(\textrm{tails})=\frac{\textrm{number of tails sides of a coin}}{\textrm{number of all sides of a coin}}=\frac{1}{2}=0.5$$

&nbsp;

Heads and tails together exhaust all possible outcomes, so the probability that a coin will fall on any of its two sides is:

&nbsp;

$$p(\textrm{heads or tails})=\frac{2}{2}=\frac{1}{2}+\frac{1}{2}=0.5+0.5=1$$

&nbsp;

Now how about we extend our example to something that has more that two outcomes? With six sides, a conventional die will serve us perfectly.

&nbsp;

Following the same logic as with the coin, the probability to throw, say, a five is:

&nbsp;

$$p(\textrm{five})=\frac{\textrm{number of "five" sides of a die}}{\textrm{number of all sides of a die}}=\frac{1}{6}=0.167$$

&nbsp;

The same goes for throwing a one, a two, a three, a four, or a six:

&nbsp;

$$p(\textrm{one})=p(\textrm{two})=p(\textrm{three})=p(\textrm{four})=p(\textrm{five})=p(\textrm{six})=\frac{1}{6}=0.167$$

&nbsp;

Or, imagine you have a bowl with ten balls inside (i.e., the balls have numbers from 1 to 10). The probability of selecting each one out (without looking!) is, you guessed it, 1 out of 10, as each number appears only once and there are ten possible outcomes:

&nbsp;

$$p(1)=p(2)=p(3)=\ldots=p(10)=\frac{1}{10}=0.1$$

&nbsp;

While this principle applies to <em>N</em> of any size -- so we can increase the number of outcomes as much as we want -- note <strong>the key prerequisite for the calculations to work: the outcomes must happen randomly.</strong> A coin toss and a die throw are classical examples of random chance. But when picking balls out of a bowl we have to make sure we don't look or we might (consciously or subconsciously) <em>choose</em> one. Choosing a ball with a specific number introduces bias and thus invalidates randomness -- i.e., it invalidates the principle of the outcomes having the same probability. Without this principle we cannot calculate anything: the only way to know the probability of an outcome is, in a sense, to divide the total probability, as it were, (i.e., 1) by the number of all possible outcomes, giving us equal probability for each. <strong>We know the probability of an outcome <em>only</em> <em>if</em> we know how many outcomes are possible in total and they all have the same probability. </strong>(Chapter 6 has more on the topic as it's devoted to the topic of how random selection works.)

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1784</wp:post_id>
		<wp:post_date><![CDATA[2019-08-22 22:15:31]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 02:15:31]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-2-1-calculating-probabilities]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>6</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.2.2 Simple Probability Calculations</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-2-simple-probability-calculations/</link>
		<pubDate>Fri, 23 Aug 2019 17:21:48 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1802</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

This section is a brief side quest which shows you how to calculate combinations of probabilities. For example, and back to die throwing, what is the probability of throwing a two <em>or</em> a four?

&nbsp;

I'm certain you already know the answer. In this case the "outcomes of interest" are two instead of one, so the probability is two out of six possible outcomes:

&nbsp;

$$p(\textrm{two or four})=\frac{\textrm{number of outcomes we are interested in}}{\textrm{number of all outcomes}}=\frac{2}{6}=\frac{1}{3}=0.333$$

&nbsp;

Or I could have just as easily simply added the two outcomes' individual probabilities:

&nbsp;

$$p(\textrm{two or four})=\frac{\textrm{number of two's}}{\textrm{all outcomes}} + \frac{\textrm{number of four's}}{\textrm{all outcomes}}=\frac{1}{6}+\frac{1}{6}=\frac{2}{6}=0.333$$

&nbsp;

And this is it: <strong>to combine the probabilities of two outcomes which cannot happen at the same time (a.k.a. <em>disjoint events</em></strong>[footnote]You can recognize dijoint event by the usage of "or": it's one or the other (or a third one, etc.). When flipping one coin, you can either get heads or tails; when you throw one die, you can get only one of its sides at a time. Hence, we add their probabilities.[/footnote]<strong>), you simply have to add them together.</strong> (Recall we already used this when we started with the probability of getting heads <em>or</em> tails being 1; it's simply the probability of getting heads (0.5) added to the probability of getting tails (0.5)).

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 5.4 Adding Probabilities</em></p>

</header>
<div class="textbox__content">

&nbsp;

Since we already imagined a bowl with ten consecutively numbered balls inside, let's save ourselves the effort of imagining a new one and reuse it again. What is the probability of randomly selecting the #5 ball <em>or</em> the #7 ball <em>or</em> the #9 ball?

</div>
<sub>(Answer: 0.3)</sub>

&nbsp;

</div>
&nbsp;

On the other hand, <strong>combining probabilities of events that <em>can</em> happen at the same time, or that happen one after another in time (both a.k.a. <em>independent</em> <em>events</em></strong>[footnote]Events are called independent when the outcome of one doesn't affect the outcome of the other whatsoever. (Contrast this with getting heads in a coin toss, which precludes getting tails; same with throwing any number on a die as it precludes the other numbers from being thrown.)[/footnote]<strong>)</strong> is a tad more complicated and <strong>requires multiplication.</strong>

&nbsp;

For example, the probability of throwing double two's when throwing two dice (or throwing a two with one die and then immediately throwing again another two) is:

&nbsp;

$$p(\textrm{double two's})=\frac{\textrm{number of two's (1st die)}}{\textrm{all outcomes (1st die)}}\times\frac{\textrm{number of two's (2nd die)}}{\textrm{all outcomes (2nd die)}}=$$

$=\frac{1}{6}\times\frac{1}{6}=\frac{1}{36}=0.028$

&nbsp;

Or, if we flip a coin three times (or three coins at the same time), the probability of getting three tails is the probability of getting tails once out of one coin flip (i.e., 0.5) multiplied by the same probability and then multiplied by the same probability again (or simply 0.5<sup>3</sup>):

&nbsp;

$$p(\textrm{three tails})=\frac{1}{2}\times\frac{1}{2}\times\frac{1}{2}=\frac{1}{8}=0.125$$

&nbsp;

Thus the probability of flipping three tails in a row (or three tails with three coins at the same time) is 1.25 percent.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do it! 5.5 Multiplying Probabilities</em></p>

</header>
<div class="textbox__content">

&nbsp;

Using the same imaginary bowl with ten consecutively numbered balls inside as in the previous exercise, what is the probability of randomly selecting first the #3 ball, then the #4 ball, and then the #5 ball, <em>if you return the selected balls immediately back in the bowl before selecting the next one?</em>

</div>
<sub>(Answer: 0.001)</sub>

&nbsp;

</div>
&nbsp;

Now take the time to note the italicized condition at the end of the question in the exercise you just did. It's important enough to necessitate its own scary-red warning,

&nbsp;
<div class="textbox textbox--learning-objectives"><header class="textbox__header">
<p class="textbox__title"><em><span style="color: #ff0000"><strong>Watch Out!! #10</strong></span>... for Replacement When Working with Probabilities</em></p>

</header>
<div class="textbox__content">

&nbsp;

What would have happened, had I not specified that in the calculation in <em>Do It! 5.5</em> you should consider the selected balls being returned right after their random selection? Why, you would have tempered with the number of all possible outcomes, of course.

</div>
After all, after randomly selecting the first ball, <em>unless you imagine returning it back in the bowl</em>, there will be only (10-1=) 9 balls left from which to make the second selection. Then after removing the second ball, <em>and again not returning it back in the bowl</em>, you'd have left only (9-1=) 8 imaginary balls from which to select your third ball. Then, unlike the $\frac{1}{10}\times\frac{1}{10}\times\frac{1}{10}$ you should have used above, the calculation now becomes:

&nbsp;

$$p(\textrm{"3", "4", "5" balls in a row})=\frac{1}{10}\times\frac{1}{9}\times\frac{1}{8}=0.0013$$

&nbsp;

The difference between this result and the one in the exercise seems small but that's only because we're working with small numbers. It's still important to understand how random selection with replacement differs from random selection without replacement and to use the correct calculations.

&nbsp;

</div>
&nbsp;

Before we move on using probabilities with actual data, you could use a bit more practice.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 5.6 Adding and Multiplying Probabilities, With and Without Replacement</em></p>

</header>
<div class="textbox__content">

&nbsp;

Imagine you and four of your friends (let's call them Adam, Bhav, Chen, and Dila) are in a class of 25 students. Assume that it's the first time your class meets and your professor doesn't know any of you; she only has the class roster in front of her so any name she calls, she calls from the roster at random. Answer the following questions:
<ul>
 	<li>What is the probability that your professor will call your name?</li>
 	<li>What is the probability that she calls on Bhav?</li>
 	<li> What is the probability that she calls on you, then Chen, and then Dila, one after the other? (Hint: She won't call a name twice in a row, she remembers that much.)</li>
 	<li>What is the probability that she calls either your name or Adam's?</li>
 	<li>What is the probability that she calls on any one of your friends?</li>
 	<li>Your professor also needs  to randomly pair up students for a group assignment; what is the probability that she selects Chen and Dila to be in the same group?</li>
</ul>
<sub>(Answers: 0.04; 0.04; 0.000; 0.08; 0.16; 0.002)</sub>

</div>
</div>]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1802</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 13:21:48]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 17:21:48]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-2-2-simple-probability-calculations]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>7</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.2.3 Probabilities with Frequency Tables</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-3-probabilities-with-frequency-tables/</link>
		<pubDate>Fri, 23 Aug 2019 19:37:09 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1837</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

So far we've been working only with small-<em>N</em> examples but there is no reason to think what you learned from coins and dice and balls in bowls will not apply to actual, large-<em>N</em> data.

&nbsp;

We already established that probabilities are proportions, and they can also be expressed in percentage terms. Conveniently enough, I had the foresight to introduce percentages (a.k.a relative frequency) as early as Section 2.3.1 (<a href="https://pressbooks.bccampus.ca/simplestats/chapter/2-3-1-adding-percentages/">https://pressbooks.bccampus.ca/simplestats/chapter/2-3-1-adding-percentages/</a>). (I am that wise.) It turns out, we can work with the percentages we find in frequency tables as easily as we can with any of the imaginary examples we did in the previous sections. I'll prove my claim with an example.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 5.3 Social Class (GSS 2016)</em></p>

</header>
<div class="textbox__content">

Supposedly everyone thinks they're middle class and Canadians are not different. And while Table 5.1 shows that not really everyone thinks so, the majority of them do.

</div>
<em>Table 5.1 Respondent's Social Class (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/freq-table-social-class-gss-2016-probabilities.png" alt="" width="533" height="319" class="wp-image-1842 size-full alignnone" />

&nbsp;

Out of all 19,161 respondents who provided a valid response when asked about their social class, what would be the probability of randomly selecting a middle-class person?

&nbsp;

Going by the formula we've used so far, we have:

&nbsp;

$$p(\textrm{middle class})=\frac{\textrm{middle class N}}{\textrm{total N}}=\frac{12230}{19161}=0.638$$

&nbsp;

Or, the probability of randomly selecting a middle-class respondent from this group of people is 63.8 percent[footnote]In Chapter 6 will will also see that this is also the probability of a randomly selected <em>Canadian</em> (out of all Canadians) to be middle class, and why that is. This of course applies to all the calculations below[/footnote], exactly as the <em>Valid Percent</em> column tells us.

&nbsp;

And what would be the probability of randomly selecting either an upper-class <em>or</em> an upper-middle-class person?

&nbsp;

$p(\textrm{upper class or upper-middle class})=\frac{\textrm{upper class N}}{\textrm{total N}}+\frac{\textrm{upper-middle class N}}{\textrm{total N}}=$

$=\frac{233}{19161}+\frac{3321}{19161}}=\frac{3554}{19161}=0.185$

&nbsp;

Or, the probability of randomly selecting an upper-class or an upper-middle-class respondent is 18.5 percent, as we can well see in the Cumulative Percent column.

&nbsp;

Finally, what would be the probability of randomly selecting (with replacement) first a respondent who reported being lower class <em>and</em> <em>then</em> a respondent who reported being upper class?

&nbsp;

$p(\textrm{lower class and upper class})=\frac{\textrm{lower class N}}{\textrm{total N}}\times\frac{\textrm{upper class N}}{\textrm{total N}}=$

$=\frac{628}{19161}\times\frac{233}{19161}}=0.033\times0.012=0.0004$

&nbsp;

Or, the probability of first selecting a person who reported being lower class <em>and then</em> a person who reported being upper class is a minuscule 0.004 percent. (A quick-and-dirty multiplication of the valid percentages of two groups, 1.2 percent and 3.3 percent, will give you the same result.)

&nbsp;

</div>
&nbsp;

See, it works! Now try it on your own.

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 5.7 Marital Status (GSS 2016)</em></p>

</header>
<div class="textbox__content">

&nbsp;

Look at Table 5.2 and answer the questions listed below.

&nbsp;

<em>Table 5.2 Respondent's Marital Status (GSS 2016)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/freq-table-marstat-gss2016-probabilities.png" alt="" width="527" height="231" class="alignnone wp-image-1857 size-full" />
<ul>
 	<li>What is the probability of randomly selecting a person (out of the 19,609 people) who is living common-law?</li>
 	<li>What is the probability of randomly selecting a person (out of the 19,609 people) who is either separated <em>or</em> divorced?</li>
 	<li>What is the probability of first randomly selecting a person (out of the 19,609 people, with replacement) who is married <em>and then</em> one who is single?</li>
</ul>
<sub>(Answer: 0.091; 0.117; 0.106)</sub>

</div>
</div>
&nbsp;

&nbsp;

In passing, we can also extrapolate that since percentages and proportions are relative frequencies, and probabilities are proportions and percentages, <strong>probability <em>is</em> relative frequency</strong> too.

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1837</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 15:37:09]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 19:37:09]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-2-3-probabilities-with-frequency-tables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>8</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>5.2.5 The Real Use of z-Values</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/5-2-5-the-real-use-of-z-values/</link>
		<pubDate>Fri, 23 Aug 2019 23:15:11 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1876</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Recall from Section 5.1.2 (here: <a href="https://pressbooks.bccampus.ca/simplestats/chapter/5-1-2-the-z-value/">https://pressbooks.bccampus.ca/simplestats/chapter/5-1-2-the-z-value/</a>) that any value/score can be converted into a <em>z</em>-value, which tells us how far the value is from the mean in terms of standard deviations. Now that we know the normal curve has a bell shape reflecting probabilities (the higher the curve at any point, the bigger the probability), any point on the horizontal axis can be seen as a <em>z</em>-value associated with a specific probability -- or rather, the probability below and the probability above the z-value.

&nbsp;

You can find the <em>z</em>-values' probabilities listed in a Normal Distribution Table, e.g., this one: <a href="https://www.mathsisfun.com/data/standard-normal-distribution-table.html">https://www.mathsisfun.com/data/standard-normal-distribution-table.html</a>. Note that since the normal distribution is symmetric (i.e., the left side, below the mean, is exactly the same as the right side, above the mean), such tables usually only list probabilities between the mean and the <em>z</em>-score and above the <em>z</em>-score;  this needs to be taken into account when calculating probabilities.[footnote]To make sense of that, the linked webpage also provides an interactive tool to see all z-values with the normal curve with three options: between the mean and <em>z</em>, above <em>z</em>, and below <em>z</em>.[/footnote]

&nbsp;

Alternatively, online normal distribution calculators like this one <a href="http://onlinestatbook.com/2/calculators/normal_dist.html">http://onlinestatbook.com/2/calculators/normal_dist.html</a> give you the option to specify which probability you need calculated based on a specific mean and standard deviations.

&nbsp;

Let's take an example to see how this works.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 5.4 Hockey Player Heights</em></p>

</header>
<div class="textbox__content">

&nbsp;

According to Hockey Graphs (REFERENCE https://hockey-graphs.com/2015/02/19/nhl-player-size-from-1917-18-to-2014-15-a-brief-look/), the average height of players in the National Hockey League is about 185 cm, with a standard deviation of about 5.3 cm[footnote]2014 data.[/footnote].

&nbsp;

What is the probability that a new recruit (to your team of choice) will be taller than 185 cm? (Suspend disbelief and assume the recruit is randomly selected; i.e., his height has no bearing on his selection.)

&nbsp;

This one is easy: 185 is the mean, so the probability of a particular height being above the mean is 50 percent (equal to the probability of a height being below the mean). (For a visual, refer to Fig. 5.6 (B) in the previous section.)

&nbsp;

So let's complicate matters further: What is the probability of the new recruit being taller than 198 cm?

&nbsp;

To find it, we first need to convert the value into a z-score:

&nbsp;

$$z=\frac{x_i - \mu}{\sigma}=\frac{198-185}{5.3}=\frac{13}{5.3}=2.45$$

&nbsp;

where of course <em>x<sub>i</sub></em> is the original value,<em> μ</em> is the mean, and <em>σ</em> is the standard deviation.

&nbsp;

Then, using a normal distribution table (e.g., the one linked above, <a href="https://www.mathsisfun.com/data/standard-normal-distribution-table.html">https://www.mathsisfun.com/data/standard-normal-distribution-table.html</a>[footnote]Or its applet, set to "<em>z</em> onwards".[/footnote]), we find that the probability for a height to be above <em>z</em>=2.45 (i.e., above 198 cm) corresponds to 0.71 percent, or less than 1 percent. (Of course, if you're curious, you'll also know that the probability of a new recruit to be shorter than 198 cm is (100-0.71=) 99.29 percent.)

&nbsp;

You can see the correspondence between the two graphs below in Fig. 5.7, one showing the height values and the other the z-scores. The area we are interested in is beyond/above 198 cm, i.e., beyond/above z=2.45.

&nbsp;

<em>Figure 5.7 (A) The Area Beyond 198 cm</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-in-cm.png" alt="" width="898" height="454" class="wp-image-1891 size-full alignleft" />

</div>
&nbsp;

&nbsp;

<em>Figure 5.7 (B) The Area Beyond z = 2.45</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example.png" alt="" width="714" height="361" class="wp-image-1892 alignleft" />

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

&nbsp;

We can also ask the probability of a team recruit being shorter than 180 cm. Then:

&nbsp;

$$z=\frac{x_i - \mu}{\sigma}=\frac{180-185}{5.3}=\frac{-5}{5.3}=-0.94$$

&nbsp;

Checking the normal distribution table, we find that the probability up to/below <em>z</em>=-0.94 is 17.36 percent. Thus we have found that the probability of a recruit to be shorter than 180 cm is 17.36 percent. (Alternatively, we also know that the probability of a recruit being taller than 180 cm is (100-17.36=) 82. 65 percent.) Again, see the graphs in Fig. 5.8 below.

&nbsp;

<em>Figure 5.8 (A) The Area Up To 180 cm</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-in-cm-2.png" alt="" width="898" height="454" class="wp-image-1898 size-full aligncenter" />

<em>Figure 5.8 (B) The Area Up To z = -0.94</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-2.png" alt="" width="898" height="454" class="wp-image-1899 size-full aligncenter" />

&nbsp;

Finally, let's try finding the probability of a new recruit being between 178 cm and 188 cm. In this case we need to find two <em>z</em>-scores, and add the probabilities between each of the <em>z</em>-scores and the mean (i.e., above the lower score up to the mean, and below the higher score down to the mean).

&nbsp;

$$z=\frac{x_i - \mu}{\sigma}=\frac{178-185}{5.3}=\frac{-7}{5.3}=-1.32$$

&nbsp;

$$z=\frac{x_i - \mu}{\sigma}=\frac{188-185}{5.3}=\frac{3}{5.3}=0.57$$

&nbsp;

Using a normal distribution table we find that the probability between <em>z</em>=-1.32 and the mean is 40.66 percent. The probability between the mean and <em>z</em>=0.57 is 21.57 percent. Thus, the probability that a new recruit's height will be between 178 cm and 188 cm is (40.66+21.57=) 62.23 percent. See Fig. 5.9 below.

&nbsp;

<em>Figure 5.9 (A) The Area Between 178 cm and 188 cm (Or Rather Between 178 cm and 185 cm and Between 185 cm and 188 cm)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-in-cm-3.png" alt="" width="898" height="454" class="wp-image-1903 size-full aligncenter" />

<em>Figure 5.9 (B) The Area Between z = -1.32  and z = 0.57 (Or Rather Between z = -1.32 and 0 and Between 0 and z = 0.57)</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/08/normal-hockey-players-z-example-3.png" alt="" width="898" height="455" class="wp-image-1904 size-full aligncenter" />

&nbsp;

&nbsp;

</div>
&nbsp;

Time to practice on your own!

&nbsp;
<div class="textbox textbox--exercises"><header class="textbox__header">
<p class="textbox__title"><em>Do It! 5.8 Test Scores</em></p>

</header>
<div class="textbox__content">

&nbsp;

Imagine you learn that the average score on some test you've taken is 110 with a standard deviation 8. You still don't know your score, so you'll try to estimate some probabilities. What is the probability that you have more than 130? What about more than 95? Below 87? Between 90 and 115? Feel free to use the normal distribution table linked above. (Hint: Drawing out the normal curve centered on 110 helps.)

&nbsp;

<sub>(Answers: <em>z</em>=2.5, 0.62%; <em>z</em>=-1.88, 96.99%; <em>z</em>=-2.88, 0.2%; <em>z</em>=-2.5 and <em>z</em>=0.63, 49.38% + 23.57%= 72.95%)</sub>

</div>
</div>
&nbsp;

Now, with the concepts of probabilities and the normal distribution under your belt, you are finally ready to delve into statistical inference. Unfortunately for you, another theoretical chapter looms on the horizon, next. Grit your teeth and bear it, for the payoff (once we get to actually applying the theory in practice) is well worth it.

&nbsp;

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1876</wp:post_id>
		<wp:post_date><![CDATA[2019-08-23 19:15:11]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-23 23:15:11]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[5-2-5-the-real-use-of-z-values]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>28</wp:post_parent>
		<wp:menu_order>10</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>6.3 Random Sampling</title>
		<link>https://pressbooks.bccampus.ca/simplestats/chapter/6-3-random-sampling/</link>
		<pubDate>Wed, 28 Aug 2019 23:13:07 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=chapter&#038;p=1912</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

In order to be able to use what we know about probability distributions and the normal curve and to be able to apply this knowledge in the service of inference (how exactly we do that comes later in the chapter), we need to know the probabilities of the population elements to be selected. The problem is, estimating these probabilities (every time, for each and any new study) can be way too burdensome, if not outright impossible. Consider the following example.

&nbsp;
<div class="textbox textbox--examples"><header class="textbox__header">
<p class="textbox__title"><em>Example 6.1 Mode of Transportation of Students</em></p>

</header>
<div class="textbox__content">

&nbsp;

Imagine that you are interested in what mode of transportation the students in your university usually take to campus. You decide that a sample of<em> N</em>=100 sounds reasonable. Imagine further that you don't know anything about sampling (or logic) so you decide to go to the nearest bus stop <span style="font-size: 1rem">to your campus </span><span style="text-indent: 1em;font-size: 1rem">and talk to the first hundred students that happen to come by once you're there. </span>

&nbsp;

<span style="text-indent: 1em;font-size: 1rem">Arguably, if you did that, you could expect close to 100 percent of your sample to choose <em>bus</em> as their usual mode of transportation to school -- after all, you have talked only to students waiting at a <em>bus stop</em>. True, it's possible that some of your respondents were taking the bus only at that particular time (their car might have broken down, or they didn't feel like driving that day, etc.) but it's hardly likely this to be the case for more than a few out of the selected hundred. </span>

&nbsp;

<span style="text-indent: 1em;font-size: 1rem">So far, what you could learn from your study is that some hundred (or close to it) students in your university happen to usually take the bus to school. In and of itself, there is nothing wrong with that. The question, however, is whether you can use this information to conclude that <em>bus</em> is the usual form of transportation for students in your university <em>in general</em>. To paraphrase in the language of research: is the information regarding usual modes of transportation gathered by you from a hundred students at a bus stop generalizable to your institution's student body as a whole?</span>

&nbsp;

Even going by logic alone you should be able to easily see that the answer is <em>no, of course not</em>. After all, you only talked to students at a bus stop who are there specifically to take the bus, at a specific time, on a specific day. What about the students that directly went to the parking lot to take their cars, or those who went to retrieve their bikes from the bike racks, or who simply walked home? Then what about students who had no classes on the day that you went to the bus stop? Or the students that were in class at the time you were interviewing your subjects? Or the students in your institution whose classes were at a different campus and never came to the one you happened to be in?

&nbsp;

In short, your method of collecting information had produced a <em>biased sample</em>: some elements in it (students who happened to be taking the bus at the time of your survey) had a higher chance at participating in your study than others (everyone else). The sample is biased toward bus-takers -- those who you talked to had something like 100 percent chance to be in the study (they did it after all); other bus-takers who weren't there had a smaller but still potential chance to be in the study, and those who never take the bus had 0 percent probability to be in your study.

&nbsp;

What's more, not only are the probabilities to be selected different for the different students, the calculating the exact probability for every element in every new study and accounting for the differences would be a fool's errand, as unfeasible (or outright impossible) as collecting information on the entire population under study in the general case.

</div>
</div>
&nbsp;

The takeaway from Example 6.1 is that in statistics we want elements to have easily known (to make calculations easy) and equal (so as to not produce bias) probabilities to be selected. Fortunately for us, random sampling (also called <em>probability sampling</em>) provides both -- as the way for the probabilities to be known is based on the fact that when chosen at random, the elements have the same/equal probability of  being chosen.

&nbsp;

Recall that in a coin flip the probability of getting heads is the same as the probability of getting tails, and they are both $\frac{1}{2}$, one outcome out of two possible outcomes, or 0.5. The probabilities of throwing a die and getting a one, or a two, or a three, or a four, or a five, or a six are all equal, and known: $\frac{1}{6}$, one outcome out of six possible outcomes, or 0.167. Similarly, the probability of selecting one person at random out of a group of thirty-five people is the same for all thirty-five people, and equal to $\frac{1}{35}$, or 0.028.

&nbsp;

<strong>Throwing dice, flipping coins, and selecting at random are all random (chance) events - there is no bias in them, as the probability of any outcome is the same as any other outcome, and easily calculatable as one out of the total number of possible outcomes.</strong>

&nbsp;

If we apply the same logic to sampling, we can see that the only thing we need is to make sure that our selection is random and that it applies to all elements in a population of a particular known size: then the probability of selecting an element will be always <span style="text-indent: 18.6667px;font-size: 14pt">one out of the total number of elements, i.e., the total study population size.</span>

&nbsp;

<span style="text-indent: 18.6667px;font-size: 14pt">When this condition -- equal probability of elements to be selected -- is met and we know that probability, we know its frequency distribution (psst, it's normal!).  We can thus use probability theory and its theorems and postulates which provide mathematical proof that a random (i.e., unbiased) sample reflects and <em>represents</em> the population from which it was drawn truthfully. Then and only then, whatever we learn from the sample would be generalizable to the population. (Of course, it's not <em>that</em> simple; there is more to it -- like sample size -- but I'll leave this for later when we get to the Central Limit Theorem). </span>

&nbsp;

So what would have been the best way to get a representative answer to the question regarding usual modes of transportation for students in your institution? Theoretically, you could have obtained a list of all students from the registrar, selected your hundred at random from the full list, and contacted only the persons selected. Their responses would indicate the most popular mode of student transportation and <em>now</em>, with random sampling, <em>they would reflect the entire student's body.</em>

&nbsp;

In practice things are more complicated: How <em>exactly</em> do you chose at random any desired number of elements from a list of all elements?[footnote]The comprehensive list of all elements in a population is called a <em>sampling frame</em>. Note that in practice some sampling frames might not include all elements they purport to have. For example, using the phone book as a sampling frame for a population is a frequently used method, yet we know that some people have unlisted numbers -- or, possibly, do not have a phone -- so they are not listed in the phone book. Thus there is a difference between the population and the sampling frame for it, where the sampling frame is an approximation of but not quite a list of the entire population. [/footnote] How do you even obtain a list of all elements in the first place? Even if we have one, do we put every element's name/number in a hat and pull them out one by one?

&nbsp;

While providing details on how random sampling is done in real life is also outside the scope of this text, I can assure you several such methods exist (though pulling names out of hat isn't one of them). For a comprehensive treatment, again, I encourage you to consult a research methods text; for my purposes here I will just list the major ones.

&nbsp;

<em>Simple random sampling</em> is the closest that you can get to the pulling-names-out-of-a-hat proposition, however, in this day and age it is usually done with computers using random number generators. The same goes for <em>systematic random sampling</em> (when the selection starts at a random starting point and proceeds at a fixed interval). Then there are also <em>stratified random sampling</em> (the population is first divided into <em>strata</em> based on similar characteristics of the elements, not unlike in quota sampling but then the selection from each strata is random), and <em>cluster random sampling</em> (the population is divided into clusters -- think sub-groups -- and then clusters are selected at random), where the latter can be even done in several stages (called <em>multistage cluster random sampling</em>).

&nbsp;

To conclude: ultimately, the important thing to learn here is not how the sampling is done empirically but the key difference between non-random and random sampling. <strong>Non-random/non-probability sampling methods select elements arbitrarily at researchers' discretion, with unknown and unequal probabilities of elements to be selected; this, in turn, precludes the use of probability theory and therefore allows for only assumed (but unprovable) generalizability of the samples produced in this way.</strong>

&nbsp;

On the other hand, random/probability sampling methods, in selecting elements at random, ensure that elements have equal (and therefore known) probability to be chosen; this <strong>random selection allows for the use of probability theory, the normal curve, and everything that is already mathematically proven regarding features of random variables and their probability distributions. Probability theory demonstrates that randomly selected samples (of sufficient size) are representative of and generalizable to the population from which they were drawn.</strong> <strong>Therefore, conducting a census of all elements under study becomes unnecessary as long as we are able to draw a random sample (of sufficient size) of the population.</strong>

&nbsp;

At this point, <span style="text-indent: 18.6667px;font-size: 14pt">(if you are still awake) </span><span style="text-indent: 1em;font-size: 14pt">you have probably noticed that I ask you to accept the fact that random samples are representative of their populations on my word, with little proof. While I will not go about proving this mathematically (and you'll be happier for it), I will provide the theorem on which my claims are based soon enough. First, however, we still have a few other things to cover, and the logic of inference is next.</span>

&nbsp;]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1912</wp:post_id>
		<wp:post_date><![CDATA[2019-08-28 19:13:07]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-08-28 23:13:07]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[6-3-random-sampling]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>32</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[chapter]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
		<wp:postmeta>
			<wp:meta_key><![CDATA[pb_show_title]]></wp:meta_key>
			<wp:meta_value><![CDATA[on]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Chapter 3 Measures of Central Tendency</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/chapter-3/</link>
		<pubDate>Wed, 31 Oct 2018 18:34:27 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=part&#038;p=24</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

Now that you have learned  the preliminaries -- what datasets and variables are, and how to summarize the information within a variable in tabular and graphical formats -- it's time to turn to applied statistics proper. Statistics allows us to <em>analyze</em> information , i.e., to learn more than what we simply see at first glance. Thus we scrutinize the data collected in great detail to get the most out of it, in terms of both description (examining what we see) and inference (reaching evidence-based conclusions).

&nbsp;

Aptly, we talk about <em>descriptive statistics</em> and <em>inferential statistics</em>. I<span style="font-size: 14pt">n the latter half of this book </span><span style="text-indent: 1em;font-size: 14pt">we will turn to inferential statistics which is devoted to inferential analysis on the basis of probability theory. </span><span style="font-size: 14pt">We now start with descriptive statistics devoted to the descriptive analysis of variables, i.e., to learning all we possibly can about a variable and its distribution. If you recall from Chapter 2's introduction, <strong>a</strong> <strong>variable's distribution is the way the observations/cases are distributed across the variable's categories</strong>. The cases can be concentrated closer together or more spread out, and exploring such features of a variable's distribution is the focus of this chapter and the next.</span>

&nbsp;

In addition to the visual summary of a variable which we get through graphs and which allow us to virtually <em>see</em> a variable's distribution, generally there are two further types of information we can get through descriptive analysis. They are called <em><strong>central tendency </strong></em>and <em><strong>dispersion</strong></em>.

&nbsp;

Considering what a variable in a dataset looks like, recall that a variable has a list of observations/ cases (think, for example, of the responses collected through a survey question) where the list is size <em>N</em> (<em>N</em>, again, is the number of <em>elements,</em> in general, or <em>respondents</em> if we focus specifically on people, as we usually do). Thus, on the one hand, we talk about <em>typical cases</em>, or <em>where cases tend to cluster</em> -- for example, what the most frequent response given is, if respondents tend to give similar answers, etc. -- and what the<em> "centre" </em>of the variable's distribution is. Measures related to this type of information are called <strong>measures of central tendency</strong>. There are three of them and we explore all of them in the current chapter in turn, the mode, the median, and the mean.

&nbsp;

On the other hand, we can also talk about how much a variable's distribution is "spread out". That is, if a variable is called that because the responses <em>vary</em> across people, how <em>variable</em> a variable actually is - does it vary a lot or does it vary a little? Are all responses clustered around the "centre" or are they relatively dispersed? Measures related to this type of information are called <strong>measures of dispersion</strong>, and they are presented in the next chapter.

&nbsp;

To summarize,<strong> we describe variables by </strong>providing and exploring<strong> 1) the visual summary of their distribution (i.e., a graph), 2) their measures of central tendency, and 3) their measures of dispersion.</strong>

&nbsp;

There is a catch, however: <strong>Not all measures of central tendency and dispersion are appropriate for all variables.</strong> Just like not all graphs are appropriate for each type of variable, <strong>whether a measure of central tendency or dispersion is applicable to a variable or not depends on the variable's level of measurement.</strong>

&nbsp;

I did already warn you that determining the proper level of measurement of a variable is key -- without that, you can execute correctly neither descriptive, nor inferential analysis. Go back and reread Section 1.3 <span style="text-indent: 37.3333px;font-size: 14pt">if necessary </span><span style="text-align: initial;text-indent: 2em;font-size: 14pt">(https://pressbooks.bccampus.ca/simplestats/chapter/1-3-levels-of-measurement/) or what comes next will make little sense to you.</span>

&nbsp;

But enough with the boring theory -- on to the the application of central tendency measures!]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>24</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:34:27]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:34:27]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[chapter-3]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>2</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Chapter 4 Measures of Dispersion</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/chapter-4-2/</link>
		<pubDate>Wed, 31 Oct 2018 18:34:50 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=part&#038;p=26</guid>
		<description></description>
		<content:encoded><![CDATA[[latexpage]

Early on in Chapter 3 we established that there are three pieces of information which helps us describe variables. Describing variables helps us to glean something from the variables' distribution beyond the raw list of observations of which it is made. In other words, through descriptive analysis we get to learn something about the cases that is not readily observable when all we have is a collection of data points.

&nbsp;

Graphs provide a first glimpse at a variable's distribution. Measures of central tendency provide information about the typical cases, where most cases tend to cluster, or about the "centre" of the data. We now turn to measures of dispersion, the last of the three key pieces of descriptive information pertaining to variables. Measures of dispersion tell us how"spread out" a variable's cases are; they provide a "clusteredness" measure of the data, as it were, and of how <em>dispersed</em> cases are across the variable's values.

&nbsp;

A simple illustration will make dispersion measures easier to understand. Take two sets of three numbers: "4, 5, 6" and "2, 5, 8". By now, you should be able to tell immediately that the median of both sets is 5 (each set has one value below and one above 5). You also might be able to easily see that the mean of both sets is also <em>5</em>; if not, this is how we get it:

&nbsp;

$$\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{(4+5+6)}{3}=\frac{15}{3}=5$$

$$\frac{\sum\limits_{i=1}^{N}{x_i}}{N}=\frac{(2+5+8)}{3}=\frac{15}{3}=5$$

&nbsp;

Even if both "4, 5, 6" and "2, 5, 8" sets have the same measures of central tendency, you'd be hard-pressed to claim they are the same sets of numbers. Take a look at the image below (or just look at a ruler of your own, if you have one close by): the values of 4 and 6 are much closer to 5, than 2 and 8 are. That is, the values of our first set are more closely clustered around the "centre", while the values of our second set are more loosely spread around it. This "clustering" vs. "spreading" is precisely what dispersion measures.

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/ruler-1023726_1280-1024x341.png" alt="" width="560" height="186" class="alignnone wp-image-526" title="ruler-1023726_1280.png from Pixabay; no attribution necessary." />

There are four commonly used measures of dispersion.[footnote]A fifth measure of dispersion exists but is less commonly used. I'll introduce it only insofar as it is useful for understanding the standard deviation, the most widely used measure of dispersion.[/footnote] Before we turn to each of them in turn, note what I have just demonstrated here: <strong>it is quite possible for two variables to have the same measures of central tendency but different measures of dispersion</strong>.

&nbsp;

The four measures of dispersion can be divided into two groups. We begin with the simpler two, the <em>range</em> and the <em>interquartile range</em>, then turn to the more complicated (but most widely used) pair, the <em>variance</em> and the <em>standard deviation</em>.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>26</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:34:50]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:34:50]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[chapter-4-2]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>3</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Chapter 5 The Normal Distribution and Some Basics of Probability</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/chapter-5/</link>
		<pubDate>Wed, 31 Oct 2018 18:35:11 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=part&#038;p=28</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

A variable's distribution, you recall,<span style="font-size: 14pt"> is the way the observations/cases are distributed across the variable's categories. Frequency tables, graphs, as well as measures of central tendency and dispersion all provide information about the distributions of variables. </span>

&nbsp;

<span style="font-size: 14pt">All variables have a distribution (of course!) but some variables have a special type of distribution: one whose features and uses in statistics go beyond being simply "a variable's distribution". We call this distribution <em>normal distribution</em>.</span>

&nbsp;

In the first part of this chapter I introduce the normal distribution, detailing its features that make it so special. The latter half of the chapter is devoted to a concept without which we wouldn't be able to do any statistical inference and estimation, namely statistical probability. You will learn some basics of probability theory which are necessary for us to eventually proceed to statistical inference.

&nbsp;

You might be wondering why these two seemingly unrelated things -- a variable's distribution and probability theory -- are in the same chapter together. For now I will just give you a hint: probabilities have distributions too. Read on to find out more.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>28</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:35:11]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:35:11]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[chapter-5]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>4</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Chapter 6 Sampling, the Basis of Inference</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/chapter-6/</link>
		<pubDate>Wed, 31 Oct 2018 18:35:44 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=part&#038;p=32</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

While describing variables is all nice and good -- and useful -- statistics would be rather limited if we only used it for that. In reality, descriptive statistics, while popular (consider sports statistics, for example), is only a relatively tiny part of all that statistics has to offer. The true power of statistics lies in granting us a superpower: the ability to <em>infer --</em> to know (and even to predict), within reason, things we cannot otherwise possibly know through observation alone. This part of statistics is called <em>inferential statistics</em>, and it's based on probability theory, a branch of mathematics of which you had a small taste in Chapter 5.

&nbsp;

How do we know that life expectancy at birth  is 82.3 years <span style="text-indent: 37.3333px;font-size: 14pt">in Canada </span><span style="text-align: initial;text-indent: 2em;font-size: 14pt">and 78.7 years in the United States but only 51.8 years in Siera Leone (REFERENCE World Bank, 2016)? How can we predict, with reasonable certainty, the outcome of elections? How can we predict how many people will die of a particular cause in a specific country in a year? How do we know if most Canadians approve of immigration? Or what percentage of the Canadian work force is employed part-time? How do we predict how many people will be added to the world population in any year, or how many people will the world have in 2100?</span>

&nbsp;

<em>Figure 6.1 World Population Projection 2100</em>

<img src="https://pressbooks.bccampus.ca/simplestats/wp-content/uploads/sites/564/2019/02/World-population.png" alt="" width="722" height="541" class="wp-image-661  aligncenter" />

[https://population.un.org/wpp/Graphs/Probabilistic/POP/TOT/]

&nbsp;

Fig. 6.1 above might seem complicated to you now, but soon enough you would be able to read it, as we will be covering all the concepts listed in the legend.[footnote]As it's somewhat difficult to see it on the graph, the answer to the last question -- what is the projected population of the world for 2100? -- is 11.2 bln. people (REFERENCE UN Population Division, 2017).<span style="text-indent: 18.6667px;font-size: 14pt">[/footnote] We can do all that, and more, because of inferential statistics. </span>

&nbsp;

While I'll leave the demography examples and projections about the future aside (as the scope of this text is quite more modest), let's take an example from closer to home and, say, talk about the attitudes to immigration in Canada. How do we know if Canadians approve of immigration? What do we mean when we even say "Canadians"? If we say "Canadians approve of immigration," does that mean all Canadians do? If not, that how many Canadians approve and how many disapprove?

&nbsp;

To answer these questions, we need to introduce more vocabulary than we have been using so far; vocabulary that is generally used in all sorts of research, both quantitative and qualitative, and not pertaining to statistics <em>per se</em>, though very relevant to it. In short, we have to start differentiating between a <em>sample</em> and a <em>population</em> (a term that has a more general meaning than the way we use it in everyday life), and we need to talk about <em>sampling</em>.

&nbsp;

Following that, I'll explain the concept of <em>randomness</em> in greater detail, which, coupled with what we now know about probability, will help us get to the <em>sampling distribution</em>. With that and the <em>Central Limit Theorem</em>, we'll be ready. Then, and only then, we'll be able to answer questions like <em>How do we know if Canadians approve of immigration? </em>along with any other question we might have about things/entities we cannot directly obtain information about.

&nbsp;

But I am getting too far ahead and too fast in my overview which, as any abstract talk, easily gets confusing. Let's take it slowly from the beginning: samples and populations in the next section, and build from there. Be forewarned, however: what follows is indeed quite a bit theoretical and abstract, I'm afraid. (Yes, more do than the last chapter, sorry.) Believe me, I wouldn't do this to you if it weren't necessary.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>32</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:35:44]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:35:44]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[chapter-6]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>5</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Chapter 7 Variables Associations</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/chapter-7/</link>
		<pubDate>Wed, 31 Oct 2018 18:35:57 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=part&#038;p=34</guid>
		<description></description>
		<content:encoded><![CDATA[Statistical inference is hardly only a matter of estimating single variable means and proportions, and of constructing confidence intervals around them. Rather, quantitative sociologists (and other social scientists), like all scientists trying to explain the world around them, study associations between variables. Does class attendance affect students marks? Are male professors praised more highly in student evaluations than female professors? Are children of more educated parents more likely to earn post-secondary degrees? Does abstinence-only sex education lead to higher teen pregnancy (and abortion) rates? Are rich people more likely to vote? Are religious people more likely to espouse more socially conservative values? Does playing violent video-games increase incidence of real-life violence and crime? Does race/ethnicity affect one's educational attainment and/or income?
<p style="text-indent: 18.6667px">All of these questions reflect variable <em>associations</em>. Every time we hypothesize that two characteristics are related, or think that something <em>causes</em> change in another, every time we ask <em>why</em> something is the way it is and what makes it to be that way, we already speak the language of variable associations, even without acknowledging it as such.</p>
<p style="text-indent: 18.6667px"><span style="text-indent: 18.6667px"><span style="font-size: 14pt">While we can use various research methods to provide answers to these questions, </span><span style="font-size: 18.6667px">quantitative</span><span style="font-size: 14pt"> analysis can shed a unique light on them due to its grounding in mathematical/probability theory and the generalizability that stems from it.</span></span><span style="text-indent: 18.6667px;font-size: 14pt">[footnote]For generalizability, see Section XX in Chapter 6.[/footnote]</span><span style="font-size: 14pt;text-indent: 18.6667px"> Of course, like any research method, using statistics for inference particularly in the social sciences has its problems and limitations. Thus, we have to be very careful in not overstating conclusions and to always qualify our findings based on the specific way we have operationalized our variables (i.e., exactly how we have measured a concept), as well as depending on our sample size, the statistical assumptions we've made, the uncertainty we're dealing with, etc., etc.</span></p>
While most real-life research involves/considers many variables at the same time, examining <em>multivariate</em> associations like that are beyond the scope of this book. Instead, in the remainder of this text I focus on <em>bivariate</em> associations -- associations between two variables. Still, keep in mind that while this is a necessary first step when just entering the world of variable associations, this hardly ever (rather never) reflects reality in any way: the social world is too complex for there to only be one and <em>only one</em> cause to something we observe and that we're trying to explain. I'll remind you of this fact frequently as one of the biggest mistakes you could probably make with inference is to assume that the variable on which you have chosen to focus is the <em>only</em> one associated with (or worse, affecting) another variable of interest. In short, from now on we work with two variables in order to understand how associations work in principle, not because inference based on two variables reflects reality (neither in general, nor in real-life research).

The chapter starts with introducing what we mean by associations between variables, and with distinguishing between statistical and causal associations. In a brief return to descriptive statistics, you'll then learn how to describe bi-variate associations. At the end, I'll bring you back to the theory (and practice) of statistical inference, specifically to hypotheses and hypotheses testing, as this is again what allows us to move from sample descriptions to generalizable conclusions about the population of interest. Finally, I provide a brief discussion of the inevitability of uncertainty through introducing you to the two types of errors of inference.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>34</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 14:35:57]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 18:35:57]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[chapter-7]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>6</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Chapter 9 Testing Associations I: Difference of Means, F-test, and χ2 Test</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/chapter-9-bivariate-testing-contingency-tables/</link>
		<pubDate>Wed, 31 Oct 2018 22:01:10 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=part&#038;p=120</guid>
		<description></description>
		<content:encoded><![CDATA[All the theory you had to suffer through in Chapter 8 (and all other theoretical chapters) was for the purposes of what we do in this chapter and the next. All your efforts in introductory statistics will culminate in your ability to test bivarate associations for statistical significance -- i.e., to make statistical inference about populations based on random samples.

Recall that we ended Chapter 7 with the knowledge that we describe/examine potential bivariate associations 1) between a discrete and a continuous variable through boxplots and difference of means, 2) between two discrete variables through contingency tables and difference of proportions, and 3) between two continuous variables through scatterplots and the correlation coefficient <em>r</em>, in a given dataset.

In this chapter and the next you will learn how to test these three types of bivariate associations for statistical significance, i.e. to check whether they can be generalizable to the population of interest. The current chapter is devoted to the first two types of bivariate associations. Chapter 10, the last chapter in this book offers a preliminary first glimpse into a powerful technique for multivariate inference (that can be used for variables at any level of measurement), called statistical regression -- albeit we only cover the continuous variables case to serve as introduction.

Now that you know how hypothesis testing works, most of the associations testing will seem straightforward and somewhat formulaic: pose hypotheses, test hypothesis, make a decision regarding hypotheses, interpret findings in a substantive manner. The only thing that differs is the tests, as different type of associations generally require different tests. Regression is the one procedure that adds more, as it were, to this predictable pattern, but we'll deal with it when we get there.

And then you'll be done. So what are you waiting for? Gird up your loins for this last final push and let's get it over with!]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>120</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:01:10]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:01:10]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[chapter-9-bivariate-testing-contingency-tables]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>7</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Chapter 10 Testing Associations II: Correlation and Regression</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/chapter-10-bivariate-testing-basics-of-regression/</link>
		<pubDate>Wed, 31 Oct 2018 22:07:13 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=part&#038;p=128</guid>
		<description></description>
		<content:encoded><![CDATA[This is it: you are finally here, reading the<em> last</em> <em>chapter</em>. (And after nine chapters, what's just one more?) This is not a heavy chapter as some of the others but regression <em>is</em> sufficiently different form the type of testing about which you learned in Chapter 8 to deserve a heads-up -- so if you find yourself despairing at some point, just remind yourself that <em>this is it</em>; once you've learned <em>this</em>, you'll have a passing knowledge about <em>how, what for,</em> and <em>why</em> statistics is used in sociological research, and you'll also be able to do some basic analysis on your own! -- and you'll be done in no time.

Pep talk aside, for this chapter you should review/recall Chapter 7, and specifically Section 7.2.3 on examining bivariate associations between two variables, treated as continuous for the purposes of statistical analysis. We did that visually through a scatterplot (with a line of best fit) and numerically through the coefficient of correlation called Pearson's <em>r</em>.

In this chapter, you will learn what <em>r</em> actually is, and that it has its own <em>t</em>-test and a <em>p</em>-value to test its significance. In addition, I will present a relatively brief and basic introduction into the topic of regression, a powerful and versatile technique with truly impressive number of applications which readily allows for doing <em>multivariate</em> analysis.

After all, recall that when we do bivariate analysis, we ignore the complexity of the real world where variables are/may be tangled in a veritable web of almost endless interrelationships. With bivariate analysis we ignore all that to focus on how just <em>two</em> variables are statistically associated. But because of that, we can't say anything about <em>causality</em> as we cannot account for additional variables that could serve as alternative explanations to what we observe. And while multivariate regression cannot <em>completely</em> do that either (in the social sciences establishing causality is a pretty tall order), with careful assumptions and the right specifications, it can help bring us more than a few steps further in that direction.

Of course, even if I haven't already told you, you would have been able to tell by now that multivariate regression analysis falls beyond the scope of what we do here. What follows is a necessary stepping stone, however; once you have the right idea about how regression works with two continuous variables, everything else <em>regression</em> follows the same basic principle and thus can be built on top of the foundation you'll have by the end of this chapter (and book!).

So, ready? Let's go then! The end is just a several sections away!]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>128</wp:post_id>
		<wp:post_date><![CDATA[2018-10-31 18:07:13]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2018-10-31 22:07:13]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[chapter-10-bivariate-testing-basics-of-regression]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>8</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>Chapter 2 What Data Looks Like and Summarizing Data</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/ch-2-what-data-looks-like-and-summarizing-data/</link>
		<pubDate>Mon, 28 Jan 2019 23:26:25 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=part&#038;p=323</guid>
		<description></description>
		<content:encoded><![CDATA[&nbsp;

This chapter moves us to more practical matters, namely working with actual data. Once you get familiar with what real data sets look like and how they are organized, you will learn how to summarize the information contained within variables. We can do that through tables and through graphs. Both reflect the <em>distribution of a variable</em> (a concept which we'll discuss extensively from <span style="background-color: #ffffff">Chapter 3 </span>on), which is the way the observations/data points are distributed across a variable's categories. (For example, counting how many of your friends don't have siblings, how many have one sibling, how many have two siblings, etc, and writing the information down will give you the (frequency) distribution of the variable <em>number of siblings you friends have</em>.)

&nbsp;

We start with frequency tables, and explore the summary information contained within. We end the chapter with the way we can visually display variables (i.e., their distribution) and the discussion of what type of graph (a pie chart, a bar graph, or a histogram) is most appropriate for variables at different levels of measurement.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>323</wp:post_id>
		<wp:post_date><![CDATA[2019-01-28 18:26:25]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-01-28 23:26:25]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[ch-2-what-data-looks-like-and-summarizing-data]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>1</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[533]]></wp:meta_value>
		</wp:postmeta>
	</item>
	<item>
		<title>8 Hypotheses Testing</title>
		<link>https://pressbooks.bccampus.ca/simplestats/part/8-hypotheses-testing/</link>
		<pubDate>Fri, 29 Mar 2019 22:24:56 +0000</pubDate>
		<dc:creator><![CDATA[mariana]]></dc:creator>
		<guid isPermaLink="false">https://pressbooks.bccampus.ca/simplestats/?post_type=part&#038;p=1051</guid>
		<description></description>
		<content:encoded><![CDATA[In Chapter 7 we learned how to look for associations between two variables in random sample data. Just because two variables' observations exhibit a pattern that we can see in the sample doesn't mean that the variables are necessarily <em>truly</em> related in the population. Recall the purpose of sampling from Chapter 6: to infer something about a population based on a sample, to use sample statistics to estimate population parameters.

Given this, the questions you should be asking at this point are: Is an association we observe in the sample data something that exists in the population of interest? That is, do we observe this association because it really exists in the population and is reflected in the sample? Or is our sample unusual enough so that the association is an artifact of random chance present only in this one sample?  How certain can we be in our conclusion either way?

To answer these questions, you need to learn how<em> to test potential associations for statistical significance</em>. The last section of this chapter and the next two chapters are devoted to just that. First, however, there is some preliminary work to do. To that effect, in this chapter I introduce you to the concept of a <em>hypothesis</em> in social science research and the logic of hypotheses testing, both as a theory and in practical terms.

Before we delve into this (rather extensive) topic, I still have to address the elephant in the room when it comes to statistical associations: <em>causality</em>, next.]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>1051</wp:post_id>
		<wp:post_date><![CDATA[2019-03-29 18:24:56]]></wp:post_date>
		<wp:post_date_gmt><![CDATA[2019-03-29 22:24:56]]></wp:post_date_gmt>
		<wp:comment_status><![CDATA[closed]]></wp:comment_status>
		<wp:ping_status><![CDATA[closed]]></wp:ping_status>
		<wp:post_name><![CDATA[8-hypotheses-testing]]></wp:post_name>
		<wp:status><![CDATA[publish]]></wp:status>
		<wp:post_parent>0</wp:post_parent>
		<wp:menu_order>7</wp:menu_order>
		<wp:post_type><![CDATA[part]]></wp:post_type>
		<wp:post_password><![CDATA[]]></wp:post_password>
		<wp:is_sticky>0</wp:is_sticky>
		<wp:postmeta>
			<wp:meta_key><![CDATA[_edit_last]]></wp:meta_key>
			<wp:meta_value><![CDATA[667]]></wp:meta_value>
		</wp:postmeta>
	</item>
</channel>
</rss>
