{"id":313,"date":"2023-10-18T11:37:22","date_gmt":"2023-10-18T15:37:22","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/chapter\/__unknown__\/"},"modified":"2024-02-08T14:34:54","modified_gmt":"2024-02-08T19:34:54","slug":"__unknown__","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/chapter\/__unknown__\/","title":{"raw":"What is p hacking?","rendered":"What is p hacking?"},"content":{"raw":"<h1>p-hacking or data dredging<\/h1>\r\n[caption id=\"attachment_314\" align=\"aligncenter\" width=\"108\"]<img class=\"wp-image-314 size-medium\" src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/significant-108x300.png\" alt=\"green jelly beans - p-hacking\" width=\"108\" height=\"300\" \/> https:\/\/xkcd.com\/882\/[\/caption]\r\n\r\n&nbsp;\r\n\r\nComic:\u00a0 XKCD, <a href=\"https:\/\/xkcd.com\/882\/\">https:\/\/xkcd.com\/882\/<\/a>\r\n<h2>What is p-hacking?<\/h2>\r\nRecall the iced-coffee promotion.\u00a0 Normally, the average sales per customer in our store is $4.18.\u00a0 I recently ran a promotion where the store offered half-priced iced coffees.\u00a0 My hope is that this changes customer buying habits enough to raise the average sale per customer.\u00a0 We also really want to succeed!\u00a0 Our boss was unsure about the promotion, and if it turns out not to have an effect, we're\r\n\r\nNULL HYPOTHESIS: The promotion didn't work, so sales are the same or decreased:\r\n\r\n<math display=\"block\"><semantics><mstyle><msub><mi>H<\/mi><mn>0<\/mn><\/msub><mo>:<\/mo><mi>\u03bc<!-- \u03bc --><\/mi><mo>\u2264<!-- \u2264 --><\/mo><mrow class=\"MJX-TeXAtom-ORD\"><mo>$<\/mo><\/mrow><mn>4.18<\/mn><\/mstyle><annotation encoding=\"latex\">{\"version\":\"1.1\",\"math\":\"H_0: \\mu \\leq $4.18\"}<\/annotation><\/semantics><\/math>\r\n\r\nALTERNATIVE HYPOTHESIS: The promotion worked, and average sales increased!:\r\n\r\n<math display=\"block\"><semantics><mstyle><msub><mi>H<\/mi><mi>A<\/mi><\/msub><mo>:<\/mo><mi>\u03bc<!-- \u03bc --><\/mi><mo>&gt;<\/mo><mrow class=\"MJX-TeXAtom-ORD\"><mo>$<\/mo><\/mrow><mn>4.18<\/mn><\/mstyle><annotation encoding=\"latex\">{\"version\":\"1.1\",\"math\":\"H_A: \\mu &gt; $4.18\"}<\/annotation><\/semantics><\/math>\r\n\r\nRemember, we need to come up with a criteria to decide when to make a decision to implement iced-coffee afternoons across the entire chain.\u00a0 We use the typical 5% level of significance.\u00a0 That is, we will reject the null hypothesis, and conclude that there is enough evidence to show that the promotion raises the average customer bill, if p &lt; 0.05.\r\n\r\nIn class we noted that if we use a significance level of 0.05, we would expect a result this strange\u00a0<strong><em>if the null hypothesis was true<\/em><\/strong> only 5% of the time.\u00a0 But what if I...\u00a0 did the experiment 20 times?\u00a0 So, every day I pick out 50 random customers, and measure their average spend.\u00a0 Even if the promotion didn't do anything, we would expect that at least one of these samples would have a sample mean that was far away enough from the true population mean of $4.18 to give a small p-value.\r\n\r\nHere's the result of my experiment (run on Excel using the norm.inv() command to pull from the correct distribution.\u00a0 I used a standard error of\r\n\r\n<math display=\"block\"><semantics><mstyle><mfrac><mrow><mrow class=\"MJX-TeXAtom-ORD\"><mo>$<\/mo><\/mrow><mn>0.84<\/mn><\/mrow><msqrt><mn>50<\/mn><\/msqrt><\/mfrac><\/mstyle><annotation encoding=\"latex\">{\"version\":\"1.1\",\"math\":\"\\frac{$0.84}{\\sqrt{50}}\"}<\/annotation><\/semantics><\/math>\r\n\r\n, just like last week.\u00a0 If you open the file, you will rerun the randomizing, and should get different results.\r\n\r\n<img class=\"size-medium wp-image-315 aligncenter\" src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/File_ozfo30tr6f1fhtf6yvxrb7uwyztr90nb00168493457-300x186.png\" alt=\"Screengrab of an Excel file\" width=\"300\" height=\"186\" \/>\r\n\r\n&nbsp;\r\n\r\nThen I show the best result to my boss - if this screengrab, I have an impressive result of p = 0.0088, which is less than 1%, a strong result!\u00a0 Note that in 7 of the 20 experiments, my sample mean was actually lower than $4.18.\r\n<h2>Publication Bias<\/h2>\r\nI hope that it's clear that only showing the good results to my boss would be unethical, and would not benefit our business.\u00a0 But in the world of research, it's often unusual to publish something that has a negative conclusion - \"not enough evidence to suggest anything\" is an unexciting headline.\u00a0 That means that this ends up happening more than we would like.\r\n\r\nFor more info, try googling \"reproducibility crisis\", to see how publishing only good results has gotten the field of psychology into hot water.\r\n<h2>Data Dredging, or what about the green jellybeans?<\/h2>\r\nThis is a little bit different.\u00a0 In the comic at the top, the researcher didn't test jellybeans over and over.\u00a0 Instead, she kept on testing different hypotheses, from the same dataset.\u00a0 Much as testing the\u00a0<em>same false hypothesis<\/em> 20 times gave us significant results, testing\u00a0<em>20 different, but all false hypotheses<\/em> might give us something that is statistically significant (ie, p &lt; 0.05) by random chance.\u00a0 Certainly, if we test 100 or 1000 or even 10,000 hypotheses all at once, we would probably come up with something significant.\r\n<h2>More to explore:<\/h2>\r\n<ul>\r\n \t<li>Try to get a publishable result, by testing different factors: <a href=\"https:\/\/projects.fivethirtyeight.com\/p-hacking\/\">https:\/\/projects.fivethirtyeight.com\/p-hacking\/<\/a><\/li>\r\n \t<li>Check out some weird and spurious correlations, by comparing thousands of variables: <a href=\"http:\/\/www.tylervigen.com\/spurious-correlations\">http:\/\/www.tylervigen.com\/spurious-correlations<\/a><\/li>\r\n \t<li>Read up on what happens to a promising career when people find out you've been p-hacking: <a href=\"https:\/\/www.buzzfeednews.com\/article\/stephaniemlee\/brian-wansink-cornell-p-hacking\">https:\/\/www.buzzfeednews.com\/article\/stephaniemlee\/brian-wansink-cornell-p-hacking<\/a><\/li>\r\n<\/ul>","rendered":"<h1>p-hacking or data dredging<\/h1>\n<figure id=\"attachment_314\" aria-describedby=\"caption-attachment-314\" style=\"width: 108px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-314 size-medium\" src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/significant-108x300.png\" alt=\"green jelly beans - p-hacking\" width=\"108\" height=\"300\" srcset=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/significant-108x300.png 108w, https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/significant-369x1024.png 369w, https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/significant-65x180.png 65w, https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/significant-225x624.png 225w, https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/significant-350x971.png 350w, https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/significant.png 540w\" sizes=\"auto, (max-width: 108px) 100vw, 108px\" \/><figcaption id=\"caption-attachment-314\" class=\"wp-caption-text\">https:\/\/xkcd.com\/882\/<\/figcaption><\/figure>\n<p>&nbsp;<\/p>\n<p>Comic:\u00a0 XKCD, <a href=\"https:\/\/xkcd.com\/882\/\">https:\/\/xkcd.com\/882\/<\/a><\/p>\n<h2>What is p-hacking?<\/h2>\n<p>Recall the iced-coffee promotion.\u00a0 Normally, the average sales per customer in our store is $4.18.\u00a0 I recently ran a promotion where the store offered half-priced iced coffees.\u00a0 My hope is that this changes customer buying habits enough to raise the average sale per customer.\u00a0 We also really want to succeed!\u00a0 Our boss was unsure about the promotion, and if it turns out not to have an effect, we&#8217;re<\/p>\n<p>NULL HYPOTHESIS: The promotion didn&#8217;t work, so sales are the same or decreased:<\/p>\n<p><math display=\"block\"><semantics><mstyle><msub><mi>H<\/mi><mn>0<\/mn><\/msub><mo>:<\/mo><mi>\u03bc<!-- \u03bc --><\/mi><mo>\u2264<!-- \u2264 --><\/mo><mrow class=\"MJX-TeXAtom-ORD\"><mo>$<\/mo><\/mrow><mn>4.18<\/mn><\/mstyle><annotation encoding=\"latex\">{&#8220;version&#8221;:&#8221;1.1&#8243;,&#8221;math&#8221;:&#8221;H_0: \\mu \\leq $4.18&#8243;}<\/annotation><\/semantics><\/math><\/p>\n<p>ALTERNATIVE HYPOTHESIS: The promotion worked, and average sales increased!:<\/p>\n<p><math display=\"block\"><semantics><mstyle><msub><mi>H<\/mi><mi>A<\/mi><\/msub><mo>:<\/mo><mi>\u03bc<!-- \u03bc --><\/mi><mo>&gt;<\/mo><mrow class=\"MJX-TeXAtom-ORD\"><mo>$<\/mo><\/mrow><mn>4.18<\/mn><\/mstyle><annotation encoding=\"latex\">{&#8220;version&#8221;:&#8221;1.1&#8243;,&#8221;math&#8221;:&#8221;H_A: \\mu &gt; $4.18&#8243;}<\/annotation><\/semantics><\/math><\/p>\n<p>Remember, we need to come up with a criteria to decide when to make a decision to implement iced-coffee afternoons across the entire chain.\u00a0 We use the typical 5% level of significance.\u00a0 That is, we will reject the null hypothesis, and conclude that there is enough evidence to show that the promotion raises the average customer bill, if p &lt; 0.05.<\/p>\n<p>In class we noted that if we use a significance level of 0.05, we would expect a result this strange\u00a0<strong><em>if the null hypothesis was true<\/em><\/strong> only 5% of the time.\u00a0 But what if I&#8230;\u00a0 did the experiment 20 times?\u00a0 So, every day I pick out 50 random customers, and measure their average spend.\u00a0 Even if the promotion didn&#8217;t do anything, we would expect that at least one of these samples would have a sample mean that was far away enough from the true population mean of $4.18 to give a small p-value.<\/p>\n<p>Here&#8217;s the result of my experiment (run on Excel using the norm.inv() command to pull from the correct distribution.\u00a0 I used a standard error of<\/p>\n<p><math display=\"block\"><semantics><mstyle><mfrac><mrow><mrow class=\"MJX-TeXAtom-ORD\"><mo>$<\/mo><\/mrow><mn>0.84<\/mn><\/mrow><msqrt><mn>50<\/mn><\/msqrt><\/mfrac><\/mstyle><annotation encoding=\"latex\">{&#8220;version&#8221;:&#8221;1.1&#8243;,&#8221;math&#8221;:&#8221;\\frac{$0.84}{\\sqrt{50}}&#8221;}<\/annotation><\/semantics><\/math><\/p>\n<p>, just like last week.\u00a0 If you open the file, you will rerun the randomizing, and should get different results.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-315 aligncenter\" src=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/File_ozfo30tr6f1fhtf6yvxrb7uwyztr90nb00168493457-300x186.png\" alt=\"Screengrab of an Excel file\" width=\"300\" height=\"186\" srcset=\"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/File_ozfo30tr6f1fhtf6yvxrb7uwyztr90nb00168493457-300x186.png 300w, https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/File_ozfo30tr6f1fhtf6yvxrb7uwyztr90nb00168493457-65x40.png 65w, https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/File_ozfo30tr6f1fhtf6yvxrb7uwyztr90nb00168493457-225x140.png 225w, https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/File_ozfo30tr6f1fhtf6yvxrb7uwyztr90nb00168493457-350x217.png 350w, https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-content\/uploads\/sites\/1653\/2023\/10\/File_ozfo30tr6f1fhtf6yvxrb7uwyztr90nb00168493457.png 508w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Then I show the best result to my boss &#8211; if this screengrab, I have an impressive result of p = 0.0088, which is less than 1%, a strong result!\u00a0 Note that in 7 of the 20 experiments, my sample mean was actually lower than $4.18.<\/p>\n<h2>Publication Bias<\/h2>\n<p>I hope that it&#8217;s clear that only showing the good results to my boss would be unethical, and would not benefit our business.\u00a0 But in the world of research, it&#8217;s often unusual to publish something that has a negative conclusion &#8211; &#8220;not enough evidence to suggest anything&#8221; is an unexciting headline.\u00a0 That means that this ends up happening more than we would like.<\/p>\n<p>For more info, try googling &#8220;reproducibility crisis&#8221;, to see how publishing only good results has gotten the field of psychology into hot water.<\/p>\n<h2>Data Dredging, or what about the green jellybeans?<\/h2>\n<p>This is a little bit different.\u00a0 In the comic at the top, the researcher didn&#8217;t test jellybeans over and over.\u00a0 Instead, she kept on testing different hypotheses, from the same dataset.\u00a0 Much as testing the\u00a0<em>same false hypothesis<\/em> 20 times gave us significant results, testing\u00a0<em>20 different, but all false hypotheses<\/em> might give us something that is statistically significant (ie, p &lt; 0.05) by random chance.\u00a0 Certainly, if we test 100 or 1000 or even 10,000 hypotheses all at once, we would probably come up with something significant.<\/p>\n<h2>More to explore:<\/h2>\n<ul>\n<li>Try to get a publishable result, by testing different factors: <a href=\"https:\/\/projects.fivethirtyeight.com\/p-hacking\/\">https:\/\/projects.fivethirtyeight.com\/p-hacking\/<\/a><\/li>\n<li>Check out some weird and spurious correlations, by comparing thousands of variables: <a href=\"http:\/\/www.tylervigen.com\/spurious-correlations\">http:\/\/www.tylervigen.com\/spurious-correlations<\/a><\/li>\n<li>Read up on what happens to a promising career when people find out you&#8217;ve been p-hacking: <a href=\"https:\/\/www.buzzfeednews.com\/article\/stephaniemlee\/brian-wansink-cornell-p-hacking\">https:\/\/www.buzzfeednews.com\/article\/stephaniemlee\/brian-wansink-cornell-p-hacking<\/a><\/li>\n<\/ul>\n","protected":false},"author":883,"menu_order":4,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-313","chapter","type-chapter","status-publish","hentry"],"part":102,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters\/313","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/users\/883"}],"version-history":[{"count":2,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters\/313\/revisions"}],"predecessor-version":[{"id":389,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters\/313\/revisions\/389"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/parts\/102"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapters\/313\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/media?parent=313"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/pressbooks\/v2\/chapter-type?post=313"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/contributor?post=313"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/businessanalytics\/wp-json\/wp\/v2\/license?post=313"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}