{"id":194,"date":"2025-12-10T12:40:39","date_gmt":"2025-12-10T17:40:39","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/chapter\/reproducible-environments\/"},"modified":"2026-02-12T15:11:06","modified_gmt":"2026-02-12T20:11:06","slug":"reproducible-environments","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/chapter\/reproducible-environments\/","title":{"raw":"Reproducible Environments","rendered":"Reproducible Environments"},"content":{"raw":"At this point, we have covered quite a bit. Open as a principle of software development, open as a principle of human and machine interpretability, and open as a facet of reproducible workflows.\r\n\r\nIf we think back to the content covered in Open Workflows, and the discussion of reproducibility and replicability, it\u2019s worth considering that reproducibility is really about internal validation while replicability is about external validation. Reproducibility confirms the same data and processing methods will produce the same result. Replicability contributes to the evidence base by conducting a new study modeled on a previous study.\r\n\r\nIn the spirit of open as it relates to the digital environment and reproducibility, one of the gold standards when we\u2019re looking at a single study is computational reproducibility; that is to say, if I pass off all of my inputs (data, scripts, etc.) to someone else, can they, on their computer, reproduce what I did exactly. Unfortunately, the answer is frequently no, because computers are complex environments, and no two machines are going to have exactly the same environment; hardware and software differences will exist and these will impact how data is processed by a program. In complex food production, like brewing beer, it\u2019s often said that making a great beer once is easy, but making it a second time is much harder. Small variations \u2014 such as precise temperatures, ingredient sources, and even the weather \u2014 can subtly change the flavour. The same challenge exists in computational reproducibility \u2014 unless we use a container and apply the concept of containerization.\r\n\r\nThe full details of how containerization is deployed are really beyond an introductory section on open research. But the principles being addressed by containerization are critical to navigating a digital environment when we think about the ability of work, embedded within a piece of software, to be validated by others.\r\n<h2>A Recipe for Understanding Containers<\/h2>\r\nThe thing about any piece of software or any script is that it is never fully self-contained. We always rely on dependencies or pre-existing bundles of code, usually called libraries. Think of it like baking muffins.\r\n\r\nWhen you write your R or Python script, you\u2019re writing out your recipe; a set of instructions with particular steps that need to be followed. Your recipe requires certain things to be fully executed though. It requires:\r\n<ul>\r\n \t<li>an environment in which to run; let\u2019s call this your kitchen<\/li>\r\n \t<li>something to process your ingredients into the end product; let\u2019s call this your oven<\/li>\r\n \t<li>something to validate all the ingredients; let\u2019s call this your mixing bowl<\/li>\r\n \t<li>a list of ingredients; let\u2019s call these your dependencies or libraries.<\/li>\r\n<\/ul>\r\nFinally, your favourite muffin recipe is dependent on you having eggs, butter, white flour, and cow\u2019s milk. Let\u2019s now imagine that when you built your working environment \u2014 your kitchen \u2014 you made sure to include a lifetime supply of all of these dependencies \u2014 your ingredients. All is good until you come home one day, and realize that your partner has done some upgrades. One of these upgrades is to replace all of your white flour with rye flour and your cow\u2019s milk with goat\u2019s milk.\r\n\r\nThis upgrade was ostensibly made to reflect the need for a healthier lifestyle. Beyond potentially being annoyed about the lack of consultation, maybe you see the problem? Next time you try to make your muffins, your validator \u2014 your mixing bowl \u2014 will be expecting white flour and cow\u2019s milk. Unable to find these ingredients, your mixing bowl will fail to pass all the ingredients off to your oven. No more muffins, in spite of the most well-documented script \u2014 your recipe \u2014 being in hand.\r\n\r\nEven if you never had any upgrades done, what if your friend wanted your recipe? Sure, you could give them the script, and they could source all the ingredients. But if you wanted to ensure that the recipe was a perfect match to your own, you\u2019d gift-wrap all the ingredients with the recipe attached, ensuring success.\r\n\r\nThis gift wrapping, or bundling, is exactly what software that containerizes a piece of code does \u2014 it ensures that the code is accompanied by the appropriate environment and dependencies so that it will run into the future. This is a critical aspect of reproducibility.\r\n\r\n<a href=\"https:\/\/www.docker.com\/\">Docker<\/a> is a popular open-source tool for containerizing software and code. For those working in the realm of High-Performance Computing, <a href=\"https:\/\/apptainer.org\">Apptainer<\/a> is another popular option.\r\n<div class=\"textbox shaded\">\r\n<table style=\"border-collapse: collapse;width: 100%;height: 120px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 120px\">\r\n<td style=\"width: 85%;height: 120px\">\r\n<h5>Dig Deeper<\/h5>\r\nTo learn more about containers and the way they can improve reproducibility, review the following videos:\r\n<ul>\r\n \t<li>A conference presentation on the basic elements and implementation of Docker:<a href=\"https:\/\/www.youtube.com\/watch?v=GqqX0j127wA\"> Using Docker Containers to Improve Reproducibility in PL\/SE Research<\/a> (42:08)<\/li>\r\n \t<li><a href=\"https:\/\/www.youtube.com\/watch?v=DA87Ba2dpNM\">An introduction to containers using Singularity and some of the differences between Docker and Singularity <\/a>(48.23)<\/li>\r\n \t<li><a href=\"https:\/\/youtube.com\/playlist?list=PLKZ9c4ONm-VkxWW98Gcn9H6WwykMiqtnF\">An introduction to Apptainer<\/a> (8 part series)<\/li>\r\n<\/ul>\r\n<\/td>\r\n<td style=\"width: 15%;height: 120px\"><img class=\"aligncenter wp-image-33 size-thumbnail\" src=\"https:\/\/pressbooks.bccampus.ca\/wp-content\/uploads\/sites\/2593\/2025\/11\/Dig-Deeper-2-150x150.png\" alt=\"\" width=\"150\" height=\"150\" \/><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<h2>Using GenAI for Data Analysis: Why Reproducibility Can Be Challenging<\/h2>\r\n[pb_glossary id=\"263\"]Large Language Models[\/pb_glossary] are powerful tools for data analysis tasks like sentiment analysis and text summarization. They can very quickly make sense of large amounts of text and generate useful insights in a natural, human-like way.\r\n\r\nBut while they\u2019re impressive, using LLMs comes with a reproducibility challenge. If you run the same data analysis twice using an LLM, you might not get the exact same result. That\u2019s because LLMs rely on built-in randomness when generating their text output. This randomness is intentional and helps make the output more natural, but it also makes it harder to get consistent, repeatable results.\r\n\r\nThe problem is even more complicated when you\u2019re using proprietary Software as a Service models like GPT-4 or Claude. These models are hosted by companies and act like \u201cblack boxes\u201d. You don\u2019t have access to their inner workings such as their training data or model weights. That means if the provider makes a change to the model behind the scenes, your results may change too, even if your inputs stay the same.\r\n\r\nTo work around this, it\u2019s important to:\r\n<ul>\r\n \t<li>Save all your prompts and settings<\/li>\r\n \t<li>Use fixed versions of the model when possible<\/li>\r\n \t<li>If available, use model settings such as setting Temperature to 0, using Greedy Decoding, and fixing a Random Seed, to minimize randomization<\/li>\r\n \t<li>Try repeating your analysis a few times to observe the scale of the random effects<\/li>\r\n \t<li>Consider using a locally hosted model such as Llama, Gemma or Mistral<\/li>\r\n<\/ul>\r\n<div class=\"textbox shaded\">\r\n<table style=\"border-collapse: collapse;width: 100%;height: 120px\" border=\"0\">\r\n<tbody>\r\n<tr style=\"height: 120px\">\r\n<td style=\"width: 85%;height: 120px\">\r\n<h5>Dig Deeper<\/h5>\r\nTo learn more about data analysis with LLMs:\r\n<ul>\r\n \t<li>Download and install <a href=\"https:\/\/github.com\/ollama\/ollama\">ollama<\/a>, a user friendly tool for running LLMs locally.<\/li>\r\n \t<li>Explore AI-Assisted Data Extraction (AIDE) <a href=\"https:\/\/youtu.be\/33fE_nqjoE0?si=jVleOwPKlxKOJ_xG\">Demo Video<\/a> <a href=\"https:\/\/github.com\/noah-schroeder\/AIDE\">GitHub<\/a><\/li>\r\n<\/ul>\r\n<\/td>\r\n<td style=\"width: 15%;height: 120px\"><img class=\"aligncenter wp-image-33 size-thumbnail\" src=\"https:\/\/pressbooks.bccampus.ca\/wp-content\/uploads\/sites\/2593\/2025\/11\/Dig-Deeper-2-150x150.png\" alt=\"\" width=\"150\" height=\"150\" \/><\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>","rendered":"<p>At this point, we have covered quite a bit. Open as a principle of software development, open as a principle of human and machine interpretability, and open as a facet of reproducible workflows.<\/p>\n<p>If we think back to the content covered in Open Workflows, and the discussion of reproducibility and replicability, it\u2019s worth considering that reproducibility is really about internal validation while replicability is about external validation. Reproducibility confirms the same data and processing methods will produce the same result. Replicability contributes to the evidence base by conducting a new study modeled on a previous study.<\/p>\n<p>In the spirit of open as it relates to the digital environment and reproducibility, one of the gold standards when we\u2019re looking at a single study is computational reproducibility; that is to say, if I pass off all of my inputs (data, scripts, etc.) to someone else, can they, on their computer, reproduce what I did exactly. Unfortunately, the answer is frequently no, because computers are complex environments, and no two machines are going to have exactly the same environment; hardware and software differences will exist and these will impact how data is processed by a program. In complex food production, like brewing beer, it\u2019s often said that making a great beer once is easy, but making it a second time is much harder. Small variations \u2014 such as precise temperatures, ingredient sources, and even the weather \u2014 can subtly change the flavour. The same challenge exists in computational reproducibility \u2014 unless we use a container and apply the concept of containerization.<\/p>\n<p>The full details of how containerization is deployed are really beyond an introductory section on open research. But the principles being addressed by containerization are critical to navigating a digital environment when we think about the ability of work, embedded within a piece of software, to be validated by others.<\/p>\n<h2>A Recipe for Understanding Containers<\/h2>\n<p>The thing about any piece of software or any script is that it is never fully self-contained. We always rely on dependencies or pre-existing bundles of code, usually called libraries. Think of it like baking muffins.<\/p>\n<p>When you write your R or Python script, you\u2019re writing out your recipe; a set of instructions with particular steps that need to be followed. Your recipe requires certain things to be fully executed though. It requires:<\/p>\n<ul>\n<li>an environment in which to run; let\u2019s call this your kitchen<\/li>\n<li>something to process your ingredients into the end product; let\u2019s call this your oven<\/li>\n<li>something to validate all the ingredients; let\u2019s call this your mixing bowl<\/li>\n<li>a list of ingredients; let\u2019s call these your dependencies or libraries.<\/li>\n<\/ul>\n<p>Finally, your favourite muffin recipe is dependent on you having eggs, butter, white flour, and cow\u2019s milk. Let\u2019s now imagine that when you built your working environment \u2014 your kitchen \u2014 you made sure to include a lifetime supply of all of these dependencies \u2014 your ingredients. All is good until you come home one day, and realize that your partner has done some upgrades. One of these upgrades is to replace all of your white flour with rye flour and your cow\u2019s milk with goat\u2019s milk.<\/p>\n<p>This upgrade was ostensibly made to reflect the need for a healthier lifestyle. Beyond potentially being annoyed about the lack of consultation, maybe you see the problem? Next time you try to make your muffins, your validator \u2014 your mixing bowl \u2014 will be expecting white flour and cow\u2019s milk. Unable to find these ingredients, your mixing bowl will fail to pass all the ingredients off to your oven. No more muffins, in spite of the most well-documented script \u2014 your recipe \u2014 being in hand.<\/p>\n<p>Even if you never had any upgrades done, what if your friend wanted your recipe? Sure, you could give them the script, and they could source all the ingredients. But if you wanted to ensure that the recipe was a perfect match to your own, you\u2019d gift-wrap all the ingredients with the recipe attached, ensuring success.<\/p>\n<p>This gift wrapping, or bundling, is exactly what software that containerizes a piece of code does \u2014 it ensures that the code is accompanied by the appropriate environment and dependencies so that it will run into the future. This is a critical aspect of reproducibility.<\/p>\n<p><a href=\"https:\/\/www.docker.com\/\">Docker<\/a> is a popular open-source tool for containerizing software and code. For those working in the realm of High-Performance Computing, <a href=\"https:\/\/apptainer.org\">Apptainer<\/a> is another popular option.<\/p>\n<div class=\"textbox shaded\">\n<table style=\"border-collapse: collapse;width: 100%;height: 120px\">\n<tbody>\n<tr style=\"height: 120px\">\n<td style=\"width: 85%;height: 120px\">\n<h5>Dig Deeper<\/h5>\n<p>To learn more about containers and the way they can improve reproducibility, review the following videos:<\/p>\n<ul>\n<li>A conference presentation on the basic elements and implementation of Docker:<a href=\"https:\/\/www.youtube.com\/watch?v=GqqX0j127wA\"> Using Docker Containers to Improve Reproducibility in PL\/SE Research<\/a> (42:08)<\/li>\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=DA87Ba2dpNM\">An introduction to containers using Singularity and some of the differences between Docker and Singularity <\/a>(48.23)<\/li>\n<li><a href=\"https:\/\/youtube.com\/playlist?list=PLKZ9c4ONm-VkxWW98Gcn9H6WwykMiqtnF\">An introduction to Apptainer<\/a> (8 part series)<\/li>\n<\/ul>\n<\/td>\n<td style=\"width: 15%;height: 120px\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-33 size-thumbnail\" src=\"https:\/\/pressbooks.bccampus.ca\/wp-content\/uploads\/sites\/2593\/2025\/11\/Dig-Deeper-2-150x150.png\" alt=\"\" width=\"150\" height=\"150\" srcset=\"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-content\/uploads\/sites\/2593\/2025\/11\/Dig-Deeper-2-150x150.png 150w, https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-content\/uploads\/sites\/2593\/2025\/11\/Dig-Deeper-2-65x64.png 65w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h2>Using GenAI for Data Analysis: Why Reproducibility Can Be Challenging<\/h2>\n<p><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_194_263\">Large Language Models<\/a> are powerful tools for data analysis tasks like sentiment analysis and text summarization. They can very quickly make sense of large amounts of text and generate useful insights in a natural, human-like way.<\/p>\n<p>But while they\u2019re impressive, using LLMs comes with a reproducibility challenge. If you run the same data analysis twice using an LLM, you might not get the exact same result. That\u2019s because LLMs rely on built-in randomness when generating their text output. This randomness is intentional and helps make the output more natural, but it also makes it harder to get consistent, repeatable results.<\/p>\n<p>The problem is even more complicated when you\u2019re using proprietary Software as a Service models like GPT-4 or Claude. These models are hosted by companies and act like \u201cblack boxes\u201d. You don\u2019t have access to their inner workings such as their training data or model weights. That means if the provider makes a change to the model behind the scenes, your results may change too, even if your inputs stay the same.<\/p>\n<p>To work around this, it\u2019s important to:<\/p>\n<ul>\n<li>Save all your prompts and settings<\/li>\n<li>Use fixed versions of the model when possible<\/li>\n<li>If available, use model settings such as setting Temperature to 0, using Greedy Decoding, and fixing a Random Seed, to minimize randomization<\/li>\n<li>Try repeating your analysis a few times to observe the scale of the random effects<\/li>\n<li>Consider using a locally hosted model such as Llama, Gemma or Mistral<\/li>\n<\/ul>\n<div class=\"textbox shaded\">\n<table style=\"border-collapse: collapse;width: 100%;height: 120px\">\n<tbody>\n<tr style=\"height: 120px\">\n<td style=\"width: 85%;height: 120px\">\n<h5>Dig Deeper<\/h5>\n<p>To learn more about data analysis with LLMs:<\/p>\n<ul>\n<li>Download and install <a href=\"https:\/\/github.com\/ollama\/ollama\">ollama<\/a>, a user friendly tool for running LLMs locally.<\/li>\n<li>Explore AI-Assisted Data Extraction (AIDE) <a href=\"https:\/\/youtu.be\/33fE_nqjoE0?si=jVleOwPKlxKOJ_xG\">Demo Video<\/a> <a href=\"https:\/\/github.com\/noah-schroeder\/AIDE\">GitHub<\/a><\/li>\n<\/ul>\n<\/td>\n<td style=\"width: 15%;height: 120px\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-33 size-thumbnail\" src=\"https:\/\/pressbooks.bccampus.ca\/wp-content\/uploads\/sites\/2593\/2025\/11\/Dig-Deeper-2-150x150.png\" alt=\"\" width=\"150\" height=\"150\" srcset=\"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-content\/uploads\/sites\/2593\/2025\/11\/Dig-Deeper-2-150x150.png 150w, https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-content\/uploads\/sites\/2593\/2025\/11\/Dig-Deeper-2-65x64.png 65w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div class=\"glossary\"><span class=\"screen-reader-text\" id=\"definition\">definition<\/span><template id=\"term_194_263\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_194_263\"><div tabindex=\"-1\"><p>A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Large_language_model\">Wikipedia<\/a>)<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><\/div>","protected":false},"author":1076,"menu_order":5,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-194","chapter","type-chapter","status-publish","hentry"],"part":184,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/pressbooks\/v2\/chapters\/194","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/wp\/v2\/users\/1076"}],"version-history":[{"count":3,"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/pressbooks\/v2\/chapters\/194\/revisions"}],"predecessor-version":[{"id":448,"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/pressbooks\/v2\/chapters\/194\/revisions\/448"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/pressbooks\/v2\/parts\/184"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/pressbooks\/v2\/chapters\/194\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/wp\/v2\/media?parent=194"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/pressbooks\/v2\/chapter-type?post=194"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/wp\/v2\/contributor?post=194"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/openscholarship\/wp-json\/wp\/v2\/license?post=194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}