{"id":131,"date":"2022-08-23T12:55:32","date_gmt":"2022-08-23T16:55:32","guid":{"rendered":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/chapter\/11-quantitative-measurement\/"},"modified":"2022-08-23T12:58:38","modified_gmt":"2022-08-23T16:58:38","slug":"11-quantitative-measurement","status":"publish","type":"chapter","link":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/chapter\/11-quantitative-measurement\/","title":{"raw":"Quantitative measurement","rendered":"Quantitative measurement"},"content":{"raw":"\n<div class=\"textbox examples\">\n<h3>Chapter Outline<\/h3>\n<ol>\n \t<li><a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/11-quantitative-measurement\/#11.1\">Conceptual definitions<\/a> (17 minute read)<\/li>\n \t<li><a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/11-quantitative-measurement\/#11.2\">Operational definitions<\/a> (36 minute read)<\/li>\n \t<li><a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/11-quantitative-measurement\/#11.3\">Measurement quality<\/a> (21 minute read)<\/li>\n \t<li><a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/11-quantitative-measurement\/#11.4\">Ethical and social justice considerations<\/a> (15 minute read)<\/li>\n<\/ol>\nContent warning: examples in this chapter contain references to ethnocentrism, toxic masculinity, racism in science, drug use, mental health and depression, psychiatric inpatient care, poverty and basic needs insecurity, pregnancy, and racism and sexism in the workplace and higher education.<a id=\"11.1\"><\/a>\n\n<\/div>\n<h1>11.1 Conceptual definitions<\/h1>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\nLearners will be able to...\n<ul>\n \t<li>Define measurement and conceptualization<\/li>\n \t<li>Apply Kaplan\u2019s three categories to determine the complexity of measuring a given variable<\/li>\n \t<li>Identify the role previous research and theory play in defining concepts<\/li>\n \t<li>Distinguish between unidimensional and multidimensional concepts<\/li>\n \t<li>Critically apply reification to how you conceptualize the key variables in your research project<\/li>\n<\/ul>\n<\/div>\nIn social science, when we use the term&nbsp;[pb_glossary id=\"585\"]<strong>measurement<\/strong>[\/pb_glossary], we mean the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. At its core, measurement is about defining one\u2019s terms in as clear and precise a way as possible. Of course, measurement in social science isn\u2019t quite as simple as using a measuring cup or spoon, but there are some basic tenets on which most social scientists agree when it comes to measurement. We\u2019ll explore those, as well as some of the ways that measurement might vary depending on your unique approach to the study of your topic.\n\nAn important point here is that measurement does not require any particular instruments or procedures. What it does require is a <em>systematic procedure<\/em> for assigning scores, meanings, and descriptions to individuals or objects so that those scores represent the characteristic of interest. You can measure phenomena in many different ways, but you must be sure that how you choose to measure gives you information and data that lets you answer your research question. If you're looking for information about a person's income, but your main points of measurement have to do with the money they have in the bank, you're not really going to find the information you're looking for!\n\nThe question of what social scientists measure can be answered by asking yourself what social scientists study. Think about the topics you\u2019ve learned about in other classes you\u2019ve taken or the topics you\u2019ve considered investigating yourself. Let\u2019s consider Melissa Milkie and Catharine Warner\u2019s study (2011)[footnote]Milkie, M. A., &amp; Warner, C. H. (2011). Classroom learning environments and the mental health of first grade children. <em>Journal of Health and Social Behavior, 52<\/em>, 4\u201322[\/footnote] of first graders\u2019 mental health. In order to conduct that study, Milkie and Warner needed to have some idea about how they were going to measure mental health. What does mental health mean, exactly? And how do we know when we\u2019re observing someone whose mental health is good and when we see someone whose mental health is compromised? Understanding how measurement works in research methods helps us answer these sorts of questions.\n\nAs you might have guessed, social scientists will measure just about anything that they have an interest in investigating. For example, those who are interested in learning something about the correlation between social class and levels of happiness must develop some way to measure both social class and happiness. Those who wish to understand how well immigrants cope in their new locations must measure immigrant status and coping. Those who wish to understand how a person\u2019s gender shapes their learning experiences must measure gender and workplace experiences (and get more specific about which experiences are under examination). You get the idea. Social scientists can and do measure just about anything you can imagine observing or wanting to study. Of course, some things are easier to observe or measure than others.\n\n&nbsp;\n\n<img class=\"aligncenter wp-image-4171\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/jose-martin-ramirez-carrasco-z2tinW7Z6Bw-unsplash-scaled-1.jpg\" alt=\"\" width=\"500\" height=\"750\">\n<h2>Observing your variables<\/h2>\nIn 1964, philosopher Abraham Kaplan (1964)[footnote]Kaplan, A. (1964). <em>The conduct of inquiry: Methodology for behavioral science<\/em>. San Francisco, CA: Chandler Publishing Company.[\/footnote] wrote <em>The<\/em>&nbsp;<em>Conduct of Inquiry,&nbsp;<\/em>which has since become a classic work in research methodology (Babbie, 2010).[footnote]Earl Babbie offers a more detailed discussion of Kaplan\u2019s work in his text. You can read it in: Babbie, E. (2010). <em>The practice of social research<\/em> (12th ed.). Belmont, CA: Wadsworth.[\/footnote] In his text, Kaplan describes different categories of things that behavioral scientists observe. One of those categories, which Kaplan called \u201cobservational terms,\u201d is probably the simplest to measure in social science. <strong>[pb_glossary id=\"628\"]Observational terms[\/pb_glossary]<\/strong> are the sorts of things that we can see with the naked eye simply by looking at them. Kaplan roughly defines them as conditions that are easy to identify and verify through direct observation. If, for example, we wanted to know how the conditions of playgrounds differ across different neighborhoods, we could directly observe the variety, amount, and condition of equipment at various playgrounds.\n\n<strong>[pb_glossary id=\"641\"]Indirect observables[\/pb_glossary]<\/strong>, on the other hand, are less straightforward to assess. In Kaplan's framework, they are conditions that are subtle and complex that we must use existing knowledge and intuition to define. If we conducted a study for which we wished to know a person\u2019s income, we\u2019d probably have to ask them their income, perhaps in an interview or a survey. Thus, we have observed income, even if it has only been observed indirectly. Birthplace might be another indirect observable. We can ask study participants where they were born, but chances are good we won\u2019t have directly observed any of those people being born in the locations they report.\n\nSometimes the measures that we are interested in are more complex and more abstract than observational terms or indirect observables. Think about some of the concepts you\u2019ve learned about in other classes\u2014for example, ethnocentrism. What is ethnocentrism? Well, from completing an earlier class you might know that it has something to do with the way a person judges another\u2019s culture. But how would you&nbsp;<em>measure&nbsp;<\/em>it? Here\u2019s another construct: bureaucracy. We know this term has something to do with organizations and how they operate but measuring such a construct is trickier than measuring something like a person\u2019s income. The theoretical concepts of ethnocentrism and bureaucracy represent ideas whose meanings we have come to agree on. Though we may not be able to observe these abstractions directly, we can observe their components.\n\nKaplan referred to these more abstract things that behavioral scientists measure as constructs.&nbsp;<strong>[pb_glossary id=\"663\"]Constructs[\/pb_glossary]<\/strong>&nbsp;are \u201cnot observational either directly or indirectly\u201d (Kaplan, 1964, p. 55),[footnote]Kaplan, A. (1964). <em>The conduct of inquiry: Methodology for behavioral science<\/em>. San Francisco, CA: Chandler Publishing Company.[\/footnote] but they can be defined based on observables. For example, the construct of bureaucracy could be measured by counting the number of supervisors that need to approve teacher reimbursements of routine personal spending on their classrooms. The greater the number of administrators that must sign off on routine matters, the greater the degree of bureaucracy. Similarly, we might be able to ask a person the degree to which they trust people from different cultures around the world and then assess the ethnocentrism inherent in their answers. We can measure constructs like bureaucracy and ethnocentrism by defining them in terms of what we can observe.[footnote]In this chapter, we will use the terms concept and construct interchangeably. While each term has a distinct meaning in research conceptualization, we do not believe this distinction is important enough to warrant discussion in this chapter. [\/footnote]\n\nThe idea of coming up with your own measurement tool might sound pretty intimidating at this point. The good news is that if you find something in the literature that works for you, you can use it (with proper attribution, of course). If there are only pieces of it that you like, you can reuse those pieces (with proper attribution and describing\/justifying any changes). You don't always have to start from scratch! Indeed, I would encourage you <em>not<\/em> to start from scratch.\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\nLook at the variables in your research question.\n<ul>\n \t<li>Classify them as direct observables, indirect observables, or constructs.<\/li>\n \t<li>Do you think measuring them will be easy or hard?<\/li>\n \t<li>What are your first thoughts about how to measure each variable? No wrong answers here, just write down a thought about each variable.<\/li>\n<\/ul>\n<\/div>\n&nbsp;\n\n<img class=\"aligncenter size-large wp-image-4172\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/simone-pellegrini-L3QG_OBluT0-unsplash-scaled-1.jpg\" alt=\"\" width=\"1024\" height=\"683\">\n<h2>Measurement starts with conceptualization<\/h2>\nIn order to measure the concepts in your research question, we first have to understand what we think about them. As an aside, the word <em>concept&nbsp;<\/em>has come up quite a bit, and it is important to be sure we have a shared understanding of that term. A&nbsp;[pb_glossary id=\"718\"]<strong>concept<\/strong>[\/pb_glossary] is the notion or image that we conjure up when we think of some cluster of related observations or ideas. For example, masculinity is a concept. What do you think of when you hear that word? Presumably, you imagine some set of behaviors and perhaps even a particular style of self-presentation. Of course, we can\u2019t necessarily assume that everyone conjures up the same set of ideas or images when they hear the word&nbsp;<em>masculinity<\/em>. While there are many possible ways to define the term and some may be more common or have more support than others, there is no universal definition of masculinity. What counts as masculine may shift over time, from culture to culture, and even from individual to individual (Kimmel, 2008).[footnote][\/footnote] This is why defining our concepts is so important.\n\n<span style=\"text-align: initial\"><span style=\"font-size: 1em\">Not all researchers clearly explain their theoretical or conceptual framework for their study, but they should! Without understanding how a researcher has defined their key concepts, it would be nearly impossible to understand the meaning of that researcher\u2019s findings and conclusions. Back in <a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/7-theory-and-paradigm\/\">Chapter 7<\/a>, you developed a theoretical framework for your study based on a survey of the theoretical literature in your topic area. If you haven't done that yet, consider flipping back to that section to familiarize yourself with some of the techniques for finding and using theories relevant to your research question. Continuing with our example on masculinity, we would need to survey the literature on theories of masculinity. After a few queries on masculinity, I found a wonderful article by Wong (2010)[footnote]Wong, Y. J., Steinfeldt, J. A., Speight, Q. L., &amp; Hickman, S. J. (2010). Content analysis of Psychology of men &amp; masculinity (2000\u20132008).&nbsp;<i>Psychology of Men &amp; Masculinity<\/i>,&nbsp;<i>11<\/i>(3), 170.[\/footnote] that analyzed eight years of the journal <em>Psychology of Men&nbsp;&amp; Masculinity<\/em> and analyzed <a href=\"https:\/\/www.researchgate.net\/profile\/Y-Joel-Wong\/publication\/232438006_Content_Analysis_of_Psychology_of_Men_Masculinity_2000-2008\/links\/565e3f8008aefe619b2705d3\/Content-Analysis-of-Psychology-of-Men-Masculinity-2000-2008.pdf\">how often different theories of masculinity were used<\/a>. Not only can I get a sense of which theories are more accepted and which are more marginal in the social science on masculinity, I am able to identify a range of options from which I can find the theory or theories that will inform my project.&nbsp;<\/span><\/span>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\nIdentify a specific theory (or more than one theory) and how it helps you understand...\n<ul>\n \t<li>Your independent variable(s).<\/li>\n \t<li>Your dependent variable(s).<\/li>\n \t<li>The relationship between your independent and dependent variables.<\/li>\n<\/ul>\nRather than completing this exercise from scratch, build from your theoretical or conceptual framework developed in previous chapters.\n\n<\/div>\nIn quantitative methods, <strong>[pb_glossary id=\"366\"]conceptualization[\/pb_glossary]<\/strong> involves writing out clear, concise definitions for our key concepts. These are the kind of definitions you are used to, like the ones in a dictionary. A conceptual definition involves defining a concept in terms of other concepts, usually by making reference to how other social scientists and theorists have defined those concepts in the past. Of course, new conceptual definitions are created all the time because our conceptual understanding of the world is always evolving.\n\nConceptualization is deceptively challenging\u2014spelling out exactly what the concepts in your research question mean to you. Following along with our example, think about what comes to mind when you read the term masculinity. How do you know masculinity when you see it? Does it have something to do with men or with social norms? If so, perhaps we could define masculinity as the social norms that men are expected to follow. That seems like a reasonable start, and at this early stage of conceptualization, brainstorming about the images conjured up by concepts and playing around with possible definitions is appropriate. Doing so can also be used as a tool to explore your own personal biases and assumptions--something that can help you limit their influence on your work in ways that could corrupt your findings down the line. However, this reflective engagement is just the first step. At this point, you should be moving beyond brainstorming for your key variables because you have read a good amount of research about them.\n\nIn addition, we should consult previous research and theory to understand the definitions that other scholars have already given for the concepts we are interested in. This doesn\u2019t mean we must use their definitions, but understanding how concepts have been defined in the past will help us to compare our conceptualizations with how other scholars define and relate concepts. Understanding prior definitions of our key concepts will also help us decide whether we plan to challenge those conceptualizations or rely on them for our own work. Finally, working on conceptualization is likely to help in the process of refining your research question to one that is specific and clear in what it asks. Conceptualization and operationalization (next section) are where \"the rubber meets the road,\" so to speak, and you have to specify what you mean by the question you are asking. As your conceptualization deepens, you will often find that your research question becomes more specific and clear.\n\nIf we turn to the literature on masculinity, we will surely come across work by <a href=\"https:\/\/www.youtube.com\/watch?v=wnLmKmTdAgM\">Michael Kimmel<\/a>, one of the preeminent masculinity scholars in the United States. After consulting Kimmel\u2019s prior work (2000; 2008),[footnote]Kimmel, M. (2000).&nbsp;<em>The<\/em><em>&nbsp;gendered society<\/em>. New York, NY: Oxford University Press; Kimmel, M. (2008). Masculinity. In W. A. Darity Jr. (Ed.),&nbsp;<em>International<\/em><em>&nbsp;encyclopedia of the social sciences&nbsp;<\/em>(2nd ed., Vol. 5, p. 1\u20135). Detroit, MI: Macmillan Reference USA[\/footnote] we might tweak our initial definition of masculinity. Rather than defining masculinity as \u201cthe social norms that men are expected to follow,\u201d perhaps instead we\u2019ll define it as \u201cthe social roles, behaviors, and meanings prescribed for men in any given society at any one time\u201d (Kimmel &amp; Aronson, 2004, p. 503).[footnote]Kimmel, M. &amp; Aronson, A. B. (2004).&nbsp;<em>Men and masculinities: A-J<\/em>. Denver, CO: ABL-CLIO.[\/footnote] Our revised definition is more precise and complex because it goes beyond addressing one aspect of men\u2019s lives (norms), and addresses three aspects: roles, behaviors, and meanings. It also implies that roles, behaviors, and meanings may vary across societies and over time. Using definitions developed by theorists and scholars is a good idea, though you may find that you want to define things your own way.\n\nAs you can see, conceptualization isn\u2019t as simple as applying any random definition that we come up with to a term. For example, note the difference between the research-based definition of masculinity and a basic dictionary <a href=\"https:\/\/www.merriam-webster.com\/dictionary\/masculinity\">definition<\/a>: \"the quality or nature of the male sex: the quality, state, or degree of being masculine or manly.\"\n\nDefining our terms may involve some brainstorming at the very beginning. But conceptualization must go beyond that, to engage with or critique existing definitions and conceptualizations in the literature. Once we\u2019ve brainstormed about the images associated with a particular word, we should also consult prior work to understand how others define the term in question. After we\u2019ve identified a clear definition that we\u2019re happy with, we should make sure that every term used in our definition will make sense to others. Are there terms used within our definition that also need to be defined? If so, our conceptualization is not yet complete. Our definition includes the concept of \"social roles,\" so we should have a definition for what those mean and become familiar with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Role_theory\">role theory<\/a> to help us with our conceptualization. If we don't know what roles are, how can we study them?\n\nLet's say we do all of that. We have a clear definition of the term <em>masculinity<\/em> with reference to previous literature and we also have a good understanding of the terms in our conceptual definition...then we're done, right? Not so fast. You\u2019ve likely met more than one man in your life, and you\u2019ve probably noticed that they are not the same, even if they live in the same society during the same historical time period. This could mean there are dimensions of masculinity. In terms of social scientific measurement, concepts can be said to have <strong>[pb_glossary id=\"376\"]multiple dimensions[\/pb_glossary]<\/strong>&nbsp;when there are multiple elements that make up a single concept. With respect to the term&nbsp;<em>masculinity<\/em>, dimensions could be based on gender identity, gender performance, sexual orientation, etc.. In any of these cases, the concept of masculinity would be considered to have multiple dimensions.\n\n<span style=\"text-align: initial;font-size: 1em\">While you do not need to spell out every possible dimension of the concepts you wish to measure, it is important to identify whether your concepts are <\/span><strong style=\"text-align: initial;font-size: 1em\">[pb_glossary id=\"383\"]unidimensional[\/pb_glossary]<\/strong><span style=\"text-align: initial;font-size: 1em\"> (and therefore relatively easy to define and measure) or multidimensional (and therefore require multi-part definitions and measures). In this way, how you conceptualize your variables determines how you will measure them in your study. Unidimensional concepts are those that are expected to have a single underlying dimension. These concepts can be measured using a single measure or test. Examples include simple concepts such as a person\u2019s weight, time spent sleeping, and so forth.&nbsp;<\/span>\n\n<span style=\"text-align: initial;font-size: 1em\">One frustrating thing is that there is no clear demarcation between concepts that are inherently unidimensional or multidimensional. Even something as simple as age could be broken down into multiple dimensions including mental age and chronological age, so where does conceptualization stop? How far down the dimensional rabbit hole do we have to go? Researchers should consider two things. First, how important is this variable in your study? If age is not important in your study (maybe it is a control variable), it seems like a waste of time to do a lot of work drawing from developmental theory to conceptualize this variable. A unidimensional measure from zero to dead is all the detail we need. On the other hand, if we were measuring the impact of age on masculinity, conceptualizing our independent variable (age) as multidimensional may provide a richer understanding of its impact on masculinity. Finally, your conceptualization will lead directly to your operationalization of the variable, and once your operationalization is complete, make sure someone reading your study could follow how your conceptual definitions informed the measures you chose for your variables.&nbsp;<\/span>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\nWrite a conceptual definition for your independent and dependent variables.\n<ul>\n \t<li>Cite and attribute definitions to other scholars, if you use their words.<\/li>\n \t<li>Describe how your definitions are informed by your theoretical framework.<\/li>\n \t<li>Place your definition in conversation with other theories and conceptual definitions commonly used in the literature.<\/li>\n \t<li>Are there multiple dimensions of your variables?<\/li>\n \t<li>Are any of these dimensions important for you to measure?<\/li>\n<\/ul>\n<\/div>\n&nbsp;\n\n<img class=\"aligncenter wp-image-119\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5573925_1280-774x1024.png\" alt=\"\" width=\"302\" height=\"400\">\n<h2>Do researchers actually know what we're talking about?<\/h2>\nConceptualization proceeds differently in qualitative research compared to quantitative research. Since qualitative researchers are interested in the understandings and experiences of their participants, it is less important for them to find one fixed definition for a concept before starting to interview or interact with participants. The researcher\u2019s job is to accurately and completely represent how their participants understand a concept, not to test their own definition of that concept.\n\nIf you were conducting qualitative research on masculinity, you would likely consult previous literature like Kimmel\u2019s work mentioned above. From your literature review, you may come up with a&nbsp;<em>working definition<\/em>&nbsp;for the terms you plan to use in your study, which can change over the course of the investigation. However, the definition that matters is the definition that your participants share during data collection. A working definition is merely a place to start, and researchers should take care not to think it is the only or best definition out there.\n\nIn qualitative inquiry, your participants are the experts on the concepts that arise during the study. Your job as the researcher is to accurately and reliably collect and interpret their understanding of the concepts they describe while answering your questions. Conceptualization of concepts is likely to change over the course of qualitative inquiry, as you learn more information from your participants. Indeed, getting participants to comment on, extend, or challenge the definitions and understandings of other participants is a hallmark of qualitative research. This is the opposite of quantitative research, in which definitions must be completely set in stone before the inquiry can begin.\n\nThe contrast between qualitative and quantitative conceptualization is instructive for understanding how quantitative methods (and positivist research in general) privilege the knowledge of the researcher over the knowledge of study participants and community members. Positivism holds that the researcher is the \"expert,\" and can define concepts based on their expert knowledge of the scientific literature. This knowledge is in contrast to the lived experience that participants possess from experiencing the topic under examination day-in, day-out. For this reason, it would be wise to remind ourselves not to take our definitions too seriously and be critical about the limitations of our knowledge.\n\nConceptualization must be open to revisions, even radical revisions, as scientific knowledge progresses. While I\u2019ve suggested consulting prior scholarly definitions of our concepts, you should not assume that prior, scholarly definitions are more real than the definitions we create. Likewise, we should not think that our own made-up definitions are any more real than any other definition. It would also be wrong to assume that just because definitions exist for some concept that the concept itself exists beyond some abstract idea in our heads. Building on the paradigmatic ideas behind interpretivism and the critical paradigm, researchers call the assumption that our abstract concepts exist in some concrete, tangible way is known as <strong>[pb_glossary id=\"390\"]reification[\/pb_glossary]<\/strong>. It explores the power dynamics behind how we can create reality by how we define it.\n\nReturning again to our example of masculinity. Think about our how our notions of masculinity have developed over the past few decades, and how different and yet so similar they are to patriarchal definitions throughout history. Conceptual definitions become more or less popular based on the power arrangements inside of social science the broader world. Western knowledge systems are privileged, while others are viewed as unscientific and marginal. The historical domination of social science by white men from WEIRD countries meant that definitions of masculinity were imbued their cultural biases and were designed explicitly and implicitly to preserve their power. This has inspired movements for <a href=\"https:\/\/www.india-seminar.com\/2009\/597\/597_shiv_visvanathan.htm\">cognitive justice<\/a> as we seek to use social science to achieve global development.\n<div class=\"textbox key-takeaways\">\n<h3>Key Takeaways<\/h3>\n<ul>\n \t<li>Measurement is the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating.<\/li>\n \t<li>Kaplan identified three categories of things that social scientists measure including observational terms, indirect observables, and constructs.<\/li>\n \t<li>Some concepts have multiple elements or dimensions.<\/li>\n \t<li>Researchers often use measures previously developed and studied by other researchers.<\/li>\n \t<li>Conceptualization is a process that involves coming up with clear, concise definitions.<\/li>\n \t<li>Conceptual definitions are based on the theoretical framework you are using for your study (and the paradigmatic assumptions underlying those theories).<\/li>\n \t<li>Whether your conceptual definitions come from your own ideas or the literature, you should be able to situate them in terms of other commonly used conceptual definitions.<\/li>\n \t<li>Researchers should acknowledge the limited explanatory power of their definitions for concepts and how oppression can shape what explanations are considered true or scientific.<\/li>\n<\/ul>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\nThink historically about the variables in your research question.\n<ul>\n \t<li>How has our conceptual definition of your topic changed over time?<\/li>\n \t<li>What scholars or social forces were responsible for this change?<\/li>\n<\/ul>\nTake a critical look at your conceptual definitions.\n<ul>\n \t<li>How participants might define terms for themselves differently, in terms of their daily experience?<\/li>\n \t<li>On what cultural assumptions are your conceptual definitions based?<\/li>\n \t<li>Are your conceptual definitions applicable across all cultures that will be represented in your sample?<a id=\"11.2\"><\/a><\/li>\n<\/ul>\n<\/div>\n<h1>11.2 Operational definitions<\/h1>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\nLearners will be able to...\n<ul>\n \t<li>Define and give an example of indicators and attributes for a variable<\/li>\n \t<li>Apply the three components of an operational definition to a variable<\/li>\n \t<li>Distinguish between levels of measurement for a variable and how those differences relate to measurement<\/li>\n \t<li>Describe the purpose of composite measures like scales and indices<\/li>\n<\/ul>\n<\/div>\nConceptual definitions are like dictionary definitions. They tell you what a concept means by defining it using other concepts. In this section we will move from the abstract realm (theory) to the real world (measurement). <strong>[pb_glossary id=\"616\"]Operationalization[\/pb_glossary]<\/strong> is the process by which researchers spell out precisely how a concept will be measured in their study. It involves identifying the specific research procedures we will use to gather data about our concepts. If conceptually defining your terms means looking at theory, how do you operationally define your terms? By looking for indicators of when your variable is present or not, more or less intense, and so forth. Operationalization is probably the most challenging part of quantitative research, but once it's done, the design and implementation of your study will be straightforward.\n\n&nbsp;\n\n<img class=\"aligncenter wp-image-120\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280-1024x1024.png\" alt=\"\" width=\"400\" height=\"400\">\n<h2>Indicators<\/h2>\nOperationalization works by identifying specific&nbsp;<strong>[pb_glossary id=\"719\"]indicators[\/pb_glossary]<\/strong> that will be taken to represent the ideas we are interested in studying. If we are interested in studying masculinity, then the indicators for that concept might include some of the social roles prescribed to men in society such as breadwinning or fatherhood. Being a breadwinner or a father might therefore be considered <em>indicators&nbsp;<\/em>of a person\u2019s masculinity. The extent to which a man fulfills either, or both, of these roles might be understood as clues (or indicators) about the extent to which he is viewed as masculine.\n\nLet\u2019s look at another example of indicators. Each day, Gallup researchers poll 1,000 randomly selected Americans to ask them about their well-being. To measure well-being, Gallup asks these people to respond to questions covering six broad areas: physical health, emotional health, work environment, life evaluation, healthy behaviors, and access to basic necessities. Gallup uses these six factors as indicators of the concept that they are really interested in, which is <a href=\"http:\/\/www.well-beingindex.com\/\">well-being<\/a>.\n\nIdentifying indicators can be even simpler than the examples described thus far. Political party affiliation is another relatively easy concept for which to identify indicators. If you asked a person what party they voted for in the last national election (or gained access to their voting records), you would get a good indication of their party affiliation. Of course, some voters split tickets between multiple parties when they vote and others swing from party to party each election, so our indicator is not perfect. Indeed, if our study were about political identity as a key concept, operationalizing it solely in terms of who they voted for in the previous election leaves out a lot of information about identity that is relevant to that concept. Nevertheless, it's a pretty good indicator of political party affiliation.\n\nChoosing indicators is not an arbitrary process. As described earlier, utilizing prior theoretical and empirical work in your area of interest is a great way to identify indicators in a scholarly manner. And your conceptual definitions will point you in the direction of relevant indicators. Empirical work will give you some very specific examples of how the important concepts in an area have been measured in the past and what sorts of indicators have been used. Often, it makes sense to use the same indicators as previous researchers; however, you may find that some previous measures have potential weaknesses that your own study will improve upon.\n\nAll of the examples in this chapter have dealt with questions you might ask a research participant on a survey or in a quantitative interview. If you plan to collect data from other sources, such as through direct observation or the analysis of available records, think practically about what the design of your study might look like and how you can collect data on various indicators feasibly. If your study asks about whether the participant regularly changes the oil in their car, you will likely not observe them directly doing so. Instead, you will likely need to rely on a survey question that asks them the frequency with which they change their oil or ask to see their car maintenance records.\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<ul>\n \t<li>What indicators are commonly used to measure the variables in your research question?<\/li>\n \t<li>How can you feasibly collect data on these indicators?<\/li>\n \t<li>Are you planning to collect your own data using a questionnaire or interview? Or are you planning to analyze available data like client files or raw data shared from another researcher's project?<\/li>\n<\/ul>\nRemember, you need [pb_glossary id=\"503\"]<strong>raw data<\/strong>[\/pb_glossary]. You research project cannot rely solely on the results reported by other researchers or the arguments you read in the literature. A literature review is only the first part of a research project, and your review of the literature should inform the indicators you end up choosing when <em>you<\/em> measure the variables in your research question.\n\n<\/div>\nUnlike conceptual definitions which contain other concepts, operational definition consists of the following components: (1) the variable being measured and its attributes, (2) the measure you will use, (3) how you plan to interpret the data collected from that measure to draw conclusions about the variable you are measuring.\n<h2>Step 1: Specifying variables and attributes<\/h2>\nThe first component, the variable, should be the easiest part. At this point in quantitative research, you should have a research question that has at least one independent and at least one dependent variable. Remember that variables must be able to vary. For example, the United States is not a variable. Country of residence is a variable, as is patriotism. Similarly, if your sample only includes men, gender is a constant in your study, not a variable. A&nbsp;<strong>[pb_glossary id=\"388\"]constant[\/pb_glossary]<\/strong> is a characteristic that does not change in your study.\n\nWhen social scientists measure concepts, they sometimes use the language of variables and attributes. A&nbsp;<strong>[pb_glossary id=\"4195\"]variable[\/pb_glossary]<\/strong> refers to a quality or quantity that varies across people or situations. <strong>[pb_glossary id=\"387\"]Attributes[\/pb_glossary]<\/strong>&nbsp;are the characteristics that make up a variable. For example, the variable hair color would contain attributes like blonde, brown, black, red, gray, etc. A variable\u2019s attributes determine its level of measurement. There are four possible levels of measurement: nominal, ordinal, interval, and ratio. The first two levels of measurement are&nbsp;<strong>[pb_glossary id=\"695\"]categorical[\/pb_glossary]<\/strong>, meaning their attributes are categories rather than numbers. The latter two levels of measurement are&nbsp;<strong>[pb_glossary id=\"654\"]continuous[\/pb_glossary]<\/strong>, meaning their attributes are numbers.\n\n[caption id=\"attachment_130\" align=\"aligncenter\" width=\"654\"]<img class=\"size-large wp-image-4175\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/tommy-van-kessel-BXFY8_iii9M-unsplash-scaled-1.jpg\" alt=\"\" width=\"654\" height=\"1024\"> I exist to frustrate researchers' categorizations.[\/caption]\n<h3>Levels of measurement<\/h3>\nHair color is an example of a nominal level of measurement.&nbsp;<strong>[pb_glossary id=\"720\"]Nominal[\/pb_glossary]<\/strong> measures are categorical, and those categories cannot be mathematically ranked. As a brown-haired person (with some gray), I can\u2019t say for sure that brown-haired people are better than blonde-haired people. As with all nominal levels of measurement, there is no ranking order between hair colors; they are simply different. That is what constitutes a nominal level--gender and race are also measured at the nominal level.\n\nWhat attributes are contained in the variable&nbsp;<em>hair color<\/em>? While blonde, brown, black, and red are common colors, some people may not fit into these categories if we only list these attributes. My wife, who currently has purple hair, wouldn\u2019t fit anywhere. This means that our attributes were not exhaustive. <strong>[pb_glossary id=\"721\"]Exhaustiveness[\/pb_glossary]<\/strong>&nbsp;means that all possible attributes are listed. We may have to list a lot of colors before we can meet the criteria of exhaustiveness. Clearly, there is a point at which exhaustiveness has been reasonably met. If a person insists that their hair color is&nbsp;<em>light burnt sienna<\/em>, it is not your responsibility to list that as an option. Rather, that person would reasonably be described as brown-haired. Perhaps listing a category for&nbsp;<em>other color<\/em>&nbsp;would suffice to make our list of colors exhaustive.\n\nWhat about a person who has multiple hair colors at the same time, such as red and black? They would fall into multiple attributes. This violates the rule of&nbsp;<strong>[pb_glossary id=\"722\"]mutual exclusivity[\/pb_glossary]<\/strong>, in which a person cannot fall into two different attributes. Instead of listing all of the possible combinations of colors, perhaps you might include a&nbsp;<em>multi-color<\/em>&nbsp;attribute to describe people with more than one hair color.\n\nMaking sure researchers provide mutually exclusive and exhaustive attributes is about making sure all people are represented in the data record. For many years, the attributes for gender were only male or female. Now, our understanding of gender has evolved to encompass more attributes that better reflect the diversity in the world. Children of parents from different races were often classified as one race or another, even if they identified with both cultures. The option for bi-racial or multi-racial on a survey not only more accurately reflects the racial diversity in the real world but validates and acknowledges people who identify in that manner. If we did not measure race in this way, we would leave empty the data record for people who identify as biracial or multiracial, impairing our search for truth.\n\nUnlike nominal-level measures, attributes at the&nbsp;<strong>[pb_glossary id=\"524\"]ordinal[\/pb_glossary]<\/strong>&nbsp;level can be rank ordered. For example, someone\u2019s degree of satisfaction in their romantic relationship can be ordered by rank. That is, you could say you are not at all satisfied, a little satisfied, moderately satisfied, or highly satisfied. Note that even though these have a rank order to them (not at all satisfied is certainly worse than highly satisfied), we cannot calculate a mathematical distance between those attributes. We can simply say that one attribute of an ordinal-level variable is more or less than another attribute.\n\nThis can get a little confusing when using <strong>[pb_glossary id=\"723\"]rating scales[\/pb_glossary]<\/strong>. If you have ever taken a customer satisfaction survey or completed a course evaluation for school, you are familiar with rating scales. \u201cOn a scale of 1-5, with 1 being the lowest and 5 being the highest, how likely are you to recommend our company to other people?\u201d That surely sounds familiar. Rating scales use numbers, but only as a shorthand, to indicate what attribute (highly likely, somewhat likely, etc.) the person feels describes them best. You wouldn\u2019t say you are \u201c2\u201d likely to recommend the company, but you would say you are not very likely to recommend the company. Ordinal-level attributes must also be exhaustive and mutually exclusive, as with nominal-level variables.\n\nAt the&nbsp;<strong>[pb_glossary id=\"461\"]interval[\/pb_glossary]&nbsp;<\/strong>level, attributes must also be exhaustive and mutually exclusive and there is equal distance between attributes. Interval measures are also continuous, meaning their attributes are numbers, rather than categories. IQ scores are interval level, as are temperatures in Fahrenheit and Celsius. Their defining characteristic is that we can say how much more or less one attribute differs from another. We cannot, however, say with certainty what the ratio of one attribute is in comparison to another. For example, it would not make sense to say that a person with an IQ score of 140 has twice the IQ of a person with a score of 70, or that 20 degrees is twice as hot as 10 degrees. However, the difference between IQ scores of 80 and 100 is the same as the difference between IQ scores of 120 and 140 (and the difference between a temperature of 20 and 10 is the same as the difference between 35 and 25).\n\nWhile we cannot say that someone with an IQ of 140 is twice as intelligent as someone with an IQ of 70 because IQ is measured at the interval level, we can say that someone with six siblings has twice as many as someone with three because number of siblings is measured at the ratio level. Finally, at the <strong>[pb_glossary id=\"462\"]ratio[\/pb_glossary]&nbsp;<\/strong>level, attributes are mutually exclusive and exhaustive, attributes can be rank ordered, the distance between attributes is equal, and attributes have a true zero point.&nbsp;Thus, with these variables, we <em>can&nbsp;<\/em>say what the ratio of one attribute is in comparison to another. Examples of ratio-level variables include age and years of education. We know that a person who is 12 years old is twice as old as someone who is 6 years old. Height measured in meters and weight measured in kilograms are good examples. So are counts of discrete objects or events such as the number of siblings one has or the number of questions a student answers correctly on an exam. The differences between each level of measurement are visualized in Table 11.1.\n<table><caption>Table 11.1 Criteria for Different Levels of Measurement<\/caption>\n<tbody>\n<tr>\n<td><\/td>\n<td>Nominal<\/td>\n<td>Ordinal<\/td>\n<td>Interval<\/td>\n<td>Ratio<\/td>\n<\/tr>\n<tr>\n<td>Exhaustive<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<\/tr>\n<tr>\n<td>Mutually exclusive<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<\/tr>\n<tr>\n<td>Rank-ordered<\/td>\n<td><\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<\/tr>\n<tr>\n<td>Equal distance between attributes<\/td>\n<td><\/td>\n<td><\/td>\n<td>X<\/td>\n<td>X<\/td>\n<\/tr>\n<tr>\n<td>True zero point<\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td>X<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h4>Levels of measurement=levels of specificity<\/h4>\nWe have spent time learning how to determine our data's level of measurement. Now what? How could we use this information to help us as we measure concepts and develop measurement tools? First, the types of statistical tests that we are able to use are generally dependent on our data's level of measurement.&nbsp;With nominal-level measurement, for example, the only available measure of central tendency is the mode. With ordinal-level measurement, the median or mode can be used as indicators of central tendency[footnote]That said, when using a Lickert scale, which is an ordinal scale, many researchers will argue that averages, measures of variation, and parametric tests are appropriate. For more on this, see Sullivan, G. M., &amp; Artino, A. R., Jr (2013). Analyzing and interpreting data from likert-type scales. <i>Journal of graduate medical education<\/i>, <i>5<\/i>(4), 541\u2013542. <a href=\"https:\/\/doi.org\/10.4300\/JGME-5-4-18\">https:\/\/doi.org\/10.4300\/JGME-5-4-18<\/a>&nbsp;and Norman G. (2010). Likert scales, levels of measurement and the \"laws\" of statistics. <i>Advances in health sciences education : theory and practice<\/i>, <i>15<\/i>(5), 625\u2013632. <a href=\"https:\/\/doi.org\/10.1007\/s10459-010-9222-y\">https:\/\/doi.org\/10.1007\/s10459-010-9222-y<\/a>[\/footnote]. Interval and ratio-level measurement are typically considered the most desirable because they permit for any indicators of central tendency to be computed (i.e., mean, median, or mode). Also, ratio-level measurement is the only level that allows meaningful statements about ratios of scores. The higher the level of measurement, the more complex statistical tests we are able to conduct. This knowledge may help us decide what kind of data we need to gather, and how.\n\nThat said, we have to balance this knowledge with the understanding that sometimes, collecting data at a higher level of measurement could negatively impact our studies. For instance, sometimes providing answers in ranges may make prospective participants feel more comfortable responding to sensitive items. Imagine that you were interested in collecting information on topics such as income, number of sexual partners, number of times someone used illicit drugs, etc. You would have to think about the sensitivity of these items and determine if it would make more sense to collect some data at a lower level of measurement (e.g., asking if they are sexually active or not (nominal) versus their total number of sexual partners (ratio)).\n\nFinally, sometimes when analyzing data, researchers find a need to change a data's level of measurement. For example, a few years ago, a student was interested in studying the relationship between mental health and life satisfaction. This student used a variety of measures. One item asked about the number of mental health symptoms, reported as the actual number. When analyzing data, my student examined the mental health symptom variable and noticed that she had two groups, those with none or one symptoms and those with many symptoms. Instead of using the ratio level data (actual number of mental health symptoms), she collapsed her cases into two categories, few and many. She decided to use this variable in her analyses. It is important to note that you can move a higher level of data to a lower level of data; however, you are unable to move a lower level to a higher level.\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<ul>\n \t<li>Check that the variables in your research question can vary...and that they are not constants or one of many potential attributes of a variable.<\/li>\n \t<li>Think about the attributes your variables have. Are they categorical or continuous? What level of measurement seems most appropriate?<\/li>\n<\/ul>\n<\/div>\n&nbsp;\n\n<img class=\"aligncenter size-large wp-image-4176\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/markus-winkler-htShI76GLDM-unsplash-scaled-1.jpg\" alt=\"\" width=\"1024\" height=\"683\">\n<h2>Step 2: Specifying measures for each variable<\/h2>\nLet\u2019s pick a research question and walk through the process of operationalizing variables to see how specific we need to get. I\u2019m going to hypothesize that students in a class who are unmotivated are less likely to be satisfied with instruction. Remember, this would be a direct relationship\u2014as motivation decreases, satisfaction decreases. In this question, motivation&nbsp;is my independent variable (the cause) and satisfaction with instruction is my dependent variable (the effect). Now we have identified our variables, their attributes, and levels of measurement, we move onto the second component: the measure itself.\n\nSo, how would you measure my key variables: motivation&nbsp;and satisfaction? What indicators would you look for? Some students might say that motivation could be measured by observing a participant\u2019s body language. They may also say that a motivated&nbsp;person will often express feelings of engagement or energy. In addition, a satisfied person might be happy around instructors and often express gratitude. While these factors may indicate that the variables are present, they lack coherence. Unfortunately, what this \u201cmeasure\u201d is actually saying is that \u201cI know motivation and satisfaction when I see them.\u201d While you are likely a decent judge of motivation and satisfaction, you need to provide more information in a research study for how you plan to measure your variables. Your judgment is subjective, based on your own idiosyncratic experiences with motivation and satisfaction. They couldn\u2019t be replicated by another researcher. They also can\u2019t be done consistently for a large group of people. Operationalization requires that you come up with a specific and rigorous measure for seeing who is motivation or satisfied.\n\nFinding a good measure for your variable depends on the kind of variable it is. Variables that are directly observable don't come up very often in my students' classroom projects, but they might include things like taking someone's blood pressure, marking attendance or participation in a group, and so forth. To measure an indirectly observable variable like age, you would probably put a question on a survey that asked, \u201cHow old are you?\u201d Measuring a variable like income might require some more thought, though. Are you interested in this person\u2019s individual income or the income of their family unit? This might matter if your participant does not work or is dependent on other family members for income. Do you count income from social welfare programs? Are you interested in their income per month or per year? Even though indirect observables are relatively easy to measure, the measures you use must be clear in what they are asking, and operationalization is all about figuring out the specifics of what you want to know. For more complicated constructs, you will need compound measures (that use multiple indicators to measure a single variable).\n\nHow you plan to collect your data also influences how you will measure your variables. For researchers using secondary data like student records as a data source, you are limited by what information is in the data sources you can access. If your organization uses a given measurement for a learning outcome, that is the one you will use in your study. One of the benefits of collecting your own data is being able to select the measures you feel best exemplify your understanding of the topic.\n<h3>Measuring unidimensional concepts<\/h3>\nThe previous section mentioned two important considerations: how complicated the variable is and how you plan to collect your data. With these in hand, we can use the level of measurement to further specify how you will measure your variables and consider specialized rating scales developed by social science researchers.\n<h4>Measurement at each level<\/h4>\nNominal measures assess categorical variables. These measures are used for variables or indicators that have mutually exclusive attributes, but that cannot be rank-ordered. Nominal measures ask about the variable and provide names or labels for different attribute values like social work, counseling, and nursing for the variable profession. Nominal measures are relatively straightforward.\n\nOrdinal measures often use a rating scale. It is an ordered set of responses that participants must choose from. Figure 11.1 shows several examples. The number of response options on a typical rating scale is usualy five or seven, though it can range from three to 11. Five-point scales are best for unipolar scales where only one construct is tested, such as frequency (Never, Rarely, Sometimes, Often, Always). Seven-point scales are best for bipolar scales where there is a dichotomous spectrum, such as liking (Like very much, Like somewhat, Like slightly, Neither like nor dislike, Dislike slightly, Dislike somewhat, Dislike very much). Sometime you want people to be forced to choose one way or another, so you might use a forced-choice scale of even numbered options (4, 6, or 8) that doesn't offer a mid-point option. For bipolar questions, it is useful to offer an earlier question that branches them into an area of the scale; if asking about liking ice cream, first ask \u201cDo you generally like or dislike ice cream?\u201d Once the respondent chooses like or dislike, refine it by offering them relevant choices from the seven-point scale. Branching improves both reliability and validity (Krosnick &amp; Berent, 1993).[footnote]Krosnick, J.A. &amp; Berent, M.K. (1993). Comparisons of party identification and policy preferences: The impact of survey question format.&nbsp;<em>American Journal of Political Science, 27<\/em>(3), 941-964.[\/footnote] Although you often see scales with numerical labels, it is best to only present verbal labels to the respondents but convert them to numerical values in the analyses. Avoid partial labels or length or overly specific labels. In some cases, the verbal labels can be supplemented with (or even replaced by) meaningful graphics. The last rating scale shown in Figure 11.1 is a visual-analog scale, on which participants make a mark somewhere along the horizontal line to indicate the magnitude of their response.\n\n&nbsp;\n\n[caption id=\"attachment_130\" align=\"aligncenter\" width=\"900\"]<img class=\"size-full wp-image-4149\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/9.2.png\" alt=\"\" width=\"900\" height=\"461\"> Figure 11.1 Example rating scales for closed-ended questionnaire items[\/caption]\n\nInterval measures are those where the values measured are not only rank-ordered, but are also equidistant from adjacent attributes. For example, the temperature scale (in Fahrenheit or Celsius), where the difference between 30 and 40 degree Fahrenheit is the same as that between 80 and 90 degree Fahrenheit. Likewise, if you have a scale that asks respondents\u2019 annual income using the following attributes (ranges): $0 to 10,000, $10,000 to 20,000, $20,000 to 30,000, and so forth, this is also an interval measure, because the mid-point of each range (i.e., $5,000, $15,000, $25,000, etc.) are equidistant from each other. The intelligence quotient (IQ) scale is also an interval measure, because the measure is designed such that the difference between IQ scores 100 and 110 is supposed to be the same as between 110 and 120 (although in all honesty, we do not really know whether that is truly the case). Interval measures allow us to examine \u201chow much more\u201d is one attribute when compared to another, which is not possible with nominal or ordinal measures. You may find researchers who argue that ordinal rating scales are actually interval measures so that we can use different statistical techniques for analyzing them. As we will discuss in the latter part of the chapter, this is debatable because there is no way to know whether the difference between a 3 and a 4 on a rating scale is the same as the difference between a 2 and a 3. Those numbers are just placeholders for categories.\n\nRatio measures are those that have all the qualities of nominal, ordinal, and interval scales, and in addition, also have a \u201ctrue zero\u201d point (where the value zero implies lack or non-availability of the underlying construct). Think about how to measure the number of people working in human resources at a social work agency. It could be one, several, or none (if the company contracts out for those services). Measuring interval and ratio data is relatively easy, as people either select or input a number for their answer. If you ask a person how many eggs they purchased last week, they can simply tell you they purchased `a dozen eggs at the store, two at breakfast on Wednesday, or none at all.\n<h4>Commonly used rating scales in questionnaires<\/h4>\n<p class=\"c4\"><span class=\"c5 c1\">The level of measurement will give you the basic information you need, but social scientists have developed specialized instruments for use in questionnaires, a common tool used in quantitative research.&nbsp;<\/span><span class=\"c5 c1\">Although <strong>[pb_glossary id=\"386\"]Likert scale[\/pb_glossary]<\/strong> is a term colloquially used to refer to almost any rating scale (e.g., a 0-to-10 life satisfaction scale), it has a much more precise meaning. <\/span><span class=\"c5 c1\">In the 1930s, researcher Rensis Likert (pronounced LICK-ert) created a new approach for measuring people\u2019s attitudes (Likert, 1932)<\/span><span class=\"c22 c5\">.[footnote]Likert, R. (1932). A technique for the measurement of attitudes.&nbsp;<em>Archives of Psychology,140<\/em>, 1\u201355.[\/footnote]<\/span><span class=\"c5 c1\">&nbsp;It involves presenting people with several statements\u2014including both favorable and unfavorable statements\u2014about some person, group, or idea. Respondents then express their agreement or disagreement with each statement on a 5-point scale:&nbsp;<\/span><em><span class=\"c5 c8 c1\">Strongly Agree<\/span><span class=\"c5 c1\">,&nbsp;<\/span><span class=\"c5 c8 c1\">Agree<\/span><span class=\"c5 c1\">,&nbsp;<\/span><span class=\"c5 c8 c1\">Neither Agree nor Disagree<\/span><span class=\"c5 c1\">,&nbsp;<\/span><span class=\"c5 c8 c1\">Disagree<\/span><span class=\"c5 c1\">,&nbsp;<\/span><span class=\"c5 c8 c1\">Strongly Disagree<\/span><\/em><span class=\"c5 c1\">. Numbers are assigned to each response a<\/span><span class=\"c5 c1\">nd then summed across all items to produce a score representing the attitude toward the person, group, or idea. For items that are phrased in an opposite direction (e.g., negatively worded statements instead of positively worded statements), reverse coding is used so that the numerical scoring of statements also runs in the opposite direction.&nbsp;<\/span><span class=\"c5 c1\">The entire set of items came to be called a Likert scale, as indicated in Table 11.2 below.<\/span><\/p>\n<p class=\"c33 c70\"><span class=\"c5 c1\">Unless you are measuring people\u2019s attitude toward something by assessing their level of agreement with several statements about it, it is best to avoid calling it a Likert scale. You are probably just using a rating scale. Likert scales allow for more granularity (more finely tuned response) than yes\/no items, including whether respondents are neutral to the statement. <\/span>Below is an example of how we might use a Likert scale to assess your attitudes about research as you work your way through this textbook.<\/p>\n&nbsp;\n<table class=\"grid\" style=\"border-collapse: collapse;width: 0%;height: 131px\" border=\"0\"><caption>Table 11.2 Likert scale<\/caption>\n<tbody>\n<tr>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><strong>Strongly agree<\/strong><\/td>\n<td style=\"width: 16.6667%\"><strong>Agree<\/strong><\/td>\n<td style=\"width: 16.6667%\"><strong>Neutral<\/strong><\/td>\n<td style=\"width: 16.6667%\"><strong>Disagree<\/strong><\/td>\n<td style=\"width: 16.6667%\"><strong>Strongly disagree<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 16.6667%\">I like research more now than when I started reading this book.<\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 16.6667%\">This textbook is easy to use.<\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 16.6667%\">I feel confident about how well I understand levels of measurement.<\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 16.6667%\">This textbook is helping me plan my research proposal.<\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<strong>[pb_glossary id=\"385\"]Semantic differential scales[\/pb_glossary]<\/strong> are composite (multi-item) scales in which respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites. Whereas in the above Likert scale, the participant is asked how much they <em>agree or disagree<\/em> with a statement, in a semantic differential scale the participant is asked to indicate how they <em>feel<\/em> about a specific item. This makes the s<span style=\"font-size: 1em\">emantic differential scale an excellent technique for measuring people\u2019s attitudes or feelings toward objects, events, or behaviors.<\/span><span style=\"text-align: initial;font-size: 1em\"> Table 11.3 is an example of a semantic differential scale that was created to assess participants' feelings about this textbook.&nbsp;<\/span>\n\n&nbsp;\n<table style=\"height: 90px\"><caption><strong>Table 11.3. A semantic differential scale for measuring attitudes towards a textbook<\/strong><\/caption>\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 779.826px\" colspan=\"7\"><em><strong>1) <\/strong><span style=\"text-decoration: underline\"><strong>How would you rate your opinions toward this textbook?<\/strong><\/span><\/em><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\">Very much<\/td>\n<td style=\"height: 15px;width: 104.444px\">Somewhat<\/td>\n<td style=\"height: 15px;width: 77.3438px\">Neither<\/td>\n<td style=\"height: 15px;width: 104.444px\">Somewhat<\/td>\n<td style=\"height: 15px;width: 107.465px\">Very much<\/td>\n<td style=\"height: 15px;width: 103.524px\"><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\">Boring<\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 77.3438px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 103.524px\">Exciting<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\">Useless<\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 77.3438px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 103.524px\">Useful<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\">Hard<\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 77.3438px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 103.524px\">Easy<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\">Irrelevant<\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 77.3438px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 103.524px\">Applicable<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div>Notice that on a Likert scale, each item is different but the choices for the scale are the same (e.g., strongly agree, agree, etc.). However, for a semantic differential scale, the thing that you are reviewing, in this case, beliefs about research content, remains the same. It is the choices that change. Semantic differential is believed to be an excellent technique for measuring people\u2019s attitude or feelings toward objects, events, or behaviors.<\/div>\nThis composite scale was designed by Louis Guttman and uses a series of items arranged in increasing order of intensity (least intense to most intense) of the concept. This type of scale allows us to understand the intensity of beliefs or feelings. Each item in the above <strong>[pb_glossary id=\"384\"]Guttman scale[\/pb_glossary]<\/strong> has a weight (this is not indicated on the tool) which varies with the intensity of that item, and the weighted combination of each response is used as an aggregate measure of an observation.\n<div class=\"textbox shaded\">\n\n<strong>Example Guttman Scale Items<\/strong>\n<ol>\n \t<li>I often felt the material was not engaging&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Yes\/No<\/li>\n \t<li>I was often thinking about other things in class&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Yes\/No<\/li>\n \t<li>I was often working on other tasks during class&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Yes\/No<\/li>\n \t<li>I will work to abolish research from the curriculum&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Yes\/No<\/li>\n<\/ol>\n<\/div>\nNotice how the items move from lower intensity to higher intensity. A researcher reviews the yes answers and creates a score for each participant.\n<h3>Composite measures: Scales and indices<\/h3>\nDepending on your research design, your measure may be something you put on a survey or pre\/post-test that you give to your participants. For a variable like age or income, one well-worded question may suffice. Unfortunately, most variables in the social world are not so simple. Motivation and satisfaction are multidimensional concepts. Relying on a single indicator like a question that asks \"Yes or no, are you motivated?\u201d does not encompass the complexity of motivation, including issues with mood, energy and happiness. There is no easy way to delineate between multidimensional and unidimensional concepts, as its all in how you think about your variable. Satisfaction could be validly measured using a unidimensional ordinal rating scale. However, if satisfaction were a key variable in our study, we would need a theoretical framework and conceptual definition for it. That means we'd probably have more indicators to ask about like timeliness, respect, sensitivity, and many others, and we would want our study to say something about what satisfaction truly means in terms of our other key variables. However, if satisfaction is not a key variable in your conceptual framework, it makes sense to operationalize it as a unidimensional concept.\n\nFor more complicated measures, researchers use scales and indices (sometimes called indexes) to measure their variables because they assess multiple indicators to develop a composite (or total) score. Co<span style=\"font-size: 1em\">mposite scores provide a much greater understanding of concepts than a single item could. Although we won't delve too deeply into the process of scale development, we will cover some important topics for you to understand how scales and indices developed by other researchers can be used in your project.<\/span>\n\nAlthough they exhibit differences (which will later be discussed) the two have in common various factors.\n<ul>\n \t<li>Both are ordinal measures of variables.<\/li>\n \t<li>Both can order the units of analysis in terms of specific variables.<\/li>\n \t<li>Both are [pb_glossary id=\"375\"]<strong>composite measures<\/strong>[\/pb_glossary].<\/li>\n<\/ul>\n&nbsp;\n\n<img class=\"aligncenter size-large wp-image-124\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920-1024x691.png\" alt=\"\" width=\"1024\" height=\"691\">\n<h4>Scales<\/h4>\nThe previous section discussed how to measure respondents\u2019 responses to predesigned items or indicators belonging to an underlying construct. But how do we create the indicators themselves? The process of creating the indicators is called scaling. More formally, scaling is a branch of measurement that involves the construction of measures by associating qualitative judgments about unobservable constructs with quantitative, measurable metric units. Stevens (1946)[footnote]Stevens, S. S. (1946). On the Theory of Scales of Measurement.&nbsp;<i>Science<\/i>,&nbsp;<i>103<\/i>(2684), 677-680.[\/footnote] said, \u201cScaling is the assignment of objects to numbers according to a rule.\u201d This process of measuring abstract concepts in concrete terms remains one of the most difficult tasks in empirical social science research.\n\nThe outcome of a scaling process is a <strong>[pb_glossary id=\"724\"]scale[\/pb_glossary]<\/strong>, which is an empirical structure for measuring items or indicators of a given construct. Understand that multidimensional \u201cscales\u201d, as discussed in this section, are a little different from \u201crating scales\u201d discussed in the previous section. A rating scale is used to capture the respondents\u2019 reactions to a given item on a questionnaire. For example, an ordinally scaled item captures a value between \u201cstrongly disagree\u201d to \u201cstrongly agree.\u201d Attaching a rating scale to a statement or instrument is not scaling. Rather, scaling is the formal process of developing scale items, before rating scales can be attached to those items.\n\nIf creating your own scale sounds painful, don\u2019t worry! For most multidimensional variables, you would likely be duplicating work that has already been done by other researchers. Specifically, this is a branch of science called psychometrics. You do not need to create a scale for motivation because scales such as the Intrinsic Motivation Inventory (IMI), General Causality Orientations Scale (GCOS), and the Sport Climate Questionnaire (SCQ) have been developed and refined over dozens of years to measure variables like motivation. As we will discuss in the next section, these scales have been shown to be reliable and valid. While you could create a new scale to measure motivation or satisfaction, a study with rigor would pilot test and refine that new scale over time to make sure it measures the concept accurately and consistently. This high level of rigor is often unachievable in student research projects because of the cost and time involved in pilot testing and validating, so using existing scales is recommended.\n\nUnfortunately, there is no good one-stop-shop for psychometric scales. The <a href=\"https:\/\/databases.lib.sfu.ca\/record\/61245147620003610\/Mental-Measurements-Yearbook-with-Tests-in-Print\">Mental Measurements Yearbook<\/a> provides a searchable database of measures for social science variables, though it is woefully incomplete and often does not contain the full documentation for scales in its database. You can access it from a university library\u2019s list of databases. If you can\u2019t find anything in there, your next stop should be the methods section of the articles in your literature review. The methods section of each article will detail how the researchers measured their variables, and often the results section is instructive for understanding more about measures. In a quantitative study, researchers may have used a scale to measure key variables and will provide a brief description of that scale, its names, and maybe a few example questions. If you need more information, look at the results section and tables discussing the scale to get a better idea of how the measure works. Looking beyond the articles in your literature review, searching Google Scholar using queries like \u201cmotivation scale\u201d or \u201csatisfaction scale\u201d should also provide some relevant results. For example, searching for documentation for the Rosenberg Self-Esteem Scale (which we will discuss in the next section), I found this <a href=\"http:\/\/www.integrativehealthpartners.org\/downloads\/ACTmeasures.pdf\">report from researchers investigating acceptance and commitment therapy<\/a> which details this scale and many others used to assess mental health outcomes. If you find the name of the scale somewhere but cannot find the documentation (all questions and answers plus how to interpret the scale), a general web search with the name of the scale and \".pdf\" may bring you to what you need. Or, to get professional help with finding information, always ask a librarian!\n\nUnfortunately, these approaches do not guarantee that you will be able to view the scale itself or get information on how it is interpreted. Many scales cost money to use and may require training to properly administer. You may also find scales that are related to your variable but would need to be slightly modified to match your study\u2019s needs. You could adapt a scale to fit your study, however changing even small parts of a scale can influence its accuracy and consistency. While it is perfectly acceptable in student projects to adapt a scale without testing it first (time may not allow you to do so), pilot testing is always recommended for adapted scales, and researchers seeking to draw valid conclusions and publish their results must take this additional step.\n<h4>Indices<\/h4>\nAn [pb_glossary id=\"576\"]<strong>index<\/strong>[\/pb_glossary] is a composite score derived from aggregating measures of multiple concepts (called components) using a set of rules and formulas. It is different from a scale. Scales also aggregate measures; however, these measures examine different dimensions <em>or<\/em> the same dimension of a single construct. A well-known example of an index is the <a href=\"https:\/\/www.bls.gov\/cpi\/\">consumer price index<\/a> (CPI), which is computed every month by the Bureau of Labor Statistics of the U.S. Department of Labor. The CPI is a measure of how much consumers have to pay for goods and services (in general) and is divided into eight major categories (food and beverages, housing, apparel, transportation, healthcare, recreation, education and communication, and \u201cother goods and services\u201d), which are further subdivided into more than 200 smaller items. Each month, government employees call all over the country to get the current prices of more than 80,000 items. Using a complicated weighting scheme that takes into account the location and probability of purchase for each item, analysts then combine these prices into an overall index score using a series of formulas and rules.\n\nAnother example of an index is the <a href=\"https:\/\/usa.ipums.org\/usa-action\/variables\/SEI#description_section\">Duncan Socioeconomic Index<\/a> (SEI). This index is used to quantify a person's socioeconomic status (SES) and is a combination of three concepts: income, education, and occupation. Income is measured in dollars, education in years or degrees achieved, and occupation is classified into categories or levels by status. These very different measures are combined to create an overall SES index score. However, SES index measurement has generated a lot of controversy and disagreement among researchers and may not easily generalize from nation to nation. For a discussion of SES in Canada, check out <a href=\"https:\/\/journals.sfu.ca\/ijepl\/index.php\/ijepl\/article\/view\/858\">Measures of Socio-Economic Status in Educational Research: The Canadian Context<\/a>.\n<div class=\"textbox\">Here is a resource where you can read a&nbsp;<a href=\"https:\/\/usa.ipums.org\/usa\/chapter4\/sei_note.shtml\">summary of the Socio-Economic Index debate.<\/a><\/div>\nThe process of creating an index is similar to that of a scale. First, conceptualize (define) the index and its constituent components. Though this appears simple, there may be a lot of disagreement on what components (concepts\/constructs) should be included or excluded from an index. For instance, in the SES index, isn\u2019t income correlated with education and occupation? And if so, should we include one component only or all three components? Reviewing the literature, using theories, and\/or interviewing experts or key stakeholders may help resolve this issue. Second, operationalize and measure each component. For instance, how will you categorize occupations, particularly since some occupations may have changed with time (e.g., there were no Web developers before the Internet)? As we will see in step three below, researchers must create a rule or formula for calculating the index score. Again, this process may involve a lot of subjectivity, so validating the index score using existing or new data is important.\n\nScale and index development are often taught in their own course in doctoral education, so it is unreasonable for you to expect to develop a consistently accurate measure within the span of a week or two. Using available indices and scales is recommended for this reason.\n<h4>Differences between scales and indices<\/h4>\nThough indices and scales yield a single numerical score or value representing a concept of interest, they are different in many ways. First, indices often comprise components that are very different from each other (e.g., income, education, and occupation in the SES index) and are measured in different ways. Conversely, scales typically involve a set of similar items that use the same rating scale (such as a five-point Likert scale about customer satisfaction).\n\nSecond, indices often combine objectively measurable values such as prices or income, while scales are designed to assess subjective or judgmental constructs such as attitude, prejudice, or self-esteem. Some argue that the sophistication of the scaling methodology makes scales different from indexes, while others suggest that indexing methodology can be equally sophisticated. Nevertheless, indexes and scales are both essential tools in social science research.\n\nScales and indices seem like clean, convenient ways to measure different phenomena in social science, but just like with a lot of research, we have to be mindful of the assumptions and biases underneath. What if a scale or an index was developed using only White women as research participants? Is it going to be useful for other groups? It very well might be, but when using a scale or index on a group for whom it hasn't been tested, it will be very important to evaluate the validity and reliability of the instrument, which we address in the rest of the chapter.\n\nFinally, it's important to note that while scales and indices are often made up of nominal or ordinal variables, when we analyze them into composite scores, we will treat them as interval\/ratio variables.\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<ul>\n \t<li>Look back to your work from the previous section, are your variables unidimensional or multidimensional?<\/li>\n \t<li>Describe the specific measures you will use (actual questions and response options you will use with participants) for each variable in your research question.<\/li>\n \t<li>If you are using a measure developed by another researcher but do not have all of the questions, response options, and instructions needed to implement it, put it on your to-do list to get them.<\/li>\n<\/ul>\n<\/div>\n&nbsp;\n\n[caption id=\"attachment_130\" align=\"aligncenter\" width=\"1024\"]<img class=\"wp-image-4178 size-large\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/mockup-graphics-i1iqQRLULlg-unsplash-scaled-1.jpg\" alt=\"\" width=\"1024\" height=\"683\"> If we were operationalizing blood pressure, the cuff and reader would be the measure...but how do we interpret what is high, low, and normal blood pressure?[\/caption]\n<h3>Step 3: How you will interpret your measures<\/h3>\nThe final stage of operationalization involves setting the rules for how the measure works and how the researcher should interpret the results. Sometimes, interpreting a measure can be incredibly easy. If you ask someone their age, you\u2019ll probably interpret the results by noting the raw number (e.g., 22) someone provides and that it is lower or higher than other people's ages. However, you could also recode that person into age categories (e.g., under 25, 20-29-years-old, generation Z, etc.). Even scales may be simple to interpret. If there is a scale of problem behaviors, one might simply add up the number of behaviors checked off\u2013with a range from 1-5 indicating low risk of delinquent behavior, 6-10 indicating the student is moderate risk, etc. How you choose to interpret your measures should be guided by how they were designed, how you conceptualize your variables, the data sources you used, and your plan for analyzing your data statistically. Whatever measure you use, you need a set of rules for how to take any valid answer a respondent provides to your measure and interpret it in terms of the variable being measured.\n\nFor more complicated measures like scales, refer to the information provided by the author for how to interpret the scale. If you can\u2019t find enough information from the scale\u2019s creator, look at how the results of that scale are reported in the results section of research articles.\n\nOne common mistake I see often is that students will introduce another variable into their operational definition. This is incorrect. Your operational definition should mention only one variable\u2014the variable being defined. While your study will certainly draw conclusions about the relationships between variables, that's not what operationalization is. Operationalization specifies what instrument you will use to measure your variable and how you plan to interpret the data collected using that measure.\n\nOperationalization is probably the trickiest component of basic research methods, so please don\u2019t get frustrated if it takes a few drafts and a lot of feedback to get to a workable definition. At the time of this writing, the books original author was in the process of operationalizing the concept of \u201cattitudes towards research methods.\u201d Originally, he thought that he could gauge students\u2019 attitudes toward research methods by looking at their end-of-semester course evaluations. As he became aware of the potential methodological issues with student course evaluations, he opted to use focus groups of students to measure their common beliefs about research. You may recall some of these opinions from <a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/1-science-and-social-work\/\">Chapter 1<\/a>, such as the common beliefs that research is boring, useless, and too difficult. After the focus group, he created a scale based on the opinions he gathered, and he plans to pilot test it with another group of students. After the pilot test, he expects that he will have to revise the scale again before he can implement the measure in a real research project.\n<div class=\"textbox key-takeaways\">\n<h3>Key Takeaways<\/h3>\n<ul>\n \t<li>Operationalization involves spelling out precisely how a concept will be measured.<\/li>\n \t<li>Operational definitions must include the variable, the measure, and how you plan to interpret the measure.<\/li>\n \t<li>There are four different levels of measurement: nominal, ordinal, interval, and ratio (in increasing order of specificity).<\/li>\n \t<li>Scales and indices are common ways to collect information and involve using multiple indicators in measurement.<\/li>\n \t<li>A key difference between a scale and an index is that a scale contains multiple indicators for one concept, whereas an indicator examines multiple concepts (components).<\/li>\n \t<li>Using scales developed and refined by other researchers can improve the rigor of a quantitative study.<\/li>\n<\/ul>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\nUse the research question that you developed in the previous chapters and find a related scale or index that researchers have used. If you have trouble finding the exact phenomenon you want to study, get as close as you can.\n<ul>\n \t<li>What is the level of measurement for each item on each tool? Take a second and think about why the tool's creator decided to include these levels of measurement. Identify any levels of measurement you would change and why.<\/li>\n \t<li>If these tools don't exist for what you are interested in studying, why do you think that is?<a id=\"11.3\"><\/a><\/li>\n<\/ul>\n<\/div>\n<h1>11.3 Measurement quality<\/h1>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\nLearners will be able to...\n<ul>\n \t<li>Define and describe the types of validity and reliability<\/li>\n \t<li>Assess for systematic error<\/li>\n<\/ul>\n<\/div>\nThe previous chapter provided insight into measuring concepts in social science research. We discussed the importance of identifying concepts and their corresponding indicators as a way to help us operationalize them. In essence, we now understand that when we think about our measurement process, we must be intentional and thoughtful in the choices that we make. This section is all about how to judge the quality of the measures you've chosen for the key variables in your research question.\n<h2><strong><span style=\"color: #ff0000\">--&gt;Reliability&nbsp;and Validity: Really Important Sections&lt;--<\/span><\/strong><\/h2>\n(If I could make it flash, I would)\n<h2>Reliability<\/h2>\nFirst, let\u2019s say we\u2019ve decided to measure alcoholism by asking people to respond to the following question: Have you ever had a problem with alcohol? If we measure alcoholism this way, then it is likely that anyone who identifies as an alcoholic would respond \u201cyes.\u201d This may seem like a good way to identify our group of interest, but think about how you and your peer group may respond to this question. Would participants respond differently after a wild night out, compared to any other night? Could an infrequent drinker\u2019s current headache from last night\u2019s glass of wine influence how they answer the question this morning? How would that same person respond to the question before consuming the wine? In each cases, the same person might respond differently to the same question at different points, so it is possible that our measure of alcoholism has a reliability problem.&nbsp;<strong>[pb_glossary id=\"589\"]Reliability[\/pb_glossary]<\/strong>&nbsp;in measurement is about consistency.\n\nOne common problem of reliability with social scientific measures is memory. If we ask research participants to recall some aspect of their own past behavior, we should try to make the recollection process as simple and straightforward for them as possible. Sticking with the topic of alcohol intake, if we ask respondents how much wine, beer, and liquor they\u2019ve consumed each day over the course of the past 3 months, how likely are we to get accurate responses? Unless a person keeps a journal documenting their intake, there will very likely be some inaccuracies in their responses. On the other hand, we might get more accurate responses if we ask a participant how many drinks of any kind they have consumed in the past week.\n\nReliability can be an issue even when we\u2019re not reliant on others to accurately report their behaviors. Perhaps a researcher is interested in observing how alcohol intake influences interactions in public locations. They may decide to conduct observations at a local pub by noting how many drinks patrons consume and how their behavior changes as their intake changes. What if the researcher has to use the restroom, and the patron next to them takes three shots of tequila during the brief period the researcher is away from their seat? The reliability of this researcher\u2019s measure of alcohol intake depends on their ability to physically observe every instance of patrons consuming drinks. If they are unlikely to be able to observe every such instance, then perhaps their mechanism for measuring this concept is not reliable.\n\nThe following subsections describe the types of reliability that are important for you to know about, but keep in mind that you may see other approaches to judging reliability mentioned in the empirical literature.\n<h3><b><\/b>Test-retest reliability<\/h3>\nWhen researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time. [pb_glossary id=\"653\"]<strong>Test-retest reliability<\/strong>[\/pb_glossary] is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent (Whoops... pro-tip. Did you know the human race has been getting smarter over the past century[footnote]Trahan, L. H., Stuebing, K. K., Fletcher, J. M., &amp; Hiscock, M. (2014). The Flynn effect: A meta-analysis. Psychological Bulletin, 140(5), 1332\u20131360. <a href=\"https:\/\/doi.org\/10.1037\/a0037173\">https:\/\/doi.org\/10.1037\/a0037173<\/a>[\/footnote]?).\n\nAssessing test-retest reliability requires using the measure on a group of people at one time, using it again on the&nbsp;<em>same<\/em> group of people at a later time. Unlike an experiment, you aren't giving participants an intervention but trying to establish a reliable baseline of the variable you are measuring. Once you have these two measurements, you then look at the correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing the correlation coefficient. Figure 11.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. The correlation coefficient for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.\n\n&nbsp;\n\n[caption id=\"attachment_130\" align=\"aligncenter\" width=\"902\"]<img class=\"wp-image-3152 size-full\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/5.2.png\" alt=\"A scatterplot with scores at time 1 on the x-axis and scores at time 2 on the y-axis, both ranging from 0 to 30. The dots on the scatter plot indicate a strong, positive correlation.\" width=\"902\" height=\"448\"> Figure 11.2 Test-retest correlation between two sets of scores of several college students on the Rosenberg Self-Esteem Scale, given two times a week apart[\/caption]\n<figure id=\"attachment_318\" class=\"wp-caption aligncenter\" aria-describedby=\"caption-attachment-318\"><\/figure>\nAgain, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.\n<h3><b><\/b>Internal consistency<\/h3>\nAnother kind of reliability is [pb_glossary id=\"725\"]<strong>internal consistency<\/strong>[\/pb_glossary], which is the consistency of people\u2019s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people\u2019s scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that they have a number of good qualities. If people\u2019s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioral and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants\u2019 bets were consistently high or low across trials. A specific statistical test known as Cronbach\u2019s Alpha provides a way to measure how well each question of a scale is related to the others.\n<h3><b><\/b>Interrater reliability<\/h3>\nMany behavioral measures involve significant judgment on the part of an observer or a rater. [pb_glossary id=\"649\"]<strong>Interrater reliability<\/strong>[\/pb_glossary] is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students\u2019 social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student\u2019s level of social skills. To the extent that each participant does, in fact, have some level of social skills that can be detected by an attentive observer, different observers\u2019 ratings should be highly correlated with each other.\n\n&nbsp;\n\n<img class=\"aligncenter size-large wp-image-127\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920-1024x683.jpg\" alt=\"\" width=\"1024\" height=\"683\">\n<h2>Validity<\/h2>\n[pb_glossary id=\"590\"]<strong>Validity<\/strong>[\/pb_glossary], another key element of assessing measurement quality, is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account\u2014reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to. There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever. For example, think about a math test of story problems designed to evaluate addition skill. If the story problems were written at the fifth grade reading level, but given to a first grade class, they would be reliable (students would consistently fail, all the time) but not valid (you wouldn't get an accurate understanding of the student's mathematical ability).\n\nDiscussions of validity usually divide it into several distinct \u201ctypes.\u201d But a good way to interpret these types is that they are other kinds of evidence\u2014in addition to reliability\u2014that should be taken into account when judging the validity of a measure.\n<h3><b><\/b>Face validity<\/h3>\n[pb_glossary id=\"643\"]<strong>Face validity<\/strong>[\/pb_glossary] is the extent to which a measurement method appears \u201con its face\u201d to measure the construct of interest. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. So a questionnaire that included these kinds of items would have good face validity. Although face validity can be assessed quantitatively\u2014for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to\u2014it is usually assessed informally.\n\nFace validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people\u2019s intuitions about human behavior, which are frequently wrong. Math teachers might look at our test of story problems and see them as measuring addition skills, yet not realize the story problems are all written using language that is too complex for first grade students to grasp.\n<h3><b><\/b>Content validity<\/h3>\n[pb_glossary id=\"644\"]<strong>Content validity<\/strong>[\/pb_glossary] is the extent to which a measure \u201ccovers\u201d the construct of interest. For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that they think positive thoughts about exercising, feels good about exercising, and actually exercises. So to have good content validity, a measure of people\u2019s attitudes toward exercise would have to reflect all three of these aspects. Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.\n<h3><b><\/b>Criterion validity<\/h3>\n[pb_glossary id=\"647\"]<strong>Criterion validity<\/strong>[\/pb_glossary] is the extent to which people\u2019s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with. For example, people\u2019s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam. If it were found that people\u2019s scores were in fact negatively correlated with their exam performance, then this would be a piece of evidence that these scores really represent people\u2019s test anxiety. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.\n\nA criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. For example, one would expect test anxiety scores to be negatively correlated with exam performance and course grades and positively correlated with general anxiety and with blood pressure during an exam. Or imagine that a researcher develops a new measure of physical risk taking. People\u2019s scores on this measure should be correlated with their participation in \u201cextreme\u201d activities such as snowboarding and rock climbing, the number of speeding tickets they have received, and even the number of broken bones they have had over the years. When the criterion is measured at the same time as the construct, criterion validity is referred to as [pb_glossary id=\"646\"]<strong>concurrent validity<\/strong>[\/pb_glossary]; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as [pb_glossary id=\"645\"]<strong>predictive validity<\/strong>[\/pb_glossary] (because scores on the measure have \u201cpredicted\u201d a future outcome).\n<h3>Discriminant validity<\/h3>\n[pb_glossary id=\"726\"]<strong>Discriminant validity<\/strong>[\/pb_glossary], on the other hand, is the extent to which scores on a measure are <em>not<\/em>&nbsp;correlated with measures of variables that are conceptually distinct. For example, self-esteem is a general attitude toward the self that is fairly stable over time. It is not the same as mood, which is how good or bad one happens to be feeling right now. So people\u2019s scores on a new measure of self-esteem should not be very highly correlated with their moods. If the new measure of self-esteem were highly correlated with a measure of mood, it could be argued that the new measure is not really measuring self-esteem; it is measuring mood instead.\n<h2>Increasing the reliability and validity of measures<\/h2>\nWe have reviewed the types of errors and how to evaluate our measures based on reliability and validity considerations. However, what can we do while selecting or creating our tool so that we minimize the potential of errors? Many of our options were covered in our discussion about reliability and validity. Nevertheless, the following table provides a quick summary of things that you should do when creating or selecting a measurement tool. While not all of these will be feasible in your project, it is important to include easy-to-implement measures in your research context.\n\nMake sure that you engage in a rigorous literature review so that you understand the concept that you are studying. This means understanding the different ways that your concept may manifest itself. This review should include a search for existing instruments.[footnote]Sullivan G. M. (2011). A primer on the validity of assessment instruments. <em>Journal of graduate medical education, 3<\/em>(2), 119\u2013120. doi:10.4300\/JGME-D-11-00075.1[\/footnote]\n<ul>\n \t<li>Do you understand all the dimensions of your concept? Do you have a good understanding of the content dimensions of your concept(s)?<\/li>\n \t<li>What instruments exist? How many items are on the existing instruments? Are these instruments appropriate for your population?<\/li>\n \t<li>Are these instruments standardized? Note: If an instrument is standardized, that means it has been rigorously studied and tested.<\/li>\n<\/ul>\nConsult content experts to review your instrument. This is a good way to check the face validity of your items. Additionally, content experts can also help you understand the content validity.[footnote]Sullivan G. M. (2011). A primer on the validity of assessment instruments. <em>Journal of graduate medical education, 3<\/em>(2), 119\u2013120. doi:10.4300\/JGME-D-11-00075.1[\/footnote]\n<ul>\n \t<li>Do you have access to a reasonable number of content experts? If not, how can you locate them?<\/li>\n \t<li>Did you provide a list of critical questions for your content reviewers to use in the reviewing process?<\/li>\n<\/ul>\nPilot test your instrument on a sufficient number of people and get detailed feedback.[footnote]Engel, R. &amp; Schutt, R. (2013). <em>The practice of research in social work (3rd. ed.)<\/em>. Thousand Oaks, CA: SAGE.[\/footnote] Ask your group to provide feedback on the wording and clarity of items. Keep detailed notes and make adjustments BEFORE you administer your final tool.\n<ul>\n \t<li>How many people will you use in your pilot testing?<\/li>\n \t<li>How will you set up your pilot testing so that it mimics the actual process of administering your tool?<\/li>\n \t<li>How will you receive feedback from your pilot testing group? Have you provided a list of questions for your group to think about?<\/li>\n<\/ul>\nProvide training for anyone collecting data for your project.[footnote]Engel, R. &amp; Schutt, R. (2013). <em>The practice of research in social work (3rd. ed.)<\/em>. Thousand Oaks, CA: SAGE.[\/footnote] You should provide those helping you with a written research protocol that explains all of the steps of the project. You should also problem solve and answer any questions that those helping you may have. This will increase the chances that your tool will be administered in a consistent manner.\n<ul>\n \t<li>How will you conduct your orientation\/training? How long will it be? What modality?<\/li>\n \t<li>How will you select those who will administer your tool? What qualifications do they need?<\/li>\n<\/ul>\nWhen thinking of items, use a higher level of measurement, if possible.[footnote]Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE.[\/footnote] This will provide more information and you can always downgrade to a lower level of measurement later.\n<ul>\n \t<li>Have you examined your items and the levels of measurement?<\/li>\n \t<li>Have you thought about whether you need to modify the type of data you are collecting? Specifically, are you asking for information that is too specific (at a higher level of measurement) which may reduce participants' willingness to participate?<\/li>\n<\/ul>\nUse multiple indicators for a variable.[footnote]Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE.[\/footnote] Think about the number of items that you will include in your tool.\n<ul>\n \t<li>Do you have enough items? Enough indicators? The correct indicators?<\/li>\n<\/ul>\nConduct an item-by-item assessment of multiple-item measures.[footnote]Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE.[\/footnote] When you do this assessment, think about each word and how it changes the meaning of your item.\n<ul>\n \t<li>Are there items that are redundant? Do you need to modify, delete, or add items?<\/li>\n<\/ul>\n&nbsp;\n\n<img class=\"aligncenter size-large wp-image-128\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920-1024x767.jpg\" alt=\"\" width=\"1024\" height=\"767\">\n<h2>Types of error<\/h2>\nAs you can see, measures never perfectly describe what exists in the real world. Good measures demonstrate validity and reliability but will always have some degree of error. <strong>[pb_glossary id=\"382\"]Systematic error[\/pb_glossary]<\/strong> (also called bias) causes our measures to consistently output incorrect data in one direction or another on a measure, usually due to an identifiable process. Imagine you created a measure of height, but you didn\u2019t put an option for anyone over six feet tall. If you gave that measure to your local college or university, some of the taller students might not be measured accurately. In fact, you would be under the mistaken impression that the tallest person at your school was six feet tall, when in actuality there are likely people taller than six feet at your school. This error seems innocent, but if you were using that measure to help you build a new building, those people might hit their heads!\n\nA less innocent form of error arises when researchers word questions in a way that might cause participants to think one answer choice is preferable to another. For example, if I were to ask you \u201cDo you think global warming is caused by human activity?\u201d you would probably feel comfortable answering honestly. But what if I asked you \u201cDo you agree with 99% of scientists that global warming is caused by human activity?\u201d Would you feel comfortable saying no, if that\u2019s what you honestly felt? I doubt it. That is an example of a&nbsp;<strong>[pb_glossary id=\"727\"]leading question[\/pb_glossary]<\/strong>, a question with wording that influences how a participant responds. We\u2019ll discuss leading questions and other problems in question wording in greater detail in <a href=\"https:\/\/pressbooks.rampages.us\/msw-research\/chapter\/12-survey-design\/\">Chapter 12<\/a>.\n\nIn addition to error created by the researcher, your participants can cause error in measurement. Some people will respond without fully understanding a question, particularly if the question is worded in a confusing way. Let\u2019s consider another potential source or error. If we asked people if they always washed their hands after using the bathroom, would we expect people to be perfectly honest? Polling people about whether they wash their hands after using the bathroom might only elicit what people would like others to think they do, rather than what they actually do. This is an example of&nbsp;<strong>[pb_glossary id=\"343\"]social desirability bias[\/pb_glossary]<\/strong>, in which participants in a research study want to present themselves in a positive, socially desirable way to the researcher. People in your study will want to seem tolerant, open-minded, and intelligent, but their true feelings may be closed-minded, simple, and biased. Participants may lie in this situation. This occurs often in political polling, which may show greater support for a candidate from a minority race, gender, or political party than actually exists in the electorate.\n\nA related form of bias is called&nbsp;<strong>[pb_glossary id=\"728\"]acquiescence bias[\/pb_glossary]<\/strong>, also known as \u201cyea-saying.\u201d It occurs when people say yes to whatever the researcher asks, even when doing so contradicts previous answers. For example, a person might say yes to both \u201cI am a confident leader in group discussions\u201d and \u201cI feel anxious interacting in group discussions.\u201d Those two responses are unlikely to both be true for the same person. Why would someone do this? Similar to social desirability, people want to be agreeable and nice to the researcher asking them questions or they might ignore contradictory feelings when responding to each question. You could interpret this as someone saying \"yeah, I guess.\" Respondents may also act on cultural reasons, trying to \u201csave face\u201d for themselves or the person asking the questions. Regardless of the reason, the results of your measure don\u2019t match what the person truly feels.\n\nSo far, we have discussed sources of error that come from choices made by respondents or researchers. Systematic errors will result in responses that are incorrect in one direction or another. For example, social desirability bias usually means that the number of people who <em>say<\/em>&nbsp;they will vote for a third party in an election is greater than the number of people who actually vote for that candidate. Systematic errors such as these can be reduced, but random error can never be eliminated. Unlike systematic error, which biases responses consistently in one direction or another,&nbsp;<strong>[pb_glossary id=\"378\"]random error[\/pb_glossary]<\/strong>&nbsp;is unpredictable and does not consistently result in scores that are consistently higher or lower on a given measure. Instead, random error is more like statistical noise, which will likely average out across participants.\n\nRandom error is present in any measurement. If you\u2019ve ever stepped on a bathroom scale twice and gotten two slightly different results, maybe a difference of a tenth of a pound, then you\u2019ve experienced random error. Maybe you were standing slightly differently or had a fraction of your foot off of the scale the first time. If you were to take enough measures of your weight on the same scale, you\u2019d be able to figure out your true weight. In social science, if you gave someone a scale measuring motivation on a day after they lost their job, they would likely score differently than if they had just gotten a promotion and a raise. Thus, social scientists speak with humility about our measures. We are reasonably confident that what we found is true, but we must always acknowledge that our measures are only an approximation of reality.\n\nHumility is important in scientific measurement, as errors can have real consequences. At the time I'm writing this, I tested positive for COVID. Like most people, I used a home test from the pharmacy. If the test said I was &nbsp;positive when I was not, that would be a <strong>[pb_glossary id=\"381\"]false positive[\/pb_glossary]<\/strong>. On the other hand, if the test indicated that I was not positive when I was in fact ill, that would be a&nbsp;<strong>[pb_glossary id=\"380\"]false negative[\/pb_glossary]<\/strong>. Even if the test is 99% accurate, that means that one in a hundred testers will get an erroneous result when they use the test. For me, a false negative would have been a relief, then devastating when I found out I was ill. A false positive would have been worrisome at first and then quite a relief when I discovered I wasn't sick with COVID. While both false positives and false negatives are not very likely for home COVID tests (when taken correctly), measurement error can have consequences for the people being measured.\n<div class=\"textbox key-takeaways\">\n<h3>Key Takeaways<\/h3>\n<ul>\n \t<li>Reliability is a matter of consistency.<\/li>\n \t<li>Validity is a matter of accuracy.<\/li>\n \t<li>There are many types of validity and reliability.<\/li>\n \t<li>Systematic error may arise from the researcher, participant, or measurement instrument.<\/li>\n \t<li>Systematic error biases results in a particular direction, whereas random error can be in any direction.<\/li>\n \t<li>All measures are prone to error and should interpreted with humility.<\/li>\n<\/ul>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\nUse the measurement tools you located in the previous exercise. Evaluate the reliability and validity of these tools. Hint: You will need to go into the literature to \"research\" these tools.\n<ul>\n \t<li>Provide a clear statement regarding the reliability and validity of these tools. What strengths did you notice? What were the limitations?<\/li>\n \t<li>Think about your [pb_glossary id=\"621\"]<strong>target population<\/strong>[\/pb_glossary]. Are there changes that need to be made in order for one of these tools to be appropriate for your population?<\/li>\n \t<li>If you decide to create your own tool, how will you assess its validity and reliability?<a id=\"11.4\"><\/a><\/li>\n<\/ul>\n<\/div>\n<h1>11.4 Ethical and social justice considerations<\/h1>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\nLearners will be able to...\n<ul>\n \t<li>Identify potential cultural, ethical, and social justice issues in measurement.<\/li>\n<\/ul>\n<\/div>\nWith your variables operationalized, it's time to take a step back and look at how measurement in social science impact our daily lives. As we will see, how we measure things is both shaped by power arrangements inside our society, and more insidiously, by establishing what is scientifically true, measures have their own power to influence the world. Just like reification in the conceptual world, how we operationally define concepts can reinforce or fight against oppressive forces.\n\n&nbsp;\n\n<img class=\"aligncenter size-large wp-image-4181\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/mitchell-griest-ImgBdiGAl4c-unsplash-scaled-1.jpg\" alt=\"\" width=\"1024\" height=\"781\">\n<h2>Data equity<\/h2>\nHow we decide to measure our variables determines what kind of data we end up with in our research project. Because scientific processes are a part of our sociocultural context, the same biases and oppressions we see in the real world can be manifested or even magnified in research data. Jagadish and colleagues (2021)[footnote]Jagadish, H. V., Stoyanovich, J., &amp; Howe, B. (2021). COVID-19 Brings Data Equity Challenges to the Fore. <i>Digital Government: Research and Practice<\/i>,&nbsp;<i>2<\/i>(2), 1-7.[\/footnote] presents four dimensions of data equity that are relevant to consider: in representation of non-dominant groups within data sets; in how data is collected, analyzed, and combined across datasets; in equitable and participatory access to data, and finally in the outcomes associated with the data collection. Historically, we have mostly focused on the outcomes of measures producing outcomes that are biased in one way or another, and this section reviews many such examples. However, it is important to note that equity must also come from designing measures that respond to questions like:\n<ol>\n \t<li>Are groups historically suppressed from the data record represented in the sample?<\/li>\n \t<li>Are equity data gathered by researchers and used to uncover and quantify inequity?<\/li>\n \t<li>Are the data accessible across domains and levels of expertise, and can community members participate in the design, collection, and analysis of the public data record?<\/li>\n \t<li>Are the data collected used to monitor and mitigate inequitable impacts?<\/li>\n<\/ol>\nSo, it's not just about whether measures work for one population for another. Data equity is about the context in which data are created from how we measure people and things. We agree with these authors that data equity should be considered within the context of automated decision-making systems and recognizing a broader literature around the role of administrative systems in creating and reinforcing discrimination. To combat the inequitable processes and outcomes we describe below, researchers must foreground equity as a core component of measurement.\n<h2>Flawed measures &amp; missing measures<\/h2>\nAt the end of every semester, students in just about every university classroom in North America complete similar student evaluations of teaching (SETs). Since every student is likely familiar with these, we can recognize many of the concepts we discussed in the previous sections. There are number of rating scale questions that ask you to rate the professor, class, and teaching effectiveness on a scale of 1-5. Scores are averaged across students and used to determine the quality of teaching delivered by the faculty member. SETs scores are often a principle component of how faculty are reappointed to teaching positions. Would it surprise you to learn that student evaluations of teaching are of questionable quality? If your instructors are assessed with a biased or incomplete measure, how might that impact your education?\n\nMost often, student scores are averaged across questions and reported as a final average. This average is used as one factor, often the most important factor, in a faculty member's reappointment to teaching roles. We learned in this chapter that rating scales are ordinal, not interval or ratio, and the data are categories not numbers. Although rating scales use a familiar 1-5 scale, the numbers 1, 2, 3, 4, &amp; 5 are really just helpful labels for categories like \"excellent\" or \"strongly agree.\" If we relabeled these categories as letters (A-E) rather than as numbers (1-5), how would you average them?\n\nAveraging ordinal data is methodologically dubious, as the numbers are merely a useful convention. As you will learn in <a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/14-univariate-analysis\/\">Chapter 14<\/a>, taking the [pb_glossary id=\"393\"]<strong>median<\/strong>[\/pb_glossary] value is what makes the most sense with ordinal data. Median values are also less sensitive to outliers. So, a single student who has strong negative or positive feelings towards the professor could bias the class's SETs scores higher or lower than what the \"average\" student in the class would say, particularly for classes with few students or in which fewer students completed evaluations of their teachers.\n\nWe care about teaching quality because more effective teachers will produce more knowledgeable and capable students. However, student evaluations of teaching are not particularly good indicators of teaching quality and are not associated with the independently measured learning gains of students (i.e., test scores, final grades) (Uttl et al., 2017).[footnote]Uttl, B., White, C. A., &amp; Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. <i>Studies in Educational Evaluation<\/i>,&nbsp;<i>54<\/i>, 22-42.[\/footnote] This speaks to the lack of criterion validity. Higher teaching quality should be associated with better learning outcomes for students, but across multiple studies stretching back years, there is no association that cannot be better explained by other factors. To be fair, there are scholars who find that SETs are valid and reliable. For a thorough <a href=\"https:\/\/www.academia.edu\/31896041\/Student_Ratings_of_Instruction_in_College_and_University_Courses\">defense of SETs as well as a historical summary of the literature<\/a> see Benton &amp; Cashin (2012).[footnote]Benton, S. L., &amp; Cashin, W. E. (2014). Student ratings of instruction in college and university courses. In <i>Higher education: Handbook of theory and research<\/i>&nbsp;(pp. 279-326). Springer, Dordrecht.[\/footnote]\n\nEven though student evaluations of teaching often contain dozens of questions, researchers often find that the questions are so highly interrelated that one concept (or factor, as it is called in a <a href=\"https:\/\/stats.idre.ucla.edu\/spss\/seminars\/introduction-to-factor-analysis\/a-practical-introduction-to-factor-analysis\/\">factor analysis<\/a>) explains a large portion of the variance in teachers' scores on student evaluations (Clayson, 2018).[footnote]Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability.&nbsp;<i>Assessment &amp; Evaluation in Higher Education<\/i>,&nbsp;<i>43<\/i>(4), 666-681.[\/footnote] Personally, I believe based on completing SETs myself that factor is probably best conceptualized as student satisfaction, which is obviously worthwhile to measure, but is conceptually quite different from teaching effectiveness or whether a course achieved its intended outcomes. The lack of a clear operational and conceptual definition for the variable or variables being measured in student evaluations of teaching also speaks to a lack of content validity. Researchers check content validity by comparing the measurement method with the conceptual definition, but without a clear conceptual definition of the concept measured by student evaluations of teaching, it's not clear how we can know our measure is valid. Indeed, the lack of clarity around what is being measured in teaching evaluations impairs students' ability to provide reliable and valid evaluations. So, while many researchers argue that the class average SETs scores are reliable in that they are consistent over time and across classes, it is unclear what exactly is being measured even if it is consistent (Clayson, 2018).[footnote]Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability. <i>Assessment &amp; Evaluation in Higher Education<\/i>,&nbsp;<i>43<\/i>(4), 666-681.[\/footnote]\n\nAs a faculty member, there are a number of things I can do to influence my evaluations and disrupt validity and reliability. Since SETs scores are associated with the grades students perceive they will receive (e.g., Boring et al., 2016),[footnote]Boring, A., Ottoboni, K., &amp; Stark, P. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness.&nbsp;<i>ScienceOpen Research<\/i>.[\/footnote] guaranteeing everyone a final grade of A in my class will likely increase my SETs scores and my chances at tenure and promotion. I could time an email reminder to complete SETs with releasing high grades for a major assignment to boost my evaluation scores. On the other hand, student evaluations might be coincidentally timed with poor grades or difficult assignments that will bias student evaluations downward. Students may also infer I am manipulating them and give me lower SET scores as a result. To maximize my SET scores and chances and promotion, I also need to select which courses I teach carefully. Classes that are more quantitatively oriented generally receive lower ratings than more qualitative and humanities-driven classes, which makes my decision to teach social work research a poor strategy (Uttl &amp; Smibert, 2017).[footnote]Uttl, B., &amp; Smibert, D. (2017). Student evaluations of teaching: teaching quantitative courses can be hazardous to one\u2019s career. <i>Peer Journal<\/i>,&nbsp;<i>5<\/i>, e3299.[\/footnote] The only manipulative strategy I will admit to using is bringing food (usually cookies or donuts) to class during the period in which students are completing evaluations. <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/29956364\/\">Measurement is impacted by context<\/a>&nbsp;(cookies get me better scores!).\n\nAs a white cis-gender male educator, I am adversely impacted by SETs because of their sketchy validity, reliability, and methodology. The other flaws with student evaluations actually help me while disadvantaging teachers from oppressed groups. <a href=\"https:\/\/www.researchgate.net\/profile\/Troy-Heffernan\/publication\/349864729_Sexism_racism_prejudice_and_bias_a_literature_review_and_synthesis_of_research_surrounding_student_evaluations_of_courses_and_teaching\/links\/6046e75492851c077f27d53f\/Sexism-racism-prejudice-and-bias-a-literature-review-and-synthesis-of-research-surrounding-student-evaluations-of-courses-and-teaching.pdf\">Heffernan (2021)<\/a>[footnote]Heffernan, T. (2021). Sexism, racism, prejudice, and bias: a literature review and synthesis of research surrounding student evaluations of courses and teaching.&nbsp;<i>Assessment &amp; Evaluation in Higher Education<\/i>, 1-11.[\/footnote] provides a comprehensive overview of the sexism, racism, ableism, and prejudice baked into student evaluations:\n<blockquote>\"In all studies relating to gender, the analyses indicate that the highest scores are awarded in subjects filled with young, white, male students being taught by white English first language speaking, able-bodied, male academics who are neither too young nor too old (approx. 35\u201350 years of age), and who the students believe are heterosexual. Most deviations from this scenario in terms of student and academic demographics equates to lower SET scores. These studies thus highlight that white, able-bodied, heterosexual, men of a certain age are not only the least affected, they benefit from the practice. When every demographic group who does not fit this image is significantly disadvantaged by SETs, these processes serve to further enhance the position of the already privileged\" (p. 5).<\/blockquote>\nThe staggering consistency of studies examining prejudice in SETs has led to some rather superficial reforms like reminding students to not submit racist or sexist responses in the written instructions given before SETs. Yet, even though we know that SETs are systematically biased against women, people of color, and people with disabilities, the overwhelming majority of universities in North America continue to use them to evaluate faculty for promotion or reappointment. From a critical perspective, it is worth considering why university administrators continue to use such a biased and flawed instrument. SETs produce data that make it easy to compare faculty to one another and track faculty members over time. Furthermore, they offer students a direct opportunity to voice their concerns and highlight what went well.\n\nAs the people with the greatest knowledge about what happened in the classroom as whether it met their expectations, providing students with open-ended questions is the most productive part of SETs. There is very rarely student input on the criteria and methodology for teaching evaluations, yet students are the most impacted by helpful or harmful teaching practices.\n\nStudents should fight for better assessment in the classroom because well-designed assessments provide documentation to support more effective teaching practices and discourage unhelpful or discriminatory practices. Flawed assessments like SETs, can lead to a lack of information about problems with courses, instructors, or other aspects of the program. Think critically about what data your program uses to gauge its effectiveness. How might you introduce areas of student concern into how your program evaluates itself? Are there issues with food or housing insecurity, mentorship of nontraditional and first generation students, or other issues that faculty should consider when they evaluate their program? Finally, as you transition into practice, think about how your school measures its impact and how it privileges or excludes student, parent, and community voices in the assessment process.\n<div class=\"textbox\">\n\nWhile writing this section, one of the authors wrote this <a href=\"https:\/\/osf.io\/preprints\/socarxiv\/bgk6n\/\">commentary article<\/a> addressing potential racial bias in social work licensing exams. If you are interested in an example of missing or flawed measures that relates to systems <em>your<\/em> social work practice is governed by (rather than SETs which govern <i>our <\/i>practice in higher education) check it out!\n\nYou may also be interested in similar <a href=\"https:\/\/www.jessestommel.com\/ungrading-an-faq\/\">arguments against the standard grading scale<\/a> (A-F), and why grades (numerical, letter, etc.) do not do a good job of measuring learning. Think critically about the role that grades play in your life as a student, your self-concept, and your relationships with teachers. Your test and grade anxiety is due in part to how your learning is measured. Those measurements end up becoming an official record of your scholarship and allow employers or funders to compare you to other scholars. The stakes for measurement are the same for participants in your research study.\n\n<\/div>\n&nbsp;\n\n<img class=\"aligncenter size-large wp-image-130\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5732103_1280-1024x634.png\" alt=\"\" width=\"1024\" height=\"634\">\n<h2>Self-reflection and measurement<\/h2>\nStudent evaluations of teaching are just like any other measure. How we decide to measure what we are researching is influenced by our backgrounds, including our culture, implicit biases, and individual experiences. For me as a middle-class, cisgender white man, the decisions I make about measurement will probably default to ones that make the most sense to me and others like me, and thus measure characteristics about us most accurately if I don't think carefully about it. There are major implications for research here because this could affect the validity of my measurements for other populations.\n\nThis doesn't mean that standardized scales or indices, for instance, won't work for diverse groups of people. What it means is that researchers must not ignore difference in deciding how to measure a variable in their research. Doing so may serve to push already marginalized people further into the margins of academic research and, consequently, social work intervention. Social work researchers, with our strong orientation toward celebrating difference and working for social justice, are obligated to keep this in mind for ourselves and encourage others to think about it in their research, too.\n\nThis involves reflecting on <em>what<\/em> we are measuring, <em>how<\/em> we are measuring, and <em>why<\/em> we are measuring. Do we have biases that impacted how we operationalized our concepts? Did we include [pb_glossary id=\"308\"]<strong>stakeholders<\/strong>[\/pb_glossary] and [pb_glossary id=\"285\"]<strong>gatekeepers<\/strong>[\/pb_glossary] in the development of our concepts? This can be a way to gain access to vulnerable populations. What feedback did we receive on our measurement process and how was it incorporated into our work? These are all questions we should ask as we are thinking about measurement. Further, engaging in this intentionally reflective process will help us maximize the chances that our measurement will be accurate and as free from bias as possible.\n\nUnfortunately, social science researchers do not do a great job of sharing their measures in a way that allows practitioners and administrators to use them to evaluate the impact of interventions and programs on clients. Few scales are published under an open copyright license that allows other people to view it for free and share it with others. Instead, the best way to find a scale mentioned in an article is often to simply search for it in Google with \".pdf\" or \".docx\" in the query to see if someone posted a copy online (usually in violation of copyright law). As we discussed in <a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/4-critical-information-literacy\/\">Chapter 4<\/a>, this is an issue of information privilege, or the structuring impact of oppression and discrimination on groups' access to and use of scholarly information. As a student at a university with a research library, you can access the Mental Measurement Yearbook to look up scales and indexes that measure client or program outcomes while researchers unaffiliated with university libraries cannot do so. Similarly, the vast majority of scholarship in social work and allied disciplines does not share measures, data, or other research materials openly, a best practice in open and collaborative science. It is important to underscore these structural barriers to using valid and reliable scales in. An invalid or unreliable outcome test may cause ineffective or harmful programs to persist or may worsen existing prejudices and oppressions experienced by students, communities, and practitioners.\n\nBut it's not just about reflecting and identifying problems and biases in our measurement, operationalization, and conceptualization\u2014what are we going to&nbsp;<em>do<\/em> about it? Consider this as you move through this book and become a more critical consumer of research. Sometimes there isn't something you can do in the immediate sense\u2014the literature base at this moment just is what it is. But how does that inform what you will do later?\n<h2>A place to start: Stop oversimplifying race<\/h2>\n<span style=\"text-align: initial;background-color: initial;font-size: 1em\">We will address many more of the critical issues related to measurement in the next chapter. One way to get started in bringing cultural awareness to scientific measurement is through a critical examination of how we analyze race quantitatively. There are many important methodological objections to how we measure the impact of race. We encourage you to watch Dr. Abigail Sewell's three-part workshop series called \"Nested Models for Critical Studies of Race &amp; Racism\" for the Inter-university Consortium for Political and Social Research (ICPSR). She discusses how to operationalize and measure inequality, racism, and intersectionality and critiques researchers' attempts to oversimplify or overlook racism when we measure concepts in social science. If you are interested in developing your social work research skills further, consider applying for financial support from your university to attend an ICPSR summer seminar like Dr. Sewell's where you can receive more advanced and specialized training in using research for social change. <\/span>\n<ul>\n \t<li><a style=\"text-align: initial;background-color: initial;font-size: 1em\" href=\"https:\/\/youtu.be\/04OZ3BFPpVg\">Part 1: Creating Measures of Supraindividual Racism<\/a><span style=\"text-align: initial;background-color: initial;font-size: 1em\"> (2-hour video)<\/span><\/li>\n \t<li><a style=\"text-align: initial;background-color: initial;font-size: 1em\" href=\"https:\/\/youtu.be\/pfcKQ_7O9FE\">Part 2: Evaluating Population Risks of Supraindividual Racism<\/a><span style=\"text-align: initial;background-color: initial;font-size: 1em\"> (2-hour video)<\/span><\/li>\n \t<li><a style=\"text-align: initial;background-color: initial;font-size: 1em\" href=\"https:\/\/www.youtube.com\/watch?v=4OZL7fu2YkI\">Part 3: Quantifying Intersectionality<\/a><span style=\"text-align: initial;background-color: initial;font-size: 1em\"> (2-hour video)<\/span><\/li>\n<\/ul>\n<div class=\"textbox key-takeaways\">\n<h3>Key Takeaways<\/h3>\n<ul>\n \t<li>Researchers must be attentive to personal and institutional biases in the measurement process that affect marginalized groups.<\/li>\n \t<li>What is measured and how it is measured is shaped by power, and educators must be critical and self-reflective in their research projects.<\/li>\n<\/ul>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\nThink about your current research question and the tool(s) that you will use to gather data. Even if you haven't chosen your tools yet, think of some that you have encountered in the literature so far.\n<ul>\n \t<li>How does your positionality and experience shape what variables you are choosing to measure and how you measure them?<\/li>\n \t<li>Evaluate the measures in your study for potential biases.<\/li>\n \t<li>If you are using measures developed by another researcher, investigate whether it is valid and reliable in other studies across cultures.<\/li>\n<\/ul>\n<\/div>\n","rendered":"<div class=\"textbox examples\">\n<h3>Chapter Outline<\/h3>\n<ol>\n<li><a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/11-quantitative-measurement\/#11.1\">Conceptual definitions<\/a> (17 minute read)<\/li>\n<li><a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/11-quantitative-measurement\/#11.2\">Operational definitions<\/a> (36 minute read)<\/li>\n<li><a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/11-quantitative-measurement\/#11.3\">Measurement quality<\/a> (21 minute read)<\/li>\n<li><a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/11-quantitative-measurement\/#11.4\">Ethical and social justice considerations<\/a> (15 minute read)<\/li>\n<\/ol>\n<p>Content warning: examples in this chapter contain references to ethnocentrism, toxic masculinity, racism in science, drug use, mental health and depression, psychiatric inpatient care, poverty and basic needs insecurity, pregnancy, and racism and sexism in the workplace and higher education.<a id=\"11.1\"><\/a><\/p>\n<\/div>\n<h1>11.1 Conceptual definitions<\/h1>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\n<p>Learners will be able to&#8230;<\/p>\n<ul>\n<li>Define measurement and conceptualization<\/li>\n<li>Apply Kaplan\u2019s three categories to determine the complexity of measuring a given variable<\/li>\n<li>Identify the role previous research and theory play in defining concepts<\/li>\n<li>Distinguish between unidimensional and multidimensional concepts<\/li>\n<li>Critically apply reification to how you conceptualize the key variables in your research project<\/li>\n<\/ul>\n<\/div>\n<p>In social science, when we use the term&nbsp;<a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_585\"><strong>measurement<\/strong><\/a>, we mean the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating. At its core, measurement is about defining one\u2019s terms in as clear and precise a way as possible. Of course, measurement in social science isn\u2019t quite as simple as using a measuring cup or spoon, but there are some basic tenets on which most social scientists agree when it comes to measurement. We\u2019ll explore those, as well as some of the ways that measurement might vary depending on your unique approach to the study of your topic.<\/p>\n<p>An important point here is that measurement does not require any particular instruments or procedures. What it does require is a <em>systematic procedure<\/em> for assigning scores, meanings, and descriptions to individuals or objects so that those scores represent the characteristic of interest. You can measure phenomena in many different ways, but you must be sure that how you choose to measure gives you information and data that lets you answer your research question. If you&#8217;re looking for information about a person&#8217;s income, but your main points of measurement have to do with the money they have in the bank, you&#8217;re not really going to find the information you&#8217;re looking for!<\/p>\n<p>The question of what social scientists measure can be answered by asking yourself what social scientists study. Think about the topics you\u2019ve learned about in other classes you\u2019ve taken or the topics you\u2019ve considered investigating yourself. Let\u2019s consider Melissa Milkie and Catharine Warner\u2019s study (2011)<a class=\"footnote\" title=\"Milkie, M. A., &amp; Warner, C. H. (2011). Classroom learning environments and the mental health of first grade children. Journal of Health and Social Behavior, 52, 4\u201322\" id=\"return-footnote-131-1\" href=\"#footnote-131-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a> of first graders\u2019 mental health. In order to conduct that study, Milkie and Warner needed to have some idea about how they were going to measure mental health. What does mental health mean, exactly? And how do we know when we\u2019re observing someone whose mental health is good and when we see someone whose mental health is compromised? Understanding how measurement works in research methods helps us answer these sorts of questions.<\/p>\n<p>As you might have guessed, social scientists will measure just about anything that they have an interest in investigating. For example, those who are interested in learning something about the correlation between social class and levels of happiness must develop some way to measure both social class and happiness. Those who wish to understand how well immigrants cope in their new locations must measure immigrant status and coping. Those who wish to understand how a person\u2019s gender shapes their learning experiences must measure gender and workplace experiences (and get more specific about which experiences are under examination). You get the idea. Social scientists can and do measure just about anything you can imagine observing or wanting to study. Of course, some things are easier to observe or measure than others.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4171\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/jose-martin-ramirez-carrasco-z2tinW7Z6Bw-unsplash-scaled-1.jpg\" alt=\"\" width=\"500\" height=\"750\" \/><\/p>\n<h2>Observing your variables<\/h2>\n<p>In 1964, philosopher Abraham Kaplan (1964)<a class=\"footnote\" title=\"Kaplan, A. (1964). The conduct of inquiry: Methodology for behavioral science. San Francisco, CA: Chandler Publishing Company.\" id=\"return-footnote-131-2\" href=\"#footnote-131-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a> wrote <em>The<\/em>&nbsp;<em>Conduct of Inquiry,&nbsp;<\/em>which has since become a classic work in research methodology (Babbie, 2010).<a class=\"footnote\" title=\"Earl Babbie offers a more detailed discussion of Kaplan\u2019s work in his text. You can read it in: Babbie, E. (2010). The practice of social research (12th ed.). Belmont, CA: Wadsworth.\" id=\"return-footnote-131-3\" href=\"#footnote-131-3\" aria-label=\"Footnote 3\"><sup class=\"footnote\">[3]<\/sup><\/a> In his text, Kaplan describes different categories of things that behavioral scientists observe. One of those categories, which Kaplan called \u201cobservational terms,\u201d is probably the simplest to measure in social science. <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_628\">Observational terms<\/a><\/strong> are the sorts of things that we can see with the naked eye simply by looking at them. Kaplan roughly defines them as conditions that are easy to identify and verify through direct observation. If, for example, we wanted to know how the conditions of playgrounds differ across different neighborhoods, we could directly observe the variety, amount, and condition of equipment at various playgrounds.<\/p>\n<p><strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_641\">Indirect observables<\/a><\/strong>, on the other hand, are less straightforward to assess. In Kaplan&#8217;s framework, they are conditions that are subtle and complex that we must use existing knowledge and intuition to define. If we conducted a study for which we wished to know a person\u2019s income, we\u2019d probably have to ask them their income, perhaps in an interview or a survey. Thus, we have observed income, even if it has only been observed indirectly. Birthplace might be another indirect observable. We can ask study participants where they were born, but chances are good we won\u2019t have directly observed any of those people being born in the locations they report.<\/p>\n<p>Sometimes the measures that we are interested in are more complex and more abstract than observational terms or indirect observables. Think about some of the concepts you\u2019ve learned about in other classes\u2014for example, ethnocentrism. What is ethnocentrism? Well, from completing an earlier class you might know that it has something to do with the way a person judges another\u2019s culture. But how would you&nbsp;<em>measure&nbsp;<\/em>it? Here\u2019s another construct: bureaucracy. We know this term has something to do with organizations and how they operate but measuring such a construct is trickier than measuring something like a person\u2019s income. The theoretical concepts of ethnocentrism and bureaucracy represent ideas whose meanings we have come to agree on. Though we may not be able to observe these abstractions directly, we can observe their components.<\/p>\n<p>Kaplan referred to these more abstract things that behavioral scientists measure as constructs.&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_663\">Constructs<\/a><\/strong>&nbsp;are \u201cnot observational either directly or indirectly\u201d (Kaplan, 1964, p. 55),<a class=\"footnote\" title=\"Kaplan, A. (1964). The conduct of inquiry: Methodology for behavioral science. San Francisco, CA: Chandler Publishing Company.\" id=\"return-footnote-131-4\" href=\"#footnote-131-4\" aria-label=\"Footnote 4\"><sup class=\"footnote\">[4]<\/sup><\/a> but they can be defined based on observables. For example, the construct of bureaucracy could be measured by counting the number of supervisors that need to approve teacher reimbursements of routine personal spending on their classrooms. The greater the number of administrators that must sign off on routine matters, the greater the degree of bureaucracy. Similarly, we might be able to ask a person the degree to which they trust people from different cultures around the world and then assess the ethnocentrism inherent in their answers. We can measure constructs like bureaucracy and ethnocentrism by defining them in terms of what we can observe.<a class=\"footnote\" title=\"In this chapter, we will use the terms concept and construct interchangeably. While each term has a distinct meaning in research conceptualization, we do not believe this distinction is important enough to warrant discussion in this chapter.\" id=\"return-footnote-131-5\" href=\"#footnote-131-5\" aria-label=\"Footnote 5\"><sup class=\"footnote\">[5]<\/sup><\/a><\/p>\n<p>The idea of coming up with your own measurement tool might sound pretty intimidating at this point. The good news is that if you find something in the literature that works for you, you can use it (with proper attribution, of course). If there are only pieces of it that you like, you can reuse those pieces (with proper attribution and describing\/justifying any changes). You don&#8217;t always have to start from scratch! Indeed, I would encourage you <em>not<\/em> to start from scratch.<\/p>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<p>Look at the variables in your research question.<\/p>\n<ul>\n<li>Classify them as direct observables, indirect observables, or constructs.<\/li>\n<li>Do you think measuring them will be easy or hard?<\/li>\n<li>What are your first thoughts about how to measure each variable? No wrong answers here, just write down a thought about each variable.<\/li>\n<\/ul>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-4172\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/simone-pellegrini-L3QG_OBluT0-unsplash-scaled-1.jpg\" alt=\"\" width=\"1024\" height=\"683\" \/><\/p>\n<h2>Measurement starts with conceptualization<\/h2>\n<p>In order to measure the concepts in your research question, we first have to understand what we think about them. As an aside, the word <em>concept&nbsp;<\/em>has come up quite a bit, and it is important to be sure we have a shared understanding of that term. A&nbsp;<a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_718\"><strong>concept<\/strong><\/a> is the notion or image that we conjure up when we think of some cluster of related observations or ideas. For example, masculinity is a concept. What do you think of when you hear that word? Presumably, you imagine some set of behaviors and perhaps even a particular style of self-presentation. Of course, we can\u2019t necessarily assume that everyone conjures up the same set of ideas or images when they hear the word&nbsp;<em>masculinity<\/em>. While there are many possible ways to define the term and some may be more common or have more support than others, there is no universal definition of masculinity. What counts as masculine may shift over time, from culture to culture, and even from individual to individual (Kimmel, 2008). This is why defining our concepts is so important.<\/p>\n<p><span style=\"text-align: initial\"><span style=\"font-size: 1em\">Not all researchers clearly explain their theoretical or conceptual framework for their study, but they should! Without understanding how a researcher has defined their key concepts, it would be nearly impossible to understand the meaning of that researcher\u2019s findings and conclusions. Back in <a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/7-theory-and-paradigm\/\">Chapter 7<\/a>, you developed a theoretical framework for your study based on a survey of the theoretical literature in your topic area. If you haven&#8217;t done that yet, consider flipping back to that section to familiarize yourself with some of the techniques for finding and using theories relevant to your research question. Continuing with our example on masculinity, we would need to survey the literature on theories of masculinity. After a few queries on masculinity, I found a wonderful article by Wong (2010)<a class=\"footnote\" title=\"Wong, Y. J., Steinfeldt, J. A., Speight, Q. L., &amp; Hickman, S. J. (2010). Content analysis of Psychology of men &amp; masculinity (2000\u20132008).\u00a0Psychology of Men &amp; Masculinity,\u00a011(3), 170.\" id=\"return-footnote-131-6\" href=\"#footnote-131-6\" aria-label=\"Footnote 6\"><sup class=\"footnote\">[6]<\/sup><\/a> that analyzed eight years of the journal <em>Psychology of Men&nbsp;&amp; Masculinity<\/em> and analyzed <a href=\"https:\/\/www.researchgate.net\/profile\/Y-Joel-Wong\/publication\/232438006_Content_Analysis_of_Psychology_of_Men_Masculinity_2000-2008\/links\/565e3f8008aefe619b2705d3\/Content-Analysis-of-Psychology-of-Men-Masculinity-2000-2008.pdf\">how often different theories of masculinity were used<\/a>. Not only can I get a sense of which theories are more accepted and which are more marginal in the social science on masculinity, I am able to identify a range of options from which I can find the theory or theories that will inform my project.&nbsp;<\/span><\/span><\/p>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<p>Identify a specific theory (or more than one theory) and how it helps you understand&#8230;<\/p>\n<ul>\n<li>Your independent variable(s).<\/li>\n<li>Your dependent variable(s).<\/li>\n<li>The relationship between your independent and dependent variables.<\/li>\n<\/ul>\n<p>Rather than completing this exercise from scratch, build from your theoretical or conceptual framework developed in previous chapters.<\/p>\n<\/div>\n<p>In quantitative methods, <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_366\">conceptualization<\/a><\/strong> involves writing out clear, concise definitions for our key concepts. These are the kind of definitions you are used to, like the ones in a dictionary. A conceptual definition involves defining a concept in terms of other concepts, usually by making reference to how other social scientists and theorists have defined those concepts in the past. Of course, new conceptual definitions are created all the time because our conceptual understanding of the world is always evolving.<\/p>\n<p>Conceptualization is deceptively challenging\u2014spelling out exactly what the concepts in your research question mean to you. Following along with our example, think about what comes to mind when you read the term masculinity. How do you know masculinity when you see it? Does it have something to do with men or with social norms? If so, perhaps we could define masculinity as the social norms that men are expected to follow. That seems like a reasonable start, and at this early stage of conceptualization, brainstorming about the images conjured up by concepts and playing around with possible definitions is appropriate. Doing so can also be used as a tool to explore your own personal biases and assumptions&#8211;something that can help you limit their influence on your work in ways that could corrupt your findings down the line. However, this reflective engagement is just the first step. At this point, you should be moving beyond brainstorming for your key variables because you have read a good amount of research about them.<\/p>\n<p>In addition, we should consult previous research and theory to understand the definitions that other scholars have already given for the concepts we are interested in. This doesn\u2019t mean we must use their definitions, but understanding how concepts have been defined in the past will help us to compare our conceptualizations with how other scholars define and relate concepts. Understanding prior definitions of our key concepts will also help us decide whether we plan to challenge those conceptualizations or rely on them for our own work. Finally, working on conceptualization is likely to help in the process of refining your research question to one that is specific and clear in what it asks. Conceptualization and operationalization (next section) are where &#8220;the rubber meets the road,&#8221; so to speak, and you have to specify what you mean by the question you are asking. As your conceptualization deepens, you will often find that your research question becomes more specific and clear.<\/p>\n<p>If we turn to the literature on masculinity, we will surely come across work by <a href=\"https:\/\/www.youtube.com\/watch?v=wnLmKmTdAgM\">Michael Kimmel<\/a>, one of the preeminent masculinity scholars in the United States. After consulting Kimmel\u2019s prior work (2000; 2008),<a class=\"footnote\" title=\"Kimmel, M. (2000).\u00a0The\u00a0gendered society. New York, NY: Oxford University Press; Kimmel, M. (2008). Masculinity. In W. A. Darity Jr. (Ed.),\u00a0International\u00a0encyclopedia of the social sciences\u00a0(2nd ed., Vol. 5, p. 1\u20135). Detroit, MI: Macmillan Reference USA\" id=\"return-footnote-131-7\" href=\"#footnote-131-7\" aria-label=\"Footnote 7\"><sup class=\"footnote\">[7]<\/sup><\/a> we might tweak our initial definition of masculinity. Rather than defining masculinity as \u201cthe social norms that men are expected to follow,\u201d perhaps instead we\u2019ll define it as \u201cthe social roles, behaviors, and meanings prescribed for men in any given society at any one time\u201d (Kimmel &amp; Aronson, 2004, p. 503).<a class=\"footnote\" title=\"Kimmel, M. &amp; Aronson, A. B. (2004).\u00a0Men and masculinities: A-J. Denver, CO: ABL-CLIO.\" id=\"return-footnote-131-8\" href=\"#footnote-131-8\" aria-label=\"Footnote 8\"><sup class=\"footnote\">[8]<\/sup><\/a> Our revised definition is more precise and complex because it goes beyond addressing one aspect of men\u2019s lives (norms), and addresses three aspects: roles, behaviors, and meanings. It also implies that roles, behaviors, and meanings may vary across societies and over time. Using definitions developed by theorists and scholars is a good idea, though you may find that you want to define things your own way.<\/p>\n<p>As you can see, conceptualization isn\u2019t as simple as applying any random definition that we come up with to a term. For example, note the difference between the research-based definition of masculinity and a basic dictionary <a href=\"https:\/\/www.merriam-webster.com\/dictionary\/masculinity\">definition<\/a>: &#8220;the quality or nature of the male sex: the quality, state, or degree of being masculine or manly.&#8221;<\/p>\n<p>Defining our terms may involve some brainstorming at the very beginning. But conceptualization must go beyond that, to engage with or critique existing definitions and conceptualizations in the literature. Once we\u2019ve brainstormed about the images associated with a particular word, we should also consult prior work to understand how others define the term in question. After we\u2019ve identified a clear definition that we\u2019re happy with, we should make sure that every term used in our definition will make sense to others. Are there terms used within our definition that also need to be defined? If so, our conceptualization is not yet complete. Our definition includes the concept of &#8220;social roles,&#8221; so we should have a definition for what those mean and become familiar with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Role_theory\">role theory<\/a> to help us with our conceptualization. If we don&#8217;t know what roles are, how can we study them?<\/p>\n<p>Let&#8217;s say we do all of that. We have a clear definition of the term <em>masculinity<\/em> with reference to previous literature and we also have a good understanding of the terms in our conceptual definition&#8230;then we&#8217;re done, right? Not so fast. You\u2019ve likely met more than one man in your life, and you\u2019ve probably noticed that they are not the same, even if they live in the same society during the same historical time period. This could mean there are dimensions of masculinity. In terms of social scientific measurement, concepts can be said to have <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_376\">multiple dimensions<\/a><\/strong>&nbsp;when there are multiple elements that make up a single concept. With respect to the term&nbsp;<em>masculinity<\/em>, dimensions could be based on gender identity, gender performance, sexual orientation, etc.. In any of these cases, the concept of masculinity would be considered to have multiple dimensions.<\/p>\n<p><span style=\"text-align: initial;font-size: 1em\">While you do not need to spell out every possible dimension of the concepts you wish to measure, it is important to identify whether your concepts are <\/span><strong style=\"text-align: initial;font-size: 1em\"><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_383\">unidimensional<\/a><\/strong><span style=\"text-align: initial;font-size: 1em\"> (and therefore relatively easy to define and measure) or multidimensional (and therefore require multi-part definitions and measures). In this way, how you conceptualize your variables determines how you will measure them in your study. Unidimensional concepts are those that are expected to have a single underlying dimension. These concepts can be measured using a single measure or test. Examples include simple concepts such as a person\u2019s weight, time spent sleeping, and so forth.&nbsp;<\/span><\/p>\n<p><span style=\"text-align: initial;font-size: 1em\">One frustrating thing is that there is no clear demarcation between concepts that are inherently unidimensional or multidimensional. Even something as simple as age could be broken down into multiple dimensions including mental age and chronological age, so where does conceptualization stop? How far down the dimensional rabbit hole do we have to go? Researchers should consider two things. First, how important is this variable in your study? If age is not important in your study (maybe it is a control variable), it seems like a waste of time to do a lot of work drawing from developmental theory to conceptualize this variable. A unidimensional measure from zero to dead is all the detail we need. On the other hand, if we were measuring the impact of age on masculinity, conceptualizing our independent variable (age) as multidimensional may provide a richer understanding of its impact on masculinity. Finally, your conceptualization will lead directly to your operationalization of the variable, and once your operationalization is complete, make sure someone reading your study could follow how your conceptual definitions informed the measures you chose for your variables.&nbsp;<\/span><\/p>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<p>Write a conceptual definition for your independent and dependent variables.<\/p>\n<ul>\n<li>Cite and attribute definitions to other scholars, if you use their words.<\/li>\n<li>Describe how your definitions are informed by your theoretical framework.<\/li>\n<li>Place your definition in conversation with other theories and conceptual definitions commonly used in the literature.<\/li>\n<li>Are there multiple dimensions of your variables?<\/li>\n<li>Are any of these dimensions important for you to measure?<\/li>\n<\/ul>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-119\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5573925_1280-774x1024.png\" alt=\"\" width=\"302\" height=\"400\" srcset=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5573925_1280-774x1024.png 774w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5573925_1280-227x300.png 227w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5573925_1280-768x1017.png 768w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5573925_1280-65x86.png 65w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5573925_1280-225x298.png 225w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5573925_1280-350x463.png 350w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5573925_1280.png 967w\" sizes=\"auto, (max-width: 302px) 100vw, 302px\" \/><\/p>\n<h2>Do researchers actually know what we&#8217;re talking about?<\/h2>\n<p>Conceptualization proceeds differently in qualitative research compared to quantitative research. Since qualitative researchers are interested in the understandings and experiences of their participants, it is less important for them to find one fixed definition for a concept before starting to interview or interact with participants. The researcher\u2019s job is to accurately and completely represent how their participants understand a concept, not to test their own definition of that concept.<\/p>\n<p>If you were conducting qualitative research on masculinity, you would likely consult previous literature like Kimmel\u2019s work mentioned above. From your literature review, you may come up with a&nbsp;<em>working definition<\/em>&nbsp;for the terms you plan to use in your study, which can change over the course of the investigation. However, the definition that matters is the definition that your participants share during data collection. A working definition is merely a place to start, and researchers should take care not to think it is the only or best definition out there.<\/p>\n<p>In qualitative inquiry, your participants are the experts on the concepts that arise during the study. Your job as the researcher is to accurately and reliably collect and interpret their understanding of the concepts they describe while answering your questions. Conceptualization of concepts is likely to change over the course of qualitative inquiry, as you learn more information from your participants. Indeed, getting participants to comment on, extend, or challenge the definitions and understandings of other participants is a hallmark of qualitative research. This is the opposite of quantitative research, in which definitions must be completely set in stone before the inquiry can begin.<\/p>\n<p>The contrast between qualitative and quantitative conceptualization is instructive for understanding how quantitative methods (and positivist research in general) privilege the knowledge of the researcher over the knowledge of study participants and community members. Positivism holds that the researcher is the &#8220;expert,&#8221; and can define concepts based on their expert knowledge of the scientific literature. This knowledge is in contrast to the lived experience that participants possess from experiencing the topic under examination day-in, day-out. For this reason, it would be wise to remind ourselves not to take our definitions too seriously and be critical about the limitations of our knowledge.<\/p>\n<p>Conceptualization must be open to revisions, even radical revisions, as scientific knowledge progresses. While I\u2019ve suggested consulting prior scholarly definitions of our concepts, you should not assume that prior, scholarly definitions are more real than the definitions we create. Likewise, we should not think that our own made-up definitions are any more real than any other definition. It would also be wrong to assume that just because definitions exist for some concept that the concept itself exists beyond some abstract idea in our heads. Building on the paradigmatic ideas behind interpretivism and the critical paradigm, researchers call the assumption that our abstract concepts exist in some concrete, tangible way is known as <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_390\">reification<\/a><\/strong>. It explores the power dynamics behind how we can create reality by how we define it.<\/p>\n<p>Returning again to our example of masculinity. Think about our how our notions of masculinity have developed over the past few decades, and how different and yet so similar they are to patriarchal definitions throughout history. Conceptual definitions become more or less popular based on the power arrangements inside of social science the broader world. Western knowledge systems are privileged, while others are viewed as unscientific and marginal. The historical domination of social science by white men from WEIRD countries meant that definitions of masculinity were imbued their cultural biases and were designed explicitly and implicitly to preserve their power. This has inspired movements for <a href=\"https:\/\/www.india-seminar.com\/2009\/597\/597_shiv_visvanathan.htm\">cognitive justice<\/a> as we seek to use social science to achieve global development.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Key Takeaways<\/h3>\n<ul>\n<li>Measurement is the process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena that we are investigating.<\/li>\n<li>Kaplan identified three categories of things that social scientists measure including observational terms, indirect observables, and constructs.<\/li>\n<li>Some concepts have multiple elements or dimensions.<\/li>\n<li>Researchers often use measures previously developed and studied by other researchers.<\/li>\n<li>Conceptualization is a process that involves coming up with clear, concise definitions.<\/li>\n<li>Conceptual definitions are based on the theoretical framework you are using for your study (and the paradigmatic assumptions underlying those theories).<\/li>\n<li>Whether your conceptual definitions come from your own ideas or the literature, you should be able to situate them in terms of other commonly used conceptual definitions.<\/li>\n<li>Researchers should acknowledge the limited explanatory power of their definitions for concepts and how oppression can shape what explanations are considered true or scientific.<\/li>\n<\/ul>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<p>Think historically about the variables in your research question.<\/p>\n<ul>\n<li>How has our conceptual definition of your topic changed over time?<\/li>\n<li>What scholars or social forces were responsible for this change?<\/li>\n<\/ul>\n<p>Take a critical look at your conceptual definitions.<\/p>\n<ul>\n<li>How participants might define terms for themselves differently, in terms of their daily experience?<\/li>\n<li>On what cultural assumptions are your conceptual definitions based?<\/li>\n<li>Are your conceptual definitions applicable across all cultures that will be represented in your sample?<a id=\"11.2\"><\/a><\/li>\n<\/ul>\n<\/div>\n<h1>11.2 Operational definitions<\/h1>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\n<p>Learners will be able to&#8230;<\/p>\n<ul>\n<li>Define and give an example of indicators and attributes for a variable<\/li>\n<li>Apply the three components of an operational definition to a variable<\/li>\n<li>Distinguish between levels of measurement for a variable and how those differences relate to measurement<\/li>\n<li>Describe the purpose of composite measures like scales and indices<\/li>\n<\/ul>\n<\/div>\n<p>Conceptual definitions are like dictionary definitions. They tell you what a concept means by defining it using other concepts. In this section we will move from the abstract realm (theory) to the real world (measurement). <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_616\">Operationalization<\/a><\/strong> is the process by which researchers spell out precisely how a concept will be measured in their study. It involves identifying the specific research procedures we will use to gather data about our concepts. If conceptually defining your terms means looking at theory, how do you operationally define your terms? By looking for indicators of when your variable is present or not, more or less intense, and so forth. Operationalization is probably the most challenging part of quantitative research, but once it&#8217;s done, the design and implementation of your study will be straightforward.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-120\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280-1024x1024.png\" alt=\"\" width=\"400\" height=\"400\" srcset=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280-1024x1024.png 1024w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280-300x300.png 300w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280-150x150.png 150w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280-768x769.png 768w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280-65x65.png 65w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280-225x225.png 225w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280-350x350.png 350w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/detective-152085_1280.png 1279w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/p>\n<h2>Indicators<\/h2>\n<p>Operationalization works by identifying specific&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_719\">indicators<\/a><\/strong> that will be taken to represent the ideas we are interested in studying. If we are interested in studying masculinity, then the indicators for that concept might include some of the social roles prescribed to men in society such as breadwinning or fatherhood. Being a breadwinner or a father might therefore be considered <em>indicators&nbsp;<\/em>of a person\u2019s masculinity. The extent to which a man fulfills either, or both, of these roles might be understood as clues (or indicators) about the extent to which he is viewed as masculine.<\/p>\n<p>Let\u2019s look at another example of indicators. Each day, Gallup researchers poll 1,000 randomly selected Americans to ask them about their well-being. To measure well-being, Gallup asks these people to respond to questions covering six broad areas: physical health, emotional health, work environment, life evaluation, healthy behaviors, and access to basic necessities. Gallup uses these six factors as indicators of the concept that they are really interested in, which is <a href=\"http:\/\/www.well-beingindex.com\/\">well-being<\/a>.<\/p>\n<p>Identifying indicators can be even simpler than the examples described thus far. Political party affiliation is another relatively easy concept for which to identify indicators. If you asked a person what party they voted for in the last national election (or gained access to their voting records), you would get a good indication of their party affiliation. Of course, some voters split tickets between multiple parties when they vote and others swing from party to party each election, so our indicator is not perfect. Indeed, if our study were about political identity as a key concept, operationalizing it solely in terms of who they voted for in the previous election leaves out a lot of information about identity that is relevant to that concept. Nevertheless, it&#8217;s a pretty good indicator of political party affiliation.<\/p>\n<p>Choosing indicators is not an arbitrary process. As described earlier, utilizing prior theoretical and empirical work in your area of interest is a great way to identify indicators in a scholarly manner. And your conceptual definitions will point you in the direction of relevant indicators. Empirical work will give you some very specific examples of how the important concepts in an area have been measured in the past and what sorts of indicators have been used. Often, it makes sense to use the same indicators as previous researchers; however, you may find that some previous measures have potential weaknesses that your own study will improve upon.<\/p>\n<p>All of the examples in this chapter have dealt with questions you might ask a research participant on a survey or in a quantitative interview. If you plan to collect data from other sources, such as through direct observation or the analysis of available records, think practically about what the design of your study might look like and how you can collect data on various indicators feasibly. If your study asks about whether the participant regularly changes the oil in their car, you will likely not observe them directly doing so. Instead, you will likely need to rely on a survey question that asks them the frequency with which they change their oil or ask to see their car maintenance records.<\/p>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<ul>\n<li>What indicators are commonly used to measure the variables in your research question?<\/li>\n<li>How can you feasibly collect data on these indicators?<\/li>\n<li>Are you planning to collect your own data using a questionnaire or interview? Or are you planning to analyze available data like client files or raw data shared from another researcher&#8217;s project?<\/li>\n<\/ul>\n<p>Remember, you need <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_503\"><strong>raw data<\/strong><\/a>. You research project cannot rely solely on the results reported by other researchers or the arguments you read in the literature. A literature review is only the first part of a research project, and your review of the literature should inform the indicators you end up choosing when <em>you<\/em> measure the variables in your research question.<\/p>\n<\/div>\n<p>Unlike conceptual definitions which contain other concepts, operational definition consists of the following components: (1) the variable being measured and its attributes, (2) the measure you will use, (3) how you plan to interpret the data collected from that measure to draw conclusions about the variable you are measuring.<\/p>\n<h2>Step 1: Specifying variables and attributes<\/h2>\n<p>The first component, the variable, should be the easiest part. At this point in quantitative research, you should have a research question that has at least one independent and at least one dependent variable. Remember that variables must be able to vary. For example, the United States is not a variable. Country of residence is a variable, as is patriotism. Similarly, if your sample only includes men, gender is a constant in your study, not a variable. A&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_388\">constant<\/a><\/strong> is a characteristic that does not change in your study.<\/p>\n<p>When social scientists measure concepts, they sometimes use the language of variables and attributes. A&nbsp;<strong>variable<\/strong> refers to a quality or quantity that varies across people or situations. <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_387\">Attributes<\/a><\/strong>&nbsp;are the characteristics that make up a variable. For example, the variable hair color would contain attributes like blonde, brown, black, red, gray, etc. A variable\u2019s attributes determine its level of measurement. There are four possible levels of measurement: nominal, ordinal, interval, and ratio. The first two levels of measurement are&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_695\">categorical<\/a><\/strong>, meaning their attributes are categories rather than numbers. The latter two levels of measurement are&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_654\">continuous<\/a><\/strong>, meaning their attributes are numbers.<\/p>\n<figure id=\"attachment_130\" aria-describedby=\"caption-attachment-130\" style=\"width: 654px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-large wp-image-4175\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/tommy-van-kessel-BXFY8_iii9M-unsplash-scaled-1.jpg\" alt=\"\" width=\"654\" height=\"1024\" \/><figcaption id=\"caption-attachment-130\" class=\"wp-caption-text\">I exist to frustrate researchers&#8217; categorizations.<\/figcaption><\/figure>\n<h3>Levels of measurement<\/h3>\n<p>Hair color is an example of a nominal level of measurement.&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_720\">Nominal<\/a><\/strong> measures are categorical, and those categories cannot be mathematically ranked. As a brown-haired person (with some gray), I can\u2019t say for sure that brown-haired people are better than blonde-haired people. As with all nominal levels of measurement, there is no ranking order between hair colors; they are simply different. That is what constitutes a nominal level&#8211;gender and race are also measured at the nominal level.<\/p>\n<p>What attributes are contained in the variable&nbsp;<em>hair color<\/em>? While blonde, brown, black, and red are common colors, some people may not fit into these categories if we only list these attributes. My wife, who currently has purple hair, wouldn\u2019t fit anywhere. This means that our attributes were not exhaustive. <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_721\">Exhaustiveness<\/a><\/strong>&nbsp;means that all possible attributes are listed. We may have to list a lot of colors before we can meet the criteria of exhaustiveness. Clearly, there is a point at which exhaustiveness has been reasonably met. If a person insists that their hair color is&nbsp;<em>light burnt sienna<\/em>, it is not your responsibility to list that as an option. Rather, that person would reasonably be described as brown-haired. Perhaps listing a category for&nbsp;<em>other color<\/em>&nbsp;would suffice to make our list of colors exhaustive.<\/p>\n<p>What about a person who has multiple hair colors at the same time, such as red and black? They would fall into multiple attributes. This violates the rule of&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_722\">mutual exclusivity<\/a><\/strong>, in which a person cannot fall into two different attributes. Instead of listing all of the possible combinations of colors, perhaps you might include a&nbsp;<em>multi-color<\/em>&nbsp;attribute to describe people with more than one hair color.<\/p>\n<p>Making sure researchers provide mutually exclusive and exhaustive attributes is about making sure all people are represented in the data record. For many years, the attributes for gender were only male or female. Now, our understanding of gender has evolved to encompass more attributes that better reflect the diversity in the world. Children of parents from different races were often classified as one race or another, even if they identified with both cultures. The option for bi-racial or multi-racial on a survey not only more accurately reflects the racial diversity in the real world but validates and acknowledges people who identify in that manner. If we did not measure race in this way, we would leave empty the data record for people who identify as biracial or multiracial, impairing our search for truth.<\/p>\n<p>Unlike nominal-level measures, attributes at the&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_524\">ordinal<\/a><\/strong>&nbsp;level can be rank ordered. For example, someone\u2019s degree of satisfaction in their romantic relationship can be ordered by rank. That is, you could say you are not at all satisfied, a little satisfied, moderately satisfied, or highly satisfied. Note that even though these have a rank order to them (not at all satisfied is certainly worse than highly satisfied), we cannot calculate a mathematical distance between those attributes. We can simply say that one attribute of an ordinal-level variable is more or less than another attribute.<\/p>\n<p>This can get a little confusing when using <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_723\">rating scales<\/a><\/strong>. If you have ever taken a customer satisfaction survey or completed a course evaluation for school, you are familiar with rating scales. \u201cOn a scale of 1-5, with 1 being the lowest and 5 being the highest, how likely are you to recommend our company to other people?\u201d That surely sounds familiar. Rating scales use numbers, but only as a shorthand, to indicate what attribute (highly likely, somewhat likely, etc.) the person feels describes them best. You wouldn\u2019t say you are \u201c2\u201d likely to recommend the company, but you would say you are not very likely to recommend the company. Ordinal-level attributes must also be exhaustive and mutually exclusive, as with nominal-level variables.<\/p>\n<p>At the&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_461\">interval<\/a>&nbsp;<\/strong>level, attributes must also be exhaustive and mutually exclusive and there is equal distance between attributes. Interval measures are also continuous, meaning their attributes are numbers, rather than categories. IQ scores are interval level, as are temperatures in Fahrenheit and Celsius. Their defining characteristic is that we can say how much more or less one attribute differs from another. We cannot, however, say with certainty what the ratio of one attribute is in comparison to another. For example, it would not make sense to say that a person with an IQ score of 140 has twice the IQ of a person with a score of 70, or that 20 degrees is twice as hot as 10 degrees. However, the difference between IQ scores of 80 and 100 is the same as the difference between IQ scores of 120 and 140 (and the difference between a temperature of 20 and 10 is the same as the difference between 35 and 25).<\/p>\n<p>While we cannot say that someone with an IQ of 140 is twice as intelligent as someone with an IQ of 70 because IQ is measured at the interval level, we can say that someone with six siblings has twice as many as someone with three because number of siblings is measured at the ratio level. Finally, at the <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_462\">ratio<\/a>&nbsp;<\/strong>level, attributes are mutually exclusive and exhaustive, attributes can be rank ordered, the distance between attributes is equal, and attributes have a true zero point.&nbsp;Thus, with these variables, we <em>can&nbsp;<\/em>say what the ratio of one attribute is in comparison to another. Examples of ratio-level variables include age and years of education. We know that a person who is 12 years old is twice as old as someone who is 6 years old. Height measured in meters and weight measured in kilograms are good examples. So are counts of discrete objects or events such as the number of siblings one has or the number of questions a student answers correctly on an exam. The differences between each level of measurement are visualized in Table 11.1.<\/p>\n<table>\n<caption>Table 11.1 Criteria for Different Levels of Measurement<\/caption>\n<tbody>\n<tr>\n<td><\/td>\n<td>Nominal<\/td>\n<td>Ordinal<\/td>\n<td>Interval<\/td>\n<td>Ratio<\/td>\n<\/tr>\n<tr>\n<td>Exhaustive<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<\/tr>\n<tr>\n<td>Mutually exclusive<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<\/tr>\n<tr>\n<td>Rank-ordered<\/td>\n<td><\/td>\n<td>X<\/td>\n<td>X<\/td>\n<td>X<\/td>\n<\/tr>\n<tr>\n<td>Equal distance between attributes<\/td>\n<td><\/td>\n<td><\/td>\n<td>X<\/td>\n<td>X<\/td>\n<\/tr>\n<tr>\n<td>True zero point<\/td>\n<td><\/td>\n<td><\/td>\n<td><\/td>\n<td>X<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h4>Levels of measurement=levels of specificity<\/h4>\n<p>We have spent time learning how to determine our data&#8217;s level of measurement. Now what? How could we use this information to help us as we measure concepts and develop measurement tools? First, the types of statistical tests that we are able to use are generally dependent on our data&#8217;s level of measurement.&nbsp;With nominal-level measurement, for example, the only available measure of central tendency is the mode. With ordinal-level measurement, the median or mode can be used as indicators of central tendency<a class=\"footnote\" title=\"That said, when using a Lickert scale, which is an ordinal scale, many researchers will argue that averages, measures of variation, and parametric tests are appropriate. For more on this, see Sullivan, G. M., &amp; Artino, A. R., Jr (2013). Analyzing and interpreting data from likert-type scales. Journal of graduate medical education, 5(4), 541\u2013542. https:\/\/doi.org\/10.4300\/JGME-5-4-18\u00a0and Norman G. (2010). Likert scales, levels of measurement and the &quot;laws&quot; of statistics. Advances in health sciences education : theory and practice, 15(5), 625\u2013632. https:\/\/doi.org\/10.1007\/s10459-010-9222-y\" id=\"return-footnote-131-9\" href=\"#footnote-131-9\" aria-label=\"Footnote 9\"><sup class=\"footnote\">[9]<\/sup><\/a>. Interval and ratio-level measurement are typically considered the most desirable because they permit for any indicators of central tendency to be computed (i.e., mean, median, or mode). Also, ratio-level measurement is the only level that allows meaningful statements about ratios of scores. The higher the level of measurement, the more complex statistical tests we are able to conduct. This knowledge may help us decide what kind of data we need to gather, and how.<\/p>\n<p>That said, we have to balance this knowledge with the understanding that sometimes, collecting data at a higher level of measurement could negatively impact our studies. For instance, sometimes providing answers in ranges may make prospective participants feel more comfortable responding to sensitive items. Imagine that you were interested in collecting information on topics such as income, number of sexual partners, number of times someone used illicit drugs, etc. You would have to think about the sensitivity of these items and determine if it would make more sense to collect some data at a lower level of measurement (e.g., asking if they are sexually active or not (nominal) versus their total number of sexual partners (ratio)).<\/p>\n<p>Finally, sometimes when analyzing data, researchers find a need to change a data&#8217;s level of measurement. For example, a few years ago, a student was interested in studying the relationship between mental health and life satisfaction. This student used a variety of measures. One item asked about the number of mental health symptoms, reported as the actual number. When analyzing data, my student examined the mental health symptom variable and noticed that she had two groups, those with none or one symptoms and those with many symptoms. Instead of using the ratio level data (actual number of mental health symptoms), she collapsed her cases into two categories, few and many. She decided to use this variable in her analyses. It is important to note that you can move a higher level of data to a lower level of data; however, you are unable to move a lower level to a higher level.<\/p>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<ul>\n<li>Check that the variables in your research question can vary&#8230;and that they are not constants or one of many potential attributes of a variable.<\/li>\n<li>Think about the attributes your variables have. Are they categorical or continuous? What level of measurement seems most appropriate?<\/li>\n<\/ul>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-4176\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/markus-winkler-htShI76GLDM-unsplash-scaled-1.jpg\" alt=\"\" width=\"1024\" height=\"683\" \/><\/p>\n<h2>Step 2: Specifying measures for each variable<\/h2>\n<p>Let\u2019s pick a research question and walk through the process of operationalizing variables to see how specific we need to get. I\u2019m going to hypothesize that students in a class who are unmotivated are less likely to be satisfied with instruction. Remember, this would be a direct relationship\u2014as motivation decreases, satisfaction decreases. In this question, motivation&nbsp;is my independent variable (the cause) and satisfaction with instruction is my dependent variable (the effect). Now we have identified our variables, their attributes, and levels of measurement, we move onto the second component: the measure itself.<\/p>\n<p>So, how would you measure my key variables: motivation&nbsp;and satisfaction? What indicators would you look for? Some students might say that motivation could be measured by observing a participant\u2019s body language. They may also say that a motivated&nbsp;person will often express feelings of engagement or energy. In addition, a satisfied person might be happy around instructors and often express gratitude. While these factors may indicate that the variables are present, they lack coherence. Unfortunately, what this \u201cmeasure\u201d is actually saying is that \u201cI know motivation and satisfaction when I see them.\u201d While you are likely a decent judge of motivation and satisfaction, you need to provide more information in a research study for how you plan to measure your variables. Your judgment is subjective, based on your own idiosyncratic experiences with motivation and satisfaction. They couldn\u2019t be replicated by another researcher. They also can\u2019t be done consistently for a large group of people. Operationalization requires that you come up with a specific and rigorous measure for seeing who is motivation or satisfied.<\/p>\n<p>Finding a good measure for your variable depends on the kind of variable it is. Variables that are directly observable don&#8217;t come up very often in my students&#8217; classroom projects, but they might include things like taking someone&#8217;s blood pressure, marking attendance or participation in a group, and so forth. To measure an indirectly observable variable like age, you would probably put a question on a survey that asked, \u201cHow old are you?\u201d Measuring a variable like income might require some more thought, though. Are you interested in this person\u2019s individual income or the income of their family unit? This might matter if your participant does not work or is dependent on other family members for income. Do you count income from social welfare programs? Are you interested in their income per month or per year? Even though indirect observables are relatively easy to measure, the measures you use must be clear in what they are asking, and operationalization is all about figuring out the specifics of what you want to know. For more complicated constructs, you will need compound measures (that use multiple indicators to measure a single variable).<\/p>\n<p>How you plan to collect your data also influences how you will measure your variables. For researchers using secondary data like student records as a data source, you are limited by what information is in the data sources you can access. If your organization uses a given measurement for a learning outcome, that is the one you will use in your study. One of the benefits of collecting your own data is being able to select the measures you feel best exemplify your understanding of the topic.<\/p>\n<h3>Measuring unidimensional concepts<\/h3>\n<p>The previous section mentioned two important considerations: how complicated the variable is and how you plan to collect your data. With these in hand, we can use the level of measurement to further specify how you will measure your variables and consider specialized rating scales developed by social science researchers.<\/p>\n<h4>Measurement at each level<\/h4>\n<p>Nominal measures assess categorical variables. These measures are used for variables or indicators that have mutually exclusive attributes, but that cannot be rank-ordered. Nominal measures ask about the variable and provide names or labels for different attribute values like social work, counseling, and nursing for the variable profession. Nominal measures are relatively straightforward.<\/p>\n<p>Ordinal measures often use a rating scale. It is an ordered set of responses that participants must choose from. Figure 11.1 shows several examples. The number of response options on a typical rating scale is usualy five or seven, though it can range from three to 11. Five-point scales are best for unipolar scales where only one construct is tested, such as frequency (Never, Rarely, Sometimes, Often, Always). Seven-point scales are best for bipolar scales where there is a dichotomous spectrum, such as liking (Like very much, Like somewhat, Like slightly, Neither like nor dislike, Dislike slightly, Dislike somewhat, Dislike very much). Sometime you want people to be forced to choose one way or another, so you might use a forced-choice scale of even numbered options (4, 6, or 8) that doesn&#8217;t offer a mid-point option. For bipolar questions, it is useful to offer an earlier question that branches them into an area of the scale; if asking about liking ice cream, first ask \u201cDo you generally like or dislike ice cream?\u201d Once the respondent chooses like or dislike, refine it by offering them relevant choices from the seven-point scale. Branching improves both reliability and validity (Krosnick &amp; Berent, 1993).<a class=\"footnote\" title=\"Krosnick, J.A. &amp; Berent, M.K. (1993). Comparisons of party identification and policy preferences: The impact of survey question format.\u00a0American Journal of Political Science, 27(3), 941-964.\" id=\"return-footnote-131-10\" href=\"#footnote-131-10\" aria-label=\"Footnote 10\"><sup class=\"footnote\">[10]<\/sup><\/a> Although you often see scales with numerical labels, it is best to only present verbal labels to the respondents but convert them to numerical values in the analyses. Avoid partial labels or length or overly specific labels. In some cases, the verbal labels can be supplemented with (or even replaced by) meaningful graphics. The last rating scale shown in Figure 11.1 is a visual-analog scale, on which participants make a mark somewhere along the horizontal line to indicate the magnitude of their response.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_130\" aria-describedby=\"caption-attachment-130\" style=\"width: 900px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-4149\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/9.2.png\" alt=\"\" width=\"900\" height=\"461\" \/><figcaption id=\"caption-attachment-130\" class=\"wp-caption-text\">Figure 11.1 Example rating scales for closed-ended questionnaire items<\/figcaption><\/figure>\n<p>Interval measures are those where the values measured are not only rank-ordered, but are also equidistant from adjacent attributes. For example, the temperature scale (in Fahrenheit or Celsius), where the difference between 30 and 40 degree Fahrenheit is the same as that between 80 and 90 degree Fahrenheit. Likewise, if you have a scale that asks respondents\u2019 annual income using the following attributes (ranges): $0 to 10,000, $10,000 to 20,000, $20,000 to 30,000, and so forth, this is also an interval measure, because the mid-point of each range (i.e., $5,000, $15,000, $25,000, etc.) are equidistant from each other. The intelligence quotient (IQ) scale is also an interval measure, because the measure is designed such that the difference between IQ scores 100 and 110 is supposed to be the same as between 110 and 120 (although in all honesty, we do not really know whether that is truly the case). Interval measures allow us to examine \u201chow much more\u201d is one attribute when compared to another, which is not possible with nominal or ordinal measures. You may find researchers who argue that ordinal rating scales are actually interval measures so that we can use different statistical techniques for analyzing them. As we will discuss in the latter part of the chapter, this is debatable because there is no way to know whether the difference between a 3 and a 4 on a rating scale is the same as the difference between a 2 and a 3. Those numbers are just placeholders for categories.<\/p>\n<p>Ratio measures are those that have all the qualities of nominal, ordinal, and interval scales, and in addition, also have a \u201ctrue zero\u201d point (where the value zero implies lack or non-availability of the underlying construct). Think about how to measure the number of people working in human resources at a social work agency. It could be one, several, or none (if the company contracts out for those services). Measuring interval and ratio data is relatively easy, as people either select or input a number for their answer. If you ask a person how many eggs they purchased last week, they can simply tell you they purchased `a dozen eggs at the store, two at breakfast on Wednesday, or none at all.<\/p>\n<h4>Commonly used rating scales in questionnaires<\/h4>\n<p class=\"c4\"><span class=\"c5 c1\">The level of measurement will give you the basic information you need, but social scientists have developed specialized instruments for use in questionnaires, a common tool used in quantitative research.&nbsp;<\/span><span class=\"c5 c1\">Although <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_386\">Likert scale<\/a><\/strong> is a term colloquially used to refer to almost any rating scale (e.g., a 0-to-10 life satisfaction scale), it has a much more precise meaning. <\/span><span class=\"c5 c1\">In the 1930s, researcher Rensis Likert (pronounced LICK-ert) created a new approach for measuring people\u2019s attitudes (Likert, 1932)<\/span><span class=\"c22 c5\">.<a class=\"footnote\" title=\"Likert, R. (1932). A technique for the measurement of attitudes.\u00a0Archives of Psychology,140, 1\u201355.\" id=\"return-footnote-131-11\" href=\"#footnote-131-11\" aria-label=\"Footnote 11\"><sup class=\"footnote\">[11]<\/sup><\/a><\/span><span class=\"c5 c1\">&nbsp;It involves presenting people with several statements\u2014including both favorable and unfavorable statements\u2014about some person, group, or idea. Respondents then express their agreement or disagreement with each statement on a 5-point scale:&nbsp;<\/span><em><span class=\"c5 c8 c1\">Strongly Agree<\/span><span class=\"c5 c1\">,&nbsp;<\/span><span class=\"c5 c8 c1\">Agree<\/span><span class=\"c5 c1\">,&nbsp;<\/span><span class=\"c5 c8 c1\">Neither Agree nor Disagree<\/span><span class=\"c5 c1\">,&nbsp;<\/span><span class=\"c5 c8 c1\">Disagree<\/span><span class=\"c5 c1\">,&nbsp;<\/span><span class=\"c5 c8 c1\">Strongly Disagree<\/span><\/em><span class=\"c5 c1\">. Numbers are assigned to each response a<\/span><span class=\"c5 c1\">nd then summed across all items to produce a score representing the attitude toward the person, group, or idea. For items that are phrased in an opposite direction (e.g., negatively worded statements instead of positively worded statements), reverse coding is used so that the numerical scoring of statements also runs in the opposite direction.&nbsp;<\/span><span class=\"c5 c1\">The entire set of items came to be called a Likert scale, as indicated in Table 11.2 below.<\/span><\/p>\n<p class=\"c33 c70\"><span class=\"c5 c1\">Unless you are measuring people\u2019s attitude toward something by assessing their level of agreement with several statements about it, it is best to avoid calling it a Likert scale. You are probably just using a rating scale. Likert scales allow for more granularity (more finely tuned response) than yes\/no items, including whether respondents are neutral to the statement. <\/span>Below is an example of how we might use a Likert scale to assess your attitudes about research as you work your way through this textbook.<\/p>\n<p>&nbsp;<\/p>\n<table class=\"grid\" style=\"border-collapse: collapse;width: 0%;height: 131px\">\n<caption>Table 11.2 Likert scale<\/caption>\n<tbody>\n<tr>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><strong>Strongly agree<\/strong><\/td>\n<td style=\"width: 16.6667%\"><strong>Agree<\/strong><\/td>\n<td style=\"width: 16.6667%\"><strong>Neutral<\/strong><\/td>\n<td style=\"width: 16.6667%\"><strong>Disagree<\/strong><\/td>\n<td style=\"width: 16.6667%\"><strong>Strongly disagree<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 16.6667%\">I like research more now than when I started reading this book.<\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 16.6667%\">This textbook is easy to use.<\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 16.6667%\">I feel confident about how well I understand levels of measurement.<\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 16.6667%\">This textbook is helping me plan my research proposal.<\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<td style=\"width: 16.6667%\"><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_385\">Semantic differential scales<\/a><\/strong> are composite (multi-item) scales in which respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites. Whereas in the above Likert scale, the participant is asked how much they <em>agree or disagree<\/em> with a statement, in a semantic differential scale the participant is asked to indicate how they <em>feel<\/em> about a specific item. This makes the s<span style=\"font-size: 1em\">emantic differential scale an excellent technique for measuring people\u2019s attitudes or feelings toward objects, events, or behaviors.<\/span><span style=\"text-align: initial;font-size: 1em\"> Table 11.3 is an example of a semantic differential scale that was created to assess participants&#8217; feelings about this textbook.&nbsp;<\/span><\/p>\n<p>&nbsp;<\/p>\n<table style=\"height: 90px\">\n<caption><strong>Table 11.3. A semantic differential scale for measuring attitudes towards a textbook<\/strong><\/caption>\n<tbody>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 779.826px\" colspan=\"7\"><em><strong>1) <\/strong><span style=\"text-decoration: underline\"><strong>How would you rate your opinions toward this textbook?<\/strong><\/span><\/em><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\">Very much<\/td>\n<td style=\"height: 15px;width: 104.444px\">Somewhat<\/td>\n<td style=\"height: 15px;width: 77.3438px\">Neither<\/td>\n<td style=\"height: 15px;width: 104.444px\">Somewhat<\/td>\n<td style=\"height: 15px;width: 107.465px\">Very much<\/td>\n<td style=\"height: 15px;width: 103.524px\"><\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\">Boring<\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 77.3438px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 103.524px\">Exciting<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\">Useless<\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 77.3438px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 103.524px\">Useful<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\">Hard<\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 77.3438px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 103.524px\">Easy<\/td>\n<\/tr>\n<tr style=\"height: 15px\">\n<td style=\"height: 15px;width: 97.4306px\">Irrelevant<\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 77.3438px\"><\/td>\n<td style=\"height: 15px;width: 104.444px\"><\/td>\n<td style=\"height: 15px;width: 107.465px\"><\/td>\n<td style=\"height: 15px;width: 103.524px\">Applicable<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div>Notice that on a Likert scale, each item is different but the choices for the scale are the same (e.g., strongly agree, agree, etc.). However, for a semantic differential scale, the thing that you are reviewing, in this case, beliefs about research content, remains the same. It is the choices that change. Semantic differential is believed to be an excellent technique for measuring people\u2019s attitude or feelings toward objects, events, or behaviors.<\/div>\n<p>This composite scale was designed by Louis Guttman and uses a series of items arranged in increasing order of intensity (least intense to most intense) of the concept. This type of scale allows us to understand the intensity of beliefs or feelings. Each item in the above <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_384\">Guttman scale<\/a><\/strong> has a weight (this is not indicated on the tool) which varies with the intensity of that item, and the weighted combination of each response is used as an aggregate measure of an observation.<\/p>\n<div class=\"textbox shaded\">\n<p><strong>Example Guttman Scale Items<\/strong><\/p>\n<ol>\n<li>I often felt the material was not engaging&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Yes\/No<\/li>\n<li>I was often thinking about other things in class&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Yes\/No<\/li>\n<li>I was often working on other tasks during class&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Yes\/No<\/li>\n<li>I will work to abolish research from the curriculum&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Yes\/No<\/li>\n<\/ol>\n<\/div>\n<p>Notice how the items move from lower intensity to higher intensity. A researcher reviews the yes answers and creates a score for each participant.<\/p>\n<h3>Composite measures: Scales and indices<\/h3>\n<p>Depending on your research design, your measure may be something you put on a survey or pre\/post-test that you give to your participants. For a variable like age or income, one well-worded question may suffice. Unfortunately, most variables in the social world are not so simple. Motivation and satisfaction are multidimensional concepts. Relying on a single indicator like a question that asks &#8220;Yes or no, are you motivated?\u201d does not encompass the complexity of motivation, including issues with mood, energy and happiness. There is no easy way to delineate between multidimensional and unidimensional concepts, as its all in how you think about your variable. Satisfaction could be validly measured using a unidimensional ordinal rating scale. However, if satisfaction were a key variable in our study, we would need a theoretical framework and conceptual definition for it. That means we&#8217;d probably have more indicators to ask about like timeliness, respect, sensitivity, and many others, and we would want our study to say something about what satisfaction truly means in terms of our other key variables. However, if satisfaction is not a key variable in your conceptual framework, it makes sense to operationalize it as a unidimensional concept.<\/p>\n<p>For more complicated measures, researchers use scales and indices (sometimes called indexes) to measure their variables because they assess multiple indicators to develop a composite (or total) score. Co<span style=\"font-size: 1em\">mposite scores provide a much greater understanding of concepts than a single item could. Although we won&#8217;t delve too deeply into the process of scale development, we will cover some important topics for you to understand how scales and indices developed by other researchers can be used in your project.<\/span><\/p>\n<p>Although they exhibit differences (which will later be discussed) the two have in common various factors.<\/p>\n<ul>\n<li>Both are ordinal measures of variables.<\/li>\n<li>Both can order the units of analysis in terms of specific variables.<\/li>\n<li>Both are <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_375\"><strong>composite measures<\/strong><\/a>.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-124\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920-1024x691.png\" alt=\"\" width=\"1024\" height=\"691\" srcset=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920-1024x691.png 1024w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920-300x203.png 300w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920-768x518.png 768w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920-1536x1037.png 1536w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920-65x44.png 65w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920-225x152.png 225w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920-350x236.png 350w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/survey-4441595_1920.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h4>Scales<\/h4>\n<p>The previous section discussed how to measure respondents\u2019 responses to predesigned items or indicators belonging to an underlying construct. But how do we create the indicators themselves? The process of creating the indicators is called scaling. More formally, scaling is a branch of measurement that involves the construction of measures by associating qualitative judgments about unobservable constructs with quantitative, measurable metric units. Stevens (1946)<a class=\"footnote\" title=\"Stevens, S. S. (1946). On the Theory of Scales of Measurement.\u00a0Science,\u00a0103(2684), 677-680.\" id=\"return-footnote-131-12\" href=\"#footnote-131-12\" aria-label=\"Footnote 12\"><sup class=\"footnote\">[12]<\/sup><\/a> said, \u201cScaling is the assignment of objects to numbers according to a rule.\u201d This process of measuring abstract concepts in concrete terms remains one of the most difficult tasks in empirical social science research.<\/p>\n<p>The outcome of a scaling process is a <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_724\">scale<\/a><\/strong>, which is an empirical structure for measuring items or indicators of a given construct. Understand that multidimensional \u201cscales\u201d, as discussed in this section, are a little different from \u201crating scales\u201d discussed in the previous section. A rating scale is used to capture the respondents\u2019 reactions to a given item on a questionnaire. For example, an ordinally scaled item captures a value between \u201cstrongly disagree\u201d to \u201cstrongly agree.\u201d Attaching a rating scale to a statement or instrument is not scaling. Rather, scaling is the formal process of developing scale items, before rating scales can be attached to those items.<\/p>\n<p>If creating your own scale sounds painful, don\u2019t worry! For most multidimensional variables, you would likely be duplicating work that has already been done by other researchers. Specifically, this is a branch of science called psychometrics. You do not need to create a scale for motivation because scales such as the Intrinsic Motivation Inventory (IMI), General Causality Orientations Scale (GCOS), and the Sport Climate Questionnaire (SCQ) have been developed and refined over dozens of years to measure variables like motivation. As we will discuss in the next section, these scales have been shown to be reliable and valid. While you could create a new scale to measure motivation or satisfaction, a study with rigor would pilot test and refine that new scale over time to make sure it measures the concept accurately and consistently. This high level of rigor is often unachievable in student research projects because of the cost and time involved in pilot testing and validating, so using existing scales is recommended.<\/p>\n<p>Unfortunately, there is no good one-stop-shop for psychometric scales. The <a href=\"https:\/\/databases.lib.sfu.ca\/record\/61245147620003610\/Mental-Measurements-Yearbook-with-Tests-in-Print\">Mental Measurements Yearbook<\/a> provides a searchable database of measures for social science variables, though it is woefully incomplete and often does not contain the full documentation for scales in its database. You can access it from a university library\u2019s list of databases. If you can\u2019t find anything in there, your next stop should be the methods section of the articles in your literature review. The methods section of each article will detail how the researchers measured their variables, and often the results section is instructive for understanding more about measures. In a quantitative study, researchers may have used a scale to measure key variables and will provide a brief description of that scale, its names, and maybe a few example questions. If you need more information, look at the results section and tables discussing the scale to get a better idea of how the measure works. Looking beyond the articles in your literature review, searching Google Scholar using queries like \u201cmotivation scale\u201d or \u201csatisfaction scale\u201d should also provide some relevant results. For example, searching for documentation for the Rosenberg Self-Esteem Scale (which we will discuss in the next section), I found this <a href=\"http:\/\/www.integrativehealthpartners.org\/downloads\/ACTmeasures.pdf\">report from researchers investigating acceptance and commitment therapy<\/a> which details this scale and many others used to assess mental health outcomes. If you find the name of the scale somewhere but cannot find the documentation (all questions and answers plus how to interpret the scale), a general web search with the name of the scale and &#8220;.pdf&#8221; may bring you to what you need. Or, to get professional help with finding information, always ask a librarian!<\/p>\n<p>Unfortunately, these approaches do not guarantee that you will be able to view the scale itself or get information on how it is interpreted. Many scales cost money to use and may require training to properly administer. You may also find scales that are related to your variable but would need to be slightly modified to match your study\u2019s needs. You could adapt a scale to fit your study, however changing even small parts of a scale can influence its accuracy and consistency. While it is perfectly acceptable in student projects to adapt a scale without testing it first (time may not allow you to do so), pilot testing is always recommended for adapted scales, and researchers seeking to draw valid conclusions and publish their results must take this additional step.<\/p>\n<h4>Indices<\/h4>\n<p>An <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_576\"><strong>index<\/strong><\/a> is a composite score derived from aggregating measures of multiple concepts (called components) using a set of rules and formulas. It is different from a scale. Scales also aggregate measures; however, these measures examine different dimensions <em>or<\/em> the same dimension of a single construct. A well-known example of an index is the <a href=\"https:\/\/www.bls.gov\/cpi\/\">consumer price index<\/a> (CPI), which is computed every month by the Bureau of Labor Statistics of the U.S. Department of Labor. The CPI is a measure of how much consumers have to pay for goods and services (in general) and is divided into eight major categories (food and beverages, housing, apparel, transportation, healthcare, recreation, education and communication, and \u201cother goods and services\u201d), which are further subdivided into more than 200 smaller items. Each month, government employees call all over the country to get the current prices of more than 80,000 items. Using a complicated weighting scheme that takes into account the location and probability of purchase for each item, analysts then combine these prices into an overall index score using a series of formulas and rules.<\/p>\n<p>Another example of an index is the <a href=\"https:\/\/usa.ipums.org\/usa-action\/variables\/SEI#description_section\">Duncan Socioeconomic Index<\/a> (SEI). This index is used to quantify a person&#8217;s socioeconomic status (SES) and is a combination of three concepts: income, education, and occupation. Income is measured in dollars, education in years or degrees achieved, and occupation is classified into categories or levels by status. These very different measures are combined to create an overall SES index score. However, SES index measurement has generated a lot of controversy and disagreement among researchers and may not easily generalize from nation to nation. For a discussion of SES in Canada, check out <a href=\"https:\/\/journals.sfu.ca\/ijepl\/index.php\/ijepl\/article\/view\/858\">Measures of Socio-Economic Status in Educational Research: The Canadian Context<\/a>.<\/p>\n<div class=\"textbox\">Here is a resource where you can read a&nbsp;<a href=\"https:\/\/usa.ipums.org\/usa\/chapter4\/sei_note.shtml\">summary of the Socio-Economic Index debate.<\/a><\/div>\n<p>The process of creating an index is similar to that of a scale. First, conceptualize (define) the index and its constituent components. Though this appears simple, there may be a lot of disagreement on what components (concepts\/constructs) should be included or excluded from an index. For instance, in the SES index, isn\u2019t income correlated with education and occupation? And if so, should we include one component only or all three components? Reviewing the literature, using theories, and\/or interviewing experts or key stakeholders may help resolve this issue. Second, operationalize and measure each component. For instance, how will you categorize occupations, particularly since some occupations may have changed with time (e.g., there were no Web developers before the Internet)? As we will see in step three below, researchers must create a rule or formula for calculating the index score. Again, this process may involve a lot of subjectivity, so validating the index score using existing or new data is important.<\/p>\n<p>Scale and index development are often taught in their own course in doctoral education, so it is unreasonable for you to expect to develop a consistently accurate measure within the span of a week or two. Using available indices and scales is recommended for this reason.<\/p>\n<h4>Differences between scales and indices<\/h4>\n<p>Though indices and scales yield a single numerical score or value representing a concept of interest, they are different in many ways. First, indices often comprise components that are very different from each other (e.g., income, education, and occupation in the SES index) and are measured in different ways. Conversely, scales typically involve a set of similar items that use the same rating scale (such as a five-point Likert scale about customer satisfaction).<\/p>\n<p>Second, indices often combine objectively measurable values such as prices or income, while scales are designed to assess subjective or judgmental constructs such as attitude, prejudice, or self-esteem. Some argue that the sophistication of the scaling methodology makes scales different from indexes, while others suggest that indexing methodology can be equally sophisticated. Nevertheless, indexes and scales are both essential tools in social science research.<\/p>\n<p>Scales and indices seem like clean, convenient ways to measure different phenomena in social science, but just like with a lot of research, we have to be mindful of the assumptions and biases underneath. What if a scale or an index was developed using only White women as research participants? Is it going to be useful for other groups? It very well might be, but when using a scale or index on a group for whom it hasn&#8217;t been tested, it will be very important to evaluate the validity and reliability of the instrument, which we address in the rest of the chapter.<\/p>\n<p>Finally, it&#8217;s important to note that while scales and indices are often made up of nominal or ordinal variables, when we analyze them into composite scores, we will treat them as interval\/ratio variables.<\/p>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<ul>\n<li>Look back to your work from the previous section, are your variables unidimensional or multidimensional?<\/li>\n<li>Describe the specific measures you will use (actual questions and response options you will use with participants) for each variable in your research question.<\/li>\n<li>If you are using a measure developed by another researcher but do not have all of the questions, response options, and instructions needed to implement it, put it on your to-do list to get them.<\/li>\n<\/ul>\n<\/div>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_130\" aria-describedby=\"caption-attachment-130\" style=\"width: 1024px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-4178 size-large\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/mockup-graphics-i1iqQRLULlg-unsplash-scaled-1.jpg\" alt=\"\" width=\"1024\" height=\"683\" \/><figcaption id=\"caption-attachment-130\" class=\"wp-caption-text\">If we were operationalizing blood pressure, the cuff and reader would be the measure&#8230;but how do we interpret what is high, low, and normal blood pressure?<\/figcaption><\/figure>\n<h3>Step 3: How you will interpret your measures<\/h3>\n<p>The final stage of operationalization involves setting the rules for how the measure works and how the researcher should interpret the results. Sometimes, interpreting a measure can be incredibly easy. If you ask someone their age, you\u2019ll probably interpret the results by noting the raw number (e.g., 22) someone provides and that it is lower or higher than other people&#8217;s ages. However, you could also recode that person into age categories (e.g., under 25, 20-29-years-old, generation Z, etc.). Even scales may be simple to interpret. If there is a scale of problem behaviors, one might simply add up the number of behaviors checked off\u2013with a range from 1-5 indicating low risk of delinquent behavior, 6-10 indicating the student is moderate risk, etc. How you choose to interpret your measures should be guided by how they were designed, how you conceptualize your variables, the data sources you used, and your plan for analyzing your data statistically. Whatever measure you use, you need a set of rules for how to take any valid answer a respondent provides to your measure and interpret it in terms of the variable being measured.<\/p>\n<p>For more complicated measures like scales, refer to the information provided by the author for how to interpret the scale. If you can\u2019t find enough information from the scale\u2019s creator, look at how the results of that scale are reported in the results section of research articles.<\/p>\n<p>One common mistake I see often is that students will introduce another variable into their operational definition. This is incorrect. Your operational definition should mention only one variable\u2014the variable being defined. While your study will certainly draw conclusions about the relationships between variables, that&#8217;s not what operationalization is. Operationalization specifies what instrument you will use to measure your variable and how you plan to interpret the data collected using that measure.<\/p>\n<p>Operationalization is probably the trickiest component of basic research methods, so please don\u2019t get frustrated if it takes a few drafts and a lot of feedback to get to a workable definition. At the time of this writing, the books original author was in the process of operationalizing the concept of \u201cattitudes towards research methods.\u201d Originally, he thought that he could gauge students\u2019 attitudes toward research methods by looking at their end-of-semester course evaluations. As he became aware of the potential methodological issues with student course evaluations, he opted to use focus groups of students to measure their common beliefs about research. You may recall some of these opinions from <a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/1-science-and-social-work\/\">Chapter 1<\/a>, such as the common beliefs that research is boring, useless, and too difficult. After the focus group, he created a scale based on the opinions he gathered, and he plans to pilot test it with another group of students. After the pilot test, he expects that he will have to revise the scale again before he can implement the measure in a real research project.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Key Takeaways<\/h3>\n<ul>\n<li>Operationalization involves spelling out precisely how a concept will be measured.<\/li>\n<li>Operational definitions must include the variable, the measure, and how you plan to interpret the measure.<\/li>\n<li>There are four different levels of measurement: nominal, ordinal, interval, and ratio (in increasing order of specificity).<\/li>\n<li>Scales and indices are common ways to collect information and involve using multiple indicators in measurement.<\/li>\n<li>A key difference between a scale and an index is that a scale contains multiple indicators for one concept, whereas an indicator examines multiple concepts (components).<\/li>\n<li>Using scales developed and refined by other researchers can improve the rigor of a quantitative study.<\/li>\n<\/ul>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<p>Use the research question that you developed in the previous chapters and find a related scale or index that researchers have used. If you have trouble finding the exact phenomenon you want to study, get as close as you can.<\/p>\n<ul>\n<li>What is the level of measurement for each item on each tool? Take a second and think about why the tool&#8217;s creator decided to include these levels of measurement. Identify any levels of measurement you would change and why.<\/li>\n<li>If these tools don&#8217;t exist for what you are interested in studying, why do you think that is?<a id=\"11.3\"><\/a><\/li>\n<\/ul>\n<\/div>\n<h1>11.3 Measurement quality<\/h1>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\n<p>Learners will be able to&#8230;<\/p>\n<ul>\n<li>Define and describe the types of validity and reliability<\/li>\n<li>Assess for systematic error<\/li>\n<\/ul>\n<\/div>\n<p>The previous chapter provided insight into measuring concepts in social science research. We discussed the importance of identifying concepts and their corresponding indicators as a way to help us operationalize them. In essence, we now understand that when we think about our measurement process, we must be intentional and thoughtful in the choices that we make. This section is all about how to judge the quality of the measures you&#8217;ve chosen for the key variables in your research question.<\/p>\n<h2><strong><span style=\"color: #ff0000\">&#8211;&gt;Reliability&nbsp;and Validity: Really Important Sections&lt;&#8211;<\/span><\/strong><\/h2>\n<p>(If I could make it flash, I would)<\/p>\n<h2>Reliability<\/h2>\n<p>First, let\u2019s say we\u2019ve decided to measure alcoholism by asking people to respond to the following question: Have you ever had a problem with alcohol? If we measure alcoholism this way, then it is likely that anyone who identifies as an alcoholic would respond \u201cyes.\u201d This may seem like a good way to identify our group of interest, but think about how you and your peer group may respond to this question. Would participants respond differently after a wild night out, compared to any other night? Could an infrequent drinker\u2019s current headache from last night\u2019s glass of wine influence how they answer the question this morning? How would that same person respond to the question before consuming the wine? In each cases, the same person might respond differently to the same question at different points, so it is possible that our measure of alcoholism has a reliability problem.&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_589\">Reliability<\/a><\/strong>&nbsp;in measurement is about consistency.<\/p>\n<p>One common problem of reliability with social scientific measures is memory. If we ask research participants to recall some aspect of their own past behavior, we should try to make the recollection process as simple and straightforward for them as possible. Sticking with the topic of alcohol intake, if we ask respondents how much wine, beer, and liquor they\u2019ve consumed each day over the course of the past 3 months, how likely are we to get accurate responses? Unless a person keeps a journal documenting their intake, there will very likely be some inaccuracies in their responses. On the other hand, we might get more accurate responses if we ask a participant how many drinks of any kind they have consumed in the past week.<\/p>\n<p>Reliability can be an issue even when we\u2019re not reliant on others to accurately report their behaviors. Perhaps a researcher is interested in observing how alcohol intake influences interactions in public locations. They may decide to conduct observations at a local pub by noting how many drinks patrons consume and how their behavior changes as their intake changes. What if the researcher has to use the restroom, and the patron next to them takes three shots of tequila during the brief period the researcher is away from their seat? The reliability of this researcher\u2019s measure of alcohol intake depends on their ability to physically observe every instance of patrons consuming drinks. If they are unlikely to be able to observe every such instance, then perhaps their mechanism for measuring this concept is not reliable.<\/p>\n<p>The following subsections describe the types of reliability that are important for you to know about, but keep in mind that you may see other approaches to judging reliability mentioned in the empirical literature.<\/p>\n<h3><b><\/b>Test-retest reliability<\/h3>\n<p>When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time. <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_653\"><strong>Test-retest reliability<\/strong><\/a> is the extent to which this is actually the case. For example, intelligence is generally thought to be consistent across time. A person who is highly intelligent today will be highly intelligent next week. This means that any good measure of intelligence should produce roughly the same scores for this individual next week as it does today. Clearly, a measure that produces highly inconsistent scores over time cannot be a very good measure of a construct that is supposed to be consistent (Whoops&#8230; pro-tip. Did you know the human race has been getting smarter over the past century<a class=\"footnote\" title=\"Trahan, L. H., Stuebing, K. K., Fletcher, J. M., &amp; Hiscock, M. (2014). The Flynn effect: A meta-analysis. Psychological Bulletin, 140(5), 1332\u20131360. https:\/\/doi.org\/10.1037\/a0037173\" id=\"return-footnote-131-13\" href=\"#footnote-131-13\" aria-label=\"Footnote 13\"><sup class=\"footnote\">[13]<\/sup><\/a>?).<\/p>\n<p>Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the&nbsp;<em>same<\/em> group of people at a later time. Unlike an experiment, you aren&#8217;t giving participants an intervention but trying to establish a reliable baseline of the variable you are measuring. Once you have these two measurements, you then look at the correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing the correlation coefficient. Figure 11.2 shows the correlation between two sets of scores of several university students on the Rosenberg Self-Esteem Scale, administered two times, a week apart. The correlation coefficient for these data is +.95. In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.<\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_130\" aria-describedby=\"caption-attachment-130\" style=\"width: 902px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-3152 size-full\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/5.2.png\" alt=\"A scatterplot with scores at time 1 on the x-axis and scores at time 2 on the y-axis, both ranging from 0 to 30. The dots on the scatter plot indicate a strong, positive correlation.\" width=\"902\" height=\"448\" \/><figcaption id=\"caption-attachment-130\" class=\"wp-caption-text\">Figure 11.2 Test-retest correlation between two sets of scores of several college students on the Rosenberg Self-Esteem Scale, given two times a week apart<\/figcaption><\/figure>\n<figure id=\"attachment_318\" class=\"wp-caption aligncenter\" aria-describedby=\"caption-attachment-318\"><\/figure>\n<p>Again, high test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for intelligence, self-esteem, and the Big Five personality dimensions. But other constructs are not assumed to be stable over time. The very nature of mood, for example, is that it changes. So a measure of mood that produced a low test-retest correlation over a period of a month would not be a cause for concern.<\/p>\n<h3><b><\/b>Internal consistency<\/h3>\n<p>Another kind of reliability is <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_725\"><strong>internal consistency<\/strong><\/a>, which is the consistency of people\u2019s responses across the items on a multiple-item measure. In general, all the items on such measures are supposed to reflect the same underlying construct, so people\u2019s scores on those items should be correlated with each other. On the Rosenberg Self-Esteem Scale, people who agree that they are a person of worth should tend to agree that they have a number of good qualities. If people\u2019s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct. This is as true for behavioral and physiological measures as for self-report measures. For example, people might make a series of bets in a simulated game of roulette as a measure of their level of risk seeking. This measure would be internally consistent to the extent that individual participants\u2019 bets were consistently high or low across trials. A specific statistical test known as Cronbach\u2019s Alpha provides a way to measure how well each question of a scale is related to the others.<\/p>\n<h3><b><\/b>Interrater reliability<\/h3>\n<p>Many behavioral measures involve significant judgment on the part of an observer or a rater. <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_649\"><strong>Interrater reliability<\/strong><\/a> is the extent to which different observers are consistent in their judgments. For example, if you were interested in measuring university students\u2019 social skills, you could make video recordings of them as they interacted with another student whom they are meeting for the first time. Then you could have two or more observers watch the videos and rate each student\u2019s level of social skills. To the extent that each participant does, in fact, have some level of social skills that can be detected by an attentive observer, different observers\u2019 ratings should be highly correlated with each other.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-127\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920-1024x683.jpg\" alt=\"\" width=\"1024\" height=\"683\" srcset=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920-1024x683.jpg 1024w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920-300x200.jpg 300w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920-768x512.jpg 768w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920-1536x1024.jpg 1536w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920-65x43.jpg 65w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920-225x150.jpg 225w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920-350x233.jpg 350w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/dartboard-5518055_1920.jpg 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h2>Validity<\/h2>\n<p><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_590\"><strong>Validity<\/strong><\/a>, another key element of assessing measurement quality, is the extent to which the scores from a measure represent the variable they are intended to. But how do researchers make this judgment? We have already considered one factor that they take into account\u2014reliability. When a measure has good test-retest reliability and internal consistency, researchers should be more confident that the scores represent what they are supposed to. There has to be more to it, however, because a measure can be extremely reliable but have no validity whatsoever. For example, think about a math test of story problems designed to evaluate addition skill. If the story problems were written at the fifth grade reading level, but given to a first grade class, they would be reliable (students would consistently fail, all the time) but not valid (you wouldn&#8217;t get an accurate understanding of the student&#8217;s mathematical ability).<\/p>\n<p>Discussions of validity usually divide it into several distinct \u201ctypes.\u201d But a good way to interpret these types is that they are other kinds of evidence\u2014in addition to reliability\u2014that should be taken into account when judging the validity of a measure.<\/p>\n<h3><b><\/b>Face validity<\/h3>\n<p><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_643\"><strong>Face validity<\/strong><\/a> is the extent to which a measurement method appears \u201con its face\u201d to measure the construct of interest. Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. So a questionnaire that included these kinds of items would have good face validity. Although face validity can be assessed quantitatively\u2014for example, by having a large sample of people rate a measure in terms of whether it appears to measure what it is intended to\u2014it is usually assessed informally.<\/p>\n<p>Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people\u2019s intuitions about human behavior, which are frequently wrong. Math teachers might look at our test of story problems and see them as measuring addition skills, yet not realize the story problems are all written using language that is too complex for first grade students to grasp.<\/p>\n<h3><b><\/b>Content validity<\/h3>\n<p><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_644\"><strong>Content validity<\/strong><\/a> is the extent to which a measure \u201ccovers\u201d the construct of interest. For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something. By this conceptual definition, a person has a positive attitude toward exercise to the extent that they think positive thoughts about exercising, feels good about exercising, and actually exercises. So to have good content validity, a measure of people\u2019s attitudes toward exercise would have to reflect all three of these aspects. Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.<\/p>\n<h3><b><\/b>Criterion validity<\/h3>\n<p><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_647\"><strong>Criterion validity<\/strong><\/a> is the extent to which people\u2019s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with. For example, people\u2019s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam. If it were found that people\u2019s scores were in fact negatively correlated with their exam performance, then this would be a piece of evidence that these scores really represent people\u2019s test anxiety. But if it were found that people scored equally well on the exam regardless of their test anxiety scores, then this would cast doubt on the validity of the measure.<\/p>\n<p>A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. For example, one would expect test anxiety scores to be negatively correlated with exam performance and course grades and positively correlated with general anxiety and with blood pressure during an exam. Or imagine that a researcher develops a new measure of physical risk taking. People\u2019s scores on this measure should be correlated with their participation in \u201cextreme\u201d activities such as snowboarding and rock climbing, the number of speeding tickets they have received, and even the number of broken bones they have had over the years. When the criterion is measured at the same time as the construct, criterion validity is referred to as <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_646\"><strong>concurrent validity<\/strong><\/a>; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_645\"><strong>predictive validity<\/strong><\/a> (because scores on the measure have \u201cpredicted\u201d a future outcome).<\/p>\n<h3>Discriminant validity<\/h3>\n<p><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_726\"><strong>Discriminant validity<\/strong><\/a>, on the other hand, is the extent to which scores on a measure are <em>not<\/em>&nbsp;correlated with measures of variables that are conceptually distinct. For example, self-esteem is a general attitude toward the self that is fairly stable over time. It is not the same as mood, which is how good or bad one happens to be feeling right now. So people\u2019s scores on a new measure of self-esteem should not be very highly correlated with their moods. If the new measure of self-esteem were highly correlated with a measure of mood, it could be argued that the new measure is not really measuring self-esteem; it is measuring mood instead.<\/p>\n<h2>Increasing the reliability and validity of measures<\/h2>\n<p>We have reviewed the types of errors and how to evaluate our measures based on reliability and validity considerations. However, what can we do while selecting or creating our tool so that we minimize the potential of errors? Many of our options were covered in our discussion about reliability and validity. Nevertheless, the following table provides a quick summary of things that you should do when creating or selecting a measurement tool. While not all of these will be feasible in your project, it is important to include easy-to-implement measures in your research context.<\/p>\n<p>Make sure that you engage in a rigorous literature review so that you understand the concept that you are studying. This means understanding the different ways that your concept may manifest itself. This review should include a search for existing instruments.<a class=\"footnote\" title=\"Sullivan G. M. (2011). A primer on the validity of assessment instruments. Journal of graduate medical education, 3(2), 119\u2013120. doi:10.4300\/JGME-D-11-00075.1\" id=\"return-footnote-131-14\" href=\"#footnote-131-14\" aria-label=\"Footnote 14\"><sup class=\"footnote\">[14]<\/sup><\/a><\/p>\n<ul>\n<li>Do you understand all the dimensions of your concept? Do you have a good understanding of the content dimensions of your concept(s)?<\/li>\n<li>What instruments exist? How many items are on the existing instruments? Are these instruments appropriate for your population?<\/li>\n<li>Are these instruments standardized? Note: If an instrument is standardized, that means it has been rigorously studied and tested.<\/li>\n<\/ul>\n<p>Consult content experts to review your instrument. This is a good way to check the face validity of your items. Additionally, content experts can also help you understand the content validity.<a class=\"footnote\" title=\"Sullivan G. M. (2011). A primer on the validity of assessment instruments. Journal of graduate medical education, 3(2), 119\u2013120. doi:10.4300\/JGME-D-11-00075.1\" id=\"return-footnote-131-15\" href=\"#footnote-131-15\" aria-label=\"Footnote 15\"><sup class=\"footnote\">[15]<\/sup><\/a><\/p>\n<ul>\n<li>Do you have access to a reasonable number of content experts? If not, how can you locate them?<\/li>\n<li>Did you provide a list of critical questions for your content reviewers to use in the reviewing process?<\/li>\n<\/ul>\n<p>Pilot test your instrument on a sufficient number of people and get detailed feedback.<a class=\"footnote\" title=\"Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE.\" id=\"return-footnote-131-16\" href=\"#footnote-131-16\" aria-label=\"Footnote 16\"><sup class=\"footnote\">[16]<\/sup><\/a> Ask your group to provide feedback on the wording and clarity of items. Keep detailed notes and make adjustments BEFORE you administer your final tool.<\/p>\n<ul>\n<li>How many people will you use in your pilot testing?<\/li>\n<li>How will you set up your pilot testing so that it mimics the actual process of administering your tool?<\/li>\n<li>How will you receive feedback from your pilot testing group? Have you provided a list of questions for your group to think about?<\/li>\n<\/ul>\n<p>Provide training for anyone collecting data for your project.<a class=\"footnote\" title=\"Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE.\" id=\"return-footnote-131-17\" href=\"#footnote-131-17\" aria-label=\"Footnote 17\"><sup class=\"footnote\">[17]<\/sup><\/a> You should provide those helping you with a written research protocol that explains all of the steps of the project. You should also problem solve and answer any questions that those helping you may have. This will increase the chances that your tool will be administered in a consistent manner.<\/p>\n<ul>\n<li>How will you conduct your orientation\/training? How long will it be? What modality?<\/li>\n<li>How will you select those who will administer your tool? What qualifications do they need?<\/li>\n<\/ul>\n<p>When thinking of items, use a higher level of measurement, if possible.<a class=\"footnote\" title=\"Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE.\" id=\"return-footnote-131-18\" href=\"#footnote-131-18\" aria-label=\"Footnote 18\"><sup class=\"footnote\">[18]<\/sup><\/a> This will provide more information and you can always downgrade to a lower level of measurement later.<\/p>\n<ul>\n<li>Have you examined your items and the levels of measurement?<\/li>\n<li>Have you thought about whether you need to modify the type of data you are collecting? Specifically, are you asking for information that is too specific (at a higher level of measurement) which may reduce participants&#8217; willingness to participate?<\/li>\n<\/ul>\n<p>Use multiple indicators for a variable.<a class=\"footnote\" title=\"Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE.\" id=\"return-footnote-131-19\" href=\"#footnote-131-19\" aria-label=\"Footnote 19\"><sup class=\"footnote\">[19]<\/sup><\/a> Think about the number of items that you will include in your tool.<\/p>\n<ul>\n<li>Do you have enough items? Enough indicators? The correct indicators?<\/li>\n<\/ul>\n<p>Conduct an item-by-item assessment of multiple-item measures.<a class=\"footnote\" title=\"Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE.\" id=\"return-footnote-131-20\" href=\"#footnote-131-20\" aria-label=\"Footnote 20\"><sup class=\"footnote\">[20]<\/sup><\/a> When you do this assessment, think about each word and how it changes the meaning of your item.<\/p>\n<ul>\n<li>Are there items that are redundant? Do you need to modify, delete, or add items?<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-128\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920-1024x767.jpg\" alt=\"\" width=\"1024\" height=\"767\" srcset=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920-1024x767.jpg 1024w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920-300x225.jpg 300w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920-768x576.jpg 768w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920-1536x1151.jpg 1536w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920-65x49.jpg 65w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920-225x169.jpg 225w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920-350x262.jpg 350w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/error-63628_1920.jpg 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h2>Types of error<\/h2>\n<p>As you can see, measures never perfectly describe what exists in the real world. Good measures demonstrate validity and reliability but will always have some degree of error. <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_382\">Systematic error<\/a><\/strong> (also called bias) causes our measures to consistently output incorrect data in one direction or another on a measure, usually due to an identifiable process. Imagine you created a measure of height, but you didn\u2019t put an option for anyone over six feet tall. If you gave that measure to your local college or university, some of the taller students might not be measured accurately. In fact, you would be under the mistaken impression that the tallest person at your school was six feet tall, when in actuality there are likely people taller than six feet at your school. This error seems innocent, but if you were using that measure to help you build a new building, those people might hit their heads!<\/p>\n<p>A less innocent form of error arises when researchers word questions in a way that might cause participants to think one answer choice is preferable to another. For example, if I were to ask you \u201cDo you think global warming is caused by human activity?\u201d you would probably feel comfortable answering honestly. But what if I asked you \u201cDo you agree with 99% of scientists that global warming is caused by human activity?\u201d Would you feel comfortable saying no, if that\u2019s what you honestly felt? I doubt it. That is an example of a&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_727\">leading question<\/a><\/strong>, a question with wording that influences how a participant responds. We\u2019ll discuss leading questions and other problems in question wording in greater detail in <a href=\"https:\/\/pressbooks.rampages.us\/msw-research\/chapter\/12-survey-design\/\">Chapter 12<\/a>.<\/p>\n<p>In addition to error created by the researcher, your participants can cause error in measurement. Some people will respond without fully understanding a question, particularly if the question is worded in a confusing way. Let\u2019s consider another potential source or error. If we asked people if they always washed their hands after using the bathroom, would we expect people to be perfectly honest? Polling people about whether they wash their hands after using the bathroom might only elicit what people would like others to think they do, rather than what they actually do. This is an example of&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_343\">social desirability bias<\/a><\/strong>, in which participants in a research study want to present themselves in a positive, socially desirable way to the researcher. People in your study will want to seem tolerant, open-minded, and intelligent, but their true feelings may be closed-minded, simple, and biased. Participants may lie in this situation. This occurs often in political polling, which may show greater support for a candidate from a minority race, gender, or political party than actually exists in the electorate.<\/p>\n<p>A related form of bias is called&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_728\">acquiescence bias<\/a><\/strong>, also known as \u201cyea-saying.\u201d It occurs when people say yes to whatever the researcher asks, even when doing so contradicts previous answers. For example, a person might say yes to both \u201cI am a confident leader in group discussions\u201d and \u201cI feel anxious interacting in group discussions.\u201d Those two responses are unlikely to both be true for the same person. Why would someone do this? Similar to social desirability, people want to be agreeable and nice to the researcher asking them questions or they might ignore contradictory feelings when responding to each question. You could interpret this as someone saying &#8220;yeah, I guess.&#8221; Respondents may also act on cultural reasons, trying to \u201csave face\u201d for themselves or the person asking the questions. Regardless of the reason, the results of your measure don\u2019t match what the person truly feels.<\/p>\n<p>So far, we have discussed sources of error that come from choices made by respondents or researchers. Systematic errors will result in responses that are incorrect in one direction or another. For example, social desirability bias usually means that the number of people who <em>say<\/em>&nbsp;they will vote for a third party in an election is greater than the number of people who actually vote for that candidate. Systematic errors such as these can be reduced, but random error can never be eliminated. Unlike systematic error, which biases responses consistently in one direction or another,&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_378\">random error<\/a><\/strong>&nbsp;is unpredictable and does not consistently result in scores that are consistently higher or lower on a given measure. Instead, random error is more like statistical noise, which will likely average out across participants.<\/p>\n<p>Random error is present in any measurement. If you\u2019ve ever stepped on a bathroom scale twice and gotten two slightly different results, maybe a difference of a tenth of a pound, then you\u2019ve experienced random error. Maybe you were standing slightly differently or had a fraction of your foot off of the scale the first time. If you were to take enough measures of your weight on the same scale, you\u2019d be able to figure out your true weight. In social science, if you gave someone a scale measuring motivation on a day after they lost their job, they would likely score differently than if they had just gotten a promotion and a raise. Thus, social scientists speak with humility about our measures. We are reasonably confident that what we found is true, but we must always acknowledge that our measures are only an approximation of reality.<\/p>\n<p>Humility is important in scientific measurement, as errors can have real consequences. At the time I&#8217;m writing this, I tested positive for COVID. Like most people, I used a home test from the pharmacy. If the test said I was &nbsp;positive when I was not, that would be a <strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_381\">false positive<\/a><\/strong>. On the other hand, if the test indicated that I was not positive when I was in fact ill, that would be a&nbsp;<strong><a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_380\">false negative<\/a><\/strong>. Even if the test is 99% accurate, that means that one in a hundred testers will get an erroneous result when they use the test. For me, a false negative would have been a relief, then devastating when I found out I was ill. A false positive would have been worrisome at first and then quite a relief when I discovered I wasn&#8217;t sick with COVID. While both false positives and false negatives are not very likely for home COVID tests (when taken correctly), measurement error can have consequences for the people being measured.<\/p>\n<div class=\"textbox key-takeaways\">\n<h3>Key Takeaways<\/h3>\n<ul>\n<li>Reliability is a matter of consistency.<\/li>\n<li>Validity is a matter of accuracy.<\/li>\n<li>There are many types of validity and reliability.<\/li>\n<li>Systematic error may arise from the researcher, participant, or measurement instrument.<\/li>\n<li>Systematic error biases results in a particular direction, whereas random error can be in any direction.<\/li>\n<li>All measures are prone to error and should interpreted with humility.<\/li>\n<\/ul>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<p>Use the measurement tools you located in the previous exercise. Evaluate the reliability and validity of these tools. Hint: You will need to go into the literature to &#8220;research&#8221; these tools.<\/p>\n<ul>\n<li>Provide a clear statement regarding the reliability and validity of these tools. What strengths did you notice? What were the limitations?<\/li>\n<li>Think about your <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_621\"><strong>target population<\/strong><\/a>. Are there changes that need to be made in order for one of these tools to be appropriate for your population?<\/li>\n<li>If you decide to create your own tool, how will you assess its validity and reliability?<a id=\"11.4\"><\/a><\/li>\n<\/ul>\n<\/div>\n<h1>11.4 Ethical and social justice considerations<\/h1>\n<div class=\"textbox learning-objectives\">\n<h3>Learning Objectives<\/h3>\n<p>Learners will be able to&#8230;<\/p>\n<ul>\n<li>Identify potential cultural, ethical, and social justice issues in measurement.<\/li>\n<\/ul>\n<\/div>\n<p>With your variables operationalized, it&#8217;s time to take a step back and look at how measurement in social science impact our daily lives. As we will see, how we measure things is both shaped by power arrangements inside our society, and more insidiously, by establishing what is scientifically true, measures have their own power to influence the world. Just like reification in the conceptual world, how we operationally define concepts can reinforce or fight against oppressive forces.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-4181\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/mitchell-griest-ImgBdiGAl4c-unsplash-scaled-1.jpg\" alt=\"\" width=\"1024\" height=\"781\" \/><\/p>\n<h2>Data equity<\/h2>\n<p>How we decide to measure our variables determines what kind of data we end up with in our research project. Because scientific processes are a part of our sociocultural context, the same biases and oppressions we see in the real world can be manifested or even magnified in research data. Jagadish and colleagues (2021)<a class=\"footnote\" title=\"Jagadish, H. V., Stoyanovich, J., &amp; Howe, B. (2021). COVID-19 Brings Data Equity Challenges to the Fore. Digital Government: Research and Practice,\u00a02(2), 1-7.\" id=\"return-footnote-131-21\" href=\"#footnote-131-21\" aria-label=\"Footnote 21\"><sup class=\"footnote\">[21]<\/sup><\/a> presents four dimensions of data equity that are relevant to consider: in representation of non-dominant groups within data sets; in how data is collected, analyzed, and combined across datasets; in equitable and participatory access to data, and finally in the outcomes associated with the data collection. Historically, we have mostly focused on the outcomes of measures producing outcomes that are biased in one way or another, and this section reviews many such examples. However, it is important to note that equity must also come from designing measures that respond to questions like:<\/p>\n<ol>\n<li>Are groups historically suppressed from the data record represented in the sample?<\/li>\n<li>Are equity data gathered by researchers and used to uncover and quantify inequity?<\/li>\n<li>Are the data accessible across domains and levels of expertise, and can community members participate in the design, collection, and analysis of the public data record?<\/li>\n<li>Are the data collected used to monitor and mitigate inequitable impacts?<\/li>\n<\/ol>\n<p>So, it&#8217;s not just about whether measures work for one population for another. Data equity is about the context in which data are created from how we measure people and things. We agree with these authors that data equity should be considered within the context of automated decision-making systems and recognizing a broader literature around the role of administrative systems in creating and reinforcing discrimination. To combat the inequitable processes and outcomes we describe below, researchers must foreground equity as a core component of measurement.<\/p>\n<h2>Flawed measures &amp; missing measures<\/h2>\n<p>At the end of every semester, students in just about every university classroom in North America complete similar student evaluations of teaching (SETs). Since every student is likely familiar with these, we can recognize many of the concepts we discussed in the previous sections. There are number of rating scale questions that ask you to rate the professor, class, and teaching effectiveness on a scale of 1-5. Scores are averaged across students and used to determine the quality of teaching delivered by the faculty member. SETs scores are often a principle component of how faculty are reappointed to teaching positions. Would it surprise you to learn that student evaluations of teaching are of questionable quality? If your instructors are assessed with a biased or incomplete measure, how might that impact your education?<\/p>\n<p>Most often, student scores are averaged across questions and reported as a final average. This average is used as one factor, often the most important factor, in a faculty member&#8217;s reappointment to teaching roles. We learned in this chapter that rating scales are ordinal, not interval or ratio, and the data are categories not numbers. Although rating scales use a familiar 1-5 scale, the numbers 1, 2, 3, 4, &amp; 5 are really just helpful labels for categories like &#8220;excellent&#8221; or &#8220;strongly agree.&#8221; If we relabeled these categories as letters (A-E) rather than as numbers (1-5), how would you average them?<\/p>\n<p>Averaging ordinal data is methodologically dubious, as the numbers are merely a useful convention. As you will learn in <a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/14-univariate-analysis\/\">Chapter 14<\/a>, taking the <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_393\"><strong>median<\/strong><\/a> value is what makes the most sense with ordinal data. Median values are also less sensitive to outliers. So, a single student who has strong negative or positive feelings towards the professor could bias the class&#8217;s SETs scores higher or lower than what the &#8220;average&#8221; student in the class would say, particularly for classes with few students or in which fewer students completed evaluations of their teachers.<\/p>\n<p>We care about teaching quality because more effective teachers will produce more knowledgeable and capable students. However, student evaluations of teaching are not particularly good indicators of teaching quality and are not associated with the independently measured learning gains of students (i.e., test scores, final grades) (Uttl et al., 2017).<a class=\"footnote\" title=\"Uttl, B., White, C. A., &amp; Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. Studies in Educational Evaluation,\u00a054, 22-42.\" id=\"return-footnote-131-22\" href=\"#footnote-131-22\" aria-label=\"Footnote 22\"><sup class=\"footnote\">[22]<\/sup><\/a> This speaks to the lack of criterion validity. Higher teaching quality should be associated with better learning outcomes for students, but across multiple studies stretching back years, there is no association that cannot be better explained by other factors. To be fair, there are scholars who find that SETs are valid and reliable. For a thorough <a href=\"https:\/\/www.academia.edu\/31896041\/Student_Ratings_of_Instruction_in_College_and_University_Courses\">defense of SETs as well as a historical summary of the literature<\/a> see Benton &amp; Cashin (2012).<a class=\"footnote\" title=\"Benton, S. L., &amp; Cashin, W. E. (2014). Student ratings of instruction in college and university courses. In Higher education: Handbook of theory and research\u00a0(pp. 279-326). Springer, Dordrecht.\" id=\"return-footnote-131-23\" href=\"#footnote-131-23\" aria-label=\"Footnote 23\"><sup class=\"footnote\">[23]<\/sup><\/a><\/p>\n<p>Even though student evaluations of teaching often contain dozens of questions, researchers often find that the questions are so highly interrelated that one concept (or factor, as it is called in a <a href=\"https:\/\/stats.idre.ucla.edu\/spss\/seminars\/introduction-to-factor-analysis\/a-practical-introduction-to-factor-analysis\/\">factor analysis<\/a>) explains a large portion of the variance in teachers&#8217; scores on student evaluations (Clayson, 2018).<a class=\"footnote\" title=\"Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability.\u00a0Assessment &amp; Evaluation in Higher Education,\u00a043(4), 666-681.\" id=\"return-footnote-131-24\" href=\"#footnote-131-24\" aria-label=\"Footnote 24\"><sup class=\"footnote\">[24]<\/sup><\/a> Personally, I believe based on completing SETs myself that factor is probably best conceptualized as student satisfaction, which is obviously worthwhile to measure, but is conceptually quite different from teaching effectiveness or whether a course achieved its intended outcomes. The lack of a clear operational and conceptual definition for the variable or variables being measured in student evaluations of teaching also speaks to a lack of content validity. Researchers check content validity by comparing the measurement method with the conceptual definition, but without a clear conceptual definition of the concept measured by student evaluations of teaching, it&#8217;s not clear how we can know our measure is valid. Indeed, the lack of clarity around what is being measured in teaching evaluations impairs students&#8217; ability to provide reliable and valid evaluations. So, while many researchers argue that the class average SETs scores are reliable in that they are consistent over time and across classes, it is unclear what exactly is being measured even if it is consistent (Clayson, 2018).<a class=\"footnote\" title=\"Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability. Assessment &amp; Evaluation in Higher Education,\u00a043(4), 666-681.\" id=\"return-footnote-131-25\" href=\"#footnote-131-25\" aria-label=\"Footnote 25\"><sup class=\"footnote\">[25]<\/sup><\/a><\/p>\n<p>As a faculty member, there are a number of things I can do to influence my evaluations and disrupt validity and reliability. Since SETs scores are associated with the grades students perceive they will receive (e.g., Boring et al., 2016),<a class=\"footnote\" title=\"Boring, A., Ottoboni, K., &amp; Stark, P. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness.\u00a0ScienceOpen Research.\" id=\"return-footnote-131-26\" href=\"#footnote-131-26\" aria-label=\"Footnote 26\"><sup class=\"footnote\">[26]<\/sup><\/a> guaranteeing everyone a final grade of A in my class will likely increase my SETs scores and my chances at tenure and promotion. I could time an email reminder to complete SETs with releasing high grades for a major assignment to boost my evaluation scores. On the other hand, student evaluations might be coincidentally timed with poor grades or difficult assignments that will bias student evaluations downward. Students may also infer I am manipulating them and give me lower SET scores as a result. To maximize my SET scores and chances and promotion, I also need to select which courses I teach carefully. Classes that are more quantitatively oriented generally receive lower ratings than more qualitative and humanities-driven classes, which makes my decision to teach social work research a poor strategy (Uttl &amp; Smibert, 2017).<a class=\"footnote\" title=\"Uttl, B., &amp; Smibert, D. (2017). Student evaluations of teaching: teaching quantitative courses can be hazardous to one\u2019s career. Peer Journal,\u00a05, e3299.\" id=\"return-footnote-131-27\" href=\"#footnote-131-27\" aria-label=\"Footnote 27\"><sup class=\"footnote\">[27]<\/sup><\/a> The only manipulative strategy I will admit to using is bringing food (usually cookies or donuts) to class during the period in which students are completing evaluations. <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/29956364\/\">Measurement is impacted by context<\/a>&nbsp;(cookies get me better scores!).<\/p>\n<p>As a white cis-gender male educator, I am adversely impacted by SETs because of their sketchy validity, reliability, and methodology. The other flaws with student evaluations actually help me while disadvantaging teachers from oppressed groups. <a href=\"https:\/\/www.researchgate.net\/profile\/Troy-Heffernan\/publication\/349864729_Sexism_racism_prejudice_and_bias_a_literature_review_and_synthesis_of_research_surrounding_student_evaluations_of_courses_and_teaching\/links\/6046e75492851c077f27d53f\/Sexism-racism-prejudice-and-bias-a-literature-review-and-synthesis-of-research-surrounding-student-evaluations-of-courses-and-teaching.pdf\">Heffernan (2021)<\/a><a class=\"footnote\" title=\"Heffernan, T. (2021). Sexism, racism, prejudice, and bias: a literature review and synthesis of research surrounding student evaluations of courses and teaching.\u00a0Assessment &amp; Evaluation in Higher Education, 1-11.\" id=\"return-footnote-131-28\" href=\"#footnote-131-28\" aria-label=\"Footnote 28\"><sup class=\"footnote\">[28]<\/sup><\/a> provides a comprehensive overview of the sexism, racism, ableism, and prejudice baked into student evaluations:<\/p>\n<blockquote><p>&#8220;In all studies relating to gender, the analyses indicate that the highest scores are awarded in subjects filled with young, white, male students being taught by white English first language speaking, able-bodied, male academics who are neither too young nor too old (approx. 35\u201350 years of age), and who the students believe are heterosexual. Most deviations from this scenario in terms of student and academic demographics equates to lower SET scores. These studies thus highlight that white, able-bodied, heterosexual, men of a certain age are not only the least affected, they benefit from the practice. When every demographic group who does not fit this image is significantly disadvantaged by SETs, these processes serve to further enhance the position of the already privileged&#8221; (p. 5).<\/p><\/blockquote>\n<p>The staggering consistency of studies examining prejudice in SETs has led to some rather superficial reforms like reminding students to not submit racist or sexist responses in the written instructions given before SETs. Yet, even though we know that SETs are systematically biased against women, people of color, and people with disabilities, the overwhelming majority of universities in North America continue to use them to evaluate faculty for promotion or reappointment. From a critical perspective, it is worth considering why university administrators continue to use such a biased and flawed instrument. SETs produce data that make it easy to compare faculty to one another and track faculty members over time. Furthermore, they offer students a direct opportunity to voice their concerns and highlight what went well.<\/p>\n<p>As the people with the greatest knowledge about what happened in the classroom as whether it met their expectations, providing students with open-ended questions is the most productive part of SETs. There is very rarely student input on the criteria and methodology for teaching evaluations, yet students are the most impacted by helpful or harmful teaching practices.<\/p>\n<p>Students should fight for better assessment in the classroom because well-designed assessments provide documentation to support more effective teaching practices and discourage unhelpful or discriminatory practices. Flawed assessments like SETs, can lead to a lack of information about problems with courses, instructors, or other aspects of the program. Think critically about what data your program uses to gauge its effectiveness. How might you introduce areas of student concern into how your program evaluates itself? Are there issues with food or housing insecurity, mentorship of nontraditional and first generation students, or other issues that faculty should consider when they evaluate their program? Finally, as you transition into practice, think about how your school measures its impact and how it privileges or excludes student, parent, and community voices in the assessment process.<\/p>\n<div class=\"textbox\">\n<p>While writing this section, one of the authors wrote this <a href=\"https:\/\/osf.io\/preprints\/socarxiv\/bgk6n\/\">commentary article<\/a> addressing potential racial bias in social work licensing exams. If you are interested in an example of missing or flawed measures that relates to systems <em>your<\/em> social work practice is governed by (rather than SETs which govern <i>our <\/i>practice in higher education) check it out!<\/p>\n<p>You may also be interested in similar <a href=\"https:\/\/www.jessestommel.com\/ungrading-an-faq\/\">arguments against the standard grading scale<\/a> (A-F), and why grades (numerical, letter, etc.) do not do a good job of measuring learning. Think critically about the role that grades play in your life as a student, your self-concept, and your relationships with teachers. Your test and grade anxiety is due in part to how your learning is measured. Those measurements end up becoming an official record of your scholarship and allow employers or funders to compare you to other scholars. The stakes for measurement are the same for participants in your research study.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-130\" src=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5732103_1280-1024x634.png\" alt=\"\" width=\"1024\" height=\"634\" srcset=\"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5732103_1280-1024x634.png 1024w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5732103_1280-300x186.png 300w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5732103_1280-768x475.png 768w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5732103_1280-65x40.png 65w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5732103_1280-225x139.png 225w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5732103_1280-350x217.png 350w, https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-content\/uploads\/sites\/1753\/2022\/08\/man-5732103_1280.png 1280w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<h2>Self-reflection and measurement<\/h2>\n<p>Student evaluations of teaching are just like any other measure. How we decide to measure what we are researching is influenced by our backgrounds, including our culture, implicit biases, and individual experiences. For me as a middle-class, cisgender white man, the decisions I make about measurement will probably default to ones that make the most sense to me and others like me, and thus measure characteristics about us most accurately if I don&#8217;t think carefully about it. There are major implications for research here because this could affect the validity of my measurements for other populations.<\/p>\n<p>This doesn&#8217;t mean that standardized scales or indices, for instance, won&#8217;t work for diverse groups of people. What it means is that researchers must not ignore difference in deciding how to measure a variable in their research. Doing so may serve to push already marginalized people further into the margins of academic research and, consequently, social work intervention. Social work researchers, with our strong orientation toward celebrating difference and working for social justice, are obligated to keep this in mind for ourselves and encourage others to think about it in their research, too.<\/p>\n<p>This involves reflecting on <em>what<\/em> we are measuring, <em>how<\/em> we are measuring, and <em>why<\/em> we are measuring. Do we have biases that impacted how we operationalized our concepts? Did we include <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_308\"><strong>stakeholders<\/strong><\/a> and <a class=\"glossary-term\" aria-haspopup=\"dialog\" aria-describedby=\"definition\" href=\"#term_131_285\"><strong>gatekeepers<\/strong><\/a> in the development of our concepts? This can be a way to gain access to vulnerable populations. What feedback did we receive on our measurement process and how was it incorporated into our work? These are all questions we should ask as we are thinking about measurement. Further, engaging in this intentionally reflective process will help us maximize the chances that our measurement will be accurate and as free from bias as possible.<\/p>\n<p>Unfortunately, social science researchers do not do a great job of sharing their measures in a way that allows practitioners and administrators to use them to evaluate the impact of interventions and programs on clients. Few scales are published under an open copyright license that allows other people to view it for free and share it with others. Instead, the best way to find a scale mentioned in an article is often to simply search for it in Google with &#8220;.pdf&#8221; or &#8220;.docx&#8221; in the query to see if someone posted a copy online (usually in violation of copyright law). As we discussed in <a href=\"https:\/\/sfuedl.pressbooks.com\/chapter\/4-critical-information-literacy\/\">Chapter 4<\/a>, this is an issue of information privilege, or the structuring impact of oppression and discrimination on groups&#8217; access to and use of scholarly information. As a student at a university with a research library, you can access the Mental Measurement Yearbook to look up scales and indexes that measure client or program outcomes while researchers unaffiliated with university libraries cannot do so. Similarly, the vast majority of scholarship in social work and allied disciplines does not share measures, data, or other research materials openly, a best practice in open and collaborative science. It is important to underscore these structural barriers to using valid and reliable scales in. An invalid or unreliable outcome test may cause ineffective or harmful programs to persist or may worsen existing prejudices and oppressions experienced by students, communities, and practitioners.<\/p>\n<p>But it&#8217;s not just about reflecting and identifying problems and biases in our measurement, operationalization, and conceptualization\u2014what are we going to&nbsp;<em>do<\/em> about it? Consider this as you move through this book and become a more critical consumer of research. Sometimes there isn&#8217;t something you can do in the immediate sense\u2014the literature base at this moment just is what it is. But how does that inform what you will do later?<\/p>\n<h2>A place to start: Stop oversimplifying race<\/h2>\n<p><span style=\"text-align: initial;background-color: initial;font-size: 1em\">We will address many more of the critical issues related to measurement in the next chapter. One way to get started in bringing cultural awareness to scientific measurement is through a critical examination of how we analyze race quantitatively. There are many important methodological objections to how we measure the impact of race. We encourage you to watch Dr. Abigail Sewell&#8217;s three-part workshop series called &#8220;Nested Models for Critical Studies of Race &amp; Racism&#8221; for the Inter-university Consortium for Political and Social Research (ICPSR). She discusses how to operationalize and measure inequality, racism, and intersectionality and critiques researchers&#8217; attempts to oversimplify or overlook racism when we measure concepts in social science. If you are interested in developing your social work research skills further, consider applying for financial support from your university to attend an ICPSR summer seminar like Dr. Sewell&#8217;s where you can receive more advanced and specialized training in using research for social change. <\/span><\/p>\n<ul>\n<li><a style=\"text-align: initial;background-color: initial;font-size: 1em\" href=\"https:\/\/youtu.be\/04OZ3BFPpVg\">Part 1: Creating Measures of Supraindividual Racism<\/a><span style=\"text-align: initial;background-color: initial;font-size: 1em\"> (2-hour video)<\/span><\/li>\n<li><a style=\"text-align: initial;background-color: initial;font-size: 1em\" href=\"https:\/\/youtu.be\/pfcKQ_7O9FE\">Part 2: Evaluating Population Risks of Supraindividual Racism<\/a><span style=\"text-align: initial;background-color: initial;font-size: 1em\"> (2-hour video)<\/span><\/li>\n<li><a style=\"text-align: initial;background-color: initial;font-size: 1em\" href=\"https:\/\/www.youtube.com\/watch?v=4OZL7fu2YkI\">Part 3: Quantifying Intersectionality<\/a><span style=\"text-align: initial;background-color: initial;font-size: 1em\"> (2-hour video)<\/span><\/li>\n<\/ul>\n<div class=\"textbox key-takeaways\">\n<h3>Key Takeaways<\/h3>\n<ul>\n<li>Researchers must be attentive to personal and institutional biases in the measurement process that affect marginalized groups.<\/li>\n<li>What is measured and how it is measured is shaped by power, and educators must be critical and self-reflective in their research projects.<\/li>\n<\/ul>\n<\/div>\n<div class=\"textbox exercises\">\n<h3>Exercises<\/h3>\n<p>Think about your current research question and the tool(s) that you will use to gather data. Even if you haven&#8217;t chosen your tools yet, think of some that you have encountered in the literature so far.<\/p>\n<ul>\n<li>How does your positionality and experience shape what variables you are choosing to measure and how you measure them?<\/li>\n<li>Evaluate the measures in your study for potential biases.<\/li>\n<li>If you are using measures developed by another researcher, investigate whether it is valid and reliable in other studies across cultures.<\/li>\n<\/ul>\n<\/div>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-131-1\">Milkie, M. A., &amp; Warner, C. H. (2011). Classroom learning environments and the mental health of first grade children. <em>Journal of Health and Social Behavior, 52<\/em>, 4\u201322 <a href=\"#return-footnote-131-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-131-2\">Kaplan, A. (1964). <em>The conduct of inquiry: Methodology for behavioral science<\/em>. San Francisco, CA: Chandler Publishing Company. <a href=\"#return-footnote-131-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><li id=\"footnote-131-3\">Earl Babbie offers a more detailed discussion of Kaplan\u2019s work in his text. You can read it in: Babbie, E. (2010). <em>The practice of social research<\/em> (12th ed.). Belmont, CA: Wadsworth. <a href=\"#return-footnote-131-3\" class=\"return-footnote\" aria-label=\"Return to footnote 3\">&crarr;<\/a><\/li><li id=\"footnote-131-4\">Kaplan, A. (1964). <em>The conduct of inquiry: Methodology for behavioral science<\/em>. San Francisco, CA: Chandler Publishing Company. <a href=\"#return-footnote-131-4\" class=\"return-footnote\" aria-label=\"Return to footnote 4\">&crarr;<\/a><\/li><li id=\"footnote-131-5\">In this chapter, we will use the terms concept and construct interchangeably. While each term has a distinct meaning in research conceptualization, we do not believe this distinction is important enough to warrant discussion in this chapter.  <a href=\"#return-footnote-131-5\" class=\"return-footnote\" aria-label=\"Return to footnote 5\">&crarr;<\/a><\/li><li id=\"footnote-131-6\">Wong, Y. J., Steinfeldt, J. A., Speight, Q. L., &amp; Hickman, S. J. (2010). Content analysis of Psychology of men &amp; masculinity (2000\u20132008).&nbsp;<i>Psychology of Men &amp; Masculinity<\/i>,&nbsp;<i>11<\/i>(3), 170. <a href=\"#return-footnote-131-6\" class=\"return-footnote\" aria-label=\"Return to footnote 6\">&crarr;<\/a><\/li><li id=\"footnote-131-7\">Kimmel, M. (2000).&nbsp;<em>The<\/em><em>&nbsp;gendered society<\/em>. New York, NY: Oxford University Press; Kimmel, M. (2008). Masculinity. In W. A. Darity Jr. (Ed.),&nbsp;<em>International<\/em><em>&nbsp;encyclopedia of the social sciences&nbsp;<\/em>(2nd ed., Vol. 5, p. 1\u20135). Detroit, MI: Macmillan Reference USA <a href=\"#return-footnote-131-7\" class=\"return-footnote\" aria-label=\"Return to footnote 7\">&crarr;<\/a><\/li><li id=\"footnote-131-8\">Kimmel, M. &amp; Aronson, A. B. (2004).&nbsp;<em>Men and masculinities: A-J<\/em>. Denver, CO: ABL-CLIO. <a href=\"#return-footnote-131-8\" class=\"return-footnote\" aria-label=\"Return to footnote 8\">&crarr;<\/a><\/li><li id=\"footnote-131-9\">That said, when using a Lickert scale, which is an ordinal scale, many researchers will argue that averages, measures of variation, and parametric tests are appropriate. For more on this, see Sullivan, G. M., &amp; Artino, A. R., Jr (2013). Analyzing and interpreting data from likert-type scales. <i>Journal of graduate medical education<\/i>, <i>5<\/i>(4), 541\u2013542. <a href=\"https:\/\/doi.org\/10.4300\/JGME-5-4-18\">https:\/\/doi.org\/10.4300\/JGME-5-4-18<\/a>&nbsp;and Norman G. (2010). Likert scales, levels of measurement and the \"laws\" of statistics. <i>Advances in health sciences education : theory and practice<\/i>, <i>15<\/i>(5), 625\u2013632. <a href=\"https:\/\/doi.org\/10.1007\/s10459-010-9222-y\">https:\/\/doi.org\/10.1007\/s10459-010-9222-y<\/a> <a href=\"#return-footnote-131-9\" class=\"return-footnote\" aria-label=\"Return to footnote 9\">&crarr;<\/a><\/li><li id=\"footnote-131-10\">Krosnick, J.A. &amp; Berent, M.K. (1993). Comparisons of party identification and policy preferences: The impact of survey question format.&nbsp;<em>American Journal of Political Science, 27<\/em>(3), 941-964. <a href=\"#return-footnote-131-10\" class=\"return-footnote\" aria-label=\"Return to footnote 10\">&crarr;<\/a><\/li><li id=\"footnote-131-11\">Likert, R. (1932). A technique for the measurement of attitudes.&nbsp;<em>Archives of Psychology,140<\/em>, 1\u201355. <a href=\"#return-footnote-131-11\" class=\"return-footnote\" aria-label=\"Return to footnote 11\">&crarr;<\/a><\/li><li id=\"footnote-131-12\">Stevens, S. S. (1946). On the Theory of Scales of Measurement.&nbsp;<i>Science<\/i>,&nbsp;<i>103<\/i>(2684), 677-680. <a href=\"#return-footnote-131-12\" class=\"return-footnote\" aria-label=\"Return to footnote 12\">&crarr;<\/a><\/li><li id=\"footnote-131-13\">Trahan, L. H., Stuebing, K. K., Fletcher, J. M., &amp; Hiscock, M. (2014). The Flynn effect: A meta-analysis. Psychological Bulletin, 140(5), 1332\u20131360. <a href=\"https:\/\/doi.org\/10.1037\/a0037173\">https:\/\/doi.org\/10.1037\/a0037173<\/a> <a href=\"#return-footnote-131-13\" class=\"return-footnote\" aria-label=\"Return to footnote 13\">&crarr;<\/a><\/li><li id=\"footnote-131-14\">Sullivan G. M. (2011). A primer on the validity of assessment instruments. <em>Journal of graduate medical education, 3<\/em>(2), 119\u2013120. doi:10.4300\/JGME-D-11-00075.1 <a href=\"#return-footnote-131-14\" class=\"return-footnote\" aria-label=\"Return to footnote 14\">&crarr;<\/a><\/li><li id=\"footnote-131-15\">Sullivan G. M. (2011). A primer on the validity of assessment instruments. <em>Journal of graduate medical education, 3<\/em>(2), 119\u2013120. doi:10.4300\/JGME-D-11-00075.1 <a href=\"#return-footnote-131-15\" class=\"return-footnote\" aria-label=\"Return to footnote 15\">&crarr;<\/a><\/li><li id=\"footnote-131-16\">Engel, R. &amp; Schutt, R. (2013). <em>The practice of research in social work (3rd. ed.)<\/em>. Thousand Oaks, CA: SAGE. <a href=\"#return-footnote-131-16\" class=\"return-footnote\" aria-label=\"Return to footnote 16\">&crarr;<\/a><\/li><li id=\"footnote-131-17\">Engel, R. &amp; Schutt, R. (2013). <em>The practice of research in social work (3rd. ed.)<\/em>. Thousand Oaks, CA: SAGE. <a href=\"#return-footnote-131-17\" class=\"return-footnote\" aria-label=\"Return to footnote 17\">&crarr;<\/a><\/li><li id=\"footnote-131-18\">Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE. <a href=\"#return-footnote-131-18\" class=\"return-footnote\" aria-label=\"Return to footnote 18\">&crarr;<\/a><\/li><li id=\"footnote-131-19\">Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE. <a href=\"#return-footnote-131-19\" class=\"return-footnote\" aria-label=\"Return to footnote 19\">&crarr;<\/a><\/li><li id=\"footnote-131-20\">Engel, R. &amp; Schutt, R. (2013). The practice of research in social work (3rd. ed.). Thousand Oaks, CA: SAGE. <a href=\"#return-footnote-131-20\" class=\"return-footnote\" aria-label=\"Return to footnote 20\">&crarr;<\/a><\/li><li id=\"footnote-131-21\">Jagadish, H. V., Stoyanovich, J., &amp; Howe, B. (2021). COVID-19 Brings Data Equity Challenges to the Fore. <i>Digital Government: Research and Practice<\/i>,&nbsp;<i>2<\/i>(2), 1-7. <a href=\"#return-footnote-131-21\" class=\"return-footnote\" aria-label=\"Return to footnote 21\">&crarr;<\/a><\/li><li id=\"footnote-131-22\">Uttl, B., White, C. A., &amp; Gonzalez, D. W. (2017). Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related. <i>Studies in Educational Evaluation<\/i>,&nbsp;<i>54<\/i>, 22-42. <a href=\"#return-footnote-131-22\" class=\"return-footnote\" aria-label=\"Return to footnote 22\">&crarr;<\/a><\/li><li id=\"footnote-131-23\">Benton, S. L., &amp; Cashin, W. E. (2014). Student ratings of instruction in college and university courses. In <i>Higher education: Handbook of theory and research<\/i>&nbsp;(pp. 279-326). Springer, Dordrecht. <a href=\"#return-footnote-131-23\" class=\"return-footnote\" aria-label=\"Return to footnote 23\">&crarr;<\/a><\/li><li id=\"footnote-131-24\">Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability.&nbsp;<i>Assessment &amp; Evaluation in Higher Education<\/i>,&nbsp;<i>43<\/i>(4), 666-681. <a href=\"#return-footnote-131-24\" class=\"return-footnote\" aria-label=\"Return to footnote 24\">&crarr;<\/a><\/li><li id=\"footnote-131-25\">Clayson, D. E. (2018). Student evaluation of teaching and matters of reliability. <i>Assessment &amp; Evaluation in Higher Education<\/i>,&nbsp;<i>43<\/i>(4), 666-681. <a href=\"#return-footnote-131-25\" class=\"return-footnote\" aria-label=\"Return to footnote 25\">&crarr;<\/a><\/li><li id=\"footnote-131-26\">Boring, A., Ottoboni, K., &amp; Stark, P. (2016). Student evaluations of teaching (mostly) do not measure teaching effectiveness.&nbsp;<i>ScienceOpen Research<\/i>. <a href=\"#return-footnote-131-26\" class=\"return-footnote\" aria-label=\"Return to footnote 26\">&crarr;<\/a><\/li><li id=\"footnote-131-27\">Uttl, B., &amp; Smibert, D. (2017). Student evaluations of teaching: teaching quantitative courses can be hazardous to one\u2019s career. <i>Peer Journal<\/i>,&nbsp;<i>5<\/i>, e3299. <a href=\"#return-footnote-131-27\" class=\"return-footnote\" aria-label=\"Return to footnote 27\">&crarr;<\/a><\/li><li id=\"footnote-131-28\">Heffernan, T. (2021). Sexism, racism, prejudice, and bias: a literature review and synthesis of research surrounding student evaluations of courses and teaching.&nbsp;<i>Assessment &amp; Evaluation in Higher Education<\/i>, 1-11. <a href=\"#return-footnote-131-28\" class=\"return-footnote\" aria-label=\"Return to footnote 28\">&crarr;<\/a><\/li><\/ol><\/div><div class=\"glossary\"><span class=\"screen-reader-text\" id=\"definition\">definition<\/span><template id=\"term_131_585\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_585\"><div tabindex=\"-1\"><p>The process by which we describe and ascribe meaning to the key facts, concepts, or other phenomena under investigation in a research study.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_628\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_628\"><div tabindex=\"-1\"><p>In measurement, conditions that are easy to identify and verify through direct observation.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_641\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_641\"><div tabindex=\"-1\"><p>In measurement, conditions that are subtle and complex that we must use existing knowledge and intuition to define.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_663\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_663\"><div tabindex=\"-1\"><p>Conditions that are not directly observable and represent states of being, experiences, and ideas.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_718\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_718\"><div tabindex=\"-1\"><p>A mental image that summarizes a set of similar observations, feelings, or ideas<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_366\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_366\"><div tabindex=\"-1\"><p>developing clear, concise definitions for the key concepts in a research question<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_376\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_376\"><div tabindex=\"-1\"><p>concepts that are comprised of multiple elements<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_383\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_383\"><div tabindex=\"-1\"><p>concepts that are expected to have a single underlying dimension<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_390\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_390\"><div tabindex=\"-1\"><p>assuming that abstract concepts exist in some concrete, tangible way<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_616\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_616\"><div tabindex=\"-1\"><p>process by which researchers spell out precisely how a concept will be measured in their study<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_719\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_719\"><div tabindex=\"-1\"><p>Clues that demonstrate the presence, intensity, or other aspects of a concept in the real world<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_503\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_503\"><div tabindex=\"-1\"><p>unprocessed data that researchers can analyze using quantitative and qualitative methods (e.g., responses to a survey or interview transcripts) <\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_388\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_388\"><div tabindex=\"-1\"><p>a characteristic that does not change in a study<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_4195\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_4195\"><div tabindex=\"-1\"><\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_387\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_387\"><div tabindex=\"-1\"><p>The characteristics that make up a variable<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_695\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_695\"><div tabindex=\"-1\"><p>variables whose values are organized into mutually exclusive groups but whose numerical values cannot be used in mathematical operations.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_654\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_654\"><div tabindex=\"-1\"><p>variables whose values are mutually exclusive and can be used in mathematical operations<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_720\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_720\"><div tabindex=\"-1\"><p>The lowest level of measurement; categories cannot be mathematically ranked, though they are exhaustive and mutually exclusive<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_721\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_721\"><div tabindex=\"-1\"><p>Exhaustive categories are options for closed ended questions that allow for every possible response (no one should feel like they can't find the answer for them).<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_722\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_722\"><div tabindex=\"-1\"><p>Mutually exclusive categories are options for closed ended questions that do not overlap, so people only fit into one category or another, not both.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_524\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_524\"><div tabindex=\"-1\"><p>Level of measurement that follows nominal level. Has mutually exclusive categories and a hierarchy (rank order), but we cannot calculate a mathematical distance between attributes.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_723\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_723\"><div tabindex=\"-1\"><p>An ordered set of responses that participants must choose from.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_461\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_461\"><div tabindex=\"-1\"><p>A level of measurement that is continuous, can be rank ordered, is exhaustive and mutually exclusive, and for which the distance between attributes is known to be equal. But for which there is no zero point.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_462\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_462\"><div tabindex=\"-1\"><p>The highest level of measurement. Denoted by mutually exclusive categories, a hierarchy (order), values can be added, subtracted, multiplied, and divided, and the presence of an absolute zero.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_386\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_386\"><div tabindex=\"-1\"><p>measuring people\u2019s attitude toward something by assessing their level of agreement with several statements about it<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_385\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_385\"><div tabindex=\"-1\"><p>Composite (multi-item) scales in which respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_384\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_384\"><div tabindex=\"-1\"><p>A composite scale using a series of items arranged in increasing order of intensity of the construct of interest, from least intense to most intense.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_375\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_375\"><div tabindex=\"-1\"><p>measurements of variables based on more than one one indicator<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_724\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_724\"><div tabindex=\"-1\"><p>An empirical structure for measuring items or indicators of the multiple dimensions of a concept.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_576\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_576\"><div tabindex=\"-1\"><p>a composite score derived from aggregating measures of multiple concepts (called components) using a set of rules and formulas<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_589\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_589\"><div tabindex=\"-1\"><p>The ability of a measurement tool to measure a phenomenon the same way, time after time. Note: Reliability does not imply validity.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_653\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_653\"><div tabindex=\"-1\"><p>The extent to which scores obtained on a scale or other measure are consistent across time<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_725\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_725\"><div tabindex=\"-1\"><p>The consistency of people\u2019s responses across the items on a multiple-item measure. Responses about the same underlying construct should be correlated, though not perfectly.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_649\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_649\"><div tabindex=\"-1\"><p>The extent to which different observers are consistent in their assessment or rating of a particular characteristic or item.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_590\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_590\"><div tabindex=\"-1\"><p>The extent to which the scores from a measure represent the variable they are intended to.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_643\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_643\"><div tabindex=\"-1\"><p>The extent to which a measurement method appears \u201con its face\u201d to measure the construct of interest<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_644\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_644\"><div tabindex=\"-1\"><p>The extent to which a measure \u201ccovers\u201d the construct of interest, i.e., it's comprehensiveness to measure the construct.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_647\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_647\"><div tabindex=\"-1\"><p>The extent to which people\u2019s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_646\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_646\"><div tabindex=\"-1\"><p>A type of criterion validity. Examines how well a tool provides the same scores as an already existing tool administered at the same point in time.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_645\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_645\"><div tabindex=\"-1\"><p>A type of criterion validity that examines how well your tool predicts a future criterion.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_726\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_726\"><div tabindex=\"-1\"><p>The extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_382\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_382\"><div tabindex=\"-1\"><p>(also known as bias) refers to when a measure consistently outputs incorrect data, usually in one direction and due to an identifiable process<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_727\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_727\"><div tabindex=\"-1\"><p>When a participant's answer to a question is altered due to the way in which a question is written. In essence, the question leads the participant to answer in a specific way.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_343\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_343\"><div tabindex=\"-1\"><p>Social desirability bias occurs when we create questions that lead respondents to answer in ways that don't reflect their genuine thoughts or feelings to avoid being perceived negatively.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_728\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_728\"><div tabindex=\"-1\"><p>In a measure, when people say yes to whatever the researcher asks, even when doing so contradicts previous answers.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_378\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_378\"><div tabindex=\"-1\"><p>Unpredictable error that does not result in scores that are consistently higher or lower on a given measure but are nevertheless inaccurate.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_381\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_381\"><div tabindex=\"-1\"><p>when a measure indicates the presence of a phenomenon, when in reality it is not present<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_380\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_380\"><div tabindex=\"-1\"><p>when a measure does not indicate the presence of a phenomenon, when in reality it is present<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_621\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_621\"><div tabindex=\"-1\"><p>the group of people whose needs your study addresses<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_393\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_393\"><div tabindex=\"-1\"><p>The value in the middle when all our values are placed in numerical order. Also called the 50th percentile.<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_308\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_308\"><div tabindex=\"-1\"><p>individuals or groups who have an interest in the outcome of the study you conduct<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><template id=\"term_131_285\"><div class=\"glossary__definition\" role=\"dialog\" data-id=\"term_131_285\"><div tabindex=\"-1\"><p>the people or organizations who control access to the population you want to study<\/p>\n<\/div><button><span aria-hidden=\"true\">&times;<\/span><span class=\"screen-reader-text\">Close definition<\/span><\/button><\/div><\/template><\/div>","protected":false},"author":1686,"menu_order":2,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"class_list":["post-131","chapter","type-chapter","status-publish","hentry"],"part":104,"_links":{"self":[{"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/pressbooks\/v2\/chapters\/131","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/wp\/v2\/users\/1686"}],"version-history":[{"count":1,"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/pressbooks\/v2\/chapters\/131\/revisions"}],"predecessor-version":[{"id":741,"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/pressbooks\/v2\/chapters\/131\/revisions\/741"}],"part":[{"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/pressbooks\/v2\/parts\/104"}],"metadata":[{"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/pressbooks\/v2\/chapters\/131\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/wp\/v2\/media?parent=131"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/pressbooks\/v2\/chapter-type?post=131"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/wp\/v2\/contributor?post=131"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.bccampus.ca\/dlaitsch\/wp-json\/wp\/v2\/license?post=131"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}