Data Organization

Oral Robinson; Alexander Wilson

53 Data Organization

As you collect your data, it must go through some early stages of preparation before it is analyzed. The methods that you use for organization should likewise help guide your data collection, allowing for easier categorization and aggregation of the complex data that you will be receiving. We therefore recommend that you read the qualitative and quantitative analysis sections in addition to this chapter on data collection before you begin. In this final section, we briefly overview common problems and concerns that researchers have with organizing their data.

Table 8.3 - Common Problems with Data Organization
Problem	Description
Coded Chaotically	Many problems with organization ultimately stem from arbitrariness. Organization implies that definite principles and rules are applied to the arrangement of information. Defining these rules allows any researcher who understands the rules to predict the placement and relationship of the information. But if no rule or pattern exists, each dive into the data will remain as confusing as the first. As a consequence, keep your organization and coding consistent by establishing strict rules around the coded data. For each code you attribute to a set of data, define it clearly and write down its definition in your field notes. This will not only help you find related data in the future, but will also remind you of your reasoning later on.
Inconsistency in Variable Measurement	Use the same measurement tool for each variable! For instance, if you use an ordinal measurement for “income,” do not apply an interval-ratio measure for subsequent variables. This will make your variable much easier to manipulate, as the same measurement will allow you to compare the data easier in your analysis. Comparing data that has used a different system of measurement will require that you translate that data into the same system of measurement before you compare (i.e. what you would have to do if you measured half with the metric system and the other with imperial).
Inconsistent in treatment of missing data	How you deal with missing data is a vital part of the data collection process, and like everything else, it has to be dealt with through a consistent set of principles. Dealing with missing data consistently first means that you actually deal with all the missing data. Make sure that if you are addressing the missing data of one variable (which you should) you do the same for all the others. The second problem with inconsistent treatment of data is how you treat your missing data. If you acknowledge the methodological problems behind missing data for your survey of ‘immigrant incomes in Canada’ then you should try to address the same problems for your other data (which may not exist, but explain why).

Missing Data

Regardless of how you define the scope of your empirical research, there exists no complete picture of an experience. However, as a researcher, it is your task to best account for the data that relates to your topic. As a consequence, if your methods fail to capture key data that relates to your topic, then it must be discussed. There are many reasons for this, the survey may have been too long (non-response error), the interviewer may have forgotten a question (administrative error), there may have been populations out of reach of the demographer (coverage error), the variable might have been too narrowly defined (measurement error), the questions may have been leading and so on ad infinitum.

While missing data can affect your ability to answer research questions, in most cases, it is not that alarming. The questions posed by missing data are often as important as the research question itself. The data that your method does not find can often help to explain the weaknesses of the method, or the need for a different method in researching this part of social life (see Chapter 11 for a discussion about writing your limitations). The missing data may also indicate some structural barriers about your topic itself. For instance, the tendency of your respondents to not disclose their incomes may indicate a social desirability bias (the perception that they will appear more popular if they lie or omit information about low or high income). In other cases, the data is either scarce or hard to access. Missing data, oddly enough, can be an important kind of social data; they may indicate the inevitable barriers and power inequalities that mitigate the flow of information.

Because missing data is most likely to be noticed in the process of data-entry, it is important that you devise a protocol for tabulating missing data. Here is what Bhattacherjee (2012, p. 120) has to say about missing data in data entry:

During data entry, some statistical programs automatically treat blank entries as missing values, while others require a specific numeric value such as -1 or 999 to be entered to denote a missing value. During data analysis, the default mode of handling missing values in most software programs is to simply drop the entire observation containing even a single missing value, in a technique called listwise deletion. Such deletion can significantly shrink the sample size and make it extremely difficult to detect small effects. Hence, some software programs allow the option of replacing missing values with an estimated value via a process called imputation.”

Considering the importance of data-entry just discussed, it is therefore vital that you are wary of ‘imputation’ and ‘listwise deletion.’ Missing values in a dataset cannot be simply inferred if it is not attached to evidence, and listwise deletion and imputation (based on previous values) draw assumptions about the data that often cannot be easily shrugged off. As a consequence, if you are doing a statistical procedure, and need to include data that was missed, be sure to also explain in words possible reasons for the missing data, its impact on your data set, and how you treated it in your calculations.

Data Transformation

As we noted in missing data, it is sometimes necessary that you alter your data values before they can be interpreted. This, however, should be done with caution. While performing a logarithm on your data values may help to dramatize (and therefore make noticeable) the pattern within a dataset, it also can distort the viewer’s perception of your findings. That is why it is always important that you remain explicit about your methods throughout the process so your reader knows exactly why you performed the adjustments you did. That caveat stated, data transformation is an important part of dealing with statistical data. As noted before, it is to help indicate trends in the data that are not necessarily evident at first glance. As a result, data transformation is led by a search for key trends in the data. For example, a common type of transformation involves scaling up or down the weight of an item. Bhattacherjee (2012) suggests including scale measures by adding individual scale items, creating a weighted index from a set of observed measures, and collapsing multiple items into one category (see Fink, 2009 for a deeper discussion on Data Transformation).

References

Bhattacherjee, A. (2012). Social Science Research: Principles, Methods, and Practices https://scholarcommons.usf.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=1002&context=oa_textbooks

Fink, E. L. (2009). The FAQs on data transformation. Communication Monographs, 76(4), 379-397.

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Practicing and Presenting Social Research Copyright © 2022 by Oral Robinson and Alexander Wilson is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Missing Data

Data Transformation

References

License

Share This Book