Transcribing and Coding

Oral Robinson; Alexander Wilson

58 Transcribing and Coding

Before you get to the point where you are able to answer your research question, several things need to happen. First, you need to accurately transcribe your data into a form that will lend itself to reliable scrutiny. This includes organizing your field notes, memos, observational and other data in a way that will make it easy for you to review.

The second stage of data analysis is coding and counting (so that it does not appear that your analysis is only a series of biased anecdotes, see Silverman, 2015). Counting is the process of enumerating or assigning numbers to non-numberical data while coding is the process of organizing data into categories so that it can be analyzed. Researchers use several strategies, including grids (or matrix tables), affinity diagrams and content mapping to help discern and organize patterns. Third, you need to transform the evidence into a coherent argument while exercising reflexivity about the decisions you made to transform that evidence (Rosaline, 2011). It is only at this stage that you will be able to answer your research question. We will expand on these stages before returning to how qualitative researchers answer their questions. Let us begin by discussing transcribing and coding.

Transcribing

All audio interviews must be transcribed before being analyzed. This usually takes about 6 to 7 times as much time as the interview itself (a 1 hour interview = 6 to 7 hours of transcribing) (Halcomb et al, 2006). The amount of time taken to transcribe will depend on the software used, skill, motivation, experience or other factors. In some cases, it is significantly more, and in other cases, it can be less.

There are two approaches to assembling qualitative data for analysis: verbatim transcripts or field notes of memos of the research process. According to Halcomm, verbatim transcription refers to the “word-for-word reproduction of verbal data, where the written words are an exact replication of the audiorecorded words” (2006, p. 39). Depending on the study, researchers might be less interested in verbatim records and might focus more on field notes and memos. In some cases, they might listen to audio records as a means of supplementing the field notes and memos created during the research process. Most researchers rely on the combination of field notes, memos and verbatim transcripts. Regardless of the approach used, the qualitative data analysis process is usually guided by the same goal: to identify patterns.

Box 9.2 – Getting Started with Transcription

Before you begin transcribing a file, listen to a few minutes of the recording to get a sense of the speech patterns and quality of the recording
Begin transcribing with the shorter and clearer files. This will give you a sense of victory and help you build momentum. Even if you save some of the shorter files for later, do not do all the long, difficult files first. That can be demotivating.
Take a break between transcripts
Omitting fillers in speech patterns such as um, uh, like, you know and so forth are okay so long the context of what is stated is not greatly altered by this change
Unless language competence is important to the research, it is acceptable to make small grammatical changes
Try enhancing files with audio issues. Omit data only if the audio is irretrievable or if the time/resource investment is substantial. Inaudible sections should be marked with a blank (________)
Transcribe in small amounts at a time e.g. 5 seconds of audio. This will ensure that you remember everything that was said and can be time efficient. It also enhances accuracy)
Label emotions and actions for what they are e.g., sighs, breathes heavily, laughter
Use ellipsis (…) to indicate unfinished sentences or pauses mid-sentence
Numbers should be written in letters
Start a new paragraph (block format) with each new speaker and they should be separated with an empty line.

Adapted from Frankfort-Nachmias, C. and Nachmias, D. (1996). Research Methods in the Social Sciences (5th edition). St. Martin’s Press Inc.

9.2.2.2 Transcription Softwares

There are several transcription softwares that can make the process easier for you. These include Dragon Naturally Speaking, Adobe Premiere, Otter, Happy Transcribe, Rev and Amberscript. Most of these require subscription or payment to use. However, your institution might have some available. It is also worth checking out free softwares. It is important to remember that no software is completely accurate. Regardless of the program used, you will need to make edits and corrections.

Coding in Grounded Theory

Miles and Huberman (1994) define codes as “tags or labels for assigning units of meaning to the descriptive or inferential information compiled during a study” (p. 56). Simply put, codes are abstractions, labels we assign to chunks of texts, which can be of varying size, e.g., words, phrases, sentences or whole paragraphs to summarize their meaning (Miles & Huberman, 1994). This means breaking the data into manageable chunks so that it can be analyzed to uncover relationships (similarities and dissimilarities). Coding is hence the bedrock of qualitative data analysis. We discuss three of these strategies: grounded theory, systematic analysis and content analysis.

Bhattacherjee (2012) describes grounded theory as “an inductive technique of interpreting recorded data about a social phenomenon to build theories about that phenomenon [in which] interpretations are ‘grounded in’ (or based on) observed empirical data” (p. 113). This process has three common techniques:

Open coding: also called emergent codes because codes are derived from the text, rather than from preconceived ideas and concepts (Blair, 2015). Open coding begins by analyzing texts to determine labels (Strauss & Corbin, 1998) then deriving concepts and categories/sub-categories, which will ultimately evolve into constructs. It is the bedrock of grounded theory because the researcher attempts to be open to new ideas while suspending pre-existing beliefs, concepts, theories and attitudes to allow meanings to emerge from the data. This is no doubt an extremely difficult undertaking.
Axial coding: Organizes categories and sub-categories into causal explanations that could possibly explain the phenomenon. This can be performed simultaneously with open coding. Researchers need to be alert to the categories that cut across all data sets. It is only through this process that one can determine the themes in the dataset. Remember, a theme is a collection of related codes. While conducting axial coding, the researcher is looking for general patterns and explanations by asking questions such as
Selective coding: “involves identifying a central category or a core variable and systematically and logically relating this central category to other categories” (Bhattacherjee, 2012, p. 114). Doing so will help to better recognize patterns and explanations. In particular, you might need to ask yourself: (1) can certain codes be grouped together under a common category? (2) are there specific relationships between codes (e.g., is there progression such as A leads to B, C mitigates B, A and B usually happens before C etc.,)? Strauss & Corbin (1998, p. 161) notes that “categories are organised around a central explanatory concept”

From the above, it is evident that open-coding is foundational to grounded theory because it generates a “participant generated ‘theory’ from the data” (Blair, 2015, p. 17). Do not make the claim that you are using grounded theory if the codes do not emerge from the data. Essentially, grounded theory coding means that the explanations and concepts used to answer the research questions are generated from within the data and not from the literature or other external sources. This requires that researchers read, re-read and label texts until they reach theoretical saturation. Theoretical saturation is “when additional data does not yield any marginal change in the core categories or the relationships” (Bhattacherjee, 2012, p. 115). In other words, it is the point at which you are not finding any new concepts, relationships or codes. Reaching theoretical saturation requires intimate connection to the data. Many insights do not stand out the first time you code the data. You must be prepared to code it multiple times, paying attention to the context in which something was said (e.g., was it said in relation to another topic, did you have to probe for it to happen etc). Taking these things into account could reveal new instances of a code or theme. However, at the point of theoretical saturation, it is important to move on. You should focus on either axial or selective coding.

On a final note, grounded theory and open coding can be used with any type of qualitative data, but content analysis is used less often to analyze interviews and other primary data. Instead, content (and template and systematic) analysis is often used to analyze secondary data e.g., institutional documents, newspaper reports, books and other social artifacts.

The Constant Comparative Method

An important element of qualitative data analysis is constantly comparing and contrasting your findings. The constant comparative method involves “looking systematically at who is saying what and in what context…it relies on identifying patterns in your data and this means that you need to do some counting” (Rosaline, 2011, p.254). Counting in this context does not equate to statistical inference but you need to provide evidence that a theme or perspective was really important. For example, you might say “seven out of the fifteen respondents articulated that…”. Hence it is important to compare and contrast the perspectives of your respondents.

Dealing with Exceptional Findings

In the coding process, you are likely to find a theme or certain insights that do not fit with the general trends of the analysis. You might be tempted to: (a) ignore the findings or (b) treat it as a major theme. You should certainly not ignore it, but neither should you treat it as the rule or as a generalizable finding. The adage, “the exception does not prove the rule,” applies here: exceptional claims require exceptional evidence. Think of your audience and background research to your field: is your finding all that unique? If it is, then it requires extra evidence: many of your interviewees should optimally have a statement that supports your point. If the exception is interesting but you lack the evidence to support it as a major finding, you should note it as an issue for further research. On the other hand, findings that are well-established in the field do not need extensive elaboration. You can simply offer only a couple of quotations before moving onto something they do not know.

Checking for Internal Consistency

Before drawing definitive conclusions from your analysis, you must check for internal consistency (whether what you are saying contradicts itself) and then (re)check your results against the raw data (whether you have omitted key evidence from what you are saying). Checking for internal consistency means applying your explanations to all the data you have gathered and ask: does it contradict any of my data? Are any of these contradictions abundant or important enough to undermine the explanatory power of the theory? For example, if some of your raw data contradicts the dominant theme that “all right-wing media outlets are funded to neglect nefarious corporate behaviour,” you will need to address the contradictions. Suppose, you find some text from a right-wing media outlet with grassroots funding that condemns big corporations, you might need to question how prevalent such a contradiction is, what are the conditions under which such contradictions happen, then evaluate the implications for your dominant finding.

Grounded theory is commonly criticized for its lack of strict standards for defining concepts before observation. Because the concepts (or bits) of data are gathered according to the judgment of the researcher, it therefore asks the reader to trust the researcher’s judgment in picking relevant and accurate data. In this respect, grounded theory can risk becoming a tool to confirm the bias of the researcher (as is also a risk of interpretive research). It still, however, is an evidence driven approach, and requires the conceptualization and amassing of evidence in order to prove its argument. Nonetheless, the grounded theory researcher should place extra emphasis on thick description in their data analysis. Thick description means providing detailed multiple descriptions (usually through verbatim quotes and narratives) and interpretations (explanations) of this. This means that many different networks of data are connected to the main argument of the research, providing the presentation of multiple viewpoints on a single topic. This concrete and direct evidence (as opposed to an abstract and jargony description), will prove to your reader that while your data presentation was still reliant on your judgment as a researcher, the judgment is based on comprehensive evidence that is coded, not fabricated.

Box 9.3 – Ensuring a Grounded Theory

Glaser (1998, p. 18-19) states that there are four primary requirements for judging a good grounded theory:

Fit: Emerging concepts should accurately describe the pattern of data.
Workability: clarifies whether the concepts and hypotheses account for how participants concerns are resolved
Relevance: addresses whether the issue is of social concern i.e., are people interested in the finding? What are the wider social implications?
Modifiability: is the theory amenable to modification if new data shed more light on the phenomena?

Source: Glaser, B.G. (1998). Doing grounded theory – Issues and discussions. Sociology Press.

Content Analysis

Content analysis begins with a different coding scheme than grounded theory. Rather than begin with open coding, content analysis uses systematic coding. Bhattacherjee (2012) therefore defines content analysis as “the systematic analysis of the content of a text (e.g., who says what, to whom, why, and to what extent and with what effect) in a quantitative or qualitative manner” (p. 115). Systematic coding determines, before reading the text, a system for sorting what could be found. Hence, it provides “inputs” for codes such as the use of specific terms such as “good” and “bad” to describe the “sentiment” a customer feels about a product or by giving broader concepts inputs such as making “care for cost” equivalent to the use of “expensive, cheap, cost-effective, cost or price” when describing Uber’s service. It is also a technique able to numerically evaluate a text, to determine quantitative relationships of how much a particular code appears throughout a given discourse. Similarly, some researchers use template coding where codes are predefined by the researcher based on prior research, reading or theory (Blair, 2015; King, 1994; Miles et al, 2014).

Content analysis can be used deductively, to test the efficacy of a theory for explaining a given phenomenon. For instance, I could derive an hypothesis (based on other readings about Uber’s arrival in urban landscapes) that the primary concern of the public about Uber is cost. I could define the code “cost” and it’s potential inputs beforehand, and then hone in on how much it is discussed relative to other potential issues such as “working conditions,” “emissions,” and “speed” to determine what is actually most mentioned in public discourse. This makes systematic coding an effective tool for clearly testing whether assumptions in the field comprehensively hold on a large discourse. Devising codes beforehand also allows more data to be easily organized, making content analysis a more effective tool for coding larger datasets.

Unlike grounded theory, content analysis involves the creation of a predesigned set of codes or constructs, which the “text” or data is then ordered into. For instance, say I am analyzing the “media about the upcoming election to determine whether one candidate is given more favourable representation than the others.” I might choose to devise codes that capture “favourable representation” with both qualitative and quantitative aspects. I could deem “favourable” as “allusion to the positive benefits of their policy or leadership (their ‘sound’ fiscal policy will…)” and then count the instances where this occurs.

Box 9.4 – Content Analysis in Five Steps

Transcription

Are all your audio and visual data converted into an easily accessible textual file (by hand or by computer program)?

Coding Rules – What am I looking for and do I define it?

Is your hypothesis able to anticipate what text you might find?
- How might that hypothesis be split up into clear codes? What are some potential examples for each code?
- How do you define the codes so that they are mutually exclusive and exhaustive (i.e. that they do not explain the same thing and that they capture as much text as could fit into that definition)?
- Are the codes worth finding out? Are they interesting? Has another researcher searched for the same thing and confirmed/disconfirmed the existence of that speech?

Code Data According to Rules – Have I found what I was looking for?

Were my codes present or non-existent in my textual data?
Have I found out the quantity of each code in comparison to the other?
- Have you addressed the frequency (amount in relation to the total responses), direction (positive or negative statement, stance towards other institution, person, idea, etc.) and depth (how many other statements was it referring to?) of each of my codes?

The Uncoded – Is there data I am misinterpreting/ignoring according to my initial rules?

Check for data that was uncoded according to your protocols
- Have I accounted for my biases as a researcher?
- Do they reflect my biases as a researcher?
- Can any of them be redressed without compromising the intentions of my hypothesis?

Reflection and Reiteration – Has my hypothesis been proven/falsified, and which codes best prove/falsify it?

Evaluate your findings with regard to your initial hypotheses
- Do the findings follow the trend you were expecting?
- If not, how do they deviate from that trend?
- Are there “negative cases” (cases which contradict the expectation) which you can explain?
Nuance your expectation in an attempt to explain the cases that contradicted it
Reread and repeat coding steps to continually test and strengthen your thesis

Framework Analysis

Because template and systematic coding rely on pre-existing ideas, they are usually considered framework analysis (Ritchie & Spencer 1994). Framework analysis uses grids and matrices to organize data into categories and to establish what patterns. Pre-existing categories allow us to neatly fit findings into grids which can give a clear sense of patterns. Matrix tables are particularly common in qualitative analysis. A matrix is basically a table which organizes quotations or chunks of data under broad themes (in the columns) and cases (in the rows) to allow for easy comparison. Despite its advantage for organizing raw data, Rosaline (2011) notes that researchers need to also ask: (1) what are the exceptions and how might they be explained? (2) Could and how might exceptions point to general principles (generalizations)? (3) How can the patterns established by the grids be explained? One way of dealing with this is by creating a different column or matrix for exceptions.

The following box offers some practical tips on organizing matrix tables while the table is an illustrative matrix table which codes hypothetical interviews with student environmental activists around three themes: (a) hopes for the movement; (b) why they began to participate in the movement; (c) and their discipline of study. Hopefully, it helps you to think about how to organize the major codes of your study, and how to record supporting evidence. At this point, it might be useful to contemplate: what patterns and exceptions are discernible from the table? By constantly reflecting on this question, you will be better able to identify the answers to your research question.

Table 9.1 - Sample Matrix
	Hopes	Beginning	Discipline
Respondent 1	“I hope the movement will be able to coordinate itself better in the future. The last protest was an embarrassment, the speaker could not even find some basic agreements with each other regarding the needs of the environmental movement.” (‘30:20)	“I was eight years old. My parents had informed me of the risks of climate change, and I wanted to do something about it. I joined my school’s recycling club and helped sort cans. I actually thought that was enough to fix the environment at the time. Recycling.” (‘15:45)	Engineering
Respondent 2	“We need the attention of those in power. Environmental protests have been happening for thirty years, and policy is still too slow to follow the popularity of the movement. We need to focus on those in positions of power now, not just popularity.” (‘15:20)	“I am embarrassed to say that I never participated until I joined university. Yeaa, I guess it was about then when my friends were protesting that I thought of joining them. Once I had attended, listened to the speakers at Vancouver’s protest, then I think the impact of the movement, which I already knew of, struck me in all its importance.” (‘10:00)	Forestry

Box 9.5 – Sample Matrix

A common way to make a matrix is to simply highlight the raw data of your interview transcripts (or a collection of your textual raw data, a corpus file).

For instance, take this hypothetical interview with the first respondent of the previous matrix. The highlights are yellow for discipline, green for beginning:

Interviewer: What is your discipline of study? Has it had anything to do with your participation in the environmental movement?
Respondent 1: I study engineering. I suppose it has had an impact, but only indirectly. I was interested in math in high school and also felt that new technology could reduce the environmental damages of the old kind. The interests intertwined with my passion for engineering, which was not as purely theoretical as studying mathematics, nor lacking quantitative reasoning like other environmental activism roles.

Once highlighted, they can then be returned to copy into the relevant outline of your argument. If the interview transcripts take 20 pages, and there are five of them, scouring the documents for highlights can quickly become tedious. This is where having both highlights (initial data categorizations) on the raw data and narrowed key quotations (potent examples) in a separate matrix can make your final write-up much easier. One you can use to look for more data and get a sense of how comprehensive your evidence is for a particular code, the other will have a few of your most lucid examples for the write up.

References

Bhattacherjee, A. (2012). Social Science Research: Principles, Methods, and Practices https://scholarcommons.usf.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=1002&context=oa_textbooks

Blair, E. (2015). A reflexive exploration of two qualitative data coding techniques. Journal of Methods and Measurement in the Social Sciences, 6(1), 14-29.

Glaser, B.G. (1998). Doing grounded theory – Issues and discussions. Sociology Press.
Silverman, D. (2015). Interpreting qualitative data. Sage.

Halcomb, E. J., & Davidson, P. M. (2006). Is verbatim transcription of interview data always necessary? Applied nursing research, 19(1), 38-42.

Frankfort-Nachmias, C. and Nachmias, D. (1996). Research Methods in the Social Sciences (5th edition). St. Martin’s Press Inc.

Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook. Sage.

King, 1994

(Rosaline, 2011, p.254

Ritchie & Spencer 1994
Strauss, A & Corbin, J. (1998). Basics of qualitative research (2nd ed.). Sage

License

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Practicing and Presenting Social Research Copyright © 2022 by Oral Robinson and Alexander Wilson is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.