41 Codebooks and Data Dictionaries
Previous Lesson
Lesson 12 of 17
Codebooks and data dictionaries are used by researchers to describe parts of data collection or analysis. Their formatting is dependent upon your research design.
Codebooks and Data Dictionaries for Tabular Data
A codebook or data dictionary explains variable names and values in a data set. They are often used to describe quantitative data in a spreadsheet (tabular). It should define all the fields in the table, containing:
- Variable names as they appear in the spreadsheet
- Readable variable names (e.g., ID on a spreadsheet may be short for participant ID.)
- Units of measurement used for a variable and levels of precision (e.g., Are measurements in metric? Is time in HH:MM:SS or some other format?)
- Allowed values for the variable
- Whether values are selected from a list (e.g., M=male, F=female, O=other)
- Values defined in a description
- Description of the variable
- Values for missing data

Data Dictionary image from “How to Make a Data Dictionary” licensed CC0.
Researchers who collect data through surveys create codebooks to provide information about how the data file is structured and any response codes used to record responses (e.g., 1 = yes, 0 = no, 999 = nonresponse). Common software packages used for collection and analysis of survey responses (e.g., REDCap, SPSS) can be used to generate codebooks.
Qualitative Codebooks
Researchers who do qualitative coding of data (e.g., coding interview data by themes) create codebooks that describe and define each code (theme) used in the analysis.
These codes may be deductive (a list of codes is created beforehand and applied to the data), or inductive (the codes emerge from the analysis). Software tools for qualitative analysis (e.g., NVivo, Atlas.ti, MAXQDA) can be used for qualitative coding and can export codebooks containing definitions that are input by you as the researcher.