2 Bioinformatics assignment 2
Pardis Mani and kathleef
Why do we want to do this assignment?
Global food demand is increasing rapidly, with projections indicating that farmers must produce 70% more food by 2050 to feed the growing world population (Alston et al. 2009; The State of Food and Agriculture 2020; Ray et al. 2023). However, traditional farming methods are reaching their limits, and climate change is exacerbating conditions by reducing the efficiency of photosynthesis in crops like rice (Figure 1) (Alston et al. 2009; Nahar et al. 2018).

One key way climate change threatens food production is by raising temperatures, which in turn reduces the efficiency of photosynthesis, especially in C3 plants like rice (Yamori et al. 2014). Under heat stress, C3 plants experience increased rates of photorespiration, a wasteful process that competes with normal photosynthesis (Peterhansel et al. 2010).
Photosynthesis relies on the enzyme RuBisCO, Ribulose bisphosphate carboxylase/oxygenase, which ideally uses CO₂ as a substrate to fix carbon and build sugars. However, RuBisCO can also mistakenly use O₂ instead of CO₂, triggering photorespiration, which consumes energy without producing sugars, thereby reducing overall photosynthetic efficiency (Peterhansel et al. 2010).
This issue escalates with rising temperatures: CO₂ solubility drops more rapidly than O₂ solubility in water, resulting in a reduced CO₂/O₂ ratio within the plant (Figure 2). Consequently, RuBisCO is increasingly inclined to bind to O₂ rather than CO₂, which increases photorespiration and reduces sugar yields (Ku and Edwards 1977). Therefore, C3 crop productivity will decline in a warming climate, directly contributing to future food security.

In contrast, C4 plants have evolved specialized anatomy and biochemistry to concentrate CO₂ inside their cells. By capturing CO₂ in mesophyll cells and transporting it to bundle sheath cells, where O₂ concentration is minimized, C4 plants significantly avoid photorespiration (Figure 3) (Yamori et al. 2014). This anatomical advancement allows them to sustain higher photosynthetic efficiency, especially in high-temperature environments.

Our long-term goal is to engineer C3 plants, such as rice, to perform C4-like photosynthesis by integrating traits such as higher vein density and carbon-concentrating anatomy. To accomplish this, we first identify and understand the genes that control vein development and other structural characteristics essential for C4 function.
In this assignment, we are targeting genes in the C4 plant Setaria italica (foxtail millet), which serves as a model system that is simple enough to modify genetically. Utilizing CRISPR, we plan to knock out genes related to auxin signaling and vein formation. By examining the resulting phenotypes, we aim to identify crucial genes for developing C4 traits, knowledge that could ultimately aid in engineering higher-yield and climate-resilient rice.
Before getting into more details, we have to get familiar with the CRISPR-Cas system.
CRISPR-Cas in Bacteria and Archaea: A Natural Immune System
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) systems originated in bacteria and archaea and serve as a genetic immune mechanism against viruses and foreign DNA. When a virus invades a bacterial cell, the bacterial immune system captures small segments of the viral DNA, which are then integrated into the cell’s genome at a designated locus known as the CRISPR locus.
These integrated sequences, referred to as “spacers,” function as a genetic memory of previous infections. If the same virus attacks again, the bacteria transcribe these spacers into short RNA molecules known as CRISPR RNAs (crRNAs), each containing a sequence that matches part of a known invader.
The crRNA interacts with a small RNA known as tracrRNA (trans-activating CRISPR RNA). This tracrRNA binds to crRNA, assisting it in directing the Cas9 protein, an endonuclease, towards the target DNA. Once the corresponding sequence is located, Cas9 cleaves both strands of the viral DNA, thereby destroying it and protecting the cell.

How CRISPR-Cas was modified for gene editing
To modify the system for genome editing, scientists simplified the bacterial CRISPR-Cas machinery by combining the crRNA and tracrRNA into a single RNA molecule called the single-guide RNA (sgRNA or gRNA). This modified sgRNA contains a 20-nucleotide “spacer” sequence corresponding to the target gene and a scaffold region binding to Cas9. When introduced into cells with the Cas9 protein, the sgRNA guides Cas9 to a specific location in the genome, producing a double-stranded break in the DNA. Following the cut, the cell attempts to repair it through one of two pathways:
- Non-Homologous End Joining (NHEJ): A repair method prone to errors that can lead to insertions or deletions (indels), frequently resulting in gene knockout.
- Homology-Directed Repair (HDR): A more accurate repair process that requires a DNA template.
In this project, we utilize NHEJ to knock out genes without providing a repair template.

What is the PAM sequence and why is it important?
For Cas9 to effectively cleave DNA, the target sequence must be immediately followed by a specific short sequence called the PAM (Protospacer Adjacent Motif). For Cas9, the PAM sequence is typically 5’-NGG-3’, where N represents any nucleotide. Cas9 relies on a PAM to identify and attach to the target DNA. In the absence of a PAM, even if the guide RNA perfectly matches the DNA, Cas9 cannot cleave.
In this project, when selecting a target sequence, it is crucial to incorporate the PAM site (that is, the 20-nt target sequence plus the NGG PAM) into your design and analysis.
Key Takeaways
Component |
Function |
---|---|
crRNA |
Provides the guide sequence complementary to the target DNA. |
tracrRNA |
Stabilizes crRNA and facilitates its loading into Cas9. |
sgRNA / gRNA |
Engineered fusion of crRNA and tracrRNA into a single guide RNA. |
Cas9 |
An endonuclease that cuts DNA at the target site. |
PAM |
A required short DNA sequence adjacent to the target site that Cas9 must recognize to cut (typically 5’-NGG-3’). |
What do we want to do?
In this assignment, we aim to design a pair of CRISPR oligonucleotides that guide the Cas9 protein to a specific genomic region within the Setaria italica (foxtail millet) gene. By targeting and disrupting genes associated with vein development and auxin signaling, we seek to enhance a broader project focused on engineering C3 crops, such as rice, to improve photosynthesis and increase heat tolerance.
To achieve this effectively, we must select a target sequence that is:
- Positioned within a large exon (to prevent introns from interfering with the match),
- Close enough to the gene’s center to reduce the risk of affecting only partial functions,
- Associated with a PAM sequence (5’-NGG-3’) for Cas9 recognition,
- Specific (with low off-target risk) and effective (with good GC content and efficacy score).
What steps are involved?
- Find your assigned gene’s genomic DNA (gDNA) sequence using the accession number. For this step, we use Phytozome.
- Choose an appropriate oligo sequence among the available options using a CRISPR design tool.
What is happening in each step?
Now let’s talk about each step in more depth:
1. Find your assigned gene’s genomic DNA (gDNA) sequence.
In this step, we work with the genomic DNA (gDNA) sequence, including exons and introns. Since CRISPR-Cas9 specifically targets DNA and not RNA, it is essential to design our guide RNA (gRNA) against a continuous exon sequence to ensure accurate targeting and disruption of the gene.
Using cDNA, which consists solely of spliced exons, would result in the loss of critical information regarding the exon boundaries in the original genome. This could lead to the unintentional design of a gRNA that spans an exon-exon junction, preventing it from properly matching the unspliced genomic DNA. Consequently, this mismatch would hinder the gRNA’s ability to recognize the target site, thereby preventing Cas9 from effectively cutting the DNA.
1.1. Access the Phytozome database:
1.2. Search for your gene.
-
-
- Choose Setaria italica v2.2 in the first search bar.
- After selecting your species, select “find genes by keyword.”
- In the second search bar, enter your designated gene accession number.
- Click search.
-
1.3. Locate and download the genomic DNA sequence.
-
-
- Choose the genomic DNA.
-
-
-
- To get the sequence, you might have to scroll down a bit. Then choose “Genomic Sequence.”
-
-
-
- From your sequence, choose an exon that is:
-
Long enough (usually over 100–150 bp) to allow for flexibility in target site selection.
-
- From your sequence, choose an exon that is:
-
-
-
-
- In the middle of your gene. Avoid targeting exons at the very beginning (e.g., start codon region) or at the very end (near stop codon).
-
-
-
-
- Save your exon for the next step.
-
Common Mistakes & Troubleshooting:
-
-
-
- Retrieved cDNA or mRNA sequences instead of gDNA? Ensure you choose the genomic DNA tab, not the “transcript sequence” or “CDS sequence.”
- Selected the wrong gene version or species? Verify that you are using Setaria italica v2.2 before proceeding and double check your accession number.
- Missed a portion of the sequence while copying and pasting? Ensure you select the entire gDNA sequence, from the first base to the last.
-
-
2. Choose an appropriate oligo sequence amongst the available options.
2.1. Access a CRISPR design tool:
2.2. Input selected exon sequence:
-
-
- Paste your selected exon sequence into the query box.
-
2.3. Adjust the settings before submitting:
-
-
- Keep most settings on default.
- For in vitro transcription, choose “custom” and add the cloning overhangs:
- Forward overhang: 5′- GGCA
- Reverse overhang: 5′- AAAC
- Set species to: Foxtail millet (Setaria italica v2.2)
- Submit your sequence.
-
Why do we add these overhangs to CRISPR oligos?
When designing oligos for CRISPR experiments, we add specific 4-nucleotide sequences known as overhangs at the 5′ ends of both forward and reverse oligos. Although these overhangs do not form part of the target sequence, they play a crucial role in inserting the guide RNA into a CRISPR expression vector.
These overhangs align with restriction enzyme recognition sites, vital for cutting the plasmid vector. For this project, we use BsaI to cut the plasmid vector, so the oligos are designed with corresponding overhangs for accurate insertion into the plasmid. BsaI is classified as a Type IIS restriction enzyme that cuts outside its recognition sequence, creating specific 4-base sticky ends.
The two overhangs, GGCA and AAAC, are specifically chosen to be different. This is important for two main reasons:
- Ensures correct directionality: The distinct nature of the overhangs guarantees that the guide RNA is inserted into the plasmid in only one direction, ensuring the vector’s promoter accurately transcribes the guide RNA.
- Prevents vector self-ligation: If the overhangs were identical, the plasmid could re-ligate to itself, leading to constructs devoid of inserts. Using non-palindromic, non-complementary overhangs prevents the plasmid from closing without inserting the guide RNA, which boosts cloning efficiency and reduces screening time.
If the overhangs are inaccurate, the plasmid and oligos will fail to ligate, rendering the CRISPR construct ineffective.
2.4. Interpret the results:
-
-
- A map will display the locations of the target sites.
- Below that, you can see the details about each target site candidate.
-
2.5. Select a target that meets these criteria:
-
-
- High efficacy score (the top of the list is usually best).
- No mismatches in the on-target row (your gene).
- Try to find a match with all the off-targets with four mismatches, primarily in the core (in the [ ]).
- Prioritize off-targets found in introns or intergenic areas (search for “yellow” or “green” boxes).
- Try to avoid targets that have off-targets within exons.
-
What is efficacy?
When designing a CRISPR guide RNA (gRNA), the efficacy score predicts the probability of successful binding and cutting of the desired DNA sequence. CRISPR design tools determine this score from multiple factors:
-
Off-target effects
Off-targets are DNA sites that partially match your gRNA.
If your guide has several or strong off-targets, Cas9 may inadvertently cleave the wrong gene, thereby decreasing specificity and increasing unwanted effects.
Choose a guide with minimal off-targets, particularly none in coding regions.
-
GC content (Recommended: 40–60%)
The GC content influences the gRNA’s binding affinity to the DNA.
A low GC content results in weak binding, while a high content leads to overly stable (i.e., inflexible) binding.
Aim for approximately 40–60% GC content for the best targeting performance.
-
Nucleotide position effects
Specific nucleotides in certain positions of the guide RNA can enhance or weaken activity.
The region closest to the PAM (positions 1–12) is particularly vital, called the “seed region.”
The positive or negative effect depends on the nucleotide and its position: for example, having a T at position four or a G at position two might reduce efficacy.
The image below highlights the positions most sensitive in the PAM-proximal versus PAM-distal regions, based on studies of guide performance (Jung et al. 2024).
2.6. Confirm location:
-
-
- Copy the 23 bp target (20-nt guide + PAM) and search your gDNA sequence to ensure it is entirely within the exon.
- If located on the − strand, obtain the reverse complement using: http://www.bioinformatics.org/sms/rev_comp.html.
-
The site provides the oligo pair, including forward and reverse strands, with the PAM site omitted and the appropriate overhangs added to the 5’ ends of each strand.
-
Common Mistakes & Troubleshooting:
-
-
-
-
Used default overhangs instead of custom ones? Always add GGCA and AAAC under “custom” overhangs before running the tool.
-
Used the same overhangs for both oligos? The forward and reverse oligos must have distinct overhangs to guarantee directionality.
-
Don’t forget that PAM is not included in the oligo.
-
-
-
References:
Al-Salim, S.H.F., Al-Edelbi, R., Aljbory, F., and M.M. Saleh, 2016 Evaluation of the Performance of Some Rice (Oryza sativa L.) Varieties in Two Different Environments. OALib. 03: 1–7
Alston, J.M., Beddow, J.M., and P.G. Pardey, 2009 Agricultural Research, Productivity, and Food Prices in the Long Run. Science. 325: 1209–1210
CRISPR Systems Doudna Lab
Jung, W.J., Park, S.-J., Cha, S., and K. Kim, 2024 Factors affecting the cleavage efficiency of the CRISPR-Cas9 system. Animal Cells and Systems. 28: 75–83
Ku, S.-B., and G.E. Edwards, 1977 Oxygen Inhibition of Photosynthesis: I. Temperature Dependence and Relation to O2/CO2 Solubility Ratio. Plant Physiol.. 59: 986–990
Nahar, A., Luckstead, J., Wailes, E.J., and M.J. Alam, 2018 An assessment of the potential impact of climate change on rice farmers and markets in Bangladesh. Climatic Change. 150: 289–304
Peterhansel, C., Horst, I., Niessen, M., Blume, C., Kebeish, R., Kürkcüoglu, S., and F. Kreuzaler, 2010 Photorespiration. The Arabidopsis Book. 8: e0130
Rasul, M.F., Hussen, B.M., Salihi, A., Ismael, B.S., Jalal, P.J., Zanichelli, A., Jamali, E., Baniahmad, A., Ghafouri-Fard, S., Basiri, A., and M. Taheri, 2022 Strategies to overcome the main challenges of the use of CRISPR/Cas9 as a replacement for cancer therapy. Mol Cancer. 21: 64
Ray, A., Rai, A., and S. Ravichandran, 2023 Impact of Agriculture Production on Climate: Contributor and Victim. International Journal of Green Chemistry. 9: 37–43
Sauvagère, S., and C. Siatka, 2023 CRISPR-Cas: ‘The Multipurpose Molecular Tool’ for Gene Therapy and Diagnosis. Genes. 14: 1542
Smith, M., 2025 CRISPR. National Human Genome Research Institute
The State of Food and Agriculture 2020 2020 FAO.
Understanding food insecurity Understanding food insecurity
What is CRISPR? A bioengineer explains
Yamori, W., Hikosaka, K., and D.A. Way, 2014 Temperature response of photosynthesis in C3, C4, and CAM plants: temperature acclimation and temperature adaptation. Photosynth Res. 119: 101–117