Course Review: Bioinformatics I
I last took biology in high school, and have been interested in learning more for my own curiousity - especially at the intersection of biology and computation, and due to the pandemic occurring.
Over the last week, I completed Coursera’s Finding Hidden Messages in DNA (Bioinformatics I).
Overall, I enjoyed the course, it’s a nice sampler into bioinformatics and piqued my curiousity for the rest of the specialization.
Coming from a professional programmer background, I was surprised at how much overlap there was with basic algorithms, covering topics such as Big O runtime, greedy / randomized algorithm, Longest Common Substring all making apperances (more below). The course touches on just enough biology to scaffold the computational problems - I will be looking to learn some more college introductory biology / genetics.
Key Biology Concepts Covered:
- (Week 1 / 2) DNA Replication - how DNA is copied for simple organisms with circular DNA, finding where replication begins and ends via various computation methods
- (Week 3/ Week 4) DNA Motif Finding - finding common patterns in DNA which correspond to some biological function, such as circadian rhythm in plants
Computation Topics Touched Upon:
- Algorithm Running Time - Big O running time. Motif finding algorithms can be quite expensive and algorithm design matters.
- Probability / Joint Probability Distribution - Estimating the probability of a sequence based on a candidate profile (probability distribution)
- Greedy Algorithms + Randomized Algorithms
- Laplace Smoothing
Tips / Recommendations for Future Learners
- I worked on the “Honors Track” with additional computational problems. The problems were not too difficult to solve, similar to a “Leetcode” Medium, with detailed pseudocode.
- The problems typically come with an example toy data set to validate answer on. The actual grading is done on a full randomized data set. I found that if I could get the correct solution for the toy data set, 90% of the time my algorithm was implemented correctly for the full data set.
- Modularize your code into libraries. Code that you write in Week 2 will be useful in Week 3, Week 3 usefuly in Week 4 etc.. Code reuse will make your life easier.
Feel free to check out my code.