I last took biology in high school, and have been interested in learning more for my own curiousity - especially at the intersection of biology and computation, and due to the pandemic occurring.

Over the last week, I completed Coursera’s Finding Hidden Messages in DNA (Bioinformatics I).

Overall, I enjoyed the course, it’s a nice sampler into bioinformatics and piqued my curiousity for the rest of the specialization.

Coming from a professional programmer background, I was surprised at how much overlap there was with basic algorithms, covering topics such as Big O runtime, greedy / randomized algorithm, Longest Common Substring all making apperances (more below). The course touches on just enough biology to scaffold the computational problems - I will be looking to learn some more college introductory biology / genetics.

Key Biology Concepts Covered:

  • (Week 1 / 2) DNA Replication - how DNA is copied for simple organisms with circular DNA, finding where replication begins and ends via various computation methods
  • (Week 3/ Week 4) DNA Motif Finding - finding common patterns in DNA which correspond to some biological function, such as circadian rhythm in plants

Computation Topics Touched Upon:

  • Algorithm Running Time - Big O running time. Motif finding algorithms can be quite expensive and algorithm design matters.
  • Probability / Joint Probability Distribution - Estimating the probability of a sequence based on a candidate profile (probability distribution)
  • Greedy Algorithms + Randomized Algorithms
  • Laplace Smoothing

Tips / Recommendations for Future Learners

  • I worked on the “Honors Track” with additional computational problems. The problems were not too difficult to solve, similar to a “Leetcode” Medium, with detailed pseudocode.
    • The problems typically come with an example toy data set to validate answer on. The actual grading is done on a full randomized data set. I found that if I could get the correct solution for the toy data set, 90% of the time my algorithm was implemented correctly for the full data set.
  • Modularize your code into libraries. Code that you write in Week 2 will be useful in Week 3, Week 3 usefuly in Week 4 etc.. Code reuse will make your life easier.

Feel free to check out my code.