Weekly outline

  • Welcome to Bioinformatics 🧬 Algorithms & Data Structures!

    This course will be held by me, Gökçe

    📡 Class room

    aydos.de/binfroom

    like inf2

    Class Recordings

    like inf2

    The first recording from 2020-03-26 is not listed on aydos.de/binfroom

    Minutes & Code examples from the Class

    https://mygit.th-deg.de/lsi/binf-notes

    Meeting room

    like inf2

    Class

    Thursdays 14:00 to 15:30

    Flipped class like inf2

    Textbook

    We will use the book Bioinformatics - Algorithms and Data Structures from Phillip Compeau & Pavel Pevzner, the third edition. The first chapters are available at the book website. The author said he will open up all the chapters in the next weeks.

    Interactive Online Course

    There is an interactive online course which contains all the book contents and interactive coding challenges

    Bioinformatics Algorithms: An Active Learning Approach @ Stepik

    The first chapter is free:

    Bioinformatics I: Finding Hidden Messages in DNA

    If you do not want to. I think the online version of the textbook should be sufficient. The textbook seems to contain all the content in the print edition and the code challenges are also available at Rosalind.

    Another free interactive course based on the book is available at Coursera. I only briefly audited this course, but it seems to be a subset of the textbook. The advantage is that there are peer-reviewed assignments.

    Videos

    https://www.bioinformaticsalgorithms.org/lecture-videos

    Slides

    If you need the slides, please contact me.

    Exam

    written exam on 30.07.2020, open book & internet & computer

    Assignments

    like inf2

    Critical Awareness

    like inf2

    Moodleoverflow

    like inf2

  • CW13 — March 26th

    Prep

    • Please finish the following feedback form
    • Finish the following subchapters on the website or in the free Stepik course:
      • 1.1 (A Journey of a Thousand Miles...) to including 1.4 (An Explosion of Hidden Messages)
      • take the detours if you can and please try to do the coding challenges (on Rosalind or Stepik) using Python or command line
    • We will discuss your questions on Thursday at 2pm in the online meeting room

    Agenda

    • Testing microphone
    • Introduction new students
    • course start feedback
    • course details
    • poll: could you finish the chapters?
    • meaning of 5 ECTS
      • 1 ECTS ~= 28h
      • 5 ECTS ~= 140h
      • time for class preparation and assignments = 140h - 24h (classes ~1.5h x 16 weeks) - 20h (exam prep) = ~96h
      • prep time per week = ~6h
    • poll: which language did you use?
    • questions
    • It has not escaped our... — meaning
    • hidden messages
      • https://en.wikipedia.org/wiki/The_Gold-Bug
      • https://en.wikipedia.org/wiki/Caesar_cipher
    • Pseudocode
      • exercise: character count in a string
    • one minute paper
      • How useful was the online class today? Please give details.

  • CW14 — April 2th

    Prep

    • until including Chapter 1.7

    Assignments to Submit

    • character count in a string as pseudo code

       input: string of length n, character c

       output: number of c's in the string

    • Implementation of PatternCount() in Python (from Chapter 1.2)
    • Please submit them in one archive (e.g., zip)

    Agenda

    • Feedback
    • Real-time tracking of pathogen evolution
    • learning goal
      • understand what pseudo code is and be able to write a basic problem in pseudo code
      • dictionary vs list
    • assignment discussion
      • code example
    • questions
    • Pseudocode
      • group exercise:
        • input: k-mer kmer, DNA-sequence seq
        • output: number of the k-mers kmer in the DNA-sequence seq
    • frequentWords algorithm
      • discussion in groups
        • what if we search for all possible k-mers in one singe iteration?
    • dict vs list (vs set)
      • list: order important
      • dict: faster to access the elements
      • https://docs.python.org/3/tutorial/datastructures.html
  • CW15 — April 9th

    Prep

    • until including Chapter 1.8

    Agenda

    • assignment discussion
    • questions
    • creating a Python library
    • Python style guide
    • evaluating the runtime
      • big-O notation
      • examples
        • patternCount
        • frequentWords
        • O(1)
        • O( n)
        • O(n^2)
        • O(2^n)
    • What is the probability that two randomly generated strings of length 9 in the DNA alphabet ({A,C,G,T}) are identical?

      • Pseudocode
        • Chapter 1.3 Reverse Complement Problem
      • Python codes for the following coding challenges
        • Chapter 1.2 frequent_words problem
        • Chapter 1.3 Reverse Complement Problem
        • Chapter 1.3 Pattern Matching Problem
  • CW 16 — April 16th

    Prep

    • until including Chapter 2.3

    Agenda

      • consider the following function
        • input: array of integers
        • output: minimum of these numbers
      • write the pseudocode for your algorithm
      • implement it in Python
      • calculate the runtime
        • best case
        • worst case
      • is your function iterative or recursive?
        • how would you convert it to an iterative or recursive function?
  • CW 17 — April 23th

    Prep

    • until including Chapter 2.4

    Agenda

    • feedback
    • questions
    • Hanoi towers
      • can you find a formula for T(n) that does not require recursion?
    • brute force algorithms
    • entropy
  • CW 18 — April 30th

    Prep

    • until including Chapter 2.7

    Agenda

    • feedback
    • grading
      • idea: to have multiple grades instead of one single exam
      • waiting for feedback from faculty
    • where are we?
      • (regulatory) Motif finding => searching for genes which are circadian
        • but the motifs we find are mutated
    • where are you?
      • until which chapter did you prepare?
    • questions
    • slides into 2nd chapter
    • What is the expected number of occurrences of a 9-mer in 500 random DNA strings, each of length 1000? Assume that the sequences are formed by selecting each nucleotide (A,C,G,T) with the same probability (0.25). (Chapter 2.2.2)
      • ~1.8
      • https://brilliant.org/wiki/linearity-of-expectation/
  • CW 19 — May 07

    Prep

    • until including Chapter 2.8

    Agenda

    • feedback
    • grading
      • graded assignments + 90 min exam
      • not allowed by the exam commission
    • in which chapter are you?
      • do you use the lecture videos for prep?
    • questions
    • what is heuristic
      • e.g., chess
      • e.g., local search heuristic
  • CW 20 — May 14

    Prep

    • until including Chapter 3.7

    Agenda

    • feedback
    • in which chapter are you?
    • your questions
    • we calculated the probability for
      • 10 randomly selected 15-mers from the 10 600-nuc-long dna_strings capture *at least* one impl  anted 15-mer
      • we found ~.017
      • what does it mean for us in case of randomizedMotifSearch()
    • do you use ipython?
      • reloading a module
      • import importlib
      • importlib.reload(your_library)
    • howto implement random() in Gibbs Sampling using random_integer(int)
  • CW 21 — May 21

    💤

  • CW 22 — May 28

    Prep

    • until including Chapter 3.7
    • in which code challenges did you have problems?

    Agenda

    • feedback
    • discussions of code challenges in chapter 3
  • CW 23 — Jun 04

    Prep

    • until including Chapter 3.8 (do not forget the coding challenges)

    Agenda

    • feedback
    • in which chapter are you?
    • discussion from last session
      • does information get lost when we convert an overlap graph to a de Bruijn graph?
      • it depends if we are working on the composition (exploded newspaper particles) or the overlap graph constructed directly from the string
      • example ATGATGGATG
      • the overlap graph constructed from the string does not equal to the overlap graph built from the composition
      • de Bruijn graph is the same for both
      • but our problem is based on exploded newspaper anyway
    • https://aydos.de/jhub
    • assignments ba3e, ba3f
    • Midterm Eval:
      • http://tinyurl.com/yblqfbdv
      • also give feedback: do you profit from the teaching style (preparation + discussing only assignments & key points & your questions)
  • CW 25 — Jun 18

    Prep

    • Chapter 3.9 (Assembling Genomes from Read-pairs)

    Agenda

  • CW 26 — Jun 25

    Prep

    • 3.10 Epilogue: Genome Assembly Faces Real Sequencing Data
    • pre-class assignment ↴

    Agenda

    • feedback

      • The self-preparation of the chapters is not sufficient I think. Maybe we could just sum up what we prepared together at the beginning of the lecture
      • Generally, the online course spends much time with developing the ideas behind algorithms. It starts with "let's try it..." and then finds out certain weaknesses of the tried approach to afterwards improve it iteratively. I think we should spend more time on explaining the final algorithm than on developing approaches that we will find out they're not good enough anyway
      • Also, with regard to preparation for the exam, it is hard to judge which of the contents is relevant and which is not so relevant.
    • pre-class assignment discussion

    • questions
  • CW 27 — Jul 02

    Prep

    Pre-class assignment

    This is a Jupyter Notebook file. Either:

    • use aydos.de/jhub

      • login with your university credentials
      • go to Assignments tab
      • fetch motifs assignment
      • suffer or have fun with it & complete what you can
      • you can run individual cells using CTRL + Enter
      • the tests for coding assignments are included (assertions)
      • the answers for questions are not tested, we will discuss them in class
      • you can turn your solutions in via submit button, then I can browse your answers during class
    • download it and run it on your own Jupyter Notebook server and turn your solutions in via https://aydos.de/jhub

  • CW 28 — Jul 09

    • semester overview & summary

      • what is computational thinking
      • how do biologists search, how do computer scientists search?
      • how do we search for patterns?
      • how to write your approach as pseudocode?
      • which algorithms are better than the others?
      • how do we calculate the runtime complexity of algorithms?
      • what is a DnaA box?
      • what is the motivation behind skew diagrams?
      • how to write an algorithm iteratively and recursively?
      • what are motifs?
      • how do we search for motifs?
      • what is a profile matrix?
      • what is the difference between count- and entropy-based score?
      • what is a median string?
      • what are the four algorithms we used for motif search? What are the differences?
      • what is the analogy between an exploded newspaper and genome assembly?
      • how do we use graphs to assemble a genome?
      • what are the differences between overlap and de Bruijn graphs?
      • what is an Eulerian graph?
      • what is en Eulerian cycle?
      • what is the advantage of read-pairs?
      • what are the challenges of real sequencing data? How can we overcome them?