COSC 348: Lab06

Hidden Markov Models (HMMs)

Work through the section on Scoring a Sequence with an HMM up to the forward algorithm in the online tutorial by Rachel Karchin from the department of Bioinformatics and Computational Biology, University of California, Santa Cruz, USA:
      Hidden Markov Models and Protein Sequence Analysis.

(Rachel has given permission for us to use her tutorial material.)


IMPORTANT

Save the last 30 min or so for writing (and submitting) a 1 page report in which you answer the questions below. Remember, your today's lab work and report are worth 1% of your final mark.

In your report, try to answer these questions (time permitting):

  • Given an HMM with 3 match states, 3 delete states and 4 insertion states (in Figure 8), what is the minimum number of states that a "random walk" will visit? What is the maximum? (do not consider the BEGIN and END states)
  • If a "random walk" generates a sequence of L aminoacids, and we define N as the number of states visited in that path, is it true that L is always less or equal than N? Why?
  • In a pairwise alignment, a higher penalty is generally imposed for a gap "opening" compared with the gap "extension". Is this feature present in an HMM profile? Why?
  • Why are HMM best suited to find distant homologs than the profiles based on position-specific probability distributions?
  • An HMM is very sensitive to the quality of the training set. Discuss briefly how the following limitations affect the model built and give a possible solution.
    1) Training sequences are not evenly distributed in the sequence space of their class
    2) There are too many sequences in the training set and they cover a huge space
    3) There are only few sequences available for the training set

    Cosc348 home
    Cosc348 labs