COSC 348: Lab06

Hidden Markov Models (HMMs)

Work through the section on Scoring a Sequence with an HMM up to the forward algorithm in the online tutorial by Rachel Karchin from the department of Bioinformatics and Computational Biology, University of California, Santa Cruz, USA:
Hidden Markov Models and Protein Sequence Analysis.

(Rachel has given permission for us to use her tutorial material.)

IMPORTANT

Save the last 30 min or so for writing (and submitting) a 1 page report in which you answer the questions below. Remember, your today's lab work and report are worth 1% of your final mark.

In your report, try to answer these questions (time permitting):

Given an HMM with 3 match states, 3 delete states and 4 insertion states (in Figure 8), what is the minimum number of states that a "random walk" will visit? What is the maximum? (do not consider the BEGIN and END states)

If a "random walk" generates a sequence of L aminoacids, and we define N as the number of states visited in that path, is it true that L is always less or equal than N? Why?

In a pairwise alignment, a higher penalty is generally imposed for a gap "opening" compared with the gap "extension". Is this feature present in an HMM profile? Why?

Why are HMM best suited to find distant homologs than the profiles based on position-specific probability distributions?

An HMM is very sensitive to the quality of the training set. Discuss briefly how the following limitations affect the model built and give a possible solution.
1) Training sequences are not evenly distributed in the sequence space of their class
2) There are too many sequences in the training set and they cover a huge space
3) There are only few sequences available for the training set

Cosc348 home
Cosc348 labs