COSC 348: Lab02

Overview: Finding Data

An important part of bioinformatics is processing data, so the first thing we need to find out is "where is the data".

The purpose of this lab is to find information about the single protein called GABRA1, and the DNA sequence that is responsible for GABRA1 generation. In the second part, we'll look into how to find information about the whole genome.

IMPORTANT: Save the last 30 min or so for writing a 1 page report reflecting on your today's investion and submit it. Answering the questions provided along this document will help you to write it, do not hesitate about speculating, it is not an exam. Any feedback you provide regarding the lab session will be very valuable as well.
Remember, your today's lab work and this reflection are worth 1% of your final mark.


Part 1: Protein and DNA sequence retrieval

We will retrieve various information about the protein denoted by GABRA1.
First let us find out what the GABRA1 protein is: have a read of this.

Make sure you understand what GABRA1 is -- talk to someone near you, and see if you have the same idea.

ExPASy (SwissProt database) – retrieving the AA sequence of proteins

Now that we know what GABRA1 is, we want to find out, and store, the sequence of Amino Acids (AA) that makes the protein.


About the FASTA format:

>Sequence_identifier_and_name | The definition line followed by the sequence of bases, i.e.:
ARCTGKINYD.....

FASTA is the default format for many sequence analysis software. These programs are case-sensitive. Be aware:


GenBank – Retrieving the coding DNA sequence for a protein

This format of sequence is called the GenBank format.


About the GenBank format:

GenBank is also the default format for many sequence analysis software. It consists of 4 parts (described in detail at http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html ):


By now, you should be ready to retrieve the same information for any gene/protein. Try with the Human Haemoglobin subunit alpha, and see if you can answer some of the following questions.


Part 2: Genome retrieval

Let us first see the genome in action. How viruses inject their RNA or DNA into our cells and thus force them to make more viruses. Life cycle of viruses in video.

GenBank – finding and retrieving genomes of viruses

GenBank – the leading nucleotide sequence repository/database maintained jointly by the NCBI (U.S. National Center for Biotechnology Information), EMBL (European Molecular Biology Laboratory) and DDBJ (DNA Data Bank of Japan).


DNA of influenza A viruse has 10 genes (segments):

How important are viruses in evolution? Did DNA Come From Viruses?

Ensembl Project – Exploring the Human Genome

Ensemble is a joint project of the European Bioinformatics Institute and the Sanger Institute both located near Cambridge, U.K. You can spend weeks navigating all the options here.