BIOINFORMATICS: Protein Databases

Introduction to Databases

A database is a collection of similar information which is stored in the computer system.

In case of Bioinformatics, Databases are developed for Drug designing, Clinical Data or any simple information on Proteins, Nucleotides, Genes, Gene prediction and so on.

Can be created by anyone who has good computer knowledge.

Protein Database

Collection of similar protein information: Sequence or Structure.
The three being discussed today:-
-> PDB
-> dbPTM
-> SCOP

Protein Data Bank (PDB)

Belongs to the RCSB (Research Collaboratory for Structural Bioinformatics).

A repository for the 3-D structural data of large biological molecules, proteins and nucleic acids.

The PDB is a key resource in areas of structural biology, such as structural genomics.
Data obtained by X-ray crystallography or NMR spectroscopy.

Overseen by an organization called the Worldwide Protein Data Bank.

The PDB database is updated every Tuesday.

The PDB ID is of four characters- first is any number from 1 to 9, rest can be alpha-numeric.

In 2007, 7263 structures were added. In 2008, only 7073 structures were added, with a total of 55,660 structures.

The information summarized for each entry includes several data items:

Title- The title of the experiment or analysis that is represented in the entry.

Author- The names of the authors responsible for the deposition.

Primary Citation- Includes the primary journal reference to the structure.

History- Includes the date of deposition, date of release of the structure by PDB and supersedes (appears if a previous version, or versions, of a structure were deposited with the PDB.

Experimental Method- The experimental technique used to solve the structure including theoretical modeling.

Parametres- For structures that were determined by x-ray diffraction, this section gives information about the refinement of the structure.

Unit Cell- For structures that were determined by x-ray diffraction, this section gives the crystal cell lengths and angles.

NMR Ensemble-For structures determined by NMR, this section includes the total number of conformer that were calculated in the final round, number of conformer that are submitted for the ensemble & description of how the submitted conformer (models) were selected.

NMR Refine- Contains the method used to determine the structure.

Molecular Description- Contains the no. of polymers, molecule name, any mutation, if present, entity fragment description, chain identifiers and the EC (Enzyme Commission) number.

Source- Specifies the biological and/or chemical source of the molecule given for each entity identified in the molecular description section.

Related PDB entries- Data items in this section contain references to entries that are related to the entry.
Chemical Component- Contains the 3-letter code, the name & the chemical formula of the chemical component.
SCOP classification- Classifications are pulled from the SCOP database and summarized here.
CATH classification- As classified by the CATH database.
GO Terms- Clicking on any of the results in this section will perform a search of the database resulting in a Query Results Browser page containing all structures with the selected Molecular Function, Biological Process Cellular Component.

dbPTM

dbPTM is a database that compiles information on protein post-translational modifications (PTMs), such as the catalytic sites, solvent accessibility of amino acid residues, protein secondary and tertiary structures, protein domains and protein variations.

The database includes all the experimentally validated PTM sites from Swiss-Prot, PhosphoELM and O-GLYCBASE.

The dbPTM systematically identifies three major types of protein PTM (phosphorylation, glycosylation and sulfation) sites against Swiss-Prot proteins.

The summary table of PTMs :

To facilitate the users to investigate and browse all the types of PTM in the release 2.0 of dbPTM. In the table, each type of PTM was categorized by their modified amino acids with the number of experimentally verified sites. For example, users can choose the acetylation of lysine (K) to take the more detailed information such as the position of modification on amino acid, the location of modification on protein sequence, the modified chemical formula, and the mass difference.The most effective knowledge about the PTM is the substrate site specificity including the frequency of amino acids, the average solvent accessibility, and the frequency of secondary structure surrounding the modified site.

BIOINFORMATICS

Wednesday, April 8, 2009

Protein Databases

No comments:

Post a Comment

Followers

Blog Archive

About Me