English
  • Събития
  • Новини
  • Методи
  • Обучение
  • Обзорни статии
  • Публикации
  • Кариера
  • За нас
  • Блог
 
Speeding up tandem mass spectrometry-based database searching by longest common prefix
A↑ a↓ A print
Background:
Tandem mass spectrometry-based database searching has become an important technology for peptide and protein identification. One of the key challenges in database searching is the remarkable increase in computational demand, brought about by the expansion of protein databases, semi- or non-specific enzymatic digestion, post-translational modifications and other factors. Some software tools choose peptide indexing to accelerate processing. However, peptide indexing requires a large amount of time and space for construction, especially for the non-specific digestion. Additionally, it is not flexible to use.
Results:
We developed an algorithm based on the longest common prefix (ABLCP) to efficiently organize a protein sequence database. The longest common prefix is a data structure that is always coupled to the suffix array. It eliminates redundant candidate peptides in databases and reduces the corresponding peptide-spectrum matching times, thereby decreasing the identification time. This algorithm is based on the property of the longest common prefix. Even enzymatic digestion poses a challenge to this property, but some adjustments can be made to this algorithm to ensure that no candidate peptides are omitted. Compared with peptide indexing, ABLCP requires much less time and space for construction and is subject to fewer restrictions.
Conclusions:
The ABLCP algorithm can help to improve data analysis efficiency. A software tool implementing this algorithm is available at http://pfind.ict.ac.cn/pfind2dot5/index.htm

Read the original article at BMC Bioinformatics.
Published: Nov 2010, Updated: 25th Nov
Most recent items in "Биоинформатика":
Background:
Published descriptions of biology protocols are often ambiguous and incomplete, making them difficult to replicate in other laboratories. However, there is increasing benefit to formalizing the descriptions of protocols, as laboratory automation systems (such as microfluidic chips) ...
25th Nov Биоинформатика: DeltaProt: a software toolbox for comparative genomics
Background:
Statistical bioinformatics is the study of biological data sets obtained by new micro-technologies by means of proper statistical methods. For a better understanding of environmental adaptations of proteins, orthologous sequences from different habitats may be explored and compared.... Background:
The ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, i... Background:
The problem of determining the physical conformation of a protein dimer, given the structures of the two interacting proteins in their unbound state, is a difficult one. The location of the docking interface is determined largely by geometric complementarity, but finding complementa... Background:
The data from DNA microarrays are increasingly being used in order to understand effects of different conditions, exposures or diseases on the modulation of the expression of various genes in a biological system. This knowledge is then further used in order to generate molecular mec...