Research Article Open Access

The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering

Khalid Jaber, Nur’Aini Abdul Rashid and Rosni Abdullah

Abstract

Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI) library and POSIX threads (PThread) to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.

American Journal of Applied Sciences
Volume 6 No. 7, 2009, 1368-1372

DOI: https://doi.org/10.3844/ajassp.2009.1368.1372

Submitted On: 3 April 2009 Published On: 31 July 2009

How to Cite: Jaber, K., Rashid, N. A. & Abdullah, R. (2009). The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering. American Journal of Applied Sciences, 6(7), 1368-1372. https://doi.org/10.3844/ajassp.2009.1368.1372

  • 3,340 Views
  • 2,761 Downloads
  • 3 Citations

Download

Keywords

  • Maximal cliques algorithm
  • parallel processing
  • protein sequence clustering
  • MPI
  • pthread