Multiple sequence alignment quality comparison in T-Coffee, MUSCLE and M-Coffee based on different benchmarks
Abstract
Multiple sequence alignment (MSA) is a fundamental process in the studies for determination of evolutionary, structural and functional relationships of biological sequences or organisms. There are various heuristic approaches comparing more than two sequences to generate MSA. However, each tool used for MSA is not suitable for every dataset. Considering the importance of MSA in wide range of relationship studies, we were interested in comparing the performance of different MSA tools for various datasets. In this study, we applied three different MSA tools, T-Coffee, MUSCLE and M -Coffee, on several datasets, BAliBase, SABmark, DIRMBASE, ProteinBali and DNABali. It was aimed to evaluate the differences in the performance of these tools based on the stated benchmarks regarding the % consistency, sum of pairs (SP) and column scores (CS) by using Suite MSA. We also calculated the average values of these scores for each tool to examine the results in comparative perspective. Eventually, we conclude that all three tools performed their best with the datasets from ProteinBali (average % consistency: 29.6, 32.3, 29.7; SP: 0.74, 0.73, 0.74; CS with gaps: 0.27, 0.27, 0.26 for T-Coffee, MUSCLE, M-Coffee, respectively), whereas the lowest performance was obtained in datasets from DIRMBASE (average % consistency: 1.8, 1.1, 4.3; SP: 0.05, 0.04, 0.04 CS with gaps: 0.01, 0, 0.008 for T-Coffee, MUSCLE, M-Coffee, respectively)