Reasearch Interests

Reasearch Interests

One strand of Pawel's research relates to his activity as a Head of Bioinformatics, Microarray & DNA Analysis in the SHWFGF. He is interested in developing and applying new computational methods for the analysis of microarray data, particularly for Affymetrix platform. Second strand of PH's research relates to his activity in the Bioinformatics Research Centre. He is interested in aspects of protein structure prediction in general and transmembrane (TM) domains of integral membrane proteins, in particular.

Glasgow Analysis of Microarrays (GlaMA):

Over the last several years we have worked on development of methods applicable for the analysis of microarray data containing realistic number of replicated samples. We have co-developed a cascade of methods known as GlaMA, which contains RankProducts (RP) a "primary"method for identification of differential genes between small groups of biologically replicated samples as well as Iterative GroupAnalysis (iGA) and Graph based Iterative GroupAnalysis (GiGA), "secondary"methods for finding significantly changed gene classes and subnets, respectively. The standalone computer programs for each of these methods are available from GlaMA, the independent RP module (RankProd) is available from BioConductor suite and automated pipeline for analysis of Affymetrix GeneChip expression arrays was created in SHWFGF and has been used routinely for the analysis of over 3000 arrays. Although all these methods were developed in context of microarray technology we have subsequently demonstrated that they could be used in the proteomics and metabolomics contexts as well.

RankProducts superiority for studies with small number of replicates:

Our experience built upon many years of running microarray projects at SHWFGF tells us that the most common question that the biologists ask is about statistically significant lists of differentially expressed genes between groups of replicated samples. Using both biological and simulated data we have shown that the RP generated lists are superior to those obtained using most common methods such as t-test, Wilcoxon-test or t-statistic based SAM method, if the number of biological replicates is less than ten. Recently, an independent study testing 20 different statistical methods for identification of differentially expressed genes on simulated data with 25, 10 and 5 replicates revealed that the RP method was superior to others under the scenario with 5 replicates. It is therefore becoming increasingly clear that the RP should be supported as a method of choice in studies with small number of replicates.

Modelling protein transmembrane domains:

The goal of structural genomics is to experimentally determine all protein structures representative for all available protein sequences. Sadly, technical problems with overexpression and crystallization of transmembrane (TM) proteins make this class of protein particularly resistant to structural biology techniques and consequently to structural genomics. It is for this reason that bioinformatics efforts to predict protein structure from its sequence should now concentrate on TM proteins. TM proteins are coded by ca. 25% of genes in eukaryotic organisms and are vital for intercellular communication and selective transport. In the past we developed a computational methodology for generating accurate models of helical TM domains from plethora of experimental and theoretical information. This was successfully applied to number of proteins and the model of TM domain of G-protein-coupled receptor rhodopsin was confirmed by subsequently determined high-resolution 3-D structure. To this end we are currently investigating whether this methodology can be useful when little or no experimental data is available. In particular, we are investigating possibilities of elucidating structural restraints from multiple sequence alignments by applying correlated mutation concept as well as hidden class site model of evolution. Such restraints are then used for TM domain modelling using Monte Carlo Simulated Annealing protocol or to identify protein-protein interaction surfaces.