Program for building and clustering of genomic profiles using GPU
Before RNA transcription starts, special segments of DNA form a complex of regulatory proteins called transcription factors. This complex allows RNA polymerase to be bound to DNA and to start reading RNA. It is a difficult problem to search binding sites on DNA because of many factors influencing the binding. In particular, other sites in the vicinity of a given site may influence binding. To reveal those dependencies the authors introduce histograms of density distribution of binding sites, called genomic profiles. The software package developed in the scope of this work allows building genomic profiles using the prediction of binding sites by a weight matrices algorithm for different implementations: for multicore CPU’s or NVidia GPU’s using CUDA. In addition, the software allows clustering genomic profiles using K-means clustering and hierarchical clustering. The algorithm allows to build samples – random transcription factors hierarchies based on the existing experimental structural classification to estimate the correspondence between genomic profiles construction and the existing classification. The analysis of correspondence of genomic profiles with biological classification of transcription factors was developed using the sampling algorithm.