An insight into genome-wide nucleosome positions is required to understand the local regulation of genome function.  Bioinformatics tools to analyse nucleosome positions in genomes are limited. This paucity is addressed with NUCPOS, a suite that provides several utilities to analyse important aspects of nucleosomal organisation, including nucleosome density, the positioning strength of individual nucleosomes, the contribution of sequence to observed positions, and the average nucleosomal organization of specified genomic positions, such as pol II transcription start sites.

    NUCPOS is available under the GNU public license (GPL-3). The C++ 11 source code may be downloaded from https://sourceforge.net/projects/nucpos/ and compiled with GCC (gcc.gnu.org) g++ version 4.7.3 or later.

    Nucfrag: takes the SAM format (samtools.github.io/hts-specs/SAMv1.pdf) output file without headers generated by Bowtie2 (bowtie-bio.sourceforge.net) as input, and generates a series of data files of the number of nucleosomes centred at each base pair position of each chromosome. Nucfrag allows the selection of lower and upper fragment sizes to select subpopulations of nucleosomes, possibly with or without associated linker histone H1, from the bowtie2 output file.  Additional outputs include a file of the distribution of the aligned fragment sizes in the selected size range, and data files of virtual footprints, simulating genomic areas that would be protected from nuclease cleavage by nucleosomes.  The nucleosome position data files can be uploaded and viewed in a genome browser after addition of the appropriate BED or bigBed format headers (genome.ucsc.edu).

    Dyad_bins: is a program that takes the nucleosome position data files generated by Nucfrag as input, and performs a binning analysis, i.e., it counts the number of times that a specific number of nucleosomes are co-aligned in the genome.  The output can simply be visualized with a graphing program such as Gnuplot (www.gnuplot.info).  The co-alignment of many nucleosomes at a specific genomic position generally indicates a strongly positioned nucleosome, which may have functional relevance.  This analysis provides insight into the general nucleosome density and number of well-positioned nucleosomes in specified genomic regions (Fig. 1 A).

    Align_dyads: is a program that takes a text file with a list of genomic positions and the nucleosome position data files generated by Nucfrag as input, and generates a data file of superimposed nucleosome positions aligned at the specified genomic positions.  The user can select the number of nucleotides to include upstream and downstream of the listed positions.  The program is especially useful to gain insight into the average nucleosomal organization of transcription start sites, defined replication origins, and similar functional elements in a genome.  An output file is written than can be visualized in Gnuplot (Fig. 1 B).

    The precise rotational and translational position of a nucleosome is determined by the DNA sequence accommodated by the nucleosome, as well as steric influences due to other proteins bound to the DNA. The contribution of intrinsic sequence effects is often useful to understand whether specific, functionally significant, well-positioned nucleosomes are precisely placed due to the inherent DNA sequence, or due other features of the genome.  The utility hp_fft allows the user to quantitatively access the contribution of dinucleotide periodicities to the anisotropic flexibility of nucleosomes positioned at identified genomic positions (Fig. 1 C).

    hp_fft: performs the fast Fourier transform of the distribution of each of the 16 possible dinucleotides in a sliding 128 nt window, and provides the Fourier magnitude of the distribution of each dinucleotide at a periodicity of approximately 10 nt.  hp_fft takes as input the fractional occurrence of each dinucleotide at each sequence position in the sequence of interest.  The factional distribution is generated with the utility dinucleotide_frequencies, which takes as input the FastA format sequence file, which may contain multiple sequences representing nucleosomes positioned at specific genomic features. hp_fft requires the open source FFTW library (www.fftw.org).

    NUCPOS
     
    Fig. 1. (A) Distribution of co-aligned nucleosomes into bins. Nucleosomes present on coding regions (filled circles) and on non-coding regions (white circles) are shown. (B) The average nucleosomal occupancy of a polyadenylation site.  Note the nucleosome depleted region at position 0, and the strongly positioned nucleosome at position 150. (C) Fourier amplitude of a genomic region.  Dinucleotide distributions that could support well-positioned nucleosomes are present at positions 50 and the region 200-400. All images were generated from NUCPOS output files with Gnuplot.



    Modern computing has enabled research that was previously considered unfeasible. Parallel algorithms have been developed to run over powerful multicore machines. For even more computing power, these machines can be aggregated together into large high performance computing (HPC) clusters. On these clusters, jobs can be spread out across a large number of nodes instead of being executed on a single machine. This can substantially decrease the time required to execute resource intensive modeling and simulation jobs – a common requirement in the field of biophysics. It is also useful when a large number of much smaller jobs need to be executed. Unfortunately, running jobs on a cluster involves a steep learning curve. Jobs must be submitted via software systems known as resource managers. These systems can usually only be run via the command line and require expertise that most researchers don't have.

    To solve this problem, we have developed JMS, a web-based front-end to an HPC cluster. JMS allows users to run, manage and monitor jobs via a user-friendly web interface. It also lets users create new tools that can be pipelined together along with existing tools to create complex computational workflows. These workflows can be saved, versioned and reused as needed. A detailed job history of all jobs is stored and can be accessed and download at any time. All tools, workflows and jobs can be shared with other users to create a highly collaborative work environment. In addition, tools and workflows can be made public via external interfaces. Although applicable to any field, JMS is currently being tailored toward structural bioinformatics with the introduction of tools and workflows for homology modelling, docking studies, and molecular dynamics.

    JMS has been open-sourced and is freely available at https://github.com/RUBi-ZA/JMS.

    JMS has been published in PLoS ONE:
    David K Brown, David L Penkler, Thommas M Musyoka, and Özlem Tastan Bishop "JMS: An open source workflow management system and web-based cluster front-end for high performance computing"  PLoS ONE 10(8): e0134273, 2015. doi: 10.1371/journal.pone.0134273


    The admixture mapping tools include a suite of tools for use on multi-way admixed populations to overcome the limitation of existing tools, which tend to work best with 2- or 3-way admixed populations only. The admixture tools included in this project are:
    1.    Tool for selecting the best proxy ancestral populations for an admixed population
    2.    Tool for inferring local ancestry in admixed populations

    The first tool is an important precursor for the second as identifying the correct ancestral populations is crucial to be able to accurately infer local ancestry. A prototype for this first tool has already been developed in the group. PROXYANC has two novel algorithms including the correlation between observed linkage disequilibrium in an admixed population and population genetic differentiation in ancestral populations, and an optimal quadratic programming based on the linear combination of population genetic distances (FST). PROXYANC was evaluated against other methods, such as the f3 statistic using a simulated 5-way admixed population as well as real data for the local South African Coloured (SAC) population, which is also 5-way admixed. The simulation results showed that PROXYANC was a significant improvement on existing methods for multi-way admixed populations.

    For the second tool, we have evaluated some of the existing methods for inferring local ancestry (or locus-specific ancestry) and determining the date of admixture on multi-way admixed populations including the SAC and simulated data. These methods include HapMix, ROLLOFF and a PCA-based method, StepPCO for dating admixture, and WinPOP and LampLD for local-ancestry. All three of the dating tools gave quite different predictions of the date of admixture events, showing the lack of accuracy of existing methods and need for a better one

    PROXYANC

    PROXYANC implements an approach to select the best proxy ancestral populations for admixed populations. It searches for the best combination of reference populations that can minimize the genetic distance between the admixed population and all possible synthetic populations, consisting of a linear combination from reference populations. PROXYANC also computes a proxy-ancestry score by regressing a statistic for LD (at short distance < 0.25 Morgan) between a pair of SNPs in the admixed population against a weighted ancestral allele frequency differentiation. Download PROXYANC.

    PROXYANANC can select AIMs based on the relationship between the observed local multi-locus linkage disequilibrium in a recently admixed population and ancestral population difference in allele frequency and based on the Kernel principal component analysis (Kernel-PCA), which is the extension of the linear PCA.

    PROXYANC can identify possible unusual difference in allele frequency between pair-wise popualtions, as signal of natural selection.

    PROXYANC compute the expected maximun admixture LD from proxy ancestral populations of the admixed population.

    PROXYANC compute population pair-wise Fst (Genetic distance).

    Downloads:

    ancGWAS

    ancGWAS is an algebraic graph-based method to identify the most significant sub-network underlying ethnic difference in complex diseases risk in a recently admixed population. This approach integrates the association signal from a GWAS data set, the local ancestry, and SNP pair-wise linkage disequilibrium from the admixed population into the PPI network.

    Downloads:

    ancMETA

    ancMETA is an application for leveraging cross-population Gene/Sub-network Meta-analysis to recover Disease Association Signal (DAS) risk in a homogenous or recently admixed population. This approach integrates the association signal from a GWAS data set, the local ancestry, and SNP pair-wise linkage disequilibrium into both the PPI and Protein-functional network.

    Downloads:


    The Human Mutation Analysis (HUMA) web server has been developed as a freely available platform for the analysis of genetic variation in humans. HUMA provides an extensive database, populated from a myriad of different sources, and incorporates a number of tools to analyse and visualize the data. The HUMA database is populated with genes, transcripts, exons, proteins and protein structures, diseases and variations. All data has been linked to allow advanced search functionality. For example, searching for a protein will provided all related data including the genes that code it, the known SNPs and other variations within it, the diseases associated with it, and all the experimentally determined PDB structures.

    The HUMA database has been created with the aim of analysing the effects on non-synonymous SNPs on protein stability and function. In order to do this, analysis tools needed to be incorporated into the web server. Firstly, a BLAST tool has been incorporated to allow users to search the HUMA database for homologous proteins. Secondly, a homology modelling pipeline has been included to allow users to model proteins with variations from the database included. In addition, tools, such as Polyphen 2.0 and nsSNPAnalyzer, that try to predict the effects of variations, will also be made available via the web interface.

    Collaboration features have been built into HUMA to facilitate the sharing of job results and analysis. HUMA also allows users to upload their own variation data to the server. These datasets are stored privately and users can choose to share them with other users and groups.
    Once launched, the HUMA web server will be freely available at https://huma.rubi.ru.ac.za


    Genesis can be used to display screen and publication quality pictures of population PCA and admixture charts and has been developed by: Wits Bionformatics, Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg

    Why use Genesis?

    Genesis takes the output of popular programs such as Admixture and EIGENSTRAT and produces good quality pictures, which the user can interactively change. There are first class tools that can be used to create good quality pictures, but they require expertise to use and best used when one already knows exactly what the output should look like. In practice there is a huge need for an interactive tool. Which PCs display interesting data can be interactively explored. Which colours are best to use is not just an aesthetic problem: in some cases a set of colours works well but with other data the same colours doesn't because the colours don't clearly contrast with a new position of the objects being drawn. There may be a need to rearrange the labelling or the data. We want to make the fonts as big as possible, but what is "big as possible" depends on the quantity and arrangement of data. Often when displaying admixture charts, multiple charts are shown in one diagram, we need to to keep consistency of colours and may want to play with the ordering of data.

    We see the need for an interactive tool that can be used to explore possibilities and produce good quality data. Although tools like Distruct and R are more flexible and produce very high quality pictures, Genesis is interactive and requires much less expertise to use.

    Requirements:

    Genesis requires Java 1.7 with SWT libraries installed. Genesis runs on Windows, Linux and MacOS X.  For Mac OS X, X11 must be installed. (Download XQuartz here)

    On Windows and Linux, the program should be run as:

    java -jar Genesis.jar

    On Mac OS X, X11 must be installed and the program should be run as:

    java -XstartOnFirstThread -jar Genesis.jar

    Some sample data files can be found here

    Download: The executable can be otained from here (Latest version:0.2.5 27 January 2015)

    GIT hub:  https://github.com/shaze/genesis
    From the command line: git clone https://github.com/shaze/genesis

    Documentation: The manual is available as a pdf file


    Subcategories

    All articles on upcoming webinars

    © 2021 H3ABioNet.org
    Terms - Privacy - Policy & Safety - Helpdesk
    The H3ABioNet website content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health