Introduction

hlmm is a python library for fitting heteroskedastic linear mixed models to genetic data.

A heteroskedastic linear model (hetlm.model) can model the effect of a set of variables on the mean of a response (such as a continuous phenotype) and the effect of a (potentially different) set of variables of the variability of the response

A hetlm.model models all effects on both the mean and variance of a response as fixed effects. A heteroskedastic linear mixed model (hetlmm.model) adds modelling of random effects of a set of variables on the mean of the response.

Modelling random effects can make fitting a hetlmm.model very computationally demanding, with the number of operations scaling with the cube of sample size.

The package hlmm provides the ability to fit a hetlmm.model much more quickly when the number of variables with random effects is small compared to the sample size. We use an algorithm whose operations scale in proportion to the sample size multiplied by the number of random effects squared.

Main features

hetlm.model: define heteroskedastic linear models and find maximum likelihood estimates of parameters

hetlmm.model: define heteroskedastic linear mixed models and find maximum likelihood estimates of parameters

hlmm_chr.py (Documentation for hlmm_chr.py script): command line script that fits heteroskedastic linear models or heteroskedastic linear mixed models to a contiguous segment of the genome. The script takes bed formatted genotypes as input. and can incorporate covariates for the fixed effects on the mean and/or variance.

fit_hlmm_model.py (Documentation for fit_hlmm_model.py script): command line script that fits a heteroskedastic linear model or a heteroskedastic linear mixed model to a given response (phenotype), mean covariates, variance covariates, and variables to model random effects for.

Quick install

We recommend installing using pip (https://pip.pypa.io/en/stable/). At the command line, type

sudo pip install hlmm

Detailed Package Install Instructions

hlmm has the following dependencies:

python 2.7

Packages:

  • numpy
  • scipy
  • pysnptools

We highly recommend using a python distribution such as Anaconda (https://store.continuum.io/cshop/anaconda/). This will come with both numpy and scipy installed and can include an MKL-compiled distribution for optimal speed.

To install from source, clone the git repository, and in the directory containing the HLMM source code, at the shell type

sudo python setupy.py install

or, on the windows command prompt, type

python setup.py install

Running tests

The tests directory contains scripts for testing the computation of the likelihoods, gradients, and maximum likelihood solutions for both heteroskedastic linear models (test_hetlm.py) and for heteroskedastic linear mixed models (test_hetlmm.py). To run these tests, a further dependency is required: numdifftools.

To run the tests, first install hlmm. Change to the tests/ directory and at the shell type

python test_hetlm.py

python test_hetlmm.py

Both tests should run without any failures.