Quantcast
Channel: FaST-LMM: FActored Spectrally Transformed Linear Mixed Models
Viewing all 87 articles
Browse latest View live

Updated Wiki: Home

$
0
0

More current research and software can be found at the eScience FaST-LMM site.

FaST-LMM (Factored SpectrallyTransformed Linear MixedModels) is a program for performing genome-wide association studies (GWAS) on large data sets. It runs on both Windows and Linux system, and has been tested on data sets with over 120,000 individuals [1,2].

  1. J. Listgarten*, C. Lippert*, C.M. Kadie, R.I. Davidson, E. Eskin, and D. Heckerman*.Improved linear mixed models for genome-wide association studies.Nature Methods, 9: 525-526, June 2012 (doi:10.1038/nmeth.2037). (*equal contributions)
  2. C. Lippert*, J. Listgarten*, Y. Liu, C.M. Kadie, R.I. Davidson, and D. Heckerman*. FaST linear mixed models for genome-wide association studies.Nature Methods, 8: 833-835, Oct 2011 (doi:10.1038/nmeth.1681). (*equal contributions)

Newer papers with improved algorithms can be found on the MSR eScience site for FaST-LMM.

Releases

Please go to the MSR eScience site for FaST-LMM for newer releases.

2012/05      - Future updates are now hosted on the MSR eScience site for FaST-LMM.

2012/04/06 - v1.09 - Several usability enahancements (MaxThreads), update to latest MKL libs (v10.3.7),
                                and misc fixes
2012/03/17 - v1.08 - Improve perf of SNC detection, add -extractSimTopK option, doc updates, 
                                -verboseOut fixes
2012/03/06 - v1.07 - Fix bug 17541 -sim option not processing the alternate phenotype file properly
2012/02/27 - v1.06 - Detection and filtering of constant genotypes in a SNP
                               + Bug fixes for binary and transposed files
2012/02/02 - v1.05 - Bug fixes
2012/01/05 - v1.04 - Add support for using dosage information + bug fixes
2011/11/21 - v1.03 - Add several perf enhancements relating -extract and -numjobs for improved
                                throughput and reduce memory usage
2011/09/27 - v1.02 - Add several perf enhancements relating to C++ for improved throughput and
                                reduce memory usage
2011/09/14 - v1.01 - Add -extract support to C# and small clean-up changes
2011/09/04 - v1.00 - Initial Release

 


Released: FastLmm v1.09 Binaries for Windows and Linux (Apr 09, 2012)

$
0
0

-------------------------------------------------------------------------------------
More current research and software can be found at the eScience FaST-LMM site:
http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm
-------------------------------------------------------------------------------------

These files contain the files necessary to run FaSTLMM on Windows or Linux along with the license and users manual. To download FaSTLMM source code, please follow the changeset link located above to the Source Code tab.

The FaSTLMM.Win.zip download contains both C++ and CSharp executable versions of FaSTLMM. No installer is required, just UnZip the file into a directory and run from there. Or put the installation directory on your path and run it from anywhere. The C++ version included runs on the 64 bit OS version while the CSharp version runs on both 64 and 32 bit versions of the OS.

The FaSTLMM.Linux.zip download contains the C++ executable and has been run on the Ubuntu distribution.

NOTE: The Beta status does not indicate a low quality release. The Beta status does indicate there is ongoing active development with updates and new features being delivered frequently.

v1.09
This version contains several usability enhancements sent from users. This includes a 'pre-check' that the output file is writable to prevent a write failure at the end of a long run. It also includes support for -MaxThreads to control threading and CPU usage. This is in support of running multiple jobs in a cluster situation. We also updated the system to use the latest MKL libraries (v10.3.7) on both Windows and Linux. And we fixed some bugs where intersection of data across plink, alternate phenotype, and covariates files were improperly failing to produce output.

v1.08
This version contains significant performance improvements to SNC detection and reduce the memory requirements for processing binary files. In addition -verboseOut reports have been 'cleaned' up the doc has been updated to reflect the new output format and options.

v1.07
This version contains the fix for bug 17541 which was causing an error in -sim file operations. Individuals must be unique in the family name and individual name and the unique key is a concatenation of these two pieces of information with a space separator. The -sim file requires a <tab> to separate fields so FAM<sp>INDIVIDUAL is a valid input

V1.06
This version adds detection and filtering of constant genotypes in a SNP allong with bug fixes.

V1.05
This version primarily fixes problems in -extract C++ memory management and the C# dll references to Sho.

V1.04
This version now contains the much requested dosage data. See the updated manual for specific instructions. In addition, several fixes around file handling on Linux were incorporated in this drop.

V1.03
This version has several more improvements to throughput and memory reduction, especially when partioning the inputs with -numjobs or -extract and numerous bug fixes. (Linux version updated 12/12/2011 with small bug fix - see change list 83740)

V1.02
The C++ version has several improvements to throughput and memory reduction.

V1.01
The C# version now supports the -extract option and the documentation is updated.
In addition, these files contain some small clean-up changes from the 1.0 release

V1.0
Initial release

Updated Release: FastLmm v1.09 Binaries for Windows and Linux (Apr 09, 2012)

$
0
0

-------------------------------------------------------------------------------------
More current research and software can be found at the eScience FaST-LMM site:
http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm
-------------------------------------------------------------------------------------

These files contain the files necessary to run FaSTLMM on Windows or Linux along with the license and users manual. To download FaSTLMM source code, please follow the changeset link located above to the Source Code tab.

The FaSTLMM.Win.zip download contains both C++ and CSharp executable versions of FaSTLMM. No installer is required, just UnZip the file into a directory and run from there. Or put the installation directory on your path and run it from anywhere. The C++ version included runs on the 64 bit OS version while the CSharp version runs on both 64 and 32 bit versions of the OS.

The FaSTLMM.Linux.zip download contains the C++ executable and has been run on the Ubuntu distribution.

NOTE: The Beta status does not indicate a low quality release. The Beta status does indicate there is ongoing active development with updates and new features being delivered frequently.

v1.09
This version contains several usability enhancements sent from users. This includes a 'pre-check' that the output file is writable to prevent a write failure at the end of a long run. It also includes support for -MaxThreads to control threading and CPU usage. This is in support of running multiple jobs in a cluster situation. We also updated the system to use the latest MKL libraries (v10.3.7) on both Windows and Linux. And we fixed some bugs where intersection of data across plink, alternate phenotype, and covariates files were improperly failing to produce output.

v1.08
This version contains significant performance improvements to SNC detection and reduce the memory requirements for processing binary files. In addition -verboseOut reports have been 'cleaned' up the doc has been updated to reflect the new output format and options.

v1.07
This version contains the fix for bug 17541 which was causing an error in -sim file operations. Individuals must be unique in the family name and individual name and the unique key is a concatenation of these two pieces of information with a space separator. The -sim file requires a <tab> to separate fields so FAM<sp>INDIVIDUAL is a valid input

V1.06
This version adds detection and filtering of constant genotypes in a SNP allong with bug fixes.

V1.05
This version primarily fixes problems in -extract C++ memory management and the C# dll references to Sho.

V1.04
This version now contains the much requested dosage data. See the updated manual for specific instructions. In addition, several fixes around file handling on Linux were incorporated in this drop.

V1.03
This version has several more improvements to throughput and memory reduction, especially when partioning the inputs with -numjobs or -extract and numerous bug fixes. (Linux version updated 12/12/2011 with small bug fix - see change list 83740)

V1.02
The C++ version has several improvements to throughput and memory reduction.

V1.01
The C# version now supports the -extract option and the documentation is updated.
In addition, these files contain some small clean-up changes from the 1.0 release

V1.0
Initial release

New Post: -excludeByPosition

$
0
0
-excludeByPosition is extremely slow. Is there a workaround for this problem?

Also, user manual states that 2000000 bp work well with human data, any ideas what would be sufficient for mouse data?

New Post: No reqults in output

$
0
0
will answer my own question. Your software doesn't work on large data, guys. I don't know what's the limit of number of SNPs and I don't have time to check it.
Cheers

New Post: No reqults in output

$
0
0
chrustaly,

Can you give a bit more on info on which version you are using and what your specific command line is.
Hopefully you saw the home page pointed to a new software distribution site and are using v2.06 from there.
In terms of sizes, I have run fastlmm on ~500k snps with a 15k sample size. Others have run w/ ~1M snps and ~100k sample sizes.

New Post: No reqults in output

$
0
0

Hey,

I didn’t see your new site, will definitely take a look.

I was running it on 2 different sets, ~400k snps x ~200 people each and got as the result all 0.

Then I processed those sets with snip-snip and run fastlmm again and it worked (on about 20k snps).

I don’t have many missing values, so it shouldn’t be a problem. Format was exactly the same for 400k and 20k. So it should be the size or I don’t know…

Thank you for your answer.

Katya

New Post: No reqults in output

$
0
0

PS Probably fastlmm fails on exactly same genotype in whole column?


New Post: No reqults in output

$
0
0
I think fastlmm removes a specific SNP from consideration if there is no genotype variation in the sample set after all the individuals are selected.
The best way for us to help get through or fix problems is to get the steps to reproduce the specific problem here on our hardware. The key pieces of information to be able to repro the issue are the version of the program (in the startup banner) and the command line where the problem occurs. If the problem is data specific, then a sample of the data that reproduces the problem is really helpful too.
-bobd-

New Post: No reqults in output

$
0
0

I will run it later today and will send you the output message and everything else that I will get. I don’t remember, but hope that it will be a log file.

Unfortunately I will not be able to give you my dataset, since it was obtained in our lab and is private.

But I will try my best to help.

New Post: No reqults in output

$
0
0
Understand about the dataset. We will do our best to help figure out what is wrong in your situation too.
For reporting problems, most of the output is going to stderr, so to redirect it you have to use 2>&1 1>file.log
also, sending email to fastlmm at microsoft dot com with other questions will get to more
people involved with fastlmm

Commented Issue: Incorrect ID key mapping with -sim command [18361]

$
0
0
Seems like issue 17541 still isn't completly resolved in version 1.09 for Windows.
 
I use this command to read in a created sim-file:
fastlmmc -bfile data.fastlmm -pheno data.pheno -sim data.sim
 
I get this error message:
Fatal Error : Cannot create column major array from Vector and Mapping. kernelColumns contains [0] copies of mapping key [1 123] (epected 1 copy).
 
Has http://fastlmm.codeplex.com/SourceControl/changeset/changes/87382 not yet been implemented in the latest version?
Comments: ** Comment from web user: mihuzx **

hi,

Have the problem been solved?
but I came aross the same problem.Would you please give any suggestion about this problem. how can I fix my sim-file and make it work.
Appreciate your quick reply.

Commented Issue: Incorrect ID key mapping with -sim command [18361]

$
0
0
Seems like issue 17541 still isn't completly resolved in version 1.09 for Windows.
 
I use this command to read in a created sim-file:
fastlmmc -bfile data.fastlmm -pheno data.pheno -sim data.sim
 
I get this error message:
Fatal Error : Cannot create column major array from Vector and Mapping. kernelColumns contains [0] copies of mapping key [1 123] (epected 1 copy).
 
Has http://fastlmm.codeplex.com/SourceControl/changeset/changes/87382 not yet been implemented in the latest version?
Comments: ** Comment from web user: mihuzx **

hi,
The problem solved. I read the user-manual of version2.07 and found the tile row also need family ID and individual ID separated by a space which is not explained in the version1.09.
thanks all the same

Commented Issue: Incorrect ID key mapping with -sim command [18361]

$
0
0
Seems like issue 17541 still isn't completly resolved in version 1.09 for Windows.
 
I use this command to read in a created sim-file:
fastlmmc -bfile data.fastlmm -pheno data.pheno -sim data.sim
 
I get this error message:
Fatal Error : Cannot create column major array from Vector and Mapping. kernelColumns contains [0] copies of mapping key [1 123] (epected 1 copy).
 
Has http://fastlmm.codeplex.com/SourceControl/changeset/changes/87382 not yet been implemented in the latest version?
Comments: ** Comment from web user: bobd00 **

Glad your were able to resolve the problem using the new docs.

best,
-bobd-

New Post: Found missing SNP minor or major allele information in file

$
0
0
Hi there,

I'm trying fast-lmm to run a GWAS on a quantitative trait and ran into the following error:

Fatal Error : Found missing SNP minor or major allele information in file [test.bim] near line 1023:41.
found [0] [T]

My samples involve 500K SNP with ~7000 individuals.

The command I used was:
fastlmmc -verboseOutput -bfile test -bfileSim test -pheno pheno.txt -covar covar.txt -out test_pheno &

I wonder if I should exclude the SNPs that do not have the minor / major allele information. But, I can't find how I can do so from the plink options. Any idea would be appreciated.

Thank you!

Katie

New Post: Found missing SNP minor or major allele information in file

$
0
0

Katie,

Could you let me know what version of fastlmm are you using and what host OS? That helps me to better understand the context of any questions coming in.

The PLINK docs say to encode the snp should have both alleles encoded or no alleles encoded. If you are on the x chromosome or y chromosome without a pair, they suggest coding as homozygous rather than missing. In this case, I would expect the SNP to be encoded as [T] [T] or [0] [0].

We usually fix the input so we know what data is being operated on in an attempt to avoid surprises.

I don’t believe there is a switch to fixup or ignore the input we think is suspect.

Thanks

-bobd-

New Post: Found missing SNP minor or major allele information in file

$
0
0
Hi bobd,

I'm using the "FaSTLMM.207.Linux" version and running on linux platform. It would be great if you can provide some hints on how to recode the homozygous on the x or y chromosome.

Thank you!

New Post: Found missing SNP minor or major allele information in file

$
0
0

Looking at the error you sent a little deeper and it indicated that the .BIM file was malformed. I was thinking this message was a specific genotype whereas a BIM file is a part of PLINK’s binary PED files format[details here]. A .BIM file is a text file where each line defines a SNP that is represented in the associated .BED file and a SNP must have at least two possible allele values for a location to be a SNP. A ‘0’ for an allele means it is missing which is an error when describing a SNP. This should be fixed

The BIM file format is:

Chromosome<tab>Snp_Id<tab>centiMorgans<tab>BasePosition<tab>Minor_Allele<tab>Major_Allele<newline>

1 RS1234 0 1 G A

I am not certain how it was created, but it seems there is a real problem there. It is important to understand the quality of the data you are operating on polymorphic and it looks like these offending SNPs should be removed or investigated to understand more completely what is happening.

Sorry I cannot help out more, but I think you need to understand how the BIM file was created with a SNP defined as “0 T”

-bobd-

New Post: Found missing SNP minor or major allele information in file

$
0
0
Hi Bobd,

It works after cleaning up those SNPs with weird coding of "0" instead of the normal "G/T/C/A". However, I'm not sure why the running of the fast-lmm got killed after the step "End Loading FastLmm Data:….". Any idea?

Thank you!

Katie

New Post: Found missing SNP minor or major allele information in file

$
0
0
Hi Katie,
fastlmmc should always print an error indication if it does not complete. I'll send a private message to see about getting some data to reproduce the problem where I can debug it.
Viewing all 87 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>