New Post: Found missing SNP minor or major allele information in file

March 5, 2014, 5:31 pm

≫ Next: New Post: Difference between bfile and bfilesim

≪ Previous: New Post: Found missing SNP minor or major allele information in file

Hi there,

These are the last few lines of the verbose output:

           Number of Phenotypes:       1
            Number of SNPs Read:  500418
            Number of SNPs Used:  497252

-- End Processing PLINK fileset: [test]
-- End Loading Test Data:
/bin/bash: line 368: 32493 Killed

I suspect it's due to the limited memory of the machine I'm using?? Thanks!

↧

New Post: Difference between bfile and bfilesim

March 6, 2014, 10:02 pm

≫ Next: New Post: Difference between bfile and bfilesim

≪ Previous: New Post: Found missing SNP minor or major allele information in file

Hi there,

It would be great if I can learn about the difference between the bfile and bfilesim input files? May I have an example of the bfilesim please? or I can use the set of bile as bfilesim?

Thanks,
Katie

↧

New Post: Difference between bfile and bfilesim

March 7, 2014, 9:14 am

≫ Next: New Post: Input files

≪ Previous: New Post: Difference between bfile and bfilesim

The file formats are the same for both options. One file set is used to create the similarity matrix (*sim) and the other is the test data.

It is not uncommon to use the same file for both inputs doing a small sample but perhaps not the best method when using large datasets.

The doc talks about how to select a ‘better’ subset using the -autoselect option and this may lead to better results.

-bobd-

↧

New Post: Input files

March 7, 2014, 1:50 pm

≫ Next: New Post: Difference between bfile and bfilesim

≪ Previous: New Post: Difference between bfile and bfilesim

May I know if I need to order the families and individuals in the same way as in the pheno file? Thank you!

↧

New Post: Difference between bfile and bfilesim

March 7, 2014, 3:34 pm

≫ Next: New Post: Difference between bfile and bfilesim

≪ Previous: New Post: Input files

As I try to test only one snp here, I guess I don't need to use the -autoselect option to trim the data. Am I correct? Thank you!

↧

New Post: Difference between bfile and bfilesim

March 7, 2014, 4:04 pm

≫ Next: New Post: Input files

≪ Previous: New Post: Difference between bfile and bfilesim

Please take a look at the FaST-LMM-Select section of the documentation to understand more fully.
The reason you want to select the SNPs for the similarity matrix is reduce the noise and pull out a more of the signal for the snp of interest.
You can run with everything in the similarity matrix, but your prediction accuracy improves when you remove 'unrelated' noise.

↧

New Post: Input files

March 8, 2014, 7:11 pm

≫ Next: New Post: Detecting interaction with covariates using FaST-LMM

≪ Previous: New Post: Difference between bfile and bfilesim

No. The order of individuals in alternate phenotype file is not significant. They will be matched with the correct individuals in the other input files.

↧

New Post: Detecting interaction with covariates using FaST-LMM

April 13, 2014, 11:56 pm

≫ Next: New Post: No reqults in output

≪ Previous: New Post: Input files

Hello,
I was wondering whether it is possible to test gene-environment
interactions with FaST-LMM. Ideally, I would like to have in the output file new
columns with the p-values corresponding to the interactions between a
specified set of SNPs and an environmental exposure.

As far as I understand, even if it possible to specify a covariate
file, FaST-LMM does not make it possible to detect interactions with
covariates directly, i.e. by specifying an option in the commands.

I was wondering whether it would be possible to code the interactions
directly in the SNP file. For each SNP in my predefined set, this
would result in two new columns with the product alelle x env.

Could somebody give me a bit of advice on this issue?

↧

New Post: No reqults in output

July 31, 2014, 5:08 am

≫ Next: Created Unassigned: Re-use decomposition object [21195]

≪ Previous: New Post: Detecting interaction with covariates using FaST-LMM

Hi, I am wondering if you ever got to the bottom of this problem. I am facing the same problem now - I have input dosage files for the GWAS and have computed the genetic similarity matrix. When I use the dosage files (.dat and .fam) and the similarity matrix for association mapping, fastlmm gave Pvalues for some phenotypes but not others (all NAs). All the phenotypes are quantitative measures between -3 and 3, and missing values (small proportion of samples) are coded -9 which is also specified with the -missingPhenotype option. Under what circumstance would the Pvalues be NAs?

Thank you for your help!

↧

Created Unassigned: Re-use decomposition object [21195]

July 31, 2014, 7:43 am

≫ Next: New Post: No reqults in output

≪ Previous: New Post: No reqults in output

Hello,

I have a question concerning the re-use of the decomposition object.

My dataset contains 22.500 phenotypes and 60.000 SNPs for 200 samples. First I compute the spectral decomposition object and store this using -eigenOut xx for the first phenotype. For the remaining phenotypes, which were adressed using -mpheno, the already computed decomposition object should be used.
When I use -runGWAS NORUN and -eigen xx, nothing happens and I don't get any output data.
Is the -eigen option not included yet?
Thank in advance!

Franziska

↧

New Post: No reqults in output

July 31, 2014, 3:49 pm

≫ Next: New Post: Re-use decomposition object

≪ Previous: Created Unassigned: Re-use decomposition object [21195]

I believe we resolved the NaN issues I was aware of.
A few questions for you.
Can you please confirm which version and build of fastlmm you are running and what is your host OS?
And if you can send us repro instructions, I can take a deeper look.
thanks

↧

New Post: Re-use decomposition object

August 1, 2014, 2:27 am

≫ Next: New Post: Re-use decomposition object

≪ Previous: New Post: No reqults in output

↧

New Post: Re-use decomposition object

August 1, 2014, 3:02 pm

≫ Next: New Post: No reqults in output

≪ Previous: New Post: Re-use decomposition object

Franziska,

I am not sure what your command line is specifically, but it sounds like there is a bit of confusion.
The -eigenOut [directory] option produces the spectral decomposition of the similarity matrix that can be used in subsequent runs.
The NORUN option says do not run the actual GWAS code. This allows one to stop processing after the spectral decomposition is output if the –eigenOut option is used.
The -eigen [directory] option uses the files produced by -eigenOut

The expectation is you would run the -eigenOut [dir]and -rungwastype NORUN in the same command to produce the spectral decomposition output without running GWAS because you will be running multiple GWAS commands with the same similarity matrix.

Running a command with -eigen and NORUN loads the eigen vector and then stops.

-bobd-

↧

New Post: No reqults in output

August 15, 2014, 9:17 am

≫ Next: New Post: Putative bug in the program feature_selection_cv.py

≪ Previous: New Post: Re-use decomposition object

Hi,

Thank you for your reply, I am using FastLmmC v2.06.20130802 and the host OS is Linux, the starting lines of the stderr from the programme when I get it running are:

FastLmmC v2.06.20130802 - Factored Spectrally Transformed Linear Mixed Models [Release]

Copyright Microsoft Corporation -- Licensed Only for Non-Commercial use.

Compiled Aug 2 2013 at 23:01:11 by erg00lx for Linux

using MKL v11.00.04 - Build: 20130517

The command line I used:

FaSTLMM.206.Linux/Linux_MKL/fastlmmc -dosage1 converge_n11443.chr5_075_080 -pheno converge_11443samples_noduplicates_isCase.txt -mpheno 1 -simconverge.indep.no_X.sim -eigen kinship -REML -runGwasType RUN -logReg -out converge_11443samples_noduplicates_isCase.converge_n11443.chr5_075_080 -log -logDir log -maxThreads 1 -maxChromosomeValue 23 2> converge_11443samples_noduplicates_isCase.converge_n11443.chr5_075_080.stderr

Where -dosage1 converge_n11443.chr5_075_080 is dosage file containing dosages from a segment of the genome; -sim /well/mott-flint/caina/CONVERGE/kinship/converge.indep.no_X.sim and -eigen /well/mott-flint/caina/CONVERGE/kinship
 had been calculated beforehand with a small number of tagging SNP data from the whole genome.


For the NaN issue: It seem to occur for some phenotype and not others and I haven’t been able to generalise why… the phenotype I have here is a binary measure of disease status (this gave expected output with numerical P values), but
 there are other phenotypes I used, some binary and some quantitative, for which fastLMM gave all NaNs for output (I had disabled the -logReg option accordingly for the quantitative measures).

====


There is another issue I have just met with. I ran the particular command line I had given above for both that particular segment of the genome and all other segments. While most other segments had given expected results, this particular
 segment and some others (making up about 1/5 of the genome) had failed with this error message.

Compute GWAS w/ Logistic Regression:
Warning : deltaLL[1.91e+04] > maxDeltaLL[1.000000e-10]. Setting pval to 1.0
This may indicate optimizer parameters need tuning in the sources
Warning : deltaLL[3.47e+04] > maxDeltaLL[1.000000e-10]. Setting pval to 1.0
This may indicate optimizer parameters need tuning in the sourcesfastlmmc: ./Inc/minimize.h:302: bool ts::minimize::_LBFGS::IsDegenerateGradient(const TVector&, const TVector&) [with TScalar = double, TVector = std::vector<double>]:
 Assertion `false' failed. 

When I tried running associations with other phenotypes with the same dosage files, a different set of segments would fail in this manner for different phenotypes.

Could you tell what the problem is by looking at the error message? I would appreciate it if you can help me understand what’s going on, or if I am doing anything inappropriate.

Thank you very much for your time and I look forward to hearing from you.

Best,
Na 

↧

New Post: Putative bug in the program feature_selection_cv.py

August 21, 2014, 6:15 am

≫ Next: Created Unassigned: missing output files with feature_selection_cv.py for UNIX [21262]

≪ Previous: New Post: No reqults in output

Hi;
I get the following error message when I run feature_selection_cv.py:

C:\FastLmm.Py\fastlmm\feature_selection >python feature_selection_cv.py

File "feature_selection_cv.py", line33, ...
from fastlmm.pylink.snpreader.Bed import Bed, ...
ImportError: No module named fastlmm.pyplink.snpreader.Bed

I checked that the module was present and it is.

Could help me to find what's going wrong ?

Thanks in advance;

Regards;

P.

↧

Created Unassigned: missing output files with feature_selection_cv.py for UNIX [21262]

August 28, 2014, 12:07 am

≫ Next: New Post: Using indicator matrix as covariates

≪ Previous: New Post: Putative bug in the program feature_selection_cv.py

Hello;
I run feature_selection_cv.py for UNIX and I've got only the _mse.txt file. The other files, especially the most significant - _report.txt and _snps.cnv - are missing. The Log reports a wealth of warnings and a RunTime error ("invalid Display variable") at the end:

/usr/local/anaconda/lib/python2.7/site-packages/fastlmm/pyplink/plink.py:357: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
byte_a = ba_raw.reshape((SP.ceil(0.25*N),Sblock),order='F')
/usr/local/anaconda/lib/python2.7/site-packages/fastlmm/pyplink/plink.py:357: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
byte_a = ba_raw.reshape((SP.ceil(0.25*N),Sblock),order='F')
Traceback (most recent call last):
File "/usr/local/anaconda/lib/python2.7/site-packages//fastlmm/feature_selection/feature_selection_cv.py", line 934, in <module>
result = main()
File "/usr/local/anaconda/lib/python2.7/site-packages//fastlmm/feature_selection/feature_selection_cv.py", line 926, in main
best_k, best_delta, best_obj, best_snps = fss.perform_selection(k_values, delta_values, args.strategy, output_prefix=args.output_prefix, select_by_ll=args.select_by_ll, runner=runner)
File "/usr/local/anaconda/lib/python2.7/site-packages//fastlmm/feature_selection/feature_selection_cv.py", line 440, in perform_selection
result = runner.run(perform_selection_distributable)
File "/usr/local/anaconda/lib/python2.7/site-packages/fastlmm/util/runner/Local.py", line 25, in run
result = run_all_in_memory(distributable)
File "/usr/local/anaconda/lib/python2.7/site-packages/fastlmm/util/runner/__init__.py", line 40, in run_all_in_memory
return work.reduce(result_sequence)
File "/usr/local/anaconda-2.0.1/lib/python2.7/site-packages/fastlmm/feature_selection/PerformSelectionDistributable.py", line 118, in reduce
best_k_mse, best_delta_mse, best_mse, best_delta_mse_interp, best_mse_interp = self.feature_selection_strategy.reduce_result(mse_cv, self.k_values, self.delta_values, self.strategy, self.output_prefix, best_delta_for_k, label="mse")
File "/usr/local/anaconda/lib/python2.7/site-packages//fastlmm/feature_selection/feature_selection_cv.py", line 520, in reduce_result
pylab.figure()
File "/usr/local/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py", line 423, in figure
**kwargs)
File "/usr/local/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 31, in new_figure_manager
return new_figure_manager_given_figure(num, thisFig)
File "/usr/local/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 38, in new_figure_manager_given_figure
canvas = FigureCanvasQTAgg(figure)
File "/usr/local/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", line 70, in __init__
FigureCanvasQT.__init__( self, figure )
File "/usr/local/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py", line 207, in __init__
_create_qApp()
File "/usr/local/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py", line 62, in _create_qApp
raise RuntimeError('Invalid DISPLAY variable')
RuntimeError: Invalid DISPLAY variable
--------------------------------------------------------------------------
Could anyone tell what's going wrong ? Any clues that could help to fix the problem will be greatly appreciated.

PS: I installed the same program on my PC and it works perfectly so the problem is likely to come from a bad interaction between this program and the UNIX Platform that I use.

Thanks in advance for the hand.

P. Dubreuil

↧

New Post: Using indicator matrix as covariates

September 16, 2014, 8:05 am

≫ Next: Updated Wiki: Home

≪ Previous: Created Unassigned: missing output files with feature_selection_cv.py for UNIX [21262]

Hi,

I wonder if it is possible that using an indicator matrix for a covariates instead of the absolute values make any differences in the p-value distribution?

I have two high correlated covariates: batch (introduced as indicator matrix) and age (absolute values). Both show very different p-value distributions when I compute them with these covariates seperately. Could the indicator matrix format be the reason for that?

Thanks in advance!

↧

Updated Wiki: Home

October 15, 2014, 3:57 pm

≫ Next: New Post: Fatal Error : Expected phenotype indicator in file [allF2phe_plink.txt] near line 1:8 to be 0, 1, -9, or a floating point number.

≪ Previous: New Post: Using indicator matrix as covariates

More current research and software can be found at the eScience FaST-LMM site.

FaST-LMM (Factored SpectrallyTransformed Linear MixedModels) is a program for performing genome-wide association studies (GWAS) on large data sets. It runs on both Windows and Linux system, and has been tested on data sets with over 120,000 individuals [1,2].

J. Listgarten^*, C. Lippert^*, C.M. Kadie, R.I. Davidson, E. Eskin, and D. Heckerman^*.Improved linear mixed models for genome-wide association studies.Nature Methods, 9: 525-526, June 2012 (doi:10.1038/nmeth.2037). (^*equal contributions)
C. Lippert^*, J. Listgarten^*, Y. Liu, C.M. Kadie, R.I. Davidson, and D. Heckerman^*. FaST linear mixed models for genome-wide association studies.Nature Methods, 8: 833-835, Oct 2011 (doi:10.1038/nmeth.1681). (^*equal contributions)

Newer papers with improved algorithms can be found on the MSR eScience site for FaST-LMM.

Releases

Please go to the MSR eScience site for FaST-LMM for newer releases.

2012/05 - This code is deprecated. Future updates are now hosted on the MSR eScience site for FaST-LMM.

2012/04/06 - v1.09 - Several usability enhancements (MaxThreads), update to latest MKL libs (v10.3.7),
                                and misc fixes
2012/03/17 - v1.08 - Improve perf of SNC detection, add -extractSimTopK option, doc updates,
                                -verboseOut fixes
2012/03/06 - v1.07 - Fix bug 17541 -sim option not processing the alternate phenotype file properly
2012/02/27 - v1.06 - Detection and filtering of constant genotypes in a SNP
                               + Bug fixes for binary and transposed files
2012/02/02 - v1.05 - Bug fixes
2012/01/05 - v1.04 - Add support for using dosage information + bug fixes
2011/11/21 - v1.03 - Add several perf enhancements relating -extract and -numjobs for improved
                                throughput and reduce memory usage
2011/09/27 - v1.02 - Add several perf enhancements relating to C++ for improved throughput and
                                reduce memory usage
2011/09/14 - v1.01 - Add -extract support to C# and small clean-up changes
2011/09/04 - v1.00 - Initial Release

↧

New Post: Fatal Error : Expected phenotype indicator in file [allF2phe_plink.txt] near line 1:8 to be 0, 1, -9, or a floating point number.

November 13, 2014, 7:42 am

≫ Next: New Post: Fatal Error : Expected phenotype indicator in file [allF2phe_plink.txt] near line 1:8 to be 0, 1, -9, or a floating point number.

≪ Previous: Updated Wiki: Home

I am trying to use FaST-LMM the very first time, but I'm getting a following error, and I'm wondering whether the problem derives from wrong coding, or something else:

FastLmmC v2.07.20131102 - Factored Spectrally Transformed Linear Mixed Models [Release]
Copyright Microsoft Corporation -- Licensed Only for Non-Commercial use.
Compiled Nov 2 2013 at 20:09:57 by erg00lx for Linux v3.x kernel
using MKL v11.00.04 - Build: 20130517

++ Start Processing CommandLine:
-- End Processing CommandLine:

++ Start Loading FastLmm Data:
++ Start Loading Covariance Data:
Fatal Error : Expected phenotype indicator in file [allF2phe_plink.txt] near line 1:8 to be 0, 1, -9, or a floating point number. Found [foodconversion]

I'm very comfused about the error. It's ok when i performed this command in plink: plink --file cleanF2deChr0 --noweb --pheno allF2phe_plink.txt --mpheno 4 --epistasis --out test

Why did the phenotype file used in fastlmmc find error? Can anyone see any obvious errors with this code and/or files: fastlmmc -verboseOutput -file cleanF2deChr0 -fileSim cleanF2deChr0 -pheno allF2phe_plink.txt -out $savedir/test.dummy.txt

↧

New Post: Fatal Error : Expected phenotype indicator in file [allF2phe_plink.txt] near line 1:8 to be 0, 1, -9, or a floating point number.

November 13, 2014, 8:35 am

≫ Next: Updated Wiki: Home

≪ Previous: New Post: Fatal Error : Expected phenotype indicator in file [allF2phe_plink.txt] near line 1:8 to be 0, 1, -9, or a floating point number.

The error says fastlmmc was looking in the file allF2phe_plink.txt for a numeric value and found "foodconversion" in line 1 column 8 , so it sounds like the file is mal-formed.

Do you have a header row in the file allF2phe_plink.txt? If so, are the first two labels FID and IID?

If you don't have a header, then it sounds like you have the text string "foodconversion" where the format is expecting a numeric value.
The file reader does validation of the file as it is read so even though you are saying -mpheno 4, it is still expecting all the columns to be well formed.

What version of plink did you run?
The alternate phenotype file is described in more detail at http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#pheno

If that is not the problem, could you send allF2phe_plink.txt (or at least the first lines) so we can look it over?

thanks
-bobd-

↧