I have been using B values to rank genes in order of more likely to
less likely (differentially expressed) in LimmaGUI.
I am now using Limma, I noticed the default value for the parameter
"proportion" (on the function eBayes) is set at 0.01 (expected 1%
differentially expressed genes). I didn't pay much attention to this
parameter before, because in LimmaGUI you cannot specify it.
However, now that I use "straight" Limma more I was playing with the
proportion parameter and it affects the B stats a lot. Therefore I
come
to the question of what's the best way to estimate this parameter.
My first guess is to use the P values (FDR, calculated by BH) to
decide
a cut off, usually 0.05. Then see how many genes are differentially
expressed according to that rule. And use this observed proportion of
differentially expressed genes as my proportion parameter.
Is this the correct way to do it?
Jose
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK

Jose,
I'm very glad you asked this question. One of the things that has made
me wary
of using limma is that the proportion of differentially expressed
genes is often
one of the primary things I'm trying to discover from the data, so I
feel uneasy
making an assumption as to what that proportion is. In your email
below, you say
that the output of limma is sensitive to the assumption, which, of
course, makes
me feel even more uneasy about it.
I've not noticed any responses on the BioC list. Has anyone commented
on this
issue to you?
-Ben
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-
> bounces at stat.math.ethz.ch] On Behalf Of J.delasHeras at ed.ac.uk
> Sent: Wednesday, April 19, 2006 8:06 AM
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] Limma: correct calculation of B statistics (log
odds)
>
>
> I have been using B values to rank genes in order of more likely to
> less likely (differentially expressed) in LimmaGUI.
>
> I am now using Limma, I noticed the default value for the parameter
> "proportion" (on the function eBayes) is set at 0.01 (expected 1%
> differentially expressed genes). I didn't pay much attention to this
> parameter before, because in LimmaGUI you cannot specify it.
>
> However, now that I use "straight" Limma more I was playing with the
> proportion parameter and it affects the B stats a lot. Therefore I
come
> to the question of what's the best way to estimate this parameter.
>
> My first guess is to use the P values (FDR, calculated by BH) to
decide
> a cut off, usually 0.05. Then see how many genes are differentially
> expressed according to that rule. And use this observed proportion
of
> differentially expressed genes as my proportion parameter.
>
> Is this the correct way to do it?
>
> Jose
>
> --
> Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
> Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

Hi Ben,
the only thing that changes is the B value. Everything else (I think!)
stays unaffected. If you don't want to use the B value, then I think
you can ignore that parameter (proportion) because I haven't noticed
any differences in the P values obtained, either adjusted or
non-adjusted for multiple testing.
Jose
Quoting "Wittner, Ben, Ph.D." <wittner.ben at="" mgh.harvard.edu="">:
> Jose,
>
> I'm very glad you asked this question. One of the things that has
> made me wary
> of using limma is that the proportion of differentially expressed
> genes is often
> one of the primary things I'm trying to discover from the data, so I
> feel uneasy
> making an assumption as to what that proportion is. In your email
> below, you say
> that the output of limma is sensitive to the assumption, which, of
> course, makes
> me feel even more uneasy about it.
>
> I've not noticed any responses on the BioC list. Has anyone
commented on this
> issue to you?
>
> -Ben
>
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-
>> bounces at stat.math.ethz.ch] On Behalf Of J.delasHeras at ed.ac.uk
>> Sent: Wednesday, April 19, 2006 8:06 AM
>> To: bioconductor at stat.math.ethz.ch
>> Subject: [BioC] Limma: correct calculation of B statistics (log
odds)
>>
>>
>> I have been using B values to rank genes in order of more likely to
>> less likely (differentially expressed) in LimmaGUI.
>>
>> I am now using Limma, I noticed the default value for the parameter
>> "proportion" (on the function eBayes) is set at 0.01 (expected 1%
>> differentially expressed genes). I didn't pay much attention to
this
>> parameter before, because in LimmaGUI you cannot specify it.
>>
>> However, now that I use "straight" Limma more I was playing with
the
>> proportion parameter and it affects the B stats a lot. Therefore I
come
>> to the question of what's the best way to estimate this parameter.
>>
>> My first guess is to use the P values (FDR, calculated by BH) to
decide
>> a cut off, usually 0.05. Then see how many genes are differentially
>> expressed according to that rule. And use this observed proportion
of
>> differentially expressed genes as my proportion parameter.
>>
>> Is this the correct way to do it?
>>
>> Jose
>>
>> --
>> Dr. Jose I. de las Heras Email: J.delasHeras
at ed.ac.uk
>> The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
>> Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
>> Swann Building, Mayfield Road
>> University of Edinburgh
>> Edinburgh EH9 3JR
>> UK
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
--
Dr. Jose I. de las Heras Email: J.delasHeras at
ed.ac.uk
The Wellcome Trust Centre for Cell Biology Phone: +44 (0)131
6513374
Institute for Cell & Molecular Biology Fax: +44 (0)131
6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK

Dear Gordon,
I apologize for not thanking you more quickly for your detailed and
thoughtful
response. I think I agree with everything you've said below, but now I
have
another concern on which I would like your opinion.
For many of the data sets I've dealt with, for many genes, the
variances of the
two classes do not seem to be equal. For example, the code below uses
R's
var.test() to produce a p-value for each gene and then plots a
histogram of the
p-values. The histogram can be viewed at
http://tinyurl.com/epdn7
The model implemented in limma seems to assume a single variance for
each gene.
Do you think this is a problem?
Thanks again,
-Ben
library('ALL')
data(ALL)
pdat <- pData(ALL)
subset <- intersect(grep('^B', as.character(pdat$BT)),
which(pdat$mol %in% c('BCR/ABL', 'NEG')))
eset <- ALL[, subset]
i1 <- which(eset$mol == 'BCR/ABL')
i2 <- which(eset$mol == 'NEG')
pvals <- apply(exprs(eset), 1, function(v) (var.test(v[i1],
v[i2])$p.value))
jpeg(filename='ALL.jpeg', width=240, height=240)
hist(pvals, col='green',
main='Histogram of var.test() pvals for ALL BCR/ABL vs NEG')
dev.off()
> -----Original Message-----
> From: Gordon Smyth [mailto:smyth at wehi.EDU.AU]
> Sent: Thursday, April 20, 2006 8:02 PM
> To: Wittner, Ben, Ph.D.
> Cc: bioconductor at stat.math.ethz.ch; J.delasHeras at ed.ac.uk
> Subject: [BioC] Limma: correct calculation of B statistics (log
odds)
>
> Dear Ben,
>
> Please see also my longer reply to Jose in a separate email.
>
> The t-statistics, p-values and gene rankings provided by limma do
not
> depend on the assumed proportion. In fact part of the motivation for
> developing the moderated t-statistics was to obtain a statistic with
> the same power as the posterior odds without needing this
> difficult-to-estimate quantity.
>
> While the B-statistic does depend on the prior assumed proportion,
> this is dependence is very straightforward, well understand and
> explicit. The prior log-odds simply adds a constant to all the
> genewise B-statistics. It doesn't change the ordering.
>
> I agree with your desire to avoid dependence on unjustified
> assumptions. My approach in limma has been to minimise assumptions
> where possible but otherwise to make the assumptions very explicit.
>
> What I personally feel uneasy about are statistical methods which
> propose to estimate quantities about which the data contains very
> little information. The dependence on assumptions may be hard to
see.
> It seems to me that the proportion of DE genes is just such a
> quantity, because its estimation must be highly sensitive to model
> assumptions in small microarray experiments. I could easily provide
> an automatic estimate of this quantity as part of the eBayes()
> computations in limma, but I deliberately chose not to do this.
>
> Expanding a little further on this topic, it seems to me that a
> biologically meaningful treatment of the proportion of truly DE
genes
> would require a more careful definition of the concept of
> differential expression than has so far appeared in the literature.
> It seems to me that mathematicians and biologists have different
> things in mind when they think of this quantity. Mathematicians are
> including many genes with very small fold changes which the
> biologists would do not consider of interest. A biologically
> meaningful treatment would have to specify how large a fold change
> needs to be in order to be considered material. I suspect that
> biologists are going to be surprised by how sensitive the estimated
> proportion is to this threshold.
>
> Best wishes
> Gordon
>
> >[BioC] Limma: correct calculation of B statistics (log odds)
> >Wittner, Ben, Ph.D. Wittner.Ben at mgh.harvard.edu
> >Thu Apr 20 19:40:10 CEST 2006
> >
> >Jose,
> >
> >I'm very glad you asked this question. One of the things that has
made me
> wary
> >of using limma is that the proportion of differentially expressed
> >genes is often
> >one of the primary things I'm trying to discover from the data, so
I
> >feel uneasy
> >making an assumption as to what that proportion is. In your email
> >below, you say
> >that the output of limma is sensitive to the assumption, which, of
> >course, makes
> >me feel even more uneasy about it.
> >I've not noticed any responses on the BioC list. Has anyone
commented on
> this
> >issue to you?
> >
> >-Ben