Log in

No account? Create an account
Applied Data Analysis
[Most Recent Entries] [Calendar View] [Friends]

Below are the 10 most recent journal entries recorded in Data analysis and applied statistics' LiveJournal:

Friday, April 27th, 2007
12:25 pm
R Help
I am currently working on this problem, but I'm not sure how to approach the second half of the question:

    In the data set (an artificial one of 3121 patients, that is similar to a subset of the data analyzed in Stiell et al., 2001) head.injury, obtain a logistic regression model relating clinically.important.brain.injury to other variables. Patients whose risk is sufficiently high will be sent for CT (computed tomography). Using a risk threshold of 0.025 (2.5%), turn the result into a decision rule for use of CT.

so far i have these commands:Collapse )

If anyone could help with the "risk threshold" or as advice on how to proceed, that'd be great. Thanks!
Saturday, October 14th, 2006
1:09 am
I built a web interface (and the back-end regression program) at http://www.braintechllc.com (a start-up company, not yet really doing much business, I think). Eerr, the direct link is http://www.braintechllc.com/mydatatools.aspx?Regression=1. I am interested if anyone knows of a similar tool on the internet that lets you run multivariate regression and view/play with the results online? I would love to get in touch with others who have participated in similar projects, and maybe get a few hints on how to speed up the process... Also I'm just looking for some feedback on what could be added to the tool.

I am a bit worried about crunching numbers all day on the web host's CPU, so any contacts with others who have done similar projects (and know any mathematic shortcuts to speed up the process) would be much appreciated! Peace,

Friday, March 17th, 2006
11:26 am
Whatever happened to Blind Signal Separation?
Assume you have a number of sources of sound, for instance radios, a fountain and chatter in a park. You have placed a number of microphones around the park and each of these microphones get a mix of all these sounds. The relative mixing of the sounds is different since the microphones are placed at different distances to the sources.
The challenge to single out the sound from each source is called signal separation and is an important problem in nonlinear dynamics (or so I'm told.) You may also call this noise removal if one or more of the signal sources are noise to you.

Blind Signal Separation (BSS) is one method to do this. The most heavily cited paper on it is A New Learning Algorithm for Blind Signal Separation and a good introduction (atleast the first few pages are) is Blind Signal Separation: Statistical Principles.

I'm new to the field and trying to get a feeling for where the "forefront in technology" is, especially regarding signals and simulation of non-linear systems. BSS seemed interesting, having little need for knowledge about the source of the signal. However, papers about this method have been reduced to a trickle over the last few years. Why is this? The 2001 paper A Proof of the Non-Existence of Universal Nonlinearities for Blind Signal Separation could have something to do with it..

Is BSS a dead topic? If so, what are currently good methods in signal separation and prediction of nonlinear systems?

With apologies for crossposting
Saturday, March 11th, 2006
6:02 pm
Statistics Questions
I'm trying to complete a take home test. I have answered everything, but I was the least confident about the following seven questions. I have the explanation why I chose the answer that I did, but like I said, I'm not too confident in my answer. Any help would be greatly appreciated. Although my goal is to answer everything correct... I would like it if someone can provide explanations of each of the below, so that I understand the questions.

1) In testing the hypothesis HO: µ1 = µ2 for independent samples using the one-way ANOVA, when compared to the t-test, the actual level of significance is:
***B)higher than α (I chose this because I thought in ANOVA it is α)
C)not able to be determined
D)lower than α

2) If the underlying assumptions of the ANOVA are violated, the actual Type I error rate will:
A) equal α
***B) be greater than α (I thought Type I errors were = 1-α)
C) be smaller than α
D) be greater or less than α

3) In determining the sample size, which of the following is not used?
A) The α level
B) Power
C) A standardized effect size
****D) The mean of the population (Usually we are deriving the mean after we have a sample size.)

4) One of the assumptions of in a two-way ANOVA is not:
A) there are equal numbers of observations in each cell of the factorial.
****B) the population variances in all cells of the factorial design are not equal. (I figured there was supposed to be homogeneity of variance in all these types of tests.)
C) the samples are random, independent, and from defined populations.
D) the scores on the dependent variable are normally distributed in the population.

5) It is not true that the F ration for a one-way ANOVA:
A) can be less than one.
B) involves at least two kinds of degrees of freedom.
****C) measures SS(Tr)/SSE. (I thought it measured a ratio of (MS something)/(MS error)
D) must be at least zero.

6) The omnibus test is:
****A) testing the null hypothesis with the ANOVA with a constant α level. (I'm unsure why I chose this, it just felt right.)
B) an alternative to the ANOVA.
C) used to determine which pairs of means are not equal.
D) None of these.

7) When a statistically significant _________ effect has only two levels, the nature of the relationship is determined in the same fashion for the independent groups t-test.
A) interaction
B) simple
****C) main (I chose this because I don't remember learning about a "simple" effect, and I wouldn't think that an interaction effect would ever be run with a t-test.)
D) a and b

Current Mood: busy
Wednesday, January 25th, 2006
6:16 pm
Statistics Questions
I'm trying to answer four questions, and I think I manight have answered them correctly, but I'm not entirely sure. I'm looking for any help (wisdom). If you see that I'm wrong, let me know, and feel free to give me an anser (as well as the reason). I appreciate the help:

1) A null hypothesis will be rejected if:
(a) alpha=.05 and p-value=.15
(b) the obtained value of z is closer to the null hypothesis mean than the critical value.
**(c) alpha=.05 and p-value=.03
(d) alpha=.01 and p-value=.03

2) Which of the following is a possible null hypothesis for a two-tailed hypothesis test?
**(a) population mean = 30
(b) population mean (NOT =) 30
(c) sample mean = 30
(d) population mean >= 30

3) If alpha = .05 and beta = .10 then
(a) p(reject Ho | Ho true) = .95
(b) p(reject Ho | Ho false) = .95
**(c) p(retain Ho | Ho true) = .95
(d) p(retain Ho | Ho false) = .95

4) When the alpha level is .05, a result is defined as being "unexpected" if the probability of obtaining the result, assuming the null hypothesis is true, is:
(a) less than .95
(b) greater than .95
(c) less than .05
**(d) greater than .05

Current Mood: busy
Tuesday, June 21st, 2005
12:02 pm
I joined Outfoxed (user Id recompiler and vladg).
One of my biggest pet peeves is it that it IS centralized even though it claims not to be. By default when most people join they trust Stan (very very very bad) and Outfoxed (bad but not TOO terrible). Since default length of trust is 3 nodes and Stan and Outfoxed is 2 hops away from anyone (even unwanted and hostile parties). The guy is in a hurry to write something cool and gather data for his masters project and he fucked up. There may have been a lot of growth on the system but the trust matrix is broken from the very start (vs how PGP key signing works). One thing they can try to do to fix it is in a month or so break the trust relationship between Stan and everyone that's not actually his friend and hope people started trusting other users on the system. One fun thing I found is rating the trust of processes on your computer. Out of boredom I added a bunch of random processes to trusted. Then I decided to artificially try to fix it for myself by creating a 2nd account for myself on another system creating high level of trust to my primary account and to my trusted friends and then setting both accounts to ignore/not trust Stan.
I urge everyone to write to the University Stan is in and voice your concerns. There is no shortcut to good data no matter how many front page stories on slashdot you get.
Read more...Collapse )
Thursday, November 18th, 2004
2:43 pm
Don't have the time
I don't have the time to analyze this, but here's an interesting point.

The largest counties were more likely to vote kerry, but also more likely to have etouch voting. If etouch voting randomly exchanged bush/kerry votes with a small percentage (either due to normal malfunctions or due to malice) then the NUMBER of votes randomly exchanged towards bush would be more than the number randomly exchanged towards kerry, because the likelihood of an etouch voter intending to vote kerry was considerably higher.

This doesn't require malice, just normal malfunctions of touch screens when tens of thousands of people go to use them.
2:39 pm
Voting data as collected at UC Berkeley

information on vote tallies and soforth. The raw data is available for your analysis pleasure.
Monday, September 27th, 2004
10:40 am
Phone usage
Cell phone providers have tons of "deals" that they try to get people to sign up for. Of course the cheaper your plan per minute, the more minutes you're likely to use, but probably only up to a point. Also there are various patterns of usage. Some people don't have wired phones at home for example.

Here's what I've got from my most recent bill:

My balance was $34.52 with taxes etc.

I used 179 total minutes, and 93 of them were "nights and weekends" (unlimited)

I consider myself a low usage person, and there is only one phone on my plan.

This means that my phone is costing me on average 19 cents a minute. I have a home phone with a good long distance plan so I prefer that for calls to friends on weekends.

How many of you optimize your phone plan?

How would you design a survey to model cell phone usage in the general public? what variables would need to be collected? How small a sample could you get away with?

My idea for survey questions:

1) Do you have a cell phone?
2) How many phones are on your plan?
2b) How many of your plan's phones are used by children?
3) How much did it cost you last month?
3a) Is long distance included in your plan or do you pay per minute for long distance?
3b) Do you use phone cards for long distance?
4) How many minutes did you use during regular hours?
5) How many minutes did you use during nights and weekends or other promotional or off peak times?
6) Do you have a home phone?
7) Do you have a long distance plan on your home phone?
8) Do you consider yourself a "very low, low, average, high, or very high" user of cell phones?
9) How old are you?
10) Do you like to talk on the phone?

I think with that data, you could produce a reasonably good statistical model of phone usage that could be used by consumers to choose plans.

My hypothesis is that most people are paying more for their plan than they need to because they buy too many minutes for their actual usage.

What other predictors of phone usage can you think of that should be included in that survey?
Tuesday, September 21st, 2004
10:07 am
First data set
Ok, here's a set of data from my power bill that I fooled around with yesterday. It has some missing data.

Our heat is natural gas, 90KBTU/hr input, we have no air conditioning. We use a fair number of compact fluorescent lights, and have about 1600 sq ft and 3 adults in the house. zip code is 94549 to key to weather.

I consider our household fairly low energy consumers, esp with no air conditioner, but I am amazed at how poorly we perform in the winter. An electric space heater could run all day long for those prices. We have several computers that run 24hrs/day.

Perhaps some others would care to add their own power consumption data and we could analyze the seasonal effects and variance.

Year	Month	ElecKwh	GasTherm	Dollars
2000	12	548	114	219.97
2001	1	523	118	276.07
2001	2	476	82	186.69
2001	3	377	45	101.81
2001	4	310	32	72.60
2001	5	345	28	69.80
2001	6	343	21	57.38
2001	7	342	23	52.18
2001	8	404	25	64.96
2001	9	378	54	78.34
2001	10	444	14	58.33
2001	11	492	76	112.91
2001	12	424	76	105.45
2002	1	610	133	182.44
2002	2	519	77	112.88
2002	3	518	61	98.91
2002	4	438	43	78.51
2002	5	306	28	54.06
2002	6	280	20	42.91
2002	7	323	20	40.85
2002	8	332	16	47.20
2002	9	322	17	40.39
2002	10	373	43	67.11
2003	8	371	16	
2003	7	396	17	
2003	6	423	26	
2003	5	480	49	
2003	1	633	92	
2003	2	549	84	
2003	3	521	63	
2004	8	393	18	60.58
2004	7	397	17	59.71
2004	6	449	24	71.57
2004	5	425	28	68.61
2004	1	655	107	200.53
2004	2	549	92	163.67
2004	3	536	67	116.91
About LiveJournal.com