I sent Dr. Burnham a very polite request for his study data, protocols, stat programs and log files. He wrote me back within 2 hours but demurred citing a busy schedule and the desire to protect both his surveyors and the sampled neighborhoods. Both are totally understandable points but really don't extend to everything I asked for -- he could have an RA e-mail me his .do files in 5 minutes. I'll ask him again in six months.Comment Posted By DrSteve On 15.10.2006 @ 13:32
As I'd noted, it took me a long time to come to the conclusion I did, and I was pretty uncomfortable doing even that, but the more I think about it the more I think that spatial correlation "looks" to the statistical calculations the way any other strong within-cluster association would: Like a lack of variance in the underlying data.
The phenomenology for the homogeneity is different from other patterns we observe in clustering, say, children of the same case-head sharing environmental factors, but I think the result is probably the same -- the large SEEs we already see.
Their method isn't a first-best, but I'm no longer sure it overstates anything -- unless, as I'd also noted, the death certificates weren't unduplicated across households.
So this is me stepping back from my earlier statement to xii.Comment Posted By DrSteve On 13.10.2006 @ 08:28
You sound a bit closed-minded yourself, I'm afraid.Comment Posted By DrSteve On 12.10.2006 @ 15:27
What Iâ€™m interested in knowing is whether you, as a professional in this field, think that this studyâ€™s results are likely too high.
And, how likely, and how much too high.
Maybe very difficult to say. In statistics, we have formulae to use as tools, but these formulae are generally based on assumptions. Once those assumptions are violated, we depart from the nice formulae and sometimes the impact is hard to describe without re-deriving all the metrics from scratch. I know that's unsatisfactory, but there it is.
In other words, is the study worthless? And, since itâ€™s likely that this kind of study has been done before on other subjects, are they also worthless?
I would look askance at any study that geographically concentrated data collection where spatial correlation was an issue, particularly if it relied on a method where heterogeneity and independence of those data were important. But I'm not going to be baited into a blanket statement about cluster sampling. Legitimate methods can be applied to data that violate the assumptions required to use them.
Anyone who refutes this study should go and do their own. THe best way to refute a scientific study is with your own.
Maybe the best way, but certainly not the only way. I can evaluate a study, in many cases, by reviewing the data, the methods and assumptions used, even the batch files written to process the data. All of which I'm asking Burnham for. I've audited federal studies where the researchers used the wrong "weights" option in their software. I didn't need to re-perform the study from scratch to know their report was wrong.
And let's not let this devolve into a chickensurveyor argument, OK?Comment Posted By DrSteve On 12.10.2006 @ 15:17
Yes the clusters were allegedly randomly selected. My comment goes to nonrandom data collection within the cluster (beyond the selection of the first home), and the general requirement for heterogeneity within clusters if one's going to get all one needs out of having selected that methodology.
I understand there's a stated reason for the protocol, but the sampling math doesn't care whether it's a good reason or not.
I guess I'm just going to have to do an illustrative simulation at some point, but my intuition is that spatial correlation of war deaths results in overstatement of counts under this methodology. The rationale is that types of deaths are not spatially independent -- finding one increases the probability of finding another next door. I'm less sure about that than I am that the confidence intervals understate the actual standard error of the estimate here, but in any event I don't have a whole lot of confidence in their number.
I shouldn't have to jump up and down and shout about "grossly misleading findings" or professional misconduct, should I? This subject is a big deal. I'm trying to maintain some gravity here.Comment Posted By DrSteve On 12.10.2006 @ 12:28
"Iâ€™m waiting for one statistical expert to tell me that the methodology of this study is bad. Somebody step up, or shut the hell up."
Please see my numerous posts above. Maybe you don't think I qualify as an expert (PhD in Economics from a top-10 school, took every econometrics class offered, plus 10 years experience working with complex survey data).
Open minded Engineer:
"so 1 dead for every 4 randomly slected home. Thatâ€™s bad no matter how you look at it"
Reread the study. Only the initial home in each cluster was randomly selected. The others were contiguous to that home. That's not an innocuous detail. Spatial correlation would mean that the probability of deaths in one household could affect the probability of deaths in other households in the same cluster. Look at how fish & wildlife does its bird count cluster surveys -- if they start with a randomly selected observation point, they move a specified minimum distance away before setting up the next observation point. There are multiple reasons for doing this, some of which aren't relevant to the study we're discussing (birds are easier to double-count than people) but spatial correlation is a big consideration.Comment Posted By DrSteve On 12.10.2006 @ 11:12
Saying that a statistical method is "well-established" is one thing; showing that it's an appropriate method for the data is quite another.
Here's an example: Vector autoregression is a "well-established" econometric technique. But you can't use it on every single time series you encounter (you have to examine integration properties first). I'd suggest that using contiguous-household sampling to study the incidence of violent death in a war zone is a bad fit for the data, even if the overall sampling design is the "well-established" clustering method. To knock that application isn't to deny that the technique is well-established, it's to suggest that nearly every statistical technique relies on a series of distributional and other assumptions, and using them when these assumptions are violated can get you into trouble.Comment Posted By DrSteve On 12.10.2006 @ 06:44
"Why wouldnâ€™t spatial correlation error also be likely to produce underestimated death rates?"
Offhand I don't think I could speak to the direction of that effect. It would certainly muck up the SEEs even more, though.Comment Posted By DrSteve On 11.10.2006 @ 18:35
"A sample size of 12,000 was calculated to be adequate to identify a doubling of an estimated pre-invasion crude mortality rate"
Well, yes and no. Let's not let "a sample of 12,000" become the meme here. The units sampled were households, and only 1800-and-something of those were sampled. Now, 12,800-plus people correspond to those households, but this isn't the same as a random sample of 12,800-plus people because, *within* households, individuals *share* many of the characteristics that would be used to project to the larger population. One has to take account of the clustering in e.g. calculating standard errors of one's estimates, lest the software think you have more unique observations than you do. The software they used, Stata 8, has good routines for complex survey data, so provided they set up the design metadata correctly prior to analysis, the numbers as calculated by the program are correct.
Let me pipe up with one more thing -- clustering works best when, within a cluster, you have good heterogeneity. I'm not sure that picking a household from within a cluster and then going to all geographically contiguous households until you reach your quota of 40 gives you the heterogeneity you need. The incidence of violent death in this conflict might exhibit spatial correlation.Comment Posted By DrSteve On 11.10.2006 @ 15:40
I suppose I'd also be interested to see how the results would be affected if projected solely from the documented (i.e. certified) deaths. Validation of the household observations is terrifically important, since people could say anything they liked.
Household membership, etc., might also be subject to interpretation. Questions of interpretation of the data collection instrument often factor heavily into reported outcomes -- as was certainly the case with the early Card-Katz-Krueger studies on fast food employment and the minimum wage. Payroll data on restaurants in the same ZIP codes as the respondents in the original study didn't confirm the original phone questionnaire data at all. Just the opposite, in fact.Comment Posted By DrSteve On 11.10.2006 @ 15:09
Pages (2) :  2