Academics Use Imaginary Data in Their Research
Academia values the appearance of truth over actual truth.
HD DownloadAfter surviving a disastrous congressional hearing, Claudine Gay was forced to resign as the president of Harvard for repeatedly copying and pasting language used by other scholars and passing it off as her own. She's hardly alone among elite academics, and plagiarism has become a roiling scandal in academia.
There's another common practice among professional researchers that should be generating even more outrage: making up data. I'm not talking about explicit fraud, which also happens way too often, but about openly inserting fictional data into a supposedly objective analysis.
Instead of doing the hard work of gathering data to test hypotheses, researchers take the easy path of generating numbers to support their preconceptions or to claim statistical significance. They cloak this practice in fancy-sounding words like "imputation," "ecological inference," "contextualization," and "synthetic control."
They're actually just making stuff up.
Claudine Gay was accused of plagiarizing sections of her Ph.D. thesis, for which she was awarded Harvard's Toppan Prize for the best dissertation in political science. She has since requested three corrections. More outrageous is that she wrote a paper on white voter participation without having any data on white voter participation.
In an article in the American Political Science Review that was based on her dissertation, Gay set out to investigate "the link between black congressional representation and political engagement," finding that "the election of blacks to Congress negatively affects white political involvement and only rarely increases political engagement among African Americans."
To arrive at that finding, you might assume that Gay had done the hard work of measuring white and black voting patterns in the districts she was studying. You would assume wrong.
Instead, Gay used regression analysis to estimate white voting patterns. She analyzed 10 districts with black representatives and observed that those with more voting-age whites had lower turnout at the polls than her model predicted. So she concludes that whites must be the ones not voting.
She committed what in statistics is known as the "ecological fallacy"—you see two things occurring in the same place and assume a causal relationship. For example, you notice a lot of people dying in hospitals, so you assume hospitals kill people. The classic example is Jim Crow laws were strictest in states that skewed black. Ecological inference leads to the false conclusion that blacks supported Jim Crow.
Gay's theory that a black congressional representative depresses white voter turnout could be true, but there are other plausible explanations for what she observed. The point is that we don't know. The way to investigate white voter turnout is to measure white voter turnout.
Gay is hardly the only culprit. Because she was the president of Harvard, it's worth making an example of her work, but it reflects broad trends in academia. Unlike the academic crime of plagiarism, students are taught and encouraged to invent data under the guise of statistical sophistication. Academia values the appearance of truth over actual truth.
You need real data to understand the world. The process of gathering real data also leads to essential insights. Researchers pick up on subtleties that often cause them to shift their hypotheses. Armchair investigators, on the other hand, build neat rows and columns that don't say anything about what's happening outside their windows.
Another technique for generating rather than collecting data is called "imputation," which was used in a paper titled "Green innovations and patents in OECD countries" by economists Almas Heshmati and Mike Shinas. The authors wanted to analyze the number of "green" patents issued by different countries in different years. But the authors only had data for some countries and some years.
"Imputation" means filling in data gaps with educated guesses. It can be defensible if you have a good basis for your guesses and they don't affect your conclusions strongly. For example, you can usually guess gender based on a person's name. But if you're studying the number of green patents, and you don't know that number, imputation isn't an appropriate tool for solving the problem.
The use of imputation allowed them to publish a paper arguing that environmentalist policies lead to innovation—which is likely the conclusions they had hoped for—and to do so with enough statistical significance to pass muster with journal editors.
A graduate student in economics working with the same data as Heshmati and Shinas recounted being "dumbstruck" after reading their paper. The student, who wants to remain anonymous for career reasons, reached out to HeshmAati to find out how he and Shinas had filled in the data gaps. The research accountability site Retraction Watch reported that they had used the Excel "autofill" function.
According to an analysis by the economist Gary Smith, altogether there were over 2,000 fictional data points amounting to 13 percent of all the data used in the paper.
The Excel autofill function is a lot of fun and genuinely handy in some situations. When you enter 1, 2, 3, it guesses 4. But it doesn't work when the data—like much of reality—have no simple or predictable pattern.
When you give Excel a list of U.S. presidents, it can't predict the next one. I did give it a try though. Why did Excel think that William Henry Harrison' would retake the White House in 1941? Harrison died in office just 31 days after his inauguration—in 1841. Most likely, autofill figured it was only fair that he be allowed to serve out his remaining years. Why did it pick 1941? That's when FDR began his third term, which apparently Excel considered to be illegitimate, so it exhumed Harrison and put him back in the White House.
In a paper published in the journal of the American Medical Association and written up by CNN and the New York Post, a team of academics claimed to show that age-adjusted death rates soared 106 percent during the pandemic among renters who had received eviction filing notices, compared to 25 percent for a control group.
The authors got 483,408 eviction filings, and asked the U.S. Census how many of the tenants had died. The answer was 0.3 percent, and that 58 percent were still alive. The status of about 42 percent was unknown—usually because the tenant had moved without filing a change of address. If the authors had assumed that all the unknowns were still alive, the COVID-era mortality increase would be 22 percent for tenants who got eviction notices versus 25 percent who didn't. This would have been a statistically insignificant finding, wouldn't have been publishable, and certainly wouldn't have gotten any press attention.
Some of the tenants that the Census couldn't find probably did die, though likely not many, since most dead people end up with death certificates—and people who are dead can't move, so you'd expect most of them to to be linked to their census addresses. But some might move or change their names and then die, or perhaps they were missing from the Census database before receiving an eviction notice.
But whatever the reality, the authors didn't have the data. The entire result of their paper—the 106 percent claimed increase in mortality for renters with eviction filings versus the 22 percent observed rate—comes from a guess about how many of the unknown tenants had died.
How did they guess? They made the wildly implausible assumption that the Census and the Social Security Administration are equally likely to lose track of a dead person and a living one. Yet the government is far more interested in when people die than when they move, especially because they don't want to keep cutting them Social Security checks. Also, dead people don't move or change their names.
Whether or not their assumption was plausible, the paper reported a guess as if it reflected objective data. That's considered acceptable in academia, but it shouldn't be.
Another paper, titled "Association Between Connecticut's Permit-to-Purchase Handgun Law and Homicides," was published in the American Journal of Public Health. It cooked up data to use as a control. The study claimed to show that a 1994 gun control law passed in Connecticut cut firearm homicides by 40 percent. But firearm homicide rates in Connecticut followed national trends, with no obvious change after the 1994 law.
Forty percent compared to what? The authors arrived at their conclusion by concocting an imaginary state to serve as the control group, combining numbers from California, Maryland, Nevada, New Hampshire, and Rhode Island. This fictional state had 40 percent more homicides than the real Connecticut.
Reality is too messy for a technique like this to tell us anything meaningful. The author's entire finding derived from the fact that Rhode Island, which comprised most of "synthetic Connecticut," experienced a temporary spate of about 20 extra murders from 1999 to 2003, a large percentage increase in such a small state. Since the temporary spike in murders wasn't the result of a change in gun control policy, it tells us little about the efficacy of Connecticut's 1994 law or the policy issue at hand.
Is it always wrong to guess about missing data? No, not under conditions of extreme uncertainty in which data collection is impossible before a decision has to be made. For example, if you're considering taking a potentially life-saving medicine that hasn't been properly studied, you make the best guess you can with the information you have. But difficult decisions that have to be made with scarce information shouldn't influence public policy and aren't worthy of publication.
Yet researchers routinely rely on these methods to generate results on matters of no great urgency, because in academia publishing matters more than truth. Which is a shame. Progress in human knowledge requires real-world observations, not clicking a mouse and dragging it to the bottom of the screen.
Photo Credits: Michael Brochstein/Sipa USA/Newscom, Walter G Arce Sr Grindstone Medi/ASP, Graeme Sloan/Sipa USA/Newscom
Music Credits: Strange Connection by Nobou, Digital Dreams by Jimmy Svensson, Nothing Can Stop Us by Nobou, Hero Is Born by idokay, Sneaky Shenanigans by Charlie Ryan
- Video Editor: Adani Samat
- Audio Production: Ian Keyser
Editor's Note: As of February 29, 2024, commenting privileges on reason.com posts are limited to Reason Plus subscribers. Past commenters are grandfathered in for a temporary period. Subscribe here to preserve your ability to comment. Your Reason Plus subscription also gives you an ad-free version of reason.com, along with full access to the digital edition and archives of Reason magazine. We request that comments be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of reason.com or Reason Foundation. We reserve the right to delete any comment and ban commenters for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
Also: increasing numbers of social science, and even life and physical science, experiments cannot be reproduced when other skeptics try.
The "scientists" should be embarrassed. If they had any actual ethics.
"cannot be reproduced when other skeptics try."
That's not so surprising. If those trying to replicate the studies are skeptical of the intentions or honesty or competence of the original researchers, their own work is possibly compromised from the start. The gold standard of experimentation is the double blind test. Under such a protocol, the potential role of the skeptical scientist's personal prejudices is minimized. Such experiments are more complicated, time consuming and expensive, but experience shows they are more reliable.
It's a pretty unimaginative scientist who won't guess what an experiment will do.
"scientist who won’t guess what an experiment will do"
That's called a hypothesis, an essential start to any experiment. Still, it might be beneficial to shield the would-be replicator from knowing the identity of the original experimenter, as s/he may be inclined to overturn, even subconsciously, anything from a scientist whom s/he finds incompetent, or contemptible. Same goes for the publication or academic institution. Double blind tests are used to minimize prejudice, whether outright of subconscious. The mental state of the experimenters has been found to be decisive in some cases. I vaguely remember reading of a crystallographer who got positive results when nobody else could. The difference was the successful scientist prayed during the experiment while the others didn't.
Perhaps as science delves finer and finer, deeper and deeper into natural phenomena, the mental involvement of the scientist becomes more germane. It's an explanation I suspect won't find much traction here as the first impulse seems to be to demonize scientists as dishonest, corrupt retards.
No, jackass, I'm referring o your nonsensical dream robot world ...
Only a complete doofus can conduct an experiment without making some guesses as to what's going on, and there goes what you laughingly refer to as "double blind". What that phrase really means is no one involved knows whether the test subjects or samples are the control set or the experimental set. It doesn't mean that the experimenter is ignorant about what the test does. Maybe a bottom level lab tech doesn't know why he's irradiating some unknown sample, but the experimenter who told him to do that sure does.
Typical statist. Everyone's a cog in a great statist play directed by some elite.
"Only a complete doofus"
You've identified the person you're responding to.
I always just thought he was an ass. Recently I’ve realized he’s a dim bulb.
I'm suggesting that the attitude of the would-be replicator toward the person who performed the original experimenter may spoil the attempt. It's not so different from other research where some knowledge is purposefully kept from the researchers for fear it will influence, even subconsciously affect the results. It's the very same reason why those subjects in soft drink taste test advertising wear blindfolds during the tests. The idea is that the foreknowledge will prejudice the results. It's not all that difficult a concept, I promise you.
"Only a complete doofus can conduct an experiment without making some guesses as to what’s going on"
Guessing what's going to be the result of an experiment is called a hypothesis, or an hypothesis if you want to quibble.
"What that phrase really means is no one involved knows whether the test subjects or samples are the control set or the experimental set."
True enough. My intended meaning is that the would-be replicator should be shielded from knowing the identity of the original experimenter and any other information. like the name of the publication or the academic institute associated with the originator, that could prejudice the results. It's not really double blind at all, is it, and that seems to have confused you. Sorry for my lack of clarity. But please, I beseech you, not to underestimate or dismiss outright the experimenters mental state influencing results. And for god's sake don't fall into the statist cheap jack cynicism of assuming that scientists who don't get the results you expect are corrupt retards.
" It doesn’t mean that the experimenter is ignorant about what the test does."
It's not about the test or what the test does. It's about shielding the would-be replicator from who the original experimenter was. That way personal feelings and prejudices the would-be replicator has towards the originator should be removed from consideration.
"Typical statist. "
Again, you seem confused. I am not talking about states. I am proposing changes to the way attempts to replicate experiments are carried out. If an experimenter has a grudge against another scientist or institution, this may prejudice the results of replication, even at a subconscious level. These personal feelings could interfere with objectivity.
" Everyone’s a cog in a great statist play directed by some elite."
Again, you don't appear to have grasped my point. I urge you to set aside your preconceptions and begin again. If that's not helpful, have someone else read my comments here and have them explain my comments to you.
I was on the debate team, I can show you data from the god-forsaken NYT that says there are 200,000 homeless people or two million in the same month.
Bullshit. If the methodology and data of the original study is properly documented and disclosed, the motiviation of the reproducer will be irrelevant. The result is either reproducible or not.
Double-blind is a methodological control hopefully used in the original survey. It blinds the researcher from the data source, not the hypothesis.
Imaginary data in the aid of trying to bring about an imaginary utopia.
Another reason you can’t have chicks in charge
"Academics Use Imaginary Data in Their Research."
Well, how else are these over-educated useful idiots going to get their grant money to pay for their third vacation home in the Bahamas?
Tell the truth?
^BINGO. Look at how the funding is NOT *EARNED* (it's stolen) and it's pretty easy to see why there is no value in the result. If the result had any value it wouldn't require 'armed-theft' to fund it.
After diligent research, I found that 68.5% of research papers contain made-up statistics.
A few years ago, 2015, there was an attempt to reproduce important published psychology research. 40% could not be replicated. If fewer than 5% of chemistry research results had not been reproducible there would have been a firestorm in chemistry circles, but psychologists have mostly just shrugged this off.
Soft 'sciences' have always been less than accurate or even honest. Perhaps that explains why so many students choose them.
You get more of what you subsidize, and when you subsidize marginal students, universities have to invent marginal fields and hire marginal professors to teach them. And those marginal students and faculty know they are marginal, they are bored out of their skulls learning new pronouns and genders, and why not protest? Queers for Palestine is the result of such marginal thinking.
Anyone who thinks non-profit universities don't care about money should ponder how much of that sweet student loan money they are soaking up.
As economists say, incentives matter. When you get paid based partially upon how much you publish, and you don't get a dime for debunking what others publish, this is what you get.
As I understand it, chemistry is based on observable shit that exists in the physical world with stuff like chemicals and equations and whatnot. Psychology on the other hand is based on theories that some guy pulled out of his ass.
These day, it's more likely to be Some Gal.
Some Thing.
"psychologists have mostly just shrugged this off." lol...on what basis are you making such a pronouncement? There are lots of things happening on the research-side of psychology that are being implemented with the hope that this will do anything to the problem (e.g., required pre-registering of studies, making data public as a prerequisite of publication, statistical/methodological review of papers post-peer review). I know none of this gets reported about, so people don't realize. In the end, all of these things are "theater" anyway, because they won't actually address the issues that are driving most of this: money, prestige, and status.
Ahem .... climatastrophe pseudo science?
Good thing Ronald Bailey didn't write this, or he would have left out climate modeling used as if were real data.
The data identities as truth.
Here's a good listen for anyone curious about the subject.
https://freakonomics.com/podcast/why-is-there-so-much-fraud-in-academia/
Can I be a liar and plagiarist and be paid 900K annually from Harvard?
Are you a POC female and a member of the rainbow mafia?
No.
Then no. Unless you're willing to cut your dick off.
No, you only have to identify as dickless. And it really helps if the Babylon Bee misgenders you and loses their Twitter account.
If I say I am, it's literally murder to question it.
This is the dirty little secret of progressivism and the broader managerial technocracy, both of which are predicated on the notion that enlightened rule by an elite employing the latest science and academic thinking is preferable to allowing the great unwashed to manage their affairs on their own. Even if you believe that, ignoring the principles of self-governance or the sovereignty of the individual, you have to ignore the fact that the underlying science and academic thinking is often utterly shoddy and often willfully so. That science and research isn't divine revelation from on high. It's the product of fallible and often self-serving human beings.
This guy gets it.
I know it's trite to attack climate science on anytime data manipulation is used, but this it too relevant to not mention.
Before creating global temperature models, the temperature trends of each individual station are taken and processed. Documented changes have to be adjusted for. Then, there is the process of homogenization. When a site or sites doesn't fit the regional trend, the trend is corrected to homogenize it.
This involves taking the data, removing the trend of the measurement, and adding the regional trend to the result. For those who have even cursorily touched data science, you realize the problem. Homogenization removes the measurement and keeps the noise, the precise OPPOSITE of what you are supposed to do.
Ever since I read about that practice, I have put a hard eye to any trend. After all, if they would do that on such an important measurement with so much funding to do it right, what wouldn't be manipulated.
It was that kind of thing that prompted Richard Muller, originally somewhat of a climate sceptic, at least about the data and methodoogy, to set up the Berkeley Earth Surface Temperature project to re-analyse the station data, look into the urban heat island effect. etc.
https://en.wikipedia.org/wiki/Berkeley_Earth
That's what they called "massaging the data" when they came up with the "hockey stick" graph for Global Warming.
Nah. The hockey stick was even worse. They cut off the modern period and stuck a thermometer graph onto a tree ring sample, ignoring the fact that the rings showed a decrease in temperature over the time that the thermometer showed an increase, which under normal circumstances would invalidate the correllation.
Worse their data algorithm was a signal search algorithm that utilized weighting to give more weight to data sets that agreed with their initial assertion.
I'm still wondering how they fallacied over WWII which clearly shows the fastest Global Cooling years while more UN-filtered leaded gasoline was burned than ever before.
Oh yeah; they don't. They literally cut-off their charts right at that point.
I have some exposure to some of the models used by climate researchers at NOAA. I can tell you, the models are frequently ad hoc and contain numerous fudge factors and corrections to massage the data, throw out outliers, adjust that term during this time period, this term during that time period, etc. Further, many temperature measurements are based on proxies--e.g. assuming tree rings are wider during higher temperatures, but there's simply no way to determine how much wider per degree C.
I'm not saying that any particular models are wrong, just that, having implemented models like these before, I understand enough of the math to know that a minor mistake in a fudge factor meant to allow dissimilar measurements to be used as if they were from the same dataset can make a huge difference in the validity of the model. Not to mention simple errors in implementation that can have the results "look right" but still be completely wrong.
"In early 2001, CPC was requested to implement the 1971–2000 normal for operational forecasts. So, we constructed a new SST normal for the 1971–2000 base period and implemented it operationally at CPC in August of 2001" (Journal of Climate).
Just the abstract to that particular paper reveals how fragile the models are, being based on assumptions piled on top of assumptions, and unveiling a tendency to massage data.
"SST predictions are usually issued in terms of anomalies and standardized anomalies relative to a 30-yr normal: climatological mean (CM) and standard deviation (SD). The World Meteorological Organization (WMO) suggests updating the 30-yr normal every 10 yr."
How can a normal be updated--the data is the data, and its normal is its normal? This sentence implies that the data is somehow massaged every ten years or so. There may be legitimate reasons to do so, but anytime you massage data, there have to be questions as to the legitimacy of the alteration.
"Using the extended reconstructed sea surface temperature (ERSST) on a 28 grid for 1854–2000 and the Hadley Centre Sea Ice and SST dataset (HadISST) on a 18 grid for 1870–1999, eleven 30-yr normals are calculated, and the interdecadal changes of seasonal CM, seasonal SD, and seasonal persistence (P) are discussed."
I.e., data is being assembled from widely disparate data sources, with different measurement techniques, and that some of the data was made with instrumentation that simply cannot be validated (data from 1854?).
"Both PDO and NAO show a multidecadal oscillation that is consistent between ERSST and HadISST except that HadISST is biased toward warm in summer and cold in winter relative to ERSST."
Now we see that different data sets, ostensibly of the same population, disagree. And the fact that one data set exhibits bias to the extreme (too warm in summer and too cold in winter) raises questions about the proper use of this data. One scientist may be able to make a valid claim that the more stable data is in error and "correct" it to be more in line with the more volatile data; another scientist may do the opposite. And their personal bias will play a role as to which way they go.
Fortunately, because these academics nonetheless had to provide their methodology, and often enough their data, their papers could be checked and later rejected or disproved.
Poor gov’na shrike.
What part of my point do you disagree with?
Big Mac doesn't disagree with points. He's not smart enough. Instead he disagrees with people.
I thought you had me muted.
Nobody is muted.
he disagrees with people.
That appears to be true of most of the right-wing claque here.
Cite?
LOL
…..their papers could be checked and later rejected or disproved.
But only after the “results” have set off a new wave of policy making.
Isn't that largely on the policy makers?
I suppose some researchers know that their work will be used that way, and I'm sure that many in the social sciences have a strong ideological bias (or know what's exptected of them). WE really need government out of the academic research funding business. Way too many perverse incentives that are destroying the credibility of science.
It's both, and in collusion.
Former U.S. Senator Timothy Wirth (D-CO), then representing the Clinton-Gore administration as U.S undersecretary of state for global issues, addressing the same Rio Climate Summit audience, agreed: “We have got to ride the global warming issue. Even if the theory of global warming is wrong, we will be doing the right thing in terms of economic policy and environmental policy.”
Christine Stewart, former Canadian Environment Minister: “No matter if the science is all phoney, there are collateral environmental benefits…. climate change [provides] the greatest chance to bring about justice and equality in the world.”
Monika Kopacz, atmospheric scientist: “It is no secret that a lot of climate-change research is subject to opinion, that climate models sometimes disagree even on the signs of the future changes (e.g. drier vs. wetter future climate). The problem is, only sensational exaggeration makes the kind of story that will get politicians’ — and readers’ — attention. So, yes, climate scientists might exaggerate, but in today’s world, this is the only way to assure any political action and thus more federal financing to reduce the scientific uncertainty.”
Researcher Robert Phalen’s 2010 testimony to the California Air Resources Board: “It benefits us personally to have the public be afraid, even if these risks are trivial.”
Here, take a look. TEN years. That's not science, that's ScIeNcE FrAuD.
https://wattsupwiththat.com/2024/04/22/with-tree-rings-on-their-fingers/
It's a 30 minute video and a 10 minute read.
From your link
In a 2011 study in the Annals of Applied Statistics, two statisticians, Blakely McShane and Abraham Wyner, demonstrated this by constructing multiple different models using the Mann hockey stick data and showed that while they implied completely different conclusions about how today’s climate compares to the past, they all fit the data about the same.
That doesn't rule out that their model provides a similar reconstruction but has much wider standard errors
That's not what that conclusion means at all. What McShane and Wyner demonstrated was that using Mann's data, they could construct models that reached wildly divergent conclusions despite meeting near-identical statistical standards. In other words, the error margins are so wide that no conclusion is possible.
More to the point, you're completely ignoring the other findings of scientific malfeasance uncovered by McIntyre (which are excerpted in ABC's comment).
That’s not what that conclusion means at all
From the actual paper.
https://projecteuclid.org/journals/annals-of-applied-statistics/volume-5/issue-1/A-statistical-analysis-of-multiple-temperature-proxies–Are-reconstructions/10.1214/10-AOAS398.full
“Our model provides a similar reconstruction but has much wider standard errors,”
Yes, I can read your quote. You misinterpreted it and still ignored ABC's substantive point.
Nope. They had their own model that Watts neglected to mention as it undermined his thesis.
That’s the ideal but rarely the reality. The abyssmal reproducability statistics are only of the minority of studies that even have reproducible methodologies and data in the first place. The number of studies that are unreproducible because the authors “declined” to publish their methodology and/or data is staggering.
Dude. The article shows how the papers are checked (Excel fill-in) and the disprove happens every day (the sky still hasn’t fallen down) yet the “check and disprove” department continues to sit in blatant ignorance. It's bigotry at work not science and nobody would pay for such garbage if the 'armed-theft' guns were forcing everyone to pay for it.
How dare you question the experts!
— Lying Jeffy
+ charliehall
+ JasonT20
+ SRG2
AlWaYs TrUsT THE SCIENCE!!!!111!!!!!
This seems appropriate here.
I wouldn't trust "The Science", but I do trust science, i.e. the process of systematically checking your hypotheses against reality.
Naturally, there's a big difference between "science" and "The Science."
'"Imputation" means filling in data gaps with educated guesses.'
Not accurate. Imputation is usually based on data and statistical modeling. It can be quite accurate when done correctly.
You're missing the point. Imputation may (or may not) lead to an accurate result but it is not data. It is the output of your hypothesis, not an input that can be used to prove or disprove your hypothesis.
Bayesian priors, please.
The proper statistical methods for imputation take into account the fact that imputed values are not actual values.
"based on data and statistical modeling" ... that is based on more "imputation," "ecological inference," "contextualization," and "synthetic control."
Like a bad rumor - the BS just spreads and gets exaggerated.
She committed what in statistics is known as the "ecological fallacy"—you see two things occurring in the same place and assume a causal relationship.
Well, most of the definitions I see for "ecological fallacy" are that one has assumed that inferences about individuals can be deduced from inferences about the group.
Whatever, that is not as important as the question I have for Aaron Brown:
What is the fallacy called when one makes sweeping generalizations about all of academia and gives only four anecdotes as proof?
You should go find those needles in the haystack of academia research that doesn't use one or all of these fallacy methods? Then you can publish your findings WITHOUT using sweeping generalizations.. eh? Then when my child is looking for Non-BS education I'll know where to send them.
You should go find those needles in the haystack of academia research that doesn’t use one or all of these fallacy methods?
What is the fallacy called where you assume that your conclusion is true as part of your argument?
I don't think you finding needles in the haystack will ever be true because you're too lazy to support your argument or more likely know you're arguing BS.
I don't think you even recognized what I was arguing, since you never addressed it.
You make a sweeping generalization that just because there are 4-cases instead of all cases presented that the conclusion is a fallacy. So I told you to present all-cases without "sweeping generalizations" and you wanted to play the chase-your-own-tail game instead of presenting those supposed 'other' cases.
Why do I have to present all cases? My whole point was that Brown, in an article where he criticizes academics for being sloppy or actually making up data, makes his case against all of academia with 4 cases of academics doing what he says is wrong. Where is his evidence that those 4 cases are representative of the whole? It isn’t on me to disprove his hypothesis. It is on him to show that his hypothesis holds up to scrutiny.
I made no sweeping generalizations. I was pointing out the problem with one person's argument, where he made a sweeping generalization about all of academia based on 4 cases of alleged poor scholarship. You just used that phrase because I did. You clearly didn't think about what I wrote enough to understand it.
Here's your evidence you're to lazy to go get because you're too busy trying to destroy the credibility of Browns work.
"In a study that analysed how ML prediction models deal with missing data" 96 out of 152 accounted for missing data.
https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-024-02173-x#Sec10
I always understood the ecological fallacy (or population fallacy) to mean that also. It’s the special case in the context of statistics of the fallacy of division: assuming that knowing something about a whole tells you that that is true of a part of the whole, like claiming that because someone is bilaterally symmetrical his liver must be. An example of the ecological fallacy as I understand it is when the ‘proof’ the FDA uses to determine that a drug is ‘safe and effective’ uses statistics (which tell you things about a whole population) and then assumes that those data apply to particular non-stochastic systems called individual human beings. Another is when doctors use statistical correlations to decide that a particular patient’s blood pressure or serum lipid levels is ‘too high’.
An example of the ecological fallacy as I understand it is when the ‘proof’ the FDA uses to determine that a drug is ‘safe and effective’ uses statistics (which tell you things about a whole population) and then assumes that those data apply to particular non-stochastic systems called individual human beings. Another is when doctors use statistical correlations to decide that a particular patient’s blood pressure or serum lipid levels is ‘too high’.
If "safe and effective" means that the risk is low compared to the likely benefit, then they haven't engaged in any invalid leaps of reasoning, in my opinion. That is what "high" blood pressure is, for instance. They see a correlation between people that have measured bp consistently above some statistical threshold having a significantly higher risk of adverse events like strokes and heart disease. That plus the very plausible causal relationship between having high blood pressure and those events, and it is a solid conclusion to draw that an individual with blood pressure above that threshold is at risk of stroke or cardiovascular disease.
Cause and effect can be tough to prove, especially in highly varied human populations. That is why statistics lead to conclusions about risks and not "this will happen" if doctors and researchers are careful.
Terms like 'probability' and 'correlation' are from statistics, which as I said is a branch of mathematics that tells you things about populations. Terms like 'cosine' and 'secant' are from trigonometry, which tells you things about triangles. Since individual human beings are neither populations nor triangles, terminology from either of those branches of mathematics is meaningless when applied to them: what philosophers call a 'category error'.
What happens when it turns out that much of the intellectual class are frauds and charlatans, but the political class demands Rule by Experts?
I think we're finding out right now.
We found out good and hard in 2020.
You should see some of the crazy, convoluted methods used by Economists! Economists just love to keep adding variables to their models until they get the desired result.
It's not much of a surprise that the majority of economists failed to predict just about every recession and bust cycle of the past 50 years.
BTW...I hold M.S. degrees in engineering AND statistics. I ran the statistics department of a major semiconductor company for 7 years.
Yeah it's all pretty dismal.
And let's not forget all the "health" studies, which almost always seem to devolve into finding another proxy for wealth and conclude that being rich is healthy.
“you can usually guess
gendersex based on a person’s name”Fixed it.
You are behind the times. The original statement is correct. You broke it. Some people are choosing to use a gender that is different than their biological sex, and they are changing their names to reflect the chosen gender. For this reason names are more useful for guessing a person's gender than for guessing the person's sex. It is true that many of these people don't understand the difference between sex and gender.
https://thecritic.co.uk/the-sexist-pseudoscience-of-gender-identity/
Give this a read. "Gender" as commonly defined today is absolute nonsense.
And once again, Reason contributors are three years behind the commentariat.
And, once again, pointing at other media and saying “Look! Over there! It was them!” while continuing to publish Reason/Cato studies on immigrant-owned business, immigrant-related crime, immigrant-related welfare, etc., etc. that contain the exact same categorization errors, affirmation/survivorship bias, ecological fallacies, and nonsensical imputation.
“But when you absolutely know that something is true, it’s all right to make up data that proves it, especially when the real data fails to.”
Progressives
John Lott isn't progressive.
Look at that! The 'Guns' (gov-guns) aren't teaching people anything but pure BS.
But leftards don't care. They're always posting 'links' to pure BS (studies) to support their BS on top of spouting pure BS all because they don't want to *earn* anything.
Maybe life should be just as simple as ensured Individual Liberty and Justice for all. And all the BS producers should go out of business due to lack of 'armed-theft' funding.
Regarding the Connecticut case: in 1994, a year before the new Conn. law AND the Brady Act (nationally) took effect, Connecticut had 215 gun homicides. 10 years later than had 100 gun homicides (about average for that period for Conn). So gun homicides dropped by about 57%.
A reasonable comparison would be to the national drop (as all states saw decreases after the Brady Act took effect in 1995, the same year as the new Conn laws): nationally, in 1994, there were 32, 150 gun homicides. In 2004, there were 11, 624 gun homicieds in the US. The decrease, over the decade was 64% so based on these calculations, the additional Conn law cannot be shown to have decreased compared to the national average (and one could argue unconvincingly equating correlation with cause that it increased the gun homicides). But what is really important is that the Brady Act, which affected all states, had such a powerful effect in cutting back gun homicides. The comparison with Connecticut to the entire nation is not a proper apples to apple comparison, and the fictional synthetic Connecticut used by the researchers has more validity be creating a model that matches the demographic, economic, and other features of Connecticut.
Laws don’t stop criminals
Actually, if you’re looking at all the data instead of cherrypicking what supports your pre-ordained conclusion, the fact that the gun homicide rate dropped further, faster before the Brady Bill rather than during/after suggests the same thing you say about the Conn. Law one layer higher. That, if the Brady Bill had any effect at all, it was to slow/stop the decrease in gun homicides. It took us 20 yrs. from 1995-2015, during and after the Brady Bill to see the same rate of decrease we saw from 1990-1995, the 5 yrs. before the Brady Bill took effect.
I think your 1994 national statistic includes suicides. The 2004 statistic appears not to include suicides.
Sure, like THAT is the only thing wrong with his conclusions.
Let's start with the most basic rule, correlation is not causation. Concluding that the Brady act reduced gun deaths without accounting for, literally, everything else in the world. All violent crime in the US was at a peak in the early 90s and declining. Precipitously, in fact.
His conclusions are specious, and beside the point. Which is that inferring data that don't exist makes for bad science.
There is a similar problem with the analyses that claim to show that death penalties reduce crime.
This reminds me of the "hot deck" imputation method that was frequently used in the 1960s and 1970s for analyzing and reporting on census data. If you were ethical, you would disclose if the hot deck method had been used and why it had been used, and of course, you would always keep the original raw data. The term "hot deck" originated due to the data being commonly stored on punched cards, and the missing data allocation process frequently would be to simply fill in missing values by using the data from the preceding card in the sorted deck.
Yes, this data was used for government planning and spending programs.
In the 1970s I was part of a project to obtain better quality and more frequent data than the US Census data could provide for one specific geographic area. This data was used to support longitudinal studies. This project was an example of why data imputation methods may still have to be used even when better data exists. The raw data contained information about identifiable individuals, and as such the data could only be released with the identity information and any data that could be used to determine identity removed.
The raw data contained information about identifiable individuals, and as such the data could only be released with the identity information and any data that could be used to determine identity removed.
Take your meds old man, you're not making any sense. I'd cut you some slack and say that some of the information and relational theories that out you as a doddering old fool weren't well known in the 60s and 70s but that would still mean you haven't learned a goddamned thing in more than 50 yrs.
Privacy laws are indeed a major impediment to good science. You just gave one example. It is an even bigger issue in healthcare research. We are so concerned about privacy that we accept bad science and bad policies that result from bad science.
Well, this is to be expected, given the crowding out of other sources of research funding by federal grants. When you know your grant depends on your 'proving' that something requires regulation by the agency providing the grant, your objectivity is corrupted.
As Eisenhower noted in his farewell address:
"Today, the solitary inventor, tinkering in his shop, has been over shadowed by task forces of scientists in laboratories and testing fields. In the same fashion, the free university, historically the fountainhead of free ideas and scientific discovery, has experienced a revolution in the conduct of research. Partly because of the huge costs involved, a government contract becomes virtually a substitute for intellectual curiosity. For every old blackboard there are now hundreds of new electronic computers.
"The prospect of domination of the nation's scholars by Federal employment, project allocations, and the power of money is ever present and is gravely to be regarded.
"Yet, in holding scientific research and discovery in respect, as we should, we must also be alert to the equal and opposite danger that public policy could itself become the captive of a scientific-technological elite."
"When you know your grant depends on your ‘proving’ that something requires regulation by the agency providing the grant, your objectivity is corrupted."
It is similar to corporate funded research. The tobacco or pharmaceutical industries also fund or have funded studies that are corruptly motivated, and whose results are lined up with company interests. Sometimes the results are surprising and this lends credibility to the study that it wouldn't have otherwise. The studies, for example, that the oil companies funded that agreed with anthropocentric climate change, CO2, greenhouse gas and fossil fuels. Perhaps the oil companies know that they are a guaranteed money machine and are at no risk or danger regardless of which way the climate change study went. Still, the fact that the study weighed against oil interest lends it credence.
“When you know your grant depends on your ‘proving’ that something requires regulation by the agency providing the grant, your objectivity is corrupted.”
Most grant giving agencies have no regulatory power.
The real shame here is not so much that academics continue to publish fraudulent studies but, rather, that politicians and the demographics in their political camps keep imposing policies on all of us based on fake "studies."