Exclusive: Elsevier to retract paper by economist who failed to disclose data tinkering

Almas Heshmati

A paper on green innovation that drew sharp rebuke for using questionable and undisclosed methods to replace missing data will be retracted, its publisher told Retraction Watch.

Previous work by one of the authors, a professor of economics in Sweden, is also facing scrutiny, according to another publisher. 

As we reported earlier this month, Almas Heshmati of Jönköping University mended a dataset full of gaps by liberally applying Excel’s autofill function and copying data between countries – operations other experts described as “horrendous” and “beyond concern.”

Heshmati and his coauthor, Mike Tsionas, a professor of economics at Lancaster University in the UK who died recently, made no mention of missing data or how they dealt with them in their 2023 article, “Green innovations and patents in OECD countries.” Instead, the paper gave the impression of a complete dataset. One economist argued in a guest post on our site that there was “no justification” for such lack of disclosure.

Elsevier, in whose Journal of Cleaner Production the study appeared, moved quickly on the new information. A spokesperson for the publisher told us yesterday: “We have investigated the paper and can confirm that it will be retracted.”

We first contacted Heshmati after a PhD student tipped us off about his dubious research practices. The student had obtained Heshmati’s dataset, which, along with email correspondence between the professor and the student, revealed how Heshmati had approached the numerous missing observations in the data. 

When we presented Heshmati with the criticism leveled against him, the researcher stood by his methods, but said they should have been “acknowledged and explained.” He had missed doing so “unintentionally in the writing stage of the paper,” he said.

It’s a mistake Heshmati has made twice, it turns out. In 2020, he and two colleagues published a paper in Empirical Economics, a Springer Nature title, that bore strong resemblance to the 2023 article and relied on the same patched-up dataset. The article mentioned neither the data gaps nor the Excel operations.

The 2020 paper, “Green innovations and patenting renewable energy technologies,” has been cited 23 times, according to Clarivate’s Web of Science.

A spokesperson for Springer Nature told us:

Thank you for bringing these concerns to our attention. We are now looking into the matter carefully following an established procedure and in line with best-practice COPE guidance. We would be happy to provide an update when we have completed our investigation.

When we reached out to Heshmati about the 2020 paper, he told us he used the same data as in the later article, but with some “differences in the way the variables are defined and transformed.” As such, he said, the description of the data was similar between the two papers:

The first paper is cited in the second paper. Journal of Cleaner Production has strict control of similarity rate and decline [sic] papers with high rate of similarity regardless of their authors.

Heshmati also said he was “disappointed” by Retraction Watch’s “approach and clearly implied false accusations”:

As you well know, I was open to supporting [the PhD student who contacted Heshmati], after exchanging some e-mail [I] sent him a zoom link and had a long zoom meeting with him explaining the reason of imputation of missing observations and their benefits and harms. I shared the data with him where all imputed points are highlighted in yellow color to allow future replacement. I have been all the way open, honest, and willing to share information without fear of negative publicity. I still believe that I did right and would use the technique again on same data but with deleting years and countries with high frequency of missing values and produce sensitivity analysis of the result with full and restricted sample.

Esfandiar Maasoumi, a professor of economics at Emory University in Atlanta and first author of the 2020 paper, said he believed Heshmati to be “honest and reliable,” but denied involvement in and knowledge of the data imputation. He told Retraction Watch:

I take data accuracy and transparency very seriously. I did not have any hand in data and code/implementation parts of the paper, and relied on my coauthors for these components. I regard my coauthor, Dr. Heshmati, as very honest and reliable and well informed. I would reject any implication that he has intentionally done anything improper.

Maasoumi also characterized Retraction Watch’s coverage of Heshmati’s research as “sensational” and “rather uninformed” and said we jumped to conclusions about intentions:

The criticism of lack of reporting should be placed on both authors and journals that do not allow or accommodate full descriptions of work in scarce journal pages. I noted that Dr. Heshmati had willingly, and openly engaged with [the PhD student who approached Heshmati about the missing data] and completely cooperated with him/her. This is clear indication of his good and honest intentions. Personally, my experience with other authors for the last 50 years is that it is next to impossible to get data, code, and meaningful correspondence with authors of papers, published or otherwise.

He elaborated, quoting experts we spoke with:

Any reader of your work here, and generally, would note that the central, key message, is ” dishonesty”, or ” cheating” as you put it and imply it very strongly. Given that those implications are categorically false, will you write a retraction, a correction, and contact the same sources immediately?

Heshmati did hand over his data to the PhD student, if grudgingly.

“I have the excel data file but do not distribute it as I may update and use it again in research,” Heshmati wrote in an email seen by Retraction Watch. “Now I have teaching and a busy schedule, but if your intension [sic] is to learn I may show you the file.”

The student vowed not to share the spreadsheet with others – a promise he later broke after realizing just how appalling Heshmati’s methods were. So much of the data had been filled in by Heshmati it felt to the student like “fabrication” warranting a retraction.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly updatefollow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that’s not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at team@retractionwatch.com.

How (not) to deal with missing data: An economist’s take on a controversial study

Gary Smith

Nearly 100 years ago, Muriel Bristol refused to drink a cup of tea that had been prepared by her colleague, the great British statistician Ronald Fisher, because Fisher had poured milk into the cup first and tea second, rather than tea first and milk second. Fisher didn’t believe she could tell the difference, so he tested her with eight cups of tea, half milk first and half tea first. When she got all eight correct, Fisher calculated the probability a random guesser would do so as well – which works out to 1.4%. He soon recognized that the results of agricultural experiments could be gauged in the same way – by the probability that random variation would generate the observed outcomes.

If this probability (the P-value) is sufficiently low, the results might be deemed statistically significant. How low? Fisher recommended we use a 5% cutoff and “ignore entirely all results which fail to reach this level.”

His 5% solution soon became the norm. Not wanting their hard work to be ignored entirely, many researchers strive mightily to get their P-values below 0.05.

For example, a student in my introductory statistics class once surveyed 54 classmates and was disappointed that the P-value was 0.114. This student’s creative solution was to multiply the original data by three by assuming each survey response had been given by three people instead of one: “I assumed I originally picked a perfect random sample, and that if I were to poll 3 times as many people, my data would be greater in magnitude, but still distributed in the same way.” This ingenious solution reduced the P-value to 0.011, well below Fisher’s magic threshold.

Ingenious, yes. Sensible, no. If this procedure were legitimate, every researcher could multiply their data by whatever number is necessary to get a P-value below 0.05. The only valid way to get more data is, well, to get more data. This student should have surveyed more people instead of fabricating data.

I was reminded of this student’s clever ploy when Frederik Joelving, a journalist with Retraction Watch, recently contacted me about a published paper written by two prominent economists, Almas Heshmati and Mike Tsionas, on green innovations in 27 countries during the years 1990 through 2018. Joelving had been contacted by a PhD student who had been working with the same data used by Heshmati and Tsionas. The student knew the data in the article had large gaps and was “dumbstruck” by the paper’s assertion these data came from a “balanced panel.” Panel data are cross-sectional data for, say, individuals, businesses, or countries at different points in time. A “balanced panel” has complete cross-section data at every point in time; an unbalanced panel has missing observations. This student knew firsthand there were lots of missing observations in these data.

The student contacted Heshmati and eventually obtained spreadsheets of the data he had used in the paper. Heshmati acknowledged that, although he and his coauthor had not mentioned this fact in the paper, the data had gaps. He revealed in an email that these gaps had been filled by using Excel’s autofill function: “We used (forward and) backward trend imputations to replace the few missing unit values….using 2, 3, or 4 observed units before or after the missing units.”  

That statement is striking for two reasons. First, far from being a “few” missing values, nearly 2,000 observations for the 19 variables that appear in their paper are missing (13% of the data set). Second, the flexibility of using two, three, or four adjacent values is concerning. Joelving played around with Excel’s autofill function and found that changing the number of adjacent units had a large effect on the estimates of missing values.

Joelving also found that Excel’s autofill function sometimes generated negative values, which were, in theory, impossible for some data. For example, Korea is missing R&Dinv (green R&D investments) data for 1990-1998. Heshmati and Tsionas used Excel’s autofill with three years of data (1999, 2000, and 2001) to create data for the nine missing years. The imputed values for 1990-1996 were negative, so the authors set these equal to the positive 1997 value.

KoreaExtrapolation.pdf

Overall, the missing observations in this data set are not evenly distributed across countries and years. IPRpro (an index of  intellectual and property rights strength) is missing 79% of its data because there are only observations every four, five, or six years. Another variable, EDUter (government expenditures on tertiary education as a percentage of GDP) was said to be a “crucial determinant of innovativeness” but is missing 34% of its data.

Some countries are missing data for several consecutive years. For example, the variable MKTcap is the market capitalization of listed domestic companies measured as a percentage of gross domestic product (GDP). The MKTcap data end for Finland in 2005, Denmark in 2004, and Sweden in 2003, requiring 13, 14, and 15 years of imputed data, respectively. The MKTcap data for Greece don’t begin until 2001 (requiring 12 years of imputed data). Italy has MKTcap data for only 1999 through 2008. The authors imputed the values for the nine years before and the 10 years after this interval.

The most extreme cases are where a country has no data for a given variable. The authors’ solution was to copy and paste data for another country. Iceland has no MKTcap data, so all 29 years of data for Japan were pasted into the Iceland cells. Similarly, the ENVpol (environmental policy stringency) data for Greece (with six years imputed) were pasted into Iceland’s cells and the ENVpol data for Netherlands (with 2013-2018 imputed) were pasted into New Zealand’s cells. The WASTE (municipal waste per capita) data for Belgium (with 1991-1994 and 2018 imputed) were pasted into Canada. The United Kingdom’s R&Dpers (R&D personnel) data were pasted into the United States (though the 10.417 entry for the United Kingdom in 1990 was inexplicably changed to 9.900 for the United States).

The copy-and-pasted countries were usually adjacent in the alphabetical list (Belgium and Canada, Greece and Iceland, Netherlands and New Zealand, United Kingdom and United States), but there is no reason an alphabetical sorting gives the most reasonable candidates for copying and pasting. Even more troubling is the pasting of Japan’s MKTcap data into Iceland and the simultaneous pasting of Greece’s ENVpol data into Iceland. Iceland and Japan are not adjacent alphabetically, suggesting this match was chosen to bolster the desired results.

Imputation is attractive because it provides more observations and, if the imputed data are similar to the actual data, the P-values are likely to drop. In an email exchange with Retraction Watch, Heshmati said, “If we do not use imputation, such data is [sic] almost useless.”

Imputation sometimes seems reasonable. If we are measuring the population of an area and are missing data for 2011, it is reasonable to fit a trend line and, unless there has been substantial immigration or emigration, use the predicted value for 2011. Using stock returns for 2010 and 2012 to impute a stock return for 2011 is not reasonable.

Clearly, the more values are imputed, the less trustworthy are the results. It is surely questionable to use data for, say, 1999 through 2008 to impute values for 1990-1998 and 2009-2018. It is hard to think of any sensible justification for using 29 years of one country’s data to fill in missing cells for another country.

There is no justification for a paper not stating that some data were imputed and describing how the imputation was done. It is even worse to state the data had no missing observations. This paper might have been assessed quite differently – perhaps not been published at all – if the reviewers had known about the many imputations and how they were done.

Gary Smith is an economics professor at Pomona College. He has written (or co-authored) more than 100 peer-reviewed papers and 17 books, including the best-seller Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie With Statistics.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly updatefollow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that’s not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at team@retractionwatch.com.

Econ journal board quits en masse because Wiley ‘appeared to emphasize quantity over quality’

In what has become a familiar refrain, more than 30 editors and advisors of an economics journal have resigned because they felt the publisher’s need for growth would increase the “risks of proliferation of poor-quality science.”

In a letter uploaded to Dropbox on February 7, the editors and advisors of the Journal of Economic Surveys said: “We no longer believed that the corporate policies and practices of the Journal’s publisher, Wiley, as we perceived them through several statements made by Wiley and the draft of a new editor agreement submitted to the attention of Editors-in-Chief and Managing Editors by Wiley, were coherent with ours.”

Despite involving a lawyer, the now-former editors said:

The non-negotiable documentation submitted to our analysis appeared to emphasize quantity over quality of the papers submitted and strongly favoured cross pollination among the various Wiley publications also in relation to papers that we would have not considered favourably for the Journal of Economic Surveys increasing–in our perception–risks of proliferation of poor-quality science.

In a statement through a spokesperson to Retraction Watch, Allyn Molina, Wiley’s vice president of publishing development. said the company was “grateful to the leadership and dedication of the former Editors who created a strong foundation.” The journal offers an open access option, for a fee of more than $4,000 per article. But Molina said “this journal does not have any open access publishing targets.” 

He continued:

Wiley proposed increasing publication output by 4% over the prior year as a goal rather than a requirement. In absolute terms, that’s an increase of 2 additional articles per year. This is supported by significant growth in submissions and a trend of emerging topics in the field.

Les Oxley, one of the two former managing editors in chief of the journal, said he and his colleagues had “have no comment to make” on Wiley’s statement.

The mass resignation joins a growing list of more than two dozen such episodes, as academic researchers battle for-profit journal publishing companies. Quoting public statements by Wiley, the editors wrote:

Wiley’s performance metrics for journals are “growth in submissions” and “growth in published articles”; Wiley also appears ready to affirm that “it’s all driven by volume ultimately”.

The company, the editors wrote, again quoting public statements, “considers it ‘important to keep authors at the center of journal strategy’” and “has an ‘Open Access (‘pay to publish’)’ business model and a ‘Cascade strategy of finding initially rejected articles another more appropriate home within Wiley’s portfolio.’”

Wiley told Retraction Watch:

We regularly take steps to ensure our journals are best serving the needs of the academic community. Over time, this means updating our editorial and production processes to improve the author experience, ensuring our workflows meet the needs of authors, librarians and funders, and bolstering processes to safeguard research integrity. Editors are key to making these changes happen, and our goal is always to reach common ground on changes that support the long-term success of the journal. We recognize that not all Editors will embrace these changes, and where compromise cannot be reached, it is sometimes in our mutual interest to part ways.

The publisher said it “will work with the succeeding editorial team to carry on that legacy.”

We are currently working with members of the board and wider community to recruit new leadership for the journal. This is a fully transparent process that includes advertising and recruiting within the wider field. In partnership with the new editorial team, we plan to introduce best practice editorial workflows and broaden the journal’s scope to reflect new and emerging fields.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly updatefollow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that’s not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at team@retractionwatch.com.

No data? No problem! Undisclosed tinkering in Excel behind economics paper

Almas Heshmati

Last year, a new study on green innovations and patents in 27 countries left one reader slack-jawed. The findings were no surprise. What was baffling was how the authors, two professors of economics in Europe, had pulled off the research in the first place. 

The reader, a PhD student in economics, was working with the same data described in the paper. He knew they were riddled with holes – sometimes big ones: For several countries, observations for some of the variables the study tracked were completely absent. The authors made no mention of how they dealt with this problem. On the contrary, they wrote they had “balanced panel data,” which in economic parlance means a dataset with no gaps.

“I was dumbstruck for a week,” said the student, who requested anonymity for fear of harming his career. (His identity is known to Retraction Watch.)

The student wrote a polite email to the paper’s first author, Almas Heshmati, a professor of economics at Jönköping University in Sweden, asking how he dealt with the missing data. 

In email correspondence seen by Retraction Watch and a follow-up Zoom call, Heshmati told the student he had used Excel’s autofill function to mend the data. He had marked anywhere from two to four observations before or after the missing values and dragged the selected cells down or up, depending on the case. The program then filled in the blanks. If the new numbers turned negative, Heshmati replaced them with the last positive value Excel had spit out. 

The student was shocked. Replacing missing observations with substitute values – an operation known in statistics as imputation – is a common but controversial technique in economics that allows certain types of analyses to be carried out on incomplete data. Researchers have established methods for the practice; each comes with its own drawbacks that affect how the results are interpreted. As far as the student knew, Excel’s autofill function was not among these methods, especially not when applied in a haphazard way without clear justification.

But it got worse. Heshmati’s data, which the student convinced him to share, showed that in several instances where there were no observations to use for the autofill operation, the professor had taken the values from an adjacent country in the spreadsheet. New Zealand’s data had been copied from the Netherlands, for example, and the United States’ data from the United Kingdom. 

This way, Heshmati had filled in thousands of empty cells in the dataset – well over one in 10 – including missing values for the study’s outcome variables. A table listing descriptive statistics for the study’s 25 variables referred to “783 observations” of each variable, but did not mention that many of these “observations” were in fact imputations.

“This fellow, he imputed everything,” the student said. “He is a professor, he should know that if you do so much imputation then your data will be entirely fabricated.”

Other experts echoed the student’s concerns when told of the Excel operations underlying the paper.

“That sounds rather horrendous,” said Andrew Harvey, a professor of econometrics at the University of Cambridge, in England. “If you fill in lots of data points in this way it will invalidate a lot of the statistics and associated tests. There are ways of dealing with these problems correctly but they do require some effort.

“Interpolating data is bad practice but lots of people do it and it’s not dishonest so long as it’s mentioned,” Harvey added. “The other point about copying data from one country to another sounds much worse.”

Søren Johansen, an econometrician and professor emeritus at the University of Copenhagen, in Denmark, characterized what Heshmati did as “cheating.” 

“The reason it’s cheating isn’t that he’s done it, but that he hasn’t written it down,” Johansen said. “It’s pretty egregious.” 

The paper, “Green innovations and patents in OECD countries,” was published in the Journal of Cleaner Production, a highly ranked title from Elsevier. It has been cited just once, according to Clarivate’s Web of Science.

Neither the publisher nor the journal’s editors, whom the student said he alerted to his concerns, have responded to our requests for comment.

Heshmati’s coauthor, Mike Tsionas, a professor of economics at Lancaster University in the UK, died recently. In a eulogy posted on LinkedIn in January, the International Finance and Banking Society hailed Tsionas as “a true luminary in the field of econometrics.” 

In a series of emails to Retraction Watch, Heshmati, who, according to the paper, was responsible for data curation, first said Tsionas had been aware of how Heshmati dealt with the missing data.

“If we do not use imputation, such data is almost useless,” Heshmati said. He added that the description of the data in the paper as “balanced” referred to “the final data” – that is, the mended dataset.

Referring to the imputation, Heshmati wrote in a subsequent email:

Of course, the procedure must be acknowledged and explained. I have missed to explain the imputation procedure in the data section unintentionally in the writing stage of the paper. I am fully responsible for imputations and missing to acknowledge it.

He added that when he was approached by the PhD student: 

I offered him a zoom meeting to explain to him the procedure and even gave him the data. If I had other intensions [sic] and did not believe in my imputation approach, I would not share the data with him. If I had to start over again, I would have managed the data in the same way as the alternative would mean dropping several countries and years.

Gary Smith, a professor of economics at Pomona College in Claremont, California, said the copying of data between countries was “beyond concerning.” He reviewed Heshmati’s spreadsheet for Retraction Watch and found five cases where more than two dozen data points had been copied from one country to another. 

Marco Hafner, a senior economist at the RAND Corporation, a nonprofit think tank, said “using the autofill function may not be the best of ideas in the first place as I can imagine it is not directly evident to what conditions missing values have been determined/imputed.”

Hafner, who is research leader at RAND Europe, added that “under reasonable assumptions and if it’s really necessary for analytical reasons, one could fill in data gaps for one country with data from another country.” But, he said, the impact of those assumptions would need to be reported in a sensitivity analysis – something Heshmati said he had not done. 

“At the bare minimum,” Hafner said, the paper should have stated the assumptions underlying the imputation and how it was done – something that, he added, would have reduced the chances of the work getting published should the reviewers find the methods inappropriate.

Like Retraction Watch? You can make a tax-deductible contribution to support our work, subscribe to our free daily digest or paid weekly updatefollow us on Twitter, like us on Facebook, or add us to your RSS reader. If you find a retraction that’s not in The Retraction Watch Database, you can let us know here. For comments or feedback, email us at team@retractionwatch.com.

Nobel Prize for research in global labor markets, using historical data

Claudia Goldin, an economist at Harvard, has won the Nobel Prize in Economics. A big part of her studies are rooted in the collection and analysis of centuries-old data:

Women are vastly underrepresented in the global labour market and, when they work, they earn less than men. Claudia Goldin has trawled the archives and collected over 200 years of data from the US, allowing her to demonstrate how and why gender differences in earnings and employment rates have changed over time.

Goldin showed that female participation in the labour market did not have an upward trend over this entire period, but instead forms a U-shaped curve. The participation of married women decreased with the transition from an agrarian to an industrial society in the early nineteenth century, but then started to increase with the growth of the service sector in the early twentieth century. Goldin explained this pattern as the result of structural change and evolving social norms regarding women’s responsibilities for home and family.

Amazing.

The illustrations by Johan Jarnestad that accompany the announcement are also really useful.

Tags: , , , ,

More friendships between rich and poor might mean less poverty

Recently published in Nature, research by Chetty, R., Jackson, M.O., Kuchler, T. et al. suggests that economic connectedness, or friendships between rich and poor, could improve economic mobility. The researchers used Facebook connection data from 70.3 million users, along with demographic and income data. NYT’s The Upshot explains the relationships with a collection of maps and charts.

You can find an anonymized, aggregated version of the data through the Social Capital Atlas. Also, I am very much into this socially-focused use of social media data.

Tags: , , , ,

Inflation explained with eggs

The prices of everything seem to be rising a lot lately. Why? For Vox, Emily Stewart uses eggs as a case study to explain:

“There are different ways of thinking about the inflation issue, and economists by default tend to think about macroeconomic issues such as inflation in macroeconomic terms,” said Isabella Weber, an economist at UMass Amherst. “In this current situation that we are facing, we basically have very strong micro dynamics, that is dynamics on the level of specific sectors that translate into a more general kind of price pressure.”

Eggs don’t paint the full inflation picture in the US, but they do a part of it — it’s more expensive to feed chickens and move eggs around, so it’s more expensive to produce and move eggs, so it’s more expensive for consumers to buy eggs.

Tags: , , ,

Journals punished by high-profile indexing service cry foul, demand a recount

A group of editors of journals focused on the history of economics has gone public to urge Clarivate Analytics, which publishes a highly influential ranking of journals, to reconsider its decision to drop the titles from this year’s index. Clarivate said it suppressed the titles because of apparent “citation stacking,” in which various editors agree … Continue reading Journals punished by high-profile indexing service cry foul, demand a recount

Famous Harvard economist reused parts of 2002 paper multiple times, says journal

A former Harvard economist and co-founder of a massive repository of free papers in social sciences has been accused of reusing similar material over multiple papers. The three papers share the same title. According to an investigation by one of the journals, two papers by Michael Jensen, now an emeritus faculty member at Harvard, are … Continue reading Famous Harvard economist reused parts of 2002 paper multiple times, says journal

Why detailed retraction notices are important (according to economists)

When journals retract a paper but don’t explain why, what should readers think? Was the problem as simple as an administrative error by the publisher, or more concerning, like fraud? In a recent paper in Research Policy, economists led by Adam Cox at the University of Portsmouth, UK, analyzed 55 retractions from hundreds of economics … Continue reading Why detailed retraction notices are important (according to economists)