07Sep / 2023

Flawed Rotten Tomatoes ratings

Rotten Tomatoes aggregates movie reviews to spit out a freshness score for each film. There’s a problem though. For Vulture, Lane Brown reports on the flawed system:

But despite Rotten Tomatoes’ reputed importance, it’s worth a reminder: Its math stinks. Scores are calculated by classifying each review as either positive or negative and then dividing the number of positives by the total. That’s the whole formula. Every review carries the same weight whether it runs in a major newspaper or a Substack with a dozen subscribers.

If a review straddles positive and negative, too bad. “I read some reviews of my own films where the writer might say that he doesn’t think that I pull something off, but, boy, is it interesting in the way that I don’t pull it off,” says Schrader, a former critic. “To me, that’s a good review, but it would count as negative on Rotten Tomatoes.”

Studios have of course learned how to game the system, not to mention most of the site is now owned by movie ticket seller Fandango.

Tags: bias, movies, ratings, Rotten Tomatoes, Vulture

Permalink

15Aug / 2023

Supermarket provides AI-driven meal planner and is disappointed by the internet using it to output weird recipes

A supermarket chain in New Zealand offered an AI-based recipe generator, and of course people started throwing in random household items to see what it would make. For The Guardian, Tess McClure reports:

The app, created by supermarket chain Pak ‘n’ Save, was advertised as a way for customers to creatively use up leftovers during the cost of living crisis. It asks users to enter in various ingredients in their homes, and auto-generates a meal plan or recipe, along with cheery commentary. It initially drew attention on social media for some unappealing recipes, including an “oreo vegetable stir-fry”.

When customers began experimenting with entering a wider range of household shopping list items into the app, however, it began to make even less appealing recommendations. One recipe it dubbed “aromatic water mix” would create chlorine gas. The bot recommends the recipe as “the perfect nonalcoholic beverage to quench your thirst and refresh your senses”.

Tags: AI, grocery, Guardian, recipes

Permalink

03Aug / 2023

Honesty research likely faked data

Research by Dan Ariely and Francesca Gino suggested that people were more honest in a survey when you ask them about honesty at the beginning. The problem is that the data in the analysis was likely faked. The research was over ten years ago, and Ariely suggested that the insurance company that supplied the data did something to it prior to him receiving it, but the insurance company recently stated that the data was faked after they supplied it.

In any case, there’s fake data in there somewhere. Planet Money broke it all down.

See also the analysis by Data Colada, which is why the fraud came to light.

Tags: fake data, honesty, Planet Money

Permalink

09Jan / 2023

Bias in AI-generated images

Lensa is an app that lets you retouch photos, and it recently added a feature that uses Stable Diffusion to generate AI-assisted portraits. While fun for some, the feature reveals biases in the underlying dataset. Melissa Heikkilä, for MIT Technology Review, describes problematic biases towards sexualized images for some groups:

Lensa generates its avatars using Stable Diffusion, an open-source AI model that generates images based on text prompts. Stable Diffusion is built using LAION-5B, a massive open-source data set that has been compiled by scraping images off the internet.

And because the internet is overflowing with images of naked or barely dressed women, and pictures reflecting sexist, racist stereotypes, the data set is also skewed toward these kinds of images.

This leads to AI models that sexualize women regardless of whether they want to be depicted that way, Caliskan says—especially women with identities that have been historically disadvantaged.

Tags: AI, bias, images, Lensa, MIT Technology Review, Stable Diffusion

Permalink

26Sep / 2022

Historical data

Randall Munroe provides another fine observation through xkcd.

I often wonder what our data and charts will look like a century or two from now. Will the conventions and aesthetics look silly and amateur or classic and vintage? Will what seems like a lot of detailed data now seem spotty and useless, or will we look back in disbelief that companies were allowed to track our activities? Will AI have taken over human cognition and make these questions obsolete, because we’re in a suspended dream state, our bodies used as energy to power super computers, unsure of what is real and what is simulated? Important questions.

Tags: humor, xkcd

Permalink

16Aug / 2022

Google Maps incorrectly pointing people to crisis pregnancy centers

Davey Alba and Jack Gillum, for Bloomberg, found that Google Maps commonly points people to crisis pregnancy centers, non-medical locations that encourage women to follow through with pregnancy, when they search for “abortion clinic”.

Tags: abortion, Bloomberg, Google, search

Permalink

12Aug / 2022

Looking for falsified images in Alzheimer’s study

Charles Piller, for Science, highlights the work of Matthew Schrag, who uses image analysis to look for falsified data, recently scrutinizing a link between a protein and Alzheimer’s:

“So much in our field is not reproducible, so it’s a huge advantage to understand when data streams might not be reliable,” Schrag says. “Some of that’s going to happen reproducing data on the bench. But if it can happen in simpler, faster ways—such as image analysis—it should.” Eventually Schrag ran across the seminal Nature paper, the basis for many others. It, too, seemed to contain multiple doctored images.

Science asked two independent image analysts—Bik and Jana Christopher—to review Schrag’s findings about that paper and others by Lesné. They say some supposed manipulation might be digital artifacts that can occur inadvertently during image processing, a possibility Schrag concedes. But Bik found his conclusions compelling and sound. Christopher concurred about the many duplicated images and some markings suggesting cut-and-pasted Western blots flagged by Schrag. She also identified additional dubious blots and backgrounds he had missed.

Tags: Alzheimer’s, protein, science

Permalink

15Jun / 2022

Unreliable FBI crime data

The Marshall Project and Axios report that the FBI changed their reporting system last year, and 40 percent of law enforcement agencies didn’t submit any data:

In 2021, the FBI retired its nearly century-old national crime data collection program, the Summary Reporting System used by the Uniform Crime Reporting (UCR) program. The agency switched to a new system, the National Incident-Based Reporting System (NIBRS), which gathers more specific information on each incident. Even though the FBI announced the transition years ago and the federal government spent hundreds of millions of dollars to help local police make the switch, about 7,000 of the nation’s 18,000 law enforcement agencies did not successfully send crime data to the voluntary program last year.

I am sure policymakers will definitely be very responsible and cite data appropriately and not cherrypick from incomplete data to push an agenda.

Tags: Axios, crime, FBI, Marshall Project

Permalink

30Nov / 2021

Simpson’s Paradox in vaccination data

This chart, made by someone who is against vaccinations, shows a higher mortality rate for those who are vaccinated versus those who are not. Strange. It shows real data from the Office of National Statistics in the UK. As explained by Stuart McDonald, Simpson’s Paradox is at play:

[W]ithin the 10-59 age band, the average unvaccinated person is much younger than the average vaccinated person, and therefore has a lower death rate. Any benefit from the vaccines is swamped by the increase in all-cause mortality rates with age.

If you’re unfamiliar, Simpson’s Paradox is when a trend appears in separate groups but then disappears or reverses when you combine the groups. In this case, the confounding factors of age and vaccine uptake makes the above chart useless.

Tags: coronavirus, Simpson's Paradox, vaccination

Permalink

11Oct / 2021

Scientists with bad data

Tim Harford warns against bad data in science:

Some frauds seem comical. In the 1970s, a researcher named William Summerlin claimed to have found a way to prevent skin grafts from being rejected by the recipient. He demonstrated his results by showing a white mouse with a dark patch of fur, apparently a graft from a black mouse. It transpired that the dark patch had been coloured with a felt-tip pen. Yet academic fraud is no joke.

Tags: fake data, science, Tim Harford

Permalink

← Older posts

Reader

Category Archives: Mistaken Data

Flawed Rotten Tomatoes ratings

Supermarket provides AI-driven meal planner and is disappointed by the internet using it to output weird recipes

Honesty research likely faked data

Bias in AI-generated images

Historical data

Google Maps incorrectly pointing people to crisis pregnancy centers

Looking for falsified images in Alzheimer’s study

Unreliable FBI crime data

Simpson’s Paradox in vaccination data

Scientists with bad data

Meta