News organizations blocking OpenAI

Ben Welsh has a running list of the news organizations blocking OpenAI crawlers:

In total, 532 of 1,147 news publishers surveyed by the homepages.news archive have instructed OpenAI, Google AI or the non-profit Common Crawl to stop scanning their sites, which amounts to 46.4% of the sample.

The three organizations systematically crawl web sites to gather the information that fuels generative chatbots like OpenAI’s ChatGPT and Google’s Bard. Publishers can request that their content be excluded by opting out via the robots.txt convention.

On the web, it used to be that you would write or make something and there would be a link to the thing. Other websites could link to the thing, and people would go to the place with the thing. With this recent AI wave, a lot of the thing ends up elsewhere and no one sees the original place.

Fun times ahead.

Tags: , ,

Finding a troll’s identity

A troll kept leaving comments on a woman’s TikTok videos, so she figured out who he was by following bits of information.

@rx0rcist Congrats, daddy. #heybestie #accountability #rx0rcist ♬ Chopin Nocturne No. 2 Piano Mono – moshimo sound design

The sleuthing genre of videos that find something based on digital footprints continues to fascinate. Plus, this one is really satisfying. Although it also makes me wonder about privacy and people using the bits of information for bad things.

Tags: , ,

Save recipes to your email

Yums, by Matthew Phillips, is a quick and simple way to save recipes via email:

Yums is a recipe storage system built on top of email. Content on the web disappears with time, but email you have control over forever.

To use Yums, find a recipe you like and send an email to save@yums.email. In the body paste the link to the recipe you want to save. You can paste as many links as you like, separated by new lines.

Super easy and the email is just the recipe without the backstory. I understand why people include all the wordage and pictures before the actual recipe, but it’s gotten out of hand.

Tags: ,

Directory of date-me docs

Instead of using dating apps, some have turned to the date-me doc, which is a single, view-only page about the individual. It’s appealing, because it’s a flexible format that lets you include what you want and how you want. With the surge in interest, of course someone started to tabulate the docs.

Tags: ,

Oddly specific ad profiles

Advertising funds a big chunk of the web, but for advertisers to continue to spend, their placements have to deliver results. So companies collect data about people’s online activity and create profiles based on the behavior. For The Markup, Jon Keegan and Joel Eastwood, dig in to the specificity of these profiles.

Profiles get stuck in segments or groups, and advertisers can choose which segment to put ads in front of. The above are finance-based segments. I’ve always dreamed of being a “Silver Sophisticate” myself.

You can download the data the project is based on here.

Tags: , ,

NYT switches to CDC data for their Covid dashboard

After three years, The New York Times is switching away from local data collection to data from the Centers for Disease Control and Prevention:

As the virus began to spread rapidly in the United States in March 2020, it became clear that there was no single source that tracked infections at the local level. In the absence of comprehensive government data, The Times quickly built a custom system for gathering, vetting and publishing data from more than 100 state and local government sources.

By collecting the data continually, and from multiple levels of government, The Times was able to map the spread of the virus, with updated information published several times a day.

It’s sad that NYT had to collect data at all, but I’m glad they did. Those Covid pages were an invaluable resource those first two years.

Tags: , ,

Data warehouse at the supermarket

Grocery stores with loyalty programs collect data on what and when you buy at their stores. Then they sell that data, because of course they do. For The Markup, Jon Keegan delves into why that matters when two big companies, Kroger and Albertsons, plan to get together:

In October 2022, Kroger and another top supermarket chain, Albertsons, announced plans for a $24.6 billion merger that would combine the top two supermarket chains in the U.S., creating stiff competition for Walmart, the overall top seller of groceries. U.S. regulators and members of Congress are scrutinizing the deal, including by examining its potential to erode privacy: Kroger has carefully grown two “alternative profit business” units that monetize customer information, expected by Kroger to yield more than $1 billion in “profits opportunity.” Folding Albertsons into Kroger will potentially add tens of millions of additional households to this data pool, netting half the households in America as customers.

Tags: , , , ,

Sleuthing for birth dates, with just TikTok profiles as clue

TikTok user notkahnjunior figures out people’s birth dates through the psuedo-privacy of the internet. People give her their TikTok profile, and she takes it from there.

@notkahnjunior Replying to @knoughpe ♬ original sound – kahn

No special tools required. Just web searches coupled with interactions among those who don’t know or care about privacy on the internets. It seems too easy. But it is also entertaining.

Tags: , ,

Border enforcement data collection

Drew Harwell, for The Washington Post, reporting on a growing database and who has access to the records:

The rapid expansion of the database and the ability of 2,700 CBP officers to access it without a warrant — two details not previously known about the database — have raised alarms in Congress about what use the government has made of the information, much of which is captured from people not suspected of any crime. CBP officials told congressional staff the data is maintained for 15 years.

Details of the database were revealed Thursday in a letter to CBP Commissioner Chris Magnus from Sen. Ron Wyden (D-Ore.), who criticized the agency for “allowing indiscriminate rifling through Americans’ private records” and called for stronger privacy protections.

Tags: , ,

Open cameras and AI to locate Instagram photos

Dries Depoorter recorded video from open cameras for a week and scraped Instagram photos. Then he used AI to identify the people in the photos and their locations. Depoorter calls it The Follower.

Tags: , , , ,