DNA face to facial recognition in attempt to find suspect

In an effort to find a suspect in a 1990 murder, there was a police request in 2017 to use a 3-D rendering of a face based on DNA. For Wired, Dhruv Mehrotra reports:

The detective’s request to run a DNA-generated estimation of a suspect’s face through facial recognition tech has not previously been reported. Found in a trove of hacked police records published by the transparency collective Distributed Denial of Secrets, it appears to be the first known instance of a police department attempting to use facial recognition on a face algorithmically generated from crime-scene DNA.

This seems like a natural progression, but it should be easy to see how the pairing of the tech could cause all sorts of issues when someone’s face is poorly constructed and then misclassified with facial recognition. What’s the confidence interval equivalent for a face?

Tags: , , , , ,

Manual data labeling behind the AI

One of the things that makes AI seem neat is that it sometimes feels like magic when you enter a question and get a reasonable answer in a human-like tone. For Bloomberg, Davey Alba reports on how some of that magic is just a bunch of people labeling data for low wages:

Other technology companies training AI products also hire human contractors to improve them. In January, Time reported that laborers in Kenya, paid $2 an hour, had worked to make ChatGPT less toxic. Other tech giants, including Meta Platforms Inc., Amazon.com Inc. and Apple Inc. make use of subcontracted staff to moderate social network content and product reviews, and to provide technical support and customer service.

“If you want to ask, what is the secret sauce of Bard and ChatGPT? It’s all of the internet. And it’s all of this labeled data that these labelers create,” said Laura Edelson, a computer scientist at New York University. “It’s worth remembering that these systems are not the work of magicians — they are the work of thousands of people and their low-paid labor.”

Tags: , , ,

Guide for working with machine learning datasets

As part of the Knowing Machines research project, A Critical Field Guide for Working with Machine Learning Datasets, by Sarah Ciston, offers advice for working through the life cycle of complex and large datasets:

Machine learning datasets are powerful but unwieldy. They are often far too large to check all the data manually, to look for inaccurate labels, dehumanizing images, or other widespread issues. Despite the fact that datasets commonly contain problematic material — whether from a technical, legal, or ethical perspective — datasets are also valuable resources when handled carefully and critically. This guide offers questions, suggestions, strategies, and resources to help people work with existing machine learning datasets at every phase of their lifecycle. Equipped with this understanding, researchers and developers will be more capable of avoiding the problems unique to datasets. They will also be able to construct more reliable, robust solutions, or even explore promising new ways of thinking with machine learning datasets that are more critical and conscientious.

Plus points for framing the guide in a spreadsheet layout.

Tags: , , ,

Sources and attribution for AI-generated images

AI-based image generation take bits and pieces from existing people-made images and tries to smartly mash sources together for something new. However, that something new often looks a lot like someone else’s work. It’s why Getty Images is suing Stability AI, the company behind Stable Diffusion.

Stable Attribution goes in the opposite direction of image generation, and instead tries to identify source images of a given AI-generated image. Load an image and Stable Attribution looks for the most similar images in the Stable Diffusion training data.

The explainer on the Stable Attribution homepage is a nice abstraction of how the process works.

Tags: , , ,

Manual removal of harmful text to train AI models

AI training data comes from the internet, and as we know but maybe forget sometimes, there are harmful areas that are terrible for people. For Time, Billy Perrigo reports on how OpenAI outsourced a firm to label such data, which required people to read disturbing text:

To build that safety system, OpenAI took a leaf out of the playbook of social media companies like Facebook, who had already shown it was possible to build AIs that could detect toxic language like hate speech to help remove it from their platforms. The premise was simple: feed an AI with labeled examples of violence, hate speech, and sexual abuse, and that tool could learn to detect those forms of toxicity in the wild. That detector would be built into ChatGPT to check whether it was echoing the toxicity of its training data, and filter it out before it ever reached the user. It could also help scrub toxic text from the training datasets of future AI models.

To get those labels, OpenAI sent tens of thousands of snippets of text to an outsourcing firm in Kenya, beginning in November 2021. Much of that text appeared to have been pulled from the darkest recesses of the internet.

Tags: , , , ,

Generative AI trade-offs

People have been having fun with generative AI lately. Enter a prompt and get a believable body of text, or enter descriptive text and get a photorealistic image. But as with all things that are fun on the internet, there are those who are looking to exploit the popularity. Maggie Appleton discusses the trade-offs:

There’s a swirl of optimism around how these models will save us from a suite of boring busywork: writing formal emails, internal memos, technical documentation, marketing copy, product announcement, advertisements, cover letters, and even negotiating with medical insurance companies.

But we’ll also need to reckon with the trade-offs of making insta-paragraphs and 1-click cover images. These new models are poised to flood the web with generic, generated content.

You thought the first page of Google was bunk before? You haven’t seen Google where SEO optimizer bros pump out billions of perfectly coherent but predictably dull informational articles for every longtail keyword combination under the sun.

Tags: , , ,

AI-based image generation ethics

AI-based image generation is having a moment. Time some text and you can get a piece of art that resembles the style of your favorite artist. However, there’s an ethical dilemma with the source material. Andy Baio talked to Hollie Mengert, whose artwork was used to create a model for Stable Diffusion:

“For me, personally, it feels like someone’s taking work that I’ve done, you know, things that I’ve learned — I’ve been a working artist since I graduated art school in 2011 — and is using it to create art that that I didn’t consent to and didn’t give permission for,” she said. “I think the biggest thing for me is just that my name is attached to it. Because it’s one thing to be like, this is a stylized image creator. Then if people make something weird with it, something that doesn’t look like me, then I have some distance from it. But to have my name on it is ultimately very uncomfortable and invasive for me.”

AI-generated charts are only tangentially a thing so far. We humans still have a leg up in the context and meaning part of understanding data.

Tags: , , , , ,

Crisis Text Line and data sharing

Crisis Text Line was sharing data with a for-profit business started by its founder. Given the sensitivity and nature of the data, this relationship understandably seemed questionable at best. Danah Boyd, who serves on the board for Crisis Text Line, provides a detailed view into what happened and why:

The practice of non-profit governance requires collectively grappling with trade-off after trade-off. I have been a volunteer director of the board of Crisis Text Line for 8 years both because I believe in the mission and because I have been grateful to govern alongside amazing directors from whom I constantly learn. This doesn’t mean it’s been easy and it definitely doesn’t mean we always agree. But we do push each other and I learn a lot in the process. We strived to govern ethically, but that doesn’t mean others would see our decisions as such. We also make decisions that do not pan out as expected, requiring us to own our mistakes even as we change course. Sometimes, we can be fully transparent about our decisions; in other situations – especially when personnel matters are involved – we simply can’t. That is the hardest part of governance, both for our people and for myself personally.

Tags: , ,

AI-generated voice used to fake phone call and steal money

Reporting for The Washington Post, Drew Harwell describes the case of the fake voice used for bad things:

Thieves used voice-mimicking software to imitate a company executive’s speech and dupe his subordinate into sending hundreds of thousands of dollars to a secret account, the company’s insurer said, in a remarkable case that some researchers are calling one of the world’s first publicly reported artificial-intelligence heists.

The managing director of a British energy company, believing his boss was on the phone, followed orders one Friday afternoon in March to wire more than $240,000 to an account in Hungary, said representatives from the French insurance giant Euler Hermes, which declined to name the company.

Publicly available software that makes it straightforward to impersonate others digitally: what could go wrong?

Tags: , , , ,

AI-generated voice used to fake phone call and steal money

Reporting for The Washington Post, Drew Harwell describes the case of the fake voice used for bad things:

Thieves used voice-mimicking software to imitate a company executive’s speech and dupe his subordinate into sending hundreds of thousands of dollars to a secret account, the company’s insurer said, in a remarkable case that some researchers are calling one of the world’s first publicly reported artificial-intelligence heists.

The managing director of a British energy company, believing his boss was on the phone, followed orders one Friday afternoon in March to wire more than $240,000 to an account in Hungary, said representatives from the French insurance giant Euler Hermes, which declined to name the company.

Publicly available software that makes it straightforward to impersonate others digitally: what could go wrong?

Tags: , , , ,