Discrimination algorithms

Claire Cain Miller for the Upshot on when algorithms discriminate:

There is a widespread belief that software and algorithms that rely on data are objective. But software is not free of human influence. Algorithms are written and maintained by people, and machine learning algorithms adjust what they do based on people’s behavior. As a result, say researchers in computer science, ethics and law, algorithms can reinforce human prejudices.

I bring this up often, because I apparently still hold a grudge, but I will always remember the time I told someone I study statistics. He responded skeptically, "Don't computers do that for you?"

In the words of Jeffrey Heer: "It's an absolute myth that you can send an algorithm over raw data and have insights pop up."

Tags: , ,

Data into art and connecting to humans

Visualization tends to rest in the realm of efficiency and accuracy. From a research perspective, these are easier things to measure than say, emotion and connection to the data that a visualization represents. In decision-making and well, just overall opinion about the world we live in, social aspects of data play a significant role. The Creators Project interviewed data artists who work on this fuzzier side of insight.

Tags: ,

Just Skin Deep — Your Immune System at the Surface

The skin is the human body’s largest organ. At 1.8 square meters for the average adult, skin covers about as much area as a large closet, and accounts for 12-15% of total body weight. The incredible variation in skin — … Continue reading »

The post Just Skin Deep — Your Immune System at the Surface appeared first on PLOS Blogs Network.

The dots are people

Fatailities in Baghdad

The simple analysis is to approach data blind, as machine output. But this almost always produces an incomplete analysis and a detached, less than meaningful visualization. Jacob Harris, a developer at the New York Times, talks context, empathy, and what the dots represent.

In reference to the New York Times' map of deaths in Baghdad after receiving the Wikileaks war logs:

Before it was a final graphic though, it was a demo piece I hastily hacked into Google Earth using its KML format. I remember feeling pretty proud of myself at how cool even a crude rendering like this looked, and the detailed work I had done to pull out all the data within reports to see these dots surge and wane as I dragged the slider. Then I remembered that each of those data points was a life snuffed out, and I suddenly felt ashamed of my pride in my programming chops. As data journalists, we often prefer the "20,000 foot view," placing points on a map or trends on a chart. And so we often grapple with the problems such a perspective creates for us and our readers—and from a distance, it's easy to forget the dots are people. If I lose sight of that while I am making the map, how can I expect my readers to see it in the final product?

Tags:

Posted by in design, human

Tags:

Permalink

Chart none of the things

When it comes to storytelling, copious amounts of data often means lots of charts. Sometimes though, a chart isn't what you need. Sarah Slobin, a graphics editor for the Wall Street Journal, talks about such an experience. The urge was to chart all the things, but in the end, there was a better route.

Losing the graphics made sense to all of us on the project. What worked best for the story won out, as it should. We didn't need graphics for the sake of graphics, especially graphics that weren't working in service of the piece. And photos, while not numbers, are also data in their own right. My own internal calculus, data = charts, was based on habit and that habit had become like armor over time, I put it on without thinking before trudging off to battle. So now, at the outset of each project, I’m working on learning to be really honest with myself each time I sort through a set of statistics; "What does the reader really need here?" Not, "What cool thing can I do with these data?"

Tags:

Posted by in design, human

Tags:

Permalink

Inadvertent algorithmic cruelty

If you logged into Facebook the past couple of weeks, you saw your friends' automatically generated year-end reviews. Estimated events and popular pictures appear in chronological order. Facebook eventually pinned your own year in review at the top of your feed for perusal. Seems harmless — until you realize there are people who don't want to look back, like Eric Meyer, whose daughter died this year.

And I know, of course, that this is not a deliberate assault. This inadvertent algorithmic cruelty is the result of code that works in the overwhelming majority of cases, reminding people of the awesomeness of their years, showing them selfies at a party or whale spouts from sailing boats or the marina outside their vacation house.

But for those of us who lived through the death of loved ones, or spent extended time in the hospital, or were hit by divorce or losing a job or any one of a hundred crises, we might not want another look at this past year.

See also Meyer's follow-up. While many took the original post as a way to hate on Facebook, Meyer didn't mean it like that.

Tags: , ,

Explorations of People Movements

Running

In 2010, I surveyed visual explorations of traffic, and it was all about how cars, planes, trains, and ships moved about their respective landscapes. It was implied that the moving things had people in them, but the focus was mostly on the things themselves. Location data was a byproduct of the need of vehicles to get from point A to point B in the most efficient way possible.

Airplanes floated across the sky. Cabs left ghostly trails in the city. The visualization projects were, and still are, impressive.

However, around the same time, it was growing more common for people to carry phones with GPS capability and these days, it's commonplace in areas where most people use smartphones. This new data source gave rise to similar but different visualization projects that were more granular.

We see people. Movements.

Aggregated estimates

Let's not get ahead of ourselves though. Data for people movements has been around for a while. It's just that it came as aggregated estimates — and other forms of course, as you go back further. I mean, the ever so popular Minard chart of Napoleon's March shows people moving.

During this century though, way back in 2007 (a whole six years ago) I played with some global migration data, and that was only on the country level. It also didn't include all the countries in their entirety if I remember correctly.

Getting more local and recent, Jon Bruner used data from the Internal Revenue Service to show migration in the United States, at the county level. Red lines represent a net migration leaving the selected county, and black lines represent arrivals.

Where Americans are moving

Bruner followed up the next year with a more refined version. The map seemed to re-spark interest in migration in other places, such as Carlo Zapponi's peoplemovin project, which still used lines connecting regions but went sans map.

Moving to the USA

WNYC's The Brian Lehrer Show asked listeners who moved to or from New York about where they came from or where they went and then posted the data so that people could play around with it. Interesting projects came out of that little experiment.

Even earlier this year, Hyperakt and Ekene Ijeoma visualized refugee data that goes back to 1975.

Refugee project

So there's a lot of data that shows how people move, but until recent years, it's been typically in aggregate and only reveals endpoints.

Single points from a lot of people

Like I said though, GPS in phones brought a different type of data. And it wasn't just directions to a place that remained isolated in your car. People check-in to places — restaurants, stores, airports — and they share it with others. Many keep their location data public.

Instead of aggregated estimates, you can infer the movement of individuals, through individual check-ins. For example, Foursquare looked at transportation over a year via people who checked in at airports, railway stations, and highways.

foursquare travels

And there are plenty of Foursquare-related maps that let readers make their own inferences. Given enough people with smartphones who use the service, you can see where those people go. The now defunct Weeplaces was a fun interactive that let you explore your check-in data.

This single point check-in data eventually found itself on other services, such as Facebook, Twitter, and Flickr, attached to status updates, tweets, and photos. Location was often implied with these objects, but it soon transitioned to latitude and longitude coordinates.

Anonymous and separate, the data points aren't always immediately informative. See them all at once? You often see obvious patterns. This drove many of Eric Fischer's maps. Zoom in enough and his maps just look like a bunch of dots, and zoom out there's an activity of the masses.

For example, Fischer looked at where tourists and locals go in major cities, based on geotagged Flickr photos.

Tourists pictures in San Francisco

The map looks like a lot of connecting lines in the view above. They're actually a bunch of dots that are close enough together that the streets in San Francisco grow apparent. Blue dots represent locals whereas red dots represent tourists. (It looks like there's a dichotomy between locals and tourists, but the red dots got precedence in this map, overlaid on top of the blue. Still, it's an interesting contrast.)

Fischer followed up the next year with a contrast between geotagged Flickr photos and tweets on Twitter. He called the series See Something or Say Something, and the idea was that people photograph areas with notable things whereas tweets are more everyday. He provided country-level views, but the zoomed in city-level views were far more interesting.

The granular data shows details that the aggregates obscure. That's what's so interesting. And it was just the beginning. More continuous tracking and the idea of phones as sensors were building momentum too.

Many points from a few people

Not surprisingly, continuous tracking seemed to first gain wider adoption in sports. Athletes want to improve their performance, so if there's a more concrete way to see what makes them better and by how much, then all the better. However, the technology can be expensive. Other times it was rough around the edges, involving video cameras with manual labor to put together some form of stop motion.

The technology and demand eventually caught up to each other though, which turns out better for everyone who's interested in their movements. Made by Humans, an artistic interpretation of athletes in motion, is one of my favorites.

These days, it's hard not to watch sports without sports analytics entering the picture. It used to be just random sports facts, but now there are sensors everywhere to track basketball players, log serving speed and ball placement in tennis, and survey quarterback passes at a higher accuracy than before.

Tim Duncan movements

Similar things exist for soccer, baseball, golf, and probably ping pong, bowling, and underwater basket weaving.

Many points from a lot of people

Enter the tracking for individuals today. Many apps for movement still focus on sports-like activity: Strava, RunKeeper, MapMyRun, Endomondo, and plenty more. The data is useful for people with the apps, because they can keep track of how far and how fast they walked, ran, or cycled. People can use this information to set goals and to improve their performance. Some just want to be more healthy.

However, just like the check-in maps of single data points, the path data from running and cycling can also be useful when you look at data from many people. For example, with limited access to RunKeeper data, I was able to get a sense of where people run in major cities. Here's a look at running in Washington, D.C.:

Where people run

Like I said, I only had access to some of the RunKeeper data. Strava released a more expansive interactive map that lets you see running and cycling around the world.

Strava cycling map

These apps used to drain your battery, because it took a lot of resources to constantly grab your location for long periods of time. That's why most people only logged their physical activity. It didn't last all day.

A project I was a part of in graduate school, pre-iPhone, tracked location throughout the day. It tried to at least. A separate GPS device connected to the phone via bluetooth, and we carried a second battery pack to replace the drained one midday.

Now you can download an app that uses your phone's built-in GPS. Leave it on all day to run in the background, and sometimes you don't even notice. Why just track running and cycling when you can track your movements all day? The Moves app, the "activity diary for your life" (and recently acquired by Facebook), stays on all day to log your location and estimates steps, calories burned, and your activities. Human is a similar app. They just released a bunch of maps that show activity in major cities.

Human maps

I use OpenPaths, because I like the data model for privacy and selective sharing. Although I have my eye on an actual GPS device that stores the data locally on one of those memory card things instead of the cloud.

Looking ahead

We can log our location all day every day, with little effort. What next? Well, the apps can even go higher granularity with increased sampling rates. Whatever the new phones allow, I guess. More sensors? Sure. Continuous data? Alright.

At some point though I hope that there are more people who care about where their data goes than there are who do not. I hope that people wonder how services use their data before they sign up for it. Why does company X provide such a neat service for free? I don't buy into the whole "if you're not paying, you're the product blah blah blah" stuff, and the reason behind a service can be beneficial to a community. But, it's important to know where such personal information — your location and movements — wanders off to in the endlessness of someone else's servers.

For example, German politician Malte Spitz provided six months of phone data to Zeit Online a couple of years ago to demonstrate what such data revealed. It showed where he went and how and when he made phone calls.

How comfortable would you be if a company had a similar profile of you, and what if they used the data for purposes other than improving the service they provide you?

As another example, Strava started a for-fee program that gives cities a license to access anonymized cycling data. The Oregon Department of Transportation paid $20,000 for a one-year license. The hope is that cities can use the detailed data to improve cycling routes. That would be great. However, what if a city closes routes, based on what they see in the data, rather than upgrade or maintain existing paths?

I'm not saying this has happened or will happen, but if there aren't enough people who care about where their data goes, it's easy to see how a less savory company might aim for profit over the good of its users. Naturally this goes for other types of data too — not just location.

At the end of the day, it's great that we can log detailed data almost effortlessly. But also, now is as good a time as any to remind ourselves the value of our personal information.

Modern Humans: Were We Really Better than Neanderthals, or Did We Just Get Lucky?

OLYMPUS DIGITAL CAMERA

We’ve all heard the story: dim-witted Neanderthals couldn’t quite keep up with our intelligent modern human ancestors, leading to their eventual downfall and disappearance from the world we know now. Apparently they needed more brain space for their eyes. The …

The post Modern Humans: Were We Really Better than Neanderthals, or Did We Just Get Lucky? appeared first on PLOS Blogs Network.

Looking for a bioinformatics expert?

What I have to offer:
  • A balanced background in both biology (BSc) and computer science (BCS)
  • Soon to be completed PhD
  • Extensive research experience in bioinformatics, genomics, phylogenetics/phylogenomics, evolution, and bacteria pathogenesis
  • Some previous research experience in medical imaging, ontology development, and metagenomics
  • An impressive publishing record (7 papers, 3 first authors, 2 more first authors under review)
  • Solid computational skills including Perl programming, database design (MySQL), parallel programming, and web design (PHP & JavaScript)
  • Good communication and social skills
  • More information
What I am looking for:
  • Post-doc or job (academic or industrial)
  • Preferably, a position where I have some significant manager or leadership responsibilities
  • Geographically interested in north eastern parts of North America (Ottawa down to New York), but would entertain positions elsewhere in N.A.
I didn't put any limitations on research interests, since I am open to many areas. However, anything having to due with the human microbiome project, human-bacteria interactions, or metagenomics would be of particular interest.

Please email me if you are interested or if you have suggestions on some good openings.