Collection of NBA basketball data sources and apps

If you’re into basketball data, Sravan Pannala is keeping a running list of data sources, apps, and visualization on the topic. I can always use more basketball data, so I’ll bookmark this for later.

Tags: ,

Data on net worth, income, and savings

Data for the 2022 Survey of Consumer Finances, from the Federal Reserve Board, was made available recently. It comes out every three years, so you can see how income and savings have changed over time, broken down by demographics.

For example, the above shows that net worth increased, across age groups, since the last release in 2019. The chart is from the SCF’s “chartbook,” which isn’t the most elegant thing in the world, but it works.

Tags: ,

When Mike Breen announces “Bang!”

Mike Breen is a well-known NBA basketball announcer. When a player hits a big three-pointer, Breen often uses his catchphrase, “Bang!” Someone counted all the times he yelled the phrase for each player, because sure why not. Stephen Curry and Klay Thompson are a cut above the rest.

Tags: , ,

Tattoos and impulsiveness dataset

For their research on tattoos and choice, Bradley Ruffle and Anne Wilson provide the dataset for perusal:

Survey and experimental evidence documents discrimination against tattooed individuals in the labor market and in commercial transactions. Thus, individuals’ decision to get tattooed may reflect short-sighted time preferences. We show that, according to numerous measures, those with tattoos, especially visible ones, are more short-sighted and impulsive than the non-tattooed. Almost nothing mitigates these results, neither the motive for the tattoo, the time contemplated before getting tattooed nor the time elapsed since the last tattoo. Even the expressed intention to get a(nother) tattoo predicts increased short-sightedness and helps establish the direction of causality between tattoos and short-sightedness.

It’s a couple of hundred responses from the tattooed individuals, but maybe there’s something there. The dataset is linked towards the bottom of the page.

Tags:

Guide for working with machine learning datasets

As part of the Knowing Machines research project, A Critical Field Guide for Working with Machine Learning Datasets, by Sarah Ciston, offers advice for working through the life cycle of complex and large datasets:

Machine learning datasets are powerful but unwieldy. They are often far too large to check all the data manually, to look for inaccurate labels, dehumanizing images, or other widespread issues. Despite the fact that datasets commonly contain problematic material — whether from a technical, legal, or ethical perspective — datasets are also valuable resources when handled carefully and critically. This guide offers questions, suggestions, strategies, and resources to help people work with existing machine learning datasets at every phase of their lifecycle. Equipped with this understanding, researchers and developers will be more capable of avoiding the problems unique to datasets. They will also be able to construct more reliable, robust solutions, or even explore promising new ways of thinking with machine learning datasets that are more critical and conscientious.

Plus points for framing the guide in a spreadsheet layout.

Tags: , , ,

Database of feathers

There’s a database of feathers called Featherbase, because of course there is:

Featherbase is a working group of German feather scientists and other collectors worldwide who came together with their personal collections and created the biggest and most comprehensive online feather library in the world. Using our website, it is possible to identify feathers from hundreds of different species, compare similarities between them, work out gender or age-specific characteristics and look at the statistics of countless feather measurements.

Tags: ,

Scraping public data ruled legal

For TechCrunch, Zack Whittaker reporting:

In its second ruling on Monday, the Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act, or CFAA, which governs what constitutes computer hacking under U.S. law.

The Ninth Circuit’s decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo.

Tags: ,

1950 Census released by U.S. National Archives

For privacy reasons, there’s a 72-year restriction on individual Census records, which include names and addresses. It’s 72 years today since the release of the 1950 Census. The scanned paper records are available for browsing and downloading.

Tags: , ,

World Bank’s Gender Data Portal

In an effort to make gender inequalities more obvious, World Bank updated their Gender Data Portal:

The World Bank Group has redesigned its Gender Data Portal with these audiences in mind by offering over 900 gender indicators in different formats, ranging from raw data to appealing visualizations and stories. Making sex-disaggregated data easier to analyze, interpret and visualize will bring into focus gender issues that are frequently invisible, including on topics such as digital development, transport, and water. It will highlight existing gender gaps as well as gaps in the availability of gender data.

Tags: ,

More detailed data release from Census 2020

After a lot of angst over the past few years around undercount, representation, and anonymization, the Census Bureau released detailed data from the 2020 decennial census:

The U.S. Census Bureau today released additional 2020 Census results showing an increase in the population of U.S. metro areas compared to a decade ago. In addition, these once-a-decade results showed the nation’s diversity in how people identify their race and ethnicity.

“We are excited to reach this milestone of delivering the first detailed statistics from the 2020 Census,” said acting Census Bureau Director Ron Jarmin. “We appreciate the public’s patience as Census Bureau staff worked diligently to process these data and ensure it meets our quality standards.”

These statistics, which come from the 2020 Census Redistricting Data (Public Law 94-171) Summary File, provide the first look at populations for small areas and include information on Hispanic origin, race, age 18 and over, housing occupancy and group quarters. They represent where people were living as of April 1, 2020, and are available for the nation, states and communities down to the block level.

Tags: ,