If you’re into basketball data, Sravan Pannala is keeping a running list of data sources, apps, and visualization on the topic. I can always use more basketball data, so I’ll bookmark this for later.
Tags: basketball, Sravan Pannala
Adam's Blogroll: click through to the author's blog
If you’re into basketball data, Sravan Pannala is keeping a running list of data sources, apps, and visualization on the topic. I can always use more basketball data, so I’ll bookmark this for later.
Tags: basketball, Sravan Pannala
Posted by basketball, Data Sources, Sravan Pannala
inData for the 2022 Survey of Consumer Finances, from the Federal Reserve Board, was made available recently. It comes out every three years, so you can see how income and savings have changed over time, broken down by demographics.
For example, the above shows that net worth increased, across age groups, since the last release in 2019. The chart is from the SCF’s “chartbook,” which isn’t the most elegant thing in the world, but it works.
Tags: Federal Reserve, income
Posted by Data Sources, Federal Reserve, income
inMike Breen is a well-known NBA basketball announcer. When a player hits a big three-pointer, Breen often uses his catchphrase, “Bang!” Someone counted all the times he yelled the phrase for each player, because sure why not. Stephen Curry and Klay Thompson are a cut above the rest.
Tags: bang, basketball, Mike Breen
Posted by bang, basketball, Data Sources, Mike Breen
inFor their research on tattoos and choice, Bradley Ruffle and Anne Wilson provide the dataset for perusal:
Survey and experimental evidence documents discrimination against tattooed individuals in the labor market and in commercial transactions. Thus, individuals’ decision to get tattooed may reflect short-sighted time preferences. We show that, according to numerous measures, those with tattoos, especially visible ones, are more short-sighted and impulsive than the non-tattooed. Almost nothing mitigates these results, neither the motive for the tattoo, the time contemplated before getting tattooed nor the time elapsed since the last tattoo. Even the expressed intention to get a(nother) tattoo predicts increased short-sightedness and helps establish the direction of causality between tattoos and short-sightedness.
It’s a couple of hundred responses from the tattooed individuals, but maybe there’s something there. The dataset is linked towards the bottom of the page.
Tags: tattoos
Posted by Data Sources, tattoos
inAs part of the Knowing Machines research project, A Critical Field Guide for Working with Machine Learning Datasets, by Sarah Ciston, offers advice for working through the life cycle of complex and large datasets:
Machine learning datasets are powerful but unwieldy. They are often far too large to check all the data manually, to look for inaccurate labels, dehumanizing images, or other widespread issues. Despite the fact that datasets commonly contain problematic material — whether from a technical, legal, or ethical perspective — datasets are also valuable resources when handled carefully and critically. This guide offers questions, suggestions, strategies, and resources to help people work with existing machine learning datasets at every phase of their lifecycle. Equipped with this understanding, researchers and developers will be more capable of avoiding the problems unique to datasets. They will also be able to construct more reliable, robust solutions, or even explore promising new ways of thinking with machine learning datasets that are more critical and conscientious.
Plus points for framing the guide in a spreadsheet layout.
Tags: ethics, guide, machine learning, Sarah Ciston
Posted by Data Sources, Ethics, guide, machine learning, Sarah Ciston
inThere’s a database of feathers called Featherbase, because of course there is:
Featherbase is a working group of German feather scientists and other collectors worldwide who came together with their personal collections and created the biggest and most comprehensive online feather library in the world. Using our website, it is possible to identify feathers from hundreds of different species, compare similarities between them, work out gender or age-specific characteristics and look at the statistics of countless feather measurements.
Posted by birds, Data Sources, feather
inFor TechCrunch, Zack Whittaker reporting:
In its second ruling on Monday, the Ninth Circuit reaffirmed its original decision and found that scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act, or CFAA, which governs what constitutes computer hacking under U.S. law.
The Ninth Circuit’s decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo.
Posted by Data Sources, Public, scraping
inFor privacy reasons, there’s a 72-year restriction on individual Census records, which include names and addresses. It’s 72 years today since the release of the 1950 Census. The scanned paper records are available for browsing and downloading.
Posted by Archive, census, Data Sources, history
inIn an effort to make gender inequalities more obvious, World Bank updated their Gender Data Portal:
The World Bank Group has redesigned its Gender Data Portal with these audiences in mind by offering over 900 gender indicators in different formats, ranging from raw data to appealing visualizations and stories. Making sex-disaggregated data easier to analyze, interpret and visualize will bring into focus gender issues that are frequently invisible, including on topics such as digital development, transport, and water. It will highlight existing gender gaps as well as gaps in the availability of gender data.
Tags: gender, World Bank
Posted by Data Sources, gender, World Bank
inAfter a lot of angst over the past few years around undercount, representation, and anonymization, the Census Bureau released detailed data from the 2020 decennial census:
The U.S. Census Bureau today released additional 2020 Census results showing an increase in the population of U.S. metro areas compared to a decade ago. In addition, these once-a-decade results showed the nation’s diversity in how people identify their race and ethnicity.
“We are excited to reach this milestone of delivering the first detailed statistics from the 2020 Census,” said acting Census Bureau Director Ron Jarmin. “We appreciate the public’s patience as Census Bureau staff worked diligently to process these data and ensure it meets our quality standards.”
These statistics, which come from the 2020 Census Redistricting Data (Public Law 94-171) Summary File, provide the first look at populations for small areas and include information on Hispanic origin, race, age 18 and over, housing occupancy and group quarters. They represent where people were living as of April 1, 2020, and are available for the nation, states and communities down to the block level.
Tags: Census Bureau, demographics
Posted by Census Bureau, Data Sources, demographics
in