Introduction to Data Science, an open source book

Introduction to Data Science, by Harvard biostatistics professor Rafael A. Irizarry, is an open source book that provides, as you might have guessed, an introduction to data science:

The demand for skilled data science practitioners in industry, academia, and government is rapidly growing. This book introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression, and machine learning.

Tags: , , ,

Unemployed data scientist

It seems a lot of data scientists have either left or were laid off from their jobs during the past few months. Jacqueline Nolis and Emily Robinson, data scientists who hosted a podcast and wrote a book on building a career in the field, happened to be in the lot. So naturally, they brought back the podcast for a bonus episode on their experiences with sudden unemployment and the job search.

I’ve never had a “real” job (as some tend to tell me), so workplace experiences are always interesting to me, like peering into an aquarium. The layoff process seems not fun.

Tags: , ,

Revisiting data science, the career

In 2012, Thomas Davenport and DJ Patil outlined a budding career choice called “data science” where people, with a combination of programming and statistics, made sense of “big” datasets. For Harvard Business Review, Davenport and Patil revisit the career ten years later:

A decade later, the job is more in demand than ever with employers and recruiters. AI is increasingly popular in business, and companies of all sizes and locations feel they need data scientists to develop AI models. By 2019, postings for data scientists on Indeed had risen by 256%, and the U.S. Bureau of Labor Statistics, predicts data science will see more growth than almost any other field between now and 2029. The sought-after job is generally paid quite well; the median salary for an experienced data scientist in California is approaching $200,000.

So data science is looking pretty strong.

At the time of their first article, I was writing my dissertation, aiming to finish my PhD in statistics in a year. I wondered what I was going to do after. Statisticians, including me, were resistant to data science, or at the least had mixed feelings about it. They felt they were already doing it, so there was no need for a new field of study. Plus, statistician was already declared the “sexy” job of the decade three years prior. We still had time left.

I don’t hear those arguments anymore. There were overlapping skills to start, and the overlap seemed to increase over time. The label seemed to grow less important, as statisticians became data scientists and data scientists learned more analysis.

When people ask me what I do, I don’t say that I’m a statistician. I just say I help interpret data, and if I’m pressed, I say that I make a lot of charts.

Tags: ,

NLM announces rescheduled Curation at Scale Workshop

Data curation plays a critical role in today’s biomedical research and ensures scientific data will be accessible for future research and reuse. In the time of pandemics, the need to get scientific information to researchers, medical personnel, and the public as quickly as possible is greater than ever before. In response to the need for … Continue reading NLM announces rescheduled Curation at Scale Workshop

The post NLM announces rescheduled Curation at Scale Workshop appeared first on NCBI Insights.

New positions: Data Scientist in Public Health Epidemiology and Postdoc in Statistical Methods

I am looking to fill two positions at the Big Data InstituteNuffield Department of Population HealthUniversity of Oxford: a Data Scientist in Public Health Epidemiology and a Postdoctoral Researcher in Statistical Methods.

The Big Data Institute (BDI) is an interdisciplinary research centre that develops, evaluates and deploys efficient methods for acquiring and analysing biomedical data at scale and for exploiting the opportunities arising from such studies. The Nuffield Department of Population Health (NDPH), a key partner in the BDI, contains world-renowned population health research groups and is an excellent environment for multi-disciplinary teaching and research.  

The role of the Data Scientist in Public Health Epidemiology is to help pilot a project developing systems for continuous record linkage between a large Public Health England (PHE) data source and other population health records, with the aim of facilitating research into infectious diseases.

The post holder will manage and develop record linkage algorithms comparing records with relational databases containing health records via appropriate anonymization protocols, and manage and develop systems for identifying incoming records of interest, for near-real time updating of SQL databases, and for issuing email and SMS alerts in response to these events. The responsibilities will also include contributing to large-scale statistical studies using public health records to investigate disease epidemiology, and analysing and interpreting results, reviewing and refining working hypotheses, writing reports and presenting findings to colleagues.

To be considered, applicants will hold a degree in Computer Science, Data Science, Statistics, or another relevant subject with a strong quantitative component, or have equivalent experience. They will also need an understanding of relational database construction and SQL queries, experience coding in at least one common programming language (e.g. C#, Java, Python) and good interpersonal skills with the ability to work closely with others as part of a team, while taking personal responsibility for assigned tasks.

The role of the Postdoctoral Researcher in Statistical Methods is to develop statistical methods based on the harmonic mean p-value (HMP) approach. The HMP bridges classical and Bayesian approaches to model-averaged hypothesis testing, with applications to very large-scale data analysis problems in biomedical science.

The post holder will join a team with expertise in statistical inference, population genetics, genomics, evolution, epidemiology and infectious disease. The responsibilities will include developing statistical methods based on the HMP, undertaking research under the direction of the principal investigator, helping with supervision within the project as required, driving forward manuscripts for publication in collaboration with group members and disseminating results through other means such as academic conferences.

To be considered, applicants will hold, or be close to completion of, a PhD/DPhil involving statistical methods development and a track record of publication-quality methods development in statistical theory or methods development. The ability to work independently in pursuing the goals of an agreed research plan and excellent interpersonal skills and the ability to work closely with others as a team are also essential.

The closing date for both positions is noon on the 5th May 2021. Only applications received through the online system will be considered:

Jobs of a data scientist

Roger Peng outlines four main roles of a data scientist:

If you’re reading this and find yourself saying “I’m not an X” where X is either scientist, statistician, systems engineer, or politician, then chances are that is where you are weak at data science. I think a good data scientist has to have some skill in each of these domains in order to be able to complete the basic data analytic iteration.

The good thing about data science is that you can apply the skills to different fields and tasks. It’s also one of the challenges when you’re in the early phases of learning, because you have to figure out what to work on. This should point you in the right direction.

See also: Peng’s tentpoles of data science.

Tags: , ,

Audit advanced data science course online

Jeff Leek and Roger Peng started their course Advanced Data Science at Johns Hopkins University. It’s meant for JHU students, but you can learn from the weekly course material for free:

The class is not designed to teach a set of statistical methods or packages – there are a ton of awesome classes, books, and tutorials about those things out there! Rather the goal is to help you to organize your thinking around how to combine the things you have learned about statistics, data manipulation, and visualization into complete data analyses that answer important questions about the world around you.

So you know the methods and tools (or how to learn them on your own), but you want to learn more about putting it all together.

Nice. I could probably use a refresher.

You can get the weekly updates here.

Tags: , , , ,

Computational Medicine Codeathon and AWS workshop at Chapel Hill in March

NIH is pleased to announce a computational medicine-focused codeathon. To apply, please complete the application form by February 25, 2020. We will also be offering a free workshop, AWS Technical Essentials, the day before the codeathon. Read on for more information … Continue reading

Request for proposals: Single Cell in the Cloud codeathon at NYGC in January

The New York Genome Center is hosting an NCBI  Single Cell in the cloud codeathon from January 15-17, 2020. Submissions for project proposals are due December 2nd. Please submit your proposal and apply here. What topics are in scope? This codeathon … Continue reading