Introduction to Modern Statistics

Introduction to Modern Statistics by Mine Cetinkaya-Rundel and Johanna Hardin is a free-to-download book:

Introduction to Modern Statistics is a re-imagining of a previous title, Introduction to Statistics with Randomization and Simulation book. The new book puts a heavy emphasis on exploratory data analysis (specifically exploring multivariate relationships using visualization, summarization, and descriptive models) and provides a thorough discussion of simulation-based inference using randomization and bootstrapping, followed by a presentation of the related Central Limit Theorem based approaches.

Read it in the browser or buy a print version. A good deal either way.

Tags: ,

Addition of a Message Board to the blog


This is a short post just to point out that there is now a Message Board on this blog, where people can post community information, such as jobs and scholarships, as well as any other requests or information. The link is at the upper-right of the blog pages.

To post a message to the Board, send an email to: Leo van Iersel.


Meet DEB: Caroline Robertson, Program Assistant

cr

Caroline and her pup, Gracie

 

What’s your role here at DEB? I am a Program Assistant with the Ecosystem Science and Population and Community Ecology Clusters. One of my favorite tasks is coordinating travel arrangements (no, really!) because it breaks up my day and gets my brain going.

What did you study in school? I have a degree in anthropology from SIU Carbondale where I focused on cultural transmission of historical violence (see here for my senior thesis for a better understanding of what I enjoy studying). Currently, I am in school at American University studying Public Anthropology for a master’s degree and continuing to focus on the cultural transmission of historical violence (stay tuned for the online archive I am creating). I am scheduled to receive my degree in August after I finish my thesis, so wish me luck!

What do you love to do in your spare time? Spare time? What is that!? When I’m not doing school work, I spend any spare time playing with/training my Australian Shepherd puppy, Gracie! She takes up most of my fiancé’s and my time, but we love her for it!

Hamburgers or hotdogs? Depends who is cooking it, but if it’s a restaurant, then hamburgers!

Why do you enjoy working with DEB? DEB is full of kind people who are a joy to work with! There are many different personalities here that all have their own charm. I am happiest when working with people and so love to work with the diverse crowd that is DEB (I am an anthropologist after all).

 


A quick refresher for blog visitors

We’ve come a long way since launching the blog in February of 2013. For one, there are a lot more of you following us and reading posts on a regular basis.

So we’re interrupting our normal content to post a quick “nuts and bolts” refresher in the hope of improving your experience here and encouraging more participation.

Content:

This blog is an informal communication medium of the NSF’s Division of Environmental Biology. You can share with us anything you’d like us to know and we can provide you with information, pass along news and announcements, respond to your questions and comments and share our experiences. Suggestions for new and follow-up post topics are always appreciated.

Why doesn’t DEBrief run reader polls and ask for discussion of specific scenarios?

Soliciting advice or information from the public (e.g., us asking you questions that could be seen as placing undue burden on your time) is governed by various laws and takes place through formal mechanisms, for example by the BIO Advisory Committee. The blog is not an approved formal mechanism for doing this.

Comments:

Please do: your questions, comments, responses to posts, and even general venting are all welcome. We started the blog because we wanted to create an open opportunity for two-way communication between DEB and the research community.

Comments are moderated (see the blog policies) so an email address can be useful for follow-up but isn’t required. In fact, you can comment without filling in any of the information and stay anonymous if you wish.

Schedule:

We try to have something new roughly every week but make no guarantees since we put a high premium on quality content and also have other responsibilities, for example: managing proposal review.

For your convenience, you can click the links to the right to receive notifications of new posts or to subscribe via an RSS.

Sharing:

DEBrief is more than just captivating original content you might “like”. We also bring you important reminders and critical updates related to new and ongoing funding opportunities. Help us get this information to everyone who would benefit from seeing it. Sharing buttons for various traditional (email, print) and social (Facebook, Twitter, Reddit) media are at the bottom of each post.

Navigation:

Old posts are below and will continuously load as you scroll deeper into the past. Since some are quite long, most content for older posts has been moved below the fold. A search function, categorical filters, and monthly archives are available via the menus on the right-hand side of the page.

Up top, we have some tabs with permanent information:

  • “DEB Resources and Links” takes you directly to frequently sought official information hosted on the NSF.gov website.
  • “About” lists who we our and our aims in presenting information here.
  • “Blog Policies” explains what you should expect from us and what we expect from you while interacting through the blog.

Blog posts and formal publications


One possible use of blog posts is as first drafts of ideas that might make their appearance in a refereed publication at a later date. Thus, many of my blog posts have appeared in one form or another in my recent publications. Here I have listed the ones that I can remember using, just in case anyone wants a citable reference for the information in these posts.

A. Morrison DA (2013) Phylogenetic networks are fundamentally different from other kinds of biological networks. In W.J. Zhang (ed.) Network Biology: Theories, Methods and Applications (Nova Science Publishers, New York) pp. 23-68.

    9 Biological versus phylogenetic networks
  13 Network measures and phylogenetic networks
  23 An explanation of graph types
  25 Networks and bootstraps as tree-support criteria
  34 Networks of affinity rather than genealogy
  36 Networks of genealogy
  53 Are mathematical constraints biologically realistic?
  54 Some odd network definitions and terms
  63 Human races, networks and fuzzy clusters
  69 Is this the first network from conflicting datasets?
  70 Why do we still use trees for the Neandertal genealogy?
  72 Networks and most recent common ancestors
  74 Open questions about evolutionary networks, part 1
  75 Open questions about evolutionary networks, part 2
  76 Open questions about evolutionary networks, part 3
  88 When is there support for a large phylogeny?
  90 Explanation of the names for phylogenetic networks
  94 Phylogenetic position of turtles: a network view
  99 How networks differ from bootstrapped trees
107 We should present bayesian phylogenetic analyses using networks
115 Is there a philosophy of phylogenetic networks?

B. Morrison DA (2014) Phylogenetic networks — a new form of multivariate data summary for data mining and exploratory data analysis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4: in press.

  29 Network analysis of scotch whiskies
  50 Phylogenetic network of the FIFA World Cup
  61 How to interpret splits graphs
101 Distortions and artifacts in Principal Components Analysis analysis of genome data
103 Networks can outperform PCA ordinations in phylogenetic analysis
114 Network analysis of Genesis 1:3
119 Network of ancient Thai bronze Buddha images
134 A network analysis of Simon and Garfunkel
159 Networks and human inter-population variation
172 The acoustics of the Sydney Opera House

C. Morrison DA (2014) Next generation sequencing and phylogenetic networks. EMBnet.journal: Bioinformatics in Action 20: e760.

191 Next Generation Sequencing and phylogenetic networks

D. Morrison DA (2014) Phylogenetic networks: a review of methods to display evolutionary history. Annual Research and Review in Biology 4: 1518-1543.

    2 The first phylogenetic network (1755)
  21 The second phylogenetic network (1766)
  34 Networks of affinity rather than genealogy
  36 Networks of genealogy
  67 Metaphors for evolutionary relationships
  89 Relationship trees drawn like real trees
168 Who first used the term "phylogenetic network"?
182 Affinity networks updated
183 Reticulation patterns and processes in phylogenetic networks
187 What are evolutionary networks currently used for?

E. Morrison DA (2014) Rooted phylogenetic networks for exploratory data analysis. Advances in Research 2: 145-152.

  43 Rooted networks for exploratory data analysis

F. Morrison DA (2014) Is the Tree of Life the best metaphor, model or heuristic for phylogenetics? Systematic Biology 63: 628-638.

  23 An explanation of graph types
  34 Networks of affinity rather than genealogy
  36 Networks of genealogy
  58 Who published the first phylogenetic tree?
  89 Relationship trees drawn like real trees
143 Resistance to network thinking
144 Destroying the Tree of Life?
147 Should phylogenetic modelling proceed from simple to complex or vice versa?
171 Conflicting placental roots: network or tree?
182 Affinity networks updated

R for cats and cat lovers

Programmer catFollowing the lead of JavaScript for Cats by Maxwell Ogden, Scott Chamberlain and Carson Sievert wrote R for Cats. It's a playful introduction to R intended for those who have little to no programming experience.

The bulk of it so far is a primer on data structures, and there's a little bit on functions and some dos and don'ts. It's stuff you should know before you get into more advanced tutorials.

Mainly though: ooo look, kitty.

Once you're done with that (It only takes about 30 minutes.), there are lots of other resources for getting started with R.

Blogs about phylogenetics


I have occasionally been asked about what blogs currently exist in phylogenetics, because there seem to be very few. There are blogs in related areas, such as phyloinformatics, evolutionary biology, and systematics, but very few blogs dedicated primarily to phylogenetics (not just occasionally mentioning it).

Below is a list of the current and former blogs that I know about. In each case I have provided basic information taken from the blog itself. Please let me know about any suitable blogs that have been missed.

Current General Blogs


The Genealogical World of Phylogenetic Networks

Biology, computational science, and networks in phylogenetic analysis. This blog is about the use of networks in phylogenetic analysis, as a replacement for (or an adjunct to) the usual use of trees. This topic has received considerable attention in the biological literature, not least in microbiology (where horizontal gene transfer is often considered to be rampant) and botany (where hybridization has always been considered to be common). It has also received increasing attention in the computational sciences.

Contributors: David Morrison, Steven Kelk, Leo van Iersel, Mike Charleston, Jesper Jansson
Started: 25 February 2012

TreeThinkers

TreeThinkers is a blog devoted to phylogenetic and phylogeny-based inference. We aim to use it as a place to discuss recent research and methods; to ask and answer questions; and serve as a general resource for news and trivia in phylogenetics. Although the blog is associated with the Bodega workshop, we welcome posts and participation from the entire phylogenetics community.

Contributors: Bastien Boussau, Gideon Bradburd, Jeremy Brown, Rich Glor, Tracy Heath, David Hillis, Sebastian Höhna, Luke Mahler, Mike May, Brian Moore, Samantha Price, Peter Wainwright
Editor: Bob Thomson
Started: 2 October 2012

Open Tree of Life

The tree of life links all biodiversity through a shared evolutionary history. This project will produce the first online, comprehensive first-draft tree of all 1.8 million named species, accessible to both the public and scientific communities. Assembly of the tree will incorporate previously-published results, with strong collaborations between computational and empirical biologists to develop, test and improve methods of data synthesis. This initial tree of life will not be static; instead, we will develop tools for scientists to update and revise the tree as new data come in.

Contributors: Robin Blom, Karen Cranston, Karl Gude, Mark Holder, Rosemary Keane, Rick Ree
Started: April 8, 2012

The Bayesian Kitchen

Statistical inference and evolutionary biology. Undoubtedly, since its introduction in phylogenetics in the late 90's, Bayesian inference has become an essential part of current applied statistical work in evolutionary sciences. However, there are still many problems, computational, theoretical and even foundational. After ten years of applied Bayesian work in phylogenetics and in evolutionary genetics, I feel the need to step back and re-think the whole thing.

Contributor: Nicolas Lartillot
Started: 24 December 2013

EvoPhylo

Evolution, phylogenetics, bioinformatics, stuff.

Contributor: Dave Lunt
Started: 30 January 2008


Current Program Blogs


Phylogenetic Tools for Comparative Biology

This web-log chronicles the development of new tools for phylogenetic analyses in the phytools R package. Unless you a reading a very recent page of the blog, I recommend that you install the latest CRAN version of phytools (or latest beta release) before attempting to replicate any of the analyses of this site. That is because the linked functions may be archived, and very likely have been replaced by newer versions.

Contributor: Liam Revell
Started: 11 December 2010

Osiris Phylogenetics

Accessible and reproducible phylogenetics using the Galaxy workflow system.

Contributor: Todd Oakley
Started: 7 September 2012


Blogs Currently in Limbo


Dechronization

Dechronization is authored by evolutionary biologists interested in the development and application of methods for estimating phylogeny and making phylogeny-based inferences. The goal of the blog is to provide a forum for discussion of the latest research and methods, while also providing anecdotes, tidbits of natural history, and other related information.

Contributors: Rich Glor, Luke Harmon, Brian Moore, Tom Near, Dan Rabosky, Liam Revell
Started: 29 April 2008      Last post: 6 June 2011

CYPHY - Cybertaxonomy and Phylogenetics

Mostly harmless pointing at things pertaining to cybertaxonomy and phylogenetics.

Contributor: Matt Yoder
Started: 6 November 2007      Last post: 23 February 2011

Fish Phylogenetics

I created this new blog to share thoughts on work from my research group on the phylogenetics and evolutionary biology of fishes. This will provide a forum to share insight about the studies that we publish, discuss important scientific aspects of fish diversity, reflect on my experiences teaching ichthyology (the study of fishes), and to comment and review contributions by other researchers.

Contributor: Tom Near
Started: 23 August 2012      Last post: 15 September 2012

Taxonomy Phylogeny

Taxonomies group organisms according to phenotype, while phylogenetic systems groups organisms according to shared evolutionary heritage.

Contributor: ???
Started: 1 January 2008      Last post: 31 December 2010

Phylogenetic Geek

A bag of info on phylogenetics.

Contributor: ???
Started: 5 August 2011      Last post: 16 September 2011

DEB Numbers: An Introduction

This is the first of what will hopefully be somewhat regular posts on Division of Environmental Biology data.  Our goals for these posts include: stimulating discussions with our PI community, providing deeper context for understanding the grant review process, and sharing insights we gain through ongoing examination of our programs.

This introductory post outlines some of the things we hope to cover in future posts and provides an overview of the challenges that apply to reporting and interpreting DEB Numbers.

Who will be writing these posts?

In DEB we have several analysts whose duties include assisting in management of the review process and working with the NSF databases to produce reports and analyses as well as taking part in communications with our PI community.  Expect most of the posts under the DEB Numbers heading to be written by and based on the work of the analysts.  Program Officers in the Division will also contribute to Numbers posts but they have other priorities (e.g., managing review of your individual proposals) and expect to author these posts less frequently.

What types of data will be presented?

What we will provide to you are our best efforts to clearly and accurately present the numbers you care about for DEB programs. This includes information on things like submission trends, program demographics, and various portfolio metrics. The scope of the presented data will be limited to the programs housed in the Division of Environmental Biology, though comparisons may be made to public data presented elsewhere. Hopefully, your questions, feedback, and comments will help us to advance our data presentations beyond the static glimpses offered in outreach talks.

We will not, as stated in this blog’s policy notes, discuss or provide data about individual proposals or groups of proposals at a level that the information could be tied to specific applicants.

Timeliness, completeness, and terminology: challenges to presenting data clearly and definitively

Timeliness: The U.S. federal government operates on a fiscal year calendar which runs from October 1 – September 30. NSF reporting and analyses account for each proposal by the fiscal year when a decision was recorded. For instance, full proposals received for the August 2012 deadline ultimately become part of the FY2013 dataset because they are reviewed and processed to completion after October 1, 2012.  However, because we process reviews and decisions throughout the fiscal year, proposal data is coming in right up until the end of business on September 30.  This creates a conflict between our desires to present information quickly and our responsibility to be accurate.

Completeness: We can only show you the data that we have.  Some things we recognize as important and know you care about cannot be definitively reported because of gaps in the data.  For instance, many pieces of key demographic information for reporting on historically underrepresented groups by gender, ethnicity, or career status can be based only on the records the public voluntarily self-reports through approved information collections (the optional PI Information fields of your FastLane profiles).  Other potentially useful data is incomplete because the data fields were recent additions or are specific to certain programs and so are missing from portions of the record.

Terminology: We recognize that the phrasing and terminology we use may be interpreted to different ends by blog participants.  There are many key words and phrases that have a common English use, one or more colloquial uses in the PI community, and a specific technical meaning inside NSF.  Just a few that spring to mind are: proposal, project, jacket, collaborator, year, broader impact, postdoc, ethnicity, underrepresented, minority, program, peer review, and merit review.

Our aim is to explain and consistently apply technical wording (e.g., specifying “fiscal year”).  However, this is no small feat with NSF where we have hundreds of solicitations, submission mechanisms and specialized considerations that vary from office to office and year to year.  We will inevitably miss some terms but welcome the opportunities provided by a blog to quickly see when we have been confusing and clarify.

Responses to these challenges

1) Values for the current fiscal year are tentative. They will be noted, as applicable, with qualifiers such as “estimated (est.)”, “tentative” or “to date (t.d.)”. Generally, final values will not be available until after the start of the following fiscal year or later.

2) Descriptive language will be attached to values where the data has known gaps or limitations (e.g., the success rate for proposals self-reporting a female PI was XX%, at least YY proposals were received from minority PIs).

3) Technical terms will be explained when presented and the explanation referenced when the term is used in future posts.

4) If you are not seeing what you are looking for or something seems off, speak up in the comments.

Coming up next in DEB Numbers: Revisiting performance of PI demographic groups during the first preliminary proposal cycle.


DEB in the News, Feb. 19, 2013

This will be a semi-regular feature where we share with you links to press coverage and other media featuring DEB PIs that came our way. (Note: today’s items cover the last 2-weeks, but that is not a planned frequency at this point.) Some of the items are picked up via the NSF public affairs team, others we only hear about when you tell us.  If you saw something they didn’t, feel free to share it in the comments.

Two DEB Ecosystems Studies program PIs are honored with ASLO 2013 society awards: http://www.aslo.org/meetings/neworleans2013/awards.html

Bruce Peterson, Senior Scientist, The Ecosystems Center, Marine Biological Laboratory, is the 2013 recipient of the Alfred C. Redfield Lifetime Achievement Award

Emily Bernhardt, Associate Professor, Department of Biology, Duke University, is the 2013 recipient of the Yentshc-Schindler Early Career Award

Biodiversity and Disease Transmission

Population and Community Ecology CAREER awardee Pieter Johnson, Assistant Professor, Ecology and Evolutionary Biology, University of Colorado in the news for new paper- popular press here, and for those with access, research paper here.

Long Term Research of Yellowstone Ecosystem (NSF Discovery Article)

Highlights a recent paper in Proceedings of the Royal Society by Kristin Marshall, formerly of Colorado State University (CSU) and now of the National Oceanic and Atmospheric Administration, Tom Hobbs of CSU and David Cooper of CSU.

Scientists build new ‘tree of life’ for placentals, visualize common ancestor (NSF Discovery Article)

Fruit from the Assembling the Tree of Life project featured on NSF website and NYT’s Science section, and for those with access, research article here.

 

Disclosures: DEB Program Officer, Linda Deegan and DEB Division Director, Alan Townsend, both rotators, have had scientific collaborations at their home institutions with PIs Peterson and Johnson, respectively. Tom Hobbs is a former rotating Program Officer in DEB.  The above list includes all DEB-related news items received in the last two weeks; individual affiliations played no role in determining inclusion in this post.