Changes to SRA Data Access on the Google Cloud Platform (GCP)

Sequence Read Archive (SRA) data available via the Google Cloud Platform (GCP) are migrating from multi-region to single region us-east-1. This migration is projected to be complete by May 2024. To minimize the impact of this change, we recommend updating your workflow to access SRA data in us-east-1 region as soon as conveniently possible.  Please note … Continue reading Changes to SRA Data Access on the Google Cloud Platform (GCP)

Changes to SRA Data Access on the Google Cloud Platform (GCP)

Sequence Read Archive (SRA) data available via the Google Cloud Platform (GCP) are migrating from multi-region to single region us-east-1. This migration is projected to be complete by May 2024. To minimize the impact of this change, we recommend updating your workflow to access SRA data in us-east-1 region as soon as conveniently possible.  Please note … Continue reading Changes to SRA Data Access on the Google Cloud Platform (GCP)

Update to GenBank Qualifier

‘Country’ will transition to ‘Geographic Location’ effective June 2024 As announced earlier this year, we will begin to systematically gather ‘location of collection’ and ‘date and time of collection’ for sequence data submitted to GenBank and the Sequence Read Archive (SRA). As part of this effort and to make location data more accurate and informative, we are also changing the … Continue reading Update to GenBank Qualifier

Introducing Pebblescout: Index and Search Petabyte-Scale Sequence Resources Faster than Ever

NCBI is excited to introduce Pebblescout, a freely available web service that allows you to search for sequence matches in very large nucleotide databases, such as runs in the NIH Sequence Read Archive (SRA) and assemblies for whole genome shotgun sequencing projects in Genbank – faster and more efficiently!   Pebblescout uses short segments of your … Continue reading Introducing Pebblescout: Index and Search Petabyte-Scale Sequence Resources Faster than Ever

NCBI Virus: Mutation-Based Search for SARS-CoV-2 Data

Millions of SARS-CoV-2 samples from around the world have been made publicly available as assembled and unassembled sequence data in GenBank and the Sequence Read Archive (SRA). Now you can find sequences with a particular mutation by searching with the protein and the amino acid change (e.g. S:F486V). Visit our SARS-CoV-2 Variant Overview on NCBI … Continue reading NCBI Virus: Mutation-Based Search for SARS-CoV-2 Data

Coming Soon! Including Sample Location and Collection Date and Time for Sequences Submitted to GenBank and SRA

As previously announced, in collaboration with our partners at the International Nucleotide Sequence Database Collaboration (INSDC), we will begin to systematically gather ‘location of collection’ and ‘date and time of collection’ for sequence data submitted to GenBank and the Sequence Read Archive (SRA). Gathering information about where and when a biological sample was collected aligns … Continue reading Coming Soon! Including Sample Location and Collection Date and Time for Sequences Submitted to GenBank and SRA

Streamlining Access to SRA COVID-19 Datasets on the Cloud

To make it easier for you to find and access Sequence Read Archive (SRA) data, we are re-organizing and improving our cloud storage systems.   Beginning April 2023, we will move the SARS-CoV-2 normalized data and source files from the COVID-19 data buckets on Amazon Web Services (AWS) and Google Cloud Platform (GCP) to the NIH … Continue reading Streamlining Access to SRA COVID-19 Datasets on the Cloud

3+ Ways NCBI is Enhancing the SRA Database

Do you submit or access Sequence Read Archive (SRA) data? In an ongoing effort to enhance your experience, NCBI is making several improvements to our widely used SRA database. SRA is the largest publicly available repository of high throughput sequencing data. The archive accepts data from all organisms as well as metagenomic and environmental surveys. … Continue reading 3+ Ways NCBI is Enhancing the SRA Database

Scrubbing human sequence contamination from Sequence Read Archive (SRA) submissions

Do you work with human-derived sequence data? Do you often struggle with the need to determine if your data is free of human sequence and therefore suitable for public distribution? We encourage submitters to screen for and remove contaminating human reads from data files prior to submission to SRA. To support investigators in this effort, … Continue reading Scrubbing human sequence contamination from Sequence Read Archive (SRA) submissions

Announcing the GenBank and SRA Data Processing Webpage

Interested in understanding how sequence data are submitted, processed, and made publicly available in GenBank and the Sequence Read Archive (SRA)? Announcing the GenBank and SRA Data Processing webpage! Here you can learn about procedures that the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), uses for processing submitted … Continue reading Announcing the GenBank and SRA Data Processing Webpage