Tuesday, December 30, 2014

Technically Speaking: Converting Glimmer predict and gff3 Gene Annotations

Example screenshot of open reading frame
annotation within the Geneious program.
Predicting open reading frames within genomic sequences is probably one of the most basic yet important hallmarks of bioinformatics and sequencing analysis. This is the process by which, given an organism's genomic sequence or a section of that genomic sequence, we predict what sections of that genome are potential genes. At its most basic level, this can be done by looking for sequence regions between start and stop codons (sequence signals for the beginning and end of a gene). While there are many programs for predicting open reading frames, I often use the common Glimmer3 toolkit. This program works great overall, but one drawback is that it can sometimes be hard to visualize your open reading frames on your genome or genomic region (using Geneious or the Integrated Genomics Viewer) because it does not give you a '.gff3' formatted file, which is commonly used by these programs. In this technical post, I am going to focus on the file types you get from Glimmer3, I will explain the .gff3 file type, and I will leave you with a perl script to convert between the two.

Thursday, November 27, 2014

Potential New Roles for Bacteria and B Cells in Promoting the Stomach Flu

A CDC infographic highlighting the infectivity of
Norovirus. <Source>

Introduction

We are all unfortunately familiar with the notorious stomach flu.  Most of us have experienced that awful nausea, vomiting, diarrhea, and tiredness associated with catching some stomach bug.  While there are many viral causes of the stomach flu (also called gastroenteritis), one of the most common is the Norovirus.  The Norovirus is a common and contagious virus that is currently the leading cause of viral gastroenteritis.  The infection can last a few days and, while most people recover, it can cause serious issues such as dehydration, and jeopardize the health of many at-risk populations.  There is also no vaccine against the Norovirus right now (the flu vaccine does not protect you from the stomach flu), although researchers are working on it.

Sunday, November 9, 2014

Using Specific Bacteria to Treat Antibiotic-Induced Diarrheal Disease (C. difficile)

Clostridium difficile establishes infections following
antibiotic treatment and causes diarrheal disease.
<Source>
There has been a lot of talk about the microbiome and Clostridium difficile infections.  This is because patient antibiotic or chemotherapeutic exposure (both of which can destroy your commensal bacterial communities) increases the risk of C. difficile infection.  This observation suggests a role for commensal bacteria in mediating infection resistance.  The exact commensal bacteria that mediate protection against C. difficile infection are not known, but luckily for us, scientists are working on it.  A paper, recently published in Nature, describes a study that sheds light on what bacteria might be offering protection against C. difficile infection.

Sunday, October 12, 2014

Your Artificial Sweeteners, Your Bacteria, and Your Health

It seems like one cannot help hearing about this paper throughout the microbiome and related fields.  The paper "Artificial Sweeteners Induce Glucose Intolerance by Altering the Gut Microbiota" was recently published in Nature, and it has had a lot of press.  Interest in the paper is partially due to its focus on two hot topics: the influence of food on the gut microbiome and aspects of obesity, as well as artificial sweeteners, which have long been a topic of debate.  I presented this paper at our student microbiome journal club a couple of weeks ago, so I wanted to go over it here too.

The big three artificial sweeteners. <Source>
First let's get the media hype out of the way.  This paper has been oversold by the media as a reason to stop eating artificial sweeteners, and the results do not actually support this claim (we will get to this below).  Furthermore, to be honest, the title of the paper oversold itself a bit too.  The results don't quite support the bold title, which also contributed to some of the perceived hype surrounding the paper (we'll get to this below too).  But beyond the hype, this is a pretty great and interesting paper and worth discussing in more detail.

Thursday, September 18, 2014

Microbiome Analysis: Average Sequence Lengths and Looping with xargs

Sometimes I want to easily calculate the average sequence length of our collection of sequences in either a fasta or fastq file.  To address this, I wrote up a couple of small perl scripts to quickly calculate the median sequence length of a fasta or fastq file.  You can find these perl scripts on GitHub in my "Microbiome_sequence_analysis_toolkit" repository.  The nice thing about this script is that it returns the median and file name to the standard output, which makes it easier to loop across many files and collect the results into a single summary file.

Sunday, August 24, 2014

A Microbiome Analysis Toolkit and "Block Fasta" Formatting

I write a lot of scripts in my day-to-day sequence analysis of microbiome data.  While a lot of these are a bit project specific, some of these could be useful for others in their sequence analysis projects.  A while back I posted about a script for formatting Qiime output files for input into the Lefse analysis toolkit, but now I am thinking it would be worth adding more.  Therefore, I changed the "Lefse formatting" repository to be a more general "microbiome sequence analysis toolkit" repository.  This seems like a nice place to periodically add scripts for easy use by others.  To get this new repo started, I added a new script for removing "block fasta" formatting from fasta sequence files.  It's relatively simple, but I think it's also pretty useful.

Sunday, August 3, 2014

To Create an Online Forum for the Virus Ecology Research Community

 The Marine Microbiology Initiative is striving to build a
community of virus ecology resources online.
<Source>
Like bacteria and fungus ecology, virus ecology is an important part of the functioning world around us, as well as an important aspect of our health and wellbeing (viruses are part of the human microbiome after all).  While there is a lot of great virus ecology research going on, there remains a lack of communication and standardization throughout the field.  For example, there are a lot of great analytical tools available online, but it is often unclear which is the best suited for certain analyses, especially for those entering the field.  This lack of communication and research is hindering the research potential of the field as it moves forward.  Luckily for us, the awesome people at the Gordon and Betty Moore Foundation Marine Microbiology Initiative are working on implementing online tools for enhancing communication and methodological standardization across the field.  In this post, I just want to spread the word about their project and their call for proposals.

Sunday, July 20, 2014

Bacterial CRISPRs: Not Just For Targeting Foreign Nucleic Acids

The CRISPR-Cas system has been found to play roles
in the antibiotic resistance of some bacteria. <Source>
In recent years, CRISPRs (Clustered, regularly interspaced, short palindromic repeats) have been gaining popularity in the microbiology field.  Briefly, CRISPRs serve as an adaptive immune system for bacteria, meaning that they are able to remember what viruses (bacteriophages) or other entities have infected them and mount a targeted defensive response the next time they are infected with the same entity (think of it as an analog to our adaptive immune response which uses antibodies and other agents to target invading microbes).  More specifically, the CRISPR-Cas (Cas are the CRISPR associated genes) system facilitates the integration of a small section of the foreign genomic DNA into the CRISPR array within the bacterial genome (see left side of the detailed diagram below).  While in the array, this section of foreign DNA will serve as a template for recognizing the invading genome again if another infection occurs, and the template will be used for targeting that invading genome for rapid destruction.  As can be seen in the figure below, this system is similar to the Eukaryotic RNA-interference system found in organisms including humans.  I've already gotten pretty technical here, and anything more in-depth would be beyond the scope of this post, so please check out reference [1] for further reading.

Sunday, July 13, 2014

Gene Therapy for Hemophilia, Blindness, and Cancer, and the Tools That Make it Possible at the HMGS 2014 Symposium

Every blogger has those times when life gets busy and their blog takes a back seat.  For me, this summer has been one of those times.  Between meetings, research, our family trip back home, and the general effort involved in being a scientist, I have been behind in updating this blog.  Despite my blog slacking, I'm sure it will be worth it when I have more cool stuff to write about in the next couple of months (especially when I have cool new research findings to talk about!).  So let's get started with some awesome catchup!

Dr. Maus presenting her research to the students attending
the symposium.
A month ago, we here at Penn hosted the annual Howard Hughes Medical Institute (HHMI) Med Into Grad Scholars (HMGS) symposium for the participating northeast schools.  This program was made to promote translational research in PhD training by integrating more medically relevant training into the PhD candidate curriculum (including coursework and clinical clerkships).  For more information about the program, and to read about the symposium last year, check out my post here.

Sunday, June 15, 2014

Microbes in Cancer, Staph Infection Biofilms, Improved Lectures and More at ASM 2014


A few weeks ago I attended the American Society for Microbiology (ASM) General Meeting in Boston.  I wrote up a post about the virome workshop I attended (follow this link for the post), but I also want to write out a summary for the rest of the conference.  The ASM meeting was huge (I'm talking thousands of microbiologists) with many full days of great science, so an in depth review of the meeting would be beyond the scope of this single blog post.  Therefore my goal is to give you a brief summary of the meeting, along with some links and resources you can use if you want to get more information.  Additionally, for another summary of the general meeting, check out this ASM summary episode of TWIM.

Sunday, May 25, 2014

Understanding and Analyzing Viral Communities with Shotgun Metagenomics (ASM 2014 Workshop)


Last week I attended the annual American Society for Microbiology (ASM) meeting in Boston.  This is a HUGE meeting (I'm talking thousands of microbiologists all in one place) with a lot going on.  There was so much going on that I decided to break my discussions on the meeting into a couple different posts.  This post is going to be about the viromics (the study of viral communities, which are called viromes) workshop I attended on the first day.

Sunday, April 27, 2014

Abstract Art and Laser Capture Microscopy


The lab is a place where we see some really cool and beautiful things (especially when doing microscopy).  For this post I want to quickly share a cool picture I took in the lab this week, as well as give you a little background about the method I used to get it.

Saturday, April 5, 2014

NDSEG Featured Fellow Announcement

This week the kind folks from the NDSEG (National Defense Science and Engineering Graduate) Fellowship program highlighted me as their weekly featured fellow.  This is an announcement the group puts out weekly on their Facebook and Twitter pages to highlight the cool work their fellows are doing.  I am honored to be their featured fellow, and am thankful for their support.  The announcement from their Facebook page is as follows:




Monday, March 17, 2014

GitHub Education Officially Providing Students and Educators Free Resources

In my last post I talked about PLOS's efforts to improve data sharing, and how important it is to make your data and analyses tools available when you publish, especially in microbiome research.  This importance of sharing published data and analysis scripts was also highlighted in a recent article in the journal Microbiome (see reference below).  Now the code versioning and sharing software site GitHub is even getting on board by officially supporting students and classrooms with free micro and organization accounts to improve coding education and data sharing practices.  I'll also note that, as they point out in their blog, this is something they had done for a while but now they made it official.

Sunday, March 2, 2014

New PLOS Publishing Requirements Aim to Advance Data Sharing Practices

PLOS (the Public Library Of Science) is a popular scientific journal publisher whose journals include PLOS Genetics, PLOS Pathogens, and of course, PLOS ONE.  What makes PLOS stand out is not that they publish great science (which they do, of course), but rather their leadership in open access publishing (open access means that anybody can read their publications for free).  Recently PLOS announced that they will be taking their open access policies to the next level by requiring all published data to be openly and clearly accessible to the public.  Specifically their blog stated that "authors must make all data publicly available, without restriction, immediately upon publication of the article".  This has already sparked some important conversations about the feasibility of such a requirement.

Saturday, February 22, 2014

Modest Data Reported From Oxford Nanopore's Exciting MinION Sequencing Platform

The first data from Oxford Nanopore's promising MinION sequencing platform was released a couple of days ago.  The sequencing data was released at the Advances in Genome Biology and Technology Meeting in Florida, and was presented by Dr David Jaffe of the Broad Institute in Cambridge, MA.  The reported data is a bit underwhelming because it failed to live up to the goals set by the company a couple of years ago, and because it does not seem to offer anything new to the field of DNA sequencing.  Here I am briefly going to go over the way the MinION system works, how it is performing, and what we can possibly expect in the future.  I also have my sources below, so check those out for further reading.

Thursday, February 13, 2014

Microbial Biomarker Discovery and How to Properly Format Your Data (Lefse)

Biomarker discovery is a big part of medical research.  A biomarker is a clinical signal, like the presence of a gene in your genome or colonization of your lungs with a certain bacterial species, whose presence or absence indicates a disease state or predicts an increased risk for developing a disease state.  These play important roles in medicine because they can allow for disease diagnosis (ie. the physician can test for the disease biomarker) or provide a prediction of whether the patient is at a higher risk for developing a condition (ie. a gene that puts a patient at higher risk for developing diabetes).  There are some good programs out there for biomarker discovery, but one I particularly like is Lefse.

Tuesday, February 4, 2014

ReadCube: One Giant Leap for Sceintific Literature Management

Maybe you know the feeling?  A "scientific literature" or "articles" file with a confusing tree of sub-files that you need a map to navigate.  Hundreds of articles titled "Sci_paper01119922838347.pdf" that contain who-knows-what.  Constantly having to navigate your University's version of PubMed or Google Scholar whenever you need a new paper.

Saturday, January 18, 2014

Details and Perspectives as Illumina Announces their Newest DNA Sequencing Machines and the $1,000 Human Genome

INTRODUCTION

A couple of days ago, at the healthcare investment JP Morgan Healthcare Conference, the CEO of Illumina (one of the major DNA sequencing technology companies) announced their newest line sequencing machines.  The two new DNA sequencers are the NextSeq 500 and the HiSeq X10, with the NextSeq 500 being marketed for everyday laboratory use, and the HiSeq X10 being marketed as a factory level, population sequencer (this is the higher power model).  These are going to be powerful, state-of-the-art machines that are going to have a significant impact on both research and clinical applications.  Here I am going to briefly cover what these new machines are and what their release means for contemporary research and clinical applications.  As always, I will also point you in the right direction for further reading, in case you are interested in more.

Sunday, January 12, 2014

Recent Publication: Our Ongoing Study of The Traumatic Wound Microbiome

A few days ago our lab, in collaboration with some folks here at the Hospital of the University of Pennsylvania Orthopaedics Department, published a manuscript in the Journal of Orthopedic Research.  The title of our manuscript was "Culture-independent pilot study of microbiota colonizing open fractures and association with severity, mechanism, location, and complication from presentation to early outpatient follow-up".  This is a report of our ongoing prospective study in which we are characterizing the microbial communities associated with open fracture wounds and their adjacent healthy skin, as well as describing correlations between the microbiome and clinical factors (such as healing complications and wound severity).  Unfortunately this paper is not open access, so you are going to have to access it through a university or local library subscription to the journal.

Sunday, January 5, 2014

Learning Linux: How I Turned My Old PC Into a Linux Server

INTRODUCTION

I do a lot of computational microbiological research, so having a basic understanding of how computers work is essential.  Because computers play such an important role in my research, and because I am just genuinely interested in them, I decided to dive further into the linux world by setting up and maintaining my own personal linux "server".  I always find this kind of thing is a great way for me to learn because it gives me the chance to immerse myself in the environment and forces me to keep learning about how the systems work.  So far this has been a pretty cool experience.  In this post I am going to outline my process of setting up my own linux "server" using an old PC, and go over the pitfalls & decisions I encountered through the process.  I hope this will be an enjoyable little story of my short linux adventure so far, and also hope it will provide you with helpful resources in case you want to try the same thing (and if you do try this on your own, ALWAYS backup your hard drive before you start).  And let's be honest, I will forget how I did it if I don't record it somewhere, and blogs are great places for recording these kinds of outlines.