Research

I'm a member of the Notre Dame Bioinformatics Lab. My current research involves research on genetic data. Currently, I'm involved in research with Expressed Sequence Tags (ESTs). My work thus far has been parallelizing applications to run in a grid environment, primarily our 500 node campus Condor grid. My future work will be continuing on ESTs relating to the Vectorbase site.

My primary work in the NDBL is on scalable bioinformatics. This work is heavily collaborative with members of the Cooperative Computing Lab and also with members of the Department of Biological Sciences. This work produced a paper that was presented at the 5th annual Workshop on Workflows in Support of Large-Scale Science which occurred in conjunction with Supercomputing 2010 in New Orleans (Slides, Paper). My work on the MAKER genome annotation tool was presented at the 2nd IEEE Conference on Computational Advances in Bio and Medical Sciences (ICCABS) 2012. (Slides, Paper, coming soon)

My initial work after arriving at Notre Dame involved the parallelization of the sequence alignment tool SSAHA. This work was presented at an ISMB 2010 poster session in Boston in July (Poster). This work was also included in the article "Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions." in Journal of Cluster Computing, September 2010. (Paper)

A full list of publications is available in my CV

Course Related Research

CSE 60641 - Graduate Operating Systems

AVATAR - AVATAR is an abstraction for making use of virtual machines in a distributed computing environment. The goal of AVATAR is to present a homogeneous set of resources to the user of a heterogeneous grid resource, such as Condor. An additional goal is to provide this service in a way that is nearly transparent to the user. Therefore AVATAR only requires the user to provide the requirements of their job in a similar way as they would to Condor. Our system then takes these requirements, checks them against the remote host on which the job is executing and decides whether or not to run a virtual machine or just execute the job natively. In the case where a virtual machine is necessary, the system fetches the filesystem and kernel and executes an instance of a virtual machine, in which the job will execute. The output of the job is then returned to the local filesystem where the grid system, such as Condor, can pick up the output and return it to the user.

CSE 60543 - Algorithms for Biological Networks

Parallel short read assembly - Applied networks methods, primarily data and graph partitioning ideas, to data and graphs produced during modern short read assembly. The goal of this project is to reduce the very high RAM requirements associated with DeBruijn graph based assembly.

CSE 60532 - Bioinformatics Computing

Examined EST data from the Salt Cress project looking for structural variations between two related populations and between three related species. Utilized the EST pipeline described in my paper at Works 2010 and also utilized in O'neil et. al. "Population-level transcriptome sequencing of non-model organisms Erynnis propertius and Papilio zelicaon"