For interested parties, this page describes both my working skill set and the different domains in which I have previous experience. I evidence these claims with a detailed list of my projects and publications.
Resume / Academic CV / Blog / LinkedIn
As a developer, writing code is the major component of my regular workflow. I have experience building software with system-level languages like C, Python, and Java. For analysis and system administration, I have employed scripting languages like awk & Python to process and clean raw data files for input into databases, or data analysis environments like Jupyter and RStudio. A detailed list of familiar languages and software tools can be found on my experience timeline.
Looking at the data is a critical step in almost any data analysis. As part of the InfoVis group at UBC, I have discussed visualization techniques and evaluations since 2005. In my own experience, I use Tableau, or dplyr and ggplot2 for exploration of static tables, and GNU Octave/Matlab RStudio, and IPython Notebooks for iterative exploration while developing algorithms.
Stats/ML are a suite of techniques that help build predictive models of data distributions as well as give guidance about how much confidence to put in those models. As head of analytics at Coho Data, I have employed time-series analysis and clustering to analyze customer usage patterns. As a doctoral student, I have devised "unsupervised learning" techniques for data exploration. As a quant, I have used a variety of supervised learning techniques for fitting model parameters and statistical hypothesis testing for confirmatory analyses.
Familiarity with computer graphics APIs like WebGL/OpenGL has yielded at least two benefits for me as a researcher/analyst. First, I have leveraged the graphics pipeline to build novel visualization techniques that scale to large datasets. Second, I have exploited the parallelism of graphics processors (GPUs) to speed up existing analysis techniques.
My team and I have designed and developed an analytics system for analyzing storage product performance, events, and alerts. The system is deployed in AWS, is designed to easily scale with demand, and uses a flexible elasticsearch back end.
I have also developed different methods for analyzing the workload statistics of computer storage systems. I am interested in analyzing block-level storage traces for:
My exposure to finance is on the trading side of things. I have researched the following:
I have been involved in several projects to help researchers navigate unordered collections of documents. These projects are:
docs
example)My experience with scientific computing has focused on numerical linear algebra and nonlinear optimization, having taken graduate courses in both these topics. In my own research, I have experience with
I consulted with BC Cancer Agency to development software for analyzing DNA copy number alterations. The project involved collaborating with a lead researcher to design a visual console for analyzing copy numbers and labels across chromosomes.
The Counter Stack is a compressed representation of a request stream. With Counter Stacks, one can calculate cache-usage statistics in sub-linear space for any time interval without storing the entire trace.
Color Thief is an iOS app for color transfer. It harnesses a fast GPU code I wrote for quickly swapping colors between photos using phone hardware.
Overview is a tool for exploring large, unlabelled document datasets. It is targeted at helping journalists quickly analyze text dumps.
Glimmer is an algorithm for fast multidimensional scaling. Significant speed gains are achieved by leveraging GPU parallelism.
I built a graph visualization tool based on Jarke van Wijk's paper using the prefuse toolkit.
The MetaCombine Project at Emory University assessed using semantic clustering as a means of exploring digital libraries.
This report describes an algorithm for computing fast nonnegative matrix factorizations.
In 2004, my PC game, Growbot, was one of the Student Showcase winners in the Independent Games Festival.
I1 is an algorithm for stochastic optimization based on Information Filtering. It is designed to complement the optimization algorithm called K1 based on the Kalman Filter.
I wrote a tutorial on different matrix reordering algorithms. Minimum degree matrix reordering increases the efficiency of the matrix inverse.