LiDAR may be the next big advancement in digital health. Here’s what to know about LiDAR and its role in medicine.

Photo by National Cancer Institute on Unsplash

What is LiDAR?

LiDAR, which stands for “Light detection and ranging” is not a new technology, although its recent addition to the plethora of sensors available on the Apple iPhone 12 Pro/iPad Pro has caused a recent buzz. LiDAR works by determining the distance between itself and an object by monitoring how long it takes a pulse of light (often a laser) to bounce back. This is similar to how radar works (but with infrared light instead of radio waves). I like to think of LiDAR as a bat that uses light instead of sound.


An explanation of residuals, sum of squared residuals, simple linear regression, and multiple linear regression with code in R

Linear Regression is one of the first concepts we learn in data science and machine learning. Yet, many are confused by linear regression and the common terminology associated with it. In this article, we explore linear regression step-by-step. We discuss residuals, sum of squared residuals (or errors), simple and multiple linear regression, and linear regression terminology. We then bring everything together in a simple example of linear regression in R.

Image by wikipedia.

We are going to examine the different components of linear regression using a data-set based on the seven countries study, which examined factors that affect cardiovascular disease around the world…


Supervised vs. Unsupervised Machine Learning in Plain Language

Photo by Tim Mossholder on Unsplash

I thought we should tackle one of the first questions asked in machine learning:

“What is the difference between supervised and unsupervised machine learning?”

In the machine learning field, there are two subcategories of machine learning called “supervised” and “unsupervised” learning. No, this doesn’t mean you have to watch supervised models run and you can read a book while an unsupervised model runs. Let’s take a look at what these terms mean and how we might use each of them in data science.

We are going to explain the difference between supervised and unsupervised machine learning using a simple shape…


Feature Engineering for Machine Learning in Plain Language

Photo by William Iven on Unsplash

Machine learning may get all the credit, but the real work is done in engineering features for the machine learning models. Feature engineering is the process of using domain knowledge to create features (or variables) that become the input of a machine learning model. When developing a model, a majority of my time is spent in feature engineering. It doesn’t matter how robust your model is, if you have poor features, you will not achieve high accuracy (i.e. your model will be crap).

The Explanation

Lets say you want to build a model that will tell you every day whether you should…


A beginner’s guide to learning the statistical programming language R

Photo by Roman Synkevych on Unsplash

I created a mini-course on R a couple of years ago and decided to incorporate the main ideas into a blog to reach more people trying to learn R. While I do most of my work in Python, you can’t beat R’s statistical packages and visualizations through ggplot2 (more on this later).

What is R?

R is an open-source (free!) software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS, making it an ideal platform for all your statistical computing needs. R is frequently used by data scientists, statisticians, and researchers.

Why should I learn R?

Next…


How to explain cross validation in plain language

Photo by Franki Chamaki on Unsplash

As a data scientist in the field of digital medicine, I often find myself having to explain machine learning models to non-technical individuals. This happens so frequently, in fact, that I have created a bunch of explanations for common machine learning methods that are so easy you could use them to explain machine learning to grandma! Make sure to check out Explaining Machine Learning to Grandma: Tree-based Models.

Today we are going to be discussing cross validation of machine learning models. …


How to explain tree-based machine learning models in plain language

Photo by Christopher Rusev on Unsplash

As a data scientist in the field of digital medicine, I often find myself having to explain machine learning models to non-technical individuals. This happens so frequently, in fact, that I have created a bunch of explanations for common machine learning methods that are so easy you could use them to explain machine learning to grandma!

Today we are going to be discussing tree-based machine learning models. Tree-based machine learning models, including decision trees, random forests, and gradient boosting, are commonly used machine learning methods.

You wake up and need to decide if you should bring your umbrella with you…


A data scientist’s guide to Thanksgiving — with code to make your own Thanksgiving pie charts

It’s that time of year again — when google searches for turkey, casserole, and pie skyrocket. (Don’t believe me? Check for yourself!)

Thanksgiving is well known for its turkey, which makes sense because 46 million turkeys are consumed on Thanksgiving Day! (Source). However, you may be surprised to know that Americans eat even more pumpkin pie on Thanksgiving than turkey — it is estimated that over 50 million pumpkin pies are eaten on Thanksgiving. (Source). So perhaps instead of nicknaming Thanksgiving “Turkey Day” it ought to be called “Pie Day”. And instead of the 1,000 turkey trots that occur across…


This tutorial is part of dbdpED, the educational platform for digital biomarker discovery. This tutorial is also available as a Jupyter Notebook. This is a beginner tutorial. If you are more advanced, we recommend our other case studies. Before starting, we recommend that you read this blog on the DBDP and the basics of digital biomarker discovery.

In this case study, we will be using continuous glucose monitor (CGM) data. CGMs are commonly used by people with Type 1 Diabetes.


With all of the election maps we have seen this week in the US, I thought it was time to release the next edition of Delightful Figures in Python with — you guessed it — plotting maps in Python!

I wanted to put together a series to guide Python-ers through generating delightful figures. Delightful figures can help you effectively tell your story, add credibility to your data analysis, and give your papers/articles/blogs a pop of fun. In Part 1 of the series, we focused on Donut Plots and in Part 2 of the series we created Word Clouds.

Getting Started

Some basic…

Digital Health Data Scientist | Lead developer at DBDP.org | Final year PhD Candidate at Duke | Ultramarathoner

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store