LiDAR, which stands for “Light detection and ranging” is not a new technology, although its recent addition to the plethora of sensors available on the Apple iPhone 12 Pro/iPad Pro has caused a recent buzz. LiDAR works by determining the distance between itself and an object by monitoring how long it takes a pulse of light (often a laser) to bounce back. This is similar to how radar works (but with infrared light instead of radio waves). I like to think of LiDAR as a bat that uses light instead of sound.
Linear Regression is one of the first concepts we learn in data science and machine learning. Yet, many are confused by linear regression and the common terminology associated with it. In this article, we explore linear regression step-by-step. We discuss residuals, sum of squared residuals (or errors), simple and multiple linear regression, and linear regression terminology. We then bring everything together in a simple example of linear regression in R.
We are going to examine the different components of linear regression using a data-set based on the seven countries study, which examined factors that affect cardiovascular disease around the world…
I thought we should tackle one of the first questions asked in machine learning:
“What is the difference between supervised and unsupervised machine learning?”
In the machine learning field, there are two subcategories of machine learning called “supervised” and “unsupervised” learning. No, this doesn’t mean you have to watch supervised models run and you can read a book while an unsupervised model runs. Let’s take a look at what these terms mean and how we might use each of them in data science.
We are going to explain the difference between supervised and unsupervised machine learning using a simple shape…
Machine learning may get all the credit, but the real work is done in engineering features for the machine learning models. Feature engineering is the process of using domain knowledge to create features (or variables) that become the input of a machine learning model. When developing a model, a majority of my time is spent in feature engineering. It doesn’t matter how robust your model is, if you have poor features, you will not achieve high accuracy (i.e. your model will be crap).
Lets say you want to build a model that will tell you every day whether you should…
I created a mini-course on R a couple of years ago and decided to incorporate the main ideas into a blog to reach more people trying to learn R. While I do most of my work in Python, you can’t beat R’s statistical packages and visualizations through ggplot2 (more on this later).
R is an open-source (free!) software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS, making it an ideal platform for all your statistical computing needs. R is frequently used by data scientists, statisticians, and researchers.
As a data scientist in the field of digital medicine, I often find myself having to explain machine learning models to non-technical individuals. This happens so frequently, in fact, that I have created a bunch of explanations for common machine learning methods that are so easy you could use them to explain machine learning to grandma! Make sure to check out Explaining Machine Learning to Grandma: Tree-based Models.
Today we are going to be discussing cross validation of machine learning models. …
As a data scientist in the field of digital medicine, I often find myself having to explain machine learning models to non-technical individuals. This happens so frequently, in fact, that I have created a bunch of explanations for common machine learning methods that are so easy you could use them to explain machine learning to grandma!
Today we are going to be discussing tree-based machine learning models. Tree-based machine learning models, including decision trees, random forests, and gradient boosting, are commonly used machine learning methods.
You wake up and need to decide if you should bring your umbrella with you…
It’s that time of year again — when google searches for turkey, casserole, and pie skyrocket. (Don’t believe me? Check for yourself!)
Thanksgiving is well known for its turkey, which makes sense because 46 million turkeys are consumed on Thanksgiving Day! (Source). However, you may be surprised to know that Americans eat even more pumpkin pie on Thanksgiving than turkey — it is estimated that over 50 million pumpkin pies are eaten on Thanksgiving. (Source). So perhaps instead of nicknaming Thanksgiving “Turkey Day” it ought to be called “Pie Day”. And instead of the 1,000 turkey trots that occur across…
This tutorial is part of dbdpED, the educational platform for digital biomarker discovery. This tutorial is also available as a Jupyter Notebook. This is a beginner tutorial. If you are more advanced, we recommend our other case studies. Before starting, we recommend that you read this blog on the DBDP and the basics of digital biomarker discovery.
In this case study, we will be using continuous glucose monitor (CGM) data. CGMs are commonly used by people with Type 1 Diabetes.
With all of the election maps we have seen this week in the US, I thought it was time to release the next edition of Delightful Figures in Python with — you guessed it — plotting maps in Python!
I wanted to put together a series to guide Python-ers through generating delightful figures. Delightful figures can help you effectively tell your story, add credibility to your data analysis, and give your papers/articles/blogs a pop of fun. In Part 1 of the series, we focused on Donut Plots and in Part 2 of the series we created Word Clouds.
Some basic…
Digital Health Data Scientist | Lead developer at DBDP.org | Final year PhD Candidate at Duke | Ultramarathoner