# Check current working directory
getwd()[1] "/Users/oscar/Lectures Notes. Continuous glucose monitoring data analysis"
# Set working directory (if needed)
# setwd("path/to/your/project/cgm_data")Understand how to import various data formats (CSV, TXT, Excel, SPSS) into R.
Become familiar with raw CGM data structures from different devices (Dexcom, iPro, Libre).
Use the cgmanalysis package to clean and standardize raw CGM files.
Before you can analyze CGM data, you first need to import it into R and then clean it so that it is ready for summaries, visualizations, and clinical reports.
Most CGM devices allow you to export data as a CSV file (Comma‑Separated Values). R can read these files directly. Common columns include:
You will use the read.csv() function (or read_csv() from the readr package) to bring the data into R as a data frame.
In R, the working directory is the folder where R looks for files to import from. You can check it with getwd() and change it with setwd(). We’ll assume the data used in this chapter is in a folder called cgm_data and you already know where this folder is.
# Check current working directory
getwd()[1] "/Users/oscar/Lectures Notes. Continuous glucose monitoring data analysis"
# Set working directory (if needed)
# setwd("path/to/your/project/cgm_data")R supports many file formats. We’ll focus on those commonly used for CGM data.
The working directory is the folder where R looks for files when you read them, and where it saves outputs by default. You can check it with:
getwd()[1] "/Users/oscar/Lectures Notes. Continuous glucose monitoring data analysis"
You can change it with:
setwd("path/to/your/folder").csv filesComma Separated Values files (.csv) are the most common export format from CGM devices. Use read.csv() from base R or read_csv() from the readr package. In this section, we will worry only about data importation, then we will focus more in variable types, and choice and setting of the right format.
# Base R
datos <- read.csv("cgm_data/cgm.csv", header = TRUE)
head(datos) subjectid timestamp sensorglucose
1 1 1/11/18 4:07 101
2 1 1/11/18 4:12 100
3 1 1/11/18 4:17 101
4 1 1/11/18 4:22 107
5 1 1/11/18 4:27 105
6 1 1/11/18 4:32 105
# tidyverse alternative
library(readr)
datos <- read_csv("cgm_data/cgm.csv")
head(datos)The readxl package provides read_excel(), to import this common data format. On the argument sheet we select the excel spreadsheet to be imported in case there are more than one available.
library(readxl)
datos <- read_excel("cgm_data/cgm.xlsx", sheet = 1)
head(datos)# A tibble: 6 × 3
subjectid timestamp sensorglucose
<dbl> <dttm> <dbl>
1 1 2018-11-01 04:07:04 101
2 1 2018-11-01 04:12:04 100
3 1 2018-11-01 04:17:04 101
4 1 2018-11-01 04:22:05 107
5 1 2018-11-01 04:27:05 105
6 1 2018-11-01 04:32:04 105
.sav)The haven package handles SPSS, SAS, or Stata files.
library(haven)
datos <- read_sav("cgm_data/cgm.sav")
head(datos)# A tibble: 6 × 3
subjectid timestamp sensorglucose
<dbl> <dttm> <dbl>
1 1 2018-11-01 04:07:04 101
2 1 2018-11-01 04:12:04 100
3 1 2018-11-01 04:17:04 101
4 1 2018-11-01 04:22:05 107
5 1 2018-11-01 04:27:05 105
6 1 2018-11-01 04:32:04 105
Raw data files from Dexcom, iPro or Libre have different structures. We will use the cgmanalysis package to convert them into a uniform format suitable for further analysis.
iPro <- read_excel("raw_registers/iPro.xlsx")
head(iPro, 20) # a messy collection of "metada"# A tibble: 20 × 22
Medtronic Diabetes iP…¹ ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 PATIENT INFO <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Name Test… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
3 Report Range 43313 to 43320 <NA> <NA> <NA> <NA> <NA> <NA>
4 DEVICE INFO <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 Glucose Sensor Recorder Medt… s/n:… <NA> <NA> <NA> <NA> <NA> <NA> <NA>
6 Meter Life… s/n:… <NA> <NA> <NA> <NA> <NA> <NA> <NA>
7 Data Exported on 4333… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
8 DEVICE DATA <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
9 Number of Records 2051 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
10 Data Time Range 4331… to 4332… <NA> <NA> <NA> <NA> <NA> <NA>
11 Index Date Time Time… Sour… BG R… Used… ISIG… Sens… Sens…
12 1 43313 0.5 4331… GSR <NA> <NA> <NA> <NA> iPro…
13 2 43313 0.5 4331… GSR <NA> <NA> <NA> <NA> iPro…
14 3 43313 0.5 4331… GSR <NA> <NA> <NA> <NA> iPro…
15 4 43313 0.5 4331… GSR <NA> <NA> <NA> <NA> iPro…
16 5 43313 0.5 4331… GSR <NA> <NA> <NA> <NA> iPro…
17 6 43313 0.5 4331… GSR <NA> <NA> <NA> <NA> iPro…
18 7 43313 0.5 4331… GSR <NA> <NA> <NA> <NA> iPro…
19 8 43313 0.5 4331… GSR <NA> <NA> <NA> <NA> iPro…
20 9 43313 0.5 4331… GSR <NA> <NA> <NA> <NA> iPro…
# ℹ abbreviated name: ¹`Medtronic Diabetes iPro Data Export File (v1.1.0)`
# ℹ 12 more variables: ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>,
# ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
# ...20 <chr>, ...21 <chr>, ...22 <chr>
Expected Clean Output When the cleaning works correctly, your data should look like this:
| subjectid | timestamp | sensorglucose |
|---|---|---|
| 1234567 | 2018-11-01 00:02:05 | 117 |
| 1234567 | 2018-11-01 00:07:06 | 127 |
| 1234567 | 2018-11-01 00:12:06 | 119 |
| … | … | … |
cgmanalysis::cleandataThe cleandata() function from the cgmanalysis package automates the cleaning of raw CGM export files. You point it to a folder containing the original files (e.g., CSV exports from sensors), and it processes each file, applies standard cleaning rules, and saves cleaned versions in a separate folder. This ensures every file ends up with the same column names, date‑time format, and validated glucose values.
Arguments
| Argument | Description |
|---|---|
input_folder |
The path to the folder containing your raw CGM files (e.g., "data/raw/"). |
output_folder |
The folder where cleaned files will be saved. If it does not exist, the function will create it. |
id_column |
The name of the column that contains the patient or sensor identifier. The function will keep this column unchanged. |
glucose_column |
The name of the column that contains the glucose readings (e.g., "glucose" or "Sensor Glucose (mg/dL)"). |
datetime_column |
The name of the column that contains the date and time of each reading. The function will convert it to POSIXct using the format you specify. |
date_format |
A character string specifying how the date and time are stored in the raw file (e.g., "%Y-%m-%d %H:%M:%S"). See the lubridate or strptime documentation for details. |
low_cutoff |
Optional: a numeric value below which glucose readings are considered invalid and removed (e.g., 20). |
high_cutoff |
Optional: a numeric value above which glucose readings are considered invalid and removed (e.g., 600). |
verbose |
Logical (TRUE or FALSE). If TRUE, the function prints messages about each file as it processes them. |
How it works
input_folder for all files that match the expected format (usually .csv or .txt).datetime_column to POSIXct using date_format.low_cutoff and high_cutoff (if provided).output_folder with the same filename (or optionally adds a suffix).Before long, you will be using this function to clean dozens of patient files in seconds — turning raw sensor exports into analysis‑ready data frames without manual work.
Why cleandata() is a game‑changer
Manual cleaning is slow and error‑prone. With this function, you define the rules once and apply them consistently to every file, every time. It is the same principle as the for loop but packaged into a convenient, purpose‑built tool.
# Install and load
install.packages("cgmanalysis")library(cgmanalysis)# Clean the raw files
cleandata(
inputdirectory = "raw_registers", # folder with raw files
outputdirectory = "processed_cgm", # where cleaned files go
removegaps = FALSE, # keep gaps in data
gapfill = TRUE, # fill short gaps by interpolation
maximumgap = 20, # maximum gap length (minutes) to fill
verbose = TRUE)[1] "Dexcom.txt"
[1] "iPro.xlsx"
[1] "Libre.csv"
After cleaning, we can read the resulting CSV files.
libre <- read.csv("processed_cgm/Libre.csv", header = TRUE)
iPro <- read.csv("processed_cgm/Libre.csv", header = TRUE)
dexcom <- read.csv("processed_cgm/Dexcom.csv", header = TRUE)Each cleaned file contains columns: subjectid, timestamp, and sensorglucose.
names(iPro)[1] "subjectid" "timestamp" "sensorglucose"
head(iPro) subjectid timestamp sensorglucose
1 Test Testerson 2018-08-01 12:00:00 117
2 08/01/2018 14:00:00 2018-08-01 12:15:00 127
3 08/15/2018 11:59:00 2018-08-01 12:30:00 125
4 2018-08-01 12:45:00 119
5 2018-08-01 13:00:00 99
6 2018-08-01 13:15:00 104
tail(iPro) subjectid timestamp sensorglucose
1332 2018-08-15 08:44:00 105
1333 2018-08-15 08:59:00 113
1334 2018-08-15 09:14:00 109
1335 2018-08-15 09:29:00 104
1336 2018-08-15 09:44:00 95
1337 2018-08-15 09:59:00 99
head(dexcom) subjectid timestamp sensorglucose
1 1234567 2018-11-01 00:02:05 115
2 11/01/2018 01:02:05 2018-11-01 00:07:06 113
3 11/06/2018 02:31:50 2018-11-01 00:12:06 121
4 2018-11-01 00:17:06 117
5 2018-11-01 00:22:06 119
6 2018-11-01 00:27:05 123
head(libre) subjectid timestamp sensorglucose
1 Test Testerson 2018-08-01 12:00:00 117
2 08/01/2018 14:00:00 2018-08-01 12:15:00 127
3 08/15/2018 11:59:00 2018-08-01 12:30:00 125
4 2018-08-01 12:45:00 119
5 2018-08-01 13:00:00 99
6 2018-08-01 13:15:00 104
We learned how to import CGM data from various formats and clean it using cgmanalysis library. In the next session, we’ll start exploring the data and learn basic R programming concepts.