3  Diving into CGM

3.1 Learning Objectives

  • Understand how to import various data formats (CSV, TXT, Excel, SPSS) into R.

  • Become familiar with raw CGM data structures from different devices (Dexcom, iPro, Libre).

  • Use the cgmanalysis package to clean and standardize raw CGM files.

3.2 Importing and Cleaning CGM Data

Before you can analyze CGM data, you first need to import it into R and then clean it so that it is ready for summaries, visualizations, and clinical reports.

Most CGM devices allow you to export data as a CSV file (Comma‑Separated Values). R can read these files directly. Common columns include:

  • Time – date and time of each glucose reading.
  • Glucose – sensor glucose value (mg/dL or mmol/L).
  • Sensor ID or Patient ID – identifier for the device or patient.

You will use the read.csv() function (or read_csv() from the readr package) to bring the data into R as a data frame.

3.3 Setting the Working Directory

In R, the working directory is the folder where R looks for files to import from. You can check it with getwd() and change it with setwd(). We’ll assume the data used in this chapter is in a folder called cgm_data and you already know where this folder is.

# Check current working directory
getwd()
[1] "/Users/oscar/Lectures Notes. Continuous glucose monitoring data analysis"
# Set working directory (if needed)
# setwd("path/to/your/project/cgm_data")

3.4 Importing Data

R supports many file formats. We’ll focus on those commonly used for CGM data.

3.4.1 Working Directory

The working directory is the folder where R looks for files when you read them, and where it saves outputs by default. You can check it with:

getwd()
[1] "/Users/oscar/Lectures Notes. Continuous glucose monitoring data analysis"

You can change it with:

setwd("path/to/your/folder")

3.5 Method: Using the RStudio Menu

If you prefer a visual approach over writing code, RStudio provides a straightforward way to manage your file paths through the interface.

Step-by-Step Instructions

  1. Navigate to the Menu
    Look at the top toolbar in RStudio and click on the Session menu.

  2. Select Directory Options
    Hover over Set Working Directory and then select Choose Directory… from the submenu.

  3. Locate Your Folder
    A file browser window will appear. Browse to the specific folder where your data files are stored.

  4. Confirm
    Click Open (or Select Folder on some systems).

Verifying the Change

Once you have completed these steps, RStudio automatically updates the working directory for your current session. You can check the current path at any time by typing the following into your console:

mydir = getwd()

With list.files we can check what this folder contains:

list.files(mydir) # the files presented in the folder
[1] "cgm.csv"  "cgm.sav"  "cgm.xlsx"

3.6 Importing .csv files

Comma Separated Values files (.csv) are the most common export format from CGM devices. Use read.csv() from base R or read_csv() from the readr package. In this section, we will worry only about data importation, then we will focus more in variable types, and choice and setting of the right format.

# Base R
datos <- read.csv("cgm_data/cgm.csv", header = TRUE)
head(datos)
  subjectid    timestamp sensorglucose
1         1 1/11/18 4:07           101
2         1 1/11/18 4:12           100
3         1 1/11/18 4:17           101
4         1 1/11/18 4:22           107
5         1 1/11/18 4:27           105
6         1 1/11/18 4:32           105
# tidyverse alternative
library(readr)
datos <- read_csv("cgm_data/cgm.csv")
head(datos)

3.7 Importing excel files

The readxl package provides read_excel(), to import this common data format. On the argument sheet we select the excel spreadsheet to be imported in case there are more than one available.

library(readxl)
datos <- read_excel("cgm_data/cgm.xlsx", sheet = 1)
head(datos)
# A tibble: 6 × 3
  subjectid timestamp           sensorglucose
      <dbl> <dttm>                      <dbl>
1         1 2018-11-01 04:07:04           101
2         1 2018-11-01 04:12:04           100
3         1 2018-11-01 04:17:04           101
4         1 2018-11-01 04:22:05           107
5         1 2018-11-01 04:27:05           105
6         1 2018-11-01 04:32:04           105

3.8 Importing SPSS files (.sav)

The haven package handles SPSS, SAS, or Stata files.

library(haven)
datos <- read_sav("cgm_data/cgm.sav")
head(datos)
# A tibble: 6 × 3
  subjectid timestamp           sensorglucose
      <dbl> <dttm>                      <dbl>
1         1 2018-11-01 04:07:04           101
2         1 2018-11-01 04:12:04           100
3         1 2018-11-01 04:17:04           101
4         1 2018-11-01 04:22:05           107
5         1 2018-11-01 04:27:05           105
6         1 2018-11-01 04:32:04           105

3.9 Raw CGM Data from Different Devices

Raw data files from Dexcom, iPro or Libre have different structures. We will use the cgmanalysis package to convert them into a uniform format suitable for further analysis.

iPro <- read_excel("raw_registers/iPro.xlsx")
head(iPro, 20) # a messy collection of "metada"
# A tibble: 20 × 22
   Medtronic Diabetes iP…¹ ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10
   <chr>                   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 PATIENT INFO            <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
 2 Name                    Test… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
 3 Report Range            43313 to    43320 <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
 4 DEVICE INFO             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
 5 Glucose Sensor Recorder Medt… s/n:… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
 6 Meter                   Life… s/n:… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
 7 Data Exported on        4333… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
 8 DEVICE DATA             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
 9 Number of Records       2051  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
10 Data Time Range         4331… to    4332… <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
11 Index                   Date  Time  Time… Sour… BG R… Used… ISIG… Sens… Sens…
12 1                       43313 0.5   4331… GSR   <NA>  <NA>  <NA>  <NA>  iPro…
13 2                       43313 0.5   4331… GSR   <NA>  <NA>  <NA>  <NA>  iPro…
14 3                       43313 0.5   4331… GSR   <NA>  <NA>  <NA>  <NA>  iPro…
15 4                       43313 0.5   4331… GSR   <NA>  <NA>  <NA>  <NA>  iPro…
16 5                       43313 0.5   4331… GSR   <NA>  <NA>  <NA>  <NA>  iPro…
17 6                       43313 0.5   4331… GSR   <NA>  <NA>  <NA>  <NA>  iPro…
18 7                       43313 0.5   4331… GSR   <NA>  <NA>  <NA>  <NA>  iPro…
19 8                       43313 0.5   4331… GSR   <NA>  <NA>  <NA>  <NA>  iPro…
20 9                       43313 0.5   4331… GSR   <NA>  <NA>  <NA>  <NA>  iPro…
# ℹ abbreviated name: ¹​`Medtronic Diabetes iPro Data Export File (v1.1.0)`
# ℹ 12 more variables: ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>,
#   ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
#   ...20 <chr>, ...21 <chr>, ...22 <chr>

Expected Clean Output When the cleaning works correctly, your data should look like this:

subjectid timestamp sensorglucose
1234567 2018-11-01 00:02:05 117
1234567 2018-11-01 00:07:06 127
1234567 2018-11-01 00:12:06 119

3.9.1 Using cgmanalysis::cleandata

The cleandata() function from the cgmanalysis package automates the cleaning of raw CGM export files. You point it to a folder containing the original files (e.g., CSV exports from sensors), and it processes each file, applies standard cleaning rules, and saves cleaned versions in a separate folder. This ensures every file ends up with the same column names, date‑time format, and validated glucose values.

Arguments

Argument Description
input_folder The path to the folder containing your raw CGM files (e.g., "data/raw/").
output_folder The folder where cleaned files will be saved. If it does not exist, the function will create it.
id_column The name of the column that contains the patient or sensor identifier. The function will keep this column unchanged.
glucose_column The name of the column that contains the glucose readings (e.g., "glucose" or "Sensor Glucose (mg/dL)").
datetime_column The name of the column that contains the date and time of each reading. The function will convert it to POSIXct using the format you specify.
date_format A character string specifying how the date and time are stored in the raw file (e.g., "%Y-%m-%d %H:%M:%S"). See the lubridate or strptime documentation for details.
low_cutoff Optional: a numeric value below which glucose readings are considered invalid and removed (e.g., 20).
high_cutoff Optional: a numeric value above which glucose readings are considered invalid and removed (e.g., 600).
verbose Logical (TRUE or FALSE). If TRUE, the function prints messages about each file as it processes them.

How it works

  1. The function scans the input_folder for all files that match the expected format (usually .csv or .txt).
  2. For each file, it reads the data and checks that the required columns exist.
  3. It converts the datetime_column to POSIXct using date_format.
  4. It filters out rows where glucose is outside the low_cutoff and high_cutoff (if provided).
  5. It saves the cleaned data frame to output_folder with the same filename (or optionally adds a suffix).

Before long, you will be using this function to clean dozens of patient files in seconds — turning raw sensor exports into analysis‑ready data frames without manual work.

Why cleandata() is a game‑changer
Manual cleaning is slow and error‑prone. With this function, you define the rules once and apply them consistently to every file, every time. It is the same principle as the for loop but packaged into a convenient, purpose‑built tool.


# Install and load
install.packages("cgmanalysis")
library(cgmanalysis)
# Clean the raw files
cleandata(
  inputdirectory  = "raw_registers",          # folder with raw files
  outputdirectory = "processed_cgm",  # where cleaned files go
  removegaps = FALSE, # keep gaps in data
  gapfill = TRUE,     # fill short gaps by interpolation
  maximumgap = 20,    # maximum gap length (minutes) to fill
  verbose = TRUE)
[1] "Dexcom.txt"
[1] "iPro.xlsx"
[1] "Libre.csv"

After cleaning, we can read the resulting CSV files.

libre  <- read.csv("processed_cgm/Libre.csv",  header = TRUE)
iPro <- read.csv("processed_cgm/Libre.csv",  header = TRUE)
dexcom <- read.csv("processed_cgm/Dexcom.csv", header = TRUE)

Each cleaned file contains columns: subjectid, timestamp, and sensorglucose.

names(iPro)
[1] "subjectid"     "timestamp"     "sensorglucose"
head(iPro)
            subjectid           timestamp sensorglucose
1      Test Testerson 2018-08-01 12:00:00           117
2 08/01/2018 14:00:00 2018-08-01 12:15:00           127
3 08/15/2018 11:59:00 2018-08-01 12:30:00           125
4                     2018-08-01 12:45:00           119
5                     2018-08-01 13:00:00            99
6                     2018-08-01 13:15:00           104
tail(iPro)
     subjectid           timestamp sensorglucose
1332           2018-08-15 08:44:00           105
1333           2018-08-15 08:59:00           113
1334           2018-08-15 09:14:00           109
1335           2018-08-15 09:29:00           104
1336           2018-08-15 09:44:00            95
1337           2018-08-15 09:59:00            99
head(dexcom)
            subjectid           timestamp sensorglucose
1             1234567 2018-11-01 00:02:05           115
2 11/01/2018 01:02:05 2018-11-01 00:07:06           113
3 11/06/2018 02:31:50 2018-11-01 00:12:06           121
4                     2018-11-01 00:17:06           117
5                     2018-11-01 00:22:06           119
6                     2018-11-01 00:27:05           123
head(libre)
            subjectid           timestamp sensorglucose
1      Test Testerson 2018-08-01 12:00:00           117
2 08/01/2018 14:00:00 2018-08-01 12:15:00           127
3 08/15/2018 11:59:00 2018-08-01 12:30:00           125
4                     2018-08-01 12:45:00           119
5                     2018-08-01 13:00:00            99
6                     2018-08-01 13:15:00           104

3.10 Summary

We learned how to import CGM data from various formats and clean it using cgmanalysis library. In the next session, we’ll start exploring the data and learn basic R programming concepts.