object1 <- 3 + 2 # this symbol "<-" is the object creator
object1[1] 5
object2 <- "woman" # I did it again
object3 <- TRUE R is a free, open‑source environment built for the statistical analysis of data. In this course, you will use R to analyse continuous glucose monitoring (CGM) data—things like calculating time in range, visualizing glucose trends, and generating summary reports. R is ideal for this because:
It handles time‑stamped sensor data cleanly.
It has ready‑made packages for diabetes metrics (e.g., dplyr, ggplot2, CGM‑related packages).
It makes reproducible work easy: you can save your analysis steps and repeat them for any patient.
You do not need to be a programmer. The course will give you step‑by‑step code and explain every piece.
R does not work like point‑and‑click software. It is program‑oriented: you give commands (code) and R executes them step by step.
Hopefully, with the rise of natural language programming (AI), the whole analysis landscape will change, but in the meantime, learning some core concepts will make you fluent.
Think of an object as a container that holds a piece of information.
In R, everything you work with is an object:
120).Each object has a class: a label that tells R what kind of information it contains and what you can do with it.
| Class | What it stores | CGM example | Typical use / functions |
|---|---|---|---|
numeric |
Numbers | Glucose value (120, 95, 185) | mean(), summary() |
integer |
Whole numbers | Count of hypoglycemic events | min(), max() |
character |
Text | Sensor serial number | table() |
factor |
Categories | “Day” / “Night” | levels() |
POSIXct / Date |
Dates and times | Time of glucose reading | difftime() |
data.frame / tibble |
Tables (rows and columns) | Entire CGM dataset | subset() |
list |
A container that can hold multiple objects | A summary report with numbers, text, and a plot | Access with $ or [ ] |
Date vs. POSIXct
- Date: stores only the calendar date (e.g., "2025-03-20"). Use for daily summaries.
- POSIXct: stores date and time (e.g., "2025-03-20 14:30:00"). Essential for tracing glucose over hours and calculating time‑in‑range.
object1 <- 3 + 2 # this symbol "<-" is the object creator
object1[1] 5
object2 <- "woman" # I did it again
object3 <- TRUE With function class() we may check our objects classes.
class(object1)[1] "numeric"
class(object2)[1] "character"
class(object3)[1] "logical"
The class of an object determines which functions work with it.
If you use the wrong class, R will either give you an error or — even worse — give you a meaningless result without any warning.
Glucose values are usually numeric. You can calculate mean(), median(), min(), max() — all useful for metrics.
character (text), mean() would produce an error.Time stamps (e.g., "2025-03-20 14:30:00") are usually POSIXct. You can calculate time differences (difftime()), extract the hour (lubridate::hour()), or plot glucose over time correctly.
character, a plot would treat time as unrelated text labels, and you could not compute durations.Sensor ID or patient ID is usually character (text) or factor (category). You can group by it (group_by(sensor_id)) or count how many readings each sensor has.
mean() of a patient ID — that would be meaningless, and R will either error or produce nonsense.Always check your class
If a function does not behave as expected, use class() on the object. Many “strange” results or errors in R come from mismatched classes.
When you import CGM data from a CSV file, R sometimes guesses the class incorrectly. You can convert between classes using functions like:
as.numeric(): to turn text that looks like numbers into actual numbers.
as.POSIXct(): to convert text dates into date‑time objects.
as.factor(): to turn text into categories.
You will practice these conversions in the course — they are essential for preparing CGM data for analysis.
Key takeaway: - Objects are the pieces of data you work with. While classes tell R what kind of data each object holds and what actions are allowed.
When working with CGM data, you will encounter two main date/time classes in R:
Date – stores only the calendar date (e.g., 2025-03-20).
Use when: you need daily summaries (e.g., average glucose per day).
POSIXct – stores both date and time (e.g., 2025-03-20 14:30:00).
Use when: you work with sensor traces, need to calculate time between readings, or want to see glucose patterns by hour.
In this course, most of your CGM data will be imported as POSIXct because the time of each glucose reading matters for metrics like time in range, hypoglycemia duration, and for plotting ambulatory glucose profiles (AGP).
Always check your time column
After importing your data, use class(cgm_data$time) to verify it is POSIXct. If it is not, you can convert it with as.POSIXct() — we will practice this together.
date1 <- as.Date("2025-01-24")
date2 <- as.Date("2025-06-15")
class(date1)[1] "Date"
difftime(date2, date1, units = "days")Time difference of 142 days
date_hour1 <- as.POSIXct("2025-01-24 15:30:00")
date_hour2 <- as.POSIXct("01-06-2025 18:30:00", format = "%d-%m-%Y %H:%M:%OS")
difftime(date_hour2, date_hour1, units = "hours")Time difference of 3074 hours
Logical operators are questions you ask the data. In R, the answer to these questions is always a binary “Nursing Assessment”: YES (TRUE) or NO (FALSE).
| Operator | Meaning | Clinical Example |
|---|---|---|
> |
Greater than | Is the Temperature > 38.0°C? |
< |
Less than | Is the Glucose < 70 mg/dL? |
== |
Exactly equal to | Is the Patient ID == "12345"? |
!= |
Not equal to | Is the Heart Rhythm != "Sinus"? |
& |
AND (Both must be true) | Is BP low & is Heart Rate high? |
| |
OR (Either can be true) | Is the patient in pain | having a fever? |
The Double Equal: Notice that we use
==to ask a question (Is this equal to that?). In R, a single=is used to assign a value (like writing a note in a chart).bp = 120(Setting the BP to 120).bp == 120(Asking: “Is the BP 120?”).
In the context of CGM analysis, a for loop is like an automated processing line. Instead of manually opening 250 patient files one by one, the loop handles the entire “queue” (folder) for you. This gives you:
Efficiency: It processes 200 files in the time it takes you to do one.
Safety: It applies the exact same cleaning rules to every file, eliminating “fatigue errors” or manual typos.
Consistency: Every file in your /processed_cgm folder ends up in the same standard format, ready for analysis.
Here is a simple example. The for loop iterates i (or whatever variable name you choose) over a sequence — in this case, from 1 to 5. Each time through the loop, R executes the code inside the curly braces { }, and i takes the next value in the sequence.
i = 1i = 2i = 5This pattern of “do something for each item in a set” is the foundation for automating repetitive tasks like processing multiple patient files.
for (i in 1:5) {
print(i)}[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Clinical Analogy: A
forloop is like an Automated Medication Dispensing System. You program the logic once, and it follows that exact protocol for every single patient in the system, every single time.
We did not come this far to deal with a single value — we need more complex objects to store and manage real CGM data. R provides several data structures, each suited for different kinds of information.
| Data structure | What it holds | CGM example |
|---|---|---|
| Vector | A sequence of values of the same type | A single patient’s glucose readings: c(110, 145, 130, 98) |
| Data frame | A table with rows and columns (different types allowed) | The complete dataset for one patient: time stamps, glucose, sensor ID |
| List | A container that can hold any mix of objects | A report containing numbers, text, a data frame, and a plot |
| Matrix | A two‑dimensional structure where all elements are the same type | Rare in CGM analysis; sometimes used for time‑glucose matrices |
| Array | Multi‑dimensional version of a matrix | Not commonly used in basic CGM workflows |
In this course, you will work most often with:
Now that you understand objects and classes, these structures will feel like natural containers for your CGM data.
A vector is a one‑dimensional collection of elements that are all the same type. In R, vectors are the building blocks for most data structures — a data frame is essentially a collection of vectors of equal length.
CGM examples of vectors
110, 145, 130, 98, 120."08:00", "08:15", "08:30", "08:45", "09:00"."PT001", "PT001", "PT001", "PT002", "PT002".Once you have a vector, you can:
mean, min, max.[].Why vectors matter for CGM analysis
Every column in your CGM dataset is a vector. Understanding vectors helps you manipulate single columns efficiently before combining them into a full data frame.
# Create a vector of glucose values
glucose <- c(110, 145, 130, 98, 120)
# Access elements
glucose[1] # first value[1] 110
glucose[3] # third value[1] 130
# Subset based on condition
glucose[glucose > 130][1] 145
Before long, you will be computing summaries like these on actual patient data (patience).
# Descriptive Statistics
mean(glucose)
min(glucose)
max(glucose)A data.frame is a table‑like structure where columns can have different types. This is the workhorse of CGM analysis — it allows you to store time stamps (as POSIXct), glucose values (as numeric), and patient identifiers (as character or factor) all in one object.
CGM example of a data frame
| time | glucose | sensor_id |
|---|---|---|
| 2025-03-20 08:00:00 | 110 | S01 |
| 2025-03-20 08:15:00 | 145 | S01 |
| 2025-03-20 08:30:00 | 130 | S01 |
| 2025-03-20 08:45:00 | 98 | S02 |
| 2025-03-20 09:00:00 | 120 | S02 |
With a data frame, you can:
Keep all related data together (time, glucose, sensor ID).
Filter rows (e.g., only readings above 180 mg/dL).
Group by columns (e.g., calculate daily averages per sensor).
Create plots that combine multiple columns (e.g., glucose over time).
Proper data analysis starts with how you organize your files before even opening R. Please read: Broman, K. W., & Woo, K. H. (2018). Data Organization in Spreadsheets. The American Statistician, 72(1), 2-10. https://doi.org/10.1080/00031305.2017.1375989
Soon you will be building data frames from real sensor exports and using them to generate clinical insights.
# Create a data frame from vectors
cgm_data <- data.frame(
time = as.POSIXct(c("2025-03-20 08:00:00", "2025-03-20 08:15:00",
"2025-03-20 08:30:00", "2025-03-20 08:45:00",
"2025-03-20 09:00:00")),
glucose = c(110, 145, 130, 98, 120),
sensor_id = c("S01", "S01", "S01", "S02", "S02")
)
# View the data frame
head(cgm_data) time glucose sensor_id
1 2025-03-20 08:00:00 110 S01
2 2025-03-20 08:15:00 145 S01
3 2025-03-20 08:30:00 130 S01
4 2025-03-20 08:45:00 98 S02
5 2025-03-20 09:00:00 120 S02
dim(cgm_data) # number of patients and variables[1] 5 3
names(cgm_data) # variable names[1] "time" "glucose" "sensor_id"
str(cgm_data) # variable classes'data.frame': 5 obs. of 3 variables:
$ time : POSIXct, format: "2025-03-20 08:00:00" "2025-03-20 08:15:00" ...
$ glucose : num 110 145 130 98 120
$ sensor_id: chr "S01" "S01" "S01" "S02" ...
Why data frames matter for CGM analysis
A data frame is the natural format for a patient’s CGM records. It keeps everything organized, and almost every analysis: from time‑in‑range calculations to AGP plots starts with a data frame.
Once your CGM data is in a data frame, you often need to extract specific parts:
R provides several ways to select exactly what you need. The most common are:
data[rows, columns] – using square brackets; leaving rows or columns blank means “all”.data$column_name – extracts a single column as a vector.subset() – a more readable way to filter rows based on conditions.CGM examples
all_data[all_data$patient_id == "PT001", ]all_data$glucoseall_data[1:10, c("time", "glucose")]subset(all_data, glucose > 180)You will use these selections constantly: whether you are isolating a single patient, focusing on nocturnal readings, or extracting glucose values for a summary statistic.
Why selection matters
Real‑world CGM datasets often contain multiple patients, days, or sensor types. Being able to select the exact rows and columns you need is the first step toward any meaningful analysis.
# Select rows where sensor_id is "S01"
cgm_data[cgm_data$sensor_id == "S01", ] time glucose sensor_id
1 2025-03-20 08:00:00 110 S01
2 2025-03-20 08:15:00 145 S01
3 2025-03-20 08:30:00 130 S01
# Select the glucose column as a vector
cgm_data$glucose[1] 110 145 130 98 120
# Select the first 3 rows and only the time and glucose columns
cgm_data[1:3, c("time", "glucose")] time glucose
1 2025-03-20 08:00:00 110
2 2025-03-20 08:15:00 145
3 2025-03-20 08:30:00 130
# Use subset to get hyperglycemia (glucose > 140)
subset(cgm_data, glucose > 140) time glucose sensor_id
2 2025-03-20 08:15:00 145 S01
# Combine: rows for S01 and columns time and glucose
cgm_data[cgm_data$sensor_id == "S01", c("time", "glucose")] time glucose
1 2025-03-20 08:00:00 110
2 2025-03-20 08:15:00 145
3 2025-03-20 08:30:00 130
In real‑world CGM analysis, your data often lives in multiple tables:
To bring everything together for analysis, you need to merge data frames. This is like combining two spreadsheets using a common column — in CGM work, usually the sensor_id or patient_id.
Common types of merges
For most CGM workflows, you will use left joins to attach patient metadata to glucose readings, ensuring no sensor data is lost.
CGM example
You have: - glucose_data – columns: time, glucose, sensor_id - patient_info – columns: sensor_id, age, diabetes_type, target_low, target_high
Merging these allows you to: - Calculate time in range using each patient’s personal targets - Compare glucose patterns by age or diabetes type - Create patient‑specific reports
Before long, you will be merging sensor exports with clinic databases to produce personalized CGM summaries.
Why merging matters
CGM devices export time‑stamped glucose data, but clinical context (patient demographics, insulin regimens) often lives elsewhere. Merging bridges that gap and turns raw numbers into actionable insights.
# Create example data frames
glucose_data <- data.frame(
time = as.POSIXct(c("2025-03-20 08:00:00", "2025-03-20 08:15:00",
"2025-03-20 08:30:00", "2025-03-20 08:45:00")),
glucose = c(110, 145, 130, 98),
sensor_id = c("S01", "S01", "S02", "S02")
)
glucose_data time glucose sensor_id
1 2025-03-20 08:00:00 110 S01
2 2025-03-20 08:15:00 145 S01
3 2025-03-20 08:30:00 130 S02
4 2025-03-20 08:45:00 98 S02
patient_info <- data.frame(
sensor_id = c("S01", "S02"),
age = c(34, 28),
diabetes_type = c("Type 1", "Type 2"),
target_low = c(70, 80),
target_high = c(180, 200)
)
patient_info sensor_id age diabetes_type target_low target_high
1 S01 34 Type 1 70 180
2 S02 28 Type 2 80 200
# Left join: keep all glucose readings, add patient info where available
merged_data <- merge(glucose_data, patient_info,
by = "sensor_id", all.x = TRUE)
# View merged data
merged_data sensor_id time glucose age diabetes_type target_low
1 S01 2025-03-20 08:00:00 110 34 Type 1 70
2 S01 2025-03-20 08:15:00 145 34 Type 1 70
3 S02 2025-03-20 08:30:00 130 28 Type 2 80
4 S02 2025-03-20 08:45:00 98 28 Type 2 80
target_high
1 180
2 180
3 200
4 200
rbind)Sometimes you need to stack data frames on top of each other, for example, combining CGM data from multiple patients into one large table, or appending a new day of readings to an existing dataset.
Row binding does exactly that. The function rbind() takes two or more data frames with the same columns and stacks their rows together.
CGM examples
You have separate CSV files for each patient. After importing them one by one, you use rbind() to combine them into a single data frame for analysis across your entire clinic.
A patient wears a new sensor. You want to add the new readings to their existing data frame.
You download data for the same patient from two different months and want a continuous timeline.
Important: For rbind() to work, the data frames must have identical column names and the same order (or at least matching column structures). If they differ, you can align them first.
Soon you will be combining hundreds of patient files into one master data frame
Why row binding matters
Real CGM datasets are often split across multiple files or time periods. rbind() lets you bring everything together so you can analyze the full picture without manually copying and pasting in Excel.
# Create two data frames for the same patient on different days
day1 <- data.frame(
time = as.POSIXct(c("2025-03-20 08:00:00", "2025-03-20 08:15:00")),
glucose = c(110, 145),
sensor_id = "S01"
)
day1 time glucose sensor_id
1 2025-03-20 08:00:00 110 S01
2 2025-03-20 08:15:00 145 S01
day2 <- data.frame(
time = as.POSIXct(c("2025-03-21 08:00:00", "2025-03-21 08:15:00")),
glucose = c(130, 98),
sensor_id = "S01"
)
day2 time glucose sensor_id
1 2025-03-21 08:00:00 130 S01
2 2025-03-21 08:15:00 98 S01
# Stack them
all_readings <- rbind(day1, day2)
# View combined data
all_readings time glucose sensor_id
1 2025-03-20 08:00:00 110 S01
2 2025-03-20 08:15:00 145 S01
3 2025-03-21 08:00:00 130 S01
4 2025-03-21 08:15:00 98 S01
We covered the basics of R objects, data structures, and merging. These skills will be essential for handling CGM data.