I’ve worked with similar datasets in my research, and here’s an approach that might help:
Use the tidyverse package suite for efficient data manipulation. Start by arranging the data by pupil and date, then group by pupil. Calculate the time difference between consecutive classes for each student. You can then use case_when() to categorize breaks and identify if students continued after a break.
This should give you a good starting point. You might need to adjust the logic for ‘continued_after_break’ depending on your specific requirements. Hope this helps!
Hey there, Iris_92Paint! That’s a really interesting dataset you’re working with. I’m curious about what you’ve tried so far with dplyr. Have you managed to identify the breaks in studies yet?
I’m thinking we could probably use some kind of lag function to compare dates between classes for each student. Maybe something like:
This is just off the top of my head, so it might need some tweaking. What do you think? Have you tried anything similar?
Also, I’m curious about what you’re planning to do with this information once you’ve identified the breaks and continuations. Are you looking at student retention patterns or something like that? It sounds like a really cool project!