A single wrapper function to rule them all

Each function in the catapultR package has a very specific purpose. For this reason, it is often necessary to use several of them in order to access all desired data. This post introduces an alternative approach using an external function that serves as a wrapper for many commonly used catapultR functions.

Brian Hart (Catapult Sports - Data Science Team)www.catapultsports.com
02-26-2020

Difficulty Level: Beginner

Problem Statement

Due to the narrow scope of each individual catapultR function, access to a complete set of data often requires combining outputs from several functions. This is intentional because it helps the user better understand the organization of Catapult data in the cloud. The specificity also allows each individual function more flexibility. However, for many common data queries, it would be convenient to have a single function that could be used to access and join the data. The image below contains a simplified representation of the OpenField data model accessible through the Connect API.

This blog post introduces a function (and the thought process behind it) that aims to simplify the process for accessing data. It is not meant as a replacement for all requests, and it is recommended that users customize the logic to meet their specific needs.

Solving the Problem

To begin, a more clearly defined set or requirements is useful. Ultimately, we want a function that:

The function will combine ofCloudGetActivities(), ofCloudGetAthleteDevicesInActivity(), ofCloudGetStatistics(), ofCloudGetActivityEvents(), ofCloudGetActivityEfforts(), and ofCloudGetActivitySensorData() into a single function.

Generalized approach to data acquisition

In general, the function will use the following logic:

  1. Query activities for specified date range
    • If data requested is aggregated data, just use the statistics table
    • Else, for each activity, determine athlete involvement
      1. For each activity-athlete combination, query data (events, efforts, or ten Hz sensor data)
      2. Join athlete, activity, and period information to data
  2. Return data and total query count

Function Arguments: The get_cat_data() function takes the following arguments:

For the actual code used to create this function, you can download this R script file below. Feel free to alter the code to better meet your needs. It is not currently supported in the catapultR package and is intended as an example as it does not meet production level standards in terms of error handling and efficiency.

Download one_func_to_rule_them_all.R


Examples

We will use an anonymized rugby account for a few examples. The first step is always to login. We will also load a few additional packages and source the R script for the function. You will see that we can use the function get_cat_data() to accomplish almost everything by changing a few of the arguments.

To keep from printing excessively large tables in this post, all dataframes will be limited to the first 100 rows.

You will see in the examples below that several helpful messages are provided with the get_cat_data() function. There are also built-in progress bars since many of the queries take a while to complete.

Login:


# first, load a few packages
library(catapultR)
library(tidyverse)
library(lubridate)
source("one_func_to_rule_them_all.R")

token <- ofCloudCreateToken(sToken = "8SyAJZonGYUH5adDgINl1re7WkOPXhB0EtbVzcMp",  sRegion = "America")


Now that we’re logged in, we can see what activities are available.

Get Activities: Below, we will get activities for a specified date range with the data_type argument set to “activities_only”.


activities <-  get_cat_data(credentials = token, 
                            data_type = "activities_only", 
                            from = "2021-01-01", to = Sys.Date())

rmarkdown::paged_table(head(activities, 100))

[1] "Total API Query Count: 1"


Get Statistics: Now, we will get statistics aggregated by activity and period by changing the data_type to “stats” and including parameters as arguments.


stats <- get_cat_data(credentials = token, 
                      data_type = "stats", 
                      from = "2021-03-01", to = "2021-03-20", 
                      options = list(stats_params = c("total_player_load", "max_vel")))

rmarkdown::paged_table(head(stats, 100))

Total API Query Count: 1


We can filter this result by specifying either a vector of athlete_ids or position_names in the options argument. Below, we will make the same request but with a filter so that only centre and flanker positions will be returned.


stats <- get_cat_data(credentials = token, 
                      data_type = "stats", 
                      from = "2021-03-01", to = "2021-03-20", 
                      options = list(stats_params = c("total_player_load", "max_vel"),
                      athlete_filter = list(position_name = c("Centre", "Flanker"))))

rmarkdown::paged_table(head(stats, 100))

Total API Query Count: 1

Joining, by = "athlete_name"

Joining, by = "activity_name"


Sport-Specific Events & IMA Events: Very similar logic can be used to access events for multiple athletes over several activities. Instead of “stats” as the data_type and stats_params, we now use “events” as the data_type and provide event_types to query.

If you set the interact argument to TRUE, you will be prompted with messages as the function progresses, and you will be given the option to exit if the number of API queries seems too high.


events <- get_cat_data(credentials = token, 
                       data_type = "events", 
                       from = "2019-11-01", to = "2019-11-08", 
                       options = list(event_types = c("rugby_union_scrum", 
                                                      "rugby_union_contact_involvement")),
                       interact = FALSE)

rmarkdown::paged_table(head(events, 100))

There are 11 activities containing athletes.
All empty activities will be removed.

  |                                                                  
  |                                                            |   0%
  |                                                                  
  |=====                                                       |   9%
  |                                                                  
  |===========                                                 |  18%
  |                                                                  
  |================                                            |  27%
  |                                                                  
  |======================                                      |  36%
  |                                                                  
  |===========================                                 |  45%
  |                                                                  
  |=================================                           |  55%
  |                                                                  
  |======================================                      |  64%
  |                                                                  
  |============================================                |  73%
  |                                                                  
  |=================================================           |  82%
  |                                                                  
  |=======================================================     |  91%
  |                                                                  
  |============================================================| 100%
Final Touches... almost complete...
Total API Query Count: 157

Joining, by = "activity_name"
Joining, by = "activity_name"

Joining, by = "period_name"
Joining, by = "period_name"

Joining, by = "athlete_id"

Joining, by = "athlete_name"

Joining, by = "athlete_first_name"

Joining, by = "athlete_last_name"

Joining, by = "device_id"

The table above contains lots of information, but by scrolling to the right you can see specific information related to the rugby scrums and contact involvements.

To get a quick count of the number of each event type by athlete, you can use the table() function.


head(table(events$athlete_name, events$event_type), 10) # just see 10 athletes

                       
                        rugby_union_contact_involvement
  Aaron Medrano                                      88
  Ahlaam el-Shariff                                  39
  Alexander Gram                                     35
  Alexander Jamili                                   37
  Alexandra Babcock                                 172
  Angelica Martinez                                  97
  Behron Victoria                                     1
  Charya Kim                                         33
  Ciprie Rodriguez Loya                              64
  Cody Kim                                           40
                       
                        rugby_union_scrum
  Aaron Medrano                         0
  Ahlaam el-Shariff                     0
  Alexander Gram                        0
  Alexander Jamili                      0
  Alexandra Babcock                    31
  Angelica Martinez                     1
  Behron Victoria                       0
  Charya Kim                           14
  Ciprie Rodriguez Loya                 2
  Cody Kim                             16


GPS-Based Velocity and Acceleration Efforts: GPS-based efforts can also be accessed with similar logic. The data_type should now be “efforts” and instead of event_types, we now use effort_types. You can specify velocity efforts only (“vel”), acceleration efforts only (“accel”), or both velocity and acceleration efforts (“both”).

Again, if you set the interact argument to TRUE, you will be prompted with messages as the function progresses, and you will be given the option to exit if the number of API queries seems too high.


efforts <- get_cat_data(credentials = token, 
                       data_type = "efforts", 
                       from = "2019-03-01", to = "2019-03-07", 
                       options = list(effort_types = c("both")),
                       interact = FALSE)

rmarkdown::paged_table(head(efforts), 100)

There are 7 activities containing athletes.
All empty activities will be removed.

  |                                                                  
  |                                                            |   0%
  |                                                                  
  |=========                                                   |  14%
  |                                                                  
  |=================                                           |  29%
  |                                                                  
  |==========================                                  |  43%
  |                                                                  
  |==================================                          |  57%
  |                                                                  
  |===========================================                 |  71%
  |                                                                  
  |===================================================         |  86%
  |                                                                  
  |============================================================| 100%
Final Touches... almost complete...
Total API Query Count: 90

Joining, by = "activity_name"
Joining, by = "activity_name"

Joining, by = "period_name"
Joining, by = "period_name"

Joining, by = "athlete_id"

Joining, by = "athlete_name"

Joining, by = "athlete_first_name"

Joining, by = "athlete_last_name"

Joining, by = "device_id"


By scrolling to the right in the table above, you can see specific information related to the GPS-based velocity and acceleration efforts.

To get a quick count of the number of each effort type by athlete, you can use the table() function.


head(table(efforts$athlete_name, efforts$effort_type), 10) # just see 10 athletes

                        
                         accel  vel
  Alexander Rash           313  809
  Alexandra Littlejohn     119  856
  Amie Bernat              302 1073
  Andrea Kunard            203  747
  Caitlyn Drago            303  642
  Damian Kennah             19  119
  Ghaamid el-Sami          131  701
  Hayley Walters             0   13
  Jaden Mcfalls-Brothers   255  913
  Jaelynn Thomas           157  752


10 Hz Sensor Data: Similar logic will work for 10 Hz data as well. However, be careful when making API queries for 10 Hz data – this type of data is much bigger in terms of both storage and memory demands. These requests will take longer to process as well. It is recommended that you limit your requests for 10 Hz data to single activities and consider using athlete filters as well. To access 10 Hz data, just use “sd” as the data_type.


Conclusion

You can see from the examples that it is possible to accomplish many useful API queries with the get_cat_data() function. This function was also written in a way that minimizes the number of API queries. Use caution with wrapper functions like this – they make it very easy to create very large requests that involve potentially thousands of API queries. If you choose to experiment with the function yourself, it is strongly recommended that you do so with an interactive session first. This will provide periodic messages with API query counts and ask for user input before proceeding.

Citation

For attribution, please cite this work as

Hart (2020, Feb. 26). catapultR: A single wrapper function to rule them all. Retrieved from http://catapultr.catapultsports.com/posts/2020-02-26-one-function-to-rule-them-all/

BibTeX citation

@misc{hart2020a,
  author = {Hart, Brian},
  title = {catapultR: A single wrapper function to rule them all},
  url = {http://catapultr.catapultsports.com/posts/2020-02-26-one-function-to-rule-them-all/},
  year = {2020}
}