First the move2
package needs to be loaded.
First valid movebank credentials need to be stored. Here the ‘username’ needs to be set. In an interactive session the user will be prompted for a password and possibly will be prompted to install additional packages. Alternatively the password can be provided on the command line as a second argument however it is then important to ensure it will not be stored in the R history. For more details and setups storing multiple credentials we refer to the “movebank” vignette.
movebank_store_credentials("username")
For this example we download the data from the study ‘Galapagos
Albatrosses’ (notice that matching is conducted if no study is named as
such). Here we specify to only download data from the ‘gps’ sensor.
Filtering at this early stage speeds up the working cycle as no unneeded
data is extracted from the database. The resulting data is printed to
the screen showing an overview of the data (here the number of lines are
reduced). First general properties of the data are shown including the
number of tracks and the average track duration. After, the first
observations are printed. The units of attributes derived from the
movebank vocabulary are associated to the locations are shown between
square brackets. Finally, the summary of the track_data
is
printed, where each row corresponds to the track level data for each
track.
In case the license terms for the study have not been accepted
before, the download command will fail and prompt the user to read the
license terms and accept these. This can be done by adding the
license-md5
argument to the download command with the hash
provided in the error message.
data <- movebank_download_study("Galapagos Albatrosses", sensor_type_id = "gps")
data
#> A <move2> with `track_id_column` "individual_local_identifier" and `time_column`
#> "timestamp"
#> Containing 28 tracks lasting on average 37.1 days in a
#> Simple feature collection with 16414 features and 18 fields (with 386 geometries empty)
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -91.3732 ymin: -12.79464 xmax: -77.51874 ymax: 0.1821983
#> Geodetic CRS: WGS 84
#> # A tibble: 16,414 × 19
#> sensor_type_id individual_local_iden…¹ eobs_battery_voltage eobs_fix_battery_vol…²
#> <int64> <fct> [mV] [mV]
#> 1 653 4264-84830852 3686 3437
#> 2 653 4264-84830852 3701 3452
#> 3 653 4264-84830852 3701 3482
#> # ℹ 16,411 more rows
#> # ℹ abbreviated names: ¹individual_local_identifier, ²eobs_fix_battery_voltage
#> # ℹ 15 more variables: eobs_horizontal_accuracy_estimate [m],
#> # eobs_key_bin_checksum <int64>, eobs_speed_accuracy_estimate [m/s],
#> # eobs_start_timestamp <dttm>, eobs_status <ord>, …
#> First 3 track features:
#> # A tibble: 28 × 52
#> deployment_id tag_id individual_id animal_life_stage attachment_type
#> <int64> <int64> <int64> <fct> <fct>
#> 1 2911170 2911124 2911090 adult tape
#> 2 2911150 2911126 2911091 adult tape
#> 3 2911167 2911127 2911092 adult tape
#> # ℹ 25 more rows
#> # ℹ 47 more variables: deployment_comments <chr>, deploy_on_timestamp <dttm>,
#> # duty_cycle <chr>, deployment_local_identifier <fct>, manipulation_type <fct>, …
As the move2
class extends sf
we can profit
from existing plotting functionality. Here we use ggplot2
to visualize the data. Notice that geom_sf
is called twice,
once to plot the location records, and a second time to plot the tracks
for each individual. For the later mt_track_lines
is used
convert the point location data to a single line geometry per
individual.
library(ggplot2)
ggplot() +
ggspatial::annotation_map_tile(zoom = 5) +
ggspatial::annotation_scale() +
theme_linedraw() +
geom_sf(data = data, color = "darkgrey", size = 1) +
geom_sf(data = mt_track_lines(data), aes(color = individual_local_identifier)) +
coord_sf(
crs = sf::st_crs("+proj=aeqd +lon_0=-83 +lat_0=-6 +units=km"),
xlim = c(-1000, 600),
ylim = c(-800, 700)
) +
guides(color = "none")
#> In total 386 empty location records are removed before summarizing.
#> Zoom: 5
#>
#> Fetching 9 missing tiles
#> | | | 0% | |======== | 11% | |================ | 22% | |========================= | 33% | |================================= | 44% | |========================================= | 56% | |================================================= | 67% | |========================================================== | 78% | |================================================================== | 89% | |==========================================================================| 100%
#> ...complete!
For more interactive explorations of the data other packages like
mapview
and leaflet
might be of interest, as
it allows to zoom into the tracks and explore the attributes of each
location.
Using the tools from gganimate
and
ggspatial
we can also add more context to the map and
animate it. Here annotate_map_tile
is used to add the OpenStreetMap background map
and annotate_scale
to add a scale bar. A variety of
different animations are possible, here we show birds originating from
different study_site
’s. These kind of animations help to
gain insights into the differences between subgroups of the data set.
Besides tagging location, also different years, life stages or sexes are
clear candidates to compare.
require(gganimate)
require(ggspatial)
animation_site <- ggplot() +
annotation_map_tile(zoom = 5, progress = "none") +
geom_sf(
data = mt_track_lines(data),
mapping = aes(group = individual_local_identifier),
color = "black"
) +
transition_states(study_site, state_length = 2) +
enter_fade() +
exit_fade() +
ease_aes("cubic-in-out") +
labs(title = "{closest_state}") +
annotation_scale()
#> In total 386 empty location records are removed before summarizing.
animation_site
We can also take this one step further by animating the map view. To do that we use a local equal area projection and define for each state a manual zoom.
animation_site +
coord_sf(
crs = sf::st_crs("+proj=aeqd +lon_0=-83 +lat_0=-6 +units=km")
) +
view_zoom_manual(
pause_length = 2,
xmin = c(-850, -900, -950),
ymin = c(-350, -800, -350),
ymax = c(610, 700, 610),
xmax = c(500, 600, 500)
)
It is also possible to animate the movement of individuals. To make the animation smoother, we linearly interpolate the position of individuals. Here we interpolate to a location every 60 minutes. We avoid interpolating longer time lags (in this case larger than 3 hours) to avoid inferring movement between location to long apart, this causes some individuals to temporarily disappear from the animation. To ensure that the data set is fully regular we omit the existing locations. For visual purposes we only show a short period of a few days, however, this can be adjusted to what is most appropriate for the intended visualization. For each timestamp we render one frame. Here we color by individual but other attributes like the speed could also be used.
data_interpolated <- data[!sf::st_is_empty(data), ] |>
mt_interpolate(
seq(
as.POSIXct("2008-7-27"),
as.POSIXct("2008-8-1"), "60 mins"
),
max_time_lag = units::as_units(3, "hours"),
omit = TRUE
)
animation <- ggplot() +
annotation_map_tile(zoom = 5, progress = "none") +
annotation_scale() +
theme_linedraw() +
geom_sf(
data = data_interpolated, size = 3,
aes(color = individual_local_identifier)
) +
transition_manual(timestamp) +
labs(
title = "Galapagos Albatrosses",
subtitle = "Time: {current_frame}",
color = "Individual"
)
animate(animation,
nframes = length(unique(data_interpolated$timestamp))
)
The previous animation was made using a relatively simple approach.
However, with some additional changes it is possible to add a tail to
the movement of individuals and use a dark color scheme. To visualize
the tail we use shadow_wake
which does not work in
combination with transition_manual
, therefore we switch
here to the usage of transition_time
. To keep the
continuity of movement, this case we do in interpolate over longer time
spans.
date_range <- as.POSIXct(c("2008-7-29", "2008-8-1"))
ts <- mt_time(data)
data_interpolated <- data[!sf::st_is_empty(data), ] |>
mt_interpolate(
times <- sort(unique((c(date_range, ts[ts < max(date_range) & ts > min(date_range)])))),
omit = T
)
label_df <- data.frame(
timestamp = date_range,
display_time = lubridate::with_tz(date_range, "America/Lima")
)
animation <- ggplot() +
annotation_map_tile("cartodark", zoom = 5, progress = "none") +
annotation_scale(bar_cols = c("gray80", "gray40"), text_col = "gray80") +
geom_sf(data = mt_track_lines(data), color = "grey40") +
theme_linedraw() +
geom_sf(
data = data_interpolated, size = 3,
aes(color = individual_local_identifier)
) +
scale_color_brewer(palette = "Set1") +
guides(color = "none") +
xlab("") +
ylab("") +
geom_text(
data = label_df,
aes(label = display_time, x = -10100000, y = -1370000),
color = "grey80", size = 3, hjust = 0
) +
transition_time(timestamp) +
shadow_wake(.2, exclude_layer = 6)
#> In total 386 empty location records are removed before summarizing.
The time period selected is 3 days and we generate one frame every 30
minutes. The detail
argument is used to estimate additional
intermediate locations so that the tail gets a smooth shape.
animate(animation,
nframes = 3 * 24 * 2 + 1, detail = 5
)
Note that the timezone is set to Lima, this provides us the possibility to easily identify that most movement occurs during the day and that the birds are relatively stationary and floating with the current during the night.
Now lets take a deeper dive into the albatross data. We have seen that these birds breed on the Galapagos islands and forage on the coast of South America. For some of these birds only one foraging trip has been recorded while for others multiple trips are available. In this example we explore the tracks within four different stages of these trips (in the breeding area, the outbound and inbound flight and the foraging area). To do this we do spatial intersections with the defined regions, split the track into different sections and summarize each track.
First we download the data again, however this time we additionally include the accelerometer data, this gives us one trajectory per individual. Some individuals are marked in the deployment data as being ‘not used for analysis’, so we omit these individuals from the dataset.
require(units)
require(dplyr)
require(sf)
data <- movebank_download_study("Galapagos Albatrosses",
sensor_type_id = c("gps", "acceleration")
)
data <- data %>%
filter_track_data(deployment_comments != "not used in analysis")
The dataset now contains both accelerometer and gps data, these
records are not necessarily recorded at the same time and are thus
reported as separate rows in the data. The accelerometer data are not
associated with locations. In order to associate the measurements to a
specific region we do a linear interpolation for all missing locations
(the default of mt_interpolate
). The records as downloaded
by default are not sorted by time and individual, therefore we first
need to ensure this ordering is correct.
data <- data[order(mt_track_id(data), mt_time(data)), ]
data <- mt_interpolate(data)
Next, we want to associate records with the respective regions
(breeding and foraging), we do this by a spatial intersection
(st_join
). For simplicity here the breeding region is
defined as with a fixed distance from any of the deployment locations of
the tracks. The foraging region is defined as the area within a 100
kilometer from the coast of South America. As a result a new column
called region
is generated where all records outside of
either of the regions has a NA
value.
library(rnaturalearth)
breeding_area <- st_buffer(mt_track_data(data)$deploy_on_location, as_units(25, "km")) |>
st_union()
foraging_area <- ne_countries(110,
returnclass = "sf",
continent = "South America"
) |>
st_union() |>
st_buffer(as_units(100, "km"))
regions <- st_sf(data.frame(
region = c("Breeding", "Foraging"),
polygon = c(breeding_area, foraging_area)
))
data <- st_join(data, regions)
Next, we need to find those trajectories that are either inbound or
outbound. To identify these we look for series of NA
records where the previous location was within the breeding region and
the next location in the foraging region, or vice versa. This approach
has the advantage that only full trips from one to the other region are
defined as commutes. Shorter trips outside of the breeding or foraging
region can be redefined to be either foraging or breeding so that these
sections are not cut up.
data <- data %>%
group_by(mt_track_id(.)) %>%
mutate(
region_change = paste(
vctrs::vec_fill_missing(region),
vctrs::vec_fill_missing(region, "up")
),
region = case_match(region_change,
"Foraging Breeding" ~ "Inbound",
"Breeding Foraging" ~ "Outbound",
"Breeding Breeding" ~ "Breeding",
"Foraging Foraging" ~ "Foraging",
.default = region
)
)
Next, we redefine tracks, up to now the tracks corresponded to all
tracking data from one individual. Now we want to convert all continuous
data within one region as being one track. Here rle
is used
to identify continues section within one region. We do omit locations
that have no region associated, these occur at the start or end of
trajectories
data <- data %>%
mutate(
sequence_number = with(rle(region), rep(seq_along(lengths), lengths)),
track = paste(individual_local_identifier, region, sequence_number)
) %>%
ungroup() %>%
mutate_track_data(individual = droplevels(individual_local_identifier)) %>%
mt_set_track_id("track") %>%
filter(!is.na(region))
These used gps tracking devices were also equipped with accelerometers, here we do not go into the calibration of these measurements, however we can still calculate a proxy for energy expenditure, the dynamic body acceleration (DBA). The acceleration measurements, in this case collected in bursts, are stored in movebank as a text string which can be parsed to an expenditure for each burst as follows (notice that for these tags no calibration is available):
acc_to_dba <- function(x) {
acc_mat <- matrix(as.numeric(unlist(strsplit(x, " "))), nrow = 2)
mean(colSums(abs(acc_mat - rowMeans(acc_mat))))
}
data$dba <- unlist(lapply(data$eobs_accelerations_raw, acc_to_dba))
Now we can calculate summaries for each track, in this case we calculate the number of records in each track but also the start and end so the duration can be calculates. Furthermore we calculate summary statistics for the ground speed as measured by the gps and the DBA.
track_summary <- data %>%
mt_track_lines(
region = unique(region), n = dplyr::n(), start = min(timestamp),
end = max(timestamp),
across(
all_of(c("ground_speed", "dba")),
list(
mean = function(x) mean(x, na.rm = TRUE),
sd = function(x) sd(x, na.rm = TRUE)
)
)
) %>%
mutate(duration = as_units(end - start))
We can first tabulate the number of tracks per region for each individual. Here we see that each individual has one track more in the breeding region then in the other regions. This makes sense as all individuals were equipped with transmitters in the breeding region and data is only retrieved once they returned.
table(track_summary$individual, track_summary$region)
#>
#> Breeding Foraging Inbound Outbound
#> 1163-1163 4 3 3 3
#> 2131-2131 2 1 1 1
#> 3275-30662 4 3 3 3
#> 4261-2228 4 3 3 3
#> 4262-84830876 2 1 1 1
#> 4264-84830852 2 1 1 1
#> 4265-8483009431 2 1 1 1
#> 4266-84831108 3 2 2 2
#> 4267-84830990 2 1 1 1
Now we can plot the obtained summary statistics for the trajectories in each ‘region’. First we plot a map, to show where each track is, the inbound tracks are longer then the outbound tracks.
ggplot(track_summary) +
geom_sf(data = ne_coastline(returnclass = "sf", 50)) +
geom_sf(aes(color = region)) +
theme_linedraw() +
coord_sf(
crs = st_crs("+proj=aeqd +lon_0=-83 +lat_0=-6 +units=km"),
xlim = c(-1000, 600),
ylim = c(-800, 700)
) +
labs(color = "Region") +
scale_color_brewer(type = "qual")
However the distance is covered in a shorter time as can be seen in the following graph. Here the units are changes to days to facilitate the interpretation.
ggplot(track_summary, aes(x = region, y = duration)) +
geom_boxplot(outlier.shape = NA) +
geom_point(aes(color = individual, group = individual),
position = position_jitterdodge()
) +
xlab("") +
scale_y_units("Duration", unit = "days", trans = "log10") +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust = .5)) +
theme_linedraw() +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust = .5)) +
scale_color_brewer("Individual", type = "qual", palette = "Set1")
This corresponds with the fact that the ground speed measured by the gps sensor is on average higher (notice units get propagated from movebank).
ggplot(
track_summary,
aes(x = region, y = ground_speed_mean)
) +
theme_linedraw() +
geom_boxplot(outlier.shape = NA) +
geom_point(aes(color = individual, group = individual),
position = position_jitterdodge()
) +
xlab("") +
ylab("Mean ground speed") +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust = .5)) +
scale_color_brewer("Individual", type = "qual", palette = "Set1")
#> Warning: The `scale_name` argument of `continuous_scale()` is deprecated as of ggplot2
#> 3.5.0.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
At the same time DBA is lower in these inbound flight corresponding to the tail winds these birds profit from. We also see that birds have a higher speed and DBA in the foraging region compared to the breeding region corresponding with their respective activities.
ggplot(
track_summary,
aes(x = region, y = dba_mean)
) +
theme_linedraw() +
geom_boxplot(outlier.shape = NA) +
geom_point(aes(color = individual, group = individual),
position = position_jitterdodge()
) +
ylab("Mean DBA") +
xlab("") +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust = .5)) +
scale_color_brewer("Individual", type = "qual", palette = "Set1")
This analysis provides an example on how movement data can be
analyzed using move2
and how the retention of meta data is
important to maintain consistency throughout the analysis. This enables
tight integration with other packages facilitating visualization and
spatial operations.