A mobility aggregate groups the pseudonymised CDR data of many individual subscribers into an output that characterises the behaviour of the entire group of subscribers.
We can calculate aggregates which capture a broad range of aspects of population distributions and mobility, including both short-term and longer-term patterns. These aggregates can then be combined, scaled and adjusted to derive indicators which can be used to inform decision-making.
Aggregates should meet the following criteria:
- They contain no personal information – the data must be anonymised such that subscribers’ individual privacy is preserved;
- They are fast and easy to compute – MNOs should be able to calculate the aggregates with limited resources, for example on a single server.
- They are robust to infrequent phone usage – for Low- and Middle-Income Countries (LMICs) in particular, the methodology must be suitable even when many subscribers may not use their mobile devices daily.
Criteria 1 is always essential when producing CDR aggregates.
It should not be possible to reidentify any subscribers from CDR aggregates. You can read more about preserving individual privacy here.
Criteria 2 and 3 are primarily of concern in Low- and Middle-Income Countries (LMICs).
The resources available in LMICs for the processing of CDR data may be more limited, meaning that the methods for calculating need to use these resources efficiently. For example FlowKit, the open-source software suite developed by Flowminder, is designed and its methods have been optimised to run on a single server within an MNOs firewall.
Infrequent use of mobile devices is also more likely to be a limitation in LMICs, and especially in less wealthy communities within LMICs. It’s therefore important that the methods are developed without the assumption that all subscribers are going to be active every day, or even once every few days.
In this section, we will introduce the four main categories of aggregate derived from CDR data and how they are calculated.
Count of Subscribers
Count of subscribers aggregates capture the number of active subscribers recorded in a given area during a given time interval.
Each active subscriber may be present in multiple locations during a single time interval as they may visit multiple locations during that period.
Count of subscribers aggregates are primarily used to calculate Presence-class indicators. These indicators describe short-term (hourly, daily) changes in the number of people who are present within each area.
Count of subscribers aggregates are generally calculated for short time intervals (e.g. daily, weekly) but, depending on the requirements of the analysis, the spatial and temporal resolution can vary.
The presence aggregate is the simplest count of subscribers aggregate. Presence is calculated as the number of unique active subscribers recorded in a given area during a given time interval. This aggregate is used to calculate a broad range of indicators related to presence, population mixing, intra-regional travel and hotspots.
However, we can also calculate other types of count of subscribers aggregates which may be used to address a range of different questions. These other aggregates include:
- Count of visitors; the count of unique active subscribers recorded in a given area during a given time interval who are not residents in this area during this time interval.
- Count of subscribers only seen in home location; the count of active subscribers who are only recorded in the same area as their home location.
Count of Travellers
Count of travellers aggregates describe the number of active subscribers who were present in a given area and then another area within a given time interval.
Each active subscriber may travel between multiple pairs of locations during the same time period.
Count of travellers aggregates are primarily used to calculate Movement-class indicators. These indicators describe short-term (hourly, daily) changes in the number of people who are travelling into, out of and between areas.
Count of travellers aggregates are generally calculated for short time intervals (e.g. daily, weekly) but, depending on the requirements of the analysis, the spatial and temporal resolution can vary.
We can calculate different types of count of traveller aggregates depending on the type of question we are trying to address.
First, count of travellers aggregates can be directed or undirected. Directed count of travellers aggregates differentiate between trips from Area A to Area B and from Area B to Area A, counting each of these separately; undirected aggregates do not differentiate and count all trips between Area A and Area B regardless of subscribers’ starting points.
Secondly, count of traveller aggregates can count all pairs of locations within subscribers’ trajectories, or only consecutive pairs of locations. For example, a subscriber could have a trajectory through Area A, then Area B, then Area C, and finally Area D: a consecutive aggregate would record trips from Area A to Area B, Area B to Area C, and Area C to Area D; an all pairs aggregate would record trips from Area A to Areas B, C and D, for Area B to Areas C and D, and from Area C to Area D.
Count of relocations
Count of relocations aggregates describe the number of active subscribers whose home location changed between two periods of time.
Every active subscriber in a given time period is assigned a home location and each subscriber can only have one home location.
There are a number of methods for assigning home locations. You can find out more about the different approaches, including Flowminder’s methods, here.
Count of relocation aggregates can be used to calculate Resident- and Relocation-class indicators. The indicators describe the long-term (weekly, monthly or seasonal) variation in the number of people whose home location is within an area of interest and changes in the number of people changing their home location between pairs of areas, respectively.
Count of relocations aggregates are generally calculated for longer time periods (e.g. weeks, months) but, depending on the requirements of the analysis, the spatial and temporal resolution can vary.
Count of relocations aggregates can also be calculated using different lengths of time windows (i.e. home locations assigned using data from the preceding 3 days or 7 days) and updated at different frequencies (i.e. recalculated daily, weekly, or monthly). The optimal time window and frequency of update will depend on your application and the amount of resolution needed to address your questions.
Count of residents
Count of residents aggregates describe the number of active subscribers with their home location in each area at each point in time.
Every active subscriber in a given time period is assigned a home location and each subscriber can only have one home location.
There are a number of methods for assigning home locations. You can find out more about the different approaches, including Flowminder’s methods, here.
While count of residents aggregates can be used to directly calculate resident indicators, we recommend using relocation aggregates where possible, as described here. This is because changes in resident aggregates may be more influenced by variation in the number of active subscribers than by subscriber mobility, especially in Low- and Middle-Income Countries where phone usage may be less regular.
Count of residents aggregates are generally calculated for longer time periods (e.g. weeks, months) but, depending on the requirements of the analysis, the spatial and temporal resolution can vary.
Count of residents aggregates can also be calculated using different lengths of time windows (i.e. home locations assigned using data from the preceding 3 days or 7 days) and updated at different frequencies (i.e. recalculated daily, weekly, or monthly). The optimal time window and frequency of update will depend on your application and the amount of resolution needed to address your questions.