Skip to Main Navigation
Download the report

Infinite possibilities

Repurposing and integrating public- and private-intent data can help provide real-time and finer-scale insights, fill data gaps, and overcome limitations associated with each data type. Despite the recognized potential of these efforts to improve living standards in low-income countries, several hurdles need to be cleared – including technological and human capital constraints, data privacy risks, limits to representativeness of the poorest and marginalized populations, and gaps in research on methods, tools and standards for integration of public intent and private intent data.

Monitoring Smallholder Agriculture from Space

Agriculture is an integral part of livelihoods in Sub-Saharan Africa, where it can contribute up to two-thirds of household income in rural areas. As such, improving the productivity of smallholder farmers has been a long-standing goal in many African countries that aim to eliminate poverty and food insecurity.

To monitor progress towards national and international development goals related to agricultural productivity, countries need accurate, crop-specific measures of area under cultivation, production and yields – not only at the national-level but with sufficient within-country disaggregation that can guide targeting and evaluation of policies and programs promoting agricultural and rural development and resilience against disasters and extreme weather events.

With the commencement of the European Space Agency’s Sentinel-2 mission in 2015 and the subsequent increase in the public availability of high-resolution satellite imagery, research has shown the potential for satellite-based monitoring of agricultural outcomes in smallholder farming systems that could not be studied from space previously.

Satellite-based measures of crop areas and yields require data for training and validating underlying models, for which machine learning is increasingly used. Research conducted in Kenya, Uganda, and Mali has recently demonstrated the promise of georeferenced survey data for training and validating models that combine household surveys and satellite imagery to derive high-resolution, crop-specific estimates of area under cultivation and yields – with a mix of accuracy, precision and timeliness that is only possible thanks to repurposing and integration of public-intent data sources.

Building on these activities at sub-national scale, new research conducted by World Bank Living Standards Measurement Study (LSMS) and Atlas AI - under the 50x2030: Data-Smart Agriculture Initiative - is now zeroing in on identifying the required volume of, and approach to, survey data collection for meeting model training and validation needs in earth observation applications that can in turn be scaled-up across entire countries. These applications can provide reliable satellite-based estimates of agricultural outcomes, particularly in settings that are characterized by smallholder farming and poverty and that would stand to benefit the most from these efforts.

Using Sentinel-2 satellite imagery and innovative survey data that had been collected by the Malawi National Statistical Office and the Central Statistical Agency of Ethiopia, research has identified the best practices in georeferenced survey data collection that would maximize the utility of recurrent household surveys for satellite-based estimation of agricultural outcomes – with a focus on mapping areas cultivated with maize, a key contributor to livelihoods and diets in both countries. The analysis reveals that a simple machine learning workflow can classify smallholder plots with maize cultivation with up to 75 percent accuracy; that classification performance peaks with slightly less than 60 percent of the training data; and that the seemingly small erosion in predictive accuracy under less preferable approaches to georeferencing plot locations in household surveys can in fact result in significant overestimation, in the amount of 0.16 to 0.47 million hectares (8 to 24 percent), in total area cultivated with maize.

After identifying the best available model, the researchers use it to map maize cultivation across Malawi and Ethiopia at 10-meter spatial resolution - an achievement that was unfathomable prior to 2015. Below, the map for Malawi contrasts two realities for the 2018/19 agricultural season. On the left, we have estimates of area cultivated with maize at the district-level – the most refined level at which the household survey can provide reliable statistics in Malawi. And on the right, we have the snapshot of a selected area in central Malawi, where we can zoom down to areas as small as 100 square meters - made possible by satellite-based estimation that is anchored in high-quality ground data from surveys.

Mapping of Areas under Maize Cultivation in Malawi for 2018/19 Agricultural Season

Source: Azzari, G., Jain, S., Jeffries, G., Kilic, T., and Murray S. (2021). “Understanding the requirements for surveys to support satellite-based crop type mapping: evidence from Sub-Saharan Africa.” World Bank Policy Research Working Paper.

Advances in high-resolution satellite imagery, remote sensing research, and georeferenced household surveys have the potential to dramatically improve monitoring and understanding of smallholder farming in Africa.

Mobile phone data to combat COVID-19

After the onset of the COVID-19 outbreak, governments began implementing policy measures to reduce social contact and curb the spread of the virus. Many non-essential businesses were closed, and citizens were asked or ordered to stay home, with the goal of saving lives and livelihoods. While the development of a vaccine was still ongoing, these non-pharmaceutical interventions played a crucial role in fighting the novel coronavirus. But policy makers had a hard time answering key questions about their efficacy. For example, have impacts differed over time and across district boundaries? And crucially, are vulnerable groups less able than others to comply with social distancing regulations?

Data offers some insights. Specifically, data collected through mobile phones, such as call detail records (CDRs) and global positioning system (GPS) location data, have been extremely valuable in quantifying variations in human mobility, population density, travel patterns, and population mixing in real time and at high resolution, making it possible to better target policy interventions and improve epidemiological modeling. Soon after the onset of the pandemic, several large-scale technology companies made anonymized location data available to support pandemic response and recovery. Google’s Community Mobility reports, Facebook’s Mobility Index, Apple’s Mobility Trends Reports, as well as data pipelines provided by Veraset, Unacast and Cuebiq, are only a few examples of these efforts.

To get a better sense of the impact of the pandemic and the resulting social distancing policies on human mobility, we analyzed privacy-protected GPS location data from Veraset in developing countries. In Jakarta, Indonesia, for example, we found that after the state of emergency was declared, households living in the wealthiest 20% of neighborhoods (what we term “high-wealth users”) increased their time spent at home by about 35% more than those living in the poorest 40% of neighborhoods (known as “low-wealth users”), relative to the pre-pandemic period. In addition, the share of high-wealth users commuting to work decreased by about 25% more than that of low-wealth users.

Time spent at home

0510152025Time spent at home (% change)DateWealth of administrative unit0%40%80%100%Low-wealth usersMedium-wealth usersHigh-wealth users

Source: Veraset

Note: Change in the time users spent at home from February 15 to November 14, 2020, relative to the baseline period. Large-scale social restrictions (Indonesian: Pembatasan Sosial Berskala Besar or PSBB)

After the state of emergency was declared, high-wealth smartphone users living in Jakarta increased their time spent at home by 35% more than low-wealth users

Share of commuters

-60-50-40-30-20-100Share of commuters (% change)DateWealth of administrative unit0%40%80%100%Low-wealth usersMedium-wealth usersHigh-wealth users

Source: Veraset

Note: Change in the share of users living in Jakarta and commuting to work from February 15 to November 14, 2020, relative to the baseline period. Large-scale social restrictions (Indonesian: Pembatasan Sosial Berskala Besar or PSBB)

The share of high-wealth smartphone users commuting to work decreased by 25% more than that of low-wealth users

These findings are not unique to Indonesian cities and underscore a simple fact: lockdown policies affect citizens differently. Vulnerable groups find it more challenging to comply with social distancing measures for a wide variety of reasons, including limited household savings, a weak or nonexistent social safety net, incomes that depend on face-to-face contact, crowded living arrangements, and poor access to basic services. Many of the most vulnerable are left with the terrible choice of staying home and forgoing an income, or going to work but doing so at great potential health risk. These findings also illustrate that fine-grained and high-frequency measures of mobility derived from mobile phone data provide a useful tool to policymakers for assessing the impact of the pandemic, especially on the most vulnerable.

Beyond the assessment of variations in human mobility, mobile phones have also been a crucial component of governments’ contact tracing strategy. Both private companies and government actors have developed mobile contact-tracing applications, such as the Corona app 100m in the Republic of Korea, TraceTogether in Singapore, or COVIDSafe in Australia, with the objective of alerting individuals who may have been in contact with a person infected with COVID-19. While the scale at which digital contact tracing has contributed to a reduction in the spread of the virus is still debated, it raises important concerns about data protection, prompting researchers worldwide to develop contact tracing technologies that preserve privacy. Examples are the Private Kit: Safe Paths developed by the Massachusetts Institute of Technology (MIT) and the Decentralized Privacy-Preserving Proximity Tracing (DP3T) protocol developed by a consortium of European research institutions.

Although mobile phone data have played a key role to combat the COVID-19 pandemic by allowing for a rapid assessment of changes in human mobility and for contact tracing, they present important limitations in terms of representativeness of the population. Smartphone usage is still limited worldwide, and especially in rural areas of the developing world. Data collected from mobile phones should be seen as a complement rather than a substitute to surveys to tackle development challenges such as the COVID-19 pandemic and beyond.

The use of mobile phone data to inform COVID-19 public health response and their possible biases


Source: Adapted from Grantz et al. (2020) - Figure 1

Note: The use of mobile phone data to inform COVID-19 public health response and their possible biases.

Mobile phone data should be considered in light of ownership and use biases that may limit generalizability to the overall population.

Mobile phone owners and users only represent a subset of the population and may have additional age, socio-demographic, or geographic biases.

Contact-tracing applications that require the use of a smartphone...

… or application may further limit the generalizability of these data since they represent smaller subsets of the user population.

Looking forward

Repurposing and integrating public- and private-intent data have the potential to improve the welfare of our planet and the men, women, and children of all nations. To do so, countries must strengthen investments in technological and human capital necessary for frontier applications on data integration. Data partnerships must be nurtured among public- and private-intent data producers and users for increasing access to confidential data without compromising privacy. And these steps must be complemented with investments in research on methods, tools and standards for integrated use of public- and private-intent data.