Hero image for Explore Summer '21 feature story, "Capturing a World of Data"

Capturing a World of Data

Computing power is the key to analyzing a changing environment
By Cindy Spence


The frontiers remaining in the natural world today are not in the thickest jungles, deepest oceans and highest mountains.

For naturalists today, the last frontier is data.

Robert Guralnick, the biodiversity informatics curator at the Florida Museum of Natural History, says data science approaches, particularly machine learning, can help with the critical challenge of extracting the best data generated by an ever-more-closely monitored environment and using it to save global biodiversity.

“We really need to be able to do this and do it well,” says Guralnick, “and relatively quickly. Data limitations are perhaps the key impediment in understanding just how quickly the planet is changing and the consequences of those changes.”

Portrait of Robert Guralnick
Robert Guralnick

Naturalists and scientists still use field notebooks, but to those analog tools they are adding the tools of artificial intelligence:

  • From space, satellites monitor Earth around the clock.
  • Closer to ground, drones provide surveillance of any terrain.
  • Some sensors record readings such as temperature and moisture, while others record the sounds of birds and insects.
  • Camera traps record the behaviors of animals when humans are not around or seasonal changes in plants.
  • And environmental DNA can be collected and analyzed with the latest sequencing equipment.

There is much to learn, with the help of AI and computing power.

“Ecology is now a big data science, but with all the data we are generating — and we are generating those at increasing rates — are we generating more rapid insights we can use to get ahead of environmental challenges?” Guralnick asks.

“The answer to that question is, I think, not yet. We have to move faster. We have to be better at this.”

One ecology app — iNaturalist — illustrates the data explosion. The app lets users identify plants and animals around them with the help of other users and often results in research-grade observations. Started in 2008, the app has recorded 54 million observations of more than 305,000 species. Just in August 2020, in 31 days, users uploaded more than a million photographs just of plants, covering about 20,000 species, about half a million of them from the U.S.

Another platform, eBird, allows people to use their cellphones to record and share birdwatching data. Involving citizen scientists to collect data and then using digital tools to analyze the data has allowed researchers to monitor bird movements and migrations better than ever.

“We use machine learning to classify where birds are and what they are doing across space and time,” Guralnick says. “Collective birdwatching has changed the world.”

And while iNaturalist today records about 200,000 daily observations of plants and animals, one day that number will be 1 million, and then 5 million.

“That’s a hugely powerful resource for understanding what’s happening in the environment and making sense of it,” Guralnick says.

“These platforms can be transformational,” Guralnick says. “AI is perhaps the critical toolkit here to actually make progress.”

Ecosystem Services

Trees cover 31 percent of the world’s land area, and the ecosystem services they provide are valued in the trillions of dollars. Since plants are the foundation of all terrestrial ecosystems, and trees rule the plant world, understanding changes in forests is a key to protecting the ecosystem services — or benefits — they offer.

An ecosystem can be an urban trail, a state park, a national forest. Understanding ecosystems as the scale grows becomes more challenging and requires both teamwork and technology, says Ethan White, an associate professor in the Department of Wildlife Ecology and Conservation, a part of UF’s Institute of Food and Agricultural Sciences.

A multidisciplinary research team — Integrating Data science with Trees and Remote Sensing (IDTReeS) — was formed with faculty including Alina Zare in electrical and computer engineering, Stephanie Bohlman in the School of Forest, Fisheries and Geomatics Sciences, Daisy Wang in computer and information science and engineering, and Aditya Singh in agricultural and biological engineering. The group is developing ways to identify individual trees in large forests.

“Changes in forests due to climate change, disturbance, land use change and forest management influence carbon storage, economics and ecosystem services,” Bohlman says. “These changes depend fundamentally on the characteristics of individual trees, but it is traditionally only possible to collect this data at very local scales using people-on-the-ground field techniques.

“Remote sensing from satellites, aircraft and drones has the potential to allow us to measure individual trees across huge areas,” Bohlman says. “That creates the potential for more informed decisions about forest management and responses to climate change.”

To develop machine learning methods for doing this at large scales, White says, it made sense to use the data available from the National Ecological Observatory Network. UF is a leader in NEON, which is funded by the National Science Foundation. NEON uses flyovers to collect photographic data for ecosystems from Puerto Rico to Alaska, and White and his collaborators used the photographs to create algorithms that identify millions of individual trees in each of these forests, including over 5 million trees at UF’s NEON site — the Ordway-Swisher Biological Station.

NSF map of NEON Field Sites

“At this phase, we want to know if we can generate data at this scale in a way that is sufficiently close to the kinds of information we’d get from really intensive field work,” White says. “Our work so far shows that we can determine where the trees are, and how big they are, and we can do this for over 100 million trees.”

Because photographic data is becoming more widely available, White says the methods the team has used can be applied widely, although scaling out to the entire U.S. would still take a long time. And, as more data pour in, the task becomes even more computationally challenging.

“Everything scales really, really quickly,” White says. “But we’re getting faster and better, and HiPerGator has a substantial amount of resources, so we can go quite a bit bigger.”

While carbon storage is a huge question for forestry, White says providing a method that can answer other questions at larger scales also is important.

“A lot of times when we ask ecological questions, we ask them at the scale that a graduate student or small team can go out and collect the data in the field,” White says. “But we’re often interested in applying data at much larger scales. So what we’re trying to do is produce the kinds of data we’d produce in the field — size, species, leaf traits — to answer a broad suite of ecological questions. Where are the largest trees, where is the most biomass, where are the most biodiverse regions?

“We are building a platform for research by providing large-scale data on forests quite broadly. We don’t just analyze it ourselves. We turn it into data products and make the products publicly available so other people can work on them and do ecology with them.”

Group portrait of Ethan White's research collaborators
Collaborations in ecology pull from multiple departments. From left, Daisy Wang, Ethan White, Stephanie Bohlman,
Alina Zare and Aditya Singh, who are working together on analyzing forest-level data.

Nature’s Voices

Images are not the only inputs. Sound, too, is a key data source in documenting the natural world, says Brian Stucky, an AI facilitator and consultant with UF Research Computing.

Animal sounds convey significant information:

  • Which species are present in an ecosystem.
  • Whether there are seasonal patterns to activity.
  • How animals interact.
  • How abundant a species is in an ecosystem.
  • Are there new sounds, indicating species that are not yet identified.
Portrait of Brian Stucky
Brian Stucky

One example of sound as big data, Stucky says, is research on frogs, which are in steep decline across the world.

Doctoral researcher Greg Jongsma studies African frogs, and in 2019 he visited Gabon for field work. He strapped two recorders to trees and turned them on. One recorded 10 days of sounds, the other six days. They yielded 380 hours of bioacoustics data.

“As much as I love bioacoustics,” Stucky says, “nobody I know is crazy enough to sit down with a pair of headphones and try to manually analyze 380 hours of audio.”

The natural world is a noisy place 24/7, so figuring out how to extract only the frog species was a labor-intensive proposition. The cacophony included frogs, to be sure, but also birds and insects. A biology student, Katie Everett, volunteered to take a stab at the task.

“Eventually, we heard from Katie that after spending about two hours, she hadn’t even made it through two minutes of audio,” Stucky says.

Stucky says the team developed new methods of annotating audio by hand to more quickly generate the training data needed to build the AI system to analyze the full dataset. The system was able to identify the calls of the target frog species with great accuracy.

“This is a work in progress,” Stucky says. “We’re still actively experimenting with various network architectures. Our goal here is to use these tools to analyze the daily patterns of calling activity for four key frog species.”

Stucky says the AI methods have allowed the team to analyze all the audio and describe when the newly discovered frog is most active during a typical day at a level of detail that is not available for any other species of frog in Africa.

“It’s breaking new ground in that way as well,” Stucky says, and Jongsma agrees.

“If Brian had not developed this amazing AI-driven approach to tackling these audio recordings, they likely would have remained on SD cards, collecting dust, quite possibly never to be used,” Jongsma says. “I can see so much potential for asking big questions that would have been insurmountable in terms of bridging big data and long-term field data collection without an AI specialist like Brian.”

Image of a camera trap in Gabon
One of Greg Jongsma’s camera traps in Gabon.

If AI can do the heavy lifting, Stucky says, scientists can find new ways to answer biological questions, simply by deploying acoustic monitoring and analyzing the sounds.

Data and Diversity

With Earth in the midst of what has been called the sixth great extinction and species disappearing 1,000 times faster than normal, there’s little time to spare in conquering the data frontier for natural resources.

“Documenting the rate and drivers of biodiversity change is critical because those losses have important consequences for ecosystem services that underlie human society, like food, fuel and fiber,” Guralnick says. “We’re moving toward disequilibrium and losing diversity before we even know what we’re losing.”

The traditional mainstay of ecological study — field work — isn’t going anywhere. But more and more, naturalists will be teaming up with data scientists, or learning the skills of data science for themselves.

White is a big believer in data science tools in the service of ecology. He helped build the national Data Carpentry organization, which teaches researchers how to use the tools of data science like programming languages and databases. This spring, the IDTReeS group ran a data science competition in which teams used open remote sensing data to design algorithms to identify and classify trees.

“The idea behind the competition is that teams will develop new ways to process ecological information, and those methods can benefit scientists everywhere,” says Wang, one of the leaders of the program. “An open challenge can attract solutions from a broad range of participants and help us refine methods to help us solve problems from various research teams.

“Our efforts are among the first to apply data science to the field of ecology,” Wang says.

Wang, Zare, and White also teach classes to help lay a foundation for future ecologists in tools like machine learning. Raising baseline tech skills for scientists can have a huge impact on ecology, White says.

“We can’t just get someone from Google to handle data for us. This requires a real fusion of domain expertise with advanced technical skills, so it’s essential for these projects to have a whole range of people from biological experts through machine learning and computer science experts who are all capable of interacting and talking with one another. That’s a really difficult thing to accomplish, and it takes time to figure out.”

Cross-disciplinary collaborations, like the one with Zare’s Machine Learning and Sensing Lab and Wang’s Data Science Research Lab, leverage both skillsets.

Zare says the computer scientists may not be experts on an ecological problem, but if an ecologist hands her team a curated dataset, it could have a big impact. Ecologists, too, could have more impact with an assist in coding and machine learning methods. The back-and-forth discussions about creating a meaningful dataset provide insights to both sides.

“We’re a unique group,” Zare says.

The goal is for such collaborations not to be unique for long. White says ecology offers legitimately difficult problems that will require big interdisciplinary teams.

“A key component of these approaches is large amounts of high-quality field data to even begin to develop models in the first place. And because systems change through time, the need for new field data is never going away,” White says.

“By combining this field work with remote sensing and analysis, we can do so much more than any of us could do on our own,” White says.

Image of a frog

Guralnick says integrating “pixel views” of the world with observations made on the ground will require another kind of collaboration — one between humans and machines. It’s important to automate tasks humans previously have done because machines are faster at those tasks. But it’s equally important to keep humans in the loop. Human intelligence will be needed to make automation succeed.

“We have managed to develop a remarkable set of tools, especially in the last 50 years, for monitoring the environment,” Guralnick says. “Now we have the enormous challenge of integrating these data into a coherent picture that could be useful for solving problems.

“It is a frontier challenge.”


Sources:

Robert Guralnick
Biodiversity Informatics Curator, Florida Museum of Natural History
rguralnick@flmnh.ufl.edu

Brian Stucky
Biodiversity Informatics Researcher, Florida Museum of Natural History
stuckyb@ufl.edu


Related Website:

Integrating Data science with Trees and Remote Sensing