Since starting LIFEPLAN, we have built a network of 104 volunteer teams around the world. These teams conduct LIFEPLAN sampling on a global scale, using camera traps, audio recorders, insect traps, fungal spore samplers and soil sampling. We have purchased and sent equipment to the teams, and assisted them with acquiring permits and complying with the relevant international legislation. We have created a mobile application and a website for keeping track of tens of thousands of samples, and a cloud-based system for digital data transfer over the internet. We have also started our own sampling using the same methods on a finer spatial scale, at an additional 68 locations in Sweden and Madagascar.
To identify the species in the samples, images and sounds that are being collected, we have developed machine learning models. The massive amounts of data we are collecting make it impossible for individual human experts to go through it all, which is why we need machine learning methods. We have also set up websites for collecting the training data that will be required for the identification task. The training data will be sound, image, and DNA barcode libraries of known species.
A major challenge is that we are discovering many new species. We have addressed this challenge by developing a classification approach that uses probabilities to represent uncertainty in classification and taxa discovery. We have also developed a new approach for predicting the number of new taxa that would be discovered if a given number of additional samples were processed - providing valuable information for the design of sampling and prediction of biodiversity. This approach also adds to the statistical literature on species sampling models, relevant to very broad applications beyond ecology.
Beyond collecting our own data and analyzing it, a major part of LIFEPLAN is developing new methods for big data statistics. We have developed multiple new modelling frameworks that can flexibly adapt to the types of structure common in spatial ecology data, as well as many other applications. We have produced multiple algorithms for more efficient computation in modelling of large spatial data – these algorithms can handle broad data types and models. We have developed two new classes of algorithms to enable much faster Bayesian statistical analyses of very long time series data, while maintaining theoretical guarantees on accuracy of the approximations employed.