Breadcrumbs
Home ›
Science and technology
›
Science and innovation
›
Funding information and opportunities
›
Investment funds
›
Strategic Science Investment Fund
›
Funded programmes
...
›
Data Science platform
-
Strategic Science Investment Fund
-
Funded programmes
- Advanced Energy Technology platform
- Antarctic Science platform
- Crown Research Institute platforms
- Data Science platform
- Independent Research Organisation platforms
- Infectious Disease research platform
- New Zealand Agricultural Green House Gas Research Centre
- Ngā rākau taketake – combatting kauri dieback and myrtle rust
- Ribonucleic Acid (RNA) Development platform
- Natural Hazards and Resilience Platform
-
Funded infrastructure
- Advanced genomics research platform
- Australian Synchrotron
- Enhanced Geohazards Monitoring
- Longitudinal studies infrastructure platform
- Mission Operations Control Centre
- National eScience Infrastructure
- Nationally Significant Collections and Databases
- Research and Education Advanced Network New Zealand
- Research Vessel Tangaroa
- Review of scientific collections and databases
-
Funded programmes
Data Science platform
The Strategic Science Investment Fund (SSIF) Data Science platform, intends to significantly lift New Zealand’s capability, support and encourage dynamic and world class data science research, and deliver on the Government’s data science investment goals.
On this page
The Government’s data science investment goals are to:
- make sure New Zealand has sufficient advanced data science capability to develop useful and transformative data science techniques
- create benefits for New Zealand.
MBIE funding
The Government is committed to investing $49 million between 2020 and 2027 in four Strategic Science Investment Fund (SSIF) Data Science research programmes.
Research programmes funded
The research programmes being funded by the SSIF platform for Data Science are:
Read the public statement
Aquaculture is New Zealand’s best opportunity to sustainably grow its Blue Economy, yet the industry is facing significant challenges in achieving its $1 billon revenue target by 2025. We now have more mussel farms, but the total annual yield hasn’t increased. Could it be due to climate change? The industry depends on natural sources of spat (mussel larvae) found in springtime off Ninety Mile Beach. Little is known about where it comes from – or how best to get spat to adhere to the growing ropes. Mussel farming relies on experience, but conditions change from farm to farm and season to season.
The aquaculture industry thinks that data science could be the answer. By bringing marine scientists together with specialists in machine learning, modelling, and data visualization, we will develop new science to support decision-making, so farm managers can respond to climate challenges, manage disease, improve production yields, and farm sustainably at scale. Applying data science to marine farming makes sense.
In this programme, we will develop innovative data science techniques that will enable the aquaculture industry to produce efficiently and at large scale, producing high-quality, low-carbon protein for New Zealand and the world without compromising the environment. To do this, we will build data science knowledge in the industry.
Māori own large aquaculture assets. With our partners Whakatōhea and the Wakatū Incorporation, we will co-design a programme to educate young Māori in data science and create the next generation of industry leaders. All our students, from undergraduate to PhD, will work with industry within the research programme and through internships and summer projects.
Aquaculture data poses immense challenges for the researchers, so our programme is led by our best data scientists, guided by distinguished international researchers. Our students will learn from the world’s best.
Read the public update from the 2023/24 annual report
Over the fourth year, we have achieved significant progress in the established four case study themes on shellfish and finfish, and Vision Mātauranga, as well as the fundamental research in data science and AI. The team has 34 high-quality publications at international journals and conferences. Our team members have also been invited to present our work in different international conferences. A number of team members have been awarded best paper awards and competitions. We have also developed machine learning tools for the industry to apply our techniques such as online tools and demos for real-world users.
Our capacity-building is highly effective in both recruiting and training students including undergraduate students, summer/honour students, masters, and PhDs, providing a solid framework for their academic and professional development to work on this programme. We have regularly managed the Māori scholarships/internships, including Undergraduate Scholarships/internships, Summer scholarships, Graduate Awards, Master by Research Scholarships and PhD Scholarships, to attract, support and train our young Māori students and researchers in data science and application to aquaculture. We have offered Māori undergraduate Scholarships to three students, and one of them is currently working with us as a summer student.
For impact improvement, we have delivered 17 talks/speeches, organised 6 different special sessions/issues and workshops, and published 2 newsletters to promote our programme and achievements. We have also invited 9 world-leading researchers to visit us, and to deliver distinguished talks and panel discussions. Our team members are also in significant roles for data science and aquaculture disciplines, such as associate editors in top journals, members of advisory boards for international conferences, and conference chairs. We have been actively engaging with industry partners, and attending NZ Aquaculture and Seafood industry conferences/meetings/forums. Finally, we have organised one workshop to strength the collaborations among team members, and provided opportunities for engagements with international leaders in Aquaculture).
Our work generated by this programme has been recognised by a professional UK magazine WORLDFISHING & AQUACULTURE, details here. One of our papers accepted by Journal of the Royal Society of New Zealand has been recognised and posted by Australian Society for Fish Biology, i.e., a professional organisation of fish and fisheries researchers, via linkedin.
Read the public update from the 2022/23 annual report
Over the third year, we have achieved significant progress in the four case study themes on shellfish production, finfish breeding, fish health prediction, and Vision Mātauranga, as well as the fundamental research in data science and AI.
In terms of research, each case study has been progressing well based on the collaborations between experts from both the aquaculture and data science sides. New aquaculture related data and applications are also explored and added to the case studies, greatly enhancing the case studies. The team has a good number of high-quality publications in international journals and conferences. Our team members have also been invited to present our work in different international conferences, workshops and summer schools via plenary and keynote talks, specialised tutorials, best paper awards and competitions.
We have successfully built a pipeline with Māori scholarships/internships, including Undergraduate Scholarships/internships, Summer scholarships, Graduate Awards, Master by Research Scholarships and PhD Scholarships, to attract, support and train our young Māori students and researchers in data science and application to aquaculture. We have successfully offered Māori Undergraduate Scholarships to three Māori students, and expect to offer more soon.
To improve our impact, we have delivered different kinds of talks/speeches, organised different special sessions/issues and workshops, and published newsletters to promote our programme and achievements. We have also invited world-leading researchers to visit us, and invited them to deliver distinguished talks and panel discussions. Our team members also play leading and important roles for data science and aquaculture disciplines, such as associate editors in top journals, and membership of advisory boards for international conferences. We have been actively engaging with industry partners, and attending NZ Aquaculture and Seafood industry conferences/meetings/forums. Finally, our Te Whiri Kawe—Centre for Data Science and AI (https://www.wgtn.ac.nz/cdsai(external link)) was successfully launched by Minister Dr Ayesha Verrall in June 2023.
Launch of Te Whiri Kawe—Centre for Data Science and Artificial Intelligence(external link) — Victoria University of Wellington
Read the public update from the 2021/22 annual report
The goal of this programme is to help the decision marking in the aquaculture industry by bringing marine and aquaculture scientists together with specialists in machine learning, modelling, and data visualisation to develop new science. In this programme, we aim to develop innovative data science techniques that will enable the aquaculture industry to produce efficiently and at a large scale, producing high-quality, low-carbon protein for NZ and the world without compromising the environment.
To do this, we will build data science knowledge in the industry. Māori own large aquaculture assets. With our partners Whakatōhea and the Wakatū Incorporation, we will co-design a programme to educate young Māori in data science and create the next generation of industry leaders. All our students, from undergraduate to PhD, will work with industry within the research programme and through undergraduate internships, postgraduate scholarships and summer projects.
Over the past year, we have made significant achievements in different aspects. We have established four application case study themes which focus on shellfish production, finfish breeding, fish performance and health prediction, and Vision Mātauranga, respectively. Each theme is run by experts from both industry and academia. We have published 59 papers including fundamental research in data science and applications to aquaculture. To improve our impact, we have delivered 19 talks/presentations, organised 21 special sessions/issues and workshops, and published 1 newsletter to make our achievements more visible. We have also started building a pipeline which provides scholarships and internships to help young Māori in data science. In addition, we have been collaborating with top international researchers to carry out research, and engaged with world-leading researchers by inviting talks and discussions. More details can be found on our programme website:
Progress Report 2022 - Groups/DataScienceForAquaculture | ECS(external link) — Victoria University of Wellington
More information:
Data science for aquaculture(external link) — Victoria University of Wellington
Read the public statement
Led by Te Hiku Media, our research will lead the revitalisation of minority and indigenous languages and the indigenisation of digital devices worldwide. Over the next seven years, our research programme will bring together world-leading data scientists from New Zealand, Cambridge and Oxford Universities, Māori communities and Mozilla in a unique collaboration to tackle this challenge.
Our proposal aims to establish a multilingual language platform to develop natural language processing tools and methods that will enable New Zealanders to engage with technology in the language they use or aspire to use every day. Starting with te reo Māori and New Zealand English, our program will ensure a New Zealand identity is firmly embedded in the digital world. We will also extend into the Pacific and work with Samoan and Hawaiian communities. Our tools will make it possible to switch between these languages, so people can speak into their devices to “find a choice as kai of panipopo”. Most importantly, however, we will secure the future for these languages in a changing, dynamic, digital world.
Currently, minority languages do not have the datasets big enough for existing methods to work, so these languages and their communities are largely invisible and unheard in these contexts. Their existence is under threat because as digital technologies further permeate our day to day life, the ability to engage and transmit the language intergenerationally becomes more and more difficult. Our research programme will make it possible for ‘low resource’ languages and their speakers to be able to fully participate in a digital context by creating cutting edge technology.
Read the public update from the 2023/24 annual report
Over the fourth year, we have achieved significant progress in the "Ehara te toka i Akiha, he toka pakupaku, he toka whitianga-ā-rā; ka pā tāu ko te toka o Mapuna, tēnā tāu e titiro ai ko te ripo kau." This whakataukī describes two approaches to the research and work undertaken by Papa Reo this year. The first, like Akiha, is visible and all of its qualities can be seen. The second, like Māpuna, is hidden below the surface and is only noticed by the ripples and currents that swirl around it.
This year, Papa Reo had, like Akiha, moments of very public recognition for the work undertaken by the team. The Rise 25 award to the Chief Technology Officer of the project, Keoni Mahelona, was a global recognition of the efforts to build a language platform for under-resourced languages and maintain sovereignty over data. Invitations and subsequent presentations to the Creative Commons Summit, National Digital Forum and the ADA Copyright Forum, for example, are all moments where the project, the project goals and the team's efforts have all had a moment in the sun.
Like Māpuna, however, much of what we do occurs below the surface and is often incremental steps to advancing the technological goals of the project. Our models are continually validated and benchmarked, which is a vital step to maintaining integrity and legitimacy in the AI community. Our models outperform efforts by big tech to deliver NLP tools for te reo Māori and we believe that this is as a result of our approach, our relationships and the work we conduct below the surface.
Read the public update from the 2022/23 annual report
"Kua tawhiti kē tō haerenga mai kia kore e haere tonu. He nui rawa ō mahi kia kore e mahi tonu. You have come too far not to go further; you have done too much not to do more." (Tā Hemi Henare).
Having successfully built a strong research team and a set of foundation tools for te reo Māori, Te Hiku Media and the Papa Reo project focussed on maximising the impact of the tools and their use and relevance to Aotearoa. This meant tackling the bilingual research problem, ensuring the speech recognition tools will accurately transcribe te reo Māori and NZ English. The new bilingual model was released in early 2023 and when benchmarked against models released by the likes of Meta and OpenAI, is at least 50% more accurate for both languages and performs better across subsets such as contemporary speakers of NZ English speakers and archival native speakers of te reo Māori.
As Big Tech launched Large Language Models like ChatGPT into the market with abandon and little care for the consequences, Te Hiku Media have advanced its advocacy for ethical approaches to data collection, training of machine learning models and indigenous data sovereignty. This has seen global recognition with members of the team invited to participate in discussions hosted by Stanford Institute for Human-Centered Artificial Intelligence, the Computing Research Association of America and become members of the Partnership in AI Task Force for Inclusive AI. On the home front, Te Hiku Media continued to make impact in the Māori language community by getting tools such as Rongo and Kaituhi into the hands of users, further claiming technological landscape for te reo Māori.
Read the public update from the 2021/22 annual report
Iti pioke nō Rangaunu, he au tōna. Small as the dog shark of Rangaunu may be, great is its wake.”
The second year of Papa Reo has been about building the momentum of the project. Despite being a small team, the impact of Papa Reo has been felt extensively across the language technology and the indigenous data sovereignty spaces. Papa Reo worked closely with University-based research teams providing access to tools and data. We also supported undergraduate internships and postgraduate research projects.
Papa Reo also invested in growing the capacity and capability of our team and in our tools. Kaitāia now houses one of the fastest AI servers in the country. Ōrongonui, named for the moon phase when it was turned on, means the team has been able to experiment and innovate without constraint. Our new team of data scientists, developers and engineers can now experiment with novel approaches quickly, creating and improving our tools.
With this state-of-the-art infrastructure, Papa Reo continues to develop a multilingual language platform. Papa Reo deployed a new Māori text-to-speech model. We launched the Rongo app on Apple using a model that provides real-time feedback to improve pronunciation. The first Māori speech-to-text model developed by Te Hiku Media continues to be refined with a reduced word error rate while we prepare to train a new model using novel approaches. With these models in place, over the coming years, we will reduce the work required for other under-resourced languages.
We remain an active voice in the recognition of indigenous data sovereignty, contributing to global conversations, for example at APEC, and on the ground in national forums. The growing awareness of Te Hiku Media’s position on data sovereignty continues to create opportunities to educate and drive our decisions when choosing tools and partners.
Read more:
A language platform for a multilingual Aotearoa(external link) — Te Hiku Media
Read the public statement
Data are essential to research, understand, set policy for and manage New Zealand’s environment, but environmental data presents many challenges that require new data science methods to overcome them, and a substantial increase in the capability of environmental researchers, governors and managers to use data science in their work. This programme will develop those new methods and build the required capability.
In particular, we will focus on developing methods to deal with environmental datasets that are collected in large volumes over time and must therefore be dealt with as streams that are analysed incrementally, as they are measured, rather than as collections of data that can be analysed all at once. These methods will address underlying characteristics of the data that evolve over time (e.g. due to climatic or ecological changes), and data that are collected at a range of time intervals and spatial scales ranging from broadscale satellite images to singlepoint measurements on the ground, in the water or air. The methods we develop will be interpretable and explainable (to help users understand why an algorithm produces some particular output), identify and understand anomalies (to distinguish 'normal' from 'unusual' measurements) and quantify uncertainty in algorithm output (to help decision-makers understand how confident they can be in conclusions drawn from the data science methods).
To deliver the methods we develop in a form that environmental scientists and managers can use, we will build a new open source framework to do machine learning on time series data, and provide an open access repository of environmental datasets to improve reproducibility in environmental data science. Through workshops, undergraduate and postgraduate research projects within the programme, we will build New Zealand’s capability in fundamental and applied data science relevant to environmental data, from introductory to postdoctoral level.
Read the public update from the 2023/24 annual report
TAIAO’s vision is to advance data science by providing robust, accessible tools and methods for New Zealand's environmental sectors. Our community platform upholds data sovereignty and open-source principles, supporting Vision Mātauranga and fostering growth across the data, environmental science, and software development communities. In line with these goals, we launched CapyMOA in March 2024, an advanced machine learning library specifically designed for data streams. CapyMOA significantly improves real-time data processing, offering faster, scalable, and more accessible machine learning for continuous data streams. Additionally, we have published a range of work aligned with our research aims in peer-reviewed journals and conference proceedings over the past year.
We actively supported a number of exciting networking and conference initiatives. In addition to hosting our annual TAIAO Workshop in August 2023, we contributed to the 21st Australasian Data Mining Conference (AusDM 2023) and the AI Researchers Association Annual Conference 2024, where we showcased our research and engaged with the broader data science community. We also supported Indigidata Aotearoa, an initiative focused on developing an understanding of Indigenous data science and sovereignty, specifically tailored for Māori participants. We were proud to support the AI for the Environment Hackathon Festival last year and this year, we are again hosting the hackathon event for the Waikato and Canterbury regions. It was heartening to see teams competing with such passion for improving environmental outcomes for Aotearoa New Zealanders.
In addition to the three existing case studies, we have begun developing a new case study in collaboration with Sanctuary Mountain Maungatautari. The goal is to create a flexible and extensible annotation platform designed to address the complex challenges of annotating endangered kiwis. This platform aims to be a transformative solution for the detailed analysis of photographic and video imagery.
Read the public update from the 2022/23 annual report
The vision of TAIAO continues to be to enable the next level of data science to provide robust and fit-for-purpose tools and methods that are accessible and useful to researchers and practitioners across all areas of the New Zealand environment. In the third year of our project, we've achieved significant milestones, including the continued expansion of the community formed during the initial two years. Additionally, our team has gained deeper insights into the needs of potential end-users within Aotearoa New Zealand.
The TAIAO community platform (taiao.ai) received a new look and feel that was launched in November 2022. With this fresh launch, we introduced “Categories”, which facilitate filtering of Datasets, Notebooks, Software, and Tutorials. Aligned with the essence of TAIAO, the platform maintains its status as both data sovereign and open source. These foundational principles are instrumental in upholding our dedication to Vision Mātauranga and in cultivating not only the data and environmental science communities but also the software development community.
In addition to the two existing case studies, we have undertaken the development of a new case study that involves a collaboration with the Department of Conservation, with the goal of creating a flexible and extensible annotation platform. This platform aims to tackle the intricate challenges and intricacies associated with annotating under-sea habitats.
The TAIAO Machine Learning Course for Flood Practitioners took place in June 2023 at the University of Waikato, drawing attendees from the Waikato Regional Council, including flood practitioners, hydrologists, and environmental scientists. This event has generated interest from various other regional councils for potential future editions of the course. Efforts have been initiated to encourage greater engagement on the TAIAO community platform. In line with this, we are planning to establish a new dedicated category within the platform that aligns with this objective.
Read the public update from the 2021/22 annual report
The vision of TAIAO continues to be to enable the next level of data science to provide robust and fit-for-purpose tools and methods that are accessible and useful to researchers and practitioners across all areas of the New Zealand environment. Achievements for year two are continued strong growth of the community established during the first year, and increased team understanding of the needs of potential end-users in Aotearoa New Zealand.
The TAIAO community platform (taiao.ai) has continued its iterative improvement, culminating in a public launch in November 2021. Feedback from the environmental data science community after the launch was strongly supportive of our intent to develop the TAIAO community through the sharing of datasets, notebooks, kōrero and resources. True to the intent of TAIAO, the platform will remain a data sovereign and open source. These are two fundamentals which drive both the commitment to vision matauranga and the ability to build both the data and environmental science communities alongside the software development community.
We have been working on two case studies for the TAIAO data platform based on a data mesh architecture. For the first case study, in collaboration with the Waikato Regional Council and MetService, we created an operational live archive and API for MetService's rain radar data. The archive has been populated with all the historical surveillance radar scans from the Auckland and the Bay of Plenty radars and is progressively being backfilled with data from the other New Zealand rain radars.
The second case study is ongoing work with SCION, which takes advantage of data from the ForestFlows system, which monitors forests in real-time. This work will help to increase the information flow to interested entities regarding the forests of New Zealand.
Read more:
Machine Learning for Streams(external link) — Time-Evolving Data Science and Artificial Intelligence for Advanced Open Environmental Science (TAIAO)
Read the public statement
Data science facilitates new approaches to longstanding problems in healthcare, policy, ecology and economy. But to make the most effective use of it, we need analytical methods that are straightforward to apply, open to review and audit, and produce results that can be correctly interpreted by practicing researchers and policymakers. Furthermore, we need methods that discover, gather and integrate potentially useful data with minimal human intervention, and ensure that the most suitable analysis methods are used with such data. Finally we need to empower a whole generation of researchers, across all fields, to use these new methods in robust and defensible ways.
Our team comprises researchers from the Universities of Auckland, Otago, Canterbury and Massey. In it, computer scientists and statisticians will work alongside domain scientists in fields such as computational biology, ecology and public health. Over the next 7 years, we will improve the application of data science methods in complex research settings, make processing more efficient, and create transparent and computationally-reproducible workflows that are published, open and easily reused. We will commit the majority of our budget to training and equipping the doctoral and post-doctoral researchers who will go on to successfully apply data science methods to making improvements to our environment, economy and society.
Read the public update from the 2023/24 annual report
Our overall aim is to develop new data science methods that help to improve the process of conducting and reporting research, and to use developments in data science to improve our national capability in genomics and biology research, thus helping New Zealand to remain competitive in these two important areas.
One of our major project goals is to help make research more transparent and trustable. Over the past year, we have developed a scientific claim verification methodology and associated tool allows questions to be posed about the accuracy and supporting evidence for any claim made against the scientific literature. An associated web application allows claims to be evaluated in real time against established scientific literature. This kind of tool helps to empower non-scientists to ask specific questions about scientific findings and evidence, and to explore for themselves the peer-reviewed evidence that supports, and refutes, any science claim.
Another major achievement over the past year is the use of Digital Twin technology to create a genomic digital twin of a threatened native species, the Hihi (stitchbird). This digital twin allows us to model with precision how the genomics of the current Hihi population will affect its ability to survive and thrive in a changing world. Such methods allow us to better understand how our precious flora and fauna will react in the future to changes in their environment, so we can understand associated risks before changes occur and plan for better mitigation.
Collaborations within our team are now leading to additional grant proposals in the areas of AI and genomics, and we have strengthened our collaborations with other research partners based in Aotearoa (notably ESR and Genomics Aotearoa) along with several prestigious overseas collaborations, including the Allen AI Institute and Globus Labs.
We sponsored and helped to deliver a significant national outreach & training event (ResBaz 2024) enabling over 1000 researchers to learn new data science skills that they can apply to their own research.
Event schedule(external link) — ResBaz Aotearoa
Read the public update from the 2022/23 annual report
The team has achieved significant research progress this year in four distinct areas: (i) phylogenetic analysis—studying the relationships between genomics and disease using advanced data science methods, (ii) ecological modelling—contrasting geographical and genomic differences within endangered species and (iii) live science publishing—creating a new way of publishing science experiments embedded within traditional research articles that are now getting traction with other researchers and (iv) AI tools that can validate or refute scientific claim--for example: "Covid vaccines cause sterility in males". In each of these areas we have developed new methods and related software that is being shared openly with the science community.
Collaborations within our team are now leading to additional grant funding in the areas of AI and genomics, and we have strengthened our collaborations with other research partners based in Aotearoa (notably ESR and Genomics Aotearoa with plans to extend to the national RNA Platform once it is underway).
We sponsored a significant national outreach & training event (ResBaz 2022: https://resbaz.auckland.ac.nz/sessions/(external link)) enabling several hundred applied researchers throughout the country to learn new data science skills that they can apply in their own research. We are also now training a future generation of NZ data scientists via our PhD and post-doctoral scholarships and via related AI Carpentry work.
Taken together, this progress is producing the new methods, code, collaborations and workforce that will ensure Aotearoa can take better advantage of data science in its future research endeavours in key industries such as genomics and AI and branches of government such as public health and debate on science and policy.
For more information, contact Prof. Mark Gahegan, Professor of Computer Science at the University of Auckland, email: m.gahegan@auckland.ac.nz
Read the public update from the 2021/22 annual report
The team has achieved significant research progress this year in three distinct areas: (i) phylogenetic analysis—studying the relationships between genomics and disease, (ii) ecological modelling—contrasting geographical and genomic differences within endangered species and (iii) live science publishing—creating a new way of publishing science experiments embedded within traditional research articles. In each of these areas we have developed new methods and related software that has been shared openly with the science community.
Collaborations within our team are now leading to additional grant proposals in the areas of AI and genomics, and we have strengthened our collaborations with other research partners based in Aotearoa (notably ESR and Genomics Aotearoa).
We sponsored a significant national outreach & training event (ResBaz 2021: https://resbaz.auckland.ac.nz/sessions/(external link)) enabling hundreds of applied researchers throughout the country to learn new data science skills that they can apply in their own research. We are also now training a future generation of NZ data scientists via our PhD and post-doctoral scholarships.
Taken together, this progress is producing the new methods, code, collaborations and workforce that will ensure Aotearoa can take full advantage of data science in its future research endeavours in key industries and branches of government.
For more information, contact Prof. Mark Gahegan, Professor of Computer Science at the University of Auckland, email: m.gahegan@auckland.ac.nz