Iomairt AI a’ toirt cothrom na Fèinne dhan Ghàidhlig san linn dhidseatach

Published: 2 May 2023

AI initiative gives Gaelic a foothold in the digital age

[English below]

Tha cànanaichean agus eòlaichean AI gu bhith a’ tòiseachadh air prògram àrd-amasach a tha air a dhealbh gus Gàidhlig na h-Alba a chuideachadh gus soirbheachadh san linn dhidseatach.

Tha eòlaichean ann an Oilthighean Dhùn Èideann agus Ghlaschu air £225,000 fhaighinn bho Riaghaltas na h-Alba gus siostam fo-thiotalan Gàidhlig a chruthachadh a bhios freagarrach dhan BhBC.

Bheir am maoineachadh cothrom dhan sgioba tòiseachadh air obrachadh a dh’ionnsaigh a bhith a’ cruthachadh modail cànain mòr – coltach ri ChatGPT – airson luchd-labhairt na Gàidhlig.

Tha oidhirpean gus an siostam a chruthachadh mar phàirt de dh’iomairt nas fharsainge gus cur an-aghaidh a’ chunnart bho mhùchadh didseatach, a tha mu choinneamh Gàidhlig na h-Alba agus mion-chànanan eile.

Bidh seo a’ tachairt nuair nach urrainn do luchd-labhairt cànain pàirt a ghabhail ann an conaltradh didseatach air sgàth dìth teicneòlais sa chànan.

Tha an sgioba rannsachaidh cuideachd a’ cuideachadh le bhith a’ cruthachadh siostam aithneachadh cainnte airson Ojibwe, fear de na cànanan dùthchasach ann an Canada. 

Tha cnapan-starra mòra mu choinneamh na Gàidhlig ann a bhith a’ cruthachadh agus a’ cumail suas a teicneòlas cànain air sgàth gainnead dàta.

Mar as trice tha prògraman AI air an trèanadh air seataichean dàta mòra, a bhios gu tric air an togail bhon eadar-lìn agus an uair sin air an gleusadh le fios air ais bho dhaoine.

Cruinnichidh luchd-rannsachaidh corpas mòr de dhàta Gàidhlig agus cleachdaidh iad e gus siostam aithneachaidh cainnt fèin-obrachail (automatic speech recognition - ASR) àrd-inbhe a chruthachadh airson meadhanan, foghlam agus rannsachadh.

Cruthaichidh am pròiseact teicneòlas fo-thiotalan Gàidhlig air a bheil cruaidh fheum agus bidh e na dheagh thoiseach tòiseachaidh airson a bhith a’ cruthachadh mhodailean Gàidhlig ùr-nodha.

Tha luchd-rannsachaidh den beachd gun cuidich seo le bhith a’ dìon a’ chànain ann an raointean didseatach agus mar sin gun cuir e gu mòr ri oidhirpean ath-bheothachadh nàiseanta.

Tha modhan ionnsachaidh domhainn mar a theirear riutha feumach air tòrr mòr dàta trèanaidh, agus airson mion-chànanan mar a’ Ghàidhlig, tha gainnead dàta aig sgèile mhòr na chnap-starra mòr.

Tha am pròiseact ag amas air dèiligeadh ris a’ chnap-starra seo gu dìreach, le bhith a’ cruthachadh cruinneachadh susbainteach de dhàta trèanaidh Gàidhlig stèidhichte air daoine aig a bheil Gàidhlig bho thùs.

Tha an sgioba mar-thà air clach-mhìle chudromach a ruigsinn ann an 2021 le bhith a’ cruthachadh a’ chiad siostam aithneachaidh cainnt airson na Gàidhlig a tha ri fhaighinn gu poblach.

Ged a tha feum air obair leasachadh a bharrachd gus a dhèanamh nas pungaile, tha an teicneòlas air cuideachadh mar-thà le bhith a’ cruthachadh fo-thiotalan airson bhidiothan teagaisg.

Bidh grunn phròiseactan didseatach a tha ann mar-thà nam bun-stèidh airson na bunait eòlais a tha a dhìth gus cùisean a mheudachadh chun na h-ìre a thathar a’ moladh an seo.

Nam measg tha 15,000 duilleag de dh’aithris Ghàidhlig air tar-sgrìobhadh a fhuaras bho Thasglann Sgoil Eòlais na h-Alba, a tha stèidhichte ann an Oilthigh Dhùn Èideann.

Cleachdaidh an sgioba cuideachd susbaint bhon Dachaigh Airson Stòras na Gàidhlig (DASG). Bidh seo a’ gabhail a-steach mu 30 millean facal teacsa bho Chorpas na Gàidhlig aig Oilthigh Ghlaschu agus clàraidhean de dhaoine aig a bheil Gàidhlig bho thùs aig tasglann claistinn Cluas ri Claisneachd aig DASG.

Thuirt prìomh neach-rannsaiche a’ phròiseict, an t-Àrd-Ollamh Uilleam Lamb, bho Sgoil Litreachasan, Cànanan is Chultaran Oilthigh Dhùn Èideann: “Tha seo mu bhith a’ cur tòrr còmhla – a chaidh a chruinneachadh bho luchd-labhairt na Gàidhlig san àm a dh’fhalbh – agus ga thoirt air ais do luchd-labhairt na Gàidhlig, ann an diofar riochdan, san latha an-diugh.”

Thuirt an co-neach-rannsachaidh, an t-Àrd-oll Roibeard Ó Maolalaigh, bho Oilthigh Ghlaschu – a tha cuideachd na Stiùiriche air DASG: “Cuiridh seo gu mòr ri leasachadh teicneòlas cànain airson na Gàidhlig. Tha e na thoileachas dhuinn gu bheil goireasan DASG gan cleachdadh san dòigh seo agus gan leasachadh a bharrachd.”

Thuirt Jenny Gilruth (BPA), Rùnaire a’ Chaibineit airson Foghlam agus Sgilean: ‘Tha Riaghaltas na h-Alba ro thoilichte taic a chumail ris a’ phròiseact innealta seo, a chuidicheas a’ Ghàidhlig gus soirbheachadh san linn dhidseatach agus a ghlèidheas dualchas cànanach is cultarach ar dùthcha.’

Cuideachd a’ gabhail pàirt tha dithis luchd-rannsachaidh eile aig Oilthigh Dhùn Èideann – an Dr Beatrice Alex, àrd-òraidiche ann am mèinneadh teacsa, agus an Dr Peter Bell, leughadair ann an teicneòlas cainnt.

Tha am pròiseact ga dhèanamh an co-bhoinn ri BBC Alba. Tha e cuideachd a’ gabhail a-steach DASG, am faclair eachdraidheil Faclair na Gàidhlig, seirbheis nam meadhanan Gàidhlig MG ALBA agus Tobar an Dualchais/Kist o Riches – clàr air-loidhne gun samhail de bheul-eòlais beairteach na h-Alba.

Gus barrachd ionnsachadh mun phròiseact DASG a tha aithnichte le Acadamaidh Bhreatainn, rachaibh gu: https://dasg.ac.uk/gd 


AI initiative gives Gaelic a foothold in the digital age

Linguists and Artificial Intelligence specialists are embarking on an ambitious programme designed to help Scottish Gaelic flourish in the digital age.

Experts at the Universities of Edinburgh and Glasgow have been awarded £225,000 by the Scottish Government to produce a Gaelic subtitling system suitable for the BBC.

Funding will also enable the team to start working towards production of a large language model – similar to ChatGPT – for Scottish Gaelic speakers.

Efforts to create the system are part of a wider initiative to counter the threat of digital extinction, faced by Scottish Gaelic and other minority languages.

The phenomenon occurs when speakers of a language are unable to participate in digital communication because of inadequate language technology.

The research team is also helping to develop a speech recognition system for Ojibwe, one of the indigenous languages of Canada. 

Gaelic faces significant obstacles in developing and maintaining its language technology because of a scarcity of data.

AI programs typically are trained on large data sets, which are often scoured from the internet and then fine-tuned by human feedback.

Researchers will assemble a large body of Gaelic language data and use it to generate a high-quality automatic speech recognition (ASR) system for media, education and research.

The project will provide desperately needed Gaelic subtitling technology and jump start the development of state-of-the-art Gaelic language models.

Researchers say this will help to safeguard the language in digital domains and contribute substantially to national revitalisation efforts.

So-called deep learning approaches are ravenous for training data, and for minority languages like Gaelic, lack of data at scale is a significant obstacle.

The project aims to tackle this obstacle head-on, by generating a substantial body of colloquial Gaelic training data.

The team has already achieved a significant milestone in 2021 by developing the first publicly-available speech recognition system for Gaelic.

Although it requires additional development to improve its accuracy, the technology has already helped to create subtitles for teaching videos.

A number of existing digital projects will lay the foundation for the knowledge base needed to support the scaling-up effort proposed here.

Among them are 15,000 pages of transcribed Gaelic narrative sourced from the School of Scottish Studies Archives, based at the University of Edinburgh.

The team will also access material from the Digital Archive of Scottish Gaelic (DASG). This will include some 30 million words of text from the University of Glasgow’s Corpas na Gàidhlig and vernacular recordings from the DASG’s Cluas ri Claisneachd audio archive

Lead researcher Professor William Lamb, of the University of Edinburgh’s School of Literatures, Languages and Cultures, said: “This is about compiling large amounts of knowledge – gleaned from Gaelic speakers in the past – and returning it to Gaelic speakers, in various forms, in the present.”

Fellow researcher Professor Roibeard Ó Maolalaigh, of the University of Glasgow – who is also DASG Director – said: “This will add substantially to the development of language technology for Gaelic. It is gratifying that DASG’s resources are being deployed in this way and being further developed.”

Education Secretary Jenny Gilruth said: “The Scottish Government is proud to support this cutting-edge project, which will help Gaelic to thrive in the digital age and safeguard our country's rich linguistic and cultural heritage.”

Also taking part are two other University of Edinburgh researchers – Dr Beatrice Alex, a senior lecturer in text mining, and Dr Peter Bell, a reader in speech technology.

The project is being carried out in tandem with BBC Alba. It also involves DASG, the historical dictionary Faclair na Gàidhlig, Gaelic media service MG ALBA and Tobar an Dualchais/Kist o Riches – a unique online record of Scotland’s rich oral heritage.

To learn more about the British Academy recognised DASG project, go to: https://dasg.ac.uk/en

First published: 2 May 2023