As noted in an article published yesterday, Iceland has partnered with OpenAI (the company that developed ChatGPT) to use its next-generation version of GPT in an effort to preserve the Icelandic language – and to turn “a defensive position into an opportunity to innovate.” The Minister of Culture and Business Affairs told RÚV yesterday that the partnership had proved that small nations could, if they had “done their homework,” use AI and language technology to aid in the preservation of their languages.
Turning a defensive position into an opportunity
Yesterday, OpenAI released GPT-4, the fourth in a series of the company’s multimodal large language models that powers ChatGPT, an AI chatbot launched last November. GPT-4 will initially be available to ChatGPT Plus subscribers (who pay $20 per month for premium access to the service), and it is already powering Microsoft’s Bing search engine platform.
As noted in an article published on OpenAI’s website yesterday, among those parties using GPT-4 to its advantage is the Icelandic government, which is employing this next-generation version of GPT to preserve its language. GPT-4 has made significant improvements in its ability to respond in Icelandic, improvements that are partly the result of a collaboration inspired by an Icelandic delegation’s visit to OpenAI ’s headquarters in May of last year. The delegation, consisting of Iceland’s President and government ministers, met with OpenAI’s founder, Sam Altman.
“Iceland […] partnered with OpenAI to use GPT-4 in the preservation effort of the Icelandic language – and to turn a defensive position into an opportunity to innovate. The partnership was envisioned not only as a way to boost GPT-4’s ability to service a new corner of the world, but also as a step towards creating resources that could serve to promote the preservation of other low-resource languages.”
The article also quotes Jóhanna Vigdís Guðmundsdóttir, CEO of Almannarómur (a non-profit language technology center): “We want to make sure that artificial intelligence will be used not only to help preserve language, culture, and history, but also to underpin economic prosperity.”
Better, but still flawed
As noted by the New York Times, GPT-4 has shown impressive improvements in accuracy when compared to its predecessor (GPT-3.5): it’s gained the ability to summarise and comment on images, summarise complicated texts, and is capable of passing a bar exam and several standardised tests; however, it still shows a tendency to hallucinate answers.
Likewise, GPT-4, while much better at Icelandic than GPT-3.5, still produces Icelandic with “grammatical errors, ‘translationese,’ and incorrect cultural knowledge.” To make further improvements, Vilhjálmur Þorsteinsson, CEO of Miðeind (a privately owned software company based in Reykjavík that specialises in language technology), assembled a team of 40 volunteers to train GPT-4 on proper Icelandic grammar and cultural knowledge.
In a process called Reinforcement Learning from Human Feedback (RLHF) human testers give GPT-4 a prompt, and four possible completions are generated. After reviewing the four responses, testers choose the most suitable answer and refine it to achieve an optimal completion. The insights derived from this procedure are subsequently utilised to enhance the performance of GPT-4, enabling it to generate more refined responses in the future.
As noted by OpenAI, RLHF produces results with just 100 examples, which makes it “more feasible for other low-resource languages, with less digital language data available, to replicate the process.” Prior to RLHF, the process of fine-tuning a model was labour and data-intensive. Þorsteinsson’s team had initially attempted to fine-tune a GPT-3 model with 300,000 Icelandic language examples, but the results were “disappointing.”
“Now we can just jump directly to the general capabilities of the large models,” Þorsteinsson is quoted as saying on OpenAI’s website, “and enable things with our language that used to require a lot of manual labour, data preparation, and resource collection for each use case.”
With a single round of RLHF complete, the model still has some room for improvement, which provides ongoing work for the Icelandic team: to continue to train GPT-4 with sufficient examples so that the model can power “the most complex and creative applications in Icelandic, rather than defaulting to English.”
The aim is to allow the entire country to interact with OpenAI’s models in their own language, which would, for example, save Icelandic companies from relying on English-speaking chatbots on their websites.
A “very happy day”
In an interview with RÚV yesterday, Minister of Culture and Business Affairs Lilja Alfreðsdóttir stated that she was very happy with the government’s partnership with OpenAI:
“This is what we’ve been aiming for over the last five years. The government has invested over ISK 2 billion ($14 million / €13 million) in creating this basic language infrastructure so that we can get to this point,” Lilja stated, adding that over sixty experts had been working on the project for the last four to five years.
“This was always the goal: that we could introduce our efforts to companies using artificial intelligence and language technology. We met with OpenAI and this was the result: that we’re the first language besides English that they plan to introduce. So we’re incredibly happy.”
Lilja explained that during their first meeting with OpenAI, it was clear that the company was interested in introducing a language that was not as widely spoken as English: “To show that the world is not just English. We somehow managed to talk about it in cultural, historical, and literary terms.”
Lilja added that this partnership was of great significance in an international context.
“We’re proving that small nations, if they’ve done their homework, can use AI and language technology to preserve their languages. And what our collaborators thought was so amazing was seeing all the work that we had already done – creating this infrastructure so that this technology may be harnessed.”