We have been working on corpus creation for many years. We have created monolingual, bilingual and multilingual corpus; these corpus have become an essential tool to continually complete and update dictionaries.
To create the corpus we use our own technology, the result of the research work we carry out in the field of language technologies.
Here is a sample of our main corpus:
- Corpus monolingual (eu) Lexikoaren Behatokia. It is a Euskaltzaindia project, developed in collaboration with UZEI and the IXA research group of the UPV/EHU. At the end of 2020 the corpus had 77,958,327 textual forms and will continue to grow each year.
- Parallel corpus (en-es/eu) EHUskaratuak, created for the UPV/EHU. It is a multilingual corpus composed of 18,048,431 textual forms.
- Corpus of science and technology (ZTC), created by Elhuyar, in collaboration with the IXA research group of the UPV/EHU: It is a labeled corpus composed of 8.5 million textual forms.
- Multilingual parallel corpus Consumer, made for the Eroski Foundation: