Skip to main content

We have been working on corpus creation for many years. We have created monolingual, bilingual and multilingual corpus; these corpus have become an essential tool to continually complete and update dictionaries.
To create the corpus we use our own technology, the result of the research work we carry out in the field of language technologies.

Here is a sample of our main corpus:

  • Corpus monolingual (eu) Lexikoaren Behatokia. It is a Euskaltzaindia project, developed in collaboration with UZEI and the IXA research group of the UPV/EHU. At the end of 2020 the corpus had 77,958,327 textual forms and will continue to grow each year.
  • Parallel corpus (en-es/eu) EHUskaratuak, created for the UPV/EHU. It is a multilingual corpus composed of 18,048,431 textual forms.
  • Corpus of science and technology (ZTC), created by Elhuyar, in collaboration with the IXA research group of the UPV/EHU: It is a labeled corpus composed of 8.5 million textual forms.
  • Multilingual parallel corpus Consumer, made for the Eroski Foundation:

If you would like to receive further information, please fill in the form

Data processor: Elhuyar

Purpose: Managing any requests and queries made.

Legal grounds: Permission obtained, legitimate interest and regulatory compliance

Recipients: Third-party service providers (including international transfers).

Rights: Rights to access, rectify and erase the data, and the rights to limit and object to their processing, including the right to object to being subject to automated individual decisions.

Source: User.

Read the privacy policy