Corpus analysis

Maj68 corpus contains 874 texts published between 1964 and 1972 in the periodicals Tribuna, Problemi and Problemi. Literatura. The texts contain complete bibliographic data, are classified according to text and language type, degree of presence of non-standard Slovenian, foreign languages, modernism, and visual elements. The data about the authors of the texts are provided with their gender and year of birth. The presence of visual elements is marked in the corpus; note that 39 texts have only visual elements, i.e. do not contain text.

The corpus is available as facsimiles (PDFs) stored in the repository SI-DIH, in the TEI source encoding, as plain text files accompanied by metadata files, and as the linguistically annotated TEI corpus, and the derived vertical files. The TEI encoding follows the CLARIN.SI TEI customisation (https://github.com/clarinsi/TEI-schema).

The automatic linguistic annotation includes lemmas, MULTEXT-East morphosyntactic descriptions and Universal Dependencies morphological features and syntactic annotation.

Size: 874 texts, 646970 words, 794382 tokens

Published: 2021-05-31

Cite as: Juvan, Marko; et al., 2021, Corpus of 1968 Slovenian literature Maj68 1.0, Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1430.