Methodology

This page explains how Maqor prepares textual datasets and lexical information for interlinear study. The objective is reproducibility, transparency, and stable token-level analysis across corpora.

1) Source Ingestion

We import source text datasets and lexical resources from curated files under the project sources/ directory. Each corpus is registered with explicit identifiers so the application can query consistent structures by language, book, chapter, and verse.

2) Canonical Structure

Each token is normalized into a unified schema that includes surface form, language, lexical key (Strong-style or analogous), morphology payload, IPA output, and translation mapping fields. This enables a shared UI model across Hebrew, Greek, and Syriac streams.

3) Morphology Normalization

Morphological tags from heterogeneous sources are preserved in raw form and also transformed into compact display codes. A parallel expanded representation is generated for explanatory popups so users can inspect part-of-speech, case, number, gender, tense, voice, mood, and related attributes when available.

4) IPA Generation

IPA values are generated with language-specific rules. For Hebrew and Aramaic, stress placement takes Masoretic signals into account. For Greek, the configured pronunciation model is applied with explicit handling for breathing marks, diphthongs, consonant clusters, and stress assignment.

5) Translation Mapping

Interlinear translation rows are generated using prebuilt token mapping tables by language variant. Mapping logic prioritizes deterministic alignment and falls back to monotonic proportional placement where tokenization differs across traditions.

6) Quality Controls

The application includes runtime checks for missing chapters, unknown books, malformed references, and absent lexical entries. Build and deployment flows also verify environment variables and corpus visibility flags to prevent accidental misconfiguration in production.

7) Limitations

Interlinear alignment and IPA are computational aids, not substitutes for formal philological review. Users should consult critical editions, grammars, and specialized lexica for publication-grade research conclusions.

Last updated: 2026-05-11