Methodology
This page explains how Maqor prepares textual datasets and lexical information for interlinear study. The objective is reproducibility, transparency, and stable token-level analysis across corpora.
1) Source Ingestion
We import source text datasets and lexical resources from curated files under the project sources/ directory. Each corpus is registered with explicit identifiers so the application can query consistent structures by language, book, chapter, and verse.
2) Canonical Structure
Each token is normalized into a unified schema that includes surface form, language, lexical key (Strong-style or analogous), morphology payload, IPA output, and translation mapping fields. This enables a shared UI model across Hebrew, Greek, and Syriac streams.
3) Morphology Normalization
Morphological tags from heterogeneous sources are preserved in raw form and also transformed into compact display codes. A parallel expanded representation is generated for explanatory popups so users can inspect part-of-speech, case, number, gender, tense, voice, mood, and related attributes when available.
4) IPA Generation
IPA values are generated with language-specific rules. For Hebrew and Aramaic, stress placement takes Masoretic signals into account. For Greek, the configured pronunciation model is applied with explicit handling for breathing marks, diphthongs, consonant clusters, and stress assignment.
5) Translation Mapping
Interlinear translation rows are generated using prebuilt token mapping tables by language variant. Mapping logic prioritizes deterministic alignment and falls back to monotonic proportional placement where tokenization differs across traditions.
6) Quality Controls
The application includes runtime checks for missing chapters, unknown books, malformed references, and absent lexical entries. Build and deployment flows also verify environment variables and corpus visibility flags to prevent accidental misconfiguration in production.
7) Limitations
Interlinear alignment and IPA are computational aids, not substitutes for formal philological review. Users should consult critical editions, grammars, and specialized lexica for publication-grade research conclusions.
Last updated: 2026-05-11