Methodology

Maqor separates the original source token from generated or supporting data. This matters because morphology, pronunciation, lexical references, and translation alignment are not the same kind of evidence.

Original token first

The word as it appears in the source text is kept as the primary display. For Hebrew and Syriac, directionality is handled right-to-left. For Greek, directionality remains left-to-right. Prefixes attached to Hebrew words remain attached in the displayed token because that is how the word appears in the text.

Morphology normalization

Different datasets encode morphology differently. Maqor normalizes compact morphology codes so that the verse display stays readable while the popup can expand the code into a human-readable breakdown. A compact code in the word box is not meant to be self-explanatory; it is a space-saving reference that can be expanded in the interface.

Lemmas and lexical references

Lemmas are displayed in the word popup and are used to connect the surface form to a dictionary headword or lexical identifier. Strong's numbers are used where available. For datasets without Strong's, Maqor can use analogous identifiers or source-specific lexical data, but those should be documented per corpus.

IPA and pronunciation aids

IPA output is generated from language-specific rules and source markings where available. For Hebrew, stress marks and Masoretic signs affect the position of stress in the output. For Greek, the project currently follows an Erasmian-style rule set with documented choices for letters such as phi, theta, chi, zeta, omicron, omega, and diphthongs. IPA is a study aid, not a claim that one exact historical pronunciation is certain in every case.

Translation support

The short translation line in a word block is a support gloss aligned to the word or phrase. It is not a full translation of the verse. Full verse translations are shown separately where configured. Alignment across languages is especially difficult when a translation reorders words or compresses multiple original tokens into one expression.

Tokenization and prefixes

Source texts do not always divide meaning into words in the same way that English does. Hebrew can attach conjunctions, prepositions, articles, and pronominal elements to a written form. Maqor keeps those visible as part of the surface token, because separating them visually can make the displayed word look unlike the text a reader would see in the source. When possible, prefix information is expanded in the word popup so the user can see both the attached element and the base lemma.

Greek and Syriac present different tokenization questions. Greek inflection often carries information that English expresses with word order or helper words. Syriac datasets may encode lexical or morphological details in transliteration schemes that require careful normalization before display. Maqor treats those issues as source-specific rather than forcing every language into a Hebrew-shaped display.

Why normalization is documented

Normalization is useful because it gives the interface consistent behavior across corpora. It is also risky if it hides the original source encoding. For that reason, Maqor keeps raw morphology available in the popup where possible. The compact code is a display convenience; the expanded description and raw source value remain important for verification.

Translation alignment limits

Word-level translation support is especially difficult. A source word can be omitted in a natural translation, supplied by context, represented by a phrase, or split across multiple translated words. Maqor's gloss line should therefore be treated as a reading aid. It helps the user follow the interlinear flow, but it should not be cited as a standalone translation. The full verse translation, source syntax, and context must still be considered.

Review and correction workflow

Because Maqor includes generated data, corrections are expected over time. A correction may involve a morphology mapping, a lemma display, a Syriac transliteration rule, a Greek pronunciation case, a Hebrew stress rule, or a translation alignment. Useful correction reports should identify the corpus, book, chapter, verse, surface word, expected value, and reason for the correction. That keeps the project auditable rather than anecdotal.

Directionality and layout

Layout is part of methodology. Hebrew and Syriac should not be forced into a left-to-right reading pattern simply because the interface is built with web technologies. Maqor keeps Semitic text direction right-to-left where appropriate and keeps Greek left-to-right. This affects not only the full verse line but also the order in which word blocks should be read. A correct interlinear layout must respect the reading direction of the source language.

Why the app uses a compact display

The compact word card is a compromise between density and readability. A user should be able to read the chapter or verse without every word becoming a paragraph. For that reason, morphology is abbreviated in the card, while the popup expands it. Strong's or analogous lexical identifiers are visible quickly, while definitions and broader lexical notes are shown on demand. This design is especially important on mobile screens where space is limited.

Language-specific pronunciation rules

Pronunciation rules are not shared blindly across languages. Hebrew stress and vowel signs require one set of decisions. Greek pronunciation uses a documented Erasmian-style model with explicit choices for letters and diphthongs. Syriac may require eastern and western variants. Maqor should keep those rule sets separate because a single generic transliteration system would erase meaningful differences between the languages.

How generated data is treated

Generated data is useful, but it must be identified as generated. IPA, compact morphology codes, normalized lemma displays, and translation alignments can all involve project logic. If a generated field is wrong, the correction may not require changing the source text. It may require changing the transformation rule. This distinction helps keep source data stable while improving the interface over time.

How this supports future interpretation discussion

The future forum-style layer depends on this methodology. If users are going to debate interpretations, they need stable references to the underlying data. A discussion about a verb form should be able to point to the surface token, morphology, lemma, source, and translation support. Without that structure, the conversation becomes detached from the text. Maqor's methodology is therefore not only technical; it prepares the ground for accountable discussion.