Wikikamus mswiktionary https://ms.wiktionary.org/wiki/Wikikamus:Laman_Utama MediaWiki 1.46.0-wmf.24 case-sensitive Media Khas Perbincangan Pengguna Perbincangan pengguna Wikikamus Perbincangan Wikikamus Fail Perbincangan fail MediaWiki Perbincangan MediaWiki Templat Perbincangan templat Bantuan Perbincangan bantuan Kategori Perbincangan kategori Lampiran Perbincangan lampiran Rima Perbincangan rima Tesaurus Perbincangan tesaurus Indeks Perbincangan indeks Petikan Perbincangan petikan Rekonstruksi Perbincangan rekonstruksi Padanan isyarat Perbincangan padanan isyarat Konkordans Perbincangan konkordans TimedText TimedText talk Modul Perbincangan modul Acara Perbincangan acara batu 0 5878 281462 252914 2026-04-23T04:40:07Z Hakimi97 2668 /* Terjemahan */ Buang ralat kod kerana parameter xs sudah tidak digunakan lagi (cari menggunakan regex \|xs=[^}]* ) 281462 wikitext text/x-wiki {{wikipedia|Batu}} {{wikipedia|Batu (ukuran)}} == Bahasa Melayu == [[Imej:Plagioclase_porphyry.jpg|thumb|Batu.]] ===Etimologi=== Turunan {{inh|ms|poz-mly-pro|*batu}} lanjut dari {{inh|ms|poz-pro|*batu}} diakar {{inh|ms|map-pro|*batu}}. Sewarisan {{cog|tl|bato}}. ===Takrifan=== {{ms-kn|j=باتو}} # [[bahan]] [[galian]] [[keras]] yang berasal dari Bumi tetapi bukan [[logam]] dan terdiri daripada pelbagai jenis. #: {{cp|ms|Mereka sedang mengutip '''batu''' di sungai.}} # sejenis logam kecil yang dipakai dalam alat pemantik [[api]] (untuk mencetuskan api). # [[permata]]. # [[ukuran]] [[jarak]] sejauh 1,760 [[ela]] (1.61 [[kilometer]]). #: {{cp|ms|Berapa '''batu''' jauhnya dari Simpang Tiga ke Simpang Empat?}} # [[penjodoh bilangan]] bagi gigi. ====Kata majmuk==== {{col3|ms|batu api|batu asah|batu bata|batu-batan|batu berani|batu bersurat|batu dacing|batu giling|batu igneus|batu kapur|batu karang|batu kelikir|batu kepala|batu kisar|batu lada|batu loncatan|batu marmar|batu permata|batu telerang|batu timbang}} ===Sebutan=== * {{dewan|ba|tu}} * {{IPA|ms|/batu/}} * {{rhymes|ms|atu|tu|u}} * {{audio|ms|Ms-MY-batu.ogg|Audio (MY)}} ===Kata terbitan=== * {{l|ms|berbatu}} * {{l|ms|berbatu-batu}} * {{l|ms|membatu}} ===Terjemahan=== {{ter-atas|benda keras semula jadi}} * Albania: {{t+|sq|gur|m}} * Arab: {{ARchar|صخر}} * Armenia: քար (k‘ar) * Asturia: piedra {{f}} * Basque: harri * Belanda: {{t+|nl|steen|m}} * Breton: maen , mein {{p}} * Bulgaria: {{t+|bg|камък|m|tr=kamǎk|sc=Cyrl}} * Catalonia: pedra {{f}} * Cina: {{t+|zh|石|tr=shí|sc=Hani}} * Chuvash: чул (chul) * Croatia: {{t+|hr|kamen|m}} * Czech: {{t+|cs|kámen|m}} * Denmark: {{t-|da|sten|m}} * Esperanto: {{t-|eo|ŝtono}} * Estonia: {{t-|et|kivi}} * Finland: {{t+|fi|kivi}} * Frisia Utara: stien * Gaelik Scot: clach {{f}} * Guaraní: ita * Hawaii: haku * Hungary: {{t+|hu|kő}} * Iceland: {{t+|is|steinn|m}} * Ido: petro * Ilocano: bato * Inggeris: {{t+|en|rock}}, {{t+|en|stone}} * Inggeris Lama: {{t-|ang|stan}} * Interlingua: petra * Ireland: cloch ''f2'' * Itali: {{t+|it|pietra|f}}, {{t+|it|roccia|f}} * Jepun: {{t+|ja|石|tr=ishi|sc=Jpan}} (いし) * Jerman: {{t+|de|Stein|m}} * Korea: [[돌]] (dol) * Kurdi: {{t+|kmr|kevir}}, {{t+|kmr|ber}}, {{t+|kmr|berd}}, {{t+|kmr|kuç}}, {{t+|ckb|به‌رد}} * Latin: {{t+|la|lapis|m}} * Latvia: {{t-|lv|akmens|m}} * Lithuania: akmuo * Malagasy: vato * Maori: {{t-|mi|whatu}} * Perancis: {{t+|fr|pierre|f}} * Pitjantjatjara: apu * Poland: {{t+|pl|kamień|m}} * Portugis: {{t+|pt|pedra|f}} * Romania: {{t-|ro|piatră|f}} * Rusia: {{t+|ru|камень|m|tr=kám'en’|sc=Cyrl}} * Sardinia (Nugor): preda {{f}} * Scotland: stane * Sami Utara: geađgi * Saxon Rendah: steen * Sepanyol: {{t+|es|piedra|f}} * Serbia: {{t-|sr|kamen|m}} * Slovakia: kameň * Slovenia: {{t+|sl|kamen|m}} * Suluk: {{t+|tsg|batu}} * Sweden: {{t+|sv|sten|c}} * Tagalog: {{t+|tl|bato}} * Tupinambá: itá * Turki: {{t+|tr|taş}} * Ukraine: камінь (kamіn’) * Vietnam: {{t+|vi|đá}} * Wales: {{t-|cy|carreg|f}} * Yunani: {{t+|el|λίθος|m|tr=líthos|sc=Grek}} {{ter-bawah}} {{ter-atas|ukuran}} * Indonesia: mil * Inggeris: {{t+|en|mile}} * Itali: miglio * Perancis: mille * Thai: {{THchar|ไมล์}} * Vietnam: Dặm Anh {{ter-bawah}} ==Bahasa Iban== ===Etimologi=== Turunan {{inh|ms|poz-pro|*batu}} dari {{inh|ms|map-pro|*batu}}. Sewarisan {{cog|tl|bato}}. ===Sebutan=== * {{dewan|ba|tu}} * {{IPA|ms|/batu/}} ===Kata nama=== {{head|iba|kata nama}} # [[batu]] #: {{cp|dtp|Tikau iya '''batu''' nya ke sungai|Dia membaling '''batu''' itu ke sungai}} == Bahasa Indonesia == ===Kata nama=== # Lihat [[#Takrifan|takrifan bahasa Melayu]]. # [[bateri]] (lampu suluh). [[Kategori:Ukuran]] ==Bahasa Kadazandusun== ===Takrifan=== ====Kata nama==== {{inti|dtp|kata nama}} # batu (jarak). #: {{cp|dtp|Piro po '''batu''' sinodu tinadalanon tokou? |Berapa '''batu''' lagi jarak perjalanan kita?}} ===Etimologi=== {{inh+|dtp|poz-pro|*batu}}, daripada {{inh|dtp|map-pro|*batu}}. ===Sebutan=== * {{IPA|dtp|/ɓa.tʊ/}} * {{penyempangan|dtp|ba|tu}} [[File:LL-Q5317225 (dtp)-Nelynnnnn-batu.wav|thumb|left]] ===Kata terbitan=== *{{l|dtp|sabatu}} ===Rujukan=== * {{R:Komoiboros DusunKadazan}} == Bahasa Bugis == ===Takrifan=== ====Kata nama==== {{inti|bug|kata nama}} # batu ===Etimologi=== {{inh+|bug|poz-pro|*batu}}, daripada {{inh|bug|map-pro|*batu}}. == Bahasa Suluk == ===Takrifan=== ==== Kata nama ==== {{inti|tsg|kata nama}} # batu === Etymology === {{inh+|tsg|phi-pro|*batu}}, daripada {{inh|tsg|poz-pro|*batu}}, daripada {{inh|tsg|map-pro|*batu}}. Banding {{cog|ceb|bato}} & {{cog|tl|bato}}. ====Kata terbitan==== {{der3|tsg|kabatuhan|tubig batu<qq:ais; air batu>|mabatu<qq:berbatu>|batu balani|batu lantup|atay-batu|batu hihilug}} ==Bahasa Iban== ===Takrifan=== ====Kata nama==== {{inti|iba|kata nama}} # batu #: {{cp|iba|Iya deka nikau aku ngena '''batu'''.| Dia hendak membaling saya menggunakan '''batu'''.}} szp9kpf123je7u14i8dpjvg0kcra5br Templat:it-adj 10 6413 281466 245684 2026-04-23T09:45:35Z Hakimi97 2668 281466 wikitext text/x-wiki <includeonly>{{#invoke:it-headword|show|Kata sifat}}</includeonly><noinclude>{{documentation}}</noinclude> 3r2u8lq81rjux9b8olgm88g3d02t8ep Modul:headword 828 9757 281453 281238 2026-04-22T15:30:22Z Hakimi97 2668 281453 Scribunto text/plain local export = {} -- Named constants for all modules used, to make it easier to swap out sandbox versions. local debug_track_module = "Module:debug/track" local en_utilities_module = "Module:en-utilities" local gender_and_number_module = "Module:gender and number" local headword_data_module = "Module:headword/data" local headword_page_module = "Module:headword/page" local links_module = "Module:links" local load_module = "Module:load" local pages_module = "Module:pages" local palindromes_module = "Module:palindromes" local pron_qualifier_module = "Module:pron qualifier" local scripts_module = "Module:scripts" local scripts_data_module = "Module:scripts/data" local script_utilities_module = "Module:script utilities" local script_utilities_data_module = "Module:script utilities/data" local string_utilities_module = "Module:string utilities" local table_module = "Module:table" local utilities_module = "Module:utilities" local concat = table.concat local dump = mw.dumpObject local insert = table.insert local ipairs = ipairs local max = math.max local new_title = mw.title.new local pairs = pairs local require = require local toNFC = mw.ustring.toNFC local toNFD = mw.ustring.toNFD local type = type local ufind = mw.ustring.find local ugmatch = mw.ustring.gmatch local ugsub = mw.ustring.gsub local umatch = mw.ustring.match --[==[ Loaders for functions in other modules, which overwrite themselves with the target function when called. This ensures modules are only loaded when needed, retains the speed/convenience of locally-declared pre-loaded functions, and has no overhead after the first call, since the target functions are called directly in any subsequent calls.]==] local function debug_track(...) debug_track = require(debug_track_module) return debug_track(...) end local function encode_entities(...) encode_entities = require(string_utilities_module).encode_entities return encode_entities(...) end local function extend(...) extend = require(table_module).extend return extend(...) end local function find_best_script_without_lang(...) find_best_script_without_lang = require(scripts_module).findBestScriptWithoutLang return find_best_script_without_lang(...) end local function format_categories(...) format_categories = require(utilities_module).format_categories return format_categories(...) end local function format_genders(...) format_genders = require(gender_and_number_module).format_genders return format_genders(...) end local function format_pron_qualifiers(...) format_pron_qualifiers = require(pron_qualifier_module).format_qualifiers return format_pron_qualifiers(...) end local function full_link(...) full_link = require(links_module).full_link return full_link(...) end local function get_current_L2(...) get_current_L2 = require(pages_module).get_current_L2 return get_current_L2(...) end local function get_link_page(...) get_link_page = require(links_module).get_link_page return get_link_page(...) end local function get_script(...) get_script = require(scripts_module).getByCode return get_script(...) end local function is_palindrome(...) is_palindrome = require(palindromes_module).is_palindrome return is_palindrome(...) end local function language_link(...) language_link = require(links_module).language_link return language_link(...) end local function load_data(...) load_data = require(load_module).load_data return load_data(...) end local function pattern_escape(...) pattern_escape = require(string_utilities_module).pattern_escape return pattern_escape(...) end local function pluralize(...) pluralize = require(en_utilities_module).pluralize return pluralize(...) end local function process_page(...) process_page = require(headword_page_module).process_page return process_page(...) end local function remove_links(...) remove_links = require(links_module).remove_links return remove_links(...) end local function shallow_copy(...) shallow_copy = require(table_module).shallowCopy return shallow_copy(...) end local function tag_text(...) tag_text = require(script_utilities_module).tag_text return tag_text(...) end local function tag_transcription(...) tag_transcription = require(script_utilities_module).tag_transcription return tag_transcription(...) end local function tag_translit(...) tag_translit = require(script_utilities_module).tag_translit return tag_translit(...) end local function trim(...) trim = require(string_utilities_module).trim return trim(...) end local function ulen(...) ulen = require(string_utilities_module).len return ulen(...) end local function ucfirst(...) ucfirst = require(string_utilities_module).ucfirst return ucfirst(...) end --[==[ Loaders for objects, which load data (or some other object) into some variable, which can then be accessed as "foo or get_foo()", where the function get_foo sets the object to "foo" and then returns it. This ensures they are only loaded when needed, and avoids the need to check for the existence of the object each time, since once "foo" has been set, "get_foo" will not be called again.]==] local m_data local function get_data() m_data = load_data(headword_data_module) return m_data end local script_data local function get_script_data() script_data = load_data(scripts_data_module) return script_data end local script_utilities_data local function get_script_utilities_data() script_utilities_data = load_data(script_utilities_data_module) return script_utilities_data end -- If set to true, categories always appear, even in non-mainspace pages local test_force_categories = false -- Add a tracking category to track entries with certain (unusually undesirable) properties. `track_id` is an identifier -- for the particular property being tracked and goes into the tracking page. Specifically, this adds a link in the -- page text to [[Wiktionary:Tracking/headword/TRACK_ID]], meaning you can find all entries with the `track_id` property -- by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID]]. -- -- If `lang` (a language object) is given, an additional tracking page [[Wiktionary:Tracking/headword/TRACK_ID/CODE]] is -- linked to where CODE is the language code of `lang`, and you can find all entries in the combination of `track_id` -- and `lang` by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID/CODE]]. This makes it possible to -- isolate only the entries with a specific tracking property that are in a given language. Note that if `lang` -- references at etymology-only language, both that language's code and its full parent's code are tracked. local function track(track_id, lang) local tracking_page = "headword/" .. track_id if lang and lang:hasType("etymology-only") then debug_track{tracking_page, tracking_page .. "/" .. lang:getCode(), tracking_page .. "/" .. lang:getFullCode()} elseif lang then debug_track{tracking_page, tracking_page .. "/" .. lang:getCode()} else debug_track(tracking_page) end return true end local function text_in_script(text, script_code) local sc = get_script(script_code) if not sc then error("Internal error: Bad script code " .. script_code) end local characters = sc.characters local out if characters then text = ugsub(text, "%W", "") out = ufind(text, "[" .. characters .. "]") end if out then return true else return false end end local spacingPunctuation = "[%s%p]+" --[[ List of punctuation or spacing characters that are found inside of words. Used to exclude characters from the regex above. ]] local wordPunc = "-#%%&@־׳״'.·*’་•:᠊" local notWordPunc = "[^" .. wordPunc .. "]+" -- Format a term (either a head term or an inflection term) along with any left or right qualifiers, labels, references -- or customized separator: `part` is the object specifying the term (and `lang` the language of the term), which should -- optionally contain: -- * left qualifiers in `q`, an array of strings; -- * right qualifiers in `qq`, an array of strings; -- * left labels in `l`, an array of strings; -- * right labels in `ll`, an array of strings; -- * references in `refs`, an array either of strings (formatted reference text) or objects containing fields `text` -- (formatted reference text) and optionally `name` and/or `group`; -- * a separator in `separator`, defaulting to " <i>or</i> " if this is not the first term (j > 1), otherwise "". -- `formatted` is the formatted version of the term itself, and `j` is the index of the term. local function format_term_with_qualifiers_and_refs(lang, part, formatted, j) local function part_non_empty(field) local list = part[field] if not list then return nil end if type(list) ~= "table" then error(("Internal error: Wrong type for `part.%s`=%s, should be \"table\""):format(field, dump(list))) end return list[1] end if part_non_empty("q") or part_non_empty("qq") or part_non_empty("l") or part_non_empty("ll") or part_non_empty("refs") then formatted = format_pron_qualifiers { lang = lang, text = formatted, q = part.q, qq = part.qq, l = part.l, ll = part.ll, refs = part.refs, } end local separator = part.separator or j > 1 and " <i>or</i> " -- use "" to request no separator if separator then formatted = separator .. formatted end return formatted end --[==[Return true if the given head is multiword according to the algorithm used in full_headword().]==] function export.head_is_multiword(head) for possibleWordBreak in ugmatch(head, spacingPunctuation) do if umatch(possibleWordBreak, notWordPunc) then return true end end return false end do local function workaround_to_exclude_chars(s) return (ugsub(s, notWordPunc, "\2%1\1")) end --[==[Add links to a multiword head.]==] function export.add_multiword_links(head, default) head = "\1" .. ugsub(head, spacingPunctuation, workaround_to_exclude_chars) .. "\2" if default then head = head :gsub("(\1[^\2]*)\\([:#][^\2]*\2)", "%1\\\\%2") :gsub("(\1[^\2]*)([:#][^\2]*\2)", "%1\\%2") end --Escape any remaining square brackets to stop them breaking links (e.g. "[citation needed]"). head = encode_entities(head, "[]", true, true) --[=[ use this when workaround is no longer needed: head = "[[" .. ugsub(head, WORDBREAKCHARS, "]]%1[[") .. "]]" Remove any empty links, which could have been created above at the beginning or end of the string. ]=] return (head :gsub("\1\2", "") :gsub("[\1\2]", {["\1"] = "[[", ["\2"] = "]]"})) end end local function non_categorizable(full_raw_pagename) return full_raw_pagename:find("^Lampiran:Gerak isyarat/") or -- Unsupported titles with descriptive names. (full_raw_pagename:find("^Tajuk tidak disokong/") and not full_raw_pagename:find("`")) end local function tag_text_and_add_quals_and_refs(data, head, formatted, j) -- Add language and script wrapper. formatted = tag_text(formatted, data.lang, head.sc, "head", nil, j == 1 and data.id or nil) -- Add qualifiers, labels, references and separator. return format_term_with_qualifiers_and_refs(data.lang, head, formatted, j) end -- Format a headword with transliterations. local function format_headword(data) -- Are there non-empty transliterations? local has_translits = false local has_manual_translits = false ------ Format the headwords. ------ local head_parts = {} local unique_head_parts = {} local has_multiple_heads = not not data.heads[2] for j, head in ipairs(data.heads) do if head.tr or head.ts then has_translits = true end if head.tr and head.tr_manual or head.ts then has_manual_translits = true end local formatted -- Apply processing to the headword, for formatting links and such. if head.term:find("[[", nil, true) and head.sc:getCode() ~= "Image" then formatted = language_link{term = head.term, lang = data.lang} else formatted = data.lang:makeDisplayText(head.term, head.sc, true) end local head_part = tag_text_and_add_quals_and_refs(data, head, formatted, j) insert(head_parts, head_part) -- If multiple heads, try to determine whether all heads display the same. To do this we need to effectively -- rerun the text tagging and addition of qualifiers and references, using 1 for all indices. if has_multiple_heads then local unique_head_part if j == 1 then unique_head_part = head_part else unique_head_part = tag_text_and_add_quals_and_refs(data, head, formatted, 1) end unique_head_parts[unique_head_part] = true end end local set_size = 0 if has_multiple_heads then for _ in pairs(unique_head_parts) do set_size = set_size + 1 end end if set_size == 1 then head_parts = head_parts[1] else head_parts = concat(head_parts) end if has_manual_translits then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr/LANGCODE]] track("manual-tr", data.lang) end ------ Format the transliterations and transcriptions. ------ local translits_formatted if has_translits then local translit_parts = {} for _, head in ipairs(data.heads) do if head.tr or head.ts then local this_parts = {} if head.tr then insert(this_parts, tag_translit(head.tr, data.lang:getCode(), "head", nil, head.tr_manual)) if head.ts then insert(this_parts, " ") end end if head.ts then insert(this_parts, "/" .. tag_transcription(head.ts, data.lang:getCode(), "head") .. "/") end insert(translit_parts, concat(this_parts)) end end translits_formatted = " (" .. concat(translit_parts, " <i>or</i> ") .. ")" local langname = data.lang:getCanonicalName() local transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus") local saw_translit_page = false if transliteration_page and transliteration_page:getContent() then translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted saw_translit_page = true end -- If data.lang is an etymology-only language and we didn't find a translation page for it, fall back to the -- full parent. if not saw_translit_page and data.lang:hasType("etymology-only") then langname = data.lang:getFullName() transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus") if transliteration_page and transliteration_page:getContent() then translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted end end else translits_formatted = "" end ------ Paste heads and transliterations/transcriptions. ------ local lemma_gloss if data.gloss then lemma_gloss = ' <span class="ib-content qualifier-content">' .. data.gloss .. '</span>' else lemma_gloss = "" end return head_parts .. translits_formatted .. lemma_gloss end local function format_headword_genders(data) local retval = "" if data.genders and data.genders[1] then if data.gloss then retval = "," end local pos_for_cat if not data.nogendercat then local no_gender_cat = (m_data or get_data()).no_gender_cat if not (no_gender_cat[data.lang:getCode()] or no_gender_cat[data.lang:getFullCode()]) then pos_for_cat = (m_data or get_data()).pos_for_gender_number_cat[data.pos_category:gsub("^reconstructed ", "")] end end local text, cats = format_genders(data.genders, data.lang, pos_for_cat) if cats then extend(data.categories, cats) end retval = retval .. "&nbsp;" .. text end return retval end -- Forward reference local format_inflections local function format_inflection_parts(data, parts) for j, part in ipairs(parts) do if type(part) ~= "table" then part = {term = part} end local partaccel = part.accel local face = part.face or "bold" if face ~= "bold" and face ~= "plain" and face ~= "hypothetical" then error("The face `" .. face .. "` " .. ( (script_utilities_data or get_script_utilities_data()).faces[face] and "should not be used for non-headword terms on the headword line." or "is invalid." )) end -- Here the final part 'or data.nolinkinfl' allows to have 'nolinkinfl=true' -- right into the 'data' table to disable inflection links of the entire headword -- when inflected forms aren't entry-worthy, e.g.: in Vulgar Latin local nolinkinfl = part.face == "hypothetical" or (part.nolink and track("nolink") or part.nolinkinfl) or ( data.nolink and track("nolink") or data.nolinkinfl) local formatted if part.label then -- FIXME: There should be a better way of italicizing a label. As is, this isn't customizable. formatted = "<i>" .. part.label .. "</i>" else -- Convert the term into a full link. Don't show a transliteration here unless enable_auto_translit is -- requested, either at the `parts` level (i.e. per inflection) or at the `data.inflections` level (i.e. -- specified for all inflections). This is controllable in {{head}} using autotrinfl=1 for all inflections, -- or fNautotr=1 for an individual inflection (remember that a single inflection may be associated with -- multiple terms). The reason for doing this is to avoid clutter in headword lines by default in languages -- where the script is relatively straightforward to read by learners (e.g. Greek, Russian), but allow it -- to be enabled in languages with more complex scripts (e.g. Arabic). -- -- FIXME: With nested inflections, should we also respect `enable_auto_translit` at the top level of the -- nested inflections structure? local tr = part.tr or not (parts.enable_auto_translit or data.inflections.enable_auto_translit) and "-" or nil -- FIXME: Temporary errors added 2025-10-03. Remove after a month or so. if part.translit then error("Internal error: Use field `tr` not `translit` for specifying an inflection part translit") end if part.transcription then error("Internal error: Use field `ts` not `transcription` for specifying an inflection part transcription") end local postprocess_annotations if part.inflections then postprocess_annotations = function(infldata) insert(infldata.annotations, format_inflections(data, part.inflections)) end end formatted = full_link( { term = not nolinkinfl and part.term or nil, alt = part.alt or (nolinkinfl and part.term or nil), lang = part.lang or data.lang, sc = part.sc or parts.sc or nil, gloss = part.gloss, pos = part.pos, lit = part.lit, id = part.id, genders = part.genders, tr = tr, ts = part.ts, accel = partaccel or parts.accel, postprocess_annotations = postprocess_annotations, }, face ) end parts[j] = format_term_with_qualifiers_and_refs(part.lang or data.lang, part, formatted, j) end local parts_output if parts[1] then parts_output = (parts.label and " " or "") .. concat(parts) elseif parts.request then parts_output = " <small>[please provide]</small>" insert(data.categories, "Requests for inflections in " .. data.lang:getFullName() .. " entries") else parts_output = "" end local parts_label = parts.label and ("<i>" .. parts.label .. "</i>") or "" return format_term_with_qualifiers_and_refs(data.lang, parts, parts_label .. parts_output, 1) end -- Format the inflections following the headword or nested after a given inflection. Declared local above. function format_inflections(data, inflections) if inflections and inflections[1] then -- Format each inflection individually. for key, infl in ipairs(inflections) do inflections[key] = format_inflection_parts(data, infl) end return concat(inflections, ", ") else return "" end end -- Format the top-level inflections following the headword. Currently this just adds parens around the -- formatted comma-separated inflections in `data.inflections`. local function format_top_level_inflections(data) local result = format_inflections(data, data.inflections) if result ~= "" then return " (" .. result .. ")" else return result end end -- Forward reference local check_red_link_inflections -- Check a single inflection (which consists of a label and zero or more terms, each possibly with nested inflections) -- for red links. If so, insert a red-link category based on `plpos` (the plural part of speech to insert in the -- category), stop further processing, and return true. If no red links found, return false. local function check_red_link_inflection_parts(data, parts, plpos) for _, part in ipairs(parts) do if type(part) ~= "table" then part = {term = part} end local term = part.term if term and not term:find("%[%[") then local stripped_physical_term = get_link_page(term, data.lang, part.sc or parts.sc or nil) if stripped_physical_term then local title = mw.title.new(stripped_physical_term) if title and not title:getContent() then insert(data.categories, data.lang:getFullName() .. " " .. plpos .. " with red links in their headword lines") return true end end end if part.inflections then if check_red_link_inflections(data, part.inflections, plpos) then return true end end end return false end -- Check a set of inflections (each of which describes a single inflection of the term, such as feminine or plural, and -- consists of a label and zero or more terms, each possibly with nested inflections) for red links. If so, insert a -- red-link category based on `plpos` (the plural part of speech to insert in the category), stop further processing, -- and return true. If no red links found, return false. function check_red_link_inflections(data, inflections, plpos) if inflections and inflections[1] then -- Check each inflection individually. for key, infl in ipairs(inflections) do if check_red_link_inflection_parts(data, infl, plpos) then return true end end end return false end -- Check the top-level inflections in `data.inflections`, along with any nested inflections, for red links. If so, -- insert a red-link category based on `plpos` (the plural part of speech to insert in the category), stop further -- processing, and return true. If no red links found, return false. local function check_red_link_inflections_top_level(data, plpos) return check_red_link_inflections(data, data.inflections, plpos) end --[==[ Returns the plural form of `pos`, a raw part of speech input, which could be singular or plural. Irregular plural POS are taken into account (e.g. "kanji" pluralizes to "kanji"). ]==] function export.pluralize_pos(pos) -- Make the plural form of the part of speech return (m_data or get_data()).irregular_plurals[pos] or pos:sub(-1) == "s" and pos or pluralize(pos) end --[==[ Return "lemma" if the given POS is a lemma, "non-lemma form" if a non-lemma form, or nil if unknown. The POS passed in must be in its plural form ("nouns", "prefixes", etc.). If you have a POS in its singular form, call {export.pluralize_pos()} above to pluralize it in a smart fashion that knows when to add "-s" and when to add "-es", and also takes into account any irregular plurals. If `best_guess` is given and the POS is in neither the lemma nor non-lemma list, guess based on whether it ends in " forms"; otherwise, return nil. ]==] function export.pos_lemma_or_nonlemma(plpos, best_guess) local m_headword_data = m_data or get_data() local isLemma = m_headword_data.lemmas -- Is it a lemma category? if isLemma[plpos] then return "Lema" end local plpos_no_recon = plpos:gsub("^reconstructed ", "") if isLemma[plpos_no_recon] then return "Lema" end -- Is it a nonlemma category? local isNonLemma = m_headword_data.nonlemmas if isNonLemma[plpos] or isNonLemma[plpos_no_recon] then return "Bentuk bukan lema" end local plpos_no_mut = plpos:gsub("^mutated ", "") if isLemma[plpos_no_mut] or isNonLemma[plpos_no_mut] then return "Bentuk bukan lema" elseif best_guess then return plpos:find("^Bentuk ") and "Bentuk bukan lema" or "Lema" else return nil end end --[==[ Canonicalize a part of speech as specified in 2= in {{tl|head}}. This checks for POS aliases and non-lemma form aliases ending in 'f', and then pluralizes if the POS term does not have an invariable plural. ]==] function export.canonicalize_pos(pos) -- FIXME: Temporary code to throw an error for alias 'pre' (= preposition) that will go away. if pos == "pre" then -- Don't throw error on 'pref' as it's an alias for "prefix". error("POS 'pre' for 'preposition' no longer allowed as it's too ambiguous; use 'prep'") end -- Likewise for pro = pronoun. if pos == "pro" or pos == "prof" then error("POS 'pro' for 'pronoun' no longer allowed as it's too ambiguous; use 'pron'") end local m_headword_data = m_data or get_data() if m_headword_data.pos_aliases[pos] then pos = m_headword_data.pos_aliases[pos] elseif pos:sub(-1) == "f" then pos = pos:sub(1, -2) pos = "Bentuk " .. (m_headword_data.pos_aliases[pos] or pos) end return export.pluralize_pos(pos) end -- Find and return the maximum index in the array `data[element]` (which may have gaps in it), and initialize it to a -- zero-length array if unspecified. Check to make sure all keys are numeric (other than "maxindex", which is set by -- [[Module:parameters]] for list parameters), all values are strings, and unless `allow_blank_string` is given, -- no blank (zero-length) strings are present. local function init_and_find_maximum_index(data, element, allow_blank_string) local maxind = 0 if not data[element] then data[element] = {} end local typ = type(data[element]) if typ ~= "table" then error(("Internal error: In full_headword(), `data.%s` must be an array but is a %s"):format(element, typ)) end for k, v in pairs(data[element]) do if k ~= "maxindex" then if type(k) ~= "number" then error(("Internal error: Unrecognized non-numeric key '%s' in `data.%s`"):format(k, element)) end if k > maxind then maxind = k end if v then if type(v) ~= "string" then error(("Internal error: For key '%s' in `data.%s`, value should be a string but is a %s"):format(k, element, type(v))) end if not allow_blank_string and v == "" then error(("Internal error: For key '%s' in `data.%s`, blank string not allowed; use 'false' for the default"):format(k, element)) end end end end return maxind end --[==[ -- Add the page to various maintenance categories for the language and the -- whole page. These are placed in the headword somewhat arbitrarily, but -- mainly because headword templates are mandatory for entries (meaning that -- in theory it provides full coverage). -- -- This is provided as an external entry point so that modules which transclude -- information from other entries (such as {{tl|ja-see}}) can take advantage -- of this feature as well, because they are used in place of a conventional -- headword template.]==] do -- Handle any manual sortkeys that have been specified in raw categories -- by tracking if they are the same or different from the automatically- -- generated sortkey, so that we can track them in maintenance -- categories. local function handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats) sortkey = sortkey or lang:makeSortKey(page.pagename) -- If there are raw categories with no sortkey, then they will be -- sorted based on the default MediaWiki sortkey, so we check against -- that. if tbl == true then if page.raw_defaultsort ~= sortkey then insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik") end return end local redundant, different for k in pairs(tbl) do if k == sortkey then redundant = true else different = true end end if redundant then insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih lewah") end if different then insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik") end return sortkey end function export.maintenance_cats(page, lang, lang_cats, page_cats) extend(page_cats, page.cats) lang = lang:getFull() -- since we are just generating categories local canonical = lang:getCanonicalName() local tbl, sortkey = page.wikitext_topic_cat[lang:getCode()] if tbl then sortkey = handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats) insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori topik yang menggunakan penanda mentah") end tbl = page.wikitext_langname_cat[canonical] if tbl then handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats) insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori nama bahasa yang menggunakan penanda mentah") end local current_L2 = get_current_L2() if current_L2 then local trimmed_L2 = trim(current_L2) local expected_L2 = "Bahasa " .. canonical if trimmed_L2 ~= expected_L2 then insert(lang_cats, "Entri bahasa " .. canonical .. " dengan pengepala bahasa tidak betul") -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul/LANGCODE]] track("pengepala bahasa tidak betul", lang) end end end end --[==[This is the primary external entry point. {{lua|full_headword(data)}} This is used by {{temp|head}} and various language-specific headword templates (e.g. {{temp|ru-adj}} for Russian adjectives, {{temp|de-noun}} for German nouns, etc.) to display an entire headword line. See [[#Further explanations for full_headword()]] ]==] function export.full_headword(data) -- Prevent data from being destructively modified. local data = shallow_copy(data) ------------ 1. Basic checks for old-style (multi-arg) calling convention. ------------ if data.getCanonicalName then error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) of properties, not a language object") end if not data.lang or type(data.lang) ~= "table" or not data.lang.getCode then error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) and `data.lang` must be a language object") end if data.id and type(data.id) ~= "string" then error("Internal error: The id in the data table should be a string.") end ------------ 2. Initialize pagename etc. ------------ local langcode = data.lang:getCode() local full_langcode = data.lang:getFullCode() local langname = data.lang:getCanonicalName() local full_langname = data.lang:getFullName() local raw_pagename = data.pagename local page local m_headword_data = m_data or get_data() if raw_pagename and raw_pagename ~= m_headword_data.pagename then -- for testing, doc pages, etc. -- data.pagename is often set on documentation and test pages through the pagename= parameter of various -- templates, to emulate running on that page. Having a large number of such test templates on a single -- page often leads to timeouts, because we fetch and parse the contents of each page in turn. However, -- we don't really need to do that and can function fine without fetching and parsing the contents of a -- given page, so turn off content fetching/parsing (and also setting the DEFAULTSORT key through a parser -- function, which is *slooooow*) in certain namespaces where test and documentation templates are likely to -- be found and where actual content does not live (User, Template, Module). local actual_namespace = m_headword_data.page.namespace local no_fetch_content = actual_namespace == "User" or actual_namespace == "Template" or actual_namespace == "Module" page = process_page(raw_pagename, no_fetch_content) else page = m_headword_data.page end local namespace = page.namespace ------------ 3. Initialize `data.heads` table; if old-style, convert to new-style. ------------ if type(data.heads) == "table" and type(data.heads[1]) == "table" then -- new-style if data.translits or data.transcriptions then error("Internal error: In full_headword(), if `data.heads` is new-style (array of head objects), `data.translits` and `data.transcriptions` cannot be given") end else -- convert old-style `heads`, `translits` and `transcriptions` to new-style local maxind = max( init_and_find_maximum_index(data, "heads"), init_and_find_maximum_index(data, "translits", true), init_and_find_maximum_index(data, "transcriptions", true) ) for i = 1, maxind do data.heads[i] = { term = data.heads[i], tr = data.translits[i], ts = data.transcriptions[i], } end end -- Make sure there's at least one head. if not data.heads[1] then data.heads[1] = {} end ------------ 4. Initialize and validate `data.categories` and `data.whole_page_categories`, and determine `pos_category` if not given, and add basic categories. ------------ -- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]] if data.altform then data.noposcat = true end init_and_find_maximum_index(data, "categories") init_and_find_maximum_index(data, "whole_page_categories") local pos_category_already_present = false if data.categories[1] then local escaped_langname = pattern_escape(full_langname) local matches_lang_pattern = "^" .. escaped_langname .. " " for _, cat in ipairs(data.categories) do -- Does the category begin with the language name? If not, tag it with a tracking category. if not cat:find(matches_lang_pattern) then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category/LANGCODE]] track("no lang category", data.lang) end end -- If `pos_category` not given, try to infer it from the first specified category. If this doesn't work, we -- throw an error below. if not data.pos_category and data.categories[1]:find(matches_lang_pattern) then data.pos_category = data.categories[1]:gsub(matches_lang_pattern, "") -- Optimization to avoid inserting category already present. pos_category_already_present = true end end if not data.pos_category then error("Internal error: `data.pos_category` not specified and could not be inferred from the categories given in " .. "`data.categories`. Either specify the plural part of speech in `data.pos_category` " .. "(e.g. \"proper nouns\") or ensure that the first category in `data.categories` is formed from the " .. "language's canonical name plus the plural part of speech (e.g. \"Norwegian Bokmål proper nouns\")." ) end -- Insert a category at the beginning for the part of speech unless it's already present or `data.noposcat` given. if not pos_category_already_present and not data.noposcat then local pos_category = ucfirst(data.pos_category) .. " bahasa " .. full_langname -- FIXME: [[User:Theknightwho]] Why is this special case here? Please add an explanatory comment. if pos_category ~= "Aksara Han rentas bahasa" then insert(data.categories, 1, pos_category) end end -- Try to determine whether the part of speech refers to a lemma or a non-lemma form; if we can figure this out, -- add an appropriate category. local postype = export.pos_lemma_or_nonlemma(data.pos_category) if not postype then -- We don't know what this category is, so tag it with a tracking category. -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/LANGCODE]] track("unrecognized pos", data.lang) -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS/LANGCODE]] track("unrecognized pos/pos/" .. data.pos_category, data.lang) elseif not data.noposcat then insert(data.categories, 1, ucfirst(postype) .. " bahasa " .. full_langname) end -- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]] if data.altform then insert(data.categories, 1, "Bentuk alternatif bahasa " .. full_langname) end ------------ 5. Create a default headword, and add links to multiword page names. ------------ -- Determine if this is an "anti-asterisk" term, i.e. an attested term in a language that must normally be -- reconstructed. local is_anti_asterisk = data.heads[1].term and data.heads[1].term:find("^!!") local lang_reconstructed = data.lang:hasType("reconstructed") if is_anti_asterisk then if not lang_reconstructed then error("Anti-asterisk feature (head= beginning with !!) can only be used with reconstructed languages") end lang_reconstructed = false end -- Determine if term is reconstructed local is_reconstructed = namespace == "Rekonstruksi" or data.lang:hasType("reconstructed") -- Create a default headword based on the pagename, which is determined in -- advance by the data module so that it only needs to be done once. local default_head = page.pagename -- Add links to multi-word page names when appropriate if not (is_reconstructed or data.nolinkhead) then local no_links = m_headword_data.no_multiword_links if not (no_links[langcode] or no_links[full_langcode]) and export.head_is_multiword(default_head) then default_head = export.add_multiword_links(default_head, true) end end if is_reconstructed then default_head = "*" .. default_head end ------------ 6. Check the namespace against the language type. ------------ if namespace == "" then if lang_reconstructed then error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Rekonstruksi: ") elseif data.lang:hasType("appendix-constructed") then error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Lampiran: ") end elseif namespace == "Petikan" or namespace == "Tesaurus" then error("Templat pengepala tidak boleh digunakan dalam ruang nama " .. namespace .. ": .") end ------------ 7. Fill in missing values in `data.heads`. ------------ -- True if any script among the headword scripts has spaces in it. local any_script_has_spaces = false -- True if any term has a redundant head= param. local has_redundant_head_param = false for _, head in ipairs(data.heads) do ------ 7a. If missing head, replace with default head. if not head.term then head.term = default_head elseif head.term == default_head then has_redundant_head_param = true elseif is_anti_asterisk and head.term == "!!" then -- If explicit head=!! is given, it's an anti-asterisk term and we fill in the default head. head.term = "!!" .. default_head elseif head.term:find("^[!?]$") then -- If explicit head= just consists of ! or ?, add it to the end of the default head. head.term = default_head .. head.term end head.term_no_initial_bang_bang = is_anti_asterisk and head.term:sub(3) or head.term if is_reconstructed then local head_term = head.term if head_term:find("%[%[") then head_term = remove_links(head_term) end if head_term:sub(1, 1) ~= "*" then error("The headword '" .. head_term .. "' must begin with '*' to indicate that it is reconstructed.") end end ------ 7b. Try to detect the script(s) if not provided. If a per-head script is provided, that takes precedence, ------ otherwise fall back to the overall script if given. If neither given, autodetect the script. local auto_sc = data.lang:findBestScript(head.term) if ( auto_sc:getCode() == "None" and find_best_script_without_lang(head.term):getCode() ~= "None" ) then insert(data.categories, "Perkataan dengan bentuk tulisan tidak piawai bahasa " .. full_langname ) end if not (head.sc or data.sc) then -- No script code given, so use autodetected script. head.sc = auto_sc else if not head.sc then -- Overall script code given. head.sc = data.sc end -- Track uses of sc parameter. if head.sc:getCode() == auto_sc:getCode() then track("redundant script code", data.lang) if not data.no_script_code_cat then insert(data.categories, "Perkataan dengan kod tulisan lewah bahasa " .. full_langname ) end else track("non-redundant manual script code", data.lang) if not data.no_script_code_cat then insert(data.categories, "Perkataan dengan kod tulisan manual tidak lewah bahasa " .. full_langname ) end end end -- If using a discouraged character sequence, add to maintenance category. if head.sc:hasNormalizationFixes() == true then local composed_head = toNFC(head.term) if head.sc:fixDiscouragedSequences(composed_head) ~= composed_head then insert(data.whole_page_categories, "Laman menggunakan jujukan aksara tidak digalakkan") end end any_script_has_spaces = any_script_has_spaces or head.sc:hasSpaces() ------ 7c. Create automatic transliterations for any non-Latin headwords without manual translit given ------ (provided automatic translit is available, e.g. not in Persian or Hebrew). -- Make transliterations head.tr_manual = nil -- Try to generate a transliteration if necessary if head.tr == "-" then head.tr = nil else local notranslit = m_headword_data.notranslit if not (notranslit[langcode] or notranslit[full_langcode]) and head.sc:isTransliterated() then head.tr_manual = not not head.tr local text = head.term_no_initial_bang_bang if not data.lang:link_tr(head.sc) then text = remove_links(text) end local automated_tr = data.lang:transliterate(text, head.sc) if automated_tr then local manual_tr = head.tr if manual_tr then if remove_links(manual_tr) == remove_links(automated_tr) then insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi lewah") else insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi manual tidak lewah") end end if not manual_tr then head.tr = automated_tr end end -- There is still no transliteration? -- Add the entry to a cleanup category. if not head.tr then head.tr = "<small>transliteration needed</small>" -- FIXME: No current support for 'Request for transliteration of Classical Persian terms' or similar. -- Consider adding this support in [[Module:category tree/poscatboiler/data/entry maintenance]]. insert(data.categories, "Permintaan transliterasi perkataan bahasa " .. full_langname) else -- Otherwise, trim it. head.tr = trim(head.tr) end end end -- Link to the transliteration entry for languages that require this. if head.tr and data.lang:link_tr(head.sc) then head.tr = full_link{ term = head.tr, lang = data.lang, sc = get_script("Latn"), tr = "-" } end end ------------ 8. Maybe tag the title with the appropriate script code, using the `display_title` mechanism. ------------ -- Assumes that the scripts in "toBeTagged" will never occur in the Reconstruction namespace. -- (FIXME: Don't make assumptions like this, and if you need to do so, throw an error if the assumption is violated.) -- Avoid tagging ASCII as Hani even when it is tagged as Hani in the headword, as in [[check]]. The check for ASCII -- might need to be expanded to a check for any Latin characters and whitespace or punctuation. local display_title -- Where there are multiple headwords, use the script for the first. This assumes the first headword is similar to -- the pagename, and that headwords that are in different scripts from the pagename aren't first. This seems to be -- about the best we can do (alternatively we could potentially do script detection on the pagename). local dt_script = data.heads[1].sc local dt_script_code = dt_script:getCode() local page_non_ascii = namespace == "" and not page.pagename:find("^[%z\1-\127]+$") local unsupported_pagename, unsupported = page.full_raw_pagename:gsub("^Tajuk tidak disokong/", "") if unsupported == 1 and page.unsupported_titles[unsupported_pagename] then display_title = 'Tajuk tidak disokong/<span class="' .. dt_script_code .. '">' .. page.unsupported_titles[unsupported_pagename] .. '</span>' elseif page_non_ascii and m_headword_data.toBeTagged[dt_script_code] or (dt_script_code == "Jpan" and (text_in_script(page.pagename, "Hira") or text_in_script(page.pagename, "Kana"))) or (dt_script_code == "Kore" and text_in_script(page.pagename, "Hang")) then display_title = '<span class="' .. dt_script_code .. '">' .. page.full_raw_pagename .. '</span>' -- Keep Han entries region-neutral in the display title. elseif page_non_ascii and (dt_script_code == "Hant" or dt_script_code == "Hans") then display_title = '<span class="Hani">' .. page.full_raw_pagename .. '</span>' elseif namespace == "Rekonstruksi" then local matched display_title, matched = ugsub( page.full_raw_pagename, "^(Rekonstruksi:[^/]+/)(.+)$", function(before, term) return before .. tag_text(term, data.lang, dt_script) end ) if matched == 0 then display_title = nil end end -- FIXME: Generalize this. -- If the current language uses ur-Arab (for Urdu, etc.), ku-Arab (Central Kurdish) or pa-Arab -- (Shahmukhi, for Punjabi) and there's more than one language on the page, don't set the display title -- because these three scripts display in Nastaliq and we don't want this for terms that also exist in other -- languages that don't display in Nastaliq (e.g. Arabic or Persian) to display in Nastaliq. Because the word -- "Urdu" occurs near the end of the alphabet, Urdu fonts tend to override the fonts of other languages. -- FIXME: This is checking for more than one language on the page but instead needs to check if there are any -- languages using scripts other than the ones just mentioned. if (dt_script_code == "ur-Arab" or dt_script_code == "ku-Arab" or dt_script_code == "pa-Arab") and page.L2_list.n > 1 then display_title = nil end if display_title then mw.getCurrentFrame():callParserFunction( "DISPLAYTITLE", display_title ) end ------------ 9. Insert additional categories. ------------ if data.force_cat_output then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/force cat output]] track("force cat output") end if has_redundant_head_param then if not data.no_redundant_head_cat then -- This is not the right way to go about this; too many exceptions and problems due to language-specific headword -- handling customization. If we want this, it should be opt-in by a given language passing in the default headword. -- insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan parameter kepala lewah") end end -- If the first head is multiword (after removing links), maybe insert into "LANG multiword terms". if not data.nomultiwordcat and any_script_has_spaces and postype == "lemma" then local no_multiword_cat = m_headword_data.no_multiword_cat if not (no_multiword_cat[langcode] or no_multiword_cat[full_langcode]) then -- Check for spaces or hyphens, but exclude prefixes and suffixes. -- Use the pagename, not the head= value, because the latter may have extra -- junk in it, e.g. superscripted text that throws off the algorithm. local no_hyphen = m_headword_data.hyphen_not_multiword_sep -- Exclude hyphens if the data module states that they should for this language. local checkpattern = (no_hyphen[langcode] or no_hyphen[full_langcode]) and ".[%s፡]." or ".[%s%-፡]." local is_multiword = umatch(page.pagename, checkpattern) if is_multiword and not non_categorizable(page.full_raw_pagename) then insert(data.categories, "Perkataan berbilang kata bahasa " .. full_langname) elseif not is_multiword then local long_word_threshold = m_headword_data.long_word_thresholds[langcode] or m_headword_data.long_word_thresholds[full_langcode] if long_word_threshold and ulen(page.pagename) >= long_word_threshold then insert(data.categories, "Perkataan panjang bahasa " .. full_langname) end end end end local default_sccat = m_headword_data.default_sccat if data.sccat or data.sccat == nil and (default_sccat[langcode] or default_sccat[full_langcode]) then for _, head in ipairs(data.heads) do insert(data.categories, ucfirst(data.pos_category) .. " bahasa " .. full_langname .. " dalam " .. head.sc:getDisplayForm()) end end -- Reconstructed terms often use weird combinations of scripts and realistically aren't spelled so much as notated. if namespace ~= "Rekonstruksi" then -- Map from languages to a string containing the characters to ignore when considering whether a term has -- multiple written scripts in it. Typically these are Greek or Cyrillic letters used for their phonetic -- values. local characters_to_ignore = { ["aaq"] = "αάὰ", -- Penobscot (Algonquian) ["acy"] = "δθ", -- Cypriot Arabic ["aez"] = "β", -- Aeka (Trans-New Guinea) ["anc"] = "γ", -- Ngas (Chadic/Afroasiatic) ["aou"] = "χ", -- A'ou (Kra-Dai) ["art-blk"] = "ч", -- Bolak (conlang) ["awg"] = "β", -- Anguthimri (Pama-Nyungan) ["az"] = "ь", -- Azerbaijani (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["ba"] = "ь", -- Bashkir (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["bhp"] = "β", -- Bima (Austronesian) ["bjz"] = "β", -- Baruga (Trans-New Guinea) ["byk"] = "θ", -- Biao (Kra-Dai) ["cdy"] = "θ", -- Chadong (Kra-Dai) ["chp"] = "θ", -- Chipewyan (Athabaskan) ["cjh"] = "χ", -- Upper Chehalis (Salishan) ["clm"] = "χ", -- Klallam (Salishan) ["col"] = "χ", -- Colombia-Wenatchi (Salishan) ["coo"] = "χθ", -- Comox (Salishan) ["crx"] = "θ", -- Carrier (Athabaskan) ["ets"] = "θ", -- Yekhee (Edoid/Niger-Congo) ["ett"] = "χ", -- Etruscan (isolate; in romanizations) ["fla"] = "χ", -- Montana Salish (Salishan) ["grt"] = "་", -- Garo (South Asian Sino-Tibetan) ["gmw-gts"] = "χ", -- Gottscheerish (Bavarian variant spoken in Slovenia) ["hur"] = "χθ", -- Halkomelem (Salishan) ["itc-psa"] = "f", -- Pre-Samnite (Italic; normally written in Greek) ["izh"] = "ь", -- Ingrian (Finnic) ["kic"] = "θ", -- Kickapoo (Algonquian) ["kk"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["ky"] = "ь", -- Kyrgyz (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["lil"] = "χ", -- Lillooet (Salishan) ["lsi"] = "ꓹ", -- Lashi (Lolo-Burmese/Sino-Tibetan; represents a glottal stop) ["mhz"] = "β", -- Mor (Austronesian) ["mqn"] = "β", -- Moronene (Austronesian) ["neg"]= "ӡā", -- Negidal (Tungusic; normally in Cyrillic) ["oka"] = "χ", -- Okanagan (Salishan) ["ole"] = "θ", -- Olekha (Sino-Tibetan) ["oui"] = "γβ", -- Old Uyghur (Turkic; FIXME: others? E.g. Greek delta (δ)?) ["pox"] = "χ", -- Polabian (West Slavic) ["rif"] = "ε", -- Tarifit (Berber) ["rom"] = "Θθ", -- Romani (Indic: International Standard; two different thetas???) ["rpn"] = "β", -- Repanbitip (Austronesian) ["sah"] = "ь", -- Yakut (Turkic; 1929 - 1939 Latin spelling) ["sit-jap"] = "χ", -- Japhug (Sino-Tibetan) ["sjw"] = "θ", -- Shawnee (Algonquian) ["squ"] = "χ", -- Squamish (Salishan) ["str"] = "χθ", -- Saanich (Salishan) ["teh"] = "χ", -- Tehuelche (Chonan; spoken in Argentina) ["tep"] = "η", -- Tepecano (Uto-Aztecan) ["thp"] = "χ", -- Thompson (Salishan) ["tk"] = "ь", -- Turkmen (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["tt"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["twa"] = "χ", -- Twana (Salishan) ["wbl"] = "ы", -- Wakhi (Iranian) ["xbc"] = "ϸ", -- Bactrian (Iranian; represents š; normally written in Greek) ["yha"] = "θ", -- Baha (Kra-Dai) ["za"] = "зч", -- Zhuang (Tai/Kra-Dai); 1957-1982 alphabet used two Cyrillic letters (as well as some others like -- ƃ, ƅ, ƨ, ɯ and ɵ that look like Cyrillic or Greek but are actually Latin) ["zlw-slv"] = "χђћ", -- Slovincian (West Slavic; FIXME: χ is Greek, the other two are Cyrillic, but I'm not sure -- the currect characters are being chosen in the entry names) ["zng"] = "θ", -- Mang (Mon-Khmer) ["ztp"] = "θ", -- Loxicha Zapotec (Zapotecan) } -- Determine how many real scripts are found in the pagename, where we exclude symbols and such. We exclude -- scripts whose `character_category` is false as well as Zmth (mathematical notation symbols), which has a -- category of "Mathematical notation symbols". When counting scripts, we need to elide language-specific -- variants because e.g. Beng and as-Beng have slightly different characters but we don't want to consider them -- two different scripts (e.g. [[এৰ]] has two characters which are detected respectively as Beng and as-Beng). local seen_scripts = {} local num_seen_scripts = 0 local num_loops = 0 local canon_pagename = page.pagename local ch_to_ignore = characters_to_ignore[full_langcode] if ch_to_ignore then canon_pagename = ugsub(canon_pagename, "[" .. ch_to_ignore .. "]", "") end while true do if canon_pagename == "" or num_seen_scripts >= 2 or num_loops >= 10 then break end -- Make sure we don't get into a loop checking the same script over and over again; happens with e.g. [[ᠪᡳ]] num_loops = num_loops + 1 local pagename_script = find_best_script_without_lang(canon_pagename, "None only as last resort") local script_chars = pagename_script.characters if not script_chars then -- we are stuck; this happens with None break end local script_code = pagename_script:getCode() local replaced canon_pagename, replaced = ugsub(canon_pagename, "[" .. script_chars .. "]", "") if ( replaced and script_code ~= "Zmth" and (script_data or get_script_data())[script_code] and script_data[script_code].character_category ~= false ) then script_code = script_code:gsub("^.-%-", "") if not seen_scripts[script_code] then seen_scripts[script_code] = true num_seen_scripts = num_seen_scripts + 1 end end end if num_seen_scripts > 1 then insert(data.categories, "Perkataan bahasa " .. full_langname .. " dieja dalam berbilang tulisan") end end -- Categorise for unusual characters. Takes into account combining characters, so that we can categorise for characters with diacritics that aren't encoded as atomic characters (e.g. U̠). These can be in two formats: single combining characters (i.e. character + diacritic(s)) or double combining characters (i.e. character + diacritic(s) + character). Each can have any number of diacritics. local standard = data.lang:getStandardCharacters() if standard and not non_categorizable(page.full_raw_pagename) then local function char_category(char) local specials = { ["#"] = "number sign", ["("] = "parentheses", [")"] = "parentheses", ["<"] = "angle brackets", [">"] = "angle brackets", ["["] = "square brackets", ["]"] = "square brackets", ["_"] = "underscore", ["{"] = "braces", ["|"] = "vertical line", ["}"] = "braces", ["ß"] = "ẞ", ["\205\133"] = "", -- this is UTF-8 for U+0345 ( ͅ) ["\239\191\189"] = "replacement character", } char = toNFD(char) :gsub(".[\128-\191]*", function(m) local new_m = specials[m] new_m = new_m or m:uupper() return new_m end) return toNFC(char) end if full_langcode ~= "hi" and full_langcode ~= "lo" then local standard_chars_scripts = {} for _, head in ipairs(data.heads) do standard_chars_scripts[head.sc:getCode()] = true end -- Iterate over the scripts, in case there is more than one (as they can have different sets of standard characters). for code in pairs(standard_chars_scripts) do local sc_standard = data.lang:getStandardCharacters(code) if sc_standard then if page.pagename_len > 1 then local explode_standard = {} local function explode(char) explode_standard[char] = true return "" end local sc_standard = ugsub(sc_standard, page.comb_chars.combined_double, explode) sc_standard = ugsub(sc_standard,page.comb_chars.combined_single, explode) :gsub(".[\128-\191]*", explode) local num_cat_inserted for char in pairs(page.explode_pagename) do if not explode_standard[char] then if char:find("[0-9]") then if not num_cat_inserted then insert(data.categories, "Perkataan dieja dengan nombor bahasa " .. full_langname) num_cat_inserted = true end elseif ufind(char, page.emoji_pattern) then insert(data.categories, "Perkataan dieja dengan emoji bahasa " .. full_langname) else local upper = char_category(char) if not explode_standard[upper] then char = upper end insert(data.categories, "Perkataan dieja dengan " .. char .. " bahasa " .. full_langname) end end end end -- If a diacritic doesn't appear in any of the standard characters, also categorise for it generally. sc_standard = toNFD(sc_standard) for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_single) do if not umatch(sc_standard, diacritic) then insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. " bahasa " .. full_langname) end end for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_double) do if not umatch(sc_standard, diacritic) then insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. "◌ bahasa " .. full_langname) end end end end -- Ancient Greek, Hindi and Lao handled the old way for now, as their standard chars still need to be converted to the new format (because there are a lot of them). elseif ulen(page.pagename) ~= 1 then for character in ugmatch(page.pagename, "([^" .. standard .. "])") do local upper = char_category(character) if not umatch(upper, "[" .. standard .. "]") then character = upper end insert(data.categories, "Perkataan dieja dengan " .. character .. " bahasa " .. full_langname) end end end if data.heads[1].sc:isSystem("alphabet") then local pagename, i = page.pagename:ulower(), 2 while umatch(pagename, "(%a)" .. ("%1"):rep(i)) do i = i + 1 insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan " .. i .. " contoh huruf yang sama berturut-turut") end end -- Categorise for palindromes if not data.nopalindromecat and namespace ~= "Rekonstruksi" and ulen(page.pagename) > 2 -- FIXME: Use of first script here seems hacky. What is the clean way of doing this in the presence of -- multiple scripts? and is_palindrome(page.pagename, data.lang, data.heads[1].sc) then insert(data.categories, "Palindrom bahasa " .. full_langname) end if namespace == "" and not lang_reconstructed then for _, head in ipairs(data.heads) do if page.full_raw_pagename ~= get_link_page(remove_links(head.term), data.lang, head.sc) then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch/LANGCODE]] track("pagename spelling mismatch", data.lang) break end end end -- Add red link category if called for and we're not a "large" page, where such checks are disabled. if data.checkredlinks and not m_headword_data.large_pages[m_headword_data.pagename] then local plposcat = type(data.checkredlinks) == "string" and data.checkredlinks or data.pos_category check_red_link_inflections_top_level(data, plposcat) end -- Add to various maintenance categories. export.maintenance_cats(page, data.lang, data.categories, data.whole_page_categories) ------------ 10. Format and return headwords, genders, inflections and categories. ------------ -- Format and return all the gathered information. This may add more categories (e.g. gender/number categories), -- so make sure we do it before evaluating `data.categories`. local text = '<span class="headword-line">' .. format_headword(data) .. format_headword_genders(data) .. format_top_level_inflections(data) .. '</span>' -- Language-specific categories. local cats = format_categories( data.categories, data.lang, data.sort_key, page.encoded_pagename, data.force_cat_output or test_force_categories, data.heads[1].sc ) -- Language-agnostic categories. local whole_page_cats = format_categories( data.whole_page_categories, nil, "-" ) return text .. cats .. whole_page_cats end return export dsxezkypi61q35s0q5jp4h6np4acc9c 281454 281453 2026-04-22T15:32:12Z Hakimi97 2668 281454 Scribunto text/plain local export = {} -- Named constants for all modules used, to make it easier to swap out sandbox versions. local debug_track_module = "Module:debug/track" local en_utilities_module = "Module:en-utilities" local gender_and_number_module = "Module:gender and number" local headword_data_module = "Module:headword/data" local headword_page_module = "Module:headword/page" local links_module = "Module:links" local load_module = "Module:load" local pages_module = "Module:pages" local palindromes_module = "Module:palindromes" local pron_qualifier_module = "Module:pron qualifier" local scripts_module = "Module:scripts" local scripts_data_module = "Module:scripts/data" local script_utilities_module = "Module:script utilities" local script_utilities_data_module = "Module:script utilities/data" local string_utilities_module = "Module:string utilities" local table_module = "Module:table" local utilities_module = "Module:utilities" local concat = table.concat local dump = mw.dumpObject local insert = table.insert local ipairs = ipairs local max = math.max local new_title = mw.title.new local pairs = pairs local require = require local toNFC = mw.ustring.toNFC local toNFD = mw.ustring.toNFD local type = type local ufind = mw.ustring.find local ugmatch = mw.ustring.gmatch local ugsub = mw.ustring.gsub local umatch = mw.ustring.match --[==[ Loaders for functions in other modules, which overwrite themselves with the target function when called. This ensures modules are only loaded when needed, retains the speed/convenience of locally-declared pre-loaded functions, and has no overhead after the first call, since the target functions are called directly in any subsequent calls.]==] local function debug_track(...) debug_track = require(debug_track_module) return debug_track(...) end local function encode_entities(...) encode_entities = require(string_utilities_module).encode_entities return encode_entities(...) end local function extend(...) extend = require(table_module).extend return extend(...) end local function find_best_script_without_lang(...) find_best_script_without_lang = require(scripts_module).findBestScriptWithoutLang return find_best_script_without_lang(...) end local function format_categories(...) format_categories = require(utilities_module).format_categories return format_categories(...) end local function format_genders(...) format_genders = require(gender_and_number_module).format_genders return format_genders(...) end local function format_pron_qualifiers(...) format_pron_qualifiers = require(pron_qualifier_module).format_qualifiers return format_pron_qualifiers(...) end local function full_link(...) full_link = require(links_module).full_link return full_link(...) end local function get_current_L2(...) get_current_L2 = require(pages_module).get_current_L2 return get_current_L2(...) end local function get_link_page(...) get_link_page = require(links_module).get_link_page return get_link_page(...) end local function get_script(...) get_script = require(scripts_module).getByCode return get_script(...) end local function is_palindrome(...) is_palindrome = require(palindromes_module).is_palindrome return is_palindrome(...) end local function language_link(...) language_link = require(links_module).language_link return language_link(...) end local function load_data(...) load_data = require(load_module).load_data return load_data(...) end local function pattern_escape(...) pattern_escape = require(string_utilities_module).pattern_escape return pattern_escape(...) end local function pluralize(...) pluralize = require(en_utilities_module).pluralize return pluralize(...) end local function process_page(...) process_page = require(headword_page_module).process_page return process_page(...) end local function remove_links(...) remove_links = require(links_module).remove_links return remove_links(...) end local function shallow_copy(...) shallow_copy = require(table_module).shallowCopy return shallow_copy(...) end local function tag_text(...) tag_text = require(script_utilities_module).tag_text return tag_text(...) end local function tag_transcription(...) tag_transcription = require(script_utilities_module).tag_transcription return tag_transcription(...) end local function tag_translit(...) tag_translit = require(script_utilities_module).tag_translit return tag_translit(...) end local function trim(...) trim = require(string_utilities_module).trim return trim(...) end local function ulen(...) ulen = require(string_utilities_module).len return ulen(...) end local function ucfirst(...) ucfirst = require(string_utilities_module).ucfirst return ucfirst(...) end --[==[ Loaders for objects, which load data (or some other object) into some variable, which can then be accessed as "foo or get_foo()", where the function get_foo sets the object to "foo" and then returns it. This ensures they are only loaded when needed, and avoids the need to check for the existence of the object each time, since once "foo" has been set, "get_foo" will not be called again.]==] local m_data local function get_data() m_data = load_data(headword_data_module) return m_data end local script_data local function get_script_data() script_data = load_data(scripts_data_module) return script_data end local script_utilities_data local function get_script_utilities_data() script_utilities_data = load_data(script_utilities_data_module) return script_utilities_data end -- If set to true, categories always appear, even in non-mainspace pages local test_force_categories = false -- Add a tracking category to track entries with certain (unusually undesirable) properties. `track_id` is an identifier -- for the particular property being tracked and goes into the tracking page. Specifically, this adds a link in the -- page text to [[Wiktionary:Tracking/headword/TRACK_ID]], meaning you can find all entries with the `track_id` property -- by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID]]. -- -- If `lang` (a language object) is given, an additional tracking page [[Wiktionary:Tracking/headword/TRACK_ID/CODE]] is -- linked to where CODE is the language code of `lang`, and you can find all entries in the combination of `track_id` -- and `lang` by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID/CODE]]. This makes it possible to -- isolate only the entries with a specific tracking property that are in a given language. Note that if `lang` -- references at etymology-only language, both that language's code and its full parent's code are tracked. local function track(track_id, lang) local tracking_page = "headword/" .. track_id if lang and lang:hasType("etymology-only") then debug_track{tracking_page, tracking_page .. "/" .. lang:getCode(), tracking_page .. "/" .. lang:getFullCode()} elseif lang then debug_track{tracking_page, tracking_page .. "/" .. lang:getCode()} else debug_track(tracking_page) end return true end local function text_in_script(text, script_code) local sc = get_script(script_code) if not sc then error("Internal error: Bad script code " .. script_code) end local characters = sc.characters local out if characters then text = ugsub(text, "%W", "") out = ufind(text, "[" .. characters .. "]") end if out then return true else return false end end local spacingPunctuation = "[%s%p]+" --[[ List of punctuation or spacing characters that are found inside of words. Used to exclude characters from the regex above. ]] local wordPunc = "-#%%&@־׳״'.·*’་•:᠊" local notWordPunc = "[^" .. wordPunc .. "]+" -- Format a term (either a head term or an inflection term) along with any left or right qualifiers, labels, references -- or customized separator: `part` is the object specifying the term (and `lang` the language of the term), which should -- optionally contain: -- * left qualifiers in `q`, an array of strings; -- * right qualifiers in `qq`, an array of strings; -- * left labels in `l`, an array of strings; -- * right labels in `ll`, an array of strings; -- * references in `refs`, an array either of strings (formatted reference text) or objects containing fields `text` -- (formatted reference text) and optionally `name` and/or `group`; -- * a separator in `separator`, defaulting to " <i>or</i> " if this is not the first term (j > 1), otherwise "". -- `formatted` is the formatted version of the term itself, and `j` is the index of the term. local function format_term_with_qualifiers_and_refs(lang, part, formatted, j) local function part_non_empty(field) local list = part[field] if not list then return nil end if type(list) ~= "table" then error(("Internal error: Wrong type for `part.%s`=%s, should be \"table\""):format(field, dump(list))) end return list[1] end if part_non_empty("q") or part_non_empty("qq") or part_non_empty("l") or part_non_empty("ll") or part_non_empty("refs") then formatted = format_pron_qualifiers { lang = lang, text = formatted, q = part.q, qq = part.qq, l = part.l, ll = part.ll, refs = part.refs, } end local separator = part.separator or j > 1 and " <i>or</i> " -- use "" to request no separator if separator then formatted = separator .. formatted end return formatted end --[==[Return true if the given head is multiword according to the algorithm used in full_headword().]==] function export.head_is_multiword(head) for possibleWordBreak in ugmatch(head, spacingPunctuation) do if umatch(possibleWordBreak, notWordPunc) then return true end end return false end do local function workaround_to_exclude_chars(s) return (ugsub(s, notWordPunc, "\2%1\1")) end --[==[Add links to a multiword head.]==] function export.add_multiword_links(head, default) head = "\1" .. ugsub(head, spacingPunctuation, workaround_to_exclude_chars) .. "\2" if default then head = head :gsub("(\1[^\2]*)\\([:#][^\2]*\2)", "%1\\\\%2") :gsub("(\1[^\2]*)([:#][^\2]*\2)", "%1\\%2") end --Escape any remaining square brackets to stop them breaking links (e.g. "[citation needed]"). head = encode_entities(head, "[]", true, true) --[=[ use this when workaround is no longer needed: head = "[[" .. ugsub(head, WORDBREAKCHARS, "]]%1[[") .. "]]" Remove any empty links, which could have been created above at the beginning or end of the string. ]=] return (head :gsub("\1\2", "") :gsub("[\1\2]", {["\1"] = "[[", ["\2"] = "]]"})) end end local function non_categorizable(full_raw_pagename) return full_raw_pagename:find("^Lampiran:Gerak isyarat/") or -- Unsupported titles with descriptive names. (full_raw_pagename:find("^Tajuk tidak disokong/") and not full_raw_pagename:find("`")) end local function tag_text_and_add_quals_and_refs(data, head, formatted, j) -- Add language and script wrapper. formatted = tag_text(formatted, data.lang, head.sc, "head", nil, j == 1 and data.id or nil) -- Add qualifiers, labels, references and separator. return format_term_with_qualifiers_and_refs(data.lang, head, formatted, j) end -- Format a headword with transliterations. local function format_headword(data) -- Are there non-empty transliterations? local has_translits = false local has_manual_translits = false ------ Format the headwords. ------ local head_parts = {} local unique_head_parts = {} local has_multiple_heads = not not data.heads[2] for j, head in ipairs(data.heads) do if head.tr or head.ts then has_translits = true end if head.tr and head.tr_manual or head.ts then has_manual_translits = true end local formatted -- Apply processing to the headword, for formatting links and such. if head.term:find("[[", nil, true) and head.sc:getCode() ~= "Image" then formatted = language_link{term = head.term, lang = data.lang} else formatted = data.lang:makeDisplayText(head.term, head.sc, true) end local head_part = tag_text_and_add_quals_and_refs(data, head, formatted, j) insert(head_parts, head_part) -- If multiple heads, try to determine whether all heads display the same. To do this we need to effectively -- rerun the text tagging and addition of qualifiers and references, using 1 for all indices. if has_multiple_heads then local unique_head_part if j == 1 then unique_head_part = head_part else unique_head_part = tag_text_and_add_quals_and_refs(data, head, formatted, 1) end unique_head_parts[unique_head_part] = true end end local set_size = 0 if has_multiple_heads then for _ in pairs(unique_head_parts) do set_size = set_size + 1 end end if set_size == 1 then head_parts = head_parts[1] else head_parts = concat(head_parts) end if has_manual_translits then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr/LANGCODE]] track("manual-tr", data.lang) end ------ Format the transliterations and transcriptions. ------ local translits_formatted if has_translits then local translit_parts = {} for _, head in ipairs(data.heads) do if head.tr or head.ts then local this_parts = {} if head.tr then insert(this_parts, tag_translit(head.tr, data.lang:getCode(), "head", nil, head.tr_manual)) if head.ts then insert(this_parts, " ") end end if head.ts then insert(this_parts, "/" .. tag_transcription(head.ts, data.lang:getCode(), "head") .. "/") end insert(translit_parts, concat(this_parts)) end end translits_formatted = " (" .. concat(translit_parts, " <i>or</i> ") .. ")" local langname = data.lang:getCanonicalName() local transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus") local saw_translit_page = false if transliteration_page and transliteration_page:getContent() then translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted saw_translit_page = true end -- If data.lang is an etymology-only language and we didn't find a translation page for it, fall back to the -- full parent. if not saw_translit_page and data.lang:hasType("etymology-only") then langname = data.lang:getFullName() transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus") if transliteration_page and transliteration_page:getContent() then translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted end end else translits_formatted = "" end ------ Paste heads and transliterations/transcriptions. ------ local lemma_gloss if data.gloss then lemma_gloss = ' <span class="ib-content qualifier-content">' .. data.gloss .. '</span>' else lemma_gloss = "" end return head_parts .. translits_formatted .. lemma_gloss end local function format_headword_genders(data) local retval = "" if data.genders and data.genders[1] then if data.gloss then retval = "," end local pos_for_cat if not data.nogendercat then local no_gender_cat = (m_data or get_data()).no_gender_cat if not (no_gender_cat[data.lang:getCode()] or no_gender_cat[data.lang:getFullCode()]) then pos_for_cat = (m_data or get_data()).pos_for_gender_number_cat[data.pos_category:gsub("^reconstructed ", "")] end end local text, cats = format_genders(data.genders, data.lang, pos_for_cat) if cats then extend(data.categories, cats) end retval = retval .. "&nbsp;" .. text end return retval end -- Forward reference local format_inflections local function format_inflection_parts(data, parts) for j, part in ipairs(parts) do if type(part) ~= "table" then part = {term = part} end local partaccel = part.accel local face = part.face or "bold" if face ~= "bold" and face ~= "plain" and face ~= "hypothetical" then error("The face `" .. face .. "` " .. ( (script_utilities_data or get_script_utilities_data()).faces[face] and "should not be used for non-headword terms on the headword line." or "is invalid." )) end -- Here the final part 'or data.nolinkinfl' allows to have 'nolinkinfl=true' -- right into the 'data' table to disable inflection links of the entire headword -- when inflected forms aren't entry-worthy, e.g.: in Vulgar Latin local nolinkinfl = part.face == "hypothetical" or (part.nolink and track("nolink") or part.nolinkinfl) or ( data.nolink and track("nolink") or data.nolinkinfl) local formatted if part.label then -- FIXME: There should be a better way of italicizing a label. As is, this isn't customizable. formatted = "<i>" .. part.label .. "</i>" else -- Convert the term into a full link. Don't show a transliteration here unless enable_auto_translit is -- requested, either at the `parts` level (i.e. per inflection) or at the `data.inflections` level (i.e. -- specified for all inflections). This is controllable in {{head}} using autotrinfl=1 for all inflections, -- or fNautotr=1 for an individual inflection (remember that a single inflection may be associated with -- multiple terms). The reason for doing this is to avoid clutter in headword lines by default in languages -- where the script is relatively straightforward to read by learners (e.g. Greek, Russian), but allow it -- to be enabled in languages with more complex scripts (e.g. Arabic). -- -- FIXME: With nested inflections, should we also respect `enable_auto_translit` at the top level of the -- nested inflections structure? local tr = part.tr or not (parts.enable_auto_translit or data.inflections.enable_auto_translit) and "-" or nil -- FIXME: Temporary errors added 2025-10-03. Remove after a month or so. if part.translit then error("Internal error: Use field `tr` not `translit` for specifying an inflection part translit") end if part.transcription then error("Internal error: Use field `ts` not `transcription` for specifying an inflection part transcription") end local postprocess_annotations if part.inflections then postprocess_annotations = function(infldata) insert(infldata.annotations, format_inflections(data, part.inflections)) end end formatted = full_link( { term = not nolinkinfl and part.term or nil, alt = part.alt or (nolinkinfl and part.term or nil), lang = part.lang or data.lang, sc = part.sc or parts.sc or nil, gloss = part.gloss, pos = part.pos, lit = part.lit, id = part.id, genders = part.genders, tr = tr, ts = part.ts, accel = partaccel or parts.accel, postprocess_annotations = postprocess_annotations, }, face ) end parts[j] = format_term_with_qualifiers_and_refs(part.lang or data.lang, part, formatted, j) end local parts_output if parts[1] then parts_output = (parts.label and " " or "") .. concat(parts) elseif parts.request then parts_output = " <small>[please provide]</small>" insert(data.categories, "Requests for inflections in " .. data.lang:getFullName() .. " entries") else parts_output = "" end local parts_label = parts.label and ("<i>" .. parts.label .. "</i>") or "" return format_term_with_qualifiers_and_refs(data.lang, parts, parts_label .. parts_output, 1) end -- Format the inflections following the headword or nested after a given inflection. Declared local above. function format_inflections(data, inflections) if inflections and inflections[1] then -- Format each inflection individually. for key, infl in ipairs(inflections) do inflections[key] = format_inflection_parts(data, infl) end return concat(inflections, ", ") else return "" end end -- Format the top-level inflections following the headword. Currently this just adds parens around the -- formatted comma-separated inflections in `data.inflections`. local function format_top_level_inflections(data) local result = format_inflections(data, data.inflections) if result ~= "" then return " (" .. result .. ")" else return result end end -- Forward reference local check_red_link_inflections -- Check a single inflection (which consists of a label and zero or more terms, each possibly with nested inflections) -- for red links. If so, insert a red-link category based on `plpos` (the plural part of speech to insert in the -- category), stop further processing, and return true. If no red links found, return false. local function check_red_link_inflection_parts(data, parts, plpos) for _, part in ipairs(parts) do if type(part) ~= "table" then part = {term = part} end local term = part.term if term and not term:find("%[%[") then local stripped_physical_term = get_link_page(term, data.lang, part.sc or parts.sc or nil) if stripped_physical_term then local title = mw.title.new(stripped_physical_term) if title and not title:getContent() then insert(data.categories, data.lang:getFullName() .. " " .. plpos .. " with red links in their headword lines") return true end end end if part.inflections then if check_red_link_inflections(data, part.inflections, plpos) then return true end end end return false end -- Check a set of inflections (each of which describes a single inflection of the term, such as feminine or plural, and -- consists of a label and zero or more terms, each possibly with nested inflections) for red links. If so, insert a -- red-link category based on `plpos` (the plural part of speech to insert in the category), stop further processing, -- and return true. If no red links found, return false. function check_red_link_inflections(data, inflections, plpos) if inflections and inflections[1] then -- Check each inflection individually. for key, infl in ipairs(inflections) do if check_red_link_inflection_parts(data, infl, plpos) then return true end end end return false end -- Check the top-level inflections in `data.inflections`, along with any nested inflections, for red links. If so, -- insert a red-link category based on `plpos` (the plural part of speech to insert in the category), stop further -- processing, and return true. If no red links found, return false. local function check_red_link_inflections_top_level(data, plpos) return check_red_link_inflections(data, data.inflections, plpos) end --[==[ Returns the plural form of `pos`, a raw part of speech input, which could be singular or plural. Irregular plural POS are taken into account (e.g. "kanji" pluralizes to "kanji"). ]==] function export.pluralize_pos(pos) -- Make the plural form of the part of speech return (m_data or get_data()).irregular_plurals[pos] or pos:sub(-1) == "s" and pos or pluralize(pos) end --[==[ Return "lemma" if the given POS is a lemma, "non-lemma form" if a non-lemma form, or nil if unknown. The POS passed in must be in its plural form ("nouns", "prefixes", etc.). If you have a POS in its singular form, call {export.pluralize_pos()} above to pluralize it in a smart fashion that knows when to add "-s" and when to add "-es", and also takes into account any irregular plurals. If `best_guess` is given and the POS is in neither the lemma nor non-lemma list, guess based on whether it ends in " forms"; otherwise, return nil. ]==] function export.pos_lemma_or_nonlemma(plpos, best_guess) local m_headword_data = m_data or get_data() local isLemma = m_headword_data.lemmas -- Is it a lemma category? if isLemma[plpos] then return "Lema" end local plpos_no_recon = plpos:gsub("^reconstructed ", "") if isLemma[plpos_no_recon] then return "Lema" end -- Is it a nonlemma category? local isNonLemma = m_headword_data.nonlemmas if isNonLemma[plpos] or isNonLemma[plpos_no_recon] then return "Bentuk bukan lema" end local plpos_no_mut = plpos:gsub("^mutated ", "") if isLemma[plpos_no_mut] or isNonLemma[plpos_no_mut] then return "Bentuk bukan lema" elseif best_guess then return plpos:find("^Bentuk ") and "Bentuk bukan lema" or "Lema" else return nil end end --[==[ Canonicalize a part of speech as specified in 2= in {{tl|head}}. This checks for POS aliases and non-lemma form aliases ending in 'f', and then pluralizes if the POS term does not have an invariable plural. ]==] function export.canonicalize_pos(pos) -- FIXME: Temporary code to throw an error for alias 'pre' (= preposition) that will go away. if pos == "pre" then -- Don't throw error on 'pref' as it's an alias for "prefix". error("POS 'pre' for 'preposition' no longer allowed as it's too ambiguous; use 'prep'") end -- Likewise for pro = pronoun. if pos == "pro" or pos == "prof" then error("POS 'pro' for 'pronoun' no longer allowed as it's too ambiguous; use 'pron'") end local m_headword_data = m_data or get_data() if m_headword_data.pos_aliases[pos] then pos = m_headword_data.pos_aliases[pos] elseif pos:sub(-1) == "f" then pos = pos:sub(1, -2) pos = "Bentuk " .. (m_headword_data.pos_aliases[pos] or pos) end return export.pluralize_pos(pos) end -- Find and return the maximum index in the array `data[element]` (which may have gaps in it), and initialize it to a -- zero-length array if unspecified. Check to make sure all keys are numeric (other than "maxindex", which is set by -- [[Module:parameters]] for list parameters), all values are strings, and unless `allow_blank_string` is given, -- no blank (zero-length) strings are present. local function init_and_find_maximum_index(data, element, allow_blank_string) local maxind = 0 if not data[element] then data[element] = {} end local typ = type(data[element]) if typ ~= "table" then error(("Internal error: In full_headword(), `data.%s` must be an array but is a %s"):format(element, typ)) end for k, v in pairs(data[element]) do if k ~= "maxindex" then if type(k) ~= "number" then error(("Internal error: Unrecognized non-numeric key '%s' in `data.%s`"):format(k, element)) end if k > maxind then maxind = k end if v then if type(v) ~= "string" then error(("Internal error: For key '%s' in `data.%s`, value should be a string but is a %s"):format(k, element, type(v))) end if not allow_blank_string and v == "" then error(("Internal error: For key '%s' in `data.%s`, blank string not allowed; use 'false' for the default"):format(k, element)) end end end end return maxind end --[==[ -- Add the page to various maintenance categories for the language and the -- whole page. These are placed in the headword somewhat arbitrarily, but -- mainly because headword templates are mandatory for entries (meaning that -- in theory it provides full coverage). -- -- This is provided as an external entry point so that modules which transclude -- information from other entries (such as {{tl|ja-see}}) can take advantage -- of this feature as well, because they are used in place of a conventional -- headword template.]==] do -- Handle any manual sortkeys that have been specified in raw categories -- by tracking if they are the same or different from the automatically- -- generated sortkey, so that we can track them in maintenance -- categories. local function handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats) sortkey = sortkey or lang:makeSortKey(page.pagename) -- If there are raw categories with no sortkey, then they will be -- sorted based on the default MediaWiki sortkey, so we check against -- that. if tbl == true then if page.raw_defaultsort ~= sortkey then insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik") end return end local redundant, different for k in pairs(tbl) do if k == sortkey then redundant = true else different = true end end if redundant then insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih lewah") end if different then insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik") end return sortkey end function export.maintenance_cats(page, lang, lang_cats, page_cats) extend(page_cats, page.cats) lang = lang:getFull() -- since we are just generating categories local canonical = lang:getCanonicalName() local tbl, sortkey = page.wikitext_topic_cat[lang:getCode()] if tbl then sortkey = handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats) insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori topik yang menggunakan penanda mentah") end tbl = page.wikitext_langname_cat[canonical] if tbl then handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats) insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori nama bahasa yang menggunakan penanda mentah") end local current_L2 = get_current_L2() if current_L2 then local trimmed_L2 = trim(current_L2) local expected_L2 = "Bahasa " .. canonical if trimmed_L2 ~= expected_L2 then insert(lang_cats, "Entri bahasa " .. canonical .. " dengan pengepala bahasa tidak betul") -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul/LANGCODE]] track("pengepala bahasa tidak betul", lang) end end end end --[==[This is the primary external entry point. {{lua|full_headword(data)}} This is used by {{temp|head}} and various language-specific headword templates (e.g. {{temp|ru-adj}} for Russian adjectives, {{temp|de-noun}} for German nouns, etc.) to display an entire headword line. See [[#Further explanations for full_headword()]] ]==] function export.full_headword(data) -- Prevent data from being destructively modified. local data = shallow_copy(data) ------------ 1. Basic checks for old-style (multi-arg) calling convention. ------------ if data.getCanonicalName then error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) of properties, not a language object") end if not data.lang or type(data.lang) ~= "table" or not data.lang.getCode then error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) and `data.lang` must be a language object") end if data.id and type(data.id) ~= "string" then error("Internal error: The id in the data table should be a string.") end ------------ 2. Initialize pagename etc. ------------ local langcode = data.lang:getCode() local full_langcode = data.lang:getFullCode() local langname = data.lang:getCanonicalName() local full_langname = data.lang:getFullName() local raw_pagename = data.pagename local page local m_headword_data = m_data or get_data() if raw_pagename and raw_pagename ~= m_headword_data.pagename then -- for testing, doc pages, etc. -- data.pagename is often set on documentation and test pages through the pagename= parameter of various -- templates, to emulate running on that page. Having a large number of such test templates on a single -- page often leads to timeouts, because we fetch and parse the contents of each page in turn. However, -- we don't really need to do that and can function fine without fetching and parsing the contents of a -- given page, so turn off content fetching/parsing (and also setting the DEFAULTSORT key through a parser -- function, which is *slooooow*) in certain namespaces where test and documentation templates are likely to -- be found and where actual content does not live (User, Template, Module). local actual_namespace = m_headword_data.page.namespace local no_fetch_content = actual_namespace == "User" or actual_namespace == "Template" or actual_namespace == "Module" page = process_page(raw_pagename, no_fetch_content) else page = m_headword_data.page end local namespace = page.namespace ------------ 3. Initialize `data.heads` table; if old-style, convert to new-style. ------------ if type(data.heads) == "table" and type(data.heads[1]) == "table" then -- new-style if data.translits or data.transcriptions then error("Internal error: In full_headword(), if `data.heads` is new-style (array of head objects), `data.translits` and `data.transcriptions` cannot be given") end else -- convert old-style `heads`, `translits` and `transcriptions` to new-style local maxind = max( init_and_find_maximum_index(data, "heads"), init_and_find_maximum_index(data, "translits", true), init_and_find_maximum_index(data, "transcriptions", true) ) for i = 1, maxind do data.heads[i] = { term = data.heads[i], tr = data.translits[i], ts = data.transcriptions[i], } end end -- Make sure there's at least one head. if not data.heads[1] then data.heads[1] = {} end ------------ 4. Initialize and validate `data.categories` and `data.whole_page_categories`, and determine `pos_category` if not given, and add basic categories. ------------ -- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]] if data.altform then data.noposcat = true end init_and_find_maximum_index(data, "categories") init_and_find_maximum_index(data, "whole_page_categories") local pos_category_already_present = false if data.categories[1] then local escaped_langname = pattern_escape(full_langname) local matches_lang_pattern = "^" .. escaped_langname .. " " for _, cat in ipairs(data.categories) do -- Does the category begin with the language name? If not, tag it with a tracking category. if not cat:find(matches_lang_pattern) then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category/LANGCODE]] track("no lang category", data.lang) end end -- If `pos_category` not given, try to infer it from the first specified category. If this doesn't work, we -- throw an error below. if not data.pos_category and data.categories[1]:find(matches_lang_pattern) then data.pos_category = data.categories[1]:gsub(matches_lang_pattern, "") -- Optimization to avoid inserting category already present. pos_category_already_present = true end end if not data.pos_category then error("Internal error: `data.pos_category` not specified and could not be inferred from the categories given in " .. "`data.categories`. Either specify the plural part of speech in `data.pos_category` " .. "(e.g. \"proper nouns\") or ensure that the first category in `data.categories` is formed from the " .. "language's canonical name plus the plural part of speech (e.g. \"Norwegian Bokmål proper nouns\")." ) end -- Insert a category at the beginning for the part of speech unless it's already present or `data.noposcat` given. if not pos_category_already_present and not data.noposcat then local pos_category = ucfirst(data.pos_category) .. " bahasa " .. full_langname -- FIXME: [[User:Theknightwho]] Why is this special case here? Please add an explanatory comment. if pos_category ~= "Aksara Han rentas bahasa" then insert(data.categories, 1, pos_category) end end -- Try to determine whether the part of speech refers to a lemma or a non-lemma form; if we can figure this out, -- add an appropriate category. local postype = export.pos_lemma_or_nonlemma(data.pos_category) if not postype then -- We don't know what this category is, so tag it with a tracking category. -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/LANGCODE]] track("unrecognized pos", data.lang) -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS/LANGCODE]] track("unrecognized pos/pos/" .. data.pos_category, data.lang) elseif not data.noposcat then insert(data.categories, 1, ucfirst(postype) .. " bahasa " .. full_langname) end -- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]] if data.altform then insert(data.categories, 1, "Bentuk alternatif bahasa " .. full_langname) end ------------ 5. Create a default headword, and add links to multiword page names. ------------ -- Determine if this is an "anti-asterisk" term, i.e. an attested term in a language that must normally be -- reconstructed. local is_anti_asterisk = data.heads[1].term and data.heads[1].term:find("^!!") local lang_reconstructed = data.lang:hasType("reconstructed") if is_anti_asterisk then if not lang_reconstructed then error("Anti-asterisk feature (head= beginning with !!) can only be used with reconstructed languages") end lang_reconstructed = false end -- Determine if term is reconstructed local is_reconstructed = namespace == "Rekonstruksi" or data.lang:hasType("reconstructed") -- Create a default headword based on the pagename, which is determined in -- advance by the data module so that it only needs to be done once. local default_head = page.pagename -- Add links to multi-word page names when appropriate if not (is_reconstructed or data.nolinkhead) then local no_links = m_headword_data.no_multiword_links if not (no_links[langcode] or no_links[full_langcode]) and export.head_is_multiword(default_head) then default_head = export.add_multiword_links(default_head, true) end end if is_reconstructed then default_head = "*" .. default_head end ------------ 6. Check the namespace against the language type. ------------ if namespace == "" then if lang_reconstructed then error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Rekonstruksi: ") elseif data.lang:hasType("appendix-constructed") then error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Lampiran: ") end elseif namespace == "Petikan" or namespace == "Tesaurus" then error("Templat pengepala tidak boleh digunakan dalam ruang nama " .. namespace .. ": .") end ------------ 7. Fill in missing values in `data.heads`. ------------ -- True if any script among the headword scripts has spaces in it. local any_script_has_spaces = false -- True if any term has a redundant head= param. local has_redundant_head_param = false for _, head in ipairs(data.heads) do ------ 7a. If missing head, replace with default head. if not head.term then head.term = default_head elseif head.term == default_head then has_redundant_head_param = true elseif is_anti_asterisk and head.term == "!!" then -- If explicit head=!! is given, it's an anti-asterisk term and we fill in the default head. head.term = "!!" .. default_head elseif head.term:find("^[!?]$") then -- If explicit head= just consists of ! or ?, add it to the end of the default head. head.term = default_head .. head.term end head.term_no_initial_bang_bang = is_anti_asterisk and head.term:sub(3) or head.term if is_reconstructed then local head_term = head.term if head_term:find("%[%[") then head_term = remove_links(head_term) end if head_term:sub(1, 1) ~= "*" then error("The headword '" .. head_term .. "' must begin with '*' to indicate that it is reconstructed.") end end ------ 7b. Try to detect the script(s) if not provided. If a per-head script is provided, that takes precedence, ------ otherwise fall back to the overall script if given. If neither given, autodetect the script. local auto_sc = data.lang:findBestScript(head.term) if ( auto_sc:getCode() == "None" and find_best_script_without_lang(head.term):getCode() ~= "None" ) then insert(data.categories, "Perkataan dalam bentuk tulisan tidak piawai bahasa " .. full_langname ) end if not (head.sc or data.sc) then -- No script code given, so use autodetected script. head.sc = auto_sc else if not head.sc then -- Overall script code given. head.sc = data.sc end -- Track uses of sc parameter. if head.sc:getCode() == auto_sc:getCode() then track("redundant script code", data.lang) if not data.no_script_code_cat then insert(data.categories, "Perkataan dengan kod tulisan lewah bahasa " .. full_langname ) end else track("non-redundant manual script code", data.lang) if not data.no_script_code_cat then insert(data.categories, "Perkataan dengan kod tulisan manual tidak lewah bahasa " .. full_langname ) end end end -- If using a discouraged character sequence, add to maintenance category. if head.sc:hasNormalizationFixes() == true then local composed_head = toNFC(head.term) if head.sc:fixDiscouragedSequences(composed_head) ~= composed_head then insert(data.whole_page_categories, "Laman menggunakan jujukan aksara tidak digalakkan") end end any_script_has_spaces = any_script_has_spaces or head.sc:hasSpaces() ------ 7c. Create automatic transliterations for any non-Latin headwords without manual translit given ------ (provided automatic translit is available, e.g. not in Persian or Hebrew). -- Make transliterations head.tr_manual = nil -- Try to generate a transliteration if necessary if head.tr == "-" then head.tr = nil else local notranslit = m_headword_data.notranslit if not (notranslit[langcode] or notranslit[full_langcode]) and head.sc:isTransliterated() then head.tr_manual = not not head.tr local text = head.term_no_initial_bang_bang if not data.lang:link_tr(head.sc) then text = remove_links(text) end local automated_tr = data.lang:transliterate(text, head.sc) if automated_tr then local manual_tr = head.tr if manual_tr then if remove_links(manual_tr) == remove_links(automated_tr) then insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi lewah") else insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi manual tidak lewah") end end if not manual_tr then head.tr = automated_tr end end -- There is still no transliteration? -- Add the entry to a cleanup category. if not head.tr then head.tr = "<small>transliteration needed</small>" -- FIXME: No current support for 'Request for transliteration of Classical Persian terms' or similar. -- Consider adding this support in [[Module:category tree/poscatboiler/data/entry maintenance]]. insert(data.categories, "Permintaan transliterasi perkataan bahasa " .. full_langname) else -- Otherwise, trim it. head.tr = trim(head.tr) end end end -- Link to the transliteration entry for languages that require this. if head.tr and data.lang:link_tr(head.sc) then head.tr = full_link{ term = head.tr, lang = data.lang, sc = get_script("Latn"), tr = "-" } end end ------------ 8. Maybe tag the title with the appropriate script code, using the `display_title` mechanism. ------------ -- Assumes that the scripts in "toBeTagged" will never occur in the Reconstruction namespace. -- (FIXME: Don't make assumptions like this, and if you need to do so, throw an error if the assumption is violated.) -- Avoid tagging ASCII as Hani even when it is tagged as Hani in the headword, as in [[check]]. The check for ASCII -- might need to be expanded to a check for any Latin characters and whitespace or punctuation. local display_title -- Where there are multiple headwords, use the script for the first. This assumes the first headword is similar to -- the pagename, and that headwords that are in different scripts from the pagename aren't first. This seems to be -- about the best we can do (alternatively we could potentially do script detection on the pagename). local dt_script = data.heads[1].sc local dt_script_code = dt_script:getCode() local page_non_ascii = namespace == "" and not page.pagename:find("^[%z\1-\127]+$") local unsupported_pagename, unsupported = page.full_raw_pagename:gsub("^Tajuk tidak disokong/", "") if unsupported == 1 and page.unsupported_titles[unsupported_pagename] then display_title = 'Tajuk tidak disokong/<span class="' .. dt_script_code .. '">' .. page.unsupported_titles[unsupported_pagename] .. '</span>' elseif page_non_ascii and m_headword_data.toBeTagged[dt_script_code] or (dt_script_code == "Jpan" and (text_in_script(page.pagename, "Hira") or text_in_script(page.pagename, "Kana"))) or (dt_script_code == "Kore" and text_in_script(page.pagename, "Hang")) then display_title = '<span class="' .. dt_script_code .. '">' .. page.full_raw_pagename .. '</span>' -- Keep Han entries region-neutral in the display title. elseif page_non_ascii and (dt_script_code == "Hant" or dt_script_code == "Hans") then display_title = '<span class="Hani">' .. page.full_raw_pagename .. '</span>' elseif namespace == "Rekonstruksi" then local matched display_title, matched = ugsub( page.full_raw_pagename, "^(Rekonstruksi:[^/]+/)(.+)$", function(before, term) return before .. tag_text(term, data.lang, dt_script) end ) if matched == 0 then display_title = nil end end -- FIXME: Generalize this. -- If the current language uses ur-Arab (for Urdu, etc.), ku-Arab (Central Kurdish) or pa-Arab -- (Shahmukhi, for Punjabi) and there's more than one language on the page, don't set the display title -- because these three scripts display in Nastaliq and we don't want this for terms that also exist in other -- languages that don't display in Nastaliq (e.g. Arabic or Persian) to display in Nastaliq. Because the word -- "Urdu" occurs near the end of the alphabet, Urdu fonts tend to override the fonts of other languages. -- FIXME: This is checking for more than one language on the page but instead needs to check if there are any -- languages using scripts other than the ones just mentioned. if (dt_script_code == "ur-Arab" or dt_script_code == "ku-Arab" or dt_script_code == "pa-Arab") and page.L2_list.n > 1 then display_title = nil end if display_title then mw.getCurrentFrame():callParserFunction( "DISPLAYTITLE", display_title ) end ------------ 9. Insert additional categories. ------------ if data.force_cat_output then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/force cat output]] track("force cat output") end if has_redundant_head_param then if not data.no_redundant_head_cat then -- This is not the right way to go about this; too many exceptions and problems due to language-specific headword -- handling customization. If we want this, it should be opt-in by a given language passing in the default headword. -- insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan parameter kepala lewah") end end -- If the first head is multiword (after removing links), maybe insert into "LANG multiword terms". if not data.nomultiwordcat and any_script_has_spaces and postype == "lemma" then local no_multiword_cat = m_headword_data.no_multiword_cat if not (no_multiword_cat[langcode] or no_multiword_cat[full_langcode]) then -- Check for spaces or hyphens, but exclude prefixes and suffixes. -- Use the pagename, not the head= value, because the latter may have extra -- junk in it, e.g. superscripted text that throws off the algorithm. local no_hyphen = m_headword_data.hyphen_not_multiword_sep -- Exclude hyphens if the data module states that they should for this language. local checkpattern = (no_hyphen[langcode] or no_hyphen[full_langcode]) and ".[%s፡]." or ".[%s%-፡]." local is_multiword = umatch(page.pagename, checkpattern) if is_multiword and not non_categorizable(page.full_raw_pagename) then insert(data.categories, "Perkataan berbilang kata bahasa " .. full_langname) elseif not is_multiword then local long_word_threshold = m_headword_data.long_word_thresholds[langcode] or m_headword_data.long_word_thresholds[full_langcode] if long_word_threshold and ulen(page.pagename) >= long_word_threshold then insert(data.categories, "Perkataan panjang bahasa " .. full_langname) end end end end local default_sccat = m_headword_data.default_sccat if data.sccat or data.sccat == nil and (default_sccat[langcode] or default_sccat[full_langcode]) then for _, head in ipairs(data.heads) do insert(data.categories, ucfirst(data.pos_category) .. " bahasa " .. full_langname .. " dalam " .. head.sc:getDisplayForm()) end end -- Reconstructed terms often use weird combinations of scripts and realistically aren't spelled so much as notated. if namespace ~= "Rekonstruksi" then -- Map from languages to a string containing the characters to ignore when considering whether a term has -- multiple written scripts in it. Typically these are Greek or Cyrillic letters used for their phonetic -- values. local characters_to_ignore = { ["aaq"] = "αάὰ", -- Penobscot (Algonquian) ["acy"] = "δθ", -- Cypriot Arabic ["aez"] = "β", -- Aeka (Trans-New Guinea) ["anc"] = "γ", -- Ngas (Chadic/Afroasiatic) ["aou"] = "χ", -- A'ou (Kra-Dai) ["art-blk"] = "ч", -- Bolak (conlang) ["awg"] = "β", -- Anguthimri (Pama-Nyungan) ["az"] = "ь", -- Azerbaijani (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["ba"] = "ь", -- Bashkir (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["bhp"] = "β", -- Bima (Austronesian) ["bjz"] = "β", -- Baruga (Trans-New Guinea) ["byk"] = "θ", -- Biao (Kra-Dai) ["cdy"] = "θ", -- Chadong (Kra-Dai) ["chp"] = "θ", -- Chipewyan (Athabaskan) ["cjh"] = "χ", -- Upper Chehalis (Salishan) ["clm"] = "χ", -- Klallam (Salishan) ["col"] = "χ", -- Colombia-Wenatchi (Salishan) ["coo"] = "χθ", -- Comox (Salishan) ["crx"] = "θ", -- Carrier (Athabaskan) ["ets"] = "θ", -- Yekhee (Edoid/Niger-Congo) ["ett"] = "χ", -- Etruscan (isolate; in romanizations) ["fla"] = "χ", -- Montana Salish (Salishan) ["grt"] = "་", -- Garo (South Asian Sino-Tibetan) ["gmw-gts"] = "χ", -- Gottscheerish (Bavarian variant spoken in Slovenia) ["hur"] = "χθ", -- Halkomelem (Salishan) ["itc-psa"] = "f", -- Pre-Samnite (Italic; normally written in Greek) ["izh"] = "ь", -- Ingrian (Finnic) ["kic"] = "θ", -- Kickapoo (Algonquian) ["kk"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["ky"] = "ь", -- Kyrgyz (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["lil"] = "χ", -- Lillooet (Salishan) ["lsi"] = "ꓹ", -- Lashi (Lolo-Burmese/Sino-Tibetan; represents a glottal stop) ["mhz"] = "β", -- Mor (Austronesian) ["mqn"] = "β", -- Moronene (Austronesian) ["neg"]= "ӡā", -- Negidal (Tungusic; normally in Cyrillic) ["oka"] = "χ", -- Okanagan (Salishan) ["ole"] = "θ", -- Olekha (Sino-Tibetan) ["oui"] = "γβ", -- Old Uyghur (Turkic; FIXME: others? E.g. Greek delta (δ)?) ["pox"] = "χ", -- Polabian (West Slavic) ["rif"] = "ε", -- Tarifit (Berber) ["rom"] = "Θθ", -- Romani (Indic: International Standard; two different thetas???) ["rpn"] = "β", -- Repanbitip (Austronesian) ["sah"] = "ь", -- Yakut (Turkic; 1929 - 1939 Latin spelling) ["sit-jap"] = "χ", -- Japhug (Sino-Tibetan) ["sjw"] = "θ", -- Shawnee (Algonquian) ["squ"] = "χ", -- Squamish (Salishan) ["str"] = "χθ", -- Saanich (Salishan) ["teh"] = "χ", -- Tehuelche (Chonan; spoken in Argentina) ["tep"] = "η", -- Tepecano (Uto-Aztecan) ["thp"] = "χ", -- Thompson (Salishan) ["tk"] = "ь", -- Turkmen (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["tt"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["twa"] = "χ", -- Twana (Salishan) ["wbl"] = "ы", -- Wakhi (Iranian) ["xbc"] = "ϸ", -- Bactrian (Iranian; represents š; normally written in Greek) ["yha"] = "θ", -- Baha (Kra-Dai) ["za"] = "зч", -- Zhuang (Tai/Kra-Dai); 1957-1982 alphabet used two Cyrillic letters (as well as some others like -- ƃ, ƅ, ƨ, ɯ and ɵ that look like Cyrillic or Greek but are actually Latin) ["zlw-slv"] = "χђћ", -- Slovincian (West Slavic; FIXME: χ is Greek, the other two are Cyrillic, but I'm not sure -- the currect characters are being chosen in the entry names) ["zng"] = "θ", -- Mang (Mon-Khmer) ["ztp"] = "θ", -- Loxicha Zapotec (Zapotecan) } -- Determine how many real scripts are found in the pagename, where we exclude symbols and such. We exclude -- scripts whose `character_category` is false as well as Zmth (mathematical notation symbols), which has a -- category of "Mathematical notation symbols". When counting scripts, we need to elide language-specific -- variants because e.g. Beng and as-Beng have slightly different characters but we don't want to consider them -- two different scripts (e.g. [[এৰ]] has two characters which are detected respectively as Beng and as-Beng). local seen_scripts = {} local num_seen_scripts = 0 local num_loops = 0 local canon_pagename = page.pagename local ch_to_ignore = characters_to_ignore[full_langcode] if ch_to_ignore then canon_pagename = ugsub(canon_pagename, "[" .. ch_to_ignore .. "]", "") end while true do if canon_pagename == "" or num_seen_scripts >= 2 or num_loops >= 10 then break end -- Make sure we don't get into a loop checking the same script over and over again; happens with e.g. [[ᠪᡳ]] num_loops = num_loops + 1 local pagename_script = find_best_script_without_lang(canon_pagename, "None only as last resort") local script_chars = pagename_script.characters if not script_chars then -- we are stuck; this happens with None break end local script_code = pagename_script:getCode() local replaced canon_pagename, replaced = ugsub(canon_pagename, "[" .. script_chars .. "]", "") if ( replaced and script_code ~= "Zmth" and (script_data or get_script_data())[script_code] and script_data[script_code].character_category ~= false ) then script_code = script_code:gsub("^.-%-", "") if not seen_scripts[script_code] then seen_scripts[script_code] = true num_seen_scripts = num_seen_scripts + 1 end end end if num_seen_scripts > 1 then insert(data.categories, "Perkataan bahasa " .. full_langname .. " dieja dalam berbilang tulisan") end end -- Categorise for unusual characters. Takes into account combining characters, so that we can categorise for characters with diacritics that aren't encoded as atomic characters (e.g. U̠). These can be in two formats: single combining characters (i.e. character + diacritic(s)) or double combining characters (i.e. character + diacritic(s) + character). Each can have any number of diacritics. local standard = data.lang:getStandardCharacters() if standard and not non_categorizable(page.full_raw_pagename) then local function char_category(char) local specials = { ["#"] = "number sign", ["("] = "parentheses", [")"] = "parentheses", ["<"] = "angle brackets", [">"] = "angle brackets", ["["] = "square brackets", ["]"] = "square brackets", ["_"] = "underscore", ["{"] = "braces", ["|"] = "vertical line", ["}"] = "braces", ["ß"] = "ẞ", ["\205\133"] = "", -- this is UTF-8 for U+0345 ( ͅ) ["\239\191\189"] = "replacement character", } char = toNFD(char) :gsub(".[\128-\191]*", function(m) local new_m = specials[m] new_m = new_m or m:uupper() return new_m end) return toNFC(char) end if full_langcode ~= "hi" and full_langcode ~= "lo" then local standard_chars_scripts = {} for _, head in ipairs(data.heads) do standard_chars_scripts[head.sc:getCode()] = true end -- Iterate over the scripts, in case there is more than one (as they can have different sets of standard characters). for code in pairs(standard_chars_scripts) do local sc_standard = data.lang:getStandardCharacters(code) if sc_standard then if page.pagename_len > 1 then local explode_standard = {} local function explode(char) explode_standard[char] = true return "" end local sc_standard = ugsub(sc_standard, page.comb_chars.combined_double, explode) sc_standard = ugsub(sc_standard,page.comb_chars.combined_single, explode) :gsub(".[\128-\191]*", explode) local num_cat_inserted for char in pairs(page.explode_pagename) do if not explode_standard[char] then if char:find("[0-9]") then if not num_cat_inserted then insert(data.categories, "Perkataan dieja dengan nombor bahasa " .. full_langname) num_cat_inserted = true end elseif ufind(char, page.emoji_pattern) then insert(data.categories, "Perkataan dieja dengan emoji bahasa " .. full_langname) else local upper = char_category(char) if not explode_standard[upper] then char = upper end insert(data.categories, "Perkataan dieja dengan " .. char .. " bahasa " .. full_langname) end end end end -- If a diacritic doesn't appear in any of the standard characters, also categorise for it generally. sc_standard = toNFD(sc_standard) for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_single) do if not umatch(sc_standard, diacritic) then insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. " bahasa " .. full_langname) end end for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_double) do if not umatch(sc_standard, diacritic) then insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. "◌ bahasa " .. full_langname) end end end end -- Ancient Greek, Hindi and Lao handled the old way for now, as their standard chars still need to be converted to the new format (because there are a lot of them). elseif ulen(page.pagename) ~= 1 then for character in ugmatch(page.pagename, "([^" .. standard .. "])") do local upper = char_category(character) if not umatch(upper, "[" .. standard .. "]") then character = upper end insert(data.categories, "Perkataan dieja dengan " .. character .. " bahasa " .. full_langname) end end end if data.heads[1].sc:isSystem("alphabet") then local pagename, i = page.pagename:ulower(), 2 while umatch(pagename, "(%a)" .. ("%1"):rep(i)) do i = i + 1 insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan " .. i .. " contoh huruf yang sama berturut-turut") end end -- Categorise for palindromes if not data.nopalindromecat and namespace ~= "Rekonstruksi" and ulen(page.pagename) > 2 -- FIXME: Use of first script here seems hacky. What is the clean way of doing this in the presence of -- multiple scripts? and is_palindrome(page.pagename, data.lang, data.heads[1].sc) then insert(data.categories, "Palindrom bahasa " .. full_langname) end if namespace == "" and not lang_reconstructed then for _, head in ipairs(data.heads) do if page.full_raw_pagename ~= get_link_page(remove_links(head.term), data.lang, head.sc) then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch/LANGCODE]] track("pagename spelling mismatch", data.lang) break end end end -- Add red link category if called for and we're not a "large" page, where such checks are disabled. if data.checkredlinks and not m_headword_data.large_pages[m_headword_data.pagename] then local plposcat = type(data.checkredlinks) == "string" and data.checkredlinks or data.pos_category check_red_link_inflections_top_level(data, plposcat) end -- Add to various maintenance categories. export.maintenance_cats(page, data.lang, data.categories, data.whole_page_categories) ------------ 10. Format and return headwords, genders, inflections and categories. ------------ -- Format and return all the gathered information. This may add more categories (e.g. gender/number categories), -- so make sure we do it before evaluating `data.categories`. local text = '<span class="headword-line">' .. format_headword(data) .. format_headword_genders(data) .. format_top_level_inflections(data) .. '</span>' -- Language-specific categories. local cats = format_categories( data.categories, data.lang, data.sort_key, page.encoded_pagename, data.force_cat_output or test_force_categories, data.heads[1].sc ) -- Language-agnostic categories. local whole_page_cats = format_categories( data.whole_page_categories, nil, "-" ) return text .. cats .. whole_page_cats end return export 300k6vdtnx49kob85k9vg814otvxtfj 281455 281454 2026-04-22T15:34:00Z Hakimi97 2668 281455 Scribunto text/plain local export = {} -- Named constants for all modules used, to make it easier to swap out sandbox versions. local debug_track_module = "Module:debug/track" local en_utilities_module = "Module:en-utilities" local gender_and_number_module = "Module:gender and number" local headword_data_module = "Module:headword/data" local headword_page_module = "Module:headword/page" local links_module = "Module:links" local load_module = "Module:load" local pages_module = "Module:pages" local palindromes_module = "Module:palindromes" local pron_qualifier_module = "Module:pron qualifier" local scripts_module = "Module:scripts" local scripts_data_module = "Module:scripts/data" local script_utilities_module = "Module:script utilities" local script_utilities_data_module = "Module:script utilities/data" local string_utilities_module = "Module:string utilities" local table_module = "Module:table" local utilities_module = "Module:utilities" local concat = table.concat local dump = mw.dumpObject local insert = table.insert local ipairs = ipairs local max = math.max local new_title = mw.title.new local pairs = pairs local require = require local toNFC = mw.ustring.toNFC local toNFD = mw.ustring.toNFD local type = type local ufind = mw.ustring.find local ugmatch = mw.ustring.gmatch local ugsub = mw.ustring.gsub local umatch = mw.ustring.match --[==[ Loaders for functions in other modules, which overwrite themselves with the target function when called. This ensures modules are only loaded when needed, retains the speed/convenience of locally-declared pre-loaded functions, and has no overhead after the first call, since the target functions are called directly in any subsequent calls.]==] local function debug_track(...) debug_track = require(debug_track_module) return debug_track(...) end local function encode_entities(...) encode_entities = require(string_utilities_module).encode_entities return encode_entities(...) end local function extend(...) extend = require(table_module).extend return extend(...) end local function find_best_script_without_lang(...) find_best_script_without_lang = require(scripts_module).findBestScriptWithoutLang return find_best_script_without_lang(...) end local function format_categories(...) format_categories = require(utilities_module).format_categories return format_categories(...) end local function format_genders(...) format_genders = require(gender_and_number_module).format_genders return format_genders(...) end local function format_pron_qualifiers(...) format_pron_qualifiers = require(pron_qualifier_module).format_qualifiers return format_pron_qualifiers(...) end local function full_link(...) full_link = require(links_module).full_link return full_link(...) end local function get_current_L2(...) get_current_L2 = require(pages_module).get_current_L2 return get_current_L2(...) end local function get_link_page(...) get_link_page = require(links_module).get_link_page return get_link_page(...) end local function get_script(...) get_script = require(scripts_module).getByCode return get_script(...) end local function is_palindrome(...) is_palindrome = require(palindromes_module).is_palindrome return is_palindrome(...) end local function language_link(...) language_link = require(links_module).language_link return language_link(...) end local function load_data(...) load_data = require(load_module).load_data return load_data(...) end local function pattern_escape(...) pattern_escape = require(string_utilities_module).pattern_escape return pattern_escape(...) end local function pluralize(...) pluralize = require(en_utilities_module).pluralize return pluralize(...) end local function process_page(...) process_page = require(headword_page_module).process_page return process_page(...) end local function remove_links(...) remove_links = require(links_module).remove_links return remove_links(...) end local function shallow_copy(...) shallow_copy = require(table_module).shallowCopy return shallow_copy(...) end local function tag_text(...) tag_text = require(script_utilities_module).tag_text return tag_text(...) end local function tag_transcription(...) tag_transcription = require(script_utilities_module).tag_transcription return tag_transcription(...) end local function tag_translit(...) tag_translit = require(script_utilities_module).tag_translit return tag_translit(...) end local function trim(...) trim = require(string_utilities_module).trim return trim(...) end local function ulen(...) ulen = require(string_utilities_module).len return ulen(...) end local function ucfirst(...) ucfirst = require(string_utilities_module).ucfirst return ucfirst(...) end --[==[ Loaders for objects, which load data (or some other object) into some variable, which can then be accessed as "foo or get_foo()", where the function get_foo sets the object to "foo" and then returns it. This ensures they are only loaded when needed, and avoids the need to check for the existence of the object each time, since once "foo" has been set, "get_foo" will not be called again.]==] local m_data local function get_data() m_data = load_data(headword_data_module) return m_data end local script_data local function get_script_data() script_data = load_data(scripts_data_module) return script_data end local script_utilities_data local function get_script_utilities_data() script_utilities_data = load_data(script_utilities_data_module) return script_utilities_data end -- If set to true, categories always appear, even in non-mainspace pages local test_force_categories = false -- Add a tracking category to track entries with certain (unusually undesirable) properties. `track_id` is an identifier -- for the particular property being tracked and goes into the tracking page. Specifically, this adds a link in the -- page text to [[Wiktionary:Tracking/headword/TRACK_ID]], meaning you can find all entries with the `track_id` property -- by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID]]. -- -- If `lang` (a language object) is given, an additional tracking page [[Wiktionary:Tracking/headword/TRACK_ID/CODE]] is -- linked to where CODE is the language code of `lang`, and you can find all entries in the combination of `track_id` -- and `lang` by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID/CODE]]. This makes it possible to -- isolate only the entries with a specific tracking property that are in a given language. Note that if `lang` -- references at etymology-only language, both that language's code and its full parent's code are tracked. local function track(track_id, lang) local tracking_page = "headword/" .. track_id if lang and lang:hasType("etymology-only") then debug_track{tracking_page, tracking_page .. "/" .. lang:getCode(), tracking_page .. "/" .. lang:getFullCode()} elseif lang then debug_track{tracking_page, tracking_page .. "/" .. lang:getCode()} else debug_track(tracking_page) end return true end local function text_in_script(text, script_code) local sc = get_script(script_code) if not sc then error("Internal error: Bad script code " .. script_code) end local characters = sc.characters local out if characters then text = ugsub(text, "%W", "") out = ufind(text, "[" .. characters .. "]") end if out then return true else return false end end local spacingPunctuation = "[%s%p]+" --[[ List of punctuation or spacing characters that are found inside of words. Used to exclude characters from the regex above. ]] local wordPunc = "-#%%&@־׳״'.·*’་•:᠊" local notWordPunc = "[^" .. wordPunc .. "]+" -- Format a term (either a head term or an inflection term) along with any left or right qualifiers, labels, references -- or customized separator: `part` is the object specifying the term (and `lang` the language of the term), which should -- optionally contain: -- * left qualifiers in `q`, an array of strings; -- * right qualifiers in `qq`, an array of strings; -- * left labels in `l`, an array of strings; -- * right labels in `ll`, an array of strings; -- * references in `refs`, an array either of strings (formatted reference text) or objects containing fields `text` -- (formatted reference text) and optionally `name` and/or `group`; -- * a separator in `separator`, defaulting to " <i>or</i> " if this is not the first term (j > 1), otherwise "". -- `formatted` is the formatted version of the term itself, and `j` is the index of the term. local function format_term_with_qualifiers_and_refs(lang, part, formatted, j) local function part_non_empty(field) local list = part[field] if not list then return nil end if type(list) ~= "table" then error(("Internal error: Wrong type for `part.%s`=%s, should be \"table\""):format(field, dump(list))) end return list[1] end if part_non_empty("q") or part_non_empty("qq") or part_non_empty("l") or part_non_empty("ll") or part_non_empty("refs") then formatted = format_pron_qualifiers { lang = lang, text = formatted, q = part.q, qq = part.qq, l = part.l, ll = part.ll, refs = part.refs, } end local separator = part.separator or j > 1 and " <i>or</i> " -- use "" to request no separator if separator then formatted = separator .. formatted end return formatted end --[==[Return true if the given head is multiword according to the algorithm used in full_headword().]==] function export.head_is_multiword(head) for possibleWordBreak in ugmatch(head, spacingPunctuation) do if umatch(possibleWordBreak, notWordPunc) then return true end end return false end do local function workaround_to_exclude_chars(s) return (ugsub(s, notWordPunc, "\2%1\1")) end --[==[Add links to a multiword head.]==] function export.add_multiword_links(head, default) head = "\1" .. ugsub(head, spacingPunctuation, workaround_to_exclude_chars) .. "\2" if default then head = head :gsub("(\1[^\2]*)\\([:#][^\2]*\2)", "%1\\\\%2") :gsub("(\1[^\2]*)([:#][^\2]*\2)", "%1\\%2") end --Escape any remaining square brackets to stop them breaking links (e.g. "[citation needed]"). head = encode_entities(head, "[]", true, true) --[=[ use this when workaround is no longer needed: head = "[[" .. ugsub(head, WORDBREAKCHARS, "]]%1[[") .. "]]" Remove any empty links, which could have been created above at the beginning or end of the string. ]=] return (head :gsub("\1\2", "") :gsub("[\1\2]", {["\1"] = "[[", ["\2"] = "]]"})) end end local function non_categorizable(full_raw_pagename) return full_raw_pagename:find("^Lampiran:Gerak isyarat/") or -- Unsupported titles with descriptive names. (full_raw_pagename:find("^Tajuk tidak disokong/") and not full_raw_pagename:find("`")) end local function tag_text_and_add_quals_and_refs(data, head, formatted, j) -- Add language and script wrapper. formatted = tag_text(formatted, data.lang, head.sc, "head", nil, j == 1 and data.id or nil) -- Add qualifiers, labels, references and separator. return format_term_with_qualifiers_and_refs(data.lang, head, formatted, j) end -- Format a headword with transliterations. local function format_headword(data) -- Are there non-empty transliterations? local has_translits = false local has_manual_translits = false ------ Format the headwords. ------ local head_parts = {} local unique_head_parts = {} local has_multiple_heads = not not data.heads[2] for j, head in ipairs(data.heads) do if head.tr or head.ts then has_translits = true end if head.tr and head.tr_manual or head.ts then has_manual_translits = true end local formatted -- Apply processing to the headword, for formatting links and such. if head.term:find("[[", nil, true) and head.sc:getCode() ~= "Image" then formatted = language_link{term = head.term, lang = data.lang} else formatted = data.lang:makeDisplayText(head.term, head.sc, true) end local head_part = tag_text_and_add_quals_and_refs(data, head, formatted, j) insert(head_parts, head_part) -- If multiple heads, try to determine whether all heads display the same. To do this we need to effectively -- rerun the text tagging and addition of qualifiers and references, using 1 for all indices. if has_multiple_heads then local unique_head_part if j == 1 then unique_head_part = head_part else unique_head_part = tag_text_and_add_quals_and_refs(data, head, formatted, 1) end unique_head_parts[unique_head_part] = true end end local set_size = 0 if has_multiple_heads then for _ in pairs(unique_head_parts) do set_size = set_size + 1 end end if set_size == 1 then head_parts = head_parts[1] else head_parts = concat(head_parts) end if has_manual_translits then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr/LANGCODE]] track("manual-tr", data.lang) end ------ Format the transliterations and transcriptions. ------ local translits_formatted if has_translits then local translit_parts = {} for _, head in ipairs(data.heads) do if head.tr or head.ts then local this_parts = {} if head.tr then insert(this_parts, tag_translit(head.tr, data.lang:getCode(), "head", nil, head.tr_manual)) if head.ts then insert(this_parts, " ") end end if head.ts then insert(this_parts, "/" .. tag_transcription(head.ts, data.lang:getCode(), "head") .. "/") end insert(translit_parts, concat(this_parts)) end end translits_formatted = " (" .. concat(translit_parts, " <i>or</i> ") .. ")" local langname = data.lang:getCanonicalName() local transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus") local saw_translit_page = false if transliteration_page and transliteration_page:getContent() then translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted saw_translit_page = true end -- If data.lang is an etymology-only language and we didn't find a translation page for it, fall back to the -- full parent. if not saw_translit_page and data.lang:hasType("etymology-only") then langname = data.lang:getFullName() transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus") if transliteration_page and transliteration_page:getContent() then translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted end end else translits_formatted = "" end ------ Paste heads and transliterations/transcriptions. ------ local lemma_gloss if data.gloss then lemma_gloss = ' <span class="ib-content qualifier-content">' .. data.gloss .. '</span>' else lemma_gloss = "" end return head_parts .. translits_formatted .. lemma_gloss end local function format_headword_genders(data) local retval = "" if data.genders and data.genders[1] then if data.gloss then retval = "," end local pos_for_cat if not data.nogendercat then local no_gender_cat = (m_data or get_data()).no_gender_cat if not (no_gender_cat[data.lang:getCode()] or no_gender_cat[data.lang:getFullCode()]) then pos_for_cat = (m_data or get_data()).pos_for_gender_number_cat[data.pos_category:gsub("^reconstructed ", "")] end end local text, cats = format_genders(data.genders, data.lang, pos_for_cat) if cats then extend(data.categories, cats) end retval = retval .. "&nbsp;" .. text end return retval end -- Forward reference local format_inflections local function format_inflection_parts(data, parts) for j, part in ipairs(parts) do if type(part) ~= "table" then part = {term = part} end local partaccel = part.accel local face = part.face or "bold" if face ~= "bold" and face ~= "plain" and face ~= "hypothetical" then error("The face `" .. face .. "` " .. ( (script_utilities_data or get_script_utilities_data()).faces[face] and "should not be used for non-headword terms on the headword line." or "is invalid." )) end -- Here the final part 'or data.nolinkinfl' allows to have 'nolinkinfl=true' -- right into the 'data' table to disable inflection links of the entire headword -- when inflected forms aren't entry-worthy, e.g.: in Vulgar Latin local nolinkinfl = part.face == "hypothetical" or (part.nolink and track("nolink") or part.nolinkinfl) or ( data.nolink and track("nolink") or data.nolinkinfl) local formatted if part.label then -- FIXME: There should be a better way of italicizing a label. As is, this isn't customizable. formatted = "<i>" .. part.label .. "</i>" else -- Convert the term into a full link. Don't show a transliteration here unless enable_auto_translit is -- requested, either at the `parts` level (i.e. per inflection) or at the `data.inflections` level (i.e. -- specified for all inflections). This is controllable in {{head}} using autotrinfl=1 for all inflections, -- or fNautotr=1 for an individual inflection (remember that a single inflection may be associated with -- multiple terms). The reason for doing this is to avoid clutter in headword lines by default in languages -- where the script is relatively straightforward to read by learners (e.g. Greek, Russian), but allow it -- to be enabled in languages with more complex scripts (e.g. Arabic). -- -- FIXME: With nested inflections, should we also respect `enable_auto_translit` at the top level of the -- nested inflections structure? local tr = part.tr or not (parts.enable_auto_translit or data.inflections.enable_auto_translit) and "-" or nil -- FIXME: Temporary errors added 2025-10-03. Remove after a month or so. if part.translit then error("Internal error: Use field `tr` not `translit` for specifying an inflection part translit") end if part.transcription then error("Internal error: Use field `ts` not `transcription` for specifying an inflection part transcription") end local postprocess_annotations if part.inflections then postprocess_annotations = function(infldata) insert(infldata.annotations, format_inflections(data, part.inflections)) end end formatted = full_link( { term = not nolinkinfl and part.term or nil, alt = part.alt or (nolinkinfl and part.term or nil), lang = part.lang or data.lang, sc = part.sc or parts.sc or nil, gloss = part.gloss, pos = part.pos, lit = part.lit, id = part.id, genders = part.genders, tr = tr, ts = part.ts, accel = partaccel or parts.accel, postprocess_annotations = postprocess_annotations, }, face ) end parts[j] = format_term_with_qualifiers_and_refs(part.lang or data.lang, part, formatted, j) end local parts_output if parts[1] then parts_output = (parts.label and " " or "") .. concat(parts) elseif parts.request then parts_output = " <small>[please provide]</small>" insert(data.categories, "Requests for inflections in " .. data.lang:getFullName() .. " entries") else parts_output = "" end local parts_label = parts.label and ("<i>" .. parts.label .. "</i>") or "" return format_term_with_qualifiers_and_refs(data.lang, parts, parts_label .. parts_output, 1) end -- Format the inflections following the headword or nested after a given inflection. Declared local above. function format_inflections(data, inflections) if inflections and inflections[1] then -- Format each inflection individually. for key, infl in ipairs(inflections) do inflections[key] = format_inflection_parts(data, infl) end return concat(inflections, ", ") else return "" end end -- Format the top-level inflections following the headword. Currently this just adds parens around the -- formatted comma-separated inflections in `data.inflections`. local function format_top_level_inflections(data) local result = format_inflections(data, data.inflections) if result ~= "" then return " (" .. result .. ")" else return result end end -- Forward reference local check_red_link_inflections -- Check a single inflection (which consists of a label and zero or more terms, each possibly with nested inflections) -- for red links. If so, insert a red-link category based on `plpos` (the plural part of speech to insert in the -- category), stop further processing, and return true. If no red links found, return false. local function check_red_link_inflection_parts(data, parts, plpos) for _, part in ipairs(parts) do if type(part) ~= "table" then part = {term = part} end local term = part.term if term and not term:find("%[%[") then local stripped_physical_term = get_link_page(term, data.lang, part.sc or parts.sc or nil) if stripped_physical_term then local title = mw.title.new(stripped_physical_term) if title and not title:getContent() then insert(data.categories, data.lang:getFullName() .. " " .. plpos .. " with red links in their headword lines") return true end end end if part.inflections then if check_red_link_inflections(data, part.inflections, plpos) then return true end end end return false end -- Check a set of inflections (each of which describes a single inflection of the term, such as feminine or plural, and -- consists of a label and zero or more terms, each possibly with nested inflections) for red links. If so, insert a -- red-link category based on `plpos` (the plural part of speech to insert in the category), stop further processing, -- and return true. If no red links found, return false. function check_red_link_inflections(data, inflections, plpos) if inflections and inflections[1] then -- Check each inflection individually. for key, infl in ipairs(inflections) do if check_red_link_inflection_parts(data, infl, plpos) then return true end end end return false end -- Check the top-level inflections in `data.inflections`, along with any nested inflections, for red links. If so, -- insert a red-link category based on `plpos` (the plural part of speech to insert in the category), stop further -- processing, and return true. If no red links found, return false. local function check_red_link_inflections_top_level(data, plpos) return check_red_link_inflections(data, data.inflections, plpos) end --[==[ Returns the plural form of `pos`, a raw part of speech input, which could be singular or plural. Irregular plural POS are taken into account (e.g. "kanji" pluralizes to "kanji"). ]==] function export.pluralize_pos(pos) -- Make the plural form of the part of speech return (m_data or get_data()).irregular_plurals[pos] or pos:sub(-1) == "s" and pos or pluralize(pos) end --[==[ Return "lemma" if the given POS is a lemma, "non-lemma form" if a non-lemma form, or nil if unknown. The POS passed in must be in its plural form ("nouns", "prefixes", etc.). If you have a POS in its singular form, call {export.pluralize_pos()} above to pluralize it in a smart fashion that knows when to add "-s" and when to add "-es", and also takes into account any irregular plurals. If `best_guess` is given and the POS is in neither the lemma nor non-lemma list, guess based on whether it ends in " forms"; otherwise, return nil. ]==] function export.pos_lemma_or_nonlemma(plpos, best_guess) local m_headword_data = m_data or get_data() local isLemma = m_headword_data.lemmas -- Is it a lemma category? if isLemma[plpos] then return "Lema" end local plpos_no_recon = plpos:gsub("^reconstructed ", "") if isLemma[plpos_no_recon] then return "Lema" end -- Is it a nonlemma category? local isNonLemma = m_headword_data.nonlemmas if isNonLemma[plpos] or isNonLemma[plpos_no_recon] then return "Bentuk bukan lema" end local plpos_no_mut = plpos:gsub("^mutated ", "") if isLemma[plpos_no_mut] or isNonLemma[plpos_no_mut] then return "Bentuk bukan lema" elseif best_guess then return plpos:find("^Bentuk ") and "Bentuk bukan lema" or "Lema" else return nil end end --[==[ Canonicalize a part of speech as specified in 2= in {{tl|head}}. This checks for POS aliases and non-lemma form aliases ending in 'f', and then pluralizes if the POS term does not have an invariable plural. ]==] function export.canonicalize_pos(pos) -- FIXME: Temporary code to throw an error for alias 'pre' (= preposition) that will go away. if pos == "pre" then -- Don't throw error on 'pref' as it's an alias for "prefix". error("POS 'pre' for 'preposition' no longer allowed as it's too ambiguous; use 'prep'") end -- Likewise for pro = pronoun. if pos == "pro" or pos == "prof" then error("POS 'pro' for 'pronoun' no longer allowed as it's too ambiguous; use 'pron'") end local m_headword_data = m_data or get_data() if m_headword_data.pos_aliases[pos] then pos = m_headword_data.pos_aliases[pos] elseif pos:sub(-1) == "f" then pos = pos:sub(1, -2) pos = "Bentuk " .. (m_headword_data.pos_aliases[pos] or pos) end return export.pluralize_pos(pos) end -- Find and return the maximum index in the array `data[element]` (which may have gaps in it), and initialize it to a -- zero-length array if unspecified. Check to make sure all keys are numeric (other than "maxindex", which is set by -- [[Module:parameters]] for list parameters), all values are strings, and unless `allow_blank_string` is given, -- no blank (zero-length) strings are present. local function init_and_find_maximum_index(data, element, allow_blank_string) local maxind = 0 if not data[element] then data[element] = {} end local typ = type(data[element]) if typ ~= "table" then error(("Internal error: In full_headword(), `data.%s` must be an array but is a %s"):format(element, typ)) end for k, v in pairs(data[element]) do if k ~= "maxindex" then if type(k) ~= "number" then error(("Internal error: Unrecognized non-numeric key '%s' in `data.%s`"):format(k, element)) end if k > maxind then maxind = k end if v then if type(v) ~= "string" then error(("Internal error: For key '%s' in `data.%s`, value should be a string but is a %s"):format(k, element, type(v))) end if not allow_blank_string and v == "" then error(("Internal error: For key '%s' in `data.%s`, blank string not allowed; use 'false' for the default"):format(k, element)) end end end end return maxind end --[==[ -- Add the page to various maintenance categories for the language and the -- whole page. These are placed in the headword somewhat arbitrarily, but -- mainly because headword templates are mandatory for entries (meaning that -- in theory it provides full coverage). -- -- This is provided as an external entry point so that modules which transclude -- information from other entries (such as {{tl|ja-see}}) can take advantage -- of this feature as well, because they are used in place of a conventional -- headword template.]==] do -- Handle any manual sortkeys that have been specified in raw categories -- by tracking if they are the same or different from the automatically- -- generated sortkey, so that we can track them in maintenance -- categories. local function handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats) sortkey = sortkey or lang:makeSortKey(page.pagename) -- If there are raw categories with no sortkey, then they will be -- sorted based on the default MediaWiki sortkey, so we check against -- that. if tbl == true then if page.raw_defaultsort ~= sortkey then insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik") end return end local redundant, different for k in pairs(tbl) do if k == sortkey then redundant = true else different = true end end if redundant then insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih lewah") end if different then insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik") end return sortkey end function export.maintenance_cats(page, lang, lang_cats, page_cats) extend(page_cats, page.cats) lang = lang:getFull() -- since we are just generating categories local canonical = lang:getCanonicalName() local tbl, sortkey = page.wikitext_topic_cat[lang:getCode()] if tbl then sortkey = handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats) insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori topik yang menggunakan penanda mentah") end tbl = page.wikitext_langname_cat[canonical] if tbl then handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats) insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori nama bahasa yang menggunakan penanda mentah") end local current_L2 = get_current_L2() if current_L2 then local trimmed_L2 = trim(current_L2) local expected_L2 = "Bahasa " .. canonical if trimmed_L2 ~= expected_L2 then insert(lang_cats, "Entri bahasa " .. canonical .. " dengan pengepala bahasa tidak betul") -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul/LANGCODE]] track("pengepala bahasa tidak betul", lang) end end end end --[==[This is the primary external entry point. {{lua|full_headword(data)}} This is used by {{temp|head}} and various language-specific headword templates (e.g. {{temp|ru-adj}} for Russian adjectives, {{temp|de-noun}} for German nouns, etc.) to display an entire headword line. See [[#Further explanations for full_headword()]] ]==] function export.full_headword(data) -- Prevent data from being destructively modified. local data = shallow_copy(data) ------------ 1. Basic checks for old-style (multi-arg) calling convention. ------------ if data.getCanonicalName then error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) of properties, not a language object") end if not data.lang or type(data.lang) ~= "table" or not data.lang.getCode then error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) and `data.lang` must be a language object") end if data.id and type(data.id) ~= "string" then error("Internal error: The id in the data table should be a string.") end ------------ 2. Initialize pagename etc. ------------ local langcode = data.lang:getCode() local full_langcode = data.lang:getFullCode() local langname = data.lang:getCanonicalName() local full_langname = data.lang:getFullName() local raw_pagename = data.pagename local page local m_headword_data = m_data or get_data() if raw_pagename and raw_pagename ~= m_headword_data.pagename then -- for testing, doc pages, etc. -- data.pagename is often set on documentation and test pages through the pagename= parameter of various -- templates, to emulate running on that page. Having a large number of such test templates on a single -- page often leads to timeouts, because we fetch and parse the contents of each page in turn. However, -- we don't really need to do that and can function fine without fetching and parsing the contents of a -- given page, so turn off content fetching/parsing (and also setting the DEFAULTSORT key through a parser -- function, which is *slooooow*) in certain namespaces where test and documentation templates are likely to -- be found and where actual content does not live (User, Template, Module). local actual_namespace = m_headword_data.page.namespace local no_fetch_content = actual_namespace == "User" or actual_namespace == "Template" or actual_namespace == "Module" page = process_page(raw_pagename, no_fetch_content) else page = m_headword_data.page end local namespace = page.namespace ------------ 3. Initialize `data.heads` table; if old-style, convert to new-style. ------------ if type(data.heads) == "table" and type(data.heads[1]) == "table" then -- new-style if data.translits or data.transcriptions then error("Internal error: In full_headword(), if `data.heads` is new-style (array of head objects), `data.translits` and `data.transcriptions` cannot be given") end else -- convert old-style `heads`, `translits` and `transcriptions` to new-style local maxind = max( init_and_find_maximum_index(data, "heads"), init_and_find_maximum_index(data, "translits", true), init_and_find_maximum_index(data, "transcriptions", true) ) for i = 1, maxind do data.heads[i] = { term = data.heads[i], tr = data.translits[i], ts = data.transcriptions[i], } end end -- Make sure there's at least one head. if not data.heads[1] then data.heads[1] = {} end ------------ 4. Initialize and validate `data.categories` and `data.whole_page_categories`, and determine `pos_category` if not given, and add basic categories. ------------ -- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]] if data.altform then data.noposcat = true end init_and_find_maximum_index(data, "categories") init_and_find_maximum_index(data, "whole_page_categories") local pos_category_already_present = false if data.categories[1] then local escaped_langname = pattern_escape(full_langname) local matches_lang_pattern = "^" .. escaped_langname .. " " for _, cat in ipairs(data.categories) do -- Does the category begin with the language name? If not, tag it with a tracking category. if not cat:find(matches_lang_pattern) then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category/LANGCODE]] track("no lang category", data.lang) end end -- If `pos_category` not given, try to infer it from the first specified category. If this doesn't work, we -- throw an error below. if not data.pos_category and data.categories[1]:find(matches_lang_pattern) then data.pos_category = data.categories[1]:gsub(matches_lang_pattern, "") -- Optimization to avoid inserting category already present. pos_category_already_present = true end end if not data.pos_category then error("Internal error: `data.pos_category` not specified and could not be inferred from the categories given in " .. "`data.categories`. Either specify the plural part of speech in `data.pos_category` " .. "(e.g. \"proper nouns\") or ensure that the first category in `data.categories` is formed from the " .. "language's canonical name plus the plural part of speech (e.g. \"Norwegian Bokmål proper nouns\")." ) end -- Insert a category at the beginning for the part of speech unless it's already present or `data.noposcat` given. if not pos_category_already_present and not data.noposcat then local pos_category = ucfirst(data.pos_category) .. " bahasa " .. full_langname -- FIXME: [[User:Theknightwho]] Why is this special case here? Please add an explanatory comment. if pos_category ~= "Aksara Han rentas bahasa" then insert(data.categories, 1, pos_category) end end -- Try to determine whether the part of speech refers to a lemma or a non-lemma form; if we can figure this out, -- add an appropriate category. local postype = export.pos_lemma_or_nonlemma(data.pos_category) if not postype then -- We don't know what this category is, so tag it with a tracking category. -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/LANGCODE]] track("unrecognized pos", data.lang) -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS/LANGCODE]] track("unrecognized pos/pos/" .. data.pos_category, data.lang) elseif not data.noposcat then insert(data.categories, 1, ucfirst(postype) .. " bahasa " .. full_langname) end -- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]] if data.altform then insert(data.categories, 1, "Bentuk alternatif bahasa " .. full_langname) end ------------ 5. Create a default headword, and add links to multiword page names. ------------ -- Determine if this is an "anti-asterisk" term, i.e. an attested term in a language that must normally be -- reconstructed. local is_anti_asterisk = data.heads[1].term and data.heads[1].term:find("^!!") local lang_reconstructed = data.lang:hasType("reconstructed") if is_anti_asterisk then if not lang_reconstructed then error("Anti-asterisk feature (head= beginning with !!) can only be used with reconstructed languages") end lang_reconstructed = false end -- Determine if term is reconstructed local is_reconstructed = namespace == "Rekonstruksi" or data.lang:hasType("reconstructed") -- Create a default headword based on the pagename, which is determined in -- advance by the data module so that it only needs to be done once. local default_head = page.pagename -- Add links to multi-word page names when appropriate if not (is_reconstructed or data.nolinkhead) then local no_links = m_headword_data.no_multiword_links if not (no_links[langcode] or no_links[full_langcode]) and export.head_is_multiword(default_head) then default_head = export.add_multiword_links(default_head, true) end end if is_reconstructed then default_head = "*" .. default_head end ------------ 6. Check the namespace against the language type. ------------ if namespace == "" then if lang_reconstructed then error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Rekonstruksi: ") elseif data.lang:hasType("appendix-constructed") then error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Lampiran: ") end elseif namespace == "Petikan" or namespace == "Tesaurus" then error("Templat pengepala tidak boleh digunakan dalam ruang nama " .. namespace .. ": .") end ------------ 7. Fill in missing values in `data.heads`. ------------ -- True if any script among the headword scripts has spaces in it. local any_script_has_spaces = false -- True if any term has a redundant head= param. local has_redundant_head_param = false for _, head in ipairs(data.heads) do ------ 7a. If missing head, replace with default head. if not head.term then head.term = default_head elseif head.term == default_head then has_redundant_head_param = true elseif is_anti_asterisk and head.term == "!!" then -- If explicit head=!! is given, it's an anti-asterisk term and we fill in the default head. head.term = "!!" .. default_head elseif head.term:find("^[!?]$") then -- If explicit head= just consists of ! or ?, add it to the end of the default head. head.term = default_head .. head.term end head.term_no_initial_bang_bang = is_anti_asterisk and head.term:sub(3) or head.term if is_reconstructed then local head_term = head.term if head_term:find("%[%[") then head_term = remove_links(head_term) end if head_term:sub(1, 1) ~= "*" then error("The headword '" .. head_term .. "' must begin with '*' to indicate that it is reconstructed.") end end ------ 7b. Try to detect the script(s) if not provided. If a per-head script is provided, that takes precedence, ------ otherwise fall back to the overall script if given. If neither given, autodetect the script. local auto_sc = data.lang:findBestScript(head.term) if ( auto_sc:getCode() == "None" and find_best_script_without_lang(head.term):getCode() ~= "None" ) then insert(data.categories, "Perkataan bahasa " .. full_langname .. " dalam bentuk tulisan tidak piawai") end if not (head.sc or data.sc) then -- No script code given, so use autodetected script. head.sc = auto_sc else if not head.sc then -- Overall script code given. head.sc = data.sc end -- Track uses of sc parameter. if head.sc:getCode() == auto_sc:getCode() then track("redundant script code", data.lang) if not data.no_script_code_cat then insert(data.categories, "Perkataan dengan kod tulisan lewah bahasa " .. full_langname ) end else track("non-redundant manual script code", data.lang) if not data.no_script_code_cat then insert(data.categories, "Perkataan dengan kod tulisan manual tidak lewah bahasa " .. full_langname ) end end end -- If using a discouraged character sequence, add to maintenance category. if head.sc:hasNormalizationFixes() == true then local composed_head = toNFC(head.term) if head.sc:fixDiscouragedSequences(composed_head) ~= composed_head then insert(data.whole_page_categories, "Laman menggunakan jujukan aksara tidak digalakkan") end end any_script_has_spaces = any_script_has_spaces or head.sc:hasSpaces() ------ 7c. Create automatic transliterations for any non-Latin headwords without manual translit given ------ (provided automatic translit is available, e.g. not in Persian or Hebrew). -- Make transliterations head.tr_manual = nil -- Try to generate a transliteration if necessary if head.tr == "-" then head.tr = nil else local notranslit = m_headword_data.notranslit if not (notranslit[langcode] or notranslit[full_langcode]) and head.sc:isTransliterated() then head.tr_manual = not not head.tr local text = head.term_no_initial_bang_bang if not data.lang:link_tr(head.sc) then text = remove_links(text) end local automated_tr = data.lang:transliterate(text, head.sc) if automated_tr then local manual_tr = head.tr if manual_tr then if remove_links(manual_tr) == remove_links(automated_tr) then insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi lewah") else insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi manual tidak lewah") end end if not manual_tr then head.tr = automated_tr end end -- There is still no transliteration? -- Add the entry to a cleanup category. if not head.tr then head.tr = "<small>transliteration needed</small>" -- FIXME: No current support for 'Request for transliteration of Classical Persian terms' or similar. -- Consider adding this support in [[Module:category tree/poscatboiler/data/entry maintenance]]. insert(data.categories, "Permintaan transliterasi perkataan bahasa " .. full_langname) else -- Otherwise, trim it. head.tr = trim(head.tr) end end end -- Link to the transliteration entry for languages that require this. if head.tr and data.lang:link_tr(head.sc) then head.tr = full_link{ term = head.tr, lang = data.lang, sc = get_script("Latn"), tr = "-" } end end ------------ 8. Maybe tag the title with the appropriate script code, using the `display_title` mechanism. ------------ -- Assumes that the scripts in "toBeTagged" will never occur in the Reconstruction namespace. -- (FIXME: Don't make assumptions like this, and if you need to do so, throw an error if the assumption is violated.) -- Avoid tagging ASCII as Hani even when it is tagged as Hani in the headword, as in [[check]]. The check for ASCII -- might need to be expanded to a check for any Latin characters and whitespace or punctuation. local display_title -- Where there are multiple headwords, use the script for the first. This assumes the first headword is similar to -- the pagename, and that headwords that are in different scripts from the pagename aren't first. This seems to be -- about the best we can do (alternatively we could potentially do script detection on the pagename). local dt_script = data.heads[1].sc local dt_script_code = dt_script:getCode() local page_non_ascii = namespace == "" and not page.pagename:find("^[%z\1-\127]+$") local unsupported_pagename, unsupported = page.full_raw_pagename:gsub("^Tajuk tidak disokong/", "") if unsupported == 1 and page.unsupported_titles[unsupported_pagename] then display_title = 'Tajuk tidak disokong/<span class="' .. dt_script_code .. '">' .. page.unsupported_titles[unsupported_pagename] .. '</span>' elseif page_non_ascii and m_headword_data.toBeTagged[dt_script_code] or (dt_script_code == "Jpan" and (text_in_script(page.pagename, "Hira") or text_in_script(page.pagename, "Kana"))) or (dt_script_code == "Kore" and text_in_script(page.pagename, "Hang")) then display_title = '<span class="' .. dt_script_code .. '">' .. page.full_raw_pagename .. '</span>' -- Keep Han entries region-neutral in the display title. elseif page_non_ascii and (dt_script_code == "Hant" or dt_script_code == "Hans") then display_title = '<span class="Hani">' .. page.full_raw_pagename .. '</span>' elseif namespace == "Rekonstruksi" then local matched display_title, matched = ugsub( page.full_raw_pagename, "^(Rekonstruksi:[^/]+/)(.+)$", function(before, term) return before .. tag_text(term, data.lang, dt_script) end ) if matched == 0 then display_title = nil end end -- FIXME: Generalize this. -- If the current language uses ur-Arab (for Urdu, etc.), ku-Arab (Central Kurdish) or pa-Arab -- (Shahmukhi, for Punjabi) and there's more than one language on the page, don't set the display title -- because these three scripts display in Nastaliq and we don't want this for terms that also exist in other -- languages that don't display in Nastaliq (e.g. Arabic or Persian) to display in Nastaliq. Because the word -- "Urdu" occurs near the end of the alphabet, Urdu fonts tend to override the fonts of other languages. -- FIXME: This is checking for more than one language on the page but instead needs to check if there are any -- languages using scripts other than the ones just mentioned. if (dt_script_code == "ur-Arab" or dt_script_code == "ku-Arab" or dt_script_code == "pa-Arab") and page.L2_list.n > 1 then display_title = nil end if display_title then mw.getCurrentFrame():callParserFunction( "DISPLAYTITLE", display_title ) end ------------ 9. Insert additional categories. ------------ if data.force_cat_output then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/force cat output]] track("force cat output") end if has_redundant_head_param then if not data.no_redundant_head_cat then -- This is not the right way to go about this; too many exceptions and problems due to language-specific headword -- handling customization. If we want this, it should be opt-in by a given language passing in the default headword. -- insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan parameter kepala lewah") end end -- If the first head is multiword (after removing links), maybe insert into "LANG multiword terms". if not data.nomultiwordcat and any_script_has_spaces and postype == "lemma" then local no_multiword_cat = m_headword_data.no_multiword_cat if not (no_multiword_cat[langcode] or no_multiword_cat[full_langcode]) then -- Check for spaces or hyphens, but exclude prefixes and suffixes. -- Use the pagename, not the head= value, because the latter may have extra -- junk in it, e.g. superscripted text that throws off the algorithm. local no_hyphen = m_headword_data.hyphen_not_multiword_sep -- Exclude hyphens if the data module states that they should for this language. local checkpattern = (no_hyphen[langcode] or no_hyphen[full_langcode]) and ".[%s፡]." or ".[%s%-፡]." local is_multiword = umatch(page.pagename, checkpattern) if is_multiword and not non_categorizable(page.full_raw_pagename) then insert(data.categories, "Perkataan berbilang kata bahasa " .. full_langname) elseif not is_multiword then local long_word_threshold = m_headword_data.long_word_thresholds[langcode] or m_headword_data.long_word_thresholds[full_langcode] if long_word_threshold and ulen(page.pagename) >= long_word_threshold then insert(data.categories, "Perkataan panjang bahasa " .. full_langname) end end end end local default_sccat = m_headword_data.default_sccat if data.sccat or data.sccat == nil and (default_sccat[langcode] or default_sccat[full_langcode]) then for _, head in ipairs(data.heads) do insert(data.categories, ucfirst(data.pos_category) .. " bahasa " .. full_langname .. " dalam " .. head.sc:getDisplayForm()) end end -- Reconstructed terms often use weird combinations of scripts and realistically aren't spelled so much as notated. if namespace ~= "Rekonstruksi" then -- Map from languages to a string containing the characters to ignore when considering whether a term has -- multiple written scripts in it. Typically these are Greek or Cyrillic letters used for their phonetic -- values. local characters_to_ignore = { ["aaq"] = "αάὰ", -- Penobscot (Algonquian) ["acy"] = "δθ", -- Cypriot Arabic ["aez"] = "β", -- Aeka (Trans-New Guinea) ["anc"] = "γ", -- Ngas (Chadic/Afroasiatic) ["aou"] = "χ", -- A'ou (Kra-Dai) ["art-blk"] = "ч", -- Bolak (conlang) ["awg"] = "β", -- Anguthimri (Pama-Nyungan) ["az"] = "ь", -- Azerbaijani (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["ba"] = "ь", -- Bashkir (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["bhp"] = "β", -- Bima (Austronesian) ["bjz"] = "β", -- Baruga (Trans-New Guinea) ["byk"] = "θ", -- Biao (Kra-Dai) ["cdy"] = "θ", -- Chadong (Kra-Dai) ["chp"] = "θ", -- Chipewyan (Athabaskan) ["cjh"] = "χ", -- Upper Chehalis (Salishan) ["clm"] = "χ", -- Klallam (Salishan) ["col"] = "χ", -- Colombia-Wenatchi (Salishan) ["coo"] = "χθ", -- Comox (Salishan) ["crx"] = "θ", -- Carrier (Athabaskan) ["ets"] = "θ", -- Yekhee (Edoid/Niger-Congo) ["ett"] = "χ", -- Etruscan (isolate; in romanizations) ["fla"] = "χ", -- Montana Salish (Salishan) ["grt"] = "་", -- Garo (South Asian Sino-Tibetan) ["gmw-gts"] = "χ", -- Gottscheerish (Bavarian variant spoken in Slovenia) ["hur"] = "χθ", -- Halkomelem (Salishan) ["itc-psa"] = "f", -- Pre-Samnite (Italic; normally written in Greek) ["izh"] = "ь", -- Ingrian (Finnic) ["kic"] = "θ", -- Kickapoo (Algonquian) ["kk"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["ky"] = "ь", -- Kyrgyz (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["lil"] = "χ", -- Lillooet (Salishan) ["lsi"] = "ꓹ", -- Lashi (Lolo-Burmese/Sino-Tibetan; represents a glottal stop) ["mhz"] = "β", -- Mor (Austronesian) ["mqn"] = "β", -- Moronene (Austronesian) ["neg"]= "ӡā", -- Negidal (Tungusic; normally in Cyrillic) ["oka"] = "χ", -- Okanagan (Salishan) ["ole"] = "θ", -- Olekha (Sino-Tibetan) ["oui"] = "γβ", -- Old Uyghur (Turkic; FIXME: others? E.g. Greek delta (δ)?) ["pox"] = "χ", -- Polabian (West Slavic) ["rif"] = "ε", -- Tarifit (Berber) ["rom"] = "Θθ", -- Romani (Indic: International Standard; two different thetas???) ["rpn"] = "β", -- Repanbitip (Austronesian) ["sah"] = "ь", -- Yakut (Turkic; 1929 - 1939 Latin spelling) ["sit-jap"] = "χ", -- Japhug (Sino-Tibetan) ["sjw"] = "θ", -- Shawnee (Algonquian) ["squ"] = "χ", -- Squamish (Salishan) ["str"] = "χθ", -- Saanich (Salishan) ["teh"] = "χ", -- Tehuelche (Chonan; spoken in Argentina) ["tep"] = "η", -- Tepecano (Uto-Aztecan) ["thp"] = "χ", -- Thompson (Salishan) ["tk"] = "ь", -- Turkmen (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["tt"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938) ["twa"] = "χ", -- Twana (Salishan) ["wbl"] = "ы", -- Wakhi (Iranian) ["xbc"] = "ϸ", -- Bactrian (Iranian; represents š; normally written in Greek) ["yha"] = "θ", -- Baha (Kra-Dai) ["za"] = "зч", -- Zhuang (Tai/Kra-Dai); 1957-1982 alphabet used two Cyrillic letters (as well as some others like -- ƃ, ƅ, ƨ, ɯ and ɵ that look like Cyrillic or Greek but are actually Latin) ["zlw-slv"] = "χђћ", -- Slovincian (West Slavic; FIXME: χ is Greek, the other two are Cyrillic, but I'm not sure -- the currect characters are being chosen in the entry names) ["zng"] = "θ", -- Mang (Mon-Khmer) ["ztp"] = "θ", -- Loxicha Zapotec (Zapotecan) } -- Determine how many real scripts are found in the pagename, where we exclude symbols and such. We exclude -- scripts whose `character_category` is false as well as Zmth (mathematical notation symbols), which has a -- category of "Mathematical notation symbols". When counting scripts, we need to elide language-specific -- variants because e.g. Beng and as-Beng have slightly different characters but we don't want to consider them -- two different scripts (e.g. [[এৰ]] has two characters which are detected respectively as Beng and as-Beng). local seen_scripts = {} local num_seen_scripts = 0 local num_loops = 0 local canon_pagename = page.pagename local ch_to_ignore = characters_to_ignore[full_langcode] if ch_to_ignore then canon_pagename = ugsub(canon_pagename, "[" .. ch_to_ignore .. "]", "") end while true do if canon_pagename == "" or num_seen_scripts >= 2 or num_loops >= 10 then break end -- Make sure we don't get into a loop checking the same script over and over again; happens with e.g. [[ᠪᡳ]] num_loops = num_loops + 1 local pagename_script = find_best_script_without_lang(canon_pagename, "None only as last resort") local script_chars = pagename_script.characters if not script_chars then -- we are stuck; this happens with None break end local script_code = pagename_script:getCode() local replaced canon_pagename, replaced = ugsub(canon_pagename, "[" .. script_chars .. "]", "") if ( replaced and script_code ~= "Zmth" and (script_data or get_script_data())[script_code] and script_data[script_code].character_category ~= false ) then script_code = script_code:gsub("^.-%-", "") if not seen_scripts[script_code] then seen_scripts[script_code] = true num_seen_scripts = num_seen_scripts + 1 end end end if num_seen_scripts > 1 then insert(data.categories, "Perkataan bahasa " .. full_langname .. " dieja dalam berbilang tulisan") end end -- Categorise for unusual characters. Takes into account combining characters, so that we can categorise for characters with diacritics that aren't encoded as atomic characters (e.g. U̠). These can be in two formats: single combining characters (i.e. character + diacritic(s)) or double combining characters (i.e. character + diacritic(s) + character). Each can have any number of diacritics. local standard = data.lang:getStandardCharacters() if standard and not non_categorizable(page.full_raw_pagename) then local function char_category(char) local specials = { ["#"] = "number sign", ["("] = "parentheses", [")"] = "parentheses", ["<"] = "angle brackets", [">"] = "angle brackets", ["["] = "square brackets", ["]"] = "square brackets", ["_"] = "underscore", ["{"] = "braces", ["|"] = "vertical line", ["}"] = "braces", ["ß"] = "ẞ", ["\205\133"] = "", -- this is UTF-8 for U+0345 ( ͅ) ["\239\191\189"] = "replacement character", } char = toNFD(char) :gsub(".[\128-\191]*", function(m) local new_m = specials[m] new_m = new_m or m:uupper() return new_m end) return toNFC(char) end if full_langcode ~= "hi" and full_langcode ~= "lo" then local standard_chars_scripts = {} for _, head in ipairs(data.heads) do standard_chars_scripts[head.sc:getCode()] = true end -- Iterate over the scripts, in case there is more than one (as they can have different sets of standard characters). for code in pairs(standard_chars_scripts) do local sc_standard = data.lang:getStandardCharacters(code) if sc_standard then if page.pagename_len > 1 then local explode_standard = {} local function explode(char) explode_standard[char] = true return "" end local sc_standard = ugsub(sc_standard, page.comb_chars.combined_double, explode) sc_standard = ugsub(sc_standard,page.comb_chars.combined_single, explode) :gsub(".[\128-\191]*", explode) local num_cat_inserted for char in pairs(page.explode_pagename) do if not explode_standard[char] then if char:find("[0-9]") then if not num_cat_inserted then insert(data.categories, "Perkataan dieja dengan nombor bahasa " .. full_langname) num_cat_inserted = true end elseif ufind(char, page.emoji_pattern) then insert(data.categories, "Perkataan dieja dengan emoji bahasa " .. full_langname) else local upper = char_category(char) if not explode_standard[upper] then char = upper end insert(data.categories, "Perkataan dieja dengan " .. char .. " bahasa " .. full_langname) end end end end -- If a diacritic doesn't appear in any of the standard characters, also categorise for it generally. sc_standard = toNFD(sc_standard) for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_single) do if not umatch(sc_standard, diacritic) then insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. " bahasa " .. full_langname) end end for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_double) do if not umatch(sc_standard, diacritic) then insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. "◌ bahasa " .. full_langname) end end end end -- Ancient Greek, Hindi and Lao handled the old way for now, as their standard chars still need to be converted to the new format (because there are a lot of them). elseif ulen(page.pagename) ~= 1 then for character in ugmatch(page.pagename, "([^" .. standard .. "])") do local upper = char_category(character) if not umatch(upper, "[" .. standard .. "]") then character = upper end insert(data.categories, "Perkataan dieja dengan " .. character .. " bahasa " .. full_langname) end end end if data.heads[1].sc:isSystem("alphabet") then local pagename, i = page.pagename:ulower(), 2 while umatch(pagename, "(%a)" .. ("%1"):rep(i)) do i = i + 1 insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan " .. i .. " contoh huruf yang sama berturut-turut") end end -- Categorise for palindromes if not data.nopalindromecat and namespace ~= "Rekonstruksi" and ulen(page.pagename) > 2 -- FIXME: Use of first script here seems hacky. What is the clean way of doing this in the presence of -- multiple scripts? and is_palindrome(page.pagename, data.lang, data.heads[1].sc) then insert(data.categories, "Palindrom bahasa " .. full_langname) end if namespace == "" and not lang_reconstructed then for _, head in ipairs(data.heads) do if page.full_raw_pagename ~= get_link_page(remove_links(head.term), data.lang, head.sc) then -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch]] -- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch/LANGCODE]] track("pagename spelling mismatch", data.lang) break end end end -- Add red link category if called for and we're not a "large" page, where such checks are disabled. if data.checkredlinks and not m_headword_data.large_pages[m_headword_data.pagename] then local plposcat = type(data.checkredlinks) == "string" and data.checkredlinks or data.pos_category check_red_link_inflections_top_level(data, plposcat) end -- Add to various maintenance categories. export.maintenance_cats(page, data.lang, data.categories, data.whole_page_categories) ------------ 10. Format and return headwords, genders, inflections and categories. ------------ -- Format and return all the gathered information. This may add more categories (e.g. gender/number categories), -- so make sure we do it before evaluating `data.categories`. local text = '<span class="headword-line">' .. format_headword(data) .. format_headword_genders(data) .. format_top_level_inflections(data) .. '</span>' -- Language-specific categories. local cats = format_categories( data.categories, data.lang, data.sort_key, page.encoded_pagename, data.force_cat_output or test_force_categories, data.heads[1].sc ) -- Language-agnostic categories. local whole_page_cats = format_categories( data.whole_page_categories, nil, "-" ) return text .. cats .. whole_page_cats end return export iwm7l3x8d8tnsm0zzbesfxagd8a88jz Modul:affix/templates 828 10378 281460 254174 2026-04-23T04:27:49Z Hakimi97 2668 Mengemas kini mengikut padanan Wikikamus bahasa Inggeris (semakan [[en:Special:Diff/89784227|89784227]]) 281460 Scribunto text/plain local export = {} local m_affix = require("Module:affix") local m_utilities = require("Module:utilities") local en_utilities_module = "Module:en-utilities" local parameter_utilities_module = "Module:parameter utilities" local pseudo_loan_module = "Module:affix/pseudo-loan" local insert = table.insert local boolean_param = {type = "boolean"} local function is_property_key(k) return require(parameter_utilities_module).item_key_is_property(k) end local recognized_affix_types = { prefix = "awalan", pre = "awalan", suffix = "akhiran", suf = "akhiran", interfix = "jalinan", inter = "jalinan", infix = "sisipan", ["in"] = "sisipan", circumfix = "apitan", circum = "apitan", ["non-affix"] = "non-affix", naf = "non-affix", root = "non-affix", } local function pre_normalize_affix_type(data) local modtext = data.modtext modtext = modtext:match("^<(.*)>$") if not modtext then error(("Internal error: Passed-in modifier isn't surrounded by angle brackets: %s"):format(data.modtext)) end if recognized_affix_types[modtext] then modtext = "type:" .. modtext end return "<" .. modtext .. ">" end -- Parse raw arguments. A single parameter `data` is passed in, with the following fields: -- * `raw_args`: The raw arguments to parse, normally taken from `frame:getParent().args`. -- * `extra_params`: An optional function of one argument that is called on the `params` structure before parsing; its -- purpose is to specify additional allowed parameters or possibly disable parameters. -- * `has_source`: There is a source-language parameter following 1= (which becomes the "destination" language -- parameter) and preceding the terms. This is currently used for {{pseudo-loan}}. -- * `ilang`: If given, it is a language object that serves as the default for the language. If specified, there is no -- language code specified in 1=; instead the term parameters start directly at 1= (or at 2= if `has_source` is -- given). -- * `require_index_for_pos`: There is no separate |pos= parameter distinct from |pos1=, |pos2=, etc. Instead, -- specifying |pos= results in an error. -- * `dont_require_index`: Allow |foo= to be specified as a synonym for |foo1= (except for |lit=, which remains -- distinct). -- * `allow_type`: Allow |type1=, |type2=, etc. or inline <type:...> for the affix type, and allow a separate |type= -- parameter for the etymology type (FIXME: this may be confusing; consider changing the etymology type to |etype=). -- * `allow_semicolon_separator`: Allow semicolon as a separator, displaying as " or ". This requires changes in the -- display of the output, to not always put a + between the items. -- -- Note that all language parameters are allowed to be etymology-only languages. -- -- Return five values ARGS, ITEMS, LANG_OBJ, SCRIPT_OBJ, SOURCE_LANG_OBJ where ARGS is a table of the parsed arguments; -- ITEMS is the list of parsed items; LANG_OBJ is the language object corresponding to the language code specified in 1= -- (or taken from `ilang` if given); SCRIPT_OBJ is the script object corresponding to sc= (if given, otherwise nil); and -- SOURCE_LANG_OBJ is the language object corresponding to the source-language code specified in 2= (or 1= if `ilang` is -- given) if `has_source` is specified (otherwise nil). local function parse_args(data) local raw_args = data.raw_args local has_source = data.has_source local ilang = data.ilang if raw_args.lang then error("The |lang= parameter is not used by this template. Place the language code in parameter 1 instead.") end local term_index = (ilang and 1 or 2) + (has_source and 1 or 0) local params = { [term_index] = {list = true, allow_holes = true}, ["sort"] = {}, ["nocap"] = boolean_param, -- always allow this even if not used, for use with {{surf}}, which adds it } if not ilang then params[1] = {required = true, type = "language", default = "und"} end local source_index if has_source then source_index = term_index - 1 params[source_index] = {required = true, type = "language", default = "und"} end local m_param_utils = require(parameter_utilities_module) local param_mod_source = {} if not data.dont_require_index then insert(param_mod_source, -- We want to require an index for all params (or use separate_no_index, which also requires an index for the -- param corresponding to the first item). {default = true, require_index = true} ) end insert(param_mod_source, {group = {"link", "ref", "lang", "q", "l", "infl"}}) -- Override lit= to be separate from lit1=. insert(param_mod_source, {param = "lit", separate_no_index = true}) if not data.dont_require_index and not data.require_index_for_pos then -- Override pos= to be separate from pos1=. insert(param_mod_source, {param = "pos", separate_no_index = true}) end if data.allow_type then insert(param_mod_source, {param = "type", separate_no_index = true}) end local param_mods = m_param_utils.construct_param_mods(param_mod_source) if data.extra_params then data.extra_params(params) end local items, args = m_param_utils.parse_list_with_inline_modifiers_and_separate_params { params = params, param_mods = param_mods, raw_args = raw_args, termarg = term_index, parse_lang_prefix = true, track_module = "homophones", -- the inclusion of &lrm; is what [[Module:affix]] has always done default_separator = data.allow_semicolon_separator and " +&lrm; " or nil, special_separators = data.allow_semicolon_separator and {[";"] = " or "} or nil, disallow_custom_separators = not data.allow_semicolon_separator, -- For compatibility, we need to not skip completely unspecified items. It is common, for example, to do -- {{suffix|lang||foo}} to generate "+ -foo". dont_skip_items = true, -- Allow e.g. <infix> to be specified in place of <type:infix>. pre_normalize_modifiers = pre_normalize_affix_type, -- Don't pass in `lang` or `sc`, as they will be used as defaults to initialize the items, which we don't want -- (particularly for `lang`), as the code in [[Module:affix]] uses the presence of `lang` as an indicator that -- a part-specific language was explicitly given. } local lang = ilang or args[1] local source if has_source then source = args[source_index] end -- For compatibility with the prior code, we need to convert items without term or properties to nil. for i = 1, #items do local item = items[i] local saw_item_property = item.term if not saw_item_property then for k, v in pairs(item) do if is_property_key(k) then saw_item_property = true break end end end if not saw_item_property then items[i] = nil elseif item.type then -- Validate and canonicalize affix types. if not recognized_affix_types[item.type] then local valid_types = {} for k in pairs(recognized_affix_types) do insert(valid_types, ("'%s'"):format(k)) end table.sort(recognized_affix_types) error(("Unrecognized affix type '%s' in item %s; valid values are %s"):format( item.type, item.itemno, table.concat(valid_types, ", "))) else item.type = recognized_affix_types[item.type] end end end if args.type and args.type.default and not m_affix.etymology_types[args.type.default] then error("Unrecognized etymology type: '" .. args.type.default .. "'") end return args, items, lang, args.sc.default, source end local function augment_affix_data(data, args, lang, sc) data.lang = lang data.sc = sc data.pos = args.pos and args.pos.default data.lit = args.lit and args.lit.default data.sort_key = args.sort data.type = args.type and args.type.default data.nocap = args.nocap data.notext = args.notext data.nocat = args.nocat data.force_cat = args.force_cat data.l = args.l.default data.ll = args.ll.default data.q = args.q.default data.qq = args.qq.default data.infl = args.infl.default return data end function export.affix(frame) local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, allow_type = true, allow_semicolon_separator = true, } -- There must be at least one part to display. If there are gaps, a term -- request will be shown. if not next(parts) and not args.type.default then if mw.title.getCurrentTitle().nsText == "Templat" then parts = { {term = "awalan-"}, {term = "kata dasar"}, {term = "-akhiran"} } else error("You must provide at least one part.") end end return m_affix.show_affix(augment_affix_data({ parts = parts }, args, lang, sc)) end function export.compound(frame) local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, allow_type = true, allow_semicolon_separator = true, } -- There must be at least one part to display. If there are gaps, a term -- request will be shown. if not next(parts) and not args.type.default then if mw.title.getCurrentTitle().nsText == "Templat" then parts = { {term = "pertama"}, {separator = " +&lrm; ", term = "kedua"} } else error("You must provide at least one part of a compound.") end end return m_affix.show_compound(augment_affix_data({ parts = parts }, args, lang, sc)) end -- FIXME: Temporary for check in compound_like() below for old-style {{contraction}} parameters. Remove eventually. local function ine(arg) if arg == "" then return nil else return arg end end function export.compound_like(frame) local iparams = { ["lang"] = {type = "language"}, ["template"] = {}, ["text"] = {}, ["oftext"] = {}, ["cat"] = {}, ["noaffixcat"] = boolean_param, ["dont_require_index"] = boolean_param, } local iargs = require("Module:parameters").process(frame.args, iparams) local parent_args = frame:getParent().args -- Error to catch most uses of old-style parameters for {{contraction}}. (FIXME: Remove eventually.) local term_param = iargs.lang and 1 or 2 if ine(parent_args[term_param + 2]) and not ine(parent_args[term_param + 1]) and not ine(parent_args.tr2) and not ine(parent_args.ts2) and not ine(parent_args.t2) and not ine(parent_args.gloss2) and not ine(parent_args.g2) and not ine(parent_args.alt2) then error(("You specified a term in %s= and not one in %s=. You probably meant to use t= to specify a gloss instead. " .. "If you intended to specify two terms, put the second term in %s=."):format(term_param + 2, term_param + 1, term_param + 1)) end if not ine(parent_args[term_param + 1]) and not ine(parent_args.alt2) and not ine(parent_args.tr2) and not ine(parent_args.ts2) and ine(parent_args.g2) then error(("You specified a gender in g2= but no term in %s=. You were probably trying to specify two genders for " .. "a single term. To do that, put both genders in g=, comma-separated."):format(term_param + 1)) end local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = parent_args, extra_params = extra_params, ilang = iargs.lang, dont_require_index = iargs.dont_require_index, -- FIXME, why are we doing this? Formerly we had 'params.pos = nil' whose intention was to disable the overall -- pos= while preserving posN=, which is equivalent to the following using the new syntax. But why is this -- necessary? require_index_for_pos = not iargs.dont_require_index, allow_semicolon_separator = true, } local template = iargs.template local nocat = args.nocat local notext = args.notext local text = not notext and iargs.text local oftext = not notext and (iargs.oftext or text and "bagi") local cat = not nocat and iargs.cat local noaffixcat = nocat or iargs.noaffixcat if not next(parts) then if mw.title.getCurrentTitle().nsText == "Templat" then parts = { {term = "pertama"}, {separator = " +&lrm; ", term = "kedua"} } end end return m_affix.show_compound_like(augment_affix_data({ parts = parts, text = text, oftext = oftext, cat = cat, noaffixcat = noaffixcat }, args, lang, sc)) end function export.surface_analysis(frame) local function ine(arg) -- Since we're operating before calling [[Module:parameters]], we need to imitate how that module processes -- arguments, including trimming since numbered arguments don't have automatic whitespace trimming. if not arg then return arg end arg = mw.text.trim(arg) if arg == "" then arg = nil end return arg end local parent_args = frame:getParent().args local etymtext local arg1 = ine(parent_args[1]) if not arg1 then -- Allow omitted first argument to just display "By surface analysis". etymtext = "" elseif arg1:find("^%+") then -- If the first argument (normally a language code) is prefixed with a +, it's a template name. local template_name = arg1:sub(2) local new_args = {} for i, v in pairs(parent_args) do if type(i) == "number" then if i > 1 then new_args[i - 1] = v end else new_args[i] = v end end new_args.nocap = true etymtext = ", " .. frame:expandTemplate { title = template_name, args = new_args } end if etymtext then return (ine(parent_args.nocap) and "m" or "M") .. "elalui [[Lampiran:Glosari#analisis dasar|analisis dasar]]" .. etymtext end local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = parent_args, extra_params = extra_params, allow_type = true, allow_semicolon_separator = true, } -- There must be at least one part to display. If there are gaps, a term -- request will be shown. if not next(parts) then if mw.title.getCurrentTitle().nsText == "Templat" then parts = { {term = "pertama"}, {separator = " +&lrm; ", term = "kedua"} } else error("You must provide at least one part.") end end return m_affix.show_surface_analysis(augment_affix_data({ parts = parts }, args, lang, sc)) end local function check_max_items(items, max_allowed) if #items > max_allowed then local bad_item = items[max_allowed + 1] if bad_item.term then error(("At most %s terms can be specified but saw a term specified for term #%s") :format(max_allowed, max_allowed + 1)) else for k, v in pairs(bad_item) do if is_property_key(k) then error(("At most %s terms can be specified but saw a value for property '%s' of term #%s") :format(max_allowed, k, max_allowed + 1)) end end end error(("Internal error: Something wrong, %s items generated when there should be at most %s, but item #%s doesn't have a term or any properties") :format(#items, max_allowed, max_allowed + 1)) end end function export.circumfix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } check_max_items(parts, 3) local prefix = parts[1] local base = parts[2] local suffix = parts[3] -- Just to make sure someone didn't use the template in a silly way if not (prefix and base and suffix) then if mw.title.getCurrentTitle().nsText == "Templat" then prefix = {term = "apitan", alt = "awalan"} base = {term = "kata dasar"} suffix = {term = "apitan", alt = "akhiran"} else error("You must specify a prefix part, a base term and a suffix part.") end end return m_affix.show_circumfix(augment_affix_data({ prefix = prefix, base = base, suffix = suffix }, args, lang, sc)) end function export.confix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } check_max_items(parts, 3) local prefix = parts[1] local base = parts[3] and parts[2] or nil local suffix = parts[3] or parts[2] -- Just to make sure someone didn't use the template in a silly way if not (prefix and suffix) then if mw.title.getCurrentTitle().nsText == "Templat" then prefix = {term = "awalan"} suffix = {term = "akhiran"} else error("You must specify a prefix part, an optional base term and a suffix part.") end end return m_affix.show_confix(augment_affix_data({ prefix = prefix, base = base, suffix = suffix }, args, lang, sc)) end function export.pseudo_loan(frame) local function extra_params(params) params.notext = boolean_param params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc, source = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, has_source = true, -- FIXME, why are we doing this? Formerly we had 'params.pos = nil' whose intention was to disable the overall -- pos= while preserving posN=, which is equivalent to the following using the new syntax. But why is this -- necessary? require_index_for_pos = true, allow_semicolon_separator = true, } return require(pseudo_loan_module).show_pseudo_loan( augment_affix_data({ source = source, parts = parts }, args, lang, sc)) end function export.infix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } check_max_items(parts, 3) local base = parts[1] local infix = parts[2] -- Just to make sure someone didn't use the template in a silly way if not (base and infix) then if mw.title.getCurrentTitle().nsText == "Templat" then base = {term = "kata dasar"} infix = {term = "sisipan"} else error("You must provide a base term and an infix.") end end return m_affix.show_infix(augment_affix_data({ base = base, infix = infix }, args, lang, sc)) end function export.prefix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } local prefixes = parts local base = nil local max_prefix = 0 for k, v in pairs(prefixes) do max_prefix = math.max(k, max_prefix) end if max_prefix >= 2 then base = prefixes[max_prefix] prefixes[max_prefix] = nil end -- Just to make sure someone didn't use the template in a silly way if not next(prefixes) then if mw.title.getCurrentTitle().nsText == "Templat" then base = {term = "kata dasar"} prefixes = { {term = "awalan"} } else error("You must provide at least one prefix.") end end return m_affix.show_prefix(augment_affix_data({ prefixes = prefixes, base = base }, args, lang, sc)) end function export.suffix(frame) local function extra_params(params) params.nocat = boolean_param params.force_cat = boolean_param end local args, parts, lang, sc = parse_args { raw_args = frame:getParent().args, extra_params = extra_params, } local base = parts[1] local suffixes = {} for k, v in pairs(parts) do suffixes[k - 1] = v end -- Just to make sure someone didn't use the template in a silly way if not next(suffixes) then if mw.title.getCurrentTitle().nsText == "Templat" then base = {term = "kata dasar"} suffixes = { {term = "akhiran"} } else error("You must provide at least one suffix.") end end return m_affix.show_suffix(augment_affix_data({ base = base, suffixes = suffixes }, args, lang, sc)) end function export.derivsee(frame) local iargs = frame.args local iparams = { ["derivtype"] = {}, } local iargs = require("Module:parameters").process(frame.args, iparams) local params = { ["head"] = {}, ["id"] = {}, ["sc"] = {type = "script"}, ["pos"] = {}, } local derivtype = iargs.derivtype params[1] = {required = "true", type = "language", default = "und"} params[2] = {} local args = require("Module:parameters").process(frame:getParent().args, params) local lang = args[1] local term = args[2] or args.head local id = args.id local sc = args.sc local pos = require(en_utilities_module).pluralize(args.pos or "Perkataan") if not term then local SUBPAGE = mw.loadData("Module:headword/data").pagename if lang:hasType("reconstructed") or mw.title.getCurrentTitle().nsText == "Rekonstruksi" then term = "*" .. SUBPAGE elseif lang:hasType("appendix-constructed") then term = SUBPAGE else term = SUBPAGE end end local category = nil local langname = lang:getFullName() if (derivtype == "compound" and pos == nil) then category = "Kata majmuk dengan " .. term .. " bahasa " .. langname elseif derivtype == "compound" and pos == "verbs" then category = "Kata majmuk terbentuk dengan " .. term .. " bahasa " .. langname elseif derivtype == "compound" then category = "Kata majmuk dengan " .. term .. " bahasa " .. langname else category = pos .. " dengan " .. derivtype .. " " .. term .. (id and " (" .. id .. ")" or "") .. " bahasa " .. langname end return require('Module:collapsible category tree').make{ lang = lang, sc = sc, category = category, } end return export gnop5l124mqbcxemso7h12z5ouo813q Modul:affix 828 10384 281459 258373 2026-04-23T04:07:07Z Hakimi97 2668 Mengemas kini mengikut padanan Wikikamus bahasa Inggeris (semakan [[en:Special:Diff/89886462|89886462]]) 281459 Scribunto text/plain local export = {} local debug_force_cat = false -- if set to true, always display categories even on userspace pages local m_links = require("Module:links") local m_str_utils = require("Module:string utilities") local m_table = require("Module:table") local en_utilities_module = "Module:en-utilities" local etymology_module = "Module:etymology" local pron_qualifier_module = "Module:pron qualifier" local scripts_module = "Module:scripts" local utilities_module = "Module:utilities" -- Export this so the category code in [[Module:category tree/etymology]] can access it. export.affix_lang_data_module_prefix = "Module:affix/lang-data/" local rsub = m_str_utils.gsub local usub = m_str_utils.sub local ulen = m_str_utils.len local rfind = m_str_utils.find local rmatch = m_str_utils.match local pluralize = require(en_utilities_module).pluralize local u = m_str_utils.char local ucfirst = m_str_utils.ucfirst local unpack = unpack or table.unpack -- Lua 5.2 compatibility function export.affix_variants(canonical, variants) local mappings = {} for _, variant in ipairs(variants) do mappings[variant] = canonical end return mappings end function export.id_mapping(default, ids) local mapping = { default = default } if ids then for id, target in pairs(ids) do mapping[id] = target end end return mapping end function export.id_mapping_with_affix_variants(base, id_variants) local mappings = {} for id, variants in pairs(id_variants) do for _, variant in ipairs(variants) do mappings[variant] = export.id_mapping(base, {[id] = base}) end end return mappings end function export.merge_tables(...) local result = {} for i = 1, select('#', ...) do local t = select(i, ...) if t then for k, v in pairs(t) do result[k] = v end end end return result end -- Export this so the category code in [[Module:category tree/etymology]] can access it. export.langs_with_lang_specific_data = { ["az"] = true, ["fi"] = true, ["fr"] = true, ["izh"] = true, ["la"] = true, ["sah"] = true, ["tr"] = true, ["trk-pro"] = true, } local default_pos = "Perkataan" --[==[ intro: ===About different types of hyphens ("template", "display" and "lookup"):=== * The "template hyphen" is the per-script hyphen character that is used in template calls to indicate that a term is an affix. This is always a single Unicode char, but there may be multiple possible hyphens for a given script. Normally this is just the regular hyphen character "-", but for some non-Latin-script languages (currently only right-to-left languages), it is different. * The "display hyphen" is the string (which might be an empty string) that is added onto a term as displayed and linked, to indicate that a term is an affix. Currently this is always either the same as the template hyphen or an empty string, but the code below is written generally enough to handle arbitrary display hyphens. Specifically: *# For East Asian languages, the display hyphen is always blank. *# For Arabic-script languages, either tatweel (ـ) or ZWNJ (zero-width non-joiner) are allowed as template hyphens, where ZWNJ is supported primarily for Farsi, because some suffixes have non-joining behavior. The display hyphen corresponding to tatweel is also tatweel, but the display hyphen corresponding to ZWNJ is blank (tatweel is also the default display hyphen, for calls to {{tl|prefix}}/{{tl|suffix}}/etc. that don't include an explicit hyphen). * The "lookup hyphen" is the hyphen that is used when looking up language-specific affix mappings. (These mappings are discussed in more detail below when discussing link affixes.) It depends only on the script of the affix in question. Most scripts (including East Asian scripts) use a regular hyphen "-" as the lookup hyphen, but Hebrew and Arabic have their own lookup hyphens (respectively maqqef and tatweel). Note that for Arabic in particular, there are three possible template hyphens that are recognized (tatweel, ZWNJ and regular hyphen), but mappings must use tatweel. ===About different types of affixes ("template", "display", "link", "lookup" and "category"):=== * A "template affix" is an affix in its source form as it appears in a template call. Generally, a template affix has an attached template hyphen (see above) to indicate that it is an affix and indicate what type of affix it is (prefix, suffix, interfix or circumfix), but some of the older-style templates such as {{tl|suffix}}, {{tl|prefix}}, {{tl|confix}}, etc. have "positional" affixes where the presence of the affix in a certain position (e.g. the second or third parameter) indicates that it is a certain type of affix, whether or not it has an attached template hyphen. * A "display affix" is the corresponding affix as it is actually displayed to the user. The display affix may differ from the template affix for various reasons: *# The display affix may be specified explicitly using the {{para|alt<var>N</var>}} parameter, the `<alt:...>` inline modifier or a piped link of the form e.g. `<nowiki>[[-kas|-käs]]</nowiki>` (here indicating that the affix should display as `-käs` but be linked as `-kas`). Here, the template affix is arguably the entire piped link, while the display affix is `-käs`. *# Even in the absence of {{para|alt<var>N</var>}} parameters, `<alt:...>` inline modifiers and piped links, certain languages have differences between the "template hyphen" specified in the template (which always needs to be specified somehow or other in templates like {{tl|affix}}, to indicate that the term is an affix and what type of affix it is) and the display hyphen (see above), with corresponding differences between template and display affixes. * A (regular) "link affix" is the affix that is linked to when the affix is shown to the user. The link affix is usually the same as the display affix, but will differ in one of three circumstances: *# The display and link affixes are explicitly made different using {{para|alt<var>N</var>}} parameters, `<alt:...>` inline modifiers or piped links, as described above under "display affix". *# For certain languages, certain affixes are mapped to canonical form using language-specific mappings. For example, in Finnish, the adjective-forming suffix {{m|fi|-kas}} appears as {{m|fi|-käs}} after front vowels, but logically both forms are the same suffix and should be linked and categorized the same. Similarly, in Latin, the negative and intensive prefixes spelled {{m|la|in-}} (etymologically two distinct prefixes) appear variously as {{m|la|il-}}, {{m|la|im-}} or {{m|la|ir-}} before certain consonants. Mappings are supplied in [[Module:affix/lang-data/LANGCODE]] to convert Finnish {{m|fi|-käs}} to {{m|fi|-kas}} for linking and categorization purposes. Note that the affixes in the mappings use "lookup hyphens" to indicate the different types of affixes, which is usually the same as the template hyphen but differs for Arabic scripts, because there are multiple possible template hyphens recognized but only one lookup hyphen (tatweel). The form of the affix as used to look up in the mapping tables is called the "lookup affix"; see below. * A "stripped link affix" is a link affix that has been passed through the language's `stripDiacritics()` function, which may strip certain diacritics: e.g. macrons in Latin and Old English (indicating length); acute and grave accents in Russian and various other Slavic languages (indicating stress); vowel diacritics in most Arabic-script languages; and also tatweel in some Arabic-script languages (currently, for example, Persian, Arabic and Urdu strip tatweel, but Ottoman Turkish does not). Stripped link affixes are currently what are used in category names. * A "lookup affix" is the form of the affix as it is looked up in the language-specific lookup mappings described above under link affixes. There are actually two lookup stages: *# First, the affix is looked up in a modified display form (specifically, the same as the display affix but using lookup hyphens). Note that this lookup does not occur if an explicit display form is given using {{para|alt<var>N</var>}} or an `<alt:...>` inline modifier, or if the template affix contains a piped or embedded link. *# If no entry is found, the affix is then looked up in a modified link form (specifically, the modified display form passed through the language's `stripDiacritics()` function, which strips out certain diacritics, but with the lookup hyphen re-added if it was stripped out, as in the case of tatweel in many Arabic-script languages). The reason for this double lookup procedure is to allow for mappings that are sensitive to the extra diacritics, but also allow for mappings that are not sensitive in this fashion (e.g. Russian {{m|ru|-ливый}} occurs both stressed and unstressed, but is the same prefix either way). * A "category affix" is the affix as it appears in categories such as [[:Category:Finnish terms suffixed with -kas| Category:Finnish terms suffixed with ''-kas'']]. The category affix is currently always the same as the stripped link affix. This means that for Arabic-script languages, it may or may not have a tatweel, even if the correponding display affix and regular link affix have a tatweel. As mentioned above, stripDiacritics() strips tatweel for Arabic, Persian and Urdu, but not for Ottoman Turkish. Hence affix categories for Arabic, Persian and Urdu will be missing the tatweel, but affix categories for Ottoman Turkish will have it. An additional complication is that if the template affix contains a ZWNJ, the display (and hence the link and category affixes) will have no hyphen attached in any case. ]==] ----------------------------------------------------------------------------------------- -- Template and display hyphens -- ----------------------------------------------------------------------------------------- --[=[ Per-script template hyphens. The template hyphen is what appears in the {{affix}}/{{prefix}}/{{suffix}}/etc. template (in the wikicode). See above. They key below is a script code, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab' and 'ur-Arab' will match 'Arab'. The value below is a string consisting of one or more hyphen characters. If there is more than one character, the default hyphen must come last and a non-default function must be specified for the script in display_hyphens[] so the correct display hyphen will be specified when no template hyphen is given (in {{suffix}}/{{prefix}}/etc.). Script detection is normally done when linking, but we need to do it earlier. However, under most circumstances we don't need to do script detection. Specifically, we only need to do script detection for a given language if (a) the language has multiple scripts; and (b) at least one of those scripts is listed below or in display_hyphens. ]=] local ZWNJ = u(0x200C) -- zero-width non-joiner local template_hyphens = { -- This covers all Arabic scripts. See above. ["Arab"] = "ـ" .. ZWNJ .. "-", -- tatweel + zero-width non-joiner + regular hyphen ["Hebr"] = "־", -- Hebrew-specific hyphen termed "maqqef" ["Mong"] = "᠊", -- FIXME! What about the following right-to-left scripts? -- Adlm (Adlam) -- Armi (Imperial Aramaic) -- Avst (Avestan) -- Cprt (Cypriot) -- Khar (Kharoshthi) -- Mand (Mandaic/Mandaean) -- Mani (Manichaean) -- Mend (Mende/Mende Kikakui) -- Narb (Old North Arabian) -- Nbat (Nabataean/Nabatean) -- Nkoo (N'Ko) -- Orkh (Orkhon runes) -- Phli (Inscriptional Pahlavi) -- Phlp (Psalter Pahlavi) -- Phlv (Book Pahlavi) -- Phnx (Phoenician) -- Prti (Inscriptional Parthian) -- Rohg (Hanifi Rohingya) -- Samr (Samaritan) -- Sarb (Old South Arabian) -- Sogd (Sogdian) -- Sogo (Old Sogdian) -- Syrc (Syriac) -- Thaa (Thaana) } -- Hyphens used when looking up an affix in a lang-specific affix mapping. Defaults to regular hyphen (-). The keys -- are script codes, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab' and 'ur-Arab' -- will match 'Arab'. The value should be a single character. local lookup_hyphens = { ["Hebr"] = "־", -- This covers all Arabic scripts. See above. ["Arab"] = "ـ", } -- Default display-hyphen function. local function default_display_hyphen(script, hyph) if not hyph then return template_hyphens[script] or "-" end return hyph end local function arab_get_display_hyphen(script, hyph) if not hyph then return "ـ" -- tatweel elseif hyph == ZWNJ then return "" else return hyph end end local function no_display_hyphen(script, hyph) return "" end -- Per-script function to return the correct display hyphen given the script and template hyphen. The function should -- also handle the case where the passed-in template hyphen is nil, corresponding to the situation in -- {{prefix}}/{{suffix}}/etc. where no template hyphen is specified. The key is the script code after removing a hyphen -- and anything preceding, so 'fa-Arab', 'ur-Arab' etc. will match 'Arab'. local display_hyphens = { -- This covers all Arabic scripts. See above. ["Arab"] = arab_get_display_hyphen, ["Bopo"] = no_display_hyphen, ["Hani"] = no_display_hyphen, ["Hans"] = no_display_hyphen, ["Hant"] = no_display_hyphen, -- The following is a mixture of several scripts. Hopefully the specs here are correct! ["Jpan"] = no_display_hyphen, ["Jurc"] = no_display_hyphen, ["Kitl"] = no_display_hyphen, ["Kits"] = no_display_hyphen, ["Laoo"] = no_display_hyphen, ["Nshu"] = no_display_hyphen, ["Shui"] = no_display_hyphen, ["Tang"] = no_display_hyphen, ["Thaa"] = no_display_hyphen, ["Thai"] = no_display_hyphen, ["Tibt"] = no_display_hyphen, } ----------------------------------------------------------------------------------------- -- Basic Utility functions -- ----------------------------------------------------------------------------------------- local function glossary_link(entry, text) text = text or entry return "[[Lampiran:Glosari#" .. entry .. "|" .. text .. "]]" end local function track(page) if type(page) == "table" then for i, pg in ipairs(page) do page[i] = "affix/" .. pg end else page = "affix/" .. page end require("Module:debug/track")(page) end local function ine(val) return val ~= "" and val or nil end ----------------------------------------------------------------------------------------- -- Compound types -- ----------------------------------------------------------------------------------------- local function make_compound_type(typ, alttext) return { text = glossary_link(typ, alttext) .. " majmuk", cat = typ .. " majmuk", } end -- Make a compound type entry with a simple rather than glossary link. -- These should be replaced with a glossary link when the entry in the glossary -- is created. local function make_non_glossary_compound_type(typ, alttext) local link = alttext and "[[" .. typ .. "|" .. alttext .. "]]" or "[[" .. typ .. "]]" return { text = link .. " majmuk", cat = typ .. " majmuk", } end local function make_raw_compound_type(typ, alttext) return { text = glossary_link(typ, alttext), cat = pluralize(typ), } end local function make_borrowing_type(typ, alttext) return { text = glossary_link(typ, alttext), borrowing_type = pluralize(typ), } end export.etymology_types = { ["adapted borrowing"] = make_borrowing_type("adapted borrowing"), ["adap"] = "adapted borrowing", ["abor"] = "adapted borrowing", ["alliterative"] = make_non_glossary_compound_type("alliterative"), ["allit"] = "alliterative", ["antonymous"] = make_non_glossary_compound_type("antonymous"), ["ant"] = "antonymous", ["bahuvrihi"] = make_compound_type("bahuvrihi", "bahuvrīhi"), ["bahu"] = "bahuvrihi", ["bv"] = "bahuvrihi", ["coordinative"] = make_compound_type("coordinative"), ["coord"] = "coordinative", ["descriptive"] = make_compound_type("descriptive"), ["desc"] = "descriptive", ["determinative"] = make_compound_type("determinative"), ["det"] = "determinative", ["dvandva"] = make_compound_type("dvandva"), ["dva"] = "dvandva", ["dvigu"] = make_compound_type("dvigu"), ["dvi"] = "dvigu", ["endocentric"] = make_compound_type("endocentric"), ["endo"] = "endocentric", ["exocentric"] = make_compound_type("exocentric"), ["exo"] = "exocentric", ["izafet I"] = make_compound_type("izafet I"), ["iz1"] = "izafet I", ["izafet II"] = make_compound_type("izafet II"), ["iz2"] = "izafet II", ["izafet III"] = make_compound_type("izafet III"), ["iz3"] = "izafet III", ["karmadharaya"] = make_compound_type("karmadharaya", "karmadhāraya"), ["karma"] = "karmadharaya", ["kd"] = "karmadharaya", ["kenning"] = make_raw_compound_type("kenning"), ["ken"] = "kenning", ["rhyming"] = make_non_glossary_compound_type("rhyming"), ["rhy"] = "rhyming", ["synonymous"] = make_non_glossary_compound_type("synonymous"), ["syn"] = "synonymous", ["tatpurusa"] = make_compound_type("tatpurusa", "tatpuruṣa"), ["tat"] = "tatpurusa", ["tp"] = "tatpurusa", } local function process_etymology_type(typ, nocap, notext, has_parts) local text_sections = {} local categories = {} local borrowing_type if typ then local typdata = export.etymology_types[typ] if type(typdata) == "string" then typdata = export.etymology_types[typdata] end if not typdata then error("Internal error: Unrecognized type '" .. typ .. "'") end local text = typdata.text if not nocap then text = ucfirst(text) end local cat = typdata.cat borrowing_type = typdata.borrowing_type local oftext = typdata.oftext or " of" if not notext then table.insert(text_sections, text) if has_parts then table.insert(text_sections, oftext) table.insert(text_sections, " ") end end if cat then table.insert(categories, cat) end end return text_sections, categories, borrowing_type end ----------------------------------------------------------------------------------------- -- Utility functions -- ----------------------------------------------------------------------------------------- -- Iterate an array up to the greatest integer index found. local function ipairs_with_gaps(t) local indices = m_table.numKeys(t) local max_index = #indices > 0 and math.max(unpack(indices)) or 0 local i = 0 return function() while i < max_index do i = i + 1 return i, t[i] end end end export.ipairs_with_gaps = ipairs_with_gaps --[==[ Join formatted parts (in `parts_formatted`) together with any overall {{para|lit}} spec (in `lit`) plus categories, which are formatted by prepending the language name as found in `lang`. The value of an entry in `categories` can be either a string (which is formatted using `sort_key`) or a table of the form `{ {cat=<var>category</var>, sort_key=<var>sort_key</var>, sort_base=<var>sort_base</var>}`, specifying the sort key and sort base to use when formatting the category. If `nocat` is given, no categories are added; otherwise, `force_cat` causes categories to be added even on userspace pages. ]==] function export.join_formatted_parts(data) local cattext local lang = data.data.lang local force_cat = data.data.force_cat or debug_force_cat if data.data.nocat then cattext = "" else for i, cat in ipairs(data.categories) do if type(cat) == "table" then data.categories[i] = require(utilities_module).format_categories(cat.cat .. " bahasa " .. lang:getFullName(), lang, cat.sort_key, cat.sort_base, force_cat) else data.categories[i] = require(utilities_module).format_categories(cat .. " bahasa " .. lang:getFullName(), lang, data.data.sort_key, nil, force_cat) end end cattext = table.concat(data.categories) end local result = table.concat(data.parts_formatted, not data.separator_already_added and " +&lrm; " or nil) .. (data.data.lit and ", secara harfiah " .. m_links.mark(data.data.lit, "gloss") or "") local q = data.data.q local qq = data.data.qq local l = data.data.l local ll = data.data.ll local infl = data.data.infl if q and q[1] or qq and qq[1] or l and l[1] or ll and ll[1] or infl and infl[1] then result = require(pron_qualifier_module).format_qualifiers { lang = lang, text = result, q = q, qq = qq, l = l, ll = ll, infl = infl, } end return result .. cattext end local function pluralize(pos) return pos end -- Remove links and call lang:stripDiacritics(term). local function strip_diacritics_no_links(lang, term) return lang:stripDiacritics(m_links.remove_links(term)) end --[=[ Convert a raw part as passed into an entry point into a part ready for linking. `lang` and `sc` are the overall language and script objects. This uses the overall language and script objects as defaults for the part and parses off any fragment from the term. We need to do the latter so that fragments don't end up in categories and so that we correctly do affix mapping even in the presence of fragments. ]=] local function canonicalize_part(part, lang, sc) if not part then return end -- Save the original (user-specified, part-specific) value of `lang`. If such a value is specified, we don't insert -- a '*fixed with' category, and we format the part using format_derived() in [[Module:etymology]] rather than -- full_link() in [[Module:links]]. part.part_lang = part.lang part.lang = part.lang or lang part.sc = part.sc or sc local term = part.term if not term then return elseif not part.fragment then part.term, part.fragment = m_links.get_fragment(term) else part.term = m_links.get_fragment(term) end end --[==[ Construct a single linked part based on the information in `part`, for use by `show_affix()` and other entry points. This should be called after `canonicalize_part()` is called on the part. This is a thin wrapper around `full_link()` in [[Module:links]] unless `part.part_lang` is specified (indicating that a part-specific language was given), in which case `format_derived()` in [[Module:etymology]] is called to display a term in a language other than the language of the overall term (specified in `data.lang`). `data` contains the entire object passed into the entry point and is used to access information for constructing the categories added by `format_derived()`. ]==] function export.link_term(part, data, include_separator) local result if part.part_lang then result = require(etymology_module).format_derived { lang = data.lang, terms = {part}, sources = {part.lang}, sort_key = data.sort_key, nocat = data.nocat, template_name = "affix", qualifiers_labels_on_outside = true, borrowing_type = data.borrowing_type, force_cat = data.force_cat or debug_force_cat, } else result = m_links.full_link(part, "term", nil, "show qualifiers") end if include_separator and part.separator then return part.separator .. result else return result end end local function canonicalize_script_code(scode) -- Convert fa-Arab, ur-Arab etc. to Arab. return (scode:gsub("^.*%-", "")) end ----------------------------------------------------------------------------------------- -- Affix-handling functions -- ----------------------------------------------------------------------------------------- -- Figure out the appropriate script for the given affix and language (unless the script is explicitly passed in), and -- return the values of template_hyphens[], display_hyphens[] and lookup_hyphens[] for that script, substituting -- default values as appropriate. Four values are returned: -- DETECTED_SCRIPT, TEMPLATE_HYPHEN, DISPLAY_HYPHEN, LOOKUP_HYPHEN local function detect_script_and_hyphens(text, lang, sc) local scode -- 1. If the script is explicitly passed in, use it. if sc then scode = sc:getCode() else local possible_script_codes = lang:getScriptCodes() -- YUCK! `possible_script_codes` comes from loadData() so #possible_scripts doesn't work (always returns 0). local num_possible_script_codes = m_table.length(possible_script_codes) if num_possible_script_codes == 0 then -- This shouldn't happen; if the language has no script codes, -- the list {"None"} should be returned. error("Something is majorly wrong! Language " .. lang:getCanonicalName() .. " has no script codes.") end if num_possible_script_codes == 1 then -- 2. If the language has only one possible script, use it. scode = possible_script_codes[1] else -- 3. Check if any of the possible scripts for the language have non-default values for template_hyphens[] -- or display_hyphens[]. If so, we need to do script detection on the text. If not, just use "Latn", -- which may not be technically correct but produces the right results because Latn has all default -- values for template_hyphens[] and display_hyphens[]. local may_have_nondefault_hyphen = false for _, script_code in ipairs(possible_script_codes) do script_code = canonicalize_script_code(script_code) if template_hyphens[script_code] or display_hyphens[script_code] then may_have_nondefault_hyphen = true break end end if not may_have_nondefault_hyphen then scode = "Latn" else scode = lang:findBestScript(text):getCode() end end end scode = canonicalize_script_code(scode) local template_hyphen = template_hyphens[scode] or "-" local lookup_hyphen = lookup_hyphens[scode] or "-" local display_hyphen = display_hyphens[scode] or default_display_hyphen return scode, template_hyphen, display_hyphen, lookup_hyphen end --[=[ Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string, specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen, or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix). ]=] local function reconstruct_term_per_hyphens(term, affix_type, scode, thyph_re, new_hyphen) local function get_hyphen(hyph) if type(new_hyphen) == "string" then return new_hyphen end return new_hyphen(scode, hyph) end if affix_type == "non-affix" then return term elseif affix_type == "apitan" then local before, before_hyphen, after_hyphen, after = rmatch(term, "^(.*)" .. thyph_re .. " " .. thyph_re .. "(.*)$") if not before or ulen(term) <= 3 then -- Unlike with other types of affixes, don't try to add hyphens in the middle of the term to convert it to -- a circumfix. Also, if the term is just hyphen + space + hyphen, return it. return term end return before .. get_hyphen(before_hyphen) .. " " .. get_hyphen(after_hyphen) .. after elseif affix_type == "sisipan" or affix_type == "jalinan" then local before_hyphen, middle, after_hyphen = rmatch(term, "^" .. thyph_re .. "(.*)" .. thyph_re .. "$") if before_hyphen and ulen(term) <= 1 then -- If the term is just a hyphen, return it. return term end return get_hyphen(before_hyphen) .. (middle or term) .. get_hyphen(after_hyphen) elseif affix_type == "awalan" then local middle, after_hyphen = rmatch(term, "^(.*)" .. thyph_re .. "$") if middle and ulen(term) <= 1 then -- If the term is just a hyphen, return it. return term end return (middle or term) .. get_hyphen(after_hyphen) elseif affix_type == "akhiran" then local before_hyphen, middle = rmatch(term, "^" .. thyph_re .. "(.*)$") if before_hyphen and ulen(term) <= 1 then -- If the term is just a hyphen, return it. return term end return get_hyphen(before_hyphen) .. (middle or term) else error(("Internal error: Unrecognized affix type '%s'"):format(affix_type)) end end --[=[ Look up a mapping from a given affix variant to the canonical form used in categories and links. The lookup tables are language-specific according to `lang`, and may be ID-specific according to `affix_id`. The affixes as they appear in the lookup tables (both the variant and the canonical form) are in "lookup affix" format (approximately speaking, they use a regular hyphen for most scripts, but a tatweel for Arabic-script entries and a maqqef for Hebrew-script entries), but the passed-in `affix` param is in "template affix" format (which differs from the lookup affix for Arabic-script entries, because more types of hyphens are allowed in template affixes; see the comments at the top of the file). The remaining parameters to this function are used to convert from template affixes to lookup affixes; see the reconstruct_term_per_hyphens() function above. If the affix contains brackets, no lookup is done. Otherwise, a two-stage process is used, first looking up the affix directly and then stripping diacritics and looking it up again. The reason for this is documented above in the comments at the top of the file (specifically, the comments describing lookup affixes). The value of a mapping can either be a string (do the mapping regardless of affix ID) or a table indexed by affix ID (where the special value `false` indicates no affix ID). The values of entries in this table can also be strings, or tables with keys `affix` and `id` (again, use `false` to indicate no ID). This allows an affix mapping to map from one ID to another (for example, this is used in English to map the [[an-]] prefix with no ID to the [[a-]] prefix with the ID 'not'). The Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string, specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen, or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix). ]=] local function lookup_affix_mapping(affix, affix_type, lang, scode, thyph_re, lookup_hyph, affix_id) local function do_lookup(affix) -- Ensure that the affix uses lookup hyphens regardless of whether it used a different type of hyphens before -- or no hyphens. local lookup_affix = reconstruct_term_per_hyphens(affix, affix_type, scode, thyph_re, lookup_hyph) local function do_lookup_for_langcode(langcode) if export.langs_with_lang_specific_data[langcode] then local langdata = mw.loadData(export.affix_lang_data_module_prefix .. langcode) if langdata.affix_mappings then local mapping = langdata.affix_mappings[lookup_affix] if mapping then if type(mapping) == "table" then mapping = mapping[affix_id] or mapping.default or mapping[affix_id or false] if mapping then return mapping end else return mapping end end end end end -- If `lang` is an etymology-only language, look for a mapping both for it and its full parent. local langcode = lang:getCode() local mapping = do_lookup_for_langcode(langcode) if mapping then return mapping end local full_langcode = lang:getFullCode() if full_langcode ~= langcode then mapping = do_lookup_for_langcode(full_langcode) if mapping then return mapping end end return nil end if affix:find("%[%[") then return nil end return do_lookup(affix) or do_lookup(lang:stripDiacritics(affix)) or nil end --[==[ For a given template term in a given language (see the definition of "template affix" near the top of the file), possibly in an explicitly specified script `sc` (but usually nil), return the term's affix type ({"awalan"}, {"jalinan"}, {"akhiran"}, {"apitan"} or {"non-affix"}) along with the corresponding link and display affixes (see definitions near the top of the file); also the corresponding lookup affix (if `return_lookup_affix` is specified). The term passed in should already have any fragment (after the # sign) parsed off of it. Four values are returned: `affix_type`, `link_term`, `display_term` and `lookup_term`. The affix type can be passed in instead of autodetected; in this case, the template term need not have any attached hyphens, and the appropriate hyphens will be added in the appropriate places. If `do_affix_mapping` is specified, look up the affix in the lang-specific affix mappings, as described in the comment at the top of the file; otherwise, the link and display terms will always be the same. (They will be the same in any case if the template term has a bracketed link in it or is not an affix.) If `return_lookup_affix` is given, the fourth return value contains the term with appropriate lookup hyphens in the appropriate places; otherwise, it is the same as the display term. (This functionality is used in [[Module:category tree/affixes and compounds]] to convert link affixes into lookup affixes so that they can be looked up in the affix mapping tables.) ]==] local function parse_term_for_affixes(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id) if not term then return "non-affix", nil, nil, nil end if term == "^" then -- Indicates a null term to emulate the behavior of {{suffix|foo||bar}}. term = "" return "non-affix", term, term, term end if term:find("^%^") then -- HACK! ^ at the beginning of Korean languages has a special meaning, triggering capitalization of the -- transliteration. Don't interpret it as "force non-affix" for those languages. local langcode = lang:getCode() if langcode ~= "ko" and langcode ~= "okm" and langcode ~= "jje" then -- Formerly we allowed ^ to force non-affix type; this is now handled using an inline modifier -- <naf>, <root>, etc. Throw an error for the moment when the old way is encountered. error("Use of ^ to force non-affix status is no longer supported; use an inline modifier <naf> or <root> " .. "after the component") end end -- Remove an asterisk if the morpheme is reconstructed and add it back at the end. local reconstructed = "" if term:find("^%*") then reconstructed = "*" term = term:gsub("^%*", "") end local scode, thyph, dhyph, lhyph = detect_script_and_hyphens(term, lang, sc) thyph = "([" .. thyph .. "])" if not affix_type then if rfind(term, thyph .. " " .. thyph) then affix_type = "apitan" else local has_beginning_hyphen = rfind(term, "^" .. thyph) local has_ending_hyphen = rfind(term, thyph .. "$") if has_beginning_hyphen and has_ending_hyphen then affix_type = "jalinan" elseif has_ending_hyphen then affix_type = "awalan" elseif has_beginning_hyphen then affix_type = "akhiran" else affix_type = "non-affix" end end end local link_term, display_term, lookup_term if affix_type == "non-affix" then link_term = term display_term = term lookup_term = term else display_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, dhyph) if do_affix_mapping then link_term = lookup_affix_mapping(term, affix_type, lang, scode, thyph, lhyph, affix_id) -- The return value of lookup_affix_mapping() may be an affix mapping with lookup hyphens if a mapping -- was found, otherwise nil if a mapping was not found. We need to convert to display hyphens in -- either case, but in the latter case we can reuse the display term, which has already been converted. if link_term then link_term = reconstruct_term_per_hyphens(link_term, affix_type, scode, thyph, dhyph) else link_term = display_term end else link_term = display_term end if return_lookup_affix then lookup_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, lhyph) else lookup_term = display_term end end link_term = reconstructed .. link_term display_term = reconstructed .. display_term lookup_term = reconstructed .. lookup_term return affix_type, link_term, display_term, lookup_term end --[==[ Add a hyphen to a term in the appropriate place, based on the specified affix type, stripping off any existing hyphens in that place. For example, if `affix_type` == {"awalan"}, we'll add a hyphen onto the end if it's not already there (or is of the wrong type). Three values are returned: the link term, display term and lookup term. This function is a thin wrapper around `parse_term_for_affixes`; see the comments above that function for more information. Note that this function is exposed externally because it is called by [[Module:category tree/affixes and compounds]]; see the comment in `parse_term_for_affixes` for more information. ]==] function export.make_affix(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id) if not (affix_type == "awalan" or affix_type == "akhiran" or affix_type == "apitan" or affix_type == "sisipan" or affix_type == "jalinan" or affix_type == "non-affix") then error("Internal error: Invalid affix type " .. (affix_type or "(nil)")) end local _, link_term, display_term, lookup_term = parse_term_for_affixes(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id) return link_term, display_term, lookup_term end ----------------------------------------------------------------------------------------- -- Main entry points -- ----------------------------------------------------------------------------------------- --[==[ Core categorization logic for affixes. This is shared between show_affix(), show_compound_like() and get_affix_categories_only(). Returns the categories array and other metadata needed for formatting. ]==] local function generate_affix_categories(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) local text_sections, categories, borrowing_type = process_etymology_type(data.type, data.surface_analysis or data.nocap, data.notext, #data.parts > 0) data.borrowing_type = borrowing_type -- Process each part local whole_words = 0 local is_affix_or_compound = false -- Canonicalize and generate links for all the parts first; then do categorization in a separate step, because when -- processing the first part for categorization, we may access the second part and need it already canonicalized. for i, part in ipairs_with_gaps(data.parts) do part = part or {} data.parts[i] = part canonicalize_part(part, data.lang, data.sc) -- Determine affix type and get link and display terms (see text at top of file). Store them in the part -- (in fields that won't clash with fields used by full_link() in [[Module:links]] or link_term()), so they -- can be used in the loop below when categorizing. part.affix_type, part.affix_link_term, part.affix_display_term = parse_term_for_affixes(part.term, part.lang, part.sc, part.type, not part.alt, nil, part.id) -- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with inline -- modifiers. The intention in either case is not to link the term. part.term = ine(part.affix_link_term) -- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being -- redundant alt text. part.alt = part.alt or (part.affix_display_term ~= part.affix_link_term and part.affix_display_term) or nil end if not data.noaffixcat then -- Now do categorization. for i, part in ipairs_with_gaps(data.parts) do local affix_type = part.affix_type if affix_type ~= "non-affix" then is_affix_or_compound = true -- Make a sort key. For the first part, use the second part as the sort key; the intention is that if the -- term has a prefix, sorting by the prefix won't be very useful so we sort by what follows, which is -- presumably the root. local part_sort_base = nil local part_sort = part.sort or data.sort_key if i == 1 and data.parts[2] and data.parts[2].term then local part2 = data.parts[2] -- If the second-part link term is empty, the user requested an unlinked term; avoid a wikitext error -- by using the alt value if available. part_sort_base = ine(part2.affix_link_term) or ine(part2.alt) if part_sort_base then part_sort_base = strip_diacritics_no_links(part2.lang, part_sort_base) end end if part.pos and rfind(part.pos, "patronym") then table.insert(categories, {cat = "patronim", sort_key = part_sort, sort_base = part_sort_base}) end if data.pos ~= "terms" and part.pos and rfind(part.pos, "diminutive") then table.insert(categories, {cat = data.pos .. " diminutif", sort_key = part_sort, sort_base = part_sort_base}) end -- Don't add a '*fixed with' category if the link term is empty or is in a different language. if ine(part.affix_link_term) and not part.part_lang then table.insert(categories, {cat = data.pos .. " dengan " .. affix_type .. " " .. strip_diacritics_no_links(part.lang, part.affix_link_term) .. (part.id and " (" .. part.id .. ")" or ""), sort_key = part_sort, sort_base = part_sort_base}) end else whole_words = whole_words + 1 if whole_words == 2 then is_affix_or_compound = true table.insert(categories, data.pos .. " majmuk") end end end -- Make sure there was either an affix or a compound (two or more non-affix terms). if not is_affix_or_compound and not data.allow_no_affixes_or_compounds then error("The parameters did not include any affixes, and the term is not a compound. Please provide at least one affix.") end end return text_sections, categories, borrowing_type end --[==[ Implementation of {{tl|affix}} and {{tl|surface analysis}}. `data` contains all the information describing the affixes to be displayed, and contains the following: * `.lang` ('''required'''): Overall language object. Different from term-specific language objects (see `.parts` below). * `.sc`: Overall script object (usually omitted). Different from term-specific script objects. * `.parts` ('''required'''): List of objects describing the affixes to show. The general format of each object is as would be passed to `full_link()`, except that the `.lang` field should be missing unless the term is of a language different from the overall `.lang` value (in such a case, the language name is shown along with the term and an additional "derived from" category is added). '''WARNING''': The data in `.parts` will be destructively modified. * `.pos`: Overall part of speech (used in categories, defaults to {"terms"}). Different from term-specific part of speech. * `.sort_key`: Overall sort key. Normally omitted except e.g. in Japanese. * `.type`: Type of compound, if the parts in `.parts` describe a compound. Strictly optional, and if supplied, the compound type is displayed before the parts (normally capitalized, unless `.nocap` is given). * `.nocap`: Don't capitalize the first letter of text displayed before the parts (relevant only if `.type` or `.surface_analysis` is given). * `.notext`: Don't display any text before the parts (relevant only if `.type` or `.surface_analysis` is given). * `.nocat`: Disable all categorization. * `.noaffixcat`: Disable affix (and compound) categorization. Relevant for e.g. blends, which may otherwise be incorrectly categorized as compound terms. * `.lit`: Overall literal definition. Different from term-specific literal definitions. * `.force_cat`: Always display categories, even on userspace pages. * `.surface_analysis`: Implement {{surface analysis}}; adds `By surface analysis, ` before the parts. '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.show_affix(data) local text_sections, categories, borrowing_type = generate_affix_categories(data) -- Process each part for display local parts_formatted = {} for i, part in ipairs_with_gaps(data.parts) do -- Make a link for the part table.insert(parts_formatted, export.link_term(part, data, "include_separator")) end if data.surface_analysis then local text = "dengan " .. glossary_link("surface analysis") .. ", " if not data.nocap then text = ucfirst(text) end table.insert(text_sections, 1, text) end table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories, separator_already_added = true }) return table.concat(text_sections) end --[==[ Get only the categories that would be generated by show_affix(), without any text output or formatting. This is used by Module:etymon to get affix categorization. Returns an array of category objects, where each entry is either a string (simple category name) or a table with keys `cat`, `sort_key`, and `sort_base` for more complex categorization. `data` should have the same structure as passed to show_affix(): * `.lang` (required): Overall language object * `.parts` (required): Array of affix part objects with `.term`, `.lang`, `.id`, etc. * `.pos`: Part of speech (defaults to "terms") * `.sort_key`: Overall sort key for categories '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.get_affix_categories_only(data) local text_sections, categories, borrowing_type = generate_affix_categories(data) return categories end function export.show_surface_analysis(data) data.surface_analysis = true data.allow_no_affixes_or_compounds = true return export.show_affix(data) end --[==[ Implementation of {{tl|compound}}. '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.show_compound(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) local text_sections, categories, borrowing_type = process_etymology_type(data.type, data.nocap, data.notext, #data.parts > 0) data.borrowing_type = borrowing_type local parts_formatted = {} local pos_for_category = (data.pos == "Perkataan") and "Kata" or data.pos table.insert(categories, pos_for_category .. " majmuk") -- Make links out of all the parts local whole_words = 0 for i, part in ipairs(data.parts) do canonicalize_part(part, data.lang, data.sc) -- Determine affix type and get link and display terms (see text at top of file). local affix_type, link_term, display_term = parse_term_for_affixes(part.term, part.lang, part.sc, part.type, not part.alt, nil, part.id) -- If the term is an interfix or the type was explicitly given, recognize it as such (which means e.g. that we -- will display the term without hyphens for East Asian languages). Otherwise, ignore the fact that it looks -- like an affix and display as specified in the template (but pay attention to the detected affix type for -- certain tracking purposes). if affix_type == "jalinan" or (part.type and part.type ~= "non-affix") then -- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with -- inline modifiers. The intention in either case is not to link the term. Don't add a '*fixed with' -- category in this case, or if the term is in a different language. -- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being -- redundant alt text. if link_term and link_term ~= "" and not part.part_lang then table.insert(categories, {cat = data.pos .. " dengan " .. affix_type .. " " .. strip_diacritics_no_links(part.lang, link_term), sort_key = part.sort or data.sort_key}) end part.term = link_term ~= "" and link_term or nil part.alt = part.alt or (display_term ~= link_term and display_term) or nil else if affix_type ~= "non-affix" then local langcode = data.lang:getCode() -- If `data.lang` is an etymology-only language, track both using its code and its full parent's code. track { affix_type, affix_type .. "/lang/" .. langcode } local full_langcode = data.lang:getFullCode() if langcode ~= full_langcode then track(affix_type .. "/lang/" .. full_langcode) end else whole_words = whole_words + 1 end end table.insert(parts_formatted, export.link_term(part, data, "include_separator")) end if whole_words == 1 then track("one whole word") elseif whole_words == 0 then track("looks like confix") end table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories, separator_already_added = true }) return table.concat(text_sections) end --[==[ Implementation of {{tl|blend}}, {{tl|univerbation}} and similar "compound-like" templates. '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.show_compound_like(data) data.allow_no_affixes_or_compounds = true local text_sections, categories, borrowing_type = generate_affix_categories(data) if data.cat then table.insert(categories, data.cat) end -- Process each part for display local parts_formatted = {} for i, part in ipairs_with_gaps(data.parts) do -- Make a link for the part table.insert(parts_formatted, export.link_term(part, data, "include_separator")) end if #data.parts > 0 and data.oftext then table.insert(text_sections, 1, " " .. data.oftext .. " ") end if data.text then table.insert(text_sections, 1, data.text) end table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories, separator_already_added = true }) return table.concat(text_sections) end --[==[ Make `part` (a structure holding information on an affix part) into an affix of type `affix_type`, and apply any relevant affix mappings. For example, if the desired affix type is "akhiran", this will (in general) add a hyphen onto the beginning of the term, alt, tr and ts components of the part if not already present. The hyphen that's added is the "display hyphen" (see above) and may be script-specific. (In the case of East Asian scripts, the display hyphen is an empty string whereas the template hyphen is the regular hyphen, meaning that any regular hyphen at the beginning of the part will be effectively removed.) `lang` and `sc` hold overall language and script objects. Note that this also applies any language-specific affix mappings, so that e.g. if the language is Finnish and the user specified [[-käs]] in the affix and didn't specify an `.alt` value, `part.term` will contain [[-kas]] and `part.alt` will contain [[-käs]]. This function is used by the "legacy" templates ({{tl|prefix}}, {{tl|suffix}}, {{tl|confix}}, etc.) where the nature of the affix is specified by the template itself rather than auto-determined from the affix, as is the case with {{tl|affix}}. '''WARNING''': This destructively modifies `part`. ]==] local function make_part_into_affix(part, lang, sc, affix_type) canonicalize_part(part, lang, sc) local link_term, display_term = export.make_affix(part.term, part.lang, part.sc, affix_type, not part.alt, nil, part.id) part.term = link_term -- When we don't specify `do_affix_mapping` to make_affix(), link and display terms (first and second retvals of -- make_affix()) are the same. -- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being -- redundant alt text. part.alt = part.alt and export.make_affix(part.alt, part.lang, part.sc, affix_type) or (display_term ~= link_term and display_term) or nil local Latn = require(scripts_module).getByCode("Latn") part.tr = export.make_affix(part.tr, part.lang, Latn, affix_type) part.ts = export.make_affix(part.ts, part.lang, Latn, affix_type) end local function track_wrong_affix_type(template, part, expected_affix_type) if part and not part.type then local affix_type = parse_term_for_affixes(part.term, part.lang, part.sc) if affix_type ~= expected_affix_type then local part_name = expected_affix_type or "base" local langcode = part.lang:getCode() local full_langcode = part.lang:getFullCode() require("Module:debug/track") { template, template .. "/" .. part_name, template .. "/" .. part_name .. "/" .. (affix_type or "none"), template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. langcode } -- If `part.lang` is an etymology-only language, track both using its code and its full parent's code. if full_langcode ~= langcode then require("Module:debug/track")( template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. full_langcode ) end end end end local function insert_affix_category(categories, pos, affix_type, part, sort_key, sort_base) -- Don't add a '*fixed with' category if the link term is empty or is in a different language. if part.term and not part.part_lang then local cat = pos .. " dengan " .. affix_type .. " " .. make_entry_name_no_links(part.lang, part.term) .. (part.id and " (" .. part.id .. ")" or "") if sort_key or sort_base then table.insert(categories, {cat = cat, sort_key = sort_key, sort_base = sort_base}) else table.insert(categories, cat) end end end --[==[ Implementation of {{tl|circumfix}}. '''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`. ]==] function export.show_circumfix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. make_part_into_affix(data.prefix, data.lang, data.sc, "awalan") make_part_into_affix(data.suffix, data.lang, data.sc, "akhiran") track_wrong_affix_type("apitan", data.prefix, "awalan") track_wrong_affix_type("apitan", data.base, nil) track_wrong_affix_type("apitan", data.suffix, "akhiran") -- Create circumfix term. local circumfix = nil if data.prefix.term and data.suffix.term then circumfix = data.prefix.term .. " " .. data.suffix.term data.prefix.alt = data.prefix.alt or data.prefix.term data.suffix.alt = data.suffix.alt or data.suffix.term data.prefix.term = circumfix data.suffix.term = circumfix end -- Make links out of all the parts. local parts_formatted = {} local categories = {} local sort_base if data.base.term then sort_base = strip_diacritics_no_links(data.base.lang, data.base.term) end table.insert(parts_formatted, export.link_term(data.prefix, data)) table.insert(parts_formatted, export.link_term(data.base, data)) table.insert(parts_formatted, export.link_term(data.suffix, data)) -- Insert the categories, but don't add a '*fixed with' category if the link term is in a different language. if not data.prefix.part_lang then table.insert(categories, {cat=data.pos .. " dengan apitan " .. strip_diacritics_no_links(data.prefix.lang, circumfix), sort_key=data.sort_key, sort_base=sort_base}) end return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|confix}}. '''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`. ]==] function export.show_confix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. make_part_into_affix(data.prefix, data.lang, data.sc, "awalan") make_part_into_affix(data.suffix, data.lang, data.sc, "akhiran") track_wrong_affix_type("confix", data.prefix, "awalan") track_wrong_affix_type("confix", data.base, nil) track_wrong_affix_type("confix", data.suffix, "akhiran") -- Make links out of all the parts. local parts_formatted = {} local prefix_sort_base if data.base and data.base.term then prefix_sort_base = strip_diacritics_no_links(data.base.lang, data.base.term) elseif data.suffix.term then prefix_sort_base = strip_diacritics_no_links(data.suffix.lang, data.suffix.term) end -- Insert the categories and parts. local categories = {} table.insert(parts_formatted, export.link_term(data.prefix, data)) insert_affix_category(categories, data.pos, "awalan", data.prefix, data.sort_key, prefix_sort_base) if data.base then table.insert(parts_formatted, export.link_term(data.base, data)) end table.insert(parts_formatted, export.link_term(data.suffix, data)) -- FIXME, should we be specifying a sort base here? insert_affix_category(categories, data.pos, "akhiran", data.suffix) return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|infix}}. '''WARNING''': This destructively modifies both `data` and `.base` and `.infix`. ]==] function export.show_infix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. make_part_into_affix(data.infix, data.lang, data.sc, "sisipan") track_wrong_affix_type("sisipan", data.base, nil) track_wrong_affix_type("sisipan", data.infix, "sisipan") -- Make links out of all the parts. local parts_formatted = {} local categories = {} table.insert(parts_formatted, export.link_term(data.base, data)) table.insert(parts_formatted, export.link_term(data.infix, data)) -- Insert the categories. -- FIXME, should we be specifying a sort base here? insert_affix_category(categories, data.pos, "sisipan", data.infix) return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|prefix}}. '''WARNING''': This destructively modifies both `data` and the structures within `.prefixes`, as well as `.base`. ]==] function export.show_prefix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. for i, prefix in ipairs(data.prefixes) do make_part_into_affix(prefix, data.lang, data.sc, "awalan") end for i, prefix in ipairs(data.prefixes) do track_wrong_affix_type("awalan", prefix, "awalan") end track_wrong_affix_type("awalan", data.base, nil) -- Make links out of all the parts. local parts_formatted = {} local first_sort_base = nil local categories = {} if data.prefixes[2] then first_sort_base = ine(data.prefixes[2].term) or ine(data.prefixes[2].alt) if first_sort_base then first_sort_base = strip_diacritics_no_links(data.prefixes[2].lang, first_sort_base) end elseif data.base then first_sort_base = ine(data.base.term) or ine(data.base.alt) if first_sort_base then first_sort_base = strip_diacritics_no_links(data.base.lang, first_sort_base) end end for i, prefix in ipairs(data.prefixes) do table.insert(parts_formatted, export.link_term(prefix, data)) insert_affix_category(categories, data.pos, "awalan", prefix, data.sort_key, i == 1 and first_sort_base or nil) end if data.base then table.insert(parts_formatted, export.link_term(data.base, data)) else table.insert(parts_formatted, "") end return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|suffix}}. '''WARNING''': This destructively modifies both `data` and the structures within `.suffixes`, as well as `.base`. ]==] function export.show_suffix(data) local categories = {} data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. for i, suffix in ipairs(data.suffixes) do make_part_into_affix(suffix, data.lang, data.sc, "akhiran") end track_wrong_affix_type("akhiran", data.base, nil) for i, suffix in ipairs(data.suffixes) do track_wrong_affix_type("akhiran", suffix, "akhiran") end -- Make links out of all the parts. local parts_formatted = {} if data.base then table.insert(parts_formatted, export.link_term(data.base, data)) else table.insert(parts_formatted, "") end for i, suffix in ipairs(data.suffixes) do table.insert(parts_formatted, export.link_term(suffix, data)) end -- Insert the categories. for i, suffix in ipairs(data.suffixes) do -- FIXME, should we be specifying a sort base here? insert_affix_category(categories, data.pos, "akhiran", suffix) if suffix.pos and rfind(suffix.pos, "patronym") then table.insert(categories, "patronim") end end return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end return export r86ilta92vkxbd2j53n2sb0jvjy4ggi 281461 281459 2026-04-23T04:34:41Z Hakimi97 2668 281461 Scribunto text/plain local export = {} local debug_force_cat = false -- if set to true, always display categories even on userspace pages local m_links = require("Module:links") local m_str_utils = require("Module:string utilities") local m_table = require("Module:table") local en_utilities_module = "Module:en-utilities" local etymology_module = "Module:etymology" local pron_qualifier_module = "Module:pron qualifier" local scripts_module = "Module:scripts" local utilities_module = "Module:utilities" -- Export this so the category code in [[Module:category tree/etymology]] can access it. export.affix_lang_data_module_prefix = "Module:affix/lang-data/" local rsub = m_str_utils.gsub local usub = m_str_utils.sub local ulen = m_str_utils.len local rfind = m_str_utils.find local rmatch = m_str_utils.match local pluralize = require(en_utilities_module).pluralize local u = m_str_utils.char local ucfirst = m_str_utils.ucfirst local unpack = unpack or table.unpack -- Lua 5.2 compatibility function export.affix_variants(canonical, variants) local mappings = {} for _, variant in ipairs(variants) do mappings[variant] = canonical end return mappings end function export.id_mapping(default, ids) local mapping = { default = default } if ids then for id, target in pairs(ids) do mapping[id] = target end end return mapping end function export.id_mapping_with_affix_variants(base, id_variants) local mappings = {} for id, variants in pairs(id_variants) do for _, variant in ipairs(variants) do mappings[variant] = export.id_mapping(base, {[id] = base}) end end return mappings end function export.merge_tables(...) local result = {} for i = 1, select('#', ...) do local t = select(i, ...) if t then for k, v in pairs(t) do result[k] = v end end end return result end -- Export this so the category code in [[Module:category tree/etymology]] can access it. export.langs_with_lang_specific_data = { ["az"] = true, ["fi"] = true, ["fr"] = true, ["izh"] = true, ["la"] = true, ["sah"] = true, ["tr"] = true, ["trk-pro"] = true, } local default_pos = "Perkataan" --[==[ intro: ===About different types of hyphens ("template", "display" and "lookup"):=== * The "template hyphen" is the per-script hyphen character that is used in template calls to indicate that a term is an affix. This is always a single Unicode char, but there may be multiple possible hyphens for a given script. Normally this is just the regular hyphen character "-", but for some non-Latin-script languages (currently only right-to-left languages), it is different. * The "display hyphen" is the string (which might be an empty string) that is added onto a term as displayed and linked, to indicate that a term is an affix. Currently this is always either the same as the template hyphen or an empty string, but the code below is written generally enough to handle arbitrary display hyphens. Specifically: *# For East Asian languages, the display hyphen is always blank. *# For Arabic-script languages, either tatweel (ـ) or ZWNJ (zero-width non-joiner) are allowed as template hyphens, where ZWNJ is supported primarily for Farsi, because some suffixes have non-joining behavior. The display hyphen corresponding to tatweel is also tatweel, but the display hyphen corresponding to ZWNJ is blank (tatweel is also the default display hyphen, for calls to {{tl|prefix}}/{{tl|suffix}}/etc. that don't include an explicit hyphen). * The "lookup hyphen" is the hyphen that is used when looking up language-specific affix mappings. (These mappings are discussed in more detail below when discussing link affixes.) It depends only on the script of the affix in question. Most scripts (including East Asian scripts) use a regular hyphen "-" as the lookup hyphen, but Hebrew and Arabic have their own lookup hyphens (respectively maqqef and tatweel). Note that for Arabic in particular, there are three possible template hyphens that are recognized (tatweel, ZWNJ and regular hyphen), but mappings must use tatweel. ===About different types of affixes ("template", "display", "link", "lookup" and "category"):=== * A "template affix" is an affix in its source form as it appears in a template call. Generally, a template affix has an attached template hyphen (see above) to indicate that it is an affix and indicate what type of affix it is (prefix, suffix, interfix or circumfix), but some of the older-style templates such as {{tl|suffix}}, {{tl|prefix}}, {{tl|confix}}, etc. have "positional" affixes where the presence of the affix in a certain position (e.g. the second or third parameter) indicates that it is a certain type of affix, whether or not it has an attached template hyphen. * A "display affix" is the corresponding affix as it is actually displayed to the user. The display affix may differ from the template affix for various reasons: *# The display affix may be specified explicitly using the {{para|alt<var>N</var>}} parameter, the `<alt:...>` inline modifier or a piped link of the form e.g. `<nowiki>[[-kas|-käs]]</nowiki>` (here indicating that the affix should display as `-käs` but be linked as `-kas`). Here, the template affix is arguably the entire piped link, while the display affix is `-käs`. *# Even in the absence of {{para|alt<var>N</var>}} parameters, `<alt:...>` inline modifiers and piped links, certain languages have differences between the "template hyphen" specified in the template (which always needs to be specified somehow or other in templates like {{tl|affix}}, to indicate that the term is an affix and what type of affix it is) and the display hyphen (see above), with corresponding differences between template and display affixes. * A (regular) "link affix" is the affix that is linked to when the affix is shown to the user. The link affix is usually the same as the display affix, but will differ in one of three circumstances: *# The display and link affixes are explicitly made different using {{para|alt<var>N</var>}} parameters, `<alt:...>` inline modifiers or piped links, as described above under "display affix". *# For certain languages, certain affixes are mapped to canonical form using language-specific mappings. For example, in Finnish, the adjective-forming suffix {{m|fi|-kas}} appears as {{m|fi|-käs}} after front vowels, but logically both forms are the same suffix and should be linked and categorized the same. Similarly, in Latin, the negative and intensive prefixes spelled {{m|la|in-}} (etymologically two distinct prefixes) appear variously as {{m|la|il-}}, {{m|la|im-}} or {{m|la|ir-}} before certain consonants. Mappings are supplied in [[Module:affix/lang-data/LANGCODE]] to convert Finnish {{m|fi|-käs}} to {{m|fi|-kas}} for linking and categorization purposes. Note that the affixes in the mappings use "lookup hyphens" to indicate the different types of affixes, which is usually the same as the template hyphen but differs for Arabic scripts, because there are multiple possible template hyphens recognized but only one lookup hyphen (tatweel). The form of the affix as used to look up in the mapping tables is called the "lookup affix"; see below. * A "stripped link affix" is a link affix that has been passed through the language's `stripDiacritics()` function, which may strip certain diacritics: e.g. macrons in Latin and Old English (indicating length); acute and grave accents in Russian and various other Slavic languages (indicating stress); vowel diacritics in most Arabic-script languages; and also tatweel in some Arabic-script languages (currently, for example, Persian, Arabic and Urdu strip tatweel, but Ottoman Turkish does not). Stripped link affixes are currently what are used in category names. * A "lookup affix" is the form of the affix as it is looked up in the language-specific lookup mappings described above under link affixes. There are actually two lookup stages: *# First, the affix is looked up in a modified display form (specifically, the same as the display affix but using lookup hyphens). Note that this lookup does not occur if an explicit display form is given using {{para|alt<var>N</var>}} or an `<alt:...>` inline modifier, or if the template affix contains a piped or embedded link. *# If no entry is found, the affix is then looked up in a modified link form (specifically, the modified display form passed through the language's `stripDiacritics()` function, which strips out certain diacritics, but with the lookup hyphen re-added if it was stripped out, as in the case of tatweel in many Arabic-script languages). The reason for this double lookup procedure is to allow for mappings that are sensitive to the extra diacritics, but also allow for mappings that are not sensitive in this fashion (e.g. Russian {{m|ru|-ливый}} occurs both stressed and unstressed, but is the same prefix either way). * A "category affix" is the affix as it appears in categories such as [[:Category:Finnish terms suffixed with -kas| Category:Finnish terms suffixed with ''-kas'']]. The category affix is currently always the same as the stripped link affix. This means that for Arabic-script languages, it may or may not have a tatweel, even if the correponding display affix and regular link affix have a tatweel. As mentioned above, stripDiacritics() strips tatweel for Arabic, Persian and Urdu, but not for Ottoman Turkish. Hence affix categories for Arabic, Persian and Urdu will be missing the tatweel, but affix categories for Ottoman Turkish will have it. An additional complication is that if the template affix contains a ZWNJ, the display (and hence the link and category affixes) will have no hyphen attached in any case. ]==] ----------------------------------------------------------------------------------------- -- Template and display hyphens -- ----------------------------------------------------------------------------------------- --[=[ Per-script template hyphens. The template hyphen is what appears in the {{affix}}/{{prefix}}/{{suffix}}/etc. template (in the wikicode). See above. They key below is a script code, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab' and 'ur-Arab' will match 'Arab'. The value below is a string consisting of one or more hyphen characters. If there is more than one character, the default hyphen must come last and a non-default function must be specified for the script in display_hyphens[] so the correct display hyphen will be specified when no template hyphen is given (in {{suffix}}/{{prefix}}/etc.). Script detection is normally done when linking, but we need to do it earlier. However, under most circumstances we don't need to do script detection. Specifically, we only need to do script detection for a given language if (a) the language has multiple scripts; and (b) at least one of those scripts is listed below or in display_hyphens. ]=] local ZWNJ = u(0x200C) -- zero-width non-joiner local template_hyphens = { -- This covers all Arabic scripts. See above. ["Arab"] = "ـ" .. ZWNJ .. "-", -- tatweel + zero-width non-joiner + regular hyphen ["Hebr"] = "־", -- Hebrew-specific hyphen termed "maqqef" ["Mong"] = "᠊", -- FIXME! What about the following right-to-left scripts? -- Adlm (Adlam) -- Armi (Imperial Aramaic) -- Avst (Avestan) -- Cprt (Cypriot) -- Khar (Kharoshthi) -- Mand (Mandaic/Mandaean) -- Mani (Manichaean) -- Mend (Mende/Mende Kikakui) -- Narb (Old North Arabian) -- Nbat (Nabataean/Nabatean) -- Nkoo (N'Ko) -- Orkh (Orkhon runes) -- Phli (Inscriptional Pahlavi) -- Phlp (Psalter Pahlavi) -- Phlv (Book Pahlavi) -- Phnx (Phoenician) -- Prti (Inscriptional Parthian) -- Rohg (Hanifi Rohingya) -- Samr (Samaritan) -- Sarb (Old South Arabian) -- Sogd (Sogdian) -- Sogo (Old Sogdian) -- Syrc (Syriac) -- Thaa (Thaana) } -- Hyphens used when looking up an affix in a lang-specific affix mapping. Defaults to regular hyphen (-). The keys -- are script codes, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab' and 'ur-Arab' -- will match 'Arab'. The value should be a single character. local lookup_hyphens = { ["Hebr"] = "־", -- This covers all Arabic scripts. See above. ["Arab"] = "ـ", } -- Default display-hyphen function. local function default_display_hyphen(script, hyph) if not hyph then return template_hyphens[script] or "-" end return hyph end local function arab_get_display_hyphen(script, hyph) if not hyph then return "ـ" -- tatweel elseif hyph == ZWNJ then return "" else return hyph end end local function no_display_hyphen(script, hyph) return "" end -- Per-script function to return the correct display hyphen given the script and template hyphen. The function should -- also handle the case where the passed-in template hyphen is nil, corresponding to the situation in -- {{prefix}}/{{suffix}}/etc. where no template hyphen is specified. The key is the script code after removing a hyphen -- and anything preceding, so 'fa-Arab', 'ur-Arab' etc. will match 'Arab'. local display_hyphens = { -- This covers all Arabic scripts. See above. ["Arab"] = arab_get_display_hyphen, ["Bopo"] = no_display_hyphen, ["Hani"] = no_display_hyphen, ["Hans"] = no_display_hyphen, ["Hant"] = no_display_hyphen, -- The following is a mixture of several scripts. Hopefully the specs here are correct! ["Jpan"] = no_display_hyphen, ["Jurc"] = no_display_hyphen, ["Kitl"] = no_display_hyphen, ["Kits"] = no_display_hyphen, ["Laoo"] = no_display_hyphen, ["Nshu"] = no_display_hyphen, ["Shui"] = no_display_hyphen, ["Tang"] = no_display_hyphen, ["Thaa"] = no_display_hyphen, ["Thai"] = no_display_hyphen, ["Tibt"] = no_display_hyphen, } ----------------------------------------------------------------------------------------- -- Basic Utility functions -- ----------------------------------------------------------------------------------------- local function glossary_link(entry, text) text = text or entry return "[[Lampiran:Glosari#" .. entry .. "|" .. text .. "]]" end local function track(page) if type(page) == "table" then for i, pg in ipairs(page) do page[i] = "affix/" .. pg end else page = "affix/" .. page end require("Module:debug/track")(page) end local function ine(val) return val ~= "" and val or nil end ----------------------------------------------------------------------------------------- -- Compound types -- ----------------------------------------------------------------------------------------- local function make_compound_type(typ, alttext) return { text = glossary_link(typ, alttext) .. " majmuk", cat = typ .. " majmuk", } end -- Make a compound type entry with a simple rather than glossary link. -- These should be replaced with a glossary link when the entry in the glossary -- is created. local function make_non_glossary_compound_type(typ, alttext) local link = alttext and "[[" .. typ .. "|" .. alttext .. "]]" or "[[" .. typ .. "]]" return { text = link .. " majmuk", cat = typ .. " majmuk", } end local function make_raw_compound_type(typ, alttext) return { text = glossary_link(typ, alttext), cat = pluralize(typ), } end local function make_borrowing_type(typ, alttext) return { text = glossary_link(typ, alttext), borrowing_type = pluralize(typ), } end export.etymology_types = { ["adapted borrowing"] = make_borrowing_type("adapted borrowing"), ["adap"] = "adapted borrowing", ["abor"] = "adapted borrowing", ["alliterative"] = make_non_glossary_compound_type("alliterative"), ["allit"] = "alliterative", ["antonymous"] = make_non_glossary_compound_type("antonymous"), ["ant"] = "antonymous", ["bahuvrihi"] = make_compound_type("bahuvrihi", "bahuvrīhi"), ["bahu"] = "bahuvrihi", ["bv"] = "bahuvrihi", ["coordinative"] = make_compound_type("coordinative"), ["coord"] = "coordinative", ["descriptive"] = make_compound_type("descriptive"), ["desc"] = "descriptive", ["determinative"] = make_compound_type("determinative"), ["det"] = "determinative", ["dvandva"] = make_compound_type("dvandva"), ["dva"] = "dvandva", ["dvigu"] = make_compound_type("dvigu"), ["dvi"] = "dvigu", ["endocentric"] = make_compound_type("endocentric"), ["endo"] = "endocentric", ["exocentric"] = make_compound_type("exocentric"), ["exo"] = "exocentric", ["izafet I"] = make_compound_type("izafet I"), ["iz1"] = "izafet I", ["izafet II"] = make_compound_type("izafet II"), ["iz2"] = "izafet II", ["izafet III"] = make_compound_type("izafet III"), ["iz3"] = "izafet III", ["karmadharaya"] = make_compound_type("karmadharaya", "karmadhāraya"), ["karma"] = "karmadharaya", ["kd"] = "karmadharaya", ["kenning"] = make_raw_compound_type("kenning"), ["ken"] = "kenning", ["rhyming"] = make_non_glossary_compound_type("rhyming"), ["rhy"] = "rhyming", ["synonymous"] = make_non_glossary_compound_type("synonymous"), ["syn"] = "synonymous", ["tatpurusa"] = make_compound_type("tatpurusa", "tatpuruṣa"), ["tat"] = "tatpurusa", ["tp"] = "tatpurusa", } local function process_etymology_type(typ, nocap, notext, has_parts) local text_sections = {} local categories = {} local borrowing_type if typ then local typdata = export.etymology_types[typ] if type(typdata) == "string" then typdata = export.etymology_types[typdata] end if not typdata then error("Internal error: Unrecognized type '" .. typ .. "'") end local text = typdata.text if not nocap then text = ucfirst(text) end local cat = typdata.cat borrowing_type = typdata.borrowing_type local oftext = typdata.oftext or " of" if not notext then table.insert(text_sections, text) if has_parts then table.insert(text_sections, oftext) table.insert(text_sections, " ") end end if cat then table.insert(categories, cat) end end return text_sections, categories, borrowing_type end ----------------------------------------------------------------------------------------- -- Utility functions -- ----------------------------------------------------------------------------------------- -- Iterate an array up to the greatest integer index found. local function ipairs_with_gaps(t) local indices = m_table.numKeys(t) local max_index = #indices > 0 and math.max(unpack(indices)) or 0 local i = 0 return function() while i < max_index do i = i + 1 return i, t[i] end end end export.ipairs_with_gaps = ipairs_with_gaps --[==[ Join formatted parts (in `parts_formatted`) together with any overall {{para|lit}} spec (in `lit`) plus categories, which are formatted by prepending the language name as found in `lang`. The value of an entry in `categories` can be either a string (which is formatted using `sort_key`) or a table of the form `{ {cat=<var>category</var>, sort_key=<var>sort_key</var>, sort_base=<var>sort_base</var>}`, specifying the sort key and sort base to use when formatting the category. If `nocat` is given, no categories are added; otherwise, `force_cat` causes categories to be added even on userspace pages. ]==] function export.join_formatted_parts(data) local cattext local lang = data.data.lang local force_cat = data.data.force_cat or debug_force_cat if data.data.nocat then cattext = "" else for i, cat in ipairs(data.categories) do if type(cat) == "table" then data.categories[i] = require(utilities_module).format_categories(cat.cat .. " bahasa " .. lang:getFullName(), lang, cat.sort_key, cat.sort_base, force_cat) else data.categories[i] = require(utilities_module).format_categories(cat .. " bahasa " .. lang:getFullName(), lang, data.data.sort_key, nil, force_cat) end end cattext = table.concat(data.categories) end local result = table.concat(data.parts_formatted, not data.separator_already_added and " +&lrm; " or nil) .. (data.data.lit and ", secara harfiah " .. m_links.mark(data.data.lit, "gloss") or "") local q = data.data.q local qq = data.data.qq local l = data.data.l local ll = data.data.ll local infl = data.data.infl if q and q[1] or qq and qq[1] or l and l[1] or ll and ll[1] or infl and infl[1] then result = require(pron_qualifier_module).format_qualifiers { lang = lang, text = result, q = q, qq = qq, l = l, ll = ll, infl = infl, } end return result .. cattext end local function pluralize(pos) return pos end -- Remove links and call lang:stripDiacritics(term). local function strip_diacritics_no_links(lang, term) return lang:stripDiacritics(m_links.remove_links(term)) end --[=[ Convert a raw part as passed into an entry point into a part ready for linking. `lang` and `sc` are the overall language and script objects. This uses the overall language and script objects as defaults for the part and parses off any fragment from the term. We need to do the latter so that fragments don't end up in categories and so that we correctly do affix mapping even in the presence of fragments. ]=] local function canonicalize_part(part, lang, sc) if not part then return end -- Save the original (user-specified, part-specific) value of `lang`. If such a value is specified, we don't insert -- a '*fixed with' category, and we format the part using format_derived() in [[Module:etymology]] rather than -- full_link() in [[Module:links]]. part.part_lang = part.lang part.lang = part.lang or lang part.sc = part.sc or sc local term = part.term if not term then return elseif not part.fragment then part.term, part.fragment = m_links.get_fragment(term) else part.term = m_links.get_fragment(term) end end --[==[ Construct a single linked part based on the information in `part`, for use by `show_affix()` and other entry points. This should be called after `canonicalize_part()` is called on the part. This is a thin wrapper around `full_link()` in [[Module:links]] unless `part.part_lang` is specified (indicating that a part-specific language was given), in which case `format_derived()` in [[Module:etymology]] is called to display a term in a language other than the language of the overall term (specified in `data.lang`). `data` contains the entire object passed into the entry point and is used to access information for constructing the categories added by `format_derived()`. ]==] function export.link_term(part, data, include_separator) local result if part.part_lang then result = require(etymology_module).format_derived { lang = data.lang, terms = {part}, sources = {part.lang}, sort_key = data.sort_key, nocat = data.nocat, template_name = "affix", qualifiers_labels_on_outside = true, borrowing_type = data.borrowing_type, force_cat = data.force_cat or debug_force_cat, } else result = m_links.full_link(part, "term", nil, "show qualifiers") end if include_separator and part.separator then return part.separator .. result else return result end end local function canonicalize_script_code(scode) -- Convert fa-Arab, ur-Arab etc. to Arab. return (scode:gsub("^.*%-", "")) end ----------------------------------------------------------------------------------------- -- Affix-handling functions -- ----------------------------------------------------------------------------------------- -- Figure out the appropriate script for the given affix and language (unless the script is explicitly passed in), and -- return the values of template_hyphens[], display_hyphens[] and lookup_hyphens[] for that script, substituting -- default values as appropriate. Four values are returned: -- DETECTED_SCRIPT, TEMPLATE_HYPHEN, DISPLAY_HYPHEN, LOOKUP_HYPHEN local function detect_script_and_hyphens(text, lang, sc) local scode -- 1. If the script is explicitly passed in, use it. if sc then scode = sc:getCode() else local possible_script_codes = lang:getScriptCodes() -- YUCK! `possible_script_codes` comes from loadData() so #possible_scripts doesn't work (always returns 0). local num_possible_script_codes = m_table.length(possible_script_codes) if num_possible_script_codes == 0 then -- This shouldn't happen; if the language has no script codes, -- the list {"None"} should be returned. error("Something is majorly wrong! Language " .. lang:getCanonicalName() .. " has no script codes.") end if num_possible_script_codes == 1 then -- 2. If the language has only one possible script, use it. scode = possible_script_codes[1] else -- 3. Check if any of the possible scripts for the language have non-default values for template_hyphens[] -- or display_hyphens[]. If so, we need to do script detection on the text. If not, just use "Latn", -- which may not be technically correct but produces the right results because Latn has all default -- values for template_hyphens[] and display_hyphens[]. local may_have_nondefault_hyphen = false for _, script_code in ipairs(possible_script_codes) do script_code = canonicalize_script_code(script_code) if template_hyphens[script_code] or display_hyphens[script_code] then may_have_nondefault_hyphen = true break end end if not may_have_nondefault_hyphen then scode = "Latn" else scode = lang:findBestScript(text):getCode() end end end scode = canonicalize_script_code(scode) local template_hyphen = template_hyphens[scode] or "-" local lookup_hyphen = lookup_hyphens[scode] or "-" local display_hyphen = display_hyphens[scode] or default_display_hyphen return scode, template_hyphen, display_hyphen, lookup_hyphen end --[=[ Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string, specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen, or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix). ]=] local function reconstruct_term_per_hyphens(term, affix_type, scode, thyph_re, new_hyphen) local function get_hyphen(hyph) if type(new_hyphen) == "string" then return new_hyphen end return new_hyphen(scode, hyph) end if affix_type == "non-affix" then return term elseif affix_type == "apitan" then local before, before_hyphen, after_hyphen, after = rmatch(term, "^(.*)" .. thyph_re .. " " .. thyph_re .. "(.*)$") if not before or ulen(term) <= 3 then -- Unlike with other types of affixes, don't try to add hyphens in the middle of the term to convert it to -- a circumfix. Also, if the term is just hyphen + space + hyphen, return it. return term end return before .. get_hyphen(before_hyphen) .. " " .. get_hyphen(after_hyphen) .. after elseif affix_type == "sisipan" or affix_type == "jalinan" then local before_hyphen, middle, after_hyphen = rmatch(term, "^" .. thyph_re .. "(.*)" .. thyph_re .. "$") if before_hyphen and ulen(term) <= 1 then -- If the term is just a hyphen, return it. return term end return get_hyphen(before_hyphen) .. (middle or term) .. get_hyphen(after_hyphen) elseif affix_type == "awalan" then local middle, after_hyphen = rmatch(term, "^(.*)" .. thyph_re .. "$") if middle and ulen(term) <= 1 then -- If the term is just a hyphen, return it. return term end return (middle or term) .. get_hyphen(after_hyphen) elseif affix_type == "akhiran" then local before_hyphen, middle = rmatch(term, "^" .. thyph_re .. "(.*)$") if before_hyphen and ulen(term) <= 1 then -- If the term is just a hyphen, return it. return term end return get_hyphen(before_hyphen) .. (middle or term) else error(("Internal error: Unrecognized affix type '%s'"):format(affix_type)) end end --[=[ Look up a mapping from a given affix variant to the canonical form used in categories and links. The lookup tables are language-specific according to `lang`, and may be ID-specific according to `affix_id`. The affixes as they appear in the lookup tables (both the variant and the canonical form) are in "lookup affix" format (approximately speaking, they use a regular hyphen for most scripts, but a tatweel for Arabic-script entries and a maqqef for Hebrew-script entries), but the passed-in `affix` param is in "template affix" format (which differs from the lookup affix for Arabic-script entries, because more types of hyphens are allowed in template affixes; see the comments at the top of the file). The remaining parameters to this function are used to convert from template affixes to lookup affixes; see the reconstruct_term_per_hyphens() function above. If the affix contains brackets, no lookup is done. Otherwise, a two-stage process is used, first looking up the affix directly and then stripping diacritics and looking it up again. The reason for this is documented above in the comments at the top of the file (specifically, the comments describing lookup affixes). The value of a mapping can either be a string (do the mapping regardless of affix ID) or a table indexed by affix ID (where the special value `false` indicates no affix ID). The values of entries in this table can also be strings, or tables with keys `affix` and `id` (again, use `false` to indicate no ID). This allows an affix mapping to map from one ID to another (for example, this is used in English to map the [[an-]] prefix with no ID to the [[a-]] prefix with the ID 'not'). The Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string, specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen, or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix). ]=] local function lookup_affix_mapping(affix, affix_type, lang, scode, thyph_re, lookup_hyph, affix_id) local function do_lookup(affix) -- Ensure that the affix uses lookup hyphens regardless of whether it used a different type of hyphens before -- or no hyphens. local lookup_affix = reconstruct_term_per_hyphens(affix, affix_type, scode, thyph_re, lookup_hyph) local function do_lookup_for_langcode(langcode) if export.langs_with_lang_specific_data[langcode] then local langdata = mw.loadData(export.affix_lang_data_module_prefix .. langcode) if langdata.affix_mappings then local mapping = langdata.affix_mappings[lookup_affix] if mapping then if type(mapping) == "table" then mapping = mapping[affix_id] or mapping.default or mapping[affix_id or false] if mapping then return mapping end else return mapping end end end end end -- If `lang` is an etymology-only language, look for a mapping both for it and its full parent. local langcode = lang:getCode() local mapping = do_lookup_for_langcode(langcode) if mapping then return mapping end local full_langcode = lang:getFullCode() if full_langcode ~= langcode then mapping = do_lookup_for_langcode(full_langcode) if mapping then return mapping end end return nil end if affix:find("%[%[") then return nil end return do_lookup(affix) or do_lookup(lang:stripDiacritics(affix)) or nil end --[==[ For a given template term in a given language (see the definition of "template affix" near the top of the file), possibly in an explicitly specified script `sc` (but usually nil), return the term's affix type ({"awalan"}, {"jalinan"}, {"akhiran"}, {"apitan"} or {"non-affix"}) along with the corresponding link and display affixes (see definitions near the top of the file); also the corresponding lookup affix (if `return_lookup_affix` is specified). The term passed in should already have any fragment (after the # sign) parsed off of it. Four values are returned: `affix_type`, `link_term`, `display_term` and `lookup_term`. The affix type can be passed in instead of autodetected; in this case, the template term need not have any attached hyphens, and the appropriate hyphens will be added in the appropriate places. If `do_affix_mapping` is specified, look up the affix in the lang-specific affix mappings, as described in the comment at the top of the file; otherwise, the link and display terms will always be the same. (They will be the same in any case if the template term has a bracketed link in it or is not an affix.) If `return_lookup_affix` is given, the fourth return value contains the term with appropriate lookup hyphens in the appropriate places; otherwise, it is the same as the display term. (This functionality is used in [[Module:category tree/affixes and compounds]] to convert link affixes into lookup affixes so that they can be looked up in the affix mapping tables.) ]==] local function parse_term_for_affixes(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id) if not term then return "non-affix", nil, nil, nil end if term == "^" then -- Indicates a null term to emulate the behavior of {{suffix|foo||bar}}. term = "" return "non-affix", term, term, term end if term:find("^%^") then -- HACK! ^ at the beginning of Korean languages has a special meaning, triggering capitalization of the -- transliteration. Don't interpret it as "force non-affix" for those languages. local langcode = lang:getCode() if langcode ~= "ko" and langcode ~= "okm" and langcode ~= "jje" then -- Formerly we allowed ^ to force non-affix type; this is now handled using an inline modifier -- <naf>, <root>, etc. Throw an error for the moment when the old way is encountered. error("Use of ^ to force non-affix status is no longer supported; use an inline modifier <naf> or <root> " .. "after the component") end end -- Remove an asterisk if the morpheme is reconstructed and add it back at the end. local reconstructed = "" if term:find("^%*") then reconstructed = "*" term = term:gsub("^%*", "") end local scode, thyph, dhyph, lhyph = detect_script_and_hyphens(term, lang, sc) thyph = "([" .. thyph .. "])" if not affix_type then if rfind(term, thyph .. " " .. thyph) then affix_type = "apitan" else local has_beginning_hyphen = rfind(term, "^" .. thyph) local has_ending_hyphen = rfind(term, thyph .. "$") if has_beginning_hyphen and has_ending_hyphen then affix_type = "jalinan" elseif has_ending_hyphen then affix_type = "awalan" elseif has_beginning_hyphen then affix_type = "akhiran" else affix_type = "non-affix" end end end local link_term, display_term, lookup_term if affix_type == "non-affix" then link_term = term display_term = term lookup_term = term else display_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, dhyph) if do_affix_mapping then link_term = lookup_affix_mapping(term, affix_type, lang, scode, thyph, lhyph, affix_id) -- The return value of lookup_affix_mapping() may be an affix mapping with lookup hyphens if a mapping -- was found, otherwise nil if a mapping was not found. We need to convert to display hyphens in -- either case, but in the latter case we can reuse the display term, which has already been converted. if link_term then link_term = reconstruct_term_per_hyphens(link_term, affix_type, scode, thyph, dhyph) else link_term = display_term end else link_term = display_term end if return_lookup_affix then lookup_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, lhyph) else lookup_term = display_term end end link_term = reconstructed .. link_term display_term = reconstructed .. display_term lookup_term = reconstructed .. lookup_term return affix_type, link_term, display_term, lookup_term end --[==[ Add a hyphen to a term in the appropriate place, based on the specified affix type, stripping off any existing hyphens in that place. For example, if `affix_type` == {"awalan"}, we'll add a hyphen onto the end if it's not already there (or is of the wrong type). Three values are returned: the link term, display term and lookup term. This function is a thin wrapper around `parse_term_for_affixes`; see the comments above that function for more information. Note that this function is exposed externally because it is called by [[Module:category tree/affixes and compounds]]; see the comment in `parse_term_for_affixes` for more information. ]==] function export.make_affix(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id) if not (affix_type == "awalan" or affix_type == "akhiran" or affix_type == "apitan" or affix_type == "sisipan" or affix_type == "jalinan" or affix_type == "non-affix") then error("Internal error: Invalid affix type " .. (affix_type or "(nil)")) end local _, link_term, display_term, lookup_term = parse_term_for_affixes(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id) return link_term, display_term, lookup_term end ----------------------------------------------------------------------------------------- -- Main entry points -- ----------------------------------------------------------------------------------------- --[==[ Core categorization logic for affixes. This is shared between show_affix(), show_compound_like() and get_affix_categories_only(). Returns the categories array and other metadata needed for formatting. ]==] local function generate_affix_categories(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) local text_sections, categories, borrowing_type = process_etymology_type(data.type, data.surface_analysis or data.nocap, data.notext, #data.parts > 0) data.borrowing_type = borrowing_type -- Process each part local whole_words = 0 local is_affix_or_compound = false -- Canonicalize and generate links for all the parts first; then do categorization in a separate step, because when -- processing the first part for categorization, we may access the second part and need it already canonicalized. for i, part in ipairs_with_gaps(data.parts) do part = part or {} data.parts[i] = part canonicalize_part(part, data.lang, data.sc) -- Determine affix type and get link and display terms (see text at top of file). Store them in the part -- (in fields that won't clash with fields used by full_link() in [[Module:links]] or link_term()), so they -- can be used in the loop below when categorizing. part.affix_type, part.affix_link_term, part.affix_display_term = parse_term_for_affixes(part.term, part.lang, part.sc, part.type, not part.alt, nil, part.id) -- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with inline -- modifiers. The intention in either case is not to link the term. part.term = ine(part.affix_link_term) -- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being -- redundant alt text. part.alt = part.alt or (part.affix_display_term ~= part.affix_link_term and part.affix_display_term) or nil end if not data.noaffixcat then -- Now do categorization. for i, part in ipairs_with_gaps(data.parts) do local affix_type = part.affix_type if affix_type ~= "non-affix" then is_affix_or_compound = true -- Make a sort key. For the first part, use the second part as the sort key; the intention is that if the -- term has a prefix, sorting by the prefix won't be very useful so we sort by what follows, which is -- presumably the root. local part_sort_base = nil local part_sort = part.sort or data.sort_key if i == 1 and data.parts[2] and data.parts[2].term then local part2 = data.parts[2] -- If the second-part link term is empty, the user requested an unlinked term; avoid a wikitext error -- by using the alt value if available. part_sort_base = ine(part2.affix_link_term) or ine(part2.alt) if part_sort_base then part_sort_base = strip_diacritics_no_links(part2.lang, part_sort_base) end end if part.pos and rfind(part.pos, "patronym") then table.insert(categories, {cat = "patronim", sort_key = part_sort, sort_base = part_sort_base}) end if data.pos ~= "terms" and part.pos and rfind(part.pos, "diminutive") then table.insert(categories, {cat = data.pos .. " diminutif", sort_key = part_sort, sort_base = part_sort_base}) end -- Don't add a '*fixed with' category if the link term is empty or is in a different language. if ine(part.affix_link_term) and not part.part_lang then table.insert(categories, {cat = data.pos .. " dengan " .. affix_type .. " " .. strip_diacritics_no_links(part.lang, part.affix_link_term) .. (part.id and " (" .. part.id .. ")" or ""), sort_key = part_sort, sort_base = part_sort_base}) end else whole_words = whole_words + 1 if whole_words == 2 then is_affix_or_compound = true table.insert(categories, data.pos .. " majmuk") end end end -- Make sure there was either an affix or a compound (two or more non-affix terms). if not is_affix_or_compound and not data.allow_no_affixes_or_compounds then error("The parameters did not include any affixes, and the term is not a compound. Please provide at least one affix.") end end return text_sections, categories, borrowing_type end --[==[ Implementation of {{tl|affix}} and {{tl|surface analysis}}. `data` contains all the information describing the affixes to be displayed, and contains the following: * `.lang` ('''required'''): Overall language object. Different from term-specific language objects (see `.parts` below). * `.sc`: Overall script object (usually omitted). Different from term-specific script objects. * `.parts` ('''required'''): List of objects describing the affixes to show. The general format of each object is as would be passed to `full_link()`, except that the `.lang` field should be missing unless the term is of a language different from the overall `.lang` value (in such a case, the language name is shown along with the term and an additional "derived from" category is added). '''WARNING''': The data in `.parts` will be destructively modified. * `.pos`: Overall part of speech (used in categories, defaults to {"terms"}). Different from term-specific part of speech. * `.sort_key`: Overall sort key. Normally omitted except e.g. in Japanese. * `.type`: Type of compound, if the parts in `.parts` describe a compound. Strictly optional, and if supplied, the compound type is displayed before the parts (normally capitalized, unless `.nocap` is given). * `.nocap`: Don't capitalize the first letter of text displayed before the parts (relevant only if `.type` or `.surface_analysis` is given). * `.notext`: Don't display any text before the parts (relevant only if `.type` or `.surface_analysis` is given). * `.nocat`: Disable all categorization. * `.noaffixcat`: Disable affix (and compound) categorization. Relevant for e.g. blends, which may otherwise be incorrectly categorized as compound terms. * `.lit`: Overall literal definition. Different from term-specific literal definitions. * `.force_cat`: Always display categories, even on userspace pages. * `.surface_analysis`: Implement {{surface analysis}}; adds `By surface analysis, ` before the parts. '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.show_affix(data) local text_sections, categories, borrowing_type = generate_affix_categories(data) -- Process each part for display local parts_formatted = {} for i, part in ipairs_with_gaps(data.parts) do -- Make a link for the part table.insert(parts_formatted, export.link_term(part, data, "include_separator")) end if data.surface_analysis then local text = "dengan " .. glossary_link("surface analysis") .. ", " if not data.nocap then text = ucfirst(text) end table.insert(text_sections, 1, text) end table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories, separator_already_added = true }) return table.concat(text_sections) end --[==[ Get only the categories that would be generated by show_affix(), without any text output or formatting. This is used by Module:etymon to get affix categorization. Returns an array of category objects, where each entry is either a string (simple category name) or a table with keys `cat`, `sort_key`, and `sort_base` for more complex categorization. `data` should have the same structure as passed to show_affix(): * `.lang` (required): Overall language object * `.parts` (required): Array of affix part objects with `.term`, `.lang`, `.id`, etc. * `.pos`: Part of speech (defaults to "terms") * `.sort_key`: Overall sort key for categories '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.get_affix_categories_only(data) local text_sections, categories, borrowing_type = generate_affix_categories(data) return categories end function export.show_surface_analysis(data) data.surface_analysis = true data.allow_no_affixes_or_compounds = true return export.show_affix(data) end --[==[ Implementation of {{tl|compound}}. '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.show_compound(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) local text_sections, categories, borrowing_type = process_etymology_type(data.type, data.nocap, data.notext, #data.parts > 0) data.borrowing_type = borrowing_type local parts_formatted = {} local pos_for_category = (data.pos == "Perkataan") and "Kata" or data.pos table.insert(categories, pos_for_category .. " majmuk") -- Make links out of all the parts local whole_words = 0 for i, part in ipairs(data.parts) do canonicalize_part(part, data.lang, data.sc) -- Determine affix type and get link and display terms (see text at top of file). local affix_type, link_term, display_term = parse_term_for_affixes(part.term, part.lang, part.sc, part.type, not part.alt, nil, part.id) -- If the term is an interfix or the type was explicitly given, recognize it as such (which means e.g. that we -- will display the term without hyphens for East Asian languages). Otherwise, ignore the fact that it looks -- like an affix and display as specified in the template (but pay attention to the detected affix type for -- certain tracking purposes). if affix_type == "jalinan" or (part.type and part.type ~= "non-affix") then -- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with -- inline modifiers. The intention in either case is not to link the term. Don't add a '*fixed with' -- category in this case, or if the term is in a different language. -- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being -- redundant alt text. if link_term and link_term ~= "" and not part.part_lang then table.insert(categories, {cat = data.pos .. " dengan " .. affix_type .. " " .. strip_diacritics_no_links(part.lang, link_term), sort_key = part.sort or data.sort_key}) end part.term = link_term ~= "" and link_term or nil part.alt = part.alt or (display_term ~= link_term and display_term) or nil else if affix_type ~= "non-affix" then local langcode = data.lang:getCode() -- If `data.lang` is an etymology-only language, track both using its code and its full parent's code. track { affix_type, affix_type .. "/lang/" .. langcode } local full_langcode = data.lang:getFullCode() if langcode ~= full_langcode then track(affix_type .. "/lang/" .. full_langcode) end else whole_words = whole_words + 1 end end table.insert(parts_formatted, export.link_term(part, data, "include_separator")) end if whole_words == 1 then track("one whole word") elseif whole_words == 0 then track("looks like confix") end table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories, separator_already_added = true }) return table.concat(text_sections) end --[==[ Implementation of {{tl|blend}}, {{tl|univerbation}} and similar "compound-like" templates. '''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`. ]==] function export.show_compound_like(data) data.allow_no_affixes_or_compounds = true local text_sections, categories, borrowing_type = generate_affix_categories(data) if data.cat then table.insert(categories, data.cat) end -- Process each part for display local parts_formatted = {} for i, part in ipairs_with_gaps(data.parts) do -- Make a link for the part table.insert(parts_formatted, export.link_term(part, data, "include_separator")) end if #data.parts > 0 and data.oftext then table.insert(text_sections, 1, " " .. data.oftext .. " ") end if data.text then table.insert(text_sections, 1, data.text) end table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories, separator_already_added = true }) return table.concat(text_sections) end --[==[ Make `part` (a structure holding information on an affix part) into an affix of type `affix_type`, and apply any relevant affix mappings. For example, if the desired affix type is "akhiran", this will (in general) add a hyphen onto the beginning of the term, alt, tr and ts components of the part if not already present. The hyphen that's added is the "display hyphen" (see above) and may be script-specific. (In the case of East Asian scripts, the display hyphen is an empty string whereas the template hyphen is the regular hyphen, meaning that any regular hyphen at the beginning of the part will be effectively removed.) `lang` and `sc` hold overall language and script objects. Note that this also applies any language-specific affix mappings, so that e.g. if the language is Finnish and the user specified [[-käs]] in the affix and didn't specify an `.alt` value, `part.term` will contain [[-kas]] and `part.alt` will contain [[-käs]]. This function is used by the "legacy" templates ({{tl|prefix}}, {{tl|suffix}}, {{tl|confix}}, etc.) where the nature of the affix is specified by the template itself rather than auto-determined from the affix, as is the case with {{tl|affix}}. '''WARNING''': This destructively modifies `part`. ]==] local function make_part_into_affix(part, lang, sc, affix_type) canonicalize_part(part, lang, sc) local link_term, display_term = export.make_affix(part.term, part.lang, part.sc, affix_type, not part.alt, nil, part.id) part.term = link_term -- When we don't specify `do_affix_mapping` to make_affix(), link and display terms (first and second retvals of -- make_affix()) are the same. -- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being -- redundant alt text. part.alt = part.alt and export.make_affix(part.alt, part.lang, part.sc, affix_type) or (display_term ~= link_term and display_term) or nil local Latn = require(scripts_module).getByCode("Latn") part.tr = export.make_affix(part.tr, part.lang, Latn, affix_type) part.ts = export.make_affix(part.ts, part.lang, Latn, affix_type) end local function track_wrong_affix_type(template, part, expected_affix_type) if part and not part.type then local affix_type = parse_term_for_affixes(part.term, part.lang, part.sc) if affix_type ~= expected_affix_type then local part_name = expected_affix_type or "base" local langcode = part.lang:getCode() local full_langcode = part.lang:getFullCode() require("Module:debug/track") { template, template .. "/" .. part_name, template .. "/" .. part_name .. "/" .. (affix_type or "none"), template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. langcode } -- If `part.lang` is an etymology-only language, track both using its code and its full parent's code. if full_langcode ~= langcode then require("Module:debug/track")( template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. full_langcode ) end end end end local function insert_affix_category(categories, pos, affix_type, part, sort_key, sort_base) -- Don't add a '*fixed with' category if the link term is empty or is in a different language. if part.term and not part.part_lang then local cat = pos .. " dengan " .. affix_type .. " " .. strip_diacritics_no_links(part.lang, part.term) .. (part.id and " (" .. part.id .. ")" or "") if sort_key or sort_base then table.insert(categories, {cat = cat, sort_key = sort_key, sort_base = sort_base}) else table.insert(categories, cat) end end end --[==[ Implementation of {{tl|circumfix}}. '''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`. ]==] function export.show_circumfix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. make_part_into_affix(data.prefix, data.lang, data.sc, "awalan") make_part_into_affix(data.suffix, data.lang, data.sc, "akhiran") track_wrong_affix_type("apitan", data.prefix, "awalan") track_wrong_affix_type("apitan", data.base, nil) track_wrong_affix_type("apitan", data.suffix, "akhiran") -- Create circumfix term. local circumfix = nil if data.prefix.term and data.suffix.term then circumfix = data.prefix.term .. " " .. data.suffix.term data.prefix.alt = data.prefix.alt or data.prefix.term data.suffix.alt = data.suffix.alt or data.suffix.term data.prefix.term = circumfix data.suffix.term = circumfix end -- Make links out of all the parts. local parts_formatted = {} local categories = {} local sort_base if data.base.term then sort_base = strip_diacritics_no_links(data.base.lang, data.base.term) end table.insert(parts_formatted, export.link_term(data.prefix, data)) table.insert(parts_formatted, export.link_term(data.base, data)) table.insert(parts_formatted, export.link_term(data.suffix, data)) -- Insert the categories, but don't add a '*fixed with' category if the link term is in a different language. if not data.prefix.part_lang then table.insert(categories, {cat=data.pos .. " dengan apitan " .. strip_diacritics_no_links(data.prefix.lang, circumfix), sort_key=data.sort_key, sort_base=sort_base}) end return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|confix}}. '''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`. ]==] function export.show_confix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. make_part_into_affix(data.prefix, data.lang, data.sc, "awalan") make_part_into_affix(data.suffix, data.lang, data.sc, "akhiran") track_wrong_affix_type("confix", data.prefix, "awalan") track_wrong_affix_type("confix", data.base, nil) track_wrong_affix_type("confix", data.suffix, "akhiran") -- Make links out of all the parts. local parts_formatted = {} local prefix_sort_base if data.base and data.base.term then prefix_sort_base = strip_diacritics_no_links(data.base.lang, data.base.term) elseif data.suffix.term then prefix_sort_base = strip_diacritics_no_links(data.suffix.lang, data.suffix.term) end -- Insert the categories and parts. local categories = {} table.insert(parts_formatted, export.link_term(data.prefix, data)) insert_affix_category(categories, data.pos, "awalan", data.prefix, data.sort_key, prefix_sort_base) if data.base then table.insert(parts_formatted, export.link_term(data.base, data)) end table.insert(parts_formatted, export.link_term(data.suffix, data)) -- FIXME, should we be specifying a sort base here? insert_affix_category(categories, data.pos, "akhiran", data.suffix) return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|infix}}. '''WARNING''': This destructively modifies both `data` and `.base` and `.infix`. ]==] function export.show_infix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. make_part_into_affix(data.infix, data.lang, data.sc, "sisipan") track_wrong_affix_type("sisipan", data.base, nil) track_wrong_affix_type("sisipan", data.infix, "sisipan") -- Make links out of all the parts. local parts_formatted = {} local categories = {} table.insert(parts_formatted, export.link_term(data.base, data)) table.insert(parts_formatted, export.link_term(data.infix, data)) -- Insert the categories. -- FIXME, should we be specifying a sort base here? insert_affix_category(categories, data.pos, "sisipan", data.infix) return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|prefix}}. '''WARNING''': This destructively modifies both `data` and the structures within `.prefixes`, as well as `.base`. ]==] function export.show_prefix(data) data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. for i, prefix in ipairs(data.prefixes) do make_part_into_affix(prefix, data.lang, data.sc, "awalan") end for i, prefix in ipairs(data.prefixes) do track_wrong_affix_type("awalan", prefix, "awalan") end track_wrong_affix_type("awalan", data.base, nil) -- Make links out of all the parts. local parts_formatted = {} local first_sort_base = nil local categories = {} if data.prefixes[2] then first_sort_base = ine(data.prefixes[2].term) or ine(data.prefixes[2].alt) if first_sort_base then first_sort_base = strip_diacritics_no_links(data.prefixes[2].lang, first_sort_base) end elseif data.base then first_sort_base = ine(data.base.term) or ine(data.base.alt) if first_sort_base then first_sort_base = strip_diacritics_no_links(data.base.lang, first_sort_base) end end for i, prefix in ipairs(data.prefixes) do table.insert(parts_formatted, export.link_term(prefix, data)) insert_affix_category(categories, data.pos, "awalan", prefix, data.sort_key, i == 1 and first_sort_base or nil) end if data.base then table.insert(parts_formatted, export.link_term(data.base, data)) else table.insert(parts_formatted, "") end return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end --[==[ Implementation of {{tl|suffix}}. '''WARNING''': This destructively modifies both `data` and the structures within `.suffixes`, as well as `.base`. ]==] function export.show_suffix(data) local categories = {} data.pos = data.pos or default_pos data.pos = pluralize(data.pos) canonicalize_part(data.base, data.lang, data.sc) -- Hyphenate the affixes and apply any affix mappings. for i, suffix in ipairs(data.suffixes) do make_part_into_affix(suffix, data.lang, data.sc, "akhiran") end track_wrong_affix_type("akhiran", data.base, nil) for i, suffix in ipairs(data.suffixes) do track_wrong_affix_type("akhiran", suffix, "akhiran") end -- Make links out of all the parts. local parts_formatted = {} if data.base then table.insert(parts_formatted, export.link_term(data.base, data)) else table.insert(parts_formatted, "") end for i, suffix in ipairs(data.suffixes) do table.insert(parts_formatted, export.link_term(suffix, data)) end -- Insert the categories. for i, suffix in ipairs(data.suffixes) do -- FIXME, should we be specifying a sort base here? insert_affix_category(categories, data.pos, "akhiran", suffix) if suffix.pos and rfind(suffix.pos, "patronym") then table.insert(categories, "patronim") end end return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories } end return export 7dyhsp9hwsg3p0jvb2vy3wh1uq9p26j elettroluminescente 0 10844 281464 245683 2026-04-23T09:42:21Z Hakimi97 2668 /* Kata sifat */ 281464 wikitext text/x-wiki == Bahasa Itali == ===Kata sifat=== {{it-adj}} # [[elektropendar]] ====Istilah berkaitan==== * [[elettroluminescenza]] ===Etimologi=== {{awalan|it|elettro|luminescente}} i7n4ti693i3zpr3htn2103fojat7jza Modul:category tree/topic/Communication 828 11523 281456 281414 2026-04-23T00:37:53Z Hakimi97 2668 Membatalkan semakan [[Special:Diff/281414|281414]] oleh [[Special:Contributions/PeaceSeekers|PeaceSeekers]] ([[User talk:PeaceSeekers|bincang]]) 281456 Scribunto text/plain local labels = {} local unpack = unpack or table.unpack -- Lua 5.2 compatibility -- FIXME: Lookup langs in the language list. for _, lang_etc in ipairs { "Arab", {"Cina", "Bahasa-bahasa Cina"}, "Inggeris", "Jerman", "Jepun", "Okinawa", "Portugis", "Sepanyol", "Vietnam", {"Melayu", "Bahasa-bahasa Melayik"}, } do if type(lang_etc) ~= "table" then lang_etc = {lang_etc} end local lang, desc = unpack(lang_etc) desc = desc or ("[[:Kategori:Bahasa %s|bahasa %s]]"):format(lang, lang) labels[lang] = { type = "berkenaan", description = "=" .. desc, parents = {"bahasa-bahasa"}, } end labels["komunikasi"] = { type = "berkenaan", description = "default", parents = {"Semua topik"}, } labels["huruf"] = { type = "nama", description = "default", parents = {"sistem tulisan"}, } labels["bahasa buatan"] = { -- distinguish from "cat:constructed languages" family category type = "nama", description = "={{w|constructed language}}s", parents = {"bahasa-bahasa"}, } labels["bahasa badan"] = { type = "berkenaan", description = "default", parents = {"bahasa", "nonverbal communication"}, } labels["penyiaran"] = { type = "berkenaan", description = "default", parents = {"media", "telekomunikasi"}, } labels["Komponen aksara Cina"] = { type = "set", description = "=[[komponen|Komponen]] [[aksara]] [[Cina]].", parents = {"Huruf, simbol dan tanda baca"}, } labels["diacritical marks"] = { type = "set", description = "default", parents = {"Huruf, simbol dan tanda baca"}, } labels["dialects"] = { type = "set", description = "default", parents = {"bahasa"}, } labels["dictation"] = { type = "berkenaan", description = "default", parents = {"komunikasi"}, } labels["bahasa pupus"] = { type = "nama", description = "default", parents = {"bahasa-bahasa"}, } labels["bahasa isyarat"] = { type = "nama", description = "default", parents = {"bahasa-bahasa"}, } labels["facial expressions"] = { type = "set", description = "default", parents = {"nonverbal communication", "face"}, } labels["kiasan"] = { type = "set", description = "=[[figure of speech|figures of speech]]", parents = {"retorik"}, } labels["bendera"] = { type = "berkenaan,name,type", description = "default", parents = {"komunikasi"}, } labels["jargon"] = { type = "berkenaan", description = "default", parents = {"bahasa"}, } labels["aksara Han"] = { type = "berkenaan", description = "default", parents = {"sistem tulisan"}, } labels["bahasa"] = { type = "berkenaan", description = "default", parents = {"komunikasi"}, } labels["keluarga bahasa"] = { type = "nama", description = "Topik berkenaan [[keluarga bahasa]], termasuklah yang diterima dan yang bersifat kontroversi.", parents = {"bahasa", "nama"}, } labels["bahasa-bahasa"] = { type = "nama", description = "default", parents = {"bahasa", "nama"}, } labels["Huruf, simbol dan tanda baca"] = { type = "set", description = "=[[letter]]s, [[symbol]]s, and [[punctuation]]", parents = {"Ortografi"}, } labels["logical fallacies"] = { type = "set", description = "=[[logical fallacy|logical fallacies]], clearly defined errors in reasoning used to support or refute an argument", additional = "{{also|Kategori:{{{langcode}}}:biases}}", parents = {"retorik", "logic"}, } labels["media"] = { type = "berkenaan", description = "default", parents = {"komunikasi"}, } labels["telefon bimbit"] = { type = "berkenaan,set", description = "default", parents = {"telefoni"}, } labels["nonverbal communication"] = { type = "berkenaan", description = "default", parents = {"komunikasi"}, } labels["ortografi"] = { type = "berkenaan", description = "default", parents = {"penulisan"}, } labels["palaeography"] = { type = "berkenaan", description = "default", parents = {"penulisan"}, } labels["pos"] = { type = "berkenaan", description = "=[[post#Noun|post]] or [[mail#Noun|mail]]", parents = {"komunikasi"}, } labels["postal abbreviations"] = { type = "nama", description = "default", parents = {"pos"}, } labels["public relations"] = { type = "berkenaan", description = "default no singularize", parents = {"komunikasi"}, } labels["tanda baca"] = { type = "set", description = "default", parents = {"Huruf, simbol dan tanda baca"}, } labels["radio"] = { type = "berkenaan", description = "default", parents = {"telekomunikasi"}, } labels["retorik"] = { type = "berkenaan", description = "default", parents = {"bahasa"}, } labels["signs"] = { type = "berkenaan,name,type", description = "default", parents = {"komunikasi"}, } labels["sociolects"] = { type = "nama", description = "default", parents = {"bahasa"}, } labels["simbol"] = { type = "set", description = "=[[symbol]]s, especially [[mathematical]] and [[scientific]] symbols", additional = "Most symbols have equivalent meanings in many languages and can therefore be found in [[:Category:Translingual symbols]].", parents = {"Huruf, simbol dan tanda baca"}, } labels["talking"] = { type = "berkenaan", description = "default", parents = {"bahasa", "tingkah laku manusia"}, } labels["telekomunikasi"] = { type = "berkenaan", description = "default no singularize", parents = {"komunikasi", "teknologi"}, } labels["telegraphy"] = { type = "berkenaan", description = "default", parents = {"telekomunikasi", "elektronik"}, wpcat = true, commonscat = true, } labels["telefoni"] = { type = "berkenaan", description = "default", parents = {"telekomunikasi", "elektronik"}, } labels["texting"] = { type = "berkenaan", description = "default", parents = {"telekomunikasi"}, } labels["textual division"] = { type = "berkenaan", description = "default", parents = {"penulisan"}, } labels["tipografi"] = { type = "berkenaan", description = "default", parents = {"penulisan", "percetakan"}, } labels["penulisan"] = { type = "berkenaan", description = "default", parents = {"bahasa", "tingkah laku manusia"}, } labels["sistem tulisan"] = { type = "set", description = "default", parents = {"penulisan"}, } return labels gyp35snlkpffileqsjqpu60ovnf03zf Modul:it-headword 828 13932 281463 112707 2026-04-23T09:41:01Z Hakimi97 2668 Mengemas kini mengikut padanan Wikikamus bahasa Inggeris (semakan [[en:Special:Diff/89361722|89361722]]) (perlu semakan semula) 281463 Scribunto text/plain -- This module contains code for Italian headword templates. -- Templates covered are: -- * {{it-noun}}, {{it-proper noun}}; -- * {{it-verb}}; -- * {{it-adj}}, {{it-adj-comp}}, {{it-adj-sup}}; -- * {{it-det}}; -- * {{it-art}}; -- * {{it-pron-adj}}; -- * {{it-pp}}; -- * {{it-presp}}; -- * {{it-card-noun}}, {{it-card-adj}}, {{it-card-inv}}; -- * {{it-adv}}; -- * {{it-pos}}; -- * {{it-suffix form}}. -- See [[Module:it-verb]] for Italian conjugation templates. local export = {} local pos_functions = {} local force_cat = false -- for testing; if true, categories appear in non-mainspace pages local m_strutils = require("Module:string utilities") local usub = m_strutils.sub local require_when_needed = require("Module:utilities/require when needed") local insert = table.insert local remove = table.remove local m_table = require("Module:table") local com = require("Module:it-common") local en_utilities_module = "Module:en-utilities" local headword_module = "Module:headword" local headword_utilities_module = "Module:headword utilities" local inflection_utilities_module = "Module:inflection utilities" local it_verb_module = "Module:it-verb" local parse_interface_module = "Module:parse interface" local romut_module = "Module:romance utilities" local lang = require("Module:languages").getByCode("it") local langname = lang:getCanonicalName() local m_en_utilities = require_when_needed(en_utilities_module) local m_headword_utilities = require_when_needed(headword_utilities_module) local glossary_link = require_when_needed(headword_utilities_module, "glossary_link") local unpack = unpack or table.unpack -- Lua 5.2 compatibility local no_split_apostrophe_words = { ["c'è"] = true, ["c'era"] = true, ["c'erano"] = true, } ----------------------------------------------------------------------------------------- -- Utility functions -- ----------------------------------------------------------------------------------------- local function track(page) require("Module:debug/track")("it-headword/" .. page) return true end -- Parse and insert an inflection not requiring additional processing into `data.inflections`. The raw arguments come -- from `args[field]`, which is parsed for inline modifiers. `label` is the label that the inflections are given; -- `accel` is the accelerator form, or nil. local function parse_and_insert_inflection(data, args, field, label, accel, frob) m_headword_utilities.parse_and_insert_inflection { headdata = data, forms = args[field], paramname = field, splitchar = ",", label = label, accel = accel and {form = accel} or nil, frob = frob, } end local function replace_hash_with_lemma(term, lemma) -- If there is a % sign in the lemma, we have to replace it with %% so it doesn't get interpreted as a capture -- replace expression. lemma = lemma:gsub("%%", "%%%%") return (term:gsub("#", lemma)) end local list_param = {list = true, disallow_holes = true} local boolean_param = {type = "boolean"} ----------------------------------------------------------------------------------------- -- Main entry point -- ----------------------------------------------------------------------------------------- function export.show(frame) local poscat = frame.args[1] or error("Part of speech has not been specified. Please pass parameter 1 to the module invocation.") local parargs = frame:getParent().args local params = { ["head"] = list_param, ["id"] = true, ["sort"] = true, ["apoc"] = boolean_param, ["splithyph"] = boolean_param, ["nolinkhead"] = boolean_param, ["nolink"] = {type = "boolean", alias_of = "nolinkhead"}, ["json"] = boolean_param, ["pagename"] = true, -- for testing } if pos_functions[poscat] then for key, val in pairs(pos_functions[poscat].params) do params[key] = val end end local args = require("Module:parameters").process(parargs, params) local pagename = args.pagename or mw.loadData("Module:headword/data").pagename local user_specified_heads = args.head local heads = user_specified_heads if args.nolinkhead then if #heads == 0 then heads = {pagename} end else local romut = require(romut_module) local auto_linked_head = romut.add_links_to_multiword_term(pagename, args.splithyph, no_split_apostrophe_words) if #heads == 0 then heads = {auto_linked_head} else for i, head in ipairs(heads) do if head:find("^~") then head = romut.apply_link_modifiers(auto_linked_head, usub(head, 2)) heads[i] = head end if head == auto_linked_head then track("redundant-head") end end end end local data = { lang = lang, pos_category = pos_functions[poscat] and pos_functions[poscat].pos_category or poscat, categories = {}, heads = heads, user_specified_heads = user_specified_heads, no_redundant_head_cat = #user_specified_heads == 0, genders = {}, inflections = {}, pagename = pagename, id = args.id, sort_key = args.sort, force_cat_output = force_cat, checkredlinks = pos_functions[poscat] and pos_functions[poscat].redlink_pos or true, } if pagename:find("^%-") and poscat ~= "bentuk akhiran" then data.is_suffix = true data.pos_category = "akhiran" data.checkredlinks = true local singular_poscat = m_en_utilities.singularize(poscat) insert(data.categories, "Akhiran membentuk " .. singular_poscat .. " bahasa " .. langname) insert(data.inflections, {label = "Akhiran membentuk " .. singular_poscat}) end if pos_functions[poscat] then pos_functions[poscat].func(args, data) end if args.apoc then -- Apocopated form of a term; do this after calling pos_functions[], because the function might modify -- data.pos_category. local pos = data.pos_category if not pos:find("Bentuk ") then -- Apocopated forms are non-lemma forms. local singular_poscat = m_en_utilities.singularize(pos) data.pos_category = "Bentuk " .. singular_poscat end -- If this is a suffix, insert label 'apocopated' after 'FOO-forming suffix', otherwise insert at the beginning. insert(data.inflections, data.is_suffix and 2 or 1, {label = glossary_link("apocopated")}) end if args.json then return require("Module:JSON").toJSON(data) end return require(headword_module).full_headword(data) end local deriv_params = { {"dim", glossary_link("diminutif")}, {"dim_dim", "double " .. glossary_link("diminutif")}, {"aug_dim", glossary_link("agam") .. "-" .. glossary_link("diminutif")}, {"aug", glossary_link("agam")}, {"dim_aug", glossary_link("diminutif") .. "-" .. glossary_link("agam")}, {"aug_aug", "double " .. glossary_link("agam")}, {"pej", glossary_link("pejoratif")}, {"dim_pej", glossary_link("diminutif") .. "-" .. glossary_link("pejoratif")}, {"aug_pej", glossary_link("agam") .. "-" .. glossary_link("pejoratif")}, {"pej_pej", "double " .. glossary_link("pejoratif")}, {"end", glossary_link("endearing")}, {"dim_end", glossary_link("diminutif") .. "-" .. glossary_link("endearing")}, {"aug_end", glossary_link("agam") .. "-" .. glossary_link("endearing")}, {"derog", glossary_link("hinaan")}, {"dim_derog", glossary_link("diminutif") .. "-" .. glossary_link("hinaan")}, {"aug_derog", glossary_link("agam") .. "-" .. glossary_link("hinaan")}, {"end_derog", glossary_link("endearing") .. "-" .. glossary_link("hinaan")}, } local function insert_deriv_params(params) for _, deriv_param in ipairs(deriv_params) do local param = unpack(deriv_param) params[param] = list_param end end local param_mods = { t = { -- We need to store the <t:...> inline modifier into the "gloss" key of the parsed part, because that is what -- [[Module:links]] expects. item_dest = "gloss", }, gloss = {}, -- no 'tr' or 'ts', doesn't make sense for Italian g = { -- We need to store the <g:...> inline modifier into the "genders" key of the parsed part, because that is what -- [[Module:links]] expects. item_dest = "genders", sublist = true, }, id = {}, alt = {}, q = {type = "qualifier"}, qq = {type = "qualifier"}, lit = {}, pos = {}, -- no 'sc', doesn't make sense for Italian } local function parse_term_with_modifiers(paramname, val) local function generate_obj(term) local decomp = com.decompose(term) local lemma = com.remove_non_final_accents(decomp) if lemma ~= decomp then term = com.compose("[[" .. lemma .. "|" .. decomp .. "]]") end return {term = term} end local retval = require(parse_interface_module).parse_inline_modifiers(val, { paramname = paramname, param_mods = param_mods, generate_obj = generate_obj, splitchar = "[/;,]", preserve_splitchar = true, }) for _, obj in ipairs(retval) do if obj.delimiter == ";" then obj.separator = "; " elseif obj.delimiter == "/" then obj.separator = "/" -- default to nil for comma end end return retval end local function insert_deriv_inflections(data, args) for _, deriv_param in ipairs(deriv_params) do local param, desc = unpack(deriv_param) if #args[param] > 0 then local inflection = {label = desc} for _, term in ipairs(args[param]) do local parsed_terms = parse_term_with_modifiers(param, term) for _, parsed_term in ipairs(parsed_terms) do insert(inflection, parsed_term) end end insert(data.inflections, inflection) end end end ----------------------------------------------------------------------------------------- -- Nouns -- ----------------------------------------------------------------------------------------- local allowed_genders = m_table.listToSet( {"m", "f", "mf", "mfbysense", "mfequiv", "gneut", "n", "m-p", "f-p", "mf-p", "mfbysense-p", "mfequiv-p", "gneut-p", "n-p", "?", "?-p"} ) local function validate_genders(genders) for _, g in ipairs(genders) do if type(g) == "table" then g = g.spec end if not allowed_genders[g] then error("Unrecognized gender: " .. g) end end end local function do_noun(args, data, is_proper) local is_plurale_tantum = false local has_singular = false local category_plpos = data.checkredlinks if category_plpos == true then category_plpos = data.pos_category end local category_pos = m_en_utilities.singularize(category_plpos) validate_genders(args[1]) data.genders = args[1] local saw_m = false local saw_f = false local gender_for_default_plural -- Check for specific genders and pluralia tantum. for _, g in ipairs(args[1]) do if type(g) == "table" then g = g.spec end if g:find("-p$") then is_plurale_tantum = true else has_singular = true if g == "m" or g == "mf" or g == "mfbysense" then saw_m = true end if g == "f" or g == "mf" or g == "mfbysense" then saw_f = true end end end if saw_m and saw_f then gender_for_default_plural = "mf" elseif saw_f then gender_for_default_plural = "f" else gender_for_default_plural = "m" end local lemma = data.pagename local function inscat(cat) insert(data.categories, langname .. " " .. cat) end local function insert_noun_inflection(terms, label, accel, no_inv) for _, term in ipairs(terms) do if not no_inv and term.term == lemma then term.term = nil term.label = glossary_link("invariable") end end m_headword_utilities.insert_inflection { headdata = data, terms = terms, label = label, accel = accel and {form = accel} or nil, } end -- Plural local plurals = {} -- Fetch explicit masculine and feminine plurals here because we may change them below when processing plurals. local mpls = m_headword_utilities.parse_term_list_with_modifiers { paramname = "mpl", forms = args.mpl, splitchar = ",", } local fpls = m_headword_utilities.parse_term_list_with_modifiers { paramname = "fpl", forms = args.fpl, splitchar = ",", } if is_plurale_tantum and not has_singular then if args[2][1] then error("Can't specify plurals of plurale tantum " .. category_pos) end insert(data.inflections, {label = glossary_link("hanya jamak")}) elseif args.apoc then -- apocopated noun if args[2][1] then error("Can't specify plurals of apocopated " .. category_pos) end else -- Fetch plurals and associated qualifiers, labels and genders. plurals = m_headword_utilities.parse_term_list_with_modifiers { paramname = {2, "pl"}, forms = args[2], splitchar = ",", include_mods = {"g"}, } -- Check for special plural signals local mode = nil local pl1 = plurals[1] if pl1 and #pl1.term == 1 then mode = pl1.term if mode == "?" or mode == "!" or mode == "-" or mode == "~" then pl1.term = nil if next(pl1) then error(("Can't specify inline modifiers with plural code '%s'"):format(mode)) end remove(plurals, 1) -- Remove the mode parameter elseif mode ~= "+" and mode ~= "#" then error(("Unexpected plural code '%s'"):format(mode)) end end if is_plurale_tantum then -- both singular and plural insert(data.inflections, {label = "kadangkala " .. glossary_link("hanya jamak") .. ", dengan kelainan"}) end if mode == "?" then -- Plural is unknown insert(data.categories, category_plpos .. " bahasa " .. langname .. " dengan bentuk jamak yang tidak dikenal pasti") elseif mode == "!" then -- Plural is not attested insert(data.inflections, {label = "plural not attested"}) insert(data.categories, category_plpos .. " bahasa " .. langname .. " dengan bentuk jamak yang tidak ditentusahkan") if plurals[1] then error("Can't specify any plurals along with unattested plural code '!'") end elseif mode == "-" then -- Uncountable noun; may occasionally have a plural insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname) -- If plural forms were given explicitly, then show "usually" if plurals[1] then insert(data.inflections, {label = "biasanya " .. glossary_link("tak berbilang")}) insert(data.categories, category_plpos .. " berbilang bahasa " .. langname) else insert(data.inflections, {label = glossary_link("uncountable")}) end else -- Countable or mixed countable/uncountable -- If no plurals, use the default plural unless mpl= or fpl= explicitly given. if not plurals[1] and not mpls[1] and not fpls[1] and not is_proper then plurals[1] = {term = "+"} end if mode == "~" then -- Mixed countable/uncountable noun, always has a plural insert(data.inflections, {label = glossary_link("berbilang") .. " dan " .. glossary_link("tak berbilang")}) insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname) insert(data.categories, category_plpos .. " berbilang bahasa " .. langname) elseif plurals[1] then -- Countable nouns insert(data.categories, category_plpos .. " berbilang bahasa " .. langname) else -- Uncountable nouns insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname) end end -- Process plurals, handling requests for default plurals. local has_default_or_hash = false for _, pl in ipairs(plurals) do if pl.term:find("^%+") or pl.term:find("#") or pl.term == "cap*" or pl.term == "cap*+" then has_default_or_hash = true break end end if has_default_or_hash then local newpls = {} local function insert_pl(pl, defpl) pl.term = defpl insert(newpls, pl) end local function make_gendered_plural(pl, special) if gender_for_default_plural == "mf" then local default_mpl = com.make_plural(lemma, "m", special) local default_fpl = com.make_plural(lemma, "f", special) if default_mpl then if default_mpl == default_fpl then insert_pl(pl, default_mpl) else if args.mpl[1] or args.fpl[1] then error("Can't specify gendered plural spec '" .. (special or "+") .. "' along with gender=" .. gender_for_default_plural .. " and also specify mpl= or fpl=") end mpls = {m_table.shallowCopy(pl)} mpls[1].term = default_mpl fpls = {pl} fpls[1].term = default_fpl end end else local defpl = com.make_plural(lemma, gender_for_default_plural, special) if defpl then insert_pl(pl, defpl) end end end for _, pl in ipairs(plurals) do if pl.term == "cap*" or pl.term == "cap*+" then make_gendered_plural(pl, pl.term) elseif pl.term == "+" then make_gendered_plural(pl) elseif pl.term:find("^%+") then local special = require(romut_module).get_special_indicator(pl.term) make_gendered_plural(pl, special) else insert_pl(pl, replace_hash_with_lemma(pl.term, lemma)) end end plurals = newpls end if plurals[2] then inscat(category_plpos .. " with multiple plurals") end -- If the first or only plural is the same as the singular, replace it with 'invariable', or 'usually -- invariable' if there is more than one plural. pl1 = plurals[1] if pl1 and pl1.term == lemma then if plurals[2] then insert(data.inflections, {label = "usually " .. glossary_link("invariable"), q = pl1.q, qq = pl1.qq, l = pl1.l, ll = pl1.ll, refs = pl1.refs }) else insert(data.inflections, {label = glossary_link("invariable"), q = pl1.q, qq = pl1.qq, l = pl1.l, ll = pl1.ll, refs = pl1.refs }) end remove(plurals, 1) inscat("indeclinable " .. category_plpos) end if plurals[1] then -- Check for gender-changing plurals. for _, pl in ipairs(plurals) do if pl.genders then for _, g in ipairs(pl.genders) do if type(g) ~= "table" then g = {spec = g} end if g.spec == "m" and not saw_m or g.spec == "f" and not saw_f then inscat(category_plpos .. " that change gender in the plural") end end end end end end -- Gather masculines/feminines. For each one, generate the corresponding plural. `field` is the name of the field -- containing the masculine or feminine forms (normally "m" or "f"); `inflect` is a function of one or two arguments -- to generate the default masculine or feminine from the lemma (the arguments are the lemma and optionally a -- "special" flag to indicate how to handle multiword lemmas, and the function is normally make_feminine or -- make_masculine from [[Module:it-common]]); and `default_plurals` is a list into which the corresponding default -- plurals of the gathered or generated masculine or feminine forms are stored. local function handle_mf(field, inflect, default_plurals) local special local mfs = m_headword_utilities.parse_term_list_with_modifiers { paramname = field, forms = args[field], splitchar = ",", frob = function(term) if term == "+" then -- Generate default masculine/feminine. term = inflect(lemma) else term = replace_hash_with_lemma(term, lemma) end special = require(romut_module).get_special_indicator(term) if special then term = inflect(lemma, special) end return term end } for _, mf in ipairs(mfs) do local plobj = m_table.shallowCopy(mf) plobj.term = com.make_plural(mf.term, field, special) if plobj.term then -- Add an accelerator for each masculine/feminine plural whose lemma is the corresponding singular, so that -- the accelerated entry that is generated has a definition that looks like -- # {{plural of|it|MFSING}} plobj.accel = {form = "p", lemma = mf.term} insert(default_plurals, plobj) end end return mfs end local feminine_plurals = {} local feminines = handle_mf("f", com.make_feminine, feminine_plurals) local masculine_plurals = {} local masculines = handle_mf("m", com.make_masculine, masculine_plurals) local function handle_mf_plural(mfplfield, mfpls, gender, default_plurals, singulars) if is_plurale_tantum then return mfpls, true end local new_mfpls = {} local saw_plus local noinv for i, mfpl in ipairs(mfpls) do local accel if #mfpls == #singulars then -- If same number of overriding masculine/feminine plurals as singulars, assume each plural goes with -- the corresponding singular and use each corresponding singular as the lemma in the accelerator. The -- generated entry will have -- # {{plural of|it|SINGULAR}} -- as the definition. accel = {form = "p", lemma = singulars[i].term} else accel = nil end if mfpl.term == "+" then -- We should never see + twice. If we do, it will lead to problems since we overwrite the values of -- default_plurals the first time around. if saw_plus then error(("Saw + twice when handling %s="):format(mfplfield)) end saw_plus = true if not default_plurals[1] then local defpl = com.make_plural(lemma, gender) if not defpl then error("Unable to generate default plural of '" .. lemma .. "'") end default_plurals[1] = {term = defpl} end for _, defpl in ipairs(default_plurals) do -- defpl is already a table and has an accel field m_headword_utilities.combine_termobj_qualifiers_labels(defpl, mfpl) insert(new_mfpls, defpl) end -- don't use "invariable" because the plural is not with respect to the lemma but with respect to the -- masc/fem singular noinv = true elseif mfpl.term == "cap*" or mfpl.term == "cap*+" or mfpl.term:find("^%+") then if mfpl.term:find("^%+") then mfpl.term = require(romut_module).get_special_indicator(mfpl.term) end if singulars[1] then for _, mf in ipairs(singulars) do local mfplobj = m_table.shallowCopy(mfpl) mfplobj.term = com.make_plural(mf.term, gender, mfpl.term) if mfplobj.term then mfplobj.accel = accel m_headword_utilities.combine_termobj_qualifiers_labels(mfplobj, mf) insert(new_mfpls, mfplobj) end -- don't use "invariable" because the plural is not with respect to the lemma but with respect -- to the masc/fem singular noinv = true -- FIXME: Should we throw an error if no plural could be generated? end else -- FIXME: This clause didn't exist in the corresponding code in [[Module:pt-headword]]. Is it -- correct? mfpl.term = com.make_plural(lemma, gender, mfpl.term) if mfpl.term then insert(new_mfpls, mfpl) end end else mfpl.accel = accel mfpl.term = replace_hash_with_lemma(mfpl.term, lemma) insert(new_mfpls, mfpl) -- don't use "invariable" if masc/fem singular present because the plural is not with respect to -- the lemma but with respect to the masc/fem singular noinv = noinv or #singulars > 0 end end return new_mfpls, noinv end local mpl_noinv, fpl_noinv -- Not fpls[1] because if the user didn't specify any explicit mpl= or fpl= but the lemma gender is mf or mfbysense -- and has separate masculine and feminine plural forms (e.g. any term in -ista), we don't want to reprocess those -- auto-generated forms. if args.fpl[1] then -- Override any existing feminine plurals. feminine_plurals, fpl_noinv = handle_mf_plural("fpl", fpls, "f", feminine_plurals, feminines) else feminine_plurals, fpl_noinv = fpls, false end if args.mpl[1] then -- Override any existing masculine plurals. masculine_plurals, mpl_noinv = handle_mf_plural("mpl", mpls, "m", masculine_plurals, masculines) else masculine_plurals, mpl_noinv = mpls, false end local function redundant_plural(pl) for _, p in ipairs(plurals) do if p.term == pl.term then return true end end return false end for _, mpl in ipairs(masculine_plurals) do if redundant_plural(mpl) then track("noun-redundant-mpl") end end for _, fpl in ipairs(feminine_plurals) do if redundant_plural(fpl) then track("noun-redundant-fpl") end end if plurals[1] then -- Set 'noinv' because we already took care of invariable plurals above. insert_noun_inflection(plurals, "plural", "p", "noinv") end insert_noun_inflection(masculines, "masculine") insert_noun_inflection(masculine_plurals, "masculine plural", nil, mpl_noinv) insert_noun_inflection(feminines, "feminine", "f") insert_noun_inflection(feminine_plurals, "feminine plural", nil, fpl_noinv) local function parse_and_insert_noun_inflection(field, label, accel) parse_and_insert_inflection(data, args, field, label, accel) end parse_and_insert_noun_inflection("adj", glossary_link("relational", "relational adjective")) parse_and_insert_noun_inflection("adv", glossary_link("adverb")) parse_and_insert_noun_inflection("dem", glossary_link("demonym")) parse_and_insert_noun_inflection("fdem", "female " .. glossary_link("demonym")) insert_deriv_inflections(data, args) -- Maybe add category 'Italian nouns with irregular gender' (or similar) local irreg_gender_lemma = lemma:gsub(" .*", "") -- only look at first word if (irreg_gender_lemma:find("o$") and (gender_for_default_plural == "f" or gender_for_default_plural == "mf" or gender_for_default_plural == "mfbysense")) or (irreg_gender_lemma:find("a$") and (gender_for_default_plural == "m" or gender_for_default_plural == "mf" or gender_for_default_plural == "mfbysense")) then inscat(category_plpos .. " dengan genus tak tentu") end end local function get_noun_params(nountype) local params = { [1] = {list = "g", disallow_holes = true, required = nountype ~= "proper", default = "?", type = "genders", flatten = true}, [2] = {list = "pl", disallow_holes = true}, ["m"] = list_param, ["f"] = list_param, ["mpl"] = list_param, ["fpl"] = list_param, ["adj"] = list_param, --adjective(s) ["adv"] = list_param, --adverb(s) ["dem"] = list_param, --demonym(s) ["fdem"] = list_param, --female demonym(s) } insert_deriv_params(params) return params end pos_functions["Kata nama"] = { params = get_noun_params("base"), func = do_noun, } pos_functions["Kata nama khas"] = { params = get_noun_params("proper"), func = function(args, data) do_noun(args, data, "is proper noun") end, } pos_functions["Kata nama kardinal"] = { params = get_noun_params("base"), func = function(args, data) do_noun(args, data) insert(data.categories, 1, "Nombor kardinal " .. langname) end, pos_category = "Kata bilangan", } ----------------------------------------------------------------------------------------- -- Adjectives -- ----------------------------------------------------------------------------------------- local function do_adjective(args, data, is_superlative) local feminines = {} local masculine_plurals = {} local feminine_plurals = {} -- Use "participle" not "past participle" for categories such as 'invariable participles' local category_plpos = data.checkredlinks if category_plpos == true then category_plpos = data.pos_category end local category_pos = m_en_utilities.singularize(category_plpos) if args.sp then local romut = require(romut_module) if not romut.allowed_special_indicators[args.sp] then local indicators = {} for indic, _ in pairs(romut.allowed_special_indicators) do insert(indicators, "'" .. indic .. "'") end table.sort(indicators) error("Special inflection indicator beginning can only be " .. mw.text.listToText(indicators) .. ": " .. args.sp) end end local lemma = data.pagename local function fetch_inflections(field) local retval = m_headword_utilities.parse_term_list_with_modifiers { paramname = field, forms = args[field], splitchar = ",", } if not retval[1] then return {{term = "+"}} end return retval end local function insert_inflection(terms, label, accel) m_headword_utilities.insert_inflection { headdata = data, terms = terms, label = label, accel = accel and {form = accel} or nil, } end if args.inv then -- invariable adjective insert(data.inflections, {label = glossary_link("invariable")}) insert(data.categories, langname .. " indeclinable " .. category_plpos) end if args.noforms then -- [[bello]] and any others too complicated to describe in headword insert(data.inflections, {label = "see below for inflection"}) end if args.inv or args.apoc or args.noforms then if args.sp or args.f[1] or args.pl[1] or args.mpl[1] or args.fpl[1] then error("Can't specify inflections with an invariable or apocopated adjective or with noforms=") end elseif args.fonly then -- feminine-only if args.f[1] then error("Can't specify explicit feminines with feminine-only " .. category_pos) end if args.pl[1] then error("Can't specify explicit plurals with feminine-only " .. category_pos .. ", use fpl=") end if args.mpl[1] then error("Can't specify explicit masculine plurals with feminine-only " .. category_pos) end local argsfpl = fetch_inflections("fpl") for _, fpl in ipairs(argsfpl) do if fpl.term == "+" then local defpl = com.make_plural(lemma, "f", args.sp) if not defpl then error("Unable to generate default plural of '" .. lemma .. "'") end fpl.term = defpl else fpl.term = replace_hash_with_lemma(fpl.term, lemma) end insert(feminine_plurals, fpl) end insert(data.inflections, {label = "feminine-only"}) insert_inflection(feminine_plurals, "feminine plural", "f|p") else -- Gather feminines. for _, f in ipairs(fetch_inflections("f")) do if f.term == "+" then -- Generate default feminine. f.term = com.make_feminine(lemma, args.sp) else f.term = replace_hash_with_lemma(f.term, lemma) end insert(feminines, f) end local fem_like_lemma = #feminines == 1 and feminines[1].term == lemma and not m_headword_utilities.termobj_has_qualifiers_or_labels(feminines[1]) if fem_like_lemma then insert(data.categories, langname .. " epicene " .. category_plpos) end local mpl_field = "mpl" local fpl_field = "fpl" if args.pl[1] then if args.mpl[1] or args.fpl[1] then error("Can't specify both pl= and mpl=/fpl=") end mpl_field = "pl" fpl_field = "pl" end local argsmpl = fetch_inflections(mpl_field) local argsfpl = fetch_inflections(fpl_field) for _, mpl in ipairs(argsmpl) do if mpl.term == "+" then -- Generate default masculine plural. local defpl = com.make_plural(lemma, "m", args.sp) if not defpl then error("Unable to generate default plural of '" .. lemma .. "'") end mpl.term = defpl else mpl.term = replace_hash_with_lemma(mpl.term, lemma) end insert(masculine_plurals, mpl) end for _, fpl in ipairs(argsfpl) do if fpl.term == "+" then for _, f in ipairs(feminines) do -- Generate default feminine plural; f is a table. local fplobj = m_table.shallowCopy(fpl) local defpl = com.make_plural(f.term, "f", args.sp) if not defpl then error("Unable to generate default plural of '" .. f.term .. "'") end fplobj.term = defpl m_headword_utilities.combine_termobj_qualifiers_labels(fplobj, f) insert(feminine_plurals, fplobj) end else fpl.term = replace_hash_with_lemma(fpl.term, lemma) insert(feminine_plurals, fpl) end end local fem_pl_like_masc_pl = masculine_plurals[1] and feminine_plurals[1] and m_table.deepEquals(masculine_plurals, feminine_plurals) local masc_pl_like_lemma = #masculine_plurals == 1 and masculine_plurals[1].term == lemma and not m_headword_utilities.termobj_has_qualifiers_or_labels(masculine_plurals[1]) if fem_like_lemma and fem_pl_like_masc_pl and masc_pl_like_lemma then -- actually invariable insert(data.inflections, {label = glossary_link("invariable")}) insert(data.categories, langname .. " indeclinable " .. category_plpos) else -- Make sure there are feminines given and not same as lemma. if not fem_like_lemma then insert_inflection(feminines, "feminine", "f|s") elseif args.gneut then data.genders = {"gneut"} else data.genders = {"mfbysense"} end if fem_pl_like_masc_pl then if args.gneut then insert_inflection(masculine_plurals, "plural", "p") else -- This is how the Spanish module works. -- insert_inflection(masculine_plurals, "masculine and feminine plural", "p") insert_inflection(masculine_plurals, "plural", "p") end else insert_inflection(masculine_plurals, "masculine plural", "m|p") insert_inflection(feminine_plurals, "feminine plural", "f|p") end end end local function parse_and_insert_adj_inflection(field, label, accel, frob) parse_and_insert_inflection(data, args, field, label, accel, frob) end parse_and_insert_adj_inflection("n", "neuter") parse_and_insert_adj_inflection("comp", glossary_link("comparative")) parse_and_insert_adj_inflection("sup", glossary_link("superlative")) parse_and_insert_adj_inflection("adv", glossary_link("adverb")) insert_deriv_inflections(data, args) if args.irreg and is_superlative then insert(data.categories, langname .. " irregular superlative " .. category_plpos) end end local function get_adjective_params(adjtype) local params = { ["inv"] = boolean_param, --invariable ["noforms"] = boolean_param, --too complicated to list forms except in a table ["sp"] = true, -- special indicator: "first", "first-last", etc. ["f"] = list_param, --feminine form(s) ["pl"] = list_param, --plural override(s) ["fpl"] = list_param, --feminine plural override(s) ["mpl"] = list_param, --masculine plural override(s) ["adv"] = list_param, --adverb(s) } if adjtype == "base" or adjtype == "part" or adjtype == "det" then params["comp"] = list_param --comparative(s) params["sup"] = list_param --superlative(s) params["fonly"] = boolean_param -- feminine only end if adjtype == "sup" then params["irreg"] = boolean_param end insert_deriv_params(params) return params end pos_functions["adjectives"] = { params = get_adjective_params("base"), func = do_adjective, } pos_functions["comparative adjectives"] = { params = get_adjective_params("comp"), func = do_adjective, pos_category = "adjectives", } pos_functions["superlative adjectives"] = { params = get_adjective_params("sup"), func = function(args, data) do_adjective(args, data, "is superlative") end, pos_category = "adjectives", } pos_functions["cardinal adjectives"] = { params = get_adjective_params("card"), func = function(args, data) do_adjective(args, data) insert(data.categories, 1, langname .. " cardinal numbers") end, pos_category = "numerals", } pos_functions["past participles"] = { params = get_adjective_params("part"), func = do_adjective, redlink_pos = "participles", } pos_functions["present participles"] = { params = get_adjective_params("part"), func = do_adjective, redlink_pos = "participles", } pos_functions["determiners"] = { params = get_adjective_params("det"), func = do_adjective, } pos_functions["articles"] = { params = get_adjective_params("det"), func = do_adjective, } pos_functions["adjective-like pronouns"] = { params = get_adjective_params("pron"), func = do_adjective, pos_category = "pronouns", } pos_functions["cardinal invariable"] = { params = {}, func = function(args, data) insert(data.categories, langname .. " cardinal numbers") insert(data.categories, langname .. " indeclinable numerals") insert(data.inflections, {label = glossary_link("invariable")}) end, pos_category = "numerals", } ----------------------------------------------------------------------------------------- -- Adverbs -- ----------------------------------------------------------------------------------------- local function do_adverb(args, data) local function parse_and_insert_adv_inflection(field, label, accel, frob) parse_and_insert_inflection(data, args, field, label, accel, frob) end parse_and_insert_adv_inflection("comp", glossary_link("comparative")) parse_and_insert_adv_inflection("sup", glossary_link("superlative")) parse_and_insert_adv_inflection("adj", glossary_link("adjective")) end local function get_adverb_params(advtype) local params = { ["adj"] = list_param, --adjective(s) } if advtype == "base" then params["comp"] = list_param --comparative(s) params["sup"] = list_param --superlative(s) end return params end pos_functions["adverbs"] = { params = get_adverb_params("base"), func = do_adverb, } pos_functions["comparative adverbs"] = { params = get_adverb_params("comp"), func = do_adverb, pos_category = "adverbs", } pos_functions["superlative adverbs"] = { params = get_adverb_params("sup"), func = do_adverb, pos_category = "adverbs", } ----------------------------------------------------------------------------------------- -- Verbs -- ----------------------------------------------------------------------------------------- pos_functions["verbs"] = { params = { [1] = {}, ["noautolinktext"] = boolean_param, ["noautolinkverb"] = boolean_param, }, func = function(args, data) if args[1] then local alternant_multiword_spec = require(it_verb_module).do_generate_forms(args, "from headword", data.heads[1]) local function do_verb_form(slot, label, rowslot, rowlabel) local forms = alternant_multiword_spec.forms[slot] local retval if alternant_multiword_spec.rowprops.all_defective[rowslot] then if not alternant_multiword_spec.rowprops.defective[rowslot] then -- No forms, but none expected; don't display anything return end retval = {label = "no " .. rowlabel} elseif not forms then retval = {label = "no " .. label} elseif alternant_multiword_spec.rowprops.all_unknown[rowslot] then retval = {label = "unknown " .. rowlabel} elseif forms[1].form == "?" then retval = {label = "unknown " .. label} else -- Disable accelerators for now because we don't want the added accents going into the headwords. -- FIXME: We now have support in [[Module:accel]] to specify the target explicitly; we can use this -- so we can add the accelerators back with a param to avoid the accents. local accel_form = nil -- all_verb_slots[slot] retval = {label = label, accel = accel_form and {form = accel_form} or nil} local prev_footnotes = nil -- If the footnotes for this form are the same as the footnotes for the preceding form or -- contain the preceding footnotes, replace the footnotes that are the same with "ditto". -- This avoids repetition on pages like [[succedere]] where the form ''succedétti'' has a long -- footnote which gets repeated in the traditional form ''succedètti'' (which also has the -- footnote "[traditional]"). for _, form in ipairs(forms) do local quals, refs = require(inflection_utilities_module). convert_footnotes_to_qualifiers_and_references(form.footnotes) local quals_with_ditto = quals if quals and prev_footnotes then local quals_contains_previous = true for _, qual in ipairs(prev_footnotes) do if not m_table.contains(quals, qual) then quals_contains_previous = false break end end if quals_contains_previous then local inserted_ditto = false quals_with_ditto = {} for _, qual in ipairs(quals) do if m_table.contains(prev_footnotes, qual) then if not inserted_ditto then insert(quals_with_ditto, "ditto") inserted_ditto = true end else insert(quals_with_ditto, qual) end end end end prev_footnotes = quals insert(retval, {term = form.form, q = quals_with_ditto, refs = refs}) end end insert(data.inflections, retval) end if alternant_multiword_spec.props.is_pronominal then insert(data.inflections, {label = glossary_link("pronominal")}) end if alternant_multiword_spec.props.impers then insert(data.inflections, {label = glossary_link("impersonal")}) end if alternant_multiword_spec.props.thirdonly then insert(data.inflections, {label = "third-person only"}) end local thirdonly = alternant_multiword_spec.props.impers or alternant_multiword_spec.props.thirdonly local sing_label = thirdonly and "third-person singular" or "first-person singular" for _, rowspec in ipairs { {"pres", "present", true}, {"phis", "past historic", true}, {"pp", "past participle", true}, {"imperf", "imperfect"}, {"fut", "future"}, {"sub", "subjunctive"}, {"impsub", "imperfect subjunctive"}, } do local rowslot, desc, always_show = unpack(rowspec) local slot = rowslot .. (thirdonly and "3s" or "1s") local must_show = alternant_multiword_spec.is_irreg[slot] if always_show then must_show = true elseif rowslot == "imperf" and alternant_multiword_spec.props.has_explicit_stem_spec then -- If there is an explicit stem spec, make sure it gets displayed; the imperfect is a good way of -- showing this. must_show = true elseif not alternant_multiword_spec.forms[slot] then -- If the principal part is unexpectedly missing, make sure we show this. must_show = true elseif alternant_multiword_spec.forms[slot][1].form == "?" then -- If the principal part is unknown, make sure we show this. must_show = true end if must_show then if rowslot == "pp" then do_verb_form(rowslot, desc, rowslot, desc) else do_verb_form(slot, sing_label .. " " .. desc, rowslot, desc) end end end -- Also do the imperative, but not for third-only verbs, which are always missing the imperative. if not thirdonly and (alternant_multiword_spec.is_irreg.imp2s or not alternant_multiword_spec.forms.imp2s) then do_verb_form("imp2s", "second-person singular imperative", "imp", "imperative") end -- If there is a past participle but no auxiliary (e.g. [[malfare]]), explicitly add "no auxiliary". In -- cases where there's no past participle and no auxiliary (e.g. [[irrompere]]), we don't do this as we -- already get "no past participle" displayed. Don't display an auxiliary in any case if the lemma -- consists entirely of reflexive verbs (for which the auxiliary is always [[essere]]). if alternant_multiword_spec.props.is_non_reflexive and ( alternant_multiword_spec.forms.aux or alternant_multiword_spec.forms.pp ) then do_verb_form("aux", "auxiliary", "aux", "auxiliary") end -- Add categories. for _, cat in ipairs(alternant_multiword_spec.categories) do insert(data.categories, cat) end -- If the user didn't explicitly specify head=, or specified exactly one head (not 2+) and we were able to -- incorporate any links in that head into the 1= specification, use the infinitive generated by -- [[Module:it-verb]] it in place of the user-specified or auto-generated head so that we get accents marked -- on the verb(s). Don't do this if the user gave multiple heads or gave a head with a multiword-linked -- verbal expression such as '[[dare esca]] [[al]] [[fuoco]]'. if #data.user_specified_heads == 0 or ( #data.user_specified_heads == 1 and alternant_multiword_spec.incorporated_headword_head_into_lemma ) then data.heads = {} for _, lemma_obj in ipairs(alternant_multiword_spec.forms.inf) do local quals, refs = require(inflection_utilities_module). convert_footnotes_to_qualifiers_and_references(lemma_obj.footnotes) insert(data.heads, {term = lemma_obj.form, q = quals, refs = refs}) end end end end } ----------------------------------------------------------------------------------------- -- Suffix forms -- ----------------------------------------------------------------------------------------- pos_functions["suffix forms"] = { params = { [1] = {required = true, list = true, disallow_holes = true}, ["g"] = {list = true, disallow_holes = true, type = "genders", flatten = true}, }, func = function(args, data) validate_genders(args.g) data.genders = args.g local suffix_type = {} for _, typ in ipairs(args[1]) do insert(suffix_type, typ .. "-forming suffix") end insert(data.inflections, {label = "non-lemma form of " .. m_table.serialCommaJoin(suffix_type, {conj = "or"})}) end, } ----------------------------------------------------------------------------------------- -- Arbitrary parts of speech -- ----------------------------------------------------------------------------------------- pos_functions["arbitrary part of speech"] = { params = { [1] = {required = true}, ["g"] = {list = true, disallow_holes = true, type = "genders", flatten = true}, }, func = function(args, data) if data.is_suffix then error("Can't use [[Template:it-pos]] with suffixes") end validate_genders(args.g) data.genders = args.g local plpos = m_en_utilities.pluralize(args[1]) data.pos_category = plpos end, } return export nwrvr8c2uw88xzkgo5v0lz55gsyd6a5 281465 281463 2026-04-23T09:44:51Z Hakimi97 2668 281465 Scribunto text/plain -- This module contains code for Italian headword templates. -- Templates covered are: -- * {{it-noun}}, {{it-proper noun}}; -- * {{it-verb}}; -- * {{it-adj}}, {{it-adj-comp}}, {{it-adj-sup}}; -- * {{it-det}}; -- * {{it-art}}; -- * {{it-pron-adj}}; -- * {{it-pp}}; -- * {{it-presp}}; -- * {{it-card-noun}}, {{it-card-adj}}, {{it-card-inv}}; -- * {{it-adv}}; -- * {{it-pos}}; -- * {{it-suffix form}}. -- See [[Module:it-verb]] for Italian conjugation templates. local export = {} local pos_functions = {} local force_cat = false -- for testing; if true, categories appear in non-mainspace pages local m_strutils = require("Module:string utilities") local usub = m_strutils.sub local require_when_needed = require("Module:utilities/require when needed") local insert = table.insert local remove = table.remove local m_table = require("Module:table") local com = require("Module:it-common") local en_utilities_module = "Module:en-utilities" local headword_module = "Module:headword" local headword_utilities_module = "Module:headword utilities" local inflection_utilities_module = "Module:inflection utilities" local it_verb_module = "Module:it-verb" local parse_interface_module = "Module:parse interface" local romut_module = "Module:romance utilities" local lang = require("Module:languages").getByCode("it") local langname = lang:getCanonicalName() local m_en_utilities = require_when_needed(en_utilities_module) local m_headword_utilities = require_when_needed(headword_utilities_module) local glossary_link = require_when_needed(headword_utilities_module, "glossary_link") local unpack = unpack or table.unpack -- Lua 5.2 compatibility local no_split_apostrophe_words = { ["c'è"] = true, ["c'era"] = true, ["c'erano"] = true, } ----------------------------------------------------------------------------------------- -- Utility functions -- ----------------------------------------------------------------------------------------- local function track(page) require("Module:debug/track")("it-headword/" .. page) return true end -- Parse and insert an inflection not requiring additional processing into `data.inflections`. The raw arguments come -- from `args[field]`, which is parsed for inline modifiers. `label` is the label that the inflections are given; -- `accel` is the accelerator form, or nil. local function parse_and_insert_inflection(data, args, field, label, accel, frob) m_headword_utilities.parse_and_insert_inflection { headdata = data, forms = args[field], paramname = field, splitchar = ",", label = label, accel = accel and {form = accel} or nil, frob = frob, } end local function replace_hash_with_lemma(term, lemma) -- If there is a % sign in the lemma, we have to replace it with %% so it doesn't get interpreted as a capture -- replace expression. lemma = lemma:gsub("%%", "%%%%") return (term:gsub("#", lemma)) end local list_param = {list = true, disallow_holes = true} local boolean_param = {type = "boolean"} ----------------------------------------------------------------------------------------- -- Main entry point -- ----------------------------------------------------------------------------------------- function export.show(frame) local poscat = frame.args[1] or error("Part of speech has not been specified. Please pass parameter 1 to the module invocation.") local parargs = frame:getParent().args local params = { ["head"] = list_param, ["id"] = true, ["sort"] = true, ["apoc"] = boolean_param, ["splithyph"] = boolean_param, ["nolinkhead"] = boolean_param, ["nolink"] = {type = "boolean", alias_of = "nolinkhead"}, ["json"] = boolean_param, ["pagename"] = true, -- for testing } if pos_functions[poscat] then for key, val in pairs(pos_functions[poscat].params) do params[key] = val end end local args = require("Module:parameters").process(parargs, params) local pagename = args.pagename or mw.loadData("Module:headword/data").pagename local user_specified_heads = args.head local heads = user_specified_heads if args.nolinkhead then if #heads == 0 then heads = {pagename} end else local romut = require(romut_module) local auto_linked_head = romut.add_links_to_multiword_term(pagename, args.splithyph, no_split_apostrophe_words) if #heads == 0 then heads = {auto_linked_head} else for i, head in ipairs(heads) do if head:find("^~") then head = romut.apply_link_modifiers(auto_linked_head, usub(head, 2)) heads[i] = head end if head == auto_linked_head then track("redundant-head") end end end end local data = { lang = lang, pos_category = pos_functions[poscat] and pos_functions[poscat].pos_category or poscat, categories = {}, heads = heads, user_specified_heads = user_specified_heads, no_redundant_head_cat = #user_specified_heads == 0, genders = {}, inflections = {}, pagename = pagename, id = args.id, sort_key = args.sort, force_cat_output = force_cat, checkredlinks = pos_functions[poscat] and pos_functions[poscat].redlink_pos or true, } if pagename:find("^%-") and poscat ~= "bentuk akhiran" then data.is_suffix = true data.pos_category = "akhiran" data.checkredlinks = true local singular_poscat = m_en_utilities.singularize(poscat) insert(data.categories, "Akhiran membentuk " .. singular_poscat .. " bahasa " .. langname) insert(data.inflections, {label = "Akhiran membentuk " .. singular_poscat}) end if pos_functions[poscat] then pos_functions[poscat].func(args, data) end if args.apoc then -- Apocopated form of a term; do this after calling pos_functions[], because the function might modify -- data.pos_category. local pos = data.pos_category if not pos:find("Bentuk ") then -- Apocopated forms are non-lemma forms. local singular_poscat = m_en_utilities.singularize(pos) data.pos_category = "Bentuk " .. singular_poscat end -- If this is a suffix, insert label 'apocopated' after 'FOO-forming suffix', otherwise insert at the beginning. insert(data.inflections, data.is_suffix and 2 or 1, {label = glossary_link("apocopated")}) end if args.json then return require("Module:JSON").toJSON(data) end return require(headword_module).full_headword(data) end local deriv_params = { {"dim", glossary_link("diminutif")}, {"dim_dim", "double " .. glossary_link("diminutif")}, {"aug_dim", glossary_link("agam") .. "-" .. glossary_link("diminutif")}, {"aug", glossary_link("agam")}, {"dim_aug", glossary_link("diminutif") .. "-" .. glossary_link("agam")}, {"aug_aug", "double " .. glossary_link("agam")}, {"pej", glossary_link("pejoratif")}, {"dim_pej", glossary_link("diminutif") .. "-" .. glossary_link("pejoratif")}, {"aug_pej", glossary_link("agam") .. "-" .. glossary_link("pejoratif")}, {"pej_pej", "double " .. glossary_link("pejoratif")}, {"end", glossary_link("endearing")}, {"dim_end", glossary_link("diminutif") .. "-" .. glossary_link("endearing")}, {"aug_end", glossary_link("agam") .. "-" .. glossary_link("endearing")}, {"derog", glossary_link("hinaan")}, {"dim_derog", glossary_link("diminutif") .. "-" .. glossary_link("hinaan")}, {"aug_derog", glossary_link("agam") .. "-" .. glossary_link("hinaan")}, {"end_derog", glossary_link("endearing") .. "-" .. glossary_link("hinaan")}, } local function insert_deriv_params(params) for _, deriv_param in ipairs(deriv_params) do local param = unpack(deriv_param) params[param] = list_param end end local param_mods = { t = { -- We need to store the <t:...> inline modifier into the "gloss" key of the parsed part, because that is what -- [[Module:links]] expects. item_dest = "gloss", }, gloss = {}, -- no 'tr' or 'ts', doesn't make sense for Italian g = { -- We need to store the <g:...> inline modifier into the "genders" key of the parsed part, because that is what -- [[Module:links]] expects. item_dest = "genders", sublist = true, }, id = {}, alt = {}, q = {type = "qualifier"}, qq = {type = "qualifier"}, lit = {}, pos = {}, -- no 'sc', doesn't make sense for Italian } local function parse_term_with_modifiers(paramname, val) local function generate_obj(term) local decomp = com.decompose(term) local lemma = com.remove_non_final_accents(decomp) if lemma ~= decomp then term = com.compose("[[" .. lemma .. "|" .. decomp .. "]]") end return {term = term} end local retval = require(parse_interface_module).parse_inline_modifiers(val, { paramname = paramname, param_mods = param_mods, generate_obj = generate_obj, splitchar = "[/;,]", preserve_splitchar = true, }) for _, obj in ipairs(retval) do if obj.delimiter == ";" then obj.separator = "; " elseif obj.delimiter == "/" then obj.separator = "/" -- default to nil for comma end end return retval end local function insert_deriv_inflections(data, args) for _, deriv_param in ipairs(deriv_params) do local param, desc = unpack(deriv_param) if #args[param] > 0 then local inflection = {label = desc} for _, term in ipairs(args[param]) do local parsed_terms = parse_term_with_modifiers(param, term) for _, parsed_term in ipairs(parsed_terms) do insert(inflection, parsed_term) end end insert(data.inflections, inflection) end end end ----------------------------------------------------------------------------------------- -- Nouns -- ----------------------------------------------------------------------------------------- local allowed_genders = m_table.listToSet( {"m", "f", "mf", "mfbysense", "mfequiv", "gneut", "n", "m-p", "f-p", "mf-p", "mfbysense-p", "mfequiv-p", "gneut-p", "n-p", "?", "?-p"} ) local function validate_genders(genders) for _, g in ipairs(genders) do if type(g) == "table" then g = g.spec end if not allowed_genders[g] then error("Unrecognized gender: " .. g) end end end local function do_noun(args, data, is_proper) local is_plurale_tantum = false local has_singular = false local category_plpos = data.checkredlinks if category_plpos == true then category_plpos = data.pos_category end local category_pos = m_en_utilities.singularize(category_plpos) validate_genders(args[1]) data.genders = args[1] local saw_m = false local saw_f = false local gender_for_default_plural -- Check for specific genders and pluralia tantum. for _, g in ipairs(args[1]) do if type(g) == "table" then g = g.spec end if g:find("-p$") then is_plurale_tantum = true else has_singular = true if g == "m" or g == "mf" or g == "mfbysense" then saw_m = true end if g == "f" or g == "mf" or g == "mfbysense" then saw_f = true end end end if saw_m and saw_f then gender_for_default_plural = "mf" elseif saw_f then gender_for_default_plural = "f" else gender_for_default_plural = "m" end local lemma = data.pagename local function inscat(cat) insert(data.categories, langname .. " " .. cat) end local function insert_noun_inflection(terms, label, accel, no_inv) for _, term in ipairs(terms) do if not no_inv and term.term == lemma then term.term = nil term.label = glossary_link("invariable") end end m_headword_utilities.insert_inflection { headdata = data, terms = terms, label = label, accel = accel and {form = accel} or nil, } end -- Plural local plurals = {} -- Fetch explicit masculine and feminine plurals here because we may change them below when processing plurals. local mpls = m_headword_utilities.parse_term_list_with_modifiers { paramname = "mpl", forms = args.mpl, splitchar = ",", } local fpls = m_headword_utilities.parse_term_list_with_modifiers { paramname = "fpl", forms = args.fpl, splitchar = ",", } if is_plurale_tantum and not has_singular then if args[2][1] then error("Can't specify plurals of plurale tantum " .. category_pos) end insert(data.inflections, {label = glossary_link("hanya jamak")}) elseif args.apoc then -- apocopated noun if args[2][1] then error("Can't specify plurals of apocopated " .. category_pos) end else -- Fetch plurals and associated qualifiers, labels and genders. plurals = m_headword_utilities.parse_term_list_with_modifiers { paramname = {2, "pl"}, forms = args[2], splitchar = ",", include_mods = {"g"}, } -- Check for special plural signals local mode = nil local pl1 = plurals[1] if pl1 and #pl1.term == 1 then mode = pl1.term if mode == "?" or mode == "!" or mode == "-" or mode == "~" then pl1.term = nil if next(pl1) then error(("Can't specify inline modifiers with plural code '%s'"):format(mode)) end remove(plurals, 1) -- Remove the mode parameter elseif mode ~= "+" and mode ~= "#" then error(("Unexpected plural code '%s'"):format(mode)) end end if is_plurale_tantum then -- both singular and plural insert(data.inflections, {label = "kadangkala " .. glossary_link("hanya jamak") .. ", dengan kelainan"}) end if mode == "?" then -- Plural is unknown insert(data.categories, category_plpos .. " bahasa " .. langname .. " dengan bentuk jamak yang tidak dikenal pasti") elseif mode == "!" then -- Plural is not attested insert(data.inflections, {label = "plural not attested"}) insert(data.categories, category_plpos .. " bahasa " .. langname .. " dengan bentuk jamak yang tidak ditentusahkan") if plurals[1] then error("Can't specify any plurals along with unattested plural code '!'") end elseif mode == "-" then -- Uncountable noun; may occasionally have a plural insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname) -- If plural forms were given explicitly, then show "usually" if plurals[1] then insert(data.inflections, {label = "biasanya " .. glossary_link("tak berbilang")}) insert(data.categories, category_plpos .. " berbilang bahasa " .. langname) else insert(data.inflections, {label = glossary_link("uncountable")}) end else -- Countable or mixed countable/uncountable -- If no plurals, use the default plural unless mpl= or fpl= explicitly given. if not plurals[1] and not mpls[1] and not fpls[1] and not is_proper then plurals[1] = {term = "+"} end if mode == "~" then -- Mixed countable/uncountable noun, always has a plural insert(data.inflections, {label = glossary_link("berbilang") .. " dan " .. glossary_link("tak berbilang")}) insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname) insert(data.categories, category_plpos .. " berbilang bahasa " .. langname) elseif plurals[1] then -- Countable nouns insert(data.categories, category_plpos .. " berbilang bahasa " .. langname) else -- Uncountable nouns insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname) end end -- Process plurals, handling requests for default plurals. local has_default_or_hash = false for _, pl in ipairs(plurals) do if pl.term:find("^%+") or pl.term:find("#") or pl.term == "cap*" or pl.term == "cap*+" then has_default_or_hash = true break end end if has_default_or_hash then local newpls = {} local function insert_pl(pl, defpl) pl.term = defpl insert(newpls, pl) end local function make_gendered_plural(pl, special) if gender_for_default_plural == "mf" then local default_mpl = com.make_plural(lemma, "m", special) local default_fpl = com.make_plural(lemma, "f", special) if default_mpl then if default_mpl == default_fpl then insert_pl(pl, default_mpl) else if args.mpl[1] or args.fpl[1] then error("Can't specify gendered plural spec '" .. (special or "+") .. "' along with gender=" .. gender_for_default_plural .. " and also specify mpl= or fpl=") end mpls = {m_table.shallowCopy(pl)} mpls[1].term = default_mpl fpls = {pl} fpls[1].term = default_fpl end end else local defpl = com.make_plural(lemma, gender_for_default_plural, special) if defpl then insert_pl(pl, defpl) end end end for _, pl in ipairs(plurals) do if pl.term == "cap*" or pl.term == "cap*+" then make_gendered_plural(pl, pl.term) elseif pl.term == "+" then make_gendered_plural(pl) elseif pl.term:find("^%+") then local special = require(romut_module).get_special_indicator(pl.term) make_gendered_plural(pl, special) else insert_pl(pl, replace_hash_with_lemma(pl.term, lemma)) end end plurals = newpls end if plurals[2] then inscat(category_plpos .. " with multiple plurals") end -- If the first or only plural is the same as the singular, replace it with 'invariable', or 'usually -- invariable' if there is more than one plural. pl1 = plurals[1] if pl1 and pl1.term == lemma then if plurals[2] then insert(data.inflections, {label = "usually " .. glossary_link("invariable"), q = pl1.q, qq = pl1.qq, l = pl1.l, ll = pl1.ll, refs = pl1.refs }) else insert(data.inflections, {label = glossary_link("invariable"), q = pl1.q, qq = pl1.qq, l = pl1.l, ll = pl1.ll, refs = pl1.refs }) end remove(plurals, 1) inscat("indeclinable " .. category_plpos) end if plurals[1] then -- Check for gender-changing plurals. for _, pl in ipairs(plurals) do if pl.genders then for _, g in ipairs(pl.genders) do if type(g) ~= "table" then g = {spec = g} end if g.spec == "m" and not saw_m or g.spec == "f" and not saw_f then inscat(category_plpos .. " that change gender in the plural") end end end end end end -- Gather masculines/feminines. For each one, generate the corresponding plural. `field` is the name of the field -- containing the masculine or feminine forms (normally "m" or "f"); `inflect` is a function of one or two arguments -- to generate the default masculine or feminine from the lemma (the arguments are the lemma and optionally a -- "special" flag to indicate how to handle multiword lemmas, and the function is normally make_feminine or -- make_masculine from [[Module:it-common]]); and `default_plurals` is a list into which the corresponding default -- plurals of the gathered or generated masculine or feminine forms are stored. local function handle_mf(field, inflect, default_plurals) local special local mfs = m_headword_utilities.parse_term_list_with_modifiers { paramname = field, forms = args[field], splitchar = ",", frob = function(term) if term == "+" then -- Generate default masculine/feminine. term = inflect(lemma) else term = replace_hash_with_lemma(term, lemma) end special = require(romut_module).get_special_indicator(term) if special then term = inflect(lemma, special) end return term end } for _, mf in ipairs(mfs) do local plobj = m_table.shallowCopy(mf) plobj.term = com.make_plural(mf.term, field, special) if plobj.term then -- Add an accelerator for each masculine/feminine plural whose lemma is the corresponding singular, so that -- the accelerated entry that is generated has a definition that looks like -- # {{plural of|it|MFSING}} plobj.accel = {form = "p", lemma = mf.term} insert(default_plurals, plobj) end end return mfs end local feminine_plurals = {} local feminines = handle_mf("f", com.make_feminine, feminine_plurals) local masculine_plurals = {} local masculines = handle_mf("m", com.make_masculine, masculine_plurals) local function handle_mf_plural(mfplfield, mfpls, gender, default_plurals, singulars) if is_plurale_tantum then return mfpls, true end local new_mfpls = {} local saw_plus local noinv for i, mfpl in ipairs(mfpls) do local accel if #mfpls == #singulars then -- If same number of overriding masculine/feminine plurals as singulars, assume each plural goes with -- the corresponding singular and use each corresponding singular as the lemma in the accelerator. The -- generated entry will have -- # {{plural of|it|SINGULAR}} -- as the definition. accel = {form = "p", lemma = singulars[i].term} else accel = nil end if mfpl.term == "+" then -- We should never see + twice. If we do, it will lead to problems since we overwrite the values of -- default_plurals the first time around. if saw_plus then error(("Saw + twice when handling %s="):format(mfplfield)) end saw_plus = true if not default_plurals[1] then local defpl = com.make_plural(lemma, gender) if not defpl then error("Unable to generate default plural of '" .. lemma .. "'") end default_plurals[1] = {term = defpl} end for _, defpl in ipairs(default_plurals) do -- defpl is already a table and has an accel field m_headword_utilities.combine_termobj_qualifiers_labels(defpl, mfpl) insert(new_mfpls, defpl) end -- don't use "invariable" because the plural is not with respect to the lemma but with respect to the -- masc/fem singular noinv = true elseif mfpl.term == "cap*" or mfpl.term == "cap*+" or mfpl.term:find("^%+") then if mfpl.term:find("^%+") then mfpl.term = require(romut_module).get_special_indicator(mfpl.term) end if singulars[1] then for _, mf in ipairs(singulars) do local mfplobj = m_table.shallowCopy(mfpl) mfplobj.term = com.make_plural(mf.term, gender, mfpl.term) if mfplobj.term then mfplobj.accel = accel m_headword_utilities.combine_termobj_qualifiers_labels(mfplobj, mf) insert(new_mfpls, mfplobj) end -- don't use "invariable" because the plural is not with respect to the lemma but with respect -- to the masc/fem singular noinv = true -- FIXME: Should we throw an error if no plural could be generated? end else -- FIXME: This clause didn't exist in the corresponding code in [[Module:pt-headword]]. Is it -- correct? mfpl.term = com.make_plural(lemma, gender, mfpl.term) if mfpl.term then insert(new_mfpls, mfpl) end end else mfpl.accel = accel mfpl.term = replace_hash_with_lemma(mfpl.term, lemma) insert(new_mfpls, mfpl) -- don't use "invariable" if masc/fem singular present because the plural is not with respect to -- the lemma but with respect to the masc/fem singular noinv = noinv or #singulars > 0 end end return new_mfpls, noinv end local mpl_noinv, fpl_noinv -- Not fpls[1] because if the user didn't specify any explicit mpl= or fpl= but the lemma gender is mf or mfbysense -- and has separate masculine and feminine plural forms (e.g. any term in -ista), we don't want to reprocess those -- auto-generated forms. if args.fpl[1] then -- Override any existing feminine plurals. feminine_plurals, fpl_noinv = handle_mf_plural("fpl", fpls, "f", feminine_plurals, feminines) else feminine_plurals, fpl_noinv = fpls, false end if args.mpl[1] then -- Override any existing masculine plurals. masculine_plurals, mpl_noinv = handle_mf_plural("mpl", mpls, "m", masculine_plurals, masculines) else masculine_plurals, mpl_noinv = mpls, false end local function redundant_plural(pl) for _, p in ipairs(plurals) do if p.term == pl.term then return true end end return false end for _, mpl in ipairs(masculine_plurals) do if redundant_plural(mpl) then track("noun-redundant-mpl") end end for _, fpl in ipairs(feminine_plurals) do if redundant_plural(fpl) then track("noun-redundant-fpl") end end if plurals[1] then -- Set 'noinv' because we already took care of invariable plurals above. insert_noun_inflection(plurals, "plural", "p", "noinv") end insert_noun_inflection(masculines, "masculine") insert_noun_inflection(masculine_plurals, "masculine plural", nil, mpl_noinv) insert_noun_inflection(feminines, "feminine", "f") insert_noun_inflection(feminine_plurals, "feminine plural", nil, fpl_noinv) local function parse_and_insert_noun_inflection(field, label, accel) parse_and_insert_inflection(data, args, field, label, accel) end parse_and_insert_noun_inflection("adj", glossary_link("relational", "relational adjective")) parse_and_insert_noun_inflection("adv", glossary_link("adverb")) parse_and_insert_noun_inflection("dem", glossary_link("demonym")) parse_and_insert_noun_inflection("fdem", "female " .. glossary_link("demonym")) insert_deriv_inflections(data, args) -- Maybe add category 'Italian nouns with irregular gender' (or similar) local irreg_gender_lemma = lemma:gsub(" .*", "") -- only look at first word if (irreg_gender_lemma:find("o$") and (gender_for_default_plural == "f" or gender_for_default_plural == "mf" or gender_for_default_plural == "mfbysense")) or (irreg_gender_lemma:find("a$") and (gender_for_default_plural == "m" or gender_for_default_plural == "mf" or gender_for_default_plural == "mfbysense")) then inscat(category_plpos .. " dengan genus tak tentu") end end local function get_noun_params(nountype) local params = { [1] = {list = "g", disallow_holes = true, required = nountype ~= "proper", default = "?", type = "genders", flatten = true}, [2] = {list = "pl", disallow_holes = true}, ["m"] = list_param, ["f"] = list_param, ["mpl"] = list_param, ["fpl"] = list_param, ["adj"] = list_param, --adjective(s) ["adv"] = list_param, --adverb(s) ["dem"] = list_param, --demonym(s) ["fdem"] = list_param, --female demonym(s) } insert_deriv_params(params) return params end pos_functions["Kata nama"] = { params = get_noun_params("base"), func = do_noun, } pos_functions["Kata nama khas"] = { params = get_noun_params("proper"), func = function(args, data) do_noun(args, data, "is proper noun") end, } pos_functions["Kata nama kardinal"] = { params = get_noun_params("base"), func = function(args, data) do_noun(args, data) insert(data.categories, 1, "Nombor kardinal " .. langname) end, pos_category = "Kata bilangan", } ----------------------------------------------------------------------------------------- -- Adjectives -- ----------------------------------------------------------------------------------------- local function do_adjective(args, data, is_superlative) local feminines = {} local masculine_plurals = {} local feminine_plurals = {} -- Use "participle" not "past participle" for categories such as 'invariable participles' local category_plpos = data.checkredlinks if category_plpos == true then category_plpos = data.pos_category end local category_pos = m_en_utilities.singularize(category_plpos) if args.sp then local romut = require(romut_module) if not romut.allowed_special_indicators[args.sp] then local indicators = {} for indic, _ in pairs(romut.allowed_special_indicators) do insert(indicators, "'" .. indic .. "'") end table.sort(indicators) error("Special inflection indicator beginning can only be " .. mw.text.listToText(indicators) .. ": " .. args.sp) end end local lemma = data.pagename local function fetch_inflections(field) local retval = m_headword_utilities.parse_term_list_with_modifiers { paramname = field, forms = args[field], splitchar = ",", } if not retval[1] then return {{term = "+"}} end return retval end local function insert_inflection(terms, label, accel) m_headword_utilities.insert_inflection { headdata = data, terms = terms, label = label, accel = accel and {form = accel} or nil, } end if args.inv then -- invariable adjective insert(data.inflections, {label = glossary_link("invariable")}) insert(data.categories, langname .. " indeclinable " .. category_plpos) end if args.noforms then -- [[bello]] and any others too complicated to describe in headword insert(data.inflections, {label = "see below for inflection"}) end if args.inv or args.apoc or args.noforms then if args.sp or args.f[1] or args.pl[1] or args.mpl[1] or args.fpl[1] then error("Can't specify inflections with an invariable or apocopated adjective or with noforms=") end elseif args.fonly then -- feminine-only if args.f[1] then error("Can't specify explicit feminines with feminine-only " .. category_pos) end if args.pl[1] then error("Can't specify explicit plurals with feminine-only " .. category_pos .. ", use fpl=") end if args.mpl[1] then error("Can't specify explicit masculine plurals with feminine-only " .. category_pos) end local argsfpl = fetch_inflections("fpl") for _, fpl in ipairs(argsfpl) do if fpl.term == "+" then local defpl = com.make_plural(lemma, "f", args.sp) if not defpl then error("Unable to generate default plural of '" .. lemma .. "'") end fpl.term = defpl else fpl.term = replace_hash_with_lemma(fpl.term, lemma) end insert(feminine_plurals, fpl) end insert(data.inflections, {label = "feminine-only"}) insert_inflection(feminine_plurals, "feminine plural", "f|p") else -- Gather feminines. for _, f in ipairs(fetch_inflections("f")) do if f.term == "+" then -- Generate default feminine. f.term = com.make_feminine(lemma, args.sp) else f.term = replace_hash_with_lemma(f.term, lemma) end insert(feminines, f) end local fem_like_lemma = #feminines == 1 and feminines[1].term == lemma and not m_headword_utilities.termobj_has_qualifiers_or_labels(feminines[1]) if fem_like_lemma then insert(data.categories, langname .. " epicene " .. category_plpos) end local mpl_field = "mpl" local fpl_field = "fpl" if args.pl[1] then if args.mpl[1] or args.fpl[1] then error("Can't specify both pl= and mpl=/fpl=") end mpl_field = "pl" fpl_field = "pl" end local argsmpl = fetch_inflections(mpl_field) local argsfpl = fetch_inflections(fpl_field) for _, mpl in ipairs(argsmpl) do if mpl.term == "+" then -- Generate default masculine plural. local defpl = com.make_plural(lemma, "m", args.sp) if not defpl then error("Unable to generate default plural of '" .. lemma .. "'") end mpl.term = defpl else mpl.term = replace_hash_with_lemma(mpl.term, lemma) end insert(masculine_plurals, mpl) end for _, fpl in ipairs(argsfpl) do if fpl.term == "+" then for _, f in ipairs(feminines) do -- Generate default feminine plural; f is a table. local fplobj = m_table.shallowCopy(fpl) local defpl = com.make_plural(f.term, "f", args.sp) if not defpl then error("Unable to generate default plural of '" .. f.term .. "'") end fplobj.term = defpl m_headword_utilities.combine_termobj_qualifiers_labels(fplobj, f) insert(feminine_plurals, fplobj) end else fpl.term = replace_hash_with_lemma(fpl.term, lemma) insert(feminine_plurals, fpl) end end local fem_pl_like_masc_pl = masculine_plurals[1] and feminine_plurals[1] and m_table.deepEquals(masculine_plurals, feminine_plurals) local masc_pl_like_lemma = #masculine_plurals == 1 and masculine_plurals[1].term == lemma and not m_headword_utilities.termobj_has_qualifiers_or_labels(masculine_plurals[1]) if fem_like_lemma and fem_pl_like_masc_pl and masc_pl_like_lemma then -- actually invariable insert(data.inflections, {label = glossary_link("invariable")}) insert(data.categories, langname .. " indeclinable " .. category_plpos) else -- Make sure there are feminines given and not same as lemma. if not fem_like_lemma then insert_inflection(feminines, "feminine", "f|s") elseif args.gneut then data.genders = {"gneut"} else data.genders = {"mfbysense"} end if fem_pl_like_masc_pl then if args.gneut then insert_inflection(masculine_plurals, "plural", "p") else -- This is how the Spanish module works. -- insert_inflection(masculine_plurals, "masculine and feminine plural", "p") insert_inflection(masculine_plurals, "plural", "p") end else insert_inflection(masculine_plurals, "masculine plural", "m|p") insert_inflection(feminine_plurals, "feminine plural", "f|p") end end end local function parse_and_insert_adj_inflection(field, label, accel, frob) parse_and_insert_inflection(data, args, field, label, accel, frob) end parse_and_insert_adj_inflection("n", "neuter") parse_and_insert_adj_inflection("comp", glossary_link("comparative")) parse_and_insert_adj_inflection("sup", glossary_link("superlative")) parse_and_insert_adj_inflection("adv", glossary_link("adverb")) insert_deriv_inflections(data, args) if args.irreg and is_superlative then insert(data.categories, langname .. " irregular superlative " .. category_plpos) end end local function get_adjective_params(adjtype) local params = { ["inv"] = boolean_param, --invariable ["noforms"] = boolean_param, --too complicated to list forms except in a table ["sp"] = true, -- special indicator: "first", "first-last", etc. ["f"] = list_param, --feminine form(s) ["pl"] = list_param, --plural override(s) ["fpl"] = list_param, --feminine plural override(s) ["mpl"] = list_param, --masculine plural override(s) ["adv"] = list_param, --adverb(s) } if adjtype == "base" or adjtype == "part" or adjtype == "det" then params["comp"] = list_param --comparative(s) params["sup"] = list_param --superlative(s) params["fonly"] = boolean_param -- feminine only end if adjtype == "sup" then params["irreg"] = boolean_param end insert_deriv_params(params) return params end pos_functions["Kata sifat"] = { params = get_adjective_params("base"), func = do_adjective, } pos_functions["Kata sifat bandingan"] = { params = get_adjective_params("comp"), func = do_adjective, pos_category = "Kata sifat", } pos_functions["Kata sifat superlatif"] = { params = get_adjective_params("sup"), func = function(args, data) do_adjective(args, data, "is superlative") end, pos_category = "Kata sifat", } pos_functions["Kata sifat kardinal"] = { params = get_adjective_params("card"), func = function(args, data) do_adjective(args, data) insert(data.categories, 1, "Nombor kardinal bahasa " .. langname) end, pos_category = "numerals", } pos_functions["past participles"] = { params = get_adjective_params("part"), func = do_adjective, redlink_pos = "participles", } pos_functions["present participles"] = { params = get_adjective_params("part"), func = do_adjective, redlink_pos = "participles", } pos_functions["determiners"] = { params = get_adjective_params("det"), func = do_adjective, } pos_functions["articles"] = { params = get_adjective_params("det"), func = do_adjective, } pos_functions["adjective-like pronouns"] = { params = get_adjective_params("pron"), func = do_adjective, pos_category = "pronouns", } pos_functions["cardinal invariable"] = { params = {}, func = function(args, data) insert(data.categories, langname .. " cardinal numbers") insert(data.categories, langname .. " indeclinable numerals") insert(data.inflections, {label = glossary_link("invariable")}) end, pos_category = "numerals", } ----------------------------------------------------------------------------------------- -- Adverbs -- ----------------------------------------------------------------------------------------- local function do_adverb(args, data) local function parse_and_insert_adv_inflection(field, label, accel, frob) parse_and_insert_inflection(data, args, field, label, accel, frob) end parse_and_insert_adv_inflection("comp", glossary_link("comparative")) parse_and_insert_adv_inflection("sup", glossary_link("superlative")) parse_and_insert_adv_inflection("adj", glossary_link("adjective")) end local function get_adverb_params(advtype) local params = { ["adj"] = list_param, --adjective(s) } if advtype == "base" then params["comp"] = list_param --comparative(s) params["sup"] = list_param --superlative(s) end return params end pos_functions["Adverba"] = { params = get_adverb_params("base"), func = do_adverb, } pos_functions["comparative adverbs"] = { params = get_adverb_params("comp"), func = do_adverb, pos_category = "Adverba", } pos_functions["superlative adverbs"] = { params = get_adverb_params("sup"), func = do_adverb, pos_category = "Adverba", } ----------------------------------------------------------------------------------------- -- Verbs -- ----------------------------------------------------------------------------------------- pos_functions["Kata kerja"] = { params = { [1] = {}, ["noautolinktext"] = boolean_param, ["noautolinkverb"] = boolean_param, }, func = function(args, data) if args[1] then local alternant_multiword_spec = require(it_verb_module).do_generate_forms(args, "from headword", data.heads[1]) local function do_verb_form(slot, label, rowslot, rowlabel) local forms = alternant_multiword_spec.forms[slot] local retval if alternant_multiword_spec.rowprops.all_defective[rowslot] then if not alternant_multiword_spec.rowprops.defective[rowslot] then -- No forms, but none expected; don't display anything return end retval = {label = "no " .. rowlabel} elseif not forms then retval = {label = "no " .. label} elseif alternant_multiword_spec.rowprops.all_unknown[rowslot] then retval = {label = "unknown " .. rowlabel} elseif forms[1].form == "?" then retval = {label = "unknown " .. label} else -- Disable accelerators for now because we don't want the added accents going into the headwords. -- FIXME: We now have support in [[Module:accel]] to specify the target explicitly; we can use this -- so we can add the accelerators back with a param to avoid the accents. local accel_form = nil -- all_verb_slots[slot] retval = {label = label, accel = accel_form and {form = accel_form} or nil} local prev_footnotes = nil -- If the footnotes for this form are the same as the footnotes for the preceding form or -- contain the preceding footnotes, replace the footnotes that are the same with "ditto". -- This avoids repetition on pages like [[succedere]] where the form ''succedétti'' has a long -- footnote which gets repeated in the traditional form ''succedètti'' (which also has the -- footnote "[traditional]"). for _, form in ipairs(forms) do local quals, refs = require(inflection_utilities_module). convert_footnotes_to_qualifiers_and_references(form.footnotes) local quals_with_ditto = quals if quals and prev_footnotes then local quals_contains_previous = true for _, qual in ipairs(prev_footnotes) do if not m_table.contains(quals, qual) then quals_contains_previous = false break end end if quals_contains_previous then local inserted_ditto = false quals_with_ditto = {} for _, qual in ipairs(quals) do if m_table.contains(prev_footnotes, qual) then if not inserted_ditto then insert(quals_with_ditto, "ditto") inserted_ditto = true end else insert(quals_with_ditto, qual) end end end end prev_footnotes = quals insert(retval, {term = form.form, q = quals_with_ditto, refs = refs}) end end insert(data.inflections, retval) end if alternant_multiword_spec.props.is_pronominal then insert(data.inflections, {label = glossary_link("pronominal")}) end if alternant_multiword_spec.props.impers then insert(data.inflections, {label = glossary_link("impersonal")}) end if alternant_multiword_spec.props.thirdonly then insert(data.inflections, {label = "third-person only"}) end local thirdonly = alternant_multiword_spec.props.impers or alternant_multiword_spec.props.thirdonly local sing_label = thirdonly and "third-person singular" or "first-person singular" for _, rowspec in ipairs { {"pres", "present", true}, {"phis", "past historic", true}, {"pp", "past participle", true}, {"imperf", "imperfect"}, {"fut", "future"}, {"sub", "subjunctive"}, {"impsub", "imperfect subjunctive"}, } do local rowslot, desc, always_show = unpack(rowspec) local slot = rowslot .. (thirdonly and "3s" or "1s") local must_show = alternant_multiword_spec.is_irreg[slot] if always_show then must_show = true elseif rowslot == "imperf" and alternant_multiword_spec.props.has_explicit_stem_spec then -- If there is an explicit stem spec, make sure it gets displayed; the imperfect is a good way of -- showing this. must_show = true elseif not alternant_multiword_spec.forms[slot] then -- If the principal part is unexpectedly missing, make sure we show this. must_show = true elseif alternant_multiword_spec.forms[slot][1].form == "?" then -- If the principal part is unknown, make sure we show this. must_show = true end if must_show then if rowslot == "pp" then do_verb_form(rowslot, desc, rowslot, desc) else do_verb_form(slot, sing_label .. " " .. desc, rowslot, desc) end end end -- Also do the imperative, but not for third-only verbs, which are always missing the imperative. if not thirdonly and (alternant_multiword_spec.is_irreg.imp2s or not alternant_multiword_spec.forms.imp2s) then do_verb_form("imp2s", "second-person singular imperative", "imp", "imperative") end -- If there is a past participle but no auxiliary (e.g. [[malfare]]), explicitly add "no auxiliary". In -- cases where there's no past participle and no auxiliary (e.g. [[irrompere]]), we don't do this as we -- already get "no past participle" displayed. Don't display an auxiliary in any case if the lemma -- consists entirely of reflexive verbs (for which the auxiliary is always [[essere]]). if alternant_multiword_spec.props.is_non_reflexive and ( alternant_multiword_spec.forms.aux or alternant_multiword_spec.forms.pp ) then do_verb_form("aux", "auxiliary", "aux", "auxiliary") end -- Add categories. for _, cat in ipairs(alternant_multiword_spec.categories) do insert(data.categories, cat) end -- If the user didn't explicitly specify head=, or specified exactly one head (not 2+) and we were able to -- incorporate any links in that head into the 1= specification, use the infinitive generated by -- [[Module:it-verb]] it in place of the user-specified or auto-generated head so that we get accents marked -- on the verb(s). Don't do this if the user gave multiple heads or gave a head with a multiword-linked -- verbal expression such as '[[dare esca]] [[al]] [[fuoco]]'. if #data.user_specified_heads == 0 or ( #data.user_specified_heads == 1 and alternant_multiword_spec.incorporated_headword_head_into_lemma ) then data.heads = {} for _, lemma_obj in ipairs(alternant_multiword_spec.forms.inf) do local quals, refs = require(inflection_utilities_module). convert_footnotes_to_qualifiers_and_references(lemma_obj.footnotes) insert(data.heads, {term = lemma_obj.form, q = quals, refs = refs}) end end end end } ----------------------------------------------------------------------------------------- -- Suffix forms -- ----------------------------------------------------------------------------------------- pos_functions["Bentuk akhiran"] = { params = { [1] = {required = true, list = true, disallow_holes = true}, ["g"] = {list = true, disallow_holes = true, type = "genders", flatten = true}, }, func = function(args, data) validate_genders(args.g) data.genders = args.g local suffix_type = {} for _, typ in ipairs(args[1]) do insert(suffix_type, typ .. "-forming suffix") end insert(data.inflections, {label = "non-lemma form of " .. m_table.serialCommaJoin(suffix_type, {conj = "or"})}) end, } ----------------------------------------------------------------------------------------- -- Arbitrary parts of speech -- ----------------------------------------------------------------------------------------- pos_functions["arbitrary part of speech"] = { params = { [1] = {required = true}, ["g"] = {list = true, disallow_holes = true, type = "genders", flatten = true}, }, func = function(args, data) if data.is_suffix then error("Can't use [[Template:it-pos]] with suffixes") end validate_genders(args.g) data.genders = args.g local plpos = m_en_utilities.pluralize(args[1]) data.pos_category = plpos end, } return export fmkyqlfvyjsu95owimlbwfw6333j55j suminding 0 25411 281458 213202 2026-04-23T00:40:20Z GodModeBoros 10321 281458 wikitext text/x-wiki ==Bahasa Kadazandusun== ===lumoyou=== ====kata kerja==== {{inti|dtp|kata kerja}} #[[menyanyi]] #: {{syn|dtp|lumoyou}} #: {{syn|dtp|monondig}} #: {{ux|dtp|Orohian no moti i Kuding di do '''suminding'''. |Kuding sangat suka '''menyanyi'''.}} ===Sebutan=== * {{IPA|dtp|/sʊ.min.diŋ/}} * {{penyempangan|dtp|su|min|ding}} 4zzjun2vvgrlq1iebjqwzwk8bolqkws anid 0 39910 281450 153360 2026-04-22T12:57:42Z Hakimi97 2668 /* Kata nama */ 281450 wikitext text/x-wiki ==Bahasa Lun dayeh== ===Takrifan=== ====Kata nama==== {{inti|lnd|kata nama}} # setiap 7fc7rinw672a8n6uzwy3hi3qk6h0vkb 281451 281450 2026-04-22T12:57:54Z Hakimi97 2668 281451 wikitext text/x-wiki ==Bahasa Lun Dayeh== ===Takrifan=== ====Kata nama==== {{inti|lnd|kata nama}} # setiap npc1mf02zj3znal8zur7kb78dm1sxlo mamaso 0 97162 281457 260609 2026-04-23T00:38:58Z GodModeBoros 10321 281457 wikitext text/x-wiki ==Bahasa Kadazandusun== ===Takrifan=== ====Kata nama==== {{inti|dtp|kata nama}} # [[semasa]] #: {{cp|dtp|Mamain tadi ku o tolipun ku '''mamaso''' nokodop oku di konihab.| Adik saya bermain telefon saya '''semasa''' saya tertidur kelmarin.}} 515ov3yrv1ylzpgaxc292qwbr0qcqf2 Kategori:Perkataan bahasa Korea dengan kunci isih tidak lewah dan tidak automatik 14 114937 281452 2026-04-22T15:26:55Z Hakimi97 2668 Mencipta laman baru dengan kandungan '{{auto cat}}' 281452 wikitext text/x-wiki {{auto cat}} eomzlm5v4j7ond1phrju7cnue91g5qx