Wikikamus
mswiktionary
https://ms.wiktionary.org/wiki/Wikikamus:Laman_Utama
MediaWiki 1.46.0-wmf.24
case-sensitive
Media
Khas
Perbincangan
Pengguna
Perbincangan pengguna
Wikikamus
Perbincangan Wikikamus
Fail
Perbincangan fail
MediaWiki
Perbincangan MediaWiki
Templat
Perbincangan templat
Bantuan
Perbincangan bantuan
Kategori
Perbincangan kategori
Lampiran
Perbincangan lampiran
Rima
Perbincangan rima
Tesaurus
Perbincangan tesaurus
Indeks
Perbincangan indeks
Petikan
Perbincangan petikan
Rekonstruksi
Perbincangan rekonstruksi
Padanan isyarat
Perbincangan padanan isyarat
Konkordans
Perbincangan konkordans
TimedText
TimedText talk
Modul
Perbincangan modul
Acara
Perbincangan acara
batu
0
5878
281462
252914
2026-04-23T04:40:07Z
Hakimi97
2668
/* Terjemahan */ Buang ralat kod kerana parameter xs sudah tidak digunakan lagi (cari menggunakan regex \|xs=[^}]* )
281462
wikitext
text/x-wiki
{{wikipedia|Batu}}
{{wikipedia|Batu (ukuran)}}
== Bahasa Melayu ==
[[Imej:Plagioclase_porphyry.jpg|thumb|Batu.]]
===Etimologi===
Turunan {{inh|ms|poz-mly-pro|*batu}} lanjut dari {{inh|ms|poz-pro|*batu}} diakar {{inh|ms|map-pro|*batu}}. Sewarisan {{cog|tl|bato}}.
===Takrifan===
{{ms-kn|j=باتو}}
# [[bahan]] [[galian]] [[keras]] yang berasal dari Bumi tetapi bukan [[logam]] dan terdiri daripada pelbagai jenis.
#: {{cp|ms|Mereka sedang mengutip '''batu''' di sungai.}}
# sejenis logam kecil yang dipakai dalam alat pemantik [[api]] (untuk mencetuskan api).
# [[permata]].
# [[ukuran]] [[jarak]] sejauh 1,760 [[ela]] (1.61 [[kilometer]]).
#: {{cp|ms|Berapa '''batu''' jauhnya dari Simpang Tiga ke Simpang Empat?}}
# [[penjodoh bilangan]] bagi gigi.
====Kata majmuk====
{{col3|ms|batu api|batu asah|batu bata|batu-batan|batu berani|batu bersurat|batu dacing|batu giling|batu igneus|batu kapur|batu karang|batu kelikir|batu kepala|batu kisar|batu lada|batu loncatan|batu marmar|batu permata|batu telerang|batu timbang}}
===Sebutan===
* {{dewan|ba|tu}}
* {{IPA|ms|/batu/}}
* {{rhymes|ms|atu|tu|u}}
* {{audio|ms|Ms-MY-batu.ogg|Audio (MY)}}
===Kata terbitan===
* {{l|ms|berbatu}}
* {{l|ms|berbatu-batu}}
* {{l|ms|membatu}}
===Terjemahan===
{{ter-atas|benda keras semula jadi}}
* Albania: {{t+|sq|gur|m}}
* Arab: {{ARchar|صخر}}
* Armenia: քար (k‘ar)
* Asturia: piedra {{f}}
* Basque: harri
* Belanda: {{t+|nl|steen|m}}
* Breton: maen , mein {{p}}
* Bulgaria: {{t+|bg|камък|m|tr=kamǎk|sc=Cyrl}}
* Catalonia: pedra {{f}}
* Cina: {{t+|zh|石|tr=shí|sc=Hani}}
* Chuvash: чул (chul)
* Croatia: {{t+|hr|kamen|m}}
* Czech: {{t+|cs|kámen|m}}
* Denmark: {{t-|da|sten|m}}
* Esperanto: {{t-|eo|ŝtono}}
* Estonia: {{t-|et|kivi}}
* Finland: {{t+|fi|kivi}}
* Frisia Utara: stien
* Gaelik Scot: clach {{f}}
* Guaraní: ita
* Hawaii: haku
* Hungary: {{t+|hu|kő}}
* Iceland: {{t+|is|steinn|m}}
* Ido: petro
* Ilocano: bato
* Inggeris: {{t+|en|rock}}, {{t+|en|stone}}
* Inggeris Lama: {{t-|ang|stan}}
* Interlingua: petra
* Ireland: cloch ''f2''
* Itali: {{t+|it|pietra|f}}, {{t+|it|roccia|f}}
* Jepun: {{t+|ja|石|tr=ishi|sc=Jpan}} (いし)
* Jerman: {{t+|de|Stein|m}}
* Korea: [[돌]] (dol)
* Kurdi: {{t+|kmr|kevir}}, {{t+|kmr|ber}}, {{t+|kmr|berd}}, {{t+|kmr|kuç}}, {{t+|ckb|بهرد}}
* Latin: {{t+|la|lapis|m}}
* Latvia: {{t-|lv|akmens|m}}
* Lithuania: akmuo
* Malagasy: vato
* Maori: {{t-|mi|whatu}}
* Perancis: {{t+|fr|pierre|f}}
* Pitjantjatjara: apu
* Poland: {{t+|pl|kamień|m}}
* Portugis: {{t+|pt|pedra|f}}
* Romania: {{t-|ro|piatră|f}}
* Rusia: {{t+|ru|камень|m|tr=kám'en’|sc=Cyrl}}
* Sardinia (Nugor): preda {{f}}
* Scotland: stane
* Sami Utara: geađgi
* Saxon Rendah: steen
* Sepanyol: {{t+|es|piedra|f}}
* Serbia: {{t-|sr|kamen|m}}
* Slovakia: kameň
* Slovenia: {{t+|sl|kamen|m}}
* Suluk: {{t+|tsg|batu}}
* Sweden: {{t+|sv|sten|c}}
* Tagalog: {{t+|tl|bato}}
* Tupinambá: itá
* Turki: {{t+|tr|taş}}
* Ukraine: камінь (kamіn’)
* Vietnam: {{t+|vi|đá}}
* Wales: {{t-|cy|carreg|f}}
* Yunani: {{t+|el|λίθος|m|tr=líthos|sc=Grek}}
{{ter-bawah}}
{{ter-atas|ukuran}}
* Indonesia: mil
* Inggeris: {{t+|en|mile}}
* Itali: miglio
* Perancis: mille
* Thai: {{THchar|ไมล์}}
* Vietnam: Dặm Anh
{{ter-bawah}}
==Bahasa Iban==
===Etimologi===
Turunan {{inh|ms|poz-pro|*batu}} dari {{inh|ms|map-pro|*batu}}. Sewarisan {{cog|tl|bato}}.
===Sebutan===
* {{dewan|ba|tu}}
* {{IPA|ms|/batu/}}
===Kata nama===
{{head|iba|kata nama}}
# [[batu]]
#: {{cp|dtp|Tikau iya '''batu''' nya ke sungai|Dia membaling '''batu''' itu ke sungai}}
== Bahasa Indonesia ==
===Kata nama===
# Lihat [[#Takrifan|takrifan bahasa Melayu]].
# [[bateri]] (lampu suluh).
[[Kategori:Ukuran]]
==Bahasa Kadazandusun==
===Takrifan===
====Kata nama====
{{inti|dtp|kata nama}}
# batu (jarak).
#: {{cp|dtp|Piro po '''batu''' sinodu tinadalanon tokou?
|Berapa '''batu''' lagi jarak perjalanan kita?}}
===Etimologi===
{{inh+|dtp|poz-pro|*batu}}, daripada {{inh|dtp|map-pro|*batu}}.
===Sebutan===
* {{IPA|dtp|/ɓa.tʊ/}}
* {{penyempangan|dtp|ba|tu}}
[[File:LL-Q5317225 (dtp)-Nelynnnnn-batu.wav|thumb|left]]
===Kata terbitan===
*{{l|dtp|sabatu}}
===Rujukan===
* {{R:Komoiboros DusunKadazan}}
== Bahasa Bugis ==
===Takrifan===
====Kata nama====
{{inti|bug|kata nama}}
# batu
===Etimologi===
{{inh+|bug|poz-pro|*batu}}, daripada {{inh|bug|map-pro|*batu}}.
== Bahasa Suluk ==
===Takrifan===
==== Kata nama ====
{{inti|tsg|kata nama}}
# batu
=== Etymology ===
{{inh+|tsg|phi-pro|*batu}}, daripada {{inh|tsg|poz-pro|*batu}}, daripada {{inh|tsg|map-pro|*batu}}. Banding {{cog|ceb|bato}} & {{cog|tl|bato}}.
====Kata terbitan====
{{der3|tsg|kabatuhan|tubig batu<qq:ais; air batu>|mabatu<qq:berbatu>|batu balani|batu lantup|atay-batu|batu hihilug}}
==Bahasa Iban==
===Takrifan===
====Kata nama====
{{inti|iba|kata nama}}
# batu
#: {{cp|iba|Iya deka nikau aku ngena '''batu'''.| Dia hendak membaling saya menggunakan '''batu'''.}}
szp9kpf123je7u14i8dpjvg0kcra5br
Templat:it-adj
10
6413
281466
245684
2026-04-23T09:45:35Z
Hakimi97
2668
281466
wikitext
text/x-wiki
<includeonly>{{#invoke:it-headword|show|Kata sifat}}</includeonly><noinclude>{{documentation}}</noinclude>
3r2u8lq81rjux9b8olgm88g3d02t8ep
Modul:headword
828
9757
281453
281238
2026-04-22T15:30:22Z
Hakimi97
2668
281453
Scribunto
text/plain
local export = {}
-- Named constants for all modules used, to make it easier to swap out sandbox versions.
local debug_track_module = "Module:debug/track"
local en_utilities_module = "Module:en-utilities"
local gender_and_number_module = "Module:gender and number"
local headword_data_module = "Module:headword/data"
local headword_page_module = "Module:headword/page"
local links_module = "Module:links"
local load_module = "Module:load"
local pages_module = "Module:pages"
local palindromes_module = "Module:palindromes"
local pron_qualifier_module = "Module:pron qualifier"
local scripts_module = "Module:scripts"
local scripts_data_module = "Module:scripts/data"
local script_utilities_module = "Module:script utilities"
local script_utilities_data_module = "Module:script utilities/data"
local string_utilities_module = "Module:string utilities"
local table_module = "Module:table"
local utilities_module = "Module:utilities"
local concat = table.concat
local dump = mw.dumpObject
local insert = table.insert
local ipairs = ipairs
local max = math.max
local new_title = mw.title.new
local pairs = pairs
local require = require
local toNFC = mw.ustring.toNFC
local toNFD = mw.ustring.toNFD
local type = type
local ufind = mw.ustring.find
local ugmatch = mw.ustring.gmatch
local ugsub = mw.ustring.gsub
local umatch = mw.ustring.match
--[==[
Loaders for functions in other modules, which overwrite themselves with the target function when called. This ensures modules are only loaded when needed, retains the speed/convenience of locally-declared pre-loaded functions, and has no overhead after the first call, since the target functions are called directly in any subsequent calls.]==]
local function debug_track(...)
debug_track = require(debug_track_module)
return debug_track(...)
end
local function encode_entities(...)
encode_entities = require(string_utilities_module).encode_entities
return encode_entities(...)
end
local function extend(...)
extend = require(table_module).extend
return extend(...)
end
local function find_best_script_without_lang(...)
find_best_script_without_lang = require(scripts_module).findBestScriptWithoutLang
return find_best_script_without_lang(...)
end
local function format_categories(...)
format_categories = require(utilities_module).format_categories
return format_categories(...)
end
local function format_genders(...)
format_genders = require(gender_and_number_module).format_genders
return format_genders(...)
end
local function format_pron_qualifiers(...)
format_pron_qualifiers = require(pron_qualifier_module).format_qualifiers
return format_pron_qualifiers(...)
end
local function full_link(...)
full_link = require(links_module).full_link
return full_link(...)
end
local function get_current_L2(...)
get_current_L2 = require(pages_module).get_current_L2
return get_current_L2(...)
end
local function get_link_page(...)
get_link_page = require(links_module).get_link_page
return get_link_page(...)
end
local function get_script(...)
get_script = require(scripts_module).getByCode
return get_script(...)
end
local function is_palindrome(...)
is_palindrome = require(palindromes_module).is_palindrome
return is_palindrome(...)
end
local function language_link(...)
language_link = require(links_module).language_link
return language_link(...)
end
local function load_data(...)
load_data = require(load_module).load_data
return load_data(...)
end
local function pattern_escape(...)
pattern_escape = require(string_utilities_module).pattern_escape
return pattern_escape(...)
end
local function pluralize(...)
pluralize = require(en_utilities_module).pluralize
return pluralize(...)
end
local function process_page(...)
process_page = require(headword_page_module).process_page
return process_page(...)
end
local function remove_links(...)
remove_links = require(links_module).remove_links
return remove_links(...)
end
local function shallow_copy(...)
shallow_copy = require(table_module).shallowCopy
return shallow_copy(...)
end
local function tag_text(...)
tag_text = require(script_utilities_module).tag_text
return tag_text(...)
end
local function tag_transcription(...)
tag_transcription = require(script_utilities_module).tag_transcription
return tag_transcription(...)
end
local function tag_translit(...)
tag_translit = require(script_utilities_module).tag_translit
return tag_translit(...)
end
local function trim(...)
trim = require(string_utilities_module).trim
return trim(...)
end
local function ulen(...)
ulen = require(string_utilities_module).len
return ulen(...)
end
local function ucfirst(...)
ucfirst = require(string_utilities_module).ucfirst
return ucfirst(...)
end
--[==[
Loaders for objects, which load data (or some other object) into some variable, which can then be accessed as "foo or get_foo()", where the function get_foo sets the object to "foo" and then returns it. This ensures they are only loaded when needed, and avoids the need to check for the existence of the object each time, since once "foo" has been set, "get_foo" will not be called again.]==]
local m_data
local function get_data()
m_data = load_data(headword_data_module)
return m_data
end
local script_data
local function get_script_data()
script_data = load_data(scripts_data_module)
return script_data
end
local script_utilities_data
local function get_script_utilities_data()
script_utilities_data = load_data(script_utilities_data_module)
return script_utilities_data
end
-- If set to true, categories always appear, even in non-mainspace pages
local test_force_categories = false
-- Add a tracking category to track entries with certain (unusually undesirable) properties. `track_id` is an identifier
-- for the particular property being tracked and goes into the tracking page. Specifically, this adds a link in the
-- page text to [[Wiktionary:Tracking/headword/TRACK_ID]], meaning you can find all entries with the `track_id` property
-- by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID]].
--
-- If `lang` (a language object) is given, an additional tracking page [[Wiktionary:Tracking/headword/TRACK_ID/CODE]] is
-- linked to where CODE is the language code of `lang`, and you can find all entries in the combination of `track_id`
-- and `lang` by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID/CODE]]. This makes it possible to
-- isolate only the entries with a specific tracking property that are in a given language. Note that if `lang`
-- references at etymology-only language, both that language's code and its full parent's code are tracked.
local function track(track_id, lang)
local tracking_page = "headword/" .. track_id
if lang and lang:hasType("etymology-only") then
debug_track{tracking_page, tracking_page .. "/" .. lang:getCode(),
tracking_page .. "/" .. lang:getFullCode()}
elseif lang then
debug_track{tracking_page, tracking_page .. "/" .. lang:getCode()}
else
debug_track(tracking_page)
end
return true
end
local function text_in_script(text, script_code)
local sc = get_script(script_code)
if not sc then
error("Internal error: Bad script code " .. script_code)
end
local characters = sc.characters
local out
if characters then
text = ugsub(text, "%W", "")
out = ufind(text, "[" .. characters .. "]")
end
if out then
return true
else
return false
end
end
local spacingPunctuation = "[%s%p]+"
--[[ List of punctuation or spacing characters that are found inside of words.
Used to exclude characters from the regex above. ]]
local wordPunc = "-#%%&@־׳״'.·*’་•:᠊"
local notWordPunc = "[^" .. wordPunc .. "]+"
-- Format a term (either a head term or an inflection term) along with any left or right qualifiers, labels, references
-- or customized separator: `part` is the object specifying the term (and `lang` the language of the term), which should
-- optionally contain:
-- * left qualifiers in `q`, an array of strings;
-- * right qualifiers in `qq`, an array of strings;
-- * left labels in `l`, an array of strings;
-- * right labels in `ll`, an array of strings;
-- * references in `refs`, an array either of strings (formatted reference text) or objects containing fields `text`
-- (formatted reference text) and optionally `name` and/or `group`;
-- * a separator in `separator`, defaulting to " <i>or</i> " if this is not the first term (j > 1), otherwise "".
-- `formatted` is the formatted version of the term itself, and `j` is the index of the term.
local function format_term_with_qualifiers_and_refs(lang, part, formatted, j)
local function part_non_empty(field)
local list = part[field]
if not list then
return nil
end
if type(list) ~= "table" then
error(("Internal error: Wrong type for `part.%s`=%s, should be \"table\""):format(field, dump(list)))
end
return list[1]
end
if part_non_empty("q") or part_non_empty("qq") or part_non_empty("l") or
part_non_empty("ll") or part_non_empty("refs") then
formatted = format_pron_qualifiers {
lang = lang,
text = formatted,
q = part.q,
qq = part.qq,
l = part.l,
ll = part.ll,
refs = part.refs,
}
end
local separator = part.separator or j > 1 and " <i>or</i> " -- use "" to request no separator
if separator then
formatted = separator .. formatted
end
return formatted
end
--[==[Return true if the given head is multiword according to the algorithm used in full_headword().]==]
function export.head_is_multiword(head)
for possibleWordBreak in ugmatch(head, spacingPunctuation) do
if umatch(possibleWordBreak, notWordPunc) then
return true
end
end
return false
end
do
local function workaround_to_exclude_chars(s)
return (ugsub(s, notWordPunc, "\2%1\1"))
end
--[==[Add links to a multiword head.]==]
function export.add_multiword_links(head, default)
head = "\1" .. ugsub(head, spacingPunctuation, workaround_to_exclude_chars) .. "\2"
if default then
head = head
:gsub("(\1[^\2]*)\\([:#][^\2]*\2)", "%1\\\\%2")
:gsub("(\1[^\2]*)([:#][^\2]*\2)", "%1\\%2")
end
--Escape any remaining square brackets to stop them breaking links (e.g. "[citation needed]").
head = encode_entities(head, "[]", true, true)
--[=[
use this when workaround is no longer needed:
head = "[[" .. ugsub(head, WORDBREAKCHARS, "]]%1[[") .. "]]"
Remove any empty links, which could have been created above
at the beginning or end of the string.
]=]
return (head
:gsub("\1\2", "")
:gsub("[\1\2]", {["\1"] = "[[", ["\2"] = "]]"}))
end
end
local function non_categorizable(full_raw_pagename)
return full_raw_pagename:find("^Lampiran:Gerak isyarat/") or
-- Unsupported titles with descriptive names.
(full_raw_pagename:find("^Tajuk tidak disokong/") and not full_raw_pagename:find("`"))
end
local function tag_text_and_add_quals_and_refs(data, head, formatted, j)
-- Add language and script wrapper.
formatted = tag_text(formatted, data.lang, head.sc, "head", nil, j == 1 and data.id or nil)
-- Add qualifiers, labels, references and separator.
return format_term_with_qualifiers_and_refs(data.lang, head, formatted, j)
end
-- Format a headword with transliterations.
local function format_headword(data)
-- Are there non-empty transliterations?
local has_translits = false
local has_manual_translits = false
------ Format the headwords. ------
local head_parts = {}
local unique_head_parts = {}
local has_multiple_heads = not not data.heads[2]
for j, head in ipairs(data.heads) do
if head.tr or head.ts then
has_translits = true
end
if head.tr and head.tr_manual or head.ts then
has_manual_translits = true
end
local formatted
-- Apply processing to the headword, for formatting links and such.
if head.term:find("[[", nil, true) and head.sc:getCode() ~= "Image" then
formatted = language_link{term = head.term, lang = data.lang}
else
formatted = data.lang:makeDisplayText(head.term, head.sc, true)
end
local head_part = tag_text_and_add_quals_and_refs(data, head, formatted, j)
insert(head_parts, head_part)
-- If multiple heads, try to determine whether all heads display the same. To do this we need to effectively
-- rerun the text tagging and addition of qualifiers and references, using 1 for all indices.
if has_multiple_heads then
local unique_head_part
if j == 1 then
unique_head_part = head_part
else
unique_head_part = tag_text_and_add_quals_and_refs(data, head, formatted, 1)
end
unique_head_parts[unique_head_part] = true
end
end
local set_size = 0
if has_multiple_heads then
for _ in pairs(unique_head_parts) do
set_size = set_size + 1
end
end
if set_size == 1 then
head_parts = head_parts[1]
else
head_parts = concat(head_parts)
end
if has_manual_translits then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr/LANGCODE]]
track("manual-tr", data.lang)
end
------ Format the transliterations and transcriptions. ------
local translits_formatted
if has_translits then
local translit_parts = {}
for _, head in ipairs(data.heads) do
if head.tr or head.ts then
local this_parts = {}
if head.tr then
insert(this_parts, tag_translit(head.tr, data.lang:getCode(), "head", nil, head.tr_manual))
if head.ts then
insert(this_parts, " ")
end
end
if head.ts then
insert(this_parts, "/" .. tag_transcription(head.ts, data.lang:getCode(), "head") .. "/")
end
insert(translit_parts, concat(this_parts))
end
end
translits_formatted = " (" .. concat(translit_parts, " <i>or</i> ") .. ")"
local langname = data.lang:getCanonicalName()
local transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus")
local saw_translit_page = false
if transliteration_page and transliteration_page:getContent() then
translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted
saw_translit_page = true
end
-- If data.lang is an etymology-only language and we didn't find a translation page for it, fall back to the
-- full parent.
if not saw_translit_page and data.lang:hasType("etymology-only") then
langname = data.lang:getFullName()
transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus")
if transliteration_page and transliteration_page:getContent() then
translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted
end
end
else
translits_formatted = ""
end
------ Paste heads and transliterations/transcriptions. ------
local lemma_gloss
if data.gloss then
lemma_gloss = ' <span class="ib-content qualifier-content">' .. data.gloss .. '</span>'
else
lemma_gloss = ""
end
return head_parts .. translits_formatted .. lemma_gloss
end
local function format_headword_genders(data)
local retval = ""
if data.genders and data.genders[1] then
if data.gloss then
retval = ","
end
local pos_for_cat
if not data.nogendercat then
local no_gender_cat = (m_data or get_data()).no_gender_cat
if not (no_gender_cat[data.lang:getCode()] or no_gender_cat[data.lang:getFullCode()]) then
pos_for_cat = (m_data or get_data()).pos_for_gender_number_cat[data.pos_category:gsub("^reconstructed ", "")]
end
end
local text, cats = format_genders(data.genders, data.lang, pos_for_cat)
if cats then
extend(data.categories, cats)
end
retval = retval .. " " .. text
end
return retval
end
-- Forward reference
local format_inflections
local function format_inflection_parts(data, parts)
for j, part in ipairs(parts) do
if type(part) ~= "table" then
part = {term = part}
end
local partaccel = part.accel
local face = part.face or "bold"
if face ~= "bold" and face ~= "plain" and face ~= "hypothetical" then
error("The face `" .. face .. "` " .. (
(script_utilities_data or get_script_utilities_data()).faces[face] and
"should not be used for non-headword terms on the headword line." or
"is invalid."
))
end
-- Here the final part 'or data.nolinkinfl' allows to have 'nolinkinfl=true'
-- right into the 'data' table to disable inflection links of the entire headword
-- when inflected forms aren't entry-worthy, e.g.: in Vulgar Latin
local nolinkinfl = part.face == "hypothetical" or (part.nolink and track("nolink") or part.nolinkinfl) or (
data.nolink and track("nolink") or data.nolinkinfl)
local formatted
if part.label then
-- FIXME: There should be a better way of italicizing a label. As is, this isn't customizable.
formatted = "<i>" .. part.label .. "</i>"
else
-- Convert the term into a full link. Don't show a transliteration here unless enable_auto_translit is
-- requested, either at the `parts` level (i.e. per inflection) or at the `data.inflections` level (i.e.
-- specified for all inflections). This is controllable in {{head}} using autotrinfl=1 for all inflections,
-- or fNautotr=1 for an individual inflection (remember that a single inflection may be associated with
-- multiple terms). The reason for doing this is to avoid clutter in headword lines by default in languages
-- where the script is relatively straightforward to read by learners (e.g. Greek, Russian), but allow it
-- to be enabled in languages with more complex scripts (e.g. Arabic).
--
-- FIXME: With nested inflections, should we also respect `enable_auto_translit` at the top level of the
-- nested inflections structure?
local tr = part.tr or not (parts.enable_auto_translit or data.inflections.enable_auto_translit) and "-" or nil
-- FIXME: Temporary errors added 2025-10-03. Remove after a month or so.
if part.translit then
error("Internal error: Use field `tr` not `translit` for specifying an inflection part translit")
end
if part.transcription then
error("Internal error: Use field `ts` not `transcription` for specifying an inflection part transcription")
end
local postprocess_annotations
if part.inflections then
postprocess_annotations = function(infldata)
insert(infldata.annotations, format_inflections(data, part.inflections))
end
end
formatted = full_link(
{
term = not nolinkinfl and part.term or nil,
alt = part.alt or (nolinkinfl and part.term or nil),
lang = part.lang or data.lang,
sc = part.sc or parts.sc or nil,
gloss = part.gloss,
pos = part.pos,
lit = part.lit,
id = part.id,
genders = part.genders,
tr = tr,
ts = part.ts,
accel = partaccel or parts.accel,
postprocess_annotations = postprocess_annotations,
},
face
)
end
parts[j] = format_term_with_qualifiers_and_refs(part.lang or data.lang, part,
formatted, j)
end
local parts_output
if parts[1] then
parts_output = (parts.label and " " or "") .. concat(parts)
elseif parts.request then
parts_output = " <small>[please provide]</small>"
insert(data.categories, "Requests for inflections in " .. data.lang:getFullName() .. " entries")
else
parts_output = ""
end
local parts_label = parts.label and ("<i>" .. parts.label .. "</i>") or ""
return format_term_with_qualifiers_and_refs(data.lang, parts, parts_label .. parts_output, 1)
end
-- Format the inflections following the headword or nested after a given inflection. Declared local above.
function format_inflections(data, inflections)
if inflections and inflections[1] then
-- Format each inflection individually.
for key, infl in ipairs(inflections) do
inflections[key] = format_inflection_parts(data, infl)
end
return concat(inflections, ", ")
else
return ""
end
end
-- Format the top-level inflections following the headword. Currently this just adds parens around the
-- formatted comma-separated inflections in `data.inflections`.
local function format_top_level_inflections(data)
local result = format_inflections(data, data.inflections)
if result ~= "" then
return " (" .. result .. ")"
else
return result
end
end
-- Forward reference
local check_red_link_inflections
-- Check a single inflection (which consists of a label and zero or more terms, each possibly with nested inflections)
-- for red links. If so, insert a red-link category based on `plpos` (the plural part of speech to insert in the
-- category), stop further processing, and return true. If no red links found, return false.
local function check_red_link_inflection_parts(data, parts, plpos)
for _, part in ipairs(parts) do
if type(part) ~= "table" then
part = {term = part}
end
local term = part.term
if term and not term:find("%[%[") then
local stripped_physical_term = get_link_page(term, data.lang, part.sc or parts.sc or nil)
if stripped_physical_term then
local title = mw.title.new(stripped_physical_term)
if title and not title:getContent() then
insert(data.categories, data.lang:getFullName() .. " " .. plpos .. " with red links in their headword lines")
return true
end
end
end
if part.inflections then
if check_red_link_inflections(data, part.inflections, plpos) then
return true
end
end
end
return false
end
-- Check a set of inflections (each of which describes a single inflection of the term, such as feminine or plural, and
-- consists of a label and zero or more terms, each possibly with nested inflections) for red links. If so, insert a
-- red-link category based on `plpos` (the plural part of speech to insert in the category), stop further processing,
-- and return true. If no red links found, return false.
function check_red_link_inflections(data, inflections, plpos)
if inflections and inflections[1] then
-- Check each inflection individually.
for key, infl in ipairs(inflections) do
if check_red_link_inflection_parts(data, infl, plpos) then
return true
end
end
end
return false
end
-- Check the top-level inflections in `data.inflections`, along with any nested inflections, for red links. If so,
-- insert a red-link category based on `plpos` (the plural part of speech to insert in the category), stop further
-- processing, and return true. If no red links found, return false.
local function check_red_link_inflections_top_level(data, plpos)
return check_red_link_inflections(data, data.inflections, plpos)
end
--[==[
Returns the plural form of `pos`, a raw part of speech input, which could be singular or
plural. Irregular plural POS are taken into account (e.g. "kanji" pluralizes to
"kanji").
]==]
function export.pluralize_pos(pos)
-- Make the plural form of the part of speech
return (m_data or get_data()).irregular_plurals[pos] or
pos:sub(-1) == "s" and pos or
pluralize(pos)
end
--[==[
Return "lemma" if the given POS is a lemma, "non-lemma form" if a non-lemma form, or nil
if unknown. The POS passed in must be in its plural form ("nouns", "prefixes", etc.).
If you have a POS in its singular form, call {export.pluralize_pos()} above to pluralize it
in a smart fashion that knows when to add "-s" and when to add "-es", and also takes
into account any irregular plurals.
If `best_guess` is given and the POS is in neither the lemma nor non-lemma list, guess
based on whether it ends in " forms"; otherwise, return nil.
]==]
function export.pos_lemma_or_nonlemma(plpos, best_guess)
local m_headword_data = m_data or get_data()
local isLemma = m_headword_data.lemmas
-- Is it a lemma category?
if isLemma[plpos] then
return "Lema"
end
local plpos_no_recon = plpos:gsub("^reconstructed ", "")
if isLemma[plpos_no_recon] then
return "Lema"
end
-- Is it a nonlemma category?
local isNonLemma = m_headword_data.nonlemmas
if isNonLemma[plpos] or isNonLemma[plpos_no_recon] then
return "Bentuk bukan lema"
end
local plpos_no_mut = plpos:gsub("^mutated ", "")
if isLemma[plpos_no_mut] or isNonLemma[plpos_no_mut] then
return "Bentuk bukan lema"
elseif best_guess then
return plpos:find("^Bentuk ") and "Bentuk bukan lema" or "Lema"
else
return nil
end
end
--[==[
Canonicalize a part of speech as specified in 2= in {{tl|head}}. This checks for POS aliases and non-lemma form
aliases ending in 'f', and then pluralizes if the POS term does not have an invariable plural.
]==]
function export.canonicalize_pos(pos)
-- FIXME: Temporary code to throw an error for alias 'pre' (= preposition) that will go away.
if pos == "pre" then
-- Don't throw error on 'pref' as it's an alias for "prefix".
error("POS 'pre' for 'preposition' no longer allowed as it's too ambiguous; use 'prep'")
end
-- Likewise for pro = pronoun.
if pos == "pro" or pos == "prof" then
error("POS 'pro' for 'pronoun' no longer allowed as it's too ambiguous; use 'pron'")
end
local m_headword_data = m_data or get_data()
if m_headword_data.pos_aliases[pos] then
pos = m_headword_data.pos_aliases[pos]
elseif pos:sub(-1) == "f" then
pos = pos:sub(1, -2)
pos = "Bentuk " .. (m_headword_data.pos_aliases[pos] or pos)
end
return export.pluralize_pos(pos)
end
-- Find and return the maximum index in the array `data[element]` (which may have gaps in it), and initialize it to a
-- zero-length array if unspecified. Check to make sure all keys are numeric (other than "maxindex", which is set by
-- [[Module:parameters]] for list parameters), all values are strings, and unless `allow_blank_string` is given,
-- no blank (zero-length) strings are present.
local function init_and_find_maximum_index(data, element, allow_blank_string)
local maxind = 0
if not data[element] then
data[element] = {}
end
local typ = type(data[element])
if typ ~= "table" then
error(("Internal error: In full_headword(), `data.%s` must be an array but is a %s"):format(element, typ))
end
for k, v in pairs(data[element]) do
if k ~= "maxindex" then
if type(k) ~= "number" then
error(("Internal error: Unrecognized non-numeric key '%s' in `data.%s`"):format(k, element))
end
if k > maxind then
maxind = k
end
if v then
if type(v) ~= "string" then
error(("Internal error: For key '%s' in `data.%s`, value should be a string but is a %s"):format(k, element, type(v)))
end
if not allow_blank_string and v == "" then
error(("Internal error: For key '%s' in `data.%s`, blank string not allowed; use 'false' for the default"):format(k, element))
end
end
end
end
return maxind
end
--[==[
-- Add the page to various maintenance categories for the language and the
-- whole page. These are placed in the headword somewhat arbitrarily, but
-- mainly because headword templates are mandatory for entries (meaning that
-- in theory it provides full coverage).
--
-- This is provided as an external entry point so that modules which transclude
-- information from other entries (such as {{tl|ja-see}}) can take advantage
-- of this feature as well, because they are used in place of a conventional
-- headword template.]==]
do
-- Handle any manual sortkeys that have been specified in raw categories
-- by tracking if they are the same or different from the automatically-
-- generated sortkey, so that we can track them in maintenance
-- categories.
local function handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats)
sortkey = sortkey or lang:makeSortKey(page.pagename)
-- If there are raw categories with no sortkey, then they will be
-- sorted based on the default MediaWiki sortkey, so we check against
-- that.
if tbl == true then
if page.raw_defaultsort ~= sortkey then
insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik")
end
return
end
local redundant, different
for k in pairs(tbl) do
if k == sortkey then
redundant = true
else
different = true
end
end
if redundant then
insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih lewah")
end
if different then
insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik")
end
return sortkey
end
function export.maintenance_cats(page, lang, lang_cats, page_cats)
extend(page_cats, page.cats)
lang = lang:getFull() -- since we are just generating categories
local canonical = lang:getCanonicalName()
local tbl, sortkey = page.wikitext_topic_cat[lang:getCode()]
if tbl then
sortkey = handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats)
insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori topik yang menggunakan penanda mentah")
end
tbl = page.wikitext_langname_cat[canonical]
if tbl then
handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats)
insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori nama bahasa yang menggunakan penanda mentah")
end
local current_L2 = get_current_L2()
if current_L2 then
local trimmed_L2 = trim(current_L2)
local expected_L2 = "Bahasa " .. canonical
if trimmed_L2 ~= expected_L2 then
insert(lang_cats, "Entri bahasa " .. canonical .. " dengan pengepala bahasa tidak betul")
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul/LANGCODE]]
track("pengepala bahasa tidak betul", lang)
end
end
end
end
--[==[This is the primary external entry point.
{{lua|full_headword(data)}}
This is used by {{temp|head}} and various language-specific headword templates (e.g. {{temp|ru-adj}} for Russian adjectives, {{temp|de-noun}} for German nouns, etc.) to display an entire headword line.
See [[#Further explanations for full_headword()]]
]==]
function export.full_headword(data)
-- Prevent data from being destructively modified.
local data = shallow_copy(data)
------------ 1. Basic checks for old-style (multi-arg) calling convention. ------------
if data.getCanonicalName then
error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) of properties, not a language object")
end
if not data.lang or type(data.lang) ~= "table" or not data.lang.getCode then
error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) and `data.lang` must be a language object")
end
if data.id and type(data.id) ~= "string" then
error("Internal error: The id in the data table should be a string.")
end
------------ 2. Initialize pagename etc. ------------
local langcode = data.lang:getCode()
local full_langcode = data.lang:getFullCode()
local langname = data.lang:getCanonicalName()
local full_langname = data.lang:getFullName()
local raw_pagename = data.pagename
local page
local m_headword_data = m_data or get_data()
if raw_pagename and raw_pagename ~= m_headword_data.pagename then -- for testing, doc pages, etc.
-- data.pagename is often set on documentation and test pages through the pagename= parameter of various
-- templates, to emulate running on that page. Having a large number of such test templates on a single
-- page often leads to timeouts, because we fetch and parse the contents of each page in turn. However,
-- we don't really need to do that and can function fine without fetching and parsing the contents of a
-- given page, so turn off content fetching/parsing (and also setting the DEFAULTSORT key through a parser
-- function, which is *slooooow*) in certain namespaces where test and documentation templates are likely to
-- be found and where actual content does not live (User, Template, Module).
local actual_namespace = m_headword_data.page.namespace
local no_fetch_content = actual_namespace == "User" or actual_namespace == "Template" or
actual_namespace == "Module"
page = process_page(raw_pagename, no_fetch_content)
else
page = m_headword_data.page
end
local namespace = page.namespace
------------ 3. Initialize `data.heads` table; if old-style, convert to new-style. ------------
if type(data.heads) == "table" and type(data.heads[1]) == "table" then
-- new-style
if data.translits or data.transcriptions then
error("Internal error: In full_headword(), if `data.heads` is new-style (array of head objects), `data.translits` and `data.transcriptions` cannot be given")
end
else
-- convert old-style `heads`, `translits` and `transcriptions` to new-style
local maxind = max(
init_and_find_maximum_index(data, "heads"),
init_and_find_maximum_index(data, "translits", true),
init_and_find_maximum_index(data, "transcriptions", true)
)
for i = 1, maxind do
data.heads[i] = {
term = data.heads[i],
tr = data.translits[i],
ts = data.transcriptions[i],
}
end
end
-- Make sure there's at least one head.
if not data.heads[1] then
data.heads[1] = {}
end
------------ 4. Initialize and validate `data.categories` and `data.whole_page_categories`, and determine `pos_category` if not given, and add basic categories. ------------
-- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]]
if data.altform then
data.noposcat = true
end
init_and_find_maximum_index(data, "categories")
init_and_find_maximum_index(data, "whole_page_categories")
local pos_category_already_present = false
if data.categories[1] then
local escaped_langname = pattern_escape(full_langname)
local matches_lang_pattern = "^" .. escaped_langname .. " "
for _, cat in ipairs(data.categories) do
-- Does the category begin with the language name? If not, tag it with a tracking category.
if not cat:find(matches_lang_pattern) then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category/LANGCODE]]
track("no lang category", data.lang)
end
end
-- If `pos_category` not given, try to infer it from the first specified category. If this doesn't work, we
-- throw an error below.
if not data.pos_category and data.categories[1]:find(matches_lang_pattern) then
data.pos_category = data.categories[1]:gsub(matches_lang_pattern, "")
-- Optimization to avoid inserting category already present.
pos_category_already_present = true
end
end
if not data.pos_category then
error("Internal error: `data.pos_category` not specified and could not be inferred from the categories given in "
.. "`data.categories`. Either specify the plural part of speech in `data.pos_category` "
.. "(e.g. \"proper nouns\") or ensure that the first category in `data.categories` is formed from the "
.. "language's canonical name plus the plural part of speech (e.g. \"Norwegian Bokmål proper nouns\")."
)
end
-- Insert a category at the beginning for the part of speech unless it's already present or `data.noposcat` given.
if not pos_category_already_present and not data.noposcat then
local pos_category = ucfirst(data.pos_category) .. " bahasa " .. full_langname
-- FIXME: [[User:Theknightwho]] Why is this special case here? Please add an explanatory comment.
if pos_category ~= "Aksara Han rentas bahasa" then
insert(data.categories, 1, pos_category)
end
end
-- Try to determine whether the part of speech refers to a lemma or a non-lemma form; if we can figure this out,
-- add an appropriate category.
local postype = export.pos_lemma_or_nonlemma(data.pos_category)
if not postype then
-- We don't know what this category is, so tag it with a tracking category.
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/LANGCODE]]
track("unrecognized pos", data.lang)
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS/LANGCODE]]
track("unrecognized pos/pos/" .. data.pos_category, data.lang)
elseif not data.noposcat then
insert(data.categories, 1, ucfirst(postype) .. " bahasa " .. full_langname)
end
-- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]]
if data.altform then
insert(data.categories, 1, "Bentuk alternatif bahasa " .. full_langname)
end
------------ 5. Create a default headword, and add links to multiword page names. ------------
-- Determine if this is an "anti-asterisk" term, i.e. an attested term in a language that must normally be
-- reconstructed.
local is_anti_asterisk = data.heads[1].term and data.heads[1].term:find("^!!")
local lang_reconstructed = data.lang:hasType("reconstructed")
if is_anti_asterisk then
if not lang_reconstructed then
error("Anti-asterisk feature (head= beginning with !!) can only be used with reconstructed languages")
end
lang_reconstructed = false
end
-- Determine if term is reconstructed
local is_reconstructed = namespace == "Rekonstruksi" or data.lang:hasType("reconstructed")
-- Create a default headword based on the pagename, which is determined in
-- advance by the data module so that it only needs to be done once.
local default_head = page.pagename
-- Add links to multi-word page names when appropriate
if not (is_reconstructed or data.nolinkhead) then
local no_links = m_headword_data.no_multiword_links
if not (no_links[langcode] or no_links[full_langcode]) and export.head_is_multiword(default_head) then
default_head = export.add_multiword_links(default_head, true)
end
end
if is_reconstructed then
default_head = "*" .. default_head
end
------------ 6. Check the namespace against the language type. ------------
if namespace == "" then
if lang_reconstructed then
error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Rekonstruksi: ")
elseif data.lang:hasType("appendix-constructed") then
error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Lampiran: ")
end
elseif namespace == "Petikan" or namespace == "Tesaurus" then
error("Templat pengepala tidak boleh digunakan dalam ruang nama " .. namespace .. ": .")
end
------------ 7. Fill in missing values in `data.heads`. ------------
-- True if any script among the headword scripts has spaces in it.
local any_script_has_spaces = false
-- True if any term has a redundant head= param.
local has_redundant_head_param = false
for _, head in ipairs(data.heads) do
------ 7a. If missing head, replace with default head.
if not head.term then
head.term = default_head
elseif head.term == default_head then
has_redundant_head_param = true
elseif is_anti_asterisk and head.term == "!!" then
-- If explicit head=!! is given, it's an anti-asterisk term and we fill in the default head.
head.term = "!!" .. default_head
elseif head.term:find("^[!?]$") then
-- If explicit head= just consists of ! or ?, add it to the end of the default head.
head.term = default_head .. head.term
end
head.term_no_initial_bang_bang = is_anti_asterisk and head.term:sub(3) or head.term
if is_reconstructed then
local head_term = head.term
if head_term:find("%[%[") then
head_term = remove_links(head_term)
end
if head_term:sub(1, 1) ~= "*" then
error("The headword '" .. head_term .. "' must begin with '*' to indicate that it is reconstructed.")
end
end
------ 7b. Try to detect the script(s) if not provided. If a per-head script is provided, that takes precedence,
------ otherwise fall back to the overall script if given. If neither given, autodetect the script.
local auto_sc = data.lang:findBestScript(head.term)
if (
auto_sc:getCode() == "None" and
find_best_script_without_lang(head.term):getCode() ~= "None"
) then
insert(data.categories, "Perkataan dengan bentuk tulisan tidak piawai bahasa " .. full_langname )
end
if not (head.sc or data.sc) then -- No script code given, so use autodetected script.
head.sc = auto_sc
else
if not head.sc then -- Overall script code given.
head.sc = data.sc
end
-- Track uses of sc parameter.
if head.sc:getCode() == auto_sc:getCode() then
track("redundant script code", data.lang)
if not data.no_script_code_cat then
insert(data.categories, "Perkataan dengan kod tulisan lewah bahasa " .. full_langname )
end
else
track("non-redundant manual script code", data.lang)
if not data.no_script_code_cat then
insert(data.categories, "Perkataan dengan kod tulisan manual tidak lewah bahasa " .. full_langname )
end
end
end
-- If using a discouraged character sequence, add to maintenance category.
if head.sc:hasNormalizationFixes() == true then
local composed_head = toNFC(head.term)
if head.sc:fixDiscouragedSequences(composed_head) ~= composed_head then
insert(data.whole_page_categories, "Laman menggunakan jujukan aksara tidak digalakkan")
end
end
any_script_has_spaces = any_script_has_spaces or head.sc:hasSpaces()
------ 7c. Create automatic transliterations for any non-Latin headwords without manual translit given
------ (provided automatic translit is available, e.g. not in Persian or Hebrew).
-- Make transliterations
head.tr_manual = nil
-- Try to generate a transliteration if necessary
if head.tr == "-" then
head.tr = nil
else
local notranslit = m_headword_data.notranslit
if not (notranslit[langcode] or notranslit[full_langcode]) and head.sc:isTransliterated() then
head.tr_manual = not not head.tr
local text = head.term_no_initial_bang_bang
if not data.lang:link_tr(head.sc) then
text = remove_links(text)
end
local automated_tr = data.lang:transliterate(text, head.sc)
if automated_tr then
local manual_tr = head.tr
if manual_tr then
if remove_links(manual_tr) == remove_links(automated_tr) then
insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi lewah")
else
insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi manual tidak lewah")
end
end
if not manual_tr then
head.tr = automated_tr
end
end
-- There is still no transliteration?
-- Add the entry to a cleanup category.
if not head.tr then
head.tr = "<small>transliteration needed</small>"
-- FIXME: No current support for 'Request for transliteration of Classical Persian terms' or similar.
-- Consider adding this support in [[Module:category tree/poscatboiler/data/entry maintenance]].
insert(data.categories, "Permintaan transliterasi perkataan bahasa " .. full_langname)
else
-- Otherwise, trim it.
head.tr = trim(head.tr)
end
end
end
-- Link to the transliteration entry for languages that require this.
if head.tr and data.lang:link_tr(head.sc) then
head.tr = full_link{
term = head.tr,
lang = data.lang,
sc = get_script("Latn"),
tr = "-"
}
end
end
------------ 8. Maybe tag the title with the appropriate script code, using the `display_title` mechanism. ------------
-- Assumes that the scripts in "toBeTagged" will never occur in the Reconstruction namespace.
-- (FIXME: Don't make assumptions like this, and if you need to do so, throw an error if the assumption is violated.)
-- Avoid tagging ASCII as Hani even when it is tagged as Hani in the headword, as in [[check]]. The check for ASCII
-- might need to be expanded to a check for any Latin characters and whitespace or punctuation.
local display_title
-- Where there are multiple headwords, use the script for the first. This assumes the first headword is similar to
-- the pagename, and that headwords that are in different scripts from the pagename aren't first. This seems to be
-- about the best we can do (alternatively we could potentially do script detection on the pagename).
local dt_script = data.heads[1].sc
local dt_script_code = dt_script:getCode()
local page_non_ascii = namespace == "" and not page.pagename:find("^[%z\1-\127]+$")
local unsupported_pagename, unsupported = page.full_raw_pagename:gsub("^Tajuk tidak disokong/", "")
if unsupported == 1 and page.unsupported_titles[unsupported_pagename] then
display_title = 'Tajuk tidak disokong/<span class="' .. dt_script_code .. '">' .. page.unsupported_titles[unsupported_pagename] .. '</span>'
elseif page_non_ascii and m_headword_data.toBeTagged[dt_script_code]
or (dt_script_code == "Jpan" and (text_in_script(page.pagename, "Hira") or text_in_script(page.pagename, "Kana")))
or (dt_script_code == "Kore" and text_in_script(page.pagename, "Hang")) then
display_title = '<span class="' .. dt_script_code .. '">' .. page.full_raw_pagename .. '</span>'
-- Keep Han entries region-neutral in the display title.
elseif page_non_ascii and (dt_script_code == "Hant" or dt_script_code == "Hans") then
display_title = '<span class="Hani">' .. page.full_raw_pagename .. '</span>'
elseif namespace == "Rekonstruksi" then
local matched
display_title, matched = ugsub(
page.full_raw_pagename,
"^(Rekonstruksi:[^/]+/)(.+)$",
function(before, term)
return before .. tag_text(term, data.lang, dt_script)
end
)
if matched == 0 then
display_title = nil
end
end
-- FIXME: Generalize this.
-- If the current language uses ur-Arab (for Urdu, etc.), ku-Arab (Central Kurdish) or pa-Arab
-- (Shahmukhi, for Punjabi) and there's more than one language on the page, don't set the display title
-- because these three scripts display in Nastaliq and we don't want this for terms that also exist in other
-- languages that don't display in Nastaliq (e.g. Arabic or Persian) to display in Nastaliq. Because the word
-- "Urdu" occurs near the end of the alphabet, Urdu fonts tend to override the fonts of other languages.
-- FIXME: This is checking for more than one language on the page but instead needs to check if there are any
-- languages using scripts other than the ones just mentioned.
if (dt_script_code == "ur-Arab" or dt_script_code == "ku-Arab" or dt_script_code == "pa-Arab") and page.L2_list.n > 1 then
display_title = nil
end
if display_title then
mw.getCurrentFrame():callParserFunction(
"DISPLAYTITLE",
display_title
)
end
------------ 9. Insert additional categories. ------------
if data.force_cat_output then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/force cat output]]
track("force cat output")
end
if has_redundant_head_param then
if not data.no_redundant_head_cat then
-- This is not the right way to go about this; too many exceptions and problems due to language-specific headword
-- handling customization. If we want this, it should be opt-in by a given language passing in the default headword.
-- insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan parameter kepala lewah")
end
end
-- If the first head is multiword (after removing links), maybe insert into "LANG multiword terms".
if not data.nomultiwordcat and any_script_has_spaces and postype == "lemma" then
local no_multiword_cat = m_headword_data.no_multiword_cat
if not (no_multiword_cat[langcode] or no_multiword_cat[full_langcode]) then
-- Check for spaces or hyphens, but exclude prefixes and suffixes.
-- Use the pagename, not the head= value, because the latter may have extra
-- junk in it, e.g. superscripted text that throws off the algorithm.
local no_hyphen = m_headword_data.hyphen_not_multiword_sep
-- Exclude hyphens if the data module states that they should for this language.
local checkpattern = (no_hyphen[langcode] or no_hyphen[full_langcode]) and ".[%s፡]." or ".[%s%-፡]."
local is_multiword = umatch(page.pagename, checkpattern)
if is_multiword and not non_categorizable(page.full_raw_pagename) then
insert(data.categories, "Perkataan berbilang kata bahasa " .. full_langname)
elseif not is_multiword then
local long_word_threshold = m_headword_data.long_word_thresholds[langcode] or
m_headword_data.long_word_thresholds[full_langcode]
if long_word_threshold and ulen(page.pagename) >= long_word_threshold then
insert(data.categories, "Perkataan panjang bahasa " .. full_langname)
end
end
end
end
local default_sccat = m_headword_data.default_sccat
if data.sccat or data.sccat == nil and (default_sccat[langcode] or default_sccat[full_langcode]) then
for _, head in ipairs(data.heads) do
insert(data.categories, ucfirst(data.pos_category) .. " bahasa " .. full_langname .. " dalam " ..
head.sc:getDisplayForm())
end
end
-- Reconstructed terms often use weird combinations of scripts and realistically aren't spelled so much as notated.
if namespace ~= "Rekonstruksi" then
-- Map from languages to a string containing the characters to ignore when considering whether a term has
-- multiple written scripts in it. Typically these are Greek or Cyrillic letters used for their phonetic
-- values.
local characters_to_ignore = {
["aaq"] = "αάὰ", -- Penobscot (Algonquian)
["acy"] = "δθ", -- Cypriot Arabic
["aez"] = "β", -- Aeka (Trans-New Guinea)
["anc"] = "γ", -- Ngas (Chadic/Afroasiatic)
["aou"] = "χ", -- A'ou (Kra-Dai)
["art-blk"] = "ч", -- Bolak (conlang)
["awg"] = "β", -- Anguthimri (Pama-Nyungan)
["az"] = "ь", -- Azerbaijani (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["ba"] = "ь", -- Bashkir (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["bhp"] = "β", -- Bima (Austronesian)
["bjz"] = "β", -- Baruga (Trans-New Guinea)
["byk"] = "θ", -- Biao (Kra-Dai)
["cdy"] = "θ", -- Chadong (Kra-Dai)
["chp"] = "θ", -- Chipewyan (Athabaskan)
["cjh"] = "χ", -- Upper Chehalis (Salishan)
["clm"] = "χ", -- Klallam (Salishan)
["col"] = "χ", -- Colombia-Wenatchi (Salishan)
["coo"] = "χθ", -- Comox (Salishan)
["crx"] = "θ", -- Carrier (Athabaskan)
["ets"] = "θ", -- Yekhee (Edoid/Niger-Congo)
["ett"] = "χ", -- Etruscan (isolate; in romanizations)
["fla"] = "χ", -- Montana Salish (Salishan)
["grt"] = "་", -- Garo (South Asian Sino-Tibetan)
["gmw-gts"] = "χ", -- Gottscheerish (Bavarian variant spoken in Slovenia)
["hur"] = "χθ", -- Halkomelem (Salishan)
["itc-psa"] = "f", -- Pre-Samnite (Italic; normally written in Greek)
["izh"] = "ь", -- Ingrian (Finnic)
["kic"] = "θ", -- Kickapoo (Algonquian)
["kk"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["ky"] = "ь", -- Kyrgyz (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["lil"] = "χ", -- Lillooet (Salishan)
["lsi"] = "ꓹ", -- Lashi (Lolo-Burmese/Sino-Tibetan; represents a glottal stop)
["mhz"] = "β", -- Mor (Austronesian)
["mqn"] = "β", -- Moronene (Austronesian)
["neg"]= "ӡā", -- Negidal (Tungusic; normally in Cyrillic)
["oka"] = "χ", -- Okanagan (Salishan)
["ole"] = "θ", -- Olekha (Sino-Tibetan)
["oui"] = "γβ", -- Old Uyghur (Turkic; FIXME: others? E.g. Greek delta (δ)?)
["pox"] = "χ", -- Polabian (West Slavic)
["rif"] = "ε", -- Tarifit (Berber)
["rom"] = "Θθ", -- Romani (Indic: International Standard; two different thetas???)
["rpn"] = "β", -- Repanbitip (Austronesian)
["sah"] = "ь", -- Yakut (Turkic; 1929 - 1939 Latin spelling)
["sit-jap"] = "χ", -- Japhug (Sino-Tibetan)
["sjw"] = "θ", -- Shawnee (Algonquian)
["squ"] = "χ", -- Squamish (Salishan)
["str"] = "χθ", -- Saanich (Salishan)
["teh"] = "χ", -- Tehuelche (Chonan; spoken in Argentina)
["tep"] = "η", -- Tepecano (Uto-Aztecan)
["thp"] = "χ", -- Thompson (Salishan)
["tk"] = "ь", -- Turkmen (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["tt"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["twa"] = "χ", -- Twana (Salishan)
["wbl"] = "ы", -- Wakhi (Iranian)
["xbc"] = "ϸ", -- Bactrian (Iranian; represents š; normally written in Greek)
["yha"] = "θ", -- Baha (Kra-Dai)
["za"] = "зч", -- Zhuang (Tai/Kra-Dai); 1957-1982 alphabet used two Cyrillic letters (as well as some others like
-- ƃ, ƅ, ƨ, ɯ and ɵ that look like Cyrillic or Greek but are actually Latin)
["zlw-slv"] = "χђћ", -- Slovincian (West Slavic; FIXME: χ is Greek, the other two are Cyrillic, but I'm not sure
-- the currect characters are being chosen in the entry names)
["zng"] = "θ", -- Mang (Mon-Khmer)
["ztp"] = "θ", -- Loxicha Zapotec (Zapotecan)
}
-- Determine how many real scripts are found in the pagename, where we exclude symbols and such. We exclude
-- scripts whose `character_category` is false as well as Zmth (mathematical notation symbols), which has a
-- category of "Mathematical notation symbols". When counting scripts, we need to elide language-specific
-- variants because e.g. Beng and as-Beng have slightly different characters but we don't want to consider them
-- two different scripts (e.g. [[এৰ]] has two characters which are detected respectively as Beng and as-Beng).
local seen_scripts = {}
local num_seen_scripts = 0
local num_loops = 0
local canon_pagename = page.pagename
local ch_to_ignore = characters_to_ignore[full_langcode]
if ch_to_ignore then
canon_pagename = ugsub(canon_pagename, "[" .. ch_to_ignore .. "]", "")
end
while true do
if canon_pagename == "" or num_seen_scripts >= 2 or num_loops >= 10 then
break
end
-- Make sure we don't get into a loop checking the same script over and over again; happens with e.g. [[ᠪᡳ]]
num_loops = num_loops + 1
local pagename_script = find_best_script_without_lang(canon_pagename, "None only as last resort")
local script_chars = pagename_script.characters
if not script_chars then
-- we are stuck; this happens with None
break
end
local script_code = pagename_script:getCode()
local replaced
canon_pagename, replaced = ugsub(canon_pagename, "[" .. script_chars .. "]", "")
if (
replaced and
script_code ~= "Zmth" and
(script_data or get_script_data())[script_code] and
script_data[script_code].character_category ~= false
) then
script_code = script_code:gsub("^.-%-", "")
if not seen_scripts[script_code] then
seen_scripts[script_code] = true
num_seen_scripts = num_seen_scripts + 1
end
end
end
if num_seen_scripts > 1 then
insert(data.categories, "Perkataan bahasa " .. full_langname .. " dieja dalam berbilang tulisan")
end
end
-- Categorise for unusual characters. Takes into account combining characters, so that we can categorise for characters with diacritics that aren't encoded as atomic characters (e.g. U̠). These can be in two formats: single combining characters (i.e. character + diacritic(s)) or double combining characters (i.e. character + diacritic(s) + character). Each can have any number of diacritics.
local standard = data.lang:getStandardCharacters()
if standard and not non_categorizable(page.full_raw_pagename) then
local function char_category(char)
local specials = {
["#"] = "number sign",
["("] = "parentheses",
[")"] = "parentheses",
["<"] = "angle brackets",
[">"] = "angle brackets",
["["] = "square brackets",
["]"] = "square brackets",
["_"] = "underscore",
["{"] = "braces",
["|"] = "vertical line",
["}"] = "braces",
["ß"] = "ẞ",
["\205\133"] = "", -- this is UTF-8 for U+0345 ( ͅ)
["\239\191\189"] = "replacement character",
}
char = toNFD(char)
:gsub(".[\128-\191]*", function(m)
local new_m = specials[m]
new_m = new_m or m:uupper()
return new_m
end)
return toNFC(char)
end
if full_langcode ~= "hi" and full_langcode ~= "lo" then
local standard_chars_scripts = {}
for _, head in ipairs(data.heads) do
standard_chars_scripts[head.sc:getCode()] = true
end
-- Iterate over the scripts, in case there is more than one (as they can have different sets of standard characters).
for code in pairs(standard_chars_scripts) do
local sc_standard = data.lang:getStandardCharacters(code)
if sc_standard then
if page.pagename_len > 1 then
local explode_standard = {}
local function explode(char)
explode_standard[char] = true
return ""
end
local sc_standard = ugsub(sc_standard, page.comb_chars.combined_double, explode)
sc_standard = ugsub(sc_standard,page.comb_chars.combined_single, explode)
:gsub(".[\128-\191]*", explode)
local num_cat_inserted
for char in pairs(page.explode_pagename) do
if not explode_standard[char] then
if char:find("[0-9]") then
if not num_cat_inserted then
insert(data.categories, "Perkataan dieja dengan nombor bahasa " .. full_langname)
num_cat_inserted = true
end
elseif ufind(char, page.emoji_pattern) then
insert(data.categories, "Perkataan dieja dengan emoji bahasa " .. full_langname)
else
local upper = char_category(char)
if not explode_standard[upper] then
char = upper
end
insert(data.categories, "Perkataan dieja dengan " .. char .. " bahasa " .. full_langname)
end
end
end
end
-- If a diacritic doesn't appear in any of the standard characters, also categorise for it generally.
sc_standard = toNFD(sc_standard)
for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_single) do
if not umatch(sc_standard, diacritic) then
insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. " bahasa " .. full_langname)
end
end
for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_double) do
if not umatch(sc_standard, diacritic) then
insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. "◌ bahasa " .. full_langname)
end
end
end
end
-- Ancient Greek, Hindi and Lao handled the old way for now, as their standard chars still need to be converted to the new format (because there are a lot of them).
elseif ulen(page.pagename) ~= 1 then
for character in ugmatch(page.pagename, "([^" .. standard .. "])") do
local upper = char_category(character)
if not umatch(upper, "[" .. standard .. "]") then
character = upper
end
insert(data.categories, "Perkataan dieja dengan " .. character .. " bahasa " .. full_langname)
end
end
end
if data.heads[1].sc:isSystem("alphabet") then
local pagename, i = page.pagename:ulower(), 2
while umatch(pagename, "(%a)" .. ("%1"):rep(i)) do
i = i + 1
insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan " .. i .. " contoh huruf yang sama berturut-turut")
end
end
-- Categorise for palindromes
if not data.nopalindromecat and namespace ~= "Rekonstruksi" and ulen(page.pagename) > 2
-- FIXME: Use of first script here seems hacky. What is the clean way of doing this in the presence of
-- multiple scripts?
and is_palindrome(page.pagename, data.lang, data.heads[1].sc) then
insert(data.categories, "Palindrom bahasa " .. full_langname)
end
if namespace == "" and not lang_reconstructed then
for _, head in ipairs(data.heads) do
if page.full_raw_pagename ~= get_link_page(remove_links(head.term), data.lang, head.sc) then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch/LANGCODE]]
track("pagename spelling mismatch", data.lang)
break
end
end
end
-- Add red link category if called for and we're not a "large" page, where such checks are disabled.
if data.checkredlinks and not m_headword_data.large_pages[m_headword_data.pagename] then
local plposcat = type(data.checkredlinks) == "string" and data.checkredlinks or data.pos_category
check_red_link_inflections_top_level(data, plposcat)
end
-- Add to various maintenance categories.
export.maintenance_cats(page, data.lang, data.categories, data.whole_page_categories)
------------ 10. Format and return headwords, genders, inflections and categories. ------------
-- Format and return all the gathered information. This may add more categories (e.g. gender/number categories),
-- so make sure we do it before evaluating `data.categories`.
local text = '<span class="headword-line">' ..
format_headword(data) ..
format_headword_genders(data) ..
format_top_level_inflections(data) .. '</span>'
-- Language-specific categories.
local cats = format_categories(
data.categories, data.lang, data.sort_key, page.encoded_pagename,
data.force_cat_output or test_force_categories, data.heads[1].sc
)
-- Language-agnostic categories.
local whole_page_cats = format_categories(
data.whole_page_categories, nil, "-"
)
return text .. cats .. whole_page_cats
end
return export
dsxezkypi61q35s0q5jp4h6np4acc9c
281454
281453
2026-04-22T15:32:12Z
Hakimi97
2668
281454
Scribunto
text/plain
local export = {}
-- Named constants for all modules used, to make it easier to swap out sandbox versions.
local debug_track_module = "Module:debug/track"
local en_utilities_module = "Module:en-utilities"
local gender_and_number_module = "Module:gender and number"
local headword_data_module = "Module:headword/data"
local headword_page_module = "Module:headword/page"
local links_module = "Module:links"
local load_module = "Module:load"
local pages_module = "Module:pages"
local palindromes_module = "Module:palindromes"
local pron_qualifier_module = "Module:pron qualifier"
local scripts_module = "Module:scripts"
local scripts_data_module = "Module:scripts/data"
local script_utilities_module = "Module:script utilities"
local script_utilities_data_module = "Module:script utilities/data"
local string_utilities_module = "Module:string utilities"
local table_module = "Module:table"
local utilities_module = "Module:utilities"
local concat = table.concat
local dump = mw.dumpObject
local insert = table.insert
local ipairs = ipairs
local max = math.max
local new_title = mw.title.new
local pairs = pairs
local require = require
local toNFC = mw.ustring.toNFC
local toNFD = mw.ustring.toNFD
local type = type
local ufind = mw.ustring.find
local ugmatch = mw.ustring.gmatch
local ugsub = mw.ustring.gsub
local umatch = mw.ustring.match
--[==[
Loaders for functions in other modules, which overwrite themselves with the target function when called. This ensures modules are only loaded when needed, retains the speed/convenience of locally-declared pre-loaded functions, and has no overhead after the first call, since the target functions are called directly in any subsequent calls.]==]
local function debug_track(...)
debug_track = require(debug_track_module)
return debug_track(...)
end
local function encode_entities(...)
encode_entities = require(string_utilities_module).encode_entities
return encode_entities(...)
end
local function extend(...)
extend = require(table_module).extend
return extend(...)
end
local function find_best_script_without_lang(...)
find_best_script_without_lang = require(scripts_module).findBestScriptWithoutLang
return find_best_script_without_lang(...)
end
local function format_categories(...)
format_categories = require(utilities_module).format_categories
return format_categories(...)
end
local function format_genders(...)
format_genders = require(gender_and_number_module).format_genders
return format_genders(...)
end
local function format_pron_qualifiers(...)
format_pron_qualifiers = require(pron_qualifier_module).format_qualifiers
return format_pron_qualifiers(...)
end
local function full_link(...)
full_link = require(links_module).full_link
return full_link(...)
end
local function get_current_L2(...)
get_current_L2 = require(pages_module).get_current_L2
return get_current_L2(...)
end
local function get_link_page(...)
get_link_page = require(links_module).get_link_page
return get_link_page(...)
end
local function get_script(...)
get_script = require(scripts_module).getByCode
return get_script(...)
end
local function is_palindrome(...)
is_palindrome = require(palindromes_module).is_palindrome
return is_palindrome(...)
end
local function language_link(...)
language_link = require(links_module).language_link
return language_link(...)
end
local function load_data(...)
load_data = require(load_module).load_data
return load_data(...)
end
local function pattern_escape(...)
pattern_escape = require(string_utilities_module).pattern_escape
return pattern_escape(...)
end
local function pluralize(...)
pluralize = require(en_utilities_module).pluralize
return pluralize(...)
end
local function process_page(...)
process_page = require(headword_page_module).process_page
return process_page(...)
end
local function remove_links(...)
remove_links = require(links_module).remove_links
return remove_links(...)
end
local function shallow_copy(...)
shallow_copy = require(table_module).shallowCopy
return shallow_copy(...)
end
local function tag_text(...)
tag_text = require(script_utilities_module).tag_text
return tag_text(...)
end
local function tag_transcription(...)
tag_transcription = require(script_utilities_module).tag_transcription
return tag_transcription(...)
end
local function tag_translit(...)
tag_translit = require(script_utilities_module).tag_translit
return tag_translit(...)
end
local function trim(...)
trim = require(string_utilities_module).trim
return trim(...)
end
local function ulen(...)
ulen = require(string_utilities_module).len
return ulen(...)
end
local function ucfirst(...)
ucfirst = require(string_utilities_module).ucfirst
return ucfirst(...)
end
--[==[
Loaders for objects, which load data (or some other object) into some variable, which can then be accessed as "foo or get_foo()", where the function get_foo sets the object to "foo" and then returns it. This ensures they are only loaded when needed, and avoids the need to check for the existence of the object each time, since once "foo" has been set, "get_foo" will not be called again.]==]
local m_data
local function get_data()
m_data = load_data(headword_data_module)
return m_data
end
local script_data
local function get_script_data()
script_data = load_data(scripts_data_module)
return script_data
end
local script_utilities_data
local function get_script_utilities_data()
script_utilities_data = load_data(script_utilities_data_module)
return script_utilities_data
end
-- If set to true, categories always appear, even in non-mainspace pages
local test_force_categories = false
-- Add a tracking category to track entries with certain (unusually undesirable) properties. `track_id` is an identifier
-- for the particular property being tracked and goes into the tracking page. Specifically, this adds a link in the
-- page text to [[Wiktionary:Tracking/headword/TRACK_ID]], meaning you can find all entries with the `track_id` property
-- by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID]].
--
-- If `lang` (a language object) is given, an additional tracking page [[Wiktionary:Tracking/headword/TRACK_ID/CODE]] is
-- linked to where CODE is the language code of `lang`, and you can find all entries in the combination of `track_id`
-- and `lang` by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID/CODE]]. This makes it possible to
-- isolate only the entries with a specific tracking property that are in a given language. Note that if `lang`
-- references at etymology-only language, both that language's code and its full parent's code are tracked.
local function track(track_id, lang)
local tracking_page = "headword/" .. track_id
if lang and lang:hasType("etymology-only") then
debug_track{tracking_page, tracking_page .. "/" .. lang:getCode(),
tracking_page .. "/" .. lang:getFullCode()}
elseif lang then
debug_track{tracking_page, tracking_page .. "/" .. lang:getCode()}
else
debug_track(tracking_page)
end
return true
end
local function text_in_script(text, script_code)
local sc = get_script(script_code)
if not sc then
error("Internal error: Bad script code " .. script_code)
end
local characters = sc.characters
local out
if characters then
text = ugsub(text, "%W", "")
out = ufind(text, "[" .. characters .. "]")
end
if out then
return true
else
return false
end
end
local spacingPunctuation = "[%s%p]+"
--[[ List of punctuation or spacing characters that are found inside of words.
Used to exclude characters from the regex above. ]]
local wordPunc = "-#%%&@־׳״'.·*’་•:᠊"
local notWordPunc = "[^" .. wordPunc .. "]+"
-- Format a term (either a head term or an inflection term) along with any left or right qualifiers, labels, references
-- or customized separator: `part` is the object specifying the term (and `lang` the language of the term), which should
-- optionally contain:
-- * left qualifiers in `q`, an array of strings;
-- * right qualifiers in `qq`, an array of strings;
-- * left labels in `l`, an array of strings;
-- * right labels in `ll`, an array of strings;
-- * references in `refs`, an array either of strings (formatted reference text) or objects containing fields `text`
-- (formatted reference text) and optionally `name` and/or `group`;
-- * a separator in `separator`, defaulting to " <i>or</i> " if this is not the first term (j > 1), otherwise "".
-- `formatted` is the formatted version of the term itself, and `j` is the index of the term.
local function format_term_with_qualifiers_and_refs(lang, part, formatted, j)
local function part_non_empty(field)
local list = part[field]
if not list then
return nil
end
if type(list) ~= "table" then
error(("Internal error: Wrong type for `part.%s`=%s, should be \"table\""):format(field, dump(list)))
end
return list[1]
end
if part_non_empty("q") or part_non_empty("qq") or part_non_empty("l") or
part_non_empty("ll") or part_non_empty("refs") then
formatted = format_pron_qualifiers {
lang = lang,
text = formatted,
q = part.q,
qq = part.qq,
l = part.l,
ll = part.ll,
refs = part.refs,
}
end
local separator = part.separator or j > 1 and " <i>or</i> " -- use "" to request no separator
if separator then
formatted = separator .. formatted
end
return formatted
end
--[==[Return true if the given head is multiword according to the algorithm used in full_headword().]==]
function export.head_is_multiword(head)
for possibleWordBreak in ugmatch(head, spacingPunctuation) do
if umatch(possibleWordBreak, notWordPunc) then
return true
end
end
return false
end
do
local function workaround_to_exclude_chars(s)
return (ugsub(s, notWordPunc, "\2%1\1"))
end
--[==[Add links to a multiword head.]==]
function export.add_multiword_links(head, default)
head = "\1" .. ugsub(head, spacingPunctuation, workaround_to_exclude_chars) .. "\2"
if default then
head = head
:gsub("(\1[^\2]*)\\([:#][^\2]*\2)", "%1\\\\%2")
:gsub("(\1[^\2]*)([:#][^\2]*\2)", "%1\\%2")
end
--Escape any remaining square brackets to stop them breaking links (e.g. "[citation needed]").
head = encode_entities(head, "[]", true, true)
--[=[
use this when workaround is no longer needed:
head = "[[" .. ugsub(head, WORDBREAKCHARS, "]]%1[[") .. "]]"
Remove any empty links, which could have been created above
at the beginning or end of the string.
]=]
return (head
:gsub("\1\2", "")
:gsub("[\1\2]", {["\1"] = "[[", ["\2"] = "]]"}))
end
end
local function non_categorizable(full_raw_pagename)
return full_raw_pagename:find("^Lampiran:Gerak isyarat/") or
-- Unsupported titles with descriptive names.
(full_raw_pagename:find("^Tajuk tidak disokong/") and not full_raw_pagename:find("`"))
end
local function tag_text_and_add_quals_and_refs(data, head, formatted, j)
-- Add language and script wrapper.
formatted = tag_text(formatted, data.lang, head.sc, "head", nil, j == 1 and data.id or nil)
-- Add qualifiers, labels, references and separator.
return format_term_with_qualifiers_and_refs(data.lang, head, formatted, j)
end
-- Format a headword with transliterations.
local function format_headword(data)
-- Are there non-empty transliterations?
local has_translits = false
local has_manual_translits = false
------ Format the headwords. ------
local head_parts = {}
local unique_head_parts = {}
local has_multiple_heads = not not data.heads[2]
for j, head in ipairs(data.heads) do
if head.tr or head.ts then
has_translits = true
end
if head.tr and head.tr_manual or head.ts then
has_manual_translits = true
end
local formatted
-- Apply processing to the headword, for formatting links and such.
if head.term:find("[[", nil, true) and head.sc:getCode() ~= "Image" then
formatted = language_link{term = head.term, lang = data.lang}
else
formatted = data.lang:makeDisplayText(head.term, head.sc, true)
end
local head_part = tag_text_and_add_quals_and_refs(data, head, formatted, j)
insert(head_parts, head_part)
-- If multiple heads, try to determine whether all heads display the same. To do this we need to effectively
-- rerun the text tagging and addition of qualifiers and references, using 1 for all indices.
if has_multiple_heads then
local unique_head_part
if j == 1 then
unique_head_part = head_part
else
unique_head_part = tag_text_and_add_quals_and_refs(data, head, formatted, 1)
end
unique_head_parts[unique_head_part] = true
end
end
local set_size = 0
if has_multiple_heads then
for _ in pairs(unique_head_parts) do
set_size = set_size + 1
end
end
if set_size == 1 then
head_parts = head_parts[1]
else
head_parts = concat(head_parts)
end
if has_manual_translits then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr/LANGCODE]]
track("manual-tr", data.lang)
end
------ Format the transliterations and transcriptions. ------
local translits_formatted
if has_translits then
local translit_parts = {}
for _, head in ipairs(data.heads) do
if head.tr or head.ts then
local this_parts = {}
if head.tr then
insert(this_parts, tag_translit(head.tr, data.lang:getCode(), "head", nil, head.tr_manual))
if head.ts then
insert(this_parts, " ")
end
end
if head.ts then
insert(this_parts, "/" .. tag_transcription(head.ts, data.lang:getCode(), "head") .. "/")
end
insert(translit_parts, concat(this_parts))
end
end
translits_formatted = " (" .. concat(translit_parts, " <i>or</i> ") .. ")"
local langname = data.lang:getCanonicalName()
local transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus")
local saw_translit_page = false
if transliteration_page and transliteration_page:getContent() then
translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted
saw_translit_page = true
end
-- If data.lang is an etymology-only language and we didn't find a translation page for it, fall back to the
-- full parent.
if not saw_translit_page and data.lang:hasType("etymology-only") then
langname = data.lang:getFullName()
transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus")
if transliteration_page and transliteration_page:getContent() then
translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted
end
end
else
translits_formatted = ""
end
------ Paste heads and transliterations/transcriptions. ------
local lemma_gloss
if data.gloss then
lemma_gloss = ' <span class="ib-content qualifier-content">' .. data.gloss .. '</span>'
else
lemma_gloss = ""
end
return head_parts .. translits_formatted .. lemma_gloss
end
local function format_headword_genders(data)
local retval = ""
if data.genders and data.genders[1] then
if data.gloss then
retval = ","
end
local pos_for_cat
if not data.nogendercat then
local no_gender_cat = (m_data or get_data()).no_gender_cat
if not (no_gender_cat[data.lang:getCode()] or no_gender_cat[data.lang:getFullCode()]) then
pos_for_cat = (m_data or get_data()).pos_for_gender_number_cat[data.pos_category:gsub("^reconstructed ", "")]
end
end
local text, cats = format_genders(data.genders, data.lang, pos_for_cat)
if cats then
extend(data.categories, cats)
end
retval = retval .. " " .. text
end
return retval
end
-- Forward reference
local format_inflections
local function format_inflection_parts(data, parts)
for j, part in ipairs(parts) do
if type(part) ~= "table" then
part = {term = part}
end
local partaccel = part.accel
local face = part.face or "bold"
if face ~= "bold" and face ~= "plain" and face ~= "hypothetical" then
error("The face `" .. face .. "` " .. (
(script_utilities_data or get_script_utilities_data()).faces[face] and
"should not be used for non-headword terms on the headword line." or
"is invalid."
))
end
-- Here the final part 'or data.nolinkinfl' allows to have 'nolinkinfl=true'
-- right into the 'data' table to disable inflection links of the entire headword
-- when inflected forms aren't entry-worthy, e.g.: in Vulgar Latin
local nolinkinfl = part.face == "hypothetical" or (part.nolink and track("nolink") or part.nolinkinfl) or (
data.nolink and track("nolink") or data.nolinkinfl)
local formatted
if part.label then
-- FIXME: There should be a better way of italicizing a label. As is, this isn't customizable.
formatted = "<i>" .. part.label .. "</i>"
else
-- Convert the term into a full link. Don't show a transliteration here unless enable_auto_translit is
-- requested, either at the `parts` level (i.e. per inflection) or at the `data.inflections` level (i.e.
-- specified for all inflections). This is controllable in {{head}} using autotrinfl=1 for all inflections,
-- or fNautotr=1 for an individual inflection (remember that a single inflection may be associated with
-- multiple terms). The reason for doing this is to avoid clutter in headword lines by default in languages
-- where the script is relatively straightforward to read by learners (e.g. Greek, Russian), but allow it
-- to be enabled in languages with more complex scripts (e.g. Arabic).
--
-- FIXME: With nested inflections, should we also respect `enable_auto_translit` at the top level of the
-- nested inflections structure?
local tr = part.tr or not (parts.enable_auto_translit or data.inflections.enable_auto_translit) and "-" or nil
-- FIXME: Temporary errors added 2025-10-03. Remove after a month or so.
if part.translit then
error("Internal error: Use field `tr` not `translit` for specifying an inflection part translit")
end
if part.transcription then
error("Internal error: Use field `ts` not `transcription` for specifying an inflection part transcription")
end
local postprocess_annotations
if part.inflections then
postprocess_annotations = function(infldata)
insert(infldata.annotations, format_inflections(data, part.inflections))
end
end
formatted = full_link(
{
term = not nolinkinfl and part.term or nil,
alt = part.alt or (nolinkinfl and part.term or nil),
lang = part.lang or data.lang,
sc = part.sc or parts.sc or nil,
gloss = part.gloss,
pos = part.pos,
lit = part.lit,
id = part.id,
genders = part.genders,
tr = tr,
ts = part.ts,
accel = partaccel or parts.accel,
postprocess_annotations = postprocess_annotations,
},
face
)
end
parts[j] = format_term_with_qualifiers_and_refs(part.lang or data.lang, part,
formatted, j)
end
local parts_output
if parts[1] then
parts_output = (parts.label and " " or "") .. concat(parts)
elseif parts.request then
parts_output = " <small>[please provide]</small>"
insert(data.categories, "Requests for inflections in " .. data.lang:getFullName() .. " entries")
else
parts_output = ""
end
local parts_label = parts.label and ("<i>" .. parts.label .. "</i>") or ""
return format_term_with_qualifiers_and_refs(data.lang, parts, parts_label .. parts_output, 1)
end
-- Format the inflections following the headword or nested after a given inflection. Declared local above.
function format_inflections(data, inflections)
if inflections and inflections[1] then
-- Format each inflection individually.
for key, infl in ipairs(inflections) do
inflections[key] = format_inflection_parts(data, infl)
end
return concat(inflections, ", ")
else
return ""
end
end
-- Format the top-level inflections following the headword. Currently this just adds parens around the
-- formatted comma-separated inflections in `data.inflections`.
local function format_top_level_inflections(data)
local result = format_inflections(data, data.inflections)
if result ~= "" then
return " (" .. result .. ")"
else
return result
end
end
-- Forward reference
local check_red_link_inflections
-- Check a single inflection (which consists of a label and zero or more terms, each possibly with nested inflections)
-- for red links. If so, insert a red-link category based on `plpos` (the plural part of speech to insert in the
-- category), stop further processing, and return true. If no red links found, return false.
local function check_red_link_inflection_parts(data, parts, plpos)
for _, part in ipairs(parts) do
if type(part) ~= "table" then
part = {term = part}
end
local term = part.term
if term and not term:find("%[%[") then
local stripped_physical_term = get_link_page(term, data.lang, part.sc or parts.sc or nil)
if stripped_physical_term then
local title = mw.title.new(stripped_physical_term)
if title and not title:getContent() then
insert(data.categories, data.lang:getFullName() .. " " .. plpos .. " with red links in their headword lines")
return true
end
end
end
if part.inflections then
if check_red_link_inflections(data, part.inflections, plpos) then
return true
end
end
end
return false
end
-- Check a set of inflections (each of which describes a single inflection of the term, such as feminine or plural, and
-- consists of a label and zero or more terms, each possibly with nested inflections) for red links. If so, insert a
-- red-link category based on `plpos` (the plural part of speech to insert in the category), stop further processing,
-- and return true. If no red links found, return false.
function check_red_link_inflections(data, inflections, plpos)
if inflections and inflections[1] then
-- Check each inflection individually.
for key, infl in ipairs(inflections) do
if check_red_link_inflection_parts(data, infl, plpos) then
return true
end
end
end
return false
end
-- Check the top-level inflections in `data.inflections`, along with any nested inflections, for red links. If so,
-- insert a red-link category based on `plpos` (the plural part of speech to insert in the category), stop further
-- processing, and return true. If no red links found, return false.
local function check_red_link_inflections_top_level(data, plpos)
return check_red_link_inflections(data, data.inflections, plpos)
end
--[==[
Returns the plural form of `pos`, a raw part of speech input, which could be singular or
plural. Irregular plural POS are taken into account (e.g. "kanji" pluralizes to
"kanji").
]==]
function export.pluralize_pos(pos)
-- Make the plural form of the part of speech
return (m_data or get_data()).irregular_plurals[pos] or
pos:sub(-1) == "s" and pos or
pluralize(pos)
end
--[==[
Return "lemma" if the given POS is a lemma, "non-lemma form" if a non-lemma form, or nil
if unknown. The POS passed in must be in its plural form ("nouns", "prefixes", etc.).
If you have a POS in its singular form, call {export.pluralize_pos()} above to pluralize it
in a smart fashion that knows when to add "-s" and when to add "-es", and also takes
into account any irregular plurals.
If `best_guess` is given and the POS is in neither the lemma nor non-lemma list, guess
based on whether it ends in " forms"; otherwise, return nil.
]==]
function export.pos_lemma_or_nonlemma(plpos, best_guess)
local m_headword_data = m_data or get_data()
local isLemma = m_headword_data.lemmas
-- Is it a lemma category?
if isLemma[plpos] then
return "Lema"
end
local plpos_no_recon = plpos:gsub("^reconstructed ", "")
if isLemma[plpos_no_recon] then
return "Lema"
end
-- Is it a nonlemma category?
local isNonLemma = m_headword_data.nonlemmas
if isNonLemma[plpos] or isNonLemma[plpos_no_recon] then
return "Bentuk bukan lema"
end
local plpos_no_mut = plpos:gsub("^mutated ", "")
if isLemma[plpos_no_mut] or isNonLemma[plpos_no_mut] then
return "Bentuk bukan lema"
elseif best_guess then
return plpos:find("^Bentuk ") and "Bentuk bukan lema" or "Lema"
else
return nil
end
end
--[==[
Canonicalize a part of speech as specified in 2= in {{tl|head}}. This checks for POS aliases and non-lemma form
aliases ending in 'f', and then pluralizes if the POS term does not have an invariable plural.
]==]
function export.canonicalize_pos(pos)
-- FIXME: Temporary code to throw an error for alias 'pre' (= preposition) that will go away.
if pos == "pre" then
-- Don't throw error on 'pref' as it's an alias for "prefix".
error("POS 'pre' for 'preposition' no longer allowed as it's too ambiguous; use 'prep'")
end
-- Likewise for pro = pronoun.
if pos == "pro" or pos == "prof" then
error("POS 'pro' for 'pronoun' no longer allowed as it's too ambiguous; use 'pron'")
end
local m_headword_data = m_data or get_data()
if m_headword_data.pos_aliases[pos] then
pos = m_headword_data.pos_aliases[pos]
elseif pos:sub(-1) == "f" then
pos = pos:sub(1, -2)
pos = "Bentuk " .. (m_headword_data.pos_aliases[pos] or pos)
end
return export.pluralize_pos(pos)
end
-- Find and return the maximum index in the array `data[element]` (which may have gaps in it), and initialize it to a
-- zero-length array if unspecified. Check to make sure all keys are numeric (other than "maxindex", which is set by
-- [[Module:parameters]] for list parameters), all values are strings, and unless `allow_blank_string` is given,
-- no blank (zero-length) strings are present.
local function init_and_find_maximum_index(data, element, allow_blank_string)
local maxind = 0
if not data[element] then
data[element] = {}
end
local typ = type(data[element])
if typ ~= "table" then
error(("Internal error: In full_headword(), `data.%s` must be an array but is a %s"):format(element, typ))
end
for k, v in pairs(data[element]) do
if k ~= "maxindex" then
if type(k) ~= "number" then
error(("Internal error: Unrecognized non-numeric key '%s' in `data.%s`"):format(k, element))
end
if k > maxind then
maxind = k
end
if v then
if type(v) ~= "string" then
error(("Internal error: For key '%s' in `data.%s`, value should be a string but is a %s"):format(k, element, type(v)))
end
if not allow_blank_string and v == "" then
error(("Internal error: For key '%s' in `data.%s`, blank string not allowed; use 'false' for the default"):format(k, element))
end
end
end
end
return maxind
end
--[==[
-- Add the page to various maintenance categories for the language and the
-- whole page. These are placed in the headword somewhat arbitrarily, but
-- mainly because headword templates are mandatory for entries (meaning that
-- in theory it provides full coverage).
--
-- This is provided as an external entry point so that modules which transclude
-- information from other entries (such as {{tl|ja-see}}) can take advantage
-- of this feature as well, because they are used in place of a conventional
-- headword template.]==]
do
-- Handle any manual sortkeys that have been specified in raw categories
-- by tracking if they are the same or different from the automatically-
-- generated sortkey, so that we can track them in maintenance
-- categories.
local function handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats)
sortkey = sortkey or lang:makeSortKey(page.pagename)
-- If there are raw categories with no sortkey, then they will be
-- sorted based on the default MediaWiki sortkey, so we check against
-- that.
if tbl == true then
if page.raw_defaultsort ~= sortkey then
insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik")
end
return
end
local redundant, different
for k in pairs(tbl) do
if k == sortkey then
redundant = true
else
different = true
end
end
if redundant then
insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih lewah")
end
if different then
insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik")
end
return sortkey
end
function export.maintenance_cats(page, lang, lang_cats, page_cats)
extend(page_cats, page.cats)
lang = lang:getFull() -- since we are just generating categories
local canonical = lang:getCanonicalName()
local tbl, sortkey = page.wikitext_topic_cat[lang:getCode()]
if tbl then
sortkey = handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats)
insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori topik yang menggunakan penanda mentah")
end
tbl = page.wikitext_langname_cat[canonical]
if tbl then
handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats)
insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori nama bahasa yang menggunakan penanda mentah")
end
local current_L2 = get_current_L2()
if current_L2 then
local trimmed_L2 = trim(current_L2)
local expected_L2 = "Bahasa " .. canonical
if trimmed_L2 ~= expected_L2 then
insert(lang_cats, "Entri bahasa " .. canonical .. " dengan pengepala bahasa tidak betul")
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul/LANGCODE]]
track("pengepala bahasa tidak betul", lang)
end
end
end
end
--[==[This is the primary external entry point.
{{lua|full_headword(data)}}
This is used by {{temp|head}} and various language-specific headword templates (e.g. {{temp|ru-adj}} for Russian adjectives, {{temp|de-noun}} for German nouns, etc.) to display an entire headword line.
See [[#Further explanations for full_headword()]]
]==]
function export.full_headword(data)
-- Prevent data from being destructively modified.
local data = shallow_copy(data)
------------ 1. Basic checks for old-style (multi-arg) calling convention. ------------
if data.getCanonicalName then
error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) of properties, not a language object")
end
if not data.lang or type(data.lang) ~= "table" or not data.lang.getCode then
error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) and `data.lang` must be a language object")
end
if data.id and type(data.id) ~= "string" then
error("Internal error: The id in the data table should be a string.")
end
------------ 2. Initialize pagename etc. ------------
local langcode = data.lang:getCode()
local full_langcode = data.lang:getFullCode()
local langname = data.lang:getCanonicalName()
local full_langname = data.lang:getFullName()
local raw_pagename = data.pagename
local page
local m_headword_data = m_data or get_data()
if raw_pagename and raw_pagename ~= m_headword_data.pagename then -- for testing, doc pages, etc.
-- data.pagename is often set on documentation and test pages through the pagename= parameter of various
-- templates, to emulate running on that page. Having a large number of such test templates on a single
-- page often leads to timeouts, because we fetch and parse the contents of each page in turn. However,
-- we don't really need to do that and can function fine without fetching and parsing the contents of a
-- given page, so turn off content fetching/parsing (and also setting the DEFAULTSORT key through a parser
-- function, which is *slooooow*) in certain namespaces where test and documentation templates are likely to
-- be found and where actual content does not live (User, Template, Module).
local actual_namespace = m_headword_data.page.namespace
local no_fetch_content = actual_namespace == "User" or actual_namespace == "Template" or
actual_namespace == "Module"
page = process_page(raw_pagename, no_fetch_content)
else
page = m_headword_data.page
end
local namespace = page.namespace
------------ 3. Initialize `data.heads` table; if old-style, convert to new-style. ------------
if type(data.heads) == "table" and type(data.heads[1]) == "table" then
-- new-style
if data.translits or data.transcriptions then
error("Internal error: In full_headword(), if `data.heads` is new-style (array of head objects), `data.translits` and `data.transcriptions` cannot be given")
end
else
-- convert old-style `heads`, `translits` and `transcriptions` to new-style
local maxind = max(
init_and_find_maximum_index(data, "heads"),
init_and_find_maximum_index(data, "translits", true),
init_and_find_maximum_index(data, "transcriptions", true)
)
for i = 1, maxind do
data.heads[i] = {
term = data.heads[i],
tr = data.translits[i],
ts = data.transcriptions[i],
}
end
end
-- Make sure there's at least one head.
if not data.heads[1] then
data.heads[1] = {}
end
------------ 4. Initialize and validate `data.categories` and `data.whole_page_categories`, and determine `pos_category` if not given, and add basic categories. ------------
-- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]]
if data.altform then
data.noposcat = true
end
init_and_find_maximum_index(data, "categories")
init_and_find_maximum_index(data, "whole_page_categories")
local pos_category_already_present = false
if data.categories[1] then
local escaped_langname = pattern_escape(full_langname)
local matches_lang_pattern = "^" .. escaped_langname .. " "
for _, cat in ipairs(data.categories) do
-- Does the category begin with the language name? If not, tag it with a tracking category.
if not cat:find(matches_lang_pattern) then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category/LANGCODE]]
track("no lang category", data.lang)
end
end
-- If `pos_category` not given, try to infer it from the first specified category. If this doesn't work, we
-- throw an error below.
if not data.pos_category and data.categories[1]:find(matches_lang_pattern) then
data.pos_category = data.categories[1]:gsub(matches_lang_pattern, "")
-- Optimization to avoid inserting category already present.
pos_category_already_present = true
end
end
if not data.pos_category then
error("Internal error: `data.pos_category` not specified and could not be inferred from the categories given in "
.. "`data.categories`. Either specify the plural part of speech in `data.pos_category` "
.. "(e.g. \"proper nouns\") or ensure that the first category in `data.categories` is formed from the "
.. "language's canonical name plus the plural part of speech (e.g. \"Norwegian Bokmål proper nouns\")."
)
end
-- Insert a category at the beginning for the part of speech unless it's already present or `data.noposcat` given.
if not pos_category_already_present and not data.noposcat then
local pos_category = ucfirst(data.pos_category) .. " bahasa " .. full_langname
-- FIXME: [[User:Theknightwho]] Why is this special case here? Please add an explanatory comment.
if pos_category ~= "Aksara Han rentas bahasa" then
insert(data.categories, 1, pos_category)
end
end
-- Try to determine whether the part of speech refers to a lemma or a non-lemma form; if we can figure this out,
-- add an appropriate category.
local postype = export.pos_lemma_or_nonlemma(data.pos_category)
if not postype then
-- We don't know what this category is, so tag it with a tracking category.
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/LANGCODE]]
track("unrecognized pos", data.lang)
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS/LANGCODE]]
track("unrecognized pos/pos/" .. data.pos_category, data.lang)
elseif not data.noposcat then
insert(data.categories, 1, ucfirst(postype) .. " bahasa " .. full_langname)
end
-- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]]
if data.altform then
insert(data.categories, 1, "Bentuk alternatif bahasa " .. full_langname)
end
------------ 5. Create a default headword, and add links to multiword page names. ------------
-- Determine if this is an "anti-asterisk" term, i.e. an attested term in a language that must normally be
-- reconstructed.
local is_anti_asterisk = data.heads[1].term and data.heads[1].term:find("^!!")
local lang_reconstructed = data.lang:hasType("reconstructed")
if is_anti_asterisk then
if not lang_reconstructed then
error("Anti-asterisk feature (head= beginning with !!) can only be used with reconstructed languages")
end
lang_reconstructed = false
end
-- Determine if term is reconstructed
local is_reconstructed = namespace == "Rekonstruksi" or data.lang:hasType("reconstructed")
-- Create a default headword based on the pagename, which is determined in
-- advance by the data module so that it only needs to be done once.
local default_head = page.pagename
-- Add links to multi-word page names when appropriate
if not (is_reconstructed or data.nolinkhead) then
local no_links = m_headword_data.no_multiword_links
if not (no_links[langcode] or no_links[full_langcode]) and export.head_is_multiword(default_head) then
default_head = export.add_multiword_links(default_head, true)
end
end
if is_reconstructed then
default_head = "*" .. default_head
end
------------ 6. Check the namespace against the language type. ------------
if namespace == "" then
if lang_reconstructed then
error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Rekonstruksi: ")
elseif data.lang:hasType("appendix-constructed") then
error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Lampiran: ")
end
elseif namespace == "Petikan" or namespace == "Tesaurus" then
error("Templat pengepala tidak boleh digunakan dalam ruang nama " .. namespace .. ": .")
end
------------ 7. Fill in missing values in `data.heads`. ------------
-- True if any script among the headword scripts has spaces in it.
local any_script_has_spaces = false
-- True if any term has a redundant head= param.
local has_redundant_head_param = false
for _, head in ipairs(data.heads) do
------ 7a. If missing head, replace with default head.
if not head.term then
head.term = default_head
elseif head.term == default_head then
has_redundant_head_param = true
elseif is_anti_asterisk and head.term == "!!" then
-- If explicit head=!! is given, it's an anti-asterisk term and we fill in the default head.
head.term = "!!" .. default_head
elseif head.term:find("^[!?]$") then
-- If explicit head= just consists of ! or ?, add it to the end of the default head.
head.term = default_head .. head.term
end
head.term_no_initial_bang_bang = is_anti_asterisk and head.term:sub(3) or head.term
if is_reconstructed then
local head_term = head.term
if head_term:find("%[%[") then
head_term = remove_links(head_term)
end
if head_term:sub(1, 1) ~= "*" then
error("The headword '" .. head_term .. "' must begin with '*' to indicate that it is reconstructed.")
end
end
------ 7b. Try to detect the script(s) if not provided. If a per-head script is provided, that takes precedence,
------ otherwise fall back to the overall script if given. If neither given, autodetect the script.
local auto_sc = data.lang:findBestScript(head.term)
if (
auto_sc:getCode() == "None" and
find_best_script_without_lang(head.term):getCode() ~= "None"
) then
insert(data.categories, "Perkataan dalam bentuk tulisan tidak piawai bahasa " .. full_langname )
end
if not (head.sc or data.sc) then -- No script code given, so use autodetected script.
head.sc = auto_sc
else
if not head.sc then -- Overall script code given.
head.sc = data.sc
end
-- Track uses of sc parameter.
if head.sc:getCode() == auto_sc:getCode() then
track("redundant script code", data.lang)
if not data.no_script_code_cat then
insert(data.categories, "Perkataan dengan kod tulisan lewah bahasa " .. full_langname )
end
else
track("non-redundant manual script code", data.lang)
if not data.no_script_code_cat then
insert(data.categories, "Perkataan dengan kod tulisan manual tidak lewah bahasa " .. full_langname )
end
end
end
-- If using a discouraged character sequence, add to maintenance category.
if head.sc:hasNormalizationFixes() == true then
local composed_head = toNFC(head.term)
if head.sc:fixDiscouragedSequences(composed_head) ~= composed_head then
insert(data.whole_page_categories, "Laman menggunakan jujukan aksara tidak digalakkan")
end
end
any_script_has_spaces = any_script_has_spaces or head.sc:hasSpaces()
------ 7c. Create automatic transliterations for any non-Latin headwords without manual translit given
------ (provided automatic translit is available, e.g. not in Persian or Hebrew).
-- Make transliterations
head.tr_manual = nil
-- Try to generate a transliteration if necessary
if head.tr == "-" then
head.tr = nil
else
local notranslit = m_headword_data.notranslit
if not (notranslit[langcode] or notranslit[full_langcode]) and head.sc:isTransliterated() then
head.tr_manual = not not head.tr
local text = head.term_no_initial_bang_bang
if not data.lang:link_tr(head.sc) then
text = remove_links(text)
end
local automated_tr = data.lang:transliterate(text, head.sc)
if automated_tr then
local manual_tr = head.tr
if manual_tr then
if remove_links(manual_tr) == remove_links(automated_tr) then
insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi lewah")
else
insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi manual tidak lewah")
end
end
if not manual_tr then
head.tr = automated_tr
end
end
-- There is still no transliteration?
-- Add the entry to a cleanup category.
if not head.tr then
head.tr = "<small>transliteration needed</small>"
-- FIXME: No current support for 'Request for transliteration of Classical Persian terms' or similar.
-- Consider adding this support in [[Module:category tree/poscatboiler/data/entry maintenance]].
insert(data.categories, "Permintaan transliterasi perkataan bahasa " .. full_langname)
else
-- Otherwise, trim it.
head.tr = trim(head.tr)
end
end
end
-- Link to the transliteration entry for languages that require this.
if head.tr and data.lang:link_tr(head.sc) then
head.tr = full_link{
term = head.tr,
lang = data.lang,
sc = get_script("Latn"),
tr = "-"
}
end
end
------------ 8. Maybe tag the title with the appropriate script code, using the `display_title` mechanism. ------------
-- Assumes that the scripts in "toBeTagged" will never occur in the Reconstruction namespace.
-- (FIXME: Don't make assumptions like this, and if you need to do so, throw an error if the assumption is violated.)
-- Avoid tagging ASCII as Hani even when it is tagged as Hani in the headword, as in [[check]]. The check for ASCII
-- might need to be expanded to a check for any Latin characters and whitespace or punctuation.
local display_title
-- Where there are multiple headwords, use the script for the first. This assumes the first headword is similar to
-- the pagename, and that headwords that are in different scripts from the pagename aren't first. This seems to be
-- about the best we can do (alternatively we could potentially do script detection on the pagename).
local dt_script = data.heads[1].sc
local dt_script_code = dt_script:getCode()
local page_non_ascii = namespace == "" and not page.pagename:find("^[%z\1-\127]+$")
local unsupported_pagename, unsupported = page.full_raw_pagename:gsub("^Tajuk tidak disokong/", "")
if unsupported == 1 and page.unsupported_titles[unsupported_pagename] then
display_title = 'Tajuk tidak disokong/<span class="' .. dt_script_code .. '">' .. page.unsupported_titles[unsupported_pagename] .. '</span>'
elseif page_non_ascii and m_headword_data.toBeTagged[dt_script_code]
or (dt_script_code == "Jpan" and (text_in_script(page.pagename, "Hira") or text_in_script(page.pagename, "Kana")))
or (dt_script_code == "Kore" and text_in_script(page.pagename, "Hang")) then
display_title = '<span class="' .. dt_script_code .. '">' .. page.full_raw_pagename .. '</span>'
-- Keep Han entries region-neutral in the display title.
elseif page_non_ascii and (dt_script_code == "Hant" or dt_script_code == "Hans") then
display_title = '<span class="Hani">' .. page.full_raw_pagename .. '</span>'
elseif namespace == "Rekonstruksi" then
local matched
display_title, matched = ugsub(
page.full_raw_pagename,
"^(Rekonstruksi:[^/]+/)(.+)$",
function(before, term)
return before .. tag_text(term, data.lang, dt_script)
end
)
if matched == 0 then
display_title = nil
end
end
-- FIXME: Generalize this.
-- If the current language uses ur-Arab (for Urdu, etc.), ku-Arab (Central Kurdish) or pa-Arab
-- (Shahmukhi, for Punjabi) and there's more than one language on the page, don't set the display title
-- because these three scripts display in Nastaliq and we don't want this for terms that also exist in other
-- languages that don't display in Nastaliq (e.g. Arabic or Persian) to display in Nastaliq. Because the word
-- "Urdu" occurs near the end of the alphabet, Urdu fonts tend to override the fonts of other languages.
-- FIXME: This is checking for more than one language on the page but instead needs to check if there are any
-- languages using scripts other than the ones just mentioned.
if (dt_script_code == "ur-Arab" or dt_script_code == "ku-Arab" or dt_script_code == "pa-Arab") and page.L2_list.n > 1 then
display_title = nil
end
if display_title then
mw.getCurrentFrame():callParserFunction(
"DISPLAYTITLE",
display_title
)
end
------------ 9. Insert additional categories. ------------
if data.force_cat_output then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/force cat output]]
track("force cat output")
end
if has_redundant_head_param then
if not data.no_redundant_head_cat then
-- This is not the right way to go about this; too many exceptions and problems due to language-specific headword
-- handling customization. If we want this, it should be opt-in by a given language passing in the default headword.
-- insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan parameter kepala lewah")
end
end
-- If the first head is multiword (after removing links), maybe insert into "LANG multiword terms".
if not data.nomultiwordcat and any_script_has_spaces and postype == "lemma" then
local no_multiword_cat = m_headword_data.no_multiword_cat
if not (no_multiword_cat[langcode] or no_multiword_cat[full_langcode]) then
-- Check for spaces or hyphens, but exclude prefixes and suffixes.
-- Use the pagename, not the head= value, because the latter may have extra
-- junk in it, e.g. superscripted text that throws off the algorithm.
local no_hyphen = m_headword_data.hyphen_not_multiword_sep
-- Exclude hyphens if the data module states that they should for this language.
local checkpattern = (no_hyphen[langcode] or no_hyphen[full_langcode]) and ".[%s፡]." or ".[%s%-፡]."
local is_multiword = umatch(page.pagename, checkpattern)
if is_multiword and not non_categorizable(page.full_raw_pagename) then
insert(data.categories, "Perkataan berbilang kata bahasa " .. full_langname)
elseif not is_multiword then
local long_word_threshold = m_headword_data.long_word_thresholds[langcode] or
m_headword_data.long_word_thresholds[full_langcode]
if long_word_threshold and ulen(page.pagename) >= long_word_threshold then
insert(data.categories, "Perkataan panjang bahasa " .. full_langname)
end
end
end
end
local default_sccat = m_headword_data.default_sccat
if data.sccat or data.sccat == nil and (default_sccat[langcode] or default_sccat[full_langcode]) then
for _, head in ipairs(data.heads) do
insert(data.categories, ucfirst(data.pos_category) .. " bahasa " .. full_langname .. " dalam " ..
head.sc:getDisplayForm())
end
end
-- Reconstructed terms often use weird combinations of scripts and realistically aren't spelled so much as notated.
if namespace ~= "Rekonstruksi" then
-- Map from languages to a string containing the characters to ignore when considering whether a term has
-- multiple written scripts in it. Typically these are Greek or Cyrillic letters used for their phonetic
-- values.
local characters_to_ignore = {
["aaq"] = "αάὰ", -- Penobscot (Algonquian)
["acy"] = "δθ", -- Cypriot Arabic
["aez"] = "β", -- Aeka (Trans-New Guinea)
["anc"] = "γ", -- Ngas (Chadic/Afroasiatic)
["aou"] = "χ", -- A'ou (Kra-Dai)
["art-blk"] = "ч", -- Bolak (conlang)
["awg"] = "β", -- Anguthimri (Pama-Nyungan)
["az"] = "ь", -- Azerbaijani (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["ba"] = "ь", -- Bashkir (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["bhp"] = "β", -- Bima (Austronesian)
["bjz"] = "β", -- Baruga (Trans-New Guinea)
["byk"] = "θ", -- Biao (Kra-Dai)
["cdy"] = "θ", -- Chadong (Kra-Dai)
["chp"] = "θ", -- Chipewyan (Athabaskan)
["cjh"] = "χ", -- Upper Chehalis (Salishan)
["clm"] = "χ", -- Klallam (Salishan)
["col"] = "χ", -- Colombia-Wenatchi (Salishan)
["coo"] = "χθ", -- Comox (Salishan)
["crx"] = "θ", -- Carrier (Athabaskan)
["ets"] = "θ", -- Yekhee (Edoid/Niger-Congo)
["ett"] = "χ", -- Etruscan (isolate; in romanizations)
["fla"] = "χ", -- Montana Salish (Salishan)
["grt"] = "་", -- Garo (South Asian Sino-Tibetan)
["gmw-gts"] = "χ", -- Gottscheerish (Bavarian variant spoken in Slovenia)
["hur"] = "χθ", -- Halkomelem (Salishan)
["itc-psa"] = "f", -- Pre-Samnite (Italic; normally written in Greek)
["izh"] = "ь", -- Ingrian (Finnic)
["kic"] = "θ", -- Kickapoo (Algonquian)
["kk"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["ky"] = "ь", -- Kyrgyz (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["lil"] = "χ", -- Lillooet (Salishan)
["lsi"] = "ꓹ", -- Lashi (Lolo-Burmese/Sino-Tibetan; represents a glottal stop)
["mhz"] = "β", -- Mor (Austronesian)
["mqn"] = "β", -- Moronene (Austronesian)
["neg"]= "ӡā", -- Negidal (Tungusic; normally in Cyrillic)
["oka"] = "χ", -- Okanagan (Salishan)
["ole"] = "θ", -- Olekha (Sino-Tibetan)
["oui"] = "γβ", -- Old Uyghur (Turkic; FIXME: others? E.g. Greek delta (δ)?)
["pox"] = "χ", -- Polabian (West Slavic)
["rif"] = "ε", -- Tarifit (Berber)
["rom"] = "Θθ", -- Romani (Indic: International Standard; two different thetas???)
["rpn"] = "β", -- Repanbitip (Austronesian)
["sah"] = "ь", -- Yakut (Turkic; 1929 - 1939 Latin spelling)
["sit-jap"] = "χ", -- Japhug (Sino-Tibetan)
["sjw"] = "θ", -- Shawnee (Algonquian)
["squ"] = "χ", -- Squamish (Salishan)
["str"] = "χθ", -- Saanich (Salishan)
["teh"] = "χ", -- Tehuelche (Chonan; spoken in Argentina)
["tep"] = "η", -- Tepecano (Uto-Aztecan)
["thp"] = "χ", -- Thompson (Salishan)
["tk"] = "ь", -- Turkmen (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["tt"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["twa"] = "χ", -- Twana (Salishan)
["wbl"] = "ы", -- Wakhi (Iranian)
["xbc"] = "ϸ", -- Bactrian (Iranian; represents š; normally written in Greek)
["yha"] = "θ", -- Baha (Kra-Dai)
["za"] = "зч", -- Zhuang (Tai/Kra-Dai); 1957-1982 alphabet used two Cyrillic letters (as well as some others like
-- ƃ, ƅ, ƨ, ɯ and ɵ that look like Cyrillic or Greek but are actually Latin)
["zlw-slv"] = "χђћ", -- Slovincian (West Slavic; FIXME: χ is Greek, the other two are Cyrillic, but I'm not sure
-- the currect characters are being chosen in the entry names)
["zng"] = "θ", -- Mang (Mon-Khmer)
["ztp"] = "θ", -- Loxicha Zapotec (Zapotecan)
}
-- Determine how many real scripts are found in the pagename, where we exclude symbols and such. We exclude
-- scripts whose `character_category` is false as well as Zmth (mathematical notation symbols), which has a
-- category of "Mathematical notation symbols". When counting scripts, we need to elide language-specific
-- variants because e.g. Beng and as-Beng have slightly different characters but we don't want to consider them
-- two different scripts (e.g. [[এৰ]] has two characters which are detected respectively as Beng and as-Beng).
local seen_scripts = {}
local num_seen_scripts = 0
local num_loops = 0
local canon_pagename = page.pagename
local ch_to_ignore = characters_to_ignore[full_langcode]
if ch_to_ignore then
canon_pagename = ugsub(canon_pagename, "[" .. ch_to_ignore .. "]", "")
end
while true do
if canon_pagename == "" or num_seen_scripts >= 2 or num_loops >= 10 then
break
end
-- Make sure we don't get into a loop checking the same script over and over again; happens with e.g. [[ᠪᡳ]]
num_loops = num_loops + 1
local pagename_script = find_best_script_without_lang(canon_pagename, "None only as last resort")
local script_chars = pagename_script.characters
if not script_chars then
-- we are stuck; this happens with None
break
end
local script_code = pagename_script:getCode()
local replaced
canon_pagename, replaced = ugsub(canon_pagename, "[" .. script_chars .. "]", "")
if (
replaced and
script_code ~= "Zmth" and
(script_data or get_script_data())[script_code] and
script_data[script_code].character_category ~= false
) then
script_code = script_code:gsub("^.-%-", "")
if not seen_scripts[script_code] then
seen_scripts[script_code] = true
num_seen_scripts = num_seen_scripts + 1
end
end
end
if num_seen_scripts > 1 then
insert(data.categories, "Perkataan bahasa " .. full_langname .. " dieja dalam berbilang tulisan")
end
end
-- Categorise for unusual characters. Takes into account combining characters, so that we can categorise for characters with diacritics that aren't encoded as atomic characters (e.g. U̠). These can be in two formats: single combining characters (i.e. character + diacritic(s)) or double combining characters (i.e. character + diacritic(s) + character). Each can have any number of diacritics.
local standard = data.lang:getStandardCharacters()
if standard and not non_categorizable(page.full_raw_pagename) then
local function char_category(char)
local specials = {
["#"] = "number sign",
["("] = "parentheses",
[")"] = "parentheses",
["<"] = "angle brackets",
[">"] = "angle brackets",
["["] = "square brackets",
["]"] = "square brackets",
["_"] = "underscore",
["{"] = "braces",
["|"] = "vertical line",
["}"] = "braces",
["ß"] = "ẞ",
["\205\133"] = "", -- this is UTF-8 for U+0345 ( ͅ)
["\239\191\189"] = "replacement character",
}
char = toNFD(char)
:gsub(".[\128-\191]*", function(m)
local new_m = specials[m]
new_m = new_m or m:uupper()
return new_m
end)
return toNFC(char)
end
if full_langcode ~= "hi" and full_langcode ~= "lo" then
local standard_chars_scripts = {}
for _, head in ipairs(data.heads) do
standard_chars_scripts[head.sc:getCode()] = true
end
-- Iterate over the scripts, in case there is more than one (as they can have different sets of standard characters).
for code in pairs(standard_chars_scripts) do
local sc_standard = data.lang:getStandardCharacters(code)
if sc_standard then
if page.pagename_len > 1 then
local explode_standard = {}
local function explode(char)
explode_standard[char] = true
return ""
end
local sc_standard = ugsub(sc_standard, page.comb_chars.combined_double, explode)
sc_standard = ugsub(sc_standard,page.comb_chars.combined_single, explode)
:gsub(".[\128-\191]*", explode)
local num_cat_inserted
for char in pairs(page.explode_pagename) do
if not explode_standard[char] then
if char:find("[0-9]") then
if not num_cat_inserted then
insert(data.categories, "Perkataan dieja dengan nombor bahasa " .. full_langname)
num_cat_inserted = true
end
elseif ufind(char, page.emoji_pattern) then
insert(data.categories, "Perkataan dieja dengan emoji bahasa " .. full_langname)
else
local upper = char_category(char)
if not explode_standard[upper] then
char = upper
end
insert(data.categories, "Perkataan dieja dengan " .. char .. " bahasa " .. full_langname)
end
end
end
end
-- If a diacritic doesn't appear in any of the standard characters, also categorise for it generally.
sc_standard = toNFD(sc_standard)
for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_single) do
if not umatch(sc_standard, diacritic) then
insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. " bahasa " .. full_langname)
end
end
for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_double) do
if not umatch(sc_standard, diacritic) then
insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. "◌ bahasa " .. full_langname)
end
end
end
end
-- Ancient Greek, Hindi and Lao handled the old way for now, as their standard chars still need to be converted to the new format (because there are a lot of them).
elseif ulen(page.pagename) ~= 1 then
for character in ugmatch(page.pagename, "([^" .. standard .. "])") do
local upper = char_category(character)
if not umatch(upper, "[" .. standard .. "]") then
character = upper
end
insert(data.categories, "Perkataan dieja dengan " .. character .. " bahasa " .. full_langname)
end
end
end
if data.heads[1].sc:isSystem("alphabet") then
local pagename, i = page.pagename:ulower(), 2
while umatch(pagename, "(%a)" .. ("%1"):rep(i)) do
i = i + 1
insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan " .. i .. " contoh huruf yang sama berturut-turut")
end
end
-- Categorise for palindromes
if not data.nopalindromecat and namespace ~= "Rekonstruksi" and ulen(page.pagename) > 2
-- FIXME: Use of first script here seems hacky. What is the clean way of doing this in the presence of
-- multiple scripts?
and is_palindrome(page.pagename, data.lang, data.heads[1].sc) then
insert(data.categories, "Palindrom bahasa " .. full_langname)
end
if namespace == "" and not lang_reconstructed then
for _, head in ipairs(data.heads) do
if page.full_raw_pagename ~= get_link_page(remove_links(head.term), data.lang, head.sc) then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch/LANGCODE]]
track("pagename spelling mismatch", data.lang)
break
end
end
end
-- Add red link category if called for and we're not a "large" page, where such checks are disabled.
if data.checkredlinks and not m_headword_data.large_pages[m_headword_data.pagename] then
local plposcat = type(data.checkredlinks) == "string" and data.checkredlinks or data.pos_category
check_red_link_inflections_top_level(data, plposcat)
end
-- Add to various maintenance categories.
export.maintenance_cats(page, data.lang, data.categories, data.whole_page_categories)
------------ 10. Format and return headwords, genders, inflections and categories. ------------
-- Format and return all the gathered information. This may add more categories (e.g. gender/number categories),
-- so make sure we do it before evaluating `data.categories`.
local text = '<span class="headword-line">' ..
format_headword(data) ..
format_headword_genders(data) ..
format_top_level_inflections(data) .. '</span>'
-- Language-specific categories.
local cats = format_categories(
data.categories, data.lang, data.sort_key, page.encoded_pagename,
data.force_cat_output or test_force_categories, data.heads[1].sc
)
-- Language-agnostic categories.
local whole_page_cats = format_categories(
data.whole_page_categories, nil, "-"
)
return text .. cats .. whole_page_cats
end
return export
300k6vdtnx49kob85k9vg814otvxtfj
281455
281454
2026-04-22T15:34:00Z
Hakimi97
2668
281455
Scribunto
text/plain
local export = {}
-- Named constants for all modules used, to make it easier to swap out sandbox versions.
local debug_track_module = "Module:debug/track"
local en_utilities_module = "Module:en-utilities"
local gender_and_number_module = "Module:gender and number"
local headword_data_module = "Module:headword/data"
local headword_page_module = "Module:headword/page"
local links_module = "Module:links"
local load_module = "Module:load"
local pages_module = "Module:pages"
local palindromes_module = "Module:palindromes"
local pron_qualifier_module = "Module:pron qualifier"
local scripts_module = "Module:scripts"
local scripts_data_module = "Module:scripts/data"
local script_utilities_module = "Module:script utilities"
local script_utilities_data_module = "Module:script utilities/data"
local string_utilities_module = "Module:string utilities"
local table_module = "Module:table"
local utilities_module = "Module:utilities"
local concat = table.concat
local dump = mw.dumpObject
local insert = table.insert
local ipairs = ipairs
local max = math.max
local new_title = mw.title.new
local pairs = pairs
local require = require
local toNFC = mw.ustring.toNFC
local toNFD = mw.ustring.toNFD
local type = type
local ufind = mw.ustring.find
local ugmatch = mw.ustring.gmatch
local ugsub = mw.ustring.gsub
local umatch = mw.ustring.match
--[==[
Loaders for functions in other modules, which overwrite themselves with the target function when called. This ensures modules are only loaded when needed, retains the speed/convenience of locally-declared pre-loaded functions, and has no overhead after the first call, since the target functions are called directly in any subsequent calls.]==]
local function debug_track(...)
debug_track = require(debug_track_module)
return debug_track(...)
end
local function encode_entities(...)
encode_entities = require(string_utilities_module).encode_entities
return encode_entities(...)
end
local function extend(...)
extend = require(table_module).extend
return extend(...)
end
local function find_best_script_without_lang(...)
find_best_script_without_lang = require(scripts_module).findBestScriptWithoutLang
return find_best_script_without_lang(...)
end
local function format_categories(...)
format_categories = require(utilities_module).format_categories
return format_categories(...)
end
local function format_genders(...)
format_genders = require(gender_and_number_module).format_genders
return format_genders(...)
end
local function format_pron_qualifiers(...)
format_pron_qualifiers = require(pron_qualifier_module).format_qualifiers
return format_pron_qualifiers(...)
end
local function full_link(...)
full_link = require(links_module).full_link
return full_link(...)
end
local function get_current_L2(...)
get_current_L2 = require(pages_module).get_current_L2
return get_current_L2(...)
end
local function get_link_page(...)
get_link_page = require(links_module).get_link_page
return get_link_page(...)
end
local function get_script(...)
get_script = require(scripts_module).getByCode
return get_script(...)
end
local function is_palindrome(...)
is_palindrome = require(palindromes_module).is_palindrome
return is_palindrome(...)
end
local function language_link(...)
language_link = require(links_module).language_link
return language_link(...)
end
local function load_data(...)
load_data = require(load_module).load_data
return load_data(...)
end
local function pattern_escape(...)
pattern_escape = require(string_utilities_module).pattern_escape
return pattern_escape(...)
end
local function pluralize(...)
pluralize = require(en_utilities_module).pluralize
return pluralize(...)
end
local function process_page(...)
process_page = require(headword_page_module).process_page
return process_page(...)
end
local function remove_links(...)
remove_links = require(links_module).remove_links
return remove_links(...)
end
local function shallow_copy(...)
shallow_copy = require(table_module).shallowCopy
return shallow_copy(...)
end
local function tag_text(...)
tag_text = require(script_utilities_module).tag_text
return tag_text(...)
end
local function tag_transcription(...)
tag_transcription = require(script_utilities_module).tag_transcription
return tag_transcription(...)
end
local function tag_translit(...)
tag_translit = require(script_utilities_module).tag_translit
return tag_translit(...)
end
local function trim(...)
trim = require(string_utilities_module).trim
return trim(...)
end
local function ulen(...)
ulen = require(string_utilities_module).len
return ulen(...)
end
local function ucfirst(...)
ucfirst = require(string_utilities_module).ucfirst
return ucfirst(...)
end
--[==[
Loaders for objects, which load data (or some other object) into some variable, which can then be accessed as "foo or get_foo()", where the function get_foo sets the object to "foo" and then returns it. This ensures they are only loaded when needed, and avoids the need to check for the existence of the object each time, since once "foo" has been set, "get_foo" will not be called again.]==]
local m_data
local function get_data()
m_data = load_data(headword_data_module)
return m_data
end
local script_data
local function get_script_data()
script_data = load_data(scripts_data_module)
return script_data
end
local script_utilities_data
local function get_script_utilities_data()
script_utilities_data = load_data(script_utilities_data_module)
return script_utilities_data
end
-- If set to true, categories always appear, even in non-mainspace pages
local test_force_categories = false
-- Add a tracking category to track entries with certain (unusually undesirable) properties. `track_id` is an identifier
-- for the particular property being tracked and goes into the tracking page. Specifically, this adds a link in the
-- page text to [[Wiktionary:Tracking/headword/TRACK_ID]], meaning you can find all entries with the `track_id` property
-- by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID]].
--
-- If `lang` (a language object) is given, an additional tracking page [[Wiktionary:Tracking/headword/TRACK_ID/CODE]] is
-- linked to where CODE is the language code of `lang`, and you can find all entries in the combination of `track_id`
-- and `lang` by visiting [[Special:WhatLinksHere/Wiktionary:Tracking/headword/TRACK_ID/CODE]]. This makes it possible to
-- isolate only the entries with a specific tracking property that are in a given language. Note that if `lang`
-- references at etymology-only language, both that language's code and its full parent's code are tracked.
local function track(track_id, lang)
local tracking_page = "headword/" .. track_id
if lang and lang:hasType("etymology-only") then
debug_track{tracking_page, tracking_page .. "/" .. lang:getCode(),
tracking_page .. "/" .. lang:getFullCode()}
elseif lang then
debug_track{tracking_page, tracking_page .. "/" .. lang:getCode()}
else
debug_track(tracking_page)
end
return true
end
local function text_in_script(text, script_code)
local sc = get_script(script_code)
if not sc then
error("Internal error: Bad script code " .. script_code)
end
local characters = sc.characters
local out
if characters then
text = ugsub(text, "%W", "")
out = ufind(text, "[" .. characters .. "]")
end
if out then
return true
else
return false
end
end
local spacingPunctuation = "[%s%p]+"
--[[ List of punctuation or spacing characters that are found inside of words.
Used to exclude characters from the regex above. ]]
local wordPunc = "-#%%&@־׳״'.·*’་•:᠊"
local notWordPunc = "[^" .. wordPunc .. "]+"
-- Format a term (either a head term or an inflection term) along with any left or right qualifiers, labels, references
-- or customized separator: `part` is the object specifying the term (and `lang` the language of the term), which should
-- optionally contain:
-- * left qualifiers in `q`, an array of strings;
-- * right qualifiers in `qq`, an array of strings;
-- * left labels in `l`, an array of strings;
-- * right labels in `ll`, an array of strings;
-- * references in `refs`, an array either of strings (formatted reference text) or objects containing fields `text`
-- (formatted reference text) and optionally `name` and/or `group`;
-- * a separator in `separator`, defaulting to " <i>or</i> " if this is not the first term (j > 1), otherwise "".
-- `formatted` is the formatted version of the term itself, and `j` is the index of the term.
local function format_term_with_qualifiers_and_refs(lang, part, formatted, j)
local function part_non_empty(field)
local list = part[field]
if not list then
return nil
end
if type(list) ~= "table" then
error(("Internal error: Wrong type for `part.%s`=%s, should be \"table\""):format(field, dump(list)))
end
return list[1]
end
if part_non_empty("q") or part_non_empty("qq") or part_non_empty("l") or
part_non_empty("ll") or part_non_empty("refs") then
formatted = format_pron_qualifiers {
lang = lang,
text = formatted,
q = part.q,
qq = part.qq,
l = part.l,
ll = part.ll,
refs = part.refs,
}
end
local separator = part.separator or j > 1 and " <i>or</i> " -- use "" to request no separator
if separator then
formatted = separator .. formatted
end
return formatted
end
--[==[Return true if the given head is multiword according to the algorithm used in full_headword().]==]
function export.head_is_multiword(head)
for possibleWordBreak in ugmatch(head, spacingPunctuation) do
if umatch(possibleWordBreak, notWordPunc) then
return true
end
end
return false
end
do
local function workaround_to_exclude_chars(s)
return (ugsub(s, notWordPunc, "\2%1\1"))
end
--[==[Add links to a multiword head.]==]
function export.add_multiword_links(head, default)
head = "\1" .. ugsub(head, spacingPunctuation, workaround_to_exclude_chars) .. "\2"
if default then
head = head
:gsub("(\1[^\2]*)\\([:#][^\2]*\2)", "%1\\\\%2")
:gsub("(\1[^\2]*)([:#][^\2]*\2)", "%1\\%2")
end
--Escape any remaining square brackets to stop them breaking links (e.g. "[citation needed]").
head = encode_entities(head, "[]", true, true)
--[=[
use this when workaround is no longer needed:
head = "[[" .. ugsub(head, WORDBREAKCHARS, "]]%1[[") .. "]]"
Remove any empty links, which could have been created above
at the beginning or end of the string.
]=]
return (head
:gsub("\1\2", "")
:gsub("[\1\2]", {["\1"] = "[[", ["\2"] = "]]"}))
end
end
local function non_categorizable(full_raw_pagename)
return full_raw_pagename:find("^Lampiran:Gerak isyarat/") or
-- Unsupported titles with descriptive names.
(full_raw_pagename:find("^Tajuk tidak disokong/") and not full_raw_pagename:find("`"))
end
local function tag_text_and_add_quals_and_refs(data, head, formatted, j)
-- Add language and script wrapper.
formatted = tag_text(formatted, data.lang, head.sc, "head", nil, j == 1 and data.id or nil)
-- Add qualifiers, labels, references and separator.
return format_term_with_qualifiers_and_refs(data.lang, head, formatted, j)
end
-- Format a headword with transliterations.
local function format_headword(data)
-- Are there non-empty transliterations?
local has_translits = false
local has_manual_translits = false
------ Format the headwords. ------
local head_parts = {}
local unique_head_parts = {}
local has_multiple_heads = not not data.heads[2]
for j, head in ipairs(data.heads) do
if head.tr or head.ts then
has_translits = true
end
if head.tr and head.tr_manual or head.ts then
has_manual_translits = true
end
local formatted
-- Apply processing to the headword, for formatting links and such.
if head.term:find("[[", nil, true) and head.sc:getCode() ~= "Image" then
formatted = language_link{term = head.term, lang = data.lang}
else
formatted = data.lang:makeDisplayText(head.term, head.sc, true)
end
local head_part = tag_text_and_add_quals_and_refs(data, head, formatted, j)
insert(head_parts, head_part)
-- If multiple heads, try to determine whether all heads display the same. To do this we need to effectively
-- rerun the text tagging and addition of qualifiers and references, using 1 for all indices.
if has_multiple_heads then
local unique_head_part
if j == 1 then
unique_head_part = head_part
else
unique_head_part = tag_text_and_add_quals_and_refs(data, head, formatted, 1)
end
unique_head_parts[unique_head_part] = true
end
end
local set_size = 0
if has_multiple_heads then
for _ in pairs(unique_head_parts) do
set_size = set_size + 1
end
end
if set_size == 1 then
head_parts = head_parts[1]
else
head_parts = concat(head_parts)
end
if has_manual_translits then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/manual-tr/LANGCODE]]
track("manual-tr", data.lang)
end
------ Format the transliterations and transcriptions. ------
local translits_formatted
if has_translits then
local translit_parts = {}
for _, head in ipairs(data.heads) do
if head.tr or head.ts then
local this_parts = {}
if head.tr then
insert(this_parts, tag_translit(head.tr, data.lang:getCode(), "head", nil, head.tr_manual))
if head.ts then
insert(this_parts, " ")
end
end
if head.ts then
insert(this_parts, "/" .. tag_transcription(head.ts, data.lang:getCode(), "head") .. "/")
end
insert(translit_parts, concat(this_parts))
end
end
translits_formatted = " (" .. concat(translit_parts, " <i>or</i> ") .. ")"
local langname = data.lang:getCanonicalName()
local transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus")
local saw_translit_page = false
if transliteration_page and transliteration_page:getContent() then
translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted
saw_translit_page = true
end
-- If data.lang is an etymology-only language and we didn't find a translation page for it, fall back to the
-- full parent.
if not saw_translit_page and data.lang:hasType("etymology-only") then
langname = data.lang:getFullName()
transliteration_page = new_title("Transliterasi bahasa " .. langname, "Wikikamus")
if transliteration_page and transliteration_page:getContent() then
translits_formatted = " [[Wikikamus:Transliterasi bahasa " .. langname .. "|•]]" .. translits_formatted
end
end
else
translits_formatted = ""
end
------ Paste heads and transliterations/transcriptions. ------
local lemma_gloss
if data.gloss then
lemma_gloss = ' <span class="ib-content qualifier-content">' .. data.gloss .. '</span>'
else
lemma_gloss = ""
end
return head_parts .. translits_formatted .. lemma_gloss
end
local function format_headword_genders(data)
local retval = ""
if data.genders and data.genders[1] then
if data.gloss then
retval = ","
end
local pos_for_cat
if not data.nogendercat then
local no_gender_cat = (m_data or get_data()).no_gender_cat
if not (no_gender_cat[data.lang:getCode()] or no_gender_cat[data.lang:getFullCode()]) then
pos_for_cat = (m_data or get_data()).pos_for_gender_number_cat[data.pos_category:gsub("^reconstructed ", "")]
end
end
local text, cats = format_genders(data.genders, data.lang, pos_for_cat)
if cats then
extend(data.categories, cats)
end
retval = retval .. " " .. text
end
return retval
end
-- Forward reference
local format_inflections
local function format_inflection_parts(data, parts)
for j, part in ipairs(parts) do
if type(part) ~= "table" then
part = {term = part}
end
local partaccel = part.accel
local face = part.face or "bold"
if face ~= "bold" and face ~= "plain" and face ~= "hypothetical" then
error("The face `" .. face .. "` " .. (
(script_utilities_data or get_script_utilities_data()).faces[face] and
"should not be used for non-headword terms on the headword line." or
"is invalid."
))
end
-- Here the final part 'or data.nolinkinfl' allows to have 'nolinkinfl=true'
-- right into the 'data' table to disable inflection links of the entire headword
-- when inflected forms aren't entry-worthy, e.g.: in Vulgar Latin
local nolinkinfl = part.face == "hypothetical" or (part.nolink and track("nolink") or part.nolinkinfl) or (
data.nolink and track("nolink") or data.nolinkinfl)
local formatted
if part.label then
-- FIXME: There should be a better way of italicizing a label. As is, this isn't customizable.
formatted = "<i>" .. part.label .. "</i>"
else
-- Convert the term into a full link. Don't show a transliteration here unless enable_auto_translit is
-- requested, either at the `parts` level (i.e. per inflection) or at the `data.inflections` level (i.e.
-- specified for all inflections). This is controllable in {{head}} using autotrinfl=1 for all inflections,
-- or fNautotr=1 for an individual inflection (remember that a single inflection may be associated with
-- multiple terms). The reason for doing this is to avoid clutter in headword lines by default in languages
-- where the script is relatively straightforward to read by learners (e.g. Greek, Russian), but allow it
-- to be enabled in languages with more complex scripts (e.g. Arabic).
--
-- FIXME: With nested inflections, should we also respect `enable_auto_translit` at the top level of the
-- nested inflections structure?
local tr = part.tr or not (parts.enable_auto_translit or data.inflections.enable_auto_translit) and "-" or nil
-- FIXME: Temporary errors added 2025-10-03. Remove after a month or so.
if part.translit then
error("Internal error: Use field `tr` not `translit` for specifying an inflection part translit")
end
if part.transcription then
error("Internal error: Use field `ts` not `transcription` for specifying an inflection part transcription")
end
local postprocess_annotations
if part.inflections then
postprocess_annotations = function(infldata)
insert(infldata.annotations, format_inflections(data, part.inflections))
end
end
formatted = full_link(
{
term = not nolinkinfl and part.term or nil,
alt = part.alt or (nolinkinfl and part.term or nil),
lang = part.lang or data.lang,
sc = part.sc or parts.sc or nil,
gloss = part.gloss,
pos = part.pos,
lit = part.lit,
id = part.id,
genders = part.genders,
tr = tr,
ts = part.ts,
accel = partaccel or parts.accel,
postprocess_annotations = postprocess_annotations,
},
face
)
end
parts[j] = format_term_with_qualifiers_and_refs(part.lang or data.lang, part,
formatted, j)
end
local parts_output
if parts[1] then
parts_output = (parts.label and " " or "") .. concat(parts)
elseif parts.request then
parts_output = " <small>[please provide]</small>"
insert(data.categories, "Requests for inflections in " .. data.lang:getFullName() .. " entries")
else
parts_output = ""
end
local parts_label = parts.label and ("<i>" .. parts.label .. "</i>") or ""
return format_term_with_qualifiers_and_refs(data.lang, parts, parts_label .. parts_output, 1)
end
-- Format the inflections following the headword or nested after a given inflection. Declared local above.
function format_inflections(data, inflections)
if inflections and inflections[1] then
-- Format each inflection individually.
for key, infl in ipairs(inflections) do
inflections[key] = format_inflection_parts(data, infl)
end
return concat(inflections, ", ")
else
return ""
end
end
-- Format the top-level inflections following the headword. Currently this just adds parens around the
-- formatted comma-separated inflections in `data.inflections`.
local function format_top_level_inflections(data)
local result = format_inflections(data, data.inflections)
if result ~= "" then
return " (" .. result .. ")"
else
return result
end
end
-- Forward reference
local check_red_link_inflections
-- Check a single inflection (which consists of a label and zero or more terms, each possibly with nested inflections)
-- for red links. If so, insert a red-link category based on `plpos` (the plural part of speech to insert in the
-- category), stop further processing, and return true. If no red links found, return false.
local function check_red_link_inflection_parts(data, parts, plpos)
for _, part in ipairs(parts) do
if type(part) ~= "table" then
part = {term = part}
end
local term = part.term
if term and not term:find("%[%[") then
local stripped_physical_term = get_link_page(term, data.lang, part.sc or parts.sc or nil)
if stripped_physical_term then
local title = mw.title.new(stripped_physical_term)
if title and not title:getContent() then
insert(data.categories, data.lang:getFullName() .. " " .. plpos .. " with red links in their headword lines")
return true
end
end
end
if part.inflections then
if check_red_link_inflections(data, part.inflections, plpos) then
return true
end
end
end
return false
end
-- Check a set of inflections (each of which describes a single inflection of the term, such as feminine or plural, and
-- consists of a label and zero or more terms, each possibly with nested inflections) for red links. If so, insert a
-- red-link category based on `plpos` (the plural part of speech to insert in the category), stop further processing,
-- and return true. If no red links found, return false.
function check_red_link_inflections(data, inflections, plpos)
if inflections and inflections[1] then
-- Check each inflection individually.
for key, infl in ipairs(inflections) do
if check_red_link_inflection_parts(data, infl, plpos) then
return true
end
end
end
return false
end
-- Check the top-level inflections in `data.inflections`, along with any nested inflections, for red links. If so,
-- insert a red-link category based on `plpos` (the plural part of speech to insert in the category), stop further
-- processing, and return true. If no red links found, return false.
local function check_red_link_inflections_top_level(data, plpos)
return check_red_link_inflections(data, data.inflections, plpos)
end
--[==[
Returns the plural form of `pos`, a raw part of speech input, which could be singular or
plural. Irregular plural POS are taken into account (e.g. "kanji" pluralizes to
"kanji").
]==]
function export.pluralize_pos(pos)
-- Make the plural form of the part of speech
return (m_data or get_data()).irregular_plurals[pos] or
pos:sub(-1) == "s" and pos or
pluralize(pos)
end
--[==[
Return "lemma" if the given POS is a lemma, "non-lemma form" if a non-lemma form, or nil
if unknown. The POS passed in must be in its plural form ("nouns", "prefixes", etc.).
If you have a POS in its singular form, call {export.pluralize_pos()} above to pluralize it
in a smart fashion that knows when to add "-s" and when to add "-es", and also takes
into account any irregular plurals.
If `best_guess` is given and the POS is in neither the lemma nor non-lemma list, guess
based on whether it ends in " forms"; otherwise, return nil.
]==]
function export.pos_lemma_or_nonlemma(plpos, best_guess)
local m_headword_data = m_data or get_data()
local isLemma = m_headword_data.lemmas
-- Is it a lemma category?
if isLemma[plpos] then
return "Lema"
end
local plpos_no_recon = plpos:gsub("^reconstructed ", "")
if isLemma[plpos_no_recon] then
return "Lema"
end
-- Is it a nonlemma category?
local isNonLemma = m_headword_data.nonlemmas
if isNonLemma[plpos] or isNonLemma[plpos_no_recon] then
return "Bentuk bukan lema"
end
local plpos_no_mut = plpos:gsub("^mutated ", "")
if isLemma[plpos_no_mut] or isNonLemma[plpos_no_mut] then
return "Bentuk bukan lema"
elseif best_guess then
return plpos:find("^Bentuk ") and "Bentuk bukan lema" or "Lema"
else
return nil
end
end
--[==[
Canonicalize a part of speech as specified in 2= in {{tl|head}}. This checks for POS aliases and non-lemma form
aliases ending in 'f', and then pluralizes if the POS term does not have an invariable plural.
]==]
function export.canonicalize_pos(pos)
-- FIXME: Temporary code to throw an error for alias 'pre' (= preposition) that will go away.
if pos == "pre" then
-- Don't throw error on 'pref' as it's an alias for "prefix".
error("POS 'pre' for 'preposition' no longer allowed as it's too ambiguous; use 'prep'")
end
-- Likewise for pro = pronoun.
if pos == "pro" or pos == "prof" then
error("POS 'pro' for 'pronoun' no longer allowed as it's too ambiguous; use 'pron'")
end
local m_headword_data = m_data or get_data()
if m_headword_data.pos_aliases[pos] then
pos = m_headword_data.pos_aliases[pos]
elseif pos:sub(-1) == "f" then
pos = pos:sub(1, -2)
pos = "Bentuk " .. (m_headword_data.pos_aliases[pos] or pos)
end
return export.pluralize_pos(pos)
end
-- Find and return the maximum index in the array `data[element]` (which may have gaps in it), and initialize it to a
-- zero-length array if unspecified. Check to make sure all keys are numeric (other than "maxindex", which is set by
-- [[Module:parameters]] for list parameters), all values are strings, and unless `allow_blank_string` is given,
-- no blank (zero-length) strings are present.
local function init_and_find_maximum_index(data, element, allow_blank_string)
local maxind = 0
if not data[element] then
data[element] = {}
end
local typ = type(data[element])
if typ ~= "table" then
error(("Internal error: In full_headword(), `data.%s` must be an array but is a %s"):format(element, typ))
end
for k, v in pairs(data[element]) do
if k ~= "maxindex" then
if type(k) ~= "number" then
error(("Internal error: Unrecognized non-numeric key '%s' in `data.%s`"):format(k, element))
end
if k > maxind then
maxind = k
end
if v then
if type(v) ~= "string" then
error(("Internal error: For key '%s' in `data.%s`, value should be a string but is a %s"):format(k, element, type(v)))
end
if not allow_blank_string and v == "" then
error(("Internal error: For key '%s' in `data.%s`, blank string not allowed; use 'false' for the default"):format(k, element))
end
end
end
end
return maxind
end
--[==[
-- Add the page to various maintenance categories for the language and the
-- whole page. These are placed in the headword somewhat arbitrarily, but
-- mainly because headword templates are mandatory for entries (meaning that
-- in theory it provides full coverage).
--
-- This is provided as an external entry point so that modules which transclude
-- information from other entries (such as {{tl|ja-see}}) can take advantage
-- of this feature as well, because they are used in place of a conventional
-- headword template.]==]
do
-- Handle any manual sortkeys that have been specified in raw categories
-- by tracking if they are the same or different from the automatically-
-- generated sortkey, so that we can track them in maintenance
-- categories.
local function handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats)
sortkey = sortkey or lang:makeSortKey(page.pagename)
-- If there are raw categories with no sortkey, then they will be
-- sorted based on the default MediaWiki sortkey, so we check against
-- that.
if tbl == true then
if page.raw_defaultsort ~= sortkey then
insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik")
end
return
end
local redundant, different
for k in pairs(tbl) do
if k == sortkey then
redundant = true
else
different = true
end
end
if redundant then
insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih lewah")
end
if different then
insert(lang_cats, "Perkataan bahasa " .. lang:getFullName() .. " dengan kunci isih tidak lewah dan tidak automatik")
end
return sortkey
end
function export.maintenance_cats(page, lang, lang_cats, page_cats)
extend(page_cats, page.cats)
lang = lang:getFull() -- since we are just generating categories
local canonical = lang:getCanonicalName()
local tbl, sortkey = page.wikitext_topic_cat[lang:getCode()]
if tbl then
sortkey = handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats)
insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori topik yang menggunakan penanda mentah")
end
tbl = page.wikitext_langname_cat[canonical]
if tbl then
handle_raw_sortkeys(tbl, sortkey, page, lang, lang_cats)
insert(lang_cats, "Entri bahasa " .. canonical .. " dengan kategori nama bahasa yang menggunakan penanda mentah")
end
local current_L2 = get_current_L2()
if current_L2 then
local trimmed_L2 = trim(current_L2)
local expected_L2 = "Bahasa " .. canonical
if trimmed_L2 ~= expected_L2 then
insert(lang_cats, "Entri bahasa " .. canonical .. " dengan pengepala bahasa tidak betul")
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pengepala bahasa tidak betul/LANGCODE]]
track("pengepala bahasa tidak betul", lang)
end
end
end
end
--[==[This is the primary external entry point.
{{lua|full_headword(data)}}
This is used by {{temp|head}} and various language-specific headword templates (e.g. {{temp|ru-adj}} for Russian adjectives, {{temp|de-noun}} for German nouns, etc.) to display an entire headword line.
See [[#Further explanations for full_headword()]]
]==]
function export.full_headword(data)
-- Prevent data from being destructively modified.
local data = shallow_copy(data)
------------ 1. Basic checks for old-style (multi-arg) calling convention. ------------
if data.getCanonicalName then
error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) of properties, not a language object")
end
if not data.lang or type(data.lang) ~= "table" or not data.lang.getCode then
error("Internal error: In full_headword(), the first argument `data` needs to be a Lua object (table) and `data.lang` must be a language object")
end
if data.id and type(data.id) ~= "string" then
error("Internal error: The id in the data table should be a string.")
end
------------ 2. Initialize pagename etc. ------------
local langcode = data.lang:getCode()
local full_langcode = data.lang:getFullCode()
local langname = data.lang:getCanonicalName()
local full_langname = data.lang:getFullName()
local raw_pagename = data.pagename
local page
local m_headword_data = m_data or get_data()
if raw_pagename and raw_pagename ~= m_headword_data.pagename then -- for testing, doc pages, etc.
-- data.pagename is often set on documentation and test pages through the pagename= parameter of various
-- templates, to emulate running on that page. Having a large number of such test templates on a single
-- page often leads to timeouts, because we fetch and parse the contents of each page in turn. However,
-- we don't really need to do that and can function fine without fetching and parsing the contents of a
-- given page, so turn off content fetching/parsing (and also setting the DEFAULTSORT key through a parser
-- function, which is *slooooow*) in certain namespaces where test and documentation templates are likely to
-- be found and where actual content does not live (User, Template, Module).
local actual_namespace = m_headword_data.page.namespace
local no_fetch_content = actual_namespace == "User" or actual_namespace == "Template" or
actual_namespace == "Module"
page = process_page(raw_pagename, no_fetch_content)
else
page = m_headword_data.page
end
local namespace = page.namespace
------------ 3. Initialize `data.heads` table; if old-style, convert to new-style. ------------
if type(data.heads) == "table" and type(data.heads[1]) == "table" then
-- new-style
if data.translits or data.transcriptions then
error("Internal error: In full_headword(), if `data.heads` is new-style (array of head objects), `data.translits` and `data.transcriptions` cannot be given")
end
else
-- convert old-style `heads`, `translits` and `transcriptions` to new-style
local maxind = max(
init_and_find_maximum_index(data, "heads"),
init_and_find_maximum_index(data, "translits", true),
init_and_find_maximum_index(data, "transcriptions", true)
)
for i = 1, maxind do
data.heads[i] = {
term = data.heads[i],
tr = data.translits[i],
ts = data.transcriptions[i],
}
end
end
-- Make sure there's at least one head.
if not data.heads[1] then
data.heads[1] = {}
end
------------ 4. Initialize and validate `data.categories` and `data.whole_page_categories`, and determine `pos_category` if not given, and add basic categories. ------------
-- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]]
if data.altform then
data.noposcat = true
end
init_and_find_maximum_index(data, "categories")
init_and_find_maximum_index(data, "whole_page_categories")
local pos_category_already_present = false
if data.categories[1] then
local escaped_langname = pattern_escape(full_langname)
local matches_lang_pattern = "^" .. escaped_langname .. " "
for _, cat in ipairs(data.categories) do
-- Does the category begin with the language name? If not, tag it with a tracking category.
if not cat:find(matches_lang_pattern) then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/no lang category/LANGCODE]]
track("no lang category", data.lang)
end
end
-- If `pos_category` not given, try to infer it from the first specified category. If this doesn't work, we
-- throw an error below.
if not data.pos_category and data.categories[1]:find(matches_lang_pattern) then
data.pos_category = data.categories[1]:gsub(matches_lang_pattern, "")
-- Optimization to avoid inserting category already present.
pos_category_already_present = true
end
end
if not data.pos_category then
error("Internal error: `data.pos_category` not specified and could not be inferred from the categories given in "
.. "`data.categories`. Either specify the plural part of speech in `data.pos_category` "
.. "(e.g. \"proper nouns\") or ensure that the first category in `data.categories` is formed from the "
.. "language's canonical name plus the plural part of speech (e.g. \"Norwegian Bokmål proper nouns\")."
)
end
-- Insert a category at the beginning for the part of speech unless it's already present or `data.noposcat` given.
if not pos_category_already_present and not data.noposcat then
local pos_category = ucfirst(data.pos_category) .. " bahasa " .. full_langname
-- FIXME: [[User:Theknightwho]] Why is this special case here? Please add an explanatory comment.
if pos_category ~= "Aksara Han rentas bahasa" then
insert(data.categories, 1, pos_category)
end
end
-- Try to determine whether the part of speech refers to a lemma or a non-lemma form; if we can figure this out,
-- add an appropriate category.
local postype = export.pos_lemma_or_nonlemma(data.pos_category)
if not postype then
-- We don't know what this category is, so tag it with a tracking category.
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/LANGCODE]]
track("unrecognized pos", data.lang)
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/unrecognized pos/POS/LANGCODE]]
track("unrecognized pos/pos/" .. data.pos_category, data.lang)
elseif not data.noposcat then
insert(data.categories, 1, ucfirst(postype) .. " bahasa " .. full_langname)
end
-- EXPERIMENTAL: see [[Wiktionary:Beer parlour/2024/June#Decluttering the altform mess]]
if data.altform then
insert(data.categories, 1, "Bentuk alternatif bahasa " .. full_langname)
end
------------ 5. Create a default headword, and add links to multiword page names. ------------
-- Determine if this is an "anti-asterisk" term, i.e. an attested term in a language that must normally be
-- reconstructed.
local is_anti_asterisk = data.heads[1].term and data.heads[1].term:find("^!!")
local lang_reconstructed = data.lang:hasType("reconstructed")
if is_anti_asterisk then
if not lang_reconstructed then
error("Anti-asterisk feature (head= beginning with !!) can only be used with reconstructed languages")
end
lang_reconstructed = false
end
-- Determine if term is reconstructed
local is_reconstructed = namespace == "Rekonstruksi" or data.lang:hasType("reconstructed")
-- Create a default headword based on the pagename, which is determined in
-- advance by the data module so that it only needs to be done once.
local default_head = page.pagename
-- Add links to multi-word page names when appropriate
if not (is_reconstructed or data.nolinkhead) then
local no_links = m_headword_data.no_multiword_links
if not (no_links[langcode] or no_links[full_langcode]) and export.head_is_multiword(default_head) then
default_head = export.add_multiword_links(default_head, true)
end
end
if is_reconstructed then
default_head = "*" .. default_head
end
------------ 6. Check the namespace against the language type. ------------
if namespace == "" then
if lang_reconstructed then
error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Rekonstruksi: ")
elseif data.lang:hasType("appendix-constructed") then
error("Entri dalam bahasa " .. langname .. " mesti dimasukkan dalam ruang nama Lampiran: ")
end
elseif namespace == "Petikan" or namespace == "Tesaurus" then
error("Templat pengepala tidak boleh digunakan dalam ruang nama " .. namespace .. ": .")
end
------------ 7. Fill in missing values in `data.heads`. ------------
-- True if any script among the headword scripts has spaces in it.
local any_script_has_spaces = false
-- True if any term has a redundant head= param.
local has_redundant_head_param = false
for _, head in ipairs(data.heads) do
------ 7a. If missing head, replace with default head.
if not head.term then
head.term = default_head
elseif head.term == default_head then
has_redundant_head_param = true
elseif is_anti_asterisk and head.term == "!!" then
-- If explicit head=!! is given, it's an anti-asterisk term and we fill in the default head.
head.term = "!!" .. default_head
elseif head.term:find("^[!?]$") then
-- If explicit head= just consists of ! or ?, add it to the end of the default head.
head.term = default_head .. head.term
end
head.term_no_initial_bang_bang = is_anti_asterisk and head.term:sub(3) or head.term
if is_reconstructed then
local head_term = head.term
if head_term:find("%[%[") then
head_term = remove_links(head_term)
end
if head_term:sub(1, 1) ~= "*" then
error("The headword '" .. head_term .. "' must begin with '*' to indicate that it is reconstructed.")
end
end
------ 7b. Try to detect the script(s) if not provided. If a per-head script is provided, that takes precedence,
------ otherwise fall back to the overall script if given. If neither given, autodetect the script.
local auto_sc = data.lang:findBestScript(head.term)
if (
auto_sc:getCode() == "None" and
find_best_script_without_lang(head.term):getCode() ~= "None"
) then
insert(data.categories, "Perkataan bahasa " .. full_langname .. " dalam bentuk tulisan tidak piawai")
end
if not (head.sc or data.sc) then -- No script code given, so use autodetected script.
head.sc = auto_sc
else
if not head.sc then -- Overall script code given.
head.sc = data.sc
end
-- Track uses of sc parameter.
if head.sc:getCode() == auto_sc:getCode() then
track("redundant script code", data.lang)
if not data.no_script_code_cat then
insert(data.categories, "Perkataan dengan kod tulisan lewah bahasa " .. full_langname )
end
else
track("non-redundant manual script code", data.lang)
if not data.no_script_code_cat then
insert(data.categories, "Perkataan dengan kod tulisan manual tidak lewah bahasa " .. full_langname )
end
end
end
-- If using a discouraged character sequence, add to maintenance category.
if head.sc:hasNormalizationFixes() == true then
local composed_head = toNFC(head.term)
if head.sc:fixDiscouragedSequences(composed_head) ~= composed_head then
insert(data.whole_page_categories, "Laman menggunakan jujukan aksara tidak digalakkan")
end
end
any_script_has_spaces = any_script_has_spaces or head.sc:hasSpaces()
------ 7c. Create automatic transliterations for any non-Latin headwords without manual translit given
------ (provided automatic translit is available, e.g. not in Persian or Hebrew).
-- Make transliterations
head.tr_manual = nil
-- Try to generate a transliteration if necessary
if head.tr == "-" then
head.tr = nil
else
local notranslit = m_headword_data.notranslit
if not (notranslit[langcode] or notranslit[full_langcode]) and head.sc:isTransliterated() then
head.tr_manual = not not head.tr
local text = head.term_no_initial_bang_bang
if not data.lang:link_tr(head.sc) then
text = remove_links(text)
end
local automated_tr = data.lang:transliterate(text, head.sc)
if automated_tr then
local manual_tr = head.tr
if manual_tr then
if remove_links(manual_tr) == remove_links(automated_tr) then
insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi lewah")
else
insert(data.categories, "Perkataan bahasa ".. full_langname .. " dengan transliterasi manual tidak lewah")
end
end
if not manual_tr then
head.tr = automated_tr
end
end
-- There is still no transliteration?
-- Add the entry to a cleanup category.
if not head.tr then
head.tr = "<small>transliteration needed</small>"
-- FIXME: No current support for 'Request for transliteration of Classical Persian terms' or similar.
-- Consider adding this support in [[Module:category tree/poscatboiler/data/entry maintenance]].
insert(data.categories, "Permintaan transliterasi perkataan bahasa " .. full_langname)
else
-- Otherwise, trim it.
head.tr = trim(head.tr)
end
end
end
-- Link to the transliteration entry for languages that require this.
if head.tr and data.lang:link_tr(head.sc) then
head.tr = full_link{
term = head.tr,
lang = data.lang,
sc = get_script("Latn"),
tr = "-"
}
end
end
------------ 8. Maybe tag the title with the appropriate script code, using the `display_title` mechanism. ------------
-- Assumes that the scripts in "toBeTagged" will never occur in the Reconstruction namespace.
-- (FIXME: Don't make assumptions like this, and if you need to do so, throw an error if the assumption is violated.)
-- Avoid tagging ASCII as Hani even when it is tagged as Hani in the headword, as in [[check]]. The check for ASCII
-- might need to be expanded to a check for any Latin characters and whitespace or punctuation.
local display_title
-- Where there are multiple headwords, use the script for the first. This assumes the first headword is similar to
-- the pagename, and that headwords that are in different scripts from the pagename aren't first. This seems to be
-- about the best we can do (alternatively we could potentially do script detection on the pagename).
local dt_script = data.heads[1].sc
local dt_script_code = dt_script:getCode()
local page_non_ascii = namespace == "" and not page.pagename:find("^[%z\1-\127]+$")
local unsupported_pagename, unsupported = page.full_raw_pagename:gsub("^Tajuk tidak disokong/", "")
if unsupported == 1 and page.unsupported_titles[unsupported_pagename] then
display_title = 'Tajuk tidak disokong/<span class="' .. dt_script_code .. '">' .. page.unsupported_titles[unsupported_pagename] .. '</span>'
elseif page_non_ascii and m_headword_data.toBeTagged[dt_script_code]
or (dt_script_code == "Jpan" and (text_in_script(page.pagename, "Hira") or text_in_script(page.pagename, "Kana")))
or (dt_script_code == "Kore" and text_in_script(page.pagename, "Hang")) then
display_title = '<span class="' .. dt_script_code .. '">' .. page.full_raw_pagename .. '</span>'
-- Keep Han entries region-neutral in the display title.
elseif page_non_ascii and (dt_script_code == "Hant" or dt_script_code == "Hans") then
display_title = '<span class="Hani">' .. page.full_raw_pagename .. '</span>'
elseif namespace == "Rekonstruksi" then
local matched
display_title, matched = ugsub(
page.full_raw_pagename,
"^(Rekonstruksi:[^/]+/)(.+)$",
function(before, term)
return before .. tag_text(term, data.lang, dt_script)
end
)
if matched == 0 then
display_title = nil
end
end
-- FIXME: Generalize this.
-- If the current language uses ur-Arab (for Urdu, etc.), ku-Arab (Central Kurdish) or pa-Arab
-- (Shahmukhi, for Punjabi) and there's more than one language on the page, don't set the display title
-- because these three scripts display in Nastaliq and we don't want this for terms that also exist in other
-- languages that don't display in Nastaliq (e.g. Arabic or Persian) to display in Nastaliq. Because the word
-- "Urdu" occurs near the end of the alphabet, Urdu fonts tend to override the fonts of other languages.
-- FIXME: This is checking for more than one language on the page but instead needs to check if there are any
-- languages using scripts other than the ones just mentioned.
if (dt_script_code == "ur-Arab" or dt_script_code == "ku-Arab" or dt_script_code == "pa-Arab") and page.L2_list.n > 1 then
display_title = nil
end
if display_title then
mw.getCurrentFrame():callParserFunction(
"DISPLAYTITLE",
display_title
)
end
------------ 9. Insert additional categories. ------------
if data.force_cat_output then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/force cat output]]
track("force cat output")
end
if has_redundant_head_param then
if not data.no_redundant_head_cat then
-- This is not the right way to go about this; too many exceptions and problems due to language-specific headword
-- handling customization. If we want this, it should be opt-in by a given language passing in the default headword.
-- insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan parameter kepala lewah")
end
end
-- If the first head is multiword (after removing links), maybe insert into "LANG multiword terms".
if not data.nomultiwordcat and any_script_has_spaces and postype == "lemma" then
local no_multiword_cat = m_headword_data.no_multiword_cat
if not (no_multiword_cat[langcode] or no_multiword_cat[full_langcode]) then
-- Check for spaces or hyphens, but exclude prefixes and suffixes.
-- Use the pagename, not the head= value, because the latter may have extra
-- junk in it, e.g. superscripted text that throws off the algorithm.
local no_hyphen = m_headword_data.hyphen_not_multiword_sep
-- Exclude hyphens if the data module states that they should for this language.
local checkpattern = (no_hyphen[langcode] or no_hyphen[full_langcode]) and ".[%s፡]." or ".[%s%-፡]."
local is_multiword = umatch(page.pagename, checkpattern)
if is_multiword and not non_categorizable(page.full_raw_pagename) then
insert(data.categories, "Perkataan berbilang kata bahasa " .. full_langname)
elseif not is_multiword then
local long_word_threshold = m_headword_data.long_word_thresholds[langcode] or
m_headword_data.long_word_thresholds[full_langcode]
if long_word_threshold and ulen(page.pagename) >= long_word_threshold then
insert(data.categories, "Perkataan panjang bahasa " .. full_langname)
end
end
end
end
local default_sccat = m_headword_data.default_sccat
if data.sccat or data.sccat == nil and (default_sccat[langcode] or default_sccat[full_langcode]) then
for _, head in ipairs(data.heads) do
insert(data.categories, ucfirst(data.pos_category) .. " bahasa " .. full_langname .. " dalam " ..
head.sc:getDisplayForm())
end
end
-- Reconstructed terms often use weird combinations of scripts and realistically aren't spelled so much as notated.
if namespace ~= "Rekonstruksi" then
-- Map from languages to a string containing the characters to ignore when considering whether a term has
-- multiple written scripts in it. Typically these are Greek or Cyrillic letters used for their phonetic
-- values.
local characters_to_ignore = {
["aaq"] = "αάὰ", -- Penobscot (Algonquian)
["acy"] = "δθ", -- Cypriot Arabic
["aez"] = "β", -- Aeka (Trans-New Guinea)
["anc"] = "γ", -- Ngas (Chadic/Afroasiatic)
["aou"] = "χ", -- A'ou (Kra-Dai)
["art-blk"] = "ч", -- Bolak (conlang)
["awg"] = "β", -- Anguthimri (Pama-Nyungan)
["az"] = "ь", -- Azerbaijani (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["ba"] = "ь", -- Bashkir (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["bhp"] = "β", -- Bima (Austronesian)
["bjz"] = "β", -- Baruga (Trans-New Guinea)
["byk"] = "θ", -- Biao (Kra-Dai)
["cdy"] = "θ", -- Chadong (Kra-Dai)
["chp"] = "θ", -- Chipewyan (Athabaskan)
["cjh"] = "χ", -- Upper Chehalis (Salishan)
["clm"] = "χ", -- Klallam (Salishan)
["col"] = "χ", -- Colombia-Wenatchi (Salishan)
["coo"] = "χθ", -- Comox (Salishan)
["crx"] = "θ", -- Carrier (Athabaskan)
["ets"] = "θ", -- Yekhee (Edoid/Niger-Congo)
["ett"] = "χ", -- Etruscan (isolate; in romanizations)
["fla"] = "χ", -- Montana Salish (Salishan)
["grt"] = "་", -- Garo (South Asian Sino-Tibetan)
["gmw-gts"] = "χ", -- Gottscheerish (Bavarian variant spoken in Slovenia)
["hur"] = "χθ", -- Halkomelem (Salishan)
["itc-psa"] = "f", -- Pre-Samnite (Italic; normally written in Greek)
["izh"] = "ь", -- Ingrian (Finnic)
["kic"] = "θ", -- Kickapoo (Algonquian)
["kk"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["ky"] = "ь", -- Kyrgyz (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["lil"] = "χ", -- Lillooet (Salishan)
["lsi"] = "ꓹ", -- Lashi (Lolo-Burmese/Sino-Tibetan; represents a glottal stop)
["mhz"] = "β", -- Mor (Austronesian)
["mqn"] = "β", -- Moronene (Austronesian)
["neg"]= "ӡā", -- Negidal (Tungusic; normally in Cyrillic)
["oka"] = "χ", -- Okanagan (Salishan)
["ole"] = "θ", -- Olekha (Sino-Tibetan)
["oui"] = "γβ", -- Old Uyghur (Turkic; FIXME: others? E.g. Greek delta (δ)?)
["pox"] = "χ", -- Polabian (West Slavic)
["rif"] = "ε", -- Tarifit (Berber)
["rom"] = "Θθ", -- Romani (Indic: International Standard; two different thetas???)
["rpn"] = "β", -- Repanbitip (Austronesian)
["sah"] = "ь", -- Yakut (Turkic; 1929 - 1939 Latin spelling)
["sit-jap"] = "χ", -- Japhug (Sino-Tibetan)
["sjw"] = "θ", -- Shawnee (Algonquian)
["squ"] = "χ", -- Squamish (Salishan)
["str"] = "χθ", -- Saanich (Salishan)
["teh"] = "χ", -- Tehuelche (Chonan; spoken in Argentina)
["tep"] = "η", -- Tepecano (Uto-Aztecan)
["thp"] = "χ", -- Thompson (Salishan)
["tk"] = "ь", -- Turkmen (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["tt"] = "ь", -- Kazakh (Turkic; Yañalif Latin spelling, c. 1928 - 1938)
["twa"] = "χ", -- Twana (Salishan)
["wbl"] = "ы", -- Wakhi (Iranian)
["xbc"] = "ϸ", -- Bactrian (Iranian; represents š; normally written in Greek)
["yha"] = "θ", -- Baha (Kra-Dai)
["za"] = "зч", -- Zhuang (Tai/Kra-Dai); 1957-1982 alphabet used two Cyrillic letters (as well as some others like
-- ƃ, ƅ, ƨ, ɯ and ɵ that look like Cyrillic or Greek but are actually Latin)
["zlw-slv"] = "χђћ", -- Slovincian (West Slavic; FIXME: χ is Greek, the other two are Cyrillic, but I'm not sure
-- the currect characters are being chosen in the entry names)
["zng"] = "θ", -- Mang (Mon-Khmer)
["ztp"] = "θ", -- Loxicha Zapotec (Zapotecan)
}
-- Determine how many real scripts are found in the pagename, where we exclude symbols and such. We exclude
-- scripts whose `character_category` is false as well as Zmth (mathematical notation symbols), which has a
-- category of "Mathematical notation symbols". When counting scripts, we need to elide language-specific
-- variants because e.g. Beng and as-Beng have slightly different characters but we don't want to consider them
-- two different scripts (e.g. [[এৰ]] has two characters which are detected respectively as Beng and as-Beng).
local seen_scripts = {}
local num_seen_scripts = 0
local num_loops = 0
local canon_pagename = page.pagename
local ch_to_ignore = characters_to_ignore[full_langcode]
if ch_to_ignore then
canon_pagename = ugsub(canon_pagename, "[" .. ch_to_ignore .. "]", "")
end
while true do
if canon_pagename == "" or num_seen_scripts >= 2 or num_loops >= 10 then
break
end
-- Make sure we don't get into a loop checking the same script over and over again; happens with e.g. [[ᠪᡳ]]
num_loops = num_loops + 1
local pagename_script = find_best_script_without_lang(canon_pagename, "None only as last resort")
local script_chars = pagename_script.characters
if not script_chars then
-- we are stuck; this happens with None
break
end
local script_code = pagename_script:getCode()
local replaced
canon_pagename, replaced = ugsub(canon_pagename, "[" .. script_chars .. "]", "")
if (
replaced and
script_code ~= "Zmth" and
(script_data or get_script_data())[script_code] and
script_data[script_code].character_category ~= false
) then
script_code = script_code:gsub("^.-%-", "")
if not seen_scripts[script_code] then
seen_scripts[script_code] = true
num_seen_scripts = num_seen_scripts + 1
end
end
end
if num_seen_scripts > 1 then
insert(data.categories, "Perkataan bahasa " .. full_langname .. " dieja dalam berbilang tulisan")
end
end
-- Categorise for unusual characters. Takes into account combining characters, so that we can categorise for characters with diacritics that aren't encoded as atomic characters (e.g. U̠). These can be in two formats: single combining characters (i.e. character + diacritic(s)) or double combining characters (i.e. character + diacritic(s) + character). Each can have any number of diacritics.
local standard = data.lang:getStandardCharacters()
if standard and not non_categorizable(page.full_raw_pagename) then
local function char_category(char)
local specials = {
["#"] = "number sign",
["("] = "parentheses",
[")"] = "parentheses",
["<"] = "angle brackets",
[">"] = "angle brackets",
["["] = "square brackets",
["]"] = "square brackets",
["_"] = "underscore",
["{"] = "braces",
["|"] = "vertical line",
["}"] = "braces",
["ß"] = "ẞ",
["\205\133"] = "", -- this is UTF-8 for U+0345 ( ͅ)
["\239\191\189"] = "replacement character",
}
char = toNFD(char)
:gsub(".[\128-\191]*", function(m)
local new_m = specials[m]
new_m = new_m or m:uupper()
return new_m
end)
return toNFC(char)
end
if full_langcode ~= "hi" and full_langcode ~= "lo" then
local standard_chars_scripts = {}
for _, head in ipairs(data.heads) do
standard_chars_scripts[head.sc:getCode()] = true
end
-- Iterate over the scripts, in case there is more than one (as they can have different sets of standard characters).
for code in pairs(standard_chars_scripts) do
local sc_standard = data.lang:getStandardCharacters(code)
if sc_standard then
if page.pagename_len > 1 then
local explode_standard = {}
local function explode(char)
explode_standard[char] = true
return ""
end
local sc_standard = ugsub(sc_standard, page.comb_chars.combined_double, explode)
sc_standard = ugsub(sc_standard,page.comb_chars.combined_single, explode)
:gsub(".[\128-\191]*", explode)
local num_cat_inserted
for char in pairs(page.explode_pagename) do
if not explode_standard[char] then
if char:find("[0-9]") then
if not num_cat_inserted then
insert(data.categories, "Perkataan dieja dengan nombor bahasa " .. full_langname)
num_cat_inserted = true
end
elseif ufind(char, page.emoji_pattern) then
insert(data.categories, "Perkataan dieja dengan emoji bahasa " .. full_langname)
else
local upper = char_category(char)
if not explode_standard[upper] then
char = upper
end
insert(data.categories, "Perkataan dieja dengan " .. char .. " bahasa " .. full_langname)
end
end
end
end
-- If a diacritic doesn't appear in any of the standard characters, also categorise for it generally.
sc_standard = toNFD(sc_standard)
for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_single) do
if not umatch(sc_standard, diacritic) then
insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. " bahasa " .. full_langname)
end
end
for diacritic in ugmatch(page.decompose_pagename, page.comb_chars.diacritics_double) do
if not umatch(sc_standard, diacritic) then
insert(data.categories, "Perkataan dieja dengan ◌" .. diacritic .. "◌ bahasa " .. full_langname)
end
end
end
end
-- Ancient Greek, Hindi and Lao handled the old way for now, as their standard chars still need to be converted to the new format (because there are a lot of them).
elseif ulen(page.pagename) ~= 1 then
for character in ugmatch(page.pagename, "([^" .. standard .. "])") do
local upper = char_category(character)
if not umatch(upper, "[" .. standard .. "]") then
character = upper
end
insert(data.categories, "Perkataan dieja dengan " .. character .. " bahasa " .. full_langname)
end
end
end
if data.heads[1].sc:isSystem("alphabet") then
local pagename, i = page.pagename:ulower(), 2
while umatch(pagename, "(%a)" .. ("%1"):rep(i)) do
i = i + 1
insert(data.categories, "Perkataan bahasa " .. full_langname .. " dengan " .. i .. " contoh huruf yang sama berturut-turut")
end
end
-- Categorise for palindromes
if not data.nopalindromecat and namespace ~= "Rekonstruksi" and ulen(page.pagename) > 2
-- FIXME: Use of first script here seems hacky. What is the clean way of doing this in the presence of
-- multiple scripts?
and is_palindrome(page.pagename, data.lang, data.heads[1].sc) then
insert(data.categories, "Palindrom bahasa " .. full_langname)
end
if namespace == "" and not lang_reconstructed then
for _, head in ipairs(data.heads) do
if page.full_raw_pagename ~= get_link_page(remove_links(head.term), data.lang, head.sc) then
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch]]
-- [[Special:WhatLinksHere/Wiktionary:Tracking/headword/pagename spelling mismatch/LANGCODE]]
track("pagename spelling mismatch", data.lang)
break
end
end
end
-- Add red link category if called for and we're not a "large" page, where such checks are disabled.
if data.checkredlinks and not m_headword_data.large_pages[m_headword_data.pagename] then
local plposcat = type(data.checkredlinks) == "string" and data.checkredlinks or data.pos_category
check_red_link_inflections_top_level(data, plposcat)
end
-- Add to various maintenance categories.
export.maintenance_cats(page, data.lang, data.categories, data.whole_page_categories)
------------ 10. Format and return headwords, genders, inflections and categories. ------------
-- Format and return all the gathered information. This may add more categories (e.g. gender/number categories),
-- so make sure we do it before evaluating `data.categories`.
local text = '<span class="headword-line">' ..
format_headword(data) ..
format_headword_genders(data) ..
format_top_level_inflections(data) .. '</span>'
-- Language-specific categories.
local cats = format_categories(
data.categories, data.lang, data.sort_key, page.encoded_pagename,
data.force_cat_output or test_force_categories, data.heads[1].sc
)
-- Language-agnostic categories.
local whole_page_cats = format_categories(
data.whole_page_categories, nil, "-"
)
return text .. cats .. whole_page_cats
end
return export
iwm7l3x8d8tnsm0zzbesfxagd8a88jz
Modul:affix/templates
828
10378
281460
254174
2026-04-23T04:27:49Z
Hakimi97
2668
Mengemas kini mengikut padanan Wikikamus bahasa Inggeris (semakan [[en:Special:Diff/89784227|89784227]])
281460
Scribunto
text/plain
local export = {}
local m_affix = require("Module:affix")
local m_utilities = require("Module:utilities")
local en_utilities_module = "Module:en-utilities"
local parameter_utilities_module = "Module:parameter utilities"
local pseudo_loan_module = "Module:affix/pseudo-loan"
local insert = table.insert
local boolean_param = {type = "boolean"}
local function is_property_key(k)
return require(parameter_utilities_module).item_key_is_property(k)
end
local recognized_affix_types = {
prefix = "awalan",
pre = "awalan",
suffix = "akhiran",
suf = "akhiran",
interfix = "jalinan",
inter = "jalinan",
infix = "sisipan",
["in"] = "sisipan",
circumfix = "apitan",
circum = "apitan",
["non-affix"] = "non-affix",
naf = "non-affix",
root = "non-affix",
}
local function pre_normalize_affix_type(data)
local modtext = data.modtext
modtext = modtext:match("^<(.*)>$")
if not modtext then
error(("Internal error: Passed-in modifier isn't surrounded by angle brackets: %s"):format(data.modtext))
end
if recognized_affix_types[modtext] then
modtext = "type:" .. modtext
end
return "<" .. modtext .. ">"
end
-- Parse raw arguments. A single parameter `data` is passed in, with the following fields:
-- * `raw_args`: The raw arguments to parse, normally taken from `frame:getParent().args`.
-- * `extra_params`: An optional function of one argument that is called on the `params` structure before parsing; its
-- purpose is to specify additional allowed parameters or possibly disable parameters.
-- * `has_source`: There is a source-language parameter following 1= (which becomes the "destination" language
-- parameter) and preceding the terms. This is currently used for {{pseudo-loan}}.
-- * `ilang`: If given, it is a language object that serves as the default for the language. If specified, there is no
-- language code specified in 1=; instead the term parameters start directly at 1= (or at 2= if `has_source` is
-- given).
-- * `require_index_for_pos`: There is no separate |pos= parameter distinct from |pos1=, |pos2=, etc. Instead,
-- specifying |pos= results in an error.
-- * `dont_require_index`: Allow |foo= to be specified as a synonym for |foo1= (except for |lit=, which remains
-- distinct).
-- * `allow_type`: Allow |type1=, |type2=, etc. or inline <type:...> for the affix type, and allow a separate |type=
-- parameter for the etymology type (FIXME: this may be confusing; consider changing the etymology type to |etype=).
-- * `allow_semicolon_separator`: Allow semicolon as a separator, displaying as " or ". This requires changes in the
-- display of the output, to not always put a + between the items.
--
-- Note that all language parameters are allowed to be etymology-only languages.
--
-- Return five values ARGS, ITEMS, LANG_OBJ, SCRIPT_OBJ, SOURCE_LANG_OBJ where ARGS is a table of the parsed arguments;
-- ITEMS is the list of parsed items; LANG_OBJ is the language object corresponding to the language code specified in 1=
-- (or taken from `ilang` if given); SCRIPT_OBJ is the script object corresponding to sc= (if given, otherwise nil); and
-- SOURCE_LANG_OBJ is the language object corresponding to the source-language code specified in 2= (or 1= if `ilang` is
-- given) if `has_source` is specified (otherwise nil).
local function parse_args(data)
local raw_args = data.raw_args
local has_source = data.has_source
local ilang = data.ilang
if raw_args.lang then
error("The |lang= parameter is not used by this template. Place the language code in parameter 1 instead.")
end
local term_index = (ilang and 1 or 2) + (has_source and 1 or 0)
local params = {
[term_index] = {list = true, allow_holes = true},
["sort"] = {},
["nocap"] = boolean_param, -- always allow this even if not used, for use with {{surf}}, which adds it
}
if not ilang then
params[1] = {required = true, type = "language", default = "und"}
end
local source_index
if has_source then
source_index = term_index - 1
params[source_index] = {required = true, type = "language", default = "und"}
end
local m_param_utils = require(parameter_utilities_module)
local param_mod_source = {}
if not data.dont_require_index then
insert(param_mod_source,
-- We want to require an index for all params (or use separate_no_index, which also requires an index for the
-- param corresponding to the first item).
{default = true, require_index = true}
)
end
insert(param_mod_source, {group = {"link", "ref", "lang", "q", "l", "infl"}})
-- Override lit= to be separate from lit1=.
insert(param_mod_source, {param = "lit", separate_no_index = true})
if not data.dont_require_index and not data.require_index_for_pos then
-- Override pos= to be separate from pos1=.
insert(param_mod_source, {param = "pos", separate_no_index = true})
end
if data.allow_type then
insert(param_mod_source, {param = "type", separate_no_index = true})
end
local param_mods = m_param_utils.construct_param_mods(param_mod_source)
if data.extra_params then
data.extra_params(params)
end
local items, args = m_param_utils.parse_list_with_inline_modifiers_and_separate_params {
params = params,
param_mods = param_mods,
raw_args = raw_args,
termarg = term_index,
parse_lang_prefix = true,
track_module = "homophones",
-- the inclusion of ‎ is what [[Module:affix]] has always done
default_separator = data.allow_semicolon_separator and " +‎ " or nil,
special_separators = data.allow_semicolon_separator and {[";"] = " or "} or nil,
disallow_custom_separators = not data.allow_semicolon_separator,
-- For compatibility, we need to not skip completely unspecified items. It is common, for example, to do
-- {{suffix|lang||foo}} to generate "+ -foo".
dont_skip_items = true,
-- Allow e.g. <infix> to be specified in place of <type:infix>.
pre_normalize_modifiers = pre_normalize_affix_type,
-- Don't pass in `lang` or `sc`, as they will be used as defaults to initialize the items, which we don't want
-- (particularly for `lang`), as the code in [[Module:affix]] uses the presence of `lang` as an indicator that
-- a part-specific language was explicitly given.
}
local lang = ilang or args[1]
local source
if has_source then
source = args[source_index]
end
-- For compatibility with the prior code, we need to convert items without term or properties to nil.
for i = 1, #items do
local item = items[i]
local saw_item_property = item.term
if not saw_item_property then
for k, v in pairs(item) do
if is_property_key(k) then
saw_item_property = true
break
end
end
end
if not saw_item_property then
items[i] = nil
elseif item.type then
-- Validate and canonicalize affix types.
if not recognized_affix_types[item.type] then
local valid_types = {}
for k in pairs(recognized_affix_types) do
insert(valid_types, ("'%s'"):format(k))
end
table.sort(recognized_affix_types)
error(("Unrecognized affix type '%s' in item %s; valid values are %s"):format(
item.type, item.itemno, table.concat(valid_types, ", ")))
else
item.type = recognized_affix_types[item.type]
end
end
end
if args.type and args.type.default and not m_affix.etymology_types[args.type.default] then
error("Unrecognized etymology type: '" .. args.type.default .. "'")
end
return args, items, lang, args.sc.default, source
end
local function augment_affix_data(data, args, lang, sc)
data.lang = lang
data.sc = sc
data.pos = args.pos and args.pos.default
data.lit = args.lit and args.lit.default
data.sort_key = args.sort
data.type = args.type and args.type.default
data.nocap = args.nocap
data.notext = args.notext
data.nocat = args.nocat
data.force_cat = args.force_cat
data.l = args.l.default
data.ll = args.ll.default
data.q = args.q.default
data.qq = args.qq.default
data.infl = args.infl.default
return data
end
function export.affix(frame)
local function extra_params(params)
params.notext = boolean_param
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc = parse_args {
raw_args = frame:getParent().args,
extra_params = extra_params,
allow_type = true,
allow_semicolon_separator = true,
}
-- There must be at least one part to display. If there are gaps, a term
-- request will be shown.
if not next(parts) and not args.type.default then
if mw.title.getCurrentTitle().nsText == "Templat" then
parts = { {term = "awalan-"}, {term = "kata dasar"}, {term = "-akhiran"} }
else
error("You must provide at least one part.")
end
end
return m_affix.show_affix(augment_affix_data({ parts = parts }, args, lang, sc))
end
function export.compound(frame)
local function extra_params(params)
params.notext = boolean_param
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc = parse_args {
raw_args = frame:getParent().args,
extra_params = extra_params,
allow_type = true,
allow_semicolon_separator = true,
}
-- There must be at least one part to display. If there are gaps, a term
-- request will be shown.
if not next(parts) and not args.type.default then
if mw.title.getCurrentTitle().nsText == "Templat" then
parts = { {term = "pertama"}, {separator = " +‎ ", term = "kedua"} }
else
error("You must provide at least one part of a compound.")
end
end
return m_affix.show_compound(augment_affix_data({ parts = parts }, args, lang, sc))
end
-- FIXME: Temporary for check in compound_like() below for old-style {{contraction}} parameters. Remove eventually.
local function ine(arg)
if arg == "" then
return nil
else
return arg
end
end
function export.compound_like(frame)
local iparams = {
["lang"] = {type = "language"},
["template"] = {},
["text"] = {},
["oftext"] = {},
["cat"] = {},
["noaffixcat"] = boolean_param,
["dont_require_index"] = boolean_param,
}
local iargs = require("Module:parameters").process(frame.args, iparams)
local parent_args = frame:getParent().args
-- Error to catch most uses of old-style parameters for {{contraction}}. (FIXME: Remove eventually.)
local term_param = iargs.lang and 1 or 2
if ine(parent_args[term_param + 2]) and not ine(parent_args[term_param + 1]) and not ine(parent_args.tr2) and not ine(parent_args.ts2)
and not ine(parent_args.t2) and not ine(parent_args.gloss2) and not ine(parent_args.g2)
and not ine(parent_args.alt2) then
error(("You specified a term in %s= and not one in %s=. You probably meant to use t= to specify a gloss instead. "
.. "If you intended to specify two terms, put the second term in %s=."):format(term_param + 2, term_param + 1,
term_param + 1))
end
if not ine(parent_args[term_param + 1]) and not ine(parent_args.alt2) and not ine(parent_args.tr2) and not ine(parent_args.ts2)
and ine(parent_args.g2) then
error(("You specified a gender in g2= but no term in %s=. You were probably trying to specify two genders for "
.. "a single term. To do that, put both genders in g=, comma-separated."):format(term_param + 1))
end
local function extra_params(params)
params.notext = boolean_param
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc = parse_args {
raw_args = parent_args,
extra_params = extra_params,
ilang = iargs.lang,
dont_require_index = iargs.dont_require_index,
-- FIXME, why are we doing this? Formerly we had 'params.pos = nil' whose intention was to disable the overall
-- pos= while preserving posN=, which is equivalent to the following using the new syntax. But why is this
-- necessary?
require_index_for_pos = not iargs.dont_require_index,
allow_semicolon_separator = true,
}
local template = iargs.template
local nocat = args.nocat
local notext = args.notext
local text = not notext and iargs.text
local oftext = not notext and (iargs.oftext or text and "bagi")
local cat = not nocat and iargs.cat
local noaffixcat = nocat or iargs.noaffixcat
if not next(parts) then
if mw.title.getCurrentTitle().nsText == "Templat" then
parts = { {term = "pertama"}, {separator = " +‎ ", term = "kedua"} }
end
end
return m_affix.show_compound_like(augment_affix_data({ parts = parts, text = text, oftext = oftext, cat = cat, noaffixcat = noaffixcat },
args, lang, sc))
end
function export.surface_analysis(frame)
local function ine(arg)
-- Since we're operating before calling [[Module:parameters]], we need to imitate how that module processes
-- arguments, including trimming since numbered arguments don't have automatic whitespace trimming.
if not arg then
return arg
end
arg = mw.text.trim(arg)
if arg == "" then
arg = nil
end
return arg
end
local parent_args = frame:getParent().args
local etymtext
local arg1 = ine(parent_args[1])
if not arg1 then
-- Allow omitted first argument to just display "By surface analysis".
etymtext = ""
elseif arg1:find("^%+") then
-- If the first argument (normally a language code) is prefixed with a +, it's a template name.
local template_name = arg1:sub(2)
local new_args = {}
for i, v in pairs(parent_args) do
if type(i) == "number" then
if i > 1 then
new_args[i - 1] = v
end
else
new_args[i] = v
end
end
new_args.nocap = true
etymtext = ", " .. frame:expandTemplate { title = template_name, args = new_args }
end
if etymtext then
return (ine(parent_args.nocap) and "m" or "M") .. "elalui [[Lampiran:Glosari#analisis dasar|analisis dasar]]" ..
etymtext
end
local function extra_params(params)
params.notext = boolean_param
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc = parse_args {
raw_args = parent_args,
extra_params = extra_params,
allow_type = true,
allow_semicolon_separator = true,
}
-- There must be at least one part to display. If there are gaps, a term
-- request will be shown.
if not next(parts) then
if mw.title.getCurrentTitle().nsText == "Templat" then
parts = { {term = "pertama"}, {separator = " +‎ ", term = "kedua"} }
else
error("You must provide at least one part.")
end
end
return m_affix.show_surface_analysis(augment_affix_data({ parts = parts }, args, lang, sc))
end
local function check_max_items(items, max_allowed)
if #items > max_allowed then
local bad_item = items[max_allowed + 1]
if bad_item.term then
error(("At most %s terms can be specified but saw a term specified for term #%s")
:format(max_allowed, max_allowed + 1))
else
for k, v in pairs(bad_item) do
if is_property_key(k) then
error(("At most %s terms can be specified but saw a value for property '%s' of term #%s")
:format(max_allowed, k, max_allowed + 1))
end
end
end
error(("Internal error: Something wrong, %s items generated when there should be at most %s, but item #%s doesn't have a term or any properties")
:format(#items, max_allowed, max_allowed + 1))
end
end
function export.circumfix(frame)
local function extra_params(params)
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc = parse_args {
raw_args = frame:getParent().args,
extra_params = extra_params,
}
check_max_items(parts, 3)
local prefix = parts[1]
local base = parts[2]
local suffix = parts[3]
-- Just to make sure someone didn't use the template in a silly way
if not (prefix and base and suffix) then
if mw.title.getCurrentTitle().nsText == "Templat" then
prefix = {term = "apitan", alt = "awalan"}
base = {term = "kata dasar"}
suffix = {term = "apitan", alt = "akhiran"}
else
error("You must specify a prefix part, a base term and a suffix part.")
end
end
return m_affix.show_circumfix(augment_affix_data({ prefix = prefix, base = base, suffix = suffix }, args, lang, sc))
end
function export.confix(frame)
local function extra_params(params)
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc = parse_args {
raw_args = frame:getParent().args,
extra_params = extra_params,
}
check_max_items(parts, 3)
local prefix = parts[1]
local base = parts[3] and parts[2] or nil
local suffix = parts[3] or parts[2]
-- Just to make sure someone didn't use the template in a silly way
if not (prefix and suffix) then
if mw.title.getCurrentTitle().nsText == "Templat" then
prefix = {term = "awalan"}
suffix = {term = "akhiran"}
else
error("You must specify a prefix part, an optional base term and a suffix part.")
end
end
return m_affix.show_confix(augment_affix_data({ prefix = prefix, base = base, suffix = suffix }, args, lang, sc))
end
function export.pseudo_loan(frame)
local function extra_params(params)
params.notext = boolean_param
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc, source = parse_args {
raw_args = frame:getParent().args,
extra_params = extra_params,
has_source = true,
-- FIXME, why are we doing this? Formerly we had 'params.pos = nil' whose intention was to disable the overall
-- pos= while preserving posN=, which is equivalent to the following using the new syntax. But why is this
-- necessary?
require_index_for_pos = true,
allow_semicolon_separator = true,
}
return require(pseudo_loan_module).show_pseudo_loan(
augment_affix_data({ source = source, parts = parts }, args, lang, sc))
end
function export.infix(frame)
local function extra_params(params)
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc = parse_args {
raw_args = frame:getParent().args,
extra_params = extra_params,
}
check_max_items(parts, 3)
local base = parts[1]
local infix = parts[2]
-- Just to make sure someone didn't use the template in a silly way
if not (base and infix) then
if mw.title.getCurrentTitle().nsText == "Templat" then
base = {term = "kata dasar"}
infix = {term = "sisipan"}
else
error("You must provide a base term and an infix.")
end
end
return m_affix.show_infix(augment_affix_data({ base = base, infix = infix }, args, lang, sc))
end
function export.prefix(frame)
local function extra_params(params)
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc = parse_args {
raw_args = frame:getParent().args,
extra_params = extra_params,
}
local prefixes = parts
local base = nil
local max_prefix = 0
for k, v in pairs(prefixes) do
max_prefix = math.max(k, max_prefix)
end
if max_prefix >= 2 then
base = prefixes[max_prefix]
prefixes[max_prefix] = nil
end
-- Just to make sure someone didn't use the template in a silly way
if not next(prefixes) then
if mw.title.getCurrentTitle().nsText == "Templat" then
base = {term = "kata dasar"}
prefixes = { {term = "awalan"} }
else
error("You must provide at least one prefix.")
end
end
return m_affix.show_prefix(augment_affix_data({ prefixes = prefixes, base = base }, args, lang, sc))
end
function export.suffix(frame)
local function extra_params(params)
params.nocat = boolean_param
params.force_cat = boolean_param
end
local args, parts, lang, sc = parse_args {
raw_args = frame:getParent().args,
extra_params = extra_params,
}
local base = parts[1]
local suffixes = {}
for k, v in pairs(parts) do
suffixes[k - 1] = v
end
-- Just to make sure someone didn't use the template in a silly way
if not next(suffixes) then
if mw.title.getCurrentTitle().nsText == "Templat" then
base = {term = "kata dasar"}
suffixes = { {term = "akhiran"} }
else
error("You must provide at least one suffix.")
end
end
return m_affix.show_suffix(augment_affix_data({ base = base, suffixes = suffixes }, args, lang, sc))
end
function export.derivsee(frame)
local iargs = frame.args
local iparams = {
["derivtype"] = {},
}
local iargs = require("Module:parameters").process(frame.args, iparams)
local params = {
["head"] = {},
["id"] = {},
["sc"] = {type = "script"},
["pos"] = {},
}
local derivtype = iargs.derivtype
params[1] = {required = "true", type = "language", default = "und"}
params[2] = {}
local args = require("Module:parameters").process(frame:getParent().args, params)
local lang = args[1]
local term = args[2] or args.head
local id = args.id
local sc = args.sc
local pos = require(en_utilities_module).pluralize(args.pos or "Perkataan")
if not term then
local SUBPAGE = mw.loadData("Module:headword/data").pagename
if lang:hasType("reconstructed") or mw.title.getCurrentTitle().nsText == "Rekonstruksi" then
term = "*" .. SUBPAGE
elseif lang:hasType("appendix-constructed") then
term = SUBPAGE
else
term = SUBPAGE
end
end
local category = nil
local langname = lang:getFullName()
if (derivtype == "compound" and pos == nil) then
category = "Kata majmuk dengan " .. term .. " bahasa " .. langname
elseif derivtype == "compound" and pos == "verbs" then
category = "Kata majmuk terbentuk dengan " .. term .. " bahasa " .. langname
elseif derivtype == "compound" then
category = "Kata majmuk dengan " .. term .. " bahasa " .. langname
else
category = pos .. " dengan " .. derivtype .. " " .. term .. (id and " (" .. id .. ")" or "") .. " bahasa " .. langname
end
return require('Module:collapsible category tree').make{
lang = lang,
sc = sc,
category = category,
}
end
return export
gnop5l124mqbcxemso7h12z5ouo813q
Modul:affix
828
10384
281459
258373
2026-04-23T04:07:07Z
Hakimi97
2668
Mengemas kini mengikut padanan Wikikamus bahasa Inggeris (semakan [[en:Special:Diff/89886462|89886462]])
281459
Scribunto
text/plain
local export = {}
local debug_force_cat = false -- if set to true, always display categories even on userspace pages
local m_links = require("Module:links")
local m_str_utils = require("Module:string utilities")
local m_table = require("Module:table")
local en_utilities_module = "Module:en-utilities"
local etymology_module = "Module:etymology"
local pron_qualifier_module = "Module:pron qualifier"
local scripts_module = "Module:scripts"
local utilities_module = "Module:utilities"
-- Export this so the category code in [[Module:category tree/etymology]] can access it.
export.affix_lang_data_module_prefix = "Module:affix/lang-data/"
local rsub = m_str_utils.gsub
local usub = m_str_utils.sub
local ulen = m_str_utils.len
local rfind = m_str_utils.find
local rmatch = m_str_utils.match
local pluralize = require(en_utilities_module).pluralize
local u = m_str_utils.char
local ucfirst = m_str_utils.ucfirst
local unpack = unpack or table.unpack -- Lua 5.2 compatibility
function export.affix_variants(canonical, variants)
local mappings = {}
for _, variant in ipairs(variants) do
mappings[variant] = canonical
end
return mappings
end
function export.id_mapping(default, ids)
local mapping = { default = default }
if ids then
for id, target in pairs(ids) do
mapping[id] = target
end
end
return mapping
end
function export.id_mapping_with_affix_variants(base, id_variants)
local mappings = {}
for id, variants in pairs(id_variants) do
for _, variant in ipairs(variants) do
mappings[variant] = export.id_mapping(base, {[id] = base})
end
end
return mappings
end
function export.merge_tables(...)
local result = {}
for i = 1, select('#', ...) do
local t = select(i, ...)
if t then
for k, v in pairs(t) do
result[k] = v
end
end
end
return result
end
-- Export this so the category code in [[Module:category tree/etymology]] can access it.
export.langs_with_lang_specific_data = {
["az"] = true,
["fi"] = true,
["fr"] = true,
["izh"] = true,
["la"] = true,
["sah"] = true,
["tr"] = true,
["trk-pro"] = true,
}
local default_pos = "Perkataan"
--[==[ intro:
===About different types of hyphens ("template", "display" and "lookup"):===
* The "template hyphen" is the per-script hyphen character that is used in template calls to indicate that a term is an
affix. This is always a single Unicode char, but there may be multiple possible hyphens for a given script. Normally
this is just the regular hyphen character "-", but for some non-Latin-script languages (currently only right-to-left
languages), it is different.
* The "display hyphen" is the string (which might be an empty string) that is added onto a term as displayed and linked,
to indicate that a term is an affix. Currently this is always either the same as the template hyphen or an empty
string, but the code below is written generally enough to handle arbitrary display hyphens. Specifically:
*# For East Asian languages, the display hyphen is always blank.
*# For Arabic-script languages, either tatweel (ـ) or ZWNJ (zero-width non-joiner) are allowed as template hyphens,
where ZWNJ is supported primarily for Farsi, because some suffixes have non-joining behavior. The display hyphen
corresponding to tatweel is also tatweel, but the display hyphen corresponding to ZWNJ is blank (tatweel is also
the default display hyphen, for calls to {{tl|prefix}}/{{tl|suffix}}/etc. that don't include an explicit hyphen).
* The "lookup hyphen" is the hyphen that is used when looking up language-specific affix mappings. (These mappings are
discussed in more detail below when discussing link affixes.) It depends only on the script of the affix in question.
Most scripts (including East Asian scripts) use a regular hyphen "-" as the lookup hyphen, but Hebrew and Arabic
have their own lookup hyphens (respectively maqqef and tatweel). Note that for Arabic in particular, there are
three possible template hyphens that are recognized (tatweel, ZWNJ and regular hyphen), but mappings must use tatweel.
===About different types of affixes ("template", "display", "link", "lookup" and "category"):===
* A "template affix" is an affix in its source form as it appears in a template call. Generally, a template affix has an
attached template hyphen (see above) to indicate that it is an affix and indicate what type of affix it is (prefix,
suffix, interfix or circumfix), but some of the older-style templates such as {{tl|suffix}}, {{tl|prefix}},
{{tl|confix}}, etc. have "positional" affixes where the presence of the affix in a certain position (e.g. the second
or third parameter) indicates that it is a certain type of affix, whether or not it has an attached template hyphen.
* A "display affix" is the corresponding affix as it is actually displayed to the user. The display affix may differ
from the template affix for various reasons:
*# The display affix may be specified explicitly using the {{para|alt<var>N</var>}} parameter, the `<alt:...>` inline
modifier or a piped link of the form e.g. `<nowiki>[[-kas|-käs]]</nowiki>` (here indicating that the affix should
display as `-käs` but be linked as `-kas`). Here, the template affix is arguably the entire piped link, while the
display affix is `-käs`.
*# Even in the absence of {{para|alt<var>N</var>}} parameters, `<alt:...>` inline modifiers and piped links, certain
languages have differences between the "template hyphen" specified in the template (which always needs to be
specified somehow or other in templates like {{tl|affix}}, to indicate that the term is an affix and what type of
affix it is) and the display hyphen (see above), with corresponding differences between template and display
affixes.
* A (regular) "link affix" is the affix that is linked to when the affix is shown to the user. The link affix is usually
the same as the display affix, but will differ in one of three circumstances:
*# The display and link affixes are explicitly made different using {{para|alt<var>N</var>}} parameters, `<alt:...>`
inline modifiers or piped links, as described above under "display affix".
*# For certain languages, certain affixes are mapped to canonical form using language-specific mappings. For example,
in Finnish, the adjective-forming suffix {{m|fi|-kas}} appears as {{m|fi|-käs}} after front vowels, but logically
both forms are the same suffix and should be linked and categorized the same. Similarly, in Latin, the negative and
intensive prefixes spelled {{m|la|in-}} (etymologically two distinct prefixes) appear variously as {{m|la|il-}},
{{m|la|im-}} or {{m|la|ir-}} before certain consonants. Mappings are supplied in [[Module:affix/lang-data/LANGCODE]]
to convert Finnish {{m|fi|-käs}} to {{m|fi|-kas}} for linking and categorization purposes. Note that the affixes in
the mappings use "lookup hyphens" to indicate the different types of affixes, which is usually the same as the
template hyphen but differs for Arabic scripts, because there are multiple possible template hyphens recognized but
only one lookup hyphen (tatweel). The form of the affix as used to look up in the mapping tables is called the
"lookup affix"; see below.
* A "stripped link affix" is a link affix that has been passed through the language's `stripDiacritics()` function, which
may strip certain diacritics: e.g. macrons in Latin and Old English (indicating length); acute and grave accents in
Russian and various other Slavic languages (indicating stress); vowel diacritics in most Arabic-script languages; and
also tatweel in some Arabic-script languages (currently, for example, Persian, Arabic and Urdu strip tatweel, but
Ottoman Turkish does not). Stripped link affixes are currently what are used in category names.
* A "lookup affix" is the form of the affix as it is looked up in the language-specific lookup mappings described above
under link affixes. There are actually two lookup stages:
*# First, the affix is looked up in a modified display form (specifically, the same as the display affix but using
lookup hyphens). Note that this lookup does not occur if an explicit display form is given using
{{para|alt<var>N</var>}} or an `<alt:...>` inline modifier, or if the template affix contains a piped or embedded
link.
*# If no entry is found, the affix is then looked up in a modified link form (specifically, the modified display
form passed through the language's `stripDiacritics()` function, which strips out certain diacritics, but with the
lookup hyphen re-added if it was stripped out, as in the case of tatweel in many Arabic-script languages).
The reason for this double lookup procedure is to allow for mappings that are sensitive to the extra diacritics, but
also allow for mappings that are not sensitive in this fashion (e.g. Russian {{m|ru|-ливый}} occurs both stressed and
unstressed, but is the same prefix either way).
* A "category affix" is the affix as it appears in categories such as [[:Category:Finnish terms suffixed with -kas|
Category:Finnish terms suffixed with ''-kas'']]. The category affix is currently always the same as the stripped link
affix. This means that for Arabic-script languages, it may or may not have a tatweel, even if the correponding display
affix and regular link affix have a tatweel. As mentioned above, stripDiacritics() strips tatweel for Arabic, Persian
and Urdu, but not for Ottoman Turkish. Hence affix categories for Arabic, Persian and Urdu will be missing the
tatweel, but affix categories for Ottoman Turkish will have it. An additional complication is that if the template
affix contains a ZWNJ, the display (and hence the link and category affixes) will have no hyphen attached in any case.
]==]
-----------------------------------------------------------------------------------------
-- Template and display hyphens --
-----------------------------------------------------------------------------------------
--[=[
Per-script template hyphens. The template hyphen is what appears in the {{affix}}/{{prefix}}/{{suffix}}/etc. template
(in the wikicode). See above.
They key below is a script code, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab'
and 'ur-Arab' will match 'Arab'.
The value below is a string consisting of one or more hyphen characters. If there is more than one character, the
default hyphen must come last and a non-default function must be specified for the script in display_hyphens[] so
the correct display hyphen will be specified when no template hyphen is given (in {{suffix}}/{{prefix}}/etc.).
Script detection is normally done when linking, but we need to do it earlier. However, under most circumstances we
don't need to do script detection. Specifically, we only need to do script detection for a given language if
(a) the language has multiple scripts; and
(b) at least one of those scripts is listed below or in display_hyphens.
]=]
local ZWNJ = u(0x200C) -- zero-width non-joiner
local template_hyphens = {
-- This covers all Arabic scripts. See above.
["Arab"] = "ـ" .. ZWNJ .. "-", -- tatweel + zero-width non-joiner + regular hyphen
["Hebr"] = "־", -- Hebrew-specific hyphen termed "maqqef"
["Mong"] = "᠊",
-- FIXME! What about the following right-to-left scripts?
-- Adlm (Adlam)
-- Armi (Imperial Aramaic)
-- Avst (Avestan)
-- Cprt (Cypriot)
-- Khar (Kharoshthi)
-- Mand (Mandaic/Mandaean)
-- Mani (Manichaean)
-- Mend (Mende/Mende Kikakui)
-- Narb (Old North Arabian)
-- Nbat (Nabataean/Nabatean)
-- Nkoo (N'Ko)
-- Orkh (Orkhon runes)
-- Phli (Inscriptional Pahlavi)
-- Phlp (Psalter Pahlavi)
-- Phlv (Book Pahlavi)
-- Phnx (Phoenician)
-- Prti (Inscriptional Parthian)
-- Rohg (Hanifi Rohingya)
-- Samr (Samaritan)
-- Sarb (Old South Arabian)
-- Sogd (Sogdian)
-- Sogo (Old Sogdian)
-- Syrc (Syriac)
-- Thaa (Thaana)
}
-- Hyphens used when looking up an affix in a lang-specific affix mapping. Defaults to regular hyphen (-). The keys
-- are script codes, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab' and 'ur-Arab'
-- will match 'Arab'. The value should be a single character.
local lookup_hyphens = {
["Hebr"] = "־",
-- This covers all Arabic scripts. See above.
["Arab"] = "ـ",
}
-- Default display-hyphen function.
local function default_display_hyphen(script, hyph)
if not hyph then
return template_hyphens[script] or "-"
end
return hyph
end
local function arab_get_display_hyphen(script, hyph)
if not hyph then
return "ـ" -- tatweel
elseif hyph == ZWNJ then
return ""
else
return hyph
end
end
local function no_display_hyphen(script, hyph)
return ""
end
-- Per-script function to return the correct display hyphen given the script and template hyphen. The function should
-- also handle the case where the passed-in template hyphen is nil, corresponding to the situation in
-- {{prefix}}/{{suffix}}/etc. where no template hyphen is specified. The key is the script code after removing a hyphen
-- and anything preceding, so 'fa-Arab', 'ur-Arab' etc. will match 'Arab'.
local display_hyphens = {
-- This covers all Arabic scripts. See above.
["Arab"] = arab_get_display_hyphen,
["Bopo"] = no_display_hyphen,
["Hani"] = no_display_hyphen,
["Hans"] = no_display_hyphen,
["Hant"] = no_display_hyphen,
-- The following is a mixture of several scripts. Hopefully the specs here are correct!
["Jpan"] = no_display_hyphen,
["Jurc"] = no_display_hyphen,
["Kitl"] = no_display_hyphen,
["Kits"] = no_display_hyphen,
["Laoo"] = no_display_hyphen,
["Nshu"] = no_display_hyphen,
["Shui"] = no_display_hyphen,
["Tang"] = no_display_hyphen,
["Thaa"] = no_display_hyphen,
["Thai"] = no_display_hyphen,
["Tibt"] = no_display_hyphen,
}
-----------------------------------------------------------------------------------------
-- Basic Utility functions --
-----------------------------------------------------------------------------------------
local function glossary_link(entry, text)
text = text or entry
return "[[Lampiran:Glosari#" .. entry .. "|" .. text .. "]]"
end
local function track(page)
if type(page) == "table" then
for i, pg in ipairs(page) do
page[i] = "affix/" .. pg
end
else
page = "affix/" .. page
end
require("Module:debug/track")(page)
end
local function ine(val)
return val ~= "" and val or nil
end
-----------------------------------------------------------------------------------------
-- Compound types --
-----------------------------------------------------------------------------------------
local function make_compound_type(typ, alttext)
return {
text = glossary_link(typ, alttext) .. " majmuk",
cat = typ .. " majmuk",
}
end
-- Make a compound type entry with a simple rather than glossary link.
-- These should be replaced with a glossary link when the entry in the glossary
-- is created.
local function make_non_glossary_compound_type(typ, alttext)
local link = alttext and "[[" .. typ .. "|" .. alttext .. "]]" or "[[" .. typ .. "]]"
return {
text = link .. " majmuk",
cat = typ .. " majmuk",
}
end
local function make_raw_compound_type(typ, alttext)
return {
text = glossary_link(typ, alttext),
cat = pluralize(typ),
}
end
local function make_borrowing_type(typ, alttext)
return {
text = glossary_link(typ, alttext),
borrowing_type = pluralize(typ),
}
end
export.etymology_types = {
["adapted borrowing"] = make_borrowing_type("adapted borrowing"),
["adap"] = "adapted borrowing",
["abor"] = "adapted borrowing",
["alliterative"] = make_non_glossary_compound_type("alliterative"),
["allit"] = "alliterative",
["antonymous"] = make_non_glossary_compound_type("antonymous"),
["ant"] = "antonymous",
["bahuvrihi"] = make_compound_type("bahuvrihi", "bahuvrīhi"),
["bahu"] = "bahuvrihi",
["bv"] = "bahuvrihi",
["coordinative"] = make_compound_type("coordinative"),
["coord"] = "coordinative",
["descriptive"] = make_compound_type("descriptive"),
["desc"] = "descriptive",
["determinative"] = make_compound_type("determinative"),
["det"] = "determinative",
["dvandva"] = make_compound_type("dvandva"),
["dva"] = "dvandva",
["dvigu"] = make_compound_type("dvigu"),
["dvi"] = "dvigu",
["endocentric"] = make_compound_type("endocentric"),
["endo"] = "endocentric",
["exocentric"] = make_compound_type("exocentric"),
["exo"] = "exocentric",
["izafet I"] = make_compound_type("izafet I"),
["iz1"] = "izafet I",
["izafet II"] = make_compound_type("izafet II"),
["iz2"] = "izafet II",
["izafet III"] = make_compound_type("izafet III"),
["iz3"] = "izafet III",
["karmadharaya"] = make_compound_type("karmadharaya", "karmadhāraya"),
["karma"] = "karmadharaya",
["kd"] = "karmadharaya",
["kenning"] = make_raw_compound_type("kenning"),
["ken"] = "kenning",
["rhyming"] = make_non_glossary_compound_type("rhyming"),
["rhy"] = "rhyming",
["synonymous"] = make_non_glossary_compound_type("synonymous"),
["syn"] = "synonymous",
["tatpurusa"] = make_compound_type("tatpurusa", "tatpuruṣa"),
["tat"] = "tatpurusa",
["tp"] = "tatpurusa",
}
local function process_etymology_type(typ, nocap, notext, has_parts)
local text_sections = {}
local categories = {}
local borrowing_type
if typ then
local typdata = export.etymology_types[typ]
if type(typdata) == "string" then
typdata = export.etymology_types[typdata]
end
if not typdata then
error("Internal error: Unrecognized type '" .. typ .. "'")
end
local text = typdata.text
if not nocap then
text = ucfirst(text)
end
local cat = typdata.cat
borrowing_type = typdata.borrowing_type
local oftext = typdata.oftext or " of"
if not notext then
table.insert(text_sections, text)
if has_parts then
table.insert(text_sections, oftext)
table.insert(text_sections, " ")
end
end
if cat then
table.insert(categories, cat)
end
end
return text_sections, categories, borrowing_type
end
-----------------------------------------------------------------------------------------
-- Utility functions --
-----------------------------------------------------------------------------------------
-- Iterate an array up to the greatest integer index found.
local function ipairs_with_gaps(t)
local indices = m_table.numKeys(t)
local max_index = #indices > 0 and math.max(unpack(indices)) or 0
local i = 0
return function()
while i < max_index do
i = i + 1
return i, t[i]
end
end
end
export.ipairs_with_gaps = ipairs_with_gaps
--[==[
Join formatted parts (in `parts_formatted`) together with any overall {{para|lit}} spec (in `lit`) plus categories,
which are formatted by prepending the language name as found in `lang`. The value of an entry in `categories` can be
either a string (which is formatted using `sort_key`) or a table of the form `{ {cat=<var>category</var>,
sort_key=<var>sort_key</var>, sort_base=<var>sort_base</var>}`, specifying the sort key and sort base to use when
formatting the category. If `nocat` is given, no categories are added; otherwise, `force_cat` causes categories to be
added even on userspace pages.
]==]
function export.join_formatted_parts(data)
local cattext
local lang = data.data.lang
local force_cat = data.data.force_cat or debug_force_cat
if data.data.nocat then
cattext = ""
else
for i, cat in ipairs(data.categories) do
if type(cat) == "table" then
data.categories[i] = require(utilities_module).format_categories(cat.cat .. " bahasa " .. lang:getFullName(),
lang, cat.sort_key, cat.sort_base, force_cat)
else
data.categories[i] = require(utilities_module).format_categories(cat .. " bahasa " .. lang:getFullName(), lang,
data.data.sort_key, nil, force_cat)
end
end
cattext = table.concat(data.categories)
end
local result = table.concat(data.parts_formatted, not data.separator_already_added and " +‎ " or nil) ..
(data.data.lit and ", secara harfiah " .. m_links.mark(data.data.lit, "gloss") or "")
local q = data.data.q
local qq = data.data.qq
local l = data.data.l
local ll = data.data.ll
local infl = data.data.infl
if q and q[1] or qq and qq[1] or l and l[1] or ll and ll[1] or infl and infl[1] then
result = require(pron_qualifier_module).format_qualifiers {
lang = lang,
text = result,
q = q,
qq = qq,
l = l,
ll = ll,
infl = infl,
}
end
return result .. cattext
end
local function pluralize(pos)
return pos
end
-- Remove links and call lang:stripDiacritics(term).
local function strip_diacritics_no_links(lang, term)
return lang:stripDiacritics(m_links.remove_links(term))
end
--[=[
Convert a raw part as passed into an entry point into a part ready for linking. `lang` and `sc` are the overall
language and script objects. This uses the overall language and script objects as defaults for the part and parses off
any fragment from the term. We need to do the latter so that fragments don't end up in categories and so that we
correctly do affix mapping even in the presence of fragments.
]=]
local function canonicalize_part(part, lang, sc)
if not part then
return
end
-- Save the original (user-specified, part-specific) value of `lang`. If such a value is specified, we don't insert
-- a '*fixed with' category, and we format the part using format_derived() in [[Module:etymology]] rather than
-- full_link() in [[Module:links]].
part.part_lang = part.lang
part.lang = part.lang or lang
part.sc = part.sc or sc
local term = part.term
if not term then
return
elseif not part.fragment then
part.term, part.fragment = m_links.get_fragment(term)
else
part.term = m_links.get_fragment(term)
end
end
--[==[
Construct a single linked part based on the information in `part`, for use by `show_affix()` and other entry points.
This should be called after `canonicalize_part()` is called on the part. This is a thin wrapper around `full_link()` in
[[Module:links]] unless `part.part_lang` is specified (indicating that a part-specific language was given), in which
case `format_derived()` in [[Module:etymology]] is called to display a term in a language other than the language of
the overall term (specified in `data.lang`). `data` contains the entire object passed into the entry point and is used
to access information for constructing the categories added by `format_derived()`.
]==]
function export.link_term(part, data, include_separator)
local result
if part.part_lang then
result = require(etymology_module).format_derived {
lang = data.lang,
terms = {part},
sources = {part.lang},
sort_key = data.sort_key,
nocat = data.nocat,
template_name = "affix",
qualifiers_labels_on_outside = true,
borrowing_type = data.borrowing_type,
force_cat = data.force_cat or debug_force_cat,
}
else
result = m_links.full_link(part, "term", nil, "show qualifiers")
end
if include_separator and part.separator then
return part.separator .. result
else
return result
end
end
local function canonicalize_script_code(scode)
-- Convert fa-Arab, ur-Arab etc. to Arab.
return (scode:gsub("^.*%-", ""))
end
-----------------------------------------------------------------------------------------
-- Affix-handling functions --
-----------------------------------------------------------------------------------------
-- Figure out the appropriate script for the given affix and language (unless the script is explicitly passed in), and
-- return the values of template_hyphens[], display_hyphens[] and lookup_hyphens[] for that script, substituting
-- default values as appropriate. Four values are returned:
-- DETECTED_SCRIPT, TEMPLATE_HYPHEN, DISPLAY_HYPHEN, LOOKUP_HYPHEN
local function detect_script_and_hyphens(text, lang, sc)
local scode
-- 1. If the script is explicitly passed in, use it.
if sc then
scode = sc:getCode()
else
local possible_script_codes = lang:getScriptCodes()
-- YUCK! `possible_script_codes` comes from loadData() so #possible_scripts doesn't work (always returns 0).
local num_possible_script_codes = m_table.length(possible_script_codes)
if num_possible_script_codes == 0 then
-- This shouldn't happen; if the language has no script codes,
-- the list {"None"} should be returned.
error("Something is majorly wrong! Language " .. lang:getCanonicalName() .. " has no script codes.")
end
if num_possible_script_codes == 1 then
-- 2. If the language has only one possible script, use it.
scode = possible_script_codes[1]
else
-- 3. Check if any of the possible scripts for the language have non-default values for template_hyphens[]
-- or display_hyphens[]. If so, we need to do script detection on the text. If not, just use "Latn",
-- which may not be technically correct but produces the right results because Latn has all default
-- values for template_hyphens[] and display_hyphens[].
local may_have_nondefault_hyphen = false
for _, script_code in ipairs(possible_script_codes) do
script_code = canonicalize_script_code(script_code)
if template_hyphens[script_code] or display_hyphens[script_code] then
may_have_nondefault_hyphen = true
break
end
end
if not may_have_nondefault_hyphen then
scode = "Latn"
else
scode = lang:findBestScript(text):getCode()
end
end
end
scode = canonicalize_script_code(scode)
local template_hyphen = template_hyphens[scode] or "-"
local lookup_hyphen = lookup_hyphens[scode] or "-"
local display_hyphen = display_hyphens[scode] or default_display_hyphen
return scode, template_hyphen, display_hyphen, lookup_hyphen
end
--[=[
Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to
the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string,
specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen,
or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that
matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only
the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix).
]=]
local function reconstruct_term_per_hyphens(term, affix_type, scode, thyph_re, new_hyphen)
local function get_hyphen(hyph)
if type(new_hyphen) == "string" then
return new_hyphen
end
return new_hyphen(scode, hyph)
end
if affix_type == "non-affix" then
return term
elseif affix_type == "apitan" then
local before, before_hyphen, after_hyphen, after = rmatch(term, "^(.*)" .. thyph_re .. " " .. thyph_re
.. "(.*)$")
if not before or ulen(term) <= 3 then
-- Unlike with other types of affixes, don't try to add hyphens in the middle of the term to convert it to
-- a circumfix. Also, if the term is just hyphen + space + hyphen, return it.
return term
end
return before .. get_hyphen(before_hyphen) .. " " .. get_hyphen(after_hyphen) .. after
elseif affix_type == "sisipan" or affix_type == "jalinan" then
local before_hyphen, middle, after_hyphen = rmatch(term, "^" .. thyph_re .. "(.*)" .. thyph_re .. "$")
if before_hyphen and ulen(term) <= 1 then
-- If the term is just a hyphen, return it.
return term
end
return get_hyphen(before_hyphen) .. (middle or term) .. get_hyphen(after_hyphen)
elseif affix_type == "awalan" then
local middle, after_hyphen = rmatch(term, "^(.*)" .. thyph_re .. "$")
if middle and ulen(term) <= 1 then
-- If the term is just a hyphen, return it.
return term
end
return (middle or term) .. get_hyphen(after_hyphen)
elseif affix_type == "akhiran" then
local before_hyphen, middle = rmatch(term, "^" .. thyph_re .. "(.*)$")
if before_hyphen and ulen(term) <= 1 then
-- If the term is just a hyphen, return it.
return term
end
return get_hyphen(before_hyphen) .. (middle or term)
else
error(("Internal error: Unrecognized affix type '%s'"):format(affix_type))
end
end
--[=[
Look up a mapping from a given affix variant to the canonical form used in categories and links. The lookup tables are
language-specific according to `lang`, and may be ID-specific according to `affix_id`. The affixes as they appear in the
lookup tables (both the variant and the canonical form) are in "lookup affix" format (approximately speaking, they use a
regular hyphen for most scripts, but a tatweel for Arabic-script entries and a maqqef for Hebrew-script entries), but
the passed-in `affix` param is in "template affix" format (which differs from the lookup affix for Arabic-script
entries, because more types of hyphens are allowed in template affixes; see the comments at the top of the file). The
remaining parameters to this function are used to convert from template affixes to lookup affixes; see the
reconstruct_term_per_hyphens() function above.
If the affix contains brackets, no lookup is done. Otherwise, a two-stage process is used, first looking up the affix
directly and then stripping diacritics and looking it up again. The reason for this is documented above in the comments
at the top of the file (specifically, the comments describing lookup affixes).
The value of a mapping can either be a string (do the mapping regardless of affix ID) or a table indexed by affix ID
(where the special value `false` indicates no affix ID). The values of entries in this table can also be strings, or
tables with keys `affix` and `id` (again, use `false` to indicate no ID). This allows an affix mapping to map from one
ID to another (for example, this is used in English to map the [[an-]] prefix with no ID to the [[a-]] prefix with the
ID 'not').
The Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to
the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string,
specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen,
or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that
matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only
the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix).
]=]
local function lookup_affix_mapping(affix, affix_type, lang, scode, thyph_re, lookup_hyph, affix_id)
local function do_lookup(affix)
-- Ensure that the affix uses lookup hyphens regardless of whether it used a different type of hyphens before
-- or no hyphens.
local lookup_affix = reconstruct_term_per_hyphens(affix, affix_type, scode, thyph_re, lookup_hyph)
local function do_lookup_for_langcode(langcode)
if export.langs_with_lang_specific_data[langcode] then
local langdata = mw.loadData(export.affix_lang_data_module_prefix .. langcode)
if langdata.affix_mappings then
local mapping = langdata.affix_mappings[lookup_affix]
if mapping then
if type(mapping) == "table" then
mapping = mapping[affix_id] or mapping.default or mapping[affix_id or false]
if mapping then
return mapping
end
else
return mapping
end
end
end
end
end
-- If `lang` is an etymology-only language, look for a mapping both for it and its full parent.
local langcode = lang:getCode()
local mapping = do_lookup_for_langcode(langcode)
if mapping then
return mapping
end
local full_langcode = lang:getFullCode()
if full_langcode ~= langcode then
mapping = do_lookup_for_langcode(full_langcode)
if mapping then
return mapping
end
end
return nil
end
if affix:find("%[%[") then
return nil
end
return do_lookup(affix) or do_lookup(lang:stripDiacritics(affix)) or nil
end
--[==[
For a given template term in a given language (see the definition of "template affix" near the top of the file),
possibly in an explicitly specified script `sc` (but usually nil), return the term's affix type ({"awalan"},
{"jalinan"}, {"akhiran"}, {"apitan"} or {"non-affix"}) along with the corresponding link and display affixes
(see definitions near the top of the file); also the corresponding lookup affix (if `return_lookup_affix` is specified).
The term passed in should already have any fragment (after the # sign) parsed off of it. Four values are returned:
`affix_type`, `link_term`, `display_term` and `lookup_term`. The affix type can be passed in instead of autodetected; in
this case, the template term need not have any attached hyphens, and the appropriate hyphens will be added in the
appropriate places. If `do_affix_mapping` is specified, look up the affix in the lang-specific affix mappings, as
described in the comment at the top of the file; otherwise, the link and display terms will always be the same. (They
will be the same in any case if the template term has a bracketed link in it or is not an affix.) If
`return_lookup_affix` is given, the fourth return value contains the term with appropriate lookup hyphens in the
appropriate places; otherwise, it is the same as the display term. (This functionality is used in
[[Module:category tree/affixes and compounds]] to convert link affixes into lookup affixes so that they can be looked up
in the affix mapping tables.)
]==]
local function parse_term_for_affixes(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id)
if not term then
return "non-affix", nil, nil, nil
end
if term == "^" then
-- Indicates a null term to emulate the behavior of {{suffix|foo||bar}}.
term = ""
return "non-affix", term, term, term
end
if term:find("^%^") then
-- HACK! ^ at the beginning of Korean languages has a special meaning, triggering capitalization of the
-- transliteration. Don't interpret it as "force non-affix" for those languages.
local langcode = lang:getCode()
if langcode ~= "ko" and langcode ~= "okm" and langcode ~= "jje" then
-- Formerly we allowed ^ to force non-affix type; this is now handled using an inline modifier
-- <naf>, <root>, etc. Throw an error for the moment when the old way is encountered.
error("Use of ^ to force non-affix status is no longer supported; use an inline modifier <naf> or <root> " ..
"after the component")
end
end
-- Remove an asterisk if the morpheme is reconstructed and add it back at the end.
local reconstructed = ""
if term:find("^%*") then
reconstructed = "*"
term = term:gsub("^%*", "")
end
local scode, thyph, dhyph, lhyph = detect_script_and_hyphens(term, lang, sc)
thyph = "([" .. thyph .. "])"
if not affix_type then
if rfind(term, thyph .. " " .. thyph) then
affix_type = "apitan"
else
local has_beginning_hyphen = rfind(term, "^" .. thyph)
local has_ending_hyphen = rfind(term, thyph .. "$")
if has_beginning_hyphen and has_ending_hyphen then
affix_type = "jalinan"
elseif has_ending_hyphen then
affix_type = "awalan"
elseif has_beginning_hyphen then
affix_type = "akhiran"
else
affix_type = "non-affix"
end
end
end
local link_term, display_term, lookup_term
if affix_type == "non-affix" then
link_term = term
display_term = term
lookup_term = term
else
display_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, dhyph)
if do_affix_mapping then
link_term = lookup_affix_mapping(term, affix_type, lang, scode, thyph, lhyph, affix_id)
-- The return value of lookup_affix_mapping() may be an affix mapping with lookup hyphens if a mapping
-- was found, otherwise nil if a mapping was not found. We need to convert to display hyphens in
-- either case, but in the latter case we can reuse the display term, which has already been converted.
if link_term then
link_term = reconstruct_term_per_hyphens(link_term, affix_type, scode, thyph, dhyph)
else
link_term = display_term
end
else
link_term = display_term
end
if return_lookup_affix then
lookup_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, lhyph)
else
lookup_term = display_term
end
end
link_term = reconstructed .. link_term
display_term = reconstructed .. display_term
lookup_term = reconstructed .. lookup_term
return affix_type, link_term, display_term, lookup_term
end
--[==[
Add a hyphen to a term in the appropriate place, based on the specified affix type, stripping off any existing hyphens
in that place. For example, if `affix_type` == {"awalan"}, we'll add a hyphen onto the end if it's not already there (or
is of the wrong type). Three values are returned: the link term, display term and lookup term. This function is a thin
wrapper around `parse_term_for_affixes`; see the comments above that function for more information. Note that this
function is exposed externally because it is called by [[Module:category tree/affixes and compounds]]; see the comment
in `parse_term_for_affixes` for more information.
]==]
function export.make_affix(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id)
if not (affix_type == "awalan" or affix_type == "akhiran" or affix_type == "apitan" or affix_type == "sisipan" or
affix_type == "jalinan" or affix_type == "non-affix") then
error("Internal error: Invalid affix type " .. (affix_type or "(nil)"))
end
local _, link_term, display_term, lookup_term = parse_term_for_affixes(term, lang, sc, affix_type,
do_affix_mapping, return_lookup_affix, affix_id)
return link_term, display_term, lookup_term
end
-----------------------------------------------------------------------------------------
-- Main entry points --
-----------------------------------------------------------------------------------------
--[==[
Core categorization logic for affixes. This is shared between show_affix(), show_compound_like() and
get_affix_categories_only(). Returns the categories array and other metadata needed for formatting.
]==]
local function generate_affix_categories(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
local text_sections, categories, borrowing_type =
process_etymology_type(data.type, data.surface_analysis or data.nocap, data.notext, #data.parts > 0)
data.borrowing_type = borrowing_type
-- Process each part
local whole_words = 0
local is_affix_or_compound = false
-- Canonicalize and generate links for all the parts first; then do categorization in a separate step, because when
-- processing the first part for categorization, we may access the second part and need it already canonicalized.
for i, part in ipairs_with_gaps(data.parts) do
part = part or {}
data.parts[i] = part
canonicalize_part(part, data.lang, data.sc)
-- Determine affix type and get link and display terms (see text at top of file). Store them in the part
-- (in fields that won't clash with fields used by full_link() in [[Module:links]] or link_term()), so they
-- can be used in the loop below when categorizing.
part.affix_type, part.affix_link_term, part.affix_display_term = parse_term_for_affixes(part.term,
part.lang, part.sc, part.type, not part.alt, nil, part.id)
-- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with inline
-- modifiers. The intention in either case is not to link the term.
part.term = ine(part.affix_link_term)
-- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being
-- redundant alt text.
part.alt = part.alt or (part.affix_display_term ~= part.affix_link_term and part.affix_display_term) or nil
end
if not data.noaffixcat then
-- Now do categorization.
for i, part in ipairs_with_gaps(data.parts) do
local affix_type = part.affix_type
if affix_type ~= "non-affix" then
is_affix_or_compound = true
-- Make a sort key. For the first part, use the second part as the sort key; the intention is that if the
-- term has a prefix, sorting by the prefix won't be very useful so we sort by what follows, which is
-- presumably the root.
local part_sort_base = nil
local part_sort = part.sort or data.sort_key
if i == 1 and data.parts[2] and data.parts[2].term then
local part2 = data.parts[2]
-- If the second-part link term is empty, the user requested an unlinked term; avoid a wikitext error
-- by using the alt value if available.
part_sort_base = ine(part2.affix_link_term) or ine(part2.alt)
if part_sort_base then
part_sort_base = strip_diacritics_no_links(part2.lang, part_sort_base)
end
end
if part.pos and rfind(part.pos, "patronym") then
table.insert(categories, {cat = "patronim", sort_key = part_sort, sort_base = part_sort_base})
end
if data.pos ~= "terms" and part.pos and rfind(part.pos, "diminutive") then
table.insert(categories, {cat = data.pos .. " diminutif", sort_key = part_sort,
sort_base = part_sort_base})
end
-- Don't add a '*fixed with' category if the link term is empty or is in a different language.
if ine(part.affix_link_term) and not part.part_lang then
table.insert(categories, {cat = data.pos .. " dengan " .. affix_type .. " " ..
strip_diacritics_no_links(part.lang, part.affix_link_term) ..
(part.id and " (" .. part.id .. ")" or ""),
sort_key = part_sort, sort_base = part_sort_base})
end
else
whole_words = whole_words + 1
if whole_words == 2 then
is_affix_or_compound = true
table.insert(categories, data.pos .. " majmuk")
end
end
end
-- Make sure there was either an affix or a compound (two or more non-affix terms).
if not is_affix_or_compound and not data.allow_no_affixes_or_compounds then
error("The parameters did not include any affixes, and the term is not a compound. Please provide at least one affix.")
end
end
return text_sections, categories, borrowing_type
end
--[==[
Implementation of {{tl|affix}} and {{tl|surface analysis}}. `data` contains all the information describing the affixes to
be displayed, and contains the following:
* `.lang` ('''required'''): Overall language object. Different from term-specific language objects (see `.parts` below).
* `.sc`: Overall script object (usually omitted). Different from term-specific script objects.
* `.parts` ('''required'''): List of objects describing the affixes to show. The general format of each object is as would
be passed to `full_link()`, except that the `.lang` field should be missing unless the term is of a language
different from the overall `.lang` value (in such a case, the language name is shown along with the term and
an additional "derived from" category is added). '''WARNING''': The data in `.parts` will be destructively
modified.
* `.pos`: Overall part of speech (used in categories, defaults to {"terms"}). Different from term-specific part of speech.
* `.sort_key`: Overall sort key. Normally omitted except e.g. in Japanese.
* `.type`: Type of compound, if the parts in `.parts` describe a compound. Strictly optional, and if supplied, the
compound type is displayed before the parts (normally capitalized, unless `.nocap` is given).
* `.nocap`: Don't capitalize the first letter of text displayed before the parts (relevant only if `.type` or
`.surface_analysis` is given).
* `.notext`: Don't display any text before the parts (relevant only if `.type` or `.surface_analysis` is given).
* `.nocat`: Disable all categorization.
* `.noaffixcat`: Disable affix (and compound) categorization. Relevant for e.g. blends, which may otherwise
be incorrectly categorized as compound terms.
* `.lit`: Overall literal definition. Different from term-specific literal definitions.
* `.force_cat`: Always display categories, even on userspace pages.
* `.surface_analysis`: Implement {{surface analysis}}; adds `By surface analysis, ` before the parts.
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
function export.show_affix(data)
local text_sections, categories, borrowing_type = generate_affix_categories(data)
-- Process each part for display
local parts_formatted = {}
for i, part in ipairs_with_gaps(data.parts) do
-- Make a link for the part
table.insert(parts_formatted, export.link_term(part, data, "include_separator"))
end
if data.surface_analysis then
local text = "dengan " .. glossary_link("surface analysis") .. ", "
if not data.nocap then
text = ucfirst(text)
end
table.insert(text_sections, 1, text)
end
table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted,
categories = categories, separator_already_added = true })
return table.concat(text_sections)
end
--[==[
Get only the categories that would be generated by show_affix(), without any text output or formatting.
This is used by Module:etymon to get affix categorization.
Returns an array of category objects, where
each entry is either a string (simple category name) or a table with keys `cat`, `sort_key`,
and `sort_base` for more complex categorization.
`data` should have the same structure as passed to show_affix():
* `.lang` (required): Overall language object
* `.parts` (required): Array of affix part objects with `.term`, `.lang`, `.id`, etc.
* `.pos`: Part of speech (defaults to "terms")
* `.sort_key`: Overall sort key for categories
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
function export.get_affix_categories_only(data)
local text_sections, categories, borrowing_type = generate_affix_categories(data)
return categories
end
function export.show_surface_analysis(data)
data.surface_analysis = true
data.allow_no_affixes_or_compounds = true
return export.show_affix(data)
end
--[==[
Implementation of {{tl|compound}}.
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
function export.show_compound(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
local text_sections, categories, borrowing_type =
process_etymology_type(data.type, data.nocap, data.notext, #data.parts > 0)
data.borrowing_type = borrowing_type
local parts_formatted = {}
local pos_for_category = (data.pos == "Perkataan") and "Kata" or data.pos
table.insert(categories, pos_for_category .. " majmuk")
-- Make links out of all the parts
local whole_words = 0
for i, part in ipairs(data.parts) do
canonicalize_part(part, data.lang, data.sc)
-- Determine affix type and get link and display terms (see text at top of file).
local affix_type, link_term, display_term = parse_term_for_affixes(part.term, part.lang, part.sc,
part.type, not part.alt, nil, part.id)
-- If the term is an interfix or the type was explicitly given, recognize it as such (which means e.g. that we
-- will display the term without hyphens for East Asian languages). Otherwise, ignore the fact that it looks
-- like an affix and display as specified in the template (but pay attention to the detected affix type for
-- certain tracking purposes).
if affix_type == "jalinan" or (part.type and part.type ~= "non-affix") then
-- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with
-- inline modifiers. The intention in either case is not to link the term. Don't add a '*fixed with'
-- category in this case, or if the term is in a different language.
-- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being
-- redundant alt text.
if link_term and link_term ~= "" and not part.part_lang then
table.insert(categories, {cat = data.pos .. " dengan " .. affix_type .. " " ..
strip_diacritics_no_links(part.lang, link_term), sort_key = part.sort or data.sort_key})
end
part.term = link_term ~= "" and link_term or nil
part.alt = part.alt or (display_term ~= link_term and display_term) or nil
else
if affix_type ~= "non-affix" then
local langcode = data.lang:getCode()
-- If `data.lang` is an etymology-only language, track both using its code and its full parent's code.
track { affix_type, affix_type .. "/lang/" .. langcode }
local full_langcode = data.lang:getFullCode()
if langcode ~= full_langcode then
track(affix_type .. "/lang/" .. full_langcode)
end
else
whole_words = whole_words + 1
end
end
table.insert(parts_formatted, export.link_term(part, data, "include_separator"))
end
if whole_words == 1 then
track("one whole word")
elseif whole_words == 0 then
track("looks like confix")
end
table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted,
categories = categories, separator_already_added = true })
return table.concat(text_sections)
end
--[==[
Implementation of {{tl|blend}}, {{tl|univerbation}} and similar "compound-like" templates.
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
function export.show_compound_like(data)
data.allow_no_affixes_or_compounds = true
local text_sections, categories, borrowing_type = generate_affix_categories(data)
if data.cat then
table.insert(categories, data.cat)
end
-- Process each part for display
local parts_formatted = {}
for i, part in ipairs_with_gaps(data.parts) do
-- Make a link for the part
table.insert(parts_formatted, export.link_term(part, data, "include_separator"))
end
if #data.parts > 0 and data.oftext then
table.insert(text_sections, 1, " " .. data.oftext .. " ")
end
if data.text then
table.insert(text_sections, 1, data.text)
end
table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted,
categories = categories, separator_already_added = true })
return table.concat(text_sections)
end
--[==[
Make `part` (a structure holding information on an affix part) into an affix of type `affix_type`, and apply any
relevant affix mappings. For example, if the desired affix type is "akhiran", this will (in general) add a hyphen onto
the beginning of the term, alt, tr and ts components of the part if not already present. The hyphen that's added is the
"display hyphen" (see above) and may be script-specific. (In the case of East Asian scripts, the display hyphen is an
empty string whereas the template hyphen is the regular hyphen, meaning that any regular hyphen at the beginning of the
part will be effectively removed.) `lang` and `sc` hold overall language and script objects.
Note that this also applies any language-specific affix mappings, so that e.g. if the language is Finnish and the user
specified [[-käs]] in the affix and didn't specify an `.alt` value, `part.term` will contain [[-kas]] and `part.alt` will
contain [[-käs]].
This function is used by the "legacy" templates ({{tl|prefix}}, {{tl|suffix}}, {{tl|confix}}, etc.) where the nature of
the affix is specified by the template itself rather than auto-determined from the affix, as is the case with
{{tl|affix}}.
'''WARNING''': This destructively modifies `part`.
]==]
local function make_part_into_affix(part, lang, sc, affix_type)
canonicalize_part(part, lang, sc)
local link_term, display_term = export.make_affix(part.term, part.lang, part.sc, affix_type, not part.alt, nil, part.id)
part.term = link_term
-- When we don't specify `do_affix_mapping` to make_affix(), link and display terms (first and second retvals of
-- make_affix()) are the same.
-- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being
-- redundant alt text.
part.alt = part.alt and export.make_affix(part.alt, part.lang, part.sc, affix_type) or (display_term ~= link_term and display_term) or nil
local Latn = require(scripts_module).getByCode("Latn")
part.tr = export.make_affix(part.tr, part.lang, Latn, affix_type)
part.ts = export.make_affix(part.ts, part.lang, Latn, affix_type)
end
local function track_wrong_affix_type(template, part, expected_affix_type)
if part and not part.type then
local affix_type = parse_term_for_affixes(part.term, part.lang, part.sc)
if affix_type ~= expected_affix_type then
local part_name = expected_affix_type or "base"
local langcode = part.lang:getCode()
local full_langcode = part.lang:getFullCode()
require("Module:debug/track") {
template,
template .. "/" .. part_name,
template .. "/" .. part_name .. "/" .. (affix_type or "none"),
template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. langcode
}
-- If `part.lang` is an etymology-only language, track both using its code and its full parent's code.
if full_langcode ~= langcode then
require("Module:debug/track")(
template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. full_langcode
)
end
end
end
end
local function insert_affix_category(categories, pos, affix_type, part, sort_key, sort_base)
-- Don't add a '*fixed with' category if the link term is empty or is in a different language.
if part.term and not part.part_lang then
local cat = pos .. " dengan " .. affix_type .. " " .. make_entry_name_no_links(part.lang, part.term) ..
(part.id and " (" .. part.id .. ")" or "")
if sort_key or sort_base then
table.insert(categories, {cat = cat, sort_key = sort_key, sort_base = sort_base})
else
table.insert(categories, cat)
end
end
end
--[==[
Implementation of {{tl|circumfix}}.
'''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`.
]==]
function export.show_circumfix(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
make_part_into_affix(data.prefix, data.lang, data.sc, "awalan")
make_part_into_affix(data.suffix, data.lang, data.sc, "akhiran")
track_wrong_affix_type("apitan", data.prefix, "awalan")
track_wrong_affix_type("apitan", data.base, nil)
track_wrong_affix_type("apitan", data.suffix, "akhiran")
-- Create circumfix term.
local circumfix = nil
if data.prefix.term and data.suffix.term then
circumfix = data.prefix.term .. " " .. data.suffix.term
data.prefix.alt = data.prefix.alt or data.prefix.term
data.suffix.alt = data.suffix.alt or data.suffix.term
data.prefix.term = circumfix
data.suffix.term = circumfix
end
-- Make links out of all the parts.
local parts_formatted = {}
local categories = {}
local sort_base
if data.base.term then
sort_base = strip_diacritics_no_links(data.base.lang, data.base.term)
end
table.insert(parts_formatted, export.link_term(data.prefix, data))
table.insert(parts_formatted, export.link_term(data.base, data))
table.insert(parts_formatted, export.link_term(data.suffix, data))
-- Insert the categories, but don't add a '*fixed with' category if the link term is in a different language.
if not data.prefix.part_lang then
table.insert(categories, {cat=data.pos .. " dengan apitan " .. strip_diacritics_no_links(data.prefix.lang,
circumfix), sort_key=data.sort_key, sort_base=sort_base})
end
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
--[==[
Implementation of {{tl|confix}}.
'''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`.
]==]
function export.show_confix(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
make_part_into_affix(data.prefix, data.lang, data.sc, "awalan")
make_part_into_affix(data.suffix, data.lang, data.sc, "akhiran")
track_wrong_affix_type("confix", data.prefix, "awalan")
track_wrong_affix_type("confix", data.base, nil)
track_wrong_affix_type("confix", data.suffix, "akhiran")
-- Make links out of all the parts.
local parts_formatted = {}
local prefix_sort_base
if data.base and data.base.term then
prefix_sort_base = strip_diacritics_no_links(data.base.lang, data.base.term)
elseif data.suffix.term then
prefix_sort_base = strip_diacritics_no_links(data.suffix.lang, data.suffix.term)
end
-- Insert the categories and parts.
local categories = {}
table.insert(parts_formatted, export.link_term(data.prefix, data))
insert_affix_category(categories, data.pos, "awalan", data.prefix, data.sort_key, prefix_sort_base)
if data.base then
table.insert(parts_formatted, export.link_term(data.base, data))
end
table.insert(parts_formatted, export.link_term(data.suffix, data))
-- FIXME, should we be specifying a sort base here?
insert_affix_category(categories, data.pos, "akhiran", data.suffix)
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
--[==[
Implementation of {{tl|infix}}.
'''WARNING''': This destructively modifies both `data` and `.base` and `.infix`.
]==]
function export.show_infix(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
make_part_into_affix(data.infix, data.lang, data.sc, "sisipan")
track_wrong_affix_type("sisipan", data.base, nil)
track_wrong_affix_type("sisipan", data.infix, "sisipan")
-- Make links out of all the parts.
local parts_formatted = {}
local categories = {}
table.insert(parts_formatted, export.link_term(data.base, data))
table.insert(parts_formatted, export.link_term(data.infix, data))
-- Insert the categories.
-- FIXME, should we be specifying a sort base here?
insert_affix_category(categories, data.pos, "sisipan", data.infix)
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
--[==[
Implementation of {{tl|prefix}}.
'''WARNING''': This destructively modifies both `data` and the structures within `.prefixes`, as well as `.base`.
]==]
function export.show_prefix(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
for i, prefix in ipairs(data.prefixes) do
make_part_into_affix(prefix, data.lang, data.sc, "awalan")
end
for i, prefix in ipairs(data.prefixes) do
track_wrong_affix_type("awalan", prefix, "awalan")
end
track_wrong_affix_type("awalan", data.base, nil)
-- Make links out of all the parts.
local parts_formatted = {}
local first_sort_base = nil
local categories = {}
if data.prefixes[2] then
first_sort_base = ine(data.prefixes[2].term) or ine(data.prefixes[2].alt)
if first_sort_base then
first_sort_base = strip_diacritics_no_links(data.prefixes[2].lang, first_sort_base)
end
elseif data.base then
first_sort_base = ine(data.base.term) or ine(data.base.alt)
if first_sort_base then
first_sort_base = strip_diacritics_no_links(data.base.lang, first_sort_base)
end
end
for i, prefix in ipairs(data.prefixes) do
table.insert(parts_formatted, export.link_term(prefix, data))
insert_affix_category(categories, data.pos, "awalan", prefix, data.sort_key, i == 1 and first_sort_base or nil)
end
if data.base then
table.insert(parts_formatted, export.link_term(data.base, data))
else
table.insert(parts_formatted, "")
end
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
--[==[
Implementation of {{tl|suffix}}.
'''WARNING''': This destructively modifies both `data` and the structures within `.suffixes`, as well as `.base`.
]==]
function export.show_suffix(data)
local categories = {}
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
for i, suffix in ipairs(data.suffixes) do
make_part_into_affix(suffix, data.lang, data.sc, "akhiran")
end
track_wrong_affix_type("akhiran", data.base, nil)
for i, suffix in ipairs(data.suffixes) do
track_wrong_affix_type("akhiran", suffix, "akhiran")
end
-- Make links out of all the parts.
local parts_formatted = {}
if data.base then
table.insert(parts_formatted, export.link_term(data.base, data))
else
table.insert(parts_formatted, "")
end
for i, suffix in ipairs(data.suffixes) do
table.insert(parts_formatted, export.link_term(suffix, data))
end
-- Insert the categories.
for i, suffix in ipairs(data.suffixes) do
-- FIXME, should we be specifying a sort base here?
insert_affix_category(categories, data.pos, "akhiran", suffix)
if suffix.pos and rfind(suffix.pos, "patronym") then
table.insert(categories, "patronim")
end
end
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
return export
r86ilta92vkxbd2j53n2sb0jvjy4ggi
281461
281459
2026-04-23T04:34:41Z
Hakimi97
2668
281461
Scribunto
text/plain
local export = {}
local debug_force_cat = false -- if set to true, always display categories even on userspace pages
local m_links = require("Module:links")
local m_str_utils = require("Module:string utilities")
local m_table = require("Module:table")
local en_utilities_module = "Module:en-utilities"
local etymology_module = "Module:etymology"
local pron_qualifier_module = "Module:pron qualifier"
local scripts_module = "Module:scripts"
local utilities_module = "Module:utilities"
-- Export this so the category code in [[Module:category tree/etymology]] can access it.
export.affix_lang_data_module_prefix = "Module:affix/lang-data/"
local rsub = m_str_utils.gsub
local usub = m_str_utils.sub
local ulen = m_str_utils.len
local rfind = m_str_utils.find
local rmatch = m_str_utils.match
local pluralize = require(en_utilities_module).pluralize
local u = m_str_utils.char
local ucfirst = m_str_utils.ucfirst
local unpack = unpack or table.unpack -- Lua 5.2 compatibility
function export.affix_variants(canonical, variants)
local mappings = {}
for _, variant in ipairs(variants) do
mappings[variant] = canonical
end
return mappings
end
function export.id_mapping(default, ids)
local mapping = { default = default }
if ids then
for id, target in pairs(ids) do
mapping[id] = target
end
end
return mapping
end
function export.id_mapping_with_affix_variants(base, id_variants)
local mappings = {}
for id, variants in pairs(id_variants) do
for _, variant in ipairs(variants) do
mappings[variant] = export.id_mapping(base, {[id] = base})
end
end
return mappings
end
function export.merge_tables(...)
local result = {}
for i = 1, select('#', ...) do
local t = select(i, ...)
if t then
for k, v in pairs(t) do
result[k] = v
end
end
end
return result
end
-- Export this so the category code in [[Module:category tree/etymology]] can access it.
export.langs_with_lang_specific_data = {
["az"] = true,
["fi"] = true,
["fr"] = true,
["izh"] = true,
["la"] = true,
["sah"] = true,
["tr"] = true,
["trk-pro"] = true,
}
local default_pos = "Perkataan"
--[==[ intro:
===About different types of hyphens ("template", "display" and "lookup"):===
* The "template hyphen" is the per-script hyphen character that is used in template calls to indicate that a term is an
affix. This is always a single Unicode char, but there may be multiple possible hyphens for a given script. Normally
this is just the regular hyphen character "-", but for some non-Latin-script languages (currently only right-to-left
languages), it is different.
* The "display hyphen" is the string (which might be an empty string) that is added onto a term as displayed and linked,
to indicate that a term is an affix. Currently this is always either the same as the template hyphen or an empty
string, but the code below is written generally enough to handle arbitrary display hyphens. Specifically:
*# For East Asian languages, the display hyphen is always blank.
*# For Arabic-script languages, either tatweel (ـ) or ZWNJ (zero-width non-joiner) are allowed as template hyphens,
where ZWNJ is supported primarily for Farsi, because some suffixes have non-joining behavior. The display hyphen
corresponding to tatweel is also tatweel, but the display hyphen corresponding to ZWNJ is blank (tatweel is also
the default display hyphen, for calls to {{tl|prefix}}/{{tl|suffix}}/etc. that don't include an explicit hyphen).
* The "lookup hyphen" is the hyphen that is used when looking up language-specific affix mappings. (These mappings are
discussed in more detail below when discussing link affixes.) It depends only on the script of the affix in question.
Most scripts (including East Asian scripts) use a regular hyphen "-" as the lookup hyphen, but Hebrew and Arabic
have their own lookup hyphens (respectively maqqef and tatweel). Note that for Arabic in particular, there are
three possible template hyphens that are recognized (tatweel, ZWNJ and regular hyphen), but mappings must use tatweel.
===About different types of affixes ("template", "display", "link", "lookup" and "category"):===
* A "template affix" is an affix in its source form as it appears in a template call. Generally, a template affix has an
attached template hyphen (see above) to indicate that it is an affix and indicate what type of affix it is (prefix,
suffix, interfix or circumfix), but some of the older-style templates such as {{tl|suffix}}, {{tl|prefix}},
{{tl|confix}}, etc. have "positional" affixes where the presence of the affix in a certain position (e.g. the second
or third parameter) indicates that it is a certain type of affix, whether or not it has an attached template hyphen.
* A "display affix" is the corresponding affix as it is actually displayed to the user. The display affix may differ
from the template affix for various reasons:
*# The display affix may be specified explicitly using the {{para|alt<var>N</var>}} parameter, the `<alt:...>` inline
modifier or a piped link of the form e.g. `<nowiki>[[-kas|-käs]]</nowiki>` (here indicating that the affix should
display as `-käs` but be linked as `-kas`). Here, the template affix is arguably the entire piped link, while the
display affix is `-käs`.
*# Even in the absence of {{para|alt<var>N</var>}} parameters, `<alt:...>` inline modifiers and piped links, certain
languages have differences between the "template hyphen" specified in the template (which always needs to be
specified somehow or other in templates like {{tl|affix}}, to indicate that the term is an affix and what type of
affix it is) and the display hyphen (see above), with corresponding differences between template and display
affixes.
* A (regular) "link affix" is the affix that is linked to when the affix is shown to the user. The link affix is usually
the same as the display affix, but will differ in one of three circumstances:
*# The display and link affixes are explicitly made different using {{para|alt<var>N</var>}} parameters, `<alt:...>`
inline modifiers or piped links, as described above under "display affix".
*# For certain languages, certain affixes are mapped to canonical form using language-specific mappings. For example,
in Finnish, the adjective-forming suffix {{m|fi|-kas}} appears as {{m|fi|-käs}} after front vowels, but logically
both forms are the same suffix and should be linked and categorized the same. Similarly, in Latin, the negative and
intensive prefixes spelled {{m|la|in-}} (etymologically two distinct prefixes) appear variously as {{m|la|il-}},
{{m|la|im-}} or {{m|la|ir-}} before certain consonants. Mappings are supplied in [[Module:affix/lang-data/LANGCODE]]
to convert Finnish {{m|fi|-käs}} to {{m|fi|-kas}} for linking and categorization purposes. Note that the affixes in
the mappings use "lookup hyphens" to indicate the different types of affixes, which is usually the same as the
template hyphen but differs for Arabic scripts, because there are multiple possible template hyphens recognized but
only one lookup hyphen (tatweel). The form of the affix as used to look up in the mapping tables is called the
"lookup affix"; see below.
* A "stripped link affix" is a link affix that has been passed through the language's `stripDiacritics()` function, which
may strip certain diacritics: e.g. macrons in Latin and Old English (indicating length); acute and grave accents in
Russian and various other Slavic languages (indicating stress); vowel diacritics in most Arabic-script languages; and
also tatweel in some Arabic-script languages (currently, for example, Persian, Arabic and Urdu strip tatweel, but
Ottoman Turkish does not). Stripped link affixes are currently what are used in category names.
* A "lookup affix" is the form of the affix as it is looked up in the language-specific lookup mappings described above
under link affixes. There are actually two lookup stages:
*# First, the affix is looked up in a modified display form (specifically, the same as the display affix but using
lookup hyphens). Note that this lookup does not occur if an explicit display form is given using
{{para|alt<var>N</var>}} or an `<alt:...>` inline modifier, or if the template affix contains a piped or embedded
link.
*# If no entry is found, the affix is then looked up in a modified link form (specifically, the modified display
form passed through the language's `stripDiacritics()` function, which strips out certain diacritics, but with the
lookup hyphen re-added if it was stripped out, as in the case of tatweel in many Arabic-script languages).
The reason for this double lookup procedure is to allow for mappings that are sensitive to the extra diacritics, but
also allow for mappings that are not sensitive in this fashion (e.g. Russian {{m|ru|-ливый}} occurs both stressed and
unstressed, but is the same prefix either way).
* A "category affix" is the affix as it appears in categories such as [[:Category:Finnish terms suffixed with -kas|
Category:Finnish terms suffixed with ''-kas'']]. The category affix is currently always the same as the stripped link
affix. This means that for Arabic-script languages, it may or may not have a tatweel, even if the correponding display
affix and regular link affix have a tatweel. As mentioned above, stripDiacritics() strips tatweel for Arabic, Persian
and Urdu, but not for Ottoman Turkish. Hence affix categories for Arabic, Persian and Urdu will be missing the
tatweel, but affix categories for Ottoman Turkish will have it. An additional complication is that if the template
affix contains a ZWNJ, the display (and hence the link and category affixes) will have no hyphen attached in any case.
]==]
-----------------------------------------------------------------------------------------
-- Template and display hyphens --
-----------------------------------------------------------------------------------------
--[=[
Per-script template hyphens. The template hyphen is what appears in the {{affix}}/{{prefix}}/{{suffix}}/etc. template
(in the wikicode). See above.
They key below is a script code, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab'
and 'ur-Arab' will match 'Arab'.
The value below is a string consisting of one or more hyphen characters. If there is more than one character, the
default hyphen must come last and a non-default function must be specified for the script in display_hyphens[] so
the correct display hyphen will be specified when no template hyphen is given (in {{suffix}}/{{prefix}}/etc.).
Script detection is normally done when linking, but we need to do it earlier. However, under most circumstances we
don't need to do script detection. Specifically, we only need to do script detection for a given language if
(a) the language has multiple scripts; and
(b) at least one of those scripts is listed below or in display_hyphens.
]=]
local ZWNJ = u(0x200C) -- zero-width non-joiner
local template_hyphens = {
-- This covers all Arabic scripts. See above.
["Arab"] = "ـ" .. ZWNJ .. "-", -- tatweel + zero-width non-joiner + regular hyphen
["Hebr"] = "־", -- Hebrew-specific hyphen termed "maqqef"
["Mong"] = "᠊",
-- FIXME! What about the following right-to-left scripts?
-- Adlm (Adlam)
-- Armi (Imperial Aramaic)
-- Avst (Avestan)
-- Cprt (Cypriot)
-- Khar (Kharoshthi)
-- Mand (Mandaic/Mandaean)
-- Mani (Manichaean)
-- Mend (Mende/Mende Kikakui)
-- Narb (Old North Arabian)
-- Nbat (Nabataean/Nabatean)
-- Nkoo (N'Ko)
-- Orkh (Orkhon runes)
-- Phli (Inscriptional Pahlavi)
-- Phlp (Psalter Pahlavi)
-- Phlv (Book Pahlavi)
-- Phnx (Phoenician)
-- Prti (Inscriptional Parthian)
-- Rohg (Hanifi Rohingya)
-- Samr (Samaritan)
-- Sarb (Old South Arabian)
-- Sogd (Sogdian)
-- Sogo (Old Sogdian)
-- Syrc (Syriac)
-- Thaa (Thaana)
}
-- Hyphens used when looking up an affix in a lang-specific affix mapping. Defaults to regular hyphen (-). The keys
-- are script codes, after removing a hyphen and anything preceding. Hence, script codes like 'fa-Arab' and 'ur-Arab'
-- will match 'Arab'. The value should be a single character.
local lookup_hyphens = {
["Hebr"] = "־",
-- This covers all Arabic scripts. See above.
["Arab"] = "ـ",
}
-- Default display-hyphen function.
local function default_display_hyphen(script, hyph)
if not hyph then
return template_hyphens[script] or "-"
end
return hyph
end
local function arab_get_display_hyphen(script, hyph)
if not hyph then
return "ـ" -- tatweel
elseif hyph == ZWNJ then
return ""
else
return hyph
end
end
local function no_display_hyphen(script, hyph)
return ""
end
-- Per-script function to return the correct display hyphen given the script and template hyphen. The function should
-- also handle the case where the passed-in template hyphen is nil, corresponding to the situation in
-- {{prefix}}/{{suffix}}/etc. where no template hyphen is specified. The key is the script code after removing a hyphen
-- and anything preceding, so 'fa-Arab', 'ur-Arab' etc. will match 'Arab'.
local display_hyphens = {
-- This covers all Arabic scripts. See above.
["Arab"] = arab_get_display_hyphen,
["Bopo"] = no_display_hyphen,
["Hani"] = no_display_hyphen,
["Hans"] = no_display_hyphen,
["Hant"] = no_display_hyphen,
-- The following is a mixture of several scripts. Hopefully the specs here are correct!
["Jpan"] = no_display_hyphen,
["Jurc"] = no_display_hyphen,
["Kitl"] = no_display_hyphen,
["Kits"] = no_display_hyphen,
["Laoo"] = no_display_hyphen,
["Nshu"] = no_display_hyphen,
["Shui"] = no_display_hyphen,
["Tang"] = no_display_hyphen,
["Thaa"] = no_display_hyphen,
["Thai"] = no_display_hyphen,
["Tibt"] = no_display_hyphen,
}
-----------------------------------------------------------------------------------------
-- Basic Utility functions --
-----------------------------------------------------------------------------------------
local function glossary_link(entry, text)
text = text or entry
return "[[Lampiran:Glosari#" .. entry .. "|" .. text .. "]]"
end
local function track(page)
if type(page) == "table" then
for i, pg in ipairs(page) do
page[i] = "affix/" .. pg
end
else
page = "affix/" .. page
end
require("Module:debug/track")(page)
end
local function ine(val)
return val ~= "" and val or nil
end
-----------------------------------------------------------------------------------------
-- Compound types --
-----------------------------------------------------------------------------------------
local function make_compound_type(typ, alttext)
return {
text = glossary_link(typ, alttext) .. " majmuk",
cat = typ .. " majmuk",
}
end
-- Make a compound type entry with a simple rather than glossary link.
-- These should be replaced with a glossary link when the entry in the glossary
-- is created.
local function make_non_glossary_compound_type(typ, alttext)
local link = alttext and "[[" .. typ .. "|" .. alttext .. "]]" or "[[" .. typ .. "]]"
return {
text = link .. " majmuk",
cat = typ .. " majmuk",
}
end
local function make_raw_compound_type(typ, alttext)
return {
text = glossary_link(typ, alttext),
cat = pluralize(typ),
}
end
local function make_borrowing_type(typ, alttext)
return {
text = glossary_link(typ, alttext),
borrowing_type = pluralize(typ),
}
end
export.etymology_types = {
["adapted borrowing"] = make_borrowing_type("adapted borrowing"),
["adap"] = "adapted borrowing",
["abor"] = "adapted borrowing",
["alliterative"] = make_non_glossary_compound_type("alliterative"),
["allit"] = "alliterative",
["antonymous"] = make_non_glossary_compound_type("antonymous"),
["ant"] = "antonymous",
["bahuvrihi"] = make_compound_type("bahuvrihi", "bahuvrīhi"),
["bahu"] = "bahuvrihi",
["bv"] = "bahuvrihi",
["coordinative"] = make_compound_type("coordinative"),
["coord"] = "coordinative",
["descriptive"] = make_compound_type("descriptive"),
["desc"] = "descriptive",
["determinative"] = make_compound_type("determinative"),
["det"] = "determinative",
["dvandva"] = make_compound_type("dvandva"),
["dva"] = "dvandva",
["dvigu"] = make_compound_type("dvigu"),
["dvi"] = "dvigu",
["endocentric"] = make_compound_type("endocentric"),
["endo"] = "endocentric",
["exocentric"] = make_compound_type("exocentric"),
["exo"] = "exocentric",
["izafet I"] = make_compound_type("izafet I"),
["iz1"] = "izafet I",
["izafet II"] = make_compound_type("izafet II"),
["iz2"] = "izafet II",
["izafet III"] = make_compound_type("izafet III"),
["iz3"] = "izafet III",
["karmadharaya"] = make_compound_type("karmadharaya", "karmadhāraya"),
["karma"] = "karmadharaya",
["kd"] = "karmadharaya",
["kenning"] = make_raw_compound_type("kenning"),
["ken"] = "kenning",
["rhyming"] = make_non_glossary_compound_type("rhyming"),
["rhy"] = "rhyming",
["synonymous"] = make_non_glossary_compound_type("synonymous"),
["syn"] = "synonymous",
["tatpurusa"] = make_compound_type("tatpurusa", "tatpuruṣa"),
["tat"] = "tatpurusa",
["tp"] = "tatpurusa",
}
local function process_etymology_type(typ, nocap, notext, has_parts)
local text_sections = {}
local categories = {}
local borrowing_type
if typ then
local typdata = export.etymology_types[typ]
if type(typdata) == "string" then
typdata = export.etymology_types[typdata]
end
if not typdata then
error("Internal error: Unrecognized type '" .. typ .. "'")
end
local text = typdata.text
if not nocap then
text = ucfirst(text)
end
local cat = typdata.cat
borrowing_type = typdata.borrowing_type
local oftext = typdata.oftext or " of"
if not notext then
table.insert(text_sections, text)
if has_parts then
table.insert(text_sections, oftext)
table.insert(text_sections, " ")
end
end
if cat then
table.insert(categories, cat)
end
end
return text_sections, categories, borrowing_type
end
-----------------------------------------------------------------------------------------
-- Utility functions --
-----------------------------------------------------------------------------------------
-- Iterate an array up to the greatest integer index found.
local function ipairs_with_gaps(t)
local indices = m_table.numKeys(t)
local max_index = #indices > 0 and math.max(unpack(indices)) or 0
local i = 0
return function()
while i < max_index do
i = i + 1
return i, t[i]
end
end
end
export.ipairs_with_gaps = ipairs_with_gaps
--[==[
Join formatted parts (in `parts_formatted`) together with any overall {{para|lit}} spec (in `lit`) plus categories,
which are formatted by prepending the language name as found in `lang`. The value of an entry in `categories` can be
either a string (which is formatted using `sort_key`) or a table of the form `{ {cat=<var>category</var>,
sort_key=<var>sort_key</var>, sort_base=<var>sort_base</var>}`, specifying the sort key and sort base to use when
formatting the category. If `nocat` is given, no categories are added; otherwise, `force_cat` causes categories to be
added even on userspace pages.
]==]
function export.join_formatted_parts(data)
local cattext
local lang = data.data.lang
local force_cat = data.data.force_cat or debug_force_cat
if data.data.nocat then
cattext = ""
else
for i, cat in ipairs(data.categories) do
if type(cat) == "table" then
data.categories[i] = require(utilities_module).format_categories(cat.cat .. " bahasa " .. lang:getFullName(),
lang, cat.sort_key, cat.sort_base, force_cat)
else
data.categories[i] = require(utilities_module).format_categories(cat .. " bahasa " .. lang:getFullName(), lang,
data.data.sort_key, nil, force_cat)
end
end
cattext = table.concat(data.categories)
end
local result = table.concat(data.parts_formatted, not data.separator_already_added and " +‎ " or nil) ..
(data.data.lit and ", secara harfiah " .. m_links.mark(data.data.lit, "gloss") or "")
local q = data.data.q
local qq = data.data.qq
local l = data.data.l
local ll = data.data.ll
local infl = data.data.infl
if q and q[1] or qq and qq[1] or l and l[1] or ll and ll[1] or infl and infl[1] then
result = require(pron_qualifier_module).format_qualifiers {
lang = lang,
text = result,
q = q,
qq = qq,
l = l,
ll = ll,
infl = infl,
}
end
return result .. cattext
end
local function pluralize(pos)
return pos
end
-- Remove links and call lang:stripDiacritics(term).
local function strip_diacritics_no_links(lang, term)
return lang:stripDiacritics(m_links.remove_links(term))
end
--[=[
Convert a raw part as passed into an entry point into a part ready for linking. `lang` and `sc` are the overall
language and script objects. This uses the overall language and script objects as defaults for the part and parses off
any fragment from the term. We need to do the latter so that fragments don't end up in categories and so that we
correctly do affix mapping even in the presence of fragments.
]=]
local function canonicalize_part(part, lang, sc)
if not part then
return
end
-- Save the original (user-specified, part-specific) value of `lang`. If such a value is specified, we don't insert
-- a '*fixed with' category, and we format the part using format_derived() in [[Module:etymology]] rather than
-- full_link() in [[Module:links]].
part.part_lang = part.lang
part.lang = part.lang or lang
part.sc = part.sc or sc
local term = part.term
if not term then
return
elseif not part.fragment then
part.term, part.fragment = m_links.get_fragment(term)
else
part.term = m_links.get_fragment(term)
end
end
--[==[
Construct a single linked part based on the information in `part`, for use by `show_affix()` and other entry points.
This should be called after `canonicalize_part()` is called on the part. This is a thin wrapper around `full_link()` in
[[Module:links]] unless `part.part_lang` is specified (indicating that a part-specific language was given), in which
case `format_derived()` in [[Module:etymology]] is called to display a term in a language other than the language of
the overall term (specified in `data.lang`). `data` contains the entire object passed into the entry point and is used
to access information for constructing the categories added by `format_derived()`.
]==]
function export.link_term(part, data, include_separator)
local result
if part.part_lang then
result = require(etymology_module).format_derived {
lang = data.lang,
terms = {part},
sources = {part.lang},
sort_key = data.sort_key,
nocat = data.nocat,
template_name = "affix",
qualifiers_labels_on_outside = true,
borrowing_type = data.borrowing_type,
force_cat = data.force_cat or debug_force_cat,
}
else
result = m_links.full_link(part, "term", nil, "show qualifiers")
end
if include_separator and part.separator then
return part.separator .. result
else
return result
end
end
local function canonicalize_script_code(scode)
-- Convert fa-Arab, ur-Arab etc. to Arab.
return (scode:gsub("^.*%-", ""))
end
-----------------------------------------------------------------------------------------
-- Affix-handling functions --
-----------------------------------------------------------------------------------------
-- Figure out the appropriate script for the given affix and language (unless the script is explicitly passed in), and
-- return the values of template_hyphens[], display_hyphens[] and lookup_hyphens[] for that script, substituting
-- default values as appropriate. Four values are returned:
-- DETECTED_SCRIPT, TEMPLATE_HYPHEN, DISPLAY_HYPHEN, LOOKUP_HYPHEN
local function detect_script_and_hyphens(text, lang, sc)
local scode
-- 1. If the script is explicitly passed in, use it.
if sc then
scode = sc:getCode()
else
local possible_script_codes = lang:getScriptCodes()
-- YUCK! `possible_script_codes` comes from loadData() so #possible_scripts doesn't work (always returns 0).
local num_possible_script_codes = m_table.length(possible_script_codes)
if num_possible_script_codes == 0 then
-- This shouldn't happen; if the language has no script codes,
-- the list {"None"} should be returned.
error("Something is majorly wrong! Language " .. lang:getCanonicalName() .. " has no script codes.")
end
if num_possible_script_codes == 1 then
-- 2. If the language has only one possible script, use it.
scode = possible_script_codes[1]
else
-- 3. Check if any of the possible scripts for the language have non-default values for template_hyphens[]
-- or display_hyphens[]. If so, we need to do script detection on the text. If not, just use "Latn",
-- which may not be technically correct but produces the right results because Latn has all default
-- values for template_hyphens[] and display_hyphens[].
local may_have_nondefault_hyphen = false
for _, script_code in ipairs(possible_script_codes) do
script_code = canonicalize_script_code(script_code)
if template_hyphens[script_code] or display_hyphens[script_code] then
may_have_nondefault_hyphen = true
break
end
end
if not may_have_nondefault_hyphen then
scode = "Latn"
else
scode = lang:findBestScript(text):getCode()
end
end
end
scode = canonicalize_script_code(scode)
local template_hyphen = template_hyphens[scode] or "-"
local lookup_hyphen = lookup_hyphens[scode] or "-"
local display_hyphen = display_hyphens[scode] or default_display_hyphen
return scode, template_hyphen, display_hyphen, lookup_hyphen
end
--[=[
Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to
the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string,
specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen,
or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that
matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only
the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix).
]=]
local function reconstruct_term_per_hyphens(term, affix_type, scode, thyph_re, new_hyphen)
local function get_hyphen(hyph)
if type(new_hyphen) == "string" then
return new_hyphen
end
return new_hyphen(scode, hyph)
end
if affix_type == "non-affix" then
return term
elseif affix_type == "apitan" then
local before, before_hyphen, after_hyphen, after = rmatch(term, "^(.*)" .. thyph_re .. " " .. thyph_re
.. "(.*)$")
if not before or ulen(term) <= 3 then
-- Unlike with other types of affixes, don't try to add hyphens in the middle of the term to convert it to
-- a circumfix. Also, if the term is just hyphen + space + hyphen, return it.
return term
end
return before .. get_hyphen(before_hyphen) .. " " .. get_hyphen(after_hyphen) .. after
elseif affix_type == "sisipan" or affix_type == "jalinan" then
local before_hyphen, middle, after_hyphen = rmatch(term, "^" .. thyph_re .. "(.*)" .. thyph_re .. "$")
if before_hyphen and ulen(term) <= 1 then
-- If the term is just a hyphen, return it.
return term
end
return get_hyphen(before_hyphen) .. (middle or term) .. get_hyphen(after_hyphen)
elseif affix_type == "awalan" then
local middle, after_hyphen = rmatch(term, "^(.*)" .. thyph_re .. "$")
if middle and ulen(term) <= 1 then
-- If the term is just a hyphen, return it.
return term
end
return (middle or term) .. get_hyphen(after_hyphen)
elseif affix_type == "akhiran" then
local before_hyphen, middle = rmatch(term, "^" .. thyph_re .. "(.*)$")
if before_hyphen and ulen(term) <= 1 then
-- If the term is just a hyphen, return it.
return term
end
return get_hyphen(before_hyphen) .. (middle or term)
else
error(("Internal error: Unrecognized affix type '%s'"):format(affix_type))
end
end
--[=[
Look up a mapping from a given affix variant to the canonical form used in categories and links. The lookup tables are
language-specific according to `lang`, and may be ID-specific according to `affix_id`. The affixes as they appear in the
lookup tables (both the variant and the canonical form) are in "lookup affix" format (approximately speaking, they use a
regular hyphen for most scripts, but a tatweel for Arabic-script entries and a maqqef for Hebrew-script entries), but
the passed-in `affix` param is in "template affix" format (which differs from the lookup affix for Arabic-script
entries, because more types of hyphens are allowed in template affixes; see the comments at the top of the file). The
remaining parameters to this function are used to convert from template affixes to lookup affixes; see the
reconstruct_term_per_hyphens() function above.
If the affix contains brackets, no lookup is done. Otherwise, a two-stage process is used, first looking up the affix
directly and then stripping diacritics and looking it up again. The reason for this is documented above in the comments
at the top of the file (specifically, the comments describing lookup affixes).
The value of a mapping can either be a string (do the mapping regardless of affix ID) or a table indexed by affix ID
(where the special value `false` indicates no affix ID). The values of entries in this table can also be strings, or
tables with keys `affix` and `id` (again, use `false` to indicate no ID). This allows an affix mapping to map from one
ID to another (for example, this is used in English to map the [[an-]] prefix with no ID to the [[a-]] prefix with the
ID 'not').
The Given a template affix `term` and an affix type `affix_type`, change the relevant template hyphen(s) in the affix to
the display or lookup hyphen specified in `new_hyphen`, or add them if they are missing. `new_hyphen` can be a string,
specifying a fixed hyphen, or a function of two arguments (the script code `scode` and the discovered template hyphen,
or nil of no relevant template hyphen is present). `thyph_re` is a Lua pattern (which must be enclosed in parens) that
matches the possible template hyphens. Note that not all template hyphens present in the affix are changed, but only
the "relevant" ones (e.g. for a prefix, a relevant template hyphen is one coming at the end of the affix).
]=]
local function lookup_affix_mapping(affix, affix_type, lang, scode, thyph_re, lookup_hyph, affix_id)
local function do_lookup(affix)
-- Ensure that the affix uses lookup hyphens regardless of whether it used a different type of hyphens before
-- or no hyphens.
local lookup_affix = reconstruct_term_per_hyphens(affix, affix_type, scode, thyph_re, lookup_hyph)
local function do_lookup_for_langcode(langcode)
if export.langs_with_lang_specific_data[langcode] then
local langdata = mw.loadData(export.affix_lang_data_module_prefix .. langcode)
if langdata.affix_mappings then
local mapping = langdata.affix_mappings[lookup_affix]
if mapping then
if type(mapping) == "table" then
mapping = mapping[affix_id] or mapping.default or mapping[affix_id or false]
if mapping then
return mapping
end
else
return mapping
end
end
end
end
end
-- If `lang` is an etymology-only language, look for a mapping both for it and its full parent.
local langcode = lang:getCode()
local mapping = do_lookup_for_langcode(langcode)
if mapping then
return mapping
end
local full_langcode = lang:getFullCode()
if full_langcode ~= langcode then
mapping = do_lookup_for_langcode(full_langcode)
if mapping then
return mapping
end
end
return nil
end
if affix:find("%[%[") then
return nil
end
return do_lookup(affix) or do_lookup(lang:stripDiacritics(affix)) or nil
end
--[==[
For a given template term in a given language (see the definition of "template affix" near the top of the file),
possibly in an explicitly specified script `sc` (but usually nil), return the term's affix type ({"awalan"},
{"jalinan"}, {"akhiran"}, {"apitan"} or {"non-affix"}) along with the corresponding link and display affixes
(see definitions near the top of the file); also the corresponding lookup affix (if `return_lookup_affix` is specified).
The term passed in should already have any fragment (after the # sign) parsed off of it. Four values are returned:
`affix_type`, `link_term`, `display_term` and `lookup_term`. The affix type can be passed in instead of autodetected; in
this case, the template term need not have any attached hyphens, and the appropriate hyphens will be added in the
appropriate places. If `do_affix_mapping` is specified, look up the affix in the lang-specific affix mappings, as
described in the comment at the top of the file; otherwise, the link and display terms will always be the same. (They
will be the same in any case if the template term has a bracketed link in it or is not an affix.) If
`return_lookup_affix` is given, the fourth return value contains the term with appropriate lookup hyphens in the
appropriate places; otherwise, it is the same as the display term. (This functionality is used in
[[Module:category tree/affixes and compounds]] to convert link affixes into lookup affixes so that they can be looked up
in the affix mapping tables.)
]==]
local function parse_term_for_affixes(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id)
if not term then
return "non-affix", nil, nil, nil
end
if term == "^" then
-- Indicates a null term to emulate the behavior of {{suffix|foo||bar}}.
term = ""
return "non-affix", term, term, term
end
if term:find("^%^") then
-- HACK! ^ at the beginning of Korean languages has a special meaning, triggering capitalization of the
-- transliteration. Don't interpret it as "force non-affix" for those languages.
local langcode = lang:getCode()
if langcode ~= "ko" and langcode ~= "okm" and langcode ~= "jje" then
-- Formerly we allowed ^ to force non-affix type; this is now handled using an inline modifier
-- <naf>, <root>, etc. Throw an error for the moment when the old way is encountered.
error("Use of ^ to force non-affix status is no longer supported; use an inline modifier <naf> or <root> " ..
"after the component")
end
end
-- Remove an asterisk if the morpheme is reconstructed and add it back at the end.
local reconstructed = ""
if term:find("^%*") then
reconstructed = "*"
term = term:gsub("^%*", "")
end
local scode, thyph, dhyph, lhyph = detect_script_and_hyphens(term, lang, sc)
thyph = "([" .. thyph .. "])"
if not affix_type then
if rfind(term, thyph .. " " .. thyph) then
affix_type = "apitan"
else
local has_beginning_hyphen = rfind(term, "^" .. thyph)
local has_ending_hyphen = rfind(term, thyph .. "$")
if has_beginning_hyphen and has_ending_hyphen then
affix_type = "jalinan"
elseif has_ending_hyphen then
affix_type = "awalan"
elseif has_beginning_hyphen then
affix_type = "akhiran"
else
affix_type = "non-affix"
end
end
end
local link_term, display_term, lookup_term
if affix_type == "non-affix" then
link_term = term
display_term = term
lookup_term = term
else
display_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, dhyph)
if do_affix_mapping then
link_term = lookup_affix_mapping(term, affix_type, lang, scode, thyph, lhyph, affix_id)
-- The return value of lookup_affix_mapping() may be an affix mapping with lookup hyphens if a mapping
-- was found, otherwise nil if a mapping was not found. We need to convert to display hyphens in
-- either case, but in the latter case we can reuse the display term, which has already been converted.
if link_term then
link_term = reconstruct_term_per_hyphens(link_term, affix_type, scode, thyph, dhyph)
else
link_term = display_term
end
else
link_term = display_term
end
if return_lookup_affix then
lookup_term = reconstruct_term_per_hyphens(term, affix_type, scode, thyph, lhyph)
else
lookup_term = display_term
end
end
link_term = reconstructed .. link_term
display_term = reconstructed .. display_term
lookup_term = reconstructed .. lookup_term
return affix_type, link_term, display_term, lookup_term
end
--[==[
Add a hyphen to a term in the appropriate place, based on the specified affix type, stripping off any existing hyphens
in that place. For example, if `affix_type` == {"awalan"}, we'll add a hyphen onto the end if it's not already there (or
is of the wrong type). Three values are returned: the link term, display term and lookup term. This function is a thin
wrapper around `parse_term_for_affixes`; see the comments above that function for more information. Note that this
function is exposed externally because it is called by [[Module:category tree/affixes and compounds]]; see the comment
in `parse_term_for_affixes` for more information.
]==]
function export.make_affix(term, lang, sc, affix_type, do_affix_mapping, return_lookup_affix, affix_id)
if not (affix_type == "awalan" or affix_type == "akhiran" or affix_type == "apitan" or affix_type == "sisipan" or
affix_type == "jalinan" or affix_type == "non-affix") then
error("Internal error: Invalid affix type " .. (affix_type or "(nil)"))
end
local _, link_term, display_term, lookup_term = parse_term_for_affixes(term, lang, sc, affix_type,
do_affix_mapping, return_lookup_affix, affix_id)
return link_term, display_term, lookup_term
end
-----------------------------------------------------------------------------------------
-- Main entry points --
-----------------------------------------------------------------------------------------
--[==[
Core categorization logic for affixes. This is shared between show_affix(), show_compound_like() and
get_affix_categories_only(). Returns the categories array and other metadata needed for formatting.
]==]
local function generate_affix_categories(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
local text_sections, categories, borrowing_type =
process_etymology_type(data.type, data.surface_analysis or data.nocap, data.notext, #data.parts > 0)
data.borrowing_type = borrowing_type
-- Process each part
local whole_words = 0
local is_affix_or_compound = false
-- Canonicalize and generate links for all the parts first; then do categorization in a separate step, because when
-- processing the first part for categorization, we may access the second part and need it already canonicalized.
for i, part in ipairs_with_gaps(data.parts) do
part = part or {}
data.parts[i] = part
canonicalize_part(part, data.lang, data.sc)
-- Determine affix type and get link and display terms (see text at top of file). Store them in the part
-- (in fields that won't clash with fields used by full_link() in [[Module:links]] or link_term()), so they
-- can be used in the loop below when categorizing.
part.affix_type, part.affix_link_term, part.affix_display_term = parse_term_for_affixes(part.term,
part.lang, part.sc, part.type, not part.alt, nil, part.id)
-- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with inline
-- modifiers. The intention in either case is not to link the term.
part.term = ine(part.affix_link_term)
-- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being
-- redundant alt text.
part.alt = part.alt or (part.affix_display_term ~= part.affix_link_term and part.affix_display_term) or nil
end
if not data.noaffixcat then
-- Now do categorization.
for i, part in ipairs_with_gaps(data.parts) do
local affix_type = part.affix_type
if affix_type ~= "non-affix" then
is_affix_or_compound = true
-- Make a sort key. For the first part, use the second part as the sort key; the intention is that if the
-- term has a prefix, sorting by the prefix won't be very useful so we sort by what follows, which is
-- presumably the root.
local part_sort_base = nil
local part_sort = part.sort or data.sort_key
if i == 1 and data.parts[2] and data.parts[2].term then
local part2 = data.parts[2]
-- If the second-part link term is empty, the user requested an unlinked term; avoid a wikitext error
-- by using the alt value if available.
part_sort_base = ine(part2.affix_link_term) or ine(part2.alt)
if part_sort_base then
part_sort_base = strip_diacritics_no_links(part2.lang, part_sort_base)
end
end
if part.pos and rfind(part.pos, "patronym") then
table.insert(categories, {cat = "patronim", sort_key = part_sort, sort_base = part_sort_base})
end
if data.pos ~= "terms" and part.pos and rfind(part.pos, "diminutive") then
table.insert(categories, {cat = data.pos .. " diminutif", sort_key = part_sort,
sort_base = part_sort_base})
end
-- Don't add a '*fixed with' category if the link term is empty or is in a different language.
if ine(part.affix_link_term) and not part.part_lang then
table.insert(categories, {cat = data.pos .. " dengan " .. affix_type .. " " ..
strip_diacritics_no_links(part.lang, part.affix_link_term) ..
(part.id and " (" .. part.id .. ")" or ""),
sort_key = part_sort, sort_base = part_sort_base})
end
else
whole_words = whole_words + 1
if whole_words == 2 then
is_affix_or_compound = true
table.insert(categories, data.pos .. " majmuk")
end
end
end
-- Make sure there was either an affix or a compound (two or more non-affix terms).
if not is_affix_or_compound and not data.allow_no_affixes_or_compounds then
error("The parameters did not include any affixes, and the term is not a compound. Please provide at least one affix.")
end
end
return text_sections, categories, borrowing_type
end
--[==[
Implementation of {{tl|affix}} and {{tl|surface analysis}}. `data` contains all the information describing the affixes to
be displayed, and contains the following:
* `.lang` ('''required'''): Overall language object. Different from term-specific language objects (see `.parts` below).
* `.sc`: Overall script object (usually omitted). Different from term-specific script objects.
* `.parts` ('''required'''): List of objects describing the affixes to show. The general format of each object is as would
be passed to `full_link()`, except that the `.lang` field should be missing unless the term is of a language
different from the overall `.lang` value (in such a case, the language name is shown along with the term and
an additional "derived from" category is added). '''WARNING''': The data in `.parts` will be destructively
modified.
* `.pos`: Overall part of speech (used in categories, defaults to {"terms"}). Different from term-specific part of speech.
* `.sort_key`: Overall sort key. Normally omitted except e.g. in Japanese.
* `.type`: Type of compound, if the parts in `.parts` describe a compound. Strictly optional, and if supplied, the
compound type is displayed before the parts (normally capitalized, unless `.nocap` is given).
* `.nocap`: Don't capitalize the first letter of text displayed before the parts (relevant only if `.type` or
`.surface_analysis` is given).
* `.notext`: Don't display any text before the parts (relevant only if `.type` or `.surface_analysis` is given).
* `.nocat`: Disable all categorization.
* `.noaffixcat`: Disable affix (and compound) categorization. Relevant for e.g. blends, which may otherwise
be incorrectly categorized as compound terms.
* `.lit`: Overall literal definition. Different from term-specific literal definitions.
* `.force_cat`: Always display categories, even on userspace pages.
* `.surface_analysis`: Implement {{surface analysis}}; adds `By surface analysis, ` before the parts.
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
function export.show_affix(data)
local text_sections, categories, borrowing_type = generate_affix_categories(data)
-- Process each part for display
local parts_formatted = {}
for i, part in ipairs_with_gaps(data.parts) do
-- Make a link for the part
table.insert(parts_formatted, export.link_term(part, data, "include_separator"))
end
if data.surface_analysis then
local text = "dengan " .. glossary_link("surface analysis") .. ", "
if not data.nocap then
text = ucfirst(text)
end
table.insert(text_sections, 1, text)
end
table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted,
categories = categories, separator_already_added = true })
return table.concat(text_sections)
end
--[==[
Get only the categories that would be generated by show_affix(), without any text output or formatting.
This is used by Module:etymon to get affix categorization.
Returns an array of category objects, where
each entry is either a string (simple category name) or a table with keys `cat`, `sort_key`,
and `sort_base` for more complex categorization.
`data` should have the same structure as passed to show_affix():
* `.lang` (required): Overall language object
* `.parts` (required): Array of affix part objects with `.term`, `.lang`, `.id`, etc.
* `.pos`: Part of speech (defaults to "terms")
* `.sort_key`: Overall sort key for categories
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
function export.get_affix_categories_only(data)
local text_sections, categories, borrowing_type = generate_affix_categories(data)
return categories
end
function export.show_surface_analysis(data)
data.surface_analysis = true
data.allow_no_affixes_or_compounds = true
return export.show_affix(data)
end
--[==[
Implementation of {{tl|compound}}.
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
function export.show_compound(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
local text_sections, categories, borrowing_type =
process_etymology_type(data.type, data.nocap, data.notext, #data.parts > 0)
data.borrowing_type = borrowing_type
local parts_formatted = {}
local pos_for_category = (data.pos == "Perkataan") and "Kata" or data.pos
table.insert(categories, pos_for_category .. " majmuk")
-- Make links out of all the parts
local whole_words = 0
for i, part in ipairs(data.parts) do
canonicalize_part(part, data.lang, data.sc)
-- Determine affix type and get link and display terms (see text at top of file).
local affix_type, link_term, display_term = parse_term_for_affixes(part.term, part.lang, part.sc,
part.type, not part.alt, nil, part.id)
-- If the term is an interfix or the type was explicitly given, recognize it as such (which means e.g. that we
-- will display the term without hyphens for East Asian languages). Otherwise, ignore the fact that it looks
-- like an affix and display as specified in the template (but pay attention to the detected affix type for
-- certain tracking purposes).
if affix_type == "jalinan" or (part.type and part.type ~= "non-affix") then
-- If link_term is an empty string, either a bare ^ was specified or an empty term was used along with
-- inline modifiers. The intention in either case is not to link the term. Don't add a '*fixed with'
-- category in this case, or if the term is in a different language.
-- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being
-- redundant alt text.
if link_term and link_term ~= "" and not part.part_lang then
table.insert(categories, {cat = data.pos .. " dengan " .. affix_type .. " " ..
strip_diacritics_no_links(part.lang, link_term), sort_key = part.sort or data.sort_key})
end
part.term = link_term ~= "" and link_term or nil
part.alt = part.alt or (display_term ~= link_term and display_term) or nil
else
if affix_type ~= "non-affix" then
local langcode = data.lang:getCode()
-- If `data.lang` is an etymology-only language, track both using its code and its full parent's code.
track { affix_type, affix_type .. "/lang/" .. langcode }
local full_langcode = data.lang:getFullCode()
if langcode ~= full_langcode then
track(affix_type .. "/lang/" .. full_langcode)
end
else
whole_words = whole_words + 1
end
end
table.insert(parts_formatted, export.link_term(part, data, "include_separator"))
end
if whole_words == 1 then
track("one whole word")
elseif whole_words == 0 then
track("looks like confix")
end
table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted,
categories = categories, separator_already_added = true })
return table.concat(text_sections)
end
--[==[
Implementation of {{tl|blend}}, {{tl|univerbation}} and similar "compound-like" templates.
'''WARNING''': This destructively modifies both `data` and the individual structures within `.parts`.
]==]
function export.show_compound_like(data)
data.allow_no_affixes_or_compounds = true
local text_sections, categories, borrowing_type = generate_affix_categories(data)
if data.cat then
table.insert(categories, data.cat)
end
-- Process each part for display
local parts_formatted = {}
for i, part in ipairs_with_gaps(data.parts) do
-- Make a link for the part
table.insert(parts_formatted, export.link_term(part, data, "include_separator"))
end
if #data.parts > 0 and data.oftext then
table.insert(text_sections, 1, " " .. data.oftext .. " ")
end
if data.text then
table.insert(text_sections, 1, data.text)
end
table.insert(text_sections, export.join_formatted_parts { data = data, parts_formatted = parts_formatted,
categories = categories, separator_already_added = true })
return table.concat(text_sections)
end
--[==[
Make `part` (a structure holding information on an affix part) into an affix of type `affix_type`, and apply any
relevant affix mappings. For example, if the desired affix type is "akhiran", this will (in general) add a hyphen onto
the beginning of the term, alt, tr and ts components of the part if not already present. The hyphen that's added is the
"display hyphen" (see above) and may be script-specific. (In the case of East Asian scripts, the display hyphen is an
empty string whereas the template hyphen is the regular hyphen, meaning that any regular hyphen at the beginning of the
part will be effectively removed.) `lang` and `sc` hold overall language and script objects.
Note that this also applies any language-specific affix mappings, so that e.g. if the language is Finnish and the user
specified [[-käs]] in the affix and didn't specify an `.alt` value, `part.term` will contain [[-kas]] and `part.alt` will
contain [[-käs]].
This function is used by the "legacy" templates ({{tl|prefix}}, {{tl|suffix}}, {{tl|confix}}, etc.) where the nature of
the affix is specified by the template itself rather than auto-determined from the affix, as is the case with
{{tl|affix}}.
'''WARNING''': This destructively modifies `part`.
]==]
local function make_part_into_affix(part, lang, sc, affix_type)
canonicalize_part(part, lang, sc)
local link_term, display_term = export.make_affix(part.term, part.lang, part.sc, affix_type, not part.alt, nil, part.id)
part.term = link_term
-- When we don't specify `do_affix_mapping` to make_affix(), link and display terms (first and second retvals of
-- make_affix()) are the same.
-- If part.alt would be the same as part.term, make it nil, so that it isn't erroneously tracked as being
-- redundant alt text.
part.alt = part.alt and export.make_affix(part.alt, part.lang, part.sc, affix_type) or (display_term ~= link_term and display_term) or nil
local Latn = require(scripts_module).getByCode("Latn")
part.tr = export.make_affix(part.tr, part.lang, Latn, affix_type)
part.ts = export.make_affix(part.ts, part.lang, Latn, affix_type)
end
local function track_wrong_affix_type(template, part, expected_affix_type)
if part and not part.type then
local affix_type = parse_term_for_affixes(part.term, part.lang, part.sc)
if affix_type ~= expected_affix_type then
local part_name = expected_affix_type or "base"
local langcode = part.lang:getCode()
local full_langcode = part.lang:getFullCode()
require("Module:debug/track") {
template,
template .. "/" .. part_name,
template .. "/" .. part_name .. "/" .. (affix_type or "none"),
template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. langcode
}
-- If `part.lang` is an etymology-only language, track both using its code and its full parent's code.
if full_langcode ~= langcode then
require("Module:debug/track")(
template .. "/" .. part_name .. "/" .. (affix_type or "none") .. "/lang/" .. full_langcode
)
end
end
end
end
local function insert_affix_category(categories, pos, affix_type, part, sort_key, sort_base)
-- Don't add a '*fixed with' category if the link term is empty or is in a different language.
if part.term and not part.part_lang then
local cat = pos .. " dengan " .. affix_type .. " " .. strip_diacritics_no_links(part.lang, part.term) ..
(part.id and " (" .. part.id .. ")" or "")
if sort_key or sort_base then
table.insert(categories, {cat = cat, sort_key = sort_key, sort_base = sort_base})
else
table.insert(categories, cat)
end
end
end
--[==[
Implementation of {{tl|circumfix}}.
'''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`.
]==]
function export.show_circumfix(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
make_part_into_affix(data.prefix, data.lang, data.sc, "awalan")
make_part_into_affix(data.suffix, data.lang, data.sc, "akhiran")
track_wrong_affix_type("apitan", data.prefix, "awalan")
track_wrong_affix_type("apitan", data.base, nil)
track_wrong_affix_type("apitan", data.suffix, "akhiran")
-- Create circumfix term.
local circumfix = nil
if data.prefix.term and data.suffix.term then
circumfix = data.prefix.term .. " " .. data.suffix.term
data.prefix.alt = data.prefix.alt or data.prefix.term
data.suffix.alt = data.suffix.alt or data.suffix.term
data.prefix.term = circumfix
data.suffix.term = circumfix
end
-- Make links out of all the parts.
local parts_formatted = {}
local categories = {}
local sort_base
if data.base.term then
sort_base = strip_diacritics_no_links(data.base.lang, data.base.term)
end
table.insert(parts_formatted, export.link_term(data.prefix, data))
table.insert(parts_formatted, export.link_term(data.base, data))
table.insert(parts_formatted, export.link_term(data.suffix, data))
-- Insert the categories, but don't add a '*fixed with' category if the link term is in a different language.
if not data.prefix.part_lang then
table.insert(categories, {cat=data.pos .. " dengan apitan " .. strip_diacritics_no_links(data.prefix.lang,
circumfix), sort_key=data.sort_key, sort_base=sort_base})
end
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
--[==[
Implementation of {{tl|confix}}.
'''WARNING''': This destructively modifies both `data` and `.prefix`, `.base` and `.suffix`.
]==]
function export.show_confix(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
make_part_into_affix(data.prefix, data.lang, data.sc, "awalan")
make_part_into_affix(data.suffix, data.lang, data.sc, "akhiran")
track_wrong_affix_type("confix", data.prefix, "awalan")
track_wrong_affix_type("confix", data.base, nil)
track_wrong_affix_type("confix", data.suffix, "akhiran")
-- Make links out of all the parts.
local parts_formatted = {}
local prefix_sort_base
if data.base and data.base.term then
prefix_sort_base = strip_diacritics_no_links(data.base.lang, data.base.term)
elseif data.suffix.term then
prefix_sort_base = strip_diacritics_no_links(data.suffix.lang, data.suffix.term)
end
-- Insert the categories and parts.
local categories = {}
table.insert(parts_formatted, export.link_term(data.prefix, data))
insert_affix_category(categories, data.pos, "awalan", data.prefix, data.sort_key, prefix_sort_base)
if data.base then
table.insert(parts_formatted, export.link_term(data.base, data))
end
table.insert(parts_formatted, export.link_term(data.suffix, data))
-- FIXME, should we be specifying a sort base here?
insert_affix_category(categories, data.pos, "akhiran", data.suffix)
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
--[==[
Implementation of {{tl|infix}}.
'''WARNING''': This destructively modifies both `data` and `.base` and `.infix`.
]==]
function export.show_infix(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
make_part_into_affix(data.infix, data.lang, data.sc, "sisipan")
track_wrong_affix_type("sisipan", data.base, nil)
track_wrong_affix_type("sisipan", data.infix, "sisipan")
-- Make links out of all the parts.
local parts_formatted = {}
local categories = {}
table.insert(parts_formatted, export.link_term(data.base, data))
table.insert(parts_formatted, export.link_term(data.infix, data))
-- Insert the categories.
-- FIXME, should we be specifying a sort base here?
insert_affix_category(categories, data.pos, "sisipan", data.infix)
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
--[==[
Implementation of {{tl|prefix}}.
'''WARNING''': This destructively modifies both `data` and the structures within `.prefixes`, as well as `.base`.
]==]
function export.show_prefix(data)
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
for i, prefix in ipairs(data.prefixes) do
make_part_into_affix(prefix, data.lang, data.sc, "awalan")
end
for i, prefix in ipairs(data.prefixes) do
track_wrong_affix_type("awalan", prefix, "awalan")
end
track_wrong_affix_type("awalan", data.base, nil)
-- Make links out of all the parts.
local parts_formatted = {}
local first_sort_base = nil
local categories = {}
if data.prefixes[2] then
first_sort_base = ine(data.prefixes[2].term) or ine(data.prefixes[2].alt)
if first_sort_base then
first_sort_base = strip_diacritics_no_links(data.prefixes[2].lang, first_sort_base)
end
elseif data.base then
first_sort_base = ine(data.base.term) or ine(data.base.alt)
if first_sort_base then
first_sort_base = strip_diacritics_no_links(data.base.lang, first_sort_base)
end
end
for i, prefix in ipairs(data.prefixes) do
table.insert(parts_formatted, export.link_term(prefix, data))
insert_affix_category(categories, data.pos, "awalan", prefix, data.sort_key, i == 1 and first_sort_base or nil)
end
if data.base then
table.insert(parts_formatted, export.link_term(data.base, data))
else
table.insert(parts_formatted, "")
end
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
--[==[
Implementation of {{tl|suffix}}.
'''WARNING''': This destructively modifies both `data` and the structures within `.suffixes`, as well as `.base`.
]==]
function export.show_suffix(data)
local categories = {}
data.pos = data.pos or default_pos
data.pos = pluralize(data.pos)
canonicalize_part(data.base, data.lang, data.sc)
-- Hyphenate the affixes and apply any affix mappings.
for i, suffix in ipairs(data.suffixes) do
make_part_into_affix(suffix, data.lang, data.sc, "akhiran")
end
track_wrong_affix_type("akhiran", data.base, nil)
for i, suffix in ipairs(data.suffixes) do
track_wrong_affix_type("akhiran", suffix, "akhiran")
end
-- Make links out of all the parts.
local parts_formatted = {}
if data.base then
table.insert(parts_formatted, export.link_term(data.base, data))
else
table.insert(parts_formatted, "")
end
for i, suffix in ipairs(data.suffixes) do
table.insert(parts_formatted, export.link_term(suffix, data))
end
-- Insert the categories.
for i, suffix in ipairs(data.suffixes) do
-- FIXME, should we be specifying a sort base here?
insert_affix_category(categories, data.pos, "akhiran", suffix)
if suffix.pos and rfind(suffix.pos, "patronym") then
table.insert(categories, "patronim")
end
end
return export.join_formatted_parts { data = data, parts_formatted = parts_formatted, categories = categories }
end
return export
7dyhsp9hwsg3p0jvb2vy3wh1uq9p26j
elettroluminescente
0
10844
281464
245683
2026-04-23T09:42:21Z
Hakimi97
2668
/* Kata sifat */
281464
wikitext
text/x-wiki
== Bahasa Itali ==
===Kata sifat===
{{it-adj}}
# [[elektropendar]]
====Istilah berkaitan====
* [[elettroluminescenza]]
===Etimologi===
{{awalan|it|elettro|luminescente}}
i7n4ti693i3zpr3htn2103fojat7jza
Modul:category tree/topic/Communication
828
11523
281456
281414
2026-04-23T00:37:53Z
Hakimi97
2668
Membatalkan semakan [[Special:Diff/281414|281414]] oleh [[Special:Contributions/PeaceSeekers|PeaceSeekers]] ([[User talk:PeaceSeekers|bincang]])
281456
Scribunto
text/plain
local labels = {}
local unpack = unpack or table.unpack -- Lua 5.2 compatibility
-- FIXME: Lookup langs in the language list.
for _, lang_etc in ipairs {
"Arab", {"Cina", "Bahasa-bahasa Cina"}, "Inggeris", "Jerman", "Jepun", "Okinawa",
"Portugis", "Sepanyol", "Vietnam", {"Melayu", "Bahasa-bahasa Melayik"},
} do
if type(lang_etc) ~= "table" then
lang_etc = {lang_etc}
end
local lang, desc = unpack(lang_etc)
desc = desc or ("[[:Kategori:Bahasa %s|bahasa %s]]"):format(lang, lang)
labels[lang] = {
type = "berkenaan",
description = "=" .. desc,
parents = {"bahasa-bahasa"},
}
end
labels["komunikasi"] = {
type = "berkenaan",
description = "default",
parents = {"Semua topik"},
}
labels["huruf"] = {
type = "nama",
description = "default",
parents = {"sistem tulisan"},
}
labels["bahasa buatan"] = { -- distinguish from "cat:constructed languages" family category
type = "nama",
description = "={{w|constructed language}}s",
parents = {"bahasa-bahasa"},
}
labels["bahasa badan"] = {
type = "berkenaan",
description = "default",
parents = {"bahasa", "nonverbal communication"},
}
labels["penyiaran"] = {
type = "berkenaan",
description = "default",
parents = {"media", "telekomunikasi"},
}
labels["Komponen aksara Cina"] = {
type = "set",
description = "=[[komponen|Komponen]] [[aksara]] [[Cina]].",
parents = {"Huruf, simbol dan tanda baca"},
}
labels["diacritical marks"] = {
type = "set",
description = "default",
parents = {"Huruf, simbol dan tanda baca"},
}
labels["dialects"] = {
type = "set",
description = "default",
parents = {"bahasa"},
}
labels["dictation"] = {
type = "berkenaan",
description = "default",
parents = {"komunikasi"},
}
labels["bahasa pupus"] = {
type = "nama",
description = "default",
parents = {"bahasa-bahasa"},
}
labels["bahasa isyarat"] = {
type = "nama",
description = "default",
parents = {"bahasa-bahasa"},
}
labels["facial expressions"] = {
type = "set",
description = "default",
parents = {"nonverbal communication", "face"},
}
labels["kiasan"] = {
type = "set",
description = "=[[figure of speech|figures of speech]]",
parents = {"retorik"},
}
labels["bendera"] = {
type = "berkenaan,name,type",
description = "default",
parents = {"komunikasi"},
}
labels["jargon"] = {
type = "berkenaan",
description = "default",
parents = {"bahasa"},
}
labels["aksara Han"] = {
type = "berkenaan",
description = "default",
parents = {"sistem tulisan"},
}
labels["bahasa"] = {
type = "berkenaan",
description = "default",
parents = {"komunikasi"},
}
labels["keluarga bahasa"] = {
type = "nama",
description = "Topik berkenaan [[keluarga bahasa]], termasuklah yang diterima dan yang bersifat kontroversi.",
parents = {"bahasa", "nama"},
}
labels["bahasa-bahasa"] = {
type = "nama",
description = "default",
parents = {"bahasa", "nama"},
}
labels["Huruf, simbol dan tanda baca"] = {
type = "set",
description = "=[[letter]]s, [[symbol]]s, and [[punctuation]]",
parents = {"Ortografi"},
}
labels["logical fallacies"] = {
type = "set",
description = "=[[logical fallacy|logical fallacies]], clearly defined errors in reasoning used to support or refute an argument",
additional = "{{also|Kategori:{{{langcode}}}:biases}}",
parents = {"retorik", "logic"},
}
labels["media"] = {
type = "berkenaan",
description = "default",
parents = {"komunikasi"},
}
labels["telefon bimbit"] = {
type = "berkenaan,set",
description = "default",
parents = {"telefoni"},
}
labels["nonverbal communication"] = {
type = "berkenaan",
description = "default",
parents = {"komunikasi"},
}
labels["ortografi"] = {
type = "berkenaan",
description = "default",
parents = {"penulisan"},
}
labels["palaeography"] = {
type = "berkenaan",
description = "default",
parents = {"penulisan"},
}
labels["pos"] = {
type = "berkenaan",
description = "=[[post#Noun|post]] or [[mail#Noun|mail]]",
parents = {"komunikasi"},
}
labels["postal abbreviations"] = {
type = "nama",
description = "default",
parents = {"pos"},
}
labels["public relations"] = {
type = "berkenaan",
description = "default no singularize",
parents = {"komunikasi"},
}
labels["tanda baca"] = {
type = "set",
description = "default",
parents = {"Huruf, simbol dan tanda baca"},
}
labels["radio"] = {
type = "berkenaan",
description = "default",
parents = {"telekomunikasi"},
}
labels["retorik"] = {
type = "berkenaan",
description = "default",
parents = {"bahasa"},
}
labels["signs"] = {
type = "berkenaan,name,type",
description = "default",
parents = {"komunikasi"},
}
labels["sociolects"] = {
type = "nama",
description = "default",
parents = {"bahasa"},
}
labels["simbol"] = {
type = "set",
description = "=[[symbol]]s, especially [[mathematical]] and [[scientific]] symbols",
additional = "Most symbols have equivalent meanings in many languages and can therefore be found in [[:Category:Translingual symbols]].",
parents = {"Huruf, simbol dan tanda baca"},
}
labels["talking"] = {
type = "berkenaan",
description = "default",
parents = {"bahasa", "tingkah laku manusia"},
}
labels["telekomunikasi"] = {
type = "berkenaan",
description = "default no singularize",
parents = {"komunikasi", "teknologi"},
}
labels["telegraphy"] = {
type = "berkenaan",
description = "default",
parents = {"telekomunikasi", "elektronik"},
wpcat = true,
commonscat = true,
}
labels["telefoni"] = {
type = "berkenaan",
description = "default",
parents = {"telekomunikasi", "elektronik"},
}
labels["texting"] = {
type = "berkenaan",
description = "default",
parents = {"telekomunikasi"},
}
labels["textual division"] = {
type = "berkenaan",
description = "default",
parents = {"penulisan"},
}
labels["tipografi"] = {
type = "berkenaan",
description = "default",
parents = {"penulisan", "percetakan"},
}
labels["penulisan"] = {
type = "berkenaan",
description = "default",
parents = {"bahasa", "tingkah laku manusia"},
}
labels["sistem tulisan"] = {
type = "set",
description = "default",
parents = {"penulisan"},
}
return labels
gyp35snlkpffileqsjqpu60ovnf03zf
Modul:it-headword
828
13932
281463
112707
2026-04-23T09:41:01Z
Hakimi97
2668
Mengemas kini mengikut padanan Wikikamus bahasa Inggeris (semakan [[en:Special:Diff/89361722|89361722]]) (perlu semakan semula)
281463
Scribunto
text/plain
-- This module contains code for Italian headword templates.
-- Templates covered are:
-- * {{it-noun}}, {{it-proper noun}};
-- * {{it-verb}};
-- * {{it-adj}}, {{it-adj-comp}}, {{it-adj-sup}};
-- * {{it-det}};
-- * {{it-art}};
-- * {{it-pron-adj}};
-- * {{it-pp}};
-- * {{it-presp}};
-- * {{it-card-noun}}, {{it-card-adj}}, {{it-card-inv}};
-- * {{it-adv}};
-- * {{it-pos}};
-- * {{it-suffix form}}.
-- See [[Module:it-verb]] for Italian conjugation templates.
local export = {}
local pos_functions = {}
local force_cat = false -- for testing; if true, categories appear in non-mainspace pages
local m_strutils = require("Module:string utilities")
local usub = m_strutils.sub
local require_when_needed = require("Module:utilities/require when needed")
local insert = table.insert
local remove = table.remove
local m_table = require("Module:table")
local com = require("Module:it-common")
local en_utilities_module = "Module:en-utilities"
local headword_module = "Module:headword"
local headword_utilities_module = "Module:headword utilities"
local inflection_utilities_module = "Module:inflection utilities"
local it_verb_module = "Module:it-verb"
local parse_interface_module = "Module:parse interface"
local romut_module = "Module:romance utilities"
local lang = require("Module:languages").getByCode("it")
local langname = lang:getCanonicalName()
local m_en_utilities = require_when_needed(en_utilities_module)
local m_headword_utilities = require_when_needed(headword_utilities_module)
local glossary_link = require_when_needed(headword_utilities_module, "glossary_link")
local unpack = unpack or table.unpack -- Lua 5.2 compatibility
local no_split_apostrophe_words = {
["c'è"] = true,
["c'era"] = true,
["c'erano"] = true,
}
-----------------------------------------------------------------------------------------
-- Utility functions --
-----------------------------------------------------------------------------------------
local function track(page)
require("Module:debug/track")("it-headword/" .. page)
return true
end
-- Parse and insert an inflection not requiring additional processing into `data.inflections`. The raw arguments come
-- from `args[field]`, which is parsed for inline modifiers. `label` is the label that the inflections are given;
-- `accel` is the accelerator form, or nil.
local function parse_and_insert_inflection(data, args, field, label, accel, frob)
m_headword_utilities.parse_and_insert_inflection {
headdata = data,
forms = args[field],
paramname = field,
splitchar = ",",
label = label,
accel = accel and {form = accel} or nil,
frob = frob,
}
end
local function replace_hash_with_lemma(term, lemma)
-- If there is a % sign in the lemma, we have to replace it with %% so it doesn't get interpreted as a capture
-- replace expression.
lemma = lemma:gsub("%%", "%%%%")
return (term:gsub("#", lemma))
end
local list_param = {list = true, disallow_holes = true}
local boolean_param = {type = "boolean"}
-----------------------------------------------------------------------------------------
-- Main entry point --
-----------------------------------------------------------------------------------------
function export.show(frame)
local poscat = frame.args[1]
or error("Part of speech has not been specified. Please pass parameter 1 to the module invocation.")
local parargs = frame:getParent().args
local params = {
["head"] = list_param,
["id"] = true,
["sort"] = true,
["apoc"] = boolean_param,
["splithyph"] = boolean_param,
["nolinkhead"] = boolean_param,
["nolink"] = {type = "boolean", alias_of = "nolinkhead"},
["json"] = boolean_param,
["pagename"] = true, -- for testing
}
if pos_functions[poscat] then
for key, val in pairs(pos_functions[poscat].params) do
params[key] = val
end
end
local args = require("Module:parameters").process(parargs, params)
local pagename = args.pagename or mw.loadData("Module:headword/data").pagename
local user_specified_heads = args.head
local heads = user_specified_heads
if args.nolinkhead then
if #heads == 0 then
heads = {pagename}
end
else
local romut = require(romut_module)
local auto_linked_head = romut.add_links_to_multiword_term(pagename, args.splithyph,
no_split_apostrophe_words)
if #heads == 0 then
heads = {auto_linked_head}
else
for i, head in ipairs(heads) do
if head:find("^~") then
head = romut.apply_link_modifiers(auto_linked_head, usub(head, 2))
heads[i] = head
end
if head == auto_linked_head then
track("redundant-head")
end
end
end
end
local data = {
lang = lang,
pos_category = pos_functions[poscat] and pos_functions[poscat].pos_category or poscat,
categories = {},
heads = heads,
user_specified_heads = user_specified_heads,
no_redundant_head_cat = #user_specified_heads == 0,
genders = {},
inflections = {},
pagename = pagename,
id = args.id,
sort_key = args.sort,
force_cat_output = force_cat,
checkredlinks = pos_functions[poscat] and pos_functions[poscat].redlink_pos or true,
}
if pagename:find("^%-") and poscat ~= "bentuk akhiran" then
data.is_suffix = true
data.pos_category = "akhiran"
data.checkredlinks = true
local singular_poscat = m_en_utilities.singularize(poscat)
insert(data.categories, "Akhiran membentuk " .. singular_poscat .. " bahasa " .. langname)
insert(data.inflections, {label = "Akhiran membentuk " .. singular_poscat})
end
if pos_functions[poscat] then
pos_functions[poscat].func(args, data)
end
if args.apoc then
-- Apocopated form of a term; do this after calling pos_functions[], because the function might modify
-- data.pos_category.
local pos = data.pos_category
if not pos:find("Bentuk ") then
-- Apocopated forms are non-lemma forms.
local singular_poscat = m_en_utilities.singularize(pos)
data.pos_category = "Bentuk " .. singular_poscat
end
-- If this is a suffix, insert label 'apocopated' after 'FOO-forming suffix', otherwise insert at the beginning.
insert(data.inflections, data.is_suffix and 2 or 1, {label = glossary_link("apocopated")})
end
if args.json then
return require("Module:JSON").toJSON(data)
end
return require(headword_module).full_headword(data)
end
local deriv_params = {
{"dim", glossary_link("diminutif")},
{"dim_dim", "double " .. glossary_link("diminutif")},
{"aug_dim", glossary_link("agam") .. "-" .. glossary_link("diminutif")},
{"aug", glossary_link("agam")},
{"dim_aug", glossary_link("diminutif") .. "-" .. glossary_link("agam")},
{"aug_aug", "double " .. glossary_link("agam")},
{"pej", glossary_link("pejoratif")},
{"dim_pej", glossary_link("diminutif") .. "-" .. glossary_link("pejoratif")},
{"aug_pej", glossary_link("agam") .. "-" .. glossary_link("pejoratif")},
{"pej_pej", "double " .. glossary_link("pejoratif")},
{"end", glossary_link("endearing")},
{"dim_end", glossary_link("diminutif") .. "-" .. glossary_link("endearing")},
{"aug_end", glossary_link("agam") .. "-" .. glossary_link("endearing")},
{"derog", glossary_link("hinaan")},
{"dim_derog", glossary_link("diminutif") .. "-" .. glossary_link("hinaan")},
{"aug_derog", glossary_link("agam") .. "-" .. glossary_link("hinaan")},
{"end_derog", glossary_link("endearing") .. "-" .. glossary_link("hinaan")},
}
local function insert_deriv_params(params)
for _, deriv_param in ipairs(deriv_params) do
local param = unpack(deriv_param)
params[param] = list_param
end
end
local param_mods = {
t = {
-- We need to store the <t:...> inline modifier into the "gloss" key of the parsed part, because that is what
-- [[Module:links]] expects.
item_dest = "gloss",
},
gloss = {},
-- no 'tr' or 'ts', doesn't make sense for Italian
g = {
-- We need to store the <g:...> inline modifier into the "genders" key of the parsed part, because that is what
-- [[Module:links]] expects.
item_dest = "genders",
sublist = true,
},
id = {},
alt = {},
q = {type = "qualifier"},
qq = {type = "qualifier"},
lit = {},
pos = {},
-- no 'sc', doesn't make sense for Italian
}
local function parse_term_with_modifiers(paramname, val)
local function generate_obj(term)
local decomp = com.decompose(term)
local lemma = com.remove_non_final_accents(decomp)
if lemma ~= decomp then
term = com.compose("[[" .. lemma .. "|" .. decomp .. "]]")
end
return {term = term}
end
local retval = require(parse_interface_module).parse_inline_modifiers(val, {
paramname = paramname,
param_mods = param_mods,
generate_obj = generate_obj,
splitchar = "[/;,]",
preserve_splitchar = true,
})
for _, obj in ipairs(retval) do
if obj.delimiter == ";" then
obj.separator = "; "
elseif obj.delimiter == "/" then
obj.separator = "/"
-- default to nil for comma
end
end
return retval
end
local function insert_deriv_inflections(data, args)
for _, deriv_param in ipairs(deriv_params) do
local param, desc = unpack(deriv_param)
if #args[param] > 0 then
local inflection = {label = desc}
for _, term in ipairs(args[param]) do
local parsed_terms = parse_term_with_modifiers(param, term)
for _, parsed_term in ipairs(parsed_terms) do
insert(inflection, parsed_term)
end
end
insert(data.inflections, inflection)
end
end
end
-----------------------------------------------------------------------------------------
-- Nouns --
-----------------------------------------------------------------------------------------
local allowed_genders = m_table.listToSet(
{"m", "f", "mf", "mfbysense", "mfequiv", "gneut", "n", "m-p", "f-p", "mf-p", "mfbysense-p", "mfequiv-p", "gneut-p", "n-p", "?", "?-p"}
)
local function validate_genders(genders)
for _, g in ipairs(genders) do
if type(g) == "table" then
g = g.spec
end
if not allowed_genders[g] then
error("Unrecognized gender: " .. g)
end
end
end
local function do_noun(args, data, is_proper)
local is_plurale_tantum = false
local has_singular = false
local category_plpos = data.checkredlinks
if category_plpos == true then
category_plpos = data.pos_category
end
local category_pos = m_en_utilities.singularize(category_plpos)
validate_genders(args[1])
data.genders = args[1]
local saw_m = false
local saw_f = false
local gender_for_default_plural
-- Check for specific genders and pluralia tantum.
for _, g in ipairs(args[1]) do
if type(g) == "table" then
g = g.spec
end
if g:find("-p$") then
is_plurale_tantum = true
else
has_singular = true
if g == "m" or g == "mf" or g == "mfbysense" then
saw_m = true
end
if g == "f" or g == "mf" or g == "mfbysense" then
saw_f = true
end
end
end
if saw_m and saw_f then
gender_for_default_plural = "mf"
elseif saw_f then
gender_for_default_plural = "f"
else
gender_for_default_plural = "m"
end
local lemma = data.pagename
local function inscat(cat)
insert(data.categories, langname .. " " .. cat)
end
local function insert_noun_inflection(terms, label, accel, no_inv)
for _, term in ipairs(terms) do
if not no_inv and term.term == lemma then
term.term = nil
term.label = glossary_link("invariable")
end
end
m_headword_utilities.insert_inflection {
headdata = data,
terms = terms,
label = label,
accel = accel and {form = accel} or nil,
}
end
-- Plural
local plurals = {}
-- Fetch explicit masculine and feminine plurals here because we may change them below when processing plurals.
local mpls = m_headword_utilities.parse_term_list_with_modifiers {
paramname = "mpl",
forms = args.mpl,
splitchar = ",",
}
local fpls = m_headword_utilities.parse_term_list_with_modifiers {
paramname = "fpl",
forms = args.fpl,
splitchar = ",",
}
if is_plurale_tantum and not has_singular then
if args[2][1] then
error("Can't specify plurals of plurale tantum " .. category_pos)
end
insert(data.inflections, {label = glossary_link("hanya jamak")})
elseif args.apoc then
-- apocopated noun
if args[2][1] then
error("Can't specify plurals of apocopated " .. category_pos)
end
else
-- Fetch plurals and associated qualifiers, labels and genders.
plurals = m_headword_utilities.parse_term_list_with_modifiers {
paramname = {2, "pl"},
forms = args[2],
splitchar = ",",
include_mods = {"g"},
}
-- Check for special plural signals
local mode = nil
local pl1 = plurals[1]
if pl1 and #pl1.term == 1 then
mode = pl1.term
if mode == "?" or mode == "!" or mode == "-" or mode == "~" then
pl1.term = nil
if next(pl1) then
error(("Can't specify inline modifiers with plural code '%s'"):format(mode))
end
remove(plurals, 1) -- Remove the mode parameter
elseif mode ~= "+" and mode ~= "#" then
error(("Unexpected plural code '%s'"):format(mode))
end
end
if is_plurale_tantum then
-- both singular and plural
insert(data.inflections, {label = "kadangkala " .. glossary_link("hanya jamak") .. ", dengan kelainan"})
end
if mode == "?" then
-- Plural is unknown
insert(data.categories, category_plpos .. " bahasa " .. langname .. " dengan bentuk jamak yang tidak dikenal pasti")
elseif mode == "!" then
-- Plural is not attested
insert(data.inflections, {label = "plural not attested"})
insert(data.categories, category_plpos .. " bahasa " .. langname .. " dengan bentuk jamak yang tidak ditentusahkan")
if plurals[1] then
error("Can't specify any plurals along with unattested plural code '!'")
end
elseif mode == "-" then
-- Uncountable noun; may occasionally have a plural
insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname)
-- If plural forms were given explicitly, then show "usually"
if plurals[1] then
insert(data.inflections, {label = "biasanya " .. glossary_link("tak berbilang")})
insert(data.categories, category_plpos .. " berbilang bahasa " .. langname)
else
insert(data.inflections, {label = glossary_link("uncountable")})
end
else
-- Countable or mixed countable/uncountable
-- If no plurals, use the default plural unless mpl= or fpl= explicitly given.
if not plurals[1] and not mpls[1] and not fpls[1] and not is_proper then
plurals[1] = {term = "+"}
end
if mode == "~" then
-- Mixed countable/uncountable noun, always has a plural
insert(data.inflections, {label = glossary_link("berbilang") .. " dan " .. glossary_link("tak berbilang")})
insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname)
insert(data.categories, category_plpos .. " berbilang bahasa " .. langname)
elseif plurals[1] then
-- Countable nouns
insert(data.categories, category_plpos .. " berbilang bahasa " .. langname)
else
-- Uncountable nouns
insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname)
end
end
-- Process plurals, handling requests for default plurals.
local has_default_or_hash = false
for _, pl in ipairs(plurals) do
if pl.term:find("^%+") or pl.term:find("#") or pl.term == "cap*" or pl.term == "cap*+" then
has_default_or_hash = true
break
end
end
if has_default_or_hash then
local newpls = {}
local function insert_pl(pl, defpl)
pl.term = defpl
insert(newpls, pl)
end
local function make_gendered_plural(pl, special)
if gender_for_default_plural == "mf" then
local default_mpl = com.make_plural(lemma, "m", special)
local default_fpl = com.make_plural(lemma, "f", special)
if default_mpl then
if default_mpl == default_fpl then
insert_pl(pl, default_mpl)
else
if args.mpl[1] or args.fpl[1] then
error("Can't specify gendered plural spec '" .. (special or "+") ..
"' along with gender=" .. gender_for_default_plural ..
" and also specify mpl= or fpl=")
end
mpls = {m_table.shallowCopy(pl)}
mpls[1].term = default_mpl
fpls = {pl}
fpls[1].term = default_fpl
end
end
else
local defpl = com.make_plural(lemma, gender_for_default_plural, special)
if defpl then
insert_pl(pl, defpl)
end
end
end
for _, pl in ipairs(plurals) do
if pl.term == "cap*" or pl.term == "cap*+" then
make_gendered_plural(pl, pl.term)
elseif pl.term == "+" then
make_gendered_plural(pl)
elseif pl.term:find("^%+") then
local special = require(romut_module).get_special_indicator(pl.term)
make_gendered_plural(pl, special)
else
insert_pl(pl, replace_hash_with_lemma(pl.term, lemma))
end
end
plurals = newpls
end
if plurals[2] then
inscat(category_plpos .. " with multiple plurals")
end
-- If the first or only plural is the same as the singular, replace it with 'invariable', or 'usually
-- invariable' if there is more than one plural.
pl1 = plurals[1]
if pl1 and pl1.term == lemma then
if plurals[2] then
insert(data.inflections, {label = "usually " .. glossary_link("invariable"),
q = pl1.q, qq = pl1.qq, l = pl1.l, ll = pl1.ll, refs = pl1.refs
})
else
insert(data.inflections, {label = glossary_link("invariable"),
q = pl1.q, qq = pl1.qq, l = pl1.l, ll = pl1.ll, refs = pl1.refs
})
end
remove(plurals, 1)
inscat("indeclinable " .. category_plpos)
end
if plurals[1] then
-- Check for gender-changing plurals.
for _, pl in ipairs(plurals) do
if pl.genders then
for _, g in ipairs(pl.genders) do
if type(g) ~= "table" then
g = {spec = g}
end
if g.spec == "m" and not saw_m or g.spec == "f" and not saw_f then
inscat(category_plpos .. " that change gender in the plural")
end
end
end
end
end
end
-- Gather masculines/feminines. For each one, generate the corresponding plural. `field` is the name of the field
-- containing the masculine or feminine forms (normally "m" or "f"); `inflect` is a function of one or two arguments
-- to generate the default masculine or feminine from the lemma (the arguments are the lemma and optionally a
-- "special" flag to indicate how to handle multiword lemmas, and the function is normally make_feminine or
-- make_masculine from [[Module:it-common]]); and `default_plurals` is a list into which the corresponding default
-- plurals of the gathered or generated masculine or feminine forms are stored.
local function handle_mf(field, inflect, default_plurals)
local special
local mfs = m_headword_utilities.parse_term_list_with_modifiers {
paramname = field,
forms = args[field],
splitchar = ",",
frob = function(term)
if term == "+" then
-- Generate default masculine/feminine.
term = inflect(lemma)
else
term = replace_hash_with_lemma(term, lemma)
end
special = require(romut_module).get_special_indicator(term)
if special then
term = inflect(lemma, special)
end
return term
end
}
for _, mf in ipairs(mfs) do
local plobj = m_table.shallowCopy(mf)
plobj.term = com.make_plural(mf.term, field, special)
if plobj.term then
-- Add an accelerator for each masculine/feminine plural whose lemma is the corresponding singular, so that
-- the accelerated entry that is generated has a definition that looks like
-- # {{plural of|it|MFSING}}
plobj.accel = {form = "p", lemma = mf.term}
insert(default_plurals, plobj)
end
end
return mfs
end
local feminine_plurals = {}
local feminines = handle_mf("f", com.make_feminine, feminine_plurals)
local masculine_plurals = {}
local masculines = handle_mf("m", com.make_masculine, masculine_plurals)
local function handle_mf_plural(mfplfield, mfpls, gender, default_plurals, singulars)
if is_plurale_tantum then
return mfpls, true
end
local new_mfpls = {}
local saw_plus
local noinv
for i, mfpl in ipairs(mfpls) do
local accel
if #mfpls == #singulars then
-- If same number of overriding masculine/feminine plurals as singulars, assume each plural goes with
-- the corresponding singular and use each corresponding singular as the lemma in the accelerator. The
-- generated entry will have
-- # {{plural of|it|SINGULAR}}
-- as the definition.
accel = {form = "p", lemma = singulars[i].term}
else
accel = nil
end
if mfpl.term == "+" then
-- We should never see + twice. If we do, it will lead to problems since we overwrite the values of
-- default_plurals the first time around.
if saw_plus then
error(("Saw + twice when handling %s="):format(mfplfield))
end
saw_plus = true
if not default_plurals[1] then
local defpl = com.make_plural(lemma, gender)
if not defpl then
error("Unable to generate default plural of '" .. lemma .. "'")
end
default_plurals[1] = {term = defpl}
end
for _, defpl in ipairs(default_plurals) do
-- defpl is already a table and has an accel field
m_headword_utilities.combine_termobj_qualifiers_labels(defpl, mfpl)
insert(new_mfpls, defpl)
end
-- don't use "invariable" because the plural is not with respect to the lemma but with respect to the
-- masc/fem singular
noinv = true
elseif mfpl.term == "cap*" or mfpl.term == "cap*+" or mfpl.term:find("^%+") then
if mfpl.term:find("^%+") then
mfpl.term = require(romut_module).get_special_indicator(mfpl.term)
end
if singulars[1] then
for _, mf in ipairs(singulars) do
local mfplobj = m_table.shallowCopy(mfpl)
mfplobj.term = com.make_plural(mf.term, gender, mfpl.term)
if mfplobj.term then
mfplobj.accel = accel
m_headword_utilities.combine_termobj_qualifiers_labels(mfplobj, mf)
insert(new_mfpls, mfplobj)
end
-- don't use "invariable" because the plural is not with respect to the lemma but with respect
-- to the masc/fem singular
noinv = true
-- FIXME: Should we throw an error if no plural could be generated?
end
else
-- FIXME: This clause didn't exist in the corresponding code in [[Module:pt-headword]]. Is it
-- correct?
mfpl.term = com.make_plural(lemma, gender, mfpl.term)
if mfpl.term then
insert(new_mfpls, mfpl)
end
end
else
mfpl.accel = accel
mfpl.term = replace_hash_with_lemma(mfpl.term, lemma)
insert(new_mfpls, mfpl)
-- don't use "invariable" if masc/fem singular present because the plural is not with respect to
-- the lemma but with respect to the masc/fem singular
noinv = noinv or #singulars > 0
end
end
return new_mfpls, noinv
end
local mpl_noinv, fpl_noinv
-- Not fpls[1] because if the user didn't specify any explicit mpl= or fpl= but the lemma gender is mf or mfbysense
-- and has separate masculine and feminine plural forms (e.g. any term in -ista), we don't want to reprocess those
-- auto-generated forms.
if args.fpl[1] then
-- Override any existing feminine plurals.
feminine_plurals, fpl_noinv = handle_mf_plural("fpl", fpls, "f", feminine_plurals, feminines)
else
feminine_plurals, fpl_noinv = fpls, false
end
if args.mpl[1] then
-- Override any existing masculine plurals.
masculine_plurals, mpl_noinv = handle_mf_plural("mpl", mpls, "m", masculine_plurals, masculines)
else
masculine_plurals, mpl_noinv = mpls, false
end
local function redundant_plural(pl)
for _, p in ipairs(plurals) do
if p.term == pl.term then
return true
end
end
return false
end
for _, mpl in ipairs(masculine_plurals) do
if redundant_plural(mpl) then
track("noun-redundant-mpl")
end
end
for _, fpl in ipairs(feminine_plurals) do
if redundant_plural(fpl) then
track("noun-redundant-fpl")
end
end
if plurals[1] then
-- Set 'noinv' because we already took care of invariable plurals above.
insert_noun_inflection(plurals, "plural", "p", "noinv")
end
insert_noun_inflection(masculines, "masculine")
insert_noun_inflection(masculine_plurals, "masculine plural", nil, mpl_noinv)
insert_noun_inflection(feminines, "feminine", "f")
insert_noun_inflection(feminine_plurals, "feminine plural", nil, fpl_noinv)
local function parse_and_insert_noun_inflection(field, label, accel)
parse_and_insert_inflection(data, args, field, label, accel)
end
parse_and_insert_noun_inflection("adj", glossary_link("relational", "relational adjective"))
parse_and_insert_noun_inflection("adv", glossary_link("adverb"))
parse_and_insert_noun_inflection("dem", glossary_link("demonym"))
parse_and_insert_noun_inflection("fdem", "female " .. glossary_link("demonym"))
insert_deriv_inflections(data, args)
-- Maybe add category 'Italian nouns with irregular gender' (or similar)
local irreg_gender_lemma = lemma:gsub(" .*", "") -- only look at first word
if (irreg_gender_lemma:find("o$") and (gender_for_default_plural == "f" or gender_for_default_plural == "mf"
or gender_for_default_plural == "mfbysense")) or
(irreg_gender_lemma:find("a$") and (gender_for_default_plural == "m" or gender_for_default_plural == "mf"
or gender_for_default_plural == "mfbysense")) then
inscat(category_plpos .. " dengan genus tak tentu")
end
end
local function get_noun_params(nountype)
local params = {
[1] = {list = "g", disallow_holes = true, required = nountype ~= "proper", default = "?", type = "genders",
flatten = true},
[2] = {list = "pl", disallow_holes = true},
["m"] = list_param,
["f"] = list_param,
["mpl"] = list_param,
["fpl"] = list_param,
["adj"] = list_param, --adjective(s)
["adv"] = list_param, --adverb(s)
["dem"] = list_param, --demonym(s)
["fdem"] = list_param, --female demonym(s)
}
insert_deriv_params(params)
return params
end
pos_functions["Kata nama"] = {
params = get_noun_params("base"),
func = do_noun,
}
pos_functions["Kata nama khas"] = {
params = get_noun_params("proper"),
func = function(args, data)
do_noun(args, data, "is proper noun")
end,
}
pos_functions["Kata nama kardinal"] = {
params = get_noun_params("base"),
func = function(args, data)
do_noun(args, data)
insert(data.categories, 1, "Nombor kardinal " .. langname)
end,
pos_category = "Kata bilangan",
}
-----------------------------------------------------------------------------------------
-- Adjectives --
-----------------------------------------------------------------------------------------
local function do_adjective(args, data, is_superlative)
local feminines = {}
local masculine_plurals = {}
local feminine_plurals = {}
-- Use "participle" not "past participle" for categories such as 'invariable participles'
local category_plpos = data.checkredlinks
if category_plpos == true then
category_plpos = data.pos_category
end
local category_pos = m_en_utilities.singularize(category_plpos)
if args.sp then
local romut = require(romut_module)
if not romut.allowed_special_indicators[args.sp] then
local indicators = {}
for indic, _ in pairs(romut.allowed_special_indicators) do
insert(indicators, "'" .. indic .. "'")
end
table.sort(indicators)
error("Special inflection indicator beginning can only be " ..
mw.text.listToText(indicators) .. ": " .. args.sp)
end
end
local lemma = data.pagename
local function fetch_inflections(field)
local retval = m_headword_utilities.parse_term_list_with_modifiers {
paramname = field,
forms = args[field],
splitchar = ",",
}
if not retval[1] then
return {{term = "+"}}
end
return retval
end
local function insert_inflection(terms, label, accel)
m_headword_utilities.insert_inflection {
headdata = data,
terms = terms,
label = label,
accel = accel and {form = accel} or nil,
}
end
if args.inv then
-- invariable adjective
insert(data.inflections, {label = glossary_link("invariable")})
insert(data.categories, langname .. " indeclinable " .. category_plpos)
end
if args.noforms then
-- [[bello]] and any others too complicated to describe in headword
insert(data.inflections, {label = "see below for inflection"})
end
if args.inv or args.apoc or args.noforms then
if args.sp or args.f[1] or args.pl[1] or args.mpl[1] or args.fpl[1] then
error("Can't specify inflections with an invariable or apocopated adjective or with noforms=")
end
elseif args.fonly then
-- feminine-only
if args.f[1] then
error("Can't specify explicit feminines with feminine-only " .. category_pos)
end
if args.pl[1] then
error("Can't specify explicit plurals with feminine-only " .. category_pos .. ", use fpl=")
end
if args.mpl[1] then
error("Can't specify explicit masculine plurals with feminine-only " .. category_pos)
end
local argsfpl = fetch_inflections("fpl")
for _, fpl in ipairs(argsfpl) do
if fpl.term == "+" then
local defpl = com.make_plural(lemma, "f", args.sp)
if not defpl then
error("Unable to generate default plural of '" .. lemma .. "'")
end
fpl.term = defpl
else
fpl.term = replace_hash_with_lemma(fpl.term, lemma)
end
insert(feminine_plurals, fpl)
end
insert(data.inflections, {label = "feminine-only"})
insert_inflection(feminine_plurals, "feminine plural", "f|p")
else
-- Gather feminines.
for _, f in ipairs(fetch_inflections("f")) do
if f.term == "+" then
-- Generate default feminine.
f.term = com.make_feminine(lemma, args.sp)
else
f.term = replace_hash_with_lemma(f.term, lemma)
end
insert(feminines, f)
end
local fem_like_lemma = #feminines == 1 and feminines[1].term == lemma and
not m_headword_utilities.termobj_has_qualifiers_or_labels(feminines[1])
if fem_like_lemma then
insert(data.categories, langname .. " epicene " .. category_plpos)
end
local mpl_field = "mpl"
local fpl_field = "fpl"
if args.pl[1] then
if args.mpl[1] or args.fpl[1] then
error("Can't specify both pl= and mpl=/fpl=")
end
mpl_field = "pl"
fpl_field = "pl"
end
local argsmpl = fetch_inflections(mpl_field)
local argsfpl = fetch_inflections(fpl_field)
for _, mpl in ipairs(argsmpl) do
if mpl.term == "+" then
-- Generate default masculine plural.
local defpl = com.make_plural(lemma, "m", args.sp)
if not defpl then
error("Unable to generate default plural of '" .. lemma .. "'")
end
mpl.term = defpl
else
mpl.term = replace_hash_with_lemma(mpl.term, lemma)
end
insert(masculine_plurals, mpl)
end
for _, fpl in ipairs(argsfpl) do
if fpl.term == "+" then
for _, f in ipairs(feminines) do
-- Generate default feminine plural; f is a table.
local fplobj = m_table.shallowCopy(fpl)
local defpl = com.make_plural(f.term, "f", args.sp)
if not defpl then
error("Unable to generate default plural of '" .. f.term .. "'")
end
fplobj.term = defpl
m_headword_utilities.combine_termobj_qualifiers_labels(fplobj, f)
insert(feminine_plurals, fplobj)
end
else
fpl.term = replace_hash_with_lemma(fpl.term, lemma)
insert(feminine_plurals, fpl)
end
end
local fem_pl_like_masc_pl = masculine_plurals[1] and feminine_plurals[1] and
m_table.deepEquals(masculine_plurals, feminine_plurals)
local masc_pl_like_lemma = #masculine_plurals == 1 and masculine_plurals[1].term == lemma and
not m_headword_utilities.termobj_has_qualifiers_or_labels(masculine_plurals[1])
if fem_like_lemma and fem_pl_like_masc_pl and masc_pl_like_lemma then
-- actually invariable
insert(data.inflections, {label = glossary_link("invariable")})
insert(data.categories, langname .. " indeclinable " .. category_plpos)
else
-- Make sure there are feminines given and not same as lemma.
if not fem_like_lemma then
insert_inflection(feminines, "feminine", "f|s")
elseif args.gneut then
data.genders = {"gneut"}
else
data.genders = {"mfbysense"}
end
if fem_pl_like_masc_pl then
if args.gneut then
insert_inflection(masculine_plurals, "plural", "p")
else
-- This is how the Spanish module works.
-- insert_inflection(masculine_plurals, "masculine and feminine plural", "p")
insert_inflection(masculine_plurals, "plural", "p")
end
else
insert_inflection(masculine_plurals, "masculine plural", "m|p")
insert_inflection(feminine_plurals, "feminine plural", "f|p")
end
end
end
local function parse_and_insert_adj_inflection(field, label, accel, frob)
parse_and_insert_inflection(data, args, field, label, accel, frob)
end
parse_and_insert_adj_inflection("n", "neuter")
parse_and_insert_adj_inflection("comp", glossary_link("comparative"))
parse_and_insert_adj_inflection("sup", glossary_link("superlative"))
parse_and_insert_adj_inflection("adv", glossary_link("adverb"))
insert_deriv_inflections(data, args)
if args.irreg and is_superlative then
insert(data.categories, langname .. " irregular superlative " .. category_plpos)
end
end
local function get_adjective_params(adjtype)
local params = {
["inv"] = boolean_param, --invariable
["noforms"] = boolean_param, --too complicated to list forms except in a table
["sp"] = true, -- special indicator: "first", "first-last", etc.
["f"] = list_param, --feminine form(s)
["pl"] = list_param, --plural override(s)
["fpl"] = list_param, --feminine plural override(s)
["mpl"] = list_param, --masculine plural override(s)
["adv"] = list_param, --adverb(s)
}
if adjtype == "base" or adjtype == "part" or adjtype == "det" then
params["comp"] = list_param --comparative(s)
params["sup"] = list_param --superlative(s)
params["fonly"] = boolean_param -- feminine only
end
if adjtype == "sup" then
params["irreg"] = boolean_param
end
insert_deriv_params(params)
return params
end
pos_functions["adjectives"] = {
params = get_adjective_params("base"),
func = do_adjective,
}
pos_functions["comparative adjectives"] = {
params = get_adjective_params("comp"),
func = do_adjective,
pos_category = "adjectives",
}
pos_functions["superlative adjectives"] = {
params = get_adjective_params("sup"),
func = function(args, data)
do_adjective(args, data, "is superlative")
end,
pos_category = "adjectives",
}
pos_functions["cardinal adjectives"] = {
params = get_adjective_params("card"),
func = function(args, data)
do_adjective(args, data)
insert(data.categories, 1, langname .. " cardinal numbers")
end,
pos_category = "numerals",
}
pos_functions["past participles"] = {
params = get_adjective_params("part"),
func = do_adjective,
redlink_pos = "participles",
}
pos_functions["present participles"] = {
params = get_adjective_params("part"),
func = do_adjective,
redlink_pos = "participles",
}
pos_functions["determiners"] = {
params = get_adjective_params("det"),
func = do_adjective,
}
pos_functions["articles"] = {
params = get_adjective_params("det"),
func = do_adjective,
}
pos_functions["adjective-like pronouns"] = {
params = get_adjective_params("pron"),
func = do_adjective,
pos_category = "pronouns",
}
pos_functions["cardinal invariable"] = {
params = {},
func = function(args, data)
insert(data.categories, langname .. " cardinal numbers")
insert(data.categories, langname .. " indeclinable numerals")
insert(data.inflections, {label = glossary_link("invariable")})
end,
pos_category = "numerals",
}
-----------------------------------------------------------------------------------------
-- Adverbs --
-----------------------------------------------------------------------------------------
local function do_adverb(args, data)
local function parse_and_insert_adv_inflection(field, label, accel, frob)
parse_and_insert_inflection(data, args, field, label, accel, frob)
end
parse_and_insert_adv_inflection("comp", glossary_link("comparative"))
parse_and_insert_adv_inflection("sup", glossary_link("superlative"))
parse_and_insert_adv_inflection("adj", glossary_link("adjective"))
end
local function get_adverb_params(advtype)
local params = {
["adj"] = list_param, --adjective(s)
}
if advtype == "base" then
params["comp"] = list_param --comparative(s)
params["sup"] = list_param --superlative(s)
end
return params
end
pos_functions["adverbs"] = {
params = get_adverb_params("base"),
func = do_adverb,
}
pos_functions["comparative adverbs"] = {
params = get_adverb_params("comp"),
func = do_adverb,
pos_category = "adverbs",
}
pos_functions["superlative adverbs"] = {
params = get_adverb_params("sup"),
func = do_adverb,
pos_category = "adverbs",
}
-----------------------------------------------------------------------------------------
-- Verbs --
-----------------------------------------------------------------------------------------
pos_functions["verbs"] = {
params = {
[1] = {},
["noautolinktext"] = boolean_param,
["noautolinkverb"] = boolean_param,
},
func = function(args, data)
if args[1] then
local alternant_multiword_spec = require(it_verb_module).do_generate_forms(args, "from headword", data.heads[1])
local function do_verb_form(slot, label, rowslot, rowlabel)
local forms = alternant_multiword_spec.forms[slot]
local retval
if alternant_multiword_spec.rowprops.all_defective[rowslot] then
if not alternant_multiword_spec.rowprops.defective[rowslot] then
-- No forms, but none expected; don't display anything
return
end
retval = {label = "no " .. rowlabel}
elseif not forms then
retval = {label = "no " .. label}
elseif alternant_multiword_spec.rowprops.all_unknown[rowslot] then
retval = {label = "unknown " .. rowlabel}
elseif forms[1].form == "?" then
retval = {label = "unknown " .. label}
else
-- Disable accelerators for now because we don't want the added accents going into the headwords.
-- FIXME: We now have support in [[Module:accel]] to specify the target explicitly; we can use this
-- so we can add the accelerators back with a param to avoid the accents.
local accel_form = nil -- all_verb_slots[slot]
retval = {label = label, accel = accel_form and {form = accel_form} or nil}
local prev_footnotes = nil
-- If the footnotes for this form are the same as the footnotes for the preceding form or
-- contain the preceding footnotes, replace the footnotes that are the same with "ditto".
-- This avoids repetition on pages like [[succedere]] where the form ''succedétti'' has a long
-- footnote which gets repeated in the traditional form ''succedètti'' (which also has the
-- footnote "[traditional]").
for _, form in ipairs(forms) do
local quals, refs = require(inflection_utilities_module).
convert_footnotes_to_qualifiers_and_references(form.footnotes)
local quals_with_ditto = quals
if quals and prev_footnotes then
local quals_contains_previous = true
for _, qual in ipairs(prev_footnotes) do
if not m_table.contains(quals, qual) then
quals_contains_previous = false
break
end
end
if quals_contains_previous then
local inserted_ditto = false
quals_with_ditto = {}
for _, qual in ipairs(quals) do
if m_table.contains(prev_footnotes, qual) then
if not inserted_ditto then
insert(quals_with_ditto, "ditto")
inserted_ditto = true
end
else
insert(quals_with_ditto, qual)
end
end
end
end
prev_footnotes = quals
insert(retval, {term = form.form, q = quals_with_ditto, refs = refs})
end
end
insert(data.inflections, retval)
end
if alternant_multiword_spec.props.is_pronominal then
insert(data.inflections, {label = glossary_link("pronominal")})
end
if alternant_multiword_spec.props.impers then
insert(data.inflections, {label = glossary_link("impersonal")})
end
if alternant_multiword_spec.props.thirdonly then
insert(data.inflections, {label = "third-person only"})
end
local thirdonly = alternant_multiword_spec.props.impers or alternant_multiword_spec.props.thirdonly
local sing_label = thirdonly and "third-person singular" or "first-person singular"
for _, rowspec in ipairs {
{"pres", "present", true},
{"phis", "past historic", true},
{"pp", "past participle", true},
{"imperf", "imperfect"},
{"fut", "future"},
{"sub", "subjunctive"},
{"impsub", "imperfect subjunctive"},
} do
local rowslot, desc, always_show = unpack(rowspec)
local slot = rowslot .. (thirdonly and "3s" or "1s")
local must_show = alternant_multiword_spec.is_irreg[slot]
if always_show then
must_show = true
elseif rowslot == "imperf" and alternant_multiword_spec.props.has_explicit_stem_spec then
-- If there is an explicit stem spec, make sure it gets displayed; the imperfect is a good way of
-- showing this.
must_show = true
elseif not alternant_multiword_spec.forms[slot] then
-- If the principal part is unexpectedly missing, make sure we show this.
must_show = true
elseif alternant_multiword_spec.forms[slot][1].form == "?" then
-- If the principal part is unknown, make sure we show this.
must_show = true
end
if must_show then
if rowslot == "pp" then
do_verb_form(rowslot, desc, rowslot, desc)
else
do_verb_form(slot, sing_label .. " " .. desc, rowslot, desc)
end
end
end
-- Also do the imperative, but not for third-only verbs, which are always missing the imperative.
if not thirdonly and (alternant_multiword_spec.is_irreg.imp2s
or not alternant_multiword_spec.forms.imp2s) then
do_verb_form("imp2s", "second-person singular imperative", "imp", "imperative")
end
-- If there is a past participle but no auxiliary (e.g. [[malfare]]), explicitly add "no auxiliary". In
-- cases where there's no past participle and no auxiliary (e.g. [[irrompere]]), we don't do this as we
-- already get "no past participle" displayed. Don't display an auxiliary in any case if the lemma
-- consists entirely of reflexive verbs (for which the auxiliary is always [[essere]]).
if alternant_multiword_spec.props.is_non_reflexive and (
alternant_multiword_spec.forms.aux or alternant_multiword_spec.forms.pp
) then
do_verb_form("aux", "auxiliary", "aux", "auxiliary")
end
-- Add categories.
for _, cat in ipairs(alternant_multiword_spec.categories) do
insert(data.categories, cat)
end
-- If the user didn't explicitly specify head=, or specified exactly one head (not 2+) and we were able to
-- incorporate any links in that head into the 1= specification, use the infinitive generated by
-- [[Module:it-verb]] it in place of the user-specified or auto-generated head so that we get accents marked
-- on the verb(s). Don't do this if the user gave multiple heads or gave a head with a multiword-linked
-- verbal expression such as '[[dare esca]] [[al]] [[fuoco]]'.
if #data.user_specified_heads == 0 or (
#data.user_specified_heads == 1 and alternant_multiword_spec.incorporated_headword_head_into_lemma
) then
data.heads = {}
for _, lemma_obj in ipairs(alternant_multiword_spec.forms.inf) do
local quals, refs = require(inflection_utilities_module).
convert_footnotes_to_qualifiers_and_references(lemma_obj.footnotes)
insert(data.heads, {term = lemma_obj.form, q = quals, refs = refs})
end
end
end
end
}
-----------------------------------------------------------------------------------------
-- Suffix forms --
-----------------------------------------------------------------------------------------
pos_functions["suffix forms"] = {
params = {
[1] = {required = true, list = true, disallow_holes = true},
["g"] = {list = true, disallow_holes = true, type = "genders", flatten = true},
},
func = function(args, data)
validate_genders(args.g)
data.genders = args.g
local suffix_type = {}
for _, typ in ipairs(args[1]) do
insert(suffix_type, typ .. "-forming suffix")
end
insert(data.inflections, {label = "non-lemma form of " .. m_table.serialCommaJoin(suffix_type, {conj = "or"})})
end,
}
-----------------------------------------------------------------------------------------
-- Arbitrary parts of speech --
-----------------------------------------------------------------------------------------
pos_functions["arbitrary part of speech"] = {
params = {
[1] = {required = true},
["g"] = {list = true, disallow_holes = true, type = "genders", flatten = true},
},
func = function(args, data)
if data.is_suffix then
error("Can't use [[Template:it-pos]] with suffixes")
end
validate_genders(args.g)
data.genders = args.g
local plpos = m_en_utilities.pluralize(args[1])
data.pos_category = plpos
end,
}
return export
nwrvr8c2uw88xzkgo5v0lz55gsyd6a5
281465
281463
2026-04-23T09:44:51Z
Hakimi97
2668
281465
Scribunto
text/plain
-- This module contains code for Italian headword templates.
-- Templates covered are:
-- * {{it-noun}}, {{it-proper noun}};
-- * {{it-verb}};
-- * {{it-adj}}, {{it-adj-comp}}, {{it-adj-sup}};
-- * {{it-det}};
-- * {{it-art}};
-- * {{it-pron-adj}};
-- * {{it-pp}};
-- * {{it-presp}};
-- * {{it-card-noun}}, {{it-card-adj}}, {{it-card-inv}};
-- * {{it-adv}};
-- * {{it-pos}};
-- * {{it-suffix form}}.
-- See [[Module:it-verb]] for Italian conjugation templates.
local export = {}
local pos_functions = {}
local force_cat = false -- for testing; if true, categories appear in non-mainspace pages
local m_strutils = require("Module:string utilities")
local usub = m_strutils.sub
local require_when_needed = require("Module:utilities/require when needed")
local insert = table.insert
local remove = table.remove
local m_table = require("Module:table")
local com = require("Module:it-common")
local en_utilities_module = "Module:en-utilities"
local headword_module = "Module:headword"
local headword_utilities_module = "Module:headword utilities"
local inflection_utilities_module = "Module:inflection utilities"
local it_verb_module = "Module:it-verb"
local parse_interface_module = "Module:parse interface"
local romut_module = "Module:romance utilities"
local lang = require("Module:languages").getByCode("it")
local langname = lang:getCanonicalName()
local m_en_utilities = require_when_needed(en_utilities_module)
local m_headword_utilities = require_when_needed(headword_utilities_module)
local glossary_link = require_when_needed(headword_utilities_module, "glossary_link")
local unpack = unpack or table.unpack -- Lua 5.2 compatibility
local no_split_apostrophe_words = {
["c'è"] = true,
["c'era"] = true,
["c'erano"] = true,
}
-----------------------------------------------------------------------------------------
-- Utility functions --
-----------------------------------------------------------------------------------------
local function track(page)
require("Module:debug/track")("it-headword/" .. page)
return true
end
-- Parse and insert an inflection not requiring additional processing into `data.inflections`. The raw arguments come
-- from `args[field]`, which is parsed for inline modifiers. `label` is the label that the inflections are given;
-- `accel` is the accelerator form, or nil.
local function parse_and_insert_inflection(data, args, field, label, accel, frob)
m_headword_utilities.parse_and_insert_inflection {
headdata = data,
forms = args[field],
paramname = field,
splitchar = ",",
label = label,
accel = accel and {form = accel} or nil,
frob = frob,
}
end
local function replace_hash_with_lemma(term, lemma)
-- If there is a % sign in the lemma, we have to replace it with %% so it doesn't get interpreted as a capture
-- replace expression.
lemma = lemma:gsub("%%", "%%%%")
return (term:gsub("#", lemma))
end
local list_param = {list = true, disallow_holes = true}
local boolean_param = {type = "boolean"}
-----------------------------------------------------------------------------------------
-- Main entry point --
-----------------------------------------------------------------------------------------
function export.show(frame)
local poscat = frame.args[1]
or error("Part of speech has not been specified. Please pass parameter 1 to the module invocation.")
local parargs = frame:getParent().args
local params = {
["head"] = list_param,
["id"] = true,
["sort"] = true,
["apoc"] = boolean_param,
["splithyph"] = boolean_param,
["nolinkhead"] = boolean_param,
["nolink"] = {type = "boolean", alias_of = "nolinkhead"},
["json"] = boolean_param,
["pagename"] = true, -- for testing
}
if pos_functions[poscat] then
for key, val in pairs(pos_functions[poscat].params) do
params[key] = val
end
end
local args = require("Module:parameters").process(parargs, params)
local pagename = args.pagename or mw.loadData("Module:headword/data").pagename
local user_specified_heads = args.head
local heads = user_specified_heads
if args.nolinkhead then
if #heads == 0 then
heads = {pagename}
end
else
local romut = require(romut_module)
local auto_linked_head = romut.add_links_to_multiword_term(pagename, args.splithyph,
no_split_apostrophe_words)
if #heads == 0 then
heads = {auto_linked_head}
else
for i, head in ipairs(heads) do
if head:find("^~") then
head = romut.apply_link_modifiers(auto_linked_head, usub(head, 2))
heads[i] = head
end
if head == auto_linked_head then
track("redundant-head")
end
end
end
end
local data = {
lang = lang,
pos_category = pos_functions[poscat] and pos_functions[poscat].pos_category or poscat,
categories = {},
heads = heads,
user_specified_heads = user_specified_heads,
no_redundant_head_cat = #user_specified_heads == 0,
genders = {},
inflections = {},
pagename = pagename,
id = args.id,
sort_key = args.sort,
force_cat_output = force_cat,
checkredlinks = pos_functions[poscat] and pos_functions[poscat].redlink_pos or true,
}
if pagename:find("^%-") and poscat ~= "bentuk akhiran" then
data.is_suffix = true
data.pos_category = "akhiran"
data.checkredlinks = true
local singular_poscat = m_en_utilities.singularize(poscat)
insert(data.categories, "Akhiran membentuk " .. singular_poscat .. " bahasa " .. langname)
insert(data.inflections, {label = "Akhiran membentuk " .. singular_poscat})
end
if pos_functions[poscat] then
pos_functions[poscat].func(args, data)
end
if args.apoc then
-- Apocopated form of a term; do this after calling pos_functions[], because the function might modify
-- data.pos_category.
local pos = data.pos_category
if not pos:find("Bentuk ") then
-- Apocopated forms are non-lemma forms.
local singular_poscat = m_en_utilities.singularize(pos)
data.pos_category = "Bentuk " .. singular_poscat
end
-- If this is a suffix, insert label 'apocopated' after 'FOO-forming suffix', otherwise insert at the beginning.
insert(data.inflections, data.is_suffix and 2 or 1, {label = glossary_link("apocopated")})
end
if args.json then
return require("Module:JSON").toJSON(data)
end
return require(headword_module).full_headword(data)
end
local deriv_params = {
{"dim", glossary_link("diminutif")},
{"dim_dim", "double " .. glossary_link("diminutif")},
{"aug_dim", glossary_link("agam") .. "-" .. glossary_link("diminutif")},
{"aug", glossary_link("agam")},
{"dim_aug", glossary_link("diminutif") .. "-" .. glossary_link("agam")},
{"aug_aug", "double " .. glossary_link("agam")},
{"pej", glossary_link("pejoratif")},
{"dim_pej", glossary_link("diminutif") .. "-" .. glossary_link("pejoratif")},
{"aug_pej", glossary_link("agam") .. "-" .. glossary_link("pejoratif")},
{"pej_pej", "double " .. glossary_link("pejoratif")},
{"end", glossary_link("endearing")},
{"dim_end", glossary_link("diminutif") .. "-" .. glossary_link("endearing")},
{"aug_end", glossary_link("agam") .. "-" .. glossary_link("endearing")},
{"derog", glossary_link("hinaan")},
{"dim_derog", glossary_link("diminutif") .. "-" .. glossary_link("hinaan")},
{"aug_derog", glossary_link("agam") .. "-" .. glossary_link("hinaan")},
{"end_derog", glossary_link("endearing") .. "-" .. glossary_link("hinaan")},
}
local function insert_deriv_params(params)
for _, deriv_param in ipairs(deriv_params) do
local param = unpack(deriv_param)
params[param] = list_param
end
end
local param_mods = {
t = {
-- We need to store the <t:...> inline modifier into the "gloss" key of the parsed part, because that is what
-- [[Module:links]] expects.
item_dest = "gloss",
},
gloss = {},
-- no 'tr' or 'ts', doesn't make sense for Italian
g = {
-- We need to store the <g:...> inline modifier into the "genders" key of the parsed part, because that is what
-- [[Module:links]] expects.
item_dest = "genders",
sublist = true,
},
id = {},
alt = {},
q = {type = "qualifier"},
qq = {type = "qualifier"},
lit = {},
pos = {},
-- no 'sc', doesn't make sense for Italian
}
local function parse_term_with_modifiers(paramname, val)
local function generate_obj(term)
local decomp = com.decompose(term)
local lemma = com.remove_non_final_accents(decomp)
if lemma ~= decomp then
term = com.compose("[[" .. lemma .. "|" .. decomp .. "]]")
end
return {term = term}
end
local retval = require(parse_interface_module).parse_inline_modifiers(val, {
paramname = paramname,
param_mods = param_mods,
generate_obj = generate_obj,
splitchar = "[/;,]",
preserve_splitchar = true,
})
for _, obj in ipairs(retval) do
if obj.delimiter == ";" then
obj.separator = "; "
elseif obj.delimiter == "/" then
obj.separator = "/"
-- default to nil for comma
end
end
return retval
end
local function insert_deriv_inflections(data, args)
for _, deriv_param in ipairs(deriv_params) do
local param, desc = unpack(deriv_param)
if #args[param] > 0 then
local inflection = {label = desc}
for _, term in ipairs(args[param]) do
local parsed_terms = parse_term_with_modifiers(param, term)
for _, parsed_term in ipairs(parsed_terms) do
insert(inflection, parsed_term)
end
end
insert(data.inflections, inflection)
end
end
end
-----------------------------------------------------------------------------------------
-- Nouns --
-----------------------------------------------------------------------------------------
local allowed_genders = m_table.listToSet(
{"m", "f", "mf", "mfbysense", "mfequiv", "gneut", "n", "m-p", "f-p", "mf-p", "mfbysense-p", "mfequiv-p", "gneut-p", "n-p", "?", "?-p"}
)
local function validate_genders(genders)
for _, g in ipairs(genders) do
if type(g) == "table" then
g = g.spec
end
if not allowed_genders[g] then
error("Unrecognized gender: " .. g)
end
end
end
local function do_noun(args, data, is_proper)
local is_plurale_tantum = false
local has_singular = false
local category_plpos = data.checkredlinks
if category_plpos == true then
category_plpos = data.pos_category
end
local category_pos = m_en_utilities.singularize(category_plpos)
validate_genders(args[1])
data.genders = args[1]
local saw_m = false
local saw_f = false
local gender_for_default_plural
-- Check for specific genders and pluralia tantum.
for _, g in ipairs(args[1]) do
if type(g) == "table" then
g = g.spec
end
if g:find("-p$") then
is_plurale_tantum = true
else
has_singular = true
if g == "m" or g == "mf" or g == "mfbysense" then
saw_m = true
end
if g == "f" or g == "mf" or g == "mfbysense" then
saw_f = true
end
end
end
if saw_m and saw_f then
gender_for_default_plural = "mf"
elseif saw_f then
gender_for_default_plural = "f"
else
gender_for_default_plural = "m"
end
local lemma = data.pagename
local function inscat(cat)
insert(data.categories, langname .. " " .. cat)
end
local function insert_noun_inflection(terms, label, accel, no_inv)
for _, term in ipairs(terms) do
if not no_inv and term.term == lemma then
term.term = nil
term.label = glossary_link("invariable")
end
end
m_headword_utilities.insert_inflection {
headdata = data,
terms = terms,
label = label,
accel = accel and {form = accel} or nil,
}
end
-- Plural
local plurals = {}
-- Fetch explicit masculine and feminine plurals here because we may change them below when processing plurals.
local mpls = m_headword_utilities.parse_term_list_with_modifiers {
paramname = "mpl",
forms = args.mpl,
splitchar = ",",
}
local fpls = m_headword_utilities.parse_term_list_with_modifiers {
paramname = "fpl",
forms = args.fpl,
splitchar = ",",
}
if is_plurale_tantum and not has_singular then
if args[2][1] then
error("Can't specify plurals of plurale tantum " .. category_pos)
end
insert(data.inflections, {label = glossary_link("hanya jamak")})
elseif args.apoc then
-- apocopated noun
if args[2][1] then
error("Can't specify plurals of apocopated " .. category_pos)
end
else
-- Fetch plurals and associated qualifiers, labels and genders.
plurals = m_headword_utilities.parse_term_list_with_modifiers {
paramname = {2, "pl"},
forms = args[2],
splitchar = ",",
include_mods = {"g"},
}
-- Check for special plural signals
local mode = nil
local pl1 = plurals[1]
if pl1 and #pl1.term == 1 then
mode = pl1.term
if mode == "?" or mode == "!" or mode == "-" or mode == "~" then
pl1.term = nil
if next(pl1) then
error(("Can't specify inline modifiers with plural code '%s'"):format(mode))
end
remove(plurals, 1) -- Remove the mode parameter
elseif mode ~= "+" and mode ~= "#" then
error(("Unexpected plural code '%s'"):format(mode))
end
end
if is_plurale_tantum then
-- both singular and plural
insert(data.inflections, {label = "kadangkala " .. glossary_link("hanya jamak") .. ", dengan kelainan"})
end
if mode == "?" then
-- Plural is unknown
insert(data.categories, category_plpos .. " bahasa " .. langname .. " dengan bentuk jamak yang tidak dikenal pasti")
elseif mode == "!" then
-- Plural is not attested
insert(data.inflections, {label = "plural not attested"})
insert(data.categories, category_plpos .. " bahasa " .. langname .. " dengan bentuk jamak yang tidak ditentusahkan")
if plurals[1] then
error("Can't specify any plurals along with unattested plural code '!'")
end
elseif mode == "-" then
-- Uncountable noun; may occasionally have a plural
insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname)
-- If plural forms were given explicitly, then show "usually"
if plurals[1] then
insert(data.inflections, {label = "biasanya " .. glossary_link("tak berbilang")})
insert(data.categories, category_plpos .. " berbilang bahasa " .. langname)
else
insert(data.inflections, {label = glossary_link("uncountable")})
end
else
-- Countable or mixed countable/uncountable
-- If no plurals, use the default plural unless mpl= or fpl= explicitly given.
if not plurals[1] and not mpls[1] and not fpls[1] and not is_proper then
plurals[1] = {term = "+"}
end
if mode == "~" then
-- Mixed countable/uncountable noun, always has a plural
insert(data.inflections, {label = glossary_link("berbilang") .. " dan " .. glossary_link("tak berbilang")})
insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname)
insert(data.categories, category_plpos .. " berbilang bahasa " .. langname)
elseif plurals[1] then
-- Countable nouns
insert(data.categories, category_plpos .. " berbilang bahasa " .. langname)
else
-- Uncountable nouns
insert(data.categories, category_plpos .. " tak berbilang bahasa " .. langname)
end
end
-- Process plurals, handling requests for default plurals.
local has_default_or_hash = false
for _, pl in ipairs(plurals) do
if pl.term:find("^%+") or pl.term:find("#") or pl.term == "cap*" or pl.term == "cap*+" then
has_default_or_hash = true
break
end
end
if has_default_or_hash then
local newpls = {}
local function insert_pl(pl, defpl)
pl.term = defpl
insert(newpls, pl)
end
local function make_gendered_plural(pl, special)
if gender_for_default_plural == "mf" then
local default_mpl = com.make_plural(lemma, "m", special)
local default_fpl = com.make_plural(lemma, "f", special)
if default_mpl then
if default_mpl == default_fpl then
insert_pl(pl, default_mpl)
else
if args.mpl[1] or args.fpl[1] then
error("Can't specify gendered plural spec '" .. (special or "+") ..
"' along with gender=" .. gender_for_default_plural ..
" and also specify mpl= or fpl=")
end
mpls = {m_table.shallowCopy(pl)}
mpls[1].term = default_mpl
fpls = {pl}
fpls[1].term = default_fpl
end
end
else
local defpl = com.make_plural(lemma, gender_for_default_plural, special)
if defpl then
insert_pl(pl, defpl)
end
end
end
for _, pl in ipairs(plurals) do
if pl.term == "cap*" or pl.term == "cap*+" then
make_gendered_plural(pl, pl.term)
elseif pl.term == "+" then
make_gendered_plural(pl)
elseif pl.term:find("^%+") then
local special = require(romut_module).get_special_indicator(pl.term)
make_gendered_plural(pl, special)
else
insert_pl(pl, replace_hash_with_lemma(pl.term, lemma))
end
end
plurals = newpls
end
if plurals[2] then
inscat(category_plpos .. " with multiple plurals")
end
-- If the first or only plural is the same as the singular, replace it with 'invariable', or 'usually
-- invariable' if there is more than one plural.
pl1 = plurals[1]
if pl1 and pl1.term == lemma then
if plurals[2] then
insert(data.inflections, {label = "usually " .. glossary_link("invariable"),
q = pl1.q, qq = pl1.qq, l = pl1.l, ll = pl1.ll, refs = pl1.refs
})
else
insert(data.inflections, {label = glossary_link("invariable"),
q = pl1.q, qq = pl1.qq, l = pl1.l, ll = pl1.ll, refs = pl1.refs
})
end
remove(plurals, 1)
inscat("indeclinable " .. category_plpos)
end
if plurals[1] then
-- Check for gender-changing plurals.
for _, pl in ipairs(plurals) do
if pl.genders then
for _, g in ipairs(pl.genders) do
if type(g) ~= "table" then
g = {spec = g}
end
if g.spec == "m" and not saw_m or g.spec == "f" and not saw_f then
inscat(category_plpos .. " that change gender in the plural")
end
end
end
end
end
end
-- Gather masculines/feminines. For each one, generate the corresponding plural. `field` is the name of the field
-- containing the masculine or feminine forms (normally "m" or "f"); `inflect` is a function of one or two arguments
-- to generate the default masculine or feminine from the lemma (the arguments are the lemma and optionally a
-- "special" flag to indicate how to handle multiword lemmas, and the function is normally make_feminine or
-- make_masculine from [[Module:it-common]]); and `default_plurals` is a list into which the corresponding default
-- plurals of the gathered or generated masculine or feminine forms are stored.
local function handle_mf(field, inflect, default_plurals)
local special
local mfs = m_headword_utilities.parse_term_list_with_modifiers {
paramname = field,
forms = args[field],
splitchar = ",",
frob = function(term)
if term == "+" then
-- Generate default masculine/feminine.
term = inflect(lemma)
else
term = replace_hash_with_lemma(term, lemma)
end
special = require(romut_module).get_special_indicator(term)
if special then
term = inflect(lemma, special)
end
return term
end
}
for _, mf in ipairs(mfs) do
local plobj = m_table.shallowCopy(mf)
plobj.term = com.make_plural(mf.term, field, special)
if plobj.term then
-- Add an accelerator for each masculine/feminine plural whose lemma is the corresponding singular, so that
-- the accelerated entry that is generated has a definition that looks like
-- # {{plural of|it|MFSING}}
plobj.accel = {form = "p", lemma = mf.term}
insert(default_plurals, plobj)
end
end
return mfs
end
local feminine_plurals = {}
local feminines = handle_mf("f", com.make_feminine, feminine_plurals)
local masculine_plurals = {}
local masculines = handle_mf("m", com.make_masculine, masculine_plurals)
local function handle_mf_plural(mfplfield, mfpls, gender, default_plurals, singulars)
if is_plurale_tantum then
return mfpls, true
end
local new_mfpls = {}
local saw_plus
local noinv
for i, mfpl in ipairs(mfpls) do
local accel
if #mfpls == #singulars then
-- If same number of overriding masculine/feminine plurals as singulars, assume each plural goes with
-- the corresponding singular and use each corresponding singular as the lemma in the accelerator. The
-- generated entry will have
-- # {{plural of|it|SINGULAR}}
-- as the definition.
accel = {form = "p", lemma = singulars[i].term}
else
accel = nil
end
if mfpl.term == "+" then
-- We should never see + twice. If we do, it will lead to problems since we overwrite the values of
-- default_plurals the first time around.
if saw_plus then
error(("Saw + twice when handling %s="):format(mfplfield))
end
saw_plus = true
if not default_plurals[1] then
local defpl = com.make_plural(lemma, gender)
if not defpl then
error("Unable to generate default plural of '" .. lemma .. "'")
end
default_plurals[1] = {term = defpl}
end
for _, defpl in ipairs(default_plurals) do
-- defpl is already a table and has an accel field
m_headword_utilities.combine_termobj_qualifiers_labels(defpl, mfpl)
insert(new_mfpls, defpl)
end
-- don't use "invariable" because the plural is not with respect to the lemma but with respect to the
-- masc/fem singular
noinv = true
elseif mfpl.term == "cap*" or mfpl.term == "cap*+" or mfpl.term:find("^%+") then
if mfpl.term:find("^%+") then
mfpl.term = require(romut_module).get_special_indicator(mfpl.term)
end
if singulars[1] then
for _, mf in ipairs(singulars) do
local mfplobj = m_table.shallowCopy(mfpl)
mfplobj.term = com.make_plural(mf.term, gender, mfpl.term)
if mfplobj.term then
mfplobj.accel = accel
m_headword_utilities.combine_termobj_qualifiers_labels(mfplobj, mf)
insert(new_mfpls, mfplobj)
end
-- don't use "invariable" because the plural is not with respect to the lemma but with respect
-- to the masc/fem singular
noinv = true
-- FIXME: Should we throw an error if no plural could be generated?
end
else
-- FIXME: This clause didn't exist in the corresponding code in [[Module:pt-headword]]. Is it
-- correct?
mfpl.term = com.make_plural(lemma, gender, mfpl.term)
if mfpl.term then
insert(new_mfpls, mfpl)
end
end
else
mfpl.accel = accel
mfpl.term = replace_hash_with_lemma(mfpl.term, lemma)
insert(new_mfpls, mfpl)
-- don't use "invariable" if masc/fem singular present because the plural is not with respect to
-- the lemma but with respect to the masc/fem singular
noinv = noinv or #singulars > 0
end
end
return new_mfpls, noinv
end
local mpl_noinv, fpl_noinv
-- Not fpls[1] because if the user didn't specify any explicit mpl= or fpl= but the lemma gender is mf or mfbysense
-- and has separate masculine and feminine plural forms (e.g. any term in -ista), we don't want to reprocess those
-- auto-generated forms.
if args.fpl[1] then
-- Override any existing feminine plurals.
feminine_plurals, fpl_noinv = handle_mf_plural("fpl", fpls, "f", feminine_plurals, feminines)
else
feminine_plurals, fpl_noinv = fpls, false
end
if args.mpl[1] then
-- Override any existing masculine plurals.
masculine_plurals, mpl_noinv = handle_mf_plural("mpl", mpls, "m", masculine_plurals, masculines)
else
masculine_plurals, mpl_noinv = mpls, false
end
local function redundant_plural(pl)
for _, p in ipairs(plurals) do
if p.term == pl.term then
return true
end
end
return false
end
for _, mpl in ipairs(masculine_plurals) do
if redundant_plural(mpl) then
track("noun-redundant-mpl")
end
end
for _, fpl in ipairs(feminine_plurals) do
if redundant_plural(fpl) then
track("noun-redundant-fpl")
end
end
if plurals[1] then
-- Set 'noinv' because we already took care of invariable plurals above.
insert_noun_inflection(plurals, "plural", "p", "noinv")
end
insert_noun_inflection(masculines, "masculine")
insert_noun_inflection(masculine_plurals, "masculine plural", nil, mpl_noinv)
insert_noun_inflection(feminines, "feminine", "f")
insert_noun_inflection(feminine_plurals, "feminine plural", nil, fpl_noinv)
local function parse_and_insert_noun_inflection(field, label, accel)
parse_and_insert_inflection(data, args, field, label, accel)
end
parse_and_insert_noun_inflection("adj", glossary_link("relational", "relational adjective"))
parse_and_insert_noun_inflection("adv", glossary_link("adverb"))
parse_and_insert_noun_inflection("dem", glossary_link("demonym"))
parse_and_insert_noun_inflection("fdem", "female " .. glossary_link("demonym"))
insert_deriv_inflections(data, args)
-- Maybe add category 'Italian nouns with irregular gender' (or similar)
local irreg_gender_lemma = lemma:gsub(" .*", "") -- only look at first word
if (irreg_gender_lemma:find("o$") and (gender_for_default_plural == "f" or gender_for_default_plural == "mf"
or gender_for_default_plural == "mfbysense")) or
(irreg_gender_lemma:find("a$") and (gender_for_default_plural == "m" or gender_for_default_plural == "mf"
or gender_for_default_plural == "mfbysense")) then
inscat(category_plpos .. " dengan genus tak tentu")
end
end
local function get_noun_params(nountype)
local params = {
[1] = {list = "g", disallow_holes = true, required = nountype ~= "proper", default = "?", type = "genders",
flatten = true},
[2] = {list = "pl", disallow_holes = true},
["m"] = list_param,
["f"] = list_param,
["mpl"] = list_param,
["fpl"] = list_param,
["adj"] = list_param, --adjective(s)
["adv"] = list_param, --adverb(s)
["dem"] = list_param, --demonym(s)
["fdem"] = list_param, --female demonym(s)
}
insert_deriv_params(params)
return params
end
pos_functions["Kata nama"] = {
params = get_noun_params("base"),
func = do_noun,
}
pos_functions["Kata nama khas"] = {
params = get_noun_params("proper"),
func = function(args, data)
do_noun(args, data, "is proper noun")
end,
}
pos_functions["Kata nama kardinal"] = {
params = get_noun_params("base"),
func = function(args, data)
do_noun(args, data)
insert(data.categories, 1, "Nombor kardinal " .. langname)
end,
pos_category = "Kata bilangan",
}
-----------------------------------------------------------------------------------------
-- Adjectives --
-----------------------------------------------------------------------------------------
local function do_adjective(args, data, is_superlative)
local feminines = {}
local masculine_plurals = {}
local feminine_plurals = {}
-- Use "participle" not "past participle" for categories such as 'invariable participles'
local category_plpos = data.checkredlinks
if category_plpos == true then
category_plpos = data.pos_category
end
local category_pos = m_en_utilities.singularize(category_plpos)
if args.sp then
local romut = require(romut_module)
if not romut.allowed_special_indicators[args.sp] then
local indicators = {}
for indic, _ in pairs(romut.allowed_special_indicators) do
insert(indicators, "'" .. indic .. "'")
end
table.sort(indicators)
error("Special inflection indicator beginning can only be " ..
mw.text.listToText(indicators) .. ": " .. args.sp)
end
end
local lemma = data.pagename
local function fetch_inflections(field)
local retval = m_headword_utilities.parse_term_list_with_modifiers {
paramname = field,
forms = args[field],
splitchar = ",",
}
if not retval[1] then
return {{term = "+"}}
end
return retval
end
local function insert_inflection(terms, label, accel)
m_headword_utilities.insert_inflection {
headdata = data,
terms = terms,
label = label,
accel = accel and {form = accel} or nil,
}
end
if args.inv then
-- invariable adjective
insert(data.inflections, {label = glossary_link("invariable")})
insert(data.categories, langname .. " indeclinable " .. category_plpos)
end
if args.noforms then
-- [[bello]] and any others too complicated to describe in headword
insert(data.inflections, {label = "see below for inflection"})
end
if args.inv or args.apoc or args.noforms then
if args.sp or args.f[1] or args.pl[1] or args.mpl[1] or args.fpl[1] then
error("Can't specify inflections with an invariable or apocopated adjective or with noforms=")
end
elseif args.fonly then
-- feminine-only
if args.f[1] then
error("Can't specify explicit feminines with feminine-only " .. category_pos)
end
if args.pl[1] then
error("Can't specify explicit plurals with feminine-only " .. category_pos .. ", use fpl=")
end
if args.mpl[1] then
error("Can't specify explicit masculine plurals with feminine-only " .. category_pos)
end
local argsfpl = fetch_inflections("fpl")
for _, fpl in ipairs(argsfpl) do
if fpl.term == "+" then
local defpl = com.make_plural(lemma, "f", args.sp)
if not defpl then
error("Unable to generate default plural of '" .. lemma .. "'")
end
fpl.term = defpl
else
fpl.term = replace_hash_with_lemma(fpl.term, lemma)
end
insert(feminine_plurals, fpl)
end
insert(data.inflections, {label = "feminine-only"})
insert_inflection(feminine_plurals, "feminine plural", "f|p")
else
-- Gather feminines.
for _, f in ipairs(fetch_inflections("f")) do
if f.term == "+" then
-- Generate default feminine.
f.term = com.make_feminine(lemma, args.sp)
else
f.term = replace_hash_with_lemma(f.term, lemma)
end
insert(feminines, f)
end
local fem_like_lemma = #feminines == 1 and feminines[1].term == lemma and
not m_headword_utilities.termobj_has_qualifiers_or_labels(feminines[1])
if fem_like_lemma then
insert(data.categories, langname .. " epicene " .. category_plpos)
end
local mpl_field = "mpl"
local fpl_field = "fpl"
if args.pl[1] then
if args.mpl[1] or args.fpl[1] then
error("Can't specify both pl= and mpl=/fpl=")
end
mpl_field = "pl"
fpl_field = "pl"
end
local argsmpl = fetch_inflections(mpl_field)
local argsfpl = fetch_inflections(fpl_field)
for _, mpl in ipairs(argsmpl) do
if mpl.term == "+" then
-- Generate default masculine plural.
local defpl = com.make_plural(lemma, "m", args.sp)
if not defpl then
error("Unable to generate default plural of '" .. lemma .. "'")
end
mpl.term = defpl
else
mpl.term = replace_hash_with_lemma(mpl.term, lemma)
end
insert(masculine_plurals, mpl)
end
for _, fpl in ipairs(argsfpl) do
if fpl.term == "+" then
for _, f in ipairs(feminines) do
-- Generate default feminine plural; f is a table.
local fplobj = m_table.shallowCopy(fpl)
local defpl = com.make_plural(f.term, "f", args.sp)
if not defpl then
error("Unable to generate default plural of '" .. f.term .. "'")
end
fplobj.term = defpl
m_headword_utilities.combine_termobj_qualifiers_labels(fplobj, f)
insert(feminine_plurals, fplobj)
end
else
fpl.term = replace_hash_with_lemma(fpl.term, lemma)
insert(feminine_plurals, fpl)
end
end
local fem_pl_like_masc_pl = masculine_plurals[1] and feminine_plurals[1] and
m_table.deepEquals(masculine_plurals, feminine_plurals)
local masc_pl_like_lemma = #masculine_plurals == 1 and masculine_plurals[1].term == lemma and
not m_headword_utilities.termobj_has_qualifiers_or_labels(masculine_plurals[1])
if fem_like_lemma and fem_pl_like_masc_pl and masc_pl_like_lemma then
-- actually invariable
insert(data.inflections, {label = glossary_link("invariable")})
insert(data.categories, langname .. " indeclinable " .. category_plpos)
else
-- Make sure there are feminines given and not same as lemma.
if not fem_like_lemma then
insert_inflection(feminines, "feminine", "f|s")
elseif args.gneut then
data.genders = {"gneut"}
else
data.genders = {"mfbysense"}
end
if fem_pl_like_masc_pl then
if args.gneut then
insert_inflection(masculine_plurals, "plural", "p")
else
-- This is how the Spanish module works.
-- insert_inflection(masculine_plurals, "masculine and feminine plural", "p")
insert_inflection(masculine_plurals, "plural", "p")
end
else
insert_inflection(masculine_plurals, "masculine plural", "m|p")
insert_inflection(feminine_plurals, "feminine plural", "f|p")
end
end
end
local function parse_and_insert_adj_inflection(field, label, accel, frob)
parse_and_insert_inflection(data, args, field, label, accel, frob)
end
parse_and_insert_adj_inflection("n", "neuter")
parse_and_insert_adj_inflection("comp", glossary_link("comparative"))
parse_and_insert_adj_inflection("sup", glossary_link("superlative"))
parse_and_insert_adj_inflection("adv", glossary_link("adverb"))
insert_deriv_inflections(data, args)
if args.irreg and is_superlative then
insert(data.categories, langname .. " irregular superlative " .. category_plpos)
end
end
local function get_adjective_params(adjtype)
local params = {
["inv"] = boolean_param, --invariable
["noforms"] = boolean_param, --too complicated to list forms except in a table
["sp"] = true, -- special indicator: "first", "first-last", etc.
["f"] = list_param, --feminine form(s)
["pl"] = list_param, --plural override(s)
["fpl"] = list_param, --feminine plural override(s)
["mpl"] = list_param, --masculine plural override(s)
["adv"] = list_param, --adverb(s)
}
if adjtype == "base" or adjtype == "part" or adjtype == "det" then
params["comp"] = list_param --comparative(s)
params["sup"] = list_param --superlative(s)
params["fonly"] = boolean_param -- feminine only
end
if adjtype == "sup" then
params["irreg"] = boolean_param
end
insert_deriv_params(params)
return params
end
pos_functions["Kata sifat"] = {
params = get_adjective_params("base"),
func = do_adjective,
}
pos_functions["Kata sifat bandingan"] = {
params = get_adjective_params("comp"),
func = do_adjective,
pos_category = "Kata sifat",
}
pos_functions["Kata sifat superlatif"] = {
params = get_adjective_params("sup"),
func = function(args, data)
do_adjective(args, data, "is superlative")
end,
pos_category = "Kata sifat",
}
pos_functions["Kata sifat kardinal"] = {
params = get_adjective_params("card"),
func = function(args, data)
do_adjective(args, data)
insert(data.categories, 1, "Nombor kardinal bahasa " .. langname)
end,
pos_category = "numerals",
}
pos_functions["past participles"] = {
params = get_adjective_params("part"),
func = do_adjective,
redlink_pos = "participles",
}
pos_functions["present participles"] = {
params = get_adjective_params("part"),
func = do_adjective,
redlink_pos = "participles",
}
pos_functions["determiners"] = {
params = get_adjective_params("det"),
func = do_adjective,
}
pos_functions["articles"] = {
params = get_adjective_params("det"),
func = do_adjective,
}
pos_functions["adjective-like pronouns"] = {
params = get_adjective_params("pron"),
func = do_adjective,
pos_category = "pronouns",
}
pos_functions["cardinal invariable"] = {
params = {},
func = function(args, data)
insert(data.categories, langname .. " cardinal numbers")
insert(data.categories, langname .. " indeclinable numerals")
insert(data.inflections, {label = glossary_link("invariable")})
end,
pos_category = "numerals",
}
-----------------------------------------------------------------------------------------
-- Adverbs --
-----------------------------------------------------------------------------------------
local function do_adverb(args, data)
local function parse_and_insert_adv_inflection(field, label, accel, frob)
parse_and_insert_inflection(data, args, field, label, accel, frob)
end
parse_and_insert_adv_inflection("comp", glossary_link("comparative"))
parse_and_insert_adv_inflection("sup", glossary_link("superlative"))
parse_and_insert_adv_inflection("adj", glossary_link("adjective"))
end
local function get_adverb_params(advtype)
local params = {
["adj"] = list_param, --adjective(s)
}
if advtype == "base" then
params["comp"] = list_param --comparative(s)
params["sup"] = list_param --superlative(s)
end
return params
end
pos_functions["Adverba"] = {
params = get_adverb_params("base"),
func = do_adverb,
}
pos_functions["comparative adverbs"] = {
params = get_adverb_params("comp"),
func = do_adverb,
pos_category = "Adverba",
}
pos_functions["superlative adverbs"] = {
params = get_adverb_params("sup"),
func = do_adverb,
pos_category = "Adverba",
}
-----------------------------------------------------------------------------------------
-- Verbs --
-----------------------------------------------------------------------------------------
pos_functions["Kata kerja"] = {
params = {
[1] = {},
["noautolinktext"] = boolean_param,
["noautolinkverb"] = boolean_param,
},
func = function(args, data)
if args[1] then
local alternant_multiword_spec = require(it_verb_module).do_generate_forms(args, "from headword", data.heads[1])
local function do_verb_form(slot, label, rowslot, rowlabel)
local forms = alternant_multiword_spec.forms[slot]
local retval
if alternant_multiword_spec.rowprops.all_defective[rowslot] then
if not alternant_multiword_spec.rowprops.defective[rowslot] then
-- No forms, but none expected; don't display anything
return
end
retval = {label = "no " .. rowlabel}
elseif not forms then
retval = {label = "no " .. label}
elseif alternant_multiword_spec.rowprops.all_unknown[rowslot] then
retval = {label = "unknown " .. rowlabel}
elseif forms[1].form == "?" then
retval = {label = "unknown " .. label}
else
-- Disable accelerators for now because we don't want the added accents going into the headwords.
-- FIXME: We now have support in [[Module:accel]] to specify the target explicitly; we can use this
-- so we can add the accelerators back with a param to avoid the accents.
local accel_form = nil -- all_verb_slots[slot]
retval = {label = label, accel = accel_form and {form = accel_form} or nil}
local prev_footnotes = nil
-- If the footnotes for this form are the same as the footnotes for the preceding form or
-- contain the preceding footnotes, replace the footnotes that are the same with "ditto".
-- This avoids repetition on pages like [[succedere]] where the form ''succedétti'' has a long
-- footnote which gets repeated in the traditional form ''succedètti'' (which also has the
-- footnote "[traditional]").
for _, form in ipairs(forms) do
local quals, refs = require(inflection_utilities_module).
convert_footnotes_to_qualifiers_and_references(form.footnotes)
local quals_with_ditto = quals
if quals and prev_footnotes then
local quals_contains_previous = true
for _, qual in ipairs(prev_footnotes) do
if not m_table.contains(quals, qual) then
quals_contains_previous = false
break
end
end
if quals_contains_previous then
local inserted_ditto = false
quals_with_ditto = {}
for _, qual in ipairs(quals) do
if m_table.contains(prev_footnotes, qual) then
if not inserted_ditto then
insert(quals_with_ditto, "ditto")
inserted_ditto = true
end
else
insert(quals_with_ditto, qual)
end
end
end
end
prev_footnotes = quals
insert(retval, {term = form.form, q = quals_with_ditto, refs = refs})
end
end
insert(data.inflections, retval)
end
if alternant_multiword_spec.props.is_pronominal then
insert(data.inflections, {label = glossary_link("pronominal")})
end
if alternant_multiword_spec.props.impers then
insert(data.inflections, {label = glossary_link("impersonal")})
end
if alternant_multiword_spec.props.thirdonly then
insert(data.inflections, {label = "third-person only"})
end
local thirdonly = alternant_multiword_spec.props.impers or alternant_multiword_spec.props.thirdonly
local sing_label = thirdonly and "third-person singular" or "first-person singular"
for _, rowspec in ipairs {
{"pres", "present", true},
{"phis", "past historic", true},
{"pp", "past participle", true},
{"imperf", "imperfect"},
{"fut", "future"},
{"sub", "subjunctive"},
{"impsub", "imperfect subjunctive"},
} do
local rowslot, desc, always_show = unpack(rowspec)
local slot = rowslot .. (thirdonly and "3s" or "1s")
local must_show = alternant_multiword_spec.is_irreg[slot]
if always_show then
must_show = true
elseif rowslot == "imperf" and alternant_multiword_spec.props.has_explicit_stem_spec then
-- If there is an explicit stem spec, make sure it gets displayed; the imperfect is a good way of
-- showing this.
must_show = true
elseif not alternant_multiword_spec.forms[slot] then
-- If the principal part is unexpectedly missing, make sure we show this.
must_show = true
elseif alternant_multiword_spec.forms[slot][1].form == "?" then
-- If the principal part is unknown, make sure we show this.
must_show = true
end
if must_show then
if rowslot == "pp" then
do_verb_form(rowslot, desc, rowslot, desc)
else
do_verb_form(slot, sing_label .. " " .. desc, rowslot, desc)
end
end
end
-- Also do the imperative, but not for third-only verbs, which are always missing the imperative.
if not thirdonly and (alternant_multiword_spec.is_irreg.imp2s
or not alternant_multiword_spec.forms.imp2s) then
do_verb_form("imp2s", "second-person singular imperative", "imp", "imperative")
end
-- If there is a past participle but no auxiliary (e.g. [[malfare]]), explicitly add "no auxiliary". In
-- cases where there's no past participle and no auxiliary (e.g. [[irrompere]]), we don't do this as we
-- already get "no past participle" displayed. Don't display an auxiliary in any case if the lemma
-- consists entirely of reflexive verbs (for which the auxiliary is always [[essere]]).
if alternant_multiword_spec.props.is_non_reflexive and (
alternant_multiword_spec.forms.aux or alternant_multiword_spec.forms.pp
) then
do_verb_form("aux", "auxiliary", "aux", "auxiliary")
end
-- Add categories.
for _, cat in ipairs(alternant_multiword_spec.categories) do
insert(data.categories, cat)
end
-- If the user didn't explicitly specify head=, or specified exactly one head (not 2+) and we were able to
-- incorporate any links in that head into the 1= specification, use the infinitive generated by
-- [[Module:it-verb]] it in place of the user-specified or auto-generated head so that we get accents marked
-- on the verb(s). Don't do this if the user gave multiple heads or gave a head with a multiword-linked
-- verbal expression such as '[[dare esca]] [[al]] [[fuoco]]'.
if #data.user_specified_heads == 0 or (
#data.user_specified_heads == 1 and alternant_multiword_spec.incorporated_headword_head_into_lemma
) then
data.heads = {}
for _, lemma_obj in ipairs(alternant_multiword_spec.forms.inf) do
local quals, refs = require(inflection_utilities_module).
convert_footnotes_to_qualifiers_and_references(lemma_obj.footnotes)
insert(data.heads, {term = lemma_obj.form, q = quals, refs = refs})
end
end
end
end
}
-----------------------------------------------------------------------------------------
-- Suffix forms --
-----------------------------------------------------------------------------------------
pos_functions["Bentuk akhiran"] = {
params = {
[1] = {required = true, list = true, disallow_holes = true},
["g"] = {list = true, disallow_holes = true, type = "genders", flatten = true},
},
func = function(args, data)
validate_genders(args.g)
data.genders = args.g
local suffix_type = {}
for _, typ in ipairs(args[1]) do
insert(suffix_type, typ .. "-forming suffix")
end
insert(data.inflections, {label = "non-lemma form of " .. m_table.serialCommaJoin(suffix_type, {conj = "or"})})
end,
}
-----------------------------------------------------------------------------------------
-- Arbitrary parts of speech --
-----------------------------------------------------------------------------------------
pos_functions["arbitrary part of speech"] = {
params = {
[1] = {required = true},
["g"] = {list = true, disallow_holes = true, type = "genders", flatten = true},
},
func = function(args, data)
if data.is_suffix then
error("Can't use [[Template:it-pos]] with suffixes")
end
validate_genders(args.g)
data.genders = args.g
local plpos = m_en_utilities.pluralize(args[1])
data.pos_category = plpos
end,
}
return export
fmkyqlfvyjsu95owimlbwfw6333j55j
suminding
0
25411
281458
213202
2026-04-23T00:40:20Z
GodModeBoros
10321
281458
wikitext
text/x-wiki
==Bahasa Kadazandusun==
===lumoyou===
====kata kerja====
{{inti|dtp|kata kerja}}
#[[menyanyi]]
#: {{syn|dtp|lumoyou}}
#: {{syn|dtp|monondig}}
#: {{ux|dtp|Orohian no moti i Kuding di do '''suminding'''.
|Kuding sangat suka '''menyanyi'''.}}
===Sebutan===
* {{IPA|dtp|/sʊ.min.diŋ/}}
* {{penyempangan|dtp|su|min|ding}}
4zzjun2vvgrlq1iebjqwzwk8bolqkws
anid
0
39910
281450
153360
2026-04-22T12:57:42Z
Hakimi97
2668
/* Kata nama */
281450
wikitext
text/x-wiki
==Bahasa Lun dayeh==
===Takrifan===
====Kata nama====
{{inti|lnd|kata nama}}
# setiap
7fc7rinw672a8n6uzwy3hi3qk6h0vkb
281451
281450
2026-04-22T12:57:54Z
Hakimi97
2668
281451
wikitext
text/x-wiki
==Bahasa Lun Dayeh==
===Takrifan===
====Kata nama====
{{inti|lnd|kata nama}}
# setiap
npc1mf02zj3znal8zur7kb78dm1sxlo
mamaso
0
97162
281457
260609
2026-04-23T00:38:58Z
GodModeBoros
10321
281457
wikitext
text/x-wiki
==Bahasa Kadazandusun==
===Takrifan===
====Kata nama====
{{inti|dtp|kata nama}}
# [[semasa]]
#: {{cp|dtp|Mamain tadi ku o tolipun ku '''mamaso''' nokodop oku di konihab.| Adik saya bermain telefon saya '''semasa''' saya tertidur kelmarin.}}
515ov3yrv1ylzpgaxc292qwbr0qcqf2
Kategori:Perkataan bahasa Korea dengan kunci isih tidak lewah dan tidak automatik
14
114937
281452
2026-04-22T15:26:55Z
Hakimi97
2668
Mencipta laman baru dengan kandungan '{{auto cat}}'
281452
wikitext
text/x-wiki
{{auto cat}}
eomzlm5v4j7ond1phrju7cnue91g5qx