User talk:Benwing2

Definition from Wiktionary, the free dictionary
Jump to: navigation, search




Hello, welcome to Wiktionary, and thank you for your contributions so far. Here are a few good links for newcomers:

  • How to edit a page is a concise list of technical guidelines to the wiki format we use here: how to, for example, make text boldfaced or create hyperlinks. Feel free to practice in the sandbox. If you would like a slower introduction we have a short tutorial.
  • Entry layout explained (ELE) is a detailed policy documenting how Wiktionary pages should be formatted. All entries should conform to this standard, the easiest way to do this is to copy exactly an existing page for a similar word.
  • Our Criteria for inclusion (CFI) define exactly which words Wiktionary is interested in including. There is also a list of things that Wiktionary is not for a higher level overview.
  • If you already have some experience with editing our sister project Wikipedia, then you may find our guide to Wikipedia users useful.
  • The FAQ aims to answer most of your remaining questions, and there are several help pages that you can browse for more information.
  • We have discussion rooms in which you can ask any question about Wiktionary or its entries, a glossary of our technical jargon, and some hints for dealing with the more common communication issues.

Also, please add a BabelBox to your userpage so we can help you with the languages you'll be working in.

I hope you enjoy editing here and being a Wiktionarian! If you have any questions, bring them to the Wiktionary:Information desk, or ask me on my talk page. If you do so, please sign your posts with four tildes: ~~~~ which automatically produces your username and the current date and time.

Again, welcome!

RuakhTALK 14:07, 22 August 2012 (UTC)

Vowel length[edit]

Please do not edit policy or guideline pages to reflect your personal opinion on this matter without discussing with other editors with experience in Ancient Greek entries first. —Μετάknowledgediscuss/deeds 02:15, 15 November 2013 (UTC)

You have new messages Hello, Benwing2. You have new messages at Metaknowledge's talk page.
You can remove this notice at any time by removing the {{talkback}} template.

Μετάknowledgediscuss/deeds 02:29, 15 November 2013 (UTC)

I was the one who wrote many of the original orthography and transliteration standards, though they have undergone some changes in the intervening time. In any case, I'm happy to address some of your issues. However, I think we need to set some things straight first. To begin with, we have had a great many self-proclaimed experts come and go on this project. You must understand that the anonymous context of the internet forces us to treat claims of authority with a grain of salt. Additionally, assertions of what absolutely needs to happen right now simply won't do. Things are done here based on consensus. If you would like things to change, that's completely reasonable. However, you must present your evidence, and win allies with discussion. Personally, I think that vowel length and accent are real components of Ancient Greek phonology, and are something that merits note in our entries; however I think it's important to understand what the purpose of transliterations are here on Wiktionary. Transliterations are never used here as a substitute for the original script, as they are in many other contexts. They are a pedagogic tool, used to help those who don't understand the original script, which they accompany. So, they are an approximation for the uninformed. A highly precise technical transliteration is unnecessary, and serves only to confuse those whom it is meant to help. -Atelaes λάλει ἐμοί 03:04, 15 November 2013 (UTC)
Sorry to barge in like this but this issue with vowel length is one of many issues with Wiktionary which (esp. compared to the English Wikipedia) make it look rather amateurish. (Lack of references is another one.)
I understand your concern about self-proclaimed experts. But go look at my contributions on the English Wikipedia and you will see that I do actually know a bit about the subjects at hand. Ask User:CodeCat, User:Angr and others who contribute to Wikipedia linguistics/language articles about me, if you want.
I'm also guessing that you are not an expert in linguistics, but may have some Classicist knowledge of Ancient Greek. The Classicist viewpoint comes through in various things you say (denigration of transcriptions as an "approximation for the uninformed", insistence on use and importance of the original script, apparent unconcern with not noting vowel length explicitly in all cases). However, Wiktionary is a linguistic work; this goes especially for etymologies. Hence we need to be following linguistic standards, not Classicist standards.
On top of this, your statements about transcriptions are wrong on a number of counts:
  1. In technical linguistics articles esp. on historical linguistics and etymology, it is not reasonable to expect that readers can handle every script out there. Transcription (not "transliteration", which refers to letter-for-letter representation in Latin script, although for Greek the difference isn't too great) is the norm and is the only reasonable way e.g. for even a knowledgeable reader to handle the different languages. Hence, something like the etymology of Old Irish ibid "he drinks" that makes references to Latin bibō and pōtō, Greek pī́nō, Armenian ǝmpǝm, Sanskrit pibati, Old Church Slavonic piti will make everyone go crazy if they are written in four different scripts (Greek, Armenian, Devanagari, Cyrillic) with the expectation that the readers "should" know all these scripts and are "uninformed" (your words) if they don't.
  2. Furthermore, the problem here is that the original Greek script wasn't properly reflecting long vowels, either. This is evidently due to your assertion, made into policy, that vowel length doesn't need to be noted in the Greek script or transcription — a typically Classicist viewpoint, quite reasonable in the context of intrepreting a work of Ancient Greek literature but not appropriate to a linguistic work.
Where should this discussion take place? I'm not asserting, and never asserted, that this change must happen "right now", but it does indeed need to happen at some point, hopefully soon. I am almost positive that all the other linguists working here (I've seen CodeCat and Angr here, there must be others) will agree with me, so I imagine consensus is not too hard to reach on this.
For reference, compare what's done in Latin, Old English, Old High German, etc. where long vowels are always indicated in all uses of every word including in head words, even though the original texts didn't have length marks any more than the original Greek texts did. Greek should follow what every other language does.

Benwing (talk) 09:07, 15 November 2013 (UTC)

If you would like to gain official consensus the Beer parlour is the appropriate place. There are indeed a number of other editors who seem to prefer the more involved transcriptions. I have held them off thus far, but it's quite possible that a determined and eloquent proponent could cause a shift in policy. Until such time, though, I would ask that you refrain from editing existing entries to conform to your view, as I will continue to undo such edits. If you wish to create new content, you are more at liberty to do so as you wish. -Atelaes λάλει ἐμοί 16:43, 15 November 2013 (UTC)

Moving pages[edit]

We generally avoid redirects on Wiktionary, so when you move a page to correct the spelling, could you place {{delete}} on the redirect that's left behind? —CodeCat 10:47, 2 July 2014 (UTC)

Will do. Does it matter where I put it in the redirect page? Presumably after the redirect itself, on the next line?


Rollback link is very close to patrol link so I misclick them sometimes. I use a browser extension which enables me to select a screen region with a mouse and "click" all of the selected links at once upon release. Edits here are high volume and people often make mistakes... Cheers --Ivan Štambuk (talk) 12:16, 2 July 2014 (UTC)

Old French[edit]

I'd be interested to know what your background is. Renard Migrant (talk) 17:58, 12 July 2014 (UTC)

With {{fro-conj-er}}, could you fix it if possible to not need any parameters? Like {{fro-conj-er|dress}}, using Lua can't it deduce that the stem is dress by taking off the final -er? Renard Migrant (talk) 10:30, 24 July 2014 (UTC)
You're right, this is possible. I'll look into it. Benwing (talk) 11:21, 24 July 2014 (UTC)
Please don't delete words that definitely exist. I have no idea why you would do that, I can only assume you haven't read WT:CFI#Attestation. Renard Migrant (talk) 12:19, 24 July 2014 (UTC)
I have undone the deletions. They appear to be Anglo-Norman words, not standard OF words. Standard OF has -gier, not -ger. Benwing (talk) 12:21, 24 July 2014 (UTC)
As I'm sure you know, there's no such thing as standard Old French. Standardization didn't exist yet, by a few hundred years at that. Renard Migrant (talk) 12:22, 24 July 2014 (UTC)
But use of -gier is pretty consistent in Francien works. And there is such a thing as standard spellings in the handbooks. e.g. amer is standard, aimer is not. I actually question whether aimer is a spurious form based on the later language. Yes, you might find occasional places where 'aim-' and 'am-' intrude on each other, but that doesn't (IMO) justify having an entry for aimer. In general, I've been trying to correct a mess of mistakes, e.g. non-standard forms like herberger having entries while standard herbergier doesn't, or std forms being claimed as alternatives to non-standard forms, etc. The current situation has the appearance that someone didn't really know OF very well when creating the entries. Certainly the conjugations were completely and utterly wrong; whoever did them just copied Modern French declensions and hoped they were the same (oops ...). Since you complain, I will not delete these forms but I'll continue to redirect non-standard to standard forms, to try and reduce the chaos of these forms. Benwing (talk) 12:30, 24 July 2014 (UTC)
These are scholarly standard forms. Basically the forms preferred by scholars. We include all words whether included by scholars or not. Less common forms are not mistakes! Do you propose to delete honor because we already have an entry for honour? If your 'corrections' involve deleting truthful information then please stop. If you're just not well enough informed on the subject, also stop. Renard Migrant (talk) 12:34, 24 July 2014 (UTC)
If you want to nominate these forms for deletion, your rationale will have to be "these definitely exist but I don't like them". See what response you get from other people. Renard Migrant (talk) 12:35, 24 July 2014 (UTC)
Look, I already told you I won't be deleting any entries. And my corrections aren't deleting truthful info. You're welcome to look over my changes and critique them if you really want. BTW I'm going to bed now so if you don't hear any more responses from me for awhile it's not because I'm ignoring you or anything but just because I need sleep. Benwing (talk) 12:51, 24 July 2014 (UTC)
Sorry you're right. Renard Migrant (talk) 13:24, 24 July 2014 (UTC)
We class Anglo-Norman as a dialect of Old French. See Template talk:xno. — Ungoliant (falai) 19:08, 24 July 2014 (UTC)

Category:Pages with module errors[edit]

Seems to be Module:fro-verb's fault. Keφr 10:21, 28 July 2014 (UTC)

Thanks, I fixed it. Benwing (talk) 19:53, 28 July 2014 (UTC)

Template:Old French preterite type boiler, Template:Old French verb ending boiler[edit]

These are named contrary to our template naming customs. Also, User:CodeCat has been developing a category boilerplate infrastructure recently, into which it might be desirable to integrate these two. Keφr 08:05, 12 August 2014 (UTC)

How are they supposed to be named? I couldn't figure that out from the link you posted. I named them based on Template:Spanish conjugation boiler. Is that also misnamed? Benwing (talk) 08:39, 12 August 2014 (UTC)
I guess {{fro-preterite catboiler}} or something similar. And I never saw these templates, but I suppose yes; though these naming conventions are not actually strict policies. They are just a codification of some coding practices, some of which are relatively recent. Keφr 09:43, 12 August 2014 (UTC)
How about {{fro-preterite type catboiler}} and {{fro-verb ending catboiler}}? Benwing (talk) 05:48, 13 August 2014 (UTC)
Fine by me. Keφr 09:42, 13 August 2014 (UTC)
OK, they've been changed. Benwing (talk) 10:55, 13 August 2014 (UTC)


I created حَقَّقَ(ḥaqqaqa) because of your edits on حق. By the way, how much do you know about the Arabic language? --Lo Ximiendo (talk) 00:00, 15 August 2014 (UTC)

I studied Arabic for a couple of years and I know the verb conjugations reasonably well. I wonder why they aren't automated? Seems like a perfect opportunity since the conjugations are so systematic. I've written code in other circumstances to generate Arabic verb conjugations and it isn't all that hard. Benwing (talk) 03:17, 15 August 2014 (UTC)
Hi and welcome. You're more than welcome to take over the work on Module:ar-verb (there are many existing working templates too, which cover various conjugations). Even if the module doesn't provide transliterations, it would be great to have it. Please don't underestimate the amount of work required for this module to cover all types of conjugations. Pls add a Babel to your user page, so that people know which languages you speak. --Anatoli T. (обсудить/вклад) 23:18, 24 August 2014 (UTC)
Hi Anatoli. You might have noticed I've done a bunch of changes to Module:ar-verb, generalizing the code (e.g. you can specify an arbitrary number of verbal nouns), finishing form I geminate (including the alternative jussive forms), and adding form II and III strong. It should be easier to expand from now on and it does provide transliterations, using Module:ar-translit. You're right that it's a lot of work to get all the conjugations. Potentially especially problematic are the hamzated ones. I think the best thing here is to write a module that substitutes the correct hamza seat based on the surrounding vowels. This is definitely possible, and there are detailed rules (which I wrote) on the Wikipedia page on "hamza". I'll look into adding Babel stuff; not quite sure how to do it but I'll look at some existing user pages. I already have this info on my Wikipedia user page. Benwing (talk) 01:14, 25 August 2014 (UTC)
Oh, I didn't notice that you edited the module page. At some stage I just lost motivation. I've got Arabic grammar books though, so I can help with testing the module for specific conjugation types and might add some types, once all the infrastructure is there and we have some working examples. I won't be able to fix any issues with the wrong display for diacritics. I hope User:ZxxZxxZ can also help. Good luck! --Anatoli T. (обсудить/вклад) 01:23, 25 August 2014 (UTC)
Thanks. I'm not sure what the issue is with the diacritics; I notice a comment about shadda + fatha getting displayed wrong, but I don't see this, regardless of whether I put the diacritics in shadda-fatha order or in the fatha-shadda order that you stuck in using dia.sh_a. Possibly this bug has been fixed in the software? Benwing (talk) 01:29, 25 August 2014 (UTC)
BTW there's also a detailed Wikipedia page on w:Arabic verbs which I wrote awhile ago; it lists all the conjugations with all the weaknesses. It's largely in transliterated form so the hamza issue doesn't come up and isn't treated as a weakness. Benwing (talk) 01:32, 25 August 2014 (UTC)
The diacritics bugs are not consistent and they are visible when testing with different OS and browsers. I think it's best to use the correct logical order and address the issues when they happen. Your WP page looks very good. The focus should be on the Arabic script, though, so hamzated verbs should take into account spelling changes. --Anatoli T. (обсудить/вклад) 01:47, 25 August 2014 (UTC)
Thanks. Agreed on targeting the Arabic script. If the diacritic bugs are still there and simply requiring reversing the order of shadda-fatha and such, then the correct way to deal with them is to postprocess the output, applying the reversals as necessary. Do you see the errors on your machine? (If so, what is your OS and browser? I'm using Chrome on Mac OS X, and no problems for me.) Take a look at User:Atitarev/ar-conjug-I-geminate-test and tell me if you see the errors in any of the numerous forms with shadda-fatha (e.g. 'dalla' or 'dallā') or combinations with other short vowels. Benwing (talk) 02:29, 25 August 2014 (UTC)
I currently see User:Atitarev/ar-conjug-I-geminate-test correctly on Windows 7, Firefox 31. --Anatoli T. (обсудить/вклад) 02:36, 25 August 2014 (UTC)

About moving Arabic verbs[edit]

I moved two verbs and a noun to أصل from اصل. Would you like to create entries for the two verbs that are listed on the latter? --Lo Ximiendo (talk) 22:35, 31 August 2014 (UTC)

Done. Benwing (talk) 22:40, 31 August 2014 (UTC)
I actually mean the verbs تأصل and استأصل. --Lo Ximiendo (talk) 23:49, 31 August 2014 (UTC)
I'm confused. If you can move those two verbs to where they belong, I can add the conjugations. Benwing (talk) 01:21, 1 September 2014 (UTC)
I added the verbs already. *gulp* --Lo Ximiendo (talk) 11:02, 1 September 2014 (UTC)
Thank you very much! I went ahead and added the conj. Benwing (talk) 11:26, 1 September 2014 (UTC)


I wasn't too sure about the imperfect, especially in the automated conjugation table that was given to the entry. Have you noticed that? I did. --Lo Ximiendo (talk) 10:21, 1 September 2014 (UTC)

Noticed what? I just checked my verb tables and it looks correct. I have tables for the verb أقام and the ones I generate for that verb look correct, and أجاب should follow exactly the same conjugation. Is there anything in particular that seems wrong to you?
BTW which automated tool are you using to do the edits such as you did on أجاب? I only know of AWB but usually it announces itself in edit entries. Benwing (talk) 11:14, 1 September 2014 (UTC)
The ar-conj template gives out yujību instead of ar-verb's yajību. That's what I noticed. --Lo Ximiendo (talk) 11:24, 1 September 2014 (UTC)
ar-verb is wrong, ar-conj is correct. Forms II, III, IV and Iq take prefixes with -u- in the active imperfect, whereas all the others take -a-. There may be lots of other errors in ar-verb but I'm pretty confident in the correctness of ar-conj. Benwing (talk) 11:30, 1 September 2014 (UTC)
Maybe it's just the editor's fault that they used yajību instead of yujību? --Lo Ximiendo (talk) 11:33, 1 September 2014 (UTC)
Probably ... I'm thinking actually that ar-verb needs to be automated like ar-conj so you don't have to type in any more info than what you type into ar-conj (except to clarify the radicals in a few cases), and it automatically figures out the radicals from the headword and generates the 3rd-person masculine singular past and non-past indicative. I added a comment to your talk page about this. Benwing (talk) 11:38, 1 September 2014 (UTC)


I also created تأمل to move it from أمل. Cheers. --Lo Ximiendo (talk) 10:08, 2 September 2014 (UTC)

ar-verb forms for ط و ع[edit]

I think there should be a way to modify {{ar-verb forms}} so that it accommodates Arabic roots such as ط و ع. --Lo Ximiendo (talk) 10:30, 2 September 2014 (UTC)

I don't have a very good understanding of all those templates. Can you explain how {{ar-verb forms}} is used? Do you call it directly or is it call from another template? Where is it used (in the headword line, etc.)?
However, all the code to handle all types of Arabic roots is already in Module:ar-verb. In the process of generating conjugation tables it generates all the forms that {{ar-verb forms}} generates and it handles all the types of roots and in general does all sorts of things way better than any of the current templates. Notice for example that in a non-form-I verb, all I have to do is write e.g. {{ar-conj|III}} and it automatically infers the appropriate radicals and generates all the forms, with all the vowels and also automatically transliterated. There's no reason that {{ar-verb}} couldn't take similar parameters and automatically generate the vocalized head word, the vocalized 3rd-person masculine singular imperfect indicative to display in the headword line, plus automatic transliteration, etc. Benwing (talk) 11:03, 2 September 2014 (UTC)
You could have a look at ج ه د for an example of {{ar-verb forms}} at work. --Lo Ximiendo (talk) 11:56, 2 September 2014 (UTC)
I moved the red link verbs to their new homes, along with those that were already created. They also request definitions (maybe not the form I and II verbs?). --Lo Ximiendo (talk) 10:43, 3 September 2014 (UTC)

Beer parlour[edit]

These discussions you are starting at the BP about Arabic templates, don't really belong there. The BP is sort of like the Supreme Court in that discussions there should affect all of Wiktionary. --WikiTiki89 14:18, 3 September 2014 (UTC)

@Atitarev Actually you were the one who started the latest one. --WikiTiki89 14:20, 3 September 2014 (UTC)

Two Arabic verb categories[edit]

I created Category:Arabic form-? verbs‏‎ and re-created Category:Arabic geminate form-II verbs‏‎ because they have members, but I can easily delete them if you think we shouldn't have them. In the latter case, you should make the corrections necessary so the entries don't get categorized in them. Thanks! Chuck Entz (talk) 15:47, 5 September 2014 (UTC)

Yeah, these categories should be there, thanks. The first one indicates a mistake in the entry (missing form= param) but it's still useful. I have no idea why I deleted the second one. Benwing (talk) 18:01, 5 September 2014 (UTC)

Entries created from the list of Arabic Quranic Verbs[edit]

Hi, in case you haven't noticed, I created the verb مَكَثَ(makaṯa) from the second half (501-1000) of the aforementioned list. --Lo Ximiendo (talk) 13:17, 14 September 2014 (UTC)

Also created نَفِدَ(nafida) some time ago. --Lo Ximiendo (talk) 02:27, 15 September 2014 (UTC)

Arabic collective nouns and their category[edit]

I wish {{ar-coll-noun}} gets its Category:Arabic collective nouns sorting back. Any thoughts about that? --Lo Ximiendo (talk) 13:04, 16 September 2014 (UTC)

Fixed. Benwing (talk) 13:13, 16 September 2014 (UTC)
Thank you. :) So it was just a simple bug... --Lo Ximiendo (talk) 13:22, 16 September 2014 (UTC)
I don't know how the templates {{ar-coll-noun}} and {{ar-sing-noun}} now sort Arabic nouns into a single red link category now instead of Category:Arabic collective nouns and Category:Arabic singulative nouns. --Lo Ximiendo (talk) 04:01, 19 September 2014 (UTC)
Oops. That is now fixed. Benwing (talk) 04:09, 19 September 2014 (UTC)
Thank you again. Besides, I'm going on a vacation to Topsail Island and be back in about a week (I think). --Lo Ximiendo (talk) 04:34, 19 September 2014 (UTC)
Have fun!!! Benwing (talk) 04:36, 19 September 2014 (UTC)

Category:Old French verbs with partial overrides[edit]

Were you intending to use this category for anything? —CodeCat 20:38, 25 September 2014 (UTC)

I went ahead and created it. It's intended to signal a particular practice that should be avoided as much as possible. Benwing (talk) 21:13, 25 September 2014 (UTC)

Arabic head parameters[edit]

If I understand it correctly, all Arabic headword lines should eventually have this parameter? If so, then it may be more efficient to make it the first positional parameter. We've already done this for Russian, Ukrainian and Slovene, which need accent marks for most words. What do you think of this? —CodeCat 20:58, 5 October 2014 (UTC)

Yes, all Arabic words should have it. However, there's a complication in that sometimes there are multiple possible vocalizations, which are currently implemented using head2=, head3=, head4=, etc. If we make head= the first positional parameter, what do we do about the remainder? One possibility is to allow multiple heads to be specified in a single head= parameter, separated by e.g. commas (this means in the unlikely case where a comma appears in a headword, it needs to be HTML-escaped, but that seems no big deal). It also shortens the typing effort. I suppose we could also have the first positional param be the head, and other ones still use head2=, head3=, etc.
Also keep in mind the effort required to fix all the various Arabic headword templates and usages of those templates if you make this change. Benwing (talk) 21:07, 5 October 2014 (UTC)
Yes I was thinking the only change would be head= to 1=, but the other headword parameters wouldn't change. This kind of "paradigm" is relatively common in Wiktionary templates. I am considering making Module:ar-headword for this. —CodeCat 21:12, 5 October 2014 (UTC)
If you're willing to fix everything up yourself, go ahead. Keep in mind there are many templates in Category:Arabic headword-line templates that make use of the param head= in various ways, and would all need to be fixed. Benwing (talk) 21:15, 5 October 2014 (UTC)
Yes, I'm aware of that. But it's fairly easy to rename and move around parameters with a bot, combined with tracking categories. —CodeCat 21:18, 5 October 2014 (UTC)
OK. Benwing (talk) 21:19, 5 October 2014 (UTC)
I've made the change to all Arabic headword-line parameters, except (for now) {{ar-nisba}}, {{ar-verb}} and {{ar-verb-part}}. It turned out that none of the templates used the 1= parameter for anything yet, so I didn't need to shift anything around. This means that for now, both head= and 1= work. But of course the former is deprecated now. Could you update the documentation of the templates? —CodeCat 23:34, 5 October 2014 (UTC)
Done. Benwing (talk) 00:08, 6 October 2014 (UTC)
Thank you. We could change more of the parameters to positional too. g= is probably a candidate, and maybe other {{ar-noun}} parameters too. —CodeCat 00:11, 6 October 2014 (UTC)
I'm wary of too much of this. At least, there should be some logic to parameters that are positional so it's not just a random collection in a hard-to-remember order (or to remember which are positional and which aren't). Benwing (talk) 00:15, 6 October 2014 (UTC)
A lot of templates already have the gender as the first positional parameter, and I noted above that for some, the headword is the first; the gender is the second then. So this is not so hard to remember. —CodeCat 00:20, 6 October 2014 (UTC)
OK, if you're gonna write the bot code to fix up the calls, go ahead. Benwing (talk) 00:24, 6 October 2014 (UTC)
Just to add... On Wiktionary, a somewhat general practice in writing templates is that the most frequently used and non-optional parameters are positional, while more rarely used or optional ones are named. In principle, every call to {{ar-noun}} should have a gender specified, so it's a good candidate for making it positional. That's actually the same reason I offered to make the headword parameter positional too. —CodeCat 00:30, 6 October 2014 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── As it happens, in the case of gender, it could be made optional. The large majority of nouns have their gender in accordance with their ending, and we could potentially list only the exceptions. This is what Arabic dictionaries typically do, for example. Benwing (talk) 00:37, 6 October 2014 (UTC)

I think it's a good idea to add gender, anyway, even if it's largely predictable and can be loaded automatically. That way, Wiktionary will be better than other dictionaries, which don't show genders. I sometimes have doubts about nouns ending in ه‎ (which may be silent or stand for ة‎), ا or ء. The noun gender for humans are often determined semantically, not by endings and it's somewhat confusing for place and country names. --Anatoli T. (обсудить/вклад) 00:45, 6 October 2014 (UTC)
We could automatically determine gender for nouns in quite a few languages. But in practice we don't do this because gender tends to be somewhat unpredictable even then. Every language has exceptions. —CodeCat 00:47, 6 October 2014 (UTC)

Arabic adjective genders[edit]

I noticed that {{ar-adj}} takes a gender parameter, but I'm not sure why. I imagine that Arabic adjectives, like those in Indo-European languages, take the gender of the thing they refer to. I examined the entries that provide this parameter, and apparently the vast majority specify g=m but a few have g=f. I don't know what the practices are regarding which form is considered the lemma, but if I assume right that it's the masculine singular form, then the entries which specify feminine gender should probably be looked at and converted into a {{feminine of}} type entry. Could you have a look at these? They're at Special:WhatLinksHere/Template:tracking/ar-head/adj g/f.

Regarding the remainder, would it be correct to eliminate the g= parameter altogether for {{ar-adj}}, and assume that all entries that use this template are masculine singular adjectives? —CodeCat 19:46, 6 October 2014 (UTC)

Yes, you are right that masculine singular forms are the lemma, and gender in Arabic adjectives does work essentially like Indo-European languages. I think it's correct to eliminate the gender code from them. Those forms marked as feminine are non-lemma feminine singular forms and should be converted as you specify. Benwing (talk) 01:05, 7 October 2014 (UTC)
But I don't know the corresponding masculine forms, so I don't know how to fix them. —CodeCat 01:07, 7 October 2014 (UTC)
@CodeCat All terms, except أنثى and عليا are term minus the final ة, which acts as a feminine marker, e.g. أوروبية is a feminine form of أوروبي. Yes, they should use {{feminine of}}. --Anatoli T. (обсудить/вклад)
@CodeCat All the terms should be fixed up. أنثى appears to possibly be feminine tantum and so I listed it just as an adjective (no gender); the others are listed as "adjective form"s with g=f and use of {{feminine of}}. Benwing (talk) 09:28, 8 October 2014 (UTC)
I've removed the genders from adjectives now. But can you check something? The template {{ar-adj-color}} had feminine and plural forms, but which plural form is this? Is it the masculine plural or the common plural? —CodeCat 19:25, 8 October 2014 (UTC)

Entries where the xxhead= parameter is not the xx= parameter + vowels[edit]

I'm working on removing redundant the xxhead= parameters now. But there are a few entries, listed at Special:WhatLinksHere/Template:tracking/ar-head/xhead/needed, where, if vowels are removed from the xxhead= parameter, the result is not identical to the xx= parameter. I don't know much at all about Arabic and my knowledge of the writing is very basic, so I'm not able to fix these. Could you have a look? —CodeCat 21:04, 6 October 2014 (UTC)

Benwing, are we making headword with or without ʾiʿrāb? Also, do we need to add fatḥa before alif - إِمْتِحَان or إِمْتِحان? it seems to work without. I can fix some. --Anatoli T. (обсудить/вклад) 23:49, 6 October 2014 (UTC)
There are cases with irregular transliterations خَطَر xaṭar (x should be replaced with ḵ, ħ with ḥ) or missing vowels صَوت (missing sukūn), should be صَوْت, etc. --Anatoli T. (обсудить/вклад) 23:54, 6 October 2014 (UTC)
(Edit conflict) I have fixed most of them, not sure why سريع is still appearing there. I don't know the vowels for the second word in شبح ظل. Is it a SoP? --Anatoli T. (обсудить/вклад) 00:37, 7 October 2014 (UTC)
We don't currently have any consensus about whether to put ʾiʿrāb in headwords. What do you think? There seems to be a sort-of convention to include ʾiʿrāb in noun headwords but not in the transliterations, but that will require some special-case hacking to distinguish verbs from nouns, since we do want the ʾiʿrāb in verbs. Convention in most dictionaries seems to be to omit the ʾiʿrāb in nouns, but then diptotes need to be marked in some special fashion. For example, the Hans Wehr dictionary puts a subscript 2 by diptotes. Adding ʾiʿrāb is one way of indicating this. I guess probably we should include ʾiʿrāb and if necessary go ahead and include it in the transliterations as well, but I'm not sure.
There are unfortunately various systems being used for transliteration. x in place of ḵ and ħ instead of ẖ are some of the most common substitutions. If you look in Module:ar-translit you'll see I handle lots of transliteration conventions in the code that generates vowels from transliteration. This should be fixed with a bot.
Missing vowels should be added. This can be done from the transliteration usually. As for fatḥa before alif, there is special casing in the transliteration code to handle this case and a few other cases where there's no ambiguity when the vowels are omitted, but they should be there still. Benwing (talk) 00:29, 7 October 2014 (UTC)
CodeCat (talkcontribs), I fixed the two cases I saw in Special:WhatLinksHere/Template:tracking/ar-head/xhead/needed. Benwing (talk) 00:33, 7 October 2014 (UTC)
I think it's OK to put ʾiʿrāb in headwords AND transliterate it. It was already agreed on, I think. Not sure why inflected forms are not transliterated. I don't like tāʿ marbūta transliterated as "-a(t)". Not sure it was discussed and agreed on. I think it's better to use "-a" or "-atun" if ʾiʿrāb is given. The page about Arabic can teach the actual (pausal, informal) pronunciations. --Anatoli T. (обсудить/вклад) 00:37, 7 October 2014 (UTC)
I'm ok with ʾiʿrāb in headwords and transliterations. Inflected forms aren't transliterated because of changes that CodeCat made; I've asked her to undo these changes or incorporate them into {{head}}. The transliteration of tāʾ marbūṭa as "-a(t)" should occur only when it appears as the first word in a multi-word expression. When appearing at the end of text, it should appear as "-a", and as "-atun" with ʾiʿrāb vowels. Benwing (talk) 00:45, 7 October 2014 (UTC)
Thanks. In ʾiḍāfa, the genitive construct, it should be "-at", not "-a(t)" and "-āt", if it follows an alif. We discussed this as well. --Anatoli T. (обсудить/вклад) 00:50, 7 October 2014 (UTC)
I think the best compromise about ʾiʿrāb is to de-emphasize them, either by graying them out—سَنَةٌ(sanatun)—or by superscripting them—سَنَةٌ(sanatun). The superscript currently looks ugly and I am not sure why. The remaining question is what to do with the fatḥatān-ʾalif ending, which when omitted leaves behind a long -ā. I think the best solution is to simply transliterate all fatḥatān occurrences normally—مَعًا(maʿan). --WikiTiki89 00:51, 7 October 2014 (UTC)
So, you oppose ʾiʿrāb transliteration? It's easier to arrive at pausal form than the other way around. ʾiʿrāb can be simply omitted in pronunciation but users will know the full form and know, which one is a diptote or triptote. Also, I just thought that it won't be possible to determine programmatically ʾiḍāfa or a noun + adjective. A flag could be used for that, I think. --Anatoli T. (обсудить/вклад) 00:55, 7 October 2014 (UTC)
Graying them out is OK with me, if you don't like the normal way. --Anatoli T. (обсудить/вклад) 00:57, 7 October 2014 (UTC)
The only thing about graying them or superscripting them is that it's a bit tricky to do this without adding manual tr= params everywhere, which defeats the purpose of automatic transliteration -- at least that's the case if we want to cite verbs with ʾiʿrāb. There are ways around this but they might not work consistently. (Alternatively, we could always gray, even for verbs.) As for fatḥatān-ʾalif, they should definitely appear normally as -an. Benwing (talk) 01:01, 7 October 2014 (UTC)
Why is it tricky to do it without manual tr= params? The module can return html tags as part of the transliteration. --WikiTiki89 03:59, 9 October 2014 (UTC)
The problem isn't with returning html from the module. What's tricky is if we want to gray out or omit ʾiʿrāb in nouns but not verbs, because the transliteration module doesn't know what's a noun and what's a verb. Verbs are traditionally cited in forms with full ʾiʿrāb -- certainly the dictionary form is. If you think we should gray out all ʾiʿrāb, including in dictionary-form verbs, then doing it automatically is not an issue. Benwing (talk) 04:26, 9 October 2014 (UTC)
If we are graying out ʾiʿrāb, there is no reason not to also gray it out for verbs. The ʾiʿrāb on verbs is omitted in the same contexts as nouns (i.e. in pausal position or in colloquial speech). --WikiTiki89 04:32, 9 October 2014 (UTC)
Anatoli -- The problem with transliterating -at in the genitive construct is that it's not programmatically obvious when such a construct occurs and when it doesn't, e.g. غُرْفّة البَيْت vs. الغُرْفَة الكَبِيرَة. Benwing (talk) 01:01, 7 October 2014 (UTC)
Ah, I see you noticed this too. Benwing (talk) 01:08, 7 October 2014 (UTC)
Yes, in that case, just "-a(t)" is fine, since it's not possible to determine if they are ʾiḍāfa or a noun + adjective. --Anatoli T. (обсудить/вклад) 05:24, 8 October 2014 (UTC)
In fully vowelated text, ʾiḍāfa is easy to identify as it lacks both nunation and the definite article. --WikiTiki89 03:59, 9 October 2014 (UTC)
Yes, you're right. That means we need to provide full vowels. For terms without ʾiʿrāb it's OK to leave "-a(t)" if greying out is not used. --Anatoli T. (обсудить/вклад) 04:13, 9 October 2014 (UTC)
I agree. Graying out is only relevant with ʾiʿrāb anyway. --WikiTiki89 04:17, 9 October 2014 (UTC)
The display of "-a(t)" already only occurs without ʾiʿrāb; it only occurs when ة is at the very end of the word followed by a space, or when ʾiʿrāb display is turned off and a space follows. If ʾiʿrāb vowels are supplied, ة is always displayed as a "t". However, your suggestion is useful when graying out to determine whether to gray out the "t". Benwing (talk) 04:26, 9 October 2014 (UTC)
All the old xxhead= parameters have now been removed, with their old values transferred over to the regular parameter name. —CodeCat 19:18, 8 October 2014 (UTC)

Arabic genders of numerals, collective and singulative nouns[edit]

I've now converted all uses of the g= parameter to the second positional parameter. But I came across a few things that I wonder if you could clarify.

  • All singulative nouns I came across were feminine. If this is a rule, then I suppose it could be made the default. Are there any exceptions?
  • Most collective nouns were masculine, except for ذرة and بوم. Again is this something that could be made default, but there are apparently exceptions. Unless those are errors, but I don't know that.
  • Currently, numerals also have a gender parameter. Do numerals have inherent gender like nouns, or do they adapt their gender like adjectives? Most of them were masculine, but some were both masculine and feminine.

CodeCat 22:05, 8 October 2014 (UTC)

As far as I know, singulative nouns are always feminine. They are formed from collective nouns by adding the feminine ending -ة. Collective nouns are generally always masculine, and are distinguished by having a singular form but plural meaning. (The corresponding singulative noun has a singular meaning.) The Wehr dictionary doesn't indicate ذرة as collective, so I'm not sure why it's marked as such, and it does indicate بوم as collective, but not as feminine. So it's possible those are both errors.
Numerals in Arabic are complicated. It's rather like Russian, where the cardinal numbers become progressively more noun-like and less adjective-like as they get higher. I went through them all recently and marked gender, which I think is correct, but it's questionable because some forms are in between nouns and adjectives. "One" and "two" are pure adjectives; "three" through "ten" behave like nouns in that the corresponding noun (e.g. in "three men") is in the genitive plural, but they also agree in gender with the governing noun. 11 through 19 are similar but govern the accusative singular. 20 through 90 again govern the accusative singular but don't agree with the governing noun, or alternatively their form is invariable in gender, which is why I marked them as both masculine and feminine. 100 and 1000 are clearly pure nouns and govern the genitive singular; 100 is feminine and 1000 is masculine, which can be seen by the agreement of smaller numbers in forms like 300 and 3000, where the word for 3 is feminine in 300 but masculine in 3000. The whole system is a huge mess. Benwing (talk) 23:02, 8 October 2014 (UTC)
Maybe we should indicate numerals using the part of speech they actually belong to then, rather than "numeral". After all, if they really are nouns or adjectives, then we should mark them as such. Concerning collectives, I wonder if the template could have no gender parameter at all, and always assume that they are masculine with no way to override. That assumption is only valid if there are no exceptions of course. For singulatives, I've already done this. —CodeCat 23:06, 8 October 2014 (UTC)
The issue with this is the forms that are partly noun-like and partly adjective-like, like 3 through 10 ... which do you declare them as? Benwing (talk) 23:09, 8 October 2014 (UTC)
I don't really know. Numerals are always a bit strange that way in many languages, that's why we use the "numeral" part of speech. It's kind of a catch-all for all the weirdity that goes on with such words in various languages. Of course that doesn't mean every single cardinal number term in a language has to be called "numeral". For example, miljoen(million) is marked as a noun, while duizend(thousand) and honderd(hundred) are both noun and numeral, and tien(ten) is a numeral only. So I'd suggest using adjective or noun for those where those terms clearly fit, and use numeral for the remainder?
And what about collectives? —CodeCat 23:15, 8 October 2014 (UTC)
What happens in Russian re. numerals? That's probably the closest to Arabic. As for collectives, بوم is apparently masculine in reality. No indication that it's feminine in any of the three dicts I looked in. ذرة is claimed to be simultaneously collective and singulative in Lane's comprehensive and verbose dictionary. I don't know what to do about that. I guess make the gender default to masculine but let it be overridden. Benwing (talk) 23:33, 8 October 2014 (UTC)
OK, Russian has all numerals as just "numeral" or "cardinal number". 1 and 2 are given with masculine and feminine forms; 100, 1000, etc. are tagged with their inherent gender, and the in-between ones, which are gender-invariable like Arabic 20 through 90, are marked without gender. I think this is probably the right solution for Arabic as well. Most languages appear to be consistent in using "numeral" etc. for all numbers; Dutch is the odd case out apparently. The Russian entries are also very well documented, including extensive usage notes on all the complications, so I think they're a good model to follow. Benwing (talk) 23:45, 8 October 2014 (UTC)
Thanks :) Arabic and Russian complexity of numerals are often used in debates and comparisons. They are a bit similar in usage, only feminine and masculine are confusing reversed in usage where feminine خمسة is used with mascline nouns and masculine خمس with feminine nouns. Russian numerals usually use genitive (singular or plural depending on the number). Number "one" is identical in usage, only Russian has also neuter. --Anatoli T. (обсудить/вклад) 23:50, 8 October 2014 (UTC)
I think more care should be taken regarding the part of speech. Dutch being the odd one out is not a good thing for the other languages I would say. German closely parallels Dutch for example so the entries should be similar. Dutch numbers don't inflect for gender or number, but the noun-ness is apparent from other syntactical structures. 100 and 1000 have plurals, for example. And "million" must be preceded by an article like any other counting noun (such as liter, dozijn(dozen), stapel(pile)). The entries themselves are a bit sparse, but w:Dutch grammar#Numerals goes into some detail. I've also tried to be exact for Proto-Slavic entries, so accordingly 1-4 are adjectives with full three-gender paradigms, 5-10 are feminine nouns with paradigms for only that gender. —CodeCat 23:55, 8 October 2014 (UTC)
See Cherine's second post here [1] Example:عشر نساء "ten women" (masculine numeral with feminine noun in plural), ستة أيام "six days" - feminine numeral with masculine noun in plural. --Anatoli T. (обсудить/вклад) 23:57, 8 October 2014 (UTC)
I just think it's a bit clunky to try to assign noun or adjective to numerals that don't behave quite as either. To assign "numeral" to 3-90 whereas "adjective" to 1 and 2 and "noun" to 100 up seems really ugly. All are numerals; they also behave similar to adjectives and/or nouns, but with enough special cases that this should probably be treated as usage info. For example, the word مئة "hundred" behaves mostly as a noun, but irregularly has a plural that's the same as its singular, which no other feminine noun does. Benwing (talk) 00:04, 9 October 2014 (UTC)
Yes, treat them as numerals regardless of behaviour. We'll need to explain why خَمْسَة(ḵamsa) (feminine-looking numeral) is a masculine and خَمْس(ḵams) (masculine-looking numeral) is a feminine. Usage notes, appendix, something else? --Anatoli T. (обсудить/вклад) 05:18, 9 October 2014 (UTC)

Plural of inanimate nouns[edit]


Also @CodeCat I've edited رِيَاح شَمْسِيَّة(riyāḥ šamsiyya) and رِيَاح نَجْمِيَّة(riyāḥ najmiyya). What I don't like is the gender "m-pl". Inanimate objects and animals in plural are grammatically feminine, aren't they (which is reflected in the adjectives used)? And there's no distinction between masc. and fem. plural for objects. "m-pl" and "f-pl" should probably only be used for humans, IMO. Did I miss anything? I can't use simply "p" for plural. --Anatoli T. (обсудить/вклад) 22:56, 8 October 2014 (UTC)

I've edited Module:ar-headword so that it recognises "p" as the plural gender, rather than "m-p" or "f-p". —CodeCat 23:02, 8 October 2014 (UTC)
Yes, plural inanimate objects take feminine singular agreement in Arabic, regardless of what their singular gender is. Plural adjectives are used only for people. I'm not sure about animals, might depend on whether they are higher or lower animals, who knows? Probably just "plural" is correct as the gender. Benwing (talk) 23:06, 8 October 2014 (UTC)
OK, I noticed you deleted m-p and f-p as possibilities. They still apply to animate nouns, so should remain as possibilities. Benwing (talk) 23:48, 8 October 2014 (UTC)
@CodeCat Yes, please. Inanimate plural nouns are grammatically feminine singular (referred to as "she" - "هي" and use feminine adj. endings, have "broken" plural forms for nouns) but not humans or some animals, which use "they" pronoun (there is a masculine and feminine "they" - "هم" "m" and "هن" "f") and use plural noun and adjective endings (broken and sound). --Anatoli T. (обсудить/вклад) 03:52, 9 October 2014 (UTC)
Why can't we simply consider non-human plurals to be grammatically feminine singular (f), rather than "plural" (p)? --WikiTiki89 04:22, 9 October 2014 (UTC)
When plural nouns occur as dictionary entries, I think there should be some indication that these are plural rather than feminine singular. Perhaps they should be identified as plural inanimate. Benwing (talk) 04:30, 9 October 2014 (UTC)
But what makes them plural other than their meaning? The meaning is indicated in the definition. Also, we should not use the word inanimate since this applies to animal plurals as well. --WikiTiki89 04:34, 9 October 2014 (UTC)
The examples that Anatoli gave above were رِيَاح شَمْسِيَّة(riyāḥ šamsiyya) and رِيَاح نَجْمِيَّة(riyāḥ najmiyya), translated as "solar wind" and "stellar wind" even though the word "wind" in Arabic is plural. So the definition doesn't always indicate the plurality. The plurality is indicated in the fact that the word for wind is a broken plural. This explains why, e.g., a word that doesn't have a feminine ending has feminine agreement, and it also tells you that you can't pluralize these forms because they're already plural (contrary to English where terms "solar winds" and "stellar winds" exist and have the expected plural meaning). If you object to "inanimate" we could say "non-human" abbreviated "nonhum" or "non-hum" or something. Benwing (talk) 04:44, 9 October 2014 (UTC)
So then other than their etymologies, what makes these examples "plural"? Take for example English crossroads, which is grammatically singular. Other than its etymology, there is nothing "plural" about it. The only thing in mind that makes رِيَاحٌ(riyāḥun) plural is that it has a singular رِيحٌ(rīḥun). If رِيَاح شَمْسِيَّة(riyāḥ šamsiyya) does not have a singular, there is no basis left for me to call it a plural. (Now if we were discussing colloquial Arabic, these would all be grammatically plural and there would be no further confusion.) --WikiTiki89 04:56, 9 October 2014 (UTC)
I agree with Benwing's argument that it should be marked as plural, even if it's a plurale tantum, which doesn't have a singular by definition. Some consider them feminine singular but I think it's better to treat broken plurals as plurals. A note in "About Arabic" on non-human plurals would suffice, I think. --Anatoli T. (обсудить/вклад) 05:07, 9 October 2014 (UTC)
But what is it that makes it plural? That's what I really want to know. Saying "it is a broken plural therefore it is a plural" is just a circular argument (and a "broken plural" is really just a singular noun that is used in place of the plural). --WikiTiki89 05:32, 9 October 2014 (UTC)
Convention or agreement between dictionary creators, if you wish. What do YOU wish to make them? Feminine singular? It's just another option. What about the fact that you can't make it plural, anymore, the etymology (plural for "wind") or that ALL non-human plurals behave like that, e.g. بُيُوت(buyūt)? It's not feminine sg but plural, isn't it? رِيَاح شَمْسِيَّة(riyāḥ šamsiyya) just doesn't have singular, if we consider it a plurale tantum. --Anatoli T. (обсудить/вклад) 05:51, 9 October 2014 (UTC)
Well here's the problem (yes, it's theoretical, but this whole discussion is pretty theoretical): Suppose we have a word whose etymology is unknown or ambiguous, it is used with feminine-singular agreement, it itself has no plural and no singular, and it does not exist in the colloquial language. What criteria do we use to determine whether it is a feminine singular noun or a non-human broken plural? --WikiTiki89 06:02, 9 October 2014 (UTC)
Something to be added here is that broken plurals often have a form that tells you they're broken plurals, e.g. أَرْوَاح "souls" (plural of روح) is of a traditionally plural form. Other examples are صَحَارَى "deserts" (origin of Sahara) and كُتَّاب "writers". In this case, رِيَاح is less obvious because you have singular كِتَاب with the same construction. In any case, if you really have something that has all the characteristics you describe, plus the fact that its form doesn't tell you whether it's singular or plural, and that fact that its meaning doesn't tell you that either, then you have no call to say something is singular or plural, that's all, and you'd have to go by what the dictionaries say or just omit it entirely. Benwing (talk) 06:52, 9 October 2014 (UTC)
What about رُمَّانٌ(rummānun)? But I think you are right about أَرْوَاحٌ(ʾarwāḥun) and صَحَارَى(ṣaḥārā). If we look to other dictionaries, then the question remains about how those dictionaries determine whether the term is plural. And then this raises another question: Why do we need to know whether it is plural? In other words, what will our readers do with this information? --WikiTiki89 07:25, 9 October 2014 (UTC)

Providing gender and plurality is important, IMO, even if it's only for the etymology. Broken plural forms seldom look like feminine singular but are used grammatically as such. If we don't provide this info, then users may ask for it, even if it doesn't make much difference for communication. If the gender or plurality is not known, it's fien to show "?" - meaning it's not known. --Anatoli T. (обсудить/вклад) 01:25, 10 October 2014 (UTC)


Shouldn't the 3mp past of this kind of verb be حَيِيُوا(ḥayiyū) rather than حَيُّوا(ḥayyū)? --WikiTiki89 16:04, 21 October 2014 (UTC)

The expected 3mp past would actually be حَيُوا(ḥayū). Take a look at رَضِيَ(raḍiya). The form حَيُّوا(ḥayyū) is explicitly given in John Mace's book on Arabic verbs. Barron's "201 Arabic Verbs" on the other hand has حَيُوا(ḥayū) without gemination; presumably one of these is a misprint. I can't find any other book that lists the full conjugation of this verb. Benwing (talk) 20:59, 21 October 2014 (UTC)
Then are you sure that the conjugation at رَضِيَ(raḍiya) is correct? It seems to me that either the 3fs past should be رَضَتْ(raḍat) or the 3mp past should be رَضِيُوا(raḍiyū), but I may be wrong. --WikiTiki89 21:54, 21 October 2014 (UTC)
I'm pretty sure the conjugation is correct. I'll take a look when I have access to my verb tables but I remember encountering this exact situation. The page on w:Arabic verbs also has this conjugation. Benwing (talk) 00:37, 22 October 2014 (UTC)
I've verified that the conjugation is correct. Something similar happens with final-weak active participles ending in -in, where the -iy- drops before u and i in masculine plural -ūna and -īna but not before a in feminine plural -iyātun/-iyātin or dual -iyāni/-iyayni. Benwing (talk) 15:33, 22 October 2014 (UTC)
In that case, I'm still confused why there is a shadda in حَيُّوا(ḥayyū); it makes sense in the conjugation of حَيَّ(ḥayya), but not in that of حَيِيَ(ḥayiya). --WikiTiki89 21:46, 25 October 2014 (UTC)

Deletion requests[edit]

Could you explain your deletion requests such as this one:

Could you confirm that the form exists, and that information provided was correct? Note that a separate page is normal for all forms of words... Lmaltier (talk) 20:57, 25 October 2014 (UTC)

There has been an agreement that the lemma form does not include a definite article unless it's an inherent part of the lemma, and that we don't include forms with added definite article unless it has some special meaning. It's similar to not including "the cat", "the dog", "the octopus", etc. as lemma entries. Benwing (talk) 05:17, 26 October 2014 (UTC)
It's not the lemma form. But is it a form of the word? In Bulgarian, forms including the definite article are actual forms of the word, just like a plural form. Is it the same? What agreement do you refer to? Lmaltier (talk) 18:45, 26 October 2014 (UTC)
The definite article is a clitic attached to the beginning of a word. In formal Classical Arabic, sometimes the ending also changes slightly, although usually without changing the unvowelled spelling under which words are entered in the dictionary. I brought this issue up in the Grease Pit I think, and asked whether these forms should generally be deleted, and there was agreement to do so. The definite form is not like the plural form in Arabic because the plural is often highly unpredictable whereas the definite is totally predictable by fairly simple rules. I think the situation is different in Bulgarian because for Bulgarian the definite isn't always predictable from the indefinite, e.g. sometimes the stress moves onto the definite. I personally think that only the few cases where the unvoweled spelling changes in the definite should be included; in all other cases the lemma can be found from the definite by simply removing the al- (Arabic ال) from the beginning of the word. Including these forms for all words would seem to clutter things up needlessly. Benwing (talk) 20:30, 26 October 2014 (UTC)
A discussion is here Wiktionary:Beer_parlour/2014/October#Category:Arabic_definitive_nouns.3F.3F.3F. It's been a general consensus not to include words with proclitic definite articles. Besides the definite article, monosyllabic prepositions, consisting of only one written consonant, question marker أَ(ʾa), enclitic pronouns are also written without a space. They don't belong to the word. It's different from Bulgarian/Macedonian, Albanian and Scandinavian languages, where these endings are considered inflections. Korean particles and copulas are the same story - written together but don't belong to words. --Anatoli T. (обсудить/вклад) 21:27, 26 October 2014 (UTC)

Arabic ǰuna[edit]

Hello. I am looking for an Arabic word transliterated as ǰuna, meaning perhaps “tanner, skin-dresser” or “hatter”. Does it exist and what is the spelling? It is needed for ճոն(čon). --Vahag (talk) 07:11, 26 October 2014 (UTC)

It would be spelled جُنَة but I can't find any such word in any of my dictionaries. I looked at a lot of variations and there are words like jauna "disc of the sun" and jūn, jūna "bay" and junāh "sinners, gatherers" but nothing meaning "tanner" or "hatter". I even checked things like junʿa, junʾa, juʿna, juʾna, junẖa, juẖna, junha, juhna on the assumption that one of these weak consonants might have been omitted in borrowing but no such luck. Benwing (talk) 21:36, 26 October 2014 (UTC)
Thanks, my source is possibly unreliable in this case. --Vahag (talk) 09:13, 27 October 2014 (UTC)

Arabic phrasebook entries[edit]


I haved fixed صَبَاح الخَيْر(ṣabāḥ al-ḵayr) and صَبَاح النُور(ṣabāḥ an-nūr) as examples of SoP entries, such as phrasebook entries. It's cumbersome to add links to individual words, though. --Anatoli T. (обсудить/вклад) 00:31, 28 October 2014 (UTC)

forte possible[edit]

Wow...I must have been half-asleep! My source doesn't even say "fp", so I'm not sure where I got that from. But it does say "forte possible" without indicating the language. Quote: "forte possible. As loud as possible." This is from "The Modern Conductor" by Elizabeth Green. The source is a trusted standard for conductors. Bob the Wikipedian (talkcontribs) 13:33, 7 November 2014 (UTC)

Oops, it says "possibile". Didn't even see the 'i' there! Bob the Wikipedian (talkcontribs) 22:01, 7 November 2014 (UTC)
OK, well then forte possibile must be a real term (although it sounds odd to me). Benwing (talk) 09:26, 8 November 2014 (UTC)

Automatic translit and entering Arabic vocalisaton[edit]


I noticed that you sometimes leave the manual transliterations, even on fully vocalised native Arabic words, why is that? Do you think it's still inaccurate, especially with tāʾ marbūṭa? Also, I'd like to share with you that I use Firefox plug-in "Character palette" to enter Arabic diacritics - highly recommended if you use Firefox. It's quite convenient and easy. :) --Anatoli T. (обсудить/вклад) 06:07, 14 November 2014 (UTC)

Yeah, it's because of the tāʾ marbūṭa, so it gets rendered properly instead of as (t). I enter Arabic diacritics using the Arabic keyboard layout on the Mac, which has almost all the necessary stuff ... just missing dagger alif and hamzat al-waṣl. These ones, along with the left and right half-rings, get entered using the built-in Mac character palette (Control-Command-Space). If I find myself using Firefox, however, I'll definitely check out the "Character palette" plug in. Benwing (talk) 23:45, 14 November 2014 (UTC)



What's the deal with ʾiʿrāb? Are we supposed to use it in headwords and translations from English? I can see both - with and without. Is it still undecided? Sorry, don't remember the outcome of discussions. --Anatoli T. (обсудить/вклад) 04:41, 18 November 2014 (UTC)

Also, marking hamzat al-waṣl is usually problematic but I see the module can handle the elision without the diacritic. Do you mark it? --Anatoli T. (обсудить/вклад) 04:42, 18 November 2014 (UTC)
I still haven't quite decided what to do about ʾiʿrāb. Mostly, I've entered words without ʾiʿrāb because it looks strange to me to include it, and most existing entries don't include it. (The main exceptions are in {{ar-nisba}}, which includes ʾiʿrāb in its auto-generated entries, and in verbal nouns for verbs, which always have ʾiʿrāb in them.) I like the solution used in Hans Wehr, which leaves triptotes unmarked and marks diptotes with a superscript 2; possibly we could adopt this solution, and I could fix Module:ar-translit to ignore a superscript 2 when transliterating. What do you think?
As for hamzat al-waṣl, you're right that the translit module can generally manage to elide it when necessary, although I've still been inserting it. I don't feel strongly about this, though, and we could choose to leave it out. Why do you think it's problematic? Benwing (talk) 08:08, 18 November 2014 (UTC)
I think we should include them, and I have been doing so. --WikiTiki89 16:37, 18 November 2014 (UTC)
I see we still have disagreement on this. Superscript 2 for diptotes is a great idea! Adding hamzat al-waṣl is no longer problematic but non-ligature form لله is not displayed correctly with any diacritic before or after or when alif is missing. --Anatoli T. (обсудить/вклад) 21:19, 18 November 2014 (UTC)
I don't like the superscript 2 idea. If we want to be explicit in headwords, we can just put the word diptote with a link to a appendix. The ʾiʿrāb will look less weird once we start graying them out. Also, if we choose not to include ʾiʿrāb, should we make an exception for words like نَادٍ(nādin)? --WikiTiki89 22:58, 18 November 2014 (UTC)
I still hesitate about ʾiʿrāb, undecided myself but if most entries and don't have them, then we won't get consistency. We won't be able to grey it out in translations or other places with automatic transliteration, will we? Hans Wehr and rare web references with vocalisation don't use them either. Terms like نَادٍ(nādin) could be exceptions.
A superscript 2 could link to a diptote appendix or About Arabic page. --Anatoli T. (обсудить/вклад) 23:28, 18 November 2014 (UTC)
Yes we can gray them out with automatic transliterations. --WikiTiki89 23:37, 18 November 2014 (UTC)
I implemented graying out ʾiʿrāb but I don't like the idea much because it can't really be done properly automatically. For example, adverbial accusatives need the ʾiʿrāb displayed normally, and in Koranic quotes we presumably want to do so as well. We also display ʾiʿrāb in verbs, among other things. I would rather display the ʾiʿrāb when it belongs in the translit and leave it out otherwise. I also think the graying out looks a bit strange in {{ar-nisba}} examples like عَرَبِيّ(ʿarabiyy). For cases like وَادٍ(wādin) we should probably make an exception and include the ʾiʿrāb; likewise for words like مُسْتَشْفًى(mustašfan). This is also the convention used in Wehr's dictionary; or at least, the translit includes the ʾiʿrāb, when it doesn't for most words. Likewise, this dictionary displays ʾiʿrāb in translit of verbs consistently, in adverbial accusatives and sometimes in phrases when it's necessary to clarify the case relations, but not otherwise. For cases like وَادٍ(wādin) I also try to put an entry at وَادِي(wādī) that says it's the construct state, given the way these words are normally pronounced.
As for displaying the word diptote, this isn't a bad idea although the problem is that it can't so easily be done for plural inflections, which are the most common cases of diptotes (well, I suppose it could, with some hacking of Module:headword, although it's not clear whether it will look bad). Benwing (talk) 23:48, 18 November 2014 (UTC)
Plural inflections don't need to say that because they are a regular part of the grammar. It is only words that are lexically diptotes, such as مِصْرُ(miṣru) that need an indication. I don't think we should base everything off of Hans Wehr. Note that there is no logical reason why ʾiʿrāb should be included for verbs, but not for nouns. Also note that it is much easier for an Arabic beginner to remove the ʾiʿrāb than to add it. --WikiTiki89 00:06, 19 November 2014 (UTC)
I don't think it would be easy in adding ʾiʿrāb to all entries consistently in practical terms, unless someone commits to make a bot to do this. Re: it is much easier for an Arabic beginner to remove the ʾiʿrāb than to add it. Yes, totally, that's the main pro argument. --Anatoli T. (обсудить/вклад) 02:00, 19 November 2014 (UTC)
If we decide to do it, then we can worry about how to do it. But it shouldn't be too hard anyway. Arabic doesn't have nearly as many entries as English or Russian, for example. --WikiTiki89 02:04, 19 November 2014 (UTC)
Why don't plural inflections need it? Some broken plurals are diptotes, some are triptotes. For example, 4-character plurals of the form CaCāCiC and CaCāCīC are generally diptotes (including words like فَوَاكِه and جرَائِد which are based off of 3-character singulars), as are plurals of the form ʾaCCiCāʾ (and ʾaCiCCāʾ for geminate roots) and CuCaCāʾ, and words of the form CaCCān, generally intensive adjectives (but not words of the form CiCCān or CuCCān), and masculine color/defect/elatives of the form ʾaCCaC (and ʾaCaCC for geminate roots) and feminine color/defect adjectives of the form CaCCāʾ, and probably other cases as well. This is independent of the predictable declension of words in -ūn, -āt and -in, which are technically diptotes because they have only two distinct case forms but which have their own declensions separate from the normal diptote declension.
As for following Hans Wehr or not, John Mace's book on Arabic Verbs likewise includes ʾiʿrāb for verbs but not nouns and adjectives (including verbal nouns and participles), and the book "Introduction to Koranic and Classical Arabic" by Thackston does something similar, where verbs are transcribed with ʾiʿrāb as e.g. rajaʿa "to return" but nouns and adjectives are written with ʾiʿrāb only if they are diptotes, e.g. ğarīb- pl. ğurabāʾu "strange" (with a hyphen in place of the triptote ending -un). So I think there's a lot of precedent for something like this. I'm not opposed to the idea of writing ʾiʿrāb only for diptotes, as Thackston does; this would be an alternative to using a superscript 2. All of these books are likewise consistent in writing ʾiʿrāb for prepositions and particles, which I think is a good idea. I imagine one reason for this is that spoken MSA may be more likely to drop the case endings of nouns and adjectives than the ʾiʿrāb of verbs. Benwing (talk) 05:42, 19 November 2014 (UTC)
Now I'm learning something new. I thought that CaCāCiC/CaCāCīC and the others you've listed were ordinary triptotes. I thought that only sound plurals were diptotes. I'm assuming that the patterns you mention use the same declension as the sound -āt plural (i.e. -u(n) for nominative and -i(n) for genitive and accusative)? But in that case, indicating them with ʾiʿrāb is probably not a good idea. Maybe we should just explicitly indicate the accusative case of diptotes in the headword line. --WikiTiki89 13:33, 19 November 2014 (UTC)
They don't use the same declension as the sound -āt plural. They have indefinite nominative -u (no nunation), indefinite genitive/accusative -a, while the definite uses -u/-i/-a like for triptotes. This is the same declension as diptotes like مِصْرُ(miṣru) and أَحْمَدُ(ʾaḥmadu). This is all documented in w:Arabic nouns and adjectives. I imagine few Arabic speakers actually know these rules nowadays. Benwing (talk) 21:21, 20 November 2014 (UTC)
Thanks! That explains my previous confusion about words like أَحْمَدُ(ʾaḥmadu). --WikiTiki89 15:39, 21 November 2014 (UTC)

I also have "Introduction to Koranic and Classical Arabic" with answer keys :). --Anatoli T. (обсудить/вклад) 05:52, 19 November 2014 (UTC)

Arabic vowels and consonants[edit]

If I'm not mistaken, Arabic normally writes only consonants, but three of the consonant letters can also be used to indicate long vowels. Assuming that the word is fully vocalised, I wonder if there is a reliable way to tell whether a given consonant represents an actual vowel or its consonantal equivalent? I am asking this because I would like to write a function that extracts the consonants or vowels from a word. This means knowing which letters are vowels and which are consonants, obviously. —CodeCat 19:08, 25 November 2014 (UTC)

For ي and و, if there is another short vowel written on them, then they are consonants, otherwise they are long vowels. In the case there is a sukuun (the null vowel) on them, it is debatable whether to analyze them as a the second element of a diphthong or just as a consonant. For ا, the situation is a bit trickier. It almost always indicates a long vowel, but at the beginning of a word, it indicates an elidable epenthetic vowel before a consonant cluster. The tricky part is that if there is a prefix, the ا is still written but represents no sound at all (كَاسْمٍ(kasmin, like a name)) rather than a long vowel. This can be detected only by knowing that consonant clusters are forbidden after long vowels (except in the active participle of geminate roots, e.g. خَاصٌّ(ḵāṣṣun), but these have the very particular form C1āC2C2-). But I'm curious as to why you're writing this function. It may or may not be important to keep in mind that long vowels can interchange with semivowels within the same consonantal root (e.g. نُونٌ(nūnun) > تَنْوِينٌ(tanwīnun)). --WikiTiki89 19:33, 25 November 2014 (UTC)
Re: why you're writing this function: pls, see Module_talk:ar-headword#Plural_forms.2C_dual_forms.2C_etc. --Anatoli T. (обсудить/вклад) 21:31, 25 November 2014 (UTC)
One conceivable way is to use Module:ar-translit and then parse the transliteration. This already implements all the rules required to distinguish consonants from vowels. (Except that it doesn't handle cases like كَاسْمٍ(kasmin, like a name) but these won't show up in single lemmas -- this occurs only because ka- "like" is a clitic.) I don't know whether this is doable in reality, as you'd have to map back to the Arabic text somehow. If not then you should at least be able to reuse the code in Module:ar-translit that does the transliteration. Benwing (talk) 05:23, 26 November 2014 (UTC)

Gender and nmber of adjectives[edit]


Re: diff. Normally, adjectives (lemmas) don't display gender in any language in the headword. If masculine singular is the lemma, so it's used as lemma, other forms use form templates. --Anatoli T. (обсудить/вклад) 07:19, 28 November 2014 (UTC)

The template is for non-lemma plural adjective forms, e.g. أَغْبِيَاء(ʾaḡbiyāʾ), which is the masculine plural of adjective غَبِيّ(ḡabiyy), and the change is made to reflect the fact that the most common usage will be with masculine (broken) plurals. For non-lemma forms it seems reasonable to display their gender. Benwing (talk) 10:35, 28 November 2014 (UTC)


Hi, What's the gender/number of حِذَاء(ḥiḏāʾ)? Hans Wehr says "(pair of) leather boots or shoes", plural أَحْذِيَة(ʾaḥḏiya). --Anatoli T. (обсудить/вклад) 01:42, 29 November 2014 (UTC)

Hmmm, it's singular, I'm guessing masculine. Words ending in اء are often feminine but in this case it's not an ending but rather a form of the root consonant و, meaning that the word has the same pattern as كِتَاب. Benwing (talk) 05:09, 29 November 2014 (UTC)

A few WingerBot clinkers[edit]

ريالات, فرميونات, كامرات, كواركات, and ميكروبات all have the same module error. Chuck Entz (talk) 06:39, 30 November 2014 (UTC)

Thanks. Fixed them. Happened when the singular noun entry had a blank head. Benwing (talk) 08:38, 30 November 2014 (UTC)

ذكرى and دنيا[edit]

Just curious etymologically, do you know why ذِكْرَى(ḏikrā) and دُنْيَا(dunyā) don't take tanween (i.e. why aren't they ذِكْرًى(ḏikran) and دُنْيًا(dunyan))? --WikiTiki89 16:57, 12 December 2014 (UTC)

دُنْيَا(dunyā) is a nominalized feminine elative of دَنِيّ(daniyy, low), literally "the lowest (place)". It has the same pattern as كُبْرَى(kubrā), feminine of أَكْبَر(ʾakbar). It takes tall alif by the rule that alif maqṣūra is written as tall alif after yāʾ. I don't know what the proto-forms of these words are, nor of ذِكْرَى(ḏikrā), but I guess the reason for no tanween is that these words are underlyingly diptotes. Similarly, the masculine elative of دَنِيّ(daniyy, low) is أَدْنَى(ʾadnā) without tanween, underlyingly *ʾadnayu (cf. ʾakbaru) whereas a word like مَعْنًى(maʿnan, meaning) is underlyingly *maʿnayun (cf. maktabun). I'm not sure the reason why ذِكْرَى(ḏikrā) is a diptote. I'm also not sure why certain words like عَصًا(ʿaṣan, stick) have tall alif independently of a preceding yāʾ. (In the dialect of Mecca, alif maqṣūra was pronounced something like [e:] whereas tall alif was [a:].) Benwing (talk) 00:51, 13 December 2014 (UTC)
I would guess that عَصًا(ʿaṣan) is underlyingly *ʿaṣawun, not **ʿaṣayun, which is why it has tall alif. You answered my question with "these words are underlyingly diptotes", "دُنْيَا(dunyā) is a nominalized feminine elative of دَنِيّ(daniyy, low)", and "I'm not sure the reason why ذِكْرَى(ḏikrā) is a diptote". Thanks! --WikiTiki89 02:47, 13 December 2014 (UTC)
Good call on عَصًا(ʿaṣan). Benwing (talk) 08:13, 13 December 2014 (UTC)


What's the transliteration of صري, if the term is real? --Lo Ximiendo (talk) 16:09, 16 December 2014 (UTC)

I don't think this word actually exists. It's not in my dictionary. Benwing (talk) 18:58, 16 December 2014 (UTC)
The closest in Wehr appears to be مُصِرّ(muṣirr, persistent, resolute). Lane has a noun something like صِرِّي(ṣirrī) or صِرِّيّ(ṣirriyy) (?) meaning something like "a serious assertion, not a jest", occurring in the expression هِيَ مِنِّي صِرِّي(hiya minnī ṣirrī) in various variants meaning approximately "It is a serious assertion from me" said of an oath. It's clearly an archaic word which is why it isn't in Wehr. I still think we should delete this word. Benwing (talk) 19:10, 16 December 2014 (UTC)
Then what's the transliteration of حماقة? --Lo Ximiendo (talk) 21:58, 17 December 2014 (UTC)
@Lo Ximiendo fixed. --Anatoli T. (обсудить/вклад) 22:14, 17 December 2014 (UTC)
Can anyone verify the term مندثر and its transliteration? --Lo Ximiendo (talk) 12:15, 18 December 2014 (UTC)
Added translit. Benwing (talk) 11:24, 19 December 2014 (UTC)

Arabic for "to fry"[edit]

Is the following term قلا really an Arabic verb that means "to fry"? --Lo Ximiendo (talk) 11:21, 16 January 2015 (UTC)

@Lo Ximiendo Yes it is. I cleaned up the entry and added conjugation. Benwing (talk) 07:11, 17 January 2015 (UTC)

شجرة التفاح and كوكا كولا[edit]

I'm not sure whether the module errors on these entries are your fault- but I'm pretty sure you can fix whatever the problem is. Thanks! Chuck Entz (talk) 22:22, 18 January 2015 (UTC)

Fixed. Benwing (talk) 01:34, 19 January 2015 (UTC)
Since كوكا كولا is indeclinable like some loanwords and many words ending in alif, so may be the inflection table is unnecessary but the header should say so + indeclinable category? --Anatoli T. (обсудить/вклад) 01:39, 19 January 2015 (UTC)
A Russian indeclinable example бюро́(bjuró). A parameter in the header is "-" adds to Category:Russian indeclinable nouns. Just a suggestion, it may reduce the editing time. --Anatoli T. (обсудить/вклад) 02:02, 19 January 2015 (UTC)

Requested entry[edit]

Hi, are you able to make entry for مُفْتٍ(muftin), please? I'm not sure about the plural form and don't have my HW handy. :) --Anatoli T. (обсудить/вклад) 05:59, 19 January 2015 (UTC)


Please take a look at Category:Pages with module errors (currently 55 entries) and fix the problem. Thanks! Chuck Entz (talk) 14:45, 21 January 2015 (UTC)

Sorry about that. Stupid typo. I wish the stuff in Category:Pages with module errors showed up faster. It seems to take quite awhile for it to cycle through, longer than it used to. E.g. I fixed the error 20 minutes ago and still see all the pages listed. Benwing (talk) 07:41, 22 January 2015 (UTC)
I just do null edits on all the entries. If you open a bunch of them in separate tabs, you can do things with the first ones you opened while the more recent ones are still getting around to responding, and keep doing each step that way until you're done- it averages out to only a second or two per entry on a reasonably fast computer. All those errors are cleared, but there's a single one with a new error. Chuck Entz (talk) 03:54, 23 January 2015 (UTC)



Do you know why شْنِیتْزَل is not working? All internal diacritics are there, although I'm not 100% sure it's a sukūn or kasra after šīn. It should probably be manually transliterated as "š(i)nitzal" but the automatic test fails. --Anatoli T. (обсудить/вклад) 00:06, 6 February 2015 (UTC)

You had FARSI YEH in place of the YEH. If you change that, you get شْنِيتْزَل(šnītzal) and it works. Benwing (talk) 04:00, 6 February 2015 (UTC)
Oops. Thank you. :) I wonder how I managed to get it there... --Anatoli T. (обсудить/вклад) 04:04, 6 February 2015 (UTC)


Hi BW,

Why is the entry reporting "Arabic nouns with sound masculine plural" when it is a broken plural and not -ūn(a)/-īn(a)? Did I miss something or is it confusing سُجُون for an SMP? TIA. :) --Anatoli T. (обсудить/вклад) 13:02, 5 March 2015 (UTC)

It thinks the -ūn ending indicates a strong plural. I fixed it by adding an explicit ":tri" (triptote) notation. This also occurs for a few other words ending in -n, e.g. عَيْن pl. عُيُون and قَرْن pl. قُرُون. Possibly I should add a check for the form فُعُون and treat it as a broken plural. Benwing (talk) 14:39, 5 March 2015 (UTC)

Arabic etyma of Swahili terms[edit]

I don't have any Arabic resources nor do I know Arabic script, so I have been having a hard time finding the etyma of Swahili words that I've been adding that (to me, at least) seem very strongly like they derive from Arabic. I was wondering whether you'd be willing to help me out with finding the Arabic origins of words in Category:Swahili entries needing etymology, or at least recommend a good online resource for Arabic that I can search in Latin script. —Μετάknowledgediscuss/deeds 21:50, 9 March 2015 (UTC)

I can try to help you. There are a whole host of dictionaries here: [2] and you can search by Latin using the search button, although they tend to be sorted by Arabic root, which requires that you have some knowledge of how Arabic words are structured, because Arabic roots are generally three consonants, with the vowels inserted between them. I don't see too many Arabic-looking words among the category you linked to above, although maskini definitely comes from Arabic مِسْكِين(miskīn). Benwing (talk) 07:13, 10 March 2015 (UTC)
aidha is from أَيْضًا(ʾayḍan). Benwing (talk) 07:18, 10 March 2015 (UTC)
sahani is from صَحْن(ṣaḥn). Benwing (talk) 07:44, 10 March 2015 (UTC)
Thank you! There are also some entries, at least one of which I see you have noticed, that need some improving of their etymologies, like tajiri.
I'm pretty sure the following words in that category are from Arabic or via Arabic: ghasia, hafifu, hodari, imara, karamu, laini, ruhusa, shikamoo. If any of those have deducible etyma, that would be very helpful. I'll try the dictionary you linked to. —Μετάknowledgediscuss/deeds 08:05, 10 March 2015 (UTC)
hafifu is possibly خَفِيف(ḵafīf, light, slight, thin). ghasia is possibly غَاشِيَة(ḡāšiya, misfortune, faint, stupor, attendants) (?). Can't find any obvious etyma for hodari or karamu. imara is possibly إِمَارَة(ʾimāra, emirate, authority, power), although this is a noun not an adjective, and the power talked about is power of command rather than physical power. laini is possibly لَيِّن(layyin, soft, feeble, tender, gentle, supple). ruhusa is definitely رُخْصَة(ruḵṣa, permission). shikamoo I have no idea about. Benwing (talk) 09:06, 10 March 2015 (UTC)
Most of those match the regular sound changes, but imara seems off and ghasia would have turned out as *ghashia if that were the etymon, unless there is a dialectal form in Arabic with /s/. Thank you for all your trouble! —Μετάknowledgediscuss/deeds 17:02, 10 March 2015 (UTC)
You're welcome! Sorry I couldn't find better etyma. As for ghasia, Arabic is pretty strict about keeping /s/ and /ʃ/ apart so I don't think there are any dialectal forms with /ʃ/ -> /s/ in them. Benwing (talk) 18:39, 10 March 2015 (UTC)
I got my hands on some better Swahili resources, including an etymological dictionary. That said, I may have to learn Arabic script if I want them to be of any use to me in this regard. —Μετάknowledgediscuss/deeds 08:05, 11 March 2015 (UTC)
If you put up some screen shots I might be able to help. Benwing (talk) 23:08, 11 March 2015 (UTC)
Once I'm not terribly busy, I'll learn Arabic script so I don't have to be as reliant. Perhaps next week. —Μετάknowledgediscuss/deeds 07:46, 13 March 2015 (UTC)


Apparently the module disagrees with the conjugation type you gave it. Chuck Entz (talk) 08:57, 29 March 2015 (UTC)

Thanks. The module was correct; I fixed the conjugation type. Benwing (talk) 09:27, 29 March 2015 (UTC)

Automatic transliteration of بالـ[edit]

I thought this worked before, but I may be wrong: بِالتَّوْفِيق(bi-t-tawfīq). --WikiTiki89 15:26, 15 May 2015 (UTC)

It seems to work if you change the alif into an alif waṣla. I don't think I ever got it working in the case you give. Benwing (talk) 06:46, 16 May 2015 (UTC)
Yes, Module:ar-translit/testcases has a case with an ʾalif waṣla - بِٱلتَّأْكِيد(bi-t-taʾkīd). --Anatoli T. (обсудить/вклад) 09:22, 16 May 2015 (UTC)
Ok. That's probably why I remember it working. --WikiTiki89 19:01, 19 May 2015 (UTC)
Considering that "ٱ" is such a rare symbol, perhaps the rule should be that if "ال" follows a kasra or a ḍamma, then it should be considered an ʾalif waṣla? I don't know if it's hard to implement for Benwing. Shall I add بِالتَّوْفِيق(bi-t-tawfīq) (or similar) to test cases? --Anatoli T. (обсудить/вклад) 01:22, 21 May 2015 (UTC)
OK, I implemented this. 19:58, 21 May 2015 (UTC)
Cool! Maybe we should have it work for مِائَة as well? Or is that too risky and not common enough to be beneficial? --WikiTiki89 21:31, 21 May 2015 (UTC)
I think that's too much work for just one single case, and every new regex slows things down and risks leading to module errors on certain long appendix pages. Benwing (talk) 23:02, 21 May 2015 (UTC)
Well I was just thinking of making the regex you just added less restrictive (i.e. changing {"([\217\143\217\144])\216\167\217\132", "%1\217\177\217\132"}, to {"([\217\143\217\144])\216\167", "%1\217\177"},), but like I said, that might be too risky and not worth it for such a rare case. --WikiTiki89 17:15, 22 May 2015 (UTC)

Parameters of Arabic headword-line templates[edit]

I've been trying to figure out how the Arabic headword-line templates work and what parameters they take. It seems to me that many of them show a rather excessive number of forms on the headword line. For example, {{ar-noun}} apparently can list:

  1. construct state
  2. definite state
  3. oblique
  4. informal
  5. dual
  6. dual construct state
  7. dual definite state
  8. dual oblique
  9. dual informal
  10. plural
  11. plural construct state
  12. plural definite state
  13. plural oblique
  14. plural informal
  15. feminine
  16. feminine construct state
  17. feminine definite state
  18. feminine oblique
  19. feminine informal
  20. masculine
  21. masculine construct state
  22. masculine definite state
  23. masculine oblique
  24. masculine informal

This really is way way way too many forms to list on a single headword line. These templates should be trimmed down to only show the bare basics and the rest should be shown in an inflection table. —CodeCat 18:38, 19 May 2015 (UTC)

Yeah, it's a lot of potential forms, but most of them aren't used. In practice only the forms that can't be predicted are listed, and that's a small number. At least, that's the practice I've been following, and it was more or less the same in the existing entries before I came along, so it's pretty consistent. Generally, for nouns, only the plural is given; dual is given only when it can't be predicted, which is fairly rare (basically, only nouns ending in -ā, where the dual can be either -awān or -ayān or sometimes both). Masculine is used only for feminine nouns referring to people, where there is a corresponding masculine noun. Construct state is given only for nouns ending in -in (which can appear in the singular or broken plural), and informal is similarly given for adjectives in -in (which can likewise appear in the singular or broken masculine plural; adjectives don't have a construct state). The reason for this is the -in is written with a diacritic ـٍ (two slanted lines below the letter), and hence doesn't appear in unvocalized text or in the unvocalized page title; whereas the construct state, informal and definite all appear with -ī, written with an extra letter ي. Giving both forms emphasizes and clarifies the relation between the two, esp. since many users may be more familiar with the version with attached ي. Overall, there are typically only a couple of forms listed in the headword line, and if there are more it's usually because there are multiple broken plurals (in the extreme case, رَاجِل(rājil, pedestrian, footsoldier) has 13!).
This means that at least the following could potentially be removed:
  1. oblique (always predictable)
  2. definite state
  3. dual construct and informal
  4. feminine construct and informal

Benwing (talk) 01:06, 20 May 2015 (UTC)

The feminine and masculine forms probably have their own lemma page don't they? If so, then we don't need to list all the forms of them as they'll already be covered on that other page. As for the rest, I don't think we should be showing all possibly-unpredictable forms in the headword line. The idea of the headword line is to give a quick overview of the inflection, but listing all the irregularities is just too much. Consider for example what would happen if we tried that for Latin deus! So we really need to make a choice: which forms are the most essential and least predictable? Forms that are only unpredictable for a handful of words don't need to be in the headword line, that's what inflection tables are for. —CodeCat 01:12, 20 May 2015 (UTC)
How about you find an example of Arabic entry that has too many forms in the headword line (other than plurals), and then you can complain. Arabic doesn't have words that are as irregular as Latin deus. --WikiTiki89 12:59, 20 May 2015 (UTC)
I agree with Wikitiki; in practice this isn't really a big issue. As for feminine and masculine forms, in the case of nouns yes they have their own lemma page, but in this case we don't list the forms of them. Feminine plurals are only given for adjectives and even then only sometimes, generally when the dictionary gives them (which is only for adjectives that can modify people and generally only when the feminine plural is irregular). I'd say, things aren't broke so let's not try to fix them. Benwing (talk) 03:47, 21 May 2015 (UTC)
BTW what is the need for your latest changes to Module:ar-headword? Why the need to explicitly list def/def2/def3/def4 etc.? This seems very hacky, and something similar won't work for plurals, where there may well be more than 4 possibilities. The current code works fine without needing to do any of this. Benwing (talk) 03:53, 21 May 2015 (UTC)


Do you know what the difference in meaning is between forms III and IV at آجَرَ(ʾājara)? I tried reading the definitions in both of the cited references, but the old-fashioned, dry, concise English makes no sense to me. --WikiTiki89 21:45, 2 July 2015 (UTC)

Probably not that much. Form III usually takes a person as an object, so form III might have the meaning "hire out" rather than exactly "rent out", although according to Wehr (see this site, which has all the dictionaries: [3]) both "rent out" and "hire out" are possible form IV meanings as well. Benwing (talk) 09:23, 3 July 2015 (UTC)
It seems that Wehr doesn't even have form III. And it seems that Lane was saying that there was confusion between the two due to the coincidence of the past tense forms. --WikiTiki89 17:17, 6 July 2015 (UTC)


I note that your bot can remove redundant transliterations. Can it also delete transliterations for all items in Category:Terms with redundant transliterations/hy and Category:Terms with manual transliterations different from the automated ones/hy? --Vahag (talk) 16:09, 5 July 2015 (UTC)

It could potentially do this. With Arabic it uses some sophistication to decide whether the remove the translit or canonicalize it. It sounds like this isn't necessary for Armenian? Can it just unilaterally remove all the translits in the words in those two categories? Benwing (talk) 00:57, 6 July 2015 (UTC)
Yes, all transliteration should be blindly removed from everywhere. --Vahag (talk) 07:05, 6 July 2015 (UTC)
The same with Category:Terms with redundant transliterations/xcl, Category:Terms with manual transliterations different from the automated ones/ka, Category:Terms with redundant transliterations/ka. If the transliteration remains in unusual places (e.g. the headword format is not standard for xcl), don't bother with additional coding. Just pick the lowest hanging fruit. --Vahag (talk) 07:10, 6 July 2015 (UTC)
@Benwing, in case, you wish to help other languages as well, to make things easier and generic, all languages, for which manual (hardcoded) transliterations don't override the automatic ones, all manual transliterations could and should be removed from everywhere when templates are used. The list of language codes of such languages (ONLY THOSE) is in Module:links just after the line starting with "local override_translit". You'll see that Russian (and other Cyrillic-based Slavic languages), Hebrew, Yiddish, Hindi are NOT in that list, so they shouldn't be removed automatically. The list in Module:links is the list of languages, for which nobody objects to use 100% automatic transliterations. --Anatoli T. (обсудить/вклад) 07:25, 6 July 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Atitarev @Vahagn Petrosyan I wrote the code to do it but I want to have you guys verify some things about it. I wrote the code so it can do all the languages listed in override_translit in Module:links, but currently I'm only having it do hy, xcl, ka, el and grc. I'm doing all the pages listed in the categories "Terms with manual transliterations different from the automated ones/X", "Terms with redundant transliterations/X", "X lemmas" and "X non-lemma forms". I have some logic to determine which parameters to remove from which templates (see below). Currently 275 lines of Python (not including a separate, already-existing generic helper library). A few issues:

  • Currently I have it written to remove sc=Armn from hy and xcl, and sc=Grek from el. I already do something similar with removing sc=Arab from Arabic-language templates, because it's definitely redundant in this case. Is it safe for hy, xcl and el? I don't remove sc=Geor from ka or sc=polytonic from grc because in each case there are two scripts listed in the relevant entry in Module:languages/data2 and Module:languages/data3/g. Does anyone know what these script parameters are used for? Is it only in the transliteration modules?
  • The code will change stuff in every namespace, including User, Talk, User talk, Wiktionary:Beer parlour/*, etc. Is this OK?
  • PLEASE VERIFY THE FOLLOWING (Vahagn): As well as removing tr= and sc=Grek and sc=Geor, I remove various numbered parameters from various hy and xcl templates, which appear to be unused translit params. Before removing any such param, I double-check to make sure non-Latin characters aren't present, but I'd like someone (e.g. Vahagn) to look over this list:
    • Even-numbered params from xcl-noun-*pl*
    • Odd-numbered params from hy-noun-* and remaining xcl-noun-*
    • Odd-numbered params from xcl-decl-verb as well as all the hy-* and xcl-* headword templates except for hy-letter (where param 1 doesn't appear to be used but is Armenian)
    • Even-numbered params from hy-conj
    • Odd-numbered params from xcl-conj
  • I have general code to remove all tr= parameters from FOO-* where FOO is any language code; but so far I need a special case for {{grc-alt}} with |dial=muk because in this case it makes a call to {{head|gmy}} for Mycenaean Greek, which isn't on the list of override_translit languages. Perhaps it should be? (Before I run the script with --save, I will check the full list of modified templates to see if there are any other such cases.)
  • Many templates still use the tr= parameter, passing it to {{head}}. I assume that it will be ignored and hence is safe to remove. These templates should be changed to ignore the tr parameter entirely. The list of templates I've found so far that do this are:
    • hy-particle, hy-personal_pronoun, hy-phrase, hy-postp, hy-postp-form, hy-prefix, hy-proper-noun-form, hy-suffix
    • xcl-adj, xcl-adj-form, xcl-adv, xcl-con, xcl-interj, xcl-noun-form, xcl-numeral, xcl-particle, xcl-postp, xcl-prefix, xcl-prep, xcl-pron, xcl-pron-form, xcl-proper_noun, xcl-proper-noun-form, xcl-root, xcl-suffix, xcl-verb, xcl-verb-form
    • axm-adj, axm-adv, axm-interj, axm-noun, axm-prefix, axm-suffix, axm-verb
    • ka-adj, ka-adv, ka-pron, ka-proper noun, ka-verb
    • oge-noun
  • A more complex issue is with the template ka-decl-noun. All the even-numbered parameters are translit parameters, but the template manually inserts them after the corresponding Georgian text rather than using auto-transliteration. As a result I don't touch these parameters; but the template should be rewritten to use auto-transliteration, so these params can be removed.

The list of params removed is:

  • No params removed whose value is - or contains non-Western chars, as described above
  • sc=Grek and sc=Geor as described above
  • odd/even params from some (Old) Armenian templates, as described above
  • tr= from all FOO-* templates where FOO is any of the processed language codes, except for some cases with grc-alt, as described above
  • tr= from all templates where 1=FOO or lang=FOO where FOO is any of the processed language codes, except for {{borrowing}}, where |lang= is ignored
  • tr1=, tr2=, etc. from {{suffix}}, {{prefix}}, {{confix}}, {{affix}}, {{compound}}

Benwing (talk) 07:31, 7 July 2015 (UTC)

OK, I'm now ignoring stuff with these prefixes: "User:", "User talk:", "Talk:", "Appendix talk:", "Template talk:", "Wiktionary:Beer parlour", "Wiktionary:Translation requests", "Wiktionary:Grease pit", "Wiktionary:Etymology scriptorium", "Wiktionary:Information desk"
I also had to special case "xcl-noun-ն-pl", "xcl-noun-ն-2-pl", "xcl-noun-ն-3-pl", "xcl-noun-ո-ա-pl" ,"xcl-noun-հայր", "xcl-noun-տէր", "xcl-noun-այր", "xcl-noun-կին", "xcl-noun-collnum-*", "xcl-noun" and "xcl-adj", which don't follow the above odd/even rules. Benwing (talk) 08:23, 7 July 2015 (UTC)

That is very impressive, thanks for agreeing to help.

  • Removing sc= is desirable. Please remove sc=Armn from hy, xcl, axm; sc=Grek from el; sc=Geor from ka; sc=polytonic from grc. It doesn't matter that the last two can be written in different scripts. Nowadays the scripts are auto-detected. They can be safely removed.
  • I confirm removing various numbered parameters from various hy and xcl templates as specified above.
  • I have added gmy to the transliteration override list. You don't need a special code.
  • Many templates still use the tr= parameter, passing it to {{head}}. I assume that it will be ignored and hence is safe to remove. Yes, please remove. I will change the templates gradually.
  • Please do ignore special complex cases. Their templates are going to be completely rewritten eventually. --Vahag (talk) 10:12, 7 July 2015 (UTC)
OK, thanks for adding gmy. I've changed things to remove all the scripts as above. It looks like the code is set now but there are an awful lot of pages; it may take a couple of days of running time to run in --save mode (which is a lot slower than not saving changes). BTW I'm impressed at the number of Armenian and Old Armenian lemmas (10,000 of the former, around 5,000 of the latter) and especially the quality of the pages, with the lengthy etymologies and references and all. I don't think I've added more than 1,000 or so Arabic lemmas, maybe 2,000 at the most (not counting around 3,000 auto-generated entries) and that took a whole lot of work, and the etymology sections are really crappy (often nothing more than "derived from such and such a three-consonant root"; it doesn't help that there simply don't exist any real Arabic etymological dictionaries, hard as that is to believe). Benwing (talk) 10:47, 7 July 2015 (UTC)
I have noted your excellent work in Arabic. The poor state of Arabic lexicography is well-known. Do you have Badawi, Elsaid M.; Haleem, Muhammad Abdel, Arabic-English Dictionary of Qur'anic Usage, 2008? It is a recent and relatively good source. --Vahag (talk) 11:11, 7 July 2015 (UTC)
PS. Please check your email. --Vahag (talk) 12:04, 7 July 2015 (UTC)
Regarding the script tags: Nearly all script tags everywhere can be removed because the modules can infer them from the text itself. The only exceptions are when scripts are mixed in one template (which is a bad idea anyway) or when the entire text consists of punctuation marks or other characters that are shared by multiple scripts of that language. --WikiTiki89 11:39, 7 July 2015 (UTC)
@Wikitiki89 Can you cite an example of a translit module that does script auto-detection? Mostly the ones I've looked either ignore the script entirely, or they check for unusual scripts and pass things off to a different module (e.g. Module:grc-translit does that with Cypriot script). I haven't seen any that auto-detect, although it certainly should be possible. Benwing (talk) 12:06, 7 July 2015 (UTC)
The translit modules only get the script if it is specified as a parameter and they are the only module interface that retains this old-fashioned behavior (see Thread:User_talk:CodeCat/Module:jdt-translit_errors). However, I have used explicit script detection in Module:jdt-translit (using findBestScript from Module:scripts). This is often unnecessary if you don't need script-specific logic in the transliteration. For correctly displaying scripts, script detection is used implicitly everywhere when it is not overidden by sc=; for example, Aramaic in Hebrew script (עלמא) and in Syriac script (ܥܠܡܐ) display correctly without script tags. --WikiTiki89 12:21, 7 July 2015 (UTC)
Thanks. Benwing (talk) 13:04, 7 July 2015 (UTC)
Hi. Regarding {{ka-decl-noun}} you mentioned above: it is now obsolete. We use {{ka-infl-noun}} instead, which uses auto-transliteration and adds a postposition table. However, simply replacing all ka-decl-noun with ka-infl-noun is not a good idea, because we may lose contraction and get incorrect declension tables. This script by @Dixtosa correctly replaces the deprecated templates with the newer ones without losing any contraction info (while also ignoring manual transliteration). You might want to check that out. --Simboyd (talk) 12:04, 7 July 2015 (UTC)
Thanks. I don't have time now to look into this; it also looks like that script needs to be rewritten so it can be run in a bot rather than interactively (I assume it's interactive since it's Javascript). However, if you need some translit parameters removed in any templates, let me know. Benwing (talk) 15:38, 7 July 2015 (UTC)
correctDecl function should be working fine.--Dixtosa (talk) 16:43, 7 July 2015 (UTC)
@Dixtosa Can you go ahead and run your bot to convert all the remaining deprecated Georgian declensions, and then delete the deprecated templates? Benwing (talk) 16:50, 7 July 2015 (UTC)

30 changes in a row --Dixtosa (talk) 12:11, 9 August 2015 (UTC)

@Dixtosa I suppose I should batch up the changes .... did it trigger some filter? Benwing (talk) 21:58, 9 August 2015 (UTC)

Ͷ, ͷ[edit]

Hello Benwing. Your bot removed manual transliterations from Ͷ and from ͷ; in doing so, it introduced errors: Ͷ, ͷ qua Arcadocypriot tsan should be transliterated Ś, ś, whereas Ͷ, ͷ qua Melian beta should be transliterated B, b. Is there a way to prevent your bot from removing the manually overridden transliterations in those cases? — I.S.M.E.T.A. 18:02, 8 July 2015 (UTC)

Sorry about that! I thought that since grc is on the list of languages with override_translit set, that manual transliterations are always ignored. I know that tr=- is paid attention to and don't remove it. Is there an exception for {{head}} or something? If so, I won't remove them and will have to figure out which other cases need to be undone ... Benwing (talk) 21:48, 8 July 2015 (UTC)
@I'm so meta even this acronym Evidently {{head}} doesn't pay attention to override_translit. I tried to look through the other cases where |tr= was removed from {{head}}. The only case I found that arguably shouldn't have been removed was , which has one form with a long vowel and one with a short, and the translit reflected the vowel length even though Greek translit normally doesn't do this, e.g. μακρά has one pronunciation with a long final vowel and another with a short final vowel, and they're both translitted the same (and were even before my bot attacked the page). Benwing (talk) 22:56, 8 July 2015 (UTC)
@Vahagn Petrosyan I found a few cases with boldface or italics in the |head=, which is now reflected in the translit; should we remove the boldface from the head?
Page 7468 Κύριλλος: Removed tr=Cýrillos: {{head|el|proper noun|head='''Κύριλλος'''|tr=Cýrillos|g=m|sc=Grek}}
Page 7562 δράματα: Removed tr=''drāmata'': {{head|grc|noun form|sc=polytonic|head=δρᾱ́μᾰτᾰ|tr=''drāmata''|g=n}}
Page 5096 Հայաստանի Հանրապետություն: Removed tr=Hayastani Hanrapetut'yun: {{head|hy|proper noun|head='''[[Հայաստան|Հայաստանի]]''' '''[[հանրապետություն|Հանրապետություն]]'''|tr=Hayastani Hanrapetut'yun|sc=Armn}}
Page 8030 Ռուսաստանի Դաշնություն: Removed tr=Ṙusastani Dašnut’yun: {{head|hy|proper noun|head='''[[Ռուսաստան|Ռուսաստանի]]''' '''[[դաշնություն|Դաշնություն]]'''|tr=Ṙusastani Dašnut’yun|sc=Armn}}
Page 3338 ვახშამი: Removed tr=vaxšami: {{head|ka|noun|head='''ვახშამი'''|tr=vaxšami|sc=Geor}}
Page 10509 ფათერაკი: Removed tr=p'at'eraki: {{head|ka|noun|head='''ფათერაკი'''|sc=Geor|tr=p'at'eraki}}
Benwing (talk) 23:00, 8 July 2015 (UTC)
I don't know about the mechanics here; I just saw an isolated error. I fixed μᾰκρά; ObsequiousNewt did the same for . — I.S.M.E.T.A. 23:33, 8 July 2015 (UTC)
It's probably worth noting that the manual transliterations don't usually include vowel length of α/ι/υ (or, to be honest, any useful information that automatic translation gives.) Which is why override_translit is on. Which is why it's probably safe to remove manual transliterations. If you want to be sure, though, check if the manual transliteration contains one of ā, ī, ū, and leave it alone if it does. —ObsequiousNewt (εἴρηκα|πεποίηκα) 01:58, 9 July 2015 (UTC)
@ObsequiousNewt OK, I added that check and did a run; it output 250 warnings about long a/i/u on pages with the manual translit different from the automatic one. They're listed here: User:Benwing/grc-long-vowel-warnings If you want to fix them up, please do. Note that I already removed manual translit from all the Ancient Greek lemma pages, so there are surely many more cases that have already disappeared (I can create a list of those pages if you want). Benwing (talk) 04:24, 9 July 2015 (UTC)
User:Benwing/grc-more-long-vowel-warnings Benwing (talk) 04:55, 9 July 2015 (UTC)
Those were formatting errors. I fixed them. The bot is doing a great job, thanks for running it. --Vahag (talk) 09:40, 9 July 2015 (UTC)
You're welcome! Benwing (talk) 05:32, 10 July 2015 (UTC)

Ancient Greek transcriptions[edit]

Why is your bot reädding the 4th parameter to {{grc-noun}} & {{grc-proper noun}}? Neither template uses the 4th parameter at all, and AG has a sorting function. —JohnC5 05:41, 15 July 2015 (UTC)

The 4th parameter here is an old transliteration parameter, not a sorting parameter, and the cases I'm re-adding are exactly those where the Latin has a macron over a, i or u. I have other code that will transfer the macron to the Greek in the appropriate place and then remove the translit again. Benwing (talk) 06:55, 15 July 2015 (UTC)
Except that the templates don't do anything with the 4th parameter: see diff and diff. Chuck Entz (talk) 08:03, 15 July 2015 (UTC)
My mistake: it's the bot, not the template, that's supposed to be using it- never mind... Chuck Entz (talk) 08:11, 15 July 2015 (UTC)
Fair enough. Sorry to pester. —JohnC5 13:22, 15 July 2015 (UTC)
Hi, ιυ isn't a diphthong but a sequence of two vowels across a syllable break, so the first edit here was wrong: the smooth breathing should be over the ι and the acute over the υ. —Aɴɢʀ (talk) 09:49, 16 July 2015 (UTC)

Hello Benwing. Could you explain why, in this edit, your bot replaced four coronides: ⟨⟩ with 4 × ⟨ ̓⟩ (4 × [U+0020 SPACE, U+0313 COMBINING COMMA ABOVE])? That shouldn't happen, and I've rolled back its change. — I.S.M.E.T.A. 16:17, 16 July 2015 (UTC)

@Angr, I'm so meta even this acronym Oops. I'm going to look through the changelogs and see if there are any other cases, and fix them. Benwing (talk) 06:13, 17 July 2015 (UTC)
@I'm so meta even this acronym The coronides got changed because as part of the matching I first decomposed using Unicode NFKD form and then recomposed using NFKC form. The "K" in both of these does compatibility transformations, and it apparently includes transforming those coronides into the comma-above sequences. This compatibility transformation had the positive side effect of fixing a number of cases where Unicode U+00B5 was incorrectly being used to represent a Greek mu, but it also changed the coronides as above.

The only other error I found was in Ἀσία and Ἀσιανός, which had both a macron and a breve and my code wasn't expecting that, so the order ended up not quite right; fixed. Benwing (talk) 07:41, 17 July 2015 (UTC)

Thank you, Benwing. My guess is that [U+0020, U+0313] is the NFKC form for ʼ (U+02BC MODIFIER LETTER APOSTROPHE), (U+1FBD GREEK KORONIS), and ᾿ (U+1FBF GREEK PSILI); however, IMO, we should never use [SPACE + COMBINING CHARACTER] where there exists a standalone character for what we want. — I.S.M.E.T.A. 12:03, 17 July 2015 (UTC)

hajrr > هَجْر[edit]

Just curious, in this diff, how did your bot know to ignore the double r? --WikiTiki89 15:26, 20 July 2015 (UTC)

One of the many, many things my code does is to remove double consonants in the Latin when next to another consonant, except for certain cases I could think of where this might legitimately occur (e.g. mudhhib, a non-canonical representation for muḏhib). This occasionally removes double consonants when it maybe shouldn't, in some weird situations that are mostly errors anyway (one that arguably isn't is dunḡḡwān = Arabic rendering of the city of Dongguan) but it fixes a lot more errors than it creates. The code that does the matching-up of Arabic and Latin is about 1,500 lines of Python (this includes a Python version of the Lua code in Module:ar-translit, and a couple hundred test cases); it got this way because I kept going through the errors and accumulating test cases that I thought "should" work, and adding more code to get them to work. Not sure why I put so much effort into this; I guess it seemed an interesting problem. An early version of the matching code sits between lines 303 and 590 of Module:ar-translit; I will probably remove it because it's out of date and not used anywhere in Wiktionary itself. Benwing (talk) 11:04, 21 July 2015 (UTC)
I'm curious about what causes these double consonant errors in the first place. What kinds of situations were you seeing with consonants erroneously doubled in the transliteration? hajrr I could ascribe to the syllabicity of the r in this position. --WikiTiki89 13:10, 21 July 2015 (UTC)
Some other examples:
  • ʔela al-xalff
  • ḵalff
  • ḡusll
  • qawss
  • ʿalaa al-ʿakss
  • ʿabrr
  • ṭawdd
All of these occur at the end of a short word, but not all involve syllabic consonants. There are also some cases with the doubled consonant before another consonant:
  • ḵallf
  • ṣaffrāʿ
  • ḵassm
  • ḥarrba
  • tellk
Also some cases like fiyyyā with a triple consonant, which I also fix. I don't know where these errors originated. Benwing (talk) 13:04, 22 July 2015 (UTC)



Are you interested in becoming a Wiktionary admininistrator? Pls ping me and reply here. You'll need to allow to be contacted by email address (which you have) and set your time zone, if I'm not mistaken. --Anatoli T. (обсудить/вклад) 10:06, 22 July 2015 (UTC)

I'd support your nomination, Benwing. — I.S.M.E.T.A. 12:22, 22 July 2015 (UTC)
@Atitarev I'd be interested. I set my time zone; let me know if there's anything else I need to do. Benwing (talk) 12:48, 22 July 2015 (UTC)
Thanks. Please accept the nomination at Wiktionary:Votes/sy-2015-07/User:Benwing for admin, check the Babel list and fix the start/end dates - usually two weeks from the acceptance date and the vote can begin, I guess :). Good luck and happy editing! --Anatoli T. (обсудить/вклад) 13:02, 22 July 2015 (UTC)
@Atitarev Done. Benwing (talk) 13:20, 22 July 2015 (UTC)
Your vote has passed, you are an Admin. Please add your name to WT:Admin. Also, see Help:Sysop tools. —Stephen (Talk) 21:17, 11 August 2015 (UTC)
@Stephen G. Brown Thank you. Benwing (talk) 06:48, 12 August 2015 (UTC)


diff. Already fixed. Chuck Entz (talk) 04:07, 25 July 2015 (UTC)

Oops. Thanks for fixing it. Those changes marked as "manual" are cases where I manually edited the template and pushed the changes using my bot, so the mistake is my editing mistake and not a bot screwup. Benwing (talk) 06:26, 25 July 2015 (UTC)

Arabic terms with irregular pronunciations[edit]


I've started Category:Arabic terms with irregular pronunciations primarily for loanwards, similar to Russian, Thai categories and a Japanese category with a different name. Hopefully it's OK with you. --Anatoli T. (обсудить/вклад) 23:45, 10 August 2015 (UTC)

BTW, what should be done with indeclinable terms, like إنجلترا? Does it need a "Declension" section at all? Russian nouns are automatically made indeclinable by providing "-" where genitive form should be, e.g. Алматы. --Anatoli T. (обсудить/вклад) 23:49, 10 August 2015 (UTC)
In Arabic I've been creating declension sections for all nouns, even ones that are indeclinable. In this case there's some useful information in the table e.g. fact that it is definite even without preceding al-.
As for the category you've created, I have no problem with it, although I think words should be put there automatically when possible (e.g. by Module:ar-nominals; it's more complex than simply looking for manual translit, because of non-final tāʾ marbūṭa, but that should be fixable with a bit of work). Benwing (talk) 01:58, 11 August 2015 (UTC)
Yeah, I also thought about that. Category:Russian terms with irregular pronunciations is also populated automatically, except for terms, which use generic {{head}}, not {{ru-noun}} and similar. --Anatoli T. (обсудить/вклад) 02:01, 11 August 2015 (UTC)
@Atitarev I implemented this. Benwing (talk) 09:58, 11 August 2015 (UTC)


I originally suspected that the "u" in داود was long because it's annotated as such in the tajweed Korans, but it is also listed as such in Wehr; it looks like the spelling داوود (the standard in Persian) also exists in Arabic, and that داود was retained for whatever reason, whether because it's the Koranic spelling or because داوود looks strange or something like that. In any event, I think we can say fairly confidently that dāwūd should be our favored transliteration. Aperiarcam (talk) 18:54, 11 August 2015 (UTC)

If you go by the Quran, then you would have to transliterate ـه (the 3rd person masculine singular suffix pronoun) as -hū/-hī as well. What really matters is how it is pronounced today. --WikiTiki89 19:00, 11 August 2015 (UTC)
Right, I recognize that (and there are plenty of other oddities in Koranic pronunciation and spelling); I just meant that that's what first made me think it was a long "u." But my suspicion is that the Koran is at least partly responsible for the un-orthographic spelling داود (interestingly enough ar:w:داود uses one waw in the header and both spellings in the text, often with the spelling داوُد (with damma) even when vowels are not otherwise annotated.) Aperiarcam (talk) 19:08, 11 August 2015 (UTC)
That also reminds me of the spelling إبرهيم (or perhaps ابرهيم), which we should probably have since it seems to be common (or to have been common). --WikiTiki89 19:15, 11 August 2015 (UTC)
Yes, I think we should include these spellings at least for the prophets, but I think (could be wrong) داود is a little unusual in its currency in modern Arabic; the absence of medial alif is a much more predictable feature of Koranic spelling and I've never seen it in modern writing, but again I may just be wrong on this count. We have an entry for صلوة so I figure we should include any of these peculiar Koranic spellings we think may prove useful to somebody. Aperiarcam (talk) 19:24, 11 August 2015 (UTC)
That also reminds me of the spelling تورية, which I will add. --WikiTiki89 19:37, 11 August 2015 (UTC)
@Benwing: Can we add a feature to the transliteration module to ignore an unvocalized yāʾ or wāw following a dagger ʾalif? --WikiTiki89 19:49, 11 August 2015 (UTC)
I stand corrected about داود. Sorry about that, I thought it was a random mistake, like so many others I've fixed. @Wikitiki89 How many cases are there with dagger alif followed by unvocalized yāʾ or wāw? Benwing (talk) 21:38, 11 August 2015 (UTC)
I don't know how many, but it is a rather common classical spelling of many nouns (mostly derived from Aramaic, I believe) that are now standardized to ـَاة(-āh). I don't know what other situations this occurs in. --WikiTiki89 02:46, 12 August 2015 (UTC)

Root categories for Arabic[edit]

(Notifying Benwing, Atitarev, Mahmudmasri): Check out Category:Hebrew terms by root and {{HE root}}. Do you think we should do the same for Arabic? --WikiTiki89 12:19, 12 August 2015 (UTC)

Sure, why not. This should be easy enough to implement in {{ar-root}}. Benwing (talk) 12:22, 12 August 2015 (UTC)
The idea is to replace the etymologies that say "From the root XXX" because most of the time, the root is not the actual etymological source of the word, but an after-the-fact classification. This will associate the word with the root without implying it is derived from the root. It may be easier to implement it as part of {{ar-root}}, but it may be better to have a separate template like we do for Hebrew and leave {{ar-root}} for simply creating links to roots. This would also allow us to use the syntax {{AR root|ء ه ل|س ه ل}} in entries like أهلا وسهلا, which {{ar-root}} does not support. --WikiTiki89 12:34, 12 August 2015 (UTC)
Currently, {{ar-root}} supports either the syntax ك ت ب(k-t-b) or ك ت ب(k-t-b). The syntax with separate params is older. I could rewrite all the latter uses to the former ones; then we could use multiple params for multiple roots. I guess what you're referring to by separate templates is that one displays the root box on the right side and one would inline the root link; I could imagine implementing that with params to {{ar-root}} (e.g. presumably when you inline a root link you also want the box on the right side by default; you could e.g. have params |nobox= and |nolink= to turn off one or the other or both). Or separately named templates ... Benwing (talk) 12:44, 12 August 2015 (UTC)
I forgot to mention that {{ar-root}} as a linking template could be useful even in entries that do no belong to that root (such as "See also {{ar-root|XXX}}"), which is not meant to categorize. So I still think it would be better to keep them separate. --WikiTiki89 12:47, 12 August 2015 (UTC)
I see. This could be supported by a |nocat= param or a separate {{ar-root-link}}. One thing I'd like to avoid is having duplicate root link and root box templates in the common case where a root is linked in the etym section. (I understand you'd like to eliminate them but that is a long-term project unless it can be done automatically.) Benwing (talk) 13:02, 12 August 2015 (UTC)
There's nothing wrong with having duplicate templates. This isn't any scarier than duplicating inflection information in {{head}} and in a table. My plan was that could use a bot to add {{AR root}} everywhere where {{ar-root}} appears in an etymology and then work manually to add {{AR root}} everywhere else and remove {{ar-root}} where it is unneeded. The latter process would have to happen either way and would have to be done manually either way. --WikiTiki89 13:37, 12 August 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I think we need a better name than {{AR root}}, which doesn't follow normal naming conventions and is very confusable with {{ar-root}}. I know that it parallels {{HE root}} and {{PIE root}} but these are likewise misnamed (esp. {{HE root}}). Perhaps {{ar-root-box}}? Also, I really don't like the idea of duplicating the stuff; I know we do it for headword/etymology but it's a pain in the ass there, and better avoided if possible. I'd rather have e.g. {{ar-root-box}} to insert a root box and a category, and {{ar-root-link}} to insert a root link, and {{ar-root}} to do both, to be rewritten manually as we get around to it (but note, there isn't an Arabic etymological dictionary so it's often not obvious what to replace the etym with). Note that {{ar-root-box}} can be called automatically from {{ar-verb}}, which knows the root of the verb in question and is able to derive it from the lemma and form class (sometimes with a bit of manual help when the auto-derived root would be ambiguous; this occurs principally with 2nd-weak and 3rd-weak roots of verbs of form class II and higher). The same thing applies to {{ar-verbal noun of}} so that verbal nouns get root boxes/cats, with a bit of bot work to propagate those manually-assisted root consonants. Same thing could potentially be done for active and passive participles using (not currently existing) templates {{ar-active participle of}} and {{ar-passive participle of}} if/when I get around to running my bot to create those participle entries (the code is all written and a few entries have already been created -- check the links to {{ar-act-participle}} and {{ar-pass-participle}}). Benwing (talk) 14:06, 12 August 2015 (UTC)

I don't care so much about the name. Having {{ar-verb}} call it, however, may clutter the page with too many boxes when only one is needed, especially when there are two of them on consecutive lines. --WikiTiki89 14:22, 12 August 2015 (UTC)
Suppport (late reply). Good idea - Category:Arabic terms by root. I thought about it too. --Anatoli T. (обсудить/вклад) 22:59, 12 August 2015 (UTC)
I guess {{ar-verb}} can simply categorize without creating the box. Benwing (talk) 04:50, 13 August 2015 (UTC)

You are an admin now[edit]

You no longer need to use this template. Either you use {{rfd}} if you want it to be discussed, or you can speedy delete it yourself. --WikiTiki89 18:03, 16 August 2015 (UTC)

OK. I was doing this out of habit more than anything else. Benwing (talk) 02:13, 17 August 2015 (UTC)

Belarusian and Ukrainian old adjective templates[edit]

Can you orphan {{be-adj1}}, {{be-adj2}}, {{be-adj3}}, {{be-adj4}}, {{be-adj5}}, and {{uk-adj1}}? No logic is required, all you need to do is subst: them. --WikiTiki89 18:43, 17 August 2015 (UTC)

OK. Benwing (talk) 23:35, 17 August 2015 (UTC)
@Wikitiki89 Done. Benwing (talk) 11:16, 18 August 2015 (UTC)
Thanks! --WikiTiki89 11:29, 18 August 2015 (UTC)

Category:Pages with Module errors[edit]

Your recent edit to Template:ru-decl-noun has added the page отруби to this category. --kc_kennylau (talk) 06:35, 20 August 2015 (UTC)

Removing stress from monosyllabic terms[edit]

Hi Benwing #2 :), also @Wikitiki89, Cinemantique

It's something nice to have and not urgent at all (!)

Do you think it would be possible to automatically remove stresses from monosyllabic terms in Module:ru-translit and the Cyrillic forms? The only reason why they exist in declension/conjugation tables is technical (the Russian Wiktionary also uses word stresses on in such cases). It doesn't really make sense to add a stress mark for monosyllabic terms.

Having to manually transliterate words with "ё" to change "(j)ó" for "(j)o" like чёрт(čort), лёд(ljod), ёж(jož), лёг(ljog) and supplying "|notrcat=1" is also annoying.

Please note, the word stress on monosyllabic terms is still important in expressions and shouldn't be removed, it's only for standalone terms or forms. --Anatoli T. (обсудить/вклад) 02:12, 21 August 2015 (UTC)

@Benwing. --Anatoli T. (обсудить/вклад) 02:13, 21 August 2015 (UTC)
I actually find them helpful in declenion/conjugation tables and would be opposed to removing them there. In headword lines and links, I don't care. --WikiTiki89 02:16, 21 August 2015 (UTC)
How is a word stress helpful when there's only one syllable and only one word? I asked you recently the same question about Hebrew and you said you opposed the stress mark on monosyllabic terms. What's the difference? --Anatoli T. (обсудить/вклад) 03:20, 21 August 2015 (UTC)
In a declension table, the monosyllable forms are there together with polysyllabic forms. Having stress marks on all of them helps you see the pattern in the declension. For Hebrew, the situation is different. Almost all words have final stress, so the stress mark is really only helpful when the stress is non-final. Also, Hebrew doesn't have declensions, or at least not the way Russian does and not with unpredictable stress patterns. --WikiTiki89 04:53, 21 August 2015 (UTC)
I don't think that seeing "зу́б" in the table, as opposed to "зуб" helps you see the declension pattern any better - regular or irregular. If it does, then how? --Anatoli T. (обсудить/вклад) 04:59, 21 August 2015 (UTC)
As for Hebrew, even if we do have tables, I still don't want to see the stress on monosyllabic forms in any language but I want to see it on inflected forms, derivations, feminine forms, etc. --Anatoli T. (обсудить/вклад) 05:04, 21 August 2015 (UTC)
When you put one word by itself, it doesn't help much, but when it's in a table, you can more easily see that "зу́б" and "зу́ба" have stress on the same syllable than if it said "зуб" and "зу́ба". --WikiTiki89 05:07, 21 August 2015 (UTC)
No, it gives nothing. Also, it's absolutely nontraditional, at least for Russian.--Cinemantique (talk) 06:37, 22 August 2015 (UTC)


Wouldn't it be better to just re-create the dative ending with attach_stressed than to artificially move the stress? --WikiTiki89 20:05, 21 August 2015 (UTC)

I did it that way in case the dative singular has an override, on the theory that the locative singular should always be stress-moved dative singular. Benwing2 (talk) 20:10, 21 August 2015 (UTC)
But theoretically, if the dative were overridden, you wouldn't know that the last vowel is stressable. In such cases, it wouldn't be unreasonable to expect the locative to also be overidden. But I don't think there are very many irregular datives (or any at all). --WikiTiki89 20:15, 21 August 2015 (UTC)
By the way, there seems to be a bug in the noun declension module that pre-reform declensions with loc=+ do not get their endings stressed. --WikiTiki89 14:43, 11 September 2015 (UTC)
@Wikitiki89 Can you point me to an example that fails? Benwing2 (talk) 19:25, 11 September 2015 (UTC)
Couldn't find one, so I made one. --WikiTiki89 20:15, 11 September 2015 (UTC)
@Wikitiki89 Problem was the call to make_unstressed() when calling m_links.full_link() in make_table() in old-style declensions, when there's already a link present (which is the case with locatives). This is theoretically a bug in full_link() in that it ignores the alt text in these circumstances, but since I think the purpose of make_unstressed() here is just to convert ё to е, I fixed it by only making that change. Benwing2 (talk) 23:50, 11 September 2015 (UTC)
But then there would still be a bug if the word had a ё in it, even though it is unlikely (but still possible) that such a case exists. The correct solution would be to not already have the link in entry. --WikiTiki89 19:41, 12 September 2015 (UTC)

invariable declension[edit]

Do we really need a whole declension table for invariable nouns? --WikiTiki89 21:39, 21 August 2015 (UTC)

Well, I used it in a noun that was listed as invariable in its attested cases, but only attested for 4 out of 6 cases in the singular; so it's useful to have a table. Also, I'm planning on extending the module to handle multi-word expressions where each word is declined individually, and in that scenario some of the words have to be treated as invariable, e.g. in крем для бритья, where I'm thinking of a syntax like {{ru-decl-noun-multi|кре́м|для*|бритья́*}}; or in Сент-Винсент и Гренадины, it would be {{ru-decl-noun-multi|Сент-Ви́нсент|и*|Гренади́н:а|n=pl|n1=sg}}; or in Красная площадь it would be {{ru-decl-noun-multi|Кра́сн*ая|пло́щад:ь-f}}. (What I'm doing here is bunching up the separate arguments of the current template into one, with colons separating arguments for nouns in the order STEM:DECL:ACCENT:BARE:PL:BAREPL and taking advantage of default values, and adjectival declensions of the form STEM*DECL, and invariable words of the form WORD*. I think this is less ugly in these circumstances than the alternatives.) Benwing2 (talk) 22:01, 21 August 2015 (UTC)
BTW the partly-attested invariable word in question is полпути. Benwing2 (talk) 22:08, 21 August 2015 (UTC)
Well actually, in words with пол-, the second part is always in the genitive and they are really more adverbial than nouns. --WikiTiki89 22:18, 21 August 2015 (UTC)
Some of them, like полсотни, seem to have full declensions. Benwing2 (talk) 22:25, 21 August 2015 (UTC)
Well that's only because it replaces the other cases with полу-, which creates normal declinable nouns. полусотня is actually a word by itself too. --WikiTiki89 22:39, 21 August 2015 (UTC)

Pre-reform declension of дерево[edit]

Why doesn't this work: {{ru-noun-old|де́рево|-ья||дере́в|or|c|gen_pl=дере́вьевъ,дере́въ,дерёвъ}}. Also interestingly, {{ru-noun-old|де́рево|ъ-ья||дере́в|or|c|gen_pl=дере́вьевъ,дере́въ,дерёвъ}} does something strange. --WikiTiki89 21:53, 15 October 2015 (UTC)

@Wikitiki89 Oops, fixed the bug. The second one is doing something strange because ъ-ья is a declension for nouns ending in -ъ rather than -о. Benwing2 (talk) 22:14, 15 October 2015 (UTC)
Thanks. But you can say the same for -ья in the modern declension, that it is for nouns ending in - rather than -о. --WikiTiki89 00:32, 16 October 2015 (UTC)
Well, -ья is overloaded in meaning; when it stands alone or following a gender, it's recognized specially and considered a "declension variant", otherwise it's considered an explicit declension. Perhaps I should have chosen a different signal for the declension variant, since -ья was already being used as an explicit declension; but it seemed to make the most sense that way. Benwing2 (talk) 05:10, 16 October 2015 (UTC)
No, you're right. It's just I was confused by the bug and wanted clarification. --WikiTiki89 17:21, 16 October 2015 (UTC)


The declension has errors in the plurals. For example, current genitive plural пауко́в-во́лков(paukóv-vólkov) should be пауко́в-волко́в(paukóv-volkóv) instead. —Stephen (Talk) 05:29, 20 October 2015 (UTC)

@Stephen G. Brown Fixed. Thanks for noticing it. Benwing2 (talk) 05:44, 20 October 2015 (UTC)

Your bot edits regarding pre-reform entries[edit]

I don't think pre-reform entries should have pronunciation sections, since they are just duplicates of the modern entries. I also think pre-reform declensions should use the categories for modern entries or should not be placed in categories at all. --WikiTiki89 17:58, 27 October 2015 (UTC)

@Atitarev, Cinemantique What do you think? It is not very hard to change the categories and put the pre-reform entries in modern categories. It is a bit trickier to eliminate the pronunciation sections since it means writing a bot to undo the previous changes, and it's not clear to me it's a good idea in any case -- I think it might be useful to have pronunciation sections since otherwise people may be confused by the old characters. Benwing2 (talk) 21:47, 27 October 2015 (UTC)
I support centralisation of contents. No need to duplicate. Soft-redirects don't need pronunciations. Actually the same should apply to Category:Russian spellings with е instead of ё. --Anatoli T. (обсудить/вклад) 22:03, 27 October 2015 (UTC)

отсос пограничного слоя[edit]

Error in the header line. I am not familiar with that encoding, so I can’t fix it. —Stephen (Talk) 05:07, 30 October 2015 (UTC)

Thanks. Benwing2 (talk) 05:09, 30 October 2015 (UTC)

Screenshot request[edit]

Hi Benwing. Sorry to pester you, but would you mind responding to this request of mine, please? — I.S.M.E.T.A. 22:02, 10 November 2015 (UTC)

@I'm so meta even this acronym Sorry! I seem to have problems reading. I saw your response but didn't manage to read it carefully and so missed your request. I'll post the screenshot in a second. Benwing2 (talk) 22:16, 10 November 2015 (UTC)
Wonderful! Thank you so much. — I.S.M.E.T.A. 03:31, 11 November 2015 (UTC)

WingerBot source[edit]

The github link on User:WingerBot does not work anymore, is the source code no longer available? Jberkel (talk) 11:08, 11 November 2015 (UTC)


Hi. The verb отпереть has a bad conjugation. The present tense and the imperatives should be like отопру́, etc., with отопр- at the beginning. —Stephen (Talk) 21:47, 14 November 2015 (UTC)

@Atitarev, Cinemantique, Wikitiki89 I'm not really sure how the verbal templates work; Anatoli or others, can you fix it? Benwing2 (talk) 21:58, 14 November 2015 (UTC)
Mostly done but I'll have to force |past_f=отперла́, like with some other verb types. --Anatoli T. (обсудить/вклад) 22:09, 14 November 2015 (UTC)
Fixed. --Anatoli T. (обсудить/вклад) 23:30, 14 November 2015 (UTC)
@Atitarev: please check Category:Pages with module errors. Chuck Entz (talk) 01:48, 15 November 2015 (UTC)
@Atitarev: Are there also colloquial forms отпёр/отпёрло/отпёрла/отпёрли, or am I imagining them? Also, the module errors were caused by your recent edit to the module. --WikiTiki89 02:43, 15 November 2015 (UTC)
Fixed the fix. Yes, отпёр/отпёрло/отпёрла/отпёрли are colloquial forms, especially in some vulgar senses. --Anatoli T. (обсудить/вклад) 08:11, 15 November 2015 (UTC)

About the voice recordings[edit]


Our renovation is in full swing. I have to apologise again for not making the recordings still. I just don't have a quiet moment at home at the moment. I will do it as soon as I have some time when it's convenient. --Anatoli T. (обсудить/вклад) 01:45, 3 December 2015 (UTC)

OK, sounds good. Good luck with your renovation. Benwing2 (talk) 06:46, 3 December 2015 (UTC)

I am not sure if you got my ping at User talk:Atitarev/recording.--Anatoli T. (обсудить/вклад) 21:01, 5 December 2015 (UTC)

@Atitarev I didn't get your ping, not sure why. Thanks very much for the recordings! I'll check them out ASAP. Benwing2 (talk) 05:31, 6 December 2015 (UTC)
Hi. I am not 100% sure if it is a great idea to change the test cases and pronunciation rules based on my recordings and your findings right now. I've provided one way of pronouncing those words, which is not the only correct way in some cases, besides, perhaps it's better to check with some references as well and have a discussion. What do you think? @Cinemantique, Wikitiki89, Wanjuscha, Stephen G. Brown, you're welcome to comment on my recordings and pronunciation rules (final -е). Cinemantique will probably oppose some new test cases in Module:User:Benwing2/ru-pron/testcases. E.g. "дава́йте" can be both [dɐˈvajtʲe] and [dɐˈvajtʲɪ] but only the latter is referenced. --Anatoli T. (обсудить/вклад) 23:19, 7 December 2015 (UTC)
@Atitarev Are you referring to the test cases in Module:User:Benwing2/ru-pron/testcases? I put them in about a week ago. They are almost all directly based on Cinemantique's references except for interpreting Avanesov's -ь as [-e] instead of [-ɪ]. They're not based off of your recordings, which I'm still going through. The code in Module:ru-pron is based off of the same thing as the testcases, but that code isn't enabled and I won't enable it until I get some sort of consensus. BTW I've asked both User:Cinemantique and User:Wikitiki89 for comments about final -е and haven't received any recently. I'm still hoping someone can look up and post the text of the references that Wikipedia uses to justify the pronunciation of final -е in жи́теле as [ˈʐɨtʲɪlʲɛ]: They are Avanesov 1975 pages 121-125 ("Фонетика современного русского литературного языка") and Avanesov 1985 page 666 ("Сведения о произношении и ударении", in Borunova, C.N.; Vorontsova, V.L.; Yes'kova, N.A., Орфоэпический словарь русского языка. Произношение. Ударение. Грамматические формы).
If you have time to do any more recordings, take a look at the new comments in Module:User:Benwing2/ru-pron/testcases. I've tried to create a bunch of minimal pairs or near-minimal pairs that should make it much clearer whether final -е in various circumstances is pronounced the same as -я, the same as -и, or neither. Without the direct comparisons, it's harder to say whether for example "дава́йте" is [dɐˈvajtʲe] or [dɐˈvajtʲɪ] or both.
BTW almost anything would be an improvement to what we have now, where most final -е's are rendered as [ʲə]. Benwing2 (talk) 23:44, 7 December 2015 (UTC)
I listened to the recordings at User talk:Atitarev/recording and I thought they were excellent. They are clear and correct. —Stephen (Talk) 23:52, 7 December 2015 (UTC)
Thank you. Benwing2 (talk) 23:53, 7 December 2015 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Atitarev BTW, comments on your recordings: First, thank you very much for making them. In general, they are quite fast for me, so I'm probably not the best to be making comments on them. But for the most part, in most words I hear something like final [ɛ], or at least it's lower than cardinal [e]; the exceptions are primarily -ое, where (at least in some words) I clearly hear [-ə], and after hard шжц, where in some words it sounds like [-ə] and in some words [-ɨ] and in some words in between. For example, in your pronunciation of дунове́ние near the beginning, the stressed е sounds clearly like cardinal [e] and the unstressed е is clearly lower. Similarly for other words in -е́ние, and also in дружне́е near the end (and for the matter, I hear cardinal [e] rather than [ɛ] in the э́ in Тайбэ́е). In general, I don't hear [-ɪ], although it's hard to say for sure because the vowels are so short and because [ɪ] can mean various different things, e.g. English [ɪ] in words like bit and pin is quite different from [i] and is probably much lower and more central than what is intended by Russian [ɪ]. If Russian [ɪ] is intended as something that's not hugely different from [i], then the only times I can recall that I heard that sound is in девяно́сто четы́ре, два́дцать четы́ре and in да́йте in да́йте мне (in the latter case, it may be because the -е isn't phrase-final). Benwing2 (talk) 00:17, 8 December 2015 (UTC)


Should be [lʲɪˈnʲæ(j)ɪt], not [lʲɪˈnʲaɪt]. --WikiTiki89 18:53, 7 December 2015 (UTC)

@Wikitiki89 I assume that in place of [lʲɪˈnʲæ(j)ɪt] you should have either [lʲɪˈnʲæjɪt] with /j/ and fronted [æ], or [lʲɪˈnʲaɪt] without /j/ and without fronted [æ], since fronting of [æ] occurs only between palatal consonants. Both versions are indicated. Is this not correct? Benwing2 (talk) 19:09, 7 December 2015 (UTC)
Oh, I didn't even realize that that word is the reason that there were two transcriptions at волк каждый год линяет, а всё сер бывает. Anyway, it would be [lʲɪˈnʲæɪt], because the /j/ is phonemically there even if it is dropped phonetically. Same goes for -я́и-. --WikiTiki89 19:19, 7 December 2015 (UTC)
@Wikitiki89 OK, that will simplify some things, although we will still need the double transcription for words like счастливо where there's optional palatal assimilation, unless something similar with underlying phonemic /sʲ/ or /s/ is going on there as well. Benwing2 (talk) 19:27, 7 December 2015 (UTC)
Yes, счастливо will need two transcriptions. --WikiTiki89 19:29, 7 December 2015 (UTC)
@Wikitiki89, Cinemantique I'm pretty sure the conjunction а is not reduced in волк каждый год линяет, а всё сер бывает and other cases. --Anatoli T. (обсудить/вклад) 03:41, 8 December 2015 (UTC)
I'm not sure. The difference between [ɐ] and [a] is fairly small, and since it is at the beginning of a word, it would never be [ə] in either case. But I tried saying a few phrases to myself, and I think that it is in fact reduced. --WikiTiki89 16:58, 8 December 2015 (UTC)

Some pronunciation issues I came across[edit]

  1. In Комите́т госуда́рственной безопа́сности(Komitét gosudárstvennoj bezopásnosti), the /t/ should not assimilate to the /g/. Perhaps this sort of assimilation only occurs when the next syllable is stressed?
  2. In ни пу́ха ни пера́(ni púxa ni perá), the ни(ni) should be reduced to [nʲɪ]. This also applies to many short function words.
  3. I think пролива́ть све́т(prolivátʹ svét) should produce [prəlʲɪˈvatʲ͡s ˈsvʲet]. That is, the onset of the [tʲ] is still palatalized, but the rest is not; even more precisely, the depalatalization occurs before the transition from plosive to fricative, but I don't know how to incorporate such details into IPA.
  4. I think that Сме́рть шпио́нам!(Smértʹ špiónam!), should be [ˈsmʲerʈ͡ʂ ʂpʲɪˈonəm]. There may or may not still be a trace of palatalization at the onset of the [ʈ], but even if there is, there is much less than in the case above.

I honestly think we should stop trying to provide these kind of overdetailed assimilation details. --WikiTiki89 22:12, 8 December 2015 (UTC)

For #1, as it happens I recently added a feature to disable assimilations, an underscore between consonants. #2 is easily fixable with a tie bar. For #4 we can disable the code that converts palatalized retroflexes to alveolopalatals, if Anatoli agrees. #3 will require a bit more work (although not that much). I don't mind including detailed assimilation info but we can take some of it out if Anatoli agrees, eg maybe we don't need to show assimilation only of the first half of an affricate, maybe it's enough to show full palatalization. Benwing2 (talk) 22:46, 8 December 2015 (UTC)
@Wikitiki89 Does vowel reduction occur in all instances of unstressed ни? The other expressions are ни к селу, ни к городу, ни крошки, ни о ком, ни пуха, ни рыба ни мясо, ни с того ни с сего, ни в пизду, ни в Красную Армию, ни хуя, ни за что, сколько волка ни корми, он всё в лес смотрит, несмотря ни на что, как ни в чём не бывало, во что бы то ни стало. If so, I can add ни to the list of unstressed particles and it will automatically be reduced. Note that this will make it sound like не. Benwing2 (talk) 23:09, 8 December 2015 (UTC)
@Atitarev, Wikitiki89 As for #1, Anatoli do you agree that the /t/ doesn't assimilate to [d], and is there a rule for this (e.g. what Wikitiki suggested) or is it irregular? Benwing2 (talk) 23:12, 8 December 2015 (UTC)
It think it can assimilate in fast speech to normal speech like in (@Wikitiki89, it's no different from "брат Ге́ны") but I understand the concern that maybe we are going too far with "overdetailed assimilations details"? --Anatoli T. (обсудить/вклад) 23:22, 8 December 2015 (UTC)
In that case I think it's OK to indicate assimilation. I think maybe #3 above is beyond the level of what we need to worry about but the others can be handled without difficulty. Benwing2 (talk) 23:24, 8 December 2015 (UTC)
@Atitarev: You're right, I would not usually say [brad ˈgʲenɨ] either. @Benwing: I cannot think of a situation where ни is not reduced, unless it is explicitly stressed (as in что́ бы ни́ было). --WikiTiki89 15:53, 9 December 2015 (UTC)
@Wikitiki89 I added ni to the list of accentless particles. Benwing2 (talk) 13:14, 10 December 2015 (UTC)
Is there a way to force reduction for other particles? Personally, I think that any word without an accent mark should be interpreted as accentless in the pronunciation module. That would make the behavior more consistent. I also found another bug: "ото все́х" should produce [ɐtɐ ˈfsʲex], not [ɐtə ˈfsʲex]. --WikiTiki89 16:23, 10 December 2015 (UTC)
Yes, you can force reduction by putting a dot over the vowel. Benwing2 (talk) 21:52, 10 December 2015 (UTC)
Oh, I was using the wrong Unicode character when I tried that. The problem is "да̇ здра́вствует" produces [də ˈzdrafstvʊ(j)ɪt], but how can I force it to produce [dɐ ˈzdrafstvʊ(j)ɪt]? (Interestingly, in most other phrases where да is followed by a stressed syllable, such as да ну́, it for some reason remains [də ˈ-].) --WikiTiki89 23:42, 10 December 2015 (UTC)
In most cases да is stressed but not as a particle. Wikitiki89 is right, "да здравствует" should give [dɐ ˈzdrastvʊ(j)ɪt] (the first "в" is silent but there are cases when "вств" is [fstv]). --Anatoli T. (обсудить/вклад) 00:02, 11 December 2015 (UTC)
You're right, I wasn't paying attention to the word здравствует there. I think that in the cases where да is stressed, even if it is a majority of cases, it should have a stress mark. --WikiTiki89 15:19, 11 December 2015 (UTC)

There isn't currently a way to do that. Ifs there a way to predict which of the two unstressed a variants should occur? If not then I'll have to add some way to specify which one is needed. Benwing2 (talk) 00:42, 11 December 2015 (UTC)

The pronunciation of unstressed vowel is the same, no different from other cases, ɐ for immediately pretonal, ə - otherwise. Maybe да should be added to the list of prefixes and an accent would be required when it's stressed? --Anatoli T. (обсудить/вклад) 00:53, 11 December 2015 (UTC)
@Atitarev, Wikitiki89 I'll add да to the list of unstressed particles. This will affect вот это да, да здравствует, лакома кошка до рыбки, да в воду лезть не хочется, ходить вокруг да около. But as for the pronunciation of unstressed а, Wikitiki says that it's not always predictable, e.g. да здравствует with [dɐ] but да ну with [də]. Do you agree? Benwing2 (talk) 21:33, 11 December 2015 (UTC)
It's [ɐ], especially in Moscow pronunciation.--Anatoli T. (обсудить/вклад) 01:28, 12 December 2015 (UTC)
The [də] could just be a me-ism. I'll try to investigate. --WikiTiki89 16:13, 14 December 2015 (UTC)
@Atitarev Can you check the pronunciation of слышал звон, да не знает, где он? Not sure that да should be stressed here. Benwing2 (talk) 21:40, 11 December 2015 (UTC)
Checked. Could could you please hide (i.e. display as a space) the tie bar from the Cyrillic spelling when "phon=" is used, as in только что? --Anatoli T. (обсудить/вклад) 06:11, 12 December 2015 (UTC)
@Atitarev Done. Benwing2 (talk) 09:51, 12 December 2015 (UTC)
@Wikitiki89 OK, there is in fact a way of controlling whether you get [ɐ] or [ə], at least in some cases. If the word with unstressed а is considered part of the phonological word with the following stressed syllable, you'll get [ɐ], otherwise you'll get [ə]. So if you add a tie bar, e.g. ото‿все́х, you'll get [ɐ]. In this particular phrase it isn't necessary any more because I added ото to the list of unstressed particles, so it will automatically link to the next word. Benwing2 (talk) 21:47, 11 December 2015 (UTC)
Great, thanks! --WikiTiki89 16:13, 14 December 2015 (UTC)


I wonder why you marked as "inherited" a term that is apparently not even supposed to have an entry? —CodeCat 20:29, 9 December 2015 (UTC)

@CodeCat, Wikitiki89 This entry was done with a bot. I think what Wikitiki did was reasonable. Benwing2 (talk) 12:15, 10 December 2015 (UTC)

Module error at Template:ru-verb[edit]

This looks like a side effect to an edit at Module:ru-headword. Please check into it. Thanks! Chuck Entz (talk) 00:19, 11 January 2016 (UTC)

@Chuck Entz Fixed. Benwing2 (talk) 04:26, 11 January 2016 (UTC)


I have explicitly opposed accenting, especially auto-accenting, direct quotations in discussions with you in the past. Was there any recent discussion about auto-accenting that I missed before you ran your auto-accent bot? --WikiTiki89 22:22, 11 January 2016 (UTC)

@Wikitiki89 Why do you oppose? --Anatoli T. (обсудить/вклад) 22:27, 11 January 2016 (UTC)
@Atitarev: I've already explained this in many discussions in the past: We shouldn't modify direct quotations. But the point is that Benwing should have discussed this before doing a bot run, especially (but not only) because he should have known that I expressed my opposition to this past. --WikiTiki89 22:31, 11 January 2016 (UTC)
Ah, you're talking about quotations. I thought you oppose links to lemmas in inflected forms. --Anatoli T. (обсудить/вклад) 22:39, 11 January 2016 (UTC)
Yes, I said "direct quotations". Here's an example edit: diff. But even ordinary links I would have liked to discuss before the bot run. --WikiTiki89 22:59, 11 January 2016 (UTC)
@Wikitiki89 Apologies! I don't remember those discussions, otherwise I wouldn't have done it. I did an auto-accenting run once before (including quotations, I think; at the very least, my old auto-accenting script didn't have provisions to avoid direct quotations). I remember discussing it with you guys and getting assent to do it before I did it last time, that's why I didn't ask again. But it's quite possible that I somehow overlooked your opposition to auto-accenting direct quotes last time I did it. In any case, if you want, I'll write a script to undo the auto-accenting of direct quotes. If you want this done, it should probably be done to {{ux}}, {{usex}} and {{lang}}, because that's generally how direct quotes are formatted. Benwing2 (talk) 00:23, 12 January 2016 (UTC)
BTW, note that quite a lot of existing quotations have accents in them (i.e. before my auto-accenting run), which almost certainly weren't in the original. I'm not quite sure why you oppose adding accents to direct quotes (we're a dictionary, after all), but I'm fine with reverting to the status quo ante. (Do you also oppose adding ё where it should be?) Benwing2 (talk) 00:27, 12 January 2016 (UTC)
Benwing2, if you revert the accents, please restore manual (accented) transliterations there may be.
I still think it's silly to try to preserve unaccented original text in dictionaries. In texts for foreigners and children accents and "ё" are used. Besides, nobody quotes Pushkin's or Lermontov's original orthography. Chinese book publishers would have to cancel character simplification if they were to preserve the original spellings. --Anatoli T. (обсудить/вклад) 00:48, 12 January 2016 (UTC)
@Benwing2: It would be great if you could revert your bot's additions, you don't have to remove them if it wasn't your bot that added them. Keep in mind that bot edits should not be controversial, even if manual edits can be. And yes, I also oppose adding ё when it was not in the source. I thought I had mentioned that accents should not be added to quotations when you first brought up this script here, but I guess my memory must be failing me. Anyway, here is a recent BP discussion we had on the topic, in which there was no strong consensus, but overall opinion was seemingly leaning toward not modifying quotations. @Atitarev: You made those same arguments in the BP discussion I just linked to, and I already responded to them there. I suggest you re-read my comments there. --WikiTiki89 01:14, 12 January 2016 (UTC)
@Wikitiki89, Atitarev Wikitiki, as you might surmise, I agree with Anatoli, but I will revert, per bot policy. Anatoli, it will revert exactly to the state it was before my bot changed it, including any manual transliterations. (It textually substitutes the old text for the new one.) Benwing2 (talk) 04:05, 12 January 2016 (UTC)
@Wikitiki89, Atitarev There were over 26,000 substitutions made by my auto-accenting script. Of those, about 1,000 involve one of the following four templates: {{ux}}, {{lang}}, {{usex}}, {{ru-ux}}. The large majority of these 1,000 aren't direct quotes, but other sorts of illustrating phrases or sentences. I'd like to only revert the actual direct quotes. The full set of 1,000 or so replacements involving those four templates is in User:Benwing2/ru-maybe-direct-quote. Can the two of you help go through this? The thing to do is to find and delete the lines that are direct quotes. Any remaining line will be left alone and not reverted. (It might be more logical to reverse things and have you delete the lines that aren't direct quotes, but there are many more of those.) Benwing2 (talk) 07:15, 12 January 2016 (UTC)
@Wikitiki89, Atitarev BTW, if you put a line at the end of where you've gotten, it will help you and others not duplicate work. Benwing2 (talk) 07:17, 12 January 2016 (UTC)
@Atitarev, Cinemantique, Wikitiki89 I've gotten through page 7502 (end of е section), and put ?? in a couple of places where I wasn't sure whether they were direct quotes or not. There are only a few direct quotes here, so going through them is fast. Benwing2 (talk) 10:22, 12 January 2016 (UTC)
Thanks. Somehow I don't feel like going through big lists right now. I had a hard day. If it's no hurry I'll do some over time. Even if you revert them all I won't complain. It's too much work :).--Anatoli T. (обсудить/вклад) 10:48, 12 January 2016 (UTC)
Instead of manually going through the lists, you can distinguish usage examples from citations by their context:
# definition
#: usage example
# definition
#* citation
#*: direct quotation
I can live with any errors remaining after that distinction is made. --WikiTiki89 14:02, 12 January 2016 (UTC)
@Atitarev: I think a good win-win solution would be to develop a template (or additional feature in the {{usex}}/{{ux}} template) that (with the help of JavaScript) would allow readers to switch between the original text and the fully accented text (perhaps even with links). But this would mean having to manually input both versions of the text (because the original text may or may not already have sporadic accents). It could even allowing showing and hiding the transliteration. --WikiTiki89 22:16, 12 January 2016 (UTC)
@Wikitiki89: Accented original texts are extremely rare. I think fully accented text should be added and converted to unaccented + ё -> е to get the original text. That method should also work for accented Arabic and the Japanese furigana. Accented Hebrew maybe not. Chinese already uses a semi-automated conversion. --Anatoli T. (обсудить/вклад) 22:37, 12 January 2016 (UTC)
@Atitarev: Fully accented texts are of course rare, but sporadic accents are not so rare. A text that otherwise does not use "ё" might still occasionally use it in a few places for disambiguation. This is a bit rarer with stress marks, but I have seen it. In Arabic, texts very frequently add fatḥatān or šadda where applicable, or ḍamma to mark the passive. In Hebrew, I have seen רוצָה to make clear that it is feminine (see the first quotation at קומבינה). But because there isn't much consistency to this, there is no way that an automated accent remover can predict which accents were in the text and which ones weren't. But I believe I've already mentioned these things to you before. Do you still have anything against this idea? --WikiTiki89 22:54, 12 January 2016 (UTC)
@Wikitiki89 I think we can work on the assumption that the original text is unaccented. Otherwise, some overrides should be required - more complexity. As for Hebrew, I meant cases when basic letters in the accented text don't match the number of letters in the unaccented variant. I can't think of an example but I hope you know what I mean. --Anatoli T. (обсудить/вклад) 00:18, 13 January 2016 (UTC)
That's exactly the problem. I don't think we should work on any assumptions at all about the original text. There's no need for overrides either, I don't know why you're making this more complex than it was. It's pretty simple to just provide both the original text and the fully accented and linked text. --WikiTiki89 00:25, 13 January 2016 (UTC)
IMO it's a pain in the ass to have to provide two versions of every text. And it's not necessary either in most cases. It should be set up so you provide the fully accented text, and it automatically derives the unaccented text unless you provide it yourself. I think this is what Anatoli is saying too, and it's compatible with having both versions available. It's similar to what's done now where the linked version of the text is derived from the accented version by default, but you can supply both if you want. Benwing2 (talk) 02:45, 13 January 2016 (UTC)
Quotations take a good deal of work to prepare anyway. The only extra step here is to copy and paste the text before adding accents, which is insignificant compared to the work involved in transcribing, citing, and adding accents. Or if you were to do it by bot, the bot would do that part for you. The concern with duplication is not the effort required to press Ctrl+C and Ctrl+V, but that the cost of maintenance is doubled. Luckily, quotations don't need much maintenance. Anyway, if this feature were to be implemented, no one's gonna force you to use it you don't want to. But more importantly, did you read my comment above (14:02 UTC)? --WikiTiki89 03:25, 13 January 2016 (UTC)
I still disagree; there's no point in duplication if it can be avoided. I did see your comment above (14:02 UTC) and I've been thinking today about how to implement it. It might end up taking less time for me to just go through the remainder of the list manually than to write the code to locate the quotations; on the other hand doing so will make life easier if I ever do another auto-accenting run. Benwing2 (talk) 03:28, 13 January 2016 (UTC)
If it helps, I think you can simplify the rule to simply check whether the line begins with #* (with or without a subsequent string of colons). I'm also not a fan of duplication, but I see this as the only viable solution for the general case (in specific cases, we may be able to avoid the duplication). Keep in mind that I still see the original text as the main focus of the quotation and the default display, the accented version is only a bonus, and does not have to be added for every quotation. --WikiTiki89 03:53, 13 January 2016 (UTC)
Well, we disagree also in whether the focus should be on the original or accented version, so it's not clear we can find a template that will satisfy both of us. As for checking for quotations, I think I can just use a capturing regex split and it will make it easy to check the context and also snarf the template. I don't think just checking for #* + colons is necessarily enough because some direct quotes are made to be always visible, but it might catch the large majority of them. I'll also look for ref= in the {{ux}} or {{usex}}. Benwing2 (talk) 04:00, 13 January 2016 (UTC)
@Wikitiki89 I undid auto-accenting of quotes. It ended up a lot easier than I thought. I just looked for #* + possible colons, and this seemed to have caught most everything. If you see any other quotes that need fixing, let me know. Benwing2 (talk) 21:00, 13 January 2016 (UTC)
I think that's good enough as a general rule for future auto-accenting bot runs. If a quote is not correctly placed under a bullet, that is the problem of whoever added it. --WikiTiki89 21:32, 13 January 2016 (UTC)
I just remembered about subscences. If you ever plan on running your auto-translit bot again, make sure it takes into account that there could be more than one hash mark before the asterisk. --WikiTiki89 21:44, 21 January 2016 (UTC)
@Wikitiki89 OK. Benwing2 (talk) 21:58, 21 January 2016 (UTC)

игого and огого[edit]


Could you add two more exceptions to [ɡ] pronunciations (translit as "g"), please - interjections игого́(igogó) (imitates the sound of horse neighing) and огого́(ogogó) (variant of ого́(ogó))? --Anatoli T. (обсудить/вклад) 01:45, 20 January 2016 (UTC)

Automatic vocalization[edit]

Automatic vocalization and transliteration of loanwords in Arabic is usually wrong and unnecessary. I advise you to exclude those from your robot. --Mahmudmasri (talk) 07:36, 26 January 2016 (UTC)

@Mahmudmasri The automatic transliteration may be incorrect in some instances (because they can be a variant pronunciation) but it may be overwritten manually with "tr=". Manual takes precedence over automatic. I totally disagree about the vocalisation. It's the native method of showing vocalisation, e.g. مِتْرُو m(metru, metro). Arabic sources don't provide either IPA or a Roman transliteration. The only verifiable source may be a recorded evidence from a media report. --Anatoli T. (обсудить/вклад) 10:06, 26 January 2016 (UTC)
@Benwing2 This message was meant for you but you probably didn't get a notification on this account.--Anatoli T. (обсудить/вклад) 20:45, 26 January 2016 (UTC)
Anatoli, thanks for the ping. @Mahmudmasri I did that run 6 months ago and I probably won't do another such run, so don't worry. I agree with Anatoli that we should try to provide transliteration where possible. In any case, those transliterations were already there. I don't think the vocalization of loanwords is necessarily a problem; I agree that sometimes there are variant pronunciations but those mostly concern issues that won't be reflected in the vocalization (e vs. i, o vs. u, long vs. short vowels), and didn't prevent Hans Wehr from giving pronunciations for loanwords. Benwing2 (talk) 22:22, 26 January 2016 (UTC)

Wiktionary:Beer parlour/2016/January#Arabic loanwords and vocalisations. --Mahmudmasri (talk) 20:36, 6 February 2016 (UTC)

@Benwing2 --Anatoli T. (обсудить/вклад) 00:50, 7 February 2016 (UTC)

New challenges?[edit]

I would be bored if I only had to work with Russian, besides, I hate Russian politicians and the way things are going there but I work with Russian because I can. Anyway, are you interested in working with other complex languages? You've already done some amazing work with Arabic and Russian. Pity you stopped working with Arabic. Loanwords could be improved for words in Hans Wehr. What do you think of partially automating Thai transliteration? It's much less predictable than its relative Lao and is more complicated (more letters and consonant clusters). The challenge is not only transliterate but provide tones, which Lao module doesn't do (but could). I've got only basic Thai but I am keen to improve it. We also have a couple of active or semi active editors who know Thai. Resources are not great but can be used to some extent.--Anatoli T. (обсудить/вклад) 06:51, 27 January 2016 (UTC)

I stopped working with Arabic for various reasons. I didn't really have enough knowledge of the language to feel competent to add new entries, and there are tons of missing entries, and I was concerned about copyright violation if I just use what Hans Wehr's dictionary says (there are out-of-copyright Arabic dictionaries but they are unreliable or hard-to-use, full of outdated senses and missing many modern senses). But also, there weren't any other active native-speaker editors working on the language, so I felt I was working more-or-less blindly. And there wasn't anything like ruwikt that I could find -- the Arabic Wiktionary is terrible. As for Thai, I'm about to start a new job so I'm wary of diving into a new language at the moment, but it's something I'll definitely consider for the future. Benwing2 (talk) 08:13, 27 January 2016 (UTC)
Yes, sure. I can't afford too much time either and for this I would need some concentration and my books. Before you even consider this project (if you decide to) you would need to do a feasibility study. It may turn out too hard or impossible for objective reasons. --Anatoli T. (обсудить/вклад) 09:29, 27 January 2016 (UTC)

algazarra, algazara[edit]

Do we have entries for the Arabic origins? Transliteration is usually given as alḡazara, pertaining from ḡazārah (abundance). Thanks. – Jberkel (talk) 23:01, 27 January 2016 (UTC)

We don't currently have an entry for غزارة. It's been a long time since I've created any Arabic entries; I might be able to manage this though. Benwing2 (talk) 00:40, 28 January 2016 (UTC)
No rush. It already helps to have the correct lemma to link to. Jberkel (talk) 12:19, 28 January 2016 (UTC)
@Jberkel I've started غَزَارَة(ḡazāra) using Benwing's templates and modules. الغَزَارَة(al-ḡazāra) is its definite form. The entry can be checked and improved but I think it's correct. --Anatoli T. (обсудить/вклад) 12:45, 28 January 2016 (UTC)
Thanks! About the definite form, this is the one which should be displayed (3rd parameter to {{m}}) in etymologies, when al- is part of the derived word? Jberkel (talk) 13:38, 28 January 2016 (UTC)
Thanks, Anatoli! Benwing2 (talk) 13:47, 28 January 2016 (UTC)
You're welcome. @Jberkel It's up to you, I don't think we have a policy on that. Same with azáfama, definite Arabic form is الزَحْمَة(az-zaḥma). --Anatoli T. (обсудить/вклад) 22:13, 28 January 2016 (UTC)
Yeah, I agree with Anatoli, we don't seem to have a policy, but I think it's a good idea. Benwing2 (talk) 22:20, 28 January 2016 (UTC)
It depends what you want to show and how much information is reasonable to provide. In case of غَزَارَة(ḡazāra), users might also want to know that "ال" is just a definite article, not part of the word. Interesting enough, English borrowing from Arabic Riyadh is without the article, but Russian Эр-Рия́д(Er-Rijád) matches closer the original اَلرِّيَاض(ar-riyāḍ). --Anatoli T. (обсудить/вклад) 00:39, 29 January 2016 (UTC)

Hello, Benwing2, all right? About "egl" and "eml" abbreviations...[edit]

I noticed that in the page "", since I wanted to write about Emilian-Romagnolo language, I had to write the abbreviation "egl" (that conversely usually refers only to Emilian language) instead of "eml"=Emilian-Romagnol language. Do you think in the future I'll have to write "egl" again, or there is something to fix upstream? Thank you in advance, --Gloria sah (talk) 16:39, 2 February 2016 (UTC)

Appendix:Russian pronunciation[edit]


Are you interested in expanding this a bit and describing what you learned about the Russian phonology? I meant to do it but procrastinated. Otherwise we just link it to Wikipedia. --Anatoli T. (обсудить/вклад) 12:09, 3 February 2016 (UTC)

I'll try to add to it but I think linking to Wikipedia is a good idea, looks like it's already done in fact. Benwing2 (talk) 17:12, 9 February 2016 (UTC)

Russian 5a verbs that start with вы́-[edit]

It looks like the imperative verb forms should have -и(те) instead of -ь(те). See ru-wikt versions of предвидеть and вылететь. --KoreanQuoter (talk) 15:03, 9 February 2016 (UTC)

@Atitarev Also summoning our prijatel'. --KoreanQuoter (talk) 15:04, 9 February 2016 (UTC)
Thanks! (предвидеть is already correct, maybe you meant another verb?) Luckily there are only 5 verbs involved here (выглядеть, выгнать, выдержать, вылететь, выстоять) and one of them (выгнать) is already correct. I think this is fixable by adding the и argument to the other verbs. We'll have to delete the bad forms but it should only be 8 pages. Benwing2 (talk) 17:03, 9 February 2016 (UTC)
Thanks for spotting! Good job, Taeho! Fixed the tables, sorry. Some forms need to deleted and some regenerated, some have more senses - indicative and imperative.--Anatoli T. (обсудить/вклад) 20:01, 9 February 2016 (UTC)
Also fixed выкипеть. This really bugged me for several days. --KoreanQuoter (talk) 00:11, 10 February 2016 (UTC)



I just want to make sure you got the answers to all your latest questions (and making changes accordingly). I didn't ping you in my answers but I hope you're keeping track of your questions and answers to them. :) --Anatoli T. (обсудить/вклад) 02:04, 4 March 2016 (UTC)

Yup, I saw all of them and fixed things up. Thanks! Benwing2 (talk) 02:56, 4 March 2016 (UTC)

Fun With Parameter 6[edit]

Please take a look at Cat:E. Chuck Entz (talk) 04:45, 9 March 2016 (UTC)

Thanks, I'm fixing them right now. Benwing2 (talk) 04:46, 9 March 2016 (UTC)

Clean up[edit]

I think there is a good need to clean up User:Benwing2/russian-freq-redlinks. I think I have to contribute russian-freq-redlinks even more. --KoreanQuoter (talk) 05:11, 25 March 2016 (UTC)

I'll rerun it soon. BTW if you want to contribute entries you should do it from the top. Benwing2 (talk) 05:14, 25 March 2016 (UTC)
I haven't touched the list for a while, so I'll remind of myself of that. Thank you. --KoreanQuoter (talk) 05:19, 25 March 2016 (UTC)
@KoreanQuoter I have to ask you ... why do you add obscure and/or obsolete terms, like претвори́ться(pretvorítʹsja)? Are you finding them among stuff you're reading? (My dictionary says претворить is obsolete and means "change, transform"; претвориться is presumably the intransitive equivalent.) IMO it would be much more helpful to add common terms. The obscure terms are filling up Category:Russian entries needing definition, making it rather less useful. Benwing2 (talk) 06:30, 25 March 2016 (UTC)
If you're looking for common terms, start at entry 8000 in User:Benwing2/russian-freq-redlinks and go through the adjectives that are red. Benwing2 (talk) 06:31, 25 March 2016 (UTC)
I've been very busy in real life for the past several months and I just can't concentrate distinguishing whether they're common or not. I've been thinking of 2-3 months of Wikibreak or just contributing only in the weekends, but I declined. So my editing style is literally messed up at this point. --KoreanQuoter (talk) 08:15, 25 March 2016 (UTC)
@KoreanQuoter OK. Sorry to be short with you. It seems in any case that претвори́ться(pretvorítʹsja) might not be obsolete after all. But I'd still recommend choosing terms based on the top of the frequency list. Benwing2 (talk) 08:16, 25 March 2016 (UTC)
Even if I don't make new lemma entries, I would often "organize" related terms of a single lemma entry. --KoreanQuoter (talk) 08:20, 25 March 2016 (UTC)


Hello, I've seen that you started a new discussion on the Beer parlour, so I think you're a quite expert user of the Wikitionary and like to discuss. Would you like to join this discussion? It's about the improper use of asterisks for Italian words with "syntactic gemination", introduced by an Italian user without asking anyone's opinion and without a consensus, but admins say that now a consensus is needed to remove them since nobody noticed them and said anything about them during the last months. So far, the few users who commented agreed that the asterisk symbol shouldn't be used, but I think that we need more users to say that the community reached a consensus... If you want to say your opinion, you're welcome to the talk!

Hi. You should create a user account to make it easier to respond to you and such; you'll also probably get more respect that way. I did see the discussion. I'm not sure what the correct answer is; I don't work on Italian. Benwing2 (talk) 00:56, 12 April 2016 (UTC)

Okay, no problem!

In case you weren't already aware of this...[edit]

See 6 Russian entries in Cat:E. Thanks! Chuck Entz (talk) 03:20, 15 April 2016 (UTC)

@Chuck Entz Oops. Somehow I haven't been checking for errors lately. How long have they been there? Must have been awhile ... Benwing2 (talk) 03:28, 15 April 2016 (UTC)
A couple of days. With only 6, I figured I could wait to see if they might get fixed in the course of your ongoing edits to the module. Chuck Entz (talk) 03:34, 15 April 2016 (UTC)
ОК thanks. They're all real errors of mine, fixed now. Benwing2 (talk) 03:35, 15 April 2016 (UTC)

template error in Arabic jayb[edit]

In the Related Terms of Arabic jayb (I don't know how to link directly to the Arabic script: it's linked from English sine), the template has two square brackets before the word 'sinuses' but only one after, so it's not showing up correctly. This was an edit done by your little AnthroPC, so it might need some retraining on this kind of link. - 10:38, 15 April 2016 (UTC)

Thanks. I fixed it. This was actually an error in a list I compiled by hand, which the bot then propagated; so it's not a programming error. Benwing2 (talk) 18:48, 15 April 2016 (UTC)

Relocation of "was wotd"[edit]

Hi, hope you're well! I was wondering if you could do a bot run to make the following simple correction, changing:

{{was wotd|2016|May|11}}


{{was wotd|2016|May|11}}

(The date is just an example.) There are a number of occurrences where the {{was wotd}} template was placed outside rather than within the "English" section, where it should be. Thanks. — SMUconlaw (talk) 10:49, 11 May 2016 (UTC)

@Smuconlaw I'll try to get to this soon, shouldn't be too hard. Benwing2 (talk) 06:08, 13 May 2016 (UTC)
Sure, no rush! Thanks. — SMUconlaw (talk) 09:15, 13 May 2016 (UTC)
Have you got some time to work on this? — SMUconlaw (talk) 17:22, 8 September 2016 (UTC)
@Smuconlaw Oops, sorry! I totally forgot about this. I'll try to get to it tonight or tomorrow, it should be pretty easy. Benwing2 (talk) 17:45, 8 September 2016 (UTC)
No worries. Thanks! — SMUconlaw (talk) 19:10, 8 September 2016 (UTC)
@Smuconlaw Sorry again, I got sidetracked. Will do ASAP. Benwing2 (talk) 18:58, 30 September 2016 (UTC)
@Smuconlaw Done. Benwing2 (talk) 03:03, 1 October 2016 (UTC)
Thank you! — SMUconlaw (talk) 10:32, 1 October 2016 (UTC)

I was off the grid[edit]

By the way, I hadn't been contributing a lot in Wiktionary due to real life issues. Anyways, how is Anatoli? --KoreanQuoter (talk) 01:48, 19 May 2016 (UTC)

He's been away for awhile. I've been pretty busy, with Wanjuscha's help. We've finished all the verbs in the 20,000-word frequency list and are working on the adjectives in the 12,000-12,999 range. Benwing2 (talk) 02:07, 19 May 2016 (UTC)


Please see CAT:E. I notice that all of them have at least one space in them. Thanks! Chuck Entz (talk) 03:43, 2 July 2016 (UTC)

@Chuck Entz Oops. My recent code is problematic with multiword expessions. I'm fixing it now. Benwing2 (talk) 05:32, 2 July 2016 (UTC)

Elision of hamzat al-wasl after vowels[edit]

I added these test cases to Module:ar-translit. I think it would be a good idea for it to work that way. If elision is desired, the vowel is not written. This solves the issues mentioned in the comments of the module. Could you make it work if you have time? --WikiTiki89 15:43, 6 July 2016 (UTC)

OK, I'll take a look at it. Benwing2 (talk) 15:54, 6 July 2016 (UTC)


We have a problem with Сан-Диего. The г automatically becomes v. --KoreanQuoter (talk) 15:38, 7 July 2016 (UTC)

@KoreanQuoter Good pickup! Диего should also be added as an exception.--Anatoli T. (обсудить/вклад) 20:57, 7 July 2016 (UTC)
Fixed. Benwing2 (talk) 02:35, 8 July 2016 (UTC)

French testcases module[edit]

(moved to Module talk:fr-pron)


(moved to Module talk:fr-pron)


I could be wrong, but this doesn't quite look like a form of опере́ться(operétʹsja)... Chuck Entz (talk) 03:03, 6 September 2016 (UTC)

No, you couldn't be wrong. @Benwing2 --WikiTiki89 03:26, 6 September 2016 (UTC)
Oops. I deleted this earlier. There's a module bug in generating this participle of опереться and запереться. I've been putting off fixing it for a while but it needs fixing now. Benwing2 (talk) 03:33, 6 September 2016 (UTC)
@Chuck Entz, Wikitiki89 Benwing2 (talk) 03:33, 6 September 2016 (UTC)
I fixed the forms, and the module fix is in the pipe. Benwing2 (talk) 04:55, 6 September 2016 (UTC)
опереться still shows the incorrect form. --Anatoli T. (обсудить/вклад) 05:17, 6 September 2016 (UTC)
Right, that's because I haven't copied over the module in Module:User:Benwing2/ru-verb to Module:ru-verb. I turned on the test-new-module feature, which compares the output of the two for all verbs, and I'm waiting till all the verbs get refreshed, which should be done soon. Benwing2 (talk) 05:23, 6 September 2016 (UTC)
Thank you. --Anatoli T. (обсудить/вклад) 05:28, 6 September 2016 (UTC)

газировать(gazirovatʹ), пузыриться(puzyritʹsja)[edit]

These Russian terms are marked as alternative forms of the same title. Is this correct? The only thing I notice that's different is the pronunciation. DTLHS (talk) 00:24, 24 September 2016 (UTC)

I did this intentionally because the stress patterns and conjugations of the two alternants differ. Logically, there's no reason they would need to be spelled the same way. Benwing2 (talk) 00:28, 24 September 2016 (UTC)
They share the etymology. Just different pronunciations. I think they should be split by pronunciation sections only.--Anatoli T. (обсудить/вклад) 00:58, 24 September 2016 (UTC)
I think the way I've handled it now is better (diff, diff). Incidentally, I think it's a bit redundant in the conjugation template to show "2a // 2a". Perhaps it should be smart enough to figure out that they're the same. --WikiTiki89 15:25, 26 September 2016 (UTC)
Looks OK to me and I fixed the "2a // 2a" bug. Benwing2 (talk) 13:38, 27 September 2016 (UTC)

No-noschwa module errors[edit]

You removed the noschwa function from Module:fr-pron without checking to see what invoked it, leading to a number of module errors in entries using {{fr-IPA}}. Please fix. Thanks! Chuck Entz (talk) 21:56, 30 September 2016 (UTC)

Oops, sorry! My bad. Fixed now. Benwing2 (talk) 22:08, 30 September 2016 (UTC)
Quickly, too! Thanks! Chuck Entz (talk) 22:32, 30 September 2016 (UTC)
There's a bunch of French headword module errors- something to do with sort keys. DTLHS (talk) 22:46, 22 October 2016 (UTC)
@DTLHS Oops! Fixed. Benwing2 (talk) 22:49, 22 October 2016 (UTC)

Spliting etymologies for nouns and verbs[edit]

Just so you know, I agree with you that etymologies for nouns and verbs should not be split as long as they are almost the same. Thus, in these cases, I find it unfortunate to create two etymology sections in an entry layout. What I do find acceptable is to create a separate bullet "(noun) From ..." in the same etymology section in addition to "(verb) From ..." bullet, but even that could be overkill. --Dan Polansky (talk) 18:27, 8 October 2016 (UTC)

Sorry, what is this in reference to? Benwing2 (talk) 18:29, 8 October 2016 (UTC)
This is in reference to a note I thought saw you made in a Beer parlour discussion, although I cannot quickly find it any more. No action is required; I just wanted to say I agree with you. --Dan Polansky (talk) 19:42, 8 October 2016 (UTC)
OK. I presume you're referring to English in particular. I'm not sure if I actually made that comment because I think we should often have two etym sections for related nouns vs. verbs, esp. if they go back to separate words in Old English. I agree it's more arguable if one of the two is a recent formation from the other, e.g. noun invite vs. verb invite. Benwing2 (talk) 19:51, 8 October 2016 (UTC)
Well, maybe it was someone else. Next time around, I should have a specific link handy. --Dan Polansky (talk) 20:39, 8 October 2016 (UTC)

More pronunciation modules[edit]

Are you interested in making more of these? Specifically, I'm interested in working on MOD:ny-IPA for Chichewa, as part of my infrastructural preparation for creating a large number of entries in the language. I just don't have the skill to create it myself, but I can write out a series of rules to be executed. If you're too busy, however, I understand that. Thanks! —Μετάknowledgediscuss/deeds 22:46, 16 October 2016 (UTC)

If you write out the rules I can look into implementing them. Benwing2 (talk) 01:28, 17 October 2016 (UTC)
I realised that I can write them out if that's helpful, but the table at w:Chewa language#Consonants pretty much covers it all (ignore the placeholder vowels after each consonant's orthographic form in the table). The template will have to be provided with a respelled version of the word; in most cases, it will be the same as what is on the headword-line, so with acute accents /á é í ó ú ḿ/ for high tone (which can just be given thus in the IPA); one extra thing is <m'>, which should be syllabic /m/ (rather than forming a digraph with whatever comes after it). The vowels <a e i o u> are the same in IPA. The only other things to note are that <zy> should be /ʒ/, <ŵ> should be /w(ᵝ)/, and <w> in the combinations <awu ewu iwu owa uwa> should be /(w)/. —Μετάknowledgediscuss/deeds 03:37, 17 October 2016 (UTC)
Oh, and syllables always end in a vowel, except for syllabic m (which will always be respelt as <m'> or <ḿ> when fed to the template). Stress is always penultimate, and no secondary stress needs to be marked. —Μετάknowledgediscuss/deeds 03:40, 17 October 2016 (UTC)
OK, thanks. I'll look into it when I have a chance. Benwing2 (talk) 03:44, 17 October 2016 (UTC)
I've had a start at Module:ny-IPA but I'm feeling a bit out of my depth- do you think you could finish it off? DTLHS (talk) 00:13, 13 November 2016 (UTC)

Elative of حي[edit]

Just wanted to let you know that your bot has an oversight in generating elative forms (see diff) and forgets to take into account the rule that alif maqsuura becomes a plain alif after yaa'. I only noticed because an anon corrected it today. It's probably not relevant since this bot run was a long time ago and this is a rather rare scenario, but if it were me, I'd want to know. --WikiTiki89 13:33, 4 November 2016 (UTC)