Writing Arabic in English

My journey in creating a phonetic Arabic keyboard

A phonetic keyboard is a type of keyboard layout designed to make typing in a particular language easier by mapping characters or letters to sounds rather than their traditional positions on the keyboard.

This makes it easier for beginners or casual users to engage with the language without the learning curve of mastering an entirely different keyboard layout.

However there a few problems we have to overcome when creating a phonetic Arabic keyboard.

Arabic is written from right to left

This actually wasn't a problem as we could just use our old friend


direction: rtl;

Arabic is a cursive script

This means letters are connected to eachother and look different depending on their position in a word.

Lets take the letter ب (baa) for example.

- Beginning of the word: بيت (house - byt)

- Middle of the word: حبيبي (my love - habibi)

- End of the word: كتب (write - ktb)

- Isolated: كِتَاب (book - ktab)

Luckily the web browser (mostly) handles this for us, so we don't have to worry about it. Thank you UTF-8.

Except on Safari where we have to use a Zero-Width Joiner (ZWJ) More on that here

As an aside look at how the word for (write) كتب closely resembles the word for (book) كِتَاب. This is because Arabic is a root based language, where the meaning is derived from a root of 3 or 4 letters. More on that here

And don't mind the marks around the word (book) we will get to that in a bit.

Mapping English to Arabic

If we want a phonetic keyboard we need to map arabic letters to the english keys in a way that makes sense for someone that is used to typing in english.

So if someone types the letter b, we want to spit out the closest corresponding arabic letter-- in this case ب (baa).

Lucky for us 17/28 arabic letters have a rough equivalent in english. And multiple english letters actually map to the same sound in arabic. (Click for audio)

This case is pretty simple to handle, we can just map the english letter to the arabic letter, and then write a simple lookup function.

What about the other 11 letters?

I decided to break up these remaining 11 letters into two categories, the first category are the letters that bear some similarity to the simple letters we have already looked at.

For the sake of this post lets call these "Emphatic" letters, of which there are five.

The second category are the letters that don't really bear any resemblance to english letters.

Emphatic Letters

These letters are pronounced from the back of the throat, and don't really have a direct equivalent in english, but they do bear some resemblance to other arabic letters in my opinion.

If you are being very generous I think you can recognize some similarities between these letters and some of the simpler letters we have already looked it.

To me ح, ص, ض, ط, and ظ are similar-sh to ه, س, د, ت, and ز respectively. Just with a little more umph.

What do you think? (Click on the letters for audio)

  • is a stronger version of

  • is a stronger version of

  • is a stronger version of

  • is a stronger version of

  • This is probably the most questionable one of all 👇

  • is a stronger version of

Anyway. You can probably see where I am going with this. We can map these emphatic letters to the uppercase versions of the letters they resemble.

So ح, ص, ض, ط, and ظ would map to T, D, S, H, Z respectively.

In our code we keep each row of the keyboard as two arrays, one for the base case, and one for when the user is holding shift-- or has pressed caps lock. And then swap between the two arrays depending on the state of the keyboard.

And then we can reuse our lookup function from before

Unique letters

I don't have a good name for these letters... lets call them "Unique" letters, of which there are six. These letters don't really have a equivalent in english, and don't really bear any resemblance to the letters we have already looked at.

So yeah, these letters are a bit of a problem. So we have to get a bit creative.

However if you notice. The shape of these letters look similar to ones we have already seen. Just with some dots sprinkled on top.

  • + some dots gives us

  • + one dot gives us

  • + one dot gives us

  • + one dot gives us

  • I ended up just using g for ayn. I dont really have a good reason but I needed a key
  • + one dot gives us

So how can we sprinkle dots? Because we are quickly running out of keys

I landed on ' as being our dot sprinkler. So s' would give us ش (sheen).

H' would give us خ (remember capital H because this is emphatic)

t' would give us ث etc...

I think you can see how we're quickly coming up with our own little shorthand

Okay thats a lot of reading, how about we take a break and do some writing using the keyboard below.

Why dont you try writing my name Sherif s'ryf

Or house -- byt

Or dog -- klb

Or I like to drink tea -- ana bHb as'rb s'ay

  • أنا (ana) = I
  • بحب (bahib) = like
  • أشرب (ashrab) = drink
  • شاي (shay) = tea

PS. Notice the little mark above the first ا in I, and drink? Thats Hamza ء. We will get to him in a bit.

PPS. Notice how the word for tea شاي (shay) is somewhat similar to chai, or tea even.

There we go, we are now able to write all 28 arabic letters, but there is still more to be done!

Hamza ء

Hamza is the unofficial 29th letter of the arabic alphabet. It is used to represent a glottal stop.

It can appear with three letters. ا (alif), و (waw), and ي (ya).

Above or below alif it is written as أ or إ

Above waw it is written as ؤ

Above ya it is written as ئ. This version is often used at the middle or end of a word.

So how can we represent hamza in our keyboard? I decided on - to represent hamza. And if you are adding hamza to a letter you can simply type --

Since hamza can appear above or below Alif, A-- will give us أ and a-- will give us إ

Lets look at some examples shall we. Lets try typing on our keyboard below.

- Why dont you try writing the name Ahmed A--Hmd

- Or Ra'is ry--ys which means president

- or Izzay a--zay "How?"

So that's it, we are now Hamza compatible. Not the most interesting feature, but a necessary one.

Diacritics

Remember those marks we saw earlier around كِتَاب. The things below the first letter and above the second letter.

Those are diacritics.

Arabic diacritics are marks that placed on or around the letter than can define how the word can be pronounced, therefore how they can be understood.

Arabic is a consonant based language, and short vowels are often left out of written text. However they are important for pronunciation and grammar. Sometimes they are used to differentiate between words that are spelled the same but have different meanings.

Lets look an example

كَتَب (katab) - "He wrote."

كُتُب (kutub) - "Books."

To cut to the chase, we will be using = as our hotkey for diacritics.

The diacritics we will implement are the following:

Fatha (short a)

َ

This mark is written as a very tiny line above the letter. Fatḥah is what makes the sound pronounced as an extra (a) sound added to the word. You can consider the A in the word ‘car’ a Fatḥah.

Since this mark is used to indicate a short "a", we can type a= to add a Fatḥah to the letter.

Kasrah (short i)

ِ

This mark is written as a very tiny line below the letter. Kasrah is what makes the sound pronounced as an extra (i) sound added to the word. You can consider the I in the word ‘sit’ a Kasrah.

Since this mark is used to indicate a short "i", we can type i= to add a Kasrah to the letter.

Dhammah (short u)

ُ

This mark is written as a very tiny I dont know above the letter. Dhammah is what makes the sound pronounced as an extra (u) sound added to the word.

كُتُب (kutub) - "Books." is a good example here

Since this mark is used to indicate a short "u", we can type u= to add a Dhammah to the letter.

Sukun (de-emphasize)

ْ

This mark is written as a very tiny circle above the letter. Sukun is what makes the sound pronounced as a stop. It is used to indicate that a letter is not followed by a vowel.

We can type h= to add a Sukun to the letter. I'm not sure how I landed on h, but I did.

Shaddah (emphasize)

ّ

This mark is written as a very tiny squiggle above the letter. Shaddah is what makes the sound pronounced as an extra emphasis on the letter. It is used to indicate that a letter is doubled.

Since the name of this mark starts with the letter s, we can type s= to add a Shaddah to the letter.

Tanwin

He is written above the final letter of a word . It is pronounced as the letter AN, IN, or UN added to a word, and comes in three forms.

Tanwin (an)

ً

Tanween is always added to the last letter of the word. In this case it adds an "an" sound to the word.

If you notice it looks like two Fathas َ, which gave us our short a sound

Since this mark is used to indicate a short "an", we can type an= to add a Tanween to the letter.

Tanwin (in)

ٍ

In this case it adds an "in" sound to the word.

If you notice it looks like two Kasrahs ِ, which gave us our short i sound

Since this mark is used to indicate a short "in", we can type in= to add a Tanween to the letter.

Tanwin (un)

ٌ

In this case it adds an "un" sound to the word.

If you notice it sorta kinda looks like two Dhammahs, ُ which gave us our short u sound

Since this mark is used to indicate a short "un", we can type un= to add a Tanween to the letter.

Lets take a look at some examples.

The word for bag is Shanta, and is written as شنطة (s'nTh') but without diacritics we lose some pronunciation hints.

Here it is with diacritics شَنْطَة (shanta)

Lets try typing it without diacritics first s'nTh'

And now with diacritics s'a=nh=Th'

If you press the audio button in the keyboard you can hear the slight difference in pronunciation.

Lets write a longer sentence now. Lets try, I like tea with milk. أَنَا بَحِبّ الشاي بِاللِّبَن (ana bahib al shay bil laban)
We can write that on our keyboard with full diacritics as
A--a=na=a ba=Hi=bs= als'ay bi=alla=s=ba=n

So there we go, we can now write arabic in english, kinda. Is it a good idea? Is it practical? I don't know. But it was fun to work on.

Implementation

The keyboard was built as a web component for ease of portability. You can view the source code here

It can be easily added to any website by adding the following code to your html file.


<script type="module" src="https://cdn.jsdelivr.net/npm/arabic-virtual-keyboard/+esm"></script>
<arabic-keyboard></arabic-keyboard>
Or you can install it via npm

npm i arabic-virtual-keyboard

You can read more about the project here Documentation

The keyboard is used throughout one of my personal projects. Parallel Arabic-- which is a language learning platform for Egyptian Arabic. Check it out here