In: , , ,
On: 2012 / 10 / 17 Viewed: 4575 times
Shorter URL for this post: http://ozh.in/vh

Ho haï, blog, long time no see! :)

I was checking stuff on various URL shorteners and noticed is.gd has one interesting feature: you can generate short URLs that are "pronounceable" (no "vgfhgt"). This is a great little touch: a random but pronounceable word will be more memorable and the probability for typos is reduced, which makes it a killer feature for random generated passwords for example.

Generating pronounceable random words isn't difficult :

  • start by a vowel or a consonant
  • alternate and add letters till proper word length

Simple, but this generates words that are too simple maybe: 'abuco', 'misolo', 'xulanipo', etc… Natural words also use a few consecutive consonant combinations as well as vowel combos ('cheepo', 'bergam', …)

I ended up with this simple piece of code that gives more natural words:

  1. <?php
  2. /**
  3.  * Generate random pronounceable words
  4.  *
  5.  * @param int $length Word length
  6.  * @return string Random word
  7.  */
  8. function random_pronounceable_word( $length = 6 ) {
  9.    
  10.     // consonant sounds
  11.     $cons = array(
  12.         // single consonants. Beware of Q, it's often awkward in words
  13.         'b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm',
  14.         'n', 'p', 'r', 's', 't', 'v', 'w', 'x', 'z',
  15.         // possible combinations excluding those which cannot start a word
  16.         'pt', 'gl', 'gr', 'ch', 'ph', 'ps', 'sh', 'st', 'th', 'wh',
  17.     );
  18.    
  19.     // consonant combinations that cannot start a word
  20.     $cons_cant_start = array(
  21.         'ck', 'cm',
  22.         'dr', 'ds',
  23.         'ft',
  24.         'gh', 'gn',
  25.         'kr', 'ks',
  26.         'ls', 'lt', 'lr',
  27.         'mp', 'mt', 'ms',
  28.         'ng', 'ns',
  29.         'rd', 'rg', 'rs', 'rt',
  30.         'ss',
  31.         'ts', 'tch',
  32.     );
  33.    
  34.     // wovels
  35.     $vows = array(
  36.         // single vowels
  37.         'a', 'e', 'i', 'o', 'u', 'y',
  38.         // vowel combinations your language allows
  39.         'ee', 'oa', 'oo',
  40.     );
  41.    
  42.     // start by vowel or consonant ?
  43.     $current = ( mt_rand( 0, 1 ) == '0' ? 'cons' : 'vows' );
  44.    
  45.     $word = '';
  46.        
  47.     while( strlen( $word ) < $length ) {
  48.    
  49.         // After first letter, use all consonant combos
  50.         if( strlen( $word ) == 2 )
  51.             $cons = array_merge( $cons, $cons_cant_start );
  52.  
  53.          // random sign from either $cons or $vows
  54.         $rnd = ${$current}[ mt_rand( 0, count( ${$current} ) -1 ) ];
  55.        
  56.         // check if random sign fits in word length
  57.         if( strlen( $word . $rnd ) <= $length ) {
  58.             $word .= $rnd;
  59.             // alternate sounds
  60.             $current = ( $current == 'cons' ? 'vows' : 'cons' );
  61.         }
  62.     }
  63.    
  64.     return $word;
  65. }
  66.  
  67. ?>

(bleh, just noticed the fuckingfancy curled quoted are back in code blocks. pastebin for cut'n'paste code)

Play with the demo: random pronounceable words.

Nothing elaborated enough to help you create an alien language for your next sci-fi movie, but I'm rather pleased with the random words it creates. It's also easy to implement this in another language: modify the group of consecutive vowels and consonants to match what exists in your language.

Shorter URL

Want to share or tweet this post? Please use this short URL: http://ozh.in/vh

Metastuff

This entry "Generate Random Pronouceable Words" was posted on 17/10/2012 at 9:52 pm and is tagged with , , ,
Watch this discussion : Comments RSS 2.0.

3 Blablas

  1. 1
    Hajo Germany »
    replied, on 05/May/13 at 11:17 am # :

    Thanks! For some internal project I ported your code to bash:

    1. #!/bin/bash
    2.  
    3. function rpw {
    4.  
    5.         cons=(b c d f g h j k l m n p r s t v w x z pt gl gr ch ph ps sh st th wh)
    6.         conscs=(ck cm dr ds ft gh gn kr ks ls lt lr mp mt ms ng ns rd rg rs rt ss ts tch)
    7.         vows=(a e i o u y ee oa oo)
    8.  
    9.         len=$((($1+0 == 0) ? 6 : $1+0))
    10.         alt=$RANDOM
    11.         word=
    12.  
    13.         while [ ${#word} -lt $len ]; do
    14.  
    15.                 if [ $(($alt%2)) -eq 0 ]; then
    16.                         rc=${cons[(($RANDOM%${#cons[*]}))]}
    17.                 else
    18.                         rc=${vows[(($RANDOM%${#vows[*]}))]}
    19.                 fi
    20.  
    21.                 if [ $((${#word}+${#rc})) -gt $len ]; then continue; fi
    22.  
    23.                 word=$word$rc
    24.  
    25.                 ((alt++))
    26.  
    27.                 if [ ${#word} -eq 1 ]; then
    28.                         cons=(${cons[@]} ${conscs[@]})
    29.                 fi
    30.  
    31.         done
    32.  
    33.         echo $word
    34. }
    35.  
    36. rpw
    37.  
    38. rpw 4
  2. 2
    Ozh »
    wrote, on 05/May/13 at 11:38 am # :

    Hajo » nice :)

  3. 3
    Hajo Germany »
    commented, on 05/May/13 at 3:48 pm # :

    Oups, an error, you have to replace lines 27-29 with this:

    1. if [ ${#conscs[*]} -gt 0 ]; then
    2.                         cons=(${cons[@]} ${conscs[@]})
    3.                         conscs=()
    4.                 fi

Leave a Reply

Comment Guidelines or Die

  • HTML: You can use these tags: <a href=""> <em> <i> <b> <strong> <blockquote>
  • Posting code: Post raw code (no <> &lt; etc) within appropriate tags : [php][/php], [css][/css], [html][/html], [js][/js], [sql][/sql], [xml][/xml], or generic [code][code]
  • Gravatars: Curious about the little images next to each commenter's name ? Go to Gravatar.
  • Spam: Various spam plugins on patrol. I'll put pins in a Voodoo doll if you spam me.
  • I will mark as Spam test comments, all comments with SEO names (ie "My Cool Online Shop" instead of "Joe") or containing forum-like signatures.

Read more ?