{"id":3945,"date":"2026-06-23T11:10:00","date_gmt":"2026-06-23T09:10:00","guid":{"rendered":"https:\/\/planetozh.com\/blog\/?p=3945"},"modified":"2026-06-23T11:10:00","modified_gmt":"2026-06-23T09:10:00","slug":"allowing-weird-characters-in-yourls-short-urls","status":"publish","type":"post","link":"https:\/\/planetozh.com\/blog\/2026\/06\/allowing-weird-characters-in-yourls-short-urls\/","title":{"rendered":"Allowing weird characters in YOURLS short URLs"},"content":{"rendered":"<p>One of the most recurring support requests we get for YOURLS goes like this: &quot;How do I get a <tt>-<\/tt> (or a <tt>\/<\/tt>, or a <tt>:<\/tt>) into my keywords?&quot; People want pretty customshort URLs like <tt>sho.rt\/my-talk<\/tt> or <tt>sho.rt\/v2:final<\/tt>, and by default YOURLS won&#39;t let them.<\/p>\n<p>The short answer is &quot;use the <tt>get_shorturl_charset<\/tt> filter.&quot; The long answer is that, depending on the character, that filter is either the whole solution or just the start of one. Here&#39;s why, with the colon as the interesting case.<\/p>\n<h2>The easy case: the hyphen<\/h2>\n<p>YOURLS ships with a bundled core plugin for the hyphen, and it&#39;s trivial code: basically add <tt>-<\/tt> to the allowed character set.<\/p>\n<div id=\"ig-sh-1\" class=\"syntax_hilite\">\n\n\t\t<div class=\"toolbar\">\n\n\t\t<div class=\"view-different-container\">\n\t\t\t\t\t\t<a href=\"#\" class=\"view-different\">&lt; View <span>plain text<\/span> &gt;<\/a>\n\t\t\t\t\t<\/div>\n\n\t\t<div class=\"language-name\">php<\/div>\n\n\t\t\n\t\t<br clear=\"both\">\n\n\t<\/div>\n\t\n\t<div class=\"code\">\n\t\t<ol class=\"php\" style=\"font-family:monospace\"><li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">\/\/ Add hyphen to the allowed character set<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">yourls_add_filter( 'get_shorturl_charset', 'ozh_hyphen_in_charset' );<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">\/\/ Unless we are crafting a random keyword<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">yourls_add_action( 'add_new_link_create_keyword', function() {<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; yourls_remove_filter( 'get_shorturl_charset', 'ozh_hyphen_in_charset' );<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">} );<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">function ozh_hyphen_in_charset( $in ) {<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; return $in . '-';<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">}<\/div><\/li>\n<\/ol>\t<\/div>\n\n<\/div>\n\n<p>So, a lot of people starting to build on this, to add any custom char. The thing is a bit more complicated.<\/p>\n<h2>Then someone asks for a colon<\/h2>\n<p>When YOURLS instantiates, amongst first things it evaluates the &quot;request&quot;, ie the <tt>abc<\/tt> part in <tt>sho.rt\/abc<\/tt>.<\/p>\n<p>One of the handy bookmarklets YOURLS has is the &quot;Prefix-n-Shorten&quot; mechanism: just prepend your YOURLS URL to a long URL and shorten it on the fly. So the request can be a full URL, such as <tt>sho.rt\/https:\/\/longurl.com\/<\/tt>. When abc is a non ambiguous request, now having a colon becomes ambiguous because it looks like a URL:<\/p>\n<div id=\"ig-sh-2\" class=\"syntax_hilite\">\n\n\t\t<div class=\"toolbar\">\n\n\t\t<div class=\"view-different-container\">\n\t\t\t\t\t\t<a href=\"#\" class=\"view-different\">&lt; View <span>plain text<\/span> &gt;<\/a>\n\t\t\t\t\t<\/div>\n\n\t\t<div class=\"language-name\">php<\/div>\n\n\t\t\n\t\t<br clear=\"both\">\n\n\t<\/div>\n\t\n\t<div class=\"code\">\n\t\t<ol class=\"php\" style=\"font-family:monospace\"><li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">$request = yourls_get_request();<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">\/\/ if request has a scheme : send to bookmarklet<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">if ( yourls_get_protocol( $keyword ) ) {<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; \/\/ ... redirect to \/admin\/index.php?...<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; yourls_redirect( \/* admin *\/, 302 );<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; exit;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">}<\/div><\/li>\n<\/ol>\t<\/div>\n\n<\/div>\n\n<h2>The plugin to allow colon<\/h2>\n<p>Because everything funnels through <tt>yourls_get_protocol()<\/tt>, and because that function is filterable (it ends with <tt>yourls_apply_filter( 'get_protocol', ... ))<\/tt>, one filter fixes the whole chain. The trick is to teach <tt>get_protocol<\/tt> to recognize our own keywords and refuse to treat their colon as a scheme.<\/p>\n<div id=\"ig-sh-3\" class=\"syntax_hilite\">\n\n\t\t<div class=\"toolbar\">\n\n\t\t<div class=\"view-different-container\">\n\t\t\t\t\t\t<a href=\"#\" class=\"view-different\">&lt; View <span>plain text<\/span> &gt;<\/a>\n\t\t\t\t\t<\/div>\n\n\t\t<div class=\"language-name\">php<\/div>\n\n\t\t\n\t\t<br clear=\"both\">\n\n\t<\/div>\n\t\n\t<div class=\"code\">\n\t\t<ol class=\"php\" style=\"font-family:monospace\"><li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">\/\/ Add the colon to the allowed short URL character set...<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">yourls_add_filter( 'get_shorturl_charset', 'ozh_colon_in_charset' );<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">function ozh_colon_in_charset( $charset ) {<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; return $charset . ':';<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">}<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">\/\/ ...unless we are crafting a random keyword, to keep auto-generated keywords colon-free<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">yourls_add_action( 'add_new_link_create_keyword', function() {<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; yourls_remove_filter( 'get_shorturl_charset', 'ozh_colon_in_charset' );<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">} );<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">\/\/ Don't let a keyword that contains a colon be mistaken for a URI scheme<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">yourls_add_filter( 'get_protocol', 'ozh_colon_get_protocol', 10, 2 );<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">function ozh_colon_get_protocol( $protocol, $url ) {<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; \/\/ Nothing was detected as a protocol: leave as is<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; if ( $protocol === '' ) {<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; return $protocol;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; }<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; \/\/ A genuine &quot;scheme:\/\/...&quot; URL (eg a bookmarklet target): leave as is<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; if ( str_contains( $protocol, '\/' ) ) {<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; return $protocol;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; }<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; \/\/ Here $protocol is a bare &quot;scheme:&quot; with no slashes. If the whole candidate<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; \/\/ string consists only of valid short URL charset characters, it's one of our<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; \/\/ keywords (eg &quot;foo:bar&quot;), not a URL: don't treat the colon as a scheme.<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; $pattern = yourls_make_regexp_pattern( yourls_get_shorturl_charset() );<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; if ( ! preg_match( '@[^' . $pattern . ']@', $url ) ) {<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; return '';<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; }<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; return $protocol;<\/div><\/li>\n<li style=\"font-weight: normal;vertical-align:top\"><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">}<\/div><\/li>\n<\/ol>\t<\/div>\n\n<\/div>\n\n<p>The logic of the get_protocol filter, in plain English:<\/p>\n<ul>\n<li>No scheme detected? Leave it alone.<\/li>\n<li>The detected scheme contains a slash (<tt>http:\/\/<\/tt>)? That&#39;s a real URL, leave it alone.<\/li>\n<li>A bare scheme: with no slashes, and the entire candidate string is made only of charset characters? That&#39;s one of our keywords &#8211; return <tt>''<\/tt> so the colon stops being read as a scheme.<\/li>\n<\/ul>\n<p>Real URLs survive because they contain a <tt>\/<\/tt>, a <tt>.<\/tt>, an <tt>@<\/tt>, or some other character outside the keyword charset, so they fail that last test and keep their protocol. The bookmarklet still works.<\/p>\n<p>The colon plugin is available here if you want to drop it into <tt>user\/plugins\/<\/tt> and go: <a href=\"https:\/\/gist.github.com\/ozh\/ff7c454ce3fa9ce8ffeaba07f8324caa\">Allow Colon in YOURLS short URL<\/a>.<\/p>\n<h2>The one limitation<\/h2>\n<p>If you try to shorten a URL whose entire string fits inside the keyword charset plus the colon &#8211; think <tt>tel:123<\/tt> or <tt>urn:isbn<\/tt> &#8211; the plugin will neutralize its scheme and treat it as a keyword. In practice this should not happen, as any realistic URL you&#39;d shorten contains a <tt>\/<\/tt>, or a <tt>.<\/tt> somewhere, which puts it safely outside the charset. But it&#39;s worth knowing it&#39;s there.<\/p>\n<h2>Bonus: which other characters will make tricky keywords?<\/h2>\n<p>A few chars will be reinterpreted somewhere down the pipeline starting at <tt>yourls-loader.php<\/tt> in the root: <tt>?<\/tt>, <tt>+<\/tt> and <tt>\/<\/tt> will be special cases to handle differently.<\/p>\n<p>Some are modified, or destroyed by the browser, or at the HTTP layer before they reach YOURLS: <tt>#<\/tt>, spaces, <tt>\\<\/tt>, <tt>%<\/tt>, so they should never be used at all in keywords.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the most recurring support requests we get for YOURLS goes like this: &quot;How do I get a &#8211; (or a \/, or a :) into my keywords?&quot; People want pretty customshort URLs like sho.rt\/my-talk or sho.rt\/v2:final, and by default YOURLS won&#39;t let them. The short answer is &quot;use the get_shorturl_charset filter.&quot; The long answer is that, depending on\u2026<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21],"tags":[359],"class_list":["post-3945","post","type-post","status-publish","format-standard","hentry","category-published","tag-yourls"],"_links":{"self":[{"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/posts\/3945","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/comments?post=3945"}],"version-history":[{"count":2,"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/posts\/3945\/revisions"}],"predecessor-version":[{"id":3947,"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/posts\/3945\/revisions\/3947"}],"wp:attachment":[{"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/media?parent=3945"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/categories?post=3945"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/planetozh.com\/blog\/wp-json\/wp\/v2\/tags?post=3945"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}