One of the most recurring support requests we get for YOURLS goes like this: "How do I get a - (or a /, or a :) into my keywords?" People want pretty customshort URLs like sho.rt/my-talk or sho.rt/v2:final, and by default YOURLS won't let them.
The short answer is "use the get_shorturl_charset filter." The long answer is that, depending on the character, that filter is either the whole solution or just the start of one. Here's why, with the colon as the interesting case.
The easy case: the hyphen
YOURLS ships with a bundled core plugin for the hyphen, and it's trivial code: basically add - to the allowed character set.
- // Add hyphen to the allowed character set
- yourls_add_filter( 'get_shorturl_charset', 'ozh_hyphen_in_charset' );
- // Unless we are crafting a random keyword
- yourls_add_action( 'add_new_link_create_keyword', function() {
- yourls_remove_filter( 'get_shorturl_charset', 'ozh_hyphen_in_charset' );
- } );
- function ozh_hyphen_in_charset( $in ) {
- return $in . '-';
- }
So, a lot of people starting to build on this, to add any custom char. The thing is a bit more complicated.
Then someone asks for a colon
When YOURLS instantiates, amongst first things it evaluates the "request", ie the abc part in sho.rt/abc.
One of the handy bookmarklets YOURLS has is the "Prefix-n-Shorten" mechanism: just prepend your YOURLS URL to a long URL and shorten it on the fly. So the request can be a full URL, such as sho.rt/https://longurl.com/. When abc is a non ambiguous request, now having a colon becomes ambiguous because it looks like a URL:
- $request = yourls_get_request();
- // if request has a scheme : send to bookmarklet
- if ( yourls_get_protocol( $keyword ) ) {
- // ... redirect to /admin/index.php?...
- yourls_redirect( /* admin */, 302 );
- exit;
- }
The plugin to allow colon
Because everything funnels through yourls_get_protocol(), and because that function is filterable (it ends with yourls_apply_filter( 'get_protocol', ... )), one filter fixes the whole chain. The trick is to teach get_protocol to recognize our own keywords and refuse to treat their colon as a scheme.
- // Add the colon to the allowed short URL character set...
- yourls_add_filter( 'get_shorturl_charset', 'ozh_colon_in_charset' );
- function ozh_colon_in_charset( $charset ) {
- return $charset . ':';
- }
- // ...unless we are crafting a random keyword, to keep auto-generated keywords colon-free
- yourls_add_action( 'add_new_link_create_keyword', function() {
- yourls_remove_filter( 'get_shorturl_charset', 'ozh_colon_in_charset' );
- } );
- // Don't let a keyword that contains a colon be mistaken for a URI scheme
- yourls_add_filter( 'get_protocol', 'ozh_colon_get_protocol', 10, 2 );
- function ozh_colon_get_protocol( $protocol, $url ) {
- // Nothing was detected as a protocol: leave as is
- if ( $protocol === '' ) {
- return $protocol;
- }
- // A genuine "scheme://..." URL (eg a bookmarklet target): leave as is
- if ( str_contains( $protocol, '/' ) ) {
- return $protocol;
- }
- // Here $protocol is a bare "scheme:" with no slashes. If the whole candidate
- // string consists only of valid short URL charset characters, it's one of our
- // keywords (eg "foo:bar"), not a URL: don't treat the colon as a scheme.
- $pattern = yourls_make_regexp_pattern( yourls_get_shorturl_charset() );
- if ( ! preg_match( '@[^' . $pattern . ']@', $url ) ) {
- return '';
- }
- return $protocol;
- }
The logic of the get_protocol filter, in plain English:
- No scheme detected? Leave it alone.
- The detected scheme contains a slash (http://)? That's a real URL, leave it alone.
- A bare scheme: with no slashes, and the entire candidate string is made only of charset characters? That's one of our keywords – return '' so the colon stops being read as a scheme.
Real URLs survive because they contain a /, a ., an @, or some other character outside the keyword charset, so they fail that last test and keep their protocol. The bookmarklet still works.
The colon plugin is available here if you want to drop it into user/plugins/ and go: Allow Colon in YOURLS short URL.
The one limitation
If you try to shorten a URL whose entire string fits inside the keyword charset plus the colon – think tel:123 or urn:isbn – the plugin will neutralize its scheme and treat it as a keyword. In practice this should not happen, as any realistic URL you'd shorten contains a /, or a . somewhere, which puts it safely outside the charset. But it's worth knowing it's there.
Bonus: which other characters will make tricky keywords?
A few chars will be reinterpreted somewhere down the pipeline starting at yourls-loader.php in the root: ?, + and / will be special cases to handle differently.
Some are modified, or destroyed by the browser, or at the HTTP layer before they reach YOURLS: #, spaces, \, %, so they should never be used at all in keywords.