The
Nginx location matches
unescaped URI. However, to capture a Unicode part of URI is somewhat tricky.
First of all, we have to prepend our regular expression with
(*UTF8)
, e.g:
location ~* "(*UTF8)^/img/(.+)\.(jpg|jpeg|gif|png))$" {
# ...
}
Now we can use ordinary
$1, $2, ..., $N
variables for corresponding regular
expression groups. In most cases they work well. For instance, we can successfully pass
them to a proxy URI. The parts are getting
url-encoded, and everything works fine...,
except some cases when we use them as file/directory names, e.g.:
proxy_store $my_variable;
The problem is that a UTF-8 character can be converted to
3..12 characters! A 3-character
sequence in Chinese like
'艾弗吉'
gets converted to 27-character long string
'%E8%89%BE%E5%BC%97%E5%90%89'
. Obviously, we can reach
Ext4's
maximum filename length limit of
255 characters with just
28 Chinese symbols.
The following captures basename as unescaped UTF-8 string:
location ~* "(*UTF8)^/img/(?<basename>(.+)\.(jpg|jpeg|gif|png)))$" {
# $basename is unescaped string in Unicode
}
Now we can pass
$basename
to directives accepting file/directory names.
Hope this helps someone.