Software Engineer's Notes: 2010

17 December 2010

Anacron for a user

Normally, anacron is launched by root. And here is how a user can launch his/her own anacron jobs.

Prepare file and dirs

$ mkdir -p ~/etc/cron.daily ~/etc/cron.weekly ~/etc/cron.monthly \
~/var/spool/anacron
$ cp /etc/cron.daily/0anacron ~/etc/cron.daily/
$ cp /etc/cron.daily/.placeholder ~/etc/cron.daily/
$ cp /etc/cron.weekly/0anacron ~/etc/cron.weekly/
$ cp /etc/cron.weekly/.placeholder ~/etc/cron.weekly/
$ cp /etc/cron.monthly/0anacron ~/etc/cron.monthly/
$ cp /etc/cron.monthly/.placeholder ~/etc/cron.monthly/
$ find ~/etc/cron.* -name 0anacron -exec chmod u+x {} \;

~/etc/anacrontab

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=user@domain.com
LOGNAME=anacron

1   5   cron.daily   nice run-parts --report /etc/cron.daily
7   10  cron.weekly  nice run-parts --report /etc/cron.weekly
@monthly    15  cron.monthly nice run-parts --report /etc/cron.monthly

Replace MAILTO value with your mailbox. If mailx package installed, you should receive anacron job error outputs by E-mail.

Make anacron launched on login

$ echo "
/usr/sbin/anacron -t \$HOME/etc/anacrontab -S \$HOME/var/spool/anacron
" >> ~/.bash_profile

UPDATE 20 DECEMBER 2010
Note, if ~/.profile exists, ~/.bash_profile may be ignored(as in my case). I wonder, why? Bash man page bash(1) claims:

When bash is invoked as an interactive login shell, or as a non-interactive shell with the --login option, it first reads and executes commands from the file /etc/profile, if that file exists. After reading that file, it looks for ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order, and reads and executes commands from the first one that exists and is readable.

Put your scripts in ~/etc/cron.daily, ~/etc/cron.weekly and ~/etc/cron.monthly directories. Note, that script file names must consist entirely of upper and lower case letters, digits, underscores, and hyphens, since we launch them through run-parts(see man run-parts 8). In case of custom fienames you should apply --regex option.

25 November 2010

Shell script to back up SVN working copy changes

To copy items shown by svn status, one can use a simple script as following.


#!/bin/bash - 
set -o nounset                              # Treat unset variables as an error

# Displays the message, usage info and exits with error code 1
function my_usage()
{
    msg=$1
    [[ $msg ]] && echo $msg
    echo "Usage:
    ./$0 source_dir destination_dir
    source_dir          Source directory
    destination_dir         Destination directory"
    exit 1
}

# Returns dir name with trailing slash
function my_get_dirname()
{
    dir=$1
    if [[ ${dir:${#dir}-1:1} != '/' ]]; then 
        dir=$dir"/"
    fi
    echo $dir
}

src_dir=`my_get_dirname $1`
dst_dir=`my_get_dirname $2`
verbose=1

# Validate args
if [[ ! -d $src_dir ]]; then 
    usage "'$src_dir' is not a directory"
elif [[ ! -d $dst_dir  ]]; then 
    usage "'$dst_dir' is not a directory"
fi

# Remember current dir 
dir=`pwd`
cd $src_dir

# Loop through files and folders 
svn st | awk '{ print $2 }' | while read F
do
    if [[ $F != 'framework' ]]; then
        # Create directory, if not exists
        d=`dirname "$dst_dir$F"`
        if [[ ! -d $d ]]; then 
            mkdir -p "$d"
        fi

        # Copy file or directory
        if [[ -f $F ]]; then
            [[ $verbose = 1 ]] && echo "FILE $F"
            cp -f "$F" "$dst_dir$F"
        elif [[ -d $F ]]; then
            [[ $verbose = 1 ]] && echo "DIR $F"
            [[ ! -d "$dst_dir$F" ]] && mkdir -p "$dst_dir$F"
            cp -rf $F $dst_dir$F/../
        fi
    fi
done

# Go to the initial dir 
cd $dir

# vim: set textwidth=80:softtabstop=4:tabstop=4:shiftwidth=4:
# vim: set expandtab:autoindent:

11 November 2010

Enable/Disable Apache virtual hosts on Ubuntu with Zenity

Here is a simple bash script I use to toggle virtual hosts in a GUI window.


#!/bin/bash
# get available & enables site names & set corresp-ly checkboxes in the following zenity list
vh_avail=(`ls -B /etc/apache2/sites-available/`);

i=0
s=""

while (( i< ${#vh_avail[*]} ))
do
 if [ -f /etc/apache2/sites-enabled/${vh_avail[$i]} ] ; then 
  s=$s"TRUE "
 else 
  s=$s"FALSE "
 fi
 s=$s${vh_avail[$i]}" "
 
 (( ++i ))
done

# get vhost name

vhosts=`(zenity --list --width=700 --height=500 --title="Choose vhost" \
 --checklist --column="" --column="vhost" $s)`;

case $? in 
 0) # OK
   a=(`echo "$vhosts"|tr "\|" " "`);
   i=0;
   di=0;
   
   #disable all
   
   while (( i< ${#vh_avail[*]} ))
   do
    if [ -f /etc/apache2/sites-enabled/${vh_avail[$i]} ] ; then 
     a2dissite ${vh_avail[$i]}
    fi
    (( ++i ))
   done
   
   # enable needed
   i=0;   
   while ((i < ${#a[*]}))
   do 
    if [ -f /etc/apache2/sites-available/${a[$i]} ] ; then
     a2ensite ${a[i++]}
    else
     (( di++ ))
    fi
   done 
   
   info="${#a[*]} vhosts enabled";
   
   if (( di>0 ));then
    info="$info\n$di vhosts not found in /etc/apache2/sites-available/"
    zenity --warning --text="$info"
   else 
    zenity --info --text="$info"
   fi
   
   # reload apache
   /etc/init.d/apache2 restart #reload not always works
  ;;
 1) # Cancel
  exit 0
  ;; 
esac

So you can make a desktop launcher with command

gksudo /share/scripts/bash/vhosts_ed.sh

06 November 2010

How to convert MP3 file tag encodings in Ubuntu

It was somewhat troublesome for me to convert ID3 tags for large amount of files, especially in case of file names, with spaces and other "special" characters. I found tools and managed to successfully convert Windows-1251 MP3 file tags to UTF-8. I also post a sample bash script here.

mp3info

This tool displays and modifies ID3 tags. Quite simple in use. I would deal with it, if there were no problems with long character sequences(just cuts unpredictably!).

mid3iconv

It's job is exactly what I want. Converts ID3 tag encodings to Unicode. Usage is simplisity itself, e.g.

$ mid3iconv -e "Windows-1251" ~/Music/sample.mp3

Sample bash script

So the task is to iterate each file and apply mid3iconv. But one should care files already having Unicode tags. Therefore, the following script tries to detect encoding by means of enca tool.


#!/bin/bash 
# Splits the first argument by delimeter specified by the 2nd parameter 
# E.g. my_split_str $string $delim
# If delimeter is not specified, the func defaults to '='
# Requirements: 
# enca
# mp3info
# python-mutagen (mid3iconv)
#
# Example: bash mp3id3enc.sh --from-code=Windows-1251 --to-code=UTF-8
function my_split_str()
{
    if (( $# <=1 )) 
    then 
        IFS='=' 
    else 
        IFS=$2 
    fi 
    set -- $1
    echo $*
}

function usage()
{
    echo $0" [options]"
    echo "Options:
    --src-dir       Source dir
    --auto-detect-enc       Whether try to auto detect source encoding. Default: 0
    --from-code     Source text encoding. Default: Windows-1251
    --to-code       Target text encoding. Default: UTF-8
    --guess-lang        Language to use when guessing source text encoding. Default: ru
    "
    exit 1
}

# Set option defaults
src_dir='./'                # source directory
auto_detect_enc=0           # auto detect source encoding
from_code='Windows-1251'
to_code='UTF-8'
guess_lang='ru'

# Get options
for i in $*
do
  o=(`my_split_str $i '='`)
  case ${o[0]} in
    --src-dir)
        src_dir=${o[1]}
        # Add trailing slash, if not added yet
        if [[ ${src_dir#${src_dir%?}} != '/' ]]; then src_dir=$src_dir"/"; fi
      ;;
    --from-code)
      from_code=${o[1]}
      ;;
    --to-code)
      to_code=${o[1]}
      ;;
    --guess-lang)
      guess_lang=${o[1]}
      ;;
    --auto-detect-enc)
      auto_detect_enc=1
      ;;
    --help)
        usage
        exit 1
      ;;
    *)
      # unknown option
      echo "Unknown option ${o[0]}"
      usage
      exit 2
      ;;
  esac
done

find "$src_dir" -name "*.mp3" | while read FILENAME
do
    echo "$FILENAME..."

    # Try to detect encoding
    if [[ $auto_detect_enc = 1 ]]; then
        e=`mp3info -p "%t" "$FILENAME" | enca -gL $guess_lang`

        if [[ "$e" = *1251* ]]; then 
            from_code="Windows-1251"
        elif [[ "$e" = *CP866* || "$e" = *866* ]]; then
            from_code="CP866"
        elif [[ "$e" = *KOI8-R* ]]; then
            from_code="KOI8-R"
        else
            echo "couldn't detect encoding for "$FILENAME
            echo "$FILENAME ($e)" >> not-detected.log 
            continue
        fi

        echo "detected encoding: "$from_code
    fi

    t=`mp3info -p "%t" "$FILENAME" | enca -L ru`

    # For unrecognized encoding assume Windows-1251 
    if [[ $t = *Unrecognized* ]]; then 
        t="Windows-1251"
    fi
    echo "t='$t'"
    if [[ $t != *866* && $t != *1251* ]]; then 
        echo "$FILENAME skipped"
        continue
    fi

    mid3iconv -e "$from_code" "$FILENAME"

    echo
done

Notice, loop like


for FILENAME in $(find $src_dir -name "*.mp3"); do
...
done

fails with file names containing spaces and other escape sequences.

24 October 2010

Importing text files into Tomboy

Thanks to Scott Carpenter's post, I managed to write a Python script importing folder files into Tomboy.

The script creates note for each text file(".txt") using Tomboy DBus API. txt2tomboy.py:


import dbus, gobject, dbus.glib
import os

# get the d-bus session bus
bus = dbus.SessionBus()
# access the tomboy d-bus object
obj = bus.get_object("org.gnome.Tomboy", "/org/gnome/Tomboy/RemoteControl")
# access the tomboy remote control interface
tomboy = dbus.Interface(obj, "org.gnome.Tomboy.RemoteControl")

extAllowed = ['.txt']
srcDir = 'path/to/notes/'
path = os.path.expanduser(srcDir)

dirlist = os.listdir(path)
dirlist.sort()

for root, dirs, files in os.walk('/home/ruslan/now/'):
    for filename in files: 
        #print join(root, filename)
        if os.path.splitext(filename)[1] in extAllowed:
            print os.path.join(root, filename)
            f = open(os.path.join(root, filename))
            # d-bus complains if string params aren't valid UTF-8
            title = filename;

   # reset to start of file and read whole file
   f.seek(0)
   s = f.read();

   # creating named notes seems to prevent notes
   #   from showing up as "New Note NNN"
   note = tomboy.CreateNamedNote(title)

   # TODO: remove DisplayNote()/HideNote() in future versions
   # where the bug will hopefully be fixed.
   # Because of bug in Tomboy 1.8.0(at least I've noticed it in this version)
   # we have to workaround displaying the note before SetNoteContents :/
   tomboy.DisplayNote(note)
   tomboy.SetNoteContents(note, title + "\n\n" + s)
   tomboy.HideNote(note)

That's it. I'd just share what I've learnt

P.S. This is a start point in learning Python for me :)

UPDATE 25.10.2010 09:46 UTC+5

More flexible version:


#!/usr/bin/python
import dbus, gobject, dbus.glib
import os, sys, getopt

def usage():
    print __file__ + " [options]"
    print "Options:"
    print "-h --help        Display this help"
    print "-p --path        Path to source direcrory with text files"

def main(argv):
    path = './'                # source path 
    extAllowed = ['.txt']

    try:
        opts, args = getopt.getopt(argv[1:], "p:h", ("path="))

    except getopt.GetoptError:
        usage()
        sys.exit(2)

    for o, a in opts:
        if o in ('-h', '--help'):
            usage();
            sys.exit();

        elif o in ("-p", "--path"):
            path = os.path.expanduser(a)

    # get the d-bus session bus
    bus = dbus.SessionBus()
    # access the tomboy d-bus object
    obj = bus.get_object("org.gnome.Tomboy", "/org/gnome/Tomboy/RemoteControl")
    # access the tomboy remote control interface
    tomboy = dbus.Interface(obj, "org.gnome.Tomboy.RemoteControl")

    dirlist = os.listdir(path)
    dirlist.sort()

    for root, dirs, files in os.walk(path):
        for filename in files: 
            #print join(root, filename)
            if os.path.splitext(filename)[1] in extAllowed:
                print os.path.join(root, filename)
                f = open(os.path.join(root, filename))
                # d-bus complains if string params aren't valid UTF-8
                title = filename;

                # reset to start of file and read whole file
                f.seek(0)
                s = f.read();

                # creating named notes seems to prevent notes
                #   from showing up as "New Note NNN"
                note = tomboy.CreateNamedNote(title)
                # TODO: remove DisplayNote()/HideNote() in future versions
                # where the bug will hopefully be fixed.
                # Because of bug in Tomboy 1.8.0(at least I've noticed it in this version)
                # we have to workaround displaying the note before SetNoteContents :/
                tomboy.DisplayNote(note)
                tomboy.SetNoteContents(note, title + "\n\n" + s)
                tomboy.HideNote(note)

if __name__ == "__main__":
    main(sys.argv)

UPDATE Tue Jul 10 10:53:17 MSK 2012

The new Tomboy DBus API has a bug: SetNoteContents() doesn't work until the note isn't displayed. I've updated scripts with the following workaround:


tomboy.DisplayNote(note)
tomboy.SetNoteContents(note, title + "\n\n" + s)
tomboy.HideNote(note)

16 August 2010

Compiling Debian package for MySQL 5.1 with SphinxSE

I'd share experiense of compiling MySQL with SphinxSE Debian-way, i.e. making a DEB package from MySQL and SphinxSE sources. I pick an easy way using checkinstall. Checkinstall implies quick-and-dirty solution. If you want package complied Debian policy, you are not on the right spot(go here; this gives an overview). Ubuntu standard packages don't always provide latest releases. At least it regards to both sphinxsearch and mysql-server. If you already have MySQL installed, remove it:

$ sudo apt-get remove --purge mysql-server-5.1

UPDATE 19.08.2010 Otherwise, you should create a group and a user for MySQL as described here After that which mysql command should return an empty reply. Download Sphinx source tarball and untar it. We'll need just it's mysqlse folder. For instance, I'm checking out 1.10 beta release now. So I refer to sphinx-1.10-beta folder below. Download MySQL source for generic Linux platform and untar it:

$ tar -xf mysql-5.1.49.tar.gz

Currently 5.1.49 is the latest version. So I refer to mysql-5.1.49 folder below. Copy Sphinx mysqlse folder to mysql-5.1.49/storage/sphinx/:

$ cp -R sphinx-1.10-beta/mysqlse/ mysql-5.1.49/storage/sphinx/

Configure and precompile MySQL:

$ cd mysql-5.1.49
$ sh BUILD/autorun.sh
$ ./configure \
--prefix=/usr/local/mysql \
--with-plugins=sphinx \
--enable-assembler \
--with-mysqld-ldflags=-all-static \
--with-server-suffix=' SphinxSE 1.10 beta' \
--with-charset=utf8 
$ make && sudo make install

Instead of sudo make install:

$ sudo checkinstall -D \
--pkgname=mysql-server-5.1.49_sphinxse-1.10b \
--pkgversion=1.0

You should receive output like

...
======================== Installation successful ==========================

Copying documentation directory...
./
./README
./ChangeLog
./Docs/
./Docs/Makefile
./Docs/Makefile.am
./Docs/INSTALL-BINARY
./Docs/mysql.info
./Docs/Makefile.in
./INSTALL-SOURCE
./COPYING
./INSTALL-WIN-SOURCE
grep: /var/tmp/tmp.FFdtipwofv/newfile: No such file or directory

Copying files to the temporary directory...OK

Stripping ELF binaries and libraries...OK

Compressing man pages...OK

Building file list...OK

Building Debian package...OK

Installing Debian package...OK

Erasing temporary files...OK

Writing backup package...OK

Deleting temp dir...OK


**********************************************************************

 Done. The new package has been installed and saved to

 /home/ruslan/now/mysql-5.1.49/mysql-server-5.1.49-sphinxse-1.10b_1.0-1_i386.deb

 You can remove it from your system anytime using: 

      dpkg -r mysql-server-5.1.49-sphinxse-1.10b

**********************************************************************

Save mysql-server-5.1.49-sphinxse-1.10b somewhere, e.g. /usr/src. Now you can install it:

$ sudo dpkg -i mysql-server-5.1.49-sphinxse-1.10b

and see it in the package list:

$ dpkg --list mysql-server-5.1.49-sphinxse-1.10b 
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                        Version                     Description
+++-===========================-===========================-======================================================================
ii  mysql-server-5.1.49-sphinxs 1.0-1                       mysql-5.1.49 + SphinxSE 1.10 beta

Launch mysql server at startup:

$ sudo cp support-files/mysql.server /etc/init.d/mysql
$ sudo chmod +x /etc/init.d/mysql
$ sudo update-rc.d mysql defaults

Copy configuration:

$ sudo cp support-files/my-medium.cnf /etc/my.cnf

If you had MySQL installed before, probably /etc/mysql/my.cnf file still exists with old configuration. In such case overwrite it:

$ sudo cp -f support-files/my-medium.cnf /etc/mysql/my.cnf

or make appropriate changes in it. Make MySQL binaries visible from everywhere. E.g. append the following line to ~/.bashrc:

export PATH=$PATH:/usr/local/mysql/bin:/usr/local/mysql/sbin

Now start MySQL daemon:

$ sudo /etc/init.d/mysql start

or, directly,

$ sudo /usr/local/mysql/bin mysqld_safe --user=mysql &

basedir, datadir etc. could be overriden whether in command line arguments, or in /etc/init.d/mysql itself. Now SPHINX should be in the engine list:

mysql> show engines \G
*************************** 1. row ***************************
      Engine: CSV
     Support: YES
     Comment: CSV storage engine
Transactions: NO
          XA: NO
  Savepoints: NO
*************************** 2. row ***************************
      Engine: SPHINX
     Support: YES
     Comment: Sphinx storage engine 0.9.9 ($Revision: 2391 $)
Transactions: NO
          XA: NO
  Savepoints: NO
*************************** 3. row ***************************
      Engine: MEMORY
     Support: YES
     Comment: Hash based, stored in memory, useful for temporary tables
Transactions: NO
          XA: NO
  Savepoints: NO
*************************** 4. row ***************************
      Engine: MyISAM
     Support: DEFAULT
     Comment: Default engine as of MySQL 3.23 with great performance
Transactions: NO
          XA: NO
  Savepoints: NO
*************************** 5. row ***************************
      Engine: MRG_MYISAM
     Support: YES
     Comment: Collection of identical MyISAM tables
Transactions: NO
          XA: NO
  Savepoints: NO
5 rows in set (0.00 sec)

UPDATE 18 AUGUST 2010

BTW, Ubuntu's standard mysql-server-5.1 package has a bug in \s (status) command: it doesn't show current database info(charset, name etc.). There is no such bug after installing from original source. And I'm still figuring out how to fix bug with DELETE key(it prints tilde("~") instead of removing a character). Occasionally, on the first compile a had got rid of it. But in the next install it appeared again. I suspect mysql-common(5.1.41-3ubuntu12.3) package. I believe, it'll be fixed, if I recompile mysql-server, but I've no spare time now.

04 August 2010

How to parse CLI arguments in bash

Here are some ideas how to parse command line arguments in a bash script.

Using getopt

#!/bin/bash
echo "Before getopt"
for i
do
    echo $i
    done
    args=`getopt abc:d $*`
    set -- $args
    echo "After getopt"
for i
do
  echo "-->$i"
done

The code is from here.

Using getopts

#!/usr/bin/env bash
# cookbook filename: getopts_example
#
# using getopts
#
aflag=
bflag=
while getopts 'ab:' OPTION
do
  case $OPTION in
    a) aflag=1
    ;;
    b) bflag=1
    bval="$OPTARG"
    ;;
    ?) printf "Usage: %s: [-a] [-b value] args\n" $(basename $0) >&2
    exit 2
    ;;
  esac
done
shift $(($OPTIND - 1))

if [ "$aflag" ]
then
  printf "Option -a specified\n"
fi
if [ "$bflag" ]
then
  printf 'Option -b "%s" specified\n' "$bval"
fi
printf "Remaining arguments are: %s\n" "$*"

Source

Using sed

Here is how to parse using a regular expression.

#!/bin/bash 

# For string like part1=part2
# returns part2 
function my_arg_val()
{
  echo $1 | sed 's/\([-a-zA-Z0-9_]*=\)\|\(-[a-z]\)//'
}

# Get args
for i in $*
do
  case $i in
    --option-a=*)
      option_a=`my_arg_val $i` 
      ;;
    --option-b=*)
      option_b=`my_arg_val $i`
      ;;
    --default)
      is_default=YES
      ;;
    *)
      # unknown option
      exit 2
      ;;
  esac
done

echo " 
option_a=$option_a
option_b=$option_b
is_default=$is_default
"

Using $IFS

Here is mine script I finally decided to use:

#!/bin/bash 

# Splits the first argument by delimeter specified by the 2nd parameter 
# E.g. my_split_str $string $delim
# If delimeter is not specified, the func defaults to '='
function my_split_str()
{
  if (( $# <=1 )) 
  then 
    IFS='=' 
  else 
    IFS=$2 
  fi 
  set -- $1
  echo $*
}

# Get args
for i in $*
do
  o=(`my_split_str $i '='`)
  case ${o[0]} in
    --option-a)
      option_a=${o[1]}
      ;;
    --option-b)
      option_b=${o[1]}
      ;;
    --default)
      is_default=YES
      ;;
    *)
      # unknown option
      echo "Unknown option ${o[0]}"
      exit 2
      ;;
  esac
done

echo " 
option_a=$option_a
option_b=$option_b
is_default=$is_default
"

02 August 2010

Colorful grep

This simple article describes how to make grep command highlight results with custom color without using --color option. 1. In ~/.bashrc find the following lines and uncomment them:

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

2. In ~/.bash_aliases add line: alias grep='grep --color' 3. Append the following line in ~/.bashrc: export GREP_COLOR=';33m' Here ';33m' is yellow foreground color for matched symbols in grep results. Now you can invoke grep without --color option, and results are highlighted with custom color. The color codes are available e.g. here You can compose a bash script like

#!/bin/bash
#
#   This file echoes a bunch of color codes to the 
#   terminal to demonstrate what's available.  Each 
#   line is the color code of one forground color,
#   out of 17 (default + 16 escapes), followed by a 
#   test use of that color on all nine background 
#   colors (default + 8 escapes).
#

T='gYw'   # The test text

echo -e "\n                 40m     41m     42m     43m\
     44m     45m     46m     47m";

for FGs in '    m' '   1m' '  30m' '1;30m' '  31m' '1;31m' '  32m' \
           '1;32m' '  33m' '1;33m' '  34m' '1;34m' '  35m' '1;35m' \
           '  36m' '1;36m' '  37m' '1;37m';
  do FG=${FGs// /}
  echo -en " $FGs \033[$FG  $T  "
  for BG in 40m 41m 42m 43m 44m 45m 46m 47m;
    do echo -en "$EINS \033[$FG\033[$BG  $T  \033[0m";
  done
  echo;
done
echo

for reference.

Related info

"Colorizing" Scripts: http://tldp.org/LDP/abs/html/colorizing.html

24 July 2010

How to handle filename encodings in Linux

Recently I upgraded a thumbnail import function in CMS and faced with an annoying problem with differences in filename encodings on different platforms. Filenames played key role as they are compared to store product codes. So I forced to guess source encoding and convert it to UTF-8. Here are some ways to convert or detect filename encodings.

convmv

The simplest way to convert filename encodings is to use convmv package. It supports about 124 encodings:

$ convmv --list | wc -l 
124

It can also detect if filename is encoded in UTF-8. --nosmart option switches off this smartness. For instance, Windows with Russian locale usually encodes filenames in CP866. The following command converts filename encodings to UTF-8 within folder and it's subfolders:

$ convmv --notest -r -f cp866 -t utf8 folder
Your Perl version has fleas #37757 #49830 
mv "folder/�����.jpg" "folder/Пустыня.jpg"
mv "folder/�����_123.jpg" "folder/Коала_123.jpg"
mv "folder/������-1�.jpg" "folder/Пингвины-1А.jpg"
Ready!

To install convmv on Ubuntu:

$ sudo apt-get install convmv

enca

Detects and converts text file encoding. Obviously, can be used to detect ex. Russian filename encoding in folder:

$ ls -1 folder | enca -L ru
IBM/MS code page 866
  LF line terminators

Here -L instructs enca to guess Russian encoding. Of course, encoding should be homogeneuos within the folder. On my Ubuntu box it supports the following languages:

$ enca --list languages
belarussian: CP1251 IBM866 ISO-8859-5 KOI8-UNI maccyr IBM855 KOI8-U
  bulgarian: CP1251 ISO-8859-5 IBM855 maccyr ECMA-113
      czech: ISO-8859-2 CP1250 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
   estonian: ISO-8859-4 CP1257 IBM775 ISO-8859-13 macce baltic
   croatian: CP1250 ISO-8859-2 IBM852 macce CORK
  hungarian: ISO-8859-2 CP1250 IBM852 macce CORK
 lithuanian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
    latvian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
     polish: ISO-8859-2 CP1250 IBM852 macce ISO-8859-13 ISO-8859-16 baltic CORK
    russian: KOI8-R CP1251 ISO-8859-5 IBM866 maccyr
     slovak: CP1250 ISO-8859-2 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
    slovene: ISO-8859-2 CP1250 IBM852 macce CORK
  ukrainian: CP1251 IBM855 ISO-8859-5 CP1125 KOI8-U maccyr
    chinese: GBK BIG5 HZ
    none:

To perform just file list convertion use -x option. E.g.

$ ls -1 folder | enconv -L russian -x UTF-8
Коала_123.jpg
Пингвины-1А.jpg
afolder
cp866pics_with_folder.tar.gz
Пустыня.jpg

A shell script could iterate each file to rename it to appropriate encoding. Ex.

$ cat test_enconv.sh
#!/bin/bash
export DEFAULT_CHARSET='UTF-8'
if [ $# -eq 0 ] ; then dir='./' ; else dir=$1 ; fi
files=(`ls -B $dir`);
i=0

if [ ${#files[*]} -eq 0 ] ; then 
    echo "No files in dir $dir"
    exit 0 
fi

echo "Detected encoding: "`ls $dir | enca -L russian`
cd $dir
while (( i< ${#files[*]} ))
do
    f=${files[$i]}
 if [ -f $f ] ; then 
        mv $f `echo "$f" | enconv -L russian`
 fi
 (( ++i ))
done
exit 0

So the command:

$ ./test_enconv.sh folder

converts filenames within folder and it's subfolders. To install enca and enconv on Ubuntu:

$ sudo apt-get install enca

iconv

Converts files between about

$ iconv --list | tr ',' "\n" | wc -l 
1168

encodings Handy and powerful. The bash script mentioned above can be easily modified to invoke iconv instead of enca. However, source encoding should be previously detected somehow.


$ cat test_iconv.sh
#!/bin/bash
if [ $# -eq 0 ] ; then dir='./' ; else dir=$1 ; fi
files=(`ls -B $dir`);
from_encoding=$2
to_encoding='UTF-8'
i=0

if [ ${#files[*]} -eq 0 ] ; then 
    echo "No files in dir $dir"
    exit 0 
fi

cd $dir
while (( i< ${#files[*]} ))
do
    f=${files[$i]}
 if [ -f $f ] ; then 
           mv $f `echo "$f" | iconv --from-code=$from_encoding \
      --to-code=$to_encoding`
 fi
 (( ++i ))
done
exit 0

The script is called like the following:

$ ./test_iconv.sh folder CP866

Compressed files

Some archivers support pipe streaming which allows to post-process extracted data before storing on filesystem. For instance, Tar has --to-command option telling to extract files and pipe their contents to the standard input of command. See http://www.gnu.org/software/tar/manual/tar.html#SEC84. Command could be a Bourne shell script:


$ cat dispatch_arc_file.sh 
#!/bin/bash
if ( ! test $TAR_REALNAME ) ; then 
exit 1
fi

filename=$TAR_REALNAME      # Filename within archive
default_encoding='UTF-8'    # Target encoding 
filename_encoding=$1        # Source filename encoding

# Ignore files in folders
if [[ $filename =~ "/" ]] ; then 
exit 0
fi

# Convert filename encoding
if [ $# -ne 0 ] ; then
filename=`echo $filename | iconv --from-code=$filename_encoding --to-code=$default_encoding`
fi

# Save file
cat /dev/stdin > $filename
exit 0

Then the following should extract files from archive.tar.gz converting filenames from CP866 encoding to UTF-8.

$ tar -xf archive.tar.gz --to-command='./dispatch_arc_file.sh CP866'

Zip is unfortunately not so flexible, as Tar. Frustrating that Zip and Rar are far more popular than e.g. Tar among Windows users. I wonder, why these archivers with such restricred license prevail, while there are so simple and handy open source tools like 7-zip. Nevertheless, Unzip supports pipe streaming with -p option. But it works just for bulk data. I.e. it doesn't separate stream into files passing all uncompressed content to the program. I'd just quote unzip's help:

unzip -p foo | more => send contents of foo.zip via pipe into program more

Writing a program reading Zip headers etc. is, obviously, not good idea. One option is to previously extract files to a folder, and then convert filenames with one of above-mentioned methods, or with simple script like this:


$ cat conv_filenames.php
<?php
$path = $argv[1];
if ($handle = opendir($path)) {
while ($file = readdir($handle)) {
     rename($file, iconv('CP866', 'UTF-8', $file));
}
closedir($handle);
}
?>

Another option is to use a class like PHP's ZipArchive:


<?php
$zip = new ZipArchive;
if ($zip->open('test.zip') !== TRUE) die 'failed';

for ($i=0; $i<$zip->numFiles; ++$i)
$zip->renameIndex($i, iconv('CP866','UTF-8',$zip->getNameIndex($i)));
$zip->extractTo('/my/directory/');

$zip->close();
?>

ZipArchive is available, when PHP compiled using the --with-zip option. mb_convert_encoding function could be an alternative to iconv in PHP(see http://www.php.net/manual/en/function.mb-convert-encoding.php).

Subscribe to: Comments ( Atom )