March 29, 2013

Chocolate quote

Kinda appropriate for some recent events:
Il y a autant de générosité à recevoir qu'à donner
- Julien Green

Roughly translated to
There is only as much generosity to receive as there is to give.

March 27, 2013

A bashism a week: substrings (dynamic offset and/or length)

Last week I talked about the substring expansion bashism and left writing a portable replacement of dynamic offset and/or length substring expansion as an exercise for the readers.

The following was part of the original blog post, but it was too long to have everything in one blog post. So here is one way to portably replace said code.

Let's consider that you have the file name foo_1.23-1.dsc of a given Debian source package; you could easily find its location under the pool/ directory with the following non-portable code:
file=foo_1.23-1.dsc
echo ${file:0:1}/${file%%_*}/$file

Which can be re-written with the following, portable, code:
file=foo_1.23-1.dsc
echo ${file%${file#?}}/${file%%_*}/$file

Now, in the Debian archive source packages with names with the lib prefix are further split, so the code would need to take that into consideration if file is libbar_3.2-1.dsc.

Here's a non-portable way to do it:
file=libbar_3.2-1.dsc
if [ lib = "${file:0:3}" ]; then
    length=4
else
    length=1
fi

# Note the use of a dynamic length:
echo ${file:0:$length}/${file%%_*}/$file

While here's one portable way to do it:
file=libbar_3.2-1.dsc
case "$file" in
    lib*)
        length=4
    ;;
    *)
        length=1
    ;;
esac

length_pattern=
while [ 0 -lt $length ]; do
    length_pattern="${length_pattern}?"
    length=$(($length-1))
done

echo ${file%${file#$length_pattern}}/${file%%_*}/$file

The idea is to compute the number of interrogation marks needed and use them where needed. Here are two functions that can replace substring expansion as long as values are not negative (which are also supported by bash.)

genpattern() {
    local pat=
    local i="${1:-0}"

    while [ 0 -lt $i ]; do
        pat="${pat}?"
        i=$(($i-1))
    done
    printf %s "$pat"
}

substr() {
    local str="${1:-}"
    local offset="${2:-0}"
    local length="${3:-0}"

    if [ 0 -lt $offset ]; then
        str="${str#$(genpattern $offset)}"
        length="$((${#str} - $length))"
    fi

    printf %s "${str%${str#$(genpattern $length)}}"
}

Note that it uses local variables to avoid polluting global variables. Local variables are not required by POSIX:2001.

Enough about substrings!

Remember, if you rely on non-standard behaviour or feature make sure you document it and, if feasible, check for it at run-time.

March 20, 2013

A bashism a week: substrings

Sometimes obtaining a substring in a shell script is needed. The bashism of this week comes handy as it allows one to obtain a substring by indicating the offset and even the length of the substring. This is the ${varname:offset:length} bashism, also known as substring expansion.

The portable "replacements" are simple if the offset (and the length) are static. For example, the following code would print the substring of "foo" consisting of only the last two characters:
var=foo
# Replace the bashism ${var:1} with:
echo ${var#?}

The length can then be limited with additional pattern-matching removal expansions:
var="portable code"
# Replace the bashism ${var:3:5} with the following code

# Offset is 3, so we use three ? (interrogation) characters:
part=${var#???}

# Length is 5, so we use five ? characters:
echo ${part%${part#?????}}

As it can be seen, it is not impossible to replace a substring expansion.

The portable code becomes slightly more complex if the offset and/or the length are dynamic. I leave that as an exercise for the readers.

Feel free to post your code as a comment (use the <pre> tags, please) or in another public way. My own response is already scheduled to be published next week at the same time as usual.

Note: substring expansions can also be replaced with a wide variety of external commands. This is a pure-POSIX shell scripting example.

March 13, 2013

A bashism a week: assigning to variables and special built-ins

Assigning a value to a variable when executing a command is a way to populate the command's environment, without the variable assignment persisting after the command completes. This is not true, however, when a special built-in is the command being executed.

POSIX:2001 states that "Variable assignments specified with special built-in utilities remain in effect after the built-in completes".

Not only this is tricky because it depends on whether a utility is a special built-in or not, but the bash interpreter does not respect that behaviour of the POSIX standard. That is, special built-ins are not so "special" to the bash interpreter.

This leaves two things to take into account when assigning to a variable when executing a command: whether the command is a special built-in, and whether bash is interpreting the script.

Now, the list of special built-ins is rather short and it would be a bit unusual to perform variable assignments when calling them, except for some cases: "exec", "eval", "." (dot), and ":" (colon).

It is important to note that ":" and "true" differ in this regard; the former is a special built-in, the latter is just a utility. Watch out for this kind of differences when using ":" or "true" to nullify a command. E.g.

Compare
$ dash -c '
method=sed
# some condition or user setting ends up making:
method=true
# later:
foo=bar $method
echo foo: $foo'
foo: 
To (redacted for brevity):
$ dash -c '
method=:
foo=bar $method
echo foo: $foo'
foo: bar

March 06, 2013

A bashism a week: returning

Inspired by Thorsten Glaser's comment about where you can break from, this "bashism a week" is about a behaviour not implemented by bash.

return is a special built-in utility, and it should only be used on functions and scripts executed by the dot utility. That's what the POSIX:2001 specification requires.

If you return from any other scope, for example by accidentally calling it from a script that was not sourced but executed directly, the bash shell won't forgive you: it does not abort the execution of commands. This can lead to undesired behaviour.

A wide variety of shell interpreters silently handle such calls to return as if exit had been called.

An easy way to avoid such undesired behaviours is to follow the best practice of setting the e option, i.e.
set -e
. With that option set at the moment of calling return outside of the allowed scopes, bash will abort the execution, as desired.

The POSIX specification does not guarantee the above behaviour either as the result in such cases is "unspecified", however.