Trim Leading and Trailing Whitespace from a String
Hello All! 🙂
I’m back after a long hiatus with a Batch file that trims leading and trailing whitespace from a string. The program won’t choke on “poison characters” and doesn’t care about the state of delayed expansion. At last, something other people may find useful! 😉
To achieve this, however, I’ve had to employ several advanced Batch programming techniques. The code may be difficult for beginners to follow, but I’ve tried to explain how the program works and I’ve included links to fuller explanations of some of the techniques used.
The program is in two parts. First, some sample strings are read in and stored in variables. The strings contain “poison characters” and lots of tabs and spaces at either end.
The names of these variables are then passed twice to the second part of the program, the
:strim subroutine. The first time with delayed expansion disabled and the
/l switch set to trim leading whitespace, and the second with delayed expansion enabled and the
/r switch set to trim trailing whitespace.
Please note that results must always be displayed with delayed expansion enabled to prevent
echo from choking on special characters.
:strim subroutine accepts three arguments:
- The name of the variable to contain the trimmed string.
- The name of the variable to be trimmed.
- An optional
/r(left or right) switch to specify whether leading or trailing whitespace should be trimmed from the string (both sides are trimmed by default).
The subroutine will throw an error if the variable to be trimmed is undefined or contains a line feed. If the variable to be trimmed consists entirely of tabs and/or spaces,
:strim will pare it down to nothing, output a warning message, and return an undefined (ie, empty) result variable.
Assuming all goes well, the
:strim subroutine will return the trimmed string stored in the result variable. Specify the same variable name twice when calling
:strim if you want the original string to be overwritten by the trimmed string.
Next we search through the Batch file looking for lines that begin with three colons. Any matches are read in, the three colons are stripped, and the remainder of the line is stored in a numbered variable. These variables are used to show how the
:strim subroutine works and are located at the end of the Batch file. Add your own examples and try it out for yourself.
And now we come to the heart of the program: the
:strim subroutine. After all the precautionary checks are performed, the
str variable should contain at least one character that isn’t a space or a tab. The need to ensure this will become clear later on.
Assuming both ends of the string are to be trimmed, we first remove any trailing whitespace. This is done by executing an indefinite
for /l loop inside the
in (...) clause of a
for /f loop.
stop variable is checked at the start of each iteration of the
for /l loop. If it’s not defined, the
pos variable is decremented by 1 and is then used as an index into
str to retrieve the last (second last, third last, and so on) character of
str. This character is stored in the
chr variable. If
chr is neither a space nor a tab,
stop is set to 1. If
chr is a space or a tab, the loop goes round again until
stop becomes defined.
stop finally becomes defined, the
pos variable is incremented by 1 (because of the idiosyncratic nature of Batch’s notation for substrings) using
set /a (which also outputs the result as a side-effect), and the subshell inside the
for /f loop’s
in (...) clause exits, terminating the indefinite
for /l loop.
At this point, the
pos variable should contain the length in characters from the end of the string to the last character in the string that isn’t a space or a tab (expressed as a negative integer). This is used to right-trim the string.
Next, a sequence of substitutions must be performed on the string to prevent it from being corrupted if the subroutine was called with delayed expansion enabled.
Lastly, we exploit the well-known trick of using
"tokens=*" in a
for /f loop to trim leading whitespace from a string. At the same time, we use the loop to pass the string over the
endlocal boundary and return it to the calling context stored in the result variable.
And we’re done! 🙂
The program above would not have been possible without the following:
Aacini’s post on the ultimate while loop in Batch.
Dave Benham demonstrates how to escape special characters inside a
in (...)clause in this DosTips thread.
Jeb explains his safe return technique. He posts a subroutine that can return a string containing any character (except the null character, of course) without corrupting it, regardless of the state of delayed expansion when it was called. Cool. 😎
You may have noticed two lines ending with an exclamation mark in the code above. They weren’t typos! Jeb explains the exclaims in this post. :twisted
Request for Comments
What do you think? Do you have any questions? Did you spot any mistakes? Would you like to tell me about some related link or resource?