Trim Leading and Trailing Whitespace from a String

Posted On Mon, 30 Sep 2013

Filed under Batch
Tags: , , , , , ,

Comments Dropped leave a response

Hello All! 🙂

I’m back after a long hiatus with a Batch file that trims leading and trailing whitespace from a string. The program won’t choke on “poison characters” and doesn’t care about the state of delayed expansion. At last, something other people may find useful! 😉

To achieve this, however, I’ve had to employ several advanced Batch programming techniques. The code may be difficult for beginners to follow, but I’ve tried to explain how the program works and I’ve included links to fuller explanations of some of the techniques used.

Program

The program is in two parts. First, some sample strings are read in and stored in variables. The strings contain “poison characters” and lots of tabs and spaces at either end.

The names of these variables are then passed twice to the second part of the program, the :strim subroutine. The first time with delayed expansion disabled and the /l switch set to trim leading whitespace, and the second with delayed expansion enabled and the /r switch set to trim trailing whitespace.

Please note that results must always be displayed with delayed expansion enabled to prevent echo from choking on special characters.

The :strim subroutine accepts three arguments:

  1. The name of the variable to contain the trimmed string.
  2. The name of the variable to be trimmed.
  3. An optional /l or /r (left or right) switch to specify whether leading or trailing whitespace should be trimmed from the string (both sides are trimmed by default).

The subroutine will throw an error if the variable to be trimmed is undefined or contains a line feed. If the variable to be trimmed consists entirely of tabs and/or spaces, :strim will pare it down to nothing, output a warning message, and return an undefined (ie, empty) result variable.

Assuming all goes well, the :strim subroutine will return the trimmed string stored in the result variable. Specify the same variable name twice when calling :strim if you want the original string to be overwritten by the trimmed string.

@if (@X)==(@Y) @goto dummy @end /* Batch
@echo off & setlocal enableextensions disabledelayedexpansion
cls & (call;)
(set lf=^

)
for /f delims^= %%t in ('cscript //e:jscript //nologo "%~dpf0"') do set tab=%%t

for /f delims^= %%v in ('findstr /bl ::: "%~dpf0"') do (
set /a n+=1 & set "val=%%v"
setlocal enabledelayedexpansion
for /f "tokens=1*" %%a in ("!n! "!val:~3!"") do (
endlocal & set "val%%a=%%~b")
)

echo(leading whitespace strimmed without delayed expansion
for /l %%i in (1 1 %n%) do call :strim res%%i val%%i /l
setlocal enabledelayedexpansion
for /l %%i in (1 1 %n%) do if defined res%%i echo(%%i:[!res%%i!]
echo(&echo(trailing whitespace strimmed with delayed expansion
for /l %%i in (1 1 %n%) do call :strim res%%i val%%i /r
for /l %%i in (1 1 %n%) do if defined res%%i echo(%%i:[!res%%i!]
endlocal

endlocal & goto :eof

:strim result= original= [/l|/r]
:: trims leading and trailing tabs and spaces from a string
:: http://wp.me/p2x3If-4t for more info
setlocal
set "nodelay=!"
setlocal enabledelayedexpansion
if not defined %2 (>&2 echo(var "%2" not defined
endlocal & endlocal & exit /b 1) else set "str=!%2!"
for %%l in ("!lf!") do if "!str!" neq "!str:%%~l=!" (
>&2 echo(var "%2" contains newlines
endlocal & endlocal & exit /b 1)
set "notblank="
for /f tokens^=*^ eol^= %%# in ("!str!") do set notblank=1
if not defined notblank (
>&2 echo(var "%2" has been strimmed down to nothing
endlocal & endlocal & set "%1=" & exit /b 0)

for %%v in ("left=" "right=" "stop=" "pos=0") do set %%v
set "side=%3" & set "side=!side:~1,1!"
if "%side%"=="l" (set left=1) else if "%side%"=="r" set right=1

if not defined left for /f %%r in ('
cmd /von /q /c for /l %%i in (^) do ^
if not defined stop (set /a pos-^=1 ^^^^^>nul ^^^^^& ^
for %%p in (^^!pos^^!^) do set "chr=^!str:~%%p,1^!" ^^^^^& ^
if "^!chr^!" neq " " if "^!chr^!" neq "%tab%" set stop^=1^
) else (set /a pos+^=1 ^^^^^& exit 0^)
') do set pos=%%r
if %pos% lss 0 set "str=!str:~0,%pos%!"

if not defined nodelay set "str=!str:^=^^^^!"
gset "str=!str:"=""!"
if not defined nodelay set "str=%str:!=^^^!%" !
set "str=!str:""="!"

if not defined right (set "option=tokens^=*") else set "option=delims^="
for /f %option%^ eol^= %%a in ("!str!") do (
endlocal & endlocal & set "%1=%%a" !)
exit /b 0

:: change BS, ESC and TAB to their ctrl-char equivalents
:::    "..\..\path to\myprog.cmd    <TAB>
:::
:::<TAB>  <TAB><TAB>    <TAB>
:::!random!<TAB>%date% ^"<TAB>
::: <TAB> ^&^
:::  "^&"&<TAB><TAB>
:::  Bang!  DOUBLE!! <TAB> single  caret ^<TAB>"double caret ^^
:::<TAB><BS>backspace  esc=[<ESC>] ^^!" %% ^^^&& << || >>?<TAB>

JavaScript */
WScript.Echo("\x09");

Discussion

First, we have to store a tab in a variable. There’s no one way to do this that works across all versions of Windows, so we have to cheat and use JavaScript. See my post on generating control characters for a detailed example.

Next we search through the Batch file looking for lines that begin with three colons. Any matches are read in, the three colons are stripped, and the remainder of the line is stored in a numbered variable. These variables are used to show how the :strim subroutine works and are located at the end of the Batch file. Add your own examples and try it out for yourself.

And now we come to the heart of the program: the :strim subroutine. After all the precautionary checks are performed, the str variable should contain at least one character that isn’t a space or a tab. The need to ensure this will become clear later on.

Assuming both ends of the string are to be trimmed, we first remove any trailing whitespace. This is done by executing an indefinite for /l loop inside the in (...) clause of a for /f loop.

The stop variable is checked at the start of each iteration of the for /l loop. If it’s not defined, the pos variable is decremented by 1 and is then used as an index into str to retrieve the last (second last, third last, and so on) character of str. This character is stored in the chr variable. If chr is neither a space nor a tab, stop is set to 1. If chr is a space or a tab, the loop goes round again until stop becomes defined.

When stop finally becomes defined, the pos variable is incremented by 1 (because of the idiosyncratic nature of Batch’s notation for substrings) using set /a (which also outputs the result as a side-effect), and the subshell inside the for /f loop’s in (...) clause exits, terminating the indefinite for /l loop.

At this point, the pos variable should contain the length in characters from the end of the string to the last character in the string that isn’t a space or a tab (expressed as a negative integer). This is used to right-trim the string.

Next, a sequence of substitutions must be performed on the string to prevent it from being corrupted if the subroutine was called with delayed expansion enabled.

Lastly, we exploit the well-known trick of using "tokens=*" in a for /f loop to trim leading whitespace from a string. At the same time, we use the loop to pass the string over the endlocal boundary and return it to the calling context stored in the result variable.

And we’re done! 🙂

Related Links

The program above would not have been possible without the following:

Request for Comments

What do you think? Do you have any questions? Did you spot any mistakes? Would you like to tell me about some related link or resource?

Whatever it is, I want to hear about it! I’m always interested in your feedback. Please leave a comment below or contact me by email or on Twitter.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s