Scan Text Files for Control Characters
A large part of programming in Batch is taken up with processing text files. By “text file”, I mean a plain text file with Windows line endings. And “plain text” means no nasty control characters such as Control-Z or the infamous Null Character.
For example, the former is used by
copy /a and
type as the end-of-file marker, while
echo interpret the latter as the end of input.
So it’s always a good idea to scan any text files of unknown origin for these troublesome characters before doing anything else. Which is why I wrote the
ctrlscan.cmd program described below…
The program accepts a list of filenames. Filenames may contain wildcards. Enter
ctrlscan /? for basic usage info.
Two things worth noting at this point are:
Queue revealed in this DosTips topic that
(call;)sets the dynamic variable
errorlevelto 0 and
(call)sets it to 1. A great tip and so useful! I’ve made extensive use of it throughout the program, so if you’re wondering what all those empty
callstatements are about, now you know. 🙂
Anyways, the location for the file of control characters is stored in the
ctrls variable and is defined as
%tmp%\ctrls.tmp by default, but feel free to change it to suit your own needs. If the file does not exist, a new one will be created.
Btw, Windows XP users should comment and uncomment the lines indicated in the
for loop expands any wildcards (eg,
*.htm?) in the command line parameter (
%1) into a list of filenames, and stores them in the
%%f loop variable.
If a filename passes all the usual validity checks,
findstr performs a literal search (the
/l switch) for every string in
%ctrls% (that’s what
/g:"%ctrls%" does) on the file, and the
/m switch ensures only the filename is sent to output if there is a match. If nothing is found, the program moves on to the next file.
If there is a match, the file is scanned a second time by a similar
findstr command, only this time it uses
/n to number all lines containing any offending characters. The line numbers are captured by a surrounding
for /f loop and stored in its
%%l variable. Finally, the line number and filename are displayed in an error message.
for loop exits when all the expanded filenames have been processed. The next command line parameter is read in (
shift /1), overwriting the current value of
%1. If the new
%1 is not empty, the program goes back to the
:loop label and starts all over again. Otherwise, the program terminates.
The program exits with
goto :EOF rather than
exit /b 0 in order to preserve the value of
Please note that
does not search for the control characters Tab [HT), Line Feed (LF), or Carriage Return (CR).
will work with Unix text files as well as files with Windows line endings. But files from systems that use CR as line terminator (such as MacOS 9 or earlier) will be treated as one long line.
is not suitable for Unicode text files.
has no limit on line length.
Well, that’s about it for now. The next version of this program will not only find control characters, but remove them as well! A program like that would be in real danger of being useful. 😉
Watch this space for updates. And in the meantime, feel free to leave a comment with any thoughts or suggestions you might have on the subject.
Read Microsoft Knowledge Base Article 934576 if you are a Windows XP user wishing to install
replon the DosTips forum that accepts two strings as arguments. It searches input for the former, replaces any matches with the latter, and writes the result to output. Supports regular expressions.
FNR is an open source tool to find and replace text in multiple files. Actively maintained. Small download. No dependencies. Runs from command line as well as Windows UI. Support for regular expressions.
See also my post on how to generate any control character.