A More Robust Line Counter
In a previous post, I talked about how to count the number of lines in a text file. I explained the technique of piping the output from
type file.txt into
find /c /v "" and wrapping the whole thing inside a
for /f loop to store the result in a variable. A simple and effective solution to a common Batch programming task. 🙂
Too bad it doesn’t work…
The problem is that sometimes you can’t make assumptions about your input. There could be anything in that file you’re trying to process: extremely long lines; Unix line endings; or control characters—including my old friend the Null Character.
For reasons unexplained,
find outputs null characters as newlines (as does
more) which means the old
type file.txt | find /c /v "" trick will give an incorrect count if the input file contains any of the little pests. 😮
A more robust solution is needed in these situations. One that won’t be tripped up by anything in the input file.
The program listed below makes extensive use of
findstr to determine the number of lines in a text file by searching for Line Feeds (ASCII 10, LF). LF is used to mark the end of a line in both Windows and Unix.
@echo off & setlocal enableextensions (set lf=^ ) call :LFcount lc "%~1" echo(file "%~1" has %lc% lines endlocal & goto :EOF :LFcount setlocal enabledelayedexpansion findstr /mv "!lf!" "%~2" >nul && ( for /f delims^=: %%n in (' cmd /v:on /c findstr /nv "^!lf^!" "%~2" 2^>nul ') do set lastline=%%n) || (for /f delims^=: %%n in (' (findstr /n "^" "%~2" ^& echo(#^) ^| findstr /bn # 2^>nul ') do set /a lastline=%%n-1) endlocal & set "%1=%lastline%" & exit /b 0
The main program in this example does nothing except define the
lf variable, call the subroutine, and display the result. I’ve covered everything you would normally expect to find in the main program (such as file validity checks, command line parameter processing, error messages, etc) in previous posts. No need to go over it all again… plus the fact that I’m extremely lazy. 😉
:LFcount subroutine accepts two parameters: a variable name, and the name of a file. The number of lines in the latter will be stored in a variable named after the former.
First, does the file end with a line feed? If not, output the line number and contents of all lines in the file that don’t end with a line feed. But there can only be one such line which has to be the last line of the file. The line number is captured by the surrounding
for /f loop and stored in a variable.
If the file does end with a line feed, the procedure is a little more complicated. The output from
findstr /n "^" "%~2" along with
echo(# is piped into
findstr /bn # which outputs the line number and contents of all lines found beginning with
#. Again, if you think about it, there can only be one match: the line after the last line in the file. The line number is captured by the surrounding
for /f loop, 1 is subtracted from it, and the result is stored in a variable.
The result is passed back to the main program via a variable named after the first parameter.
The above code returned the correct number of lines for files with Windows or Unix line endings (but not MacOS 9 or earlier); files with extremely long lines; large files with over 120k lines; and lines containing control characters. The result is returned without any noticeable delay, even for large files. I haven’t tested the program with Unicode text files.
Request for Comments
What do you think? Do you know a better way to do it? Is there an error in my code? Would you like to tell me about a link related to this post? Did you find a broken link or spelling mistake?